Re: Lucene scoring components

2018-07-17 Thread Adrien Grand
You could extend this class and provide your own implementation to incorporate term frequency into the final score. For the record, you might want to look into BM25Similarity, which takes term frequency into account, but in a way that gives a much lower score contribution to hits than

Re: Lucene scoring overall score

2018-07-17 Thread Adrien Grand
You could use IndexSearcher#explain, which tells you how the score of a document is computed. Le mar. 17 juil. 2018 à 19:06, a écrit : > Hi,- > > how can i check the contributions from different fields indexed in the > hits doc's score? > > Best regards > > >

Lucene scoring overall score

2018-07-17 Thread baris . kazar
Hi,- how can i check the contributions from different fields indexed in the hits doc's score? Best regards - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail:

Re: Lucene scoring components

2018-07-17 Thread baris . kazar
i forgot to put the doc that i was referring to: https://lucene.apache.org/core/6_0_1/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html Best regards On 7/17/18 1:01 PM, baris.ka...@oracle.com wrote: Hi,- is there a way to diminish the tf(t in d) component to 1? i dont want

Lucene scoring components

2018-07-17 Thread baris . kazar
Hi,- is there a way to diminish the tf(t in d) component to 1? i dont want the number of times a word appears to affect the scoring for my app. Best regards - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org

Lucene Scoring

2018-07-15 Thread Baris Kazar
modified but the order of results is pretty much the same. what happens is that when part of the search string is found on both fields then those entries are hit first since Lucene scoring takes number of occurrences as dominant in scoring. But i want the search string to be fully-matched

Re: Disabling Lucene Scoring/Ranking

2017-01-09 Thread Mikhail Khludnev
fwiw https://issues.apache.org/jira/browse/LUCENE-5867 is going to be released soon. On Mon, Jan 9, 2017 at 2:17 PM, Rajnish kamboj wrote: > My application does not require scoring/ranking. All data is equally > important for me. > > Search query can return any

Re: Disabling Lucene Scoring/Ranking

2017-01-09 Thread Rajnish kamboj
Thanks for quick responses.. I will try the approach.. Does bypassing scoring increases search performance also? Regards Rajnish On Monday, January 9, 2017, Ian Lea wrote: > oal.search.ConstantScoreQuery? > > "A query that wraps another query and simply returns a constant

Re: Disabling Lucene Scoring/Ranking

2017-01-09 Thread Ian Lea
oal.search.ConstantScoreQuery? "A query that wraps another query and simply returns a constant score equal to the query boost for every document that matches the query. It therefore simply strips of all scores and returns a constant one." -- Ian. On Mon, Jan 9, 2017 at 11:39 AM, Taher Galal

Re: Disabling Lucene Scoring/Ranking

2017-01-09 Thread Michael McCandless
Just wrap your Query in a ConstantScoreQuery. Lucene will optimize the query execution to not read term frequencies from disk, not compute scores, etc. Mike McCandless http://blog.mikemccandless.com On Mon, Jan 9, 2017 at 6:17 AM, Rajnish kamboj wrote: > My

Re: Disabling Lucene Scoring/Ranking

2017-01-09 Thread Taher Galal
Hi, What about writing your own scoring that just give a value of 1 to all the documents that are hits? On Mon, Jan 9, 2017 at 12:17 PM, Rajnish kamboj wrote: > My application does not require scoring/ranking. All data is equally > important for me. > > Search query

Disabling Lucene Scoring/Ranking

2017-01-09 Thread Rajnish kamboj
My application does not require scoring/ranking. All data is equally important for me. Search query can return any documents matching search criteria. So, Is there a way to completely disable scoring/ranking altogether? OR Is there a better solution to it. Regards Rajnish

Changing the lucene scoring function

2015-11-21 Thread Victor Makarenkov
Hi everybody! I would appreciate if you can refer me to some *example *or explanation of how to change the scoring function of lucene. I would expect 2 options: 1. changing some configuration, so the ranking function becomes , say Okapi BM 25 instead of standard similarity 2. Is there any

Re: Changing the lucene scoring function

2015-11-21 Thread Doug Turnbull
Hi Victor You want to look at setting a similarity other than TF IDF. For example here's BM25 Similarity https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/similarities/BM25Similarity.html And the "setSimilarity" method on IndexSearcher

Lucene Scoring in Exact and Phrase Matching

2015-11-18 Thread JayJones11
ined ? [1] https://www.elastic.co/guide/en/elasticsearch/guide/current/relevance-intro.html#explain -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-Scoring-in-Exact-and-Phrase-Matching-tp4240883.html Sent from the Lucene - Java Users mailing list archive at

Lucene scoring

2013-03-12 Thread lucas van overberghe
not be a document where his name pops up a few times but instead be the contact details of Peter where his name might popup only once. How would we go and implement this ? Is it neccesary to change the Lucene scoring algorithm or is there a better/easier way? Thanks and kind regards, Lucas Van Overberghe

Re: Lucene scoring

2013-03-12 Thread Ian Lea
on Peter. The first result should therefore not be a document where his name pops up a few times but instead be the contact details of Peter where his name might popup only once. How would we go and implement this ? Is it neccesary to change the Lucene scoring algorithm or is there a better

RE: Custom lucene scoring - Dot product between field boost and query boost

2012-02-23 Thread Yuval Kesten
to normal... So cool! -Original Message- From: Yuval Kesten [mailto:ykes...@yahoo-inc.com] Sent: Wednesday, February 22, 2012 7:29 PM To: java-user@lucene.apache.org Subject: RE: Custom lucene scoring - Dot product between field boost and query boost Hi all, Inspired by another thread here

Re: Custom lucene scoring - Dot product between field boost and query boost

2012-02-22 Thread Em
! -Original Message- From: Em [mailto:mailformailingli...@yahoo.de] Sent: Tuesday, February 21, 2012 6:07 PM To: java-user@lucene.apache.org Subject: Re: Custom lucene scoring - Dot product between field boost and query boost Hi Yuval, 1. Performances: I am calculating all the TF/IDF

Re: Custom lucene scoring - Dot product between field boost and query boost

2012-02-22 Thread Alan Woodward
before doing the indexing and obviously before the searching. Thanks! -Original Message- From: Em [mailto:mailformailingli...@yahoo.de] Sent: Tuesday, February 21, 2012 6:07 PM To: java-user@lucene.apache.org Subject: Re: Custom lucene scoring - Dot product between field boost

RE: Custom lucene scoring - Dot product between field boost and query boost

2012-02-22 Thread Yuval Kesten
has better ideas - please share! -Original Message- From: Alan Woodward [mailto:alan.woodw...@romseysoftware.co.uk] Sent: Wednesday, February 22, 2012 4:00 PM To: java-user@lucene.apache.org Subject: Re: Custom lucene scoring - Dot product between field boost and query boost Hi Yuval

Custom lucene scoring - Dot product between field boost and query boost

2012-02-21 Thread Yuval Kesten
Hi, I want to use Lucene with the following scoring logic: When I index my documents I want to set for each field a score/weight. When I query my index I want to set for each query term a score/weight. I will NEVER index or query with many instances of the same field - In each query (document)

RE: Custom lucene scoring - Dot product between field boost and query boost

2012-02-21 Thread Yuval Kesten
The same question is formatted nicer here: http://stackoverflow.com/questions/9380188/custom-lucene-scoring-dot-product-between-field-boost-and-query-boost Thanks! -Original Message- From: Yuval Kesten [mailto:ykes...@yahoo-inc.com] Sent: Tuesday, February 21, 2012 5:18 PM To: java-user

Re: Custom lucene scoring - Dot product between field boost and query boost

2012-02-21 Thread Em
Hi Yuval, 1. Performances: I am calculating all the TF/IDF stuff and NORMS for nothing... You aren't calculating that much, since you declared all those values as constants. What are you worried about? 2. The score I get from the TopScoreDocCollector is not the same as I get from the

RE: Custom lucene scoring - Dot product between field boost and query boost

2012-02-21 Thread Yuval Kesten
@lucene.apache.org Subject: Re: Custom lucene scoring - Dot product between field boost and query boost Hi Yuval, 1. Performances: I am calculating all the TF/IDF stuff and NORMS for nothing... You aren't calculating that much, since you declared all those values as constants. What are you

Lucene scoring and random result order

2011-08-25 Thread Yanick Gamelin
Hi all, I have the following problem with Lucene being not deterministic. I use a MultiSearcher to process a search and when I get hits with same score, those are returned in a random order. I wouldn't care much about the order of the hits with same score if I could get them all, so I could

RE: Lucene scoring and random result order

2011-08-25 Thread Sendros, Jason
[] { SortField.FIELD_SCORE, new SortField(POSITION,SortField.INT) }); -Original Message- From: Yanick Gamelin [mailto:yanick.game...@ericsson.com] Sent: Thursday, August 25, 2011 3:02 PM To: java-user@lucene.apache.org Subject: Lucene scoring and random result order Hi all, I have the following

Word Confidence in Lucene scoring

2011-08-13 Thread Saar Carmi
Hi Does Lucene support setting word confidence for every word in the document, to influence the scoring? As suggested by MAVIS projecthttp://research.microsoft.com/en-us/projects/mavis/, when indexing Speech Recognition text one need to take into account how confident the recognition of a word is.

Re: Lucene Scoring

2010-07-07 Thread manjula wijewickrema
...@apache.org wrote: On Jul 5, 2010, at 5:02 AM, manjula wijewickrema wrote: Hi, In my application, I input only single term query (at one time) and get back the corresponding scorings for those queries. But I am little struggling of understanding Lucene scoring. I have

Re: Lucene Scoring

2010-07-06 Thread manjula wijewickrema
input only single term query (at one time) and get back the corresponding scorings for those queries. But I am little struggling of understanding Lucene scoring. I have reffered http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html and some other pages

Re: Lucene Scoring

2010-07-06 Thread Ian Lea
: Hi, In my application, I input only single term query (at one time) and get back the corresponding scorings for those queries. But I am little struggling of understanding Lucene scoring. I have reffered http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html

Lucene Scoring

2010-07-05 Thread manjula wijewickrema
Hi, In my application, I input only single term query (at one time) and get back the corresponding scorings for those queries. But I am little struggling of understanding Lucene scoring. I have reffered http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html and some

Re: Lucene Scoring

2010-07-05 Thread Grant Ingersoll
On Jul 5, 2010, at 5:02 AM, manjula wijewickrema wrote: Hi, In my application, I input only single term query (at one time) and get back the corresponding scorings for those queries. But I am little struggling of understanding Lucene scoring. I have reffered http://lucene.apache.org/java

Lucene scoring and short fields

2008-02-07 Thread daniel rosher
Hi All, Given that Lucene scoring can favour shorter fields in documents, in the past we've had to pad out 'unreasonably' short fields to a set minimum (with basically nonsense words), I'm wondering how others might have dealt with this issue. Another option is to have a custom Similarity class

Re: Lucene scoring and short fields

2008-02-07 Thread Chris Hostetter
: (with basically nonsense words), I'm wondering how others might have : dealt with this issue. : : Another option is to have a custom Similarity class with an altered : lengthNorm method? that is what i would recommend ... it's exactly what SweetSpotSimilarity does (you define a platuea of

Re: Lucene scoring: coord_q_d factor

2006-12-19 Thread Doug Cutting
Karl Koch wrote: Are there any other papers that regard the combination of coordination level matching and TFxIDF as advantageous? We independently developed coordination-level matching combined with TFxIDF when I worked at Apple. This is documented in:

Re: Lucene scoring: coord_q_d factor

2006-12-14 Thread Soeren Pekrul
Karl Koch wrote: If I do not misunderstand that extract, I would say it suggests the combination of coordination level matching with IDF. I am interested in your view and those who read this? I understand that sentence: The natural solution is to correlate a term's matching value with its

Re: Lucene scoring: coord_q_d factor

2006-12-14 Thread Karl Koch
-user@lucene.apache.org Betreff: Re: Lucene scoring: coord_q_d factor Karl Koch wrote: If I do not misunderstand that extract, I would say it suggests the combination of coordination level matching with IDF. I am interested in your view and those who read this? I understand that sentence

Re: Lucene scoring: coord_q_d factor

2006-12-14 Thread Soeren Pekrul
Soeren Pekrul wrote: The score for a document is the sum of the term weights w(tf, idf) for each containing term. So you have already the combination of coordination level matching with IDF. Now it is possible that your query requests three terms A, B and C. Two of them (A and B) are quite

Re: Lucene scoring: coord_q_d factor

2006-12-14 Thread Grant Ingersoll
FYI: The Wiki has a fair number of resources on IR: http:// wiki.apache.org/jakarta-lucene/InformationRetrieval (I have added a link to this conversation, which contains a lot of useful information) Karl, if you are so inclined, please feel free to add any of the references you have found

Re: Lucene scoring: coord_q_d factor

2006-12-13 Thread Karl Koch
Do you know about any papers that discuss this? Karl Original-Nachricht Datum: Wed, 13 Dec 2006 10:31:41 -0500 Von: Yonik Seeley [EMAIL PROTECTED] An: java-user@lucene.apache.org Betreff: Re: Lucene scoring: coord_q_d factor On 12/13/06, Karl Koch [EMAIL PROTECTED] wrote

Re: Lucene scoring: coord_q_d factor

2006-12-13 Thread Paul Elschot
On Wednesday 13 December 2006 16:42, Karl Koch wrote: Do you know about any papers that discuss this? Coordination is called co-ordination In the original idf paper by K. Spärck Jones, A statistical interpretation of term specificity and its application in retrieval., Journal of Documentation

Re: Re: Re: Questions about Lucene scoring (was: Lucene 1.2 - scoring formula needed)

2006-12-12 Thread Karl Koch
separately since they actually also relate to the new Lucene scoring algoritm (they have not changed). Thank you for your time again :) Karl Original-Nachricht Datum: Mon, 11 Dec 2006 22:41:56 -0800 Von: Doron Cohen [EMAIL PROTECTED] An: java-user@lucene.apache.org Betreff: Re

Lucene scoring: Term frequency normalisation

2006-12-12 Thread Karl Koch
Hi, I have a question about the current Lucene scoring algoritm. In this scoring algorithm, the term frequency is calcualted by using the square root of the number of occuring terms as described in http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html#formula_tf

Lucene scoring: coord_q_d factor

2006-12-12 Thread Karl Koch
basis was the decition make to have it? Does anybody know a paper (in Information Retrieval, Information Seeking, etc.) or other more general information about this? Best Regards, Karl P.S.: This is my second question about Lucene scoring (current version). It relates to the question I posted

Re: Questions about Lucene scoring (was: Lucene 1.2 - scoring formula needed)

2006-12-12 Thread Soeren Pekrul
Hello Karl, I’m very interested in the details of Lucene’s scoring as well. Karl Koch wrote: For this reason, I do not understand why Lucene (in version 1.2) normalises the query(!) with norm_q : sqrt(sum_t((tf_q*idf_t)^2)) which is also called cosine normalisation. This is a technique that

Re: Lucene scoring: Term frequency normalisation

2006-12-12 Thread Marvin Humphrey
On Dec 12, 2006, at 2:23 AM, Karl Koch wrote: However, what exactly is the advantage of using sqare root instead of log? Speaking anecdotally, I wouldn't say there's an advantage. There's a predictable effect: very long documents are rewarded, since the damping factor is not as strong.

Re: Lucene scoring: coord_q_d factor

2006-12-12 Thread Steven Rowe
Karl Koch wrote: The coord(q,d) normalisation is a score factor based on how many of the query terms are found in the specified document. and described here: http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html#formula_coord Does this have a theoretical base? On

Re: Lucene scoring: coord_q_d factor

2006-12-12 Thread Steven Rowe
Karl Koch wrote: Is there any other paper that actually shows the benefit of doing this particular normalisation with coord_q_d? I am not suggesting here that it is not useful, I am just looking for evidence how the idea developed. I think it's a mischaracterization to call coordination a

Re: Re: Re: Questions about Lucene scoring (was: Lucene 1.2 - scoring formula needed)

2006-12-12 Thread Doron Cohen
Karl Koch [EMAIL PROTECTED] wrote: For the documents Lucene employs its norm_d_t which is explained as: norm_d_t : square root of number of tokens in d in the same field as t Actually (by default) it is: 1 / sqrt(#tokens in d with same field as t) basically just the square root of the

Re: Re: Questions about Lucene scoring (was: Lucene 1.2 - scoring formula needed)

2006-12-11 Thread Karl Koch
Betreff: Re: Questions about Lucene scoring (was: Lucene 1.2 - scoring formula needed) [EMAIL PROTECTED] wrote: According to these sources, the Lucene scoring formula in version 1.2 is: score(q,d) = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t * boost_t) * coord_q_d Hi Karl

Re: Re: Questions about Lucene scoring (was: Lucene 1.2 - scoring formula needed)

2006-12-11 Thread Doron Cohen
Well it doesn't since there is not justification of why it is the way it is. Its like saying, here is that car with 5 weels... enjoy driving. - I think the explanations there would also answer at least some of your questions. I hoped it would answer *some* of the questions... (not all)

Questions about Lucene scoring (was: Lucene 1.2 - scoring formula needed)

2006-12-09 Thread TheRanger
/200307.mbox/[EMAIL PROTECTED] ). According to these sources, the Lucene scoring formula in version 1.2 is: score(q,d) = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t * boost_t) * coord_q_d where * score (q,d) : score for document d given query q * sum_t : sum for all terms t

Lucene scoring question (how to boost leading terms match)

2006-10-03 Thread qaz zaq
Hi, I have a question about the lucene scoring. In my following example, how can I ensure the doc1 has the higher score than doc2, if I search for A*. In another words, I want to boost the docs which match their leading terms. doc1: Aterm Bterm Cterm doc2: Bterm Aterm Cterm

Re: Lucene scoring question (how to boost leading terms match)

2006-10-03 Thread Doron Cohen
think prefix queries (e.g. A*) are supported in a phrase, and if so you would need to extend it a bit.. Hope this helps, Doron qaz zaq [EMAIL PROTECTED] wrote on 03/10/2006 09:50:24: Hi, I have a question about the lucene scoring. In my following example, how can I ensure the doc1 has

Re: Lucene scoring question (how to boost leading terms match)

2006-10-03 Thread Chris Hostetter
: does not pour affinity information into the score - i.e. both doc1 and doc2 : in your example would get the same score, and the SpanFirstQurey would only : allow you to limit the set of returned documents - Hoss, do you agree with : this? Oh ... hmmm ... i think you're right. SpanScorer

RE: Lucene Scoring

2006-03-08 Thread Pasha Bizhan
Hi, From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Anyone have a doc or something that would allow me to explain this to execs? A Lucene Scoring for Dummies idea...explaining math algo to a exec or someone with no knowledge is not that easy :) http://lucene.apache.org/java/docs

Re: Lucene Scoring

2006-03-08 Thread markharw00d
[EMAIL PROTECTED] wrote: Anyone have a doc or something that would allow me to explain this to execs? Roughly speaking: * Documents containing *all* the search terms are good * Matches on rare words are better than for common words * Long documents are not as good as short ones * Documents

Re: Lucene Scoring

2006-03-08 Thread Chris Hostetter
: Roughly speaking: : : * Documents containing *all* the search terms are good : * Matches on rare words are better than for common words : * Long documents are not as good as short ones : * Documents which mention the search terms many times are good Be wary of the distinction between term and

Re: Lucene scoring bounds ??

2005-06-20 Thread Erik Hatcher
On Jun 18, 2005, at 7:39 PM, Paul Libbrecht wrote: I read the lucene-book about scoring and read a bit of the javadoc but I can't seem to find somewhere expectations of the bouds for the score value. I had believe the score would end up between 0 and 1 but I seem to keep having values

Lucene scoring bounds ??

2005-06-19 Thread Paul Libbrecht
Hi, I read the lucene-book about scoring and read a bit of the javadoc but I can't seem to find somewhere expectations of the bouds for the score value. I had believe the score would end up between 0 and 1 but I seem to keep having values under 0.2. It may be due to my special requests