You could extend this class and provide your own implementation to
incorporate term frequency into the final score. For the record, you might
want to look into BM25Similarity, which takes term frequency into account,
but in a way that gives a much lower score contribution to hits than
You could use IndexSearcher#explain, which tells you how the score of a
document is computed.
Le mar. 17 juil. 2018 à 19:06, a écrit :
> Hi,-
>
> how can i check the contributions from different fields indexed in the
> hits doc's score?
>
> Best regards
>
>
>
Hi,-
how can i check the contributions from different fields indexed in the
hits doc's score?
Best regards
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail:
i forgot to put the doc that i was referring to:
https://lucene.apache.org/core/6_0_1/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
Best regards
On 7/17/18 1:01 PM, baris.ka...@oracle.com wrote:
Hi,-
is there a way to diminish the tf(t in d) component to 1? i dont want
Hi,-
is there a way to diminish the tf(t in d) component to 1? i dont want
the number of times a word appears to affect the scoring for my app.
Best regards
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
modified but the order of results is pretty much the same.
what happens is that when part of the search string is found on both fields
then those entries are hit first since Lucene scoring takes number of
occurrences as dominant in scoring.
But i want the search string to be fully-matched
fwiw https://issues.apache.org/jira/browse/LUCENE-5867 is going to be
released soon.
On Mon, Jan 9, 2017 at 2:17 PM, Rajnish kamboj
wrote:
> My application does not require scoring/ranking. All data is equally
> important for me.
>
> Search query can return any
Thanks for quick responses..
I will try the approach..
Does bypassing scoring increases search performance also?
Regards
Rajnish
On Monday, January 9, 2017, Ian Lea wrote:
> oal.search.ConstantScoreQuery?
>
> "A query that wraps another query and simply returns a constant
oal.search.ConstantScoreQuery?
"A query that wraps another query and simply returns a constant score equal
to the query boost for every document that matches the query. It therefore
simply strips of all scores and returns a constant one."
--
Ian.
On Mon, Jan 9, 2017 at 11:39 AM, Taher Galal
Just wrap your Query in a ConstantScoreQuery. Lucene will optimize
the query execution to not read term frequencies from disk, not
compute scores, etc.
Mike McCandless
http://blog.mikemccandless.com
On Mon, Jan 9, 2017 at 6:17 AM, Rajnish kamboj wrote:
> My
Hi,
What about writing your own scoring that just give a value of 1 to all the
documents that are hits?
On Mon, Jan 9, 2017 at 12:17 PM, Rajnish kamboj
wrote:
> My application does not require scoring/ranking. All data is equally
> important for me.
>
> Search query
My application does not require scoring/ranking. All data is equally
important for me.
Search query can return any documents matching search criteria.
So, Is there a way to completely disable scoring/ranking altogether?
OR Is there a better solution to it.
Regards
Rajnish
Hi everybody!
I would appreciate if you can refer me to some *example *or explanation of
how to change the scoring function of lucene.
I would expect 2 options:
1. changing some configuration, so the ranking function becomes , say Okapi
BM 25 instead of standard similarity
2. Is there any
Hi Victor
You want to look at setting a similarity other than TF IDF. For example
here's BM25 Similarity
https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/similarities/BM25Similarity.html
And the "setSimilarity" method on IndexSearcher
ined ?
[1]
https://www.elastic.co/guide/en/elasticsearch/guide/current/relevance-intro.html#explain
--
View this message in context:
http://lucene.472066.n3.nabble.com/Lucene-Scoring-in-Exact-and-Phrase-Matching-tp4240883.html
Sent from the Lucene - Java Users mailing list archive at
not be a document where his name
pops up a few times but instead be the contact details of Peter where
his name might popup only once.
How would we go and implement this ? Is it neccesary to change the
Lucene scoring algorithm or is there a better/easier way?
Thanks and kind regards,
Lucas Van Overberghe
on Peter.
The first result should therefore not be a document where his name
pops up a few times but instead be the contact details of Peter where
his name might popup only once.
How would we go and implement this ? Is it neccesary to change the
Lucene scoring algorithm or is there a better
to normal... So cool!
-Original Message-
From: Yuval Kesten [mailto:ykes...@yahoo-inc.com]
Sent: Wednesday, February 22, 2012 7:29 PM
To: java-user@lucene.apache.org
Subject: RE: Custom lucene scoring - Dot product between field boost and query
boost
Hi all,
Inspired by another thread here
!
-Original Message-
From: Em [mailto:mailformailingli...@yahoo.de]
Sent: Tuesday, February 21, 2012 6:07 PM
To: java-user@lucene.apache.org
Subject: Re: Custom lucene scoring - Dot product between field boost and
query boost
Hi Yuval,
1. Performances: I am calculating all the TF/IDF
before doing the
indexing and obviously before the searching.
Thanks!
-Original Message-
From: Em [mailto:mailformailingli...@yahoo.de]
Sent: Tuesday, February 21, 2012 6:07 PM
To: java-user@lucene.apache.org
Subject: Re: Custom lucene scoring - Dot product between field boost
has better ideas - please share!
-Original Message-
From: Alan Woodward [mailto:alan.woodw...@romseysoftware.co.uk]
Sent: Wednesday, February 22, 2012 4:00 PM
To: java-user@lucene.apache.org
Subject: Re: Custom lucene scoring - Dot product between field boost and query
boost
Hi Yuval
Hi,
I want to use Lucene with the following scoring logic:
When I index my documents I want to set for each field a score/weight.
When I query my index I want to set for each query term a score/weight.
I will NEVER index or query with many instances of the same field - In each
query (document)
The same question is formatted nicer here:
http://stackoverflow.com/questions/9380188/custom-lucene-scoring-dot-product-between-field-boost-and-query-boost
Thanks!
-Original Message-
From: Yuval Kesten [mailto:ykes...@yahoo-inc.com]
Sent: Tuesday, February 21, 2012 5:18 PM
To: java-user
Hi Yuval,
1. Performances: I am calculating all the TF/IDF stuff and NORMS for
nothing...
You aren't calculating that much, since you declared all those values as
constants. What are you worried about?
2. The score I get from the TopScoreDocCollector is not the same as I
get from the
@lucene.apache.org
Subject: Re: Custom lucene scoring - Dot product between field boost and query
boost
Hi Yuval,
1. Performances: I am calculating all the TF/IDF stuff and NORMS for
nothing...
You aren't calculating that much, since you declared all those values as
constants. What are you
Hi all,
I have the following problem with Lucene being not deterministic.
I use a MultiSearcher to process a search and when I get hits with same score,
those are returned in a random order.
I wouldn't care much about the order of the hits with same score if I could get
them all, so I could
[] { SortField.FIELD_SCORE, new
SortField(POSITION,SortField.INT) });
-Original Message-
From: Yanick Gamelin [mailto:yanick.game...@ericsson.com]
Sent: Thursday, August 25, 2011 3:02 PM
To: java-user@lucene.apache.org
Subject: Lucene scoring and random result order
Hi all,
I have the following
Hi
Does Lucene support setting word confidence for every word in the document,
to influence the scoring?
As suggested by MAVIS
projecthttp://research.microsoft.com/en-us/projects/mavis/, when
indexing Speech Recognition text one need to take into account how confident
the recognition of a word is.
...@apache.org
wrote:
On Jul 5, 2010, at 5:02 AM, manjula wijewickrema wrote:
Hi,
In my application, I input only single term query (at one time) and
get
back
the corresponding scorings for those queries. But I am little
struggling
of
understanding Lucene scoring. I have
input only single term query (at one time) and get
back
the corresponding scorings for those queries. But I am little struggling
of
understanding Lucene scoring. I have reffered
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html
and
some other pages
:
Hi,
In my application, I input only single term query (at one time) and get
back
the corresponding scorings for those queries. But I am little struggling
of
understanding Lucene scoring. I have reffered
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html
Hi,
In my application, I input only single term query (at one time) and get back
the corresponding scorings for those queries. But I am little struggling of
understanding Lucene scoring. I have reffered
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html
and
some
On Jul 5, 2010, at 5:02 AM, manjula wijewickrema wrote:
Hi,
In my application, I input only single term query (at one time) and get back
the corresponding scorings for those queries. But I am little struggling of
understanding Lucene scoring. I have reffered
http://lucene.apache.org/java
Hi All,
Given that Lucene scoring can favour shorter fields in documents, in the
past we've had to pad out 'unreasonably' short fields to a set minimum
(with basically nonsense words), I'm wondering how others might have
dealt with this issue.
Another option is to have a custom Similarity class
: (with basically nonsense words), I'm wondering how others might have
: dealt with this issue.
:
: Another option is to have a custom Similarity class with an altered
: lengthNorm method?
that is what i would recommend ... it's exactly what SweetSpotSimilarity
does (you define a platuea of
Karl Koch wrote:
Are there any other papers that regard the combination of coordination level matching and TFxIDF as advantageous?
We independently developed coordination-level matching combined with
TFxIDF when I worked at Apple. This is documented in:
Karl Koch wrote:
If I do not misunderstand that extract, I would say it suggests the combination of coordination level matching with IDF. I am interested in your view and those who read this?
I understand that sentence:
The natural solution is to correlate a term's matching value with its
-user@lucene.apache.org
Betreff: Re: Lucene scoring: coord_q_d factor
Karl Koch wrote:
If I do not misunderstand that extract, I would say it suggests the
combination of coordination level matching with IDF. I am interested in your
view and those who read this?
I understand that sentence
Soeren Pekrul wrote:
The score for a document is the sum of the term weights w(tf, idf) for
each containing term. So you have already the combination of
coordination level matching with IDF. Now it is possible that your query
requests three terms A, B and C. Two of them (A and B) are quite
FYI: The Wiki has a fair number of resources on IR: http://
wiki.apache.org/jakarta-lucene/InformationRetrieval (I have added a
link to this conversation, which contains a lot of useful information)
Karl, if you are so inclined, please feel free to add any of the
references you have found
Do you know about any papers that discuss this?
Karl
Original-Nachricht
Datum: Wed, 13 Dec 2006 10:31:41 -0500
Von: Yonik Seeley [EMAIL PROTECTED]
An: java-user@lucene.apache.org
Betreff: Re: Lucene scoring: coord_q_d factor
On 12/13/06, Karl Koch [EMAIL PROTECTED] wrote
On Wednesday 13 December 2006 16:42, Karl Koch wrote:
Do you know about any papers that discuss this?
Coordination is called co-ordination In the original idf paper by
K. Spärck Jones, A statistical interpretation of term specificity
and its application in retrieval., Journal of Documentation
separately since they actually also relate to
the new Lucene scoring algoritm (they have not changed). Thank you for your
time again :)
Karl
Original-Nachricht
Datum: Mon, 11 Dec 2006 22:41:56 -0800
Von: Doron Cohen [EMAIL PROTECTED]
An: java-user@lucene.apache.org
Betreff: Re
Hi,
I have a question about the current Lucene scoring algoritm. In this scoring
algorithm, the term frequency is calcualted by using the square root of the
number of occuring terms as described in
http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html#formula_tf
basis was the decition make to have
it? Does anybody know a paper (in Information Retrieval, Information Seeking,
etc.) or other more general information about this?
Best Regards,
Karl
P.S.: This is my second question about Lucene scoring (current version). It
relates to the question I posted
Hello Karl,
I’m very interested in the details of Lucene’s scoring as well.
Karl Koch wrote:
For this reason, I do not understand why Lucene (in version 1.2) normalises the query(!) with
norm_q : sqrt(sum_t((tf_q*idf_t)^2))
which is also called cosine normalisation. This is a technique that
On Dec 12, 2006, at 2:23 AM, Karl Koch wrote:
However, what exactly is the advantage of using sqare root instead
of log?
Speaking anecdotally, I wouldn't say there's an advantage. There's a
predictable effect: very long documents are rewarded, since the
damping factor is not as strong.
Karl Koch wrote:
The coord(q,d) normalisation is a score factor based on how many of
the query terms are found in the specified document. and described
here:
http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html#formula_coord
Does this have a theoretical base? On
Karl Koch wrote:
Is there any other paper that actually shows the benefit of doing
this particular normalisation with coord_q_d? I am not suggesting
here that it is not useful, I am just looking for evidence how the
idea developed.
I think it's a mischaracterization to call coordination a
Karl Koch [EMAIL PROTECTED] wrote:
For the documents Lucene employs
its norm_d_t which is explained as:
norm_d_t : square root of number of tokens in d in the same field as t
Actually (by default) it is:
1 / sqrt(#tokens in d with same field as t)
basically just the square root of the
Betreff: Re: Questions about Lucene scoring (was: Lucene 1.2 - scoring formula
needed)
[EMAIL PROTECTED] wrote:
According to these sources, the Lucene scoring formula in version 1.2
is:
score(q,d) = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t *
boost_t) * coord_q_d
Hi Karl
Well it doesn't since there is not justification of why it is the
way it is. Its like saying, here is that car with 5 weels... enjoy
driving.
- I think the explanations there would also answer at least some of
your
questions.
I hoped it would answer *some* of the questions... (not all)
/200307.mbox/[EMAIL
PROTECTED] ).
According to these sources, the Lucene scoring formula in version 1.2 is:
score(q,d) = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t *
boost_t) * coord_q_d
where
* score (q,d) : score for document d given query q
* sum_t : sum for all terms t
Hi,
I have a question about the lucene scoring. In my following example, how can
I ensure the doc1 has the higher score than doc2, if I search for A*. In
another words, I want to boost the docs which match their leading terms.
doc1: Aterm Bterm Cterm
doc2: Bterm Aterm Cterm
think prefix queries (e.g. A*) are supported in
a phrase, and if so you would need to extend it a bit..
Hope this helps,
Doron
qaz zaq [EMAIL PROTECTED] wrote on 03/10/2006 09:50:24:
Hi,
I have a question about the lucene scoring. In my following
example, how can I ensure the doc1 has
: does not pour affinity information into the score - i.e. both doc1 and doc2
: in your example would get the same score, and the SpanFirstQurey would only
: allow you to limit the set of returned documents - Hoss, do you agree with
: this?
Oh ... hmmm ... i think you're right. SpanScorer
Hi,
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Anyone have a doc or something that would allow me to explain
this to execs? A Lucene Scoring for Dummies
idea...explaining math algo to a exec or someone with no
knowledge is not that easy :)
http://lucene.apache.org/java/docs
[EMAIL PROTECTED] wrote:
Anyone have a doc or something that would allow me to explain this to execs?
Roughly speaking:
* Documents containing *all* the search terms are good
* Matches on rare words are better than for common words
* Long documents are not as good as short ones
* Documents
: Roughly speaking:
:
: * Documents containing *all* the search terms are good
: * Matches on rare words are better than for common words
: * Long documents are not as good as short ones
: * Documents which mention the search terms many times are good
Be wary of the distinction between term and
On Jun 18, 2005, at 7:39 PM, Paul Libbrecht wrote:
I read the lucene-book about scoring and read a bit of the javadoc
but I can't seem to find somewhere expectations of the bouds for
the score value.
I had believe the score would end up between 0 and 1 but I seem to
keep having values
Hi,
I read the lucene-book about scoring and read a bit of the javadoc but
I can't seem to find somewhere expectations of the bouds for the score
value.
I had believe the score would end up between 0 and 1 but I seem to keep
having values under 0.2. It may be due to my special requests
61 matches
Mail list logo