Result order when score is the same

2011-04-13 Thread kenf_nc
I'm using version 1.4.1. It appears that when several documents in a result
set have the same score, the secondary sort is by 'indexed_at' ascending.
Can this be altered in the config xml files? If I wanted the secondary sort
to be indexed_at descending for example, or by a different field, say
document title.

Thanks,
Ken

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Result-order-when-score-is-the-same-tp2816127p2816127.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Result order when score is the same

2011-04-13 Thread kenf_nc
Is sort order when 'score' is the same a Lucene thing? Should I ask on the
Lucene forum?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Result-order-when-score-is-the-same-tp2816127p2817330.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Result order when score is the same

2011-04-13 Thread Rob Casson
you could just explicitly send multiple sorts...from the tutorial:

 sort=inStock asc, price desc

cheers.

On Wed, Apr 13, 2011 at 2:59 PM, kenf_nc ken.fos...@realestate.com wrote:
 Is sort order when 'score' is the same a Lucene thing? Should I ask on the
 Lucene forum?

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Result-order-when-score-is-the-same-tp2816127p2817330.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Result order when score is the same

2011-04-13 Thread Jonathan Rochkind
In real life though, it seems unlikely that the relevancy score will 
ever be identical, so the second sort field will never be used.  Is 
relevancy score ever identical?  Rarely at any rate.


On 4/13/2011 3:22 PM, Rob Casson wrote:

you could just explicitly send multiple sorts...from the tutorial:

  sort=inStock asc, price desc

cheers.

On Wed, Apr 13, 2011 at 2:59 PM, kenf_ncken.fos...@realestate.com  wrote:

Is sort order when 'score' is the same a Lucene thing? Should I ask on the
Lucene forum?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Result-order-when-score-is-the-same-tp2816127p2817330.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Result order when score is the same

2011-04-13 Thread kenf_nc
Au contraire, I have almost 4 million documents, representing businesses in
the US. And having the score be the same is a very common occurrence.

It is quite clear from testing that if score is the same, then it sorts on
indexed_at ascending. It seems silly to make me add a sort on every query,
there should be some configuration to modify this. However, if I make all my
queries include sort=score+desc,indexed_at+desc will that have a
detrimental performance effect?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Result-order-when-score-is-the-same-tp2816127p2817458.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Result order when score is the same

2011-04-13 Thread Markus Jelsma
If you omitNorms and omitTermFreqAndPositions on the query field(s) and use no 
funky boost functions, all results will have identical score in AND-queries 
(or queries with one search term). IDF has no meaning because of AND, 
queryNorm is the same across the resultset, fieldNorm is 1 and TF is 1.

It's not a really uncommon use-case. Some business owners just do not care 
about normalizing or term frequencies.

 In real life though, it seems unlikely that the relevancy score will
 ever be identical, so the second sort field will never be used.  Is
 relevancy score ever identical?  Rarely at any rate.
 
 On 4/13/2011 3:22 PM, Rob Casson wrote:
  you could just explicitly send multiple sorts...from the tutorial:
sort=inStock asc, price desc
  
  cheers.
  
  On Wed, Apr 13, 2011 at 2:59 PM, kenf_ncken.fos...@realestate.com  
wrote:
  Is sort order when 'score' is the same a Lucene thing? Should I ask on
  the Lucene forum?
  
  --
  View this message in context:
  http://lucene.472066.n3.nabble.com/Result-order-when-score-is-the-same-
  tp2816127p2817330.html Sent from the Solr - User mailing list archive at
  Nabble.com.


Re: Result order when score is the same

2011-04-13 Thread Markus Jelsma
Sorting a large set is costly, the more fields you sort on, the more memory is 
consumed (and likely cached).

If i remember correctly the result set will be ordered according to Lucene 
DocID's if there's nothing to sort on.

If i read correctly, you don't want to specify those fixed sort parameter for 
every query right? You can simply add the parameter as default (or constant (= 
invariant)) in your request handler configuration in solrconfig.

 Au contraire, I have almost 4 million documents, representing businesses in
 the US. And having the score be the same is a very common occurrence.
 
 It is quite clear from testing that if score is the same, then it sorts on
 indexed_at ascending. It seems silly to make me add a sort on every query,
 there should be some configuration to modify this. However, if I make all
 my queries include sort=score+desc,indexed_at+desc will that have a
 detrimental performance effect?
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Result-order-when-score-is-the-same-tp2
 816127p2817458.html Sent from the Solr - User mailing list archive at
 Nabble.com.


Re: Result order when score is the same

2011-04-13 Thread kenf_nc
Is a new DocID generated everytime a doc with the same UniqueID is added to
the index? If so, then docID must be incremental and would look like
indexed_at ascending. What I see (and why it's a problem for me) is the
following.

a search brings back the first 5 documents in a result set of say 60. The
score,titles are as follows (simulated)
1) 6.5, Doc 1
2) 6.3, Doc 2
3) 4.7, Doc 3
4) 4.7, Doc 4
5) 4.7, Doc 5
---
6) 4.7, Doc 6
7) 4.7, Doc 7
8) 4.4, Doc 8

If I query 6 times the results come back like that every time. However if I
change a field in Doc 4, a field that is not part of the search, it gets the
same score, but the results are now this.
1) 6.5, Doc 1
2) 6.3, Doc 2
3) 4.7, Doc 3
4) 4.7, Doc 5
5) 4.7, Doc 6
---
6) 4.7, Doc 7
7) 4.7, Doc 4
8) 4.4, Doc 8

So, in a specific situation I'm looking at, a user sees 5 items on a UI
page, they click a button to 'favorite' document #4, I update Doc 4 and
(because it was architecturally better) I re-issue the search. So from the
users viewpoint they 'favorited' number 4 and it disappeared from their
screen. Not a good user experience.

If I could modify the secondary sort when score is the same then worse case
doc 4 would pop to the top of the users screen but not disappear. Better
would be to secondary sort on Title or some other fixed field that exists on
all documents. But, I would want the sort to be at the system level, I dont'
want the overhead of sorting every query I ever make.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Result-order-when-score-is-the-same-tp2816127p2817766.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Result order when score is the same

2011-04-13 Thread Markus Jelsma

 Is a new DocID generated everytime a doc with the same UniqueID is added to
 the index? If so, then docID must be incremental and would look like
 indexed_at ascending. What I see (and why it's a problem for me) is the
 following.

Yes, Solr removes the old and inserts a new when updating an existing 
document.

 
 a search brings back the first 5 documents in a result set of say 60. The
 score,titles are as follows (simulated)
 1) 6.5, Doc 1
 2) 6.3, Doc 2
 3) 4.7, Doc 3
 4) 4.7, Doc 4
 5) 4.7, Doc 5
 ---
 6) 4.7, Doc 6
 7) 4.7, Doc 7
 8) 4.4, Doc 8
 
 If I query 6 times the results come back like that every time. However if I
 change a field in Doc 4, a field that is not part of the search, it gets
 the same score, but the results are now this.
 1) 6.5, Doc 1
 2) 6.3, Doc 2
 3) 4.7, Doc 3
 4) 4.7, Doc 5
 5) 4.7, Doc 6
 ---
 6) 4.7, Doc 7
 7) 4.7, Doc 4
 8) 4.4, Doc 8

The above scenario makes sense indeed.

 
 So, in a specific situation I'm looking at, a user sees 5 items on a UI
 page, they click a button to 'favorite' document #4, I update Doc 4 and
 (because it was architecturally better) I re-issue the search. So from the
 users viewpoint they 'favorited' number 4 and it disappeared from their
 screen. Not a good user experience.

I agree. If you don't want this to happen you must ensure that the index order 
is never used in a search.

 
 If I could modify the secondary sort when score is the same then worse case
 doc 4 would pop to the top of the users screen but not disappear. Better
 would be to secondary sort on Title or some other fixed field that exists
 on all documents. But, I would want the sort to be at the system level, I
 dont' want the overhead of sorting every query I ever make.

Well, sub-sorts must be used to avoid the index order being used for output. 
Maybe sorting on creation time (obviously not update time) as a final sort is 
allowed in our use case. It'll take some resources but if business 
requirements are as such then the resource penalty must be met or accepted.

What do you mean by sorting on the system level? You need the overhead if you 
don't want the index order to reflect in your result set if the final sub-sort 
also results in duplicates.

 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Result-order-when-score-is-the-same-tp2
 816127p2817766.html Sent from the Solr - User mailing list archive at
 Nabble.com.


Re: Result order when score is the same

2011-04-13 Thread Jonathan Rochkind



all documents. But, I would want the sort to be at the system level, I dont'
want the overhead of sorting every query I ever make.


How would 'doing it at the system level' avoid the 'overhead of sorting 
every query'?  Every query has to be sorted, if you want it sorted.


Beyond setting a default sort parameter in general in your request 
parameters, I don't think there's any way to somehow set defaults when 
I ask for sort by score, I REALLY mean sort by score, then by X, which 
is what I think you asked earlier.


Just send the sort that you want. score, some_other_field desc.  If 
that's what you want.  The second field will really only be used for 
identical scores, plus Solr sorts pretty darn efficiently. I'd be pretty 
surprised if you were able to see any measurable performance difference 
at all of adding a second field to your sort parameter.


Beware that the design you describe of updating the Solr index on user 
action can often run into trouble in Solr as you scale.  Solr can only 
handle so many commits in a given short period of time, before it starts 
having trouble. At least in 1.4.1.  I am not sure of the status in 3.1 
of some of the near real time features meant to ameliorate this 
problem, at least in some cases.  But this is potentially a far bigger 
performance headache, eventually, then worrying about adding a second 
field onto your sort effecting performance.


Re: Result order when score is the same

2011-04-13 Thread Otis Gospodnetic
Hi Ken,

It sounds like you want to just sort by time changed/added (reverse chrono 
order).  I would not worry about issues just yet unless you have some reasons 
to 
think this is going to cause problems (e.g. giant index, low RAM).  Jonathan is 
right about commits, and the NRT-ness of search in a typical Solr master-slave 
setup.  In other words, even if you update the doc, it will be on the master, 
and your user will still see the same results in the same order until the next 
time the index is replicated from the master.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: kenf_nc ken.fos...@realestate.com
 To: solr-user@lucene.apache.org
 Sent: Wed, April 13, 2011 4:49:06 PM
 Subject: Re: Result order when score is the same
 
 Is a new DocID generated everytime a doc with the same UniqueID is added  to
 the index? If so, then docID must be incremental and would look  like
 indexed_at ascending. What I see (and why it's a problem for me) is  the
 following.
 
 a search brings back the first 5 documents in a result  set of say 60. The
 score,titles are as follows (simulated)
 1) 6.5, Doc  1
 2) 6.3, Doc 2
 3) 4.7, Doc 3
 4) 4.7, Doc 4
 5) 4.7, Doc  5
 ---
 6) 4.7, Doc 6
 7) 4.7, Doc 7
 8) 4.4, Doc 8
 
 If I query 6  times the results come back like that every time. However if I
 change a field  in Doc 4, a field that is not part of the search, it gets the
 same score, but  the results are now this.
 1) 6.5, Doc 1
 2) 6.3, Doc 2
 3) 4.7, Doc  3
 4) 4.7, Doc 5
 5) 4.7, Doc 6
 ---
 6) 4.7, Doc 7
 7) 4.7, Doc  4
 8) 4.4, Doc 8
 
 So, in a specific situation I'm looking at, a user  sees 5 items on a UI
 page, they click a button to 'favorite' document #4, I  update Doc 4 and
 (because it was architecturally better) I re-issue the  search. So from the
 users viewpoint they 'favorited' number 4 and it  disappeared from their
 screen. Not a good user experience.
 
 If I could  modify the secondary sort when score is the same then worse case
 doc 4 would  pop to the top of the users screen but not disappear. Better
 would be to  secondary sort on Title or some other fixed field that exists on
 all  documents. But, I would want the sort to be at the system level, I dont'
 want  the overhead of sorting every query I ever make.
 
 
 
 --
 View this  message in context: 
http://lucene.472066.n3.nabble.com/Result-order-when-score-is-the-same-tp2816127p2817766.html

 Sent  from the Solr - User mailing list archive at Nabble.com.