[jira] Commented: (LUCENE-2215) paging collector

2010-03-26 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850072#action_12850072
 ] 

Grant Ingersoll commented on LUCENE-2215:
-

bq. Let's be careful about the semantics here Grant. Most if not all 
applications implement paging indeed, but I believe only FEW actually store 
user contexts between searches. PagingCollector relies on the application to 
store the lowest ranking doc that was returned previously, which means storing 
context between user's searches.

I think, assuming the math plays out, that once you show the gains to be had 
here, esp. for deep paging, storing an int and a float is trivial.  If they are 
implementing paging, they are already keeping state about what page they are 
on.  

bq. Now they will need to think where do I get this low score from?

Sorry, but If that is that hard to figure out, then I don't see how they have 
any business writing a Lucene application to begin with.  A simple javadoc that 
says these two values are taken from the last result of the previously seen 
page should do the trick

At any rate, let's put up the patches and find out instead of debating.  I 
should have time today to do mine.

 paging collector
 

 Key: LUCENE-2215
 URL: https://issues.apache.org/jira/browse/LUCENE-2215
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.4, 3.0
Reporter: Adam Heinz
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: IterablePaging.java, LUCENE-2215.patch, 
 PagingCollector.java, TestingPagingCollector.java


 http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
 Somebody assign this to Aaron McCurry and we'll see if we can get enough 
 votes on this issue to convince him to upload his patch.  :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2215) paging collector

2010-03-26 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850086#action_12850086
 ] 

Shai Erera commented on LUCENE-2215:


Sure let's wait for the patch and some perf. results.

 paging collector
 

 Key: LUCENE-2215
 URL: https://issues.apache.org/jira/browse/LUCENE-2215
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.4, 3.0
Reporter: Adam Heinz
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: IterablePaging.java, LUCENE-2215.patch, 
 PagingCollector.java, TestingPagingCollector.java


 http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
 Somebody assign this to Aaron McCurry and we'll see if we can get enough 
 votes on this issue to convince him to upload his patch.  :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2215) paging collector

2010-03-25 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849639#action_12849639
 ] 

Michael McCandless commented on LUCENE-2215:


This is a neat collector!

I like the idea of chaining/filtering... couldn't we put this in core
(under TFC/TSDC.create), but instead of doubling the 12 specialized
(anonymous) impls we now have, just delegate?

Ie, we'd make a FilteredCollector, taking another collector when it's
created, and then on every collect call, only if the hit is weak
enough (ie is worse than what the app provided as prev low score/doc)
would it forward it to the delegate?  I guess we should test perf w/
(the new additions to benchmark -- yay!) to see if specializing the
code (even anonymously) is warranted.

The indent whitespace needs to fixed to 2 spaces...


 paging collector
 

 Key: LUCENE-2215
 URL: https://issues.apache.org/jira/browse/LUCENE-2215
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.4, 3.0
Reporter: Adam Heinz
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: IterablePaging.java, LUCENE-2215.patch, 
 PagingCollector.java, TestingPagingCollector.java


 http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
 Somebody assign this to Aaron McCurry and we'll see if we can get enough 
 votes on this issue to convince him to upload his patch.  :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2215) paging collector

2010-03-25 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849843#action_12849843
 ] 

Grant Ingersoll commented on LUCENE-2215:
-

Mike,  don't you think, though, that through a fairly simple update of some of 
the clauses to appropriate short circuit things that we can just hook this into 
the existing collectors w/o no need for any delegation or changes?  Let me try 
a patch.  Now that the benchmark stuff is in, we should be able to test.


 paging collector
 

 Key: LUCENE-2215
 URL: https://issues.apache.org/jira/browse/LUCENE-2215
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.4, 3.0
Reporter: Adam Heinz
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: IterablePaging.java, LUCENE-2215.patch, 
 PagingCollector.java, TestingPagingCollector.java


 http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
 Somebody assign this to Aaron McCurry and we'll see if we can get enough 
 votes on this issue to convince him to upload his patch.  :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2215) paging collector

2010-03-25 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849851#action_12849851
 ] 

Uwe Schindler commented on LUCENE-2215:
---

Hey, and I want to fix the NaN thing in TSDC: LUCENE-2271

Maybe when we delegate, we can also use my cool code that switches the delegate 
to remove on comparison after the queue is full.

 paging collector
 

 Key: LUCENE-2215
 URL: https://issues.apache.org/jira/browse/LUCENE-2215
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.4, 3.0
Reporter: Adam Heinz
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: IterablePaging.java, LUCENE-2215.patch, 
 PagingCollector.java, TestingPagingCollector.java


 http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
 Somebody assign this to Aaron McCurry and we'll see if we can get enough 
 votes on this issue to convince him to upload his patch.  :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2215) paging collector

2010-03-25 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849863#action_12849863
 ] 

Michael McCandless commented on LUCENE-2215:


bq. ...through a fairly simple update of some of the clauses to appropriate 
short circuit things that we can just hook this into the existing collectors 
w/o no need for any delegation or changes? Let me try a patch. Now that the 
benchmark stuff is in, we should be able to test.

This'd make me nervous...

Ie I don't think we should insert bytecodes for the 99.9% of searches that 
wouldn't make use of this, even if we can't uncover a slowdown with 
benchmarking.

We should still benchmark it though (I'm curious)... we should also benchmark 
the delegate solution.

 paging collector
 

 Key: LUCENE-2215
 URL: https://issues.apache.org/jira/browse/LUCENE-2215
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.4, 3.0
Reporter: Adam Heinz
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: IterablePaging.java, LUCENE-2215.patch, 
 PagingCollector.java, TestingPagingCollector.java


 http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
 Somebody assign this to Aaron McCurry and we'll see if we can get enough 
 votes on this issue to convince him to upload his patch.  :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2215) paging collector

2010-03-25 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849961#action_12849961
 ] 

Grant Ingersoll commented on LUCENE-2215:
-

Yeah, but one could make the argument, Mike, that the existing optimizations 
are useless for the most common case, since I think it's safe to say most 
applications implement paging.  Of course, that being said, most users don't 
page all that deeply.  Also, for something like Solr that prefetches the top 50 
it might not be good, either.  Still, in my mind it is one additional boolean 
check, as in:
{code}
if ( (current stuff) || (pagingInfoPresent == true  paging check) )
...
{code}

pagingInfoPresent can be determined at construction time and that whole clause 
would be short circuited very quickly.

That being said, delegation could be done at construction time, too and more 
cleanly separates things.  I'll try to put up my version tomorrow.

 paging collector
 

 Key: LUCENE-2215
 URL: https://issues.apache.org/jira/browse/LUCENE-2215
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.4, 3.0
Reporter: Adam Heinz
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: IterablePaging.java, LUCENE-2215.patch, 
 PagingCollector.java, TestingPagingCollector.java


 http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
 Somebody assign this to Aaron McCurry and we'll see if we can get enough 
 votes on this issue to convince him to upload his patch.  :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2215) paging collector

2010-03-25 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850002#action_12850002
 ] 

Shai Erera commented on LUCENE-2215:


bq. since I think it's safe to say most applications implement paging

Let's be careful about the semantics here Grant. Most if not all applications 
implement paging indeed, but I believe only FEW actually store user contexts 
between searches. PagingCollector relies on the application to store the lowest 
ranking doc that was returned previously, which means storing context between 
user's searches.

I agree w/ Mike's statement about 99.9% of the searches would never run that 
code, which is why I've proposed a delegation/wrapper approach from the 
beginning. I also think that we should make some allowances here and there, for 
the non-common case, and introduce better software design than specialized 
code. A Collector filter approach for some rare (or even less common) cases 
seems very reasonable to me.

Also, I think that if we add to TSDC a create method which takes into account 
the previously scored lowest doc, it will confuse people. Now they will need to 
think where do I get this low score from? - but perhaps after I see the code, 
it wouldn't be such a bad thing  just have a feeling TSDC and TFC should be 
left on their own, and extreme paging stuff should either be its own 
specialized collector, or a wrapper.

 paging collector
 

 Key: LUCENE-2215
 URL: https://issues.apache.org/jira/browse/LUCENE-2215
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.4, 3.0
Reporter: Adam Heinz
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: IterablePaging.java, LUCENE-2215.patch, 
 PagingCollector.java, TestingPagingCollector.java


 http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
 Somebody assign this to Aaron McCurry and we'll see if we can get enough 
 votes on this issue to convince him to upload his patch.  :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2215) paging collector

2010-03-24 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849193#action_12849193
 ] 

Grant Ingersoll commented on LUCENE-2215:
-

{quote}
I must admit I don't like throwing UOE. I imagine the naive user calling one of 
these and hit w/ UOE out of nowhere really . Perhaps it's a sign 
PagingCollector should not be a sub-class of TopDocsCollector? It does not 
benefit from it in any way because it overrides all the main methods, impls 
them or throws UOE for those it doesn't like. So perhaps it should just be a 
TopScorePagingCollector which copies some of the functionality of TSDC, but is 
not a TDC itself. It will have a topDocs() method, and only it (b/c I agree the 
rest don't make any sense).
{quote}

I agree, not a huge fan of it either, but it is bad form to call it when using 
this collector and I'd rather people learn that up front.  Like I said in the 
last comment, I think we'd be better off trying to integrate this in to a lower 
level and not even having a special collector.  If we just added a create 
option that took in the necessary info, then we could just mod the existing 
collectors, possibly.  Then those two topDocs methods could just be 
deprecated/removed.

 paging collector
 

 Key: LUCENE-2215
 URL: https://issues.apache.org/jira/browse/LUCENE-2215
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.4, 3.0
Reporter: Adam Heinz
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: IterablePaging.java, LUCENE-2215.patch, 
 PagingCollector.java, TestingPagingCollector.java


 http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
 Somebody assign this to Aaron McCurry and we'll see if we can get enough 
 votes on this issue to convince him to upload his patch.  :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2215) paging collector

2010-03-24 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849200#action_12849200
 ] 

Shai Erera commented on LUCENE-2215:


So what's the motivation of declaring PagingCollector a TopDocsCollector? Would 
you envision one to request for a TopDocsCollector but don't care if it's TSDC, 
TFC or PagingCollector? I would rather have it extend TDC directly, and then 
you won't need to throw UOE for the rest of the methods ...

What about renaming it to TopScorePagingCollector?

 paging collector
 

 Key: LUCENE-2215
 URL: https://issues.apache.org/jira/browse/LUCENE-2215
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.4, 3.0
Reporter: Adam Heinz
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: IterablePaging.java, LUCENE-2215.patch, 
 PagingCollector.java, TestingPagingCollector.java


 http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
 Somebody assign this to Aaron McCurry and we'll see if we can get enough 
 votes on this issue to convince him to upload his patch.  :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2215) paging collector

2010-03-24 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849207#action_12849207
 ] 

Grant Ingersoll commented on LUCENE-2215:
-

I'm saying PagingColl. doesn't even exist and it is just folded into the two 
existing In/Out Collectors with a new create() method that knows when it's 
paging and when it's not.

 paging collector
 

 Key: LUCENE-2215
 URL: https://issues.apache.org/jira/browse/LUCENE-2215
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.4, 3.0
Reporter: Adam Heinz
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: IterablePaging.java, LUCENE-2215.patch, 
 PagingCollector.java, TestingPagingCollector.java


 http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
 Somebody assign this to Aaron McCurry and we'll see if we can get enough 
 votes on this issue to convince him to upload his patch.  :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2215) paging collector

2010-03-24 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849216#action_12849216
 ] 

Grant Ingersoll commented on LUCENE-2215:
-

{quote}
The only complication I see is that if we want to make it extremely efficient, 
we'll need to double the number of Collector impls for TSDC and TFC (the 
internal instances that are created) ... 
{quote}

I'm not convinced yet.  I think we can likely make it short circuit quite fast 
for the non-paging case, but rather than guess, let's benchmark.  I'm 
extracting my Benchmark collector stuff right now on LUCENE-2343.  I also am 
not sure we need to double the number of collectors.

 paging collector
 

 Key: LUCENE-2215
 URL: https://issues.apache.org/jira/browse/LUCENE-2215
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.4, 3.0
Reporter: Adam Heinz
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: IterablePaging.java, LUCENE-2215.patch, 
 PagingCollector.java, TestingPagingCollector.java


 http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
 Somebody assign this to Aaron McCurry and we'll see if we can get enough 
 votes on this issue to convince him to upload his patch.  :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2215) paging collector

2010-03-23 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848842#action_12848842
 ] 

Grant Ingersoll commented on LUCENE-2215:
-

I think in order to properly implement this, topDocs() needs to be non-final, 
otherwise there is some oddities in initing a PQ with more results than are 
available once paging.  Updated patch shortly.

 paging collector
 

 Key: LUCENE-2215
 URL: https://issues.apache.org/jira/browse/LUCENE-2215
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.4, 3.0
Reporter: Adam Heinz
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: IterablePaging.java, PagingCollector.java, 
 TestingPagingCollector.java


 http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
 Somebody assign this to Aaron McCurry and we'll see if we can get enough 
 votes on this issue to convince him to upload his patch.  :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2215) paging collector

2010-03-23 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848896#action_12848896
 ] 

Shai Erera commented on LUCENE-2215:


I've reviewed PagingCollector.java and the first thing I have to say about it 
is that I really like it ! :) Saves lots of unnecessary heapify code, if the 
application can allow itself to store the lowest last SD.

I have few comments/questions.

I don't understand what getLastScoreDoc is for? Is it just a utility method? Is 
it something the app can compute by itself? Anyway, it lacks javadocs, so 
perhaps if they existed I wouldn't need to ask ;).

In collect(), there's the following code:
{code}
} else if (score == previousPassLowest.score  doc = 
previousPassLowest.doc) {
// if the scores are the same and the doc is less than 
or equal to
// the
// previous pass lowest hit doc then skip because this 
collector
// favors
// lower number documents.
return;
{code}

I think there's a typo in the comment favors lower number documents .. while 
it seems to prefer higher doc IDs? The way I understand it, irregardless of 
whether docs are collected in/out of order, HitQueue ensures that when scores 
are equals, the lowest IDs are favored. Thus the first round always keeps the 
lowest IDs among the docs whose scores match. The next round will favor the 
docs whose IDs come next, and so forth ... am I right? (just clarifying my 
understanding).
If that's the case, I think it'll be good if it's spelled out in the comment, 
and also mention that it means that document has already been returned 
previously (like it's documented in the previous 'if').

The last 'else' really looks like TSDC's out-of-order version, which makes me 
think whether PagingCollector can be viewed as a filter on top of TSDC (and 
possibly even TopFieldCollector)? So if a hit should be collected, it just 
calls super.collect? I realize though that a Collector is a hotspot and we want 
to minimize 'if' let alone method call statements as much as possible. But it 
just feels so strong that it should be a filter ... :). And you wouldn't need 
to specifically handle in/out orderness ... and w/ the right design, it can 
also wrap a TFC or any other TDC implementation ...

BTW, I've noticed that you don't track maxScore - is it assumed that the 
application stores it from the first round? If so I'd document it, because the 
application needs to know it should use TSDC the first round, and 
PagingCollector the second round.

Also, PagingCollector offers a ctor which does not force the application to 
pass in a ScoreDoc. See my comment from above - it might be misleading, because 
if you use this collector right from the very first search, you lose the 
maxScore tracking. I also don't see why it should be allowed - if a dummy 
previousPassLowest ScoreDoc is used, collect() does a lot of unnecessary 'if's. 
I think this collector should be used only from the second round, and a single 
ctor which forces a ScoreDoc to be passed would make more sense. If the 
application wishes to shoot itself in the leg (performance-wise), it can pass a 
dummy SD itself.

 paging collector
 

 Key: LUCENE-2215
 URL: https://issues.apache.org/jira/browse/LUCENE-2215
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.4, 3.0
Reporter: Adam Heinz
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: IterablePaging.java, PagingCollector.java, 
 TestingPagingCollector.java


 http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
 Somebody assign this to Aaron McCurry and we'll see if we can get enough 
 votes on this issue to convince him to upload his patch.  :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2215) paging collector

2010-03-23 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848904#action_12848904
 ] 

Grant Ingersoll commented on LUCENE-2215:
-

bq. BTW, I've noticed that you don't track maxScore

Good point.  I think we probably should track it, so that the PagingColl could 
be used right from the get go.

We might also consider deprecating the topDocs() methods that take in 
parameters and think about how the paging collector might be integrated at a 
lower level in the other collectors, such that one doesn't even have to think 
about calling a diff. collector.

 paging collector
 

 Key: LUCENE-2215
 URL: https://issues.apache.org/jira/browse/LUCENE-2215
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.4, 3.0
Reporter: Adam Heinz
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: IterablePaging.java, LUCENE-2215.patch, 
 PagingCollector.java, TestingPagingCollector.java


 http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
 Somebody assign this to Aaron McCurry and we'll see if we can get enough 
 votes on this issue to convince him to upload his patch.  :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2215) paging collector

2010-03-23 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848908#action_12848908
 ] 

Shai Erera commented on LUCENE-2215:


I must admit I don't like throwing UOE. I imagine the naive user calling one of 
these and hit w/ UOE out of nowhere really :). Perhaps it's a sign 
PagingCollector should not be a sub-class of TopDocsCollector? It does not 
benefit from it in any way because it overrides all the main methods, impls 
them or throws UOE for those it doesn't like. So perhaps it should just be a 
TopScorePagingCollector which copies some of the functionality of TSDC, but is 
not a TDC itself. It will have a topDocs() method, and only it (b/c I agree the 
rest don't make any sense).

Notice the different name I propose - to make it clear it's a collector that 
can be used for paging through a scored list of results.

I BTW liked that the if/else clauses were separated, b/c you could include 
meaningful documentation for each. Right now those are just very long lines.

About in-order, I think the only thing you will save is the last 'else'. Read 
my comment above about wrapping TSDC ... not sure about it, but it will make it 
more elegant.

I'll review the rest of the patch. Didn't yet understand what's PagingIterable 
for ...

 paging collector
 

 Key: LUCENE-2215
 URL: https://issues.apache.org/jira/browse/LUCENE-2215
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.4, 3.0
Reporter: Adam Heinz
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: IterablePaging.java, LUCENE-2215.patch, 
 PagingCollector.java, TestingPagingCollector.java


 http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
 Somebody assign this to Aaron McCurry and we'll see if we can get enough 
 votes on this issue to convince him to upload his patch.  :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2215) paging collector

2010-02-12 Thread jm (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833024#action_12833024
 ] 

jm commented on LUCENE-2215:


Kudos Aaron, this is cool for what I need. 

I just integrated in my project, upgraded to 3.0 just to get this in. But I am 
having an issue in my first test:

java.lang.ArrayIndexOutOfBoundsException: 1
at 
org.apache.lucene.util.PriorityQueue.initialize(PriorityQueue.java:96)
at org.apache.lucene.search.HitQueue.init(HitQueue.java:67)
at 
org.apache.lucene.search.PagingCollector.init(PagingCollector.java:43)
at 
org.apache.lucene.search.PagingCollector.init(PagingCollector.java:39)
at 
org.apache.lucene.search.IterablePaging$PagingIterator.search(IterablePaging.java:158)
at 
org.apache.lucene.search.IterablePaging$PagingIterator.init(IterablePaging.java:151)
at 
org.apache.lucene.search.IterablePaging.iterator(IterablePaging.java:140)
at ...CombinedLuceneDBStep.proceed(CombinedLuceneDBStep.java:71)

I use it like this:
MultiSearcher ms = new MultiSearcher(indexes);
TotalHitsRef totalHitsRef = new TotalHitsRef();
ProgressRef progressRef = new ProgressRef();
IterablePaging paging = new IterablePaging(ms, lucquery, 
NB_LUCENE_HITS_PER_BATCH);

I have no clue where the issue lies, I am using MultiSearcher , and norms are 
disabled. Or maybe I screwed up something while upgrading to 3.0... I got the 
files as of feb 11th.

 paging collector
 

 Key: LUCENE-2215
 URL: https://issues.apache.org/jira/browse/LUCENE-2215
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.4, 3.0
Reporter: Adam Heinz
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: IterablePaging.java, PagingCollector.java, 
 TestingPagingCollector.java


 http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
 Somebody assign this to Aaron McCurry and we'll see if we can get enough 
 votes on this issue to convince him to upload his patch.  :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2215) paging collector

2010-02-12 Thread javi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833041#action_12833041
 ] 

javi commented on LUCENE-2215:
--

disregard my previous comment... There was some refactoring in my codebase to 
get this in and NB_LUCENE_HITS_PER_BATCH was uninitialized...so far it is 
working sweetly, I will report when I finish my tests.

thanks

 paging collector
 

 Key: LUCENE-2215
 URL: https://issues.apache.org/jira/browse/LUCENE-2215
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.4, 3.0
Reporter: Adam Heinz
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: IterablePaging.java, PagingCollector.java, 
 TestingPagingCollector.java


 http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
 Somebody assign this to Aaron McCurry and we'll see if we can get enough 
 votes on this issue to convince him to upload his patch.  :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2215) paging collector

2010-01-19 Thread Adam Heinz (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12802276#action_12802276
 ] 

Adam Heinz commented on LUCENE-2215:


Awesome, thanks!  I'll schedule some time in the coming week to patch our dev 
installation and sic some QA guys on it.

 paging collector
 

 Key: LUCENE-2215
 URL: https://issues.apache.org/jira/browse/LUCENE-2215
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.4, 3.0
Reporter: Adam Heinz
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: IterablePaging.java, PagingCollector.java, 
 TestingPagingCollector.java


 http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
 Somebody assign this to Aaron McCurry and we'll see if we can get enough 
 votes on this issue to convince him to upload his patch.  :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org