[jira] Commented: (LUCENE-2215) paging collector
[ https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850072#action_12850072 ] Grant Ingersoll commented on LUCENE-2215: - bq. Let's be careful about the semantics here Grant. Most if not all applications implement paging indeed, but I believe only FEW actually store user contexts between searches. PagingCollector relies on the application to store the lowest ranking doc that was returned previously, which means storing context between user's searches. I think, assuming the math plays out, that once you show the gains to be had here, esp. for deep paging, storing an int and a float is trivial. If they are implementing paging, they are already keeping state about what page they are on. bq. Now they will need to think where do I get this low score from? Sorry, but If that is that hard to figure out, then I don't see how they have any business writing a Lucene application to begin with. A simple javadoc that says these two values are taken from the last result of the previously seen page should do the trick At any rate, let's put up the patches and find out instead of debating. I should have time today to do mine. paging collector Key: LUCENE-2215 URL: https://issues.apache.org/jira/browse/LUCENE-2215 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: 2.4, 3.0 Reporter: Adam Heinz Assignee: Grant Ingersoll Priority: Minor Attachments: IterablePaging.java, LUCENE-2215.patch, PagingCollector.java, TestingPagingCollector.java http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898 Somebody assign this to Aaron McCurry and we'll see if we can get enough votes on this issue to convince him to upload his patch. :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2215) paging collector
[ https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850086#action_12850086 ] Shai Erera commented on LUCENE-2215: Sure let's wait for the patch and some perf. results. paging collector Key: LUCENE-2215 URL: https://issues.apache.org/jira/browse/LUCENE-2215 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: 2.4, 3.0 Reporter: Adam Heinz Assignee: Grant Ingersoll Priority: Minor Attachments: IterablePaging.java, LUCENE-2215.patch, PagingCollector.java, TestingPagingCollector.java http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898 Somebody assign this to Aaron McCurry and we'll see if we can get enough votes on this issue to convince him to upload his patch. :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2215) paging collector
[ https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849639#action_12849639 ] Michael McCandless commented on LUCENE-2215: This is a neat collector! I like the idea of chaining/filtering... couldn't we put this in core (under TFC/TSDC.create), but instead of doubling the 12 specialized (anonymous) impls we now have, just delegate? Ie, we'd make a FilteredCollector, taking another collector when it's created, and then on every collect call, only if the hit is weak enough (ie is worse than what the app provided as prev low score/doc) would it forward it to the delegate? I guess we should test perf w/ (the new additions to benchmark -- yay!) to see if specializing the code (even anonymously) is warranted. The indent whitespace needs to fixed to 2 spaces... paging collector Key: LUCENE-2215 URL: https://issues.apache.org/jira/browse/LUCENE-2215 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: 2.4, 3.0 Reporter: Adam Heinz Assignee: Grant Ingersoll Priority: Minor Attachments: IterablePaging.java, LUCENE-2215.patch, PagingCollector.java, TestingPagingCollector.java http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898 Somebody assign this to Aaron McCurry and we'll see if we can get enough votes on this issue to convince him to upload his patch. :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2215) paging collector
[ https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849843#action_12849843 ] Grant Ingersoll commented on LUCENE-2215: - Mike, don't you think, though, that through a fairly simple update of some of the clauses to appropriate short circuit things that we can just hook this into the existing collectors w/o no need for any delegation or changes? Let me try a patch. Now that the benchmark stuff is in, we should be able to test. paging collector Key: LUCENE-2215 URL: https://issues.apache.org/jira/browse/LUCENE-2215 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: 2.4, 3.0 Reporter: Adam Heinz Assignee: Grant Ingersoll Priority: Minor Attachments: IterablePaging.java, LUCENE-2215.patch, PagingCollector.java, TestingPagingCollector.java http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898 Somebody assign this to Aaron McCurry and we'll see if we can get enough votes on this issue to convince him to upload his patch. :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2215) paging collector
[ https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849851#action_12849851 ] Uwe Schindler commented on LUCENE-2215: --- Hey, and I want to fix the NaN thing in TSDC: LUCENE-2271 Maybe when we delegate, we can also use my cool code that switches the delegate to remove on comparison after the queue is full. paging collector Key: LUCENE-2215 URL: https://issues.apache.org/jira/browse/LUCENE-2215 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: 2.4, 3.0 Reporter: Adam Heinz Assignee: Grant Ingersoll Priority: Minor Attachments: IterablePaging.java, LUCENE-2215.patch, PagingCollector.java, TestingPagingCollector.java http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898 Somebody assign this to Aaron McCurry and we'll see if we can get enough votes on this issue to convince him to upload his patch. :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2215) paging collector
[ https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849863#action_12849863 ] Michael McCandless commented on LUCENE-2215: bq. ...through a fairly simple update of some of the clauses to appropriate short circuit things that we can just hook this into the existing collectors w/o no need for any delegation or changes? Let me try a patch. Now that the benchmark stuff is in, we should be able to test. This'd make me nervous... Ie I don't think we should insert bytecodes for the 99.9% of searches that wouldn't make use of this, even if we can't uncover a slowdown with benchmarking. We should still benchmark it though (I'm curious)... we should also benchmark the delegate solution. paging collector Key: LUCENE-2215 URL: https://issues.apache.org/jira/browse/LUCENE-2215 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: 2.4, 3.0 Reporter: Adam Heinz Assignee: Grant Ingersoll Priority: Minor Attachments: IterablePaging.java, LUCENE-2215.patch, PagingCollector.java, TestingPagingCollector.java http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898 Somebody assign this to Aaron McCurry and we'll see if we can get enough votes on this issue to convince him to upload his patch. :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2215) paging collector
[ https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849961#action_12849961 ] Grant Ingersoll commented on LUCENE-2215: - Yeah, but one could make the argument, Mike, that the existing optimizations are useless for the most common case, since I think it's safe to say most applications implement paging. Of course, that being said, most users don't page all that deeply. Also, for something like Solr that prefetches the top 50 it might not be good, either. Still, in my mind it is one additional boolean check, as in: {code} if ( (current stuff) || (pagingInfoPresent == true paging check) ) ... {code} pagingInfoPresent can be determined at construction time and that whole clause would be short circuited very quickly. That being said, delegation could be done at construction time, too and more cleanly separates things. I'll try to put up my version tomorrow. paging collector Key: LUCENE-2215 URL: https://issues.apache.org/jira/browse/LUCENE-2215 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: 2.4, 3.0 Reporter: Adam Heinz Assignee: Grant Ingersoll Priority: Minor Attachments: IterablePaging.java, LUCENE-2215.patch, PagingCollector.java, TestingPagingCollector.java http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898 Somebody assign this to Aaron McCurry and we'll see if we can get enough votes on this issue to convince him to upload his patch. :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2215) paging collector
[ https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850002#action_12850002 ] Shai Erera commented on LUCENE-2215: bq. since I think it's safe to say most applications implement paging Let's be careful about the semantics here Grant. Most if not all applications implement paging indeed, but I believe only FEW actually store user contexts between searches. PagingCollector relies on the application to store the lowest ranking doc that was returned previously, which means storing context between user's searches. I agree w/ Mike's statement about 99.9% of the searches would never run that code, which is why I've proposed a delegation/wrapper approach from the beginning. I also think that we should make some allowances here and there, for the non-common case, and introduce better software design than specialized code. A Collector filter approach for some rare (or even less common) cases seems very reasonable to me. Also, I think that if we add to TSDC a create method which takes into account the previously scored lowest doc, it will confuse people. Now they will need to think where do I get this low score from? - but perhaps after I see the code, it wouldn't be such a bad thing just have a feeling TSDC and TFC should be left on their own, and extreme paging stuff should either be its own specialized collector, or a wrapper. paging collector Key: LUCENE-2215 URL: https://issues.apache.org/jira/browse/LUCENE-2215 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: 2.4, 3.0 Reporter: Adam Heinz Assignee: Grant Ingersoll Priority: Minor Attachments: IterablePaging.java, LUCENE-2215.patch, PagingCollector.java, TestingPagingCollector.java http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898 Somebody assign this to Aaron McCurry and we'll see if we can get enough votes on this issue to convince him to upload his patch. :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2215) paging collector
[ https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849193#action_12849193 ] Grant Ingersoll commented on LUCENE-2215: - {quote} I must admit I don't like throwing UOE. I imagine the naive user calling one of these and hit w/ UOE out of nowhere really . Perhaps it's a sign PagingCollector should not be a sub-class of TopDocsCollector? It does not benefit from it in any way because it overrides all the main methods, impls them or throws UOE for those it doesn't like. So perhaps it should just be a TopScorePagingCollector which copies some of the functionality of TSDC, but is not a TDC itself. It will have a topDocs() method, and only it (b/c I agree the rest don't make any sense). {quote} I agree, not a huge fan of it either, but it is bad form to call it when using this collector and I'd rather people learn that up front. Like I said in the last comment, I think we'd be better off trying to integrate this in to a lower level and not even having a special collector. If we just added a create option that took in the necessary info, then we could just mod the existing collectors, possibly. Then those two topDocs methods could just be deprecated/removed. paging collector Key: LUCENE-2215 URL: https://issues.apache.org/jira/browse/LUCENE-2215 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: 2.4, 3.0 Reporter: Adam Heinz Assignee: Grant Ingersoll Priority: Minor Attachments: IterablePaging.java, LUCENE-2215.patch, PagingCollector.java, TestingPagingCollector.java http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898 Somebody assign this to Aaron McCurry and we'll see if we can get enough votes on this issue to convince him to upload his patch. :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2215) paging collector
[ https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849200#action_12849200 ] Shai Erera commented on LUCENE-2215: So what's the motivation of declaring PagingCollector a TopDocsCollector? Would you envision one to request for a TopDocsCollector but don't care if it's TSDC, TFC or PagingCollector? I would rather have it extend TDC directly, and then you won't need to throw UOE for the rest of the methods ... What about renaming it to TopScorePagingCollector? paging collector Key: LUCENE-2215 URL: https://issues.apache.org/jira/browse/LUCENE-2215 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: 2.4, 3.0 Reporter: Adam Heinz Assignee: Grant Ingersoll Priority: Minor Attachments: IterablePaging.java, LUCENE-2215.patch, PagingCollector.java, TestingPagingCollector.java http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898 Somebody assign this to Aaron McCurry and we'll see if we can get enough votes on this issue to convince him to upload his patch. :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2215) paging collector
[ https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849207#action_12849207 ] Grant Ingersoll commented on LUCENE-2215: - I'm saying PagingColl. doesn't even exist and it is just folded into the two existing In/Out Collectors with a new create() method that knows when it's paging and when it's not. paging collector Key: LUCENE-2215 URL: https://issues.apache.org/jira/browse/LUCENE-2215 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: 2.4, 3.0 Reporter: Adam Heinz Assignee: Grant Ingersoll Priority: Minor Attachments: IterablePaging.java, LUCENE-2215.patch, PagingCollector.java, TestingPagingCollector.java http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898 Somebody assign this to Aaron McCurry and we'll see if we can get enough votes on this issue to convince him to upload his patch. :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2215) paging collector
[ https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849216#action_12849216 ] Grant Ingersoll commented on LUCENE-2215: - {quote} The only complication I see is that if we want to make it extremely efficient, we'll need to double the number of Collector impls for TSDC and TFC (the internal instances that are created) ... {quote} I'm not convinced yet. I think we can likely make it short circuit quite fast for the non-paging case, but rather than guess, let's benchmark. I'm extracting my Benchmark collector stuff right now on LUCENE-2343. I also am not sure we need to double the number of collectors. paging collector Key: LUCENE-2215 URL: https://issues.apache.org/jira/browse/LUCENE-2215 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: 2.4, 3.0 Reporter: Adam Heinz Assignee: Grant Ingersoll Priority: Minor Attachments: IterablePaging.java, LUCENE-2215.patch, PagingCollector.java, TestingPagingCollector.java http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898 Somebody assign this to Aaron McCurry and we'll see if we can get enough votes on this issue to convince him to upload his patch. :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2215) paging collector
[ https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848842#action_12848842 ] Grant Ingersoll commented on LUCENE-2215: - I think in order to properly implement this, topDocs() needs to be non-final, otherwise there is some oddities in initing a PQ with more results than are available once paging. Updated patch shortly. paging collector Key: LUCENE-2215 URL: https://issues.apache.org/jira/browse/LUCENE-2215 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: 2.4, 3.0 Reporter: Adam Heinz Assignee: Grant Ingersoll Priority: Minor Attachments: IterablePaging.java, PagingCollector.java, TestingPagingCollector.java http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898 Somebody assign this to Aaron McCurry and we'll see if we can get enough votes on this issue to convince him to upload his patch. :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2215) paging collector
[ https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848896#action_12848896 ] Shai Erera commented on LUCENE-2215: I've reviewed PagingCollector.java and the first thing I have to say about it is that I really like it ! :) Saves lots of unnecessary heapify code, if the application can allow itself to store the lowest last SD. I have few comments/questions. I don't understand what getLastScoreDoc is for? Is it just a utility method? Is it something the app can compute by itself? Anyway, it lacks javadocs, so perhaps if they existed I wouldn't need to ask ;). In collect(), there's the following code: {code} } else if (score == previousPassLowest.score doc = previousPassLowest.doc) { // if the scores are the same and the doc is less than or equal to // the // previous pass lowest hit doc then skip because this collector // favors // lower number documents. return; {code} I think there's a typo in the comment favors lower number documents .. while it seems to prefer higher doc IDs? The way I understand it, irregardless of whether docs are collected in/out of order, HitQueue ensures that when scores are equals, the lowest IDs are favored. Thus the first round always keeps the lowest IDs among the docs whose scores match. The next round will favor the docs whose IDs come next, and so forth ... am I right? (just clarifying my understanding). If that's the case, I think it'll be good if it's spelled out in the comment, and also mention that it means that document has already been returned previously (like it's documented in the previous 'if'). The last 'else' really looks like TSDC's out-of-order version, which makes me think whether PagingCollector can be viewed as a filter on top of TSDC (and possibly even TopFieldCollector)? So if a hit should be collected, it just calls super.collect? I realize though that a Collector is a hotspot and we want to minimize 'if' let alone method call statements as much as possible. But it just feels so strong that it should be a filter ... :). And you wouldn't need to specifically handle in/out orderness ... and w/ the right design, it can also wrap a TFC or any other TDC implementation ... BTW, I've noticed that you don't track maxScore - is it assumed that the application stores it from the first round? If so I'd document it, because the application needs to know it should use TSDC the first round, and PagingCollector the second round. Also, PagingCollector offers a ctor which does not force the application to pass in a ScoreDoc. See my comment from above - it might be misleading, because if you use this collector right from the very first search, you lose the maxScore tracking. I also don't see why it should be allowed - if a dummy previousPassLowest ScoreDoc is used, collect() does a lot of unnecessary 'if's. I think this collector should be used only from the second round, and a single ctor which forces a ScoreDoc to be passed would make more sense. If the application wishes to shoot itself in the leg (performance-wise), it can pass a dummy SD itself. paging collector Key: LUCENE-2215 URL: https://issues.apache.org/jira/browse/LUCENE-2215 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: 2.4, 3.0 Reporter: Adam Heinz Assignee: Grant Ingersoll Priority: Minor Attachments: IterablePaging.java, PagingCollector.java, TestingPagingCollector.java http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898 Somebody assign this to Aaron McCurry and we'll see if we can get enough votes on this issue to convince him to upload his patch. :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2215) paging collector
[ https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848904#action_12848904 ] Grant Ingersoll commented on LUCENE-2215: - bq. BTW, I've noticed that you don't track maxScore Good point. I think we probably should track it, so that the PagingColl could be used right from the get go. We might also consider deprecating the topDocs() methods that take in parameters and think about how the paging collector might be integrated at a lower level in the other collectors, such that one doesn't even have to think about calling a diff. collector. paging collector Key: LUCENE-2215 URL: https://issues.apache.org/jira/browse/LUCENE-2215 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: 2.4, 3.0 Reporter: Adam Heinz Assignee: Grant Ingersoll Priority: Minor Attachments: IterablePaging.java, LUCENE-2215.patch, PagingCollector.java, TestingPagingCollector.java http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898 Somebody assign this to Aaron McCurry and we'll see if we can get enough votes on this issue to convince him to upload his patch. :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2215) paging collector
[ https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848908#action_12848908 ] Shai Erera commented on LUCENE-2215: I must admit I don't like throwing UOE. I imagine the naive user calling one of these and hit w/ UOE out of nowhere really :). Perhaps it's a sign PagingCollector should not be a sub-class of TopDocsCollector? It does not benefit from it in any way because it overrides all the main methods, impls them or throws UOE for those it doesn't like. So perhaps it should just be a TopScorePagingCollector which copies some of the functionality of TSDC, but is not a TDC itself. It will have a topDocs() method, and only it (b/c I agree the rest don't make any sense). Notice the different name I propose - to make it clear it's a collector that can be used for paging through a scored list of results. I BTW liked that the if/else clauses were separated, b/c you could include meaningful documentation for each. Right now those are just very long lines. About in-order, I think the only thing you will save is the last 'else'. Read my comment above about wrapping TSDC ... not sure about it, but it will make it more elegant. I'll review the rest of the patch. Didn't yet understand what's PagingIterable for ... paging collector Key: LUCENE-2215 URL: https://issues.apache.org/jira/browse/LUCENE-2215 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: 2.4, 3.0 Reporter: Adam Heinz Assignee: Grant Ingersoll Priority: Minor Attachments: IterablePaging.java, LUCENE-2215.patch, PagingCollector.java, TestingPagingCollector.java http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898 Somebody assign this to Aaron McCurry and we'll see if we can get enough votes on this issue to convince him to upload his patch. :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2215) paging collector
[ https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833024#action_12833024 ] jm commented on LUCENE-2215: Kudos Aaron, this is cool for what I need. I just integrated in my project, upgraded to 3.0 just to get this in. But I am having an issue in my first test: java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.lucene.util.PriorityQueue.initialize(PriorityQueue.java:96) at org.apache.lucene.search.HitQueue.init(HitQueue.java:67) at org.apache.lucene.search.PagingCollector.init(PagingCollector.java:43) at org.apache.lucene.search.PagingCollector.init(PagingCollector.java:39) at org.apache.lucene.search.IterablePaging$PagingIterator.search(IterablePaging.java:158) at org.apache.lucene.search.IterablePaging$PagingIterator.init(IterablePaging.java:151) at org.apache.lucene.search.IterablePaging.iterator(IterablePaging.java:140) at ...CombinedLuceneDBStep.proceed(CombinedLuceneDBStep.java:71) I use it like this: MultiSearcher ms = new MultiSearcher(indexes); TotalHitsRef totalHitsRef = new TotalHitsRef(); ProgressRef progressRef = new ProgressRef(); IterablePaging paging = new IterablePaging(ms, lucquery, NB_LUCENE_HITS_PER_BATCH); I have no clue where the issue lies, I am using MultiSearcher , and norms are disabled. Or maybe I screwed up something while upgrading to 3.0... I got the files as of feb 11th. paging collector Key: LUCENE-2215 URL: https://issues.apache.org/jira/browse/LUCENE-2215 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: 2.4, 3.0 Reporter: Adam Heinz Assignee: Grant Ingersoll Priority: Minor Attachments: IterablePaging.java, PagingCollector.java, TestingPagingCollector.java http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898 Somebody assign this to Aaron McCurry and we'll see if we can get enough votes on this issue to convince him to upload his patch. :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2215) paging collector
[ https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833041#action_12833041 ] javi commented on LUCENE-2215: -- disregard my previous comment... There was some refactoring in my codebase to get this in and NB_LUCENE_HITS_PER_BATCH was uninitialized...so far it is working sweetly, I will report when I finish my tests. thanks paging collector Key: LUCENE-2215 URL: https://issues.apache.org/jira/browse/LUCENE-2215 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: 2.4, 3.0 Reporter: Adam Heinz Assignee: Grant Ingersoll Priority: Minor Attachments: IterablePaging.java, PagingCollector.java, TestingPagingCollector.java http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898 Somebody assign this to Aaron McCurry and we'll see if we can get enough votes on this issue to convince him to upload his patch. :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2215) paging collector
[ https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12802276#action_12802276 ] Adam Heinz commented on LUCENE-2215: Awesome, thanks! I'll schedule some time in the coming week to patch our dev installation and sic some QA guys on it. paging collector Key: LUCENE-2215 URL: https://issues.apache.org/jira/browse/LUCENE-2215 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: 2.4, 3.0 Reporter: Adam Heinz Assignee: Grant Ingersoll Priority: Minor Attachments: IterablePaging.java, PagingCollector.java, TestingPagingCollector.java http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898 Somebody assign this to Aaron McCurry and we'll see if we can get enough votes on this issue to convince him to upload his patch. :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org