Re: Collector is collecting more than the specified hits
Sorry, searchAfter only works if you are sorting by score or by fields. It seems like you are sorting by docID? Ie, at first you want the top 100 hits sorted by docID, then the next 100, etc.? If so, you could just modify your collector so that you tell it up front the afterDocID (= last docID from the previous page), and in your collect method, if the docID is = afterDocID, don't collect it, and then if the number of hits collected == 100, throw a custom exception to abort the search? That will do the same thing that searchAfter does for normal sorting. Mike McCandless http://blog.mikemccandless.com On Mon, Feb 17, 2014 at 7:05 PM, saisantoshi saisantosh...@gmail.com wrote: As I mentioned in my original post, I am calling like the below: MyCollector collector; TopScoreDocCollector topScore = TopScoreDocCollector.create(firstIndex+numHits, true); IndexSearcher searcher = new IndexSearcher(reader); try { collector = new MyCollector(indexReader, new PositiveScoresOnlyCollector(topScore)); searcher.search(query, (Filter) null, collector); } finally { } The searchAfter method does not take any collector. I want the collector.collect(int doc) to be called only for the next set and not from the starting. If a request comes for the first set, it would be like: TopScoreDocCollector topScore = TopScoreDocCollector.create(0+100, true); the collector should call only 1- 100 and for the next set TopScoreDocCollector topScore = TopScoreDocCollector.create(101+100, true); it should call from 101 - 200 and not from 0-200. Thanks, Sai. -- View this message in context: http://lucene.472066.n3.nabble.com/Collector-is-collecting-more-than-the-specified-hits-tp4117329p4117901.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Collector is collecting more than the specified hits
The above works fine but how do I get the state of *last docID*. Also, there will be multiple users accessing this and we need to maintain the integrity of last docID. Can we know the last docID from the collector collect call? Thanks, Ranjith. -- View this message in context: http://lucene.472066.n3.nabble.com/Collector-is-collecting-more-than-the-specified-hits-tp4117329p4118048.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Collector is collecting more than the specified hits
You look at the hits you got back, and save the docID of the very last hit, and use that on the follow-on search to get the next page. This is how searchAfter works ... but you need to ensure you use the same searcher for follow-on requests; otherwise the docIDs are not comparable. E.g. use SearcherLifetimeManager. Mike McCandless http://blog.mikemccandless.com On Tue, Feb 18, 2014 at 11:14 AM, saisantoshi saisantosh...@gmail.com wrote: The above works fine but how do I get the state of *last docID*. Also, there will be multiple users accessing this and we need to maintain the integrity of last docID. Can we know the last docID from the collector collect call? Thanks, Ranjith. -- View this message in context: http://lucene.472066.n3.nabble.com/Collector-is-collecting-more-than-the-specified-hits-tp4117329p4118048.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Collector is collecting more than the specified hits
There might be an issue with the below approach as the docID that is saved might be deleted before the next call to search and I am not sure if it does break the seach functionality when such a thing happens. Thanks, Ranjith. -- View this message in context: http://lucene.472066.n3.nabble.com/Collector-is-collecting-more-than-the-specified-hits-tp4117329p4118096.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Collector is collecting more than the specified hits
As long as you guarantee your follow-on search uses the same searcher than that's a non-issue. Ie, only a new search could see an index change like a new deletion. Mike McCandless http://blog.mikemccandless.com On Tue, Feb 18, 2014 at 2:32 PM, saisantoshi saisantosh...@gmail.com wrote: There might be an issue with the below approach as the docID that is saved might be deleted before the next call to search and I am not sure if it does break the seach functionality when such a thing happens. Thanks, Ranjith. -- View this message in context: http://lucene.472066.n3.nabble.com/Collector-is-collecting-more-than-the-specified-hits-tp4117329p4118096.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Collector is collecting more than the specified hits
The collector is collecting all the documents. Let's say I have 50k documents and I want the collector to give me the results taking the start and maxHits. Can we get this functionality from Lucene? For example, very first time, I want to collect from 0 -100 the next time I want to collect from 100 - 200. What the collector seems to do right now is collecting all the 200 documents and giving me the 100. Can we have the collector do it intelligently by remembering the old search results and run the collector for the next 100 only. Thanks, Sai. -- View this message in context: http://lucene.472066.n3.nabble.com/Collector-is-collecting-more-than-the-specified-hits-tp4117329p4117858.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Collector is collecting more than the specified hits
This is exactly what searchAfter is for (deep paging). Mike McCandless http://blog.mikemccandless.com On Mon, Feb 17, 2014 at 3:12 PM, saisantoshi saisantosh...@gmail.com wrote: The collector is collecting all the documents. Let's say I have 50k documents and I want the collector to give me the results taking the start and maxHits. Can we get this functionality from Lucene? For example, very first time, I want to collect from 0 -100 the next time I want to collect from 100 - 200. What the collector seems to do right now is collecting all the 200 documents and giving me the 100. Can we have the collector do it intelligently by remembering the old search results and run the collector for the next 100 only. Thanks, Sai. -- View this message in context: http://lucene.472066.n3.nabble.com/Collector-is-collecting-more-than-the-specified-hits-tp4117329p4117858.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Collector is collecting more than the specified hits
Could you please elaborate on the above? I am not sure if the collector is already doing it or do I need to call any other API? Thanks, Sai. -- View this message in context: http://lucene.472066.n3.nabble.com/Collector-is-collecting-more-than-the-specified-hits-tp4117329p4117883.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Collector is collecting more than the specified hits
As I mentioned in my original post, I am calling like the below: MyCollector collector; TopScoreDocCollector topScore = TopScoreDocCollector.create(firstIndex+numHits, true); IndexSearcher searcher = new IndexSearcher(reader); try { collector = new MyCollector(indexReader, new PositiveScoresOnlyCollector(topScore)); searcher.search(query, (Filter) null, collector); } finally { } The searchAfter method does not take any collector. I want the collector.collect(int doc) to be called only for the next set and not from the starting. If a request comes for the first set, it would be like: TopScoreDocCollector topScore = TopScoreDocCollector.create(0+100, true); the collector should call only 1- 100 and for the next set TopScoreDocCollector topScore = TopScoreDocCollector.create(101+100, true); it should call from 101 - 200 and not from 0-200. Thanks, Sai. -- View this message in context: http://lucene.472066.n3.nabble.com/Collector-is-collecting-more-than-the-specified-hits-tp4117329p4117901.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Collector is collecting more than the specified hits
This is how Collector works: it is called for every document matching the query, and then its job is to choose which of those hits to keep. This is because in general the hits to keep can come at any time, not just the first N hits you see; e.g. the best scoring hit may be the very last one. But if you have prior knowledge, e.g. that your index is already pre-sorted by the criteria that you sort by at query time, then indeed after seeing the first N hits you can stop; to do this you must throw your own exception, and catch it up above. See Lucene's TimeLimitingCollector for a similar example ... Mike McCandless http://blog.mikemccandless.com On Fri, Feb 14, 2014 at 2:47 AM, saisantoshi saisantosh...@gmail.com wrote: The problem with the below collector is the collect method is not stopping after the numHits count has reached. Is there a way to stop the collector collecting the docs after it has reached the numHits specified. For example: * TopScoreDocCollector topScore = TopScoreDocCollector.create(numHits, true); * // TopScoreDocCollector topScore = TopScoreDocCollector.create(30, true); I would except the below collector to pause/exit out after it has collected the specified numHits ( in this case it's 30). But what's happening here is the collector is collecting all the docs and thereby causing delay in searches. Can we configure the collect method below to collect/stop after it has reached numHits specified? PLease let me know if there any issue with the collector below? public class MyCollector extends PositiveScoresOnlyCollector { private IndexReader indexReader; public MyCollector (IndexReader indexReader,PositiveScoresOnlyCollector topScore) { super(topScore); this.indexReader = indexReader; } @Override public void collect(int doc) { try { //Custom Logic super.collect(doc); } } catch (Exception e) { } } //Usage: MyCollector collector; TopScoreDocCollector topScore = TopScoreDocCollector.create(numHits, true); IndexSearcher searcher = new IndexSearcher(reader); try { collector = new MyCollector(indexReader, new PositiveScoresOnlyCollector(topScore)); searcher.search(query, (Filter) null, collector); } finally { } Thanks, Sai. -- View this message in context: http://lucene.472066.n3.nabble.com/Collector-is-collecting-more-than-the-specified-hits-tp4117329.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Collector is collecting more than the specified hits
I am not interested in the scores at all. My requirement is simple, I only need the first 100 hits or the numHits I specify ( irrespective of there scores). The collector should stop after collecting the numHits specified. Is there a way to tell in the collector to stop after collecting the numHits. Please correct me if I am wrong. I am trying to do the following. public void collect(int doc) throws IOException { if (collector.getTotalHits() = maxHits ) {// this way, I can stop it to not collect after the getTotalHits is more than numHits. delegate.collect(doc); } } I have to write a separate collector extending the Collector because I am not able to get the call to getTotalHits() if I am using PositiveScoresOnlyCollector. -- View this message in context: http://lucene.472066.n3.nabble.com/Collector-is-collecting-more-than-the-specified-hits-tp4117329p4117441.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Collector is collecting more than the specified hits
The problem with the below collector is the collect method is not stopping after the numHits count has reached. Is there a way to stop the collector collecting the docs after it has reached the numHits specified. For example: * TopScoreDocCollector topScore = TopScoreDocCollector.create(numHits, true); * // TopScoreDocCollector topScore = TopScoreDocCollector.create(30, true); I would except the below collector to pause/exit out after it has collected the specified numHits ( in this case it's 30). But what's happening here is the collector is collecting all the docs and thereby causing delay in searches. Can we configure the collect method below to collect/stop after it has reached numHits specified? PLease let me know if there any issue with the collector below? public class MyCollector extends PositiveScoresOnlyCollector { private IndexReader indexReader; public MyCollector (IndexReader indexReader,PositiveScoresOnlyCollector topScore) { super(topScore); this.indexReader = indexReader; } @Override public void collect(int doc) { try { //Custom Logic super.collect(doc); } } catch (Exception e) { } } //Usage: MyCollector collector; TopScoreDocCollector topScore = TopScoreDocCollector.create(numHits, true); IndexSearcher searcher = new IndexSearcher(reader); try { collector = new MyCollector(indexReader, new PositiveScoresOnlyCollector(topScore)); searcher.search(query, (Filter) null, collector); } finally { } Thanks, Sai. -- View this message in context: http://lucene.472066.n3.nabble.com/Collector-is-collecting-more-than-the-specified-hits-tp4117329.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org