Re: Collector is collecting more than the specified hits

2014-02-18 Thread Michael McCandless
Sorry, searchAfter only works if you are sorting by score or by fields.

It seems like you are sorting by docID?  Ie, at first you want the top
100 hits sorted by docID, then the next 100, etc.?

If so, you could just modify your collector so that you tell it up
front the afterDocID (= last docID from the previous page), and in
your collect method, if the docID is = afterDocID, don't collect it,
and then if the number of hits collected == 100, throw a custom
exception to abort the search?  That will do the same thing that
searchAfter does for normal sorting.

Mike McCandless

http://blog.mikemccandless.com


On Mon, Feb 17, 2014 at 7:05 PM, saisantoshi saisantosh...@gmail.com wrote:
 As I mentioned in my original post, I am calling like the below:

 MyCollector collector;
 TopScoreDocCollector topScore =
 TopScoreDocCollector.create(firstIndex+numHits, true);
 IndexSearcher searcher = new IndexSearcher(reader);
 try {
 collector = new MyCollector(indexReader, new
 PositiveScoresOnlyCollector(topScore));
 searcher.search(query, (Filter) null, collector);
 } finally {

 }

 The searchAfter method does not take any collector. I want the
 collector.collect(int doc) to be called only for the next set and not from
 the starting. If a request comes for the first set, it would be like:

 TopScoreDocCollector topScore = TopScoreDocCollector.create(0+100, true);

 the collector should call only 1- 100 and for the next set

 TopScoreDocCollector topScore = TopScoreDocCollector.create(101+100, true);

 it should call from 101 - 200 and not from 0-200.

 Thanks,
 Sai.







 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Collector-is-collecting-more-than-the-specified-hits-tp4117329p4117901.html
 Sent from the Lucene - Java Users mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Collector is collecting more than the specified hits

2014-02-18 Thread saisantoshi
The above works fine but how do I get the state of *last docID*. Also, there
will be multiple users accessing this and we need to maintain the integrity
of  last docID. Can we know the last docID from the collector collect call?

Thanks,
Ranjith.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Collector-is-collecting-more-than-the-specified-hits-tp4117329p4118048.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Collector is collecting more than the specified hits

2014-02-18 Thread Michael McCandless
You look at the hits you got back, and save the docID of the very last
hit, and use that on the follow-on search to get the next page.
This is how searchAfter works ... but you need to ensure you use the
same searcher for follow-on requests; otherwise the docIDs are not
comparable.  E.g. use SearcherLifetimeManager.

Mike McCandless

http://blog.mikemccandless.com


On Tue, Feb 18, 2014 at 11:14 AM, saisantoshi saisantosh...@gmail.com wrote:
 The above works fine but how do I get the state of *last docID*. Also, there
 will be multiple users accessing this and we need to maintain the integrity
 of  last docID. Can we know the last docID from the collector collect call?

 Thanks,
 Ranjith.



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Collector-is-collecting-more-than-the-specified-hits-tp4117329p4118048.html
 Sent from the Lucene - Java Users mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Collector is collecting more than the specified hits

2014-02-18 Thread saisantoshi
There might be an issue with the below approach as the docID that is saved
might be deleted before the next call to search and I am not sure if it does
break the seach functionality when such a thing happens.

Thanks,
Ranjith.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Collector-is-collecting-more-than-the-specified-hits-tp4117329p4118096.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Collector is collecting more than the specified hits

2014-02-18 Thread Michael McCandless
As long as you guarantee your follow-on search uses the same searcher
than that's a non-issue.

Ie, only a new search could see an index change like a new deletion.

Mike McCandless

http://blog.mikemccandless.com


On Tue, Feb 18, 2014 at 2:32 PM, saisantoshi saisantosh...@gmail.com wrote:
 There might be an issue with the below approach as the docID that is saved
 might be deleted before the next call to search and I am not sure if it does
 break the seach functionality when such a thing happens.

 Thanks,
 Ranjith.



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Collector-is-collecting-more-than-the-specified-hits-tp4117329p4118096.html
 Sent from the Lucene - Java Users mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Collector is collecting more than the specified hits

2014-02-17 Thread saisantoshi
The collector is collecting all the documents. Let's say I have 50k documents
and I want the collector to give me the results taking the start and
maxHits. Can we get this functionality from Lucene? For example, very first
time, I want to collect from 0 -100  the next time I want to collect from
100 - 200. What the collector seems to do right now is collecting all the
200 documents and giving me the 100. Can we have the collector do it
intelligently by remembering the old search results and run the collector
for the next 100 only.

Thanks,
Sai.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Collector-is-collecting-more-than-the-specified-hits-tp4117329p4117858.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Collector is collecting more than the specified hits

2014-02-17 Thread Michael McCandless
This is exactly what searchAfter is for (deep paging).

Mike McCandless

http://blog.mikemccandless.com


On Mon, Feb 17, 2014 at 3:12 PM, saisantoshi saisantosh...@gmail.com wrote:
 The collector is collecting all the documents. Let's say I have 50k documents
 and I want the collector to give me the results taking the start and
 maxHits. Can we get this functionality from Lucene? For example, very first
 time, I want to collect from 0 -100  the next time I want to collect from
 100 - 200. What the collector seems to do right now is collecting all the
 200 documents and giving me the 100. Can we have the collector do it
 intelligently by remembering the old search results and run the collector
 for the next 100 only.

 Thanks,
 Sai.



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Collector-is-collecting-more-than-the-specified-hits-tp4117329p4117858.html
 Sent from the Lucene - Java Users mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Collector is collecting more than the specified hits

2014-02-17 Thread saisantoshi
Could you please elaborate on the above? I am not sure if the collector is
already doing it or do I need to call any other API?

Thanks,
Sai.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Collector-is-collecting-more-than-the-specified-hits-tp4117329p4117883.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Collector is collecting more than the specified hits

2014-02-17 Thread saisantoshi
As I mentioned in my original post, I am calling like the below:

MyCollector collector;
TopScoreDocCollector topScore =
TopScoreDocCollector.create(firstIndex+numHits, true);
IndexSearcher searcher = new IndexSearcher(reader);
try {
collector = new MyCollector(indexReader, new
PositiveScoresOnlyCollector(topScore));
searcher.search(query, (Filter) null, collector);
} finally {
 
} 

The searchAfter method does not take any collector. I want the
collector.collect(int doc) to be called only for the next set and not from
the starting. If a request comes for the first set, it would be like:

TopScoreDocCollector topScore = TopScoreDocCollector.create(0+100, true);

the collector should call only 1- 100 and for the next set

TopScoreDocCollector topScore = TopScoreDocCollector.create(101+100, true);

it should call from 101 - 200 and not from 0-200.

Thanks,
Sai.







--
View this message in context: 
http://lucene.472066.n3.nabble.com/Collector-is-collecting-more-than-the-specified-hits-tp4117329p4117901.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Collector is collecting more than the specified hits

2014-02-14 Thread Michael McCandless
This is how Collector works: it is called for every document matching
the query, and then its job is to choose which of those hits to keep.

This is because in general the hits to keep can come at any time, not
just the first N hits you see; e.g. the best scoring hit may be the
very last one.

But if you have prior knowledge, e.g. that your index is already
pre-sorted by the criteria that you sort by at query time, then indeed
after seeing the first N hits you can stop; to do this you must throw
your own exception, and catch it up above.  See Lucene's
TimeLimitingCollector for a similar example ...

Mike McCandless

http://blog.mikemccandless.com


On Fri, Feb 14, 2014 at 2:47 AM, saisantoshi saisantosh...@gmail.com wrote:
 The problem with the below collector is the collect method is not stopping
 after the numHits count has reached. Is there a way to stop the collector
 collecting the docs after it has reached the numHits specified.

 For example:
 * TopScoreDocCollector topScore = TopScoreDocCollector.create(numHits,
 true); *
 // TopScoreDocCollector topScore = TopScoreDocCollector.create(30, true);

 I would except the below collector to pause/exit out after it has collected
 the specified numHits ( in this case it's 30). But what's happening here is
 the collector is collecting all the docs and thereby causing delay in
 searches. Can we configure the collect method below to collect/stop after it
 has reached numHits specified? PLease let me know if there any issue with
 the collector below?

 public class MyCollector extends PositiveScoresOnlyCollector  {

 private IndexReader indexReader;


 public MyCollector (IndexReader indexReader,PositiveScoresOnlyCollector
 topScore) {
 super(topScore);
 this.indexReader = indexReader;
 }

 @Override
 public void collect(int doc) {
 try {
//Custom Logic
 super.collect(doc);
}

 } catch (Exception e) {

 }
 }



 //Usage:

 MyCollector collector;
 TopScoreDocCollector topScore =
 TopScoreDocCollector.create(numHits, true);
 IndexSearcher searcher = new IndexSearcher(reader);
 try {
 collector = new MyCollector(indexReader, new
 PositiveScoresOnlyCollector(topScore));
 searcher.search(query, (Filter) null, collector);
 } finally {

 }

 Thanks,
 Sai.



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Collector-is-collecting-more-than-the-specified-hits-tp4117329.html
 Sent from the Lucene - Java Users mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Collector is collecting more than the specified hits

2014-02-14 Thread saisantoshi
I am not interested in the scores at all. My requirement is simple, I only
need the first 100 hits or the numHits I specify ( irrespective of there
scores). The collector should stop after collecting the numHits specified.
Is there a way to tell in the collector to stop after collecting the
numHits.

Please correct me if I am wrong. I am trying to do the following.

public void collect(int doc) throws IOException {

 if (collector.getTotalHits() = maxHits ) {// this way, I can stop it
to not collect after the getTotalHits is more than numHits.

delegate.collect(doc); 

}

}

I have to write a separate collector extending the Collector because I am
not able to get the call to getTotalHits() if I am using
PositiveScoresOnlyCollector.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Collector-is-collecting-more-than-the-specified-hits-tp4117329p4117441.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Collector is collecting more than the specified hits

2014-02-13 Thread saisantoshi
The problem with the below collector is the collect method is not stopping
after the numHits count has reached. Is there a way to stop the collector
collecting the docs after it has reached the numHits specified.

For example:
* TopScoreDocCollector topScore = TopScoreDocCollector.create(numHits,
true); *
// TopScoreDocCollector topScore = TopScoreDocCollector.create(30, true); 

I would except the below collector to pause/exit out after it has collected
the specified numHits ( in this case it's 30). But what's happening here is
the collector is collecting all the docs and thereby causing delay in
searches. Can we configure the collect method below to collect/stop after it
has reached numHits specified? PLease let me know if there any issue with
the collector below?

public class MyCollector extends PositiveScoresOnlyCollector  { 

private IndexReader indexReader; 
  

public MyCollector (IndexReader indexReader,PositiveScoresOnlyCollector
topScore) { 
super(topScore); 
this.indexReader = indexReader; 
} 

@Override 
public void collect(int doc) { 
try { 
   //Custom Logic 
super.collect(doc); 
   } 

} catch (Exception e) { 
  
} 
} 



//Usage: 

MyCollector collector; 
TopScoreDocCollector topScore =
TopScoreDocCollector.create(numHits, true); 
IndexSearcher searcher = new IndexSearcher(reader); 
try { 
collector = new MyCollector(indexReader, new
PositiveScoresOnlyCollector(topScore)); 
searcher.search(query, (Filter) null, collector); 
} finally { 
  
} 

Thanks,
Sai.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Collector-is-collecting-more-than-the-specified-hits-tp4117329.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org