RE: Increase search performance

2018-02-02 Thread Atul Bisaria
Thanks for the feedback!

-Original Message-
From: Adrien Grand [mailto:jpou...@gmail.com]
Sent: Friday, February 02, 2018 1:42 PM
To: java-user@lucene.apache.org
Subject: Re: Increase search performance

If needsScores returns false on the collector, then scores won't be computed.

Your prototype should work well.

Le ven. 2 févr. 2018 à 04:46, Atul Bisaria <atul.bisa...@ericsson.com> a écrit :

> Hi Adrien,
>
> Please correct if I am wrong, but I believe using extended
> IntComparator in custom Sort object for randomization would still
> score documents (using IndexSearcher.search(Query, int, Sort), for example).
>
> So I tried using a custom collector using IndexSearcher.search(Query,
> Collector) where the custom collector does not score documents at all.
>
> I have refactored RandomOrderCollector to fix the memory usage problem
> as described below. Let me know if this looks ok now.
>
> class RandomOrderCollector extends SimpleCollector {
> private int maxHitsRequired;
> private int docBase;
>
> private ScoreDoc[] matches;
>
> private int numHits;
>
> private Random random = new Random();
>
> public RandomOrderCollector(int maxHitsRequired)
> {
> this.maxHitsRequired = maxHitsRequired;
> this.matches = new ScoreDoc[maxHitsRequired];
> }
>
> public boolean needsScores()
> {
> return false;
> }
>
> @Override
> public void collect(int doc) throws IOException
> {
> int absoluteDoc = docBase + doc;
> int randomScore = random.nextInt(); // assign a random
> score to each doc
>
> if(numHits < maxHitsRequired)
> {
> matches[numHits++] = new ScoreDoc(absoluteDoc,
> randomScore);
> }
> else
> {
> int index = random.nextInt(maxHitsRequired);
> if(matches[index].score < randomScore)
> {
> matches[index] = new
> ScoreDoc(absoluteDoc, randomScore);;
> }
> }
> }
>
> @Override
> protected void doSetNextReader(LeafReaderContext context)
> throws IOException
> {
> super.doSetNextReader(context);
> this.docBase = context.docBase;
> }
>
> public ScoreDoc[] getHits()
> {
> return matches;
> }
> }
>
> Best Regards,
> Atul Bisaria
>
> -Original Message-
> From: Adrien Grand [mailto:jpou...@gmail.com]
> Sent: Thursday, February 01, 2018 6:11 PM
> To: java-user@lucene.apache.org
> Subject: Re: Increase search performance
>
> Yes, this collector won't perform well if you have many matches since
> memory usage is linear with the number of matches. A better option
> would be to extend eg. IntComparator and implement getNumericDocValues
> by returning a fake NumericDocValues instance that eg. does a bit mix
> of the doc id and a per-request seed (for instance HPPC's BitMixer can
> do that
> https://github.com/carrotsearch/hppc/blob/master/hppc/src/main/java/co
> m/carrotsearch/hppc/BitMixer.java
> ).
>
> Le jeu. 1 févr. 2018 à 12:31, Atul Bisaria <atul.bisa...@ericsson.com>
> a écrit :
>
> > Hi Adrien,
> >
> > Thanks for your reply.
> >
> > I have also tried testing with UsageTrackingQueryCachingPolicy, but
> > did not observe a significant change in both latency and throughput.
> >
> > Given that I have specific search requirements of no scoring and
> > sorting the search results in a random order (reason for custom sort
> > object), I have also explored writing a custom collector and could
> > observe quite a difference in latency figures.
> >
> > Let me know if this custom collector code has any loopholes which I
> > could be missing:
> >
> > class RandomOrderCollector extends SimpleCollector {
> > private int maxHitsRequired;
> > private int docBase;
> >
> > private List matches = new ArrayList();
> >
> > public RandomOrderCollector(int maxHitsRequired)
> > {
> > this.maxHitsRequired = maxHitsRequired;
> > }
> >
> > public boolean needsScores()
> > {
> > return false;
> > }
> >
> > @Override
> > public void collect(int doc) throws IOException
> > {

RE: Increase search performance

2018-02-01 Thread Atul Bisaria
Hi Adrien,

Please correct if I am wrong, but I believe using extended IntComparator in 
custom Sort object for randomization would still score documents (using 
IndexSearcher.search(Query, int, Sort), for example).

So I tried using a custom collector using IndexSearcher.search(Query, 
Collector) where the custom collector does not score documents at all.

I have refactored RandomOrderCollector to fix the memory usage problem as 
described below. Let me know if this looks ok now.

class RandomOrderCollector extends SimpleCollector
{
private int maxHitsRequired;
private int docBase;

private ScoreDoc[] matches;

private int numHits;

private Random random = new Random();

public RandomOrderCollector(int maxHitsRequired)
{
this.maxHitsRequired = maxHitsRequired;
this.matches = new ScoreDoc[maxHitsRequired];
}

public boolean needsScores()
{
return false;
}

@Override
public void collect(int doc) throws IOException
{
int absoluteDoc = docBase + doc;
int randomScore = random.nextInt(); // assign a random score to 
each doc

if(numHits < maxHitsRequired)
{
matches[numHits++] = new ScoreDoc(absoluteDoc, 
randomScore);
}
else
{
int index = random.nextInt(maxHitsRequired);
if(matches[index].score < randomScore)
{
matches[index] = new ScoreDoc(absoluteDoc, 
randomScore);;
}
}
}

@Override
protected void doSetNextReader(LeafReaderContext context) throws 
IOException
{
super.doSetNextReader(context);
this.docBase = context.docBase;
}

public ScoreDoc[] getHits()
{
return matches;
}
}

Best Regards,
Atul Bisaria

-Original Message-
From: Adrien Grand [mailto:jpou...@gmail.com]
Sent: Thursday, February 01, 2018 6:11 PM
To: java-user@lucene.apache.org
Subject: Re: Increase search performance

Yes, this collector won't perform well if you have many matches since memory 
usage is linear with the number of matches. A better option would be to extend 
eg. IntComparator and implement getNumericDocValues by returning a fake 
NumericDocValues instance that eg. does a bit mix of the doc id and a 
per-request seed (for instance HPPC's BitMixer can do that 
https://github.com/carrotsearch/hppc/blob/master/hppc/src/main/java/com/carrotsearch/hppc/BitMixer.java
).

Le jeu. 1 févr. 2018 à 12:31, Atul Bisaria <atul.bisa...@ericsson.com> a écrit :

> Hi Adrien,
>
> Thanks for your reply.
>
> I have also tried testing with UsageTrackingQueryCachingPolicy, but
> did not observe a significant change in both latency and throughput.
>
> Given that I have specific search requirements of no scoring and
> sorting the search results in a random order (reason for custom sort
> object), I have also explored writing a custom collector and could
> observe quite a difference in latency figures.
>
> Let me know if this custom collector code has any loopholes which I
> could be missing:
>
> class RandomOrderCollector extends SimpleCollector {
> private int maxHitsRequired;
> private int docBase;
>
> private List matches = new ArrayList();
>
> public RandomOrderCollector(int maxHitsRequired)
> {
> this.maxHitsRequired = maxHitsRequired;
> }
>
> public boolean needsScores()
> {
> return false;
> }
>
> @Override
> public void collect(int doc) throws IOException
> {
> matches.add(docBase + doc);
> }
>
> @Override
> protected void doSetNextReader(LeafReaderContext context)
> throws IOException
> {
> super.doSetNextReader(context);
> this.docBase = context.docBase;
> }
>
> public List getHits()
> {
> Collections.shuffle(matches);
> maxHitsRequired = Math.min(matches.size(),
> maxHitsRequired);
>
> return matches.subList(0, maxHitsRequired);
> }
> }
>
> Best Regards,
> Atul Bisaria
>
> -Original Message-
> From: Adrien Grand [mailto:jpou...@gmail.com]
> Sent: Wednesday, January 31, 2018 6:33 PM
> To: java-user@lucene.apache.org
> Subject: Re: Increase search performance
>
> Hi Atul,
>
>
> Le mar. 30 janv. 2018 à 16:24, Atul Bisaria
> <atul.bisa...@ericsson.com> a écrit :
>
> > 1. U

RE: Increase search performance

2018-02-01 Thread Atul Bisaria
Hi Adrien,

Thanks for your reply.

I have also tried testing with UsageTrackingQueryCachingPolicy, but did not 
observe a significant change in both latency and throughput.

Given that I have specific search requirements of no scoring and sorting the 
search results in a random order (reason for custom sort object), I have also 
explored writing a custom collector and could observe quite a difference in 
latency figures.

Let me know if this custom collector code has any loopholes which I could be 
missing:

class RandomOrderCollector extends SimpleCollector
{
private int maxHitsRequired;
private int docBase;

private List matches = new ArrayList();

public RandomOrderCollector(int maxHitsRequired)
{
this.maxHitsRequired = maxHitsRequired;
}

public boolean needsScores()
{
return false;
}

@Override
public void collect(int doc) throws IOException
{
matches.add(docBase + doc);
}

@Override
protected void doSetNextReader(LeafReaderContext context) throws 
IOException
{
super.doSetNextReader(context);
this.docBase = context.docBase;
}

public List getHits()
{
Collections.shuffle(matches);
maxHitsRequired = Math.min(matches.size(), maxHitsRequired);

return matches.subList(0, maxHitsRequired);
}
}

Best Regards,
Atul Bisaria

-Original Message-
From: Adrien Grand [mailto:jpou...@gmail.com]
Sent: Wednesday, January 31, 2018 6:33 PM
To: java-user@lucene.apache.org
Subject: Re: Increase search performance

Hi Atul,


Le mar. 30 janv. 2018 à 16:24, Atul Bisaria <atul.bisa...@ericsson.com> a écrit 
:

> 1. Using ConstantScoreQuery so that scoring overhead is removed since
> scoring is not required in my search use case. I also use a custom
> Sort object which does not sort by score (see code below).
>

If you don't sort by score, then wrapping with a ConstantScoreQuery won't help 
as Lucene will figure out scores are not needed anyway.


> 2. Using query cache
>
>
>
> My understanding is that query cache would cache query results and
> hence lead to significant increase in performance. Is this understanding 
> correct?
>

It depends what you mean by performance. If you are optimizing for worst-case 
latency, then the query cache might make things worse due to the fact that 
caching a query requires to visit all matches, while query execution can 
sometimes just skip over non-interesting matches (eg. in conjunctions).

However if you are looking at improving throughput, then usually the default 
policy of the query cache of caching queries that look reused usually helps.


> I am using Lucene version 5.4.1 where query cache seems to be enabled
> by default (https://issues.apache.org/jira/browse/LUCENE-6784), but I
> am not able to see any significant change in search performance.
>




> Here is the code I am testing with:
>
>
>
> DirectoryReader reader = DirectoryReader.open(directory);  //using
> MMapDirectory
>
> IndexSearcher searcher = new IndexSearcher(reader); //IndexReader and
> IndexSearcher are created only once
>
> searcher.setQueryCachingPolicy(QueryCachingPolicy.ALWAYS_CACHE);
>

Don't do that, this will always cache all filters, which usually makes things 
slower for the reason mentioned above. I would rather advise that you use an 
instance of UsageTrackingQueryCachingPolicy.


Increase search performance

2018-01-30 Thread Atul Bisaria
In the search use case in my application, I don't need to score query results 
since all results are equal. Also query patterns are also more or less fixed.



Given these conditions, I am trying to increase search performance by



1. Using ConstantScoreQuery so that scoring overhead is removed since 
scoring is not required in my search use case. I also use a custom Sort object 
which does not sort by score (see code below).



Is this enough to remove scoring overhead in search?



2. Using query cache



My understanding is that query cache would cache query results and hence lead 
to significant increase in performance. Is this understanding correct?



I am using Lucene version 5.4.1 where query cache seems to be enabled by 
default (https://issues.apache.org/jira/browse/LUCENE-6784), but I am not able 
to see any significant change in search performance.



Here is the code I am testing with:



DirectoryReader reader = DirectoryReader.open(directory);  //using 
MMapDirectory

IndexSearcher searcher = new IndexSearcher(reader);//IndexReader 
and IndexSearcher are created only once

searcher.setQueryCachingPolicy(QueryCachingPolicy.ALWAYS_CACHE);



//search code

QueryParser parser = new QueryParser("fieldname", analyzer);

Query query = new ConstantScoreQuery(parser.parse("text"));



ScoreDoc[] hits = searcher.search(query, 20, sort).scoreDocs;





Given above conditions in my application, is there anything more I can do to 
get better search performance?