Re: Solr and FieldCache

J.J. Larrea Thu, 20 Sep 2007 11:10:07 -0700

At 5:30 PM +0200 9/20/07, Walter Ferrara wrote:
>I have an index with several fields, but just one stored: ID (string,
>unique).
>I need to access that ID field for each of the tops "nodes" docs in my
>results (this is done inside a handler I wrote), code looks like:
>
>     Hits hits = searcher.search(query);
>     for(int i=0; i<nodes; i++) {
>            id[i]=hits.doc(i).get("ID");
>            score[i]=hits.score(i);
>     }
>
>I noticed that retrieving the code is slow.
>
>if I use the FieldCache, like:
>id[i]=FieldCache.DEFAULT.getStrings(searcher.getReader(),
>"ID")[hits.id(i)];


I assume you're putting FieldCache.DEFAULT.getStrings(searcher.getReader(),
"ID") in an array outside the loop, saving 2 redundant method calls per 
iteration.

>after the first execution (the initialization of the cache take some
>times), it seems to run much faster.

Do note that FieldCache.DEFAULT is caching the indexed values, not the stored 
values.  Since your field is an ID you are probably indexing it in such a way 
that both are identical, e.g. with KeywordTokenizer, so you're not seeing a 
difference.

>But what happens when SOLR reload  the index (after a commit, or an
>optimize for example)?
>Will it refresh the cache with new reader (in the warmup process?), or
>it will be the first query execution of that code (with the new reader)
>that will force the refresh? (this could mean that every first query
>after a reload will be slower)

It is refreshed by Lucene the first time the FieldCache array is requested from 
the new IndexReader.

>Is there any way to tell SOLR to cache and warmup when needed this "ID"
>field?

Absolutely, just put a warmup query in solrconfig.xml which makes request that 
invokes FieldCache.DEFAULT.getStrings on that field.

Simplest would probably be to invoke your custom handler, perhaps passing 
arguments that limit it to only processing one document to limit the data which 
gets cached; since getStrings returns the entire array, one pass through your 
loop is fine.

If that's not easy with your handler, you could achieve the same effect by 
setting up a handler which facets on the ID field, sorting by ID 
(facet.sort=false), and only asks for a single value (facet.limit=1) (the 
entire id[docid] array will get scanned to count references to that ID, but 
that ensures it gets paged in).

- J.J.

Re: Solr and FieldCache

Reply via email to