[jira] Commented: (SOLR-52) Lazy Field loading

Yonik Seeley (JIRA) Sat, 07 Oct 2006 09:01:37 -0700

    [ 
http://issues.apache.org/jira/browse/SOLR-52?page=comments#action_12440682 ] 
            
Yonik Seeley commented on SOLR-52:
----------------------------------


+1, looks good.

There are some small backward incompatabilities (any place that returns a 
Fieldable, like getUniqueKeyField), but it can't be helped, and it's fairly 
expert level anyway.

My only concern was about a memory increase for lazy-loaded short fields.  I 
reviewed some of the LazyField code just now, and it looks like this shouldn't 
be the case:
 - LazyField is an inner class that contains an extra 3 members.   It's outer 
class that it will retain a reference to is FieldsReader.    The fieldsReader 
instance is a member of SegmentReader, and has the same lifetime as the 
SegmentReader.  Hence the LazyField won't extend the lifetime of any other 
objects.

One thing I did see is the internal char[] buffer used to read the string in 
LazyField is a member for some reason (hence the data will be stored in the 
field *twice* for some reason).  I think this is probably a bug, and I'll bring 
it up on the Lucene list.

Ideas for future optimizations:
- if there is no document cache, change lazy to no-load
- special cases: if only a single field (like the ID field) is selected out of 
many documents to be return, consider bypassing doc cache and use 
LOAD_AND_BREAK if we know there is only a single value.

> Lazy Field loading
> ------------------
>
>                 Key: SOLR-52
>                 URL: http://issues.apache.org/jira/browse/SOLR-52
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Mike Klaas
>         Assigned To: Mike Klaas
>            Priority: Minor
>         Attachments: lazyfields_patch.diff
>
>
> Add lazy field loading to solr.
> Currently solr reads all stored fields and filters the undesired fields based 
> on the field list.  This is usually not a performance concern, but when using 
> solr to store large numbers of fields, or just one large field (doc contents, 
> eg. for highlighting), it is perceptible.
> Now, there is a concern with the doc cache of SolrIndexSearcher, which 
> assumes it has the whole document in the cache.  To maintain this invariant, 
> it is still the case that all the fields in a document are loaded in a 
> searcher.doc(i) call.  However, if a field set is given to teh method, only 
> the given fields are loaded directly, while the rest are loaded lazily.
> Some concerns about lazy field loading
>   1. Lazy field are only valid while the IndexReader is open.  I believe this 
> is fine since the IndexReader is kept alive by the SolrIndexSearcher, so all 
> docs in the cache have the reader available.  
>   2. It is slower to read a field lazily and retrieve its value later than 
> retrieve it directory to begin with (though I don't know how much--depends on 
> i/o factors).  We certainly don't want this to be the common case.  I added 
> an optional call which accumulates all the field likely to be used in the 
> request (highlighting, reponse writing), and populates the IndexSearcher 
> cache a priori.  This has the added advantage of concentrating doc retrieval 
> in a single place, which is nice from a performance testing perspective.
>  3. LazyFields are incompatible with the sundry Field declarations scattered 
> about Solr.  I believe I've changed all the necessary locations to Fieldable.
> Comments appreciated

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (SOLR-52) Lazy Field loading

Reply via email to