[ http://issues.apache.org/jira/browse/SOLR-52?page=comments#action_12440682 ] Yonik Seeley commented on SOLR-52: ----------------------------------
+1, looks good. There are some small backward incompatabilities (any place that returns a Fieldable, like getUniqueKeyField), but it can't be helped, and it's fairly expert level anyway. My only concern was about a memory increase for lazy-loaded short fields. I reviewed some of the LazyField code just now, and it looks like this shouldn't be the case: - LazyField is an inner class that contains an extra 3 members. It's outer class that it will retain a reference to is FieldsReader. The fieldsReader instance is a member of SegmentReader, and has the same lifetime as the SegmentReader. Hence the LazyField won't extend the lifetime of any other objects. One thing I did see is the internal char[] buffer used to read the string in LazyField is a member for some reason (hence the data will be stored in the field *twice* for some reason). I think this is probably a bug, and I'll bring it up on the Lucene list. Ideas for future optimizations: - if there is no document cache, change lazy to no-load - special cases: if only a single field (like the ID field) is selected out of many documents to be return, consider bypassing doc cache and use LOAD_AND_BREAK if we know there is only a single value. > Lazy Field loading > ------------------ > > Key: SOLR-52 > URL: http://issues.apache.org/jira/browse/SOLR-52 > Project: Solr > Issue Type: Improvement > Components: search > Reporter: Mike Klaas > Assigned To: Mike Klaas > Priority: Minor > Attachments: lazyfields_patch.diff > > > Add lazy field loading to solr. > Currently solr reads all stored fields and filters the undesired fields based > on the field list. This is usually not a performance concern, but when using > solr to store large numbers of fields, or just one large field (doc contents, > eg. for highlighting), it is perceptible. > Now, there is a concern with the doc cache of SolrIndexSearcher, which > assumes it has the whole document in the cache. To maintain this invariant, > it is still the case that all the fields in a document are loaded in a > searcher.doc(i) call. However, if a field set is given to teh method, only > the given fields are loaded directly, while the rest are loaded lazily. > Some concerns about lazy field loading > 1. Lazy field are only valid while the IndexReader is open. I believe this > is fine since the IndexReader is kept alive by the SolrIndexSearcher, so all > docs in the cache have the reader available. > 2. It is slower to read a field lazily and retrieve its value later than > retrieve it directory to begin with (though I don't know how much--depends on > i/o factors). We certainly don't want this to be the common case. I added > an optional call which accumulates all the field likely to be used in the > request (highlighting, reponse writing), and populates the IndexSearcher > cache a priori. This has the added advantage of concentrating doc retrieval > in a single place, which is nice from a performance testing perspective. > 3. LazyFields are incompatible with the sundry Field declarations scattered > about Solr. I believe I've changed all the necessary locations to Fieldable. > Comments appreciated -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
