Thanks for tackling this Mike ... I've been dreading the whole issue of Lazy Loading but your patch gives me hope. I haven't had a chance to try it out, but reading through it, it seems a lot more straight forward then I'd feared.
A couple of concerns jump out at me though, starting with the biggee... : Now, there is a concern with the doc cache of SolrIndexSearcher, which : assumes it has the whole document in the cache. To maintain this : invariant, it is still the case that all the fields in a document are : loaded in a searcher.doc(i) call. However, if a field set is given to : teh method, only the given fields are loaded directly, while the rest : are loaded lazily. it doesn't look like this would actually be an invarient with your patch. Consider the case of two request handlers: X which expects/uses all fields, and Y which only uses a few fields consistently so the rest are left to lazy loading. If Y hits doc N first it puts N in cache with only partial fields, isn't X forced to use lazy loading every time after that? Or does lazy loading not work that way? ... would the lazy fields still only be loaded once (the first time X asked for them) but then stored in the Document (which is still in the cache) ? ... even if that is hte case would hte performance hit of X loading each field Lazily one at a time be more expensive then refetching completely? (can you tell how little i understand the mechanics of Lazy Loading in general?) Depending on what would happen in the situation i described, perhaps the Solr document cache should only be used by the single arg version of SolrIndexSearcher.doc? (if we do this, two arg version of readDocs needs changed to use it directly) An alternate idea: the single arg version could check if an item found in the cache contains lazy fields and if so re-fetch and recache the full Document? FWIW: This isn't an obscure situation, The main index I deal with has a lot of metadata documents (one per category) which are *HUGE*, and there are two custom request handlers which use those documents -- one uses every field in them, and the other uses only a single field. That second handler would be an ideal candidate to start using your new SolrIndexSearcher.doc(int,Set<String>) method, but it would suck if doing thta ment that the other handler's performance suffered because it started taking longer to fetch all of the fields) Other smaller issues... 1) it's not clear to me ... will optimizePreFetchDocs trigger an NPE if there are highlight fields but no uniqueKey for the schema? (same potential in HighlightUtils which has the same line) 2) why doesn't optimizePreFetchDocs use SolrIndexSearcher.readDocs (looks like cut/paste of method body) 3) should we be concerned about letting people specify prefixes/suffixes of the fields they want to forcably load for dynamicFields instead of just a Set<String> of names? .. or should we cross that bridge when we come to it? (I ask because we have no cache aware method that takes in a FieldSelector, just the one that takes in the Set<String>) And a few minor nits... * lists in javadoc comments should us HTML <li> tags so they can be read cleanly in generated docs (see SolrPluginUtils.optimizePreFetchDocs) * there are some double imports of Fieldable in several classes (it looks like maybe they were already importing it once for no reason) -Hoss
