Thanks for tackling this Mike ... I've been dreading the whole issue of
Lazy Loading but your patch gives me hope.  I haven't had a chance to try
it out, but reading through it, it seems a lot more straight forward then
I'd feared.

A couple of concerns jump out at me though, starting with the biggee...

: Now, there is a concern with the doc cache of SolrIndexSearcher, which
: assumes it has the whole document in the cache.  To maintain this
: invariant, it is still the case that all the fields in a document are
: loaded in a searcher.doc(i) call.  However, if a field set is given to
: teh method, only the given fields are loaded directly, while the rest
: are loaded lazily.

it doesn't look like this would actually be an invarient with your patch.
Consider the case of two request handlers: X which expects/uses
all fields, and Y which only uses a few fields consistently so the
rest are left to lazy loading.   If Y hits doc N first it puts N in
cache with only partial fields, isn't X forced to use lazy loading every
time after that?

Or does lazy loading not work that way? ... would the lazy fields still
only be loaded once (the first time X asked for them) but then stored in
the Document (which is still in the cache) ? ... even if that is hte case
would hte performance hit of X loading each field Lazily one at a time be
more expensive then refetching completely?

(can you tell how little i understand the mechanics of Lazy Loading in
general?)

Depending on what would happen in the situation i described, perhaps
the Solr document cache should only be used by the single arg version of
SolrIndexSearcher.doc? (if we do this, two arg version of readDocs needs
changed to use it directly)

An alternate idea: the single arg version could check if an item found in
the cache contains lazy fields and if so re-fetch and recache the full
Document?

FWIW: This isn't an obscure situation, The main index I deal with has a
lot of metadata documents (one per category) which are *HUGE*, and there
are two custom request handlers which use those documents -- one uses
every field in them, and the other uses only a single field.  That second
handler would be an ideal candidate to start using your new
SolrIndexSearcher.doc(int,Set<String>) method, but it would suck if doing
thta ment that the other handler's performance suffered because it started
taking longer to fetch all of the fields)


Other smaller issues...

1) it's not clear to me ... will optimizePreFetchDocs trigger an NPE if
there are highlight fields but no uniqueKey for the schema?
(same potential in HighlightUtils which has the same line)

2) why doesn't optimizePreFetchDocs use SolrIndexSearcher.readDocs (looks
like cut/paste of method body)

3) should we be concerned about letting people specify prefixes/suffixes
of the fields they want to forcably load for dynamicFields instead of just
a Set<String> of names? .. or should we cross that bridge when we come to
it?  (I ask because we have no cache aware method that takes in a
FieldSelector, just the one that takes in the Set<String>)


And a few minor nits...

* lists in javadoc comments should us HTML <li> tags so they can be read
  cleanly in generated docs (see SolrPluginUtils.optimizePreFetchDocs)
* there are some double imports of Fieldable in several classes (it
  looks like maybe they were already importing it once for no reason)



-Hoss

Reply via email to