I'm using Solr 1.4. My observations and this page 
http://wiki.apache.org/solr/DistributedSearchDesign#line-254 indicate that the 
general strategy for Distributed Search is something like:
        1. Query the shards with the user's query and "fl=unique_field,score"
        2. Re-query (maybe a subset of) the shards for certain documents by 
unique_field with the field list the user requested.
        3. Maybe re-query the shards again to flesh out faceting info.

I'm encountering a significant performance penalty using DistributedSearch due 
to these additional queries, and it seems like there are some obvious 
optimizations that could avoid them in certain cases. 

For example, a way to say "I claim the fields I'm requesting are small enough 
that querying again for stored fields is worse than just getting the stored 
fields in the first request". 
(assert_tiny_data=true&fl=tiny_stored_field,unique_field) 
Or, "If the field list of the original query is contained in the first round of 
shard requests, don't bother querying again for more fields". 
(fl=unique_field,score)

Has anyone else looked into this? I'd be interested to learn if there are 
issues that makes these kind of shortcuts difficult before I dig in.

Thanks,
  -Jeff Wartes

Reply via email to