[ https://issues.apache.org/jira/browse/SOLR-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shawn Smith updated SOLR-1880: ------------------------------ Attachment: one-pass-query-v1.4.0.patch Attached a version of the patch that can applied to v1.4.0 source. The trunk patch above assumes a couple of fixes made since v1.4.0. > Performance: Distributed Search should skip GET_FIELDS stage if EXECUTE_QUERY > stage gets all fields > --------------------------------------------------------------------------------------------------- > > Key: SOLR-1880 > URL: https://issues.apache.org/jira/browse/SOLR-1880 > Project: Solr > Issue Type: Improvement > Components: search > Affects Versions: 1.4 > Reporter: Shawn Smith > Attachments: one-pass-query-v1.4.0.patch, one-pass-query.patch > > > Right now, a typical distributed search using QueryComponent makes two HTTP > requests to each shard: > # STAGE_EXECUTE_QUERY executes one HTTP request to each shard to get top N > ids and sort keys, merges the results to produce a final list of document IDs > (PURPOSE_GET_TOP_IDS). > # STAGE_GET_FIELDS executes a second HTTP request to each shard to get the > document field values for the final list of document IDs (PURPOSE_GET_FIELDS). > If the "fl" param is just "id" or just "id,score", all document data to > return is already fetched by STAGE_EXECUTE_QUERY. The second > STAGE_GET_FIELDS query is completely unnecessary. Eliminating that 2nd HTTP > request can make a big difference in overall performance. > Also, the "fl" param only gets id, score and sort columns, it would probably > be cheaper to fetch the final sort column data in STAGE_EXECUTE_QUERY which > has to read the sort column data anyway, and skip STAGE_GET_FIELDS. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira