[ 
https://issues.apache.org/jira/browse/OAK-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tommaso Teofili updated OAK-3129:
---------------------------------
    Description: 
{{SolrQueryIndex}} and {{FilterQueryParser}} use the 
{{OakSolrConfiguration#getRows}} setting in order to set the number of 
documents that should be fetched in batches while iterating the {{Cursor}} 
resulting from a certain query.
While this is an optimization that avoids loading all the results in memory in 
cases where only e.g. the first 10 results of the {{Cursor}} are visited, it 
tends to perform really bad when resultsets' cardinality is 10 times or more 
bigger than the 'rows' setting, because for each JCR query, 10 or more Solr 
queries are performed (with the additional network, Solr calls, etc. latencies).

In order to avoid that we could make use of the 'rows' setting in order to 
perform the first request to Solr and then adapt the subsequent paged requests 
(controlled by start and rows Solr HTTP parameters) to be run against the rest 
of the resultset in no more than 2 Solr queries. This can be done by looking at 
the _numFound_ value from Solr's response header (from the first query) and set 
the start/rows parameters accordingly.

  was:
{{SolrQueryIndex}} and {{FilterQueryParser}} use the 
{{OakSolrConfiguration#getRows}} setting in order to set the number of 
documents that should be fetched in batches while iterating the {{Cursor}} 
resulting from a certain query.
While this is an optimization that avoids loading all the results in memory in 
cases where only e.g. the first 10 results of the {{Cursor}} are visited, it 
tends to perform really bad when resultsets' cardinality is 10 times or more 
bigger than the 'rows' setting, because for each JCR query, 10 or more Solr 
queries are performed (with the additional network, Solr calls, etc. latencies).

In order to avoid that we could make use of the 'rows' setting in order to 
perform the first request to Solr and then adapt the subsequent paged requests 
(controlled by start and rows Solr HTTP parameters) to be run against the rest 
of the resultset in no more than 2 Solr queries. This can be done by looking at 
the _numFound_ value from Solr's response header and set the start/rows 
parameters accordingly.


> SolrQueryIndex making too many Solr requests per jCR query
> ----------------------------------------------------------
>
>                 Key: OAK-3129
>                 URL: https://issues.apache.org/jira/browse/OAK-3129
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: solr
>    Affects Versions: 1.2.2, 1.3.2, 1.0.17
>            Reporter: Tommaso Teofili
>            Assignee: Tommaso Teofili
>             Fix For: 1.2.4, 1.3.3, 1.0.18
>
>
> {{SolrQueryIndex}} and {{FilterQueryParser}} use the 
> {{OakSolrConfiguration#getRows}} setting in order to set the number of 
> documents that should be fetched in batches while iterating the {{Cursor}} 
> resulting from a certain query.
> While this is an optimization that avoids loading all the results in memory 
> in cases where only e.g. the first 10 results of the {{Cursor}} are visited, 
> it tends to perform really bad when resultsets' cardinality is 10 times or 
> more bigger than the 'rows' setting, because for each JCR query, 10 or more 
> Solr queries are performed (with the additional network, Solr calls, etc. 
> latencies).
> In order to avoid that we could make use of the 'rows' setting in order to 
> perform the first request to Solr and then adapt the subsequent paged 
> requests (controlled by start and rows Solr HTTP parameters) to be run 
> against the rest of the resultset in no more than 2 Solr queries. This can be 
> done by looking at the _numFound_ value from Solr's response header (from the 
> first query) and set the start/rows parameters accordingly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to