Thanks a ton, Dwane. I went through the article and the documentation link.
This corresponds exactly to my use case.

Best
Goutham

On Fri, Sep 25, 2020 at 2:59 PM Dwane Hall <dwaneh...@hotmail.com> wrote:

> Goutham I suggest you read Hossman's excellent article on deep paging and
> why returning rows=(some large number) is a bad idea. It provides an
> thorough overview of the concept and will explain it better than I ever
> could (
> https://lucidworks.com/post/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/#update_2013_12_18).
> In short if you want to extract that many documents out of your corpus use
> cursor mark, streaming expressions, or Solr's parallel SQL interface (that
> uses streaming expressions under the hood)
> https://lucene.apache.org/solr/guide/8_6/streaming-expressions.html.
>
> Thanks,
>
> Dwane
> ------------------------------
> *From:* Goutham Tholpadi <gtholp...@gmail.com>
> *Sent:* Friday, 25 September 2020 4:19 PM
> *To:* solr-user@lucene.apache.org <solr-user@lucene.apache.org>
> *Subject:* Solr queries slow down over time
>
> Hi,
>
> I have around 30M documents in Solr, and I am doing repeated *:* queries
> with rows=10000, and changing start to 0, 10000, 20000, and so on, in a
> loop in my script (using pysolr).
>
> At the start of the iteration, the calls to Solr were taking less than 1
> sec each. After running for a few hours (with start at around 27M) I found
> that each call was taking around 30-60 secs.
>
> Any pointers on why the same fetch of 10000 records takes much longer now?
> Does Solr need to load all the 27M before getting the last 10000 records?
> Is there a better way to do this operation using Solr?
>
> Thanks!
> Goutham
>

Reply via email to