Thanks a ton, Dwane. I went through the article and the documentation link. This corresponds exactly to my use case.
Best Goutham On Fri, Sep 25, 2020 at 2:59 PM Dwane Hall <dwaneh...@hotmail.com> wrote: > Goutham I suggest you read Hossman's excellent article on deep paging and > why returning rows=(some large number) is a bad idea. It provides an > thorough overview of the concept and will explain it better than I ever > could ( > https://lucidworks.com/post/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/#update_2013_12_18). > In short if you want to extract that many documents out of your corpus use > cursor mark, streaming expressions, or Solr's parallel SQL interface (that > uses streaming expressions under the hood) > https://lucene.apache.org/solr/guide/8_6/streaming-expressions.html. > > Thanks, > > Dwane > ------------------------------ > *From:* Goutham Tholpadi <gtholp...@gmail.com> > *Sent:* Friday, 25 September 2020 4:19 PM > *To:* solr-user@lucene.apache.org <solr-user@lucene.apache.org> > *Subject:* Solr queries slow down over time > > Hi, > > I have around 30M documents in Solr, and I am doing repeated *:* queries > with rows=10000, and changing start to 0, 10000, 20000, and so on, in a > loop in my script (using pysolr). > > At the start of the iteration, the calls to Solr were taking less than 1 > sec each. After running for a few hours (with start at around 27M) I found > that each call was taking around 30-60 secs. > > Any pointers on why the same fetch of 10000 records takes much longer now? > Does Solr need to load all the 27M before getting the last 10000 records? > Is there a better way to do this operation using Solr? > > Thanks! > Goutham >