Re: How to retrieve 200K documents from Solr 4.10.2
Hi Obaid, You may also want to check out https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets Emir On 13.10.2016 00:33, Nick Vasilyev wrote: Check out cursorMark, it should be available in your release. There is some good information on this page: https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results On Wed, Oct 12, 2016 at 5:46 PM, Salikeen, Obaid < obaid.salik...@iacpublishinglabs.com> wrote: Hi, I am using Solr 4.10.2. I have 200K documents sitting on Solr cluster (it has 3 nodes), and let me first state that I am new Solr. I want to retrieve all documents from Sold (essentially just one field from each document). What is the best way of fetching this much data without overloading Solr cluster? Approach I tried: I tried using the following API (running every minute) to fetch a batch of 1000 documents every minute. On Each run, I initialize start with the new index i.e adding 1000. http://SOLR_HOST/solr/abc/select?q=*:*&fq=&start=1&rows= 1000&fl=url&wt=csv&csv.header=false&hl=false However, with the above approach, I have two issues: 1. Solr cluster gets overloaded i.e it slows down 2. I am not sure if start=X&rows=1000 would give me the correct results (changing rows=2 or rows=4 gives me totally different results, which is why I am not confident if I will get the correct results). Thanks Obaid -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/
Re: How to retrieve 200K documents from Solr 4.10.2
Check out cursorMark, it should be available in your release. There is some good information on this page: https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results On Wed, Oct 12, 2016 at 5:46 PM, Salikeen, Obaid < obaid.salik...@iacpublishinglabs.com> wrote: > Hi, > > I am using Solr 4.10.2. I have 200K documents sitting on Solr cluster (it > has 3 nodes), and let me first state that I am new Solr. I want to retrieve > all documents from Sold (essentially just one field from each document). > > What is the best way of fetching this much data without overloading Solr > cluster? > > > Approach I tried: > I tried using the following API (running every minute) to fetch a batch of > 1000 documents every minute. On Each run, I initialize start with the new > index i.e adding 1000. > http://SOLR_HOST/solr/abc/select?q=*:*&fq=&start=1&rows= > 1000&fl=url&wt=csv&csv.header=false&hl=false > > However, with the above approach, I have two issues: > > 1. Solr cluster gets overloaded i.e it slows down > > 2. I am not sure if start=X&rows=1000 would give me the correct > results (changing rows=2 or rows=4 gives me totally different results, > which is why I am not confident if I will get the correct results). > > > Thanks > Obaid > >
How to retrieve 200K documents from Solr 4.10.2
Hi, I am using Solr 4.10.2. I have 200K documents sitting on Solr cluster (it has 3 nodes), and let me first state that I am new Solr. I want to retrieve all documents from Sold (essentially just one field from each document). What is the best way of fetching this much data without overloading Solr cluster? Approach I tried: I tried using the following API (running every minute) to fetch a batch of 1000 documents every minute. On Each run, I initialize start with the new index i.e adding 1000. http://SOLR_HOST/solr/abc/select?q=*:*&fq=&start=1&rows=1000&fl=url&wt=csv&csv.header=false&hl=false However, with the above approach, I have two issues: 1. Solr cluster gets overloaded i.e it slows down 2. I am not sure if start=X&rows=1000 would give me the correct results (changing rows=2 or rows=4 gives me totally different results, which is why I am not confident if I will get the correct results). Thanks Obaid