Re: How to retrieve 200K documents from Solr 4.10.2

2016-10-13 Thread Emir Arnautovic

Hi Obaid,

You may also want to check out 
https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets


Emir

On 13.10.2016 00:33, Nick Vasilyev wrote:

Check out cursorMark, it should be available in your release. There is some
good information on this page:

https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results


On Wed, Oct 12, 2016 at 5:46 PM, Salikeen, Obaid <
obaid.salik...@iacpublishinglabs.com> wrote:


Hi,

I am using Solr 4.10.2. I have 200K documents sitting on Solr cluster (it
has 3 nodes), and let me first state that I am new Solr. I want to retrieve
all documents from Sold (essentially just one field from each document).

What is the best way of fetching this much data without overloading Solr
cluster?


Approach I tried:
I tried using the following API (running every minute) to fetch a batch of
1000 documents every minute. On Each run, I initialize start with the new
index i.e adding 1000.
http://SOLR_HOST/solr/abc/select?q=*:*&fq=&start=1&rows=
1000&fl=url&wt=csv&csv.header=false&hl=false

However, with the above approach, I have two issues:

1.   Solr cluster gets overloaded i.e it slows down

2.   I am not sure if start=X&rows=1000 would give me the correct
results (changing rows=2 or rows=4 gives me totally different results,
which is why I am not confident if I will get the correct results).


Thanks
Obaid




--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: How to retrieve 200K documents from Solr 4.10.2

2016-10-12 Thread Nick Vasilyev
Check out cursorMark, it should be available in your release. There is some
good information on this page:

https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results


On Wed, Oct 12, 2016 at 5:46 PM, Salikeen, Obaid <
obaid.salik...@iacpublishinglabs.com> wrote:

> Hi,
>
> I am using Solr 4.10.2. I have 200K documents sitting on Solr cluster (it
> has 3 nodes), and let me first state that I am new Solr. I want to retrieve
> all documents from Sold (essentially just one field from each document).
>
> What is the best way of fetching this much data without overloading Solr
> cluster?
>
>
> Approach I tried:
> I tried using the following API (running every minute) to fetch a batch of
> 1000 documents every minute. On Each run, I initialize start with the new
> index i.e adding 1000.
> http://SOLR_HOST/solr/abc/select?q=*:*&fq=&start=1&rows=
> 1000&fl=url&wt=csv&csv.header=false&hl=false
>
> However, with the above approach, I have two issues:
>
> 1.   Solr cluster gets overloaded i.e it slows down
>
> 2.   I am not sure if start=X&rows=1000 would give me the correct
> results (changing rows=2 or rows=4 gives me totally different results,
> which is why I am not confident if I will get the correct results).
>
>
> Thanks
> Obaid
>
>


How to retrieve 200K documents from Solr 4.10.2

2016-10-12 Thread Salikeen, Obaid
Hi,

I am using Solr 4.10.2. I have 200K documents sitting on Solr cluster (it has 3 
nodes), and let me first state that I am new Solr. I want to retrieve all 
documents from Sold (essentially just one field from each document).

What is the best way of fetching this much data without overloading Solr 
cluster?


Approach I tried:
I tried using the following API (running every minute) to fetch a batch of 1000 
documents every minute. On Each run, I initialize start with the new index i.e 
adding 1000.
http://SOLR_HOST/solr/abc/select?q=*:*&fq=&start=1&rows=1000&fl=url&wt=csv&csv.header=false&hl=false

However, with the above approach, I have two issues:

1.   Solr cluster gets overloaded i.e it slows down

2.   I am not sure if start=X&rows=1000 would give me the correct results 
(changing rows=2 or rows=4 gives me totally different results, which is why I 
am not confident if I will get the correct results).


Thanks
Obaid