subject:"Retrieving 1000 records at a time"

Re: Retrieving 1000 records at a time

2016-02-19 Thread Mark Robinson

Thanks Shawn!

Best,
Mark.

On Wed, Feb 17, 2016 at 7:48 PM, Shawn Heisey  wrote:

> On 2/17/2016 3:49 PM, Mark Robinson wrote:
> > I have around 121 fields out of which 12 of them are indexed and almost
> all
> > 121 are stored.
> > Average size of a doc is 10KB.
> >
> > I was checking for start=0, rows=1000.
> > We were querying a Solr instance which was on another server and I think
> > network lag might have come into the picture also.
> >
> > I did not go for any caching as I wanted good response time in the first
> > time querying itself.
>
> Stored fields, which contain the data that is returned to the client in
> the response, are compressed on disk.  Uncompressing this data can
> contribute to the time on a slow query, but I do not think it can
> explain 30 seconds of delay.  Very large documents can be particularly
> slow to decompress, but you have indicated that each entire document is
> about 10K in size, which is not huge.
>
> It is more likely that the delay is caused by one of two things,
> possibly both:
>
> * Extremely long garbage collection pauses due to a heap that is too
> small or VERY huge (beyond 32GB) with inadequate GC tuning.
> * Not enough system memory to effectively cache the index.
>
> Some additional info that may be helpful in tracking this down further:
>
> * For each core on one machine, the size on disk of the data directory.
> * For each core, the number of documents and the number of deleted
> documents.
> * The max heap size for the Solr JVM.
> * Whether there is more than one Solr instance per server.
> * The total installed memory size in the server.
> * Whether or not the server is used for other applications.
> * What operating system the server is running.
> * Whether the index is distributed or contained in a single core.
> * Whether Solr is in SolrCloud mode or not.
> * Solr version.
>
> Thanks,
> Shawn
>
>

Re: Retrieving 1000 records at a time

2016-02-17 Thread Shawn Heisey

On 2/17/2016 3:49 PM, Mark Robinson wrote:
> I have around 121 fields out of which 12 of them are indexed and almost all
> 121 are stored.
> Average size of a doc is 10KB.
>
> I was checking for start=0, rows=1000.
> We were querying a Solr instance which was on another server and I think
> network lag might have come into the picture also.
>
> I did not go for any caching as I wanted good response time in the first
> time querying itself.

Stored fields, which contain the data that is returned to the client in
the response, are compressed on disk.  Uncompressing this data can
contribute to the time on a slow query, but I do not think it can
explain 30 seconds of delay.  Very large documents can be particularly
slow to decompress, but you have indicated that each entire document is
about 10K in size, which is not huge.

It is more likely that the delay is caused by one of two things,
possibly both:

* Extremely long garbage collection pauses due to a heap that is too
small or VERY huge (beyond 32GB) with inadequate GC tuning.
* Not enough system memory to effectively cache the index.

Some additional info that may be helpful in tracking this down further:

* For each core on one machine, the size on disk of the data directory.
* For each core, the number of documents and the number of deleted
documents.
* The max heap size for the Solr JVM.
* Whether there is more than one Solr instance per server.
* The total installed memory size in the server.
* Whether or not the server is used for other applications.
* What operating system the server is running.
* Whether the index is distributed or contained in a single core.
* Whether Solr is in SolrCloud mode or not.
* Solr version.

Thanks,
Shawn

Re: Retrieving 1000 records at a time

2016-02-17 Thread Mark Robinson

Thanks Joel and Chris!

I have around 121 fields out of which 12 of them are indexed and almost all
121 are stored.
Average size of a doc is 10KB.

I was checking for start=0, rows=1000.
We were querying a Solr instance which was on another server and I think
network lag might have come into the picture also.

I did not go for any caching as I wanted good response time in the first
time querying itself.

Thanks much for the links and suggestions. I will go thru each of them.

Best,
Mark.

On Wed, Feb 17, 2016 at 5:26 PM, Chris Hostetter 
wrote:

>
> : I have a requirement where I need to retrieve 1 to 15000 records at a
> : time from SOLR.
> : With 20 or 100 records everything happens in milliseconds.
> : When it goes to 1000, 1  it is taking more time... like even 30
> seconds.
>
> so far all you've really told us about your setup is that some
> queries with "rows=1000" are slow -- but you haven't really told us
> anything else we can help you with -- for example it's not obvious if you
> mean that you are using start=0 in all ofthose queries andthey are slow,
> or if you mean you are paginating through results (ie: increasing start
> param) 1000 at a time nad it starts getting slow as you page deeply.
>
> you also haven't told us anything about the fields you are returning --
> how many are there?, what data types are they? are they large string
> values?
>
> how are you measuring the time? are you sure network lag, or client side
> processing of the data as solr returns it isn't the bulk of the time you
> are measuring?  what does the QTime in the solr responses for these slow
> queries say?
>
> my best guesses are that either: you are doing deep paging and conflating
> the increased response time for deep results with an increase in response
> time for large rows params (because you are getting "deeper" faster with a
> large rows#) or you are seeing an increase in processing time on the
> client due ot the large volume of data being returned -- possibly even
> with SolrJ which is designed to parse the entire response into java
> data structures by default before returning to the client.
>
> w/o more concrete information, it's hard to give you advice beyond
> guesses.
>
>
> potentially helpful links...
>
> https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results
>
> https://lucidworks.com/blog/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/
>
> https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets
>
> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
>
> https://lucene.apache.org/solr/5_4_0/solr-solrj/org/apache/solr/client/solrj/io/stream/expr/StreamFactory.html
>
>
>
> -Hoss
> http://www.lucidworks.com/
>

Re: Retrieving 1000 records at a time

2016-02-17 Thread Chris Hostetter

: I have a requirement where I need to retrieve 1 to 15000 records at a
: time from SOLR.
: With 20 or 100 records everything happens in milliseconds.
: When it goes to 1000, 1 it is taking more time... like even 30 seconds.

so far all you've really told us about your setup is that some
queries with "rows=1000" are slow -- but you haven't really told us
anything else we can help you with -- for example it's not obvious if you
mean that you are using start=0 in all ofthose queries andthey are slow,
or if you mean you are paginating through results (ie: increasing start
param) 1000 at a time nad it starts getting slow as you page deeply.

you also haven't told us anything about the fields you are returning --
how many are there?, what data types are they? are they large string
values?

how are you measuring the time? are you sure network lag, or client side
processing of the data as solr returns it isn't the bulk of the time you
are measuring? what does the QTime in the solr responses for these slow
queries say?

my best guesses are that either: you are doing deep paging and conflating
the increased response time for deep results with an increase in response
time for large rows params (because you are getting "deeper" faster with a
large rows#) or you are seeing an increase in processing time on the
client due ot the large volume of data being returned -- possibly even
with SolrJ which is designed to parse the entire response into java
data structures by default before returning to the client.

w/o more concrete information, it's hard to give you advice beyond
guesses.

potentially helpful links...

https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results
https://lucidworks.com/blog/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/

https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets

https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
https://lucene.apache.org/solr/5_4_0/solr-solrj/org/apache/solr/client/solrj/io/stream/expr/StreamFactory.html

-Hoss
http://www.lucidworks.com/

Re: Retrieving 1000 records at a time

2016-02-17 Thread Joel Bernstein

Also are you ranking documents by score

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Feb 17, 2016 at 1:59 PM, Joel Bernstein  wrote:

> A few questions for you: What types of fields and how many fields will you
> be retrieving? What version of Solr are you using?
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, Feb 17, 2016 at 1:37 PM, Mark Robinson 
> wrote:
>
>> Hi,
>>
>> I have a requirement where I need to retrieve 1 to 15000 records at a
>> time from SOLR.
>> With 20 or 100 records everything happens in milliseconds.
>> When it goes to 1000, 1  it is taking more time... like even 30
>> seconds.
>>
>> Will Solr be able to return 1 records at a time in less than say 200
>> milliseconds?
>>
>> I have read that disk read is a costly affair so we have to batch results
>> and lesser the number of records retrieved in a batch the faster the
>> response when using SOLR.
>>
>> So is Solr a straight away NO candidate in a situation where 1 records
>> should be retrieved in a time of <=200 mS.
>>
>> A quick response would be very helpful.
>>
>> Thanks!
>> Mark
>>
>
>

Re: Retrieving 1000 records at a time

2016-02-17 Thread Joel Bernstein

A few questions for you: What types of fields and how many fields will you
be retrieving? What version of Solr are you using?

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Feb 17, 2016 at 1:37 PM, Mark Robinson 
wrote:

> Hi,
>
> I have a requirement where I need to retrieve 1 to 15000 records at a
> time from SOLR.
> With 20 or 100 records everything happens in milliseconds.
> When it goes to 1000, 1  it is taking more time... like even 30
> seconds.
>
> Will Solr be able to return 1 records at a time in less than say 200
> milliseconds?
>
> I have read that disk read is a costly affair so we have to batch results
> and lesser the number of records retrieved in a batch the faster the
> response when using SOLR.
>
> So is Solr a straight away NO candidate in a situation where 1 records
> should be retrieved in a time of <=200 mS.
>
> A quick response would be very helpful.
>
> Thanks!
> Mark
>

Retrieving 1000 records at a time

2016-02-17 Thread Mark Robinson

Hi,

I have a requirement where I need to retrieve 1 to 15000 records at a
time from SOLR.
With 20 or 100 records everything happens in milliseconds.
When it goes to 1000, 1  it is taking more time... like even 30 seconds.

Will Solr be able to return 1 records at a time in less than say 200
milliseconds?

I have read that disk read is a costly affair so we have to batch results
and lesser the number of records retrieved in a batch the faster the
response when using SOLR.

So is Solr a straight away NO candidate in a situation where 1 records
should be retrieved in a time of <=200 mS.

A quick response would be very helpful.

Thanks!
Mark

Re: Retrieving 1000 records at a time

Re: Retrieving 1000 records at a time

Re: Retrieving 1000 records at a time

Re: Retrieving 1000 records at a time

Re: Retrieving 1000 records at a time

Re: Retrieving 1000 records at a time

Retrieving 1000 records at a time

7 matches

Site Navigation

Mail list logo

Footer information