Re: Retrieving 1000 records at a time

2016-02-19 Thread Mark Robinson
Thanks Shawn!

Best,
Mark.

On Wed, Feb 17, 2016 at 7:48 PM, Shawn Heisey  wrote:

> On 2/17/2016 3:49 PM, Mark Robinson wrote:
> > I have around 121 fields out of which 12 of them are indexed and almost
> all
> > 121 are stored.
> > Average size of a doc is 10KB.
> >
> > I was checking for start=0, rows=1000.
> > We were querying a Solr instance which was on another server and I think
> > network lag might have come into the picture also.
> >
> > I did not go for any caching as I wanted good response time in the first
> > time querying itself.
>
> Stored fields, which contain the data that is returned to the client in
> the response, are compressed on disk.  Uncompressing this data can
> contribute to the time on a slow query, but I do not think it can
> explain 30 seconds of delay.  Very large documents can be particularly
> slow to decompress, but you have indicated that each entire document is
> about 10K in size, which is not huge.
>
> It is more likely that the delay is caused by one of two things,
> possibly both:
>
> * Extremely long garbage collection pauses due to a heap that is too
> small or VERY huge (beyond 32GB) with inadequate GC tuning.
> * Not enough system memory to effectively cache the index.
>
> Some additional info that may be helpful in tracking this down further:
>
> * For each core on one machine, the size on disk of the data directory.
> * For each core, the number of documents and the number of deleted
> documents.
> * The max heap size for the Solr JVM.
> * Whether there is more than one Solr instance per server.
> * The total installed memory size in the server.
> * Whether or not the server is used for other applications.
> * What operating system the server is running.
> * Whether the index is distributed or contained in a single core.
> * Whether Solr is in SolrCloud mode or not.
> * Solr version.
>
> Thanks,
> Shawn
>
>


Re: Retrieving 1000 records at a time

2016-02-17 Thread Shawn Heisey
On 2/17/2016 3:49 PM, Mark Robinson wrote:
> I have around 121 fields out of which 12 of them are indexed and almost all
> 121 are stored.
> Average size of a doc is 10KB.
>
> I was checking for start=0, rows=1000.
> We were querying a Solr instance which was on another server and I think
> network lag might have come into the picture also.
>
> I did not go for any caching as I wanted good response time in the first
> time querying itself.

Stored fields, which contain the data that is returned to the client in
the response, are compressed on disk.  Uncompressing this data can
contribute to the time on a slow query, but I do not think it can
explain 30 seconds of delay.  Very large documents can be particularly
slow to decompress, but you have indicated that each entire document is
about 10K in size, which is not huge.

It is more likely that the delay is caused by one of two things,
possibly both:

* Extremely long garbage collection pauses due to a heap that is too
small or VERY huge (beyond 32GB) with inadequate GC tuning.
* Not enough system memory to effectively cache the index.

Some additional info that may be helpful in tracking this down further:

* For each core on one machine, the size on disk of the data directory.
* For each core, the number of documents and the number of deleted
documents.
* The max heap size for the Solr JVM.
* Whether there is more than one Solr instance per server.
* The total installed memory size in the server.
* Whether or not the server is used for other applications.
* What operating system the server is running.
* Whether the index is distributed or contained in a single core.
* Whether Solr is in SolrCloud mode or not.
* Solr version.

Thanks,
Shawn



Re: Retrieving 1000 records at a time

2016-02-17 Thread Mark Robinson
Thanks Joel and Chris!

I have around 121 fields out of which 12 of them are indexed and almost all
121 are stored.
Average size of a doc is 10KB.

I was checking for start=0, rows=1000.
We were querying a Solr instance which was on another server and I think
network lag might have come into the picture also.

I did not go for any caching as I wanted good response time in the first
time querying itself.

Thanks much for the links and suggestions. I will go thru each of them.

Best,
Mark.

On Wed, Feb 17, 2016 at 5:26 PM, Chris Hostetter 
wrote:

>
> : I have a requirement where I need to retrieve 1 to 15000 records at a
> : time from SOLR.
> : With 20 or 100 records everything happens in milliseconds.
> : When it goes to 1000, 1  it is taking more time... like even 30
> seconds.
>
> so far all you've really told us about your setup is that some
> queries with "rows=1000" are slow -- but you haven't really told us
> anything else we can help you with -- for example it's not obvious if you
> mean that you are using start=0 in all ofthose queries andthey are slow,
> or if you mean you are paginating through results (ie: increasing start
> param) 1000 at a time nad it starts getting slow as you page deeply.
>
> you also haven't told us anything about the fields you are returning --
> how many are there?, what data types are they? are they large string
> values?
>
> how are you measuring the time? are you sure network lag, or client side
> processing of the data as solr returns it isn't the bulk of the time you
> are measuring?  what does the QTime in the solr responses for these slow
> queries say?
>
> my best guesses are that either: you are doing deep paging and conflating
> the increased response time for deep results with an increase in response
> time for large rows params (because you are getting "deeper" faster with a
> large rows#) or you are seeing an increase in processing time on the
> client due ot the large volume of data being returned -- possibly even
> with SolrJ which is designed to parse the entire response into java
> data structures by default before returning to the client.
>
> w/o more concrete information, it's hard to give you advice beyond
> guesses.
>
>
> potentially helpful links...
>
> https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results
>
> https://lucidworks.com/blog/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/
>
> https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets
>
> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
>
> https://lucene.apache.org/solr/5_4_0/solr-solrj/org/apache/solr/client/solrj/io/stream/expr/StreamFactory.html
>
>
>
> -Hoss
> http://www.lucidworks.com/
>


Re: Retrieving 1000 records at a time

2016-02-17 Thread Chris Hostetter

: I have a requirement where I need to retrieve 1 to 15000 records at a
: time from SOLR.
: With 20 or 100 records everything happens in milliseconds.
: When it goes to 1000, 1  it is taking more time... like even 30 seconds.

so far all you've really told us about your setup is that some 
queries with "rows=1000" are slow -- but you haven't really told us 
anything else we can help you with -- for example it's not obvious if you 
mean that you are using start=0 in all ofthose queries andthey are slow, 
or if you mean you are paginating through results (ie: increasing start 
param) 1000 at a time nad it starts getting slow as you page deeply.

you also haven't told us anything about the fields you are returning -- 
how many are there?, what data types are they? are they large string 
values?

how are you measuring the time? are you sure network lag, or client side 
processing of the data as solr returns it isn't the bulk of the time you 
are measuring?  what does the QTime in the solr responses for these slow 
queries say?

my best guesses are that either: you are doing deep paging and conflating 
the increased response time for deep results with an increase in response 
time for large rows params (because you are getting "deeper" faster with a 
large rows#) or you are seeing an increase in processing time on the 
client due ot the large volume of data being returned -- possibly even 
with SolrJ which is designed to parse the entire response into java 
data structures by default before returning to the client.

w/o more concrete information, it's hard to give you advice beyond 
guesses.


potentially helpful links...

https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results
https://lucidworks.com/blog/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/

https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets

https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
https://lucene.apache.org/solr/5_4_0/solr-solrj/org/apache/solr/client/solrj/io/stream/expr/StreamFactory.html



-Hoss
http://www.lucidworks.com/


Re: Retrieving 1000 records at a time

2016-02-17 Thread Joel Bernstein
Also are you ranking documents by score

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Feb 17, 2016 at 1:59 PM, Joel Bernstein  wrote:

> A few questions for you: What types of fields and how many fields will you
> be retrieving? What version of Solr are you using?
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, Feb 17, 2016 at 1:37 PM, Mark Robinson 
> wrote:
>
>> Hi,
>>
>> I have a requirement where I need to retrieve 1 to 15000 records at a
>> time from SOLR.
>> With 20 or 100 records everything happens in milliseconds.
>> When it goes to 1000, 1  it is taking more time... like even 30
>> seconds.
>>
>> Will Solr be able to return 1 records at a time in less than say 200
>> milliseconds?
>>
>> I have read that disk read is a costly affair so we have to batch results
>> and lesser the number of records retrieved in a batch the faster the
>> response when using SOLR.
>>
>> So is Solr a straight away NO candidate in a situation where 1 records
>> should be retrieved in a time of <=200 mS.
>>
>> A quick response would be very helpful.
>>
>> Thanks!
>> Mark
>>
>
>


Re: Retrieving 1000 records at a time

2016-02-17 Thread Joel Bernstein
A few questions for you: What types of fields and how many fields will you
be retrieving? What version of Solr are you using?

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Feb 17, 2016 at 1:37 PM, Mark Robinson 
wrote:

> Hi,
>
> I have a requirement where I need to retrieve 1 to 15000 records at a
> time from SOLR.
> With 20 or 100 records everything happens in milliseconds.
> When it goes to 1000, 1  it is taking more time... like even 30
> seconds.
>
> Will Solr be able to return 1 records at a time in less than say 200
> milliseconds?
>
> I have read that disk read is a costly affair so we have to batch results
> and lesser the number of records retrieved in a batch the faster the
> response when using SOLR.
>
> So is Solr a straight away NO candidate in a situation where 1 records
> should be retrieved in a time of <=200 mS.
>
> A quick response would be very helpful.
>
> Thanks!
> Mark
>


Retrieving 1000 records at a time

2016-02-17 Thread Mark Robinson
Hi,

I have a requirement where I need to retrieve 1 to 15000 records at a
time from SOLR.
With 20 or 100 records everything happens in milliseconds.
When it goes to 1000, 1  it is taking more time... like even 30 seconds.

Will Solr be able to return 1 records at a time in less than say 200
milliseconds?

I have read that disk read is a costly affair so we have to batch results
and lesser the number of records retrieved in a batch the faster the
response when using SOLR.

So is Solr a straight away NO candidate in a situation where 1 records
should be retrieved in a time of <=200 mS.

A quick response would be very helpful.

Thanks!
Mark