Re: optimize requests that fetch 1000 rows

2016-02-12 Thread Shawn Heisey
On 2/12/2016 2:57 AM, Matteo Grolla wrote: > tell me if I'm wrong but qtime accounts for search time excluding the > fetch of stored fields (I have a 90ms qtime and a ~30s time to obtain the > results on the client on a LAN infrastructure for 300kB response). debug > explains how much of

Re: optimize requests that fetch 1000 rows

2016-02-12 Thread Jack Krupansky
Thanks for that critical clarification. Try... 1. A different response writer to see if that impacts the clock time. 2. Selectively remove fields from the fl field list to see if some particular field has some issue. 3. If you simply return only the ID for the document, how fast/slow is that?

Re: optimize requests that fetch 1000 rows

2016-02-12 Thread Matteo Grolla
Hi Jack, tell me if I'm wrong but qtime accounts for search time excluding the fetch of stored fields (I have a 90ms qtime and a ~30s time to obtain the results on the client on a LAN infrastructure for 300kB response). debug explains how much of qtime is used by each search component. For me

Re: optimize requests that fetch 1000 rows

2016-02-12 Thread Erick Erickson
I agree with everyone else that this seems very unusual, but here are some additional possible options: If (and only if) you're returning "simple" (i.e. numerics and strings) you could consider the Streaming Aggregation stuff. It's built to return rows without going to disk. The restriction is

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Binoy Dalal
If you're fetching large text fields, consider highlighting on them and just returning the snippets. I faced such a problem some time ago and highlighting sped things up nearly 10x for us. On Thu, 11 Feb 2016, 15:03 Matteo Grolla wrote: > Hi, > I'm trying to

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Matteo Grolla
Hi Upayavira, I'm working with solr 4.0, sorting on score (default). I tried setting the document cache size to 2048, so all docs of a single request fit (2 requests fit actually) If I execute a query the first time it takes 24s I reexecute it, with all docs in the documentCache and it

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Matteo Grolla
I see a lot of time spent in splitOnTokens which is called by (last part of stack trace) BinaryResponseWriter$Resolver.writeResultsBody() ... solr.search.ReturnsField.wantsField() commons.io.FileNameUtils.wildcardmatch() commons.io.FileNameUtils.splitOnTokens() 2016-02-11 15:42 GMT+01:00

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Matteo Grolla
Hi Yonic, after the first query I find 1000 docs in the document cache. I'm using curl to send the request and requesting javabin format to mimic the application. gc activity is low I managed to load the entire 50GB index in the filesystem cache, after that queries don't cause disk activity

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Yonik Seeley
On Thu, Feb 11, 2016 at 9:42 AM, Matteo Grolla wrote: > Hi Yonic, > after the first query I find 1000 docs in the document cache. > I'm using curl to send the request and requesting javabin format to mimic > the application. > gc activity is low > I managed to load

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Jack Krupansky
Is this a scenario that was working fine and suddenly deteriorated, or has it always been slow? -- Jack Krupansky On Thu, Feb 11, 2016 at 4:33 AM, Matteo Grolla wrote: > Hi, > I'm trying to optimize a solr application. > The bottleneck are queries that request

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Matteo Grolla
Responses have always been slow but previously time was dominated by faceting. After few optimization this is my bottleneck. My suggestion has been to properly implement paging and reduce rows, unfortunately this is not possible at least not soon 2016-02-11 16:18 GMT+01:00 Jack Krupansky

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Matteo Grolla
[image: Immagine incorporata 1] 2016-02-11 16:05 GMT+01:00 Matteo Grolla : > I see a lot of time spent in splitOnTokens > > which is called by (last part of stack trace) > > BinaryResponseWriter$Resolver.writeResultsBody() > ... > solr.search.ReturnsField.wantsField() >

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Matteo Grolla
Thanks Toke, yes, they are long times, and solr qtime (to execute the query) is a fraction of a second. The response in javabin format is around 300k. Currently I can't limit the rows requested or the fields requested, those are fixed for me. 2016-02-11 13:14 GMT+01:00 Toke Eskildsen

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Upayavira
On Thu, Feb 11, 2016, at 09:33 AM, Matteo Grolla wrote: > Hi, > I'm trying to optimize a solr application. > The bottleneck are queries that request 1000 rows to solr. > Unfortunately the application can't be modified at the moment, can you > suggest me what could be done on the solr side

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Alessandro Benedetti
Hi Matteo, as an addition to Upayavira observation, how is the memory assigned for that Solr Instance ? How much memory is assigned to Solr and how much left for the OS ? Is this a VM on top of a physical machine ? So it is the real physical memory used, or swapping could happen frequently ? Is

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Yonik Seeley
On Thu, Feb 11, 2016 at 7:45 AM, Matteo Grolla wrote: > Thanks Toke, yes, they are long times, and solr qtime (to execute the > query) is a fraction of a second. > The response in javabin format is around 300k. OK, That tells us a lot. And if you actually tested so that

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Jack Krupansky
Are queries scaling linearly - does a query for 100 rows take 1/10th the time (1 sec vs. 10 sec or 3 sec vs. 30 sec)? Does the app need/expect exactly 1,000 documents for the query or is that just what this particular query happened to return? What does they query look like? Is it complex or use

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Jack Krupansky
Good to know. Hmmm... 200ms for 10 rows is not outrageously bad, but still relatively bad. Even 50ms for 10 rows would be considered barely okay. But... again it depends on query complexity - simple queries should be well under 50 ms for decent modern hardware. -- Jack Krupansky On Thu, Feb 11,

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Matteo Grolla
Hi Jack, response time scale with rows. Relationship doens't seem linear but Below 400 rows times are much faster, I view query times from solr logs and they are fast the same query with rows = 1000 takes 8s with rows = 10 takes 0.2s 2016-02-11 16:22 GMT+01:00 Jack Krupansky

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Jack Krupansky
Again, first things first... debugQuery=true and see which Solr search components are consuming the bulk of qtime. -- Jack Krupansky On Thu, Feb 11, 2016 at 11:33 AM, Matteo Grolla wrote: > virtual hardware, 200ms is taken on the client until response is written to >

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Alessandro Benedetti
Out of curiosity, have you tried to debug that solr version to see which text arrives to the splitOnTokens method ? In latest solr that part has changed completely. Would be curious to understand what it tries to tokenise by ? and * ! Cheers On 11 February 2016 at 16:33, Matteo Grolla

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Matteo Grolla
virtual hardware, 200ms is taken on the client until response is written to disk qtime on solr is ~90ms not great but acceptable Is it possible that the method FilenameUtils.splitOnTokens is really so heavy when requesting a lot of rows on slow hardware? 2016-02-11 17:17 GMT+01:00 Jack Krupansky

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Toke Eskildsen
On Thu, 2016-02-11 at 11:53 +0100, Matteo Grolla wrote: > I'm working with solr 4.0, sorting on score (default). > I tried setting the document cache size to 2048, so all docs of a single > request fit (2 requests fit actually) > If I execute a query the first time it takes 24s > I