Re: Performance help for heavy indexing workload

Walter Underwood Tue, 12 Feb 2008 07:48:32 -0800

That does seem really slow. Is the index on NFS-mounted storage?

wunder


On 2/12/08 7:04 AM, "Erick Erickson" <[EMAIL PROTECTED]> wrote:

> Well, the *first* sort to the underlying Lucene engine is expensive since
> it builds up the terms to sort. I wonder if you're closing and opening the
> underlying searcher for every request? This is a definite limiter.
> 
> Disclaimer: I mostly do Lucene, not SOLR (yet), so don't *even* ask
> me how to change this behavior <G>. But your comment about
> frequent updates to the index prompted this question....
> 
> Best
> Erick
> 
> On Feb 12, 2008 3:54 AM, James Brady <[EMAIL PROTECTED]> wrote:
> 
>> Hi again,
>> More analysis showed that the extraordinarily long query times only
>> appeared when I specify a sort. A concrete example:
>> 
>> For a querystring such as: ?indent=on&version=2.2&q=apache+user_id%
>> 3A39&start=0&rows=1&fl=*%2Cscore&qt=standard&wt=standard&explainOther=
>> The QTime is ~500ms.
>> For a querystring such as: ?indent=on&version=2.2&q=apache+user_id%
>> 3A39&start=0&rows=1&fl=*%
>> 2Cscore&qt=standard&wt=standard&explainOther=&sort=date_added%20asc
>> The QTime is ~75s
>> 
>> I.e. I am using the StandardRequestHandler to search for a user
>> entered term ("apache" above) and filtering by a user_id field.
>> 
>> This seems to be the case for every sort option except score asc and
>> score desc. Please tell me Solr doesn't sort all matching documents
>> before applying boolean filters?
>> 
>> James
>> 
>> Begin forwarded message:
>> 
>>> From: James Brady <[EMAIL PROTECTED]>
>>> Date: 11 February 2008 23:38:16 GMT-08:00
>>> To: solr-user@lucene.apache.org
>>> Subject: Performance help for heavy indexing workload
>>> 
>>> Hello,
>>> I'm looking for some configuration guidance to help improve
>>> performance of my application, which tends to do a lot more
>>> indexing than searching.
>>> 
>>> At present, it needs to index around two documents / sec - a
>>> document being the stripped content of a webpage. However,
>>> performance was so poor that I've had to disable indexing of the
>>> webpage content as an emergency measure. In addition, some search
>>> queries take an inordinate length of time - regularly over 60 seconds.
>>> 
>>> This is running on a medium sized EC2 instance (2 x 2GHz Opterons
>>> and 8GB RAM), and there's not too much else going on on the box. In
>>> total, there are about 1.5m documents in the index.
>>> 
>>> I'm using a fairly standard configuration - the things I've tried
>>> changing so far have been parameters like maxMergeDocs, mergeFactor
>>> and the autoCommit options. I'm only using the
>>> StandardRequestHandler, no faceting. I have a scheduled task
>>> causing a database commit every 15 seconds.
>>> 
>>> Obviously, every workload varies, but could anyone comment on
>>> whether this sort of hardware should, with proper configuration, be
>>> able to manage this sort of workload?
>>> 
>>> I can't see signs of Solr being IO-bound, CPU-bound or memory-
>>> bound, although my scheduled commit operation, or perhaps GC, does
>>> spike up the CPU utilisation at intervals.
>>> 
>>> Any help appreciated!
>>> James
>> 
>>

Re: Performance help for heavy indexing workload

Reply via email to