That does seem really slow. Is the index on NFS-mounted storage? wunder
On 2/12/08 7:04 AM, "Erick Erickson" <[EMAIL PROTECTED]> wrote: > Well, the *first* sort to the underlying Lucene engine is expensive since > it builds up the terms to sort. I wonder if you're closing and opening the > underlying searcher for every request? This is a definite limiter. > > Disclaimer: I mostly do Lucene, not SOLR (yet), so don't *even* ask > me how to change this behavior <G>. But your comment about > frequent updates to the index prompted this question.... > > Best > Erick > > On Feb 12, 2008 3:54 AM, James Brady <[EMAIL PROTECTED]> wrote: > >> Hi again, >> More analysis showed that the extraordinarily long query times only >> appeared when I specify a sort. A concrete example: >> >> For a querystring such as: ?indent=on&version=2.2&q=apache+user_id% >> 3A39&start=0&rows=1&fl=*%2Cscore&qt=standard&wt=standard&explainOther= >> The QTime is ~500ms. >> For a querystring such as: ?indent=on&version=2.2&q=apache+user_id% >> 3A39&start=0&rows=1&fl=*% >> 2Cscore&qt=standard&wt=standard&explainOther=&sort=date_added%20asc >> The QTime is ~75s >> >> I.e. I am using the StandardRequestHandler to search for a user >> entered term ("apache" above) and filtering by a user_id field. >> >> This seems to be the case for every sort option except score asc and >> score desc. Please tell me Solr doesn't sort all matching documents >> before applying boolean filters? >> >> James >> >> Begin forwarded message: >> >>> From: James Brady <[EMAIL PROTECTED]> >>> Date: 11 February 2008 23:38:16 GMT-08:00 >>> To: solr-user@lucene.apache.org >>> Subject: Performance help for heavy indexing workload >>> >>> Hello, >>> I'm looking for some configuration guidance to help improve >>> performance of my application, which tends to do a lot more >>> indexing than searching. >>> >>> At present, it needs to index around two documents / sec - a >>> document being the stripped content of a webpage. However, >>> performance was so poor that I've had to disable indexing of the >>> webpage content as an emergency measure. In addition, some search >>> queries take an inordinate length of time - regularly over 60 seconds. >>> >>> This is running on a medium sized EC2 instance (2 x 2GHz Opterons >>> and 8GB RAM), and there's not too much else going on on the box. In >>> total, there are about 1.5m documents in the index. >>> >>> I'm using a fairly standard configuration - the things I've tried >>> changing so far have been parameters like maxMergeDocs, mergeFactor >>> and the autoCommit options. I'm only using the >>> StandardRequestHandler, no faceting. I have a scheduled task >>> causing a database commit every 15 seconds. >>> >>> Obviously, every workload varies, but could anyone comment on >>> whether this sort of hardware should, with proper configuration, be >>> able to manage this sort of workload? >>> >>> I can't see signs of Solr being IO-bound, CPU-bound or memory- >>> bound, although my scheduled commit operation, or perhaps GC, does >>> spike up the CPU utilisation at intervals. >>> >>> Any help appreciated! >>> James >> >>