On 31-Oct-07, at 11:54 PM, Haishan Chen wrote:


Date: Wed, 31 Oct 2007 17:54:53 -0700> Subject: Re: Phrase Query Performance Question> From: [EMAIL PROTECTED]> To: solr- [EMAIL PROTECTED]> > "hurricane katrina" is a very expensive query against a collection> focused on Hurricane Katrina. There will be many matches in many> documents. If you want to measure worst-case, this is fine.> > I'd try other things, like:> > * ninth ward> * Ray Nagin> * Audubon Park> * Canal Street> * French Quarter> * FEMA mistakes> * storm surge> * Jackson Square> > Of course, real query logs are the only real test.> > wunder

These terms are not frequent in my index. I believe they are going to be fast. The thing is that I feel 2 million documents is a small index. 100,000 or 200,000 hits is a small set and should always have sub second query performance. Now I am only querying one field and the response is almost one second. I feel I can't achieve sub second performance if I add a bit more complexity to the query.

Many of the category terms in my index will appear in more than 5% of the documents and those category terms are very popular search
terms. So the example I gave were not extreme cases for my index

I think that you are somewhat misguided about what constitutes a small set. A query term that appears in 5-10% of the index in a natural language corpus is _extremely_ frequent. Not quite on the order of stopwords, but getting there. As a comparison, on an extremely large corpus that I have handy, documents containing both the word 'auto' and 'repair' (not necessarily adjacent) constitute 0.1% of the index. The frequency of the phrase "auto repair" is 0.025%.

@200k docs would be the response rate from an 800million-doc corpus.

What data are you indexing, what what is the intended effect of the phrase queries you are performing? Perhaps getting at the issue from this end would be more productive than hammering at the phrasequery performance question.

When I start tomcat I saw this message:
The Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path

Is that mean if I use Apache Tomcat Native library the query performance will be better. Anyone has experience on that?

Unlikely, though it might help you slightly at a high query rate with high cache hit ratios.

-Mike

Reply via email to