Re: Phrase Query Performance Question and score threshold
On 11/5/07, Haishan Chen [EMAIL PROTECTED] wrote: As for the first issues. The number of different phrase queries have performance issues I found so far are about 10. If these are normal phrase queries (no slop), a good solution might be to simply index and query these phrases as a single token. One could do this with a SynonymFilter. Oh, and no, a score threshold won't help performance. I believe there will be a lot more I just haven't tried. It can be solve by using faster hard ware though. Also I believe it will help if SOLR has samilar distributed search architecture like NUTCH so that it can scale out instead of scale up. It's coming... -Yonik
Re: Phrase Query Performance Question
He means extremely frequent and I agree. --wunder On 11/2/07 1:51 AM, Haishan Chen [EMAIL PROTECTED] wrote: Thanks for the advice. You certainly have a point. I believe you mean a query term that appears in 5-10% of an index in a natural language corpus is extremely INFREQUENT?
RE: Phrase Query Performance Question
From: [EMAIL PROTECTED] Subject: Re: Phrase Query Performance Question Date: Thu, 1 Nov 2007 11:25:26 -0700 To: solr-user@lucene.apache.org On 31-Oct-07, at 11:54 PM, Haishan Chen wrote:Date: Wed, 31 Oct 2007 17:54:53 -0700 Subject: Re: Phrase Query Performance Question From: [EMAIL PROTECTED] To: solr- [EMAIL PROTECTED] hurricane katrina is a very expensive query against a collection focused on Hurricane Katrina. There will be many matches in many documents. If you want to measure worst-case, this is fine. I'd try other things, like: * ninth ward * Ray Nagin * Audubon Park * Canal Street * French Quarter * FEMA mistakes * storm surge * Jackson Square Of course, real query logs are the only real test. wunder These terms are not frequent in my index. I believe they are going to be fast. The thing is that I feel 2 million documents is a small index. 100,000 or 200,000 hits is a small set and should always have sub second query performance. Now I am only querying one field and the response is almost one second. I feel I can't achieve sub second performance if I add a bit more complexity to the query. Many of the category terms in my index will appear in more than 5% of the documents and those category terms are very popular search terms. So the example I gave were not extreme cases for my index I think that you are somewhat misguided about what constitutes a small set. A query term that appears in 5-10% of the index in a natural language corpus is _extremely_ frequent. Not quite on the order of stopwords, but getting there. As a comparison, on an extremely large corpus that I have handy, documents containing both the word 'auto' and 'repair' (not necessarily adjacent) constitute 0.1% of the index. The frequency of the phrase auto repair is 0.025%. @200k docs would be the response rate from an 800million-doc corpus. What data are you indexing, what what is the intended effect of the phrase queries you are performing? Perhaps getting at the issue from this end would be more productive than hammering at the phrasequery performance question. Thanks for the advice. You certainly have a point. I believe you mean a query term that appears in 5-10% of an index in a natural language corpus is extremely INFREQUENT? When I start tomcat I saw this message: The Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path Is that mean if I use Apache Tomcat Native library the query performance will be better. Anyone has experience on that? Unlikely, though it might help you slightly at a high query rate with high cache hit ratios. -Mike I have try Apache Tomcat Native library on my window machine and you are right. No obvious difference on query performance I have try the index on a linux machine. The windows machine: Windows 2003, one intel(R) Xeon(TM) CPU 3.00 GHZ (Quo-core cpu) 4G Ram The linux machine: (not sure what version of linux), two Intel(R) Xeon(R) CPU E5310 1.6 GHZ (Quo-core cpu) 4G Ram Both system have raid5 but I don't know the difference. I found substantial indexing performance improvement on the linux machine. On the windows machine it took more than 5 hours. But it took only one hour to index 2 million documents on the linux system. I am really happy to see that. I guess both linux and the extra CPU contributed to the improvement. Query performance are almost the same though. The cpu on linux machine is slower so I think if the linux system were using the same cpu as the windows system query performance will improve too. Both index and query are cpu bound. If I am right. I guess I got enough on this question. But I still want to try the solr-trunk. Will update with everyone later. Thanks -Haishan _ Boo! Scare away worms, viruses and so much more! Try Windows Live OneCare! http://onecare.live.com/standard/en-us/purchase/trial.aspx?s_cid=wl_hotmailnews
Re: Phrase Query Performance Question
On 2-Nov-07, at 10:03 AM, Haishan Chen wrote: Date: Fri, 2 Nov 2007 07:32:30 -0700 Subject: Re: Phrase Query Performance Question From: [EMAIL PROTECTED] To: solr- [EMAIL PROTECTED] He means extremely frequent and I agree. --wunder Then it means a PHRASE (combination of terms except stopwords) appear in 5% to 10% of an index should NOT be that frequent? I guess I get the idea. Phrases should be rarer than individual keywords. 5-10% is moderately high even for a _single_ keyword, let alone the conjunction of two keywords, let alone the _exact phrase_ of two keywords (non stopwords in all of this discussion). As I mentioned, the 'natural' rate of 'auto'+'repair' on a corpus 100's of times bigger than yours (web documents) is .1%, and the rate of the phrase 'auto repair' is .025%. It still feels to me that you are trying doing something unique with your phrase queries. Unfortunately, you still haven't said what you are trying to do in general terms, which makes it very difficult for people to help you. -Mike
Re: Phrase Query Performance Question
: It still feels to me that you are trying doing something unique with your : phrase queries. Unfortunately, you still haven't said what you are trying to : do in general terms, which makes it very difficult for people to help you. Agreed. This seems very special case, but we dont' know what the case is. If there are specific phrases you know in advance that you will care about, and those phrases occur as frequetnly as the individual words, then the best way to deal with them is to index each phrase as a single Term (and ignore the individual words) Speaking more generally to mike's point... http://people.apache.org/~hossman/#xyproblem Your question appears to be an XY Problem ... that is: you are dealing with X, you are assuming Y will help you, and you are asking about Y without giving more details about the X so that we can understand the full issue. Perhaps the best solution doesn't involve Y at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 -Hoss
RE: Phrase Query Performance Question
Date: Fri, 2 Nov 2007 12:31:29 -0700 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Re: Phrase Query Performance Question : It still feels to me that you are trying doing something unique with your : phrase queries. Unfortunately, you still haven't said what you are trying to : do in general terms, which makes it very difficult for people to help you. Agreed. This seems very special case, but we dont' know what the case is. If there are specific phrases you know in advance that you will care about, and those phrases occur as frequetnly as the individual words, then the best way to deal with them is to index each phrase as a single Term (and ignore the individual words) Speaking more generally to mike's point... http://people.apache.org/~hossman/#xyproblem Your question appears to be an XY Problem ... that is: you are dealing with X, you are assuming Y will help you, and you are asking about Y without giving more details about the X so that we can understand the full issue. Perhaps the best solution doesn't involve Y at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 -Hoss I think the documents I was indexing can not be considered a natural language documents. It is constructed following certain rules and then feed into the indexing process. I guess because of the rules many targeting searching terms have high document frequency. I am not in obligation to achieve the quarter second performance I am just interested to see whether it is achievable. Thanks everyone for offering advice -Haishan _ Help yourself to FREE treats served up daily at the Messenger Café. Stop by today. http://www.cafemessenger.com/info/info_sweetstuff2.html?ocid=TXT_TAGLM_OctWLtagline
Re: Phrase Query Performance Question
On 31-Oct-07, at 11:54 PM, Haishan Chen wrote: Date: Wed, 31 Oct 2007 17:54:53 -0700 Subject: Re: Phrase Query Performance Question From: [EMAIL PROTECTED] To: solr- [EMAIL PROTECTED] hurricane katrina is a very expensive query against a collection focused on Hurricane Katrina. There will be many matches in many documents. If you want to measure worst-case, this is fine. I'd try other things, like: * ninth ward * Ray Nagin * Audubon Park * Canal Street * French Quarter * FEMA mistakes * storm surge * Jackson Square Of course, real query logs are the only real test. wunder These terms are not frequent in my index. I believe they are going to be fast. The thing is that I feel 2 million documents is a small index. 100,000 or 200,000 hits is a small set and should always have sub second query performance. Now I am only querying one field and the response is almost one second. I feel I can't achieve sub second performance if I add a bit more complexity to the query. Many of the category terms in my index will appear in more than 5% of the documents and those category terms are very popular search terms. So the example I gave were not extreme cases for my index I think that you are somewhat misguided about what constitutes a small set. A query term that appears in 5-10% of the index in a natural language corpus is _extremely_ frequent. Not quite on the order of stopwords, but getting there. As a comparison, on an extremely large corpus that I have handy, documents containing both the word 'auto' and 'repair' (not necessarily adjacent) constitute 0.1% of the index. The frequency of the phrase auto repair is 0.025%. @200k docs would be the response rate from an 800million-doc corpus. What data are you indexing, what what is the intended effect of the phrase queries you are performing? Perhaps getting at the issue from this end would be more productive than hammering at the phrasequery performance question. When I start tomcat I saw this message: The Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path Is that mean if I use Apache Tomcat Native library the query performance will be better. Anyone has experience on that? Unlikely, though it might help you slightly at a high query rate with high cache hit ratios. -Mike
RE: Phrase Query Performance Question
From: [EMAIL PROTECTED] Subject: Re: Phrase Query Performance Question Date: Tue, 30 Oct 2007 11:22:17 -0700 To: solr-user@lucene.apache.org On 30-Oct-07, at 6:09 AM, Yonik Seeley wrote: On 10/30/07, Haishan Chen [EMAIL PROTECTED] wrote: Thanks a lot for replying Yonik! I am running solr on a windows 2003 server (standard version). intel Xeon CPU 3.00GHz, with 4.00 GB RAM. The index is locate on Raid5 with 2 million documents. Is there any way to improve query performance without moving to more powerful computer? I understand that the query performances of phrase query (auto repair) has to do with the number of documents containing the two words. In fact the number of documents that have auto and repair are about 10. It is like 5% of the documents containing auto and repair. It seems to me 937 ms is too slower. Chen, that does seem slow I'm not sure why. 1) was this the first search on the index? if so, try running some other searches to warm things up first. Indeed--phrase matching uses a completely different part of the index, so that needs to be warmed too. One thing to try is solr trunk: it contains some speedups for phrase queries (though perhaps not as substantial as you hope for). -MIke Thanks for replying. The statistics I collected were not on the first query. And I believe I was runing JVM on server mode. I configure tomcat to use the server version of JVM.dll. I guess that is the way to set it on windows. I execute the same phrase query (auto repair) over and over again and that is the best performance I observe. Also when I did the test I disable all solr cache. I want to see the performance without Solr cache I am currently trying to test the index on linux system with similar hardware. It will take me some time to set it up. I read a discussion between Doug cutting and Andrzej Bialecki about lucene performance. http://mail-archives.apache.org/mod_mbox/lucene-java-user/200512.mbox/[EMAIL PROTECTED] It mentioned that http://websearch.archive.org/katrina/ (in nutch) had 10M documents and a search of hurricane katrina was able to return in 1.35 seconds with 600,867 hits. Althought the computer it was using might be more powerful than mine. I feel 937ms for a phrase query on a single field is kind of slower. Nutch actually expand a search to more complex queries. My index and the number of hits on my query (auto repair) is about one fifth of websearch.archive.org and its testing query. So I feel a reasonable performance for my query should be less than 300 ms. I am not sure if I am right on that logic. Anyway I will collect the statistic on linux first and try out other options. Thanks a lot Haishan _ Windows Live Hotmail and Microsoft Office Outlook – together at last. Get it now. http://office.microsoft.com/en-us/outlook/HA102225181033.aspx?pid=CL100626971033
Re: Phrase Query Performance Question
On 31-Oct-07, at 2:40 PM, Haishan Chen wrote: http://mail-archives.apache.org/mod_mbox/lucene-java-user/ 200512.mbox/[EMAIL PROTECTED] It mentioned that http://websearch.archive.org/katrina/ (in nutch) had 10M documents and a search of hurricane katrina was able to return in 1.35 seconds with 600,867 hits. Althought the computer it was using might be more powerful than mine. I feel 937ms for a phrase query on a single field is kind of slower. Nutch actually expand a search to more complex queries. My index and the number of hits on my query (auto repair) is about one fifth of websearch.archive.org and its testing query. So I feel a reasonable performance for my query should be less than 300 ms. I am not sure if I am right on that logic. I'm not sure that it is reasonable, but I'm not sure that it isn't. However, have you tried other queries? 937ms seems a little high, even for phrase queries. Anyway I will collect the statistic on linux first and try out other options. Have you tried using the performance enhancements present in solr-trunk? -Mike
RE: Phrase Query Performance Question
From: [EMAIL PROTECTED] Subject: Re: Phrase Query Performance Question Date: Wed, 31 Oct 2007 15:25:42 -0700 To: solr-user@lucene.apache.org On 31-Oct-07, at 2:40 PM, Haishan Chen wrote: http://mail-archives.apache.org/mod_mbox/lucene-java-user/ 200512.mbox/[EMAIL PROTECTED] It mentioned that http://websearch.archive.org/katrina/ (in nutch) had 10M documents and a search of hurricane katrina was able to return in 1.35 seconds with 600,867 hits. Althought the computer it was using might be more powerful than mine. I feel 937ms for a phrase query on a single field is kind of slower. Nutch actually expand a search to more complex queries. My index and the number of hits on my query (auto repair) is about one fifth of websearch.archive.org and its testing query. So I feel a reasonable performance for my query should be less than 300 ms. I am not sure if I am right on that logic. I'm not sure that it is reasonable, but I'm not sure that it isn't. However, have you tried other queries? 937ms seems a little high, even for phrase queries. Anyway I will collect the statistic on linux first and try out other options. Have you tried using the performance enhancements present in solr-trunk? -Mike Here are some query statistic. The phrase queries look slow to me. These are queries have more than 10 hits. For those return a couple thousand hits the responds time is quite fast. But this is query on one field only. (auto repair) 100384 hits 946 ms(auto repair) 100384 hits 31ms(car repair~100) 112183 hits 766 ms(car repair)112183 hits 63 ms(business service~100) 1209751 hits 1500 ms(business service) 1209751 hits 234 ms(shopping center~100) 119481 hits 359 ms(shopping center~100) 119481 hits 63 ms I don't know what is solr-trunk yet but I will find out Thank you Haishan _ Climb to the top of the charts! Play Star Shuffle: the word scramble challenge with star power. http://club.live.com/star_shuffle.aspx?icid=starshuffle_wlmailtextlink_oct
Re: Phrase Query Performance Question
hurricane katrina is a very expensive query against a collection focused on Hurricane Katrina. There will be many matches in many documents. If you want to measure worst-case, this is fine. I'd try other things, like: * ninth ward * Ray Nagin * Audubon Park * Canal Street * French Quarter * FEMA mistakes * storm surge * Jackson Square Of course, real query logs are the only real test. wunder On 10/31/07 3:25 PM, Mike Klaas [EMAIL PROTECTED] wrote: On 31-Oct-07, at 2:40 PM, Haishan Chen wrote: http://mail-archives.apache.org/mod_mbox/lucene-java-user/ 200512.mbox/[EMAIL PROTECTED] It mentioned that http://websearch.archive.org/katrina/ (in nutch) had 10M documents and a search of hurricane katrina was able to return in 1.35 seconds with 600,867 hits. Althought the computer it was using might be more powerful than mine. I feel 937ms for a phrase query on a single field is kind of slower. Nutch actually expand a search to more complex queries. My index and the number of hits on my query (auto repair) is about one fifth of websearch.archive.org and its testing query. So I feel a reasonable performance for my query should be less than 300 ms. I am not sure if I am right on that logic. I'm not sure that it is reasonable, but I'm not sure that it isn't. However, have you tried other queries? 937ms seems a little high, even for phrase queries. Anyway I will collect the statistic on linux first and try out other options. Have you tried using the performance enhancements present in solr-trunk? -Mike
RE: Phrase Query Performance Question
: (auto repair) 100384 hits 946 ms(auto repair) 100384 hits 31ms(car : repair~100) 112183 hits 766 ms(car repair) 112183 hits 63 : ms(business service~100) 1209751 hits 1500 ms(business service) : 1209751 hits 234 ms(shopping center~100) 119481 hits 359 : ms(shopping center~100) 119481 hits 63 ms if i'm reading those numbers right, every document in your corpus containing the words auto or repair also contains the exact phrase auto repair with no slop ... this seems HIGHLY unlikely. can you show us *exactly* what the query URLs you are using look like, and show us what the request handler section of your solrconfig.xml looks like. also: where are you getting these times from? are these from the logging output solr produces, or from the client you have hitting solr? : I don't know what is solr-trunk yet but I will find out he's refering to the unreleased develoment code, which you can checkout from the trunk of the SOlr subversion repository... http://lucene.apache.org/solr/version_control.html -Hoss
RE: Phrase Query Performance Question
Date: Wed, 31 Oct 2007 19:19:07 -0700 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: RE: Phrase Query Performance Question : (auto repair) 100384 hits 946 ms(auto repair) 100384 hits 31ms(car : repair~100) 112183 hits 766 ms(car repair) 112183 hits 63 : ms(business service~100) 1209751 hits 1500 ms(business service) : 1209751 hits 234 ms(shopping center~100) 119481 hits 359 : ms(shopping center~100) 119481 hits 63 ms if i'm reading those numbers right, every document in your corpus containing the words auto or repair also contains the exact phrase auto repair with no slop ... this seems HIGHLY unlikely. can you show us *exactly* what the query URLs you are using look like, and show us what the request handler section of your solrconfig.xml looks like. Yes that's exactly what the documents are like. The documents are categorized. I indexed the category with the content of the documents using text field type. The URL I used is select?q=content:(auto repair~100)fl=title. All other options like faceting, highlighting are not used. also: where are you getting these times from? are these from the logging output solr produces, or from the client you have hitting solr? : I don't know what is solr-trunk yet but I will find out he's refering to the unreleased develoment code, which you can checkout from the trunk of the SOlr subversion repository... http://lucene.apache.org/solr/version_control.html -Hoss I am getting the time from the client browser Thanks -Haishan _ Help yourself to FREE treats served up daily at the Messenger Café. Stop by today. http://www.cafemessenger.com/info/info_sweetstuff2.html?ocid=TXT_TAGLM_OctWLtagline
RE: Phrase Query Performance Question
Date: Wed, 31 Oct 2007 17:54:53 -0700 Subject: Re: Phrase Query Performance Question From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org hurricane katrina is a very expensive query against a collection focused on Hurricane Katrina. There will be many matches in many documents. If you want to measure worst-case, this is fine. I'd try other things, like: * ninth ward * Ray Nagin * Audubon Park * Canal Street * French Quarter * FEMA mistakes * storm surge * Jackson Square Of course, real query logs are the only real test. wunder These terms are not frequent in my index. I believe they are going to be fast. The thing is that I feel 2 million documents is a small index. 100,000 or 200,000 hits is a small set and should always have sub second query performance. Now I am only querying one field and the response is almost one second. I feel I can't achieve sub second performance if I add a bit more complexity to the query. Many of the category terms in my index will appear in more than 5% of the documents and those category terms are very popular search terms. So the example I gave were not extreme cases for my index When I start tomcat I saw this message: The Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path Is that mean if I use Apache Tomcat Native library the query performance will be better. Anyone has experience on that? Thanks a lot -Haishan On 10/31/07 3:25 PM, Mike Klaas [EMAIL PROTECTED] wrote: On 31-Oct-07, at 2:40 PM, Haishan Chen wrote: http://mail-archives.apache.org/mod_mbox/lucene-java-user/ 200512.mbox/[EMAIL PROTECTED] It mentioned that http://websearch.archive.org/katrina/ (in nutch) had 10M documents and a search of hurricane katrina was able to return in 1.35 seconds with 600,867 hits. Althought the computer it was using might be more powerful than mine. I feel 937ms for a phrase query on a single field is kind of slower. Nutch actually expand a search to more complex queries. My index and the number of hits on my query (auto repair) is about one fifth of websearch.archive.org and its testing query. So I feel a reasonable performance for my query should be less than 300 ms. I am not sure if I am right on that logic.I'm not sure that it is reasonable, but I'm not sure that it isn't. However, have you tried other queries? 937ms seems a little high, even for phrase queries. Anyway I will collect the statistic on linux first and try out other options.Have you tried using the performance enhancements present in solr-trunk?-Mike _ Peek-a-boo FREE Tricks Treats for You! http://www.reallivemoms.com?ocid=TXT_TAGHMloc=us
RE: Phrase Query Performance Question
Thanks a lot for replying Yonik! I am running solr on a windows 2003 server (standard version). intel Xeon CPU 3.00GHz, with 4.00 GB RAM. The index is locate on Raid5 with 2 million documents. Is there any way to improve query performance without moving to more powerful computer? I understand that the query performances of phrase query (auto repair) has to do with the number of documents containing the two words. In fact the number of documents that have auto and repair are about 10. It is like 5% of the documents containing auto and repair. It seems to me 937 ms is too slower. Would it be faster if I run solr on linux system? If it is then how much faster it would be generally? My performance target for this kind of phrase query is a quarter of a second or so. Any advice on how to achieve this on the above hardware? Thanks a lot Haishan Re: phrase query performanceYonik SeeleyFri, 26 Oct 2007 08:09:52 -0700 The differences lie in Lucene.Instead of thinking of phrase queries as slow, think of term queries as fast :-)Phrase queries need to read and consider position information thatterm queries do not. -Yonik On 10/26/07, Haishan Chen [EMAIL PROTECTED] wrote: I am a new Solr user and wonder if anyone can help me these questions. I used Solr to index about two million documents and query on it using standard request handler. I disabled all cache. I found phrase query was substantially slower than the usual query. The statistic I collected is as following. I was doing the query on the one field only. content:(auto repair) 47 ms repeatablecontent:(auto repair) 937 ms repeatablecontent:(auto repair~1) 766 ms repeatable What are the factors affecting phrase query performance? How come the phrase query content:(auto repair) is almost 20 times slower than content:(auto repair)? I also notice a the phrase query with a slop is always faster than the one without a slop. Is the difference I observe here a performance problem of Lucene or Solr? It will be appreciated if anyone can help _ Boo! Scare away worms, viruses and so much more! Try Windows Live OneCare! http://onecare.live.com/standard/en-us/purchase/trial.aspx?s_cid=wl_hotmailnews
Re: Phrase Query Performance Question
On 10/30/07, Haishan Chen [EMAIL PROTECTED] wrote: Thanks a lot for replying Yonik! I am running solr on a windows 2003 server (standard version). intel Xeon CPU 3.00GHz, with 4.00 GB RAM. The index is locate on Raid5 with 2 million documents. Is there any way to improve query performance without moving to more powerful computer? I understand that the query performances of phrase query (auto repair) has to do with the number of documents containing the two words. In fact the number of documents that have auto and repair are about 10. It is like 5% of the documents containing auto and repair. It seems to me 937 ms is too slower. Chen, that does seem slow I'm not sure why. 1) was this the first search on the index? if so, try running some other searches to warm things up first. 2) was the jvm in server mode? (start with -server) 3) shut down unlrelated things on the system so that there is more memory available to the OS to cache the index files Would it be faster if I run solr on linux system? Maybe... Lucene does rely on the OS caching often used parts of the index, so this can differ the most between Windows and Linux. If you have a Linux box lying around, trying it out quick to remove that variable would be a good idea. -Yonik
Re: Phrase Query Performance Question
On 30-Oct-07, at 6:09 AM, Yonik Seeley wrote: On 10/30/07, Haishan Chen [EMAIL PROTECTED] wrote: Thanks a lot for replying Yonik! I am running solr on a windows 2003 server (standard version). intel Xeon CPU 3.00GHz, with 4.00 GB RAM. The index is locate on Raid5 with 2 million documents. Is there any way to improve query performance without moving to more powerful computer? I understand that the query performances of phrase query (auto repair) has to do with the number of documents containing the two words. In fact the number of documents that have auto and repair are about 10. It is like 5% of the documents containing auto and repair. It seems to me 937 ms is too slower. Chen, that does seem slow I'm not sure why. 1) was this the first search on the index? if so, try running some other searches to warm things up first. Indeed--phrase matching uses a completely different part of the index, so that needs to be warmed too. One thing to try is solr trunk: it contains some speedups for phrase queries (though perhaps not as substantial as you hope for). -MIke
Phrase Query Performance Question
I am a new Solr user and wonder if anyone can help me these questions. I used Solr to index about two million documents and query on it using standard request handler. I disabled all cache. I found phrase query was substantially slower than the usual query. The statistic I collected is as following. I was doing the query on the one field only. content:(auto repair)47 ms repeatablecontent:(auto repair) 937 ms repeatablecontent:(auto repair~1) 766 ms repeatable What are the factors affecting phrase query performance? How come the phrase query content:(auto repair) is almost 20 times slower than content:(auto repair)? I also notice a the phrase query with a slop is always faster than the one without a slop. Is the performance difference I observed here between phrase query and regular query a performance problem of Lucene or Solr? I was having trouble starting a new discussion thread eariler. Hopefully I do it right this time. It will be appreciated if anyone can help Haishan _ Climb to the top of the charts! Play Star Shuffle: the word scramble challenge with star power. http://club.live.com/star_shuffle.aspx?icid=starshuffle_wlmailtextlink_oct