Query Performance of -field:[* TO *]
Hi guys, I'm trying to limit the size of my index and one of the things I have done is to not populate certain fields when the majority of the document have that value. For example, if most of my documents in my index have a field color which has the value green I will not populate that field. Only if it's another color will I will populate the field. My question is if I try to make a query such as -color:[* TO *] will it be much slower than -color:green? Thanks for your help in advance. Best, Sammy Yu
debugQuery missing boost
Hi, I'm trying to get some information how boost is used in the ranking calculation via the debugQuery parameter for the following query: (bodytext:iphone OR bodytext:firmware)^2.0 OR dateCreatedYear:2009^5.0 For one of the matching documents I can see: 4.7144237 = (MATCH) sum of: 2.2903786 = (MATCH) sum of: 0.7662499 = (MATCH) weight(bodytext:iphon in 8339166), product of: 0.427938 = queryWeight(bodytext:iphon), product of: 5.729801 = idf(docFreq=76646, numDocs=8682037) 0.07468636 = queryNorm 1.7905629 = (MATCH) fieldWeight(bodytext:iphon in 8339166), product of: 1.0 = tf(termFreq(bodytext:iphon)=1) 5.729801 = idf(docFreq=76646, numDocs=8682037) 0.3125 = fieldNorm(field=bodytext, doc=8339166) 1.5241286 = (MATCH) weight(bodytext:firmwar in 8339166), product of: 0.60354054 = queryWeight(bodytext:firmwar), product of: 8.081 = idf(docFreq=7300, numDocs=8682037) 0.07468636 = queryNorm 2.5253127 = (MATCH) fieldWeight(bodytext:firmwar in 8339166), product of: 1.0 = tf(termFreq(bodytext:firmwar)=1) 8.081 = idf(docFreq=7300, numDocs=8682037) 0.3125 = fieldNorm(field=bodytext, doc=8339166) 2.424045 = (MATCH) weight(dateCreatedYear:2009^5.0 in 8339166), product of: 0.6727613 = queryWeight(dateCreatedYear:2009^5.0), product of: 5.0 = boost 3.603128 = idf(docFreq=642831, numDocs=8682037) 0.03734318 = queryNorm 3.603128 = (MATCH) fieldWeight(dateCreatedYear:2009 in 8339166), product of: 1.0 = tf(termFreq(dateCreatedYear:2009)=1) 3.603128 = idf(docFreq=642831, numDocs=8682037) 1.0 = fieldNorm(field=dateCreatedYear, doc=8339166) which shows that the 5.0 boost in dateCreatedYear:2009^5.0 is being applied however, the 2.0 boost is missing in (bodytext:iphone OR bodytext:firmware)^2.0 How is the 2.0 boost being applied to the score? Thanks, Sammy
Re: SOLR 1.4 and 1.3 diff and other
Hi Yonik, Thanks for the quick response. Do you know the release schedule when 1.4 would be released or if it is possible to backport the NIO implementation into 1.3? If you could give me a pointer that would be great. It seems like a huge performance gain that would be of value to a lot of people. Thanks, Sammy On Wed, Dec 17, 2008 at 5:36 PM, Yonik Seeley ysee...@gmail.com wrote: On Wed, Dec 17, 2008 at 7:52 PM, Sammy Yu temi...@gmail.com wrote: I read somewhere that there are contention issues with the current cache implementation of LRUCache in 1.3 in that it is synchronous, could this be the reason why the filter query are slow? Probably not. The change is much more likely due to using a non-blocking NIO implementation in Lucene. -Yonik
Slow Response time after optimize
Hi guys, I have a typical master/slave setup running with Solr 1.3.0. I did some basic scalability test with JMeter and tweaked our environment and determined that we can handle approximately 26 simultaneous threads and get end-to-end response times of under 200ms even with typically every 5 minute distribution. However, as soon as I issue a single optimize on the master, the response time goes up to over 500ms and does not seem to recover. As soon as I restarted the response time is back down to 200ms. My index is approximately 5 GB in size and the queries are just basic constructed disjunction queries such as title:iphone OR bodytext:iphone. Has anybody seen this issue before? Thanks, Sammy
Re: Standard request with functional query
Hey guys, Thanks for the response, but how would make recency a factor on scoring documents with the standard request handler. The query (title:iphone OR bodytext:iphone OR title:firmware OR bodytext:firmware) AND _val_:ord(dateCreated)^0.1 seems to do something very similar to just sorting by dateCreated rather than having dateCreated being a part of the score. Thanks, Sammy n Thu, Dec 4, 2008 at 1:35 PM, Sammy Yu temi...@gmail.com wrote: Hi guys, I have a standard query that searches across multiple text fields such as q=title:iphone OR bodytext:iphone OR title:firmware OR bodytext:firmware This comes back with documents that have iphone and firmware (I know I can use dismax handler but it seems to be really slow), which is great. Now I want to give some more weight to more recent documents (there is a dateCreated field in each document). So I've modified the query as such: (title:iphone OR bodytext:iphone OR title:firmware OR bodytext:firmware) AND _val_:ord(dateCreated)^0.1 URLencoded to q=(title%3Aiphone+OR+bodytext%3Aiphone+OR+title%3Afirmware+OR+bodytext%3Afirmware)+AND+_val_%3Aord(dateCreated)^0.1 However, the results are not as one would expects. The first few documents only come back with the word iphone and appears to be sorted by date created. It seems to completely ignore the score and use the dateCreated field for the score. On a not directly related issue it seems like if you put the weight within the double quotes: (title:iphone OR bodytext:iphone OR title:firmware OR bodytext:firmware) AND _val_:ord(dateCreated)^0.1 the parser complains: org.apache.lucene.queryParser.ParseException: Cannot parse '(title:iphone OR bodytext:iphone OR title:firmware OR bodytext:firmware) AND _val_:ord(dateCreated)^0.1': Expected ',' at position 16 in 'ord(dateCreated)^0.1' Thanks, Sammy
Standard request with functional query
Hi guys, I have a standard query that searches across multiple text fields such as q=title:iphone OR bodytext:iphone OR title:firmware OR bodytext:firmware This comes back with documents that have iphone and firmware (I know I can use dismax handler but it seems to be really slow), which is great. Now I want to give some more weight to more recent documents (there is a dateCreated field in each document). So I've modified the query as such: (title:iphone OR bodytext:iphone OR title:firmware OR bodytext:firmware) AND _val_:ord(dateCreated)^0.1 URLencoded to q=(title%3Aiphone+OR+bodytext%3Aiphone+OR+title%3Afirmware+OR+bodytext%3Afirmware)+AND+_val_%3Aord(dateCreated)^0.1 However, the results are not as one would expects. The first few documents only come back with the word iphone and appears to be sorted by date created. It seems to completely ignore the score and use the dateCreated field for the score. On a not directly related issue it seems like if you put the weight within the double quotes: (title:iphone OR bodytext:iphone OR title:firmware OR bodytext:firmware) AND _val_:ord(dateCreated)^0.1 the parser complains: org.apache.lucene.queryParser.ParseException: Cannot parse '(title:iphone OR bodytext:iphone OR title:firmware OR bodytext:firmware) AND _val_:ord(dateCreated)^0.1': Expected ',' at position 16 in 'ord(dateCreated)^0.1' Thanks, Sammy
Re: SOLR query times
Hi Grant, Thanks for your response. I'm trying to simulate our production environment's search traffic which has very low cache hit rate. Turning off the caches can help us better understand query times and the load of the slave's when distribution occurs with a small list of pre-canned queries. If the latency is caused by loading and caching of Lucene's segments, is there a way to force Lucene's index to preload this? This seems to be the case in our production environment, when SOLR restarts the load spikes and it takes a couple of hours before it settles down. Also, are there general acceptable ways of doing scalability and performance characterization? Thanks, Sammy. On Sun, Oct 12, 2008 at 8:17 AM, Grant Ingersoll [EMAIL PROTECTED] wrote: This is pretty typical. The first query is always more expensive, as Lucene lazily loads some pieces of the index into memory and you may see the FieldCache in action, depending on sorting, not to mention you are also seeing operating system caching take place. Is there some reason you don't want these or are you just trying to understand the why? -Grant On Oct 10, 2008, at 6:25 PM, Sammy Yu wrote: Hi, I'm using SOLR 1.3 on a index with approximately 8 million documents. I would like to disable SOLR's cache so that it is easier for me to test the scenario when there is a small likelihood of cache hits. I've disabled caching by commenting out the filterCache, queryResultCache, and documentCache section in solrconfig.xml as suggested by the Wiki. It seems disabled because the admin interface no longer shows any entries in the Cache section. However, it appears that there is still some sort caching taking place. The first time I make specific query it would take around 100 msec, subsequent queries would take around 15 msec. Is there some sort of caching happening at Lucene level? Thanks for your help, Sammy Yu -- Grant Ingersoll Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans. http://www.lucenebootcamp.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
SOLR query times
Hi, I'm using SOLR 1.3 on a index with approximately 8 million documents. I would like to disable SOLR's cache so that it is easier for me to test the scenario when there is a small likelihood of cache hits. I've disabled caching by commenting out the filterCache, queryResultCache, and documentCache section in solrconfig.xml as suggested by the Wiki. It seems disabled because the admin interface no longer shows any entries in the Cache section. However, it appears that there is still some sort caching taking place. The first time I make specific query it would take around 100 msec, subsequent queries would take around 15 msec. Is there some sort of caching happening at Lucene level? Thanks for your help, Sammy Yu