Re: Help with relevance failure in Solr 1.3
Dang, had another server do this. Syncing and committing a new index does not fix it. The two servers show the same bad results. wunder On 4/11/09 9:12 AM, Walter Underwood wunderw...@netflix.com wrote: Restarting Solr fixes it. If I remember correctly, a sync and commit does not fix it. I have disabled snappuller this time, so I can study the broken instance. wunder On 4/11/09 5:03 AM, Grant Ingersoll gsing...@apache.org wrote: On Apr 10, 2009, at 5:50 PM, Walter Underwood wrote: Normally, both changeling and the changeling work fine. This one server is misbehaving like this for all multi-term queries. Yes, it is VERY weird that the term changeling does not show up in the explain. A server will occasionally go bad and stay in that state. In one case, two servers went bad and both gave the same wrong results. What's the solution for when they go bad? Do you have to restart Solr or reboot or what? Here is the dismax config. groups means movies. The title* fields are stemmed and stopped, the exact* fields are not. !-- groups and people -- requestHandler name=groups_people class=solr.SearchHandler lst name=defaults str name=defTypedismax/str str name=echoParamsnone/str float name=tie0.01/float str name=qf exact^6.0 exact_alt^6.0 exact_base~jw_0.7_1^8.0 exact_alias^8.0 title^3.0 title_alt^3.0 title_base^4.0 /str str name=pf exact^9.0 exact_alt^9.0 exact_base^12.0 exact_alias^12.0 title^3.0 title_alt^4.0 title_base^6.0 /str str name=bf search_popularity^100.0 /str str name=mm1/str int name=ps100/int str name=flid,type,movieid,personid,genreid/str /lst lst name=appends str name=fqtype:group OR type:person/str /lst /requestHandler wunder On 4/10/09 12:51 PM, Grant Ingersoll gsing...@apache.org wrote: On Apr 10, 2009, at 1:56 PM, Walter Underwood wrote: We have a rare, hard-to-reproduce problem with our Solr 1.3 servers, and I would appreciate any ideas. Ocassionally, a server will start returning results with really poor relevance. Single term queries work fine, but multi-term queries are scored based on the most common term (lowest IDF). I don't see anything in the logs when this happens. We have a monitor doing a search for the 100 most popular movies once per minute to catch this, so we know when it was first detected. I'm attaching two explain outputs, one for the query changeling and one for the changeling. I'm not sure what exactly you are asking, so bear with me... Are you saying that the changeling normally returns results just fine and then periodically it will go bad or are you saying you don't understand why the changeling scores differently from changeling? In looking at the explains, it is weird that in the the changeling case, the term changeling doesn't even show up as a term. Can you share your dismax configuration? That will be easier to parse than trying to make sense of the debug query parsing. -Grant -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Help with relevance failure in Solr 1.3
It just occurred to me that a query cache issue could potentially cause this... if it's caching it would most likely be a query.equals() implementation incorrectly returning true. Perhaps check the JaroWinkler.equals() first? Also, when one server starts to return bad results, have you tried using explainOther=id:id_of_other_doc_that_should_score_higher? -Yonik http://www.lucidimagination.com On Tue, Apr 14, 2009 at 11:43 AM, Walter Underwood wunderw...@netflix.com wrote: Dang, had another server do this. Syncing and committing a new index does not fix it. The two servers show the same bad results. wunder On 4/11/09 9:12 AM, Walter Underwood wunderw...@netflix.com wrote: Restarting Solr fixes it. If I remember correctly, a sync and commit does not fix it. I have disabled snappuller this time, so I can study the broken instance. wunder
Re: Help with relevance failure in Solr 1.3
The JaroWinkler equals was broken, but I fixed that a month ago. Query cache sounds possible, but those are cleared on a commit, right? I could run with a cache size of 0, since our middle tier HTTP cache is leaving almost nothing for the caches to do. I'll try that explain. The stored fields for the correct doc are fine, because I can see them when I use a single-term query. The indexed fields seem OK, because that query works. wunder On 4/14/09 9:11 AM, Yonik Seeley yo...@lucidimagination.com wrote: It just occurred to me that a query cache issue could potentially cause this... if it's caching it would most likely be a query.equals() implementation incorrectly returning true. Perhaps check the JaroWinkler.equals() first? Also, when one server starts to return bad results, have you tried using explainOther=id:id_of_other_doc_that_should_score_higher? -Yonik http://www.lucidimagination.com On Tue, Apr 14, 2009 at 11:43 AM, Walter Underwood wunderw...@netflix.com wrote: Dang, had another server do this. Syncing and committing a new index does not fix it. The two servers show the same bad results. wunder On 4/11/09 9:12 AM, Walter Underwood wunderw...@netflix.com wrote: Restarting Solr fixes it. If I remember correctly, a sync and commit does not fix it. I have disabled snappuller this time, so I can study the broken instance. wunder
Re: Help with relevance failure in Solr 1.3
On Tue, Apr 14, 2009 at 12:19 PM, Walter Underwood wunderw...@netflix.com wrote: The JaroWinkler equals was broken, but I fixed that a month ago. Query cache sounds possible, but those are cleared on a commit, right? Yes, but if you use autowarming, those items are regenerated and if there is a problem with equals() then it could re-appear (the cache items are correct, it's just the lookup that returns the wrong one). -Yonik http://www.lucidimagination.com
Re: Help with relevance failure in Solr 1.3
But why would it work for a few days, then go bad and stay bad? It fails for every multi-term query, even those not in cache. I ran a test with more queries than the cache size. We do use autowarming. wunder On 4/14/09 10:55 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Tue, Apr 14, 2009 at 12:19 PM, Walter Underwood wunderw...@netflix.com wrote: The JaroWinkler equals was broken, but I fixed that a month ago. Query cache sounds possible, but those are cleared on a commit, right? Yes, but if you use autowarming, those items are regenerated and if there is a problem with equals() then it could re-appear (the cache items are correct, it's just the lookup that returns the wrong one). -Yonik http://www.lucidimagination.com
Re: Help with relevance failure in Solr 1.3
Are there changes occuring when it goes bad that maybe aren't committed? On Apr 14, 2009, at 1:59 PM, Walter Underwood wrote: But why would it work for a few days, then go bad and stay bad? It fails for every multi-term query, even those not in cache. I ran a test with more queries than the cache size. We do use autowarming. wunder On 4/14/09 10:55 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Tue, Apr 14, 2009 at 12:19 PM, Walter Underwood wunderw...@netflix.com wrote: The JaroWinkler equals was broken, but I fixed that a month ago. Query cache sounds possible, but those are cleared on a commit, right? Yes, but if you use autowarming, those items are regenerated and if there is a problem with equals() then it could re-appear (the cache items are correct, it's just the lookup that returns the wrong one). -Yonik http://www.lucidimagination.com -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Help with relevance failure in Solr 1.3
Nope. This is a slave, so no indexing happens, just a sync. The sync happens once per day. It went bad at a different time. wunder On 4/14/09 11:42 AM, Grant Ingersoll gsing...@apache.org wrote: Are there changes occuring when it goes bad that maybe aren't committed? On Apr 14, 2009, at 1:59 PM, Walter Underwood wrote: But why would it work for a few days, then go bad and stay bad? It fails for every multi-term query, even those not in cache. I ran a test with more queries than the cache size. We do use autowarming. wunder On 4/14/09 10:55 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Tue, Apr 14, 2009 at 12:19 PM, Walter Underwood wunderw...@netflix.com wrote: The JaroWinkler equals was broken, but I fixed that a month ago. Query cache sounds possible, but those are cleared on a commit, right? Yes, but if you use autowarming, those items are regenerated and if there is a problem with equals() then it could re-appear (the cache items are correct, it's just the lookup that returns the wrong one). -Yonik http://www.lucidimagination.com -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Help with relevance failure in Solr 1.3
Is bad memory a possibility? i.e. is it the same machine all the time? Is there any recognizable pattern for when it happens? -Grant (grasping at straws) On Apr 14, 2009, at 2:51 PM, Walter Underwood wrote: Nope. This is a slave, so no indexing happens, just a sync. The sync happens once per day. It went bad at a different time. wunder On 4/14/09 11:42 AM, Grant Ingersoll gsing...@apache.org wrote: Are there changes occuring when it goes bad that maybe aren't committed? On Apr 14, 2009, at 1:59 PM, Walter Underwood wrote: But why would it work for a few days, then go bad and stay bad? It fails for every multi-term query, even those not in cache. I ran a test with more queries than the cache size. We do use autowarming. wunder On 4/14/09 10:55 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Tue, Apr 14, 2009 at 12:19 PM, Walter Underwood wunderw...@netflix.com wrote: The JaroWinkler equals was broken, but I fixed that a month ago. Query cache sounds possible, but those are cleared on a commit, right? Yes, but if you use autowarming, those items are regenerated and if there is a problem with equals() then it could re-appear (the cache items are correct, it's just the lookup that returns the wrong one). -Yonik http://www.lucidimagination.com -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Help with relevance failure in Solr 1.3
I already ruled out cosmic rays. It has happened on different hardware and at different times of day, including low load. The only thing associated with it is load from a new faceted browse thing we turned on. wunder On 4/14/09 2:23 PM, Grant Ingersoll gsing...@apache.org wrote: Is bad memory a possibility? i.e. is it the same machine all the time? Is there any recognizable pattern for when it happens? -Grant (grasping at straws) On Apr 14, 2009, at 2:51 PM, Walter Underwood wrote: Nope. This is a slave, so no indexing happens, just a sync. The sync happens once per day. It went bad at a different time. wunder On 4/14/09 11:42 AM, Grant Ingersoll gsing...@apache.org wrote: Are there changes occuring when it goes bad that maybe aren't committed? On Apr 14, 2009, at 1:59 PM, Walter Underwood wrote: But why would it work for a few days, then go bad and stay bad? It fails for every multi-term query, even those not in cache. I ran a test with more queries than the cache size. We do use autowarming. wunder On 4/14/09 10:55 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Tue, Apr 14, 2009 at 12:19 PM, Walter Underwood wunderw...@netflix.com wrote: The JaroWinkler equals was broken, but I fixed that a month ago. Query cache sounds possible, but those are cleared on a commit, right? Yes, but if you use autowarming, those items are regenerated and if there is a problem with equals() then it could re-appear (the cache items are correct, it's just the lookup that returns the wrong one). -Yonik http://www.lucidimagination.com -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Help with relevance failure in Solr 1.3
OK, I guess details on the new faceting stuff would be in order. Which faceting are using? Are you sure that it never occurred before (i.e. it slipped under the radar)? Obviously, the key is reproducibility here, but this has all the earmarks of some weird threading issue, it seems, at least IMO. On Apr 14, 2009, at 5:32 PM, Walter Underwood wrote: I already ruled out cosmic rays. It has happened on different hardware and at different times of day, including low load. The only thing associated with it is load from a new faceted browse thing we turned on. wunder On 4/14/09 2:23 PM, Grant Ingersoll gsing...@apache.org wrote: Is bad memory a possibility? i.e. is it the same machine all the time? Is there any recognizable pattern for when it happens? -Grant (grasping at straws) On Apr 14, 2009, at 2:51 PM, Walter Underwood wrote: Nope. This is a slave, so no indexing happens, just a sync. The sync happens once per day. It went bad at a different time. wunder On 4/14/09 11:42 AM, Grant Ingersoll gsing...@apache.org wrote: Are there changes occuring when it goes bad that maybe aren't committed? On Apr 14, 2009, at 1:59 PM, Walter Underwood wrote: But why would it work for a few days, then go bad and stay bad? It fails for every multi-term query, even those not in cache. I ran a test with more queries than the cache size. We do use autowarming. wunder On 4/14/09 10:55 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Tue, Apr 14, 2009 at 12:19 PM, Walter Underwood wunderw...@netflix.com wrote: The JaroWinkler equals was broken, but I fixed that a month ago. Query cache sounds possible, but those are cleared on a commit, right? Yes, but if you use autowarming, those items are regenerated and if there is a problem with equals() then it could re-appear (the cache items are correct, it's just the lookup that returns the wrong one). -Yonik http://www.lucidimagination.com -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Help with relevance failure in Solr 1.3
On Apr 10, 2009, at 5:50 PM, Walter Underwood wrote: Normally, both changeling and the changeling work fine. This one server is misbehaving like this for all multi-term queries. Yes, it is VERY weird that the term changeling does not show up in the explain. A server will occasionally go bad and stay in that state. In one case, two servers went bad and both gave the same wrong results. What's the solution for when they go bad? Do you have to restart Solr or reboot or what? Here is the dismax config. groups means movies. The title* fields are stemmed and stopped, the exact* fields are not. !-- groups and people -- requestHandler name=groups_people class=solr.SearchHandler lst name=defaults str name=defTypedismax/str str name=echoParamsnone/str float name=tie0.01/float str name=qf exact^6.0 exact_alt^6.0 exact_base~jw_0.7_1^8.0 exact_alias^8.0 title^3.0 title_alt^3.0 title_base^4.0 /str str name=pf exact^9.0 exact_alt^9.0 exact_base^12.0 exact_alias^12.0 title^3.0 title_alt^4.0 title_base^6.0 /str str name=bf search_popularity^100.0 /str str name=mm1/str int name=ps100/int str name=flid,type,movieid,personid,genreid/str /lst lst name=appends str name=fqtype:group OR type:person/str /lst /requestHandler wunder On 4/10/09 12:51 PM, Grant Ingersoll gsing...@apache.org wrote: On Apr 10, 2009, at 1:56 PM, Walter Underwood wrote: We have a rare, hard-to-reproduce problem with our Solr 1.3 servers, and I would appreciate any ideas. Ocassionally, a server will start returning results with really poor relevance. Single term queries work fine, but multi-term queries are scored based on the most common term (lowest IDF). I don't see anything in the logs when this happens. We have a monitor doing a search for the 100 most popular movies once per minute to catch this, so we know when it was first detected. I'm attaching two explain outputs, one for the query changeling and one for the changeling. I'm not sure what exactly you are asking, so bear with me... Are you saying that the changeling normally returns results just fine and then periodically it will go bad or are you saying you don't understand why the changeling scores differently from changeling? In looking at the explains, it is weird that in the the changeling case, the term changeling doesn't even show up as a term. Can you share your dismax configuration? That will be easier to parse than trying to make sense of the debug query parsing. -Grant -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Help with relevance failure in Solr 1.3
Restarting Solr fixes it. If I remember correctly, a sync and commit does not fix it. I have disabled snappuller this time, so I can study the broken instance. wunder On 4/11/09 5:03 AM, Grant Ingersoll gsing...@apache.org wrote: On Apr 10, 2009, at 5:50 PM, Walter Underwood wrote: Normally, both changeling and the changeling work fine. This one server is misbehaving like this for all multi-term queries. Yes, it is VERY weird that the term changeling does not show up in the explain. A server will occasionally go bad and stay in that state. In one case, two servers went bad and both gave the same wrong results. What's the solution for when they go bad? Do you have to restart Solr or reboot or what? Here is the dismax config. groups means movies. The title* fields are stemmed and stopped, the exact* fields are not. !-- groups and people -- requestHandler name=groups_people class=solr.SearchHandler lst name=defaults str name=defTypedismax/str str name=echoParamsnone/str float name=tie0.01/float str name=qf exact^6.0 exact_alt^6.0 exact_base~jw_0.7_1^8.0 exact_alias^8.0 title^3.0 title_alt^3.0 title_base^4.0 /str str name=pf exact^9.0 exact_alt^9.0 exact_base^12.0 exact_alias^12.0 title^3.0 title_alt^4.0 title_base^6.0 /str str name=bf search_popularity^100.0 /str str name=mm1/str int name=ps100/int str name=flid,type,movieid,personid,genreid/str /lst lst name=appends str name=fqtype:group OR type:person/str /lst /requestHandler wunder On 4/10/09 12:51 PM, Grant Ingersoll gsing...@apache.org wrote: On Apr 10, 2009, at 1:56 PM, Walter Underwood wrote: We have a rare, hard-to-reproduce problem with our Solr 1.3 servers, and I would appreciate any ideas. Ocassionally, a server will start returning results with really poor relevance. Single term queries work fine, but multi-term queries are scored based on the most common term (lowest IDF). I don't see anything in the logs when this happens. We have a monitor doing a search for the 100 most popular movies once per minute to catch this, so we know when it was first detected. I'm attaching two explain outputs, one for the query changeling and one for the changeling. I'm not sure what exactly you are asking, so bear with me... Are you saying that the changeling normally returns results just fine and then periodically it will go bad or are you saying you don't understand why the changeling scores differently from changeling? In looking at the explains, it is weird that in the the changeling case, the term changeling doesn't even show up as a term. Can you share your dismax configuration? That will be easier to parse than trying to make sense of the debug query parsing. -Grant -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Help with relevance failure in Solr 1.3
If you don't see the attachments, you can get them here: http://wunderwood.org/solr/ wunder On 4/10/09 10:56 AM, Walter Underwood wunderw...@netflix.com wrote: We have a rare, hard-to-reproduce problem with our Solr 1.3 servers, and I would appreciate any ideas. Ocassionally, a server will start returning results with really poor relevance. Single term queries work fine, but multi-term queries are scored based on the most common term (lowest IDF). I don't see anything in the logs when this happens. We have a monitor doing a search for the 100 most popular movies once per minute to catch this, so we know when it was first detected. I'm attaching two explain outputs, one for the query changeling and one for the changeling. We are running Solr 1.3 with Lucene 2.4.0, and have added a fuzzy query using JaroWinkler matching. I'd appreciate ideas about where to look, what debug output to try, etc. wunder
Re: Help with relevance failure in Solr 1.3
On Apr 10, 2009, at 1:56 PM, Walter Underwood wrote: We have a rare, hard-to-reproduce problem with our Solr 1.3 servers, and I would appreciate any ideas. Ocassionally, a server will start returning results with really poor relevance. Single term queries work fine, but multi-term queries are scored based on the most common term (lowest IDF). I don't see anything in the logs when this happens. We have a monitor doing a search for the 100 most popular movies once per minute to catch this, so we know when it was first detected. I'm attaching two explain outputs, one for the query changeling and one for the changeling. I'm not sure what exactly you are asking, so bear with me... Are you saying that the changeling normally returns results just fine and then periodically it will go bad or are you saying you don't understand why the changeling scores differently from changeling? In looking at the explains, it is weird that in the the changeling case, the term changeling doesn't even show up as a term. Can you share your dismax configuration? That will be easier to parse than trying to make sense of the debug query parsing. -Grant
Re: Help with relevance failure in Solr 1.3
Normally, both changeling and the changeling work fine. This one server is misbehaving like this for all multi-term queries. Yes, it is VERY weird that the term changeling does not show up in the explain. A server will occasionally go bad and stay in that state. In one case, two servers went bad and both gave the same wrong results. Here is the dismax config. groups means movies. The title* fields are stemmed and stopped, the exact* fields are not. !-- groups and people -- requestHandler name=groups_people class=solr.SearchHandler lst name=defaults str name=defTypedismax/str str name=echoParamsnone/str float name=tie0.01/float str name=qf exact^6.0 exact_alt^6.0 exact_base~jw_0.7_1^8.0 exact_alias^8.0 title^3.0 title_alt^3.0 title_base^4.0 /str str name=pf exact^9.0 exact_alt^9.0 exact_base^12.0 exact_alias^12.0 title^3.0 title_alt^4.0 title_base^6.0 /str str name=bf search_popularity^100.0 /str str name=mm1/str int name=ps100/int str name=flid,type,movieid,personid,genreid/str /lst lst name=appends str name=fqtype:group OR type:person/str /lst /requestHandler wunder On 4/10/09 12:51 PM, Grant Ingersoll gsing...@apache.org wrote: On Apr 10, 2009, at 1:56 PM, Walter Underwood wrote: We have a rare, hard-to-reproduce problem with our Solr 1.3 servers, and I would appreciate any ideas. Ocassionally, a server will start returning results with really poor relevance. Single term queries work fine, but multi-term queries are scored based on the most common term (lowest IDF). I don't see anything in the logs when this happens. We have a monitor doing a search for the 100 most popular movies once per minute to catch this, so we know when it was first detected. I'm attaching two explain outputs, one for the query changeling and one for the changeling. I'm not sure what exactly you are asking, so bear with me... Are you saying that the changeling normally returns results just fine and then periodically it will go bad or are you saying you don't understand why the changeling scores differently from changeling? In looking at the explains, it is weird that in the the changeling case, the term changeling doesn't even show up as a term. Can you share your dismax configuration? That will be easier to parse than trying to make sense of the debug query parsing. -Grant