Re: When searching for !...@#$%^*() all documents are matched incorrectly
On Monday 01 June 2009 16:50, Sam Michaels wrote: So the fix for this problem would be 1. Stop using WordDelimiterFilter for queries (what is the alternative) OR 2. Not allow any search strings without any alphanumeric characters.. We ran into this same problem while replacing all characters using a PatternReplaceFilter. I've been working around this bug by using a LengthFilter to filter out tokens of zero length. .øs Yonik Seeley-2 wrote: OK, here's the deal: str name=rawquerystring-features:foo features:(\...@#$%\^\*\(\))/str str name=querystring-features:foo features:(\...@#$%\^\*\(\))/str str name=parsedquery-features:foo/str str name=parsedquery_toString-features:foo/str The text analysis is throwing away non alphanumeric chars (probably the WordDelimiterFilter). The Lucene (and Solr) query parser throws away term queries when the token is zero length (after analysis). Solr then interprets the left over -features:foo as all documents not containing foo in the features field, so you get a bunch of matches. -Yonik http://www.lucidimagination.com On Mon, Jun 1, 2009 at 10:15 AM, Sam Michaels mas...@yahoo.com wrote: Walter, The analysis link does not produce any matches for either @ or !...@#$%^*() strings when I try to match against bathing. I'm worried that this might be the symptom of another problem (which has not revealed itself yet) and want to get to the bottom of this... Thank you. sm Walter Underwood wrote: Use the [analysis] link on the Solr admin UI to get more info on how this is being interpreted. However, I am curious about why this is important. Do users enter this query often? If not, maybe it is not something to spend time on. wunder On 5/31/09 2:56 PM, Sam Michaels mas...@yahoo.com wrote: Here is the output from the debug query when I'm trying to match the String @ against Bathing (should not match) str name=GLOM-1 3.2689073 = (MATCH) weight(activity_type:NAME in 0), product of: 0.9994 = queryWeight(activity_type:NAME), product of: 3.2689075 = idf(docFreq=153, numDocs=1489) 0.30591258 = queryNorm 3.2689075 = (MATCH) fieldWeight(activity_type:NAME in 0), product of: 1.0 = tf(termFreq(activity_type:NAME)=1) 3.2689075 = idf(docFreq=153, numDocs=1489) 1.0 = fieldNorm(field=activity_type, doc=0) /str Looks like the AND clause in the search string is ignored... SM. ryantxu wrote: two key things to try (for anyone ever wondering why a query matches documents) 1. add debugQuery=true and look at the explain text below -- anything that contributed to the score is listed there 2. check /admin/analysis.jsp -- this will let you see how analyzers break text up into tokens. Not sure off hand, but I'm guessing the WordDelimiterFilterFactory has something to do with it... On Sat, May 30, 2009 at 5:59 PM, Sam Michaels mas...@yahoo.com wrote: Hi, I'm running Solr 1.3/Java 1.6. When I run a query like - (activity_type:NAME) AND title:(\...@#$%\^\*\(\)) all the documents are returned even though there is not a single match. There is no title that matches the string (which has been escaped). My document structure is as follows doc str name=activity_typeNAME/str str name=titleBathing/str /doc The title field is of type text_title which is described below. fieldType name=text_title class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType When I run the query against Luke, no results are returned. Any suggestions are appreciated. -- View this message in context: http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all- document s-are-matched-incorrectly-tp23797731p23797731.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context:
Re: When searching for !...@#$%^*() all documents are matched incorrectly
Walter, The analysis link does not produce any matches for either @ or !...@#$%^*() strings when I try to match against bathing. I'm worried that this might be the symptom of another problem (which has not revealed itself yet) and want to get to the bottom of this... Thank you. sm Walter Underwood wrote: Use the [analysis] link on the Solr admin UI to get more info on how this is being interpreted. However, I am curious about why this is important. Do users enter this query often? If not, maybe it is not something to spend time on. wunder On 5/31/09 2:56 PM, Sam Michaels mas...@yahoo.com wrote: Here is the output from the debug query when I'm trying to match the String @ against Bathing (should not match) str name=GLOM-1 3.2689073 = (MATCH) weight(activity_type:NAME in 0), product of: 0.9994 = queryWeight(activity_type:NAME), product of: 3.2689075 = idf(docFreq=153, numDocs=1489) 0.30591258 = queryNorm 3.2689075 = (MATCH) fieldWeight(activity_type:NAME in 0), product of: 1.0 = tf(termFreq(activity_type:NAME)=1) 3.2689075 = idf(docFreq=153, numDocs=1489) 1.0 = fieldNorm(field=activity_type, doc=0) /str Looks like the AND clause in the search string is ignored... SM. ryantxu wrote: two key things to try (for anyone ever wondering why a query matches documents) 1. add debugQuery=true and look at the explain text below -- anything that contributed to the score is listed there 2. check /admin/analysis.jsp -- this will let you see how analyzers break text up into tokens. Not sure off hand, but I'm guessing the WordDelimiterFilterFactory has something to do with it... On Sat, May 30, 2009 at 5:59 PM, Sam Michaels mas...@yahoo.com wrote: Hi, I'm running Solr 1.3/Java 1.6. When I run a query like - (activity_type:NAME) AND title:(\...@#$%\^\*\(\)) all the documents are returned even though there is not a single match. There is no title that matches the string (which has been escaped). My document structure is as follows doc str name=activity_typeNAME/str str name=titleBathing/str /doc The title field is of type text_title which is described below. fieldType name=text_title class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType When I run the query against Luke, no results are returned. Any suggestions are appreciated. -- View this message in context: http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-document s-are-matched-incorrectly-tp23797731p23797731.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents-are-matched-incorrectly-tp23797731p23815688.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: When searching for !...@#$%^*() all documents are matched incorrectly
OK, here's the deal: str name=rawquerystring-features:foo features:(\...@#$%\^\*\(\))/str str name=querystring-features:foo features:(\...@#$%\^\*\(\))/str str name=parsedquery-features:foo/str str name=parsedquery_toString-features:foo/str The text analysis is throwing away non alphanumeric chars (probably the WordDelimiterFilter). The Lucene (and Solr) query parser throws away term queries when the token is zero length (after analysis). Solr then interprets the left over -features:foo as all documents not containing foo in the features field, so you get a bunch of matches. -Yonik http://www.lucidimagination.com On Mon, Jun 1, 2009 at 10:15 AM, Sam Michaels mas...@yahoo.com wrote: Walter, The analysis link does not produce any matches for either @ or !...@#$%^*() strings when I try to match against bathing. I'm worried that this might be the symptom of another problem (which has not revealed itself yet) and want to get to the bottom of this... Thank you. sm Walter Underwood wrote: Use the [analysis] link on the Solr admin UI to get more info on how this is being interpreted. However, I am curious about why this is important. Do users enter this query often? If not, maybe it is not something to spend time on. wunder On 5/31/09 2:56 PM, Sam Michaels mas...@yahoo.com wrote: Here is the output from the debug query when I'm trying to match the String @ against Bathing (should not match) str name=GLOM-1 3.2689073 = (MATCH) weight(activity_type:NAME in 0), product of: 0.9994 = queryWeight(activity_type:NAME), product of: 3.2689075 = idf(docFreq=153, numDocs=1489) 0.30591258 = queryNorm 3.2689075 = (MATCH) fieldWeight(activity_type:NAME in 0), product of: 1.0 = tf(termFreq(activity_type:NAME)=1) 3.2689075 = idf(docFreq=153, numDocs=1489) 1.0 = fieldNorm(field=activity_type, doc=0) /str Looks like the AND clause in the search string is ignored... SM. ryantxu wrote: two key things to try (for anyone ever wondering why a query matches documents) 1. add debugQuery=true and look at the explain text below -- anything that contributed to the score is listed there 2. check /admin/analysis.jsp -- this will let you see how analyzers break text up into tokens. Not sure off hand, but I'm guessing the WordDelimiterFilterFactory has something to do with it... On Sat, May 30, 2009 at 5:59 PM, Sam Michaels mas...@yahoo.com wrote: Hi, I'm running Solr 1.3/Java 1.6. When I run a query like - (activity_type:NAME) AND title:(\...@#$%\^\*\(\)) all the documents are returned even though there is not a single match. There is no title that matches the string (which has been escaped). My document structure is as follows doc str name=activity_typeNAME/str str name=titleBathing/str /doc The title field is of type text_title which is described below. fieldType name=text_title class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType When I run the query against Luke, no results are returned. Any suggestions are appreciated. -- View this message in context: http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-document s-are-matched-incorrectly-tp23797731p23797731.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents-are-matched-incorrectly-tp23797731p23815688.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: When searching for !...@#$%^*() all documents are matched incorrectly
So the fix for this problem would be 1. Stop using WordDelimiterFilter for queries (what is the alternative) OR 2. Not allow any search strings without any alphanumeric characters.. SM. Yonik Seeley-2 wrote: OK, here's the deal: str name=rawquerystring-features:foo features:(\...@#$%\^\*\(\))/str str name=querystring-features:foo features:(\...@#$%\^\*\(\))/str str name=parsedquery-features:foo/str str name=parsedquery_toString-features:foo/str The text analysis is throwing away non alphanumeric chars (probably the WordDelimiterFilter). The Lucene (and Solr) query parser throws away term queries when the token is zero length (after analysis). Solr then interprets the left over -features:foo as all documents not containing foo in the features field, so you get a bunch of matches. -Yonik http://www.lucidimagination.com On Mon, Jun 1, 2009 at 10:15 AM, Sam Michaels mas...@yahoo.com wrote: Walter, The analysis link does not produce any matches for either @ or !...@#$%^*() strings when I try to match against bathing. I'm worried that this might be the symptom of another problem (which has not revealed itself yet) and want to get to the bottom of this... Thank you. sm Walter Underwood wrote: Use the [analysis] link on the Solr admin UI to get more info on how this is being interpreted. However, I am curious about why this is important. Do users enter this query often? If not, maybe it is not something to spend time on. wunder On 5/31/09 2:56 PM, Sam Michaels mas...@yahoo.com wrote: Here is the output from the debug query when I'm trying to match the String @ against Bathing (should not match) str name=GLOM-1 3.2689073 = (MATCH) weight(activity_type:NAME in 0), product of: 0.9994 = queryWeight(activity_type:NAME), product of: 3.2689075 = idf(docFreq=153, numDocs=1489) 0.30591258 = queryNorm 3.2689075 = (MATCH) fieldWeight(activity_type:NAME in 0), product of: 1.0 = tf(termFreq(activity_type:NAME)=1) 3.2689075 = idf(docFreq=153, numDocs=1489) 1.0 = fieldNorm(field=activity_type, doc=0) /str Looks like the AND clause in the search string is ignored... SM. ryantxu wrote: two key things to try (for anyone ever wondering why a query matches documents) 1. add debugQuery=true and look at the explain text below -- anything that contributed to the score is listed there 2. check /admin/analysis.jsp -- this will let you see how analyzers break text up into tokens. Not sure off hand, but I'm guessing the WordDelimiterFilterFactory has something to do with it... On Sat, May 30, 2009 at 5:59 PM, Sam Michaels mas...@yahoo.com wrote: Hi, I'm running Solr 1.3/Java 1.6. When I run a query like - (activity_type:NAME) AND title:(\...@#$%\^\*\(\)) all the documents are returned even though there is not a single match. There is no title that matches the string (which has been escaped). My document structure is as follows doc str name=activity_typeNAME/str str name=titleBathing/str /doc The title field is of type text_title which is described below. fieldType name=text_title class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType When I run the query against Luke, no results are returned. Any suggestions are appreciated. -- View this message in context: http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-document s-are-matched-incorrectly-tp23797731p23797731.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents-are-matched-incorrectly-tp23797731p23815688.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents-are-matched-incorrectly-tp23797731p23816242.html Sent from
Re: When searching for !...@#$%^*() all documents are matched incorrectly
On Mon, Jun 1, 2009 at 10:50 AM, Sam Michaels mas...@yahoo.com wrote: So the fix for this problem would be 1. Stop using WordDelimiterFilter for queries (what is the alternative) OR 2. Not allow any search strings without any alphanumeric characters.. Short term workaround for you, yes. I would classify this surprising behavior as a bug we should eventually fix though. Could you open a JIRA issue for it? -Yonik http://www.lucidimagination.com SM. Yonik Seeley-2 wrote: OK, here's the deal: str name=rawquerystring-features:foo features:(\...@#$%\^\*\(\))/str str name=querystring-features:foo features:(\...@#$%\^\*\(\))/str str name=parsedquery-features:foo/str str name=parsedquery_toString-features:foo/str The text analysis is throwing away non alphanumeric chars (probably the WordDelimiterFilter). The Lucene (and Solr) query parser throws away term queries when the token is zero length (after analysis). Solr then interprets the left over -features:foo as all documents not containing foo in the features field, so you get a bunch of matches. -Yonik http://www.lucidimagination.com On Mon, Jun 1, 2009 at 10:15 AM, Sam Michaels mas...@yahoo.com wrote: Walter, The analysis link does not produce any matches for either @ or !...@#$%^*() strings when I try to match against bathing. I'm worried that this might be the symptom of another problem (which has not revealed itself yet) and want to get to the bottom of this... Thank you. sm Walter Underwood wrote: Use the [analysis] link on the Solr admin UI to get more info on how this is being interpreted. However, I am curious about why this is important. Do users enter this query often? If not, maybe it is not something to spend time on. wunder On 5/31/09 2:56 PM, Sam Michaels mas...@yahoo.com wrote: Here is the output from the debug query when I'm trying to match the String @ against Bathing (should not match) str name=GLOM-1 3.2689073 = (MATCH) weight(activity_type:NAME in 0), product of: 0.9994 = queryWeight(activity_type:NAME), product of: 3.2689075 = idf(docFreq=153, numDocs=1489) 0.30591258 = queryNorm 3.2689075 = (MATCH) fieldWeight(activity_type:NAME in 0), product of: 1.0 = tf(termFreq(activity_type:NAME)=1) 3.2689075 = idf(docFreq=153, numDocs=1489) 1.0 = fieldNorm(field=activity_type, doc=0) /str Looks like the AND clause in the search string is ignored... SM. ryantxu wrote: two key things to try (for anyone ever wondering why a query matches documents) 1. add debugQuery=true and look at the explain text below -- anything that contributed to the score is listed there 2. check /admin/analysis.jsp -- this will let you see how analyzers break text up into tokens. Not sure off hand, but I'm guessing the WordDelimiterFilterFactory has something to do with it... On Sat, May 30, 2009 at 5:59 PM, Sam Michaels mas...@yahoo.com wrote: Hi, I'm running Solr 1.3/Java 1.6. When I run a query like - (activity_type:NAME) AND title:(\...@#$%\^\*\(\)) all the documents are returned even though there is not a single match. There is no title that matches the string (which has been escaped). My document structure is as follows doc str name=activity_typeNAME/str str name=titleBathing/str /doc The title field is of type text_title which is described below. fieldType name=text_title class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType When I run the query against Luke, no results are returned. Any suggestions are appreciated. -- View this message in context: http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-document s-are-matched-incorrectly-tp23797731p23797731.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context:
Re: When searching for !...@#$%^*() all documents are matched incorrectly
As per relevance, no results should be returned. But all the results are returned in alphabetical order. Walter Underwood wrote: I'm really curious. What is the most relevant result for that query? wunder On 5/30/09 7:35 PM, Ryan McKinley ryan...@gmail.com wrote: two key things to try (for anyone ever wondering why a query matches documents) 1. add debugQuery=true and look at the explain text below -- anything that contributed to the score is listed there 2. check /admin/analysis.jsp -- this will let you see how analyzers break text up into tokens. Not sure off hand, but I'm guessing the WordDelimiterFilterFactory has something to do with it... On Sat, May 30, 2009 at 5:59 PM, Sam Michaels mas...@yahoo.com wrote: Hi, I'm running Solr 1.3/Java 1.6. When I run a query like - (activity_type:NAME) AND title:(\...@#$%\^\*\(\)) all the documents are returned even though there is not a single match. There is no title that matches the string (which has been escaped). My document structure is as follows doc str name=activity_typeNAME/str str name=titleBathing/str /doc The title field is of type text_title which is described below. fieldType name=text_title class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType When I run the query against Luke, no results are returned. Any suggestions are appreciated. -- View this message in context: http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents -are-matched-incorrectly-tp23797731p23797731.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents-are-matched-incorrectly-tp23797731p23804060.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: When searching for !...@#$%^*() all documents are matched incorrectly
Upon some further experimentation, I found out that even @ matches all the documents. However when I append the wildcard * to @ (@*) then there is no match... SM Sam Michaels wrote: Hi, I'm running Solr 1.3/Java 1.6. When I run a query like - (activity_type:NAME) AND title:(\...@#$%\^\*\(\)) all the documents are returned even though there is not a single match. There is no title that matches the string (which has been escaped). My document structure is as follows doc str name=activity_typeNAME/str str name=titleBathing/str /doc The title field is of type text_title which is described below. fieldType name=text_title class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType When I run the query against Luke, no results are returned. Any suggestions are appreciated. -- View this message in context: http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents-are-matched-incorrectly-tp23797731p23804381.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: When searching for !...@#$%^*() all documents are matched incorrectly
Here is the output from the debug query when I'm trying to match the String @ against Bathing (should not match) str name=GLOM-1 3.2689073 = (MATCH) weight(activity_type:NAME in 0), product of: 0.9994 = queryWeight(activity_type:NAME), product of: 3.2689075 = idf(docFreq=153, numDocs=1489) 0.30591258 = queryNorm 3.2689075 = (MATCH) fieldWeight(activity_type:NAME in 0), product of: 1.0 = tf(termFreq(activity_type:NAME)=1) 3.2689075 = idf(docFreq=153, numDocs=1489) 1.0 = fieldNorm(field=activity_type, doc=0) /str Looks like the AND clause in the search string is ignored... SM. ryantxu wrote: two key things to try (for anyone ever wondering why a query matches documents) 1. add debugQuery=true and look at the explain text below -- anything that contributed to the score is listed there 2. check /admin/analysis.jsp -- this will let you see how analyzers break text up into tokens. Not sure off hand, but I'm guessing the WordDelimiterFilterFactory has something to do with it... On Sat, May 30, 2009 at 5:59 PM, Sam Michaels mas...@yahoo.com wrote: Hi, I'm running Solr 1.3/Java 1.6. When I run a query like - (activity_type:NAME) AND title:(\...@#$%\^\*\(\)) all the documents are returned even though there is not a single match. There is no title that matches the string (which has been escaped). My document structure is as follows doc str name=activity_typeNAME/str str name=titleBathing/str /doc The title field is of type text_title which is described below. fieldType name=text_title class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType When I run the query against Luke, no results are returned. Any suggestions are appreciated. -- View this message in context: http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents-are-matched-incorrectly-tp23797731p23797731.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents-are-matched-incorrectly-tp23797731p23807341.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: When searching for !...@#$%^*() all documents are matched incorrectly
Use the [analysis] link on the Solr admin UI to get more info on how this is being interpreted. However, I am curious about why this is important. Do users enter this query often? If not, maybe it is not something to spend time on. wunder On 5/31/09 2:56 PM, Sam Michaels mas...@yahoo.com wrote: Here is the output from the debug query when I'm trying to match the String @ against Bathing (should not match) str name=GLOM-1 3.2689073 = (MATCH) weight(activity_type:NAME in 0), product of: 0.9994 = queryWeight(activity_type:NAME), product of: 3.2689075 = idf(docFreq=153, numDocs=1489) 0.30591258 = queryNorm 3.2689075 = (MATCH) fieldWeight(activity_type:NAME in 0), product of: 1.0 = tf(termFreq(activity_type:NAME)=1) 3.2689075 = idf(docFreq=153, numDocs=1489) 1.0 = fieldNorm(field=activity_type, doc=0) /str Looks like the AND clause in the search string is ignored... SM. ryantxu wrote: two key things to try (for anyone ever wondering why a query matches documents) 1. add debugQuery=true and look at the explain text below -- anything that contributed to the score is listed there 2. check /admin/analysis.jsp -- this will let you see how analyzers break text up into tokens. Not sure off hand, but I'm guessing the WordDelimiterFilterFactory has something to do with it... On Sat, May 30, 2009 at 5:59 PM, Sam Michaels mas...@yahoo.com wrote: Hi, I'm running Solr 1.3/Java 1.6. When I run a query like - (activity_type:NAME) AND title:(\...@#$%\^\*\(\)) all the documents are returned even though there is not a single match. There is no title that matches the string (which has been escaped). My document structure is as follows doc str name=activity_typeNAME/str str name=titleBathing/str /doc The title field is of type text_title which is described below. fieldType name=text_title class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType When I run the query against Luke, no results are returned. Any suggestions are appreciated. -- View this message in context: http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-document s-are-matched-incorrectly-tp23797731p23797731.html Sent from the Solr - User mailing list archive at Nabble.com.
When searching for !...@#$%^*() all documents are matched incorrectly
Hi, I'm running Solr 1.3/Java 1.6. When I run a query like - (activity_type:NAME) AND title:(\...@#$%\^\*\(\)) all the documents are returned even though there is not a single match. There is no title that matches the string (which has been escaped). My document structure is as follows doc str name=activity_typeNAME/str str name=titleBathing/str /doc The title field is of type text_title which is described below. fieldType name=text_title class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType When I run the query against Luke, no results are returned. Any suggestions are appreciated. -- View this message in context: http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents-are-matched-incorrectly-tp23797731p23797731.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: When searching for !...@#$%^*() all documents are matched incorrectly
two key things to try (for anyone ever wondering why a query matches documents) 1. add debugQuery=true and look at the explain text below -- anything that contributed to the score is listed there 2. check /admin/analysis.jsp -- this will let you see how analyzers break text up into tokens. Not sure off hand, but I'm guessing the WordDelimiterFilterFactory has something to do with it... On Sat, May 30, 2009 at 5:59 PM, Sam Michaels mas...@yahoo.com wrote: Hi, I'm running Solr 1.3/Java 1.6. When I run a query like - (activity_type:NAME) AND title:(\...@#$%\^\*\(\)) all the documents are returned even though there is not a single match. There is no title that matches the string (which has been escaped). My document structure is as follows doc str name=activity_typeNAME/str str name=titleBathing/str /doc The title field is of type text_title which is described below. fieldType name=text_title class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType When I run the query against Luke, no results are returned. Any suggestions are appreciated. -- View this message in context: http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents-are-matched-incorrectly-tp23797731p23797731.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: When searching for !...@#$%^*() all documents are matched incorrectly
I'm really curious. What is the most relevant result for that query? wunder On 5/30/09 7:35 PM, Ryan McKinley ryan...@gmail.com wrote: two key things to try (for anyone ever wondering why a query matches documents) 1. add debugQuery=true and look at the explain text below -- anything that contributed to the score is listed there 2. check /admin/analysis.jsp -- this will let you see how analyzers break text up into tokens. Not sure off hand, but I'm guessing the WordDelimiterFilterFactory has something to do with it... On Sat, May 30, 2009 at 5:59 PM, Sam Michaels mas...@yahoo.com wrote: Hi, I'm running Solr 1.3/Java 1.6. When I run a query like - (activity_type:NAME) AND title:(\...@#$%\^\*\(\)) all the documents are returned even though there is not a single match. There is no title that matches the string (which has been escaped). My document structure is as follows doc str name=activity_typeNAME/str str name=titleBathing/str /doc The title field is of type text_title which is described below. fieldType name=text_title class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType When I run the query against Luke, no results are returned. Any suggestions are appreciated. -- View this message in context: http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents -are-matched-incorrectly-tp23797731p23797731.html Sent from the Solr - User mailing list archive at Nabble.com.