Re: When searching for !@#$%^&*() all documents are matched incorrectly

Sam Michaels Mon, 01 Jun 2009 08:27:45 -0700

Yonik,

Done, here is the link.
https://issues.apache.org/jira/browse/SOLR-1196


SM.


Yonik Seeley-2 wrote:
> 
> On Mon, Jun 1, 2009 at 10:50 AM, Sam Michaels <mas...@yahoo.com> wrote:
>>
>> So the fix for this problem would be
>>
>> 1. Stop using WordDelimiterFilter for queries (what is the alternative)
>> OR
>> 2. Not allow any search strings without any alphanumeric characters..
> 
> Short term workaround for you, yes.
> I would classify this surprising behavior as a bug we should
> eventually fix though.  Could you open a JIRA issue for it?
> 
> -Yonik
> http://www.lucidimagination.com
> 
>> SM.
>>
>>
>> Yonik Seeley-2 wrote:
>>>
>>> OK, here's the deal:
>>>
>>> <str name="rawquerystring">-features:foo
>>> features:(\...@#$%\^&\*\(\))</str>
>>> <str name="querystring">-features:foo features:(\...@#$%\^&\*\(\))</str>
>>> <str name="parsedquery">-features:foo</str>
>>> <str name="parsedquery_toString">-features:foo</str>
>>>
>>> The text analysis is throwing away non alphanumeric chars (probably
>>> the WordDelimiterFilter).  The Lucene (and Solr) query parser throws
>>> away term queries when the token is zero length (after analysis).
>>> Solr then interprets the left over "-features:foo" as "all documents
>>> not containing foo in the features field", so you get a bunch of
>>> matches.
>>>
>>> -Yonik
>>> http://www.lucidimagination.com
>>>
>>>
>>> On Mon, Jun 1, 2009 at 10:15 AM, Sam Michaels <mas...@yahoo.com> wrote:
>>>>
>>>> Walter,
>>>>
>>>> The analysis link does not produce any matches for either @ or
>>>> !...@#$%^&*()
>>>> strings when I try to match against bathing. I'm worried that this
>>>> might
>>>> be
>>>> the symptom of another problem (which has not revealed itself yet) and
>>>> want
>>>> to get to the bottom of this...
>>>>
>>>> Thank you.
>>>> sm
>>>>
>>>>
>>>> Walter Underwood wrote:
>>>>>
>>>>> Use the [analysis] link on the Solr admin UI to get more info on
>>>>> how this is being interpreted.
>>>>>
>>>>> However, I am curious about why this is important. Do users enter
>>>>> this query often? If not, maybe it is not something to spend time on.
>>>>>
>>>>> wunder
>>>>>
>>>>> On 5/31/09 2:56 PM, "Sam Michaels" <mas...@yahoo.com> wrote:
>>>>>
>>>>>>
>>>>>> Here is the output from the debug query when I'm trying to match the
>>>>>> String @
>>>>>> against Bathing (should not match)
>>>>>>
>>>>>> <str name="GLOM-1">
>>>>>> 3.2689073 = (MATCH) weight(activity_type:NAME in 0), product of:
>>>>>>   0.99999994 = queryWeight(activity_type:NAME), product of:
>>>>>>     3.2689075 = idf(docFreq=153, numDocs=1489)
>>>>>>     0.30591258 = queryNorm
>>>>>>   3.2689075 = (MATCH) fieldWeight(activity_type:NAME in 0), product
>>>>>> of:
>>>>>>     1.0 = tf(termFreq(activity_type:NAME)=1)
>>>>>>     3.2689075 = idf(docFreq=153, numDocs=1489)
>>>>>>     1.0 = fieldNorm(field=activity_type, doc=0)
>>>>>> </str>
>>>>>>
>>>>>> Looks like the AND clause in the search string is ignored...
>>>>>>
>>>>>> SM.
>>>>>>
>>>>>>
>>>>>> ryantxu wrote:
>>>>>>>
>>>>>>> two key things to try (for anyone ever wondering why a query matches
>>>>>>> documents)
>>>>>>>
>>>>>>> 1.  add &debugQuery=true and look at the explain text below --
>>>>>>> anything that contributed to the score is listed there
>>>>>>> 2.  check /admin/analysis.jsp -- this will let you see how analyzers
>>>>>>> break text up into tokens.
>>>>>>>
>>>>>>> Not sure off hand, but I'm guessing the WordDelimiterFilterFactory
>>>>>>> has
>>>>>>> something to do with it...
>>>>>>>
>>>>>>>
>>>>>>> On Sat, May 30, 2009 at 5:59 PM, Sam Michaels <mas...@yahoo.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I'm running Solr 1.3/Java 1.6.
>>>>>>>>
>>>>>>>> When I run a query like  - (activity_type:NAME) AND
>>>>>>>> title:(\...@#$%\^&\*\(\))
>>>>>>>> all the documents are returned even though there is not a single
>>>>>>>> match.
>>>>>>>> There is no title that matches the string (which has been escaped).
>>>>>>>>
>>>>>>>> My document structure is as follows
>>>>>>>>
>>>>>>>> <doc>
>>>>>>>> <str name="activity_type">NAME</str>
>>>>>>>> <str name="title">Bathing</str>
>>>>>>>> ....
>>>>>>>> </doc>
>>>>>>>>
>>>>>>>>
>>>>>>>> The title field is of type text_title which is described below.
>>>>>>>>
>>>>>>>> <fieldType name="text_title" class="solr.TextField"
>>>>>>>> positionIncrementGap="100">
>>>>>>>>      <analyzer type="index">
>>>>>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>>>>>        <!-- in this example, we will only use synonyms at query
>>>>>>>> time
>>>>>>>>        <filter class="solr.SynonymFilterFactory"
>>>>>>>> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>>>>>>>>        -->
>>>>>>>>        <filter class="solr.WordDelimiterFilterFactory"
>>>>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>>>>>>>> catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>
>>>>>>>>        <filter class="solr.LowerCaseFilterFactory"/>
>>>>>>>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>>>>>>>      </analyzer>
>>>>>>>>      <analyzer type="query">
>>>>>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>>>>>        <filter class="solr.SynonymFilterFactory"
>>>>>>>> synonyms="synonyms.txt"
>>>>>>>> ignoreCase="true" expand="true"/>
>>>>>>>>        <filter class="solr.WordDelimiterFilterFactory"
>>>>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>>>>>>>> catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>
>>>>>>>>        <filter class="solr.LowerCaseFilterFactory"/>
>>>>>>>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>>>>>>>
>>>>>>>>      </analyzer>
>>>>>>>>    </fieldType>
>>>>>>>>
>>>>>>>> When I run the query against Luke, no results are returned. Any
>>>>>>>> suggestions
>>>>>>>> are appreciated.
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> View this message in context:
>>>>>>>> http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-document
>>>>>>>> s-are-matched-incorrectly-tp23797731p23797731.html
>>>>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents-are-matched-incorrectly-tp23797731p23815688.html
>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>
>>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents-are-matched-incorrectly-tp23797731p23816242.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents-are-matched-incorrectly-tp23797731p23816809.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: When searching for !@#$%^&*() all documents are matched incorrectly

Reply via email to