subject:"When searching for \!...@#$%\^\* all documents are matched incorrectly"

Re: When searching for !...@#$%^*() all documents are matched incorrectly

2009-06-09 Thread Øystein F. Steimler

On Monday 01 June 2009 16:50, Sam Michaels wrote:
 So the fix for this problem would be

 1. Stop using WordDelimiterFilter for queries (what is the alternative) OR
 2. Not allow any search strings without any alphanumeric characters..

We ran into this same problem while replacing all characters using a 
PatternReplaceFilter. I've been working around this bug by using a 
LengthFilter to filter out tokens of zero length.

.øs

 Yonik Seeley-2 wrote:
  OK, here's the deal:
 
  str name=rawquerystring-features:foo features:(\...@#$%\^\*\(\))/str
  str name=querystring-features:foo features:(\...@#$%\^\*\(\))/str
  str name=parsedquery-features:foo/str
  str name=parsedquery_toString-features:foo/str
 
  The text analysis is throwing away non alphanumeric chars (probably
  the WordDelimiterFilter).  The Lucene (and Solr) query parser throws
  away term queries when the token is zero length (after analysis).
  Solr then interprets the left over -features:foo as all documents
  not containing foo in the features field, so you get a bunch of
  matches.
 
  -Yonik
  http://www.lucidimagination.com
 
  On Mon, Jun 1, 2009 at 10:15 AM, Sam Michaels mas...@yahoo.com wrote:
  Walter,
 
  The analysis link does not produce any matches for either @ or
  !...@#$%^*() strings when I try to match against bathing. I'm worried that
  this might be
  the symptom of another problem (which has not revealed itself yet) and
  want
  to get to the bottom of this...
 
  Thank you.
  sm
 
  Walter Underwood wrote:
  Use the [analysis] link on the Solr admin UI to get more info on
  how this is being interpreted.
 
  However, I am curious about why this is important. Do users enter
  this query often? If not, maybe it is not something to spend time on.
 
  wunder
 
  On 5/31/09 2:56 PM, Sam Michaels mas...@yahoo.com wrote:
  Here is the output from the debug query when I'm trying to match the
  String @
  against Bathing (should not match)
 
  str name=GLOM-1
  3.2689073 = (MATCH) weight(activity_type:NAME in 0), product of:
    0.9994 = queryWeight(activity_type:NAME), product of:
      3.2689075 = idf(docFreq=153, numDocs=1489)
      0.30591258 = queryNorm
    3.2689075 = (MATCH) fieldWeight(activity_type:NAME in 0), product
  of: 1.0 = tf(termFreq(activity_type:NAME)=1)
      3.2689075 = idf(docFreq=153, numDocs=1489)
      1.0 = fieldNorm(field=activity_type, doc=0)
  /str
 
  Looks like the AND clause in the search string is ignored...
 
  SM.
 
  ryantxu wrote:
  two key things to try (for anyone ever wondering why a query matches
  documents)
 
  1.  add debugQuery=true and look at the explain text below --
  anything that contributed to the score is listed there
  2.  check /admin/analysis.jsp -- this will let you see how analyzers
  break text up into tokens.
 
  Not sure off hand, but I'm guessing the WordDelimiterFilterFactory
  has something to do with it...
 
 
  On Sat, May 30, 2009 at 5:59 PM, Sam Michaels mas...@yahoo.com
 
  wrote:
  Hi,
 
  I'm running Solr 1.3/Java 1.6.
 
  When I run a query like  - (activity_type:NAME) AND
  title:(\...@#$%\^\*\(\))
  all the documents are returned even though there is not a single
  match.
  There is no title that matches the string (which has been escaped).
 
  My document structure is as follows
 
  doc
  str name=activity_typeNAME/str
  str name=titleBathing/str
  
  /doc
 
 
  The title field is of type text_title which is described below.
 
  fieldType name=text_title class=solr.TextField
  positionIncrementGap=100
       analyzer type=index
         tokenizer class=solr.WhitespaceTokenizerFactory/
         !-- in this example, we will only use synonyms at query time
         filter class=solr.SynonymFilterFactory
  synonyms=index_synonyms.txt ignoreCase=true expand=false/
         --
         filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=1
  catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
         filter class=solr.LowerCaseFilterFactory/
         filter class=solr.RemoveDuplicatesTokenFilterFactory/
       /analyzer
       analyzer type=query
         tokenizer class=solr.WhitespaceTokenizerFactory/
         filter class=solr.SynonymFilterFactory
  synonyms=synonyms.txt
  ignoreCase=true expand=true/
         filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=1
  catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
         filter class=solr.LowerCaseFilterFactory/
         filter class=solr.RemoveDuplicatesTokenFilterFactory/
 
       /analyzer
     /fieldType
 
  When I run the query against Luke, no results are returned. Any
  suggestions
  are appreciated.
 
 
  --
  View this message in context:
  http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-
 document s-are-matched-incorrectly-tp23797731p23797731.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
  --
  View this message in context:

Re: When searching for !...@#$%^*() all documents are matched incorrectly

2009-06-01 Thread Sam Michaels


Walter,

The analysis link does not produce any matches for either @ or !...@#$%^*()
strings when I try to match against bathing. I'm worried that this might be
the symptom of another problem (which has not revealed itself yet) and want
to get to the bottom of this...

Thank you.
sm


Walter Underwood wrote:
 
 Use the [analysis] link on the Solr admin UI to get more info on
 how this is being interpreted.
 
 However, I am curious about why this is important. Do users enter
 this query often? If not, maybe it is not something to spend time on.
 
 wunder
 
 On 5/31/09 2:56 PM, Sam Michaels mas...@yahoo.com wrote:
 
 
 Here is the output from the debug query when I'm trying to match the
 String @
 against Bathing (should not match)
 
 str name=GLOM-1
 3.2689073 = (MATCH) weight(activity_type:NAME in 0), product of:
   0.9994 = queryWeight(activity_type:NAME), product of:
 3.2689075 = idf(docFreq=153, numDocs=1489)
 0.30591258 = queryNorm
   3.2689075 = (MATCH) fieldWeight(activity_type:NAME in 0), product of:
 1.0 = tf(termFreq(activity_type:NAME)=1)
 3.2689075 = idf(docFreq=153, numDocs=1489)
 1.0 = fieldNorm(field=activity_type, doc=0)
 /str
 
 Looks like the AND clause in the search string is ignored...
 
 SM.
 
 
 ryantxu wrote:
 
 two key things to try (for anyone ever wondering why a query matches
 documents)
 
 1.  add debugQuery=true and look at the explain text below --
 anything that contributed to the score is listed there
 2.  check /admin/analysis.jsp -- this will let you see how analyzers
 break text up into tokens.
 
 Not sure off hand, but I'm guessing the WordDelimiterFilterFactory has
 something to do with it...
 
 
 On Sat, May 30, 2009 at 5:59 PM, Sam Michaels mas...@yahoo.com wrote:
 
 Hi,
 
 I'm running Solr 1.3/Java 1.6.
 
 When I run a query like  - (activity_type:NAME) AND
 title:(\...@#$%\^\*\(\))
 all the documents are returned even though there is not a single match.
 There is no title that matches the string (which has been escaped).
 
 My document structure is as follows
 
 doc
 str name=activity_typeNAME/str
 str name=titleBathing/str
 
 /doc
 
 
 The title field is of type text_title which is described below.
 
 fieldType name=text_title class=solr.TextField
 positionIncrementGap=100
      analyzer type=index
        tokenizer class=solr.WhitespaceTokenizerFactory/
        !-- in this example, we will only use synonyms at query time
        filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
        --
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
      /analyzer
      analyzer type=query
        tokenizer class=solr.WhitespaceTokenizerFactory/
        filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt
 ignoreCase=true expand=true/
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
 
      /analyzer
    /fieldType
 
 When I run the query against Luke, no results are returned. Any
 suggestions
 are appreciated.
 
 
 --
 View this message in context:
 http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-document
 s-are-matched-incorrectly-tp23797731p23797731.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents-are-matched-incorrectly-tp23797731p23815688.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: When searching for !...@#$%^*() all documents are matched incorrectly

2009-06-01 Thread Yonik Seeley

OK, here's the deal:

str name=rawquerystring-features:foo features:(\...@#$%\^\*\(\))/str
str name=querystring-features:foo features:(\...@#$%\^\*\(\))/str
str name=parsedquery-features:foo/str
str name=parsedquery_toString-features:foo/str

The text analysis is throwing away non alphanumeric chars (probably
the WordDelimiterFilter).  The Lucene (and Solr) query parser throws
away term queries when the token is zero length (after analysis).
Solr then interprets the left over -features:foo as all documents
not containing foo in the features field, so you get a bunch of
matches.

-Yonik
http://www.lucidimagination.com


On Mon, Jun 1, 2009 at 10:15 AM, Sam Michaels mas...@yahoo.com wrote:

 Walter,

 The analysis link does not produce any matches for either @ or !...@#$%^*()
 strings when I try to match against bathing. I'm worried that this might be
 the symptom of another problem (which has not revealed itself yet) and want
 to get to the bottom of this...

 Thank you.
 sm


 Walter Underwood wrote:

 Use the [analysis] link on the Solr admin UI to get more info on
 how this is being interpreted.

 However, I am curious about why this is important. Do users enter
 this query often? If not, maybe it is not something to spend time on.

 wunder

 On 5/31/09 2:56 PM, Sam Michaels mas...@yahoo.com wrote:


 Here is the output from the debug query when I'm trying to match the
 String @
 against Bathing (should not match)

 str name=GLOM-1
 3.2689073 = (MATCH) weight(activity_type:NAME in 0), product of:
   0.9994 = queryWeight(activity_type:NAME), product of:
     3.2689075 = idf(docFreq=153, numDocs=1489)
     0.30591258 = queryNorm
   3.2689075 = (MATCH) fieldWeight(activity_type:NAME in 0), product of:
     1.0 = tf(termFreq(activity_type:NAME)=1)
     3.2689075 = idf(docFreq=153, numDocs=1489)
     1.0 = fieldNorm(field=activity_type, doc=0)
 /str

 Looks like the AND clause in the search string is ignored...

 SM.


 ryantxu wrote:

 two key things to try (for anyone ever wondering why a query matches
 documents)

 1.  add debugQuery=true and look at the explain text below --
 anything that contributed to the score is listed there
 2.  check /admin/analysis.jsp -- this will let you see how analyzers
 break text up into tokens.

 Not sure off hand, but I'm guessing the WordDelimiterFilterFactory has
 something to do with it...


 On Sat, May 30, 2009 at 5:59 PM, Sam Michaels mas...@yahoo.com wrote:

 Hi,

 I'm running Solr 1.3/Java 1.6.

 When I run a query like  - (activity_type:NAME) AND
 title:(\...@#$%\^\*\(\))
 all the documents are returned even though there is not a single match.
 There is no title that matches the string (which has been escaped).

 My document structure is as follows

 doc
 str name=activity_typeNAME/str
 str name=titleBathing/str
 
 /doc


 The title field is of type text_title which is described below.

 fieldType name=text_title class=solr.TextField
 positionIncrementGap=100
      analyzer type=index
        tokenizer class=solr.WhitespaceTokenizerFactory/
        !-- in this example, we will only use synonyms at query time
        filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
        --
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
      /analyzer
      analyzer type=query
        tokenizer class=solr.WhitespaceTokenizerFactory/
        filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt
 ignoreCase=true expand=true/
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/

      /analyzer
    /fieldType

 When I run the query against Luke, no results are returned. Any
 suggestions
 are appreciated.


 --
 View this message in context:
 http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-document
 s-are-matched-incorrectly-tp23797731p23797731.html
 Sent from the Solr - User mailing list archive at Nabble.com.








 --
 View this message in context: 
 http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents-are-matched-incorrectly-tp23797731p23815688.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: When searching for !...@#$%^*() all documents are matched incorrectly

2009-06-01 Thread Sam Michaels


So the fix for this problem would be

1. Stop using WordDelimiterFilter for queries (what is the alternative) OR
2. Not allow any search strings without any alphanumeric characters..

SM.


Yonik Seeley-2 wrote:
 
 OK, here's the deal:
 
 str name=rawquerystring-features:foo features:(\...@#$%\^\*\(\))/str
 str name=querystring-features:foo features:(\...@#$%\^\*\(\))/str
 str name=parsedquery-features:foo/str
 str name=parsedquery_toString-features:foo/str
 
 The text analysis is throwing away non alphanumeric chars (probably
 the WordDelimiterFilter).  The Lucene (and Solr) query parser throws
 away term queries when the token is zero length (after analysis).
 Solr then interprets the left over -features:foo as all documents
 not containing foo in the features field, so you get a bunch of
 matches.
 
 -Yonik
 http://www.lucidimagination.com
 
 
 On Mon, Jun 1, 2009 at 10:15 AM, Sam Michaels mas...@yahoo.com wrote:

 Walter,

 The analysis link does not produce any matches for either @ or !...@#$%^*()
 strings when I try to match against bathing. I'm worried that this might
 be
 the symptom of another problem (which has not revealed itself yet) and
 want
 to get to the bottom of this...

 Thank you.
 sm


 Walter Underwood wrote:

 Use the [analysis] link on the Solr admin UI to get more info on
 how this is being interpreted.

 However, I am curious about why this is important. Do users enter
 this query often? If not, maybe it is not something to spend time on.

 wunder

 On 5/31/09 2:56 PM, Sam Michaels mas...@yahoo.com wrote:


 Here is the output from the debug query when I'm trying to match the
 String @
 against Bathing (should not match)

 str name=GLOM-1
 3.2689073 = (MATCH) weight(activity_type:NAME in 0), product of:
   0.9994 = queryWeight(activity_type:NAME), product of:
     3.2689075 = idf(docFreq=153, numDocs=1489)
     0.30591258 = queryNorm
   3.2689075 = (MATCH) fieldWeight(activity_type:NAME in 0), product of:
     1.0 = tf(termFreq(activity_type:NAME)=1)
     3.2689075 = idf(docFreq=153, numDocs=1489)
     1.0 = fieldNorm(field=activity_type, doc=0)
 /str

 Looks like the AND clause in the search string is ignored...

 SM.


 ryantxu wrote:

 two key things to try (for anyone ever wondering why a query matches
 documents)

 1.  add debugQuery=true and look at the explain text below --
 anything that contributed to the score is listed there
 2.  check /admin/analysis.jsp -- this will let you see how analyzers
 break text up into tokens.

 Not sure off hand, but I'm guessing the WordDelimiterFilterFactory has
 something to do with it...


 On Sat, May 30, 2009 at 5:59 PM, Sam Michaels mas...@yahoo.com
 wrote:

 Hi,

 I'm running Solr 1.3/Java 1.6.

 When I run a query like  - (activity_type:NAME) AND
 title:(\...@#$%\^\*\(\))
 all the documents are returned even though there is not a single
 match.
 There is no title that matches the string (which has been escaped).

 My document structure is as follows

 doc
 str name=activity_typeNAME/str
 str name=titleBathing/str
 
 /doc


 The title field is of type text_title which is described below.

 fieldType name=text_title class=solr.TextField
 positionIncrementGap=100
      analyzer type=index
        tokenizer class=solr.WhitespaceTokenizerFactory/
        !-- in this example, we will only use synonyms at query time
        filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
        --
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
      /analyzer
      analyzer type=query
        tokenizer class=solr.WhitespaceTokenizerFactory/
        filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt
 ignoreCase=true expand=true/
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/

      /analyzer
    /fieldType

 When I run the query against Luke, no results are returned. Any
 suggestions
 are appreciated.


 --
 View this message in context:
 http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-document
 s-are-matched-incorrectly-tp23797731p23797731.html
 Sent from the Solr - User mailing list archive at Nabble.com.








 --
 View this message in context:
 http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents-are-matched-incorrectly-tp23797731p23815688.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents-are-matched-incorrectly-tp23797731p23816242.html
Sent from

Re: When searching for !...@#$%^*() all documents are matched incorrectly

2009-06-01 Thread Yonik Seeley

On Mon, Jun 1, 2009 at 10:50 AM, Sam Michaels mas...@yahoo.com wrote:

 So the fix for this problem would be

 1. Stop using WordDelimiterFilter for queries (what is the alternative) OR
 2. Not allow any search strings without any alphanumeric characters..

Short term workaround for you, yes.
I would classify this surprising behavior as a bug we should
eventually fix though.  Could you open a JIRA issue for it?

-Yonik
http://www.lucidimagination.com

 SM.


 Yonik Seeley-2 wrote:

 OK, here's the deal:

 str name=rawquerystring-features:foo features:(\...@#$%\^\*\(\))/str
 str name=querystring-features:foo features:(\...@#$%\^\*\(\))/str
 str name=parsedquery-features:foo/str
 str name=parsedquery_toString-features:foo/str

 The text analysis is throwing away non alphanumeric chars (probably
 the WordDelimiterFilter).  The Lucene (and Solr) query parser throws
 away term queries when the token is zero length (after analysis).
 Solr then interprets the left over -features:foo as all documents
 not containing foo in the features field, so you get a bunch of
 matches.

 -Yonik
 http://www.lucidimagination.com


 On Mon, Jun 1, 2009 at 10:15 AM, Sam Michaels mas...@yahoo.com wrote:

 Walter,

 The analysis link does not produce any matches for either @ or !...@#$%^*()
 strings when I try to match against bathing. I'm worried that this might
 be
 the symptom of another problem (which has not revealed itself yet) and
 want
 to get to the bottom of this...

 Thank you.
 sm


 Walter Underwood wrote:

 Use the [analysis] link on the Solr admin UI to get more info on
 how this is being interpreted.

 However, I am curious about why this is important. Do users enter
 this query often? If not, maybe it is not something to spend time on.

 wunder

 On 5/31/09 2:56 PM, Sam Michaels mas...@yahoo.com wrote:


 Here is the output from the debug query when I'm trying to match the
 String @
 against Bathing (should not match)

 str name=GLOM-1
 3.2689073 = (MATCH) weight(activity_type:NAME in 0), product of:
   0.9994 = queryWeight(activity_type:NAME), product of:
     3.2689075 = idf(docFreq=153, numDocs=1489)
     0.30591258 = queryNorm
   3.2689075 = (MATCH) fieldWeight(activity_type:NAME in 0), product of:
     1.0 = tf(termFreq(activity_type:NAME)=1)
     3.2689075 = idf(docFreq=153, numDocs=1489)
     1.0 = fieldNorm(field=activity_type, doc=0)
 /str

 Looks like the AND clause in the search string is ignored...

 SM.


 ryantxu wrote:

 two key things to try (for anyone ever wondering why a query matches
 documents)

 1.  add debugQuery=true and look at the explain text below --
 anything that contributed to the score is listed there
 2.  check /admin/analysis.jsp -- this will let you see how analyzers
 break text up into tokens.

 Not sure off hand, but I'm guessing the WordDelimiterFilterFactory has
 something to do with it...


 On Sat, May 30, 2009 at 5:59 PM, Sam Michaels mas...@yahoo.com
 wrote:

 Hi,

 I'm running Solr 1.3/Java 1.6.

 When I run a query like  - (activity_type:NAME) AND
 title:(\...@#$%\^\*\(\))
 all the documents are returned even though there is not a single
 match.
 There is no title that matches the string (which has been escaped).

 My document structure is as follows

 doc
 str name=activity_typeNAME/str
 str name=titleBathing/str
 
 /doc


 The title field is of type text_title which is described below.

 fieldType name=text_title class=solr.TextField
 positionIncrementGap=100
      analyzer type=index
        tokenizer class=solr.WhitespaceTokenizerFactory/
        !-- in this example, we will only use synonyms at query time
        filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
        --
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
      /analyzer
      analyzer type=query
        tokenizer class=solr.WhitespaceTokenizerFactory/
        filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt
 ignoreCase=true expand=true/
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/

      /analyzer
    /fieldType

 When I run the query against Luke, no results are returned. Any
 suggestions
 are appreciated.


 --
 View this message in context:
 http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-document
 s-are-matched-incorrectly-tp23797731p23797731.html
 Sent from the Solr - User mailing list archive at Nabble.com.








 --
 View this message in context:

Re: When searching for !...@#$%^*() all documents are matched incorrectly

2009-05-31 Thread Sam Michaels


As per relevance, no results should be returned. But all the results are
returned in alphabetical order.


Walter Underwood wrote:
 
 I'm really curious. What is the most relevant result for that query?
 
 wunder
 
 On 5/30/09 7:35 PM, Ryan McKinley ryan...@gmail.com wrote:
 
 two key things to try (for anyone ever wondering why a query matches
 documents)
 
 1.  add debugQuery=true and look at the explain text below --
 anything that contributed to the score is listed there
 2.  check /admin/analysis.jsp -- this will let you see how analyzers
 break text up into tokens.
 
 Not sure off hand, but I'm guessing the WordDelimiterFilterFactory has
 something to do with it...
 
 
 On Sat, May 30, 2009 at 5:59 PM, Sam Michaels mas...@yahoo.com wrote:
 
 Hi,
 
 I'm running Solr 1.3/Java 1.6.
 
 When I run a query like  - (activity_type:NAME) AND
 title:(\...@#$%\^\*\(\))
 all the documents are returned even though there is not a single match.
 There is no title that matches the string (which has been escaped).
 
 My document structure is as follows
 
 doc
 str name=activity_typeNAME/str
 str name=titleBathing/str
 
 /doc
 
 
 The title field is of type text_title which is described below.
 
 fieldType name=text_title class=solr.TextField
 positionIncrementGap=100
      analyzer type=index
        tokenizer class=solr.WhitespaceTokenizerFactory/
        !-- in this example, we will only use synonyms at query time
        filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
        --
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
      /analyzer
      analyzer type=query
        tokenizer class=solr.WhitespaceTokenizerFactory/
        filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
 
      /analyzer
    /fieldType
 
 When I run the query against Luke, no results are returned. Any
 suggestions
 are appreciated.
 
 
 --
 View this message in context:
 http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents
 -are-matched-incorrectly-tp23797731p23797731.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents-are-matched-incorrectly-tp23797731p23804060.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: When searching for !...@#$%^*() all documents are matched incorrectly

2009-05-31 Thread Sam Michaels


Upon some further experimentation, I found out that even @ matches all the
documents. However when I append the wildcard * to @ (@*) then there is no
match...

SM


Sam Michaels wrote:
 
 Hi,
 
 I'm running Solr 1.3/Java 1.6.  
 
 When I run a query like  - (activity_type:NAME) AND
 title:(\...@#$%\^\*\(\)) all the documents are returned even though there
 is not a single match. There is no title that matches the string (which
 has been escaped). 
 
 My document structure is as follows
 
 doc
 str name=activity_typeNAME/str
 str name=titleBathing/str
 
 /doc
 
 
 The title field is of type text_title which is described below. 
 
 fieldType name=text_title class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
 --
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
 
   /analyzer
 /fieldType
 
 When I run the query against Luke, no results are returned. Any
 suggestions are appreciated.
 
 
 

-- 
View this message in context: 
http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents-are-matched-incorrectly-tp23797731p23804381.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: When searching for !...@#$%^*() all documents are matched incorrectly

2009-05-31 Thread Sam Michaels


Here is the output from the debug query when I'm trying to match the String @
against Bathing (should not match)

str name=GLOM-1
3.2689073 = (MATCH) weight(activity_type:NAME in 0), product of:
  0.9994 = queryWeight(activity_type:NAME), product of:
3.2689075 = idf(docFreq=153, numDocs=1489)
0.30591258 = queryNorm
  3.2689075 = (MATCH) fieldWeight(activity_type:NAME in 0), product of:
1.0 = tf(termFreq(activity_type:NAME)=1)
3.2689075 = idf(docFreq=153, numDocs=1489)
1.0 = fieldNorm(field=activity_type, doc=0)
/str

Looks like the AND clause in the search string is ignored...

SM.


ryantxu wrote:
 
 two key things to try (for anyone ever wondering why a query matches
 documents)
 
 1.  add debugQuery=true and look at the explain text below --
 anything that contributed to the score is listed there
 2.  check /admin/analysis.jsp -- this will let you see how analyzers
 break text up into tokens.
 
 Not sure off hand, but I'm guessing the WordDelimiterFilterFactory has
 something to do with it...
 
 
 On Sat, May 30, 2009 at 5:59 PM, Sam Michaels mas...@yahoo.com wrote:

 Hi,

 I'm running Solr 1.3/Java 1.6.

 When I run a query like  - (activity_type:NAME) AND
 title:(\...@#$%\^\*\(\))
 all the documents are returned even though there is not a single match.
 There is no title that matches the string (which has been escaped).

 My document structure is as follows

 doc
 str name=activity_typeNAME/str
 str name=titleBathing/str
 
 /doc


 The title field is of type text_title which is described below.

 fieldType name=text_title class=solr.TextField
 positionIncrementGap=100
      analyzer type=index
        tokenizer class=solr.WhitespaceTokenizerFactory/
        !-- in this example, we will only use synonyms at query time
        filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
        --
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
      /analyzer
      analyzer type=query
        tokenizer class=solr.WhitespaceTokenizerFactory/
        filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/

      /analyzer
    /fieldType

 When I run the query against Luke, no results are returned. Any
 suggestions
 are appreciated.


 --
 View this message in context:
 http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents-are-matched-incorrectly-tp23797731p23797731.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents-are-matched-incorrectly-tp23797731p23807341.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: When searching for !...@#$%^*() all documents are matched incorrectly

2009-05-31 Thread Walter Underwood

Use the [analysis] link on the Solr admin UI to get more info on
how this is being interpreted.

However, I am curious about why this is important. Do users enter
this query often? If not, maybe it is not something to spend time on.

wunder

On 5/31/09 2:56 PM, Sam Michaels mas...@yahoo.com wrote:

 
 Here is the output from the debug query when I'm trying to match the String @
 against Bathing (should not match)
 
 str name=GLOM-1
 3.2689073 = (MATCH) weight(activity_type:NAME in 0), product of:
   0.9994 = queryWeight(activity_type:NAME), product of:
 3.2689075 = idf(docFreq=153, numDocs=1489)
 0.30591258 = queryNorm
   3.2689075 = (MATCH) fieldWeight(activity_type:NAME in 0), product of:
 1.0 = tf(termFreq(activity_type:NAME)=1)
 3.2689075 = idf(docFreq=153, numDocs=1489)
 1.0 = fieldNorm(field=activity_type, doc=0)
 /str
 
 Looks like the AND clause in the search string is ignored...
 
 SM.
 
 
 ryantxu wrote:
 
 two key things to try (for anyone ever wondering why a query matches
 documents)
 
 1.  add debugQuery=true and look at the explain text below --
 anything that contributed to the score is listed there
 2.  check /admin/analysis.jsp -- this will let you see how analyzers
 break text up into tokens.
 
 Not sure off hand, but I'm guessing the WordDelimiterFilterFactory has
 something to do with it...
 
 
 On Sat, May 30, 2009 at 5:59 PM, Sam Michaels mas...@yahoo.com wrote:
 
 Hi,
 
 I'm running Solr 1.3/Java 1.6.
 
 When I run a query like  - (activity_type:NAME) AND
 title:(\...@#$%\^\*\(\))
 all the documents are returned even though there is not a single match.
 There is no title that matches the string (which has been escaped).
 
 My document structure is as follows
 
 doc
 str name=activity_typeNAME/str
 str name=titleBathing/str
 
 /doc
 
 
 The title field is of type text_title which is described below.
 
 fieldType name=text_title class=solr.TextField
 positionIncrementGap=100
      analyzer type=index
        tokenizer class=solr.WhitespaceTokenizerFactory/
        !-- in this example, we will only use synonyms at query time
        filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
        --
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
      /analyzer
      analyzer type=query
        tokenizer class=solr.WhitespaceTokenizerFactory/
        filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
 
      /analyzer
    /fieldType
 
 When I run the query against Luke, no results are returned. Any
 suggestions
 are appreciated.
 
 
 --
 View this message in context:
 http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-document
 s-are-matched-incorrectly-tp23797731p23797731.html
 Sent from the Solr - User mailing list archive at Nabble.com.

When searching for !...@#$%^*() all documents are matched incorrectly

2009-05-30 Thread Sam Michaels


Hi,

I'm running Solr 1.3/Java 1.6.  

When I run a query like  - (activity_type:NAME) AND title:(\...@#$%\^\*\(\))
all the documents are returned even though there is not a single match.
There is no title that matches the string (which has been escaped). 

My document structure is as follows

doc
str name=activity_typeNAME/str
str name=titleBathing/str

/doc


The title field is of type text_title which is described below. 

fieldType name=text_title class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory
synonyms=index_synonyms.txt ignoreCase=true expand=false/
--
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/

  /analyzer
/fieldType

When I run the query against Luke, no results are returned. Any suggestions
are appreciated.


-- 
View this message in context: 
http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents-are-matched-incorrectly-tp23797731p23797731.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: When searching for !...@#$%^*() all documents are matched incorrectly

2009-05-30 Thread Ryan McKinley

two key things to try (for anyone ever wondering why a query matches documents)

1.  add debugQuery=true and look at the explain text below --
anything that contributed to the score is listed there
2.  check /admin/analysis.jsp -- this will let you see how analyzers
break text up into tokens.

Not sure off hand, but I'm guessing the WordDelimiterFilterFactory has
something to do with it...


On Sat, May 30, 2009 at 5:59 PM, Sam Michaels mas...@yahoo.com wrote:

 Hi,

 I'm running Solr 1.3/Java 1.6.

 When I run a query like  - (activity_type:NAME) AND title:(\...@#$%\^\*\(\))
 all the documents are returned even though there is not a single match.
 There is no title that matches the string (which has been escaped).

 My document structure is as follows

 doc
 str name=activity_typeNAME/str
 str name=titleBathing/str
 
 /doc


 The title field is of type text_title which is described below.

 fieldType name=text_title class=solr.TextField
 positionIncrementGap=100
      analyzer type=index
        tokenizer class=solr.WhitespaceTokenizerFactory/
        !-- in this example, we will only use synonyms at query time
        filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
        --
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
      /analyzer
      analyzer type=query
        tokenizer class=solr.WhitespaceTokenizerFactory/
        filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/

      /analyzer
    /fieldType

 When I run the query against Luke, no results are returned. Any suggestions
 are appreciated.


 --
 View this message in context: 
 http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents-are-matched-incorrectly-tp23797731p23797731.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: When searching for !...@#$%^*() all documents are matched incorrectly

2009-05-30 Thread Walter Underwood

I'm really curious. What is the most relevant result for that query?

wunder

On 5/30/09 7:35 PM, Ryan McKinley ryan...@gmail.com wrote:

 two key things to try (for anyone ever wondering why a query matches
 documents)
 
 1.  add debugQuery=true and look at the explain text below --
 anything that contributed to the score is listed there
 2.  check /admin/analysis.jsp -- this will let you see how analyzers
 break text up into tokens.
 
 Not sure off hand, but I'm guessing the WordDelimiterFilterFactory has
 something to do with it...
 
 
 On Sat, May 30, 2009 at 5:59 PM, Sam Michaels mas...@yahoo.com wrote:
 
 Hi,
 
 I'm running Solr 1.3/Java 1.6.
 
 When I run a query like  - (activity_type:NAME) AND title:(\...@#$%\^\*\(\))
 all the documents are returned even though there is not a single match.
 There is no title that matches the string (which has been escaped).
 
 My document structure is as follows
 
 doc
 str name=activity_typeNAME/str
 str name=titleBathing/str
 
 /doc
 
 
 The title field is of type text_title which is described below.
 
 fieldType name=text_title class=solr.TextField
 positionIncrementGap=100
      analyzer type=index
        tokenizer class=solr.WhitespaceTokenizerFactory/
        !-- in this example, we will only use synonyms at query time
        filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
        --
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
      /analyzer
      analyzer type=query
        tokenizer class=solr.WhitespaceTokenizerFactory/
        filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
 
      /analyzer
    /fieldType
 
 When I run the query against Luke, no results are returned. Any suggestions
 are appreciated.
 
 
 --
 View this message in context:
 http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents
 -are-matched-incorrectly-tp23797731p23797731.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: When searching for !...@#$%^*() all documents are matched incorrectly

Re: When searching for !...@#$%^*() all documents are matched incorrectly

Re: When searching for !...@#$%^*() all documents are matched incorrectly

Re: When searching for !...@#$%^*() all documents are matched incorrectly

Re: When searching for !...@#$%^*() all documents are matched incorrectly

Re: When searching for !...@#$%^*() all documents are matched incorrectly

Re: When searching for !...@#$%^*() all documents are matched incorrectly

Re: When searching for !...@#$%^*() all documents are matched incorrectly

Re: When searching for !...@#$%^*() all documents are matched incorrectly

When searching for !...@#$%^*() all documents are matched incorrectly

Re: When searching for !...@#$%^*() all documents are matched incorrectly

Re: When searching for !...@#$%^*() all documents are matched incorrectly

12 matches

Site Navigation

Mail list logo

Footer information