Re: query parsing wildcards

2007-11-28 Thread Charles Hornberger
I should have Googled better. It seems that my question has been asked
and answered already, and not just once:

  http://www.nabble.com/Using-wildcard-with-accented-words-tf4673239.html
  
http://groups.google.com/group/acts_as_solr/browse_thread/thread/42920dc2dcc5fa88

On Nov 28, 2007 9:42 AM, Charles Hornberger
[EMAIL PROTECTED] wrote:
 I'm confused by some behavior I'm seeing in Solr (i'm using 1.2.0). I
 have a field named description, declared with the following
 fieldType:

 fieldType name=textTightUnstemmed class=solr.TextField
 positionIncrementGap=100 
   analyzer
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=false/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=0 generateNumberParts=0 catenateWords=1
 catenateNumbers=1 catenateAll=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
 /fieldType

 The problem I'm having is that when I search for description:deck*, I
 get the results I expect; when I search for description:Deck*, I get
 nothing. I want both queries to return the same result set. (I'm using
 the standard request handler.)

 Interestingly, when I search for description:Deck from the web
 interface, the debug output shows that the query term is converted to
 lowercase:

 str name=rawquerystringdescription:Deck/str
 str name=querystringdescription:Deck/str
 str name=parsedquerydescription:deck/str
 str name=parsedquery_toStringdescription:deck/str

 ... but when I search for description:Deck*, it shows that it is not:

 str name=rawquerystringdescription:Deck*/str
 str name=querystringdescription:Deck*/str
 str name=parsedquerydescription:Deck*/str
 str name=parsedquery_toStringdescription:Deck*/str

 What am I doing wrong here?

 Also, when I use the Field Analysis tool for description:Deck*, it
 shows the following (sorry for the bad copy/paste):

 Query Analyzer
 org.apache.solr.analysis.WhitespaceTokenizerFactory {}
 term position   1
 term text   Deck*
 term type   word
 source start,end0,5
 org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt,
 expand=false, ignoreCase=true}
 term position   1
 term text   Deck*
 term type   word
 source start,end0,5
 org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
 ignoreCase=true}
 term position   1
 term text   Deck*
 term type   word
 source start,end0,5
 org.apache.solr.analysis.WordDelimiterFilterFactory
 {generateNumberParts=0, catenateWords=1, generateWordParts=0,
 catenateAll=0, catenateNumbers=1}
 term position   1
 term text   Deck
 term type   word
 source start,end0,4
 org.apache.solr.analysis.LowerCaseFilterFactory {}
 term position   1
 term text   deck
 term type   word
 source start,end0,4
 org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}
 term position   1
 term text   deck
 term type   word
 source start,end0,4

 Thanks,
 Charlie



Re: query parsing wildcards

2007-11-28 Thread Chris Hostetter

: I should have Googled better. It seems that my question has been asked
: and answered already, and not just once:

right, wildcard and prefix queries aren't analyzed by the query 
parser (there's more on the why of this in the Lucene-Java FAQ).

To clarify one other part of your question

:  Also, when I use the Field Analysis tool for description:Deck*, it
:  shows the following (sorry for the bad copy/paste):

the analysis tool only shows you the analysis portion of 
indexing/querying ... it knows nothing about which query parser you are 
using, so it doesn't know anything about any special query parser 
characters (like *).  The output it gave you shows you want the 
standard request handler would have done if you'd used the standard 
request handler to search for...
 description:Deck*
or:  description:Deck\*

(where the * character is 'escaped')



-Hoss