RE: Solr* != solr*

George Aroush Fri, 01 Aug 2008 08:22:45 -0700

Hi Erik and all,

I'm still trying to solve this issue and I like to know how others might
have solved it in their client.  I can't modify Solr / Lucene code and I'm
using Solr 1.2.


What I have done is simple.  Given a user input, I break it into words and
then analyze each word.  Any word contains wildcards (* Or ?) I lowercase
it.

While the logic is simple, I'm not comfortable with it because the
word-breaker isn't based on the analyzer in use by Lucene.  In my case, I
can't tell which analyzer is used.

So my question is, did you run into this problem, if so, how did you
workaround it?  That is, is breaking on generic whitespaces (independent of
the analyzer in use) "good enough"?

Thanks.

-- George

> -----Original Message-----
> From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, July 01, 2008 9:35 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr* != solr*
> 
> George - wildcard expressions, in Lucene/Solr's QueryParser, 
> are not analyzed.  There is one trick in the API that isn't 
> yet wired to  
> Solr's configuration, and that is setLowercaseExpandedTerms(true).   
> This would solve the Sol* issue because when indexed all 
> terms for the "text" field are lowercased during analysis.
> 
> An functional alternative, of course, is to have the client 
> lowercase the query expression before requesting to Solr 
> (careful, though - consider AND/OR/NOT).
> 
>       Erik
> 
> 
> 
> On Jul 1, 2008, at 8:14 PM, George Aroush wrote:
> 
> > Hi Folks,
> >
> > Can someone tell me what I might have setup wrong?  After 
> indexing my 
> > data, I can search just fine on, let say "sol*" but not on "Sol*" 
> > (note upper case 'S' vs. lower case 's') I get 0 hits.
> >
> > Here is my customize schema.xml setting:
> >
> >    <fieldType name="text" class="solr.TextField"
> > positionIncrementGap="100">
> >      <analyzer type="index">
> >        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >        <!-- in this example, we will only use synonyms at query time
> >        <filter class="solr.SynonymFilterFactory"
> > synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> -->
> >        <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt"/>
> >        <filter class="solr.WordDelimiterFilterFactory"
> > generateWordParts="0" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="0"/>
> >        <filter class="solr.LowerCaseFilterFactory"/>
> >        <filter class="solr.EnglishPorterFilterFactory"
> > protected="protwords.txt"/>
> >        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> >      </analyzer>
> >      <analyzer type="query">
> >        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >        <!-- <filter class="solr.SynonymFilterFactory"
> > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> -->
> >        <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt"/>
> >        <filter class="solr.WordDelimiterFilterFactory"
> > generateWordParts="0" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="0"/>
> >        <filter class="solr.LowerCaseFilterFactory"/>
> >        <filter class="solr.EnglishPorterFilterFactory"
> > protected="protwords.txt"/>
> >        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> >      </analyzer>
> >    </fieldType>
> >
> > Btw, "Solr", "solr", "sOlr", etc. works.  It's a problem with wild 
> > cards.
> >
> > Thanks in advance.
> >
> > -- George
>

RE: Solr* != solr*

Reply via email to