Re: searching exact phrase with stop word returns bad results

Upayavira Wed, 13 Mar 2013 02:14:18 -0700

Exact phrase search isn't exact phrase search as you are thinking of it.
A phrase search for "foo bar" searches for the terms foo and bar, and
then checks whether they are one position apart. If punctuation has been
removed during analysis, it *cannot* play a part in a search of any
kind.


You may be able to achieve what you want with a PatternTokenizer rather
than whitespace and removing the WordDelimiterFilterFactory.

Upayavira

On Wed, Mar 13, 2013, at 08:41 AM, adfel70 wrote:
> I want the following behaivour.
> if "john....@gmail.com" is indexed to the field
> 1. searching 'john' or 'doe' or 'gmail.com' will retreive the doc.
> 2. searching '"@gmail.com' will retreive the doc.
> 3. searching '"gmail.com@"' will not retreive the doc.
> 
> All I can accomplish, but 3. 
> because the word delimiter removes '@', when I search "@gmail.com" or
> "gmail.com@" its like searching "gmail.com" which causes unrequired
> results. 
> This is an exact phrase search, so I would expect only docs with the
> exact
> phrase I search (including punctuations ) to be retrieved.
> 
> How can I achieve this?
> 
> Thanks.
> 
> 
> 
> Jack Krupansky-2 wrote
> > The Word Delimiter Filter will remove all punctuation characters. That is 
> > its function.
> > 
> > Maybe you should first describe in simple English what your token/term
> > rules 
> > are, and then it would be more clear what tokenizer and filters would be 
> > most appropriate.
> > 
> > -- Jack Krupansky
> > 
> > -----Original Message----- 
> > From: adfel70
> > Sent: Tuesday, March 12, 2013 3:14 AM
> > To: 
> 
> > solr-user@.apache
> 
> > Subject: Re: searching exact phrase with stop word returns bad results
> > 
> > I see that there is not token with @.
> > the question  is why.
> > this is my field type:
> > <fieldtype name="email_type" class="solr.TextField"
> > positionIncrementGap="100" autoGeneratePhraseQueries="false"
> > omitNorms="true">
> >       
> > <analyzer>
> > <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >           
> > <filter class="solr.LowerCaseFilterFactory"/>
> >           
> > <filter class="solr.WordDelimiterFilterFactory"
> > preserveOriginal="1" generateWordParts="1" generateNumberParts="1"
> > catenateWords="0" catenateNumbers="0" catenateAll="0"
> > splitOnCaseChange="0"/>
> >       
> > </analyzer>
> >     
> > </fieldtype>
> > any idea?
> > 
> > 
> > 
> > Erick Erickson wrote
> >> Take a look at admin/analysis for the field in question, feed it values
> >> and
> >> see how they are tokenized. My guess is that the token in the index is
> > 
> >> abc@
> > 
> >>  (single token), which of course won't match the fragment "@
> >> gmail.com" (assuming gmail.com@ is a typo)...
> >>
> >> Best
> >> Erick
> >>
> >>
> >> On Wed, Mar 6, 2013 at 5:43 AM, adfel70 &lt;
> > 
> >> adfel70@
> > 
> >> &gt; wrote:
> >>
> >>> Hi
> >>>
> >>> I have emails indexed with the default text_general fieldType.
> >>>
> >>> I find that if the email "
> > 
> >> abc@
> > 
> >> " is indexed, and I search for
> >>> "gmail.com@" (exact phrase search) I can a result, while I should not
> >>> get
> >>> one.
> >>>
> >>> Any idea how to solve this?
> >>>
> >>> thanks.
> >>>
> >>>
> >>>
> >>> --
> >>> View this message in context:
> >>> http://lucene.472066.n3.nabble.com/searching-exact-phrase-with-stop-word-returns-bad-results-tp4045180.html
> >>> Sent from the Solr - User mailing list archive at Nabble.com.
> >>>
> > 
> > 
> > 
> > 
> > 
> > --
> > View this message in context: 
> > http://lucene.472066.n3.nabble.com/searching-exact-phrase-with-stop-word-returns-bad-results-tp4045180p4046560.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/searching-exact-phrase-with-stop-word-returns-bad-results-tp4045180p4046904.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: searching exact phrase with stop word returns bad results

Reply via email to