Re: KeywordTokenizerFactory - trouble with "exact" matches

Aleksander Akerø Wed, 29 Jan 2014 11:37:39 -0800

Thanks a lot, I'll try the autoGeneratePhraseQueries property and see how
that works.


Regarding the reindexing tip, it's a good tip but due to the my current "on
the fly" setup on the servers at work i basically have do build a project
with maven and deploy to tomcat, wherein the index lies, and I therefore
have to reindex each time otherwise the index would be empty. Also i
usually add use the "clean" parameter when testing with DIH. So that
shouldn't be a problem.

*Aleksander Akerø*
Systemkonsulent
Mobil: 944 89 054
E-post: aleksan...@gurusoft.no

*Gurusoft AS*
Telefon: 92 44 09 99
Østre Kullerød
www.gurusoft.no


2014-01-29 Alexandre Rafalovitch <arafa...@gmail.com>

> I think the whitespace might also be the issue. The query gets parsed
> by standard component that splits it on space before passing
> individual components into the field searches.
>
> Try enabling autoGeneratePhraseQueries on the field (or field type)
> and reindexing. See if that makes a difference.
>
> Regards,
>   Alex.
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>
>
> On Wed, Jan 29, 2014 at 9:55 PM, Aleksander Akerø
> <aleksan...@gurusoft.no> wrote:
> > update:
> >
> > Guessing that this has nothing to do with the tokenizer. Tried to use the
> > string fieldtype as well, but still the same results. So this must have
> to
> > do with some other solr config.
> >
> > What confuses me is that when I search "1005" which is another valid
> value
> > to search for, it works perfectly, but then again, this query contains no
> > whitespace.
> >
> > Any ideas?
> >
> > *Aleksander Akerø*
> > Systemkonsulent
> > Mobil: 944 89 054
> > E-post: aleksan...@gurusoft.no
> >
> > *Gurusoft AS*
> > Telefon: 92 44 09 99
> > Østre Kullerød
> > www.gurusoft.no
> >
> >
> > 2014-01-29 Aleksander Akerø <aleksan...@gurusoft.no>
> >
> >> Thanks for the quick answer, but it doesn't help if I remove the
> lowercase
> >> analyzer like so:
> >>
> >> *        <fieldType name="keyword" class="solr.TextField"
> >> positionIncrementGap="100">*
> >> *            <analyzer type="index">*
> >> *                <tokenizer class="solr.KeywordTokenizerFactory"/>*
> >> *            </analyzer>*
> >> *            <analyzer type="query">*
> >> *                <tokenizer class="solr.KeywordTokenizerFactory"/>*
> >> *            </analyzer>*
> >> *        </fieldType>*
> >>
> >>  I still need to add quotes to the searchquery to get results. And the
> >> weird thing is that if I use the analyzer and put in "FE 009" (again,
> >> without quotes) for both index and query values, it highlights the
> result
> >> as to show a match, but when i search using the GUI it gives me no
> results.
> >> The same happens when posting directly to the /select requestHandler
> via GET
> >>
> >> These is what i post using GET:
> >> http://mysite.com/solr/corename/select?q=number:FE%20009&qf=number
>  =>
> >> this does not work
> >> http://mysite.com/solr/corename/select?q=number:"FE%20009"&qf=number
>  =>
> >> this works
> >>
> >> Really starting to wonder if I am doing something terribly wrong
> somewhere.
> >>
> >> This is my requestHandler btw, pretty basic:
> >> <!-- #### Default handler #### -->
> >>     <requestHandler name="/select" class="solr.SearchHandler">
> >>         <lst name="defaults">
> >>             <str name="echoParams">explicit</str>
> >>             <str name="defType">edismax</str>
> >>             <str name="q.alt">*:*</str>
> >>             <str name="rows">10</str>
> >>             <str name="fl">*,score</str>
> >>             <str name="qf">number</str>
> >>         </lst>
> >>     </requestHandler>
> >>
> >> *Aleksander Akerø*
> >> Systemkonsulent
> >> Mobil: 944 89 054
> >> E-post: aleksan...@gurusoft.no
> >>
> >> *Gurusoft AS*
> >> Telefon: 92 44 09 99
> >> Østre Kullerød
> >> www.gurusoft.no
> >>
> >>
> >> 2014-01-29 Aruna Kumar Pamulapati <apamulap...@gmail.com>
> >>
> >> Hi ,
> >>>
> >>> I think the misunderstanding you are having is about
> >>>
> >>>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LowerCaseTokenizerFactory
> >>> lowercase
> >>> factory.
> >>>
> >>> You are correct about KeywordTokenizerFactory  but lowercase factory :
> >>> Creates
> >>> tokens by lowercasing all letters and dropping non-letters.
> >>>
> >>> The best place to play and learn these pipelines is Solr admin panel =>
> >>> analysis page.
> >>>
> >>>
> >>> thanks,
> >>> Arun
> >>>
> >>>
> >>> On Wed, Jan 29, 2014 at 9:05 AM, Aleksander Akerø <
> aleksan...@gurusoft.no
> >>> >wrote:
> >>>
> >>> > Hi, I'll try properly this time.
> >>> >
> >>> > According to solr documentation the solr.KeywordTokenizerFactory
> should
> >>> not
> >>> > do any tokenizing at all. Thus, if I understand this correctly, it
> >>> should
> >>> > only return exact matches given that this is the only analyzer
> defined
> >>> in
> >>> > the field type. Such as the following config:
> >>> >
> >>> > Fieldtypes:
> >>> > *       <fieldType name="keyword" class="solr.TextField"
> >>> > positionIncrementGap="100">*
> >>> > *            <analyzer type="index">*
> >>> > *                <tokenizer class="solr.KeywordTokenizerFactory"/>*
> >>> > *                <filter class="solr.LowerCaseFilterFactory"/>*
> >>> > *            </analyzer>*
> >>> > *            <analyzer type="query">*
> >>> > *                <tokenizer class="solr.KeywordTokenizerFactory"/>*
> >>> > *                <filter class="solr.LowerCaseFilterFactory"/>*
> >>> > *            </analyzer>*
> >>> > *        </fieldType>*
> >>> >
> >>> > Fields:
> >>> > *        <field name="number" type="keyword" indexed="true"
> >>> stored="true"
> >>> > required="false" />*
> >>> >
> >>> > But it seems not to be this way for me. In the index i have values
> like
> >>> "FE
> >>> > 009", "EE 009", "ED 009" and "FE 009-1" (without the quotes of
> course.
> >>> But
> >>> > when i search "FE 009" (without quotes), I get no results. It seems
> >>> that I
> >>> > have to add quotes to the searchquery in order to retrieve any
> results,
> >>> but
> >>> > that wont't work for me, as I later on have to expand the index with
> >>> other
> >>> > fields that need whitespace-tokenization and such, or would that work
> >>> > regardless of quotes? I have come to understand that wrapping the
> query
> >>> in
> >>> > quotes forces it to be analyzed as one token, no matter what.
> >>> >
> >>> > If I get this to work I would also like to add the
> >>> > "solr.EdgeNGramFilterFactory" to the index side analyzer, thus adding
> >>> > trailing wildcard matches. E.g. return "FE 009-1", "FE 009-2" as
> well as
> >>> > "FE 009" when searching for "FE 009", but not "EE 009", and "ED 009".
> >>> Would
> >>> > that be an ok way to do it?
> >>> >
> >>> > *Aleksander Akerø*
> >>> > Systemkonsulent
> >>> > Mobil: 944 89 054
> >>> > E-post: aleksan...@gurusoft.no
> >>> >
> >>> > *Gurusoft AS*
> >>> > Telefon: 92 44 09 99
> >>> > Østre Kullerød
> >>> > www.gurusoft.no
> >>> >
> >>>
> >>
> >>
>

Re: KeywordTokenizerFactory - trouble with "exact" matches

Reply via email to