Re: standard tokenizer seemingly splitting on dot

Mikhail Khludnev Tue, 02 May 2023 11:02:52 -0700

Analyzer is configured in schema.xml. But literally, splitting on dot is
what I expect from StandardTokenizer.


On Tue, May 2, 2023 at 8:48 PM Bill Tantzen <[email protected]>
wrote:

> Mikhail,
> Thanks for the quick reply.  Here is the parser info:
>
> <str name="QParser">LuceneQParser</str>
>
> ~~Bill
>
> On Tue, May 2, 2023 at 12:43 PM Mikhail Khludnev <[email protected]> wrote:
>
> > Hello Bill,
> > Which analyzer is configured for metadata_txt?  Perhaps you need to tune
> it
> > accordingly.
> >
> > On Tue, May 2, 2023 at 7:40 PM Bill Tantzen <[email protected]>
> > wrote:
> >
> > > In my solr 9.2 schema, I am leveraging the dynamicField
> > >
> > > <dynamicField name="*_txt" type="text_general" indexed="true"
> > > stored="true"/>
> > >
> > > which tokenizes with solr.StandardTokenizerFactory for index and query.
> > >
> > > However, when I query with, for example,
> > > <str name="q">metadata_txt:XYZ.tif</str>
> > >
> > > I see many more hits than I expect.  When I add debug=true to the
> query,
> > I
> > > see:
> > > <str name="rawquerystring">metadata_txt:XYZ.tif</str>
> > > <str name="querystring">metadata_txt:XYZ.tif</str>
> > > <str name="parsedquery">metadata_txt:XYZ metadata_txt:tif</str>
> > >
> > > But I expect that dots not followed by whitespace will be kept as part
> of
> > > the token, that is, the parsed query should remain
> "metadata_txt:XYZ.tif"
> > > but solr appears to be splitting into two tokens.
> > >
> > > Can somebody point out what I am misunderstanding?
> > > Thanks,
> > > ~~Bill
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > https://t.me/MUST_SEARCH
> > A caveat: Cyrillic!
> >
>
>
> --
> Human wheels spin round and round
> While the clock keeps the pace... -- John Mellencamp
> ________________________________________________________________
> Bill Tantzen    University of Minnesota Libraries
> 612-626-9949 (U of M)    612-325-1777 (cell)
>


-- 
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!

Re: standard tokenizer seemingly splitting on dot

Reply via email to