Thanks Michael,
The dismax hardler does indeed run and escape all non standard characters
before handing it off to the analysers and tokenisers. This fix looks like it
belongs more in the handler, more than the parser. I wrote a SearchComponent
handler to do the same thing at that level and can
Right - QueryParsers generally do a first pass, parsing incoming Strings
using their operator characters tok tokenize the input and only after that
do they pass the tokens (or phrases) to an Analyzer. I haven't checked
Dismax - not sure how it does its parsing exactly, but I doubt you can just
"tur
My impression that these quotes are ones which are part of dismax query
syntax ie they should be handled before the analysis happens.
On Mon, Jan 21, 2019 at 8:09 PM Walter Underwood
wrote:
> First, check which transforms are already handled by Unicode
> normalization. Put this in all of your an
Thanks Walter,
The solr.ICUNormalizer2CharFilterFactory testing and research I have done leads
me to believe that quotes are not normalised.
I attempted to do this with character folding, many implementations out there -
but none actually seem to work.
I’ll look into the draft.
Thank
First, check which transforms are already handled by Unicode normalization. Put
this in all of your analyzer chains:
Probably need this in solrconfig.xml:
I really cannot think of a reason to use unnormalized Unicode in Solr. That
should be in all the sample files.
For searc
I think this is probably better to discuss on solr-user, or maybe solr-dev,
since it is dismax parser you are talking about, which really lives in
Solr. However, my 2c - this seems somewhat dubious. Maybe people want to
include those in their terms? Also, it leads to a kind of slippery slope:
woul