Re: StandardTokenizer and splitting on mixedcase strings
Dan: The admin UI analysis page is invaluable for understanding exactly what element of your analysis chain does what. So when you restructure your analysis chain you can use it to see if the input transforms the way you want it to. Best, Erick On Mon, Feb 26, 2018 at 7:21 AM, Shawn Heisey wrote: > On 2/23/2018 10:55 AM, Rick Leir wrote: >> Lowercase filter before the tokenizer? > > Unless somebody invents a lowercasing CharFilter, which I don't think > exists currently, that's not possible. > > Groups of Solr analysis components always run in the following order: > > First CharFilter entries are run. > Then the Tokenizer is run. > Then Filter entries are run. > > Within each group, individual components run in the order they are > configured, but the filters will always run after charfilters and the > tokenizer. > > Thanks, > Shawn >
Re: StandardTokenizer and splitting on mixedcase strings
On 2/23/2018 10:55 AM, Rick Leir wrote: > Lowercase filter before the tokenizer? Unless somebody invents a lowercasing CharFilter, which I don't think exists currently, that's not possible. Groups of Solr analysis components always run in the following order: First CharFilter entries are run. Then the Tokenizer is run. Then Filter entries are run. Within each group, individual components run in the order they are configured, but the filters will always run after charfilters and the tokenizer. Thanks, Shawn
Re: StandardTokenizer and splitting on mixedcase strings
Dan, Lowercase filter before the tokenizer? Cheers -- Rick On February 23, 2018 6:08:27 AM EST, "Dan ." wrote: >Hi, > >The StandardTokenizerFactory splits strings like 'JavaScript' into >'Java' >and 'Script', but then searches with 'javascript' do not match the >document. > >Is there a solr way to prevent StandardTokenizer from splitting >mixedcase >strings? > >Cheers, >Dan -- Sorry for being brief. Alternate email is rickleir at yahoo dot com
Re: StandardTokenizer and splitting on mixedcase strings
Hi Dan, StandardTokenizerFactory does not do this. Maybe you have a filter in your analysis chain that does this? E.g. WordDelimiterFilterFactory has this capability. -- Steve www.lucidworks.com > On Feb 23, 2018, at 6:08 AM, Dan . wrote: > > Hi, > > The StandardTokenizerFactory splits strings like 'JavaScript' into 'Java' > and 'Script', but then searches with 'javascript' do not match the document. > > Is there a solr way to prevent StandardTokenizer from splitting mixedcase > strings? > > Cheers, > Dan