Re: StandardTokenizer and splitting on mixedcase strings

2018-02-26 Thread Erick Erickson
Dan:

The admin UI analysis page is invaluable for understanding exactly
what element of your analysis chain does what. So when you restructure
your analysis chain you can use it to see if the input transforms the
way you want it to.

Best,
Erick

On Mon, Feb 26, 2018 at 7:21 AM, Shawn Heisey  wrote:
> On 2/23/2018 10:55 AM, Rick Leir wrote:
>> Lowercase filter before the tokenizer?
>
> Unless somebody invents a lowercasing CharFilter, which I don't think
> exists currently, that's not possible.
>
> Groups of Solr analysis components always run in the following order:
>
> First CharFilter entries are run.
> Then the Tokenizer is run.
> Then Filter entries are run.
>
> Within each group, individual components run in the order they are
> configured, but the filters will always run after charfilters and the
> tokenizer.
>
> Thanks,
> Shawn
>


Re: StandardTokenizer and splitting on mixedcase strings

2018-02-26 Thread Shawn Heisey
On 2/23/2018 10:55 AM, Rick Leir wrote:
> Lowercase filter before the tokenizer?

Unless somebody invents a lowercasing CharFilter, which I don't think
exists currently, that's not possible.

Groups of Solr analysis components always run in the following order:

First CharFilter entries are run.
Then the Tokenizer is run.
Then Filter entries are run.

Within each group, individual components run in the order they are
configured, but the filters will always run after charfilters and the
tokenizer.

Thanks,
Shawn



Re: StandardTokenizer and splitting on mixedcase strings

2018-02-23 Thread Rick Leir
Dan,
Lowercase filter before the tokenizer?
Cheers -- Rick

On February 23, 2018 6:08:27 AM EST, "Dan ."  wrote:
>Hi,
>
>The StandardTokenizerFactory splits strings like 'JavaScript' into
>'Java'
>and 'Script', but then searches with 'javascript' do not match the
>document.
>
>Is there a solr way to prevent StandardTokenizer from splitting
>mixedcase
>strings?
>
>Cheers,
>Dan

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: StandardTokenizer and splitting on mixedcase strings

2018-02-23 Thread Steve Rowe
Hi Dan,

StandardTokenizerFactory does not do this.

Maybe you have a filter in your analysis chain that does this?  E.g. 
WordDelimiterFilterFactory has this capability.

--
Steve
www.lucidworks.com

> On Feb 23, 2018, at 6:08 AM, Dan .  wrote:
> 
> Hi,
> 
> The StandardTokenizerFactory splits strings like 'JavaScript' into 'Java'
> and 'Script', but then searches with 'javascript' do not match the document.
> 
> Is there a solr way to prevent StandardTokenizer from splitting mixedcase
> strings?
> 
> Cheers,
> Dan