Re: Lowercase all characters in String

2016-10-11 Thread Zheng Lin Edwin Yeo
Thanks Ahmet and Walter.

It works.

Regards,
Edwin


On 11 October 2016 at 23:36, Walter Underwood  wrote:

> Like this:
>
> 
> 
>   
> 
> 
>   
> 
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Oct 11, 2016, at 7:43 AM, Ahmet Arslan 
> wrote:
> >
> > Hi,
> >
> > KeywordTokenizer and LowerCaseFilter should suffice. Optionally you can
> add TrimFilter too.
> >
> > Ahmet
> >
> >
> > On Tuesday, October 11, 2016 5:24 PM, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com> wrote:
> > Hi,
> >
> > Would like to find out, what is the best way to lowercase all the text,
> > while preserving all the tokens.
> >
> > As I need to preserve every character of the text (including symbols and
> > white space), I'm using String. However, I can't put the
> > LowerCaseFilterFactory in String.
> >
> > I found that we can use WhitespaceTokenizerFactory, followed by
> > LowerCaseFilterFactory. Although WhitespaceTokenizerFactory can preserve
> > the symbols, it will still split on Whitespace, which is what we do not
> > want. This is because we may have words like 'One' and 'One Way'. If we
> use
> > the WhitespaceTokenizerFactory and search for 'One', it will return
> records
> > with 'One Way' too, which is what we do not want.
> >
> > Is there other way which we can achieve this?
> >
> > I'm using Solr 6.2.1.
> >
> > Regards,
> > Edwin
>
>


Re: Lowercase all characters in String

2016-10-11 Thread Walter Underwood
Like this:



  


  


wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Oct 11, 2016, at 7:43 AM, Ahmet Arslan  wrote:
> 
> Hi,
> 
> KeywordTokenizer and LowerCaseFilter should suffice. Optionally you can add 
> TrimFilter too.
> 
> Ahmet
> 
> 
> On Tuesday, October 11, 2016 5:24 PM, Zheng Lin Edwin Yeo 
>  wrote:
> Hi,
> 
> Would like to find out, what is the best way to lowercase all the text,
> while preserving all the tokens.
> 
> As I need to preserve every character of the text (including symbols and
> white space), I'm using String. However, I can't put the
> LowerCaseFilterFactory in String.
> 
> I found that we can use WhitespaceTokenizerFactory, followed by
> LowerCaseFilterFactory. Although WhitespaceTokenizerFactory can preserve
> the symbols, it will still split on Whitespace, which is what we do not
> want. This is because we may have words like 'One' and 'One Way'. If we use
> the WhitespaceTokenizerFactory and search for 'One', it will return records
> with 'One Way' too, which is what we do not want.
> 
> Is there other way which we can achieve this?
> 
> I'm using Solr 6.2.1.
> 
> Regards,
> Edwin



Re: Lowercase all characters in String

2016-10-11 Thread Ahmet Arslan
Hi,

KeywordTokenizer and LowerCaseFilter should suffice. Optionally you can add 
TrimFilter too.

Ahmet


On Tuesday, October 11, 2016 5:24 PM, Zheng Lin Edwin Yeo 
 wrote:
Hi,

Would like to find out, what is the best way to lowercase all the text,
while preserving all the tokens.

As I need to preserve every character of the text (including symbols and
white space), I'm using String. However, I can't put the
LowerCaseFilterFactory in String.

I found that we can use WhitespaceTokenizerFactory, followed by
LowerCaseFilterFactory. Although WhitespaceTokenizerFactory can preserve
the symbols, it will still split on Whitespace, which is what we do not
want. This is because we may have words like 'One' and 'One Way'. If we use
the WhitespaceTokenizerFactory and search for 'One', it will return records
with 'One Way' too, which is what we do not want.

Is there other way which we can achieve this?

I'm using Solr 6.2.1.

Regards,
Edwin


Lowercase all characters in String

2016-10-11 Thread Zheng Lin Edwin Yeo
Hi,

Would like to find out, what is the best way to lowercase all the text,
while preserving all the tokens.

As I need to preserve every character of the text (including symbols and
white space), I'm using String. However, I can't put the
LowerCaseFilterFactory in String.

I found that we can use WhitespaceTokenizerFactory, followed by
LowerCaseFilterFactory. Although WhitespaceTokenizerFactory can preserve
the symbols, it will still split on Whitespace, which is what we do not
want. This is because we may have words like 'One' and 'One Way'. If we use
the WhitespaceTokenizerFactory and search for 'One', it will return records
with 'One Way' too, which is what we do not want.

Is there other way which we can achieve this?

I'm using Solr 6.2.1.

Regards,
Edwin