Re: ICUTokenizer and CJK

2010-11-23 Thread Robert Muir
On Mon, Nov 22, 2010 at 6:50 PM, Burton-West, Tom wrote: > Hi all, > > I see in the javadoc for the ICUTokenizer that it has special handling for > Lao,Myanmar, Khmer word breaking but no details in the javadoc about what it > does with CJK, which for C and J appears to be breaking into unigrams

ICUTokenizer and CJK

2010-11-22 Thread Burton-West, Tom
Hi all, I see in the javadoc for the ICUTokenizer that it has special handling for Lao,Myanmar, Khmer word breaking but no details in the javadoc about what it does with CJK, which for C and J appears to be breaking into unigrams. Is this correct? Tom