subject:"RE\: StandardTokenizer and Korean grouping with alphanum"

RE: StandardTokenizer and Korean grouping with alphanum

2008-09-22 Thread Steven A Rowe

Hi Daniel, On 09/22/2008 at 12:49 AM, Daniel Noll wrote: I have a question about Korean tokenisation. Currently there is a rule in StandardTokenizerImpl.jflex which looks like this: ALPHANUM = ({LETTER}|{DIGIT}|{KOREAN})+ LUCENE-1126 https://issues.apache.org/jira/browse/LUCENE-1126

Re: StandardTokenizer and Korean grouping with alphanum

2008-09-22 Thread Daniel Noll

Steven A Rowe wrote: Korean has been treated differently from Chinese and Japanese since LUCENE-461 https://issues.apache.org/jira/browse/LUCENE-461. The grouping of Hangul with digits was introduced in this issue. Certainly I found LUCENE-461 during my search, and certainly grouping