Terry Steichen wrote:
I tested StandardAnalyzer (which uses StandardTokenizer) by inputing the a set of strings which produced the following results:
aa/bb/cc/dd was tokenized into 4 terms: aa, bb, cc, dd
aa/bb/cc/d1 was tokenized into 3 terms: aa, bb, cc/d1
aa/bb/c1/dd was tokenized into 2
the
same regardless of the presence or absence of numeric characters?
- Original Message -
From: Doug Cutting [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Monday, December 30, 2002 1:42 PM
Subject: Re: Incomprehensible (to me) tokenizing behavior
Terry Steichen wrote:
I
I tested StandardAnalyzer (which uses StandardTokenizer) by inputing the a set of
strings which produced the following results:
aa/bb/cc/dd was tokenized into 4 terms: aa, bb, cc, dd
aa/bb/cc/d1 was tokenized into 3 terms: aa, bb, cc/d1
aa/bb/c1/dd was tokenized into 2 terms: aa, bb/c1/dd