Re: Erroneous tokenization behavior

2016-09-14 Thread Sattam Alsubaiee
.out.println(term); > } > stream.end(); > stream.close(); > } > } > > -- > Steve > www.lucidworks.com > > > On Sep 13, 2016, at 7:59 PM, Sattam Alsubaiee <salsuba...@gmail.com> > wrote: > > > > Hi Michael, > > > > Yes, that

Re: Erroneous tokenization behavior

2016-09-13 Thread Sattam Alsubaiee
Are you wanting to discard the too-long terms (the 4.7.x behavior)? > > Mike McCandless > > http://blog.mikemccandless.com > > > On Tue, Sep 13, 2016 at 12:42 AM, Sattam Alsubaiee <salsuba...@gmail.com> > wrote: > > I'm trying to understand the tokenization behavio

Erroneous tokenization behavior

2016-09-12 Thread Sattam Alsubaiee
I'm trying to understand the tokenization behavior in Lucene. When using the StandardTokenizer in Lucene version 4.7.1, and trying to tokenize the following string "Tokenize me!" with max token filter set to be 4, I get only the token "me", but when using Lucene version 4.10.4, I get the following