Thanks, Steve.
Sattam
On Tue, Sep 13, 2016 at 5:51 PM, Steve Rowe wrote:
> Hi Sattam,
>
> You’re right, StandardTokenizer's behavior changed (in 4.9.1/4.10) to
> split long tokens at maxTokenLength rather than ignore tokens longer than
> maxTokenLength.
>
> You can simulate
Hi Sattam,
You’re right, StandardTokenizer's behavior changed (in 4.9.1/4.10) to split
long tokens at maxTokenLength rather than ignore tokens longer than
maxTokenLength.
You can simulate the old behavior by setting maxTokenLength to the length of
the longest token you want to be able to
Hi Michael,
Yes, that's the desired behavior. The setMaxTokenLength method is supposed
to allow that.
Cheers,
Sattam
On Tue, Sep 13, 2016 at 11:57 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> I guess this was a change in behavior in those versions.
>
> Are you wanting to
I guess this was a change in behavior in those versions.
Are you wanting to discard the too-long terms (the 4.7.x behavior)?
Mike McCandless
http://blog.mikemccandless.com
On Tue, Sep 13, 2016 at 12:42 AM, Sattam Alsubaiee wrote:
> I'm trying to understand the
I'm trying to understand the tokenization behavior in Lucene. When using
the StandardTokenizer in Lucene version 4.7.1, and trying to tokenize the
following string "Tokenize me!" with max token filter set to be 4, I get
only the token "me", but when using Lucene version 4.10.4, I get the
following