-Original Message-
From: Steve Rowe
Sent: Friday, August 23, 2013 12:30 AM
To: solr-user@lucene.apache.org
Subject: Re: How to avoid underscore sign indexing problem?
Dan,
StandardTokenizer implements the word boundary rules from the Unicode Text
Segmentation standard annex UAX#29
details the rules that the tokenizer uses (in addition to
extensive examples.) That's what I mean by deep dive.
-- Jack Krupansky
-Original Message- From: Shawn Heisey
Sent: Wednesday, August 21, 2013 10:41 PM
To: solr-user@lucene.apache.org
Subject: Re: How to avoid underscore sign
-Original Message- From: Shawn Heisey
Sent: Wednesday, August 21, 2013 10:41 PM
To: solr-user@lucene.apache.org
Subject: Re: How to avoid underscore sign indexing problem?
On 8/21/2013 7:54 PM, Floyd Wu wrote:
When using StandardAnalyzer to tokenize string Pacific_Rim will get
ST
the rules that the tokenizer uses (in addition to
extensive examples.) That's what I mean by deep dive.
-- Jack Krupansky
-Original Message- From: Shawn Heisey
Sent: Wednesday, August 21, 2013 10:41 PM
To: solr-user@lucene.apache.org
Subject: Re: How to avoid underscore sign indexing
.) That's what I mean by deep dive.
-- Jack Krupansky
-Original Message- From: Shawn Heisey
Sent: Wednesday, August 21, 2013 10:41 PM
To: solr-user@lucene.apache.org
Subject: Re: How to avoid underscore sign indexing problem?
On 8/21/2013 7:54 PM, Floyd Wu wrote:
When using
When using StandardAnalyzer to tokenize string Pacific_Rim will get
ST
textraw_bytesstartendtypeposition
pacific_rim[70 61 63 69 66 69 63 5f 72 69 6d]011ALPHANUM1
How to make this string to be tokenized to these two tokens Pacific,
Rim?
Set _ as stopword?
Please kindly help on this.
Many thanks.
On 8/21/2013 7:54 PM, Floyd Wu wrote:
When using StandardAnalyzer to tokenize string Pacific_Rim will get
ST
textraw_bytesstartendtypeposition
pacific_rim[70 61 63 69 66 69 63 5f 72 69 6d]011ALPHANUM1
How to make this string to be tokenized to these two tokens Pacific,
Rim?
Set _ as
Message-
From: Shawn Heisey
Sent: Wednesday, August 21, 2013 10:41 PM
To: solr-user@lucene.apache.org
Subject: Re: How to avoid underscore sign indexing problem?
On 8/21/2013 7:54 PM, Floyd Wu wrote:
When using StandardAnalyzer to tokenize string Pacific_Rim will get
ST
that the tokenizer uses (in addition to
extensive examples.) That's what I mean by deep dive.
-- Jack Krupansky
-Original Message- From: Shawn Heisey
Sent: Wednesday, August 21, 2013 10:41 PM
To: solr-user@lucene.apache.org
Subject: Re: How to avoid underscore sign indexing problem