Re: How to avoid underscore sign indexing problem?

2013-08-23 Thread Jack Krupansky
-Original Message- From: Steve Rowe Sent: Friday, August 23, 2013 12:30 AM To: solr-user@lucene.apache.org Subject: Re: How to avoid underscore sign indexing problem? Dan, StandardTokenizer implements the word boundary rules from the Unicode Text Segmentation standard annex UAX#29

Re: How to avoid underscore sign indexing problem?

2013-08-22 Thread Floyd Wu
details the rules that the tokenizer uses (in addition to extensive examples.) That's what I mean by deep dive. -- Jack Krupansky -Original Message- From: Shawn Heisey Sent: Wednesday, August 21, 2013 10:41 PM To: solr-user@lucene.apache.org Subject: Re: How to avoid underscore sign

Re: How to avoid underscore sign indexing problem?

2013-08-22 Thread Floyd Wu
-Original Message- From: Shawn Heisey Sent: Wednesday, August 21, 2013 10:41 PM To: solr-user@lucene.apache.org Subject: Re: How to avoid underscore sign indexing problem? On 8/21/2013 7:54 PM, Floyd Wu wrote: When using StandardAnalyzer to tokenize string Pacific_Rim will get ST

Re: How to avoid underscore sign indexing problem?

2013-08-22 Thread Dan Davis
the rules that the tokenizer uses (in addition to extensive examples.) That's what I mean by deep dive. -- Jack Krupansky -Original Message- From: Shawn Heisey Sent: Wednesday, August 21, 2013 10:41 PM To: solr-user@lucene.apache.org Subject: Re: How to avoid underscore sign indexing

Re: How to avoid underscore sign indexing problem?

2013-08-22 Thread Steve Rowe
.) That's what I mean by deep dive. -- Jack Krupansky -Original Message- From: Shawn Heisey Sent: Wednesday, August 21, 2013 10:41 PM To: solr-user@lucene.apache.org Subject: Re: How to avoid underscore sign indexing problem? On 8/21/2013 7:54 PM, Floyd Wu wrote: When using

How to avoid underscore sign indexing problem?

2013-08-21 Thread Floyd Wu
When using StandardAnalyzer to tokenize string Pacific_Rim will get ST textraw_bytesstartendtypeposition pacific_rim[70 61 63 69 66 69 63 5f 72 69 6d]011ALPHANUM1 How to make this string to be tokenized to these two tokens Pacific, Rim? Set _ as stopword? Please kindly help on this. Many thanks.

Re: How to avoid underscore sign indexing problem?

2013-08-21 Thread Shawn Heisey
On 8/21/2013 7:54 PM, Floyd Wu wrote: When using StandardAnalyzer to tokenize string Pacific_Rim will get ST textraw_bytesstartendtypeposition pacific_rim[70 61 63 69 66 69 63 5f 72 69 6d]011ALPHANUM1 How to make this string to be tokenized to these two tokens Pacific, Rim? Set _ as

Re: How to avoid underscore sign indexing problem?

2013-08-21 Thread Jack Krupansky
Message- From: Shawn Heisey Sent: Wednesday, August 21, 2013 10:41 PM To: solr-user@lucene.apache.org Subject: Re: How to avoid underscore sign indexing problem? On 8/21/2013 7:54 PM, Floyd Wu wrote: When using StandardAnalyzer to tokenize string Pacific_Rim will get ST

Re: How to avoid underscore sign indexing problem?

2013-08-21 Thread Floyd Wu
that the tokenizer uses (in addition to extensive examples.) That's what I mean by deep dive. -- Jack Krupansky -Original Message- From: Shawn Heisey Sent: Wednesday, August 21, 2013 10:41 PM To: solr-user@lucene.apache.org Subject: Re: How to avoid underscore sign indexing problem