RE: Case where StandardAnalyzer doesn't remove punctuation

2012-03-26 Thread colm.mchugh
Hi Steve, thanks for your response. Totally makes sense, given that the comma character is a widely used for written number syntax (e.g. 1000 is the same as 1,000). Thanks also for the notes re the mailing list and nabble. Colm. -- View this message in context:

Case where StandardAnalyzer doesn't remove punctuation

2012-03-23 Thread colm.mchugh
I'm using Lucene to search address data, and came across an interesting case where StandardAnalyzer appears not to remove punctuation (a comma). To illustrate, the following code snippet uses StandardAnalyzer to analyze an address, printing out each analyzed token. The output of the code snippet

RE: Case where StandardAnalyzer doesn't remove punctuation

2012-03-23 Thread Steven A Rowe
v3.5.0 StandardTokenizer, since it uses Unicode 6.0.0). Steve -Original Message- From: colm.mchugh [mailto:colm.mch...@mapflow.com] Sent: Thursday, March 22, 2012 9:23 AM To: dev@lucene.apache.org Subject: Case where StandardAnalyzer doesn't remove punctuation I'm using Lucene to search