[ https://issues.apache.org/jira/browse/LUCENE-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Mason updated LUCENE-3979: -------------------------------- Summary: NGramTokenizer strips whitespace, with no option to keep leading and trailing whitespace (was: NGramTokenizer) > NGramTokenizer strips whitespace, with no option to keep leading and trailing > whitespace > ---------------------------------------------------------------------------------------- > > Key: LUCENE-3979 > URL: https://issues.apache.org/jira/browse/LUCENE-3979 > Project: Lucene - Java > Issue Type: Bug > Components: modules/analysis > Affects Versions: 2.9.2, 3.0 > Environment: n/a > Reporter: David Mason > Priority: Minor > Labels: tokenizer, whitespace > Original Estimate: 5m > Remaining Estimate: 5m > > org.apache.lucene.analysis.ngram.NGramTokenizer removes whitespace, making a > search for literal strings like " test" and "test " equivalent to "test". > Searching with relevant whitespace is sometimes desired, particularly where > ngrams are used. > This could be fixed by either removing .trim() from the line shown below, or > by providing a flag to specifically set trimming behaviour (keeping trim=true > as the default so that existing code using this analyzer is not broken). > 111: inStr = new String(chars).trim(); // remove any trailing empty strings -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org