[ https://issues.apache.org/jira/browse/SOLR-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798266#action_12798266 ]
Chris Male commented on SOLR-1710: ---------------------------------- Just wondering what the return type of WordDelimiterIterator#next() supposed to indicate? I see that it either returns the end index, or DONE but this value never seems to be used by the filter. Does it have a role? > convert worddelimiterfilter to new tokenstream API > -------------------------------------------------- > > Key: SOLR-1710 > URL: https://issues.apache.org/jira/browse/SOLR-1710 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis > Reporter: Robert Muir > Attachments: SOLR-1710.patch, SOLR-1710.patch > > > This one was a doozy, attached is a patch to convert it to the new > tokenstream API. > Some of the logic was split into WordDelimiterIterator (exposes a > BreakIterator-like api for iterating subwords) > the filter is much more efficient now, no cloning. > before applying the patch, copy the existing WordDelimiterFilter to > OriginalWordDelimiterFilter > the patch includes a testcase (TestWordDelimiterBWComp) which generates > random strings from various subword combinations. > For each random string, it compares output against the existing > WordDelimiterFilter for all 512 combinations of boolean parameters. > NOTE: due to bugs found (SOLR-1706), this currently only tests 256 of these > combinations. The bugs discovered in SOLR-1706 are fixed here. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.