[ https://issues.apache.org/jira/browse/SOLR-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798260#action_12798260 ]
Chris Male commented on SOLR-1710: ---------------------------------- I am working with this patch with the goal of simplifying its logic and increasing readability. Seems great thus far though. > convert worddelimiterfilter to new tokenstream API > -------------------------------------------------- > > Key: SOLR-1710 > URL: https://issues.apache.org/jira/browse/SOLR-1710 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis > Reporter: Robert Muir > Attachments: SOLR-1710.patch, SOLR-1710.patch > > > This one was a doozy, attached is a patch to convert it to the new > tokenstream API. > Some of the logic was split into WordDelimiterIterator (exposes a > BreakIterator-like api for iterating subwords) > the filter is much more efficient now, no cloning. > before applying the patch, copy the existing WordDelimiterFilter to > OriginalWordDelimiterFilter > the patch includes a testcase (TestWordDelimiterBWComp) which generates > random strings from various subword combinations. > For each random string, it compares output against the existing > WordDelimiterFilter for all 512 combinations of boolean parameters. > NOTE: due to bugs found (SOLR-1706), this currently only tests 256 of these > combinations. The bugs discovered in SOLR-1706 are fixed here. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.