Add "minPartLength" to WordDelimiterFilter
------------------------------------------

                 Key: SOLR-293
                 URL: https://issues.apache.org/jira/browse/SOLR-293
             Project: Solr
          Issue Type: New Feature
          Components: update
    Affects Versions: 1.3
            Reporter: Mike Klaas
            Assignee: Mike Klaas
            Priority: Minor
             Fix For: 1.3


WDF is handy but over-tokenizes when faced with short word parts:

A9
R2D2
mp3

This creates one- or two- character tokens which are extremely slow to query as 
the doc freq is so high (this is contributing to a significant portion of our 
slowest queries).

This patch adds a "minPartLength" option that disables generation of parts 
below a certain length.  It is recommended to use it with catenateAll, so as to 
not lose tokens.

I'll add factory options and tests if we decide to include this (and are happy 
with the parameter name).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to