Ashok:

You really, _really_ need to dive into the admin/analysis page.
That'll show you exactly what WDFF (and all the other elements of your
chain) do to input tokens. Understanding the index and query-time
implications of all the settings in WDFF takes a while.

But from what you're describing, WDFF may not be what you're looking
for anyway, some of the regex filters could split, for instance, on
all non-alphanum characters.

Best
Erick

On Wed, Apr 17, 2013 at 12:25 AM, Shawn Heisey <s...@elyograg.org> wrote:
> On 4/16/2013 8:12 PM, Ashok wrote:
>> It looks like any 'word' that starts with a digit is treated as a numeric
>> string.
>>
>> Setting generateNumberParts="1" in stead of "0" seems to generate the right
>> tokens in this case but need to see if it has any other impacts on the
>> finalized token list...
>
> I have a fieldType that is using WDF with the following settings on the
> index side.  Both index and query analysis show it behaving correctly
> with terms that start with numbers, on versions 4.2.1 and 3.5.0:
>
>         <filter class="solr.WordDelimiterFilterFactory"
>           splitOnCaseChange="1"
>           splitOnNumerics="1"
>           stemEnglishPossessive="1"
>           generateWordParts="1"
>           generateNumberParts="1"
>           catenateWords="1"
>           catenateNumbers="1"
>           catenateAll="0"
>           preserveOriginal="1"
>         />
>
> It has different settings on the query side, but generateNumberParts is
> 1 for both:
>
>         <filter class="solr.WordDelimiterFilterFactory"
>           splitOnCaseChange="1"
>           splitOnNumerics="1"
>           stemEnglishPossessive="1"
>           generateWordParts="1"
>           generateNumberParts="1"
>           catenateWords="0"
>           catenateNumbers="0"
>           catenateAll="0"
>           preserveOriginal="0"
>         />
>
> I haven't tried it with generateNumberParts set to 0.
>
> Thanks,
> Shawn
>

Reply via email to