Ashok: You really, _really_ need to dive into the admin/analysis page. That'll show you exactly what WDFF (and all the other elements of your chain) do to input tokens. Understanding the index and query-time implications of all the settings in WDFF takes a while.
But from what you're describing, WDFF may not be what you're looking for anyway, some of the regex filters could split, for instance, on all non-alphanum characters. Best Erick On Wed, Apr 17, 2013 at 12:25 AM, Shawn Heisey <s...@elyograg.org> wrote: > On 4/16/2013 8:12 PM, Ashok wrote: >> It looks like any 'word' that starts with a digit is treated as a numeric >> string. >> >> Setting generateNumberParts="1" in stead of "0" seems to generate the right >> tokens in this case but need to see if it has any other impacts on the >> finalized token list... > > I have a fieldType that is using WDF with the following settings on the > index side. Both index and query analysis show it behaving correctly > with terms that start with numbers, on versions 4.2.1 and 3.5.0: > > <filter class="solr.WordDelimiterFilterFactory" > splitOnCaseChange="1" > splitOnNumerics="1" > stemEnglishPossessive="1" > generateWordParts="1" > generateNumberParts="1" > catenateWords="1" > catenateNumbers="1" > catenateAll="0" > preserveOriginal="1" > /> > > It has different settings on the query side, but generateNumberParts is > 1 for both: > > <filter class="solr.WordDelimiterFilterFactory" > splitOnCaseChange="1" > splitOnNumerics="1" > stemEnglishPossessive="1" > generateWordParts="1" > generateNumberParts="1" > catenateWords="0" > catenateNumbers="0" > catenateAll="0" > preserveOriginal="0" > /> > > I haven't tried it with generateNumberParts set to 0. > > Thanks, > Shawn >