You can create an index on to_tsvector(replace(foo, '-', ' ')) and then search using ...match..(replace(foo, ...), ...)
On Mon, Oct 4, 2010 at 11:41 AM, Arthur van der Wal < [email protected]> wrote: > Hi, > > I want to change the way PostgreSQL splits text into tokens, for example: > > plainto_tsquery("v-74") should split it up as "v" & "74" instead of "v" & > "-74". > > Another example: > > select to_tsvector('NL83-V-74-001-001')'-001':5,6 '74':4 'nl83':2 > 'nl83-v':1 'v':3 > > Searching for 'v-71' does not find the database entry as the '-' in 'v-71' > is not indexed. It's hard to determine when PostgreSQL splits things up by > '-' and when not > > > I tried writing my own parser (based on the the test_parser example) which > does nothing more than split at '-', however it seems to me that the logic > for finding 'base' words and derivitives that postgres does so nicely > doesn't work anymore. > > Another way would be to disable the (signed) int tokeniser and have the > unsigned int tokeniser accept preceeding 0's. > > Can anybody point me in the right direction as in how to tackle this > problem? > > Thanks very much in advance, > > Arthur van der Wal >
