Reece Hart <[EMAIL PROTECTED]> writes:
> For the purposes of indexing these names, I suspect I'd get the majority
> of cases by removing a hyphen when it's followed by 1 or 2 chars from
> [a-zA-Z0-9]. Does that require a custom parser?
Yeah, looks like it:
regression=# select * from ts_debug('MCL1 MCL-1');
alias | description | token | dictionaries | dictionary |
lexemes
-----------+--------------------------+-------+----------------+--------------+---------
numword | Word, letters and digits | MCL1 | {simple} | simple |
{mcl1}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | MCL | {english_stem} | english_stem |
{mcl}
int | Signed integer | -1 | {simple} | simple |
{-1}
(4 rows)
I had thought you might get a "numhword" output, but that only seems to
happen if there's at least one letter after the dash:
regression=# select * from ts_debug('MCL1 MCL-X1');
alias | description | token |
dictionaries | dictionary | lexemes
-----------------+------------------------------------------+--------+----------------+--------------+----------
numword | Word, letters and digits | MCL1 | {simple}
| simple | {mcl1}
blank | Space symbols | | {}
| |
numhword | Hyphenated word, letters and digits | MCL-X1 | {simple}
| simple | {mcl-x1}
hword_asciipart | Hyphenated word part, all ASCII | MCL |
{english_stem} | english_stem | {mcl}
blank | Space symbols | - | {}
| |
hword_numpart | Hyphenated word part, letters and digits | X1 | {simple}
| simple | {x1}
(6 rows)
regards, tom lane
--
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general