Hello, I've followed the steps in the following wiki pages to enable a synonym dictionary but I'm not getting the results I expect.
https://wiki.evergreen-ils.org/doku.php?id=scratchpad:brush_up_search#synonym_dictionary Spelled out numbers do get translated to digits (six -> 6) but digits don't get translated ( 6 -> six). When I test the synonym dictionary with something like the following it looks like it works: select ts_lexize('synonym_larl', '6'); ts_lexize ----------- {six} (1 row) But when I look at the the metabib.title_field_entry for a record that has been reindexed I see the following. select * from metabib.title_field_entry where source=102449 limit 100; id | source | field | value | index_vector ---------+--------+-------+----------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2402931 | 102449 | 6 | Little house on the prairie Season 6 [disc 2] test seven | '2':9A,13C,20C '6':7A,12C,18C '7':14C 'disc':8A,19C 'hous':13C 'house':2A 'littl':12C 'little':1A 'on':3A,14C 'prairi':16C 'prairie':5A 'season':6A,17C 'seven':11A,22C 'test':10A,21C 'the':4A,15C Seven gets added as 'seven' and '7', but the '2' and '6' do not. So I'm wondering if the search configuration needs to cover numeric tokens to make that work? select * from ts_debug('synonym_larl', '6'); alias | description | token | dictionaries | dictionary | lexemes -------+------------------+-------+--------------+------------+--------- uint | Unsigned integer | 6 | {simple} | simple | {6} \dF+ synonym_larl; Text search configuration "public.synonym_larl" Parser: "pg_catalog.default" Token | Dictionaries -----------------+-------------- asciihword | synonym_larl asciiword | synonym_larl email | simple file | simple float | simple host | simple hword | simple hword_asciipart | synonym_larl hword_numpart | simple hword_part | simple int | simple numhword | simple numword | simple sfloat | simple uint | simple url | simple url_path | simple version | simple word | simple Maybe the uint token needs to be set to synonym_larl also? But I'm wondering if this has bad side effects? Also, another mapping we would like to make is '&' -> 'and' , 'and' -> '&'. But it doesn't look like tsearch knows how to categorize '&' as a token. select * from ts_debug('synonym_larl', '&'); alias | description | token | dictionaries | dictionary | lexemes -------+---------------+-------+--------------+------------+--------- blank | Space symbols | & | {} | | Works fine going the other way and the '&' ends up in the index. select * from ts_debug('synonym_larl', 'and'); alias | description | token | dictionaries | dictionary | lexemes -----------+-----------------+-------+----------------+--------------+--------- asciiword | Word, all ASCII | and | {synonym_larl} | synonym_larl | {&} Thanks Josh Lake Agassiz Regional Library - Moorhead MN larl.org Josh Stompro | Office 218.233.3757 EXT-139 LARL IT Director | Cell 218.790.2110
