[OPEN-ILS-GENERAL] Synonym Dictionary - Numbers, &

Josh Stompro Thu, 25 May 2017 08:38:46 -0700

Hello, I've followed the steps in the following wiki pages to enable a synonym 
dictionary but I'm not getting the results I expect.


https://wiki.evergreen-ils.org/doku.php?id=scratchpad:brush_up_search#synonym_dictionary

Spelled out numbers do get translated to digits (six -> 6) but digits don't get 
translated ( 6 -> six).

When I test the synonym dictionary with something like the following it looks 
like it works:
select ts_lexize('synonym_larl', '6');
ts_lexize
-----------
{six}
(1 row)

But when I look at the the metabib.title_field_entry for a record that has been 
reindexed I see the following.
select * from metabib.title_field_entry where source=102449 limit 100;
   id    | source | field |                          value                      
     |                                                                          
                  index_vector
---------+--------+-------+----------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2402931 | 102449 |     6 | Little house on the prairie Season 6 [disc 2] test 
seven | '2':9A,13C,20C '6':7A,12C,18C '7':14C 'disc':8A,19C 'hous':13C 
'house':2A 'littl':12C 'little':1A 'on':3A,14C 'prairi':16C 'prairie':5A 
'season':6A,17C 'seven':11A,22C 'test':10A,21C 'the':4A,15C

Seven gets added as 'seven' and '7', but the '2' and '6' do not.

So I'm wondering if the search configuration needs to cover numeric tokens to 
make that work?

select * from ts_debug('synonym_larl', '6');
alias |   description    | token | dictionaries | dictionary | lexemes
-------+------------------+-------+--------------+------------+---------
uint  | Unsigned integer | 6     | {simple}     | simple     | {6}

\dF+ synonym_larl;
Text search configuration "public.synonym_larl"
Parser: "pg_catalog.default"
      Token      | Dictionaries
-----------------+--------------
asciihword      | synonym_larl
asciiword       | synonym_larl
email           | simple
file            | simple
float           | simple
host            | simple
hword           | simple
hword_asciipart | synonym_larl
hword_numpart   | simple
hword_part      | simple
int             | simple
numhword        | simple
numword         | simple
sfloat          | simple
uint            | simple
url             | simple
url_path        | simple
version         | simple
word            | simple

Maybe the uint token needs to be set to synonym_larl also? But I'm wondering if 
this has bad side effects?

Also, another mapping we would like to make is '&' -> 'and' , 'and' -> '&'.  
But it doesn't look like tsearch knows how to categorize '&' as a token.

select * from ts_debug('synonym_larl', '&');
alias |  description  | token | dictionaries | dictionary | lexemes
-------+---------------+-------+--------------+------------+---------
blank | Space symbols | &     | {}           |            |

Works fine going the other way and the '&' ends up in the index.

select * from ts_debug('synonym_larl', 'and');
   alias   |   description   | token |  dictionaries  |  dictionary  | lexemes
-----------+-----------------+-------+----------------+--------------+---------
asciiword | Word, all ASCII | and   | {synonym_larl} | synonym_larl | {&}

Thanks
Josh


Lake Agassiz Regional Library - Moorhead MN larl.org
Josh Stompro     | Office 218.233.3757 EXT-139
LARL IT Director | Cell 218.790.2110

[OPEN-ILS-GENERAL] Synonym Dictionary - Numbers, &

Reply via email to