Hi,
I would like to extend Similarity to have the following functionality: if the
query is "A B C" and a field contains "B C" then I would like to call that a
"match" and return a score of 1 (2/2). If the query is "A B C" and the field
contains "B D" then I would like to call that a partial matc
I worked at a couple of search engine vendors (Infoseek Ultraseek and
MarkLogic), and user dictionaries are important for linguistic processing.
Every application has some local jargon.
With languages that don’t separate words with spaces (Chinese and Japanese),
the tokenizer needs the user dic
Hello Bruno,
It's an important and commonly used feature. Feel free to chime in on the
improvements you have in mind. Thanks.
Best,
Christian
On Sat, May 18, 2024 at 9:40 PM Bruno Roustant
wrote:
> Hi,
>
> While looking at the various usages of Map with Integer keys, I found
> ja.dict.UserD
I worked at a Japanese EC company before, and they used to have over
200,000 user dictionary entries. I am not sure they still use such a user
dictionary, but the tokenizer and char/token filters cannot handle several
writing variations. So, this is the important feature for Japanese
handling.
Be