Question about extending Similarity

2024-05-21 Thread Georgios Georgiadis
Hi, I would like to extend Similarity to have the following functionality: if the query is "A B C" and a field contains "B C" then I would like to call that a "match" and return a score of 1 (2/2). If the query is "A B C" and the field contains "B D" then I would like to call that a partial matc

Re: How much is ja.dict.UserDictionary used?

2024-05-21 Thread Walter Underwood
I worked at a couple of search engine vendors (Infoseek Ultraseek and MarkLogic), and user dictionaries are important for linguistic processing. Every application has some local jargon. With languages that don’t separate words with spaces (Chinese and Japanese), the tokenizer needs the user dic

Re: How much is ja.dict.UserDictionary used?

2024-05-21 Thread Christian Moen
Hello Bruno, It's an important and commonly used feature. Feel free to chime in on the improvements you have in mind. Thanks. Best, Christian On Sat, May 18, 2024 at 9:40 PM Bruno Roustant wrote: > Hi, > > While looking at the various usages of Map with Integer keys, I found > ja.dict.UserD

Re: How much is ja.dict.UserDictionary used?

2024-05-21 Thread Kazuaki Hiraga
I worked at a Japanese EC company before, and they used to have over 200,000 user dictionary entries. I am not sure they still use such a user dictionary, but the tokenizer and char/token filters cannot handle several writing variations. So, this is the important feature for Japanese handling. Be