Re: [lucy-user] Synonyms with Lucy

Nils Diewald Thu, 04 Jul 2013 14:35:43 -0700

Hi Nathan,

That's a good idea for synonymy!

I think that independent offsets would be a good addition to core, too
(if it is not already possible).
This would - for example - also allow for compound tokenization (like
https://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilter.html).
So in case you have the word "Donaudampfschiff", you could index
"schiff" as well as "Donaudampfschiff" - and if you like, you could give
"schiff" the complete offset of "Donaudampfschiff" (as
"Donaudampfschiff" is just a special type of "schiff").
This wouldn't be feasible with expanded queries, as there are unlimited
types of "schiff" possible.

Best,
Nils

Am 04.07.2013 22:41, schrieb Nathan Kurz:
> Hi Nils --
>
> I don't think this is directly supported, but it seems like a good addition.
>
> Another approach might be to expand to the synonyms in the query
> rather than in the index.   That is, expand a search for
> [examplification]  to [example OR examplification], which should
> already highlight correctly.
>
> You'd be trading a less efficient query for a small index.
>
> --nate
>
> On Thu, Jul 4, 2013 at 6:07 AM, Nils Diewald <*@b**n.de> wrote:
>> Hello,
>> I'm working with Lucene as well as with Lucy and I'm wondering if there
>> is a possibility to store multiple terms with independent offset
>> informations in Lucy, like this is possible with Lucene.
>>
>> Example:
>> The string "This is an example" should be indexed with the
>> offset-information:
>> * this,0-4
>> * is,5-7
>> * an,8-10
>> * example,11-18
>> * examplification, 11-18
>> so in case the user searches for "examplification" the highlighter
>> highlights the synonym "example".
>>
>> I'm glad about any hints in the right direction. Thank you all for this
>> awesome tool!
>> Best, Nils

Re: [lucy-user] Synonyms with Lucy

Reply via email to