Your best bet is to preprocess queries and expand synonyms in your own
application layer. The Lucene/Solr synonym implementation, design, and
architecture is fairly lightweight (although FST is a big improvement) and not
architected for large and dynamic synonym sets.
Do you need multi-word
The examples I've seen so far are single words. But I learned today
something new .. the number of synonyms returned for a word may be in the
range of hundreds, sometimes even thousands.
So I'm not sure query-time synonyms may work at all .. what do you think?
Shai
On Thu, Jul 18, 2013 at 3:21
. Although,
for hundreds or thousands of synonyms it would probably hit the 2048 common
limit for URLs in some containers, which would need to be raised.
-- Jack Krupansky
From: Shai Erera
Sent: Thursday, July 18, 2013 8:54 AM
To: dev@lucene.apache.org
Subject: Re: Programmatic Synonyms Filter
the
2048 common limit for URLs in some containers, which would need to be
raised.
-- Jack Krupansky
*From:* Shai Erera ser...@gmail.com
*Sent:* Thursday, July 18, 2013 8:54 AM
*To:* dev@lucene.apache.org
*Subject:* Re: Programmatic Synonyms Filter (Lucene and/or Solr)
The examples I've
Container (e.g., Tomcat) limit. Configurable. I don’t recall the specifics.
-- Jack Krupansky
From: Shai Erera
Sent: Thursday, July 18, 2013 9:46 AM
To: dev@lucene.apache.org
Subject: Re: Programmatic Synonyms Filter (Lucene and/or Solr)
Actually, after chatting w/ Mike about it, he made
There are two serious issues with query-time synonyms, speed and correctness.
1. Expanding a term to 1000 synonyms at query time means 1000 term lookups.
This will not be fast. Expanding the term at index time means 1000 posting list
entries, but only one term lookup at query time.
2. Query
We index time synonyms means you bloat the index with a lot of new
postings, most of them are just duplicates of each other. And in my case,
cause for every synonym there's a weight, I cannot even consider postings
deduplication...
There's a tradeoff here (as usual). Both approaches have pros and
Adding terms to posting lists is about the most space-efficient thing you can
do in a search engine, so I would not worry too much about that.
wunder
On Jul 18, 2013, at 10:06 AM, Shai Erera wrote:
We index time synonyms means you bloat the index with a lot of new postings,
most of them are
To: dev@lucene.apache.org
Subject: Re: Programmatic Synonyms Filter (Lucene and/or Solr)
Container (e.g., Tomcat) limit. Configurable. I don’t recall the specifics.
-- Jack Krupansky
From: Shai Erera mailto:ser...@gmail.com
Sent: Thursday, July 18, 2013 9:46 AM
To: dev
much do you know about the frequency of synonym updates for this synonym
source API?
-- Jack Krupansky
From: Shai Erera
Sent: Thursday, July 18, 2013 1:06 PM
To: dev@lucene.apache.org
Subject: Re: Programmatic Synonyms Filter (Lucene and/or Solr)
We index time synonyms means you bloat
,
for hundreds or thousands of synonyms it would probably hit the 2048 common
limit for URLs in some containers, which would need to be raised.
-- Jack Krupansky
From: Shai Erera
Sent: Thursday, July 18, 2013 8:54 AM
To: dev@lucene.apache.org
Subject: Re: Programmatic Synonyms Filter (Lucene
: Thursday, July 18, 2013 8:54 AM
To: dev@lucene.apache.org
Subject: Re: Programmatic Synonyms Filter (Lucene and/or Solr)
The examples I've seen so far are single words. But I learned today
something new .. the number of synonyms returned for a word may be in the
range of hundreds, sometimes even
12 matches
Mail list logo