Re: Programmatic Synonyms Filter (Lucene and/or Solr)

2013-07-18 Thread Jack Krupansky
Your best bet is to preprocess queries and expand synonyms in your own application layer. The Lucene/Solr synonym implementation, design, and architecture is fairly lightweight (although FST is a big improvement) and not architected for large and dynamic synonym sets. Do you need multi-word

Re: Programmatic Synonyms Filter (Lucene and/or Solr)

2013-07-18 Thread Shai Erera
The examples I've seen so far are single words. But I learned today something new .. the number of synonyms returned for a word may be in the range of hundreds, sometimes even thousands. So I'm not sure query-time synonyms may work at all .. what do you think? Shai On Thu, Jul 18, 2013 at 3:21

Re: Programmatic Synonyms Filter (Lucene and/or Solr)

2013-07-18 Thread Jack Krupansky
. Although, for hundreds or thousands of synonyms it would probably hit the 2048 common limit for URLs in some containers, which would need to be raised. -- Jack Krupansky From: Shai Erera Sent: Thursday, July 18, 2013 8:54 AM To: dev@lucene.apache.org Subject: Re: Programmatic Synonyms Filter

Re: Programmatic Synonyms Filter (Lucene and/or Solr)

2013-07-18 Thread Shai Erera
the 2048 common limit for URLs in some containers, which would need to be raised. -- Jack Krupansky *From:* Shai Erera ser...@gmail.com *Sent:* Thursday, July 18, 2013 8:54 AM *To:* dev@lucene.apache.org *Subject:* Re: Programmatic Synonyms Filter (Lucene and/or Solr) The examples I've

Re: Programmatic Synonyms Filter (Lucene and/or Solr)

2013-07-18 Thread Jack Krupansky
Container (e.g., Tomcat) limit. Configurable. I don’t recall the specifics. -- Jack Krupansky From: Shai Erera Sent: Thursday, July 18, 2013 9:46 AM To: dev@lucene.apache.org Subject: Re: Programmatic Synonyms Filter (Lucene and/or Solr) Actually, after chatting w/ Mike about it, he made

Re: Programmatic Synonyms Filter (Lucene and/or Solr)

2013-07-18 Thread Walter Underwood
There are two serious issues with query-time synonyms, speed and correctness. 1. Expanding a term to 1000 synonyms at query time means 1000 term lookups. This will not be fast. Expanding the term at index time means 1000 posting list entries, but only one term lookup at query time. 2. Query

Re: Programmatic Synonyms Filter (Lucene and/or Solr)

2013-07-18 Thread Shai Erera
We index time synonyms means you bloat the index with a lot of new postings, most of them are just duplicates of each other. And in my case, cause for every synonym there's a weight, I cannot even consider postings deduplication... There's a tradeoff here (as usual). Both approaches have pros and

Re: Programmatic Synonyms Filter (Lucene and/or Solr)

2013-07-18 Thread Walter Underwood
Adding terms to posting lists is about the most space-efficient thing you can do in a search engine, so I would not worry too much about that. wunder On Jul 18, 2013, at 10:06 AM, Shai Erera wrote: We index time synonyms means you bloat the index with a lot of new postings, most of them are

RE: Programmatic Synonyms Filter (Lucene and/or Solr)

2013-07-18 Thread Uwe Schindler
To: dev@lucene.apache.org Subject: Re: Programmatic Synonyms Filter (Lucene and/or Solr) Container (e.g., Tomcat) limit. Configurable. I don’t recall the specifics. -- Jack Krupansky From: Shai Erera mailto:ser...@gmail.com Sent: Thursday, July 18, 2013 9:46 AM To: dev

Re: Programmatic Synonyms Filter (Lucene and/or Solr)

2013-07-18 Thread Jack Krupansky
much do you know about the frequency of synonym updates for this synonym source API? -- Jack Krupansky From: Shai Erera Sent: Thursday, July 18, 2013 1:06 PM To: dev@lucene.apache.org Subject: Re: Programmatic Synonyms Filter (Lucene and/or Solr) We index time synonyms means you bloat

Re: Programmatic Synonyms Filter (Lucene and/or Solr)

2013-07-18 Thread SUJIT PAL
, for hundreds or thousands of synonyms it would probably hit the 2048 common limit for URLs in some containers, which would need to be raised. -- Jack Krupansky From: Shai Erera Sent: Thursday, July 18, 2013 8:54 AM To: dev@lucene.apache.org Subject: Re: Programmatic Synonyms Filter (Lucene

Re: Programmatic Synonyms Filter (Lucene and/or Solr)

2013-07-18 Thread Shai Erera
: Thursday, July 18, 2013 8:54 AM To: dev@lucene.apache.org Subject: Re: Programmatic Synonyms Filter (Lucene and/or Solr) The examples I've seen so far are single words. But I learned today something new .. the number of synonyms returned for a word may be in the range of hundreds, sometimes even