Another great use case for synonyms is misspellings. I saw one synonym list
in which the top synonym was the phrase "dead mouse" (which doesn't look
misspelled at all); I won't tell you what it's "proper" synonym was, other
than to say that it was VERY app/culture-dependent. It was also interesting
because the user's original query phrase needed to be given a much lower
weighting in order to find what the user was "likely" looking for.
-- Jack Krupansky
-----Original Message-----
From: Walter Underwood
Sent: Wednesday, December 12, 2012 7:16 PM
To: solr-user@lucene.apache.org
Subject: Re: Can a field with defined synonym be searched without the
synonym?
If you have tons of content, you can do selective reindexing. You only need
to reindex the docs containing the the new terms. If I add a synonym for
"babysitter" and "baby sitter", then I can do a search for documents
containing either of those, and only reindex those.
Reverse weighting to even out the IDF would work, but it could be pretty
tweaky. If one synonym is very rare, you put in small weight, but then you
index several documents with that term and the it is overweighted.
wunder
On Dec 12, 2012, at 4:09 PM, Jack Krupansky wrote:
Sure, synonyms have lots of issues and choosing index vs. query is simply
picking your poison, but it all depends on your app and your data and your
user expectations, and you, the developer, have tools to moderate a lot of
these issues.
Index-time synonyms have the problem (among others) that they cannot be
changed without reindexing.
One technique is to simulate the query-time synonym filter expansion by
having your app preprocess user queries to expand to the OR of the
synonyms and then boost or de-boost the synonyms as makes sense for your
app.
For example,
(tv^0.5 OR television^2.5 OR "boob tube"^0.0001)
-- Jack Krupansky
-----Original Message----- From: Steve Rowe
Sent: Wednesday, December 12, 2012 5:28 PM
To: solr-user@lucene.apache.org
Subject: Re: Can a field with defined synonym be searched without the
synonym?
Hmm, I've gotten this very wrong :) - DisjunctionMaxQuery will operate
per-doc, so using it in the way I suggested will not allow for synonym IDF
leveling across documents. Also, scoring obviously includes more factors
than IDF.
On Dec 12, 2012, at 5:18 PM, Steve Rowe <sar...@gmail.com> wrote:
But couldn't the IDF problem be fixed by applying the same IDF to all
synonyms, e.g. via DisjunctionMaxQuery? (Maybe the ideal would be an
average, not a max.)
(E)dismax applies this query per-field, but AFAICT there is nothing
stopping anybody (modulo query parser construction :) ) from using it on
synonyms in the same field.
Steve
On Dec 12, 2012, at 12:50 PM, Walter Underwood <wun...@wunderwood.org>
wrote:
Query parsers cannot fix the IDF problem or make query-time synonyms
faster. Query synonym expansion makes more search terms. More search
terms are more work at query time.
The IDF problem is real; I've run up against it. The most rare variant
of the synonym have the highest score. This probably the opposite of
what you want. For me, it was "TV" and "television". Documents with "TV"
had higher scores than those with "television".
wunder
On Dec 12, 2012, at 9:45 AM, Roman Chyla wrote:
@wunder
It is a misconception (well, supported by that wiki description) that
the
query time synonym filter have these problems. It is actually the
default
parser, that is causing these problems. Look at this if you still think
that index time synonyms are cure for all:
https://issues.apache.org/jira/browse/LUCENE-4499
@joe
If you can use the flexible query parser (as linked in by @Swati) then
all
you need to do is to define a different field with a different
tokenizer
chain and then swap the field names before the analyzers processes the
document (and then rewrite the field name back - for example, we have
fields called "author" and "author_nosyn")
roman
On Wed, Dec 12, 2012 at 12:38 PM, Walter Underwood
<wun...@wunderwood.org>wrote:
Query time synonyms have known problems. They are slower, cause
incorrect
IDF, and don't work for phrase synonyms.
Apply synonyms at index time and you will have none of those problems.
See:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
wunder
On Dec 12, 2012, at 9:34 AM, Swati Swoboda wrote:
Query-time analyzers are still applied, even if you include a string
in
quotes. Would you expect "foo" to not match "Foo" just because it's
enclosed in quotes?
Also look at this, someone who had similar requirements:
http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-td2919876.html
-----Original Message-----
From: joe.cohe...@gmail.com [mailto:joe.cohe...@gmail.com]
Sent: Wednesday, December 12, 2012 12:09 PM
To: solr-user@lucene.apache.org
Subject: Re: Can a field with defined synonym be searched without the
synonym?
I'm aplying only query-time synonym, so I have the original values
stored and indexed.
I would've expected that if I search a strin with quotations, i'll
get
the exact match, without applying a synonym.
any way to achieve that?
Upayavira wrote
You can only search against terms that are stored in your index. If
you have applied index time synonyms, you can't remove them at query
time.
You can, however, use copyField to clone an incoming field to
another
field that doesn't use synonyms, and search against that field
instead.
Upayavira
On Wed, Dec 12, 2012, at 04:26 PM,
joe.cohen.m@
wrote:
Hi
I hava a field type without defined synonym.txt which retrieves
both
records with "home" and "house" when I search either one of them.
I want to be able to search this field on the specific value that I
enter, without the synonym filter.
is it possible?
thanks.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-b
e-searched-without-the-synonym-tp4026381.html
Sent from the Solr - User mailing list archive at Nabble.com.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-be-searched-without-the-synonym-tp4026381p4026405.html
Sent from the Solr - User mailing list archive at Nabble.com.
--
Walter Underwood
wun...@wunderwood.org
--
Walter Underwood
wun...@wunderwood.org
--
Walter Underwood
wun...@wunderwood.org