Hi,

I had to work with this kind of sides effects reguarding multiwords synonyms.
We installed solr on our project that extensively uses synonyms, a big
list that sometimes could bring out some wrong match as the one
noticed by Anuvenk
for instance

> dui => drunk driving defense
>  or
> dui,drunk driving defense,drunk driving law
> query for "dui" matches "dui => drunk driving defense" and "dui,drunk driving 
> defense,drunk driving law"

in order to prevent this kind of behavior I gave for every "synonyms
family" (saying a single line in the file) a unique identifier,
so the list looks like :

dui => HIER_FAMILIY_01
drunk driving defense => HIER_FAMILIY_01
SYN_FAMILY_01, dui,drunk driving defense,drunk driving law

I also set the synonyms filter at index time with expand=false, and at
query time with expand=false

so in this way, the matched synonyms (multi words or single words) in
documents are replaced with their family identifier, and not all the
possibilities. Indexing with expand=true will add words in documents
that could be matched alone, ignoring the fact that they belong to
multiwords expression, and this could end up with a wrong match
(intending syns mix) at query time.

so in this way a query for "dui", will be changed by the synonym
filter at query time with "HIER_FAMILIY_01" or "SYN_FAMILY_01" so
documents that contains only single words like "drunk", "driving" or
"law" will not be matched since only a document with the phrase "drunk
driving law" would have been indexed with "SYN_FAMILY_01".

The approach worked pretty good on our project and we do not notice
any sides effects on the searches, it only removes matched documents
that were considered as "noise" of the synonyms mix issue.

I think this could be usefull to add this kind of approach on the solr
synoyms filter section of the wiki,

Cheers

Laurent


On Dec 2, 2007 3:41 PM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
> Hi (changing to solr-user list)
>
> Yes it is, especially if the terms left of => are multi-spaced.  Check out 
> the Wiki, one page there explains this nicely.
>
> Otis
> -
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
> ----- Original Message ----
> From: anuvenk <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Saturday, December 1, 2007 1:21:49 AM
> Subject: Re: synonyms
>
>
> Ideally, would it be a good idea to pass the index data through the
>  synonyms
> filter while indexing?
> Also,
> say i have this mapping
> dui => drunk driving defense
>  or
> dui,drunk driving defense,drunk driving law
>
> so matches for dui, will also bring up matches for drunk driving law
>  (the
> whole phrase) or does it also bring up all matches for 'drunk' ,
> 'driving','law'  ?
>
>
>
> Yonik Seeley wrote:
> >
> > On Nov 30, 2007 5:39 PM, anuvenk <[EMAIL PROTECTED]> wrote:
> >> Should data be re-indexed everytime synonyms like
> >> word1,word2
> >> or
> >> word1 => word2
> >>
> >> are added to synonyms.txt
> >
> > Yes, if it changes the index (if it's used in the index anaylzer as
> > opposed to just the query analyzer).
> >
> > -Yonik
> >
> >
>
> --
> View this message in context:
>  http://www.nabble.com/synonyms-tf4925232.html#a14100346
> Sent from the Solr - Dev mailing list archive at Nabble.com.
>
>
>
>
>

Reply via email to