Hi, I had to work with this kind of sides effects reguarding multiwords synonyms. We installed solr on our project that extensively uses synonyms, a big list that sometimes could bring out some wrong match as the one noticed by Anuvenk for instance
> dui => drunk driving defense > or > dui,drunk driving defense,drunk driving law > query for "dui" matches "dui => drunk driving defense" and "dui,drunk driving > defense,drunk driving law" in order to prevent this kind of behavior I gave for every "synonyms family" (saying a single line in the file) a unique identifier, so the list looks like : dui => HIER_FAMILIY_01 drunk driving defense => HIER_FAMILIY_01 SYN_FAMILY_01, dui,drunk driving defense,drunk driving law I also set the synonyms filter at index time with expand=false, and at query time with expand=false so in this way, the matched synonyms (multi words or single words) in documents are replaced with their family identifier, and not all the possibilities. Indexing with expand=true will add words in documents that could be matched alone, ignoring the fact that they belong to multiwords expression, and this could end up with a wrong match (intending syns mix) at query time. so in this way a query for "dui", will be changed by the synonym filter at query time with "HIER_FAMILIY_01" or "SYN_FAMILY_01" so documents that contains only single words like "drunk", "driving" or "law" will not be matched since only a document with the phrase "drunk driving law" would have been indexed with "SYN_FAMILY_01". The approach worked pretty good on our project and we do not notice any sides effects on the searches, it only removes matched documents that were considered as "noise" of the synonyms mix issue. I think this could be usefull to add this kind of approach on the solr synoyms filter section of the wiki, Cheers Laurent On Dec 2, 2007 3:41 PM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Hi (changing to solr-user list) > > Yes it is, especially if the terms left of => are multi-spaced. Check out > the Wiki, one page there explains this nicely. > > Otis > - > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > ----- Original Message ---- > From: anuvenk <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Saturday, December 1, 2007 1:21:49 AM > Subject: Re: synonyms > > > Ideally, would it be a good idea to pass the index data through the > synonyms > filter while indexing? > Also, > say i have this mapping > dui => drunk driving defense > or > dui,drunk driving defense,drunk driving law > > so matches for dui, will also bring up matches for drunk driving law > (the > whole phrase) or does it also bring up all matches for 'drunk' , > 'driving','law' ? > > > > Yonik Seeley wrote: > > > > On Nov 30, 2007 5:39 PM, anuvenk <[EMAIL PROTECTED]> wrote: > >> Should data be re-indexed everytime synonyms like > >> word1,word2 > >> or > >> word1 => word2 > >> > >> are added to synonyms.txt > > > > Yes, if it changes the index (if it's used in the index anaylzer as > > opposed to just the query analyzer). > > > > -Yonik > > > > > > -- > View this message in context: > http://www.nabble.com/synonyms-tf4925232.html#a14100346 > Sent from the Solr - Dev mailing list archive at Nabble.com. > > > > >