Hmmmm... The Analyzer shows me *almost* what I am expecting to see. When I
show it being verbose with debug info, I can see exactly what is going on,
which is great. Thanks for the tip.

What's happening (for most of my test cases) is that some of the synonyms
are multiple words (and it's a big synonym list), and then also the word
delimiter is creating even more terms. The analyzer finds a match in
individual words (highlighted words) but the query engine makes a more
complex.

Consider:

a document with the text "the quick brown fox jumps over the lazy dog" in a
"body" field of type "text" like in schema mentioned above.

a synonym list like:

dog,canine,mut,domestic dog,barker
wretch,dog
hound,dog,pooch,doggy

and query for the word "dog"

The analyzer creates two terms, like this:

Term position 1: dog,canin,mut,domest,barker,wretch,hound,pooch,doggi
Term position 2: dog

(here, the synonym "domestic dog" for "dog" creates two tokens: "domestic"
and "dog")

And highlights the word dog in the query. So the analyzer can find it.

The query is parsed into: MultiPhraseQuery(text:"(dog canin mut domest
barker wretch hound pooch doggi) dog") 

Which only matches a document with "dog dog" or "canine dog" or "domestic
dog" (etc) in it. If these words are separated, eg: "a canine is a kind of
dog" then we get no match! :(

Why does a two word synonym require a two word match for all synonyms?

I was also hoping that the synonym list might be one way: ie: dog expands to
hound but not wretch in the example above. Is there a way to do this too?
(that might be a story for another thread).

Thanks,
Matt



-- 
View this message in context: 
http://www.nabble.com/Synonyms-list-breaks-solr-tp18401710p18405876.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to