Re: SOLR Thesaurus

Péter Király Fri, 10 Dec 2010 15:12:51 -0800

Hi Chris,

thanks for your description. I should think about this a little bit
more, then I will ask some details. The main problem is that Synonyms
are one kind of relations, and Thesaurus may contain 6-10 kinds of
relations. And it is depending on the user, which types of relations
he would like to use in a similar fashion as synonyms.


Péter

2010/12/10 Chris Hostetter <[email protected]>:
>
> : My imaginative use case:
> : - the user enters a term and maybe he turns on a flag to get not just
> : the term, but all terms, which related somehow with this (usually the
> : synonyms and narrower terms).
> : - Solr first find the queried term(s) in the thesaurus, then finds the
> : related terms, modifies and issues the query
> : e.g. query is fruits, and it becames (fruit OR apple OR banana OR ...)
> :
> : This use case is different from the synonym handler, which - as far as
> : I know - modifies the index, and injects synonyms at the position of
> : the original word. My use case suppose, that we maintain thesaurus as
> : a different "database" (maybe another Solr index).
>
> the use case you describe *could* be solved using the SynonymFilter -- you
> can configure it to be used at query time (for query expansion) *or* you
> can configure it to be used at index time (for reduction or expansion)
>
> just express your thesaurus in the synonyms.txt format and configure it in
> your schema.xml
>
> The two gotcha's to watch out for with this kind of appoach is multiword
> synonyms and the way Lucene's QueryParser treats whitespace as a
> metacharacter.
>
> in general, if you're going to do this kind of major query expantion, you
> probably wnat to use something like the "FieldQParser" which doesn't treat
> whitespace as special so user input like...
>        United States
> ...makes it to hte analyzer as one chunk of text, and can be looked up as
> is in your thesaurus.
>
> The multiword synonym issue is more complicated - i don't have the energy
> to fully explain it right now, but for query time expansion it can be a
> real pain in the ass.  one word arround is to index shingle-esque terms
> instead of hte individual words in your synonyms, but that defeats the
> point of your goal of having an external thesarus that can be modified
> independently of the index.
>
> My suggestion would be to write a simple little ThesarusQParser, that can
> use and instance of the SynonymFilter directly to preprocess the input
> text to get a list of all the Related Terms, and then delegate to another
> QParser to generate an appropate Query for each of them (typically a
> PhraseQuery) which your ThesarusQParser would then combine into a giant
> BooleanQuery (except you may wnat to consider a DisjunctionMaxQuery
> instead because of the scoring factors)
>
> ThesarusQParser would require very little code, because SynonymFilter
> would be doing all the hard work.
>
>
> -Hoss
>

Re: SOLR Thesaurus

Reply via email to