edismax - Handling collocations mapped to a single token . . ?

CRB Tue, 28 Jun 2011 13:47:44 -0700

We are trying to get edismax to handle collocations mapped to a singletoken. To do so we need to manipulate the "chunks" (as Hoss referred tothem in http://www.lucidimagination.com/blog/2010/05/23/whats-a-dismax/)generated by the dismax parser. We have numerous collocations (terms ofspeech which do not directly relate to the constituent words that makeup the saying). For example, at index time "real estate" is mapped to"real_estate" to avoid it colliding with searches for "estate" or "realvalue". So we need the "chunks" to reflect this mapping of multi-wordphrases to a single token that is done during indexing (via the synonymfilter).

In an ideal world, we would just list the queryAnalyzerFieldType thatshould be used in pre-processing the query string before it is dividedinto "chunks" (similar to what is done with the SpellChecker Compoenent).

But our impression thus far is that we are off the reservation and willneed to hack away atorg.apache.solr.search.ExtendedDismaxQParser.splitIntoClauses(String,boolean).


    Is it correct that the only pre-processing by dismax is on stopwords?

Is it correct to be able to limit customization tosplitIntoClauses(String, boolean) to handle this?


Regards,

Christopher

edismax - Handling collocations mapped to a single token . . ?

Reply via email to