We are trying to get edismax to handle collocations mapped to a single
token. To do so we need to manipulate the "chunks" (as Hoss referred to
them in http://www.lucidimagination.com/blog/2010/05/23/whats-a-dismax/)
generated by the dismax parser. We have numerous collocations (terms of
speech which do not directly relate to the constituent words that make
up the saying). For example, at index time "real estate" is mapped to
"real_estate" to avoid it colliding with searches for "estate" or "real
value". So we need the "chunks" to reflect this mapping of multi-word
phrases to a single token that is done during indexing (via the synonym
filter).
In an ideal world, we would just list the queryAnalyzerFieldType that
should be used in pre-processing the query string before it is divided
into "chunks" (similar to what is done with the SpellChecker Compoenent).
But our impression thus far is that we are off the reservation and will
need to hack away at
org.apache.solr.search.ExtendedDismaxQParser.splitIntoClauses(String,
boolean).
Is it correct that the only pre-processing by dismax is on stopwords?
Is it correct to be able to limit customization to
splitIntoClauses(String, boolean) to handle this?
Regards,
Christopher