Currently during highlighting, the query string is analyzed by the analyzer returned by IndexSchema.getQueryAnalyzer(). (If you step through the code, you'll see that the Query object representing the analyzed-and-parsed query string is generated before SolrHighlighter's key doHighlighting() method gets called.)
Two things to emphasized here: * Query analysis takes place independent of which field is being highlighted. (In other words, the query analyzer used does not vary depending on which hl.fl is currently under consideration.) * Under the hood, this analyzer delegates to a separate sub-Analyzer for each field referenced in the query itself. (For example, if you have the query "body_text:smith AND num:5", then "smith" will perhaps be analyzed using an analyzer with stopword analysis, stemming, etc., while "5" will be analyzed with something simpler, more appropriate for a numeric-only field.) Or, to summarize: Query analysis during highlighting is a function of the fields being *searched*, and *not* a function of the fields being *highlighted*. It seems to me that this behavior might be backwards. That is, what we'd really want is for query analysis during highlighting to be a function of the fields being highlighted (i.e. the hl.fl params), and *not* of the fields being mentioned in the query. Let me try to sketch the use case that leads me to think this: I have an index with two fields: body (default field; word bigram analyzer) kwic (text is copyTo'd here from the body field; non-bigram analyzer) (By "word bigram analyzer" I mean one that might analyze the input "once upon a time" into the token stream "once", "once upon", "upon", "upon a", "a", "a time".) Let's say I want to search for "audit trail" (with quotes), and get use hl.fl=kwic. If I use the current highlight mechanism, then it will be the *body* field's bigram-generating analyzer that will be used to construct the Query object used in highlighting. The resulting object, in my case, is something like this: TermQuery: "audit trail" Note that my kwic field was not analyzed with word bigrams, though, so although it contains "audit" and "trail" as adjacent tokens, it does not contain the composite token "audit trail". As such, when this TermQuery is used for highlighting, no snippets will be generated. (This no-snippets situation probably depends on a few other details of my situation that I haven't mentioned. But I'm trying to avoid drowning people in details here.) In contrast, let's suppose that, when doing my analysis for query highlighting, I *ignore* which particular fields are being *searched*, and instead use the query analyzer for the hl.fl field being requested. In this case my hl.fl=kwic, and so the non-bigram analyzer will be used, and so my Query object for highlight will be something like this: PhraseQuery: audit, trail Unlike the earlier TermQuery, this PhraseQuery works fine for highlighting with the kwic field, and generates nice snippets. Does this example make any sense? It would probably be more helpful to provide a test case, but I'll have to figure out how to make one that would provide a compelling use case here but that will also run without require you to download and apply patches from JIRA. I've thrown together a patch that makes this highlighting analysis change, and it doesn't seem to break the test suite in any major ways. I may put it up on JIRA, but it's kind of a hack. What's more, *how* to make the change in behavior I'm talking about is sort of a separate question from whether it's wildly off course even in theory. What do you think? Chris