: Subject: Overview of Query Parsing API Stack? / Dismax parsing, : new 1.4 parsing, etc.
Oh, what i would give for time to sit and document in depth how some of this stuff works (assuming i first had time to verify that it really does work the way i think) The nutshell answer is that as far as solr (1.4) is concerned, the main unit of "query parsing" is a QParser ... lots of places in the code base may care about parsing differnet strngs for the purposes of producting a Query object, but ultimately they all use a QParser. QParsers are plugins that you can configure instances of in your solrcinfog.xml and assign names to. by default, all of various pieces of code in solr that do any sort of query related parsing use some basic convention to pick a QParser by name -- so StandardRequestHandler uses the QParser named "lucene" for parsing the "q" param, while DisMaxRequestHandler uses a QParser named "dismax" for "q", and "func" for the "bf" param. so if you wanted to make some change so that *any* code path anywhere attempting to use the lucene syntax got your custom query parsing logic, you could configure a QParser with the name "lucene" and override the default. The brilliantly confusing magic comes into play when strings to be parsed start with the "local params" syntax (ie: "{!foo a=f,b=z}blah blah" ... that tells the parsing code to override whatever QParser it would have used for that string, and to pass everything after the "}" charcter to the parser named "foo", with a=f and b=z added to the list of SolrParams it's already got (from the query string, or default params in solrconfig, etc...) For most types of queries, the QParser ultimately uses Lucenes "QueryParser" class, or some subclass of it (DisMaxQueryParser used by the DisMaxQPlugin is a subclass of QueryParser") and 9 times out of 10 if people want to customize query parsing without inventing a 100% new syntax, they also write a subclass. coming in Lucene 2.9 (which is what Solr 1.4 will use) is a completley new QueryParser framework, which (i'm told) is suppose to make it much easier to create custom query parser syntaxs, but i haven't had time to look at it to see what all hte fuss is about. so in theory you could use it to implement a new QPlugin in SOlr 1.4. no matter how you ultimately implement code that goes from "String" to "Query" you have to be concerned about the type of data in the field that Query objects refers to (if it was lowercased at index time, you want to lowercase at query time, etc...). Solr does it's best to help query parsers out by supporting an <analyer type="query"/> in the schema.xml so that the schema creator that specify how to "analyze" a piece of input when building queries, but depending on the query syntax it's not always easy to get the behavior you expect from a particular query parser / analyzer pair (This part of query parsing typically trips people up when dealing with multiword synonyms, or analyzers that don't tokenize on whitespace, because the normal Lucene QueryParser uses whitespace as part of it's markup, and breaks up the input on the whitespace boundaries before it ever passes those chunks of input to the analyzers) : But trying traipse through the code to get "the big picture" is a bit : involved. like i said: the world of query parsing in solr all revolves arround the QParser API ... if you want to make sense of it, start there, and work out in both directions. PS: please, please, please ... as you make progress on understanding these internals, feel free to plagerize this email as the starting point of a new wiki page documenting your understanding for others who come along with teh same question. -Hoss