Re: Overview of Query Parsing API Stack? / Dismax parsing, new 1.4 parsing, etc.

Chris Hostetter Thu, 20 Aug 2009 19:17:06 -0700

: Subject: Overview of Query Parsing API Stack? / Dismax parsing,
:     new 1.4  parsing, etc.


Oh, what i would give for time to sit and document in depth how some of 
this stuff works (assuming i first had time to verify that it really does 
work the way i think)

The nutshell answer is that as far as solr (1.4) is concerned, the main 
unit of "query parsing" is a QParser ... lots of places in the code base 
may care about parsing differnet strngs for the purposes of producting a 
Query object, but ultimately they all use a QParser.

QParsers are plugins that you can configure instances of in your 
solrcinfog.xml and assign names to.  by default, all of various pieces of 
code in solr that do any sort of query related parsing use some basic 
convention to pick a QParser by name -- so StandardRequestHandler uses the 
QParser named "lucene" for parsing the "q" param, while 
DisMaxRequestHandler uses a QParser named "dismax" for "q", and "func" for 
the "bf" param.  so if you wanted to make some change so that *any* code 
path anywhere attempting to use the lucene syntax got your custom query 
parsing logic, you could configure a QParser with the name "lucene" and 
override the default.

The brilliantly confusing magic comes into play when strings to be parsed 
start with the "local params" syntax (ie: "{!foo a=f,b=z}blah blah" ... 
that tells the parsing code to override whatever QParser it would have 
used for that string, and to pass everything after the "}" charcter to the 
parser named "foo", with a=f and b=z added to the list of SolrParams it's 
already got (from the query string, or default params in solrconfig, 
etc...)

For most types of queries, the QParser ultimately uses Lucenes 
"QueryParser" class, or some subclass of it (DisMaxQueryParser used by the 
DisMaxQPlugin is a subclass of QueryParser") and 9 times out of 10 if 
people want to customize query parsing without inventing a 100% new 
syntax, they also write a subclass.

coming in Lucene 2.9 (which is what Solr 1.4 will use) is a completley new 
QueryParser framework, which (i'm told) is suppose to make it much easier 
to create custom query parser syntaxs, but i haven't had time to look at 
it to see what all hte fuss is about.  so in theory you could use it to 
implement a new QPlugin in SOlr 1.4.

no matter how you ultimately implement code that goes from "String" to 
"Query" you have to be concerned about the type of data in the field that 
Query objects refers to (if it was lowercased at index time, you want to 
lowercase at query time, etc...).  Solr does it's best to help query 
parsers out by supporting an <analyer type="query"/> in the schema.xml so 
that the schema creator that specify how to "analyze" a piece of 
input when building queries, but depending on the query syntax it's not 
always easy to get the behavior you expect from a particular query parser 
/ analyzer pair (This part of query parsing typically trips people up when 
dealing with multiword synonyms, or analyzers that don't tokenize on 
whitespace, because the normal Lucene QueryParser uses whitespace as part 
of it's markup, and breaks up the input on the whitespace boundaries 
before it ever passes those chunks of input to the analyzers)

: But trying traipse through the code to get "the big picture" is a bit
: involved.

like i said: the world of query parsing in solr all revolves arround the 
QParser API ... if you want to make sense of it, start there, and work out 
in both directions.

PS: please, please, please ... as you make progress on understanding these 
internals, feel free to plagerize this email as the starting point of a 
new wiki page documenting your understanding for others who come along 
with teh same question.


-Hoss

Re: Overview of Query Parsing API Stack? / Dismax parsing, new 1.4 parsing, etc.

Reply via email to