Re: Update Plugins (was Re: Handling disparate data sources in Solr)

Chris Hostetter Fri, 19 Jan 2007 15:05:07 -0800

: 1) I think it should be a ServletFilter applied to all requests that
: will only process requests with a registered handler.


I'm not sure what "it" is in the above sentence ... i believe from the
context of the rest of hte message you are you refering to
using a ServletFilter instead of a Servlet -- i honestly have no opinion
about that either way.

: 2) I think the RequestParser should take care off parsing
: ContentStreams *and* SolrParams - not just the streams.  The dispatch
: servlet/filter should never call req.getParameter().

If that's the case, then the RequestParser is in control of the URL
structure ... except that it's not in control of the path info since
that's how we pick the RequestParser in the first place ... what if
we decide later that we want to change the URL structure -- then every
RequestParser would have to be changed.

: 3) I think the dispatcher picks the Handler and either calls it
: directly or passes it to SolrCore.  It does not put "qt" in the
: SolrParams and have SolrCore extract it (again)

that's perfectly fine with me - i only had it that way because that's how
RequestHandler execution currently works, i wanted to leave anything not
directly related to what i was suggestion exactly the way it was currently
in my psuedo code.

: == Arguments for a ServletFilter: ==
: If we implement the dispatcher as a Filter:
: * the URL is totally customizable from solrconfig.xml

can you explain this more ... why does a ServletFilter make the URL more
customizable then an alternative (which i believe is jsut a Servlet)

: If we implement the dispatcher as a Servlet
: * we have to define a 'base' path for each servlet - this would make
: the names longer then then need to be and add potential confusion in
: the configuration.

whoa ... hold on a minute, even if we use a ServletFilter do do all of the
dispatching instead of a Servlet we still need a base path right?
... even if we ignore the current admin pages and assume we're going to
replace them all with new RequestHandlers when we do this, what happens a
year from now when we decide we want to add some new piece of
functionality that needs a differnet Servlet/ServletFilter ... if we've
got a Filter matching on "/*" don't we burn every possible bridge we have
for adding something else latter.

: Consider the servlet 'update' and another servlet 'select'  When our
: proposed changes, these could both be the same servlet class
: configured to distinct paths.  Now lets say you want to call:
:   http://localhost/solr/update/xml?params
: Would that be configured in solrconfig.xml as <handler name="xml"?
: name="update/xml"?  If it is "update/xml" would it only really work if
: the 'update' servlet were configured properly?

it would only make sense to map that as "xml" ... the SolrCore (and hte
solrconfig.xml) shouldn't have any knowledge of the Servlet/ServletFilter
base paths because it should be possible to use the SolrCore independent
of any ServletContainer (if for no other reason in unit tests)

: == Why RequestParser should handle SolrParams AND Streams  ==
: * It is a MUCH easier interface to understand then pre/process

I won't argue with you there.

: * The dispatch filter does not need to touch req.getParameter()
: * It alows for plugable behavior to extract parameters - not just streams.
:
: Consider the current discussion on "POST for Queries"
:  (http://www.nabble.com/Using-HTTP-Post-for-Queries-tf3039973.html)
: It seems like we may need a few ways to parse params out of the
: request.  The way one handles the parameters directly affects the
: streams.  This logic should be contained in a single place.

The intent there is to use a regular CGI form encoded POST body to express
more params then the client feels safe putting in a URL, under teh API
i was suggesting that would be solved with a "No-Op" RequestParser that
has empty preProcess and process methods.  when the Servlet (or ServletFilter)
builds the Solrparams (in between calling parser.preProcess and
parser.process) it gets *all* of hte form encoded params from the
HttpServletRequest (because no code has touched the input stream)

an alternative situation in which you might want to "Query using HTTP
POST" is if you had an XmlQPRequestHandler that understood the
xml-query-parser syntax from this contrib...
  http://svn.apache.org/viewvc/lucene/java/trunk/contrib/xml-query-parser/
...which expected to read the XML from the ContentStreams of the
SolrRequest, and you wnated to put the XML in the raw POSt body of the
request (the same way our current update POSTs work) but there were other
options XmlQPRequestHandler wanted to get out of the
SolrRequest's SolrParams.

that would be handled by a "RawPostRequestParser" whose process
method would be a No-Op, but the preProcess method would make a
ContentStream out of the InputStream from the HttpServletRequest -- then
the Servlet/ServletFilter would parse the url useing the
HttpServletRequest.getParameter() methods (which are now safe to call
without damanging the InputStream).

(That RawPostRequestParser would be reused along with an XmlUpdateHandler
that we refactor the existing <add><doc> updated logic from the core to
support the legacy /update URLs)

A third use case of doing queries with POST might be that you want to use
standard CGI form encoding/multi-part file upload semantics of HTTP to
send an XML file (or files) to the above mentioned XmlQPRequestHandler ...
so then we have "MultiPartMimeRequestParser" that has a No-Op preProcess
method, and uses the Commons FileUpload code with a
org.apache.commons.fileupload.RequestContext it builds out of the header
info passed to preProcess by the Servlet.

: == The Dispatcher should pick the handler  ==

: There is no reason it would need to inject 'qt' into the solr params
: just so it can be pulled out by SolrCore (using the @depricated
: function: solrReq.getQueryType()!)

I agree ... as i said, i just left it that way in my psuedo code because i
didn't have a strong opinion about it and wanted to leave things that
worked okay now as they were in an attempt to not confuse the point i was
trying to make -- i guess it didn't work :)

: If the dispatcher is required to put a parameter in SolrParams, we
: could not make the RequestParser in charge of filling the SolrParams.
: This would require something like your pre/process system.

well .. even if the Servlet/ServletFilter dispatches directly to the
Handler without putting a "qt" in the SolrParams ... i still prefer that
the RequestParser not be the one parsing the URL.

: == Pseudo-Java ==

I'm happily on board with all of the code you posted except this line...

:         SolrQueryRequest solrReq = parser.parse( request );

...i really, really, REALLY don't like the idea that the RequestParser
Impls -- classes users should be free to write on their own and plugin to
Solr using the solrconfig.xml -- are responsible for the URL parsing and
parameter extraction.  Maybe calling them "RequestParser" in my suggested
design is missleading, maybe a better name like "StreamExtractor" would be
better ... but they shouldn't be in charge of doing anything with the URL.

Imagine if 3 years ago, when Yonik and I were first hammering out the API
for SolrRequestHandlers, we had picked this...

   public interface SolrRequestHandlers extends SolrInfoMBean {
     public void init(NamedList args);
     public void handleRequest(HttpServletRequest req, SolrQueryResponse rsp);
   }

...and the only thing the Servlet did was format the response?  Not only
would Unit tests have been a lot harder to write, but we'd be
screwed today, because we wouldn't be able to change the Servlet and
change how RequestHandlers are used without breaking any existing
RequestHandlers written by clients that might be depending on having the
full HttpServletRequest and do crazy things with the path.

keeping HttpServletRequest out of the API for RequestParsers helps us
future-proof against breaking plugins down the road.


-Hoss

Re: Update Plugins (was Re: Handling disparate data sources in Solr)

Reply via email to