Re: Update Plugins (was Re: Handling disparate data sources in Solr)

Ryan McKinley Fri, 19 Jan 2007 19:34:48 -0800

(Note: this is different then what i have suggested before.  Treat it
as brainstorming on how to take what i have suggested and mesh it with
your concerns)


What if:

The RequestParser is not be part of the core API - It would be a
helper function for Servlets and Filters that call the core API.  It
could be configured in web.xml rather then solrconfig.xml.  A
RequestDispatcher (Servlet or Filter) would be configured with a
single RequestParser.

The RequestParser would be in charge of taking HttpRequest and determining:
 1) The RequestHandler
 2) The SolrRequest (Params & Streams)

It would not be the most 'pluggable' of plugins, but I am still having
trouble imagining anything beyond a single default RequestParser.
Assuming anything doing *really* complex ways of extracting
ContentStreams will do it in the Handler not the request parser.  For
reference see my argument for a seperate DocumentParser interface in:
http://www.nabble.com/Re%3A-Update-Plugins-%28was-Re%3A-Handling-disparate-data-sources-in-Solr%29-p8386161.html

In my view, the default one could be mapped to "/*" and a custom one
could be mapped to "/mycustomparser/*"

This would drop the ':' from my proposed URL and change the scheme to look like:
/parser/path/the/parser/knows/how/to/extract/?params

This would give people a relativly easy way to implement 'restful'
URLs if they need to.  (but they would have to edit web.xml)

: Would that be configured in solrconfig.xml as <handler name="xml"?
: name="update/xml"?  If it is "update/xml" would it only really work if
: the 'update' servlet were configured properly?

it would only make sense to map that as "xml" ... the SolrCore (and hte
solrconfig.xml) shouldn't have any knowledge of the Servlet/ServletFilter
base paths because it should be possible to use the SolrCore independent
of any ServletContainer (if for no other reason in unit tests)


Correct, SolrCore shoudl not care what the request path is.  That is
why I want to deprecate the execute( ) function that assumes the
handler is defined by 'qt'

Unit tests should be handled by execute( handler, req, res )

If I had my druthers, It would be:
 res = handler.execute( req )
but that is too big of leap for now :)

...

A third use case of doing queries with POST might be that you want to use
standard CGI form encoding/multi-part file upload semantics of HTTP to
send an XML file (or files) to the above mentioned XmlQPRequestHandler ...
so then we have "MultiPartMimeRequestParser" ...


I agree with all your use cases.  It just seems like a LOT of complex
overhead to extract the general aspects of translating a
URL+Params+Streams => Handler+Request(Params+Streams)

Again, since the number of 'RequestParsers' is small, it seems overly
complex to have a separate plugin to extract URL, another to extract
the Handler, and another to extract the streams.  Particulary since
the decsiions on how you parse the URL can totally affect the other
aspects.


...i really, really, REALLY don't like the idea that the RequestParser
Impls -- classes users should be free to write on their own and plugin to
Solr using the solrconfig.xml -- are responsible for the URL parsing and
parameter extraction.  Maybe calling them "RequestParser" in my suggested
design is missleading, maybe a better name like "StreamExtractor" would be
better ... but they shouldn't be in charge of doing anything with the URL.


What if it were configured in web.xml, would you feel more comfortable
letting it determine how the URL is parsed and streams are extracted?

Imagine if 3 years ago, when Yonik and I were first hammering out the API
for SolrRequestHandlers, we had picked this...

   public interface SolrRequestHandlers extends SolrInfoMBean {
     public void init(NamedList args);
     public void handleRequest(HttpServletRequest req, SolrQueryResponse rsp);
   }


Thank goodness you didn't!  I'm confident you won't let me (or anyone)
talk you into something like that!  You guys made a lot of good
choices and solr is an amazing platform for it.

That said, the task at issue is: How do we convert an arbitrary
HttpServletRequest into a SolrRequest.

I am proposing we have a single interface to do this:
 SolrRequest r = RequestParser.parse( HttpServletRequest  )

You are proposing this is broken down further.  Something like:
 Handler h = (the filter) getHandler( req.getPath() )
 SolrParams = (the filter) do stuff to extract the params (using
parser.preProcess())
 ContentStreams = parser.parse( request )

While it is not great to have plugins manipulate the HttpRequest -
someone needs to do it.  In my opinion, the RequestParser's job is to
isolate *everything* *else* from the HttpServletRequest.

Again, since the number of RequestParser is small, it seems ok (to me)


keeping HttpServletRequest out of the API for RequestParsers helps us
future-proof against breaking plugins down the road.


I agree.  This is why i suggest the RequestParsers is not a core part
of the API, just a helper class for Servlets and Filters.


ryan

Re: Update Plugins (was Re: Handling disparate data sources in Solr)

Reply via email to