Re: Update Plugins (was Re: Handling disparate data sources in Solr)
: > Ah ... this is the one problem with high volume on an involved thread ... : > i'm sending replies to messages you write after you've already read other : > replies to other messages you sent and changed your mind :) : Should we start a new thread? I don't think it would make a differnece ... we just need to slow down :) : Ok, now (I think) I see the difference between our ideas. : : >From your code, it looks like you want the RequestParser to extract : 'qt' that defines the RequestHandler. In my proposal, the : RequestHandler is selected independent of the RequestParser. no, no, no ... i'm sorry if i gave that impression ... the RequestParser *only* worries about getting a streams, it shouldn't have any way of even *guessing* what RequestHandler is going to be used. for refrence: http://www.nabble.com/Re%3A-p8438292.html note that i never mention "qt" .. instead i refer to "core.execute(solrReq, solrRsp);" doing exactly what it does today ... core.execute will call getRequestHandler(solrReq.getQueryType()) to pick the RequestHandler to use. the Servlet is what creates the SolrRequest object, and puts whatever SolrParams it wants (including "qt") in that SolrRequest before asking the SolrCore to take care of it. : What do you imagine happens in: : > : > String p = pickRequestParser(req); let's use the URL syntax you've been talking about that people seem to have agreed looks good (assuming i understand correctly) ... /servlet/${requesthandler}:${requestparser}?param1=val1¶m2=val2 what i was suggesting was that then the servlet which uses that URL structure might have a utility method called pickRequestParser that would look like... private String pickRequestParser(HttpServletRequest req) { String[] pathParts = req.getPathInfo().split("\:"); if (pathParts.length < 2 || "".equal(pathParts[1])) return "default"; // or "standard", or null -- whatever return pathParts[1]; } : If the RequestHandler is defined by the RequestParser, I would : suggest something like: again, i can't emphasis enough that that's not what i was proposing ... i am in no way shape or form trying to talk you out of the idea that it should be possible to specify the RequestParser, the RequestHandler, and the OutputWriter all as part of the URL, and completley independent of eachother. the RequestHandler and the OutputWriter could be specified as regular SolrParams that come from any part of the HTTP request, but the RequestParser needs to come from some part of the URL thta can be inspected with out any risk of affecting the raw post stream (ie: no HttpServletRequest.getParameter() calls) : I still don't see why: : : > : > // let the parser preprocess the streams if it wants... : > Iterable s = solrParser.preprocess : > (getStreamIno(req), new Pointer() { : > InputStream get() { : > return req.getInputStream(); : > }); : > : > Solrparams params = makeSolrRequest(req); : > : > // let the parser decide what to do with the existing streams, : > // or provide new ones : > Iterable solrParser.process(solrReq, s); : > : > // ServletSolrRequest is a basic impl of SolrRequest : > SolrRequest solrReq = new ServletSolrRequest(params, s); : > : : can not be contained entirely in: : : SolrRequest solrReq = parser.parse( req ); because then the RequestParser would be defining how the URL is getting parsed -- the makeSolrRequest utility placeholder i described had the wrong name, i should have called it makeSolrParams ... it would look something like this in the URL syntax i described above... private SolrParams makeSolrParams(HttpServletRequest req) { // this class already in our code base, used as is SolrParams p = new ServletSolrParams(req); String[] pathParts = req.getPathInfo().split("\:"); if ("".equal(pathParts[0])) return p; Map tmp = new HashMap(); tmp.put("qt", pathPaths[0]); return new DefaultSolrParams(new MapSolrParams(tmp), p); } the nutshell version of everything i'm trying to say is... SolrRequest - models all info about a request to solr to do something: - the key=val params assocaited with that request - any streams of data associated with that request RequestParser(s) - different instances for different sources of streams - is given two chances to generate ContentStreams: - once using the raw stream from the HTTP request - once using the params for the SolrRequest SolrSerlvet - the only thing with direct access to the HttpServletRequest, shields the other interface APIs from from the mechanincs of HTTP - dictates the URL structure - determines the name of the RequestParser to use - lets parser have the raw input stream - determines where SolrParams for request come from - lets parser have params to make more streams if it wants to. SolrCore - does all of hte name lookups for processing a SolrRequest: -
RE: separate log files
Hi Solr devs, I'm running multiple instances of Solr, which all using the same war file to load from. To log to separate files I implemented the following kludge. -Ben 23d22 < import org.apache.solr.request.SolrQueryResponse; 24a24 > import org.apache.solr.request.SolrQueryResponse; 33a34,36 > > import java.io.ByteArrayInputStream; > import java.io.ByteArrayOutputStream; 34a38,39 > import java.io.InputStream; > import java.io.OutputStream; 35a41,42 > import java.util.Properties; > import java.util.logging.LogManager; 47a55,80 > /* >* switch java.util.logging.Logger appenders >* >* Add the following to the web context file >* >*/ > private void switchAppenders(String prefix) { > String logParam = "org.apache.juli.FileHandler.prefix"; > log.info("switching appender to " + logParam + "=" + prefix); > Properties props = new Properties(); > try { > InputStream configStream = getClass().getResourceAsStream("/logging.properties"); > props.load(configStream); > configStream.close(); > props.setProperty(logParam, prefix); > ByteArrayOutputStream os = new ByteArrayOutputStream(); > props.store((OutputStream)os, "LOGGING PROPERTIES"); > LogManager.getLogManager().readConfiguration(new ByteArrayInputStream(os.toByteArray())); > log.info("props: " + props.toString()); > } > catch(Exception e) { > String errMsg = "Error: Cannot load configuration file; Cause: " + e.getMessage(); > log.info(errMsg); > } > } > 48a82 > 52c86,91 < --- > > // change the logging properties > String prefix = (String)c.lookup("java:comp/env/solr/log-prefix"); > if (prefix!=null) > switchAppenders(prefix); > 64a104 > > -Original Message- > From: Chris Hostetter [mailto:[EMAIL PROTECTED] > Sent: Wednesday, 17 January 2007 6:04 AM > To: solr-user@lucene.apache.org > Subject: Re: separate log files > > > : I wonder of jetty or tomcat can be configured to put logging output > : for different webapps in different log files... > > i've never tried it, but the tomcat docs do talk about tomcat > providing a custom implimentation of java.util.logging > specificly for this purpose. > > Ben: please take a look at this doc... > > http://tomcat.apache.org/tomcat-5.5-doc/logging.html > > ..specifically the section on java.util.logging (since that's > what Solr > uses) ... I believe you'll want something like the "Example > logging.properties file to be placed in common/classes" so > that you can control the logging. > > Please let us all know if this works for you ... it would > make a great addition to the SolrTomcat wiki page. > > > : On 1/15/07, Ben Incani <[EMAIL PROTECTED]> wrote: > : > Hi Solr users, > : > > : > I'm running multiple instances of Solr, which all using > the same war > : > file to load from. > : > > : > Below is an example of the servlet context file used for each > : > application. > : > > : > : > debug="0" crossContext="true" > > : > : > value="/var/local/app1" override="true" /> > : > > : > > : > Hence each application is using the same > : > WEB-INF/classes/logging.properties file to configure logging. > : > > : > I would like to each instance to log to separate log > files such as; > : > app1-solr.-mm-dd.log > : > app2-solr.-mm-dd.log > : > ... > : > > : > Is there an easy way to append the context path to > : > org.apache.juli.FileHandler.prefix > : > E.g. > : > org.apache.juli.FileHandler.prefix = ${catalina.context}-solr. > : > > : > Or would this require a code change? > : > > : > Regards > : > > : > -Ben > : > > > > -Hoss > >
Re: graduation todo list
On 1/18/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: : The old website has been redirected to the new, so those links are : less important. one hitch to this is that the symlink from when we moved hte javadocs is now gone, so links like this found in the wiki 9and in mail archives) no longer work... http://incubator.apache.org/solr/docs/api/org/apache/solr/request/DisMaxRequestHandler.html ...instead of re-adding a symlink, we should probably put in a .htaccess file to do a redirect from http://lucene.apache.org/solr/docs/(.*) to http://lucene.apache.org/solr/$1 So people don't keep linking to the docs url? Sounds fine to me... I'm on my way home, but I'll handle it later if no one else does so first. -Yonik
Re: graduation todo list
: The old website has been redirected to the new, so those links are : less important. one hitch to this is that the symlink from when we moved hte javadocs is now gone, so links like this found in the wiki 9and in mail archives) no longer work... http://incubator.apache.org/solr/docs/api/org/apache/solr/request/DisMaxRequestHandler.html ...instead of re-adding a symlink, we should probably put in a .htaccess file to do a redirect from http://lucene.apache.org/solr/docs/(.*) to http://lucene.apache.org/solr/$1 -Hoss
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
: I was... then you talked me out of it! You are correct, the client : should determine the RequestParser independent of the RequestHandler. Ah ... this is the one problem with high volume on an involved thread ... i'm sending replies to messages you write after you've already read other replies to other messages you sent and changed your mind :) Should we start a new thread? Here's a more fleshed out version of the psuedo-java i posted earlier, with all of my adendums inlined and a few simple metho calls changed to try and make the purpose more clear... Ok, now (I think) I see the difference between our ideas. From your code, it looks like you want the RequestParser to extract 'qt' that defines the RequestHandler. In my proposal, the RequestHandler is selected independent of the RequestParser. What do you imagine happens in: String p = pickRequestParser(req); This looks like you would have to have a standard way (per servlet) of gettting the RequestParser. How do you invision that? What would be the standard way to choose your request parser? If the RequestHandler is defined by the RequestParser, I would suggest something like: interface SolrRequest { RequestHandler getHandler(); Iterable getContentStreams(); SolrParams getParams(); } interface RequestParser { SolrRequest getRequest( HttpServletRequest req ); // perhaps remove getHandler() from SolrRequest and add: RequestHandler getHandler(); } And then configure a servlet or filter with the RequestParser SolrRequestFilter ... RequestParser org.apache.solr.parser.StandardRequestParser Given that the number of RequestParsers is realistically small (as Yonik mentioned), I think this could be a good solution. To update my current proposal: 1. Servlet/Filter defines the RequestParser 2. requestParser parses handler & request from HttpServletRequest 3. handled essentially as before To update the example URLs, defined by the "StandardRequestParser" /path/to/handler/?param where /path/to/handler is the "name" defined in solrconfig.xml To use a different RequestParser, it would need to be configured in web.xml /customparser/whatever/path/i/like - - - - - - - - - - - - - - I still don't see why: // let the parser preprocess the streams if it wants... Iterable s = solrParser.preprocess (getStreamIno(req), new Pointer() { InputStream get() { return req.getInputStream(); }); Solrparams params = makeSolrRequest(req); // let the parser decide what to do with the existing streams, // or provide new ones Iterable solrParser.process(solrReq, s); // ServletSolrRequest is a basic impl of SolrRequest SolrRequest solrReq = new ServletSolrRequest(params, s); can not be contained entirely in: SolrRequest solrReq = parser.parse( req ); assuming the SolrRequest interface includes Iterable getContentStreams(); the parser can use use req.getInputStream() however it likes - either to make params and/or to build ContentStreams - - - - - - - - good good ryan
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
Cool. I think i need more examples... concrete is good :-) I don't quite grok your format below... is it one line or two? /path/defined/in/solrconfig:parser?params /${handler}:${parser} Is that simply /${handler}:${parser}?params yes. the ${} is just to show what is extracted from the request URI, not a specific example Imagine you have a CsvUpdateHander defined in solrconfig.xml with a "name"="my/update/csv". The standard RequestParser could extract the parameters and Iterable for each of the following requests: POST: /my/update/csv/?separator=,&fields=foo,bar,baz (body) "10,20,30" POST:/my/update/csv/ multipart post with 5 files and 6 form fields defining (unlike the previous example this the handle would get 5 input streams rather then 1) GET: /my/update/csv/?post.remoteURL=http://..&separator=,&fields=foo,bar,baz&;... fill the stream with the content from a remote URL GET: /my/update/csv/?post.body=bodycontent,&fields=foo,bar,baz&... use 'bodycontent' as the input stream. (note, this does not make much sense for csv, but is a useful example) POST: /my/update/csv:remoteurls/?separator=,&fields=foo,bar,baz (body) http://url1,http://url2,http:/url3... In this case we would use a custom RequestParser ("remoteurls") that would read the post body and convert it to a stream of content urls. - - - - - - - The URL path (everything before the ':') would be entirely defined and configured by solrconfig.xml A filter would see if the request path matches a registered handler - if not it will pass it up the filter chain. This would allow custom filters and servlets to co-exist in the top level URL path. Consider: solrconfig.xml web.xml: MyRestfulDelete /mydelete/* POST: /delete?id=AAA would be sent to DeleteHandler POST: /mydelete/AAA/ would be sent to MyRestfulDelete Alternativly, you could have: solrconfig.xml web.xml: MyRestfulDelete /delete/* POST: /standard/delete?id=AAA would be sent to DeleteHandler POST: /delete/AAA/ would be sent to MyRestfulDelete I am suggesting we do not try have the default request servlet/filter support extracting parameters from the URL. I think this is a reasonable tradeoff to be able to have the request path easily user configurable using the *existing* plugin configuration. - - - - - - - - In a previous email, you mentioned changing the URL structure. With this proposal, we would continue to support: /select?wt=XXX for the Csv example, you would also be able to call: GET: /select?qt=/my/update/csv/&post.remoteURL=http://..&sepa... ryan
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
: However, I'm not yet convinced the benefits are worth the costs. If : the number of RequestParsers remain small, and within the scope of : being included in the core, that functionality could just be included : in a single non-pluggable RequestParser. : : I'm not convinced is a bad idea either, but I'd like to hear about : usecases for new RequestParsers (new ways of generically getting an : input stream)? I don't really see it being a very high cost ... and even if we can't imagine any other potential user written RequestParser, we already know of at least 4 use cases we want to support out of the box for getting streams: 1) raw post body (as a single stream) 2) multi-part post body (file upload, potentially several streams) 3) local file(s) specified by path (1 or more streams) 4) remote resource(s) specified by URL(s) (1 or more streams) ...we could put all that logic in a single class with that looks at a SolrParam to pick what method to use or we could extract each one into it's own class using a common interface ... either way we can hardcode the list of viable options if we want to avoid the issue of letting the client configure them .. but i still think it's worth the effort to talk about what that common interface might be. I think my idea of having both a preProcess and a process method in RequestParser so it can do things before and after the Servlet has extracted SolrParams from the URL would work in all of the cases we've thought of. -Hoss
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
: I was... then you talked me out of it! You are correct, the client : should determine the RequestParser independent of the RequestHandler. Ah ... this is the one problem with high volume on an involved thread ... i'm sending replies to messages you write after you've already read other replies to other messages you sent and changed your mind :) : Are you suggesting there would be multiple servlets each with a : different methods to get the SolrParams from the url? How does the : servlet know if it can touch req.getParameter()? I'm suggesting that their *could* be multiple Servlets with multiple URL structures ... my worry is not that we need multiple options now, it's that i don't wnat to cope up with an API for writting plugins that then has to be throw out down the road when if we want/ened to change the URL : How would the default servlet fill up SolrParams? prior to calling RequestParser.preProcess, it would only access very limited parts of the HttpServletRequest -- the bare minimum it needs to pick a RequsetParser ... probably just the path, maybe the HTTP Headers -- but if we had a URL structure where we really wanted to specify the RequestParser in a URL param it could do it using getQueryString *after* calling RequestParser.preProcess the Servlet can access any part of the HttpServletRequest (because if the RequestParser wanted to use the raw POST InputStream it would have, and if it doesn't then it's fair game to let HttpServletRequest pull data out of it when the Servlet calls HttpServletRequest.getParameterMap() -- or any of the other HttpServletRequest methods to build up the SolrParams however it wants based on the URL structure it wants to use ... then RequestParser.process can use those SolrParams to get any other streams it may want and add them to the SolrRequest. Here's a more fleshed out version of the psuedo-java i posted earlier, with all of my adendums inlined and a few simple metho calls changed to try and make the purpose more clear... // Simple inteface for having a lazy refrence to something interface Pointer { T get(); } interface RequestParser { public init(NamedList nl); // the usual /** will be passed the raw input stream from the * HttpServletRequest, ... as well as whatever other HttpServletRequest * header info we decide its important for the RequestParser to know * about the stream, and is safe for Servlets to access and make * available to the RequestParser (ie: HTTP method, content-type, * content-length, etc...) * * I'm using a NamedList instance instead of passing the * HttpServletRequest to maintain a good abstraction -- only the Serlet * know about HTTP, so if we ever want to write an RMI interface to Solr, * the same RequestParser plugins will still work ... in practice it * might be better to explicitly spell out every piece of info about * the stream we want to pass * * This is the method where a RequestParser which is going to use the * raw POST body to build up eithera single stream, or several streams * from a multi-part request has the info it needs to do so. */ public Iterable preProcess(NamedList streamInfo, Pointer s); /** garunteed that the second arg will be the result from * a previous call to preProcess, and that that Iterable from * preProcess will not have been inspected or touched in anyway, nor * will any refrences to it be maintained after this call. * * this is the method where a RequestParser which is going to use * request params to open streams from local files, or remote URLs * can do so -- a particulararly ambitious RequestParser could use * both the raw POST data *and* remote files specified in params * because it has the choice of what to do with the * Iterable it reutnred from the earlier preProcess call. */ public Iterable process(SolrRequest request, Iterable i); } class SolrUberServlet extends HttpServlet { // servlet specific method which does minimal inspection of // req to determine the parser name based on the URL private String pickRequestParser(HttpServletRequest req) { ... } // extracts just the most crucial info about the HTTP Stream from the // HttpServletRequest, so it can be passed to RequestParser.preProcss // must be careful not to use anything that might access the stream. private NamedLIst getStreamInfo(HttpServletRequest req) { ... } // builds the SolrParams for the request using servlet specific URL rules, // this method is free to use anything in the HttpServletRequest // because it won't be called untill after preProcess private SolrParams makeSolrRequestParams(HttpServletRequest req) { ... } public service(HttpServletRequest req, HttpServletResponse response) { SolrCore core = getCore(); Solr(Query)Response solrRsp = new Solr(Query)Response(); String p = pickRequestParser(req)
[jira] Commented: (SOLR-80) negative filter queries
[ https://issues.apache.org/jira/browse/SOLR-80?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465857 ] Yonik Seeley commented on SOLR-80: -- >From an interface point of view, I'm heavily leaning toward getting rid of the >restriction of lucene queries having to be all negative. This would allow >using a negative-only query anywhere one currently can use a positive query. One could simply and naturally do fq=-id:10 to filter out a single document. > negative filter queries > --- > > Key: SOLR-80 > URL: https://issues.apache.org/jira/browse/SOLR-80 > Project: Solr > Issue Type: New Feature > Components: search >Reporter: Yonik Seeley > > There is a need for negative filter queries to avoid long filter generation > times and large caching requirements. > Currently, if someone wants to filter out a small number of documents, they > must specify the complete set of documents to express those negative > conditions against. > q=foo&fq=id:[* TO *] -id:101 > In this example, to filter out a single document, the complete set of > documents (minus one) is generated, and a large bitset is cached. You could > also add the restriction to the main query, but that doesn't work with the > dismax handler which doesn't have a facility for this. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
On 1/18/07, Ryan McKinley <[EMAIL PROTECTED]> wrote: On 1/18/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: > On 1/18/07, Ryan McKinley <[EMAIL PROTECTED]> wrote: > > Yes, this proposal would fix the URL structure to be > > /path/defined/in/solrconfig:parser?params > > /${handler}:${parser} > > > > I *think* this cleanly handles most cases cleanly and simply. The > > only exception is where you want to extract variables from the URL > > path. > > But that's not a hypothetical case, extracting variables from the URL > path is something I need now (to add metadata about the data in the > raw post body, like the CSV separator). > > POST to http://localhost:8983/solr/csv?separator=,&fields=foo,bar,baz > with a body of "10,20,30" > Sorry, by "in the URL" I mean "in the URL path." The RequestParser can extract whatever it likes from getQueryString() The url you list above could absolutely be handled with the proposed format. Cool. I think i need more examples... concrete is good :-) I don't quite grok your format below... is it one line or two? /path/defined/in/solrconfig:parser?params /${handler}:${parser} Is that simply /${handler}:${parser}?params Or is it all one line where you actually have params twice? -Yonik
Re: graduation todo list
OK, I think I got all the important stuff on http://wiki.apache.org/solr/TaskList except subersion pointers from the Wiki. If you run across broken svn links, please help fix them. The old website has been redirected to the new, so those links are less important. -Yonik
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
On 1/18/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 1/18/07, Ryan McKinley <[EMAIL PROTECTED]> wrote: > Yes, this proposal would fix the URL structure to be > /path/defined/in/solrconfig:parser?params > /${handler}:${parser} > > I *think* this cleanly handles most cases cleanly and simply. The > only exception is where you want to extract variables from the URL > path. But that's not a hypothetical case, extracting variables from the URL path is something I need now (to add metadata about the data in the raw post body, like the CSV separator). POST to http://localhost:8983/solr/csv?separator=,&fields=foo,bar,baz with a body of "10,20,30" Sorry, by "in the URL" I mean "in the URL path." The RequestParser can extract whatever it likes from getQueryString() The url you list above could absolutely be handled with the proposed format. The thing that could not be handled is: http://localhost:8983/solr/csv/foo/bar/baz/ with body "10,20,30" > There are pleanty of ways to rewrite RESTfull urls into a > path+params structure. If someone absolutly needs RESTfull urls, it > can easily be implemented with a new Filter/Servlet that picks the > 'handler' and directly creates a SolrRequest from the URL path. While being able to customize something is good, having really good defaults is better IMO :-) We should also be focused on exactly what we want our standard update URLs to look like in parallel with the design of how to support them. again, i totally agree. My point is that I don't think we need to make the dispatch filter handle *all* possible ways someone may want to structure their request. It should offer the best defaults possible. If that is not sufficient, someone can extend it.
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
On 1/18/07, Ryan McKinley <[EMAIL PROTECTED]> wrote: Yes, this proposal would fix the URL structure to be /path/defined/in/solrconfig:parser?params /${handler}:${parser} I *think* this cleanly handles most cases cleanly and simply. The only exception is where you want to extract variables from the URL path. But that's not a hypothetical case, extracting variables from the URL path is something I need now (to add metadata about the data in the raw post body, like the CSV separator). POST to http://localhost:8983/solr/csv?separator=,&fields=foo,bar,baz with a body of "10,20,30" There are pleanty of ways to rewrite RESTfull urls into a path+params structure. If someone absolutly needs RESTfull urls, it can easily be implemented with a new Filter/Servlet that picks the 'handler' and directly creates a SolrRequest from the URL path. While being able to customize something is good, having really good defaults is better IMO :-) We should also be focused on exactly what we want our standard update URLs to look like in parallel with the design of how to support them. As a site note, with a change of URLs, we get a "free" chance to change whatever we want about the parameters or response format... backward compatibility only applies to the original URLs IMO. -Yonik
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
I'm confused by your sentence "A RequestParser converts a HttpServletRequest to a SolrRequest." .. i thought you were advocating that the servlet parse the URL to pick a RequestHandler, and then the RequestHandler dicates the RequestParser? I was... then you talked me out of it! You are correct, the client should determine the RequestParser independent of the RequestHandler. : /path/registered/in/solr/config:requestparser?params : : If no ':' is in the URL, use 'standard' parser : : 1. The URL path determins the RequestHandler : 2. The URL path determins the RequestParser : 3. SolrRequest = RequestParser.parse( HttpServletRequest ) : 4. handler.handleRequest( req, res ); : 5. write the response do you mean the path before hte colon determins the RequestHandler and the path after the colon determines the RequestParser? yes, that is my proposal. fine too ... i was specificly trying to avoid making any design decissions that required a particular URL structure, in what you propose we are dictating more then just the "/handler/path:parser" piece of the URL, we are also dicating that the Parser decides how the rest of the path and all URL query string data will be interpreted ... Yes, this proposal would fix the URL structure to be /path/defined/in/solrconfig:parser?params /${handler}:${parser} I *think* this cleanly handles most cases cleanly and simply. The only exception is where you want to extract variables from the URL path. There are pleanty of ways to rewrite RESTfull urls into a path+params structure. If someone absolutly needs RESTfull urls, it can easily be implemented with a new Filter/Servlet that picks the 'handler' and directly creates a SolrRequest from the URL path. In my opinion, for this level of customization is reasonable that people edit web.xml and put in their own servlets and filters. what i'm proposing is that the Servlet decide how to get the SolrParams out of an HttpServletRequest, using whatever URL that servlet wants; I guess I'm not understanding this yet: Are you suggesting there would be multiple servlets each with a different methods to get the SolrParams from the url? How does the servlet know if it can touch req.getParameter()? How would the default servlet fill up SolrParams? I think i'm getting confused ... i thought you were advocating that RequestParsers be implimented as ServletFilters (or Servlets) ... Originally I was... but again, you talked me out of it. (this time not totally) I think the /path:parser format is clear and allows for most everything off the shelf. If you want to do something different, that can easily be a custom filter (or servlet) Essentially, i think it is reasonable for people to skip 'RequestParsers' in a custom servlet and be able to build the SolrRequest directly. This level of customization is reasonable to handle directly with web.xml
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
OK, trying to catch up on this huge thread... I think I see why it's become more complicated than I originally envisioned. What I originally thought: 1) add a way to get a Reader or InputStream from SolrQueryRequest, and then reuse it for updates too 2) use the plugin name in the URL 3) write code that could handle multi-part post, or could grab args from the URL. 4) profit! I think the main additional complexity is the idea that RequestParser (#3) be both pluggable and able to be specified in the actual request. I hadn't considered that, and it's an interesting idea. Without pluggable RequestParser: - something like CSV loader would have to check the params for a "file" param and if so, open the local file themselves With a pluggable RequestParser: - the LocalFileRequestParser would be specified in the url (like /update/csv:local) and it will handle looking for the "file" param and opening the file. The CSV plugin can be a little simpler by just getting a Reader. - a new way of getting a stream could be developed (a new RequestParser) and most stream oriented plugins could just use it. However, I'm not yet convinced the benefits are worth the costs. If the number of RequestParsers remain small, and within the scope of being included in the core, that functionality could just be included in a single non-pluggable RequestParser. I'm not convinced is a bad idea either, but I'd like to hear about usecases for new RequestParsers (new ways of generically getting an input stream)? -Yonik
Re: subversion move
On 1/18/07, Zaheed Haque <[EMAIL PROTECTED]> wrote: A minor detail.. Logo on the Solr page still says "incubator"..shouldn't that be taken off now? or are they some "Graduation holding period" :=) I already changed this yesterday in subversion, but I didn't want to update the live site until the new download link worked (the download mirrors take a while to sync). It should be updated sometime today. -Yonik
Re: subversion move
Hi: A minor detail.. Logo on the Solr page still says "incubator"..shouldn't that be taken off now? or are they some "Graduation holding period" :=) Cheers On 1/18/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: Solr's source in subversion has moved within the ASF repository to to https://svn.apache.org/repos/asf/lucene/solr/ (Thanks Doug!) The easiest way to change your working directories is to use "svn switch". For example, if you have the "trunk" of solr checked out, cd to that directory and execute svn switch https://svn.apache.org/repos/asf/lucene/solr/trunk Don't forget to change any SVN paths that may be configured in your IDEs too. -Yonik
[jira] Commented: (SOLR-112) Hierarchical Handler Config
[ https://issues.apache.org/jira/browse/SOLR-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465689 ] Hoss Man commented on SOLR-112: --- Damn Ryan ... you keep taking on cool features nad churning out patches too fast for me to read them! this sounds like a cool idea, the one big caveat is documenting exactly how the NamedList "merge" method you wrote is expected to work ... ie: * what it does if both named lists have the same key? * does it do deep merging of nested named list/collections? * what does it do if one list has an element without a name (first and formost a NamedLIst is an order list after all - the names are optional) ..as far as unit tests go, the easiest way to test something like this is to start by writing a unit test of just the NamedList mergin logic -- independent of anything else (this class would be a good place to put a test of the SOLR-107 changes too by the way). next would be to test that the merge logic is getting used as you expect, with a test that uses a config file with several handlers all inheriting various properties from one another, and then a test that does queries against them -- the easiest way to do validate that the init params were getting inherited correctly would probably be to use an "EchoConfigRequestHandler" that looked something like this... public class EchoConfigRequestHandler impliments SolrRequestHandler { private NamedList initParams; public void init(NamedList nl) { initParams = nl); public void handleRequest(SOlrQueryRequest req, SolrQueryResponse rsp) { rsp.add(initParams); } } the AbstractSolrTestClass makes it easy for you to use any solrconfig file you want by overriding a method -- so you could even write one test case using one config file with lots of examples of inherited init params and a test method for each asserting that the params are what is expected, and then subclass it with another test class instance that's exactly the same except for the using a differnet solrconfig file where all of hte params are duplicated out explicitly -- testing your assumptions so to speak. > Hierarchical Handler Config > --- > > Key: SOLR-112 > URL: https://issues.apache.org/jira/browse/SOLR-112 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 1.2 >Reporter: Ryan McKinley >Priority: Minor > Fix For: 1.2 > > Attachments: SOLR-112.patch > > > From J.J. Larrea on SOLR-104 > 2. What would make this even more powerful would be the ability to "subclass" > (meaning refine and/or extend) request handler configs: If the requestHandler > element allowed an attribute extends="" and > chained the SolrParams, then one could do something like: >class="solr.DisMaxRequestHandler" > > > 0.01 > > text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 > > ... much more, per the "dismax" example in the sample solrconfig.xml ... > > ... and replacing the "partitioned" example ... >extends="search/products/all" > > > inStock:true > > -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
RE: Update Plugins (was Re: Handling disparate data sources in Solr)
: > With all this talk about plugins, registries etc., /me can't help : > thinking that this would be a good time to introduce the Spring IoC : > container to manage this stuff. I don't have a lot of familiarity with spring except for the XML configuration file used for telling the spring context what objects you want it to create on startup and what constructor args to pass then and what methods to call and so on -- with an easy ability to tell it to pass one object you had it construct as a param to another object you are hving it construct. on the whole, it seems really nice, and eventually using it to replace a lot of the home-growm configuration in SOlr would probably make a lot of sense ... but i don't think migrating to Spring is neccessary as part of the current push to support more configurable plugins for updates ... SOlr already has a pretty decent set of utilities for allowing class instances to be specified in the xml config file and have configuration arguments passed to them on initialization .. it's not as fancy as spring and it doesn't support as many features as spring, but it works well enough that it should be easy to use with the new plugins we start to add -- switching to spring right now would probably only complicate the issues, and probably wouldn't make adding Update plugins any easier. equally important: adding a few new types of plugins now probably won't make it any harder to switch to somehting like spring later ... which as i said, is something i definitely anticipate happening -Hoss
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
: I think the confusion is that (in my view) the RequestParser is the : *only* object able to touch the stream. I don't think anything should : happen between preProcess() and process(); A RequestParser converts a : HttpServletRequest to a SolrRequest. Nothing else will touch the : servlet request. that makes it the RequestParsers responsibility to dictate the URL format (if it's the only one that can touch the HttpServletRequest) i was proposing a method by which the Servlet could determine the URL format -- there could in fact be multiple servlets supporting different URL formats if we had some need for it -- and the RequestParser could generate streams based on the raw POST data and/or any streams it wants to find based on the SolrParams generated from the URL (ie: local files, remote resources, etc) I'm confused by your sentence "A RequestParser converts a HttpServletRequest to a SolrRequest." .. i thought you were advocating that the servlet parse the URL to pick a RequestHandler, and then the RequestHandler dicates the RequestParser? : /path/registered/in/solr/config:requestparser?params : : If no ':' is in the URL, use 'standard' parser : : 1. The URL path determins the RequestHandler : 2. The URL path determins the RequestParser : 3. SolrRequest = RequestParser.parse( HttpServletRequest ) : 4. handler.handleRequest( req, res ); : 5. write the response do you mean the path before hte colon determins the RequestHandler and the path after the colon determines the RequestParser? ... that would work fine too ... i was specificly trying to avoid making any design decissions that required a particular URL structure, in what you propose we are dictating more then just the "/handler/path:parser" piece of the URL, we are also dicating that the Parser decides how the rest of the path and all URL query string data will be interpreted -- which means if we have a PostBodyRequestParser and a LocalFileRequestParser and a RemoteUrlRequestParser and which all use the query stirng params to get the SolrParams for the request (and in the case of the last two: to know what file/url to parse) and then we decide that we want to support a URL structure that is more REST like and uses the path for including information, now we have to write a new version of all of those RequestParsers ( subclass of each probably) that knows what our new URL structure looks like ... even if that never comes up, every RequestParser (even custom ones written by users to use some crazy proprietery binary protocols we've never heard of to fetch stream of data has to worry about extracting the SOlrParams out of the URL. what i'm proposing is that the Servlet decide how to get the SolrParams out of an HttpServletRequest, using whatever URL that servlet wants; the RequestParser decides how to get the ContentStreams needed for that request -- in a way that can work regardless of wether the stream is acctually part of the HttpServletRequest, or just refrenced by a param in the the request; the RequestHandler decides what to do with those params and streams; and the the ResponseWriter decides how to format the results produced by the RequestHandler back to the client. : > : If anyone needs to customize this chain of events, they could easily : > : write their own Servlet/Filter : I don't *think* this would happen often, and the people would only do : it if they are unhappy with the default URL structure -> behavior : mapping. I am not suggesting this would be the normal way to : configure solr. I think i'm getting confused ... i thought you were advocating that RequestParsers be implimented as ServletFilters (or Servlets) ... but if that were the case it wouldn't just be able changing hte URL structure, it would be able picking new ways to get streams .. but that doesn't seem to be what you are suggesting, so i'm not sure what i was missunderstanding. -Hoss