[jira] Created: (SOLR-113) Some example + post.sh in docs in client/solrb
Some example + post.sh in docs in client/solrb -- Key: SOLR-113 URL: https://issues.apache.org/jira/browse/SOLR-113 Project: Solr Issue Type: Wish Components: clients - ruby - flare Environment: OSX 10.4 Reporter: Antonio Eggberg Priority: Trivial I tried flare today really nice :=) It would be nice to add some example docs like current Solr distro for the Ruby/Flare client.. If I understand correctly the exampledocs in Solr i.e /example/exampledocs is not compatible with solrb.. Maybe I am doing something wrong? if so please clarify and delete the issue. The issue is not so important but good for the folks that are impetiant. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Can this be achieved? (Was: document support for file system crawling)
: What I am really talking about, is this: There is a growing market for : simple search solutions that can work out of the box, and that can still : be customized. Something that: : - organizations can use on their network, out of the box : I am not looking to change Solr in that direction. But take a look at : Solr. Or Nutch. They are already built on Lucene and many other : projects. Why/not build something on top of this? Something more/else? : : I don't think that anyone is arguing that this product shouldn't exist : in the open-source world, just that it shouldn't be part of Solr's : mandate. It sounds like a cool project (though the closer you get to Exactly. Eivind: earlier in this thread, you were talking about having more crawling features, and document parsing features and built in to Solr, and i got hte impression that you didn't like the idea that they could be loosely coupled external applications ... but if your interest is in having an enterprise search solution that people can deploy on a box and haveit start working for them, then there is no reason for all of that code to run in a single JVM using a single code base -- i'm going to go out on a limb and guess that that the Google Appliances run more then a single process :) given a collection of loosely coupled pieces, including Solr, including Nutch, including whatever future document parsing contribs might be written for either SOlr or Nutch ... you could bundle them all together into an enterprise search system that when installed deployed them all and coupled them together and had a GUI for configuring them ... but that would be a seperate project from Solr -- just as Solr and Nutch are seperate projects from Java-Lucene ... it's all about laysers built on top of layers that allow for reuse. -Hoss
Re: Can this be achieved? (Was: document support for file system crawling)
On 1/19/07 10:33 AM, Chris Hostetter [EMAIL PROTECTED] wrote: [...] but if your interest is in having an enterprise search solution that people can deploy on a box and haveit start working for them, then there is no reason for all of that code to run in a single JVM using a single code base -- i'm going to go out on a limb and guess that that the Google Appliances run more then a single process :) Ultraseek does exactly that and is a single multi-threaded process. A single process is much easier for the admin. A multi-process solution is more complicated to start up, monitor, shut down, and upgrade. There is decent demand for a spidering enterprise search engine. Look at the Google Appliance, Ultraseek, and IBM OmniFind. The free IBM OmniFind Yahoo! Edition uses Lucene. I'd love to see the Ultraseek spider connected to Solr, but that depends on Autonomy. wunder -- Walter Underwood Search Guru, Netflix
RE: separate log files
: I'm running multiple instances of Solr, which all using the same war : file to load from. To log to separate files I implemented the following : kludge. Ben: I'm glad you managed to get your situation working, but did you try the instructions on the TomCat documentation page about configuring seperate loggers per context? if it didnt' work, did you try mailing the tomcat user list? what you have here is definitely a kludge as you say ... and not something i would recommend in general ... for starters, it assumes there will allways be a logging.properties file, besides the possibility that it won't be there, this also doesn't play nicely with the possibility of someone using the java.util.logging.config.file or java.util.logging.config.class properties ... not to mention the fact that Servlet containers are totally within thir right to control logging programaticly using the public LogManager APIs based on configuration options from their own config files well before any applications are initialized ... and this approach would undo any of that configuration -- which could break the servlet contains own logs not just the logging info from the individual webapps. : !--start SolrServlet.java.diff-- : 23d22 : import org.apache.solr.request.SolrQueryResponse; : 24a24 : import org.apache.solr.request.SolrQueryResponse; : 33a34,36 : : import java.io.ByteArrayInputStream; : import java.io.ByteArrayOutputStream; : 34a38,39 : import java.io.InputStream; : import java.io.OutputStream; : 35a41,42 : import java.util.Properties; : import java.util.logging.LogManager; : 47a55,80 :/* : * switch java.util.logging.Logger appenders : * : * Add the following to the web context file : * Environment name=solr/log-prefix type=java.lang.String : value=log-prefix. override=false / : */ : private void switchAppenders(String prefix) { : String logParam = org.apache.juli.FileHandler.prefix; : log.info(switching appender to + logParam + = + : prefix); : Properties props = new Properties(); : try { : InputStream configStream = : getClass().getResourceAsStream(/logging.properties); : props.load(configStream); : configStream.close(); : props.setProperty(logParam, prefix); : ByteArrayOutputStream os = new : ByteArrayOutputStream(); : props.store((OutputStream)os, LOGGING : PROPERTIES); : LogManager.getLogManager().readConfiguration(new : ByteArrayInputStream(os.toByteArray())); : log.info(props: + props.toString()); : } : catch(Exception e) { : String errMsg = Error: Cannot load : configuration file; Cause: + e.getMessage(); : log.info(errMsg); : } : } : : 48a82 : : 52c86,91 : : --- : :// change the logging properties :String prefix = : (String)c.lookup(java:comp/env/solr/log-prefix); :if (prefix!=null) :switchAppenders(prefix); : : 64a104 : : !--end SolrServlet.java.diff-- : : : -Original Message- : From: Chris Hostetter [mailto:[EMAIL PROTECTED] : Sent: Wednesday, 17 January 2007 6:04 AM : To: solr-user@lucene.apache.org : Subject: Re: separate log files : : : : I wonder of jetty or tomcat can be configured to put logging output : : for different webapps in different log files... : : i've never tried it, but the tomcat docs do talk about tomcat : providing a custom implimentation of java.util.logging : specificly for this purpose. : : Ben: please take a look at this doc... : : http://tomcat.apache.org/tomcat-5.5-doc/logging.html : : ..specifically the section on java.util.logging (since that's : what Solr : uses) ... I believe you'll want something like the Example : logging.properties file to be placed in common/classes so : that you can control the logging. : : Please let us all know if this works for you ... it would : make a great addition to the SolrTomcat wiki page. : : : : On 1/15/07, Ben Incani [EMAIL PROTECTED] wrote: : : Hi Solr users, : : : : I'm running multiple instances of Solr, which all using : the same war : : file to load from. : : : : Below is an example of the servlet context file used for each : : application. : : : : Context path=/app1-solr docBase=/var/usr/solr/solr-1.0.war : : debug=0 crossContext=true : : Environment name=solr/home type=java.lang.String : : value=/var/local/app1 override=true / : : /Context : : : : Hence each application is using the same : : WEB-INF/classes/logging.properties file to configure logging. : : : : I would like to each instance to log to separate log : files such as; : : app1-solr.-mm-dd.log : : app2-solr.-mm-dd.log : : ... : : : : Is there an easy way to append
Re: Can this be achieved? (Was: document support for file system crawling)
On 1/19/07, Walter Underwood [EMAIL PROTECTED] wrote: Ultraseek does exactly that and is a single multi-threaded process. A single process is much easier for the admin. A multi-process solution is more complicated to start up, monitor, shut down, and upgrade. There is decent demand for a spidering enterprise search engine. Look at the Google Appliance, Ultraseek, and IBM OmniFind. The free IBM OmniFind Yahoo! Edition uses Lucene. I'd love to see the Ultraseek spider connected to Solr, but that depends on Autonomy. You could accomplish this by throwing them together as various webapps in a single container instance. -MIke
[jira] Created: (SOLR-114) HashDocSet new hash(), andNot(), union()
HashDocSet new hash(), andNot(), union() Key: SOLR-114 URL: https://issues.apache.org/jira/browse/SOLR-114 Project: Solr Issue Type: Improvement Components: search Reporter: Yonik Seeley Looking at the negative filters stuff, I realized that andNot() had no optimized implementation for HashDocSet, so I implemented that and union(). While I was in there, I did a re-analysis of hash collision rates and came up with a cool new hash method that goes directly into a linear scan and is hence simpler, faster, and has fewer collisions. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (SOLR-114) HashDocSet new hash(), andNot(), union()
[ https://issues.apache.org/jira/browse/SOLR-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-114: -- Attachment: hashdocset.patch HashDocSet new hash(), andNot(), union() Key: SOLR-114 URL: https://issues.apache.org/jira/browse/SOLR-114 Project: Solr Issue Type: Improvement Components: search Reporter: Yonik Seeley Attachments: hashdocset.patch Looking at the negative filters stuff, I realized that andNot() had no optimized implementation for HashDocSet, so I implemented that and union(). While I was in there, I did a re-analysis of hash collision rates and came up with a cool new hash method that goes directly into a linear scan and is hence simpler, faster, and has fewer collisions. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (SOLR-114) HashDocSet new hash(), andNot(), union()
[ https://issues.apache.org/jira/browse/SOLR-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466154 ] Yonik Seeley commented on SOLR-114: --- Performance results: - HashDocSet.exists() is 13% faster - HashDocSet.intersectionSize() is thus 9% faster - HashDocSet.union() is 20 times faster - HashDocSet.andNot() is 27 times faster Tested with Sun JDK6 -server on a P4 HashDocSet new hash(), andNot(), union() Key: SOLR-114 URL: https://issues.apache.org/jira/browse/SOLR-114 Project: Solr Issue Type: Improvement Components: search Reporter: Yonik Seeley Attachments: hashdocset.patch Looking at the negative filters stuff, I realized that andNot() had no optimized implementation for HashDocSet, so I implemented that and union(). While I was in there, I did a re-analysis of hash collision rates and came up with a cool new hash method that goes directly into a linear scan and is hence simpler, faster, and has fewer collisions. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (SOLR-114) HashDocSet new hash(), andNot(), union()
[ https://issues.apache.org/jira/browse/SOLR-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466160 ] Hoss Man commented on SOLR-114: --- quick questions... 1) what test did you run to get those numbers? ... even if we don't commit it, we should attach it to this Jira issue 2) we should probably test at least the Sun 1.5 JVM as well right? HashDocSet new hash(), andNot(), union() Key: SOLR-114 URL: https://issues.apache.org/jira/browse/SOLR-114 Project: Solr Issue Type: Improvement Components: search Reporter: Yonik Seeley Attachments: hashdocset.patch Looking at the negative filters stuff, I realized that andNot() had no optimized implementation for HashDocSet, so I implemented that and union(). While I was in there, I did a re-analysis of hash collision rates and came up with a cool new hash method that goes directly into a linear scan and is hence simpler, faster, and has fewer collisions. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (SOLR-114) HashDocSet new hash(), andNot(), union()
[ https://issues.apache.org/jira/browse/SOLR-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466166 ] Yonik Seeley commented on SOLR-114: --- The performance tests are commented out in the TestDocSet test... I had other changes in my tree related to negative queries and only selected the two source files for diffs. I had quickly tested Java5 to make sure it was still faster in all instances, and it was. Numbers were about the same, some speedups larger and some smaller than Java6. HashDocSet new hash(), andNot(), union() Key: SOLR-114 URL: https://issues.apache.org/jira/browse/SOLR-114 Project: Solr Issue Type: Improvement Components: search Reporter: Yonik Seeley Attachments: hashdocset.patch Looking at the negative filters stuff, I realized that andNot() had no optimized implementation for HashDocSet, so I implemented that and union(). While I was in there, I did a re-analysis of hash collision rates and came up with a cool new hash method that goes directly into a linear scan and is hence simpler, faster, and has fewer collisions. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (SOLR-114) HashDocSet new hash(), andNot(), union()
[ https://issues.apache.org/jira/browse/SOLR-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-114: -- Attachment: test.patch HashDocSet new hash(), andNot(), union() Key: SOLR-114 URL: https://issues.apache.org/jira/browse/SOLR-114 Project: Solr Issue Type: Improvement Components: search Reporter: Yonik Seeley Attachments: hashdocset.patch, test.patch Looking at the negative filters stuff, I realized that andNot() had no optimized implementation for HashDocSet, so I implemented that and union(). While I was in there, I did a re-analysis of hash collision rates and came up with a cool new hash method that goes directly into a linear scan and is hence simpler, faster, and has fewer collisions. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (SOLR-114) HashDocSet new hash(), andNot(), union()
[ https://issues.apache.org/jira/browse/SOLR-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466176 ] Yonik Seeley commented on SOLR-114: --- tested on an AMD opteron, 64 bit mode, Java5 -server -Xbatch and exists() was 8.5% faster, intersectionSize() was 7% faster. I didn't bother testing union(), andNot(), as they are obviously going to be much faster. HashDocSet new hash(), andNot(), union() Key: SOLR-114 URL: https://issues.apache.org/jira/browse/SOLR-114 Project: Solr Issue Type: Improvement Components: search Reporter: Yonik Seeley Attachments: hashdocset.patch, test.patch Looking at the negative filters stuff, I realized that andNot() had no optimized implementation for HashDocSet, so I implemented that and union(). While I was in there, I did a re-analysis of hash collision rates and came up with a cool new hash method that goes directly into a linear scan and is hence simpler, faster, and has fewer collisions. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
On 1/19/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 1/19/07, Chris Hostetter [EMAIL PROTECTED] wrote: whoa ... hold on a minute, even if we use a ServletFilter do do all of the dispatching instead of a Servlet we still need a base path right? I thought that's what the filter gave you... the ability to filter any URL to the /solr webapp, and Ryan was doing a lookup on the next element for a request handler. yes, this is the beauty of a Filter. It *can* process the request and/or it can pass it along. There is no problem at all with mapping a filter to all requests and a servlet to some paths. The filter will only handle paths declared in solrconfig.xml everything else will be handled however it is defined in web.xml (As a sidenote, wicket 2.0 replaces their dispatch servlet with a filter - it makes it MUCH easier to have their app co-exist with other things in a shared URL structure.)
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
then all is fine and dandy ... but what happens if someone tries to configure a plugin with the name admin ... now all of the existing admin pages break. that is exactly what you would expect to happen if you map a handler to /admin. The person configuring solrconfig.xml is saying Hey, use this instead of the default /admin. I want mine to make sure you are logged in using my custom authentication method. In addition, It may be reasonable (sometime in the future) to implement /admin as a RequestHandler. This could be a clean way to address SOLR-58 (xml with stylesheets, or JSON, etc...) also: what happens a year from now when we add some completely new Servlet/ServletFilter to Solr, and want to give it a unique URL... http://host:/solr/bar/ obviously, I think the default solr settings should be prudent about selecting URLs. The standard configuration should probably map most things to /select/xxx or /update/xxx. ...we could put it earlier in the processing chain before the existing ServletFilter, but then we break any users that have registered a plugin with the name bar. Even if we move this to have a prefix path, we run into the exact same issue when sometime down the line solr has a default handler mapped to 'bar' /solr/dispatcher/bar But, if it ever becomes a problem, we can add an excludes pattern to the filter-config that would skip processing even if it maps to a known handler. more short term: if there is no prefix that the ervletFilter requires, then supporting the legacy http://host:/solr/update; and http://host:/solr/select; URLs becomes harder, I don't think /update or /select need to be legacy URLs. They can (and should) continue work as they currently do using a new framework. The reason I was suggesting that the Handler interface adds support to ask for the default RequestParser and/or ResponseWriter is to support this exact issue. (However in the case of path=/select the filter would need to get the handler from ?qt=xxx) - - - - - All that said, this could just as cleanly map everything to: /solr/dispatch/update/xml /solr/cmd/update/xml /solr/handle/update/xml /solr/do/update/xml thoughts?
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
On 1/19/07, Ryan McKinley [EMAIL PROTECTED] wrote: All that said, this could just as cleanly map everything to: /solr/dispatch/update/xml /solr/cmd/update/xml /solr/handle/update/xml /solr/do/update/xml thoughts? That was my original assumption (because I was thinking of using servlets, not a filter), but I see little advantage to scoping under additional path elements. I also agree with the other points you make. -Yonik
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
(Note: this is different then what i have suggested before. Treat it as brainstorming on how to take what i have suggested and mesh it with your concerns) What if: The RequestParser is not be part of the core API - It would be a helper function for Servlets and Filters that call the core API. It could be configured in web.xml rather then solrconfig.xml. A RequestDispatcher (Servlet or Filter) would be configured with a single RequestParser. The RequestParser would be in charge of taking HttpRequest and determining: 1) The RequestHandler 2) The SolrRequest (Params Streams) It would not be the most 'pluggable' of plugins, but I am still having trouble imagining anything beyond a single default RequestParser. Assuming anything doing *really* complex ways of extracting ContentStreams will do it in the Handler not the request parser. For reference see my argument for a seperate DocumentParser interface in: http://www.nabble.com/Re%3A-Update-Plugins-%28was-Re%3A-Handling-disparate-data-sources-in-Solr%29-p8386161.html In my view, the default one could be mapped to /* and a custom one could be mapped to /mycustomparser/* This would drop the ':' from my proposed URL and change the scheme to look like: /parser/path/the/parser/knows/how/to/extract/?params This would give people a relativly easy way to implement 'restful' URLs if they need to. (but they would have to edit web.xml) : Would that be configured in solrconfig.xml as handler name=xml? : name=update/xml? If it is update/xml would it only really work if : the 'update' servlet were configured properly? it would only make sense to map that as xml ... the SolrCore (and hte solrconfig.xml) shouldn't have any knowledge of the Servlet/ServletFilter base paths because it should be possible to use the SolrCore independent of any ServletContainer (if for no other reason in unit tests) Correct, SolrCore shoudl not care what the request path is. That is why I want to deprecate the execute( ) function that assumes the handler is defined by 'qt' Unit tests should be handled by execute( handler, req, res ) If I had my druthers, It would be: res = handler.execute( req ) but that is too big of leap for now :) ... A third use case of doing queries with POST might be that you want to use standard CGI form encoding/multi-part file upload semantics of HTTP to send an XML file (or files) to the above mentioned XmlQPRequestHandler ... so then we have MultiPartMimeRequestParser ... I agree with all your use cases. It just seems like a LOT of complex overhead to extract the general aspects of translating a URL+Params+Streams = Handler+Request(Params+Streams) Again, since the number of 'RequestParsers' is small, it seems overly complex to have a separate plugin to extract URL, another to extract the Handler, and another to extract the streams. Particulary since the decsiions on how you parse the URL can totally affect the other aspects. ...i really, really, REALLY don't like the idea that the RequestParser Impls -- classes users should be free to write on their own and plugin to Solr using the solrconfig.xml -- are responsible for the URL parsing and parameter extraction. Maybe calling them RequestParser in my suggested design is missleading, maybe a better name like StreamExtractor would be better ... but they shouldn't be in charge of doing anything with the URL. What if it were configured in web.xml, would you feel more comfortable letting it determine how the URL is parsed and streams are extracted? Imagine if 3 years ago, when Yonik and I were first hammering out the API for SolrRequestHandlers, we had picked this... public interface SolrRequestHandlers extends SolrInfoMBean { public void init(NamedList args); public void handleRequest(HttpServletRequest req, SolrQueryResponse rsp); } Thank goodness you didn't! I'm confident you won't let me (or anyone) talk you into something like that! You guys made a lot of good choices and solr is an amazing platform for it. That said, the task at issue is: How do we convert an arbitrary HttpServletRequest into a SolrRequest. I am proposing we have a single interface to do this: SolrRequest r = RequestParser.parse( HttpServletRequest ) You are proposing this is broken down further. Something like: Handler h = (the filter) getHandler( req.getPath() ) SolrParams = (the filter) do stuff to extract the params (using parser.preProcess()) ContentStreams = parser.parse( request ) While it is not great to have plugins manipulate the HttpRequest - someone needs to do it. In my opinion, the RequestParser's job is to isolate *everything* *else* from the HttpServletRequest. Again, since the number of RequestParser is small, it seems ok (to me) keeping HttpServletRequest out of the API for RequestParsers helps us future-proof against breaking plugins down the road. I agree. This is why i suggest the RequestParsers is not a core part of the API, just a helper class
Re: [jira] Commented: (SOLR-114) HashDocSet new hash(), andNot(), union()
On 1/19/07, Yonik Seeley (JIRA) [EMAIL PROTECTED] wrote: [ https://issues.apache.org/jira/browse/SOLR-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466176 ] Yonik Seeley commented on SOLR-114: --- tested on an AMD opteron, 64 bit mode, Java5 -server -Xbatch and exists() was 8.5% faster, intersectionSize() was 7% faster. I didn't bother testing union(), andNot(), as they are obviously going to be much faster. Nice job! -Mike
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
First Ryan, thank you for your patience on this *very* long hash session. Most wouldn't last that long unless it were a flame war ;-) And thanks to Hoss, who seems to have the highest read+response bandwidth of anyone I've ever seen (I'll admit I've only been selectively reading this thread, with good intentions of coming back to it). On 1/19/07, Ryan McKinley [EMAIL PROTECTED] wrote: It would not be the most 'pluggable' of plugins, but I am still having trouble imagining anything beyond a single default RequestParser. Assuming anything doing *really* complex ways of extracting ContentStreams will do it in the Handler not the request parser. Agreed... a custom handler opening various streams not covered by the default will most easily be handled by the handler opening the streams themselves. This would give people a relativly easy way to implement 'restful' URLs if they need to. (but they would have to edit web.xml) A handler could alternately get the rest of the path (absent params), right? Correct, SolrCore shoudl not care what the request path is. That is why I want to deprecate the execute( ) function that assumes the handler is defined by 'qt' Unit tests should be handled by execute( handler, req, res ) How does the unit test get the handler? If I had my druthers, It would be: res = handler.execute( req ) but that is too big of leap for now :) Yep... esp since the response writers now need the request for parameters, for the searcher (streaming docs, etc). You guys made a lot of good choices and solr is an amazing platform for it. I just wish I had known Lucene when I *started* Sol(a)r ;-) I am proposing we have a single interface to do this: SolrRequest r = RequestParser.parse( HttpServletRequest ) That's currently what new SolrServletRequest(HttpServletRequest) does. We just need to figure out how to get InputStreams, Readers, etc. I agree. This is why i suggest the RequestParsers is not a core part of the API, just a helper class for Servlets and Filters. Sounds good to as a practical starting point to me. If we need more in the future, we can add it then. USECASE: The XML update plugin using the woodstox XML parser: Woodstox docs say to give the parser an InputStream (with char encoding, if available) for best performance. This is also preferable since if the char isn't specified, the parser can try to snoop it from the stream. So, the hander needs to be able to get an InputStream, and HTTP headers. Other plugins (CSV) will ask for a Reader and expect the details to be ironed out for it. Method1: come up with ways to expose all this info through an interface... a headers object could be added to the SolrRequest context (see getContext()) Method2: consider it a more special case, have an XML update servlet that puts that info into the SolrRequest (perhaps via the context again) -Yonik
[jira] Created: (SOLR-115) replace BooleanQuery.getClauses() with clauses()
replace BooleanQuery.getClauses() with clauses() Key: SOLR-115 URL: https://issues.apache.org/jira/browse/SOLR-115 Project: Solr Issue Type: Improvement Reporter: Yonik Seeley Priority: Minor Basically, take advantage of http://issues.apache.org/jira/browse/LUCENE-745 after we update lucene versions. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
: First Ryan, thank you for your patience on this *very* long hash I could not agree more ... as i was leaving work this afternoon, it occured to me I really hope Ryan realizes i like all of his ideas, i'm just wondering if they can be better -- most people I work with don't have the stamina to deal with my design reviews :) What occured to me as i was *getting* home was that since I seem to be the only one that's (overly) worried about the RequestParser/HTTP abstraction -- and since i haven't managed to convince Ryan after all of my badgering -- it's probably just me being paranoid. I think in general, the approach you've outlined should work great -- i'll reply to some of your more recent comments directly. -Hoss
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
On 1/19/07, Chris Hostetter [EMAIL PROTECTED] wrote: : First Ryan, thank you for your patience on this *very* long hash I could not agree more ... as i was leaving work this afternoon, it occured to me I really hope Ryan realizes i like all of his ideas, i'm just wondering if they can be better -- most people I work with don't have the stamina to deal with my design reviews :) Thank you both! This is the first time I've taken the time and effort to contribute to an open source project. I'm learning the pace/etiquette etc as I go along :) Honestly your critique is refreshing - I'm used to working alone or directing others. I *think* we are close to something we will all be happy with.
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
what!? .. really? ... you don't think the ones i mentioned before are things we should support out of the box? - no stream parser (needed for simple GETs) - single stream from raw post body (needed for current updates - multiple streams from multipart mime in post body (needed for SOLR-85) - multiple streams from files specified in params (needed for SOLR-66) - multiple streams from remote URL specified in params I have imagined the single default parser handles *all* the cases you just mentioned. GET: read params from paramMap(). Check thoes params for special params that send you to one or many remote streams. POST: depending on headers/content type etc you parse the body as a single stream, multi-part files or read the params. It will take some careful design, but I think all the standard cases can be handled by a single parser.
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
On 1/20/07, Ryan McKinley [EMAIL PROTECTED] wrote: what!? .. really? ... you don't think the ones i mentioned before are things we should support out of the box? - no stream parser (needed for simple GETs) - single stream from raw post body (needed for current updates - multiple streams from multipart mime in post body (needed for SOLR-85) - multiple streams from files specified in params (needed for SOLR-66) - multiple streams from remote URL specified in params I have imagined the single default parser handles *all* the cases you just mentioned. Yes, this is what I had envisioned. And if we come up with another cool standard one, we can add it and all the current/older handlers get that additional behavior for free. -Yonik
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
: : This would drop the ':' from my proposed URL and change the scheme to look like: : /parser/path/the/parser/knows/how/to/extract/?params i was totally okay with the : syntax (although we should double check if : is actaully a legal unescaped URL character) .. but i'm confused by this new suggestions ... is parser the name of the parser in that example and path/the/parser/knows/how/to/extract data that the parser may use to build to SolrRequest with? (ie: perhaps the RequestHandler) would parser names be required to not have slashes in them in that case? (working with the assumption that most cases can be defined by a single request parser) I am/was suggesting that a dispatch servlet/fliter has a single request parser. The default request parser will choose the handler based on names defined in solrconfig.xml. If someone needs a custom RequestParser, it would be linked to a new servlet/filter (possibly) mapped to a distinct prefix. If it is not possible to handle most standard stream cases with a single request parser, i will go back to the /path:parser format. I suggest it is configured in web.xml because that is a configurable place that is not solrconfg.xml. I don't think it is or should be a highly configurable component. : : Thank goodness you didn't! I'm confident you won't let me (or anyone) : talk you into something like that! You guys made a lot of good the point i was trying to make is that if we make a RequestParser interface with a parseRequest(HttpServletRequest req) method, it amouts to just as much badness -- the key is we can make that interface as long as all the implimentations are in the SOlr code base where we can keep an eye on them, and people have to go way, WAY, *WAY* into solr to start shanging them. Yes, implementing a RequestParser is more like writing a custom Servlet then adding a Tokenizer.
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
On 1/20/07, Chris Hostetter [EMAIL PROTECTED] wrote: : I have imagined the single default parser handles *all* the cases you : just mentioned. A ... a lot of confusing things make more sense now. .. but some things are more confusing: If there is only one parser, and it decides what to do based entirely on param names and HTTP headers, then what's the point of having the parser name be part of the path in your URL design? I didn't think it would be part of the URL anymore. : POST: depending on headers/content type etc you parse the body as a : single stream, multi-part files or read the params. : : It will take some careful design, but I think all the standard cases : can be handled by a single parser. that scares me ... not only does it rely on the client code sending the correct content-type Not really... that would perhaps be the default, but the parser (or a handler) can make intelligent decisions about that. If you put the parser in the URL, then there's *that* to be messed up by the client. (i don't trust HTTP Client code -- but for the sake of argument let's assume all clients are perfect) what happens when a person wants to send a mim multi-part message *AS* the raw post body -- so the RequestHandler gets it as a single ContentStream (ie: single input stream, mime type of multipart/mixed) ? Multi-part posts will have the content-type set correctly, or it won't work. The big use-case I see is browser file upload, and they will set it correctly. This may sound like a completely ridiculous idea, but consider the situation where someone is indexing email ... they've written a RequestHandler that knows how to parser multipart mime emails and convert them to documents, they want to POST them directly to Solr and let their RequestHandler deal with them as a single entity. We should not preclude wacky handlers from doing things for themselves, calling our stuff as utility methods. ..i think life would be a lot simpler if we kept the RequestParser name as part of hte URL, completely determined by the client (since the client knows what it's trying to send) ... even if there are only 2 or 3 types of RequestParsing being done. Having to do different types of posts to different URLs doesn't seem optimal, esp if we can do it in one. -Yonik