Re: Update Plugins (was Re: Handling disparate data sources in Solr)
On Fri, 2007-01-12 at 15:41 -0500, Yonik Seeley wrote: On 1/10/07, Chris Hostetter [EMAIL PROTECTED] wrote: The one hitch i think to the the notion that updates and queries map cleanlly with something like this... SolrRequestHandler = SolrUpdateHandler SolrQueryRequest = SolrUpdateRequest SolrQueryResponse = SolrUpdateResponse (possibly the same class) QueryResponseWriter = UpdateResponseWriter (possible the same class) ...is that with queries, the input tends to be fairly simple. very generic code can be run by the query Servlet to get all of the input params and build the SolrQueryRequest ... but with updates this isn't quite as simple. there's the two issues i spoke of in my earlier mail which should be independenly confiugable: 1) where does the stream of update data come from? is it in the raw POST body? is it in a POSTed multi-part MIME part? is it a remote resource refrenced by URL? 2) how should the raw binary stream of update data be parsed? is it XML? (in the current update format) is it a CSV file? is it a PDF? ...#2 can be what the SolrUpdateHandler interface is all about -- when hitting the update url you specify a ut (update type) that determines that logic ... but it should be independed of #1 Right, you're getting at issues of why I haven't committed my CSV handler yet. It currently handles reading a local file (this is more like an SQL update handler... only a reference to the data is passed). But I also wanted to be able to handle a POST of the data , or even a file upload from a browser. Then I realized that this should be generic... the same should also apply to XML updates, and potential future update formats like JSON. I do not see the problem here. One just need to add a couple of lines in the upload servlet and change the csv plugin to input stream (not local file). See https://issues.apache.org/jira/secure/attachment/12347425/solar-85.with.file.upload.diff ... +boolean isMultipart = ServletFileUpload +.isMultipartContent(new ServletRequestContext(request)); ... +if (isMultipart) { +// Create a new file upload handler ... +commandReader = new BufferedReader(new InputStreamReader(stream)); Now instead of +core.update(commandReader, responseWriter); one would use the updateHandler for the in the request defined format (format=json) UpdateHandler handler = core.lookupUpdateHandler(format); handler.update(commandReader, responseWriter); Or do I miss something? salu2
[jira] Commented: (SOLR-86) [PATCH] standalone updater cli based on httpClient
[ https://issues.apache.org/jira/browse/SOLR-86?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464709 ] Bertrand Delacretaz commented on SOLR-86: - I like the idea of a very simple update only client. It's probably simple enough to create two versions, one using HttpClient and one with no dependencies apart from the JDK? I agree with Hoss that the post.sh replacement should use the latter. IMHO it's good to show the use of HttpClient for people who're going to base more complex clients on it, and a no depedencies client is useful for simple cases. Maybe (thinking outloud here) both clients could implement a common SolrUpdateClientInterface, and update+search clients would implement a SolrSearchInterface as well. [PATCH] standalone updater cli based on httpClient --- Key: SOLR-86 URL: https://issues.apache.org/jira/browse/SOLR-86 Project: Solr Issue Type: New Feature Components: update Reporter: Thorsten Scherler Attachments: simple-post-using-urlconnection-approach.patch, solr-86.diff, solr-86.diff We need a cross platform replacement for the post.sh. The attached code is a direct replacement of the post.sh since it is actually doing the same exact thing. In the future one can extend the CLI with other feature like auto commit, etc.. Right now the code assumes that SOLR-85 is applied since we using the servlet of this issue to actually do the update. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (SOLR-86) [PATCH] standalone updater cli based on httpClient
[ https://issues.apache.org/jira/browse/SOLR-86?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464737 ] Thorsten Scherler commented on SOLR-86: --- Hi Hoiss, I had a look at your version and it is good as gold. I personally prefer the httpClient since the method is smaller but Bertrand and ourself are right, the dependency jar price for a simple replacement is ATM too high. The only thing that I would add is directory support: ... + if (srcFile.exists()) { +if (srcFile.isDirectory()) { +File[] fileSet = srcFile.listFiles(); +for (int i = 0; i fileSet.length; i++) { +File file = fileSet[i]; +tool.postFile(file, out); +} else { +tool.postFile(srcFile, out); +} + System.out.println(); +} else { + System.err.println(srcFile + does not exist); +} I agree to your patch as official replacement of the post.sh. I further agree with Bertrand that we may include patch as base demonstration for more complex client apps. [PATCH] standalone updater cli based on httpClient --- Key: SOLR-86 URL: https://issues.apache.org/jira/browse/SOLR-86 Project: Solr Issue Type: New Feature Components: update Reporter: Thorsten Scherler Attachments: simple-post-using-urlconnection-approach.patch, solr-86.diff, solr-86.diff We need a cross platform replacement for the post.sh. The attached code is a direct replacement of the post.sh since it is actually doing the same exact thing. In the future one can extend the CLI with other feature like auto commit, etc.. Right now the code assumes that SOLR-85 is applied since we using the servlet of this issue to actually do the update. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (SOLR-109) variable substitution in lucene query params
variable substitution in lucene query params Key: SOLR-109 URL: https://issues.apache.org/jira/browse/SOLR-109 Project: Solr Issue Type: New Feature Reporter: Thorsten Scherler Allowing variable substitution in the lucene query params seems pretty slick ... a more general solution might be to modify the SolrQueryParser directly to have a new void setParamVariables(SolrParams p) method. http://marc.theaimsgroup.com/?t=11671237641r=1w=2 -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: switch to native locks by default?
Chris Hostetter wrote: : Ah, I hadn't realized that they might not be supported everywhere... I I'm just trusting the javadoc for NativeFSLockFactory ... i have no idea if it's accurate or not. Hi! I had added the caveat about native locks based on my dicey experience getting them working over NFS (NFS locking was not turned on by default in my setup; and, frustratingly, it would take ~35 seconds for a timeout to tell me this). I don't have any specific evidence that other OS/filesystems are problematic but then again I haven't done much research to understand overall portability of Java's native lock interface. It would not surprise me if other OS/filesystems had issues. I was hoping by getting the NativeFSLockFactory out there that it would get some healthy testing first and then we could use that feedback to decide whether benefits outweigh the risks of making it the default. It's not clear how many people have actually tested it at this point, though! : The current locking can also guard against mistakes though (multiple : instances of Solr trying to write to the same dir, someone opening a : Luke index on it, etc). right ... but it's only useful if all of the potential clients are using the same locking mechanism ... right now it's only safe to do any of those things if all the apps use SimpleFSLockFactory. all the more reason to make the factory and the lockDir configurable in Solr i guess. Mike
Re: svn commit: r496274 - /incubator/solr/trunk/src/java/org/apache/solr/core/Config.java
: Log: SolrConfig says 'system property solr.solr.home not set' in the : log, when using default Solr home this seems like an odd thing to call out in the log ... it implies the system proerty should be set, but using JNDI to set the solr.home is just as valid of a way to specify where things liv : -log.info(Solr home defaulted to ' + instanceDir + '); : +log.info(Solr home defaulted to ' + instanceDir + ' (system property + prop + not set)); -Hoss
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
: The most important issue is to nail down the external HTTP interface. I'm not sure if i agree with that statement .. i would think that figuring out the model or how updates should be handled in a generic way, what all of the Plugin types are, and what their APIs should be is the most important issue -- once we have those issues settled we could allways write a new SolrServlet2 that made the URL structure work anyway we want. -Hoss
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
:SolrRequestHandler = SolrUpdateHandler :SolrQueryRequest = SolrUpdateRequest :SolrQueryResponse = SolrUpdateResponse (possibly the same class) :QueryResponseWriter = UpdateResponseWriter (possible the same class) : : : Is there any reason the plugin system needs a different RequestObject : for Query vs Update? as i said: only to the extend that Updates tend to have streams of data that queries don't need (as far as i can imagine) : SolrRequest would be the current SolrQueryRequest augmented with the : HTTP method type and a way to get the raw post stream. the raw POST stream may not be where the data is though -- consider the file upload case, or the reading from a local file case, or the reading form a list of remote URLs specified in params. : I'm not sure the nitty gritty, but it should be as close to : HttpServletRequest as possible. If possible, I think handlers should : choose how to handle the stream. : : It it is a remote resource, I think its the handlers job to open the stream. i disagree ... it should be possible to create micro-plugins (I think i called them UpdateSource instances in my orriginal suggestion) that know about getting streams in various ways, but don't care what format of data is found on those streams -- that would be left for the (Update)RequestHandler (which wouldn't need to know where the data came from) a JDBC/SQL updater would probably be a very special case -- where the format and the stream are inheriently related -- in which case a No-Op UpdateSource could be used that didn't provide any stream, and the JdbcUpdateRequestHandler would manage it's JDBC streams directly. : Likewise I don't see anything in QueryResponseWriter that should tie : it to 'Query.' Could it just be ResponseWriter? probably -- as i said, both it and SolrQueryResponse could probably be reused, the only hitch is that their names might be confusing (we could allways refactor all of their guts into super classes, and deprecate the existing classes) : While we are at it... is there any reason (for or against) exposing : other parts of the HttpServletRequest to SolrRequestHandlers? the biggest one is Unit testing -- giving plugins very simple APIs that don't require a lot of knowledge about external APIs make it much easier to test them. it also helps make it possible for use to future proof plugins. other messages in this thread have discussed the possibility of changing the URL structure, supporting more restful URLs and things like that ... if we currently exposed lots of info from the HttpServletRequest in the SolrQueryRequest, then making changes like that in a backwards compatible way would be nearly impossible. As it stands, we can write a new Servlet that deals with input *completely* differently from the current URL structure, and be 99% certain that existing plugins will continue to work. : While it is not the focus of solr, someone (including me!) may want to : implement some more complex authentication scheme - Perhaps setting a : field on each document saying who added it and from what IP. : : stuff to consider: cookies, headers, remoteUser, remoteHost... all of that could concievably be done by changing the servlet to add that info into the SolrParams. -Hoss
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
On 1/15/07, Chris Hostetter [EMAIL PROTECTED] wrote: : The most important issue is to nail down the external HTTP interface. I'm not sure if i agree with that statement .. i would think that figuring out the model or how updates should be handled in a generic way, what all of the Plugin types are, and what their APIs should be is the most important issue -- once we have those issues settled we could allways write a new SolrServlet2 that made the URL structure work anyway we want. -Hoss I hate to inundate you with more code, but it seems like the best way to describe a possible interface. //--- interface ContentStream { String getName(); String getContentType(); InputStream getStream(); } interface SolrParams { String getParam( String name ); String[] getParams( String name ); } //- interface SolrRequest { SolrParams getParams(); ContentStream[] getContentStreams(); // Iterator? long getStartTime(); } interface SolrResponse { int getStatus(); // ??? NamedList getProps(); // ??? } //- interface SolrRequestProcessor { SolrResponse process( SolrRequest req ); SolrResponseWriter getWriter( SolrRequest req ); // default } interface SolrResponseWriter { void write(Writer writer, SolrRequest request, SolrResponse response); String getContentType(SolrRequest request, SolrResponse response); } //- Then a servlet (or filter) could be in charge of parsing URL/params into a request. It would pick a Processor and send the output to a writer. If someone wanted a custom URL scheme, they would overide the servlet/filter. Perhaps SolrRequest should have an object for solrCore. It would be better if it does not need to go to the static SolrCore.getUpdateHandler(). I am proposing ContentStream[] getContentStreams() because it would be simpler then an iterator. In the case of multipart upload, if you offered an API closer to: http://jakarta.apache.org/commons/fileupload/streaming.html You would not have any parameters until after you read each Item and convert the form fields to parameters. Thoughts?
[jira] Commented: (SOLR-105) Duck typing for Document/Field plus to/from solr conversions for field names.
[ https://issues.apache.org/jira/browse/SOLR-105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465016 ] Erik Hatcher commented on SOLR-105: --- Lets discuss this further. I'm not quite on board with document round tripping just yet, as I think we need more of a Lucene Hits-like concept on the Ruby side to navigate the results in an iterator/Enumerable type fashion. Keep in mind that what comes back from Solr may or may not be the full document that was added originally, due to the fields not being requested or the schema not configured to store them. Having partial documents on the Ruby side seems awkward from a user perspective and not adhere to the principle of least surprise. Duck typing for Document/Field plus to/from solr conversions for field names. -- Key: SOLR-105 URL: https://issues.apache.org/jira/browse/SOLR-105 Project: Solr Issue Type: Improvement Components: clients - ruby - flare Environment: Darwin rocket 8.8.1 Darwin Kernel Version 8.8.1: Mon Sep 25 19:42:00 PDT 2006; root:xnu-792.13.8.obj~1/RELEASE_I386 i386 i386 Reporter: William Groppe Attachments: doc_and_field_roundtrip.patch Hey Erik, Take a close look at this patch, I've extended Ed's code quite a bit. You may want to hold off applying this until we all discuss it. But on the plus side, it has 100% test coverage, and allows round trips of full documents to Solr. Will -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (SOLR-107) Iterable NamedList with java5 generics
[ https://issues.apache.org/jira/browse/SOLR-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465047 ] Hoss Man commented on SOLR-107: --- i only briefly skimmed the patch, but a couple quick questions came to mind... 1) instead of creating a new NameValuePairT interface, couldn't named list just impliment IterableMap.EntryString,T ? 2) for this bit of code... @@ -183,7 +185,7 @@ Iterator iter = eset.iterator(); while (iter.hasNext()) { Map.Entry entry = (Map.Entry)iter.next(); - add(entry.getKey().toString(), entry.getValue()); + add(entry.getKey().toString(), (T)entry.getValue()); } return args.size()0; } ...that's in addAll(Map) right? ... if we're genericizing NamedList with respect to T, then shouldn't the method sig change to addAll(Map?,T) ... which would eliminate the need for the cast right? 3) there's an addAll(NamedList) too isn't there? .. shouldn't that method change to addAll(NamedListT) as well? (I think all of those would still work in the current code base using the generics default of Object for unspecified templates) Iterable NamedList with java5 generics -- Key: SOLR-107 URL: https://issues.apache.org/jira/browse/SOLR-107 Project: Solr Issue Type: Improvement Reporter: Ryan McKinley Priority: Trivial Attachments: IterableNamedList.patch Iterators and generics are nice! this patch adds both to NamedList.java -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
On 1/16/07, Chris Hostetter [EMAIL PROTECTED] wrote: interface SolrRequestParser { SolrRequest process( HttpServletRequest req ); } (the trick being that the servlet would need to parse the st info out of the URL (either from the path or from the QueryString) directly without using any of the HttpServletRequest.getParameter*() methods... I haven't followed all of the discussion, but wouldn't it be easier to use the request path, instead of parameters, to select these RequestParsers? i.e. solr/update/pdf-parser, solr/update/hssf-parser, solr/update/my-custom-parser, etc. -Bertrand