Re: Solr SRW Service
Thanks for the responses, couple of follow-ups Why do you need Axis for this? Well you certainly don't for the SRU implementation, but for SRW I'd just say that (in all the SRW implementations i've done so far) it's a case of the right tool for the right job. Of course we can hand craft the codecs and parse/produce the XML by hand. However, the SRU/SRW community comes from a background of interoperability as a sacrosanct requirement. Given that background, having something parse wsdl and produce your codecs for you gives people (me) a warm fuzzy feeling when it comes to WSI compliance. It also makes the release process much easier when it comes to upgrading the protocol version: Just pop a new wsdl in the build tree and compile. Of course there are other reasons too, but thats a starter for 10 :) Solr has some pluggable capability, detailed here: Ah ok thanks for that. I've taken a quick look and I'm trying to figure out how we might be able too expose extra features, like the ability to request results be returned in different schemas. I'll keep at it tho and check back if I have any questions. Cheers, Ian. On Mon, 2006-11-20 at 16:35 -0500, Erik Hatcher wrote: On Nov 20, 2006, at 2:15 PM, Ian Ibbotson wrote: Hiya all... I'm interested in the possibility of contributing SRW/SRU web services interface/module to solr (see http://www.loc.gov/standards/sru/). SRW/SRU is the web service definition which is often used along- side or instead-of the more traditional Z39.50 protocol for cross/meta searching. a solr SRW/SRU interface would enable meta-search engines to transparently include solr repository search results by only configuring the base URL of the service. I've already got the much code to do much of whats needed (IE, CQL to Lucene query rewriters and code to generate the right stubs using axis etc). Actually, I might be up for creating a z3950 module too if anyone is interested? Why do you need Axis for this? So my first question really would be... Is anyone out there already working on such a beast? If so, do you need any help? Seems pointless to create a second add-on. I've searched the lists (Not in any great depth tho) and can't see any references to SRW/Z3959. Assuming nobody is, I've got some follow-up questions about the best way to package up what might be add-on modules.. is this list the right place to ask? Solr has some pluggable capability, detailed here: http://wiki.apache.org/solr/SolrPlugins You can simply create your code, which I presume would entail a SolrRequestHandler and a QueryResponseWriter, and distribute it as a JAR that others could just drop in and run with it. Erik
Re: Cocoon-2.1.9 vs. SOLR-20 SOLR-30
On 11/20/06 7:22 PM, Fuad Efendi [EMAIL PROTECTED] wrote: This is just a sample... 1. What is an Error? 2. What is a Mistake? 3. What is an application bug? 4. What is a 'system crash'? These are not HTTP concepts. The request on a URI can succeed or fail or result in other codes. Mistakes and crashes are outside of the HTTP protocol. Of cource, XML-over-HTTP engine is not the same as HTML-over-HTTP... However... Walter noticed 'crawling'... I can't imagine a company which will put SOLR as a front-end accessible to crawlers... (To crawl an indexing service instead of source documents!?) XML-over-HTTP is exactly the same as HTML-over-HTTP. In HTML, we could return detailed error information in a meta tag. No difference. If something is on HTTP, a good crawler can find it. All it takes is one link, probably to the admin URL. Once found, that crawler will happily pound on errors returned by 200. XSLT support means you could build the search UI natively on Solr, so that might happen. Even without a crawler, we must work with caches and load balancers. I will be using Solr with a load balancer in production. If Solr is a broken HTTP server, we will have to build something else. I am sure that mixing XML-based interface with HTTP status codes is not an attractive 'architecture', we shold separate conserns and leave HTTP code handling to a servlet container as much as possible... We don't need to use HTTP response codes deep in Solr, but we do need to separate bad parameters, retryable errors, non-retryable errors, and so on. We can call them what ever we want internally, but we need to report them properly over HTTP. wunder -- Walter Underwood Search Guru, Netflix
Re: Phonetic Token Filter
: 2. Should we have a Jira issue first? this wiki should have all the info you need... http://wiki.apache.org/solr/HowToContribute -Hoss
Re: Cocoon-2.1.9 vs. SOLR-20 SOLR-30
On 11/20/06, Walter Underwood [EMAIL PROTECTED] wrote: Even without a crawler, we must work with caches and load balancers. I will be using Solr with a load balancer in production. If Solr is a broken HTTP server, we will have to build something else. Agree. Every instance of Solr in CNET that serves websites is behind a load balancer. I don't know the config details of the loadbalancers though, except that part of it is the LB checking for the existence of a server-enabled file. That allows administrators to remove the file and still bring up a Solr instance w/o live traffic hitting it. Solr does nothing with this file except display enabled or disabled. From solrconfig.xml: !-- configure a healthcheck file for servers behind a loadbalancer healthcheck type=fileserver-enabled/healthcheck -Yonik
Re: Cocoon-2.1.9 vs. SOLR-20 SOLR-30
On 11/20/06, Chris Hostetter [EMAIL PROTECTED] wrote: : Wow, i had completley forgotten that SolrException contained an HTTP : status code. Hmmm... acctually, the javadocs for SolrException are a little vague on the meaning of code and there are at least a few places where it's set to a value that is not a legal HTTP status code... None of these cases actually bubble back to an HTTP response code. Schema parsing is done at startup, and the update servlet always returns 200 (with error in the XML response). Perhaps the update servlet should use HTTP error codes as well. -Yonik ./src/java/org/apache/solr/schema/IndexSchema.java: throw new SolrException(1,Schema Parsing Failed,e,false); ./src/java/org/apache/solr/schema/IndexSchema.java: throw new SolrException(1,analyzer without class or tokenizer filter list); ./src/java/org/apache/solr/schema/IndexSchema.java: throw new SolrException(1,TokenizerFactory must be specified first in analyzer); ./src/java/org/apache/solr/schema/IndexSchema.java:throw new SolrException(1,undefined field +fieldName); ./src/java/org/apache/solr/update/DirectUpdateHandler.java:if (idField == null) throw new SolrException(2,Operation requires schema to have a unique key field); ./src/java/org/apache/solr/update/DirectUpdateHandler.java:if (idField == null) throw new SolrException(2,Operation requires schema to have a unique key field); ./src/java/org/apache/solr/update/UpdateHandler.java: throw new SolrException(1,error parsing event listevers, e, false); ./src/java/org/apache/solr/update/UpdateHandler.java: throw new SolrException(1,error parsing event listeners, e, false);
Re: Phonetic Token Filter
On 11/21/06, Walter Underwood [EMAIL PROTECTED] wrote: ...It is worth a try. Most implementations of Double Metaphone are well-commented, so you could change it for other languages... Ok, I'll see if I find some time to test that, thanks for the clarifications! -Bertrand
Re: Cocoon-2.1.9 vs. SOLR-20 SOLR-30
On 11/20/06 5:51 PM, Yonik Seeley [EMAIL PROTECTED] wrote: Now that I think about it though, one nice change would be to get rid of the long stack trace for 400 exceptions... it's not needed, right? That is correct. A client error (400) should not be reported with a server stack trace. --wunder
Phonetic Token Filter
I've written a simple phonetic token filter (and factory) based on the Double Metaphone implementation in Jakarta Codecs to contribute. Three questions: 1. Does this sound like a generally useful addition? 2. Should we have a Jira issue first? 3. This adds a depencency on the codecs jar. How do we add that to the distro? The code is very simple, but I need to learn the contribution process and build some tests, so this won't happen in one day. wunder -- Walter Underwood Search Guru, Netflix
Re: Cocoon-2.1.9 vs. SOLR-20 SOLR-30
: /solr/select?q= is a tricky case. Three options: ...there's kind of a chicken/egg problem with this discussion ... the egg being what should the HTTP response look like in an 'error' situation the chicken being what is the internal API to allow a RequestHandler to denote an 'error' situation ... talking about specific cases only gets us so far since those cases may not be errors in all RequestHandlers. the problem gets even more complicated when you try to answer the question: what should Solr do if an OutputWriter encounters an error? ... we can't generate a valid JSON response dnoting an error if the JSONOutputWriter is failing :) It might be wise to discuss the API/psuedo code for dealing with errors in RequestHandlers and OutputWriters and then think about what kinds of responses those would generate rather then worrying too much about the exact HTTP status codes first ... a big question to start off with would be: should the RequestHandler know about HTTP satus codes and be allowed to set them explicitly, or should that level of detail be abstracted away? -Hoss
Re: Phonetic Token Filter
On 11/21/06, Walter Underwood [EMAIL PROTECTED] wrote: I've written a simple phonetic token filter (and factory) based on the Double Metaphone implementation in Jakarta Codecs to contribute. Three questions: 1. Does this sound like a generally useful addition? Definitely useful. If it's generally applicable enough and light weight enough then it should go in the core. Otherwise it could go in contrib (which we don't really have yet, but we will when the need arises). This sounds like it should probably go in the core. 2. Should we have a Jira issue first? Yes, please. 3. This adds a depencency on the codecs jar. How do we add that to the distro? It would go in the lib directory if it ends up in Solr proper. -Yonik
Re: Cocoon-2.1.9 vs. SOLR-20 SOLR-30
One way to think about this is to assume caches, proxies, and load balancing in the HTTP path, then think about their behavior. A 500 response may make the load balancer drop this server from the pool, for example. A 200 OK can be cached, so temporary errors shouldn't be sent with that code. On 11/20/06 10:51 AM, Chris Hostetter [EMAIL PROTECTED] wrote: ...there's kind of a chicken/egg problem with this discussion ... the egg being what should the HTTP response look like in an 'error' situation the chicken being what is the internal API to allow a RequestHandler to denote an 'error' situation ... talking about specific cases only gets us so far since those cases may not be errors in all RequestHandlers. We can get most of the benefit with a few kinds of errors: 400, 403, 404, 500, and 503. Roughly: 400 - error in the request, fix it and try again 403 - forbidden, don't try again 404 - not found, don't try again unless you think it is there now 500 - server error, don't try again 503 - server error, try again These can be mapped from internal error types. the problem gets even more complicated when you try to answer the question: what should Solr do if an OutputWriter encounters an error? ... we can't generate a valid JSON response dnoting an error if the JSONOutputWriter is failing :) Write the response to a string before sending the headers. This can be slower than writing the response out as it is computed, but the response codes can be accurate. Also, it allows optimal buffering, so it might scale better. If you really want to handle failure in an error response, write that to a string and if that fails, send a hard-coded string. wunder -- Walter Underwood Search Guru, Netflix
RE: Cocoon-2.1.9 vs. SOLR-20 SOLR-30
On the update side of things, I think it would be nice if one could check the HTTP status code and if it's OK (200), don't bother XML parsing the body. Do you mean 304 'Not Modified'? Agree, we should handle it in SOLR (it is not SOAP indeed!); we should handle 'last modified', 'expiration' etc. HTTP specs, as pointed by Hoss, allow to use 4xx codes with user-defined entities. There is some HTTP staff which we need to use anyway, but we should not use HTTP codes in a core-Java parts of an application. Some code is currently tightly coupled with such staff as SC_BAD_REQUEST SC_OK SC_NOT_FOUND This is part of JEE, and existing design looks slightly outdated: we need to decouple such 'nice' staff: } catch (SolrException e) { sendErr(e.code(), SolrException.toStr(e), request, response); } We even _catch_ an Exception, and _rethrow_ it as 400/404 (this is also 'Exception', but in a different language) 1. What is an Error? 2. What is a Mistake? 3. What is an application bug? 4. What is a 'system crash'? These are not HTTP concepts. The request on a URI can succeed or fail or result in other codes. Mistakes and crashes are outside of the HTTP protocol. Yes, I tried to mention very generic concepts and to think about 'Exceptions' in Java SE, EE, SOLR, JSON, XML, HTTP. We are always extending java.lang.Exception without any thinking, just following patterns from thousands of guides. Please, have a look at http://www.mindview.net/Etc/Discussions/CheckedExceptions And following discussion: http://www.bruceeckel.com/Etc/Discussions/UnCheckedExceptionComments Some authors suggest to use unchecked exceptions. Code written in so many books regarding try-catch-finally is suitable only for a very small applications (usually small samples from a books)... Thanks
Re: Solr SRW Service
Right, I was questioning the use of Axis for SRU, not for SRW - sorry I didn't make that clear. Erik On Nov 21, 2006, at 2:27 AM, Ian Ibbotson wrote: Thanks for the responses, couple of follow-ups Why do you need Axis for this? Well you certainly don't for the SRU implementation, but for SRW I'd just say that (in all the SRW implementations i've done so far) it's a case of the right tool for the right job. Of course we can hand craft the codecs and parse/produce the XML by hand. However, the SRU/SRW community comes from a background of interoperability as a sacrosanct requirement. Given that background, having something parse wsdl and produce your codecs for you gives people (me) a warm fuzzy feeling when it comes to WSI compliance. It also makes the release process much easier when it comes to upgrading the protocol version: Just pop a new wsdl in the build tree and compile. Of course there are other reasons too, but thats a starter for 10 :) Solr has some pluggable capability, detailed here: Ah ok thanks for that. I've taken a quick look and I'm trying to figure out how we might be able too expose extra features, like the ability to request results be returned in different schemas. I'll keep at it tho and check back if I have any questions. Cheers, Ian. On Mon, 2006-11-20 at 16:35 -0500, Erik Hatcher wrote: On Nov 20, 2006, at 2:15 PM, Ian Ibbotson wrote: Hiya all... I'm interested in the possibility of contributing SRW/SRU web services interface/module to solr (see http://www.loc.gov/standards/sru/). SRW/SRU is the web service definition which is often used along- side or instead-of the more traditional Z39.50 protocol for cross/meta searching. a solr SRW/SRU interface would enable meta-search engines to transparently include solr repository search results by only configuring the base URL of the service. I've already got the much code to do much of whats needed (IE, CQL to Lucene query rewriters and code to generate the right stubs using axis etc). Actually, I might be up for creating a z3950 module too if anyone is interested? Why do you need Axis for this? So my first question really would be... Is anyone out there already working on such a beast? If so, do you need any help? Seems pointless to create a second add-on. I've searched the lists (Not in any great depth tho) and can't see any references to SRW/Z3959. Assuming nobody is, I've got some follow-up questions about the best way to package up what might be add-on modules.. is this list the right place to ask? Solr has some pluggable capability, detailed here: http://wiki.apache.org/solr/SolrPlugins You can simply create your code, which I presume would entail a SolrRequestHandler and a QueryResponseWriter, and distribute it as a JAR that others could just drop in and run with it. Erik
Re: Cocoon-2.1.9 vs. SOLR-20 SOLR-30
On 11/20/06, Fuad Efendi [EMAIL PROTECTED] wrote: Here, we are passing 'Empty Query' error message with a full stack trace as an entity body of HTTP 404 response. It's actually returning 400: $ curl -i http://localhost:8983/solr/select/ HTTP/1.1 400 Bad Request Date: Tue, 21 Nov 2006 03:56:34 GMT Server: Jetty/5.1.11RC0 (Windows XP/5.1 x86 java/1.5.0_09 Content-Type: text/plain; charset=UTF-8 Content-Length: 1377 org.apache.solr.core.SolrException: Missing queryString at org.apache.solr.request.StandardRequestHandler.handleRequest(Standard RequestHandler.java:105) at org.apache.solr.core.SolrCore.execute(SolrCore.java:587) at org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:92) Imagine that instead of 'Incorrect ZIP Code' we will see Java stack trace in some web-sites... As an aside, as I pointed out in an earlier message, it's debatable if we should include a stack trace for user errors (as opposed to server errors). I guess it depends if it ever helps with debugging or not. Anyway, the Solr interface isn't meant as a user GUI. It's a back-end system like a database. I am sure that mixing XML-based interface with HTTP status codes is not an attractive 'architecture', we shold separate conserns and leave HTTP code handling to a servlet container as much as possible... That gets further away from REST. Not that Solr is purely REST, but it's not web-services either... it's about being practical. On the update side of things, I think it would be nice if one could check the HTTP status code and if it's OK (200), don't bother XML parsing the body. -Yonik
Re: Solr SRW Service
On Nov 20, 2006, at 2:15 PM, Ian Ibbotson wrote: So my first question really would be... Is anyone out there already working on such a beast? If so, do you need any help? Seems pointless to create a second add-on. I've searched the lists (Not in any great depth tho) and can't see any references to SRW/Z3959. Assuming nobody is, I've got some follow-up questions about the best way to package up what might be add-on modules.. is this list the right place to ask? I'm not working on it, but I know that a lot of people in the library technology community would find this to be very useful indeed. The Extensible Text Framework [1] from the California Digital Library is similar to solr in that it provides a wrapper around lucene, and it has some experimental srw/sru support apparently [2]. It might be worthwhile chatting with them. //Ed [1] http://www.cdlib.org/inside/projects/xtf/ [2] http://xtf.sourceforge.net/WebDocs/HTML/XTF_Experimental_Features/ XTFExperimental.html
Re: Solr SRW Service
On Nov 20, 2006, at 2:15 PM, Ian Ibbotson wrote: Hiya all... I'm interested in the possibility of contributing SRW/SRU web services interface/module to solr (see http://www.loc.gov/standards/sru/). SRW/SRU is the web service definition which is often used along- side or instead-of the more traditional Z39.50 protocol for cross/meta searching. a solr SRW/SRU interface would enable meta-search engines to transparently include solr repository search results by only configuring the base URL of the service. I've already got the much code to do much of whats needed (IE, CQL to Lucene query rewriters and code to generate the right stubs using axis etc). Actually, I might be up for creating a z3950 module too if anyone is interested? Why do you need Axis for this? So my first question really would be... Is anyone out there already working on such a beast? If so, do you need any help? Seems pointless to create a second add-on. I've searched the lists (Not in any great depth tho) and can't see any references to SRW/Z3959. Assuming nobody is, I've got some follow-up questions about the best way to package up what might be add-on modules.. is this list the right place to ask? Solr has some pluggable capability, detailed here: http://wiki.apache.org/solr/SolrPlugins You can simply create your code, which I presume would entail a SolrRequestHandler and a QueryResponseWriter, and distribute it as a JAR that others could just drop in and run with it. Erik
XML vs. JSON, Python, Ruby
SOLR is a Web-Application with well-defined XML-based API: - indexing service - asynchronous; no need for 'real time' (content has well-defined TTL); can use HTTP Caching for increased performance - provides native support for XSL The question: do we really need to maintain JSON/Puby as a ServletOutput? We can focus on 'Public XML API' only, and provide samples of XSL-to-JSON, XML-to-WML, and etc...