XML vs. JSON, Python, Ruby

2006-11-21 Thread Fuad Efendi
SOLR is a Web-Application with well-defined XML-based API:
- indexing service
- asynchronous; no need for 'real time' (content has well-defined TTL); can
use HTTP Caching for increased performance
- provides native support for XSL

The question: do we really need to maintain JSON/Puby as a ServletOutput? We
can focus on 'Public XML API' only, and provide samples of XSL-to-JSON,
XML-to-WML, and etc...



RE: Cocoon-2.1.9 vs. SOLR-20 & SOLR-30

2006-11-21 Thread Fuad Efendi
>turning a plain-text stack trace into a XML
>or JSON stack trace doesn't seem like a big win.

Some errors have business meaning. 
'XML/JSON stack trace' IS a stack trace IIF it is not related to business
rules violation. If it is a business rule violation - it is neither 'stack
trace' nor HTTP error code.

P.S.
Do we send explicitly stack traces with SOLR-defined HTTP error codes for
OutOfMemoryError, NullPointerException?



Re: Solr SRW Service

2006-11-21 Thread Erik Hatcher


On Nov 20, 2006, at 2:15 PM, Ian Ibbotson wrote:

Hiya all...

I'm interested in the possibility of contributing SRW/SRU web services
interface/module to solr (see http://www.loc.gov/standards/sru/).
SRW/SRU is the web service definition which is often used along- 
side or

instead-of the more traditional Z39.50 protocol for cross/meta
searching. a solr SRW/SRU interface would enable meta-search  
engines to
transparently include solr repository search results by only  
configuring

the base URL of the service. I've already got the much code to do much
of whats needed (IE, CQL to Lucene query rewriters and code to  
generate
the right stubs using axis etc). Actually, I might be up for  
creating a

z3950 module too if anyone is interested?



Why do you need Axis for this?


So my first question really would be... Is anyone out there already
working on such a beast? If so, do you need any help? Seems  
pointless to
create a second add-on. I've searched the lists (Not in any great  
depth
tho) and can't see any references to SRW/Z3959. Assuming nobody is,  
I've
got some follow-up questions about the best way to package up what  
might

be add-on modules.. is this list the right place to ask?


Solr has some pluggable capability, detailed here:



You can simply create your code, which I presume would entail a  
SolrRequestHandler and a QueryResponseWriter, and distribute it as a  
JAR that others could just drop in and run with it.


Erik



Re: Solr SRW Service

2006-11-21 Thread Edward Summers

On Nov 20, 2006, at 2:15 PM, Ian Ibbotson wrote:

So my first question really would be... Is anyone out there already
working on such a beast? If so, do you need any help? Seems  
pointless to
create a second add-on. I've searched the lists (Not in any great  
depth
tho) and can't see any references to SRW/Z3959. Assuming nobody is,  
I've
got some follow-up questions about the best way to package up what  
might

be add-on modules.. is this list the right place to ask?


I'm not working on it, but I know that a lot of people in the library  
technology community would find this to be very useful indeed.


The Extensible Text Framework [1] from the California Digital Library  
is similar to solr in that it provides a wrapper around lucene, and  
it has some experimental srw/sru support apparently [2]. It might be  
worthwhile chatting with them.


//Ed

[1] http://www.cdlib.org/inside/projects/xtf/
[2] http://xtf.sourceforge.net/WebDocs/HTML/XTF_Experimental_Features/ 
XTFExperimental.html


Re: Cocoon-2.1.9 vs. SOLR-20 & SOLR-30

2006-11-21 Thread Yonik Seeley

On 11/20/06, Fuad Efendi <[EMAIL PROTECTED]> wrote:

Here, we are passing 'Empty Query' error message with a full stack trace as
an entity body of HTTP 404 response.


It's actually returning 400:

$ curl -i http://localhost:8983/solr/select/
HTTP/1.1 400 Bad Request
Date: Tue, 21 Nov 2006 03:56:34 GMT
Server: Jetty/5.1.11RC0 (Windows XP/5.1 x86 java/1.5.0_09
Content-Type: text/plain; charset=UTF-8
Content-Length: 1377

org.apache.solr.core.SolrException: Missing queryString
   at org.apache.solr.request.StandardRequestHandler.handleRequest(Standard
RequestHandler.java:105)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:587)
   at org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:92)


Imagine that instead of 'Incorrect ZIP Code' we will see Java stack trace in
some web-sites...


As an aside, as I pointed out in an earlier message, it's debatable if
we should include a stack trace for user errors (as opposed to server
errors).  I guess it depends if it ever helps with debugging or not.

Anyway, the Solr interface isn't meant as a user GUI.  It's a back-end
system like a database.


I am sure that mixing XML-based interface with HTTP status codes is not an
attractive 'architecture', we shold separate conserns and leave HTTP code
handling to a servlet container as much as possible...


That gets further away from REST. Not that Solr is purely REST, but
it's not web-services either... it's about being practical.

On the update side of things, I think it would be nice if one could
check the HTTP status code and if it's OK (200), don't bother XML
parsing the body.

-Yonik


Re: Solr SRW Service

2006-11-21 Thread Erik Hatcher
Right, I was questioning the use of Axis for SRU, not for SRW - sorry  
I didn't make that clear.


Erik


On Nov 21, 2006, at 2:27 AM, Ian Ibbotson wrote:


Thanks for the responses, couple of follow-ups


Why do you need Axis for this?


Well you certainly don't for the SRU implementation, but for SRW I'd
just say that (in all the SRW implementations i've done so far) it's a
case of the right tool for the right job. Of course we can hand craft
the codecs and parse/produce the XML by hand. However, the SRU/SRW
community comes from a background of interoperability as a sacrosanct
requirement. Given that background, having something parse wsdl and
produce your codecs for you gives people (me) a warm fuzzy feeling  
when

it comes to WSI compliance. It also makes the release process much
easier when it comes to upgrading the protocol version: Just pop a new
wsdl in the build tree and compile. Of course there are other reasons
too, but thats a starter for 10 :)


Solr has some pluggable capability, detailed here:


Ah ok thanks for that. I've taken a quick look and I'm trying to  
figure
out how we might be able too expose extra features, like the  
ability to

request results be returned in different schemas. I'll keep at it tho
and check back if I have any questions.

Cheers,
Ian.


On Mon, 2006-11-20 at 16:35 -0500, Erik Hatcher wrote:

On Nov 20, 2006, at 2:15 PM, Ian Ibbotson wrote:

Hiya all...

I'm interested in the possibility of contributing SRW/SRU web  
services

interface/module to solr (see http://www.loc.gov/standards/sru/).
SRW/SRU is the web service definition which is often used along-
side or
instead-of the more traditional Z39.50 protocol for cross/meta
searching. a solr SRW/SRU interface would enable meta-search
engines to
transparently include solr repository search results by only
configuring
the base URL of the service. I've already got the much code to do  
much

of whats needed (IE, CQL to Lucene query rewriters and code to
generate
the right stubs using axis etc). Actually, I might be up for
creating a
z3950 module too if anyone is interested?



Why do you need Axis for this?


So my first question really would be... Is anyone out there already
working on such a beast? If so, do you need any help? Seems
pointless to
create a second add-on. I've searched the lists (Not in any great
depth
tho) and can't see any references to SRW/Z3959. Assuming nobody is,
I've
got some follow-up questions about the best way to package up what
might
be add-on modules.. is this list the right place to ask?


Solr has some pluggable capability, detailed here:



You can simply create your code, which I presume would entail a
SolrRequestHandler and a QueryResponseWriter, and distribute it as a
JAR that others could just drop in and run with it.

Erik





Re: Solr SRW Service

2006-11-21 Thread Chris Hostetter
: > got some follow-up questions about the best way to package up what
: > might
: > be add-on modules.. is this list the right place to ask?

This list is definitely the right place to start, and As Erik mentioned,
this wiki is the first place to look if you are interested in making an
Addon...

:   
:
: You can simply create your code, which I presume would entail a
: SolrRequestHandler and a QueryResponseWriter, and distribute it as a
: JAR that others could just drop in and run with it.

The one hitch here is that the choice of RequestHandler and
QueryResponseWriter are driven by URL params, and my skimming of the SRU
URL is that SRU has some very specific requirements on what the URL params
can/should be ... one way to deal with that might be to write a new
Servlet ... but another approach might be to tell people "if you want to
use Solr as an SRU service, you must register SruRequestHandler with the
name "standard" and you must register the SruResponseWriter with the name
"standard" (since those are the defaults Solr uses when the qt and wt
params aren't specified) and then http://localhost:8983/solr/select will
return your "SRU Explain" page, and be the base URL for all SRU requests.

...but i digress: the point is you can probably impliment everything
neccessary for SRU using a custom request handler that does all your query
parsing and a custom reponse writer that formats the results
appropriately -- onece you have those, then it's just a qustion of how
exactly you need to meet the requirements for query params.

Ian: I'm not sure how familiar you are with the internals of Solr (or
Lucene for that matter) but once you've got the basics of the example app
and the tutorial down, take a look the SolrPlugins wiki and the guts of
the StandardRequestHandler and it should (hopefully) be clear how you
could go about implimenting your own SruRequestHandler.


-Hoss



RE: Cocoon-2.1.9 vs. SOLR-20 & SOLR-30

2006-11-21 Thread Fuad Efendi

>On the update side of things, I think it would be nice if one could
>check the HTTP status code and if it's OK (200), don't bother XML
>parsing the body.

Do you mean 304 'Not Modified'? Agree, we should handle it in SOLR (it is
not SOAP indeed!); we should handle 'last modified', 'expiration' etc. 

HTTP specs, as pointed by Hoss, allow to use 4xx codes with user-defined
entities.

There is some HTTP staff which we need to use anyway, but we should not use
HTTP codes in a core-Java parts of an application. Some code is currently
tightly coupled with such staff as 
SC_BAD_REQUEST
SC_OK 
SC_NOT_FOUND 

This is part of JEE, and existing design looks slightly outdated: we need to
decouple such 'nice' staff:
} catch (SolrException e) {
  sendErr(e.code(), SolrException.toStr(e), request, response);
} 

We even _catch_ an Exception, and _rethrow_ it as 400/404 (this is also
'Exception', but in a different language)


>> 1. What is an Error?
>> 2. What is a Mistake?
>> 3. What is an application bug?
>> 4. What is a 'system crash'?

>These are not HTTP concepts. The request on a URI can succeed or fail
>or result in other codes. Mistakes and crashes are outside of the HTTP
>protocol.

Yes, I tried to mention very generic concepts and to think about
'Exceptions' in Java SE, EE, SOLR, JSON, XML, HTTP. We are always extending
java.lang.Exception without any thinking, just following patterns from
thousands of guides. 

Please, have a look at 
http://www.mindview.net/Etc/Discussions/CheckedExceptions
And following discussion:
http://www.bruceeckel.com/Etc/Discussions/UnCheckedExceptionComments


Some authors suggest to use unchecked exceptions. Code written in so many
books regarding try-catch-finally is suitable only for a very small
applications (usually small samples from a books)...

Thanks



Re: Cocoon-2.1.9 vs. SOLR-20 & SOLR-30

2006-11-21 Thread Walter Underwood
One way to think about this is to assume caches, proxies, and load balancing
in the HTTP path, then think about their behavior. A 500 response may make
the load balancer drop this server from the pool, for example. A 200 OK
can be cached, so temporary errors shouldn't be sent with that code.

On 11/20/06 10:51 AM, "Chris Hostetter" <[EMAIL PROTECTED]> wrote:
> 
> ...there's kind of a chicken/egg problem with this discussion ... the egg
> being "what should the HTTP response look like in an 'error' situation"
> the chicken being "what is the internal API to allow a RequestHandler to
> denote an 'error' situation" ... talking about specific cases only gets us
> so far since those cases may not be errors in all RequestHandlers.

We can get most of the benefit with a few kinds of errors: 400, 403, 404,
500, and 503. Roughly:

400 - error in the request, fix it and try again
403 - forbidden, don't try again
404 - not found, don't try again unless you think it is there now
500 - server error, don't try again
503 - server error, try again

These can be mapped from internal error types.

> the problem gets even more complicated when you try to answer the
> question: what should Solr do if an OutputWriter encounters an error? ...
> we can't generate a valid JSON response dnoting an error if the
> JSONOutputWriter is failing :)

Write the response to a string before sending the headers. This can be
slower than writing the response out as it is computed, but the response
codes can be accurate. Also, it allows optimal buffering, so it might
scale better.

If you really want to handle failure in an error response, write that
to a string and if that fails, send a hard-coded string.

wunder
-- 
Walter Underwood
Search Guru, Netflix




Re: Phonetic Token Filter

2006-11-21 Thread Yonik Seeley

On 11/21/06, Walter Underwood <[EMAIL PROTECTED]> wrote:

I've written a simple phonetic token filter (and factory) based
on the Double Metaphone implementation in Jakarta Codecs to
contribute. Three questions:

1. Does this sound like a generally useful addition?


Definitely useful.
If it's generally applicable enough and light weight enough then it
should go in the core.  Otherwise it could go in contrib (which we
don't really have yet, but we will when the need arises).

This sounds like it should probably go in the core.


2. Should we have a Jira issue first?


Yes, please.


3. This adds a depencency on the codecs jar. How do we add that
to the distro?


It would go in the lib directory if it ends up in Solr proper.

-Yonik


Solr SRW Service

2006-11-21 Thread Ian Ibbotson
Hiya all...

I'm interested in the possibility of contributing SRW/SRU web services
interface/module to solr (see http://www.loc.gov/standards/sru/).
SRW/SRU is the web service definition which is often used along-side or
instead-of the more traditional Z39.50 protocol for cross/meta
searching. a solr SRW/SRU interface would enable meta-search engines to
transparently include solr repository search results by only configuring
the base URL of the service. I've already got the much code to do much
of whats needed (IE, CQL to Lucene query rewriters and code to generate
the right stubs using axis etc). Actually, I might be up for creating a
z3950 module too if anyone is interested?

So my first question really would be... Is anyone out there already
working on such a beast? If so, do you need any help? Seems pointless to
create a second add-on. I've searched the lists (Not in any great depth
tho) and can't see any references to SRW/Z3959. Assuming nobody is, I've
got some follow-up questions about the best way to package up what might
be add-on modules.. is this list the right place to ask?

Cheers for your time,

e.



Re: Cocoon-2.1.9 vs. SOLR-20 & SOLR-30

2006-11-21 Thread Chris Hostetter

: "/solr/select?q=" is a tricky case. Three options:

...there's kind of a chicken/egg problem with this discussion ... the egg
being "what should the HTTP response look like in an 'error' situation"
the chicken being "what is the internal API to allow a RequestHandler to
denote an 'error' situation" ... talking about specific cases only gets us
so far since those cases may not be errors in all RequestHandlers.

the problem gets even more complicated when you try to answer the
question: what should Solr do if an OutputWriter encounters an error? ...
we can't generate a valid JSON response dnoting an error if the
JSONOutputWriter is failing :)

It might be wise to discuss the API/psuedo code for dealing with errors in
RequestHandlers and OutputWriters and then think about what kinds of
responses those would generate rather then worrying too much about the
exact HTTP status codes first ... a big question to start off with would
be: should the RequestHandler know about HTTP satus codes and be allowed
to set them explicitly, or should that level of detail be abstracted away?


-Hoss



Phonetic Token Filter

2006-11-21 Thread Walter Underwood
I've written a simple phonetic token filter (and factory) based
on the Double Metaphone implementation in Jakarta Codecs to
contribute. Three questions:

1. Does this sound like a generally useful addition?

2. Should we have a Jira issue first?

3. This adds a depencency on the codecs jar. How do we add that
to the distro?

The code is very simple, but I need to learn the contribution
process and build some tests, so this won't happen in one day.

wunder
-- 
Walter Underwood
Search Guru, Netflix




Re: Cocoon-2.1.9 vs. SOLR-20 & SOLR-30

2006-11-21 Thread Walter Underwood
On 11/20/06 5:51 PM, "Yonik Seeley" <[EMAIL PROTECTED]> wrote:

> Now that I think about it though, one nice change would be to get rid
> of the long stack trace for 400 exceptions... it's not needed, right?

That is correct. A client error (400) should not be reported with a
server stack trace. --wunder



Re: Phonetic Token Filter

2006-11-21 Thread Bertrand Delacretaz

On 11/21/06, Walter Underwood <[EMAIL PROTECTED]> wrote:

...It is worth a try. Most implementations of Double Metaphone are
well-commented, so you could change it for other languages...


Ok, I'll see if I find some time to test that, thanks for the clarifications!
-Bertrand


Re: Cocoon-2.1.9 vs. SOLR-20 & SOLR-30

2006-11-21 Thread Yonik Seeley

On 11/20/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:

: Wow, i had completley forgotten that SolrException contained an HTTP
: status code.

Hmmm... acctually, the javadocs for SolrException are a little vague on
the meaning of "code" and there are at least a few places where it's set
to a value that is not a legal HTTP status code...


None of these cases actually bubble back to an HTTP response code.
Schema parsing is done at startup, and the update servlet always
returns 200 (with error in the XML response).

Perhaps the update servlet should use HTTP error codes as well.

-Yonik


./src/java/org/apache/solr/schema/IndexSchema.java:  throw new 
SolrException(1,"Schema Parsing Failed",e,false);
./src/java/org/apache/solr/schema/IndexSchema.java:  throw new 
SolrException(1,"analyzer without class or tokenizer & filter list");
./src/java/org/apache/solr/schema/IndexSchema.java:   throw new 
SolrException(1,"TokenizerFactory must be specified first in analyzer");
./src/java/org/apache/solr/schema/IndexSchema.java:throw new 
SolrException(1,"undefined field "+fieldName);
./src/java/org/apache/solr/update/DirectUpdateHandler.java:if (idField == null) throw 
new SolrException(2,"Operation requires schema to have a unique key field");
./src/java/org/apache/solr/update/DirectUpdateHandler.java:if (idField == null) throw 
new SolrException(2,"Operation requires schema to have a unique key field");
./src/java/org/apache/solr/update/UpdateHandler.java:  throw new 
SolrException(1,"error parsing event listevers", e, false);
./src/java/org/apache/solr/update/UpdateHandler.java:  throw new 
SolrException(1,"error parsing event listeners", e, false);


[jira] Resolved: (SOLR-71) New support for "Date Math" when adding/quering date fields

2006-11-21 Thread Hoss Man (JIRA)
 [ http://issues.apache.org/jira/browse/SOLR-71?page=all ]

Hoss Man resolved SOLR-71.
--

Resolution: Fixed

commited with some small modifications:

1) got rid of the unneeded synchronized Yonik pointed out
2) improved the javadocs a bit
3) added mention of DateMath in example schema.xml
4) added an example of a "baked in" Date Math query to the example 
solrconfig.xml

> New support for "Date Math" when adding/quering date fields
> ---
>
> Key: SOLR-71
> URL: http://issues.apache.org/jira/browse/SOLR-71
> Project: Solr
>  Issue Type: New Feature
>  Components: update, search
>Reporter: Hoss Man
> Assigned To: Hoss Man
> Attachments: DateMath.patch
>
>
> New utility class and changes to DateField to support syntax like the 
> following...
>   startDate:[* TO NOW]
>   startDate:[* TO NOW/DAY+1DAY]
>   expirationDate:[NOW/DAY TO *]
>   reviewDate:[NOW/DAY-1YEAR TO NOW/DAY]
>   validDate:[NOW/MONTH TO NOW/MONTH+1MONTH-1MILLISECOND]
> ...where + and - mean what you think, and "/UNIT" rounds down to the nearest 
> UNIT.  The motivation for this being that date range queries like these are 
> usefull for filters, but being date sensitve can't currently be "baked in" to 
> a config as default params.
> a nice side effect of the implimentation, is that "timestamp" fields can be 
> done with a document is added by using...
>NOW
> ...and Solr will compute the value when adding the document ... if we add 
> default values to the schema.xml even that won't be neccessary.
> Comments?  
> (I'd be particularly gratefull if smarter people then I would sanity check my 
> use of ThreadLocal for managing the DateFormat in DateField ... i've never 
> used ThreadLocal before.  Any general comments on the syntax would also be 
> appreciated: This left-to-right syntax seemed more intuative to write (and 
> easier to parse) then some of the other syntaxes I'd considered)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Solr SRW Service

2006-11-21 Thread Edward Summers

On Nov 21, 2006, at 4:32 AM, Erik Hatcher wrote:
Right, I was questioning the use of Axis for SRU, not for SRW -  
sorry I didn't make that clear.


Honestly, SRW isn't that interesting to me. I imagine most folks  
would be happy to have SRU. SRW itself is being deprecated in favor  
of SRU anyhow.


Perhaps simply an extension to handle SRU would be a good place to  
start?


//Ed


[jira] Created: (SOLR-72) specify max buffered docs memory for IndexWriter in solrconfig.xml

2006-11-21 Thread Yonik Seeley (JIRA)
specify max buffered docs memory for IndexWriter in solrconfig.xml
--

 Key: SOLR-72
 URL: http://issues.apache.org/jira/browse/SOLR-72
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley
Priority: Minor


Take advantage of this: 
https://issues.apache.org/jira/browse/LUCENE-709


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Cocoon-2.1.9 vs. SOLR-20 & SOLR-30

2006-11-21 Thread Yonik Seeley

On 11/20/06, Walter Underwood <[EMAIL PROTECTED]> wrote:

Even without a crawler, we must work with caches and load balancers.
I will be using Solr with a load balancer in production. If Solr is
a broken HTTP server, we will have to build something else.


Agree.  Every instance of Solr in CNET that serves websites is behind
a load balancer.
I don't know the config details of the loadbalancers though, except
that part of it is the LB checking for the existence of a
server-enabled file.  That allows administrators to remove the file
and still bring up a Solr instance w/o live traffic hitting it.

Solr does nothing with this file except display "enabled" or "disabled".

From solrconfig.xml:

   

Re: Cocoon-2.1.9 vs. SOLR-20 & SOLR-30

2006-11-21 Thread Chris Hostetter

: Wow, i had completley forgotten that SolrException contained an HTTP
: status code.

Hmmm... acctually, the javadocs for SolrException are a little vague on
the meaning of "code" and there are at least a few places where it's set
to a value that is not a legal HTTP status code...

./src/java/org/apache/solr/schema/IndexSchema.java:  throw new 
SolrException(1,"Schema Parsing Failed",e,false);
./src/java/org/apache/solr/schema/IndexSchema.java:  throw new 
SolrException(1,"analyzer without class or tokenizer & filter list");
./src/java/org/apache/solr/schema/IndexSchema.java:   throw new 
SolrException(1,"TokenizerFactory must be specified first in analyzer");
./src/java/org/apache/solr/schema/IndexSchema.java:throw new 
SolrException(1,"undefined field "+fieldName);
./src/java/org/apache/solr/update/DirectUpdateHandler.java:if (idField == 
null) throw new SolrException(2,"Operation requires schema to have a unique key 
field");
./src/java/org/apache/solr/update/DirectUpdateHandler.java:if (idField == 
null) throw new SolrException(2,"Operation requires schema to have a unique key 
field");
./src/java/org/apache/solr/update/UpdateHandler.java:  throw new 
SolrException(1,"error parsing event listevers", e, false);
./src/java/org/apache/solr/update/UpdateHandler.java:  throw new 
SolrException(1,"error parsing event listeners", e, false);

...plus one instance in DateField i'm about to change for SOLR-71.




-Hoss



Re: Phonetic Token Filter

2006-11-21 Thread Chris Hostetter

: > 2. Should we have a Jira issue first?

this wiki should have all the info you need...

http://wiki.apache.org/solr/HowToContribute



-Hoss



Re: Cocoon-2.1.9 vs. SOLR-20 & SOLR-30

2006-11-21 Thread Walter Underwood
On 11/20/06 7:22 PM, "Fuad Efendi" <[EMAIL PROTECTED]> wrote:
> This is just a sample...
> 
> 1. What is an Error?
> 2. What is a Mistake?
> 3. What is an application bug?
> 4. What is a 'system crash'?

These are not HTTP concepts. The request on a URI can succeed or fail
or result in other codes. Mistakes and crashes are outside of the HTTP
protocol.

> Of cource, XML-over-HTTP engine is not the same as HTML-over-HTTP...
> However... Walter noticed 'crawling'... I can't imagine a company which will
> put SOLR as a front-end accessible to crawlers... (To crawl an indexing
> service instead of source documents!?)

XML-over-HTTP is exactly the same as HTML-over-HTTP. In HTML, we
could return detailed error information in a meta tag. No difference.

If something is on HTTP, a good crawler can find it. All it takes is
one link, probably to the admin URL. Once found, that crawler will
happily pound on errors returned by 200.

XSLT support means you could build the search UI natively on Solr,
so that might happen.

Even without a crawler, we must work with caches and load balancers.
I will be using Solr with a load balancer in production. If Solr is
a broken HTTP server, we will have to build something else.

> I am sure that mixing XML-based interface with HTTP status codes is not an
> attractive 'architecture', we shold separate conserns and leave HTTP code
> handling to a servlet container as much as possible...

We don't need to use HTTP response codes deep in Solr, but we do need
to separate bad parameters, retryable errors, non-retryable errors, and
so on. We can call them what ever we want internally, but we need to
report them properly over HTTP.

wunder
-- 
Walter Underwood
Search Guru, Netflix

 



Re: Solr SRW Service

2006-11-21 Thread Ian Ibbotson
Thanks for the responses, couple of follow-ups

> Why do you need Axis for this?

Well you certainly don't for the SRU implementation, but for SRW I'd
just say that (in all the SRW implementations i've done so far) it's a
case of the right tool for the right job. Of course we can hand craft
the codecs and parse/produce the XML by hand. However, the SRU/SRW
community comes from a background of interoperability as a sacrosanct
requirement. Given that background, having something parse wsdl and
produce your codecs for you gives people (me) a warm fuzzy feeling when
it comes to WSI compliance. It also makes the release process much
easier when it comes to upgrading the protocol version: Just pop a new
wsdl in the build tree and compile. Of course there are other reasons
too, but thats a starter for 10 :)

> Solr has some pluggable capability, detailed here:

Ah ok thanks for that. I've taken a quick look and I'm trying to figure
out how we might be able too expose extra features, like the ability to 
request results be returned in different schemas. I'll keep at it tho 
and check back if I have any questions.

Cheers,
Ian.


On Mon, 2006-11-20 at 16:35 -0500, Erik Hatcher wrote:
> On Nov 20, 2006, at 2:15 PM, Ian Ibbotson wrote:
> > Hiya all...
> >
> > I'm interested in the possibility of contributing SRW/SRU web services
> > interface/module to solr (see http://www.loc.gov/standards/sru/).
> > SRW/SRU is the web service definition which is often used along- 
> > side or
> > instead-of the more traditional Z39.50 protocol for cross/meta
> > searching. a solr SRW/SRU interface would enable meta-search  
> > engines to
> > transparently include solr repository search results by only  
> > configuring
> > the base URL of the service. I've already got the much code to do much
> > of whats needed (IE, CQL to Lucene query rewriters and code to  
> > generate
> > the right stubs using axis etc). Actually, I might be up for  
> > creating a
> > z3950 module too if anyone is interested?
> 
> 
> Why do you need Axis for this?
> 
> > So my first question really would be... Is anyone out there already
> > working on such a beast? If so, do you need any help? Seems  
> > pointless to
> > create a second add-on. I've searched the lists (Not in any great  
> > depth
> > tho) and can't see any references to SRW/Z3959. Assuming nobody is,  
> > I've
> > got some follow-up questions about the best way to package up what  
> > might
> > be add-on modules.. is this list the right place to ask?
> 
> Solr has some pluggable capability, detailed here:
> 
>   
> 
> You can simply create your code, which I presume would entail a  
> SolrRequestHandler and a QueryResponseWriter, and distribute it as a  
> JAR that others could just drop in and run with it.
> 
>   Erik
>