Re: Cocoon-2.1.9 vs. SOLR-20 SOLR-30

2006-11-22 Thread Walter Underwood
On 11/20/06 5:51 PM, Yonik Seeley [EMAIL PROTECTED] wrote:
 : If you really want to handle failure in an error response, write that
 : to a string and if that fails, send a hard-coded string.
 
 Hmmm... i could definitely get on board an idea like that.
 
 I took pains to make things streamable.. I'd hate to discard that.
 How do other servers handle streaming back a response and hitting an error?

You found the design tradeoff! We can stream the results or we can
give reliable error codes for errors that happen during result processing.
We can't do both. Ultraseek does streaming, but we were generating
HTML, so we could print reasonable errors in-line.

Streaming is very useful for HTML pages, because it allows the first
pixels to be painted as soon as possible. It isn't as important on the
back end, unless someone has gone to the considerable trouble of making
their entire front-end able to stream the back-end results to HTML.

If we aren't calling Writer.flush occasionally, then the streaming is
just filling up a buffer smoothly. The client won't see anything until
TCP decides to send it.

Does Lucene access fetch information from disk while we iterate
through the search results? If that happens a few times, then
streaming might make a difference. If it is mostly CPU-bound,
then streaming probably doesn't help.

wunder
-- 
Walter Underwood
Search Guru, Netflix




Re: Cocoon-2.1.9 vs. SOLR-20 SOLR-30

2006-11-22 Thread Yonik Seeley

On 11/22/06, Walter Underwood [EMAIL PROTECTED] wrote:

 I took pains to make things streamable.. I'd hate to discard that.
 How do other servers handle streaming back a response and hitting an error?

Does Lucene access fetch information from disk while we iterate
through the search results?


Yes.

Originally, all the documents were retrieved up-front, and the
response writer didn't even have access to the IndexReader.  After
seeing some users ask for some fields of *all* the documents in an
index on a different search product, I decided I'd better add
streamability to avoid OOM errors. A secondary consideration was
improving latency of the first document to the client when there are a
large number to be returned.

So Solr currently only records the ids (the internal integer lucene
docid) and optionally scores for documents to be returned.  During
response writing, the document for each id is read (which may involve
going to disk) right before it is written to the output stream.

-Yonik


Re: Cocoon-2.1.9 vs. SOLR-20 SOLR-30

2006-11-21 Thread Walter Underwood
On 11/20/06 7:22 PM, Fuad Efendi [EMAIL PROTECTED] wrote:
 This is just a sample...
 
 1. What is an Error?
 2. What is a Mistake?
 3. What is an application bug?
 4. What is a 'system crash'?

These are not HTTP concepts. The request on a URI can succeed or fail
or result in other codes. Mistakes and crashes are outside of the HTTP
protocol.

 Of cource, XML-over-HTTP engine is not the same as HTML-over-HTTP...
 However... Walter noticed 'crawling'... I can't imagine a company which will
 put SOLR as a front-end accessible to crawlers... (To crawl an indexing
 service instead of source documents!?)

XML-over-HTTP is exactly the same as HTML-over-HTTP. In HTML, we
could return detailed error information in a meta tag. No difference.

If something is on HTTP, a good crawler can find it. All it takes is
one link, probably to the admin URL. Once found, that crawler will
happily pound on errors returned by 200.

XSLT support means you could build the search UI natively on Solr,
so that might happen.

Even without a crawler, we must work with caches and load balancers.
I will be using Solr with a load balancer in production. If Solr is
a broken HTTP server, we will have to build something else.

 I am sure that mixing XML-based interface with HTTP status codes is not an
 attractive 'architecture', we shold separate conserns and leave HTTP code
 handling to a servlet container as much as possible...

We don't need to use HTTP response codes deep in Solr, but we do need
to separate bad parameters, retryable errors, non-retryable errors, and
so on. We can call them what ever we want internally, but we need to
report them properly over HTTP.

wunder
-- 
Walter Underwood
Search Guru, Netflix

 



Re: Cocoon-2.1.9 vs. SOLR-20 SOLR-30

2006-11-21 Thread Yonik Seeley

On 11/20/06, Walter Underwood [EMAIL PROTECTED] wrote:

Even without a crawler, we must work with caches and load balancers.
I will be using Solr with a load balancer in production. If Solr is
a broken HTTP server, we will have to build something else.


Agree.  Every instance of Solr in CNET that serves websites is behind
a load balancer.
I don't know the config details of the loadbalancers though, except
that part of it is the LB checking for the existence of a
server-enabled file.  That allows administrators to remove the file
and still bring up a Solr instance w/o live traffic hitting it.

Solr does nothing with this file except display enabled or disabled.

From solrconfig.xml:

   !-- configure a healthcheck file for servers behind a loadbalancer
   healthcheck type=fileserver-enabled/healthcheck

-Yonik


Re: Cocoon-2.1.9 vs. SOLR-20 SOLR-30

2006-11-21 Thread Yonik Seeley

On 11/20/06, Chris Hostetter [EMAIL PROTECTED] wrote:

: Wow, i had completley forgotten that SolrException contained an HTTP
: status code.

Hmmm... acctually, the javadocs for SolrException are a little vague on
the meaning of code and there are at least a few places where it's set
to a value that is not a legal HTTP status code...


None of these cases actually bubble back to an HTTP response code.
Schema parsing is done at startup, and the update servlet always
returns 200 (with error in the XML response).

Perhaps the update servlet should use HTTP error codes as well.

-Yonik


./src/java/org/apache/solr/schema/IndexSchema.java:  throw new 
SolrException(1,Schema Parsing Failed,e,false);
./src/java/org/apache/solr/schema/IndexSchema.java:  throw new 
SolrException(1,analyzer without class or tokenizer  filter list);
./src/java/org/apache/solr/schema/IndexSchema.java:   throw new 
SolrException(1,TokenizerFactory must be specified first in analyzer);
./src/java/org/apache/solr/schema/IndexSchema.java:throw new 
SolrException(1,undefined field +fieldName);
./src/java/org/apache/solr/update/DirectUpdateHandler.java:if (idField == null) throw 
new SolrException(2,Operation requires schema to have a unique key field);
./src/java/org/apache/solr/update/DirectUpdateHandler.java:if (idField == null) throw 
new SolrException(2,Operation requires schema to have a unique key field);
./src/java/org/apache/solr/update/UpdateHandler.java:  throw new 
SolrException(1,error parsing event listevers, e, false);
./src/java/org/apache/solr/update/UpdateHandler.java:  throw new 
SolrException(1,error parsing event listeners, e, false);


Re: Cocoon-2.1.9 vs. SOLR-20 SOLR-30

2006-11-21 Thread Walter Underwood
On 11/20/06 5:51 PM, Yonik Seeley [EMAIL PROTECTED] wrote:

 Now that I think about it though, one nice change would be to get rid
 of the long stack trace for 400 exceptions... it's not needed, right?

That is correct. A client error (400) should not be reported with a
server stack trace. --wunder



Re: Cocoon-2.1.9 vs. SOLR-20 SOLR-30

2006-11-21 Thread Chris Hostetter

: /solr/select?q= is a tricky case. Three options:

...there's kind of a chicken/egg problem with this discussion ... the egg
being what should the HTTP response look like in an 'error' situation
the chicken being what is the internal API to allow a RequestHandler to
denote an 'error' situation ... talking about specific cases only gets us
so far since those cases may not be errors in all RequestHandlers.

the problem gets even more complicated when you try to answer the
question: what should Solr do if an OutputWriter encounters an error? ...
we can't generate a valid JSON response dnoting an error if the
JSONOutputWriter is failing :)

It might be wise to discuss the API/psuedo code for dealing with errors in
RequestHandlers and OutputWriters and then think about what kinds of
responses those would generate rather then worrying too much about the
exact HTTP status codes first ... a big question to start off with would
be: should the RequestHandler know about HTTP satus codes and be allowed
to set them explicitly, or should that level of detail be abstracted away?


-Hoss



Re: Cocoon-2.1.9 vs. SOLR-20 SOLR-30

2006-11-21 Thread Walter Underwood
One way to think about this is to assume caches, proxies, and load balancing
in the HTTP path, then think about their behavior. A 500 response may make
the load balancer drop this server from the pool, for example. A 200 OK
can be cached, so temporary errors shouldn't be sent with that code.

On 11/20/06 10:51 AM, Chris Hostetter [EMAIL PROTECTED] wrote:
 
 ...there's kind of a chicken/egg problem with this discussion ... the egg
 being what should the HTTP response look like in an 'error' situation
 the chicken being what is the internal API to allow a RequestHandler to
 denote an 'error' situation ... talking about specific cases only gets us
 so far since those cases may not be errors in all RequestHandlers.

We can get most of the benefit with a few kinds of errors: 400, 403, 404,
500, and 503. Roughly:

400 - error in the request, fix it and try again
403 - forbidden, don't try again
404 - not found, don't try again unless you think it is there now
500 - server error, don't try again
503 - server error, try again

These can be mapped from internal error types.

 the problem gets even more complicated when you try to answer the
 question: what should Solr do if an OutputWriter encounters an error? ...
 we can't generate a valid JSON response dnoting an error if the
 JSONOutputWriter is failing :)

Write the response to a string before sending the headers. This can be
slower than writing the response out as it is computed, but the response
codes can be accurate. Also, it allows optimal buffering, so it might
scale better.

If you really want to handle failure in an error response, write that
to a string and if that fails, send a hard-coded string.

wunder
-- 
Walter Underwood
Search Guru, Netflix




RE: Cocoon-2.1.9 vs. SOLR-20 SOLR-30

2006-11-21 Thread Fuad Efendi

On the update side of things, I think it would be nice if one could
check the HTTP status code and if it's OK (200), don't bother XML
parsing the body.

Do you mean 304 'Not Modified'? Agree, we should handle it in SOLR (it is
not SOAP indeed!); we should handle 'last modified', 'expiration' etc. 

HTTP specs, as pointed by Hoss, allow to use 4xx codes with user-defined
entities.

There is some HTTP staff which we need to use anyway, but we should not use
HTTP codes in a core-Java parts of an application. Some code is currently
tightly coupled with such staff as 
SC_BAD_REQUEST
SC_OK 
SC_NOT_FOUND 

This is part of JEE, and existing design looks slightly outdated: we need to
decouple such 'nice' staff:
} catch (SolrException e) {
  sendErr(e.code(), SolrException.toStr(e), request, response);
} 

We even _catch_ an Exception, and _rethrow_ it as 400/404 (this is also
'Exception', but in a different language)


 1. What is an Error?
 2. What is a Mistake?
 3. What is an application bug?
 4. What is a 'system crash'?

These are not HTTP concepts. The request on a URI can succeed or fail
or result in other codes. Mistakes and crashes are outside of the HTTP
protocol.

Yes, I tried to mention very generic concepts and to think about
'Exceptions' in Java SE, EE, SOLR, JSON, XML, HTTP. We are always extending
java.lang.Exception without any thinking, just following patterns from
thousands of guides. 

Please, have a look at 
http://www.mindview.net/Etc/Discussions/CheckedExceptions
And following discussion:
http://www.bruceeckel.com/Etc/Discussions/UnCheckedExceptionComments


Some authors suggest to use unchecked exceptions. Code written in so many
books regarding try-catch-finally is suitable only for a very small
applications (usually small samples from a books)...

Thanks



Re: Cocoon-2.1.9 vs. SOLR-20 SOLR-30

2006-11-21 Thread Yonik Seeley

On 11/20/06, Fuad Efendi [EMAIL PROTECTED] wrote:

Here, we are passing 'Empty Query' error message with a full stack trace as
an entity body of HTTP 404 response.


It's actually returning 400:

$ curl -i http://localhost:8983/solr/select/
HTTP/1.1 400 Bad Request
Date: Tue, 21 Nov 2006 03:56:34 GMT
Server: Jetty/5.1.11RC0 (Windows XP/5.1 x86 java/1.5.0_09
Content-Type: text/plain; charset=UTF-8
Content-Length: 1377

org.apache.solr.core.SolrException: Missing queryString
   at org.apache.solr.request.StandardRequestHandler.handleRequest(Standard
RequestHandler.java:105)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:587)
   at org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:92)


Imagine that instead of 'Incorrect ZIP Code' we will see Java stack trace in
some web-sites...


As an aside, as I pointed out in an earlier message, it's debatable if
we should include a stack trace for user errors (as opposed to server
errors).  I guess it depends if it ever helps with debugging or not.

Anyway, the Solr interface isn't meant as a user GUI.  It's a back-end
system like a database.


I am sure that mixing XML-based interface with HTTP status codes is not an
attractive 'architecture', we shold separate conserns and leave HTTP code
handling to a servlet container as much as possible...


That gets further away from REST. Not that Solr is purely REST, but
it's not web-services either... it's about being practical.

On the update side of things, I think it would be nice if one could
check the HTTP status code and if it's OK (200), don't bother XML
parsing the body.

-Yonik


RE: Cocoon-2.1.9 vs. SOLR-20 SOLR-30

2006-11-19 Thread Chris Hostetter

: In my simplistic Cocoon-based form, end-user gets HTTP 400 when Cocoon tries
: to query http://localhost:8983/solr/select/?q= (empty query; but I have
: workaround in query.js JavaScript); obviously this is not a bug in SOLR
: neither it is an unrecoverable system error nor any kind of HTTP errors...

playing devils advocate here: that could certianly be construed a case
where a 404 Not Found is a valid response.

: The transport is 'XML over HTTP', it could even be 'XML over SNMP' and we

Agreed, but as long as the XML contains detailed well structured info
about the error (which Solr doesn't currently do) we could also leverage
the protocol's error conventions as well -- provided the protocol allows
for both error codes and response bodies (which HTTP does) ... plainly
speaking: we *could* send back a well formed error document in the
forrmat requested describing the specific problem with the request with a
non 200 HTTP status code.





-Hoss



Re: Cocoon-2.1.9 vs. SOLR-20 SOLR-30

2006-11-17 Thread Fuad Efendi

I was lucky today, I created simple 'flowscript' for Cocoon 2.1.9, and it
works fine...

We should probably separate business-related end-user errors (such as when
user submits empty query) and make it XML-like (instead of HTTP 400)

I want to play a little with 'Faceted Browsing' and Cocoon.

Thanks,

here are basic files for Cocoon:

query.js
=
function main() {
  var query = cocoon.request.get(query);
  if (query == null || query == ) {
cocoon.sendPage(query);
  } else {
cocoon.sendPage(aggregate, {query : query} );
  }
}


query.jx
=
?xml version=1.0?
page xmlns:jx=http://apache.org/cocoon/templates/jx/1.0;
  titleQuery JX/title
  content
form method=get action=
  input type=text name=query value=${query} /
  input type=submit value=Search/
/form
  /content
/page



sitemap.xmap

?xml version=1.0 encoding=UTF-8?
map:sitemap xmlns:map=http://apache.org/cocoon/sitemap/1.0;


map:flow language=javascript
  map:script src=query.js/
/map:flow


map:pipelines

  map:pipeline
  
map:match pattern=
  map:call function=main/
/map:match
 
map:match pattern=query
  map:generate type=jx src=query.jx/
  map:serialize type=xhtml/
/map:match

map:match pattern=response
  map:generate
src=http://localhost:8983/solr/select/?q={request-param:query}/
  map:transform src=context://samples/solr/example.xsl /
  map:serialize type=html/
/map:match

map:match pattern=aggregate
   map:aggregate element=page
 map:part src=cocoon:/query element=query/
 map:part src=cocoon:/response element=response/
   /map:aggregate
   map:transform
src=context://samples/common/style/xsl/html/simple-page2html.xsl /
   map:serialize/
/map:match

   /map:pipeline

  /map:pipelines

/map:sitemap


context://samples/solr/example.xsl - copied from SOLR distribution
-- 
View this message in context: 
http://www.nabble.com/Cocoon-2.1.9-vs.-SOLR-20---SOLR-30-tf2639621.html#a7412487
Sent from the Solr - Dev mailing list archive at Nabble.com.



Re: Cocoon-2.1.9 vs. SOLR-20 SOLR-30

2006-11-17 Thread Walter Underwood
On 11/17/06 2:50 PM, Fuad Efendi [EMAIL PROTECTED] wrote:

 We should probably separate business-related end-user errors (such as when
 user submits empty query) and make it XML-like (instead of HTTP 400)

Speaking as a former web spider maintainer, it is very important to keep
the HTTP response codes accurate. Never return an error with a 200.

If we want more info, return an entity (body) with the 400 response.

wunder
-- 
Walter Underwood
Search Guru, Netflix




Re: Cocoon-2.1.9 vs. SOLR-20 SOLR-30

2006-11-17 Thread Chris Hostetter

:  We should probably separate business-related end-user errors (such as when
:  user submits empty query) and make it XML-like (instead of HTTP 400)
:
: Speaking as a former web spider maintainer, it is very important to keep
: the HTTP response codes accurate. Never return an error with a 200.
:
: If we want more info, return an entity (body) with the 400 response.

Surely there is value in distinguishing between errors that occur in
Solr at the HTTP abstraction layer and logical errors that occur in the
request handler?

it seems like there would definitely be some merrit in using the HTTP
response code only to indicate Solr's ability to successfully interpret
the URL and hand it off to a RequestHandler and an OutputWriter -- some
other mechanism would probably make sense for the RequestHandler to
indicate it could not perform it's function.

For example, these urls should probably cause Solr to generate a 4xx HTTP
status...
http://host:/FOOBALOO
http://host:/select
...and any URL could concievably generate a 500 status if the
schema.xml or solrconfig.xml can't be parsed, or if the index dir can't be
opened; but if a valid request comes in for something like this...
http://host:/select?qt=bobid=foo
...and there is a RequestHandler named bob but bob's logic is to lookup
a document whose uniqueKey is the id param, and then foreach value of some
field in that doc, execute another search and return all of hte results --
what should bob do if the id param is specified (as in the example URL)
but no document with that ID can be found?

right now, the only means bob has to indicate an error is either to put
an arbitrary error code Object in the SolrResponse, or to generate an
execption, which results in either a 4xx or a 5xx error code (i forget
which at the moment) ... but has an error really occured at the HTTP
layer? ... it seems to me like this is the kind of case where the HTTP
status code should be 200, but the body of the HTTP response should
include some other data indicating a high level failure -- and that should
be included in a standard way that is independent of which RequestHandler
triggered it.

(abstractly i'm thinking along the same lines as why good web based
applications don't generate 404 error pages when you fill out a form
improperly, they generate 200 pages, but the HTML tells you there was a
problem -- since we control the output format, we can say that it will
contain specially indicator of success/failure -- but it's not hte same
thing as the HTTP status.






-Hoss



RE: Cocoon-2.1.9 vs. SOLR-20 SOLR-30

2006-11-17 Thread Fuad Efendi
:  We should probably separate business-related end-user errors (such as
when
:  user submits empty query) and make it XML-like (instead of HTTP 400)

I want to make it more clear... we need to separate errors (end-user
mistakes) from application bugs (exceptions) and from fatal errors (HTTP
Transport, TCP/IP, CPU, etc.)

In my simplistic Cocoon-based form, end-user gets HTTP 400 when Cocoon tries
to query http://localhost:8983/solr/select/?q= (empty query; but I have
workaround in query.js JavaScript); obviously this is not a bug in SOLR
neither it is an unrecoverable system error nor any kind of HTTP errors...
End-user (in this case Cocoon's Pipeline) should be notified about incorrect
request parameter (empty parameter) via standardised transport protocol
(XML...)

The transport is 'XML over HTTP', it could even be 'XML over SNMP' and we
should not use HTTP 400 case of incorrect request (protocol?) parameters...

Some useful info at http://mindview.net/Etc/Discussions/CheckedExceptions
(Bruce Eckel, Thinking in Java)

Thanks,




RE: Cocoon-2.1.9 vs. SOLR-20 SOLR-30

2006-11-17 Thread Fuad Efendi
Intresting...

I'd suggest to return 200 with XML-Response in case of any SOLR-related
problem (and leave some job to Tomcat). 

Of course if our crawler do not use 'If-modified-since' (such HTTP should be
handled by 'front-end'/httpd/firewall/proxy before reaching JEE/SOLR
container)... We are mixing different transport layers... 

RMI-IIOP does not use HTTP codes to pass Java Exceptions to remote client...

Speaking as a former web spider maintainer, it is very important to keep
the HTTP response codes accurate. Never return an error with a 200.
If we want more info, return an entity (body) with the 400 response.




Cocoon-2.1.9 vs. SOLR-20 SOLR-30

2006-11-15 Thread Fuad Efendi

Guys,
I spent 4 days trying to create simplest possible web-page sending query to
SOLR and showing results via GET method, single page should contain a form,
and a response from SOLR... of course I could modify basic XSL provided by
SOLR and avoid usage of Cocoon... finally I was forced to browse JIRA
database for unresolved bugs in Cocoon, found some messages about bad
performance of CInclude Transformer... simples task, but I need to do many
transformations to add dynamic...
Using pure JSP I could done the same just within an hour! and spend saved
time with Facets.

My votes for SOLR-20 and SOLR-30 (Java client for SOLR)
+2

And, HttpClient is preferable (ask guys from Nutch)

Cocoon is postponed, although it is very rich (DOJO-based AJAX,
pipelines,...)

Thanks
-- 
View this message in context: 
http://www.nabble.com/Cocoon-2.1.9-vs.-SOLR-20---SOLR-30-tf2639621.html#a7368481
Sent from the Solr - Dev mailing list archive at Nabble.com.