+ should we use httpUnit for testing?
Yes! This would be a great addition to our test suite.
My dream is that the test suite can start Tomcat, serving some test content, crawl that site, index it, then search it over HTTP--a true end-to-end test.
+ XML + XSLT for search-form and result page generation.
That would be great. There are versions of all of the JSP tags which are XML-compatible. So it is possible to have XSLT generate JSP pages.
+ caching of HTML snippet that was generated at startupby XML-XSLT process. (A kind of tiny template engine)
+ cache everything you can cache
+ use Servlets instead of jsp since Jsp use slow PrintWriter and Servlets the faster OutputStream
+ do not use tag libs since they are slow. (since we didnt use jsp any more we haven't this problem)
These sound like premature optimizations.
I've benchmarked simple JSP pages at well over 100 pages/second. I do not think JSP performance is presently a bottleneck for Nutch. The web ui is an area where we should emphasize ease of alteration over performance. JSP pages are easier for non-hackers to edit than servlets. Less code, even if it's slower, is probably best here.
+ do not log directly in search or result "pages" use a "log queue" that has a log- statements stack to write it independent from the out creation.
Is logging a performance bottleneck? Output is redirected to a file, and the OS buffers writes, so there shouldn't be delays when logging. In any case, if we want to optimize this, then we should optimize the log Handler that we use, not the web app, no?
http://java.sun.com/j2se/1.4.2/docs/api/java/util/logging/Handler.html
+ no sessions any more or do we need them somewhere i had oversee?
I'm not sure what you mean here.
+ using GZIPOutputStream where possible but make it configurable
That would be good. This can be done without changing anything else using servlet filters, no?
+ flush data in sections
How des this help? I've always thought it was best to flush once at the end. But I suppose you could flush the header first, or something...
+ set getLastModified to server startup time
This is so the browser will cache more things?
+ make it possible to load static content from a other server for example an apache installation. (make absolute paths possible)
Sounds good.
Doug
-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers
