I recently figured out how to get OSCache working with Nutch, so I thought it might be useful to share that information with the list to save any future troubles one might have. For anyone that doesn't know what OSCache is, here is the description as per the website; "OSCache is a caching solution that includes a JSP tag library and set of classes to perform fine grained dynamic caching of JSP content, servlet responses or arbitrary objects. It provides both in memory and persistent on disk caches, and can allow your site to have graceful error tolerance (eg if an error occurs like your db goes down, you can serve the cached content so people can still surf the site almost without knowing)." OSCache offers two ways to cache data, one by which you place a specific tag around code in your JSP source that you wish to cache and the other method called "CacheFilter" that works in a way that caches all the JSP output. When using Nutch, the second method is the only one that works and is most likely what you want anyway. Here is a step-by-step installation setup for Nutch; 1. Download the package from http://www.opensymphony.com/oscache/download.action. 2. Extract the package and place the "oscache-*.jar" file in your Nutch package (ROOT.war) under the "/WEB-INF/lib" directory.
3. From the OSCache package, edit the oscache.properties file with your specific settings and place in your Nutch package under the "/WEB-INF/classes" directory. 4. Now we need to add OSCache specific entries into our Nutch "/WEB-INF/web.xml" file. The minimum required entries are below, with comments; <filter> <filter-name>CacheFilter</filter-name> <filter-class>com.opensymphony.oscache.web.filter.CacheFilter</filter-class> <!-- This loads the OSCache CacheFilter upon deployment. --> <init-param> <param-name>time</param-name> <param-value>600</param-value> </init-param> <!-- This is the maximum amount of time any page will be cached in seconds. --> </filter> <filter-mapping> <filter-name>CacheFilter</filter-name> <url-pattern>*.jsp</url-pattern> </filter-mapping> <!-- This defines which pages the CacheFilter should cache, in this setting we are telling it to cache all JSP files. You can change this to, for example search.jsp to only cache search results and nothing else. --> 5. All done, now you just need to re-deploy your application with the changes. I don't have any benchmark information available, but this can be a useful way to speed up searches if you operate a semi-busy search engine. As for myself, I have Google Ads on all my search result pages. The way Google Ads operates is that each requested page is sent to Google via client-side javascript and Google quickly sends a request for that exact page to analyze for relevancy. This creates at least 2 requests per second for the same page. OSCache will always serve the second request (or more) from its cache. This reduces load, and opens up more query power for other users at the same time. OSCache's memory usage will depend on the maximum amount of pages you wish to store in memory, it offers the ability to store them on disk also. I hope this helps anyone wishing to do a OSCache setup in the future. Enjoy! ----- Original Message ---- From: Sean Dean <[EMAIL PROTECTED]> To: nutch-user@lucene.apache.org Sent: Wednesday, December 27, 2006 1:25:54 AM Subject: Nutch and OSCache I'm wondering if anyone is running OSCache with Nutch? Ive followed there tutorial, and it seems there is a issue when wrapping any custom tag around any flushed include, which is according to JSP specification. I guess there is one in the Nutch JSP code stopping me? Looking at the log output, Nutch runs fine and the point of error is the page generation. 2006-12-27 00:59:34,427 INFO NutchBean - query: http 2006-12-27 00:59:34,427 INFO NutchBean - lang: 2006-12-27 00:59:34,453 INFO NutchBean - searching for 20 raw hits 2006-12-27 00:59:49,837 WARN NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2006-12-27 00:59:50,941 INFO NutchBean - total hits: 5116128 2006-12-27 00:59:50,951 WARN [jsp] - Servlet.service() for servlet jsp threw exception java.io.IOException: Illegal to flush within a custom tag at javax.servlet.jsp.tagext.BodyContent.flush(BodyContent.java:79) at org.apache.jsp.search_jsp._jspService(search_jsp.java:416) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:334) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:314) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:264) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173) at com.opensymphony.oscache.web.filter.CacheFilter.doFilter(CacheFilter.java:161) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148) at org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:199) at org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:282) at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:767) at org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:697) at org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:889) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684) at java.lang.Thread.run(Thread.java:595) I know this isn't really a Nutch issue per say, but if anyone is running it without problems any tips would be greatly appreciated. Thanks, Sean