I recently figured out how to get OSCache working with Nutch, so I thought it 
might be useful to share that information with the list to save any future 
troubles one might have.
 
For anyone that doesn't know what OSCache is, here is the description as per 
the website;
 
"OSCache is a caching solution that includes a JSP tag library and set of 
classes to perform fine grained dynamic caching of JSP content, servlet 
responses or arbitrary objects. It provides both in memory and persistent on 
disk caches, and can allow your site to have graceful error tolerance (eg if an 
error occurs like your db goes down, you can serve the cached content so people 
can still surf the site almost without knowing)."
 
OSCache offers two ways to cache data, one by which you place a specific tag 
around code in your JSP source that you wish to cache and the other method 
called "CacheFilter" that works in a way that caches all the JSP output.
 
When using Nutch, the second method is the only one that works and is most 
likely what you want anyway. Here is a step-by-step installation setup for 
Nutch;
 
1. Download the package from 
http://www.opensymphony.com/oscache/download.action.
 
2. Extract the package and place the "oscache-*.jar" file in your Nutch package 
(ROOT.war) under the "/WEB-INF/lib" directory.

3. From the OSCache package, edit the oscache.properties file with your 
specific settings and place in your Nutch package under the "/WEB-INF/classes" 
directory.
 
4. Now we need to add OSCache specific entries into our Nutch 
"/WEB-INF/web.xml" file. The minimum required entries are below, with comments;
 
<filter>
    <filter-name>CacheFilter</filter-name>
    <filter-class>com.opensymphony.oscache.web.filter.CacheFilter</filter-class>
 
<!-- This loads the OSCache CacheFilter upon deployment. -->

    <init-param>
        <param-name>time</param-name>
        <param-value>600</param-value>
    </init-param>

<!-- This is the maximum amount of time any page will be cached in seconds. -->
 
</filter>
 
<filter-mapping>
    <filter-name>CacheFilter</filter-name>
    <url-pattern>*.jsp</url-pattern>
</filter-mapping>
 
<!-- This defines which pages the CacheFilter should cache, in this setting we 
are telling it to cache all JSP files. You can change this to, for example 
search.jsp to only cache search results and nothing else. -->
 
5. All done, now you just need to re-deploy your application with the changes.
 
I don't have any benchmark information available, but this can be a useful way 
to speed up searches if you operate a semi-busy search engine.
 
As for myself, I have Google Ads on all my search result pages. The way Google 
Ads operates is that each requested page is sent to Google via client-side 
javascript and Google quickly sends a request for that exact page to analyze 
for relevancy. This creates at least 2 requests per second for the same page. 
OSCache will always serve the second request (or more) from its cache. This 
reduces load, and opens up more query power for other users at the same time.
 
OSCache's memory usage will depend on the maximum amount of pages you wish to 
store in memory, it offers the ability to store them on disk also. I hope this 
helps anyone wishing to do a OSCache setup in the future.
 
Enjoy!
 
----- Original Message ----
From: Sean Dean <[EMAIL PROTECTED]>
To: nutch-user@lucene.apache.org
Sent: Wednesday, December 27, 2006 1:25:54 AM
Subject: Nutch and OSCache


I'm wondering if anyone is running OSCache with Nutch?

Ive followed there tutorial, and it seems there is a issue when wrapping any 
custom tag around any flushed include, which is according to JSP specification. 
I guess there is one in the Nutch JSP code stopping me?

Looking at the log output, Nutch runs fine and the point of error is the page 
generation.

2006-12-27 00:59:34,427 INFO  NutchBean - query: http
2006-12-27 00:59:34,427 INFO  NutchBean - lang:
2006-12-27 00:59:34,453 INFO  NutchBean - searching for 20 raw hits
2006-12-27 00:59:49,837 WARN  NativeCodeLoader - Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
2006-12-27 00:59:50,941 INFO  NutchBean - total hits: 5116128
2006-12-27 00:59:50,951 WARN  [jsp] - Servlet.service() for servlet jsp threw 
exception
java.io.IOException: Illegal to flush within a custom tag
        at javax.servlet.jsp.tagext.BodyContent.flush(BodyContent.java:79)
        at org.apache.jsp.search_jsp._jspService(search_jsp.java:416)
        at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
        at 
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:334)
        at 
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:314)
        at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:264)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
        at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252)
        at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
        at 
com.opensymphony.oscache.web.filter.CacheFilter.doFilter(CacheFilter.java:161)
        at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
        at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
        at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
        at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
        at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
        at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
        at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
        at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
        at org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:199)
        at org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:282)
        at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:767)
        at 
org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:697)
        at 
org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:889)
        at 
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684)
        at java.lang.Thread.run(Thread.java:595)

I know this isn't really a Nutch issue per say, but if anyone is running it 
without problems any tips would be greatly appreciated.

Thanks,

Sean

Reply via email to