Problems Installing
Hi there... I am trying to get nutch running Have done a trial indexing run successfully etc... Now I'm running into issues that may be more Tomcat related than Nutch: HTTP Status 500 - type Exception report message description The server encountered an internal error () that prevented it from fulfilling this request. exception org.apache.jasper.JasperException org.apache.jasper.servlet.JspServletWrapper.service(javax.servlet.http.H ttpServletRequest, javax.servlet.http.HttpServletResponse, boolean) (/usr/lib/libjasper5-compiler-5.0.30.jar.so) org.apache.jasper.servlet.JspServlet.serviceJspFile(javax.servlet.http.H ttpServletRequest, javax.servlet.http.HttpServletResponse, java.lang.String, java.lang.Throwable, boolean) (/usr/lib/libjasper5-compiler-5.0.30.jar.so) org.apache.jasper.servlet.JspServlet.service(javax.servlet.http.HttpServ letRequest, javax.servlet.http.HttpServletResponse) (/usr/lib/libjasper5-compiler-5.0.30.jar.so) javax.servlet.http.HttpServlet.service(javax.servlet.ServletRequest, javax.servlet.ServletResponse) (/usr/lib/libservletapi5-5.0.30.jar.so) org.apache.catalina.valves.ErrorReportValve.invoke(org.apache.catalina.R equest, org.apache.catalina.Response, org.apache.catalina.ValveContext) (/usr/lib/libcatalina-5.0.30.jar.so) org.apache.coyote.tomcat5.CoyoteAdapter.service(org.apache.coyote.Reques t, org.apache.coyote.Response) (/usr/lib/libcatalina-5.0.30.jar.so) org.apache.coyote.http11.Http11Processor.process(java.io.InputStream, java.io.OutputStream) (/usr/lib/libtomcat-http11-5.0.30.jar.so) org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.processC onnection(org.apache.tomcat.util.net.TcpConnection, java.lang.Object[]) (/usr/lib/libtomcat-http11-5.0.30.jar.so) org.apache.tomcat.util.net.TcpWorkerThread.runIt(java.lang.Object[]) (/tmp/libtomcat-util-5.0.30.jar.socuf3wu.so) org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run() (/tmp/libtomcat-util-5.0.30.jar.socuf3wu.so) java.lang.Thread.run() (/usr/lib/libgcj.so.6.0.0) root cause java.lang.NullPointerException org.apache.nutch.searcher.NutchBean.init(java.io.File, java.io.File) (Unknown Source) org.apache.nutch.searcher.NutchBean.NutchBean(java.io.File) (Unknown Source) org.apache.nutch.searcher.NutchBean.NutchBean() (Unknown Source) org.apache.nutch.searcher.NutchBean.get(javax.servlet.ServletContext) (Unknown Source) org.apache.jsp.search_jsp._jspService(javax.servlet.http.HttpServletRequ est, javax.servlet.http.HttpServletResponse) (Unknown Source) org.apache.jasper.runtime.HttpJspBase.service(javax.servlet.http.HttpSer vletRequest, javax.servlet.http.HttpServletResponse) (/usr/lib/libjasper5-runtime-5.0.30.jar.so) javax.servlet.http.HttpServlet.service(javax.servlet.ServletRequest, javax.servlet.ServletResponse) (/usr/lib/libservletapi5-5.0.30.jar.so) org.apache.jasper.servlet.JspServletWrapper.service(javax.servlet.http.H ttpServletRequest, javax.servlet.http.HttpServletResponse, boolean) (/usr/lib/libjasper5-compiler-5.0.30.jar.so) org.apache.jasper.servlet.JspServlet.serviceJspFile(javax.servlet.http.H ttpServletRequest, javax.servlet.http.HttpServletResponse, java.lang.String, java.lang.Throwable, boolean) (/usr/lib/libjasper5-compiler-5.0.30.jar.so) org.apache.jasper.servlet.JspServlet.service(javax.servlet.http.HttpServ letRequest, javax.servlet.http.HttpServletResponse) (/usr/lib/libjasper5-compiler-5.0.30.jar.so) javax.servlet.http.HttpServlet.service(javax.servlet.ServletRequest, javax.servlet.ServletResponse) (/usr/lib/libservletapi5-5.0.30.jar.so) org.apache.catalina.valves.ErrorReportValve.invoke(org.apache.catalina.R equest, org.apache.catalina.Response, org.apache.catalina.ValveContext) (/usr/lib/libcatalina-5.0.30.jar.so) org.apache.coyote.tomcat5.CoyoteAdapter.service(org.apache.coyote.Reques t, org.apache.coyote.Response) (/usr/lib/libcatalina-5.0.30.jar.so) org.apache.coyote.http11.Http11Processor.process(java.io.InputStream, java.io.OutputStream) (/usr/lib/libtomcat-http11-5.0.30.jar.so) org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.processC onnection(org.apache.tomcat.util.net.TcpConnection, java.lang.Object[]) (/usr/lib/libtomcat-http11-5.0.30.jar.so) org.apache.tomcat.util.net.TcpWorkerThread.runIt(java.lang.Object[]) (/tmp/libtomcat-util-5.0.30.jar.socuf3wu.so) org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run() (/tmp/libtomcat-util-5.0.30.jar.socuf3wu.so) java.lang.Thread.run() (/usr/lib/libgcj.so.6.0.0) note The full stack trace of the root cause is available in the Apache Tomcat/5.0 logs.
RE: Problems Installing
Thanks for the reply... I re-did what you mentioned below It re-installed just fine (I'm running Fedora Core 4 and installed with yum using rpm's) Even when I rename it, I must access it now via http://www.myserver..:8080/root Or else I get a 404 not found... When I try and do a search I get the same error Any other thoughts? :) Paul -Original Message- From: Dan Morrill [mailto:[EMAIL PROTECTED] Sent: Sunday, April 02, 2006 2:17 PM To: nutch-user@lucene.apache.org Subject: RE: Problems Installing Did you: 1. remove the root.war from tomcat? 2. rename nutch.war to root.war and dump that into webapps under tomcat? 3. did it install ok (can you see the exploded pages under webapps root? Just checking, this is how I fixed the same issue under windows. r/d -Original Message- From: Paul Stewart [mailto:[EMAIL PROTECTED] Sent: Sunday, April 02, 2006 11:00 AM To: nutch-user@lucene.apache.org Subject: Problems Installing Hi there... I am trying to get nutch running Have done a trial indexing run successfully etc... Now I'm running into issues that may be more Tomcat related than Nutch: HTTP Status 500 - type Exception report message description The server encountered an internal error () that prevented it from fulfilling this request. exception org.apache.jasper.JasperException org.apache.jasper.servlet.JspServletWrapper.service(javax.servlet.http.H ttpServletRequest, javax.servlet.http.HttpServletResponse, boolean) (/usr/lib/libjasper5-compiler-5.0.30.jar.so) org.apache.jasper.servlet.JspServlet.serviceJspFile(javax.servlet.http.H ttpServletRequest, javax.servlet.http.HttpServletResponse, java.lang.String, java.lang.Throwable, boolean) (/usr/lib/libjasper5-compiler-5.0.30.jar.so) org.apache.jasper.servlet.JspServlet.service(javax.servlet.http.HttpServ letRequest, javax.servlet.http.HttpServletResponse) (/usr/lib/libjasper5-compiler-5.0.30.jar.so) javax.servlet.http.HttpServlet.service(javax.servlet.ServletRequest, javax.servlet.ServletResponse) (/usr/lib/libservletapi5-5.0.30.jar.so) org.apache.catalina.valves.ErrorReportValve.invoke(org.apache.catalina.R equest, org.apache.catalina.Response, org.apache.catalina.ValveContext) (/usr/lib/libcatalina-5.0.30.jar.so) org.apache.coyote.tomcat5.CoyoteAdapter.service(org.apache.coyote.Reques t, org.apache.coyote.Response) (/usr/lib/libcatalina-5.0.30.jar.so) org.apache.coyote.http11.Http11Processor.process(java.io.InputStream, java.io.OutputStream) (/usr/lib/libtomcat-http11-5.0.30.jar.so) org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.processC onnection(org.apache.tomcat.util.net.TcpConnection, java.lang.Object[]) (/usr/lib/libtomcat-http11-5.0.30.jar.so) org.apache.tomcat.util.net.TcpWorkerThread.runIt(java.lang.Object[]) (/tmp/libtomcat-util-5.0.30.jar.socuf3wu.so) org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run() (/tmp/libtomcat-util-5.0.30.jar.socuf3wu.so) java.lang.Thread.run() (/usr/lib/libgcj.so.6.0.0) root cause java.lang.NullPointerException org.apache.nutch.searcher.NutchBean.init(java.io.File, java.io.File) (Unknown Source) org.apache.nutch.searcher.NutchBean.NutchBean(java.io.File) (Unknown Source) org.apache.nutch.searcher.NutchBean.NutchBean() (Unknown Source) org.apache.nutch.searcher.NutchBean.get(javax.servlet.ServletContext) (Unknown Source) org.apache.jsp.search_jsp._jspService(javax.servlet.http.HttpServletRequ est, javax.servlet.http.HttpServletResponse) (Unknown Source) org.apache.jasper.runtime.HttpJspBase.service(javax.servlet.http.HttpSer vletRequest, javax.servlet.http.HttpServletResponse) (/usr/lib/libjasper5-runtime-5.0.30.jar.so) javax.servlet.http.HttpServlet.service(javax.servlet.ServletRequest, javax.servlet.ServletResponse) (/usr/lib/libservletapi5-5.0.30.jar.so) org.apache.jasper.servlet.JspServletWrapper.service(javax.servlet.http.H ttpServletRequest, javax.servlet.http.HttpServletResponse, boolean) (/usr/lib/libjasper5-compiler-5.0.30.jar.so) org.apache.jasper.servlet.JspServlet.serviceJspFile(javax.servlet.http.H ttpServletRequest, javax.servlet.http.HttpServletResponse, java.lang.String, java.lang.Throwable, boolean) (/usr/lib/libjasper5-compiler-5.0.30.jar.so) org.apache.jasper.servlet.JspServlet.service(javax.servlet.http.HttpServ letRequest, javax.servlet.http.HttpServletResponse) (/usr/lib/libjasper5-compiler-5.0.30.jar.so) javax.servlet.http.HttpServlet.service(javax.servlet.ServletRequest, javax.servlet.ServletResponse) (/usr/lib/libservletapi5-5.0.30.jar.so) org.apache.catalina.valves.ErrorReportValve.invoke(org.apache.catalina.R equest, org.apache.catalina.Response, org.apache.catalina.ValveContext) (/usr/lib/libcatalina-5.0.30.jar.so
RE: Tomcat Problem
Where would I check that? I can check the JSP file by copying the nutch--.war file back over to the webroot and watch it expand etc... But confused and new to tomcat stuff -Original Message- From: Babu, KameshNarayana (GE, Research, consultant) [mailto:[EMAIL PROTECTED] Sent: Sunday, April 02, 2006 11:51 PM To: nutch-user@lucene.apache.org Subject: RE: Tomcat Problem Hey, Check the classpath and ur JSP file. Regards Kamesh -Original Message- From: Paul Stewart [mailto:[EMAIL PROTECTED] Sent: Monday, April 03, 2006 4:25 AM To: nutch-user@lucene.apache.org Subject: Tomcat Problem Sorry if this is slightly off-topic but I'm just trying to get Nutch running for testing... I *think* this is Tomcat related: HTTP Status 500 - type Exception report message description The server encountered an internal error () that prevented it from fulfilling this request. exception org.apache.jasper.JasperException: Unable to compile class for JSP
RE: Tomcat Problem
/javamail/mailapi-1.3.1.jar cp=/var/lib/tomcat5/common/lib/naming-factory.jar cp=/usr/share/java/javamail/providers-1.3.1.jar cp=/usr/share/java/libgcj-4.0.2.jar cp=/var/lib/tomcat5/common/lib/naming-common.jar cp=/usr/share/java/javamail/providers-1.3.1.jar cp=/usr/share/java/libgcj-4.0.2.jar cp=/usr/share/java/ant-launcher-1.6.2.jar cp=/usr/share/java/jasper5-runtime-5.0.30.jar cp=/var/lib/tomcat5/common/lib/naming-resources.jar cp=/usr/share/java/jakarta-commons-dbcp-1.2.1.jar cp=/usr/share/java/javamail/providers-1.3.1.jar cp=/usr/share/java/jakarta-commons-collections-3.1.jar cp=/var/lib/tomcat5/common/lib/naming-java.jar cp=/usr/share/java/jakarta-commons-logging-api-1.0.4.jar cp=/usr/share/java/javamail/providers-1.3.1.jar cp=/usr/share/java/javamail/providers-1.3.1.jar cp=/usr/share/java/jspapi-5.0.30.jar cp=/usr/share/java/servletapi5-5.0.30.jar cp=/usr/share/java/jaf-1.0.2.jar cp=/usr/lib/jvm/java/lib/tools.jar cp=/usr/share/tomcat5/bin/bootstrap.jar cp=/usr/share/java/commons-logging-api.jar cp=/usr/share/java/mx4j/mx4j.jar work dir=/usr/share/tomcat5/work/Catalina/localhost/nutch extension dir=/usr/share/java/ext srcDir=/usr/share/tomcat5/work/Catalina/localhost/nutch compilerTargetVM=1.3 compilerSourceVM=1.3 include=org/apache/jsp/index_jsp.java 3-Apr-06 1:57:34 AM org.apache.jasper.compiler.Compiler generateClass(java.lang.String[]) SEVERE: Error compiling file: /usr/share/tomcat5/work/Catalina/localhost/nutch//org/apache/jsp/index_j sp.java [javac] Compiling 1 source file -Original Message- From: Babu, KameshNarayana (GE, Research, consultant) [mailto:[EMAIL PROTECTED] Sent: Sunday, April 02, 2006 11:51 PM To: nutch-user@lucene.apache.org Subject: RE: Tomcat Problem Hey, Check the classpath and ur JSP file. Regards Kamesh -Original Message- From: Paul Stewart [mailto:[EMAIL PROTECTED] Sent: Monday, April 03, 2006 4:25 AM To: nutch-user@lucene.apache.org Subject: Tomcat Problem Sorry if this is slightly off-topic but I'm just trying to get Nutch running for testing... I *think* this is Tomcat related: HTTP Status 500 - type Exception report message description The server encountered an internal error () that prevented it from fulfilling this request. exception org.apache.jasper.JasperException: Unable to compile class for JSP org.apache.jasper.compiler.DefaultErrorHandler.javacError(java.lang.Stri ng, java.lang.Exception) (/usr/lib/libjasper5-compiler-5.0.30.jar.so) org.apache.jasper.compiler.ErrorDispatcher.javacError(java.lang.String, java.lang.Exception) (/usr/lib/libjasper5-compiler-5.0.30.jar.so) org.apache.jasper.compiler.Compiler.generateClass(java.lang.String[]) (/usr/lib/libjasper5-compiler-5.0.30.jar.so) org.apache.jasper.compiler.Compiler.compile(boolean, boolean) (/usr/lib/libjasper5-compiler-5.0.30.jar.so) org.apache.jasper.compiler.Compiler.compile(boolean) (/usr/lib/libjasper5-compiler-5.0.30.jar.so) org.apache.jasper.compiler.Compiler.compile() (/usr/lib/libjasper5-compiler-5.0.30.jar.so) org.apache.jasper.JspCompilationContext.compile() (/usr/lib/libjasper5-compiler-5.0.30.jar.so) org.apache.jasper.servlet.JspServletWrapper.service(javax.servlet.http.H ttpServletRequest, javax.servlet.http.HttpServletResponse, boolean) (/usr/lib/libjasper5-compiler-5.0.30.jar.so) org.apache.jasper.servlet.JspServlet.serviceJspFile(javax.servlet.http.H ttpServletRequest, javax.servlet.http.HttpServletResponse, java.lang.String, java.lang.Throwable, boolean) (/usr/lib/libjasper5-compiler-5.0.30.jar.so) org.apache.jasper.servlet.JspServlet.service(javax.servlet.http.HttpServ letRequest, javax.servlet.http.HttpServletResponse) (/usr/lib/libjasper5-compiler-5.0.30.jar.so) javax.servlet.http.HttpServlet.service(javax.servlet.ServletRequest, javax.servlet.ServletResponse) (/usr/lib/libservletapi5-5.0.30.jar.so) org.apache.catalina.valves.ErrorReportValve.invoke(org.apache.catalina.R equest, org.apache.catalina.Response, org.apache.catalina.ValveContext) (/usr/lib/libcatalina-5.0.30.jar.so) org.apache.coyote.tomcat5.CoyoteAdapter.service(org.apache.coyote.Reques t, org.apache.coyote.Response) (/usr/lib/libcatalina-5.0.30.jar.so) org.apache.coyote.http11.Http11Processor.process(java.io.InputStream, java.io.OutputStream) (/usr/lib/libtomcat-http11-5.0.30.jar.so) org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.processC onnection(org.apache.tomcat.util.net.TcpConnection, java.lang.Object[]) (/usr/lib/libtomcat-http11-5.0.30.jar.so) org.apache.tomcat.util.net.TcpWorkerThread.runIt(java.lang.Object[]) (/tmp/libtomcat-util-5.0.30.jar.soj8ryts.so) org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run() (/tmp
Nutch 500 Error
Hi there... I was having a number of problems with my install, mainly because I'm not used to Tomcat and/or Nutch etc... Anyways, I am running Fedora 4 and was told that the packages are bad idea to use so uninstalled all of my java/tomcat rpm's and installed new binaries today from the source sites (Sun Java / Apache Tomcat) Things are looking better I think but when I try to run a search I get this: HTTP Status 500 - type Exception report message description The server encountered an internal error () that prevented it from fulfilling this request. exception org.apache.jasper.JasperException org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServle tWrapper.java:510) org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.ja va:393) org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:314) org.apache.jasper.servlet.JspServlet.service(JspServlet.java:264) javax.servlet.http.HttpServlet.service(HttpServlet.java:802) root cause java.lang.NullPointerException org.apache.nutch.searcher.NutchBean.init(NutchBean.java:96) org.apache.nutch.searcher.NutchBean.init(NutchBean.java:82) org.apache.nutch.searcher.NutchBean.init(NutchBean.java:72) org.apache.nutch.searcher.NutchBean.get(NutchBean.java:64) org.apache.jsp.search_jsp._jspService(search_jsp.java:112) org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97) javax.servlet.http.HttpServlet.service(HttpServlet.java:802) org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.ja va:332) org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:314) org.apache.jasper.servlet.JspServlet.service(JspServlet.java:264) javax.servlet.http.HttpServlet.service(HttpServlet.java:802) The search page comes up fine I figure this must be something simple I hope... I read on the Nutch site about utf-8 and added it to the configuration of my Tomcat5 server... Any other ideas? Thanks again, Paul
RE: Nutch 500 Error
Thanks.. Tried that ... Same error HTTP Status 500 - type Exception report message description The server encountered an internal error () that prevented it from fulfilling this request. exception org.apache.jasper.JasperException org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServle tWrapper.java:510) org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.ja va:393) org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:314) org.apache.jasper.servlet.JspServlet.service(JspServlet.java:264) javax.servlet.http.HttpServlet.service(HttpServlet.java:802) root cause java.lang.NullPointerException org.apache.nutch.searcher.NutchBean.init(NutchBean.java:96) org.apache.nutch.searcher.NutchBean.init(NutchBean.java:82) org.apache.nutch.searcher.NutchBean.init(NutchBean.java:72) org.apache.nutch.searcher.NutchBean.get(NutchBean.java:64) org.apache.jsp.search_jsp._jspService(search_jsp.java:112) org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97) javax.servlet.http.HttpServlet.service(HttpServlet.java:802) org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.ja va:332) org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:314) org.apache.jasper.servlet.JspServlet.service(JspServlet.java:264) javax.servlet.http.HttpServlet.service(HttpServlet.java:802) -Original Message- From: TDLN [mailto:[EMAIL PROTECTED] Sent: Thursday, April 06, 2006 3:30 AM To: nutch-user@lucene.apache.org Subject: Re: Nutch 500 Error My guess is you have to override the searcher.dir property in nutch-site.xml and have it point to your crawl dir. Rgrds, Thomas
RE: Nutch 500 Error
Thanks I was doing the java command wrong... Back to my original problem - I re-ran throught the entire tutorial to ensure I was doing it right and it seems proper How do I tell Nutch where to look specifically in the code for the segments and indexes in case it is in the wrong place? All the best, Paul -Original Message- From: sudhendra seshachala [mailto:[EMAIL PROTECTED] Sent: Thursday, April 06, 2006 12:02 PM To: nutch-user@lucene.apache.org Subject: RE: Nutch 500 Error It should be java -versionI think. Paul Stewart [EMAIL PROTECTED] wrote: Thanks for the reply... I apologize as I'm very new to the Java world...:) I am running the following: Fedora Core 4 Apache Tomcat 5.5.16 (binary download from Tomcat site installed to /usr/local/tomcat5) jre1.5.0_06 (binary download from Sun site to /usr/java/jre1.5.0_06) Weird though - when I try to do a java -v I get this now: [EMAIL PROTECTED] jre1.5.0_06]# export JAVA_HOME=/usr/java/jre1.5.0_06/ [EMAIL PROTECTED] jre1.5.0_06]# /usr/java/jre1.5.0_06/bin/java -v Unrecognized option: -v Could not create the Java virtual machine. Is this my actual problem possibly? Or is this the wrong Java version to be running? When I downloaded 1.4.x tomcat told me it didn't support anything but 1.5.x Thanks again for your patience... Paul -Original Message- From: TDLN [mailto:[EMAIL PROTECTED] Sent: Thursday, April 06, 2006 7:16 AM To: nutch-user@lucene.apache.org Subject: Re: Nutch 500 Error What version are you on? If you trace the NullPointerException back to the code, the NutchBean.init method is where it expects to find the index and segments, so either they're missing (did you follow the tutorial and merge your segment indexes?) or it is looking in the wrong place. That's what I think. Rgrds, Thomas On 4/6/06, Paul Stewart wrote: Thanks.. Tried that ... Same error HTTP Status 500 - -- -- type Exception report message description The server encountered an internal error () that prevented it from fulfilling this request. exception org.apache.jasper.JasperException org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServ le tWrapper.java:510) org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper. ja va:393) org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:31 4) org.apache.jasper.servlet.JspServlet.service(JspServlet.java:264) javax.servlet.http.HttpServlet.service(HttpServlet.java:802) root cause java.lang.NullPointerException org.apache.nutch.searcher.NutchBean.init(NutchBean.java:96) org.apache.nutch.searcher.NutchBean.(NutchBean.java:82) org.apache.nutch.searcher.NutchBean.(NutchBean.java:72) org.apache.nutch.searcher.NutchBean.get(NutchBean.java:64) org.apache.jsp.search_jsp._jspService(search_jsp.java:112) org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97) javax.servlet.http.HttpServlet.service(HttpServlet.java:802) org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper. ja va:332) org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:31 4) org.apache.jasper.servlet.JspServlet.service(JspServlet.java:264) javax.servlet.http.HttpServlet.service(HttpServlet.java:802) -Original Message- From: TDLN [mailto:[EMAIL PROTECTED] Sent: Thursday, April 06, 2006 3:30 AM To: nutch-user@lucene.apache.org Subject: Re: Nutch 500 Error My guess is you have to override the searcher.dir property in nutch-site.xml and have it point to your crawl dir. Rgrds, Thomas Sudhi Seshachala http://sudhilogs.blogspot.com/ - Yahoo! Messenger with Voice. PC-to-Phone calls for ridiculously low rates.
RE: Nutch 500 Error
I may have found my problem but not sure Here's my /usr/local/nutch directory: drwxr-xr-x 2 root root 4096 Apr 2 13:06 bin -rw-rw-r-- 1 root root15567 Mar 31 13:40 build.xml -rw-rw-r-- 1 root root19814 Mar 31 13:40 CHANGES.txt drwxr-xr-x 2 root root 4096 Apr 12 08:47 conf drwxr-xr-x 2 root root 4096 Apr 8 17:22 crawl drwxr-xr-x 3 root root 4096 Apr 8 17:29 db -rw-rw-r-- 1 root root 1845 Mar 31 13:40 default.properties drwxr-xr-x 19 root root 4096 Mar 31 13:40 docs drwxr-xr-x 2 root root 4096 Apr 2 11:15 lib -rw-rw-r-- 1 root root 615 Mar 31 13:40 LICENSE.txt -rw-rw-r-- 1 root root 755034 Mar 31 13:40 nutch-0.7.2.jar -rw-rw-r-- 1 root root 15806453 Mar 31 13:40 nutch-0.7.2.war drwxr-xr-x 26 root root 4096 Mar 31 13:40 plugins -rw-rw-r-- 1 root root 403 Mar 31 13:40 README.txt drwxr-xr-x 4 root root 4096 Apr 8 17:28 segments drwxr-xr-x 11 root root 4096 Mar 31 13:40 src -rw-r--r-- 1 root root 65 Apr 8 17:28 urls My crawl directory is empty My db directory is this: [EMAIL PROTECTED] db]# ls -l total 4 -rw-r--r-- 1 root root0 Apr 8 17:29 dbreadlock -rw-r--r-- 1 root root0 Apr 8 17:29 dbwritelock drwxr-xr-x 6 root root 4096 Apr 8 17:29 webdb Inside webdb is this: drwxr-xr-x 2 root root 4096 Apr 8 17:29 linksByMD5 drwxr-xr-x 2 root root 4096 Apr 8 17:29 linksByURL drwxr-xr-x 2 root root 4096 Apr 8 17:29 pagesByMD5 drwxr-xr-x 2 root root 4096 Apr 8 17:29 pagesByURL -rw-r--r-- 1 root root 17 Apr 8 17:29 stats Back in my /usr/local/nutch/segments directory I have: drwxr-xr-x 8 root root 4096 Apr 8 17:30 20060408172630 drwxr-xr-x 8 root root 4096 Apr 8 17:30 20060408172823 So something must be wrong then if I use /usr/local/nutch as my searcher.dir right? Even though I'm sure I followed the tutorial I'm obviously missing something...? Thanks again. Paul -Original Message- From: sudhendra seshachala [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 11, 2006 10:45 PM To: nutch-user@lucene.apache.org Subject: RE: Nutch 500 Error check the nutch-default.xml there should be a property searcher.dir Provide the path for the index folder. Better still copy the property node and paste it in nutch-site.xml provide the path for the index folder. For ex: If the index folder is stored as home/nutch/crawl - crawldb - segments - index - indexes point searcher.dir to home/nutch/crawl. Hope this helps. Thanks Sudhi Paul Stewart [EMAIL PROTECTED] wrote: Thanks I was doing the java command wrong... Back to my original problem - I re-ran throught the entire tutorial to ensure I was doing it right and it seems proper How do I tell Nutch where to look specifically in the code for the segments and indexes in case it is in the wrong place? All the best, Paul
Hardware Planning
Hi folks... I have read the archives and looking for input specific to my estimated requirements: Want to index about 100 million public webpages. Space and bandwidth are not a problem - coming up with the right hardware and keeping the cost down is my goal. I would estimate only 1-2 searches per second at least during the first hardware phase. With that in mind I'm trying to figure out whether to use a couple of larger Dell servers or a bunch of small single CPU, 1 Gig RAM, 160 GB hard drive type of machines Anyone share what they are using for hardware for about 100 million webpages and their search result times etc?? Realworld is important to me and being able to scale is important Thanks, Paul The information transmitted is intended only for the person or entity to which it is addressed and contains confidential and/or privileged material. If you received this in error, please contact the sender immediately and then destroy this transmission, including all attachments, without copying, distributing or disclosing same. Thank you.
RE: Hardware Planning
Thanks very much for the details... I appreciate it... I'd be happy with the 500ms range on *average* but totally understand your point about searches piling up So you're suggesting about 20 million pages per box - each box with 4 drives, dual CPU and 4 gig RAM? I guess what I don't totally understand is what servers need lots of RAM and which ones need all the storage etc for sure. I was thinking of some low end boxes (2 gig RAM, 160 Gig HD, single low end processor) for storage and a couple of heftier boxes (dual cpu, 4 Gig RAM, 500 GB hard drives) - is this way off track? What needs RAM and what needs storage in the components of Nutch? Thanks again, Paul -Original Message- From: Ken Krugler [mailto:[EMAIL PROTECTED] Sent: Thursday, November 29, 2007 11:13 AM To: nutch-user@lucene.apache.org Subject: RE: Hardware Planning Hi Paul, Leaving aside the hardware requirements for the crawl... The main issue with what you need to achieve your is the nature of your index. If you're using the results of a standard Nutch web crawl, then search times 500ms shouldn't be a problem. But you actually want something more in the range of say 200ms average, as otherwise you can quickly run into the overlapping search problem...once a search doesn't complete in time before another search starts running, both searches take longer, which increases the odds that the third search happens before the previous search(es) have completed. So the performance can quickly deteriorate under a load that's only slightly higher than your target case. However getting 200ms time isn't hard either, as long as the hardware is reasonable and the index size isn't huge. In our experience, using more, cheaper boxes is the way to go. For web crawl data, I would probably got with two 10M page indexes per box, where the Lucene index goes on a smaller, faster drive and the page contents go on a bigger, slower drive. So then you'd have two faster drives and two slower drives per box, and use a dual CPU with dual cores. And 4GB of RAM, so each JVM gets 1.5GB with some breathing room for the OS. Which means you'd need about five of these servers for 100M pages...unless you want replication for reliability, which means 10 servers. -- Ken No, not familiar with that yet - can you send out any URL's? My question is really whether you're better to try for one or two big boxes or a series of small boxes - also looking for anyone who has 100 million pages in their index and a description of their hardware as a reference point... Thanks! Paul -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of VK Sent: Wednesday, November 28, 2007 9:53 PM To: nutch-user@lucene.apache.org Subject: Re: Hardware Planning Have you considered EC2 + S3? Also Rightscale has some interesting solutions, which I am currently evaluating. On Nov 28, 2007 9:38 PM, Paul Stewart [EMAIL PROTECTED] wrote: Hi folks... I have read the archives and looking for input specific to my estimated requirements: Want to index about 100 million public webpages. Space and bandwidth are not a problem - coming up with the right hardware and keeping the cost down is my goal. I would estimate only 1-2 searches per second at least during the first hardware phase. With that in mind I'm trying to figure out whether to use a couple of larger Dell servers or a bunch of small single CPU, 1 Gig RAM, 160 GB hard drive type of machines Anyone share what they are using for hardware for about 100 million webpages and their search result times etc?? Realworld is important to me and being able to scale is important Thanks, Paul --- - The information transmitted is intended only for the person or entity to which it is addressed and contains confidential and/or privileged material. If you received this in error, please contact the sender immediately and then destroy this transmission, including all attachments, without copying, distributing or disclosing same. Thank you. --- - The information transmitted is intended only for the person or entity to which it is addressed and contains confidential and/or privileged material. If you received this in error, please contact the sender immediately and then destroy this transmission, including all attachments, without copying, distributing or disclosing same. Thank you. -- Ken Krugler Krugle, Inc. +1 530-210-6378 If you can't find it, you can't fix it The information transmitted is intended only for the person or entity to which it is addressed and contains confidential and/or privileged material. If you received this in error, please contact the sender immediately and then destroy this transmission
New Installation - Problems - Error 500
Hi folks... Just installing a new server for Nutch - testing at this point... Ran a crawl with no problems but can't do a search without getting an Error 500. CentOS5.1, Tomcat5.5.20, Java SDK 1.5.0_14 The last time I installed Nutch I ran into a similar issue and it had to do with a config setting and Java looking for input please ;) Thanks, Paul type Exception report message description The server encountered an internal error () that prevented it from fulfilling this request. exception org.apache.jasper.JasperException: org.apache.hadoop.util.ReflectionUtils org.apache.jasper.servlet.JspServletWrapper.handleJspException(jasper5-c ompiler-5.5.23.jar.so) org.apache.jasper.servlet.JspServletWrapper.service(jasper5-compiler-5.5 .23.jar.so) org.apache.jasper.servlet.JspServlet.serviceJspFile(jasper5-compiler-5.5 .23.jar.so) org.apache.jasper.servlet.JspServlet.service(jasper5-compiler-5.5.23.jar .so) javax.servlet.http.HttpServlet.service(tomcat5-servlet-2.4-api-5.5.23.ja r.so) root cause javax.servlet.ServletException: org.apache.hadoop.util.ReflectionUtils org.apache.jasper.runtime.PageContextImpl.doHandlePageException(jasper5- runtime-5.5.23.jar.so) org.apache.jasper.runtime.PageContextImpl.handlePageException(jasper5-ru ntime-5.5.23.jar.so) org.apache.jsp.search_jsp._jspService(search_jsp.java:777) org.apache.jasper.runtime.HttpJspBase.service(jasper5-runtime-5.5.23.jar .so) javax.servlet.http.HttpServlet.service(tomcat5-servlet-2.4-api-5.5.23.ja r.so) org.apache.jasper.servlet.JspServletWrapper.service(jasper5-compiler-5.5 .23.jar.so) org.apache.jasper.servlet.JspServlet.serviceJspFile(jasper5-compiler-5.5 .23.jar.so) org.apache.jasper.servlet.JspServlet.service(jasper5-compiler-5.5.23.jar .so) javax.servlet.http.HttpServlet.service(tomcat5-servlet-2.4-api-5.5.23.ja r.so) root cause java.lang.NoClassDefFoundError: org.apache.hadoop.util.ReflectionUtils java.lang.Class.initializeClass(libgcj.so.7rh) org.apache.hadoop.fs.FileSystem.get(FileSystem.java:159) org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:119) org.apache.hadoop.fs.FileSystem.get(FileSystem.java:91) org.apache.nutch.searcher.NutchBean.init(NutchBean.java:95) org.apache.nutch.searcher.NutchBean.init(NutchBean.java:84) org.apache.nutch.searcher.NutchBean.get(NutchBean.java:71) org.apache.jsp.search_jsp._jspService(search_jsp.java:106) org.apache.jasper.runtime.HttpJspBase.service(jasper5-runtime-5.5.23.jar .so) javax.servlet.http.HttpServlet.service(tomcat5-servlet-2.4-api-5.5.23.ja r.so) org.apache.jasper.servlet.JspServletWrapper.service(jasper5-compiler-5.5 .23.jar.so) org.apache.jasper.servlet.JspServlet.serviceJspFile(jasper5-compiler-5.5 .23.jar.so) org.apache.jasper.servlet.JspServlet.service(jasper5-compiler-5.5.23.jar .so) javax.servlet.http.HttpServlet.service(tomcat5-servlet-2.4-api-5.5.23.ja r.so) root cause java.lang.ClassNotFoundException: java.util.concurrent.ConcurrentHashMap org.apache.catalina.loader.WebappClassLoader.loadClass(catalina-5.5.23.j ar.so) org.apache.catalina.loader.WebappClassLoader.loadClass(catalina-5.5.23.j ar.so) java.lang.Class.forName(libgcj.so.7rh) java.lang.Class.initializeClass(libgcj.so.7rh) org.apache.hadoop.fs.FileSystem.get(FileSystem.java:159) org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:119) org.apache.hadoop.fs.FileSystem.get(FileSystem.java:91) org.apache.nutch.searcher.NutchBean.init(NutchBean.java:95) org.apache.nutch.searcher.NutchBean.init(NutchBean.java:84) org.apache.nutch.searcher.NutchBean.get(NutchBean.java:71) org.apache.jsp.search_jsp._jspService(search_jsp.java:106) org.apache.jasper.runtime.HttpJspBase.service(jasper5-runtime-5.5.23.jar .so) javax.servlet.http.HttpServlet.service(tomcat5-servlet-2.4-api-5.5.23.ja r.so) org.apache.jasper.servlet.JspServletWrapper.service(jasper5-compiler-5.5 .23.jar.so) org.apache.jasper.servlet.JspServlet.serviceJspFile(jasper5-compiler-5.5 .23.jar.so) org.apache.jasper.servlet.JspServlet.service(jasper5-compiler-5.5.23.jar .so) javax.servlet.http.HttpServlet.service(tomcat5-servlet-2.4-api-5.5.23.ja r.so) The information transmitted is intended only for the person or entity to which it is addressed and contains confidential and/or privileged material. If you received this in error, please contact the sender immediately and then destroy this transmission, including all attachments, without copying, distributing or disclosing same. Thank you.
RE: New Installation - Problems - Error 500
Thanks.. my apologies as new to Java (to complicate matters). When I check in the tomcat.conf file I can't find a place to specify. When I do a search, there is multiple versions installed: /usr/bin/java /usr/share/java /usr/include/c++/4.1.1/gnu/java /usr/include/c++/4.1.1/java /usr/java /usr/java/jdk1.5.0_14/bin/java /usr/java/jdk1.5.0_14/jre/bin/java /usr/lib/jvm-exports/java /usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/bin/java /usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre/bin/java /usr/lib/jvm/java /usr/lib/java /etc/alternatives/java /etc/java /var/lib/alternatives/java Appreciate it, Paul -Original Message- From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 29, 2008 11:30 AM To: nutch-user@lucene.apache.org Subject: Re: New Installation - Problems - Error 500 Paul Stewart wrote: java.lang.NoClassDefFoundError: org.apache.hadoop.util.ReflectionUtils java.lang.Class.initializeClass(libgcj.so.7rh) This is not coming from Sun JDK - it's coming from GCJ. Check which version of Java is used by Tomcat. org.apache.hadoop.fs.FileSystem.get(FileSystem.java:159) org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:119) org.apache.hadoop.fs.FileSystem.get(FileSystem.java:91) org.apache.nutch.searcher.NutchBean.init(NutchBean.java:95) org.apache.nutch.searcher.NutchBean.init(NutchBean.java:84) org.apache.nutch.searcher.NutchBean.get(NutchBean.java:71) org.apache.jsp.search_jsp._jspService(search_jsp.java:106) -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com The information transmitted is intended only for the person or entity to which it is addressed and contains confidential and/or privileged material. If you received this in error, please contact the sender immediately and then destroy this transmission, including all attachments, without copying, distributing or disclosing same. Thank you.
RE: New Installation - Problems - Error 500
Thanks for the reply... Java -version shows this: java version 1.4.2 gij (GNU libgcj) version 4.1.2 20070626 (Red Hat 4.1.2-14) I used all pre-built packages hoping that they would do the trick ;) I updated the tomcat startup script with the proper JAVA_HOME and now I get: [EMAIL PROTECTED] rc3.d]# /etc/rc3.d/K20tomcat5 start Starting tomcat5: /usr/bin/rebuild-jar-repository: error: Could not find jdbc-stdext Java extension for this JVM /usr/bin/rebuild-jar-repository: error: Could not find jndi Java extension for this JVM /usr/bin/rebuild-jar-repository: error: Some detected jars were not found for this jvm /usr/bin/rebuild-jar-repository: error: Could not find jaas Java extension for this JVM /usr/bin/rebuild-jar-repository: error: Some detected jars were not found for this jvm [ OK ] I know it's not Nutch specific so appreciate the patience here... Paul -Original Message- From: Martin Kuen [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 29, 2008 12:16 PM To: nutch-user@lucene.apache.org Subject: Re: New Installation - Problems - Error 500 Hi, if you type java -version in your shell the shell will output the java version you are using. I assume the output will refer to to gcj not to the sun-jdk. You should change your environment variables or create the necassary ones. Open a shell and in your tomcat installation's root directory try: export JAVA_HOME=/usr/java/jdk1.5.0_14 bin/startup.sh You may have a look on the cmdline what's tomcat's output after you start it. The startup script will print where it assumes JAVA_HOME is located. Did you use some prebuilt packages for CentOs or did you install it manually using the distribution from apache? Best Regards, Martin On Jan 29, 2008 5:38 PM, Paul Stewart [EMAIL PROTECTED] wrote: Thanks.. my apologies as new to Java (to complicate matters). When I check in the tomcat.conf file I can't find a place to specify. When I do a search, there is multiple versions installed: /usr/bin/java /usr/share/java /usr/include/c++/4.1.1/gnu/java /usr/include/c++/4.1.1/java /usr/java /usr/java/jdk1.5.0_14/bin/java /usr/java/jdk1.5.0_14/jre/bin/java /usr/lib/jvm-exports/java /usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/bin/java /usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre/bin/java /usr/lib/jvm/java /usr/lib/java /etc/alternatives/java /etc/java /var/lib/alternatives/java Appreciate it, Paul -Original Message- From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 29, 2008 11:30 AM To: nutch-user@lucene.apache.org Subject: Re: New Installation - Problems - Error 500 Paul Stewart wrote: java.lang.NoClassDefFoundError: org.apache.hadoop.util.ReflectionUtils java.lang.Class.initializeClass(libgcj.so.7rh) This is not coming from Sun JDK - it's coming from GCJ. Check which version of Java is used by Tomcat. org.apache.hadoop.fs.FileSystem.get(FileSystem.java:159) org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:119) org.apache.hadoop.fs.FileSystem.get(FileSystem.java:91) org.apache.nutch.searcher.NutchBean.init(NutchBean.java:95) org.apache.nutch.searcher.NutchBean.init(NutchBean.java:84) org.apache.nutch.searcher.NutchBean.get(NutchBean.java:71) org.apache.jsp.search_jsp._jspService(search_jsp.java:106) -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com The information transmitted is intended only for the person or entity to which it is addressed and contains confidential and/or privileged material. If you received this in error, please contact the sender immediately and then destroy this transmission, including all attachments, without copying, distributing or disclosing same. Thank you. The information transmitted is intended only for the person or entity to which it is addressed and contains confidential and/or privileged material. If you received this in error, please contact the sender immediately and then destroy this transmission, including all attachments, without copying, distributing or disclosing same. Thank you.
RE: New Installation - Problems - Error 500
Thanks to everyone for their help... I installed apache-tomcat by hand tonight and I have Nutch up and running now... Just a few questions if you don't mind: In Tomcat, I have webapps/nutch-0.9 as the directory making the URL http://www.blahblah.com:8080/nutch-0.9 I want it in the root URL - if I move the files up I just get a blank page even after restarting Tomcat? Also, the port is 8080 - where is the config setting to change this to a specific IP binding and port 80? This server has regular Apache running already but it's bound to other IP's etc Finally, where do I customize the look/feel for the webpages presented in Nutch? Again, thanks everyone! Hopefully my future questions will be more challenged ;) Paul -Original Message- From: Martin Kuen [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 29, 2008 2:16 PM To: nutch-user@lucene.apache.org Subject: Re: New Installation - Problems - Error 500 Hi, On Jan 29, 2008 7:14 PM, Paul Stewart [EMAIL PROTECTED] wrote: Thanks for the reply... Java -version shows this: java version 1.4.2 gij (GNU libgcj) version 4.1.2 20070626 (Red Hat 4.1.2-14) I just had a closer look at your stacktrace and your gij version. It's version 1.4.2 and it doesn't find a class called java.util.concurrent.ConcurrentHashMap. The java.util.concurrent package was added with java 1.5. Actually I am wondering why gij loads the classes (it should not). Sun's jre would not even load the classes (bytecode version incompability). I used all pre-built packages hoping that they would do the trick ;) I updated the tomcat startup script with the proper JAVA_HOME and now I get: [EMAIL PROTECTED] rc3.d]# /etc/rc3.d/K20tomcat5 start Starting tomcat5: /usr/bin/rebuild-jar-repository: error: Could not find jdbc-stdext Java extension for this JVM /usr/bin/rebuild-jar-repository: error: Could not find jndi Java extension for this JVM /usr/bin/rebuild-jar-repository: error: Some detected jars were not found for this jvm /usr/bin/rebuild-jar-repository: error: Could not find jaas Java extension for this JVM /usr/bin/rebuild-jar-repository: error: Some detected jars were not found for this jvm [ OK ] prebuilt packages . . . This script is centOS specific (I don't know centOS). It seems as if this script somehow depends on gij. Btw. without additional libs tomcat 5.5 doesn't even run with pre 1.5 vms, so this prebuilt version is somewhat tweaked anyway. I recommend to you to download the distribution from apache and try it with that dist (and jdk-1.5). I cannot give you any advice on how to configure this for this kind of package. However, running tomcat using the apache dist is rather simple. Just unzip it and set JAVA_HOME and execute bin/startup.sh. I know it's not Nutch specific so appreciate the patience here... Paul Best Regards, Martin -Original Message- From: Martin Kuen [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 29, 2008 12:16 PM To: nutch-user@lucene.apache.org Subject: Re: New Installation - Problems - Error 500 Hi, if you type java -version in your shell the shell will output the java version you are using. I assume the output will refer to to gcj not to the sun-jdk. You should change your environment variables or create the necassary ones. Open a shell and in your tomcat installation's root directory try: export JAVA_HOME=/usr/java/jdk1.5.0_14 bin/startup.sh You may have a look on the cmdline what's tomcat's output after you start it. The startup script will print where it assumes JAVA_HOME is located. Did you use some prebuilt packages for CentOs or did you install it manually using the distribution from apache? Best Regards, Martin On Jan 29, 2008 5:38 PM, Paul Stewart [EMAIL PROTECTED] wrote: Thanks.. my apologies as new to Java (to complicate matters). When I check in the tomcat.conf file I can't find a place to specify. When I do a search, there is multiple versions installed: /usr/bin/java /usr/share/java /usr/include/c++/4.1.1/gnu/java /usr/include/c++/4.1.1/java /usr/java /usr/java/jdk1.5.0_14/bin/java /usr/java/jdk1.5.0_14/jre/bin/java /usr/lib/jvm-exports/java /usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/bin/java /usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre/bin/java /usr/lib/jvm/java /usr/lib/java /etc/alternatives/java /etc/java /var/lib/alternatives/java Appreciate it, Paul -Original Message- From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 29, 2008 11:30 AM To: nutch-user@lucene.apache.org Subject: Re: New Installation - Problems - Error 500 Paul Stewart wrote: java.lang.NoClassDefFoundError: org.apache.hadoop.util.ReflectionUtils java.lang.Class.initializeClass(libgcj.so.7rh) This is not coming from Sun JDK - it's coming from GCJ. Check which version of Java is used by Tomcat
RE: New Installation - Problems - Error 500
That's wonderful - what a great list! You guys respond very quickly... Now I gotta get back to reading the docs as I'm sure most of what I just asked is already in there...;) Best! Paul -Original Message- From: John Mendenhall [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 29, 2008 10:58 PM To: nutch-user@lucene.apache.org Subject: Re: New Installation - Problems - Error 500 Just a few questions if you don't mind: In Tomcat, I have webapps/nutch-0.9 as the directory making the URL http://www.blahblah.com:8080/nutch-0.9 I want it in the root URL - if I move the files up I just get a blank page even after restarting Tomcat? Also, the port is 8080 - where is the config setting to change this to a specific IP binding and port 80? This server has regular Apache running already but it's bound to other IP's etc To put it in the root URL, stop tomcat, remove the ROOT directory, and rename the nutch-0.9 directory to ROOT. Change the port in the tomcat/conf/server.xml file. You need to understand which ones to change to make sure it still works since there are some levels of redirection. Finally, where do I customize the look/feel for the webpages presented in Nutch? If you are making the changes at the source, the files are in the nutch/src/web directory. You'll need to look into the style directory for the basic xlst style pages which are used to build the pages. The pages directory has the main pages. There are also headers and footers in the include directory. And, the jsp pages are in the jsp directory. Simple, huh? If you want to just modify what is already in the tomcat directory, they are located in the webapps/ROOT directory in various directories, assuming you renamed it to ROOT. I hope that helps. JohnM -- john mendenhall [EMAIL PROTECTED] surf utopia internet services The information transmitted is intended only for the person or entity to which it is addressed and contains confidential and/or privileged material. If you received this in error, please contact the sender immediately and then destroy this transmission, including all attachments, without copying, distributing or disclosing same. Thank you.
Stats?
Hi folks... Is there a way to retrieve stats from Nutch - meaning how many webpages are indexed, to be indexed etc?? When I was working with AspSeek and Mnogosearch in the past I could run a command to see stats Thanks again, Paul The information transmitted is intended only for the person or entity to which it is addressed and contains confidential and/or privileged material. If you received this in error, please contact the sender immediately and then destroy this transmission, including all attachments, without copying, distributing or disclosing same. Thank you.
Limiting Crawl Time
Hi folks... What is the best way to say limit crawling to perhaps 3-4 hours per day? Is there a way to do this? Right now, I have a crawl depth of 6 and maximum per site of 100. I thought this would limit things pretty low but during some test crawls, my last crawl took 2.5 days to complete: Statistics for CrawlDb: crawl/crawldb TOTAL urls: 1566612 retry 0:1549310 retry 1:12814 retry 2:1601 retry 3:2887 min score: 0.0 avg score: 0.037 max score: 429.15 status 1 (db_unfetched):1021400 status 2 (db_fetched): 446907 status 3 (db_gone): 74420 status 4 (db_redir_temp): 13861 status 5 (db_redir_perm): 10024 CrawlDb statistics: done What I would like to do is crawl for 3-4 hours per day at most to gradually fill the index thoughts? Thanks very much, Paul The information transmitted is intended only for the person or entity to which it is addressed and contains confidential and/or privileged material. If you received this in error, please contact the sender immediately and then destroy this transmission, including all attachments, without copying, distributing or disclosing same. Thank you.
RE: Limiting Crawl Time
Thanks - perhaps I misunderstand the depth and topN commands.. My understanding of the depth command is that Nutch will only go X deep in the URL's to find websites - if I can that depth later does that mean it will go deeper at a later point in time? I thought it would continue ignoring URL's at that depth once it was told a higher depth? In other words, if I run a crawl with a depth of 2 and then a week later run a depth of 4, and then perhaps a couple of weeks later run a depth of 6 will that work? Finally, the topN command - does that mean to only select the 1000 best URL's this *particular* crawl but in the *next* crawl pick another 1000 to match? I guess on both of these commands I was under the impression that large chunks of websites would never get crawled no matter how many times I went back to crawl it? Thanks very much for the clarification... Paul -Original Message- From: Susam Pal [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 05, 2008 10:36 PM To: nutch-user@lucene.apache.org Subject: Re: Limiting Crawl Time Did you try specifying a topN value? -depth 3 -topN 1000 should be close to what you want. On 2/6/08, Paul Stewart [EMAIL PROTECTED] wrote: Hi folks... What is the best way to say limit crawling to perhaps 3-4 hours per day? Is there a way to do this? Right now, I have a crawl depth of 6 and maximum per site of 100. I thought this would limit things pretty low but during some test crawls, my last crawl took 2.5 days to complete: Statistics for CrawlDb: crawl/crawldb TOTAL urls: 1566612 retry 0:1549310 retry 1:12814 retry 2:1601 retry 3:2887 min score: 0.0 avg score: 0.037 max score: 429.15 status 1 (db_unfetched):1021400 status 2 (db_fetched): 446907 status 3 (db_gone): 74420 status 4 (db_redir_temp): 13861 status 5 (db_redir_perm): 10024 CrawlDb statistics: done What I would like to do is crawl for 3-4 hours per day at most to gradually fill the index thoughts? Thanks very much, Paul The information transmitted is intended only for the person or entity to which it is addressed and contains confidential and/or privileged material. If you received this in error, please contact the sender immediately and then destroy this transmission, including all attachments, without copying, distributing or disclosing same. Thank you. -- Sent from Gmail for mobile | mobile.google.com The information transmitted is intended only for the person or entity to which it is addressed and contains confidential and/or privileged material. If you received this in error, please contact the sender immediately and then destroy this transmission, including all attachments, without copying, distributing or disclosing same. Thank you.