Problems Installing

2006-04-02 Thread Paul Stewart
Hi there...

I am trying to get nutch running Have done a trial indexing run
successfully etc...

Now I'm running into issues that may be more Tomcat related than Nutch:

HTTP Status 500 - 




type Exception report

message 

description The server encountered an internal error () that prevented
it from fulfilling this request.

exception 

org.apache.jasper.JasperException

org.apache.jasper.servlet.JspServletWrapper.service(javax.servlet.http.H
ttpServletRequest, javax.servlet.http.HttpServletResponse, boolean)
(/usr/lib/libjasper5-compiler-5.0.30.jar.so)

org.apache.jasper.servlet.JspServlet.serviceJspFile(javax.servlet.http.H
ttpServletRequest, javax.servlet.http.HttpServletResponse,
java.lang.String, java.lang.Throwable, boolean)
(/usr/lib/libjasper5-compiler-5.0.30.jar.so)

org.apache.jasper.servlet.JspServlet.service(javax.servlet.http.HttpServ
letRequest, javax.servlet.http.HttpServletResponse)
(/usr/lib/libjasper5-compiler-5.0.30.jar.so)

javax.servlet.http.HttpServlet.service(javax.servlet.ServletRequest,
javax.servlet.ServletResponse) (/usr/lib/libservletapi5-5.0.30.jar.so)

org.apache.catalina.valves.ErrorReportValve.invoke(org.apache.catalina.R
equest, org.apache.catalina.Response, org.apache.catalina.ValveContext)
(/usr/lib/libcatalina-5.0.30.jar.so)

org.apache.coyote.tomcat5.CoyoteAdapter.service(org.apache.coyote.Reques
t, org.apache.coyote.Response) (/usr/lib/libcatalina-5.0.30.jar.so)

org.apache.coyote.http11.Http11Processor.process(java.io.InputStream,
java.io.OutputStream) (/usr/lib/libtomcat-http11-5.0.30.jar.so)

org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.processC
onnection(org.apache.tomcat.util.net.TcpConnection, java.lang.Object[])
(/usr/lib/libtomcat-http11-5.0.30.jar.so)

org.apache.tomcat.util.net.TcpWorkerThread.runIt(java.lang.Object[])
(/tmp/libtomcat-util-5.0.30.jar.socuf3wu.so)
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run()
(/tmp/libtomcat-util-5.0.30.jar.socuf3wu.so)
java.lang.Thread.run() (/usr/lib/libgcj.so.6.0.0)


root cause 

java.lang.NullPointerException
org.apache.nutch.searcher.NutchBean.init(java.io.File,
java.io.File) (Unknown Source)
org.apache.nutch.searcher.NutchBean.NutchBean(java.io.File)
(Unknown Source)
org.apache.nutch.searcher.NutchBean.NutchBean() (Unknown Source)

org.apache.nutch.searcher.NutchBean.get(javax.servlet.ServletContext)
(Unknown Source)

org.apache.jsp.search_jsp._jspService(javax.servlet.http.HttpServletRequ
est, javax.servlet.http.HttpServletResponse) (Unknown Source)

org.apache.jasper.runtime.HttpJspBase.service(javax.servlet.http.HttpSer
vletRequest, javax.servlet.http.HttpServletResponse)
(/usr/lib/libjasper5-runtime-5.0.30.jar.so)

javax.servlet.http.HttpServlet.service(javax.servlet.ServletRequest,
javax.servlet.ServletResponse) (/usr/lib/libservletapi5-5.0.30.jar.so)

org.apache.jasper.servlet.JspServletWrapper.service(javax.servlet.http.H
ttpServletRequest, javax.servlet.http.HttpServletResponse, boolean)
(/usr/lib/libjasper5-compiler-5.0.30.jar.so)

org.apache.jasper.servlet.JspServlet.serviceJspFile(javax.servlet.http.H
ttpServletRequest, javax.servlet.http.HttpServletResponse,
java.lang.String, java.lang.Throwable, boolean)
(/usr/lib/libjasper5-compiler-5.0.30.jar.so)

org.apache.jasper.servlet.JspServlet.service(javax.servlet.http.HttpServ
letRequest, javax.servlet.http.HttpServletResponse)
(/usr/lib/libjasper5-compiler-5.0.30.jar.so)

javax.servlet.http.HttpServlet.service(javax.servlet.ServletRequest,
javax.servlet.ServletResponse) (/usr/lib/libservletapi5-5.0.30.jar.so)

org.apache.catalina.valves.ErrorReportValve.invoke(org.apache.catalina.R
equest, org.apache.catalina.Response, org.apache.catalina.ValveContext)
(/usr/lib/libcatalina-5.0.30.jar.so)

org.apache.coyote.tomcat5.CoyoteAdapter.service(org.apache.coyote.Reques
t, org.apache.coyote.Response) (/usr/lib/libcatalina-5.0.30.jar.so)

org.apache.coyote.http11.Http11Processor.process(java.io.InputStream,
java.io.OutputStream) (/usr/lib/libtomcat-http11-5.0.30.jar.so)

org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.processC
onnection(org.apache.tomcat.util.net.TcpConnection, java.lang.Object[])
(/usr/lib/libtomcat-http11-5.0.30.jar.so)

org.apache.tomcat.util.net.TcpWorkerThread.runIt(java.lang.Object[])
(/tmp/libtomcat-util-5.0.30.jar.socuf3wu.so)
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run()
(/tmp/libtomcat-util-5.0.30.jar.socuf3wu.so)
java.lang.Thread.run() (/usr/lib/libgcj.so.6.0.0)


note The full stack trace of the root cause is available in the Apache
Tomcat/5.0 logs.


RE: Problems Installing

2006-04-02 Thread Paul Stewart
Thanks for the reply...

I re-did what you mentioned below It re-installed just fine (I'm
running Fedora Core 4 and installed with yum using rpm's)

Even when I rename it, I must access it now via
http://www.myserver..:8080/root

Or else I get a 404 not found...  

When I try and do a search I get the same error

Any other thoughts? :)

Paul

-Original Message-
From: Dan Morrill [mailto:[EMAIL PROTECTED] 
Sent: Sunday, April 02, 2006 2:17 PM
To: nutch-user@lucene.apache.org
Subject: RE: Problems Installing

Did you:

1. remove the root.war from tomcat?
2. rename nutch.war to root.war and dump that into webapps under tomcat?
3. did it install ok (can you see the exploded pages under webapps root?

Just checking, this is how I fixed the same issue under windows. 

r/d

-Original Message-
From: Paul Stewart [mailto:[EMAIL PROTECTED]
Sent: Sunday, April 02, 2006 11:00 AM
To: nutch-user@lucene.apache.org
Subject: Problems Installing

Hi there...

I am trying to get nutch running Have done a trial indexing run
successfully etc...

Now I'm running into issues that may be more Tomcat related than Nutch:

HTTP Status 500 - 




type Exception report

message 

description The server encountered an internal error () that prevented
it from fulfilling this request.

exception 

org.apache.jasper.JasperException

org.apache.jasper.servlet.JspServletWrapper.service(javax.servlet.http.H
ttpServletRequest, javax.servlet.http.HttpServletResponse, boolean)
(/usr/lib/libjasper5-compiler-5.0.30.jar.so)

org.apache.jasper.servlet.JspServlet.serviceJspFile(javax.servlet.http.H
ttpServletRequest, javax.servlet.http.HttpServletResponse,
java.lang.String, java.lang.Throwable, boolean)
(/usr/lib/libjasper5-compiler-5.0.30.jar.so)

org.apache.jasper.servlet.JspServlet.service(javax.servlet.http.HttpServ
letRequest, javax.servlet.http.HttpServletResponse)
(/usr/lib/libjasper5-compiler-5.0.30.jar.so)

javax.servlet.http.HttpServlet.service(javax.servlet.ServletRequest,
javax.servlet.ServletResponse) (/usr/lib/libservletapi5-5.0.30.jar.so)

org.apache.catalina.valves.ErrorReportValve.invoke(org.apache.catalina.R
equest, org.apache.catalina.Response, org.apache.catalina.ValveContext)
(/usr/lib/libcatalina-5.0.30.jar.so)

org.apache.coyote.tomcat5.CoyoteAdapter.service(org.apache.coyote.Reques
t, org.apache.coyote.Response) (/usr/lib/libcatalina-5.0.30.jar.so)

org.apache.coyote.http11.Http11Processor.process(java.io.InputStream,
java.io.OutputStream) (/usr/lib/libtomcat-http11-5.0.30.jar.so)

org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.processC
onnection(org.apache.tomcat.util.net.TcpConnection, java.lang.Object[])
(/usr/lib/libtomcat-http11-5.0.30.jar.so)

org.apache.tomcat.util.net.TcpWorkerThread.runIt(java.lang.Object[])
(/tmp/libtomcat-util-5.0.30.jar.socuf3wu.so)
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run()
(/tmp/libtomcat-util-5.0.30.jar.socuf3wu.so)
java.lang.Thread.run() (/usr/lib/libgcj.so.6.0.0)


root cause 

java.lang.NullPointerException
org.apache.nutch.searcher.NutchBean.init(java.io.File,
java.io.File) (Unknown Source)
org.apache.nutch.searcher.NutchBean.NutchBean(java.io.File)
(Unknown Source)
org.apache.nutch.searcher.NutchBean.NutchBean() (Unknown Source)

org.apache.nutch.searcher.NutchBean.get(javax.servlet.ServletContext)
(Unknown Source)

org.apache.jsp.search_jsp._jspService(javax.servlet.http.HttpServletRequ
est, javax.servlet.http.HttpServletResponse) (Unknown Source)

org.apache.jasper.runtime.HttpJspBase.service(javax.servlet.http.HttpSer
vletRequest, javax.servlet.http.HttpServletResponse)
(/usr/lib/libjasper5-runtime-5.0.30.jar.so)

javax.servlet.http.HttpServlet.service(javax.servlet.ServletRequest,
javax.servlet.ServletResponse) (/usr/lib/libservletapi5-5.0.30.jar.so)

org.apache.jasper.servlet.JspServletWrapper.service(javax.servlet.http.H
ttpServletRequest, javax.servlet.http.HttpServletResponse, boolean)
(/usr/lib/libjasper5-compiler-5.0.30.jar.so)

org.apache.jasper.servlet.JspServlet.serviceJspFile(javax.servlet.http.H
ttpServletRequest, javax.servlet.http.HttpServletResponse,
java.lang.String, java.lang.Throwable, boolean)
(/usr/lib/libjasper5-compiler-5.0.30.jar.so)

org.apache.jasper.servlet.JspServlet.service(javax.servlet.http.HttpServ
letRequest, javax.servlet.http.HttpServletResponse)
(/usr/lib/libjasper5-compiler-5.0.30.jar.so)

javax.servlet.http.HttpServlet.service(javax.servlet.ServletRequest,
javax.servlet.ServletResponse) (/usr/lib/libservletapi5-5.0.30.jar.so)

org.apache.catalina.valves.ErrorReportValve.invoke(org.apache.catalina.R
equest, org.apache.catalina.Response, org.apache.catalina.ValveContext)
(/usr/lib/libcatalina-5.0.30.jar.so

RE: Tomcat Problem

2006-04-02 Thread Paul Stewart
Where would I check that?  I can check the JSP file by copying the
nutch--.war file back over to the webroot and watch it expand etc... But
confused and new to tomcat stuff 

-Original Message-
From: Babu, KameshNarayana (GE, Research, consultant)
[mailto:[EMAIL PROTECTED] 
Sent: Sunday, April 02, 2006 11:51 PM
To: nutch-user@lucene.apache.org
Subject: RE: Tomcat Problem

Hey,
Check the classpath and ur JSP file.
Regards
Kamesh

-Original Message-
From: Paul Stewart [mailto:[EMAIL PROTECTED]
Sent: Monday, April 03, 2006 4:25 AM
To: nutch-user@lucene.apache.org
Subject: Tomcat Problem


Sorry if this is slightly off-topic but I'm just trying to get Nutch
running for testing...

I *think* this is Tomcat related:

HTTP Status 500 - 




type Exception report

message 

description The server encountered an internal error () that prevented
it from fulfilling this request.

exception 

org.apache.jasper.JasperException: Unable to compile class for JSP



RE: Tomcat Problem

2006-04-03 Thread Paul Stewart
/javamail/mailapi-1.3.1.jar
cp=/var/lib/tomcat5/common/lib/naming-factory.jar
cp=/usr/share/java/javamail/providers-1.3.1.jar
cp=/usr/share/java/libgcj-4.0.2.jar
cp=/var/lib/tomcat5/common/lib/naming-common.jar
cp=/usr/share/java/javamail/providers-1.3.1.jar
cp=/usr/share/java/libgcj-4.0.2.jar
cp=/usr/share/java/ant-launcher-1.6.2.jar
cp=/usr/share/java/jasper5-runtime-5.0.30.jar
cp=/var/lib/tomcat5/common/lib/naming-resources.jar
cp=/usr/share/java/jakarta-commons-dbcp-1.2.1.jar
cp=/usr/share/java/javamail/providers-1.3.1.jar
cp=/usr/share/java/jakarta-commons-collections-3.1.jar
cp=/var/lib/tomcat5/common/lib/naming-java.jar
cp=/usr/share/java/jakarta-commons-logging-api-1.0.4.jar
cp=/usr/share/java/javamail/providers-1.3.1.jar
cp=/usr/share/java/javamail/providers-1.3.1.jar
cp=/usr/share/java/jspapi-5.0.30.jar
cp=/usr/share/java/servletapi5-5.0.30.jar
cp=/usr/share/java/jaf-1.0.2.jar
cp=/usr/lib/jvm/java/lib/tools.jar
cp=/usr/share/tomcat5/bin/bootstrap.jar
cp=/usr/share/java/commons-logging-api.jar
cp=/usr/share/java/mx4j/mx4j.jar
work dir=/usr/share/tomcat5/work/Catalina/localhost/nutch
extension dir=/usr/share/java/ext
srcDir=/usr/share/tomcat5/work/Catalina/localhost/nutch
   compilerTargetVM=1.3
   compilerSourceVM=1.3
include=org/apache/jsp/index_jsp.java

3-Apr-06 1:57:34 AM org.apache.jasper.compiler.Compiler
generateClass(java.lang.String[])
SEVERE: Error compiling file:
/usr/share/tomcat5/work/Catalina/localhost/nutch//org/apache/jsp/index_j
sp.java [javac] Compiling 1 source file 

-Original Message-
From: Babu, KameshNarayana (GE, Research, consultant)
[mailto:[EMAIL PROTECTED] 
Sent: Sunday, April 02, 2006 11:51 PM
To: nutch-user@lucene.apache.org
Subject: RE: Tomcat Problem

Hey,
Check the classpath and ur JSP file.
Regards
Kamesh

-Original Message-
From: Paul Stewart [mailto:[EMAIL PROTECTED]
Sent: Monday, April 03, 2006 4:25 AM
To: nutch-user@lucene.apache.org
Subject: Tomcat Problem


Sorry if this is slightly off-topic but I'm just trying to get Nutch
running for testing...

I *think* this is Tomcat related:

HTTP Status 500 - 




type Exception report

message 

description The server encountered an internal error () that prevented
it from fulfilling this request.

exception 

org.apache.jasper.JasperException: Unable to compile class for JSP

org.apache.jasper.compiler.DefaultErrorHandler.javacError(java.lang.Stri
ng, java.lang.Exception) (/usr/lib/libjasper5-compiler-5.0.30.jar.so)

org.apache.jasper.compiler.ErrorDispatcher.javacError(java.lang.String,
java.lang.Exception) (/usr/lib/libjasper5-compiler-5.0.30.jar.so)

org.apache.jasper.compiler.Compiler.generateClass(java.lang.String[])
(/usr/lib/libjasper5-compiler-5.0.30.jar.so)
org.apache.jasper.compiler.Compiler.compile(boolean, boolean)
(/usr/lib/libjasper5-compiler-5.0.30.jar.so)
org.apache.jasper.compiler.Compiler.compile(boolean)
(/usr/lib/libjasper5-compiler-5.0.30.jar.so)
org.apache.jasper.compiler.Compiler.compile()
(/usr/lib/libjasper5-compiler-5.0.30.jar.so)
org.apache.jasper.JspCompilationContext.compile()
(/usr/lib/libjasper5-compiler-5.0.30.jar.so)

org.apache.jasper.servlet.JspServletWrapper.service(javax.servlet.http.H
ttpServletRequest, javax.servlet.http.HttpServletResponse, boolean)
(/usr/lib/libjasper5-compiler-5.0.30.jar.so)

org.apache.jasper.servlet.JspServlet.serviceJspFile(javax.servlet.http.H
ttpServletRequest, javax.servlet.http.HttpServletResponse,
java.lang.String, java.lang.Throwable, boolean)
(/usr/lib/libjasper5-compiler-5.0.30.jar.so)

org.apache.jasper.servlet.JspServlet.service(javax.servlet.http.HttpServ
letRequest, javax.servlet.http.HttpServletResponse)
(/usr/lib/libjasper5-compiler-5.0.30.jar.so)

javax.servlet.http.HttpServlet.service(javax.servlet.ServletRequest,
javax.servlet.ServletResponse) (/usr/lib/libservletapi5-5.0.30.jar.so)

org.apache.catalina.valves.ErrorReportValve.invoke(org.apache.catalina.R
equest, org.apache.catalina.Response, org.apache.catalina.ValveContext)
(/usr/lib/libcatalina-5.0.30.jar.so)

org.apache.coyote.tomcat5.CoyoteAdapter.service(org.apache.coyote.Reques
t, org.apache.coyote.Response) (/usr/lib/libcatalina-5.0.30.jar.so)

org.apache.coyote.http11.Http11Processor.process(java.io.InputStream,
java.io.OutputStream) (/usr/lib/libtomcat-http11-5.0.30.jar.so)

org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.processC
onnection(org.apache.tomcat.util.net.TcpConnection, java.lang.Object[])
(/usr/lib/libtomcat-http11-5.0.30.jar.so)

org.apache.tomcat.util.net.TcpWorkerThread.runIt(java.lang.Object[])
(/tmp/libtomcat-util-5.0.30.jar.soj8ryts.so)
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run()
(/tmp

Nutch 500 Error

2006-04-05 Thread Paul Stewart
Hi there...

I was having a number of problems with my install, mainly because I'm
not used to Tomcat and/or Nutch etc...

Anyways, I am running Fedora 4 and was told that the packages are bad
idea to use so uninstalled all of my java/tomcat rpm's and installed new
binaries today from the source sites (Sun Java / Apache Tomcat)

Things are looking better I think but when I try to run a search I get
this:

HTTP Status 500 - 




type Exception report

message 

description The server encountered an internal error () that prevented
it from fulfilling this request.

exception 

org.apache.jasper.JasperException

org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServle
tWrapper.java:510)

org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.ja
va:393)

org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:314)

org.apache.jasper.servlet.JspServlet.service(JspServlet.java:264)
javax.servlet.http.HttpServlet.service(HttpServlet.java:802)


root cause 

java.lang.NullPointerException
org.apache.nutch.searcher.NutchBean.init(NutchBean.java:96)
org.apache.nutch.searcher.NutchBean.init(NutchBean.java:82)
org.apache.nutch.searcher.NutchBean.init(NutchBean.java:72)
org.apache.nutch.searcher.NutchBean.get(NutchBean.java:64)
org.apache.jsp.search_jsp._jspService(search_jsp.java:112)

org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)
javax.servlet.http.HttpServlet.service(HttpServlet.java:802)

org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.ja
va:332)

org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:314)

org.apache.jasper.servlet.JspServlet.service(JspServlet.java:264)
javax.servlet.http.HttpServlet.service(HttpServlet.java:802)


The search page comes up fine I figure this must be something simple
I hope... I read on the Nutch site about utf-8 and added it to the
configuration of my Tomcat5 server... Any other ideas?

Thanks again,

Paul


RE: Nutch 500 Error

2006-04-06 Thread Paul Stewart
Thanks.. Tried that ... Same error 

HTTP Status 500 - 




type Exception report

message 

description The server encountered an internal error () that prevented
it from fulfilling this request.

exception 

org.apache.jasper.JasperException

org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServle
tWrapper.java:510)

org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.ja
va:393)

org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:314)

org.apache.jasper.servlet.JspServlet.service(JspServlet.java:264)
javax.servlet.http.HttpServlet.service(HttpServlet.java:802)


root cause 

java.lang.NullPointerException
org.apache.nutch.searcher.NutchBean.init(NutchBean.java:96)
org.apache.nutch.searcher.NutchBean.init(NutchBean.java:82)
org.apache.nutch.searcher.NutchBean.init(NutchBean.java:72)
org.apache.nutch.searcher.NutchBean.get(NutchBean.java:64)
org.apache.jsp.search_jsp._jspService(search_jsp.java:112)

org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)
javax.servlet.http.HttpServlet.service(HttpServlet.java:802)

org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.ja
va:332)

org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:314)

org.apache.jasper.servlet.JspServlet.service(JspServlet.java:264)
javax.servlet.http.HttpServlet.service(HttpServlet.java:802)

-Original Message-
From: TDLN [mailto:[EMAIL PROTECTED] 
Sent: Thursday, April 06, 2006 3:30 AM
To: nutch-user@lucene.apache.org
Subject: Re: Nutch 500 Error

My guess is you have to override the searcher.dir property in
nutch-site.xml and have it point to your crawl dir.

Rgrds, Thomas





RE: Nutch 500 Error

2006-04-11 Thread Paul Stewart
Thanks I was doing the java command wrong...

Back to my original problem - I re-ran throught the entire tutorial to
ensure I was doing it right and it seems proper How do I tell Nutch
where to look specifically in the code for the segments and indexes in
case it is in the wrong place?

All the best,
Paul
 

-Original Message-
From: sudhendra seshachala [mailto:[EMAIL PROTECTED] 
Sent: Thursday, April 06, 2006 12:02 PM
To: nutch-user@lucene.apache.org
Subject: RE: Nutch 500 Error

It should be java -versionI think.

Paul Stewart [EMAIL PROTECTED] wrote:  Thanks for the reply...
I apologize as I'm very new to the Java
world...:)

I am running the following:

Fedora Core 4
Apache Tomcat 5.5.16 (binary download from Tomcat site installed to
/usr/local/tomcat5)
jre1.5.0_06 (binary download from Sun site to /usr/java/jre1.5.0_06)

Weird though - when I try to do a java -v I get this now:

[EMAIL PROTECTED] jre1.5.0_06]# export JAVA_HOME=/usr/java/jre1.5.0_06/
[EMAIL PROTECTED] jre1.5.0_06]# /usr/java/jre1.5.0_06/bin/java -v
Unrecognized option: -v Could not create the Java virtual machine.

Is this my actual problem possibly? Or is this the wrong Java version to
be running? When I downloaded 1.4.x tomcat told me it didn't support
anything but 1.5.x 

Thanks again for your patience...
Paul


-Original Message-
From: TDLN [mailto:[EMAIL PROTECTED]
Sent: Thursday, April 06, 2006 7:16 AM
To: nutch-user@lucene.apache.org
Subject: Re: Nutch 500 Error

What version are you on? If you trace the NullPointerException back to
the code, the NutchBean.init method is where it expects to find the
index and segments, so either they're missing (did you follow the
tutorial and merge your segment indexes?) or it is looking in the wrong
place. That's what I think.

Rgrds, Thomas



On 4/6/06, Paul Stewart
wrote:
 Thanks.. Tried that ... Same error

 HTTP Status 500 -

 --
 --
 

 type Exception report

 message

 description The server encountered an internal error () that prevented

 it from fulfilling this request.

 exception

 org.apache.jasper.JasperException

 org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServ
 le
 tWrapper.java:510)

 org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.
 ja
 va:393)

 org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:31
 4)

 org.apache.jasper.servlet.JspServlet.service(JspServlet.java:264)
 javax.servlet.http.HttpServlet.service(HttpServlet.java:802)


 root cause

 java.lang.NullPointerException
 org.apache.nutch.searcher.NutchBean.init(NutchBean.java:96)
 org.apache.nutch.searcher.NutchBean.(NutchBean.java:82)
 org.apache.nutch.searcher.NutchBean.(NutchBean.java:72)
 org.apache.nutch.searcher.NutchBean.get(NutchBean.java:64)
 org.apache.jsp.search_jsp._jspService(search_jsp.java:112)

 org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)
 javax.servlet.http.HttpServlet.service(HttpServlet.java:802)

 org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.
 ja
 va:332)

 org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:31
 4)

 org.apache.jasper.servlet.JspServlet.service(JspServlet.java:264)
 javax.servlet.http.HttpServlet.service(HttpServlet.java:802)

 -Original Message-
 From: TDLN [mailto:[EMAIL PROTECTED]
 Sent: Thursday, April 06, 2006 3:30 AM
 To: nutch-user@lucene.apache.org
 Subject: Re: Nutch 500 Error

 My guess is you have to override the searcher.dir property in 
 nutch-site.xml and have it point to your crawl dir.

 Rgrds, Thomas







  Sudhi Seshachala
  http://sudhilogs.blogspot.com/
   



-
Yahoo! Messenger with Voice. PC-to-Phone calls for ridiculously low
rates.


RE: Nutch 500 Error

2006-04-12 Thread Paul Stewart
I may have found my problem but not sure

Here's my /usr/local/nutch directory:

drwxr-xr-x   2 root root 4096 Apr  2 13:06 bin
-rw-rw-r--   1 root root15567 Mar 31 13:40 build.xml
-rw-rw-r--   1 root root19814 Mar 31 13:40 CHANGES.txt
drwxr-xr-x   2 root root 4096 Apr 12 08:47 conf
drwxr-xr-x   2 root root 4096 Apr  8 17:22 crawl
drwxr-xr-x   3 root root 4096 Apr  8 17:29 db
-rw-rw-r--   1 root root 1845 Mar 31 13:40 default.properties
drwxr-xr-x  19 root root 4096 Mar 31 13:40 docs
drwxr-xr-x   2 root root 4096 Apr  2 11:15 lib
-rw-rw-r--   1 root root  615 Mar 31 13:40 LICENSE.txt
-rw-rw-r--   1 root root   755034 Mar 31 13:40 nutch-0.7.2.jar
-rw-rw-r--   1 root root 15806453 Mar 31 13:40 nutch-0.7.2.war
drwxr-xr-x  26 root root 4096 Mar 31 13:40 plugins
-rw-rw-r--   1 root root  403 Mar 31 13:40 README.txt
drwxr-xr-x   4 root root 4096 Apr  8 17:28 segments
drwxr-xr-x  11 root root 4096 Mar 31 13:40 src
-rw-r--r--   1 root root   65 Apr  8 17:28 urls

My crawl directory is empty

My db directory is this:

[EMAIL PROTECTED] db]# ls -l
total 4
-rw-r--r--  1 root root0 Apr  8 17:29 dbreadlock
-rw-r--r--  1 root root0 Apr  8 17:29 dbwritelock
drwxr-xr-x  6 root root 4096 Apr  8 17:29 webdb

Inside webdb is this:

drwxr-xr-x  2 root root 4096 Apr  8 17:29 linksByMD5
drwxr-xr-x  2 root root 4096 Apr  8 17:29 linksByURL
drwxr-xr-x  2 root root 4096 Apr  8 17:29 pagesByMD5
drwxr-xr-x  2 root root 4096 Apr  8 17:29 pagesByURL
-rw-r--r--  1 root root   17 Apr  8 17:29 stats

Back in my /usr/local/nutch/segments directory I have:

drwxr-xr-x  8 root root 4096 Apr  8 17:30 20060408172630
drwxr-xr-x  8 root root 4096 Apr  8 17:30 20060408172823

So something must be wrong then if I use /usr/local/nutch as my
searcher.dir right?  Even though I'm sure I followed the tutorial I'm
obviously missing something...?

Thanks again.

Paul
 

-Original Message-
From: sudhendra seshachala [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, April 11, 2006 10:45 PM
To: nutch-user@lucene.apache.org
Subject: RE: Nutch 500 Error

check the nutch-default.xml
there should be a property searcher.dir
Provide the path for the index folder.
Better still copy the property node and paste it in nutch-site.xml
provide the path for the index folder.
For ex:
If the index folder is stored as
home/nutch/crawl
- crawldb
- segments
- index
- indexes

point searcher.dir to home/nutch/crawl.
Hope this helps.

Thanks
Sudhi

Paul Stewart [EMAIL PROTECTED] wrote: Thanks I was doing
the java command wrong...

Back to my original problem - I re-ran throught the entire tutorial to
ensure I was doing it right and it seems proper How do I tell Nutch
where to look specifically in the code for the segments and indexes in
case it is in the wrong place?

All the best,
Paul
 



Hardware Planning

2007-11-28 Thread Paul Stewart
Hi folks...

I have read the archives and looking for input specific to my estimated
requirements:

Want to index about 100 million public webpages.  Space and bandwidth
are not a problem - coming up with the right hardware and keeping the
cost down is my goal.

I would estimate only 1-2 searches per second at least during the first
hardware phase.

With that in mind I'm trying to figure out whether to use a couple of
larger Dell servers or a bunch of small single CPU, 1 Gig RAM, 160 GB
hard drive type of machines

Anyone share what they are using for hardware for about 100 million
webpages and their search result times etc??  Realworld is important to
me and being able to scale is important

Thanks,

Paul









The information transmitted is intended only for the person or entity to which 
it is addressed and contains confidential and/or privileged material. If you 
received this in error, please contact the sender immediately and then destroy 
this transmission, including all attachments, without copying, distributing or 
disclosing same. Thank you.


RE: Hardware Planning

2007-11-29 Thread Paul Stewart
Thanks very much for the details... I appreciate it...

I'd be happy with the 500ms range on *average* but totally understand
your point about searches piling up

So you're suggesting about 20 million pages per box - each box with 4
drives, dual CPU and 4 gig RAM?

I guess what I don't totally understand is what servers need lots of RAM
and which ones need all the storage etc for sure.  I was thinking of
some low end boxes (2 gig RAM, 160 Gig HD, single low end processor) for
storage and a couple of heftier boxes (dual cpu, 4 Gig RAM, 500 GB hard
drives)  - is this way off track?

What needs RAM and what needs storage in the components of Nutch?

Thanks again,

Paul


-Original Message-
From: Ken Krugler [mailto:[EMAIL PROTECTED]
Sent: Thursday, November 29, 2007 11:13 AM
To: nutch-user@lucene.apache.org
Subject: RE: Hardware Planning

Hi Paul,

Leaving aside the hardware requirements for the crawl...

The main issue with what you need to achieve your is the nature of
your index. If you're using the results of a standard Nutch web
crawl, then search times  500ms shouldn't be a problem.

But you actually want something more in the range of say 200ms
average, as otherwise you can quickly run into the overlapping search
problem...once a search doesn't complete in time before another
search starts running, both searches take longer, which increases the
odds that the third search happens before the previous search(es)
have completed. So the performance can quickly deteriorate under a
load that's only slightly higher than your target case.

However getting 200ms time isn't hard either, as long as the hardware
is reasonable and the index size isn't huge.

In our experience, using more, cheaper boxes is the way to go. For
web crawl data, I would probably got with two 10M page indexes per
box, where the Lucene index goes on a smaller, faster drive and the
page contents go on a bigger, slower drive. So then you'd have two
faster drives and two slower drives per box, and use a dual CPU with
dual cores. And 4GB of RAM, so each JVM gets 1.5GB with some
breathing room for the OS.

Which means you'd need about five of these servers for 100M
pages...unless you want replication for reliability, which means 10
servers.

-- Ken

No, not familiar with that yet - can you send out any URL's?

My question is really whether you're better to try for one or two big
boxes or a series of small boxes - also looking for anyone who has 100
million pages in their index and a description of their hardware as a
reference point...

Thanks!

Paul


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of VK
Sent: Wednesday, November 28, 2007 9:53 PM
To: nutch-user@lucene.apache.org
Subject: Re: Hardware Planning

Have you considered EC2 + S3?

Also Rightscale has some interesting solutions, which I am currently
evaluating.

On Nov 28, 2007 9:38 PM, Paul Stewart [EMAIL PROTECTED]
wrote:

  Hi folks...

  I have read the archives and looking for input specific to my
estimated
  requirements:

   Want to index about 100 million public webpages.  Space and
bandwidth
  are not a problem - coming up with the right hardware and keeping
the
  cost down is my goal.

  I would estimate only 1-2 searches per second at least during the
first
  hardware phase.

  With that in mind I'm trying to figure out whether to use a couple
of
  larger Dell servers or a bunch of small single CPU, 1 Gig RAM, 160
GB
  hard drive type of machines

  Anyone share what they are using for hardware for about 100 million
  webpages and their search result times etc??  Realworld is important
to
  me and being able to scale is important
  
  Thanks,

  Paul









---
-


  The information transmitted is intended only for the person or
entity
to
  which it is addressed and contains confidential and/or privileged
material.
  If you received this in error, please contact the sender immediately
and
  then destroy this transmission, including all attachments, without
copying,
  distributing or disclosing same. Thank you.





---
-

The information transmitted is intended only for the person or
entity to which it is addressed and contains confidential and/or
privileged material. If you received this in error, please contact
the sender immediately and then destroy this transmission, including
all attachments, without copying, distributing or disclosing same.
Thank you.


--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
If you can't find it, you can't fix it






The information transmitted is intended only for the person or entity to which 
it is addressed and contains confidential and/or privileged material. If you 
received this in error, please contact the sender immediately and then destroy 
this transmission

New Installation - Problems - Error 500

2008-01-29 Thread Paul Stewart
Hi folks...

Just installing a new server for Nutch - testing at this point...

Ran a crawl with no problems but can't do a search without getting an
Error 500.

CentOS5.1, Tomcat5.5.20, Java SDK 1.5.0_14

The last time I installed Nutch I ran into a similar issue and it had to
do with a config setting and Java looking for input please ;)

Thanks,

Paul

type Exception report

message

description The server encountered an internal error () that prevented
it from fulfilling this request.

exception

org.apache.jasper.JasperException:
org.apache.hadoop.util.ReflectionUtils

org.apache.jasper.servlet.JspServletWrapper.handleJspException(jasper5-c
ompiler-5.5.23.jar.so)

org.apache.jasper.servlet.JspServletWrapper.service(jasper5-compiler-5.5
.23.jar.so)

org.apache.jasper.servlet.JspServlet.serviceJspFile(jasper5-compiler-5.5
.23.jar.so)

org.apache.jasper.servlet.JspServlet.service(jasper5-compiler-5.5.23.jar
.so)

javax.servlet.http.HttpServlet.service(tomcat5-servlet-2.4-api-5.5.23.ja
r.so)


root cause

javax.servlet.ServletException: org.apache.hadoop.util.ReflectionUtils

org.apache.jasper.runtime.PageContextImpl.doHandlePageException(jasper5-
runtime-5.5.23.jar.so)

org.apache.jasper.runtime.PageContextImpl.handlePageException(jasper5-ru
ntime-5.5.23.jar.so)
org.apache.jsp.search_jsp._jspService(search_jsp.java:777)

org.apache.jasper.runtime.HttpJspBase.service(jasper5-runtime-5.5.23.jar
.so)

javax.servlet.http.HttpServlet.service(tomcat5-servlet-2.4-api-5.5.23.ja
r.so)

org.apache.jasper.servlet.JspServletWrapper.service(jasper5-compiler-5.5
.23.jar.so)

org.apache.jasper.servlet.JspServlet.serviceJspFile(jasper5-compiler-5.5
.23.jar.so)

org.apache.jasper.servlet.JspServlet.service(jasper5-compiler-5.5.23.jar
.so)

javax.servlet.http.HttpServlet.service(tomcat5-servlet-2.4-api-5.5.23.ja
r.so)


root cause

java.lang.NoClassDefFoundError: org.apache.hadoop.util.ReflectionUtils
java.lang.Class.initializeClass(libgcj.so.7rh)
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:159)
org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:119)
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:91)
org.apache.nutch.searcher.NutchBean.init(NutchBean.java:95)
org.apache.nutch.searcher.NutchBean.init(NutchBean.java:84)
org.apache.nutch.searcher.NutchBean.get(NutchBean.java:71)
org.apache.jsp.search_jsp._jspService(search_jsp.java:106)

org.apache.jasper.runtime.HttpJspBase.service(jasper5-runtime-5.5.23.jar
.so)

javax.servlet.http.HttpServlet.service(tomcat5-servlet-2.4-api-5.5.23.ja
r.so)

org.apache.jasper.servlet.JspServletWrapper.service(jasper5-compiler-5.5
.23.jar.so)

org.apache.jasper.servlet.JspServlet.serviceJspFile(jasper5-compiler-5.5
.23.jar.so)

org.apache.jasper.servlet.JspServlet.service(jasper5-compiler-5.5.23.jar
.so)

javax.servlet.http.HttpServlet.service(tomcat5-servlet-2.4-api-5.5.23.ja
r.so)


root cause

java.lang.ClassNotFoundException: java.util.concurrent.ConcurrentHashMap

org.apache.catalina.loader.WebappClassLoader.loadClass(catalina-5.5.23.j
ar.so)

org.apache.catalina.loader.WebappClassLoader.loadClass(catalina-5.5.23.j
ar.so)
java.lang.Class.forName(libgcj.so.7rh)
java.lang.Class.initializeClass(libgcj.so.7rh)
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:159)
org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:119)
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:91)
org.apache.nutch.searcher.NutchBean.init(NutchBean.java:95)
org.apache.nutch.searcher.NutchBean.init(NutchBean.java:84)
org.apache.nutch.searcher.NutchBean.get(NutchBean.java:71)
org.apache.jsp.search_jsp._jspService(search_jsp.java:106)

org.apache.jasper.runtime.HttpJspBase.service(jasper5-runtime-5.5.23.jar
.so)

javax.servlet.http.HttpServlet.service(tomcat5-servlet-2.4-api-5.5.23.ja
r.so)

org.apache.jasper.servlet.JspServletWrapper.service(jasper5-compiler-5.5
.23.jar.so)

org.apache.jasper.servlet.JspServlet.serviceJspFile(jasper5-compiler-5.5
.23.jar.so)

org.apache.jasper.servlet.JspServlet.service(jasper5-compiler-5.5.23.jar
.so)

javax.servlet.http.HttpServlet.service(tomcat5-servlet-2.4-api-5.5.23.ja
r.so)






The information transmitted is intended only for the person or entity to which 
it is addressed and contains confidential and/or privileged material. If you 
received this in error, please contact the sender immediately and then destroy 
this transmission, including all attachments, without copying, distributing or 
disclosing same. Thank you.


RE: New Installation - Problems - Error 500

2008-01-29 Thread Paul Stewart
Thanks.. my apologies as new to Java (to complicate matters).

When I check in the tomcat.conf file I can't find a place to specify.  When I 
do a search, there is multiple versions installed:

/usr/bin/java
/usr/share/java
/usr/include/c++/4.1.1/gnu/java
/usr/include/c++/4.1.1/java
/usr/java
/usr/java/jdk1.5.0_14/bin/java
/usr/java/jdk1.5.0_14/jre/bin/java
/usr/lib/jvm-exports/java
/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/bin/java
/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre/bin/java
/usr/lib/jvm/java
/usr/lib/java
/etc/alternatives/java
/etc/java
/var/lib/alternatives/java

Appreciate it,

Paul


-Original Message-
From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, January 29, 2008 11:30 AM
To: nutch-user@lucene.apache.org
Subject: Re: New Installation - Problems - Error 500

Paul Stewart wrote:

 
 java.lang.NoClassDefFoundError: org.apache.hadoop.util.ReflectionUtils
   java.lang.Class.initializeClass(libgcj.so.7rh)

This is not coming from Sun JDK - it's coming from GCJ. Check which 
version of Java is used by Tomcat.

   org.apache.hadoop.fs.FileSystem.get(FileSystem.java:159)
   org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:119)
   org.apache.hadoop.fs.FileSystem.get(FileSystem.java:91)
   org.apache.nutch.searcher.NutchBean.init(NutchBean.java:95)
   org.apache.nutch.searcher.NutchBean.init(NutchBean.java:84)
   org.apache.nutch.searcher.NutchBean.get(NutchBean.java:71)
   org.apache.jsp.search_jsp._jspService(search_jsp.java:106)



-- 
Best regards,
Andrzej Bialecki 
  ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



 



The information transmitted is intended only for the person or entity to which 
it is addressed and contains confidential and/or privileged material. If you 
received this in error, please contact the sender immediately and then destroy 
this transmission, including all attachments, without copying, distributing or 
disclosing same. Thank you.


RE: New Installation - Problems - Error 500

2008-01-29 Thread Paul Stewart
Thanks for the reply...

Java -version shows this:

java version 1.4.2
gij (GNU libgcj) version 4.1.2 20070626 (Red Hat 4.1.2-14)

I used all pre-built packages hoping that they would do the trick ;)

I updated the tomcat startup script with the proper JAVA_HOME and now I
get:

[EMAIL PROTECTED] rc3.d]# /etc/rc3.d/K20tomcat5 start
Starting tomcat5: /usr/bin/rebuild-jar-repository: error: Could not find
jdbc-stdext Java extension for this JVM
/usr/bin/rebuild-jar-repository: error: Could not find jndi Java
extension for this JVM
/usr/bin/rebuild-jar-repository: error: Some detected jars were not
found for this jvm
/usr/bin/rebuild-jar-repository: error: Could not find jaas Java
extension for this JVM
/usr/bin/rebuild-jar-repository: error: Some detected jars were not
found for this jvm
   [  OK  ]

I know it's not Nutch specific so appreciate the patience here...

Paul


-Original Message-
From: Martin Kuen [mailto:[EMAIL PROTECTED]
Sent: Tuesday, January 29, 2008 12:16 PM
To: nutch-user@lucene.apache.org
Subject: Re: New Installation - Problems - Error 500

Hi,

if you type java -version in your shell the shell will output the java
version you are using. I assume the output will refer to to gcj not to
the
sun-jdk. You should change your environment variables or create the
necassary ones.
Open a shell and in your tomcat installation's root directory try:
export JAVA_HOME=/usr/java/jdk1.5.0_14
bin/startup.sh

You may have a look on the cmdline what's tomcat's output after you
start
it. The startup script will print where it assumes JAVA_HOME is located.

Did you use some prebuilt packages for CentOs or did you install it
manually
using the distribution from apache?

Best Regards,

Martin



On Jan 29, 2008 5:38 PM, Paul Stewart [EMAIL PROTECTED] wrote:

 Thanks.. my apologies as new to Java (to complicate matters).

 When I check in the tomcat.conf file I can't find a place to specify.
  When I do a search, there is multiple versions installed:

 /usr/bin/java
 /usr/share/java
 /usr/include/c++/4.1.1/gnu/java
 /usr/include/c++/4.1.1/java
 /usr/java
 /usr/java/jdk1.5.0_14/bin/java
 /usr/java/jdk1.5.0_14/jre/bin/java
 /usr/lib/jvm-exports/java
 /usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/bin/java
 /usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre/bin/java
 /usr/lib/jvm/java
 /usr/lib/java
 /etc/alternatives/java
 /etc/java
 /var/lib/alternatives/java

 Appreciate it,

 Paul


 -Original Message-
 From: Andrzej Bialecki [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, January 29, 2008 11:30 AM
 To: nutch-user@lucene.apache.org
 Subject: Re: New Installation - Problems - Error 500

 Paul Stewart wrote:

 
  java.lang.NoClassDefFoundError:
org.apache.hadoop.util.ReflectionUtils
java.lang.Class.initializeClass(libgcj.so.7rh)

 This is not coming from Sun JDK - it's coming from GCJ. Check which
 version of Java is used by Tomcat.

org.apache.hadoop.fs.FileSystem.get(FileSystem.java:159)
org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:119)
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:91)
org.apache.nutch.searcher.NutchBean.init(NutchBean.java:95)
org.apache.nutch.searcher.NutchBean.init(NutchBean.java:84)
org.apache.nutch.searcher.NutchBean.get(NutchBean.java:71)
org.apache.jsp.search_jsp._jspService(search_jsp.java:106)



 --
 Best regards,
 Andrzej Bialecki 
  ___. ___ ___ ___ _ _   __
 [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
 ___|||__||  \|  ||  |  Embedded Unix, System Integration
 http://www.sigram.com  Contact: info at sigram dot com










 The information transmitted is intended only for the person or entity
to
 which it is addressed and contains confidential and/or privileged
material.
 If you received this in error, please contact the sender immediately
and
 then destroy this transmission, including all attachments, without
copying,
 distributing or disclosing same. Thank you.







The information transmitted is intended only for the person or entity to which 
it is addressed and contains confidential and/or privileged material. If you 
received this in error, please contact the sender immediately and then destroy 
this transmission, including all attachments, without copying, distributing or 
disclosing same. Thank you.


RE: New Installation - Problems - Error 500

2008-01-29 Thread Paul Stewart
Thanks to everyone for their help... I installed apache-tomcat by hand
tonight and I have Nutch up and running now...

Just a few questions if you don't mind:

In Tomcat, I have webapps/nutch-0.9 as the directory making the URL
http://www.blahblah.com:8080/nutch-0.9

I want it in the root URL - if I move the files up I just get a blank
page even after restarting Tomcat?  Also, the port is 8080 - where
is the config setting to change this to a specific IP binding and port
80?  This server has regular Apache running already but it's bound to
other IP's etc

Finally, where do I customize the look/feel for the webpages presented
in Nutch?

Again, thanks everyone!  Hopefully my future questions will be more
challenged ;)

Paul


-Original Message-
From: Martin Kuen [mailto:[EMAIL PROTECTED]
Sent: Tuesday, January 29, 2008 2:16 PM
To: nutch-user@lucene.apache.org
Subject: Re: New Installation - Problems - Error 500

Hi,

On Jan 29, 2008 7:14 PM, Paul Stewart [EMAIL PROTECTED] wrote:

 Thanks for the reply...

 Java -version shows this:

 java version 1.4.2
 gij (GNU libgcj) version 4.1.2 20070626 (Red Hat 4.1.2-14)


I just had a closer look at your stacktrace and your gij version. It's
version 1.4.2 and it doesn't find a class called 
java.util.concurrent.ConcurrentHashMap. The java.util.concurrent
package
was added with java 1.5.

Actually I am wondering why gij loads the classes (it should not). Sun's
jre
would not even load the classes (bytecode version incompability).



 I used all pre-built packages hoping that they would do the trick ;)

 I updated the tomcat startup script with the proper JAVA_HOME and now
I
 get:

 [EMAIL PROTECTED] rc3.d]# /etc/rc3.d/K20tomcat5 start
 Starting tomcat5: /usr/bin/rebuild-jar-repository: error: Could not
find
 jdbc-stdext Java extension for this JVM
 /usr/bin/rebuild-jar-repository: error: Could not find jndi Java
 extension for this JVM
 /usr/bin/rebuild-jar-repository: error: Some detected jars were not
 found for this jvm
 /usr/bin/rebuild-jar-repository: error: Could not find jaas Java
 extension for this JVM
 /usr/bin/rebuild-jar-repository: error: Some detected jars were not
 found for this jvm
   [  OK  ]

prebuilt packages . . . This script is centOS specific (I don't know
centOS). It seems as if this script somehow depends on gij. Btw. without
additional libs tomcat 5.5 doesn't even run with pre 1.5 vms, so this
prebuilt version is somewhat tweaked anyway. I recommend to you to
download
the distribution from apache and try it with that dist (and jdk-1.5). I
cannot give you any advice on how to configure this for this kind of
package. However, running tomcat using the apache dist is rather simple.
Just unzip it and set JAVA_HOME and execute bin/startup.sh.


 I know it's not Nutch specific so appreciate the patience here...

 Paul


Best Regards,

Martin




 -Original Message-
 From: Martin Kuen [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, January 29, 2008 12:16 PM
 To: nutch-user@lucene.apache.org
 Subject: Re: New Installation - Problems - Error 500

 Hi,

 if you type java -version in your shell the shell will output the
java
 version you are using. I assume the output will refer to to gcj not to
 the
 sun-jdk. You should change your environment variables or create the
 necassary ones.
 Open a shell and in your tomcat installation's root directory try:
 export JAVA_HOME=/usr/java/jdk1.5.0_14
 bin/startup.sh

 You may have a look on the cmdline what's tomcat's output after you
 start
 it. The startup script will print where it assumes JAVA_HOME is
located.

 Did you use some prebuilt packages for CentOs or did you install it
 manually
 using the distribution from apache?

 Best Regards,

 Martin



 On Jan 29, 2008 5:38 PM, Paul Stewart [EMAIL PROTECTED]
wrote:

  Thanks.. my apologies as new to Java (to complicate matters).
 
  When I check in the tomcat.conf file I can't find a place to
specify.
   When I do a search, there is multiple versions installed:
 
  /usr/bin/java
  /usr/share/java
  /usr/include/c++/4.1.1/gnu/java
  /usr/include/c++/4.1.1/java
  /usr/java
  /usr/java/jdk1.5.0_14/bin/java
  /usr/java/jdk1.5.0_14/jre/bin/java
  /usr/lib/jvm-exports/java
  /usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/bin/java
  /usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre/bin/java
  /usr/lib/jvm/java
  /usr/lib/java
  /etc/alternatives/java
  /etc/java
  /var/lib/alternatives/java
 
  Appreciate it,
 
  Paul
 
 
  -Original Message-
  From: Andrzej Bialecki [mailto:[EMAIL PROTECTED]
  Sent: Tuesday, January 29, 2008 11:30 AM
  To: nutch-user@lucene.apache.org
  Subject: Re: New Installation - Problems - Error 500
 
  Paul Stewart wrote:
 
  
   java.lang.NoClassDefFoundError:
 org.apache.hadoop.util.ReflectionUtils
 java.lang.Class.initializeClass(libgcj.so.7rh)
 
  This is not coming from Sun JDK - it's coming from GCJ. Check which
  version of Java is used by Tomcat

RE: New Installation - Problems - Error 500

2008-01-30 Thread Paul Stewart
That's wonderful - what a great list!  You guys respond very quickly...

Now I gotta get back to reading the docs as I'm sure most of what I just
asked is already in there...;)

Best!

Paul


-Original Message-
From: John Mendenhall [mailto:[EMAIL PROTECTED]
Sent: Tuesday, January 29, 2008 10:58 PM
To: nutch-user@lucene.apache.org
Subject: Re: New Installation - Problems - Error 500

 Just a few questions if you don't mind:

 In Tomcat, I have webapps/nutch-0.9 as the directory making the URL
 http://www.blahblah.com:8080/nutch-0.9

 I want it in the root URL - if I move the files up I just get a blank
 page even after restarting Tomcat?  Also, the port is 8080 - where
 is the config setting to change this to a specific IP binding and port
 80?  This server has regular Apache running already but it's bound to
 other IP's etc

To put it in the root URL, stop tomcat, remove the
ROOT directory, and rename the nutch-0.9 directory
to ROOT.

Change the port in the tomcat/conf/server.xml
file.  You need to understand which ones to change
to make sure it still works since there are some
levels of redirection.

 Finally, where do I customize the look/feel for the webpages presented
 in Nutch?

If you are making the changes at the source,
the files are in the nutch/src/web directory.
You'll need to look into the style directory for
the basic xlst style pages which are used to
build the pages.  The pages directory has the
main pages.  There are also headers and footers
in the include directory.  And, the jsp pages
are in the jsp directory.  Simple, huh?

If you want to just modify what is already in
the tomcat directory, they are located in the
webapps/ROOT directory in various directories,
assuming you renamed it to ROOT.

I hope that helps.

JohnM

--
john mendenhall
[EMAIL PROTECTED]
surf utopia
internet services






The information transmitted is intended only for the person or entity to which 
it is addressed and contains confidential and/or privileged material. If you 
received this in error, please contact the sender immediately and then destroy 
this transmission, including all attachments, without copying, distributing or 
disclosing same. Thank you.


Stats?

2008-01-31 Thread Paul Stewart
Hi folks...



Is there a way to retrieve stats from Nutch - meaning how many webpages
are indexed, to be indexed etc??



When I was working with AspSeek and Mnogosearch in the past I could run
a command to see stats 



Thanks again,



Paul








The information transmitted is intended only for the person or entity to which 
it is addressed and contains confidential and/or privileged material. If you 
received this in error, please contact the sender immediately and then destroy 
this transmission, including all attachments, without copying, distributing or 
disclosing same. Thank you.

Limiting Crawl Time

2008-02-05 Thread Paul Stewart
Hi folks...

What is the best way to say limit crawling to perhaps 3-4 hours per day?
Is there a way to do this?

Right now, I have a crawl depth of 6 and maximum per site of 100.  I
thought this would limit things pretty low but during some test crawls,
my last crawl took 2.5 days to complete:

Statistics for CrawlDb: crawl/crawldb
TOTAL urls: 1566612
retry 0:1549310
retry 1:12814
retry 2:1601
retry 3:2887
min score:  0.0
avg score:  0.037
max score:  429.15
status 1 (db_unfetched):1021400
status 2 (db_fetched):  446907
status 3 (db_gone): 74420
status 4 (db_redir_temp):   13861
status 5 (db_redir_perm):   10024
CrawlDb statistics: done


What I would like to do is crawl for 3-4 hours per day at most to
gradually fill the index thoughts?

Thanks very much,

Paul







The information transmitted is intended only for the person or entity to which 
it is addressed and contains confidential and/or privileged material. If you 
received this in error, please contact the sender immediately and then destroy 
this transmission, including all attachments, without copying, distributing or 
disclosing same. Thank you.


RE: Limiting Crawl Time

2008-02-06 Thread Paul Stewart
Thanks - perhaps I misunderstand the depth and topN commands..

My understanding of the depth command is that Nutch will only go X deep
in the URL's to find websites - if I can that depth later does that mean
it will go deeper at a later point in time?  I thought it would continue
ignoring URL's at that depth once it was told a higher depth?  In other
words, if I run a crawl with a depth of 2 and then a week later run a
depth of 4, and then perhaps a couple of weeks later run a depth of 6
will that work?

Finally, the topN command - does that mean to only select the 1000
best URL's this *particular* crawl but in the *next* crawl pick
another 1000 to match?

I guess on both of these commands I was under the impression that large
chunks of websites would never get crawled no matter how many times I
went back to crawl it?

Thanks very much for the clarification...

Paul


-Original Message-
From: Susam Pal [mailto:[EMAIL PROTECTED]
Sent: Tuesday, February 05, 2008 10:36 PM
To: nutch-user@lucene.apache.org
Subject: Re: Limiting Crawl Time

Did you try specifying a topN value? -depth 3 -topN 1000 should be
close to what you want.

On 2/6/08, Paul Stewart [EMAIL PROTECTED] wrote:
 Hi folks...

 What is the best way to say limit crawling to perhaps 3-4 hours per
day?
 Is there a way to do this?

 Right now, I have a crawl depth of 6 and maximum per site of 100.  I
 thought this would limit things pretty low but during some test
crawls,
 my last crawl took 2.5 days to complete:

 Statistics for CrawlDb: crawl/crawldb
 TOTAL urls: 1566612
 retry 0:1549310
 retry 1:12814
 retry 2:1601
 retry 3:2887
 min score:  0.0
 avg score:  0.037
 max score:  429.15
 status 1 (db_unfetched):1021400
 status 2 (db_fetched):  446907
 status 3 (db_gone): 74420
 status 4 (db_redir_temp):   13861
 status 5 (db_redir_perm):   10024
 CrawlDb statistics: done


 What I would like to do is crawl for 3-4 hours per day at most to
 gradually fill the index thoughts?

 Thanks very much,

 Paul









 The information transmitted is intended only for the person or entity
to
 which it is addressed and contains confidential and/or privileged
material.
 If you received this in error, please contact the sender immediately
and
 then destroy this transmission, including all attachments, without
copying,
 distributing or disclosing same. Thank you.


--
Sent from Gmail for mobile | mobile.google.com






The information transmitted is intended only for the person or entity to which 
it is addressed and contains confidential and/or privileged material. If you 
received this in error, please contact the sender immediately and then destroy 
this transmission, including all attachments, without copying, distributing or 
disclosing same. Thank you.