problems with nutch clustering

Mohamed Imran K R Wed, 22 Aug 2007 03:01:21 -0700

hi
    we are trying to build a nutch cluster for the natural language
processing department of our research centre. we are deploying a search
engine for evaluation on tamil ( a south indian language). The search engine
works really well with all the customizations that they have done on a
single machine however we are facing some small issues on clustering. The
error i got sounds familiar but its vexing. I am using nutch 0.9 and
jdk1.5.0_12. i followed this tutorial
http://wiki.apache.org/nutch/NutchHadoopTutorial and it worked for a single
system but i came up with this error from the web interface for the slave
machine, when running the same on a cluster


Map output lost, rescheduling: getMapOutput(task_0001_m_000001_0,1) failed :
java.io.EOFException
        at java.io.DataInputStream.readFully(DataInputStream.java:178)
        at java.io.DataInputStream.readLong(DataInputStream.java:380)
        at 
org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:1643)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
        at 
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
        at 
org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
        at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
        at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
        at 
org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
        at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
        at org.mortbay.http.HttpServer.service(HttpServer.java:954)
        at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
        at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
        at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
        at 
org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
        at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
        at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)

The above is the error message that is thrown out on the slave console and
then on the nutch processes running on the slave just idles along...( i
check with tasktracker and its 0%)
i did change a couple of lines in log4j.properties as
hadoop.log.dir=.
hadoop.log.file=hadoop.log
but other than these, i am running a default 0.9 release.
looking forward to your help in solving this issue and having a nutch of a
time
BTW is there any IRC channel for nutch?
-- 
Regards
Mohamed Imran K R
AU-KBC Research Centre

problems with nutch clustering

Reply via email to