Hi, I am trying to configure nutch and hadoop on 2 node. But while trying to fetch, i am getting this exception. (same exception i am getting sometime while injecting new seed)
2009-10-06 14:56:51,609 WARN mapred.ReduceTask - java.io.FileNotFoundException: http://127.0.0.1:50060/mapOutput? job=job_200910061454_0001&map=attempt_200910061454_0001_m_000000_0&reduce=3 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1345) at java.security.AccessController.doPrivileged(Native Method) at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1339) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:993) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1293) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1231) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1144) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1084) Caused by: java.io.FileNotFoundException: http://127.0.0.1:50060/mapOutput?job=job_200910061454_0001&map org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTrack er/jobcache/job_200910061454_0001/attempt_200910061454_0001_m_000000_0/output/f ile.out.index in any of the configured local directories at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalP athToRead(LocalDirAllocator.java:381) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAl locator.java:138) at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTrac ker.java:2840) at javax.servlet.http.HttpServlet.service(HttpServlet.java:689) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:42 7) at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicat ionHandler.java:475) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java: 567) at org.mortbay.http.HttpContext.handle(HttpContext.java:1565) at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicatio nContext.java:635) at org.mortbay.http.HttpContext.handle(HttpContext.java:1517) at org.mortbay.http.HttpServer.service(HttpServer.java:954) at org.mortbay.http.HttpConnection.service(HttpConnection.java:814) at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981) at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831) at org.mortbay.http.SocketListener.handleConnection(SocketListener.java :244) at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534) And then continueous message in hadooop.log like 2009-10-06 15:56:43,918 WARN mapred.ReduceTask - attempt_200910061538_0005_r_000001_0 adding host 127.0.0.1 to penalty box, next contact in 150 seconds Here is my hadoop-site.xml content: <property> <name>fs.default.name</name> <value>hdfs://crawler1.mydomain.com:9000/</value> <description> The name of the default file system. Either the literal string "local" or a host:port for NDFS. </description> </property> <property> <name>mapred.job.tracker</name> <value>crawler1.mydomain.com:9001</value> <description> The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property> <property> <name>mapred.map.tasks</name> <value>2</value> <description> define mapred.map tasks to be number of slave hosts </description> </property> <property> <name>mapred.reduce.tasks</name> <value>2</value> <description> define mapred.reduce tasks to be number of slave hosts </description> </property> <property> <name>dfs.name.dir</name> <value>/nutch/filesystem/name</value> </property> <property> <name>dfs.data.dir</name> <value>/nutch/filesystem/data</value> </property> <property> <name>mapred.system.dir</name> <value>/nutch/filesystem/mapreduce/system</value> </property> <property> <name>mapred.local.dir</name> <value>/nutch/filesystem/mapreduce/local</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/tmp</value> <description>A base for other temporary directories</description> </property> netstat -antp shows program listening on port 50060 tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 12855/java tcp 0 0 0.0.0.0:50060 0.0.0.0:* LISTEN 13014/java tcp 0 0 0.0.0.0:50030 0.0.0.0:* LISTEN 12923/java tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 12765/java tcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN 12765/java masters: crawler1.mydomain.com slaves: crawler1.mydomain.com crawler2.mydomain.com It works perfect with signle machine configuration. I am using nutch 1.0. Any pointer? Thanks. - Bhavin