Hello John and Fellow coders,
I there any resolution for this 50010 port connection error !! I am really
struggling to get the multiple node environment working. I belive I have
followed all the steps on the wiki. I am using nutch 0.9.
Thanks !
08-03-05 13:01:08,876 WARN dfs.DataNode - Failed to transfer
blk_-1407334809134504262 to /9.2.209.4:50010
java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java
:195)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:519)
at org.apache.hadoop.dfs.DataNode$DataTransfer.run(DataNode.java
:995)
at java.lang.Thread.run(Thread.java:619)
On Fri, Jan 11, 2008 at 12:57 AM, John Mendenhall <[EMAIL PROTECTED]>
wrote:
> Hello,
>
> I am running nutch 0.9 currently.
> I am running on 4 nodes, one is the master, in
> addition to being a slave.
>
> I am running the nutch crawl command.
> Everything runs fine until it gets to the dedup
> command. The output from the command is as follows:
>
> -----
> Dedup: starting
> Dedup: adding indexes in: /var/nutch/crawl/indexes
> Exception in thread "main" java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
> at org.apache.nutch.indexer.DeleteDuplicates.dedup(
> DeleteDuplicates.java:439)
> at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)
> -----
>
> Can anyone please point me in the direction of getting this
> to work? I have excerpts of the interesting logs below. I
> have read for hours posts on these errors, if I could find any.
> It appears from many of the posts some of these are inocuous,
> due to the WARN message type.
>
> I did turn on the debug for log4j for the dedup process, so I
> could see if I could find anything else amiss. However, I
> was unable to determine the cause of the problem.
>
> Everything worked great when we had everything on a single
> machine, everything set to local, no distributed file system.
>
> Thank you in advance for any assistance or pointers you can
> provide.
>
> The namenode log on the master has the following errors
> which occurred at approximately the same time::
>
> -----
> 2008-01-10 18:28:03,358 WARN dfs.StateChange - DIR*
> FSDirectory.unprotectedDelete: failed to remove
> /var/nutch/crawl/indexes/part-00012 because it does not exist
> 2008-01-10 18:28:07,145 WARN dfs.StateChange - DIR*
> FSDirectory.unprotectedDelete: failed to remove
> /var/nutch/crawl/indexes/part-00011 because it does not exist
> 2008-01-10 18:28:10,562 WARN dfs.StateChange - DIR*
> FSDirectory.unprotectedDelete: failed to remove
> /var/nutch/crawl/indexes/part-00015 because it does not exist
> 2008-01-10 18:28:12,616 WARN dfs.StateChange - DIR*
> FSDirectory.unprotectedDelete: failed to remove
> /var/nutch/crawl/indexes/part-00013 because it does not exist
> 2008-01-10 18:28:13,955 WARN dfs.StateChange - DIR*
> FSDirectory.unprotectedDelete: failed to remove
> /var/nutch/crawl/indexes/part-00014 because it does not exist
> 2008-01-10 18:28:16,526 WARN dfs.StateChange - DIR*
> FSDirectory.unprotectedDelete: failed to remove
> /var/mapred/system/job_0018 because it does not exist
> 2008-01-10 18:28:22,028 WARN fs.FSNamesystem - Not able to place enough
> replicas, still in need of 1
> 2008-01-10 18:28:22,114 WARN fs.FSNamesystem - Not able to place enough
> replicas, still in need of 1
> 2008-01-10 18:28:22,207 WARN fs.FSNamesystem - Not able to place enough
> replicas, still in need of 1
> 2008-01-10 18:29:16,724 WARN dfs.StateChange - DIR*
> FSDirectory.unprotectedDelete: failed to remove
> /var/mapred/system/job_0019 because it does not exist
> -----
>
> The datanode log on the master has the following errors
> which occurred at approximately the same time::
>
> -----
> 2008-01-10 18:28:29,742 WARN dfs.DataNode - Failed to transfer
> blk_-2596562194274011404 to /76.250.98.171:50010
> java.net.SocketException: Broken pipe
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java
> :92)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
> at java.io.BufferedOutputStream.flushBuffer(
> BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.write(BufferedOutputStream.java
> :109)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at org.apache.hadoop.dfs.DataNode$DataTransfer.run(DataNode.java
> :1020)
> at java.lang.Thread.run(Thread.java:619)
> 2008-01-10 18:28:31,412 WARN dfs.DataNode - Failed to transfer
> blk_-2596562194274011404 to /76.250.98.171:50010
> java.net.SocketException: Broken pipe
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java
> :92)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
> at java.io.BufferedOutputStream.flushBuffer(
> BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.write(BufferedOutputStream.java
> :109)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at org.apache.hadoop.dfs.DataNode$DataTransfer.run(DataNode.java
> :1020)
> at java.lang.Thread.run(Thread.java:619)
> -----
>
> The jobtracker, tasktracker, and secondarynamenode logs appear to be
> normal.
>
> The hadoop.log file contains the following interesting entries:
> (I have filtered out the thousands of debug ipc calls and results.)
>
> -----
> 2008-01-10 18:28:18,233 INFO indexer.DeleteDuplicates - Dedup: starting
> 2008-01-10 18:28:18,234 DEBUG conf.Configuration - java.io.IOException:
> config(config)
> at org.apache.hadoop.conf.Configuration.<init>(Configuration.java
> :102)
> at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:77)
> at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:88)
> at org.apache.nutch.util.NutchJob.<init>(NutchJob.java:27)
> at org.apache.nutch.indexer.DeleteDuplicates.dedup(
> DeleteDuplicates.java:418)
> at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)
>
> 2008-01-10 18:28:18,367 INFO indexer.DeleteDuplicates - Dedup: adding
> indexes in: /var/nutch/crawl/indexes
> 2008-01-10 18:28:18,382 DEBUG mapred.JobClient - default FileSystem:
> hdfs://sunset2:50000
> 2008-01-10 18:28:21,672 INFO mapred.InputFormatBase - Total input paths
> to process : 16
> 2008-01-10 18:28:21,674 DEBUG mapred.JobClient - Creating splits at
> hdfs://sunset2:50000/var/mapred/system/submit_qb31lw/job.split
> 2008-01-10 18:28:24,145 INFO mapred.JobClient - Running job: job_0019
> 2008-01-10 18:28:25,156 INFO mapred.JobClient - map 0% reduce 0%
> 2008-01-10 18:28:33,267 DEBUG mapred.TaskTracker - Child starting
> 2008-01-10 18:28:33,304 DEBUG conf.Configuration - java.io.IOException:
> config()
> at org.apache.hadoop.conf.Configuration.<init>(Configuration.java
> :93)
> at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:58)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java
> :1425)
>
> 2008-01-10 18:28:33,516 DEBUG mapred.TaskTracker - Child starting
> 2008-01-10 18:28:33,553 DEBUG conf.Configuration - java.io.IOException:
> config()
> at org.apache.hadoop.conf.Configuration.<init>(Configuration.java
> :93)
> at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:58)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java
> :1425)
>
> 2008-01-10 18:28:35,485 DEBUG conf.Configuration - java.io.IOException:
> config()
> at org.apache.hadoop.conf.Configuration.<init>(Configuration.java
> :93)
> at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:107)
> at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:99)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java
> :1435)
>
> 2008-01-10 18:28:35,657 DEBUG conf.Configuration - java.io.IOException:
> config()
> at org.apache.hadoop.conf.Configuration.<init>(Configuration.java
> :93)
> at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:107)
> at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:99)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java
> :1435)
>
> 2008-01-10 18:28:36,858 DEBUG mapred.MapTask - Started thread: Sort
> progress reporter for task task_0019_m_000004_0
> 2008-01-10 18:28:37,406 DEBUG mapred.MapTask - Started thread: Sort
> progress reporter for task task_0019_m_000000_0
> 2008-01-10 18:28:38,133 WARN mapred.TaskTracker - Error running child
> java.lang.ArrayIndexOutOfBoundsException: -1
> at org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java
> :113)
> at
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(
> DeleteDuplicates.java:176)
> at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java
> :1445)
> 2008-01-10 18:28:38,787 DEBUG mapred.MapTask - opened spill0.out
> 2008-01-10 18:28:39,335 INFO mapred.JobClient - map 6% reduce 0%
> 2008-01-10 18:28:41,142 DEBUG mapred.TaskTracker - Child starting
> 2008-01-10 18:28:41,179 DEBUG conf.Configuration - java.io.IOException:
> config()
> at org.apache.hadoop.conf.Configuration.<init>(Configuration.java
> :93)
> at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:58)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java
> :1425)
>
> 2008-01-10 18:28:41,358 INFO mapred.JobClient - Task Id :
> task_0019_m_000001_0, Status : FAILED
> 2008-01-10 18:28:41,494 INFO mapred.JobClient - Task Id :
> task_0019_m_000004_0, Status : FAILED
> 2008-01-10 18:28:42,738 INFO mapred.JobClient - Task Id :
> task_0019_m_000005_0, Status : FAILED
> 2008-01-10 18:28:42,757 INFO mapred.JobClient - Task Id :
> task_0019_m_000002_0, Status : FAILED
> 2008-01-10 18:28:43,338 DEBUG conf.Configuration - java.io.IOException:
> config()
> at org.apache.hadoop.conf.Configuration.<init>(Configuration.java
> :93)
> at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:107)
> at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:99)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java
> :1435)
>
> 2008-01-10 18:28:43,716 DEBUG mapred.TaskTracker - Child starting
> 2008-01-10 18:28:43,758 DEBUG conf.Configuration - java.io.IOException:
> config()
> at org.apache.hadoop.conf.Configuration.<init>(Configuration.java
> :93)
> at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:58)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java
> :1425)
>
> 2008-01-10 18:28:44,494 DEBUG mapred.MapTask - Started thread: Sort
> progress reporter for task task_0019_m_000007_0
> 2008-01-10 18:28:44,798 INFO mapred.JobClient - Task Id :
> task_0019_m_000006_0, Status : FAILED
> 2008-01-10 18:28:45,749 WARN mapred.TaskTracker - Error running child
> java.lang.ArrayIndexOutOfBoundsException: -1
> at org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java
> :113)
> at
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(
> DeleteDuplicates.java:176)
> at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java
> :1445)
> 2008-01-10 18:28:45,912 DEBUG conf.Configuration - java.io.IOException:
> config()
> at org.apache.hadoop.conf.Configuration.<init>(Configuration.java
> :93)
> at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:107)
> at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:99)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java
> :1435)
>
> 2008-01-10 18:28:47,047 DEBUG mapred.MapTask - Started thread: Sort
> progress reporter for task task_0019_m_000001_1
> 2008-01-10 18:28:48,253 WARN mapred.TaskTracker - Error running child
> java.lang.ArrayIndexOutOfBoundsException: -1
> at org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java
> :113)
> at
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(
> DeleteDuplicates.java:176)
> at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java
> :1445)
> 2008-01-10 18:28:49,879 INFO mapred.JobClient - Task Id :
> task_0019_m_000007_0, Status : FAILED
> 2008-01-10 18:28:50,908 INFO mapred.JobClient - Task Id :
> task_0019_m_000008_0, Status : FAILED
> 2008-01-10 18:28:50,920 INFO mapred.JobClient - Task Id :
> task_0019_m_000004_1, Status : FAILED
> 2008-01-10 18:28:50,949 DEBUG mapred.TaskTracker - Child starting
> 2008-01-10 18:28:50,986 DEBUG conf.Configuration - java.io.IOException:
> config()
> at org.apache.hadoop.conf.Configuration.<init>(Configuration.java
> :93)
> at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:58)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java
> :1425)
>
> 2008-01-10 18:28:51,938 INFO mapred.JobClient - Task Id :
> task_0019_m_000001_1, Status : FAILED
> 2008-01-10 18:28:52,969 INFO mapred.JobClient - Task Id :
> task_0019_m_000005_1, Status : FAILED
> 2008-01-10 18:28:53,123 DEBUG conf.Configuration - java.io.IOException:
> config()
> at org.apache.hadoop.conf.Configuration.<init>(Configuration.java
> :93)
> at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:107)
> at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:99)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java
> :1435)
>
> 2008-01-10 18:28:53,713 DEBUG mapred.TaskTracker - Child starting
> 2008-01-10 18:28:53,753 DEBUG conf.Configuration - java.io.IOException:
> config()
> at org.apache.hadoop.conf.Configuration.<init>(Configuration.java
> :93)
> at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:58)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java
> :1425)
>
> 2008-01-10 18:28:54,009 INFO mapred.JobClient - Task Id :
> task_0019_m_000009_0, Status : FAILED
> 2008-01-10 18:28:54,317 DEBUG mapred.MapTask - Started thread: Sort
> progress reporter for task task_0019_m_000006_1
> 2008-01-10 18:28:55,614 WARN mapred.TaskTracker - Error running child
> java.lang.ArrayIndexOutOfBoundsException: -1
> at org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java
> :113)
> at
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(
> DeleteDuplicates.java:176)
> at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java
> :1445)
> 2008-01-10 18:28:55,960 DEBUG conf.Configuration - java.io.IOException:
> config()
> at org.apache.hadoop.conf.Configuration.<init>(Configuration.java
> :93)
> at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:107)
> at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:99)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java
> :1435)
>
> 2008-01-10 18:28:57,080 DEBUG mapred.MapTask - Started thread: Sort
> progress reporter for task task_0019_m_000008_1
> 2008-01-10 18:28:58,067 INFO mapred.JobClient - Task Id :
> task_0019_m_000003_0, Status : FAILED
> 2008-01-10 18:28:58,303 WARN mapred.TaskTracker - Error running child
> java.lang.ArrayIndexOutOfBoundsException: -1
> at org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java
> :113)
> at
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(
> DeleteDuplicates.java:176)
> at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java
> :1445)
> 2008-01-10 18:28:59,087 INFO mapred.JobClient - Task Id :
> task_0019_m_000007_1, Status : FAILED
> 2008-01-10 18:28:59,099 INFO mapred.JobClient - Task Id :
> task_0019_m_000006_1, Status : FAILED
> 2008-01-10 18:28:59,112 INFO mapred.JobClient - Task Id :
> task_0019_m_000002_1, Status : FAILED
> 2008-01-10 18:29:02,157 INFO mapred.JobClient - Task Id :
> task_0019_m_000008_1, Status : FAILED
> 2008-01-10 18:29:02,168 INFO mapred.JobClient - Task Id :
> task_0019_m_000001_2, Status : FAILED
> 2008-01-10 18:29:08,247 INFO mapred.JobClient - Task Id :
> task_0019_m_000004_2, Status : FAILED
> 2008-01-10 18:29:17,365 INFO mapred.JobClient - map 100% reduce 100%
> 2008-01-10 18:29:17,367 INFO mapred.JobClient - Task Id :
> task_0019_m_000001_3, Status : FAILED
> 2008-01-10 18:29:20,870 DEBUG conf.Configuration - java.io.IOException:
> config()
> at org.apache.hadoop.conf.Configuration.<init>(Configuration.java
> :93)
> at org.apache.hadoop.fs.FsShell.main(FsShell.java:910)
>
> 2008-01-10 18:29:25,582 DEBUG conf.Configuration - java.io.IOException:
> config()
> at org.apache.hadoop.conf.Configuration.<init>(Configuration.java
> :93)
> at org.apache.hadoop.fs.FsShell.main(FsShell.java:910)
> -----
>
> If you need me to post log excerpts from the other slaves, please
> let me know and I'll put them up.
>
> Thanks!
>
> JohnM
>
> --
> john mendenhall
> [EMAIL PROTECTED]
> surf utopia
> internet services
>