Re: nutch 0.9, multiple nodes, dedup error and Failed to transfer blk_-1407334809134504262

Developer Developer Wed, 05 Mar 2008 10:34:30 -0800

Thanks job.

I found solution for port 500010.


It was just a firewall issue on the slave machine. I tested with firewall
turned off, it worked.

Thanks !

On Wed, Mar 5, 2008 at 1:31 PM, John Mendenhall <[EMAIL PROTECTED]> wrote:

> On Wed, 05 Mar 2008, Developer Developer wrote:
>
> > Hello John and Fellow coders,
> >
> > I there any resolution for this 50010 port connection error !! I am
> really
> > struggling to get the multiple node environment working. I belive I have
> > followed all the steps on the wiki. I am using nutch 0.9.
> > Thanks !
> >
> > On Fri, Jan 11, 2008 at 12:57 AM, John Mendenhall <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Hello,
> > >
> > > I am running nutch 0.9 currently.
> > > I am running on 4 nodes, one is the master, in
> > > addition to being a slave.
> > >
> > > I am running the nutch crawl command.
> > > Everything runs fine until it gets to the dedup
> > > command.  The output from the command is as follows:
> > >
> > > -----
> > > Dedup: starting
> > > Dedup: adding indexes in: /var/nutch/crawl/indexes
> > > Exception in thread "main" java.io.IOException: Job failed!
> > >        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java
> :604)
> > >        at org.apache.nutch.indexer.DeleteDuplicates.dedup(
> > > DeleteDuplicates.java:439)
> > >        at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)
> > > -----
> > >
> > > ...
> > >
> > > The hadoop.log file contains the following interesting entries:
> > > (I have filtered out the thousands of debug ipc calls and results.)
> > >
> > > -----
> > > 2008-01-10 18:28:18,233 INFO  indexer.DeleteDuplicates - Dedup:
> starting
> > > 2008-01-10 18:28:18,234 DEBUG conf.Configuration - java.io.IOException
> :
> > > config(config)
> > >        at org.apache.hadoop.conf.Configuration.<init>(
> Configuration.java
> > > :102)
> > >        at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:77)
> > >        at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:88)
> > >        at org.apache.nutch.util.NutchJob.<init>(NutchJob.java:27)
> > >        at org.apache.nutch.indexer.DeleteDuplicates.dedup(
> > > DeleteDuplicates.java:418)
> > >        at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)
> > >
> > > 2008-01-10 18:28:18,367 INFO  indexer.DeleteDuplicates - Dedup: adding
> > > indexes in: /var/nutch/crawl/indexes
> > > 2008-01-10 18:28:18,382 DEBUG mapred.JobClient - default FileSystem:
> > > hdfs://sunset2:50000
> > > 2008-01-10 18:28:21,672 INFO  mapred.InputFormatBase - Total input
> paths
> > > to process : 16
> > > 2008-01-10 18:28:21,674 DEBUG mapred.JobClient - Creating splits at
> > > hdfs://sunset2:50000/var/mapred/system/submit_qb31lw/job.split
> > > 2008-01-10 18:28:24,145 INFO  mapred.JobClient - Running job: job_0019
> > > 2008-01-10 18:28:25,156 INFO  mapred.JobClient -  map 0% reduce 0%
> > > 2008-01-10 18:28:33,267 DEBUG mapred.TaskTracker - Child starting
> > > 2008-01-10 18:28:33,304 DEBUG conf.Configuration - java.io.IOException
> :
> > > config()
> > >        at org.apache.hadoop.conf.Configuration.<init>(
> Configuration.java
> > > :93)
> > >        at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:58)
> > >        at org.apache.hadoop.mapred.TaskTracker$Child.main(
> TaskTracker.java
> > > :1425)
>
> The solution to the delete duplicates problem was the
> following link:
>
> http://www.mail-archive.com/[EMAIL PROTECTED]/msg06705.html
>
> JohnM
>
> --
> john mendenhall
> [EMAIL PROTECTED]
> surf utopia
> internet services
>

Re: nutch 0.9, multiple nodes, dedup error and Failed to transfer blk_-1407334809134504262

Reply via email to