I do have the same problem and this problem is killing. I have tried all
sort of comfiguration and tricks.

I have 3 machines, all three are datanodes and 1 is jobtracker. It
successfully fetches 300,000 pages, but when I try to fetch more than that
by injecting more number of pages at the first cycle it always crashes at
the end of the fetching reduce step:

060129 142220  reduce 95%
060129 142347  reduce 96%
060129 143401  reduce 100%
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:308)
        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:347)
        at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:381)




This has happened at one of the tasktrackers:

060129 172145 task_r_ca2dxi 0.8677622% reduce > reduce
060129 172146 task_r_ca2dxi 0.868171% reduce > reduce
060129 173149 Task task_r_ca2dxi timed out.  Killing.
060129 173149 Server connection on port 50050 from 164.67.195.26: exiting
060129 173149 task_r_ca2dxi Child Error
java.io.IOException: Task process exit with nonzero status.
        at org.apache.nutch.mapred.TaskRunner.runChild(TaskRunner.java:139)
        at org.apache.nutch.mapred.TaskRunner.run(TaskRunner.java:92)
060129 173153 task_m_bikodi done; removing files.


Any suggestion?

Thanks, Mike







On 1/29/06, Stefan Groschupf <[EMAIL PROTECTED]> wrote:
>
> Sounds like your tasktracker wasn't able to connect to your
> jobtracker and more.
> Are you sure the jobtracker still runs and the tasktracker can access
> the jobtracker box still under same hostname?
>
> Am 28.01.2006 um 21:21 schrieb Rafit Izhak_Ratzin:
>
> > Hi,
> >
> > I ran the mapreduce starting with 10 URL into the sixth cycle where
> > it fetched 400K pages and everything was fine.
> >
> > 060127 001055 TOTAL urls:       1877326
> > 060127 001055 avg score:        1.099
> > 060127 001055 max score:        1666.305
> > 060127 001055 min score:        1.0
> > 060127 001055 retry 0:  1865721
> > 060127 001055 retry 1:  10887
> > 060127 001055 retry 2:  621
> > 060127 001055 retry 3:  92
> > 060127 001055 retry 4:  4
> > 060127 001055 retry 5:  1
> > 060127 001055 status 1 (DB_unfetched):  1477634
> > 060127 001055 status 2 (DB_fetched):    374736
> > 060127 001055 status 3 (DB_gone):       24956
> >
> > Then I tried another scenario starting with 80K urls, and the first
> > cycle was OK, but the second cycle where it supposed to fetch 800K
> > failed after 100% reduce.
> > I ran it with three machines 1 name node and 2 datanodes.
> >
> > One of my datanode has the next Exception:
> > 060128 083726 task_r_alfaaq Child Error
> > java.io.IOException: Task process exit with nonzero status.
> >        at org.apache.nutch.mapred.TaskRunner.runChild
> > (TaskRunner.java:139)
> >        at org.apache.nutch.mapred.TaskRunner.run(TaskRunner.java:92)
> > Which appeared more than once.
> >
> > The other Data Node had the next Exception:
> > 060128 142626 Lost connection to JobTracker [server.name/i.i.i.i:
> > 50020]. ex=java.lang.reflect.UndeclaredThrowableException  Retrying...
> >
> > Any idea?
> >
> > Thanks,
> > Rafit
> >
> > _________________________________________________________________
> > Express yourself instantly with MSN Messenger! Download today it's
> > FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
> >
> >
>
> ---------------------------------------------------------------
> company:        http://www.media-style.com
> forum:        http://www.text-mining.org
> blog:            http://www.find23.net
>
>
>
>

Reply via email to