I forgot to mention the namenode log file gives me thousands of these: 060129 155553 Zero targets found, forbidden1.size=2allowSameHostTargets=false forbidden2.size()=0 060129 155553 Zero targets found, forbidden1.size=2allowSameHostTargets=false forbidden2.size()=0
Thanks, Mike On 1/29/06, Mike Smith <[EMAIL PROTECTED]> wrote: > > I do have the same problem and this problem is killing. I have tried all > sort of comfiguration and tricks. > > I have 3 machines, all three are datanodes and 1 is jobtracker. It > successfully fetches 300,000 pages, but when I try to fetch more than that > by injecting more number of pages at the first cycle it always crashes at > the end of the fetching reduce step: > > 060129 142220 reduce 95% > 060129 142347 reduce 96% > 060129 143401 reduce 100% > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.nutch.mapred.JobClient.runJob(JobClient.java :308) > at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:347) > at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:381) > > > > > This has happened at one of the tasktrackers: > > 060129 172145 task_r_ca2dxi 0.8677622% reduce > reduce > 060129 172146 task_r_ca2dxi 0.868171% reduce > reduce > 060129 173149 Task task_r_ca2dxi timed out. Killing. > 060129 173149 Server connection on port 50050 from 164.67.195.26: exiting > 060129 173149 task_r_ca2dxi Child Error > java.io.IOException: Task process exit with nonzero status. > at org.apache.nutch.mapred.TaskRunner.runChild (TaskRunner.java > :139) > at org.apache.nutch.mapred.TaskRunner.run(TaskRunner.java:92) > 060129 173153 task_m_bikodi done; removing files. > > > Any suggestion? > > Thanks, Mike > > > > > > > > On 1/29/06, Stefan Groschupf <[EMAIL PROTECTED]> wrote: > > > > Sounds like your tasktracker wasn't able to connect to your > > jobtracker and more. > > Are you sure the jobtracker still runs and the tasktracker can access > > the jobtracker box still under same hostname? > > > > Am 28.01.2006 um 21:21 schrieb Rafit Izhak_Ratzin: > > > > > Hi, > > > > > > I ran the mapreduce starting with 10 URL into the sixth cycle where > > > it fetched 400K pages and everything was fine. > > > > > > 060127 001055 TOTAL urls: 1877326 > > > 060127 001055 avg score: 1.099 > > > 060127 001055 max score: 1666.305 > > > 060127 001055 min score: 1.0 > > > 060127 001055 retry 0: 1865721 > > > 060127 001055 retry 1: 10887 > > > 060127 001055 retry 2: 621 > > > 060127 001055 retry 3: 92 > > > 060127 001055 retry 4: 4 > > > 060127 001055 retry 5: 1 > > > 060127 001055 status 1 (DB_unfetched): 1477634 > > > 060127 001055 status 2 (DB_fetched): 374736 > > > 060127 001055 status 3 (DB_gone): 24956 > > > > > > Then I tried another scenario starting with 80K urls, and the first > > > cycle was OK, but the second cycle where it supposed to fetch 800K > > > failed after 100% reduce. > > > I ran it with three machines 1 name node and 2 datanodes. > > > > > > One of my datanode has the next Exception: > > > 060128 083726 task_r_alfaaq Child Error > > > java.io.IOException : Task process exit with nonzero status. > > > at org.apache.nutch.mapred.TaskRunner.runChild > > > (TaskRunner.java:139) > > > at org.apache.nutch.mapred.TaskRunner.run(TaskRunner.java:92) > > > Which appeared more than once. > > > > > > The other Data Node had the next Exception: > > > 060128 142626 Lost connection to JobTracker [server.name/i.i.i.i: > > > 50020]. ex=java.lang.reflect.UndeclaredThrowableException Retrying... > > > > > > Any idea? > > > > > > Thanks, > > > Rafit > > > > > > _________________________________________________________________ > > > Express yourself instantly with MSN Messenger! Download today it's > > > FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ > > > > > > > > > > --------------------------------------------------------------- > > company: http://www.media-style.com > > forum: http://www.text-mining.org > > blog: http://www.find23.net > > > > > > > > >
