may the hdds are full?
try:
bin/nutch ndfs -report
Nutch generates some temporarily data until processing.

Am 30.01.2006 um 00:54 schrieb Mike Smith:

I forgot to mention the namenode log file gives me thousands of these:

060129 155553 Zero targets found, forbidden1.size=2allowSameHostTargets=false
forbidden2.size()=0
060129 155553 Zero targets found, forbidden1.size=2allowSameHostTargets=false
forbidden2.size()=0

Thanks, Mike


On 1/29/06, Mike Smith <[EMAIL PROTECTED]> wrote:

I do have the same problem and this problem is killing. I have tried all
sort of comfiguration and tricks.

I have 3 machines, all three are datanodes and 1 is jobtracker. It
successfully fetches 300,000 pages, but when I try to fetch more than that by injecting more number of pages at the first cycle it always crashes at
the end of the fetching reduce step:

060129 142220  reduce 95%
060129 142347  reduce 96%
060129 143401  reduce 100%
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.nutch.mapred.JobClient.runJob (JobClient.java :308)
        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:347)
        at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:381)




This has happened at one of the tasktrackers:

060129 172145 task_r_ca2dxi 0.8677622% reduce > reduce
060129 172146 task_r_ca2dxi 0.868171% reduce > reduce
060129 173149 Task task_r_ca2dxi timed out.  Killing.
060129 173149 Server connection on port 50050 from 164.67.195.26: exiting
060129 173149 task_r_ca2dxi Child Error
java.io.IOException: Task process exit with nonzero status.
at org.apache.nutch.mapred.TaskRunner.runChild (TaskRunner.java
:139)
        at org.apache.nutch.mapred.TaskRunner.run(TaskRunner.java:92)
060129 173153 task_m_bikodi done; removing files.


Any suggestion?

Thanks, Mike







On 1/29/06, Stefan Groschupf <[EMAIL PROTECTED]> wrote:

Sounds like your tasktracker wasn't able to connect to your
jobtracker and more.
Are you sure the jobtracker still runs and the tasktracker can access
the jobtracker box still under same hostname?

Am 28.01.2006 um 21:21 schrieb Rafit Izhak_Ratzin:

Hi,

I ran the mapreduce starting with 10 URL into the sixth cycle where
it fetched 400K pages and everything was fine.

060127 001055 TOTAL urls:       1877326
060127 001055 avg score:        1.099
060127 001055 max score:        1666.305
060127 001055 min score:        1.0
060127 001055 retry 0:  1865721
060127 001055 retry 1:  10887
060127 001055 retry 2:  621
060127 001055 retry 3:  92
060127 001055 retry 4:  4
060127 001055 retry 5:  1
060127 001055 status 1 (DB_unfetched):  1477634
060127 001055 status 2 (DB_fetched):    374736
060127 001055 status 3 (DB_gone):       24956

Then I tried another scenario starting with 80K urls, and the first
cycle was OK, but the second cycle where it supposed to fetch 800K
failed after 100% reduce.
I ran it with three machines 1 name node and 2 datanodes.

One of my datanode has the next Exception:
060128 083726 task_r_alfaaq Child Error
java.io.IOException : Task process exit with nonzero status.
       at org.apache.nutch.mapred.TaskRunner.runChild
(TaskRunner.java:139)
at org.apache.nutch.mapred.TaskRunner.run(TaskRunner.java: 92)
Which appeared more than once.

The other Data Node had the next Exception:
060128 142626 Lost connection to JobTracker [server.name/i.i.i.i:
50020]. ex=java.lang.reflect.UndeclaredThrowableException Retrying...

Any idea?

Thanks,
Rafit

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's
FREE! http://messenger.msn.click-url.com/go/onm00200471ave/ direct/01/



---------------------------------------------------------------
company:        http://www.media-style.com
forum:        http://www.text-mining.org
blog:             http://www.find23.net






---------------------------------------------------------------
company:        http://www.media-style.com
forum:        http://www.text-mining.org
blog:            http://www.find23.net


Reply via email to