Hi ,

I am getting weird error on DBUpdater Job in Nutch2.x.
I am crawling these two links

http://www.amazon.com/Degree-Antiperspirant-Deodorant-Extreme-Blast/dp/B001ET769Y
http://www.amazon.com/Cisco-WAP4410N-Wireless-N-Access-Point/dp/B001IYCMNA

And my all jobs are running fine , when I run my dpupdate job I get this
error

Exception in thread "main" java.lang.RuntimeException: job failed:
name=update-table, jobid=job_local482736560_0001
    at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
    at org.apache.nutch.crawl.DbUpdaterJob.run(DbUpdaterJob.java:98)
    at
org.apache.nutch.crawl.DbUpdaterJob.updateTable(DbUpdaterJob.java:105)
    at org.apache.nutch.crawl.DbUpdaterJob.run(DbUpdaterJob.java:119)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.nutch.crawl.DbUpdaterJob.main(DbUpdaterJob.java:123)

And hadoop log file says

2013-06-17 21:51:41,478 WARN  mapred.FileOutputCommitter - Output path is
null in cleanup
2013-06-17 21:51:41,479 WARN  mapred.LocalJobRunner -
job_local384125843_0001
java.lang.IndexOutOfBoundsException
    at java.nio.Buffer.checkBounds(Buffer.java:559)
    at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:143)
    at
org.apache.avro.ipc.ByteBufferInputStream.read(ByteBufferInputStream.java:52)
    at
org.apache.avro.io.DirectBinaryDecoder.doReadBytes(DirectBinaryDecoder.java:183)
    at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:265)
    at
org.apache.gora.mapreduce.FakeResolvingDecoder.readString(FakeResolvingDecoder.java:131)

And if I crawl simple page like www.google.nl .. every thing works fine ,
including dbupdate job !!!

Any clues how to debug this issues ? what could be the reason for this ?

Thanks.
Tony.

Reply via email to