Am 19.12.2011 13:20, schrieb Markus Jelsma:
Are you sure this is NUTCH-1084 as you write about both readdb and readseg but
they are different. Does readseg throw the excpetion?

https://issues.apache.org/jira/browse/NUTCH-1084

Actually it is exactly the same exception. I commented the bug report already that it also occurs when using "readseg"

ReadSeg:
nutch@hrz-pc318:/nutch/nutch14/runtime/deploy/bin$ ./nutch readseg -list uniall/segs/20111219111925 11/12/19 13:20:56 INFO util.NativeCodeLoader: Loaded the native-hadoop library 11/12/19 13:20:56 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
11/12/19 13:20:56 INFO compress.CodecPool: Got brand-new decompressor
Exception in thread "main" java.io.IOException: can't find class: org.apache.nutch.protocol.ProtocolStatus because org.apache.nutch.protocol.ProtocolStatus at org.apache.hadoop.io.AbstractMapWritable.readFields(AbstractMapWritable.java:204) at org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:146) at org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:278) at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1751) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1879)
        at org.apache.hadoop.io.MapFile$Reader.next(MapFile.java:517)
at org.apache.nutch.segment.SegmentReader.getStats(SegmentReader.java:471) at org.apache.nutch.segment.SegmentReader.list(SegmentReader.java:433) at org.apache.nutch.segment.SegmentReader.main(SegmentReader.java:579)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

ReadDB:
nutch@hrz-pc318:/nutch/nutch14/runtime/deploy/bin$ ./nutch readdb uniall/crawldb -url "http://www.uni-kassel.de/uni"; 11/12/19 13:23:46 INFO util.NativeCodeLoader: Loaded the native-hadoop library 11/12/19 13:23:46 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library

11/12/19 13:23:46 INFO compress.CodecPool: Got brand-new decompressor
Exception in thread "main" java.io.IOException: can't find class: org.apache.nutch.protocol.ProtocolStatus because org.apache.nutch.protocol.ProtocolStatus at org.apache.hadoop.io.AbstractMapWritable.readFields(AbstractMapWritable.java:204) at org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:146) at org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:278) at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1751)
        at org.apache.hadoop.io.MapFile$Reader.get(MapFile.java:524)
at org.apache.hadoop.mapred.MapFileOutputFormat.getEntry(MapFileOutputFormat.java:105)
        at org.apache.nutch.crawl.CrawlDbReader.get(CrawlDbReader.java:383)
at org.apache.nutch.crawl.CrawlDbReader.readUrl(CrawlDbReader.java:389) at org.apache.nutch.crawl.CrawlDbReader.main(CrawlDbReader.java:514)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)




On Monday 19 December 2011 12:50:52 Marek Bachmann wrote:
Hello,

I am still fighting which this Exception
"java.io.IOException: can't find class:
org.apache.nutch.protocol.ProtocolStatus because
org.apache.nutch.protocol.ProtocolStatus"

when ever I try to run

*) readdb -url xzy
*) readseg -list seg
*) readseg -get -dir segs xyz

I know that is a known major bug. The only solution I am aware of is to
copy the segments to a local dir. But this is really annoying me since
it is very time-consuming.

Has anyone done an other workaround with this problem?

Thank you all in advance


Reply via email to