Am 19.12.2011 13:20, schrieb Markus Jelsma:
Are you sure this is NUTCH-1084 as you write about both readdb and readseg but
they are different. Does readseg throw the excpetion?
https://issues.apache.org/jira/browse/NUTCH-1084
Actually it is exactly the same exception. I commented the bug report
already that it also occurs when using "readseg"
ReadSeg:
nutch@hrz-pc318:/nutch/nutch14/runtime/deploy/bin$ ./nutch readseg -list
uniall/segs/20111219111925
11/12/19 13:20:56 INFO util.NativeCodeLoader: Loaded the native-hadoop
library
11/12/19 13:20:56 INFO zlib.ZlibFactory: Successfully loaded &
initialized native-zlib library
11/12/19 13:20:56 INFO compress.CodecPool: Got brand-new decompressor
Exception in thread "main" java.io.IOException: can't find class:
org.apache.nutch.protocol.ProtocolStatus because
org.apache.nutch.protocol.ProtocolStatus
at
org.apache.hadoop.io.AbstractMapWritable.readFields(AbstractMapWritable.java:204)
at
org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:146)
at
org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:278)
at
org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1751)
at
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1879)
at org.apache.hadoop.io.MapFile$Reader.next(MapFile.java:517)
at
org.apache.nutch.segment.SegmentReader.getStats(SegmentReader.java:471)
at
org.apache.nutch.segment.SegmentReader.list(SegmentReader.java:433)
at
org.apache.nutch.segment.SegmentReader.main(SegmentReader.java:579)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
ReadDB:
nutch@hrz-pc318:/nutch/nutch14/runtime/deploy/bin$ ./nutch readdb
uniall/crawldb -url "http://www.uni-kassel.de/uni"
11/12/19 13:23:46 INFO util.NativeCodeLoader: Loaded the native-hadoop
library
11/12/19 13:23:46 INFO zlib.ZlibFactory: Successfully loaded &
initialized native-zlib library
11/12/19 13:23:46 INFO compress.CodecPool: Got brand-new decompressor
Exception in thread "main" java.io.IOException: can't find class:
org.apache.nutch.protocol.ProtocolStatus because
org.apache.nutch.protocol.ProtocolStatus
at
org.apache.hadoop.io.AbstractMapWritable.readFields(AbstractMapWritable.java:204)
at
org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:146)
at
org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:278)
at
org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1751)
at org.apache.hadoop.io.MapFile$Reader.get(MapFile.java:524)
at
org.apache.hadoop.mapred.MapFileOutputFormat.getEntry(MapFileOutputFormat.java:105)
at org.apache.nutch.crawl.CrawlDbReader.get(CrawlDbReader.java:383)
at
org.apache.nutch.crawl.CrawlDbReader.readUrl(CrawlDbReader.java:389)
at
org.apache.nutch.crawl.CrawlDbReader.main(CrawlDbReader.java:514)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
On Monday 19 December 2011 12:50:52 Marek Bachmann wrote:
Hello,
I am still fighting which this Exception
"java.io.IOException: can't find class:
org.apache.nutch.protocol.ProtocolStatus because
org.apache.nutch.protocol.ProtocolStatus"
when ever I try to run
*) readdb -url xzy
*) readseg -list seg
*) readseg -get -dir segs xyz
I know that is a known major bug. The only solution I am aware of is to
copy the segments to a local dir. But this is really annoying me since
it is very time-consuming.
Has anyone done an other workaround with this problem?
Thank you all in advance