Ah, i didn't see that. It makes sense anyway since there are CrawlDatum objects there too but i'm not sure they contain metadata fields there which was a suspect of the problem.
I've tried debugging the issue once but failed to solve it. On Monday 19 December 2011 13:26:52 Marek Bachmann wrote: > Am 19.12.2011 13:20, schrieb Markus Jelsma: > > Are you sure this is NUTCH-1084 as you write about both readdb and > > readseg but they are different. Does readseg throw the excpetion? > > > > https://issues.apache.org/jira/browse/NUTCH-1084 > > Actually it is exactly the same exception. I commented the bug report > already that it also occurs when using "readseg" > > ReadSeg: > nutch@hrz-pc318:/nutch/nutch14/runtime/deploy/bin$ ./nutch readseg -list > uniall/segs/20111219111925 > 11/12/19 13:20:56 INFO util.NativeCodeLoader: Loaded the native-hadoop > library > 11/12/19 13:20:56 INFO zlib.ZlibFactory: Successfully loaded & > initialized native-zlib library > 11/12/19 13:20:56 INFO compress.CodecPool: Got brand-new decompressor > Exception in thread "main" java.io.IOException: can't find class: > org.apache.nutch.protocol.ProtocolStatus because > org.apache.nutch.protocol.ProtocolStatus > at > org.apache.hadoop.io.AbstractMapWritable.readFields(AbstractMapWritable.jav > a:204) at > org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:146) > at > org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:278) > at > org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java: > 1751) at > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1879) > at org.apache.hadoop.io.MapFile$Reader.next(MapFile.java:517) > at > org.apache.nutch.segment.SegmentReader.getStats(SegmentReader.java:471) > at > org.apache.nutch.segment.SegmentReader.list(SegmentReader.java:433) > at > org.apache.nutch.segment.SegmentReader.main(SegmentReader.java:579) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:3 > 9) at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImp > l.java:25) at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > ReadDB: > nutch@hrz-pc318:/nutch/nutch14/runtime/deploy/bin$ ./nutch readdb > uniall/crawldb -url "http://www.uni-kassel.de/uni" > 11/12/19 13:23:46 INFO util.NativeCodeLoader: Loaded the native-hadoop > library > 11/12/19 13:23:46 INFO zlib.ZlibFactory: Successfully loaded & > initialized native-zlib library > > 11/12/19 13:23:46 INFO compress.CodecPool: Got brand-new decompressor > Exception in thread "main" java.io.IOException: can't find class: > org.apache.nutch.protocol.ProtocolStatus because > org.apache.nutch.protocol.ProtocolStatus > at > org.apache.hadoop.io.AbstractMapWritable.readFields(AbstractMapWritable.jav > a:204) at > org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:146) > at > org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:278) > at > org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java: > 1751) at org.apache.hadoop.io.MapFile$Reader.get(MapFile.java:524) at > org.apache.hadoop.mapred.MapFileOutputFormat.getEntry(MapFileOutputFormat.j > ava:105) at > org.apache.nutch.crawl.CrawlDbReader.get(CrawlDbReader.java:383) at > org.apache.nutch.crawl.CrawlDbReader.readUrl(CrawlDbReader.java:389) > at > org.apache.nutch.crawl.CrawlDbReader.main(CrawlDbReader.java:514) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:3 > 9) at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImp > l.java:25) at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > > On Monday 19 December 2011 12:50:52 Marek Bachmann wrote: > >> Hello, > >> > >> I am still fighting which this Exception > >> "java.io.IOException: can't find class: > >> org.apache.nutch.protocol.ProtocolStatus because > >> org.apache.nutch.protocol.ProtocolStatus" > >> > >> when ever I try to run > >> > >> *) readdb -url xzy > >> *) readseg -list seg > >> *) readseg -get -dir segs xyz > >> > >> I know that is a known major bug. The only solution I am aware of is to > >> copy the segments to a local dir. But this is really annoying me since > >> it is very time-consuming. > >> > >> Has anyone done an other workaround with this problem? > >> > >> Thank you all in advance -- Markus Jelsma - CTO - Openindex

