Ah, i didn't see that. It makes sense anyway since there are CrawlDatum 
objects there too but i'm not sure they contain metadata fields there which 
was a suspect of the problem.

I've tried debugging the issue once but failed to solve it.

On Monday 19 December 2011 13:26:52 Marek Bachmann wrote:
> Am 19.12.2011 13:20, schrieb Markus Jelsma:
> > Are you sure this is NUTCH-1084 as you write about both readdb and
> > readseg but they are different. Does readseg throw the excpetion?
> > 
> > https://issues.apache.org/jira/browse/NUTCH-1084
> 
> Actually it is exactly the same exception. I commented the bug report
> already that it also occurs when using "readseg"
> 
> ReadSeg:
> nutch@hrz-pc318:/nutch/nutch14/runtime/deploy/bin$ ./nutch readseg -list
> uniall/segs/20111219111925
> 11/12/19 13:20:56 INFO util.NativeCodeLoader: Loaded the native-hadoop
> library
> 11/12/19 13:20:56 INFO zlib.ZlibFactory: Successfully loaded &
> initialized native-zlib library
> 11/12/19 13:20:56 INFO compress.CodecPool: Got brand-new decompressor
> Exception in thread "main" java.io.IOException: can't find class:
> org.apache.nutch.protocol.ProtocolStatus because
> org.apache.nutch.protocol.ProtocolStatus
>          at
> org.apache.hadoop.io.AbstractMapWritable.readFields(AbstractMapWritable.jav
> a:204) at
> org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:146)
>          at
> org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:278)
>          at
> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:
> 1751) at
> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1879)
>          at org.apache.hadoop.io.MapFile$Reader.next(MapFile.java:517)
>          at
> org.apache.nutch.segment.SegmentReader.getStats(SegmentReader.java:471)
>          at
> org.apache.nutch.segment.SegmentReader.list(SegmentReader.java:433)
>          at
> org.apache.nutch.segment.SegmentReader.main(SegmentReader.java:579)
>          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>          at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:3
> 9) at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImp
> l.java:25) at java.lang.reflect.Method.invoke(Method.java:597)
>          at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> 
> ReadDB:
> nutch@hrz-pc318:/nutch/nutch14/runtime/deploy/bin$ ./nutch readdb
> uniall/crawldb -url "http://www.uni-kassel.de/uni";
> 11/12/19 13:23:46 INFO util.NativeCodeLoader: Loaded the native-hadoop
> library
> 11/12/19 13:23:46 INFO zlib.ZlibFactory: Successfully loaded &
> initialized native-zlib library
> 
> 11/12/19 13:23:46 INFO compress.CodecPool: Got brand-new decompressor
> Exception in thread "main" java.io.IOException: can't find class:
> org.apache.nutch.protocol.ProtocolStatus because
> org.apache.nutch.protocol.ProtocolStatus
>          at
> org.apache.hadoop.io.AbstractMapWritable.readFields(AbstractMapWritable.jav
> a:204) at
> org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:146)
>          at
> org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:278)
>          at
> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:
> 1751) at org.apache.hadoop.io.MapFile$Reader.get(MapFile.java:524) at
> org.apache.hadoop.mapred.MapFileOutputFormat.getEntry(MapFileOutputFormat.j
> ava:105) at
> org.apache.nutch.crawl.CrawlDbReader.get(CrawlDbReader.java:383) at
> org.apache.nutch.crawl.CrawlDbReader.readUrl(CrawlDbReader.java:389)
>          at
> org.apache.nutch.crawl.CrawlDbReader.main(CrawlDbReader.java:514)
>          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>          at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:3
> 9) at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImp
> l.java:25) at java.lang.reflect.Method.invoke(Method.java:597)
>          at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> 
> > On Monday 19 December 2011 12:50:52 Marek Bachmann wrote:
> >> Hello,
> >> 
> >> I am still fighting which this Exception
> >> "java.io.IOException: can't find class:
> >> org.apache.nutch.protocol.ProtocolStatus because
> >> org.apache.nutch.protocol.ProtocolStatus"
> >> 
> >> when ever I try to run
> >> 
> >> *) readdb -url xzy
> >> *) readseg -list seg
> >> *) readseg -get -dir segs xyz
> >> 
> >> I know that is a known major bug. The only solution I am aware of is to
> >> copy the segments to a local dir. But this is really annoying me since
> >> it is very time-consuming.
> >> 
> >> Has anyone done an other workaround with this problem?
> >> 
> >> Thank you all in advance

-- 
Markus Jelsma - CTO - Openindex

Reply via email to