I want to find out what the crawldb knows about some specific urls. According 
to the nutch wiki, I should use nutch readdb with the -url option. But when I 
do a command like the following, I get nasty "can't find class" exceptions.


$NUTCH_HOME/runtime/deploy/bin/nutch readdb /crawls/popular/data/crawldb -url 
http://fabulous.com/

The error message isException in thread "main" java.io.IOException: can't find 
class: org.apache.nutch.protocol.ProtocolStatus because 
org.apache.nutch.protocol.ProtocolStatus
        at 
org.apache.hadoop.io.AbstractMapWritable.readFields(AbstractMapWritable.java:212)
        at org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:167)
        at org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:317)
        at 
org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2256)
        at org.apache.hadoop.io.MapFile$Reader.get(MapFile.java:680)
        at 
org.apache.hadoop.mapred.MapFileOutputFormat.getEntry(MapFileOutputFormat.java:99)
        at org.apache.nutch.crawl.CrawlDbReader.get(CrawlDbReader.java:465)
        at org.apache.nutch.crawl.CrawlDbReader.readUrl(CrawlDbReader.java:472)
        at org.apache.nutch.crawl.CrawlDbReader.run(CrawlDbReader.java:717)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.nutch.crawl.CrawlDbReader.main(CrawlDbReader.java:736)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)



The above message occurs for any url that is actually in the crawldb. If I 
specify a url that does not exist, I get a more understandable message. Also, 
nutch readdb -stats works reliably.
How can we make this work?

Reply via email to