OK, in nutch-default.xml the setting "ndfs.replication" changed to "1"
and warning is gone.

Next step in "quick tutorial for nutch 0.8 and later" is the command:
"bin/nutch org.apache.nutch.crawl.DmozParser netscape-content.rdf.u8 > urls.txt"
Output is:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/nutch/crawl/DmozParser

Yes, looks like DmozParser is not available.

How does an urls.txt file looks like to get started with crawling?
Is it just a simple text file with one URL per line?

Bernd

Stefan Groschupf schrieb:
This is just a warning and means you have less data nodes as numbers of block duplicates configured. Nutch tries to duplicate a block (backup) but does not find another data node.
Change the configuration or add a data node.


Reply via email to