OK, in nutch-default.xml the setting "ndfs.replication" changed to "1"
and warning is gone.
Next step in "quick tutorial for nutch 0.8 and later" is the command:
"bin/nutch org.apache.nutch.crawl.DmozParser netscape-content.rdf.u8 >
urls.txt"
Output is:
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/nutch/crawl/DmozParser
Yes, looks like DmozParser is not available.
How does an urls.txt file looks like to get started with crawling?
Is it just a simple text file with one URL per line?
Bernd
Stefan Groschupf schrieb:
This is just a warning and means you have less data nodes as numbers of
block duplicates configured.
Nutch tries to duplicate a block (backup) but does not find another
data node.
Change the configuration or add a data node.