Thank you Andrzej, >This should not happen - are you sure > that your hadoop config files are >consistent across the cluster, especially > the FS related properties?
Definitely. The script I wrote to manage the crawling/indexing process performs a scp copy of the whole conf directory to all active slaves before start crawling. >When you start the job make sure that the > classpath that your command >uses pulls in the right hadoop config, one that > correctly defines the >filesystem. I suppose it does actually... I configured the filesystem-related parameters into conf/hadoop-site.xml. Most of the nutch subprocesses run fine on dfs. Only "invertlinks" and perhaps another (could be "parse" but now I'm not able to verify) run into troubles if they cannot find a local copy of the dfs data, so I can't figure out where to look for a missing configuration. >Also, check for the presence of > multiple copies of hadoop config files >on your classpath - be aware that some > of them may be inside jars, e.g. >in your job jar. I'm using a default configuration for hadoop and never messed up with any jar file... Just added the file hadoop-site.xml into nutch-1.0.job archive, still no luck. Same error: LinkDb: java.io.IOException: No input paths specified in job Not sure where to look for further configurations... Hints? S -- Best > regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ > _ __________________________________ [__ || __|__/|__||\/| > Information Retrieval, Semantic Web ___|||__|| \| || > | Embedded Unix, System Integration > target=_blank >http://www.sigram.com Contact: info at sigram dot > com

