On 2010-10-28 12:30, Claudio Martella wrote: > Hello list, > > I have a hadoop cluster where I'd like to run nutch for crawling my > repositories. I'm currently running cloudera's hadoop > 0.20.2+737-1~lenny-cdh3b3 and nutch1.1 or nutch1.2. When I try to run: > > $ hadoop jar build/nutch-1.2.job org.apache.nutch.crawl.Crawl > /crawls/urls/ -depth 15 -dir /crawls/ -solr http://searchserver:8080/solr/
> about the x point ... URLNormalizer not found i read on some old > archives that it could be due to jar format discrepancies between > nutch's .job and what the hadoop cluster is expecting. Do you have any idea? This error indicates that plugin.folders or plugin.includes are set to incorrect values, so that Nutch plugins can't be found - could you please check the job.xml file that is created by JobTracker for this job (accessible via Hadoop web UI) and see what are the values of these properties? -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com

