Somehow the defualt configuration defined in nutch-default.xml is not taken into account when you run the crawler on hadoop.
Some of the few things you can do: 1) Configure the nutch-site.xml and provide the necessary configurations over there. 2) Also check the pluginIncludes properties, it should point to the correct plugins folder...(give the absolute path) And finally keep all the nutch configuration files either in $HADOOP_HOME/conf folder or provide the path in HADOOP_CLASSPATH variable in hadoop-env.sh file Regards, Som On Wed, Jul 18, 2012 at 7:43 AM, 许春玲 <[email protected]> wrote: > Hi, > > When I run crawler of nutch 2.0 as command: > hadoop jar /opt/nutch-2.0/runtime/deploy/apache-nutch-2.0.job > org.apache.nutch.crawl.Crawler urls -dir output00 -depth 3 -topN 5 -threads > 80 > > there is error info like: > > 12/07/18 09:13:32 INFO mapred.JobClient: Task Id : > attempt_201207101015_0091_m_000000_2, Status : FAILED > java.lang.RuntimeException: x point org.apache.nutch.net.URLNormalizer not > found. > > But the url regex in conf/regex-urlfilter.txt is correct: > > +^http://([a-z0-9]*\.)*apache.org > +^http://([a-z0-9]*\.)*sina.com.cn > > so, what should I do? > > Thks. > > Ring > > > >

