Hi Hemant, Clearly that ":java.lang.NullPointerExcep" portion of URLs is bogus. Maybe there are more telling details in your logs.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Hemant Bist <[EMAIL PROTECTED]> > To: [email protected] > Sent: Saturday, June 14, 2008 2:03:14 AM > Subject: Re: problem running nutch from eclipse 3.2 in ubuntu hardy. > > Hi Otis, > Thanks for the reply. > > I should have mentioned that as well. I have a seed file in urls directory > that contains two urls. I have tried running the nutch tutorial on single > and muliple machines(using trunk code) and its working fine for me. > Following portion from Hadoop.log shows that nutch is at least picking up > the urls. > > 2008-06-13 22:29:35,100 WARN regex.RegexURLNormalizer - can't find rules > for scope 'inject', using default > 2008-06-13 22:29:35,101 WARN crawl.Injector - Skipping > http://lucene.apache.org/:java.lang.NullPointerExcep > tion > 2008-06-13 22:29:35,101 WARN crawl.Injector - Skipping > http://shopping.yahoo.com/:java.lang.NullPointerExce > ption > > HB > > On Fri, Jun 13, 2008 at 10:55 PM, Otis Gospodnetic > wrote: > > > Hi, > > > > You didn't mention URL injection, which makes me think you didn't inject > > any seed URLs to crawl. I also suggest figuring out how to run Nutch > > "normally", "from the command-line", before introducing additional variables > > and complexities, such as running Nutch from an IDE. > > > > Otis > > -- > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > > > ----- Original Message ---- > > From: Hemant Bist > > To: [email protected] > > Sent: Saturday, June 14, 2008 1:47:21 AM > > Subject: problem running nutch from eclipse 3.2 in ubuntu hardy. > > > > Hi, > > I am trying to build and run nutch from trunk in eclipse 3.2 in Ubuntu > > hardy. I am unable to get it to crawlany site after compiling it. As far as > > I can tell, there is something wrong in my configuration but I can't figure > > out what it is! > > > > I am following [http://wiki.apache.org/nutch/RunNutchInEclipse0.9] > > and have included conf in .classpath. and modified nutch-defaults.xml for > > plugin.folders and http.agent.name > > > > > > I get the final warning message as [complete hadoop.log is attached] > > WARN crawl.Crawl - No URLs to fetch - check your seed list and URL > > filters. > > and > > some of the earlier warning messages are > > WARN mapred.JobClient - No job jar file set. User classes may not be > > found. See JobConf(Class) or JobConf#setJar(String). > > 2008-06-13 22:29:34,978 WARN regex.RegexURLNormalizer - Can't load the > > default config file! /nutch/home/work/nutch/trunk/conf/regex-normalize.xml > > 2008-06-13 22:29:34,990 WARN suffix.SuffixURLFilter - Missing > > urlfilter.suffix.file, all URLs will be rejected! > > 2008-06-13 22:29:34,994 FATAL api.RegexURLFilterBase - Can't find resource: > > crawl-urlfilter.txt > > 2008-06-13 22:29:34,995 FATAL api.RegexURLFilterBase - Can't find resource: > > automaton-urlfilte r.txt > > > > > > > > I would appreciate any pointers in debugging this. > > > > Thanks, > > HB > >
