Hi Hemant,

Clearly that ":java.lang.NullPointerExcep" portion of URLs is bogus.  Maybe 
there are more telling details in your logs.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


----- Original Message ----
> From: Hemant Bist <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Saturday, June 14, 2008 2:03:14 AM
> Subject: Re: problem running nutch from eclipse 3.2 in ubuntu hardy.
> 
> Hi Otis,
> Thanks for the reply.
> 
> I should have mentioned that as well. I have a seed file in urls directory
> that contains two urls.  I have tried  running the nutch tutorial on single
> and muliple machines(using trunk code) and its working fine for me.
> Following portion from Hadoop.log shows that nutch is at least picking up
> the urls.
> 
> 2008-06-13 22:29:35,100 WARN  regex.RegexURLNormalizer - can't find rules
> for scope 'inject', using default
> 2008-06-13 22:29:35,101 WARN  crawl.Injector - Skipping
> http://lucene.apache.org/:java.lang.NullPointerExcep
> tion
> 2008-06-13 22:29:35,101 WARN  crawl.Injector - Skipping
> http://shopping.yahoo.com/:java.lang.NullPointerExce
> ption
> 
> HB
> 
> On Fri, Jun 13, 2008 at 10:55 PM, Otis Gospodnetic 
> wrote:
> 
> > Hi,
> >
> > You didn't mention URL injection, which makes me think you didn't inject
> > any seed URLs to crawl.  I also suggest figuring out how to run Nutch
> > "normally", "from the command-line", before introducing additional variables
> > and complexities, such as running Nutch from an IDE.
> >
> >  Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> >
> >
> > ----- Original Message ----
> > From: Hemant Bist 
> > To: [email protected]
> > Sent: Saturday, June 14, 2008 1:47:21 AM
> > Subject: problem running nutch from eclipse 3.2 in ubuntu hardy.
> >
> > Hi,
> > I am trying to build and run nutch  from trunk in eclipse 3.2 in Ubuntu
> > hardy. I am unable to get it to crawlany site after compiling it.  As far as
> > I can tell, there is something wrong in my configuration but I can't figure
> > out what it is!
> >
> > I am following [http://wiki.apache.org/nutch/RunNutchInEclipse0.9]
> > and have included conf in .classpath. and modified nutch-defaults.xml for
> > plugin.folders and http.agent.name
> >
> >
> > I get the final warning message as [complete hadoop.log is attached]
> > WARN  crawl.Crawl - No URLs to fetch - check your seed list and URL
> > filters.
> > and
> > some of the earlier warning messages are
> >  WARN  mapred.JobClient - No job jar file set.  User classes may not be
> > found. See JobConf(Class) or JobConf#setJar(String).
> > 2008-06-13 22:29:34,978 WARN  regex.RegexURLNormalizer - Can't load the
> > default config file! /nutch/home/work/nutch/trunk/conf/regex-normalize.xml
> > 2008-06-13 22:29:34,990 WARN  suffix.SuffixURLFilter - Missing
> > urlfilter.suffix.file, all URLs will be rejected!
> > 2008-06-13 22:29:34,994 FATAL api.RegexURLFilterBase - Can't find resource:
> > crawl-urlfilter.txt
> > 2008-06-13 22:29:34,995 FATAL api.RegexURLFilterBase - Can't find resource:
> > automaton-urlfilte r.txt
> >
> >
> >
> > I would appreciate any pointers in debugging this.
> >
> > Thanks,
> > HB
> >

Reply via email to