I am able to fix the problem of last email and go through the command of
whole-web site crawl from nutch-0.8.x tutorial.

But the resultant folder crawl is still very small, and the last search of
"apache", I got the "hit 0" message.  Something is still wrong.

Please give me some feedback.

Adam Shuy, President
ePacific Web Design & Hosting
Professional Web/Software developer
TEL: 408-272-6946
www.epacificweb.com
-----Original Message-----
From: Tsengtan A Shuy [mailto:[EMAIL PROTECTED] 
Sent: Saturday, July 14, 2007 12:11 PM
To: [EMAIL PROTECTED]
Subject: inject command fail on whole-web run

I am running the "bin/nutch inject crawl/crawldb dmoz" command on my ubuntu
OS by following the nutch-0.8.x tutorial. But I got the following error
message:

2007-07-14 11:38:35,238 WARN  mapred.LocalJobRunner
(LocalJobRunner.java:run(120)) - job_ij0atx
java.lang.NoClassDefFoundError: dk/brics/automaton/RunAutomaton
        at
org.apache.nutch.urlfilter.automaton.AutomatonURLFilter$Rule.<init>(Automato
nURLFilter.java:89)
        at
org.apache.nutch.urlfilter.automaton.AutomatonURLFilter.createRule(Automaton
URLFilter.java:70)
        at
org.apache.nutch.urlfilter.api.RegexURLFilterBase.readRulesFile(RegexURLFilt
erBase.java:191)
        at
org.apache.nutch.urlfilter.api.RegexURLFilterBase.setConf(RegexURLFilterBase
.java:140)
        at
org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:153)
        at org.apache.nutch.net.URLFilters.<init>(URLFilters.java:53)
        at
org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:56)
        at org.apache.hadoop.mapred.JobConf.newInstance(JobConf.java:443)
        at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:33)
        at org.apache.hadoop.mapred.JobConf.newInstance(JobConf.java:443)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:125)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:91)
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:138)
        at org.apache.nutch.crawl.Injector.main(Injector.java:164)
[EMAIL PROTECTED]:~/nutch-0.8.1$ 
What is wrong in my ubuntu environment?
Please help!!

Adam Shuy, President
ePacific Web Design & Hosting
Professional Web/Software developer
TEL: 408-272-6946
www.epacificweb.com


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to