I was getting the following error at the command line...

java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:138)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:105)
Exception in thread "main"

I looked in the hadoop.log and found...

java.lang.RuntimeException: Invalid first character:
        at
org.apache.nutch.urlfilter.api.RegexURLFilterBase.setConf(RegexURLFilterBase
.java:144)
        at
org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:153)
        at org.apache.nutch.net.URLFilters.<init>(URLFilters.java:52)
        at 
org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:56)
        at org.apache.hadoop.mapred.JobConf.newInstance(JobConf.java:443)
        at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:33)
        at org.apache.hadoop.mapred.JobConf.newInstance(JobConf.java:443)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:125)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:90)
Caused by: java.io.IOException: Invalid first character:
        at
org.apache.nutch.urlfilter.api.RegexURLFilterBase.readRulesFile(RegexURLFilt
erBase.java:186)
        at
org.apache.nutch.urlfilter.api.RegexURLFilterBase.setConf(RegexURLFilterBase
.java:140)
        ... 8 more

I checked out my crawl-urlfilter.txt file and found that I had lines with
only spaces on them.

Once I removed the extraneous spaces, I did not receive the error anymore.

Sincerely,
Fred

><><><><><><><><><><><><><><><><><><
   Fred Tyre
   Information Services
   Heartland Communications, Inc.
   515-574-2147
   [EMAIL PROTECTED]
><><><><><><><><><><><><><><><><><><




-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to