Yes the warns which you've shown now are fine this is the old mapred API. Its OK. It's now stating that you've got 1 URL injected. Can you check the db? either check contents or dump/read them with readdb tool? Please remember that somewhere in the tutorial you reference the absolute patch to plugins folder needs to be changed. This is your problem here. InjectorJob doesn't require plugins to work... however when your indexing plugins are called you are in trouble. You need to sort this out.
On Saturday, July 20, 2013, Rui Gao <gaorui...@163.com> wrote: > I am following this article http://wiki.apache.org/nutch/RunNutchInEclipse. My environment is windows XP + cygwin + eclipse. > I thinks the top several WARN logs are not the blocker. (The plugin.folders contains an additional folder, after I remove it job still fails.) We can compare it with the logs from InjectorJob which runs successfully: > > 2013-07-21 12:45:01,968 INFO crawl.InjectorJob - InjectorJob: starting at 2013-07-21 12:45:01 > 2013-07-21 12:45:01,968 INFO crawl.InjectorJob - InjectorJob: Injecting urlDir: urls/dev > 2013-07-21 12:45:02,921 INFO crawl.InjectorJob - InjectorJob: Using class org.apache.gora.sql.store.SqlStore as the Gora storage class. > 2013-07-21 12:45:02,968 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable > 2013-07-21 12:45:02,984 WARN mapred.JobClient - No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). > 2013-07-21 12:45:03,015 WARN snappy.LoadSnappy - Snappy native library not loaded > 2013-07-21 12:45:03,328 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000 > 2013-07-21 12:45:03,437 WARN plugin.PluginRepository - Plugins: directory not found: ./plugins > 2013-07-21 12:45:03,484 INFO regex.RegexURLNormalizer - can't find rules for scope 'inject', using default > 2013-07-21 12:45:03,625 WARN mapred.FileOutputCommitter - Output path is null in cleanup > 2013-07-21 12:45:04,218 INFO crawl.InjectorJob - InjectorJob: total number of urls rejected by filters: 0 > 2013-07-21 12:45:04,218 INFO crawl.InjectorJob - InjectorJob: total number of urls injected after normalization and filtering: 1 > 2013-07-21 12:45:04,218 INFO crawl.InjectorJob - Injector: finished at 2013-07-21 12:45:04, elapsed: 00:00:02 > > > > > > > > > > At 2013-07-21 12:36:29,"Lewis John Mcgibbney" <lewis.mcgibb...@gmail.com> wrote: >>Please read the exception trace. You are running on Hadoop? You need to >>ensure that your plugins.directory points to the right path. There is also >>a mention of a missing job file. Please ensure that your nutch job file is >>on the Hadoop jobtracker classpath. >>hth >> >>On Saturday, July 20, 2013, Rui Gao <gaorui...@163.com> wrote: >>> Hi Lewis, >>> >>> I tried to downgrade gora-core to 0.2.1. then, I could run InjectorJob >>with both hsql and mysql. But the Crawler job still fail. here's the log: >>> 2013-07-21 12:23:41,156 INFO crawl.InjectorJob - InjectorJob: Using >>class org.apache.gora.sql.store.SqlStore as the Gora storage class. >>> 2013-07-21 12:23:41,203 WARN util.NativeCodeLoader - Unable to load >>native-hadoop library for your platform... using builtin-java classes where >>applicable >>> 2013-07-21 12:23:41,234 WARN mapred.JobClient - No job jar file set. >> User classes may not be found. See JobConf(Class) or >>JobConf#setJar(String). >>> 2013-07-21 12:23:41,265 WARN snappy.LoadSnappy - Snappy native library >>not loaded >>> 2013-07-21 12:23:41,578 INFO mapreduce.GoraRecordWriter - >>gora.buffer.write.limit = 10000 >>> 2013-07-21 12:23:41,718 WARN plugin.PluginRepository - Plugins: >>directory not found: ./plugins >>> 2013-07-21 12:23:41,765 INFO regex.RegexURLNormalizer - can't find rules >>for scope 'inject', using default >>> 2013-07-21 12:23:41,937 WARN mapred.FileOutputCommitter - Output path is >>null in cleanup >>> 2013-07-21 12:23:42,468 INFO crawl.InjectorJob - InjectorJob: total >>number of urls rejected by filters: 0 >>> 2013-07-21 12:23:42,468 INFO crawl.InjectorJob - InjectorJob: total >>number of urls injected after normalization and filtering: 1 >>> 2013-07-21 12:23:42,468 INFO crawl.FetchScheduleFactory - Using >>FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule >>> 2013-07-21 12:23:42,468 INFO crawl.AbstractFetchSchedule - >>defaultInterval=2592000 >>> 2013-07-21 12:23:42,468 INFO crawl.AbstractFetchSchedule - >>maxInterval=7776000 >>> 2013-07-21 12:23:42,593 WARN mapred.JobClient - No job jar file set. >> User classes may not be found. See JobConf(Class) or >>JobConf#setJar(String). >>> 2013-07-21 12:23:42,796 INFO mapreduce.GoraRecordReader - >>gora.buffer.read.limit = 10000 >>> 2013-07-21 12:23:43,062 INFO crawl.FetchScheduleFactory - Using >>FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule >>> 2013-07-21 12:23:43,062 INFO crawl.AbstractFetchSchedule - >>defaultInterval=2592000 >>> 2013-07-21 12:23:43,062 INFO crawl.AbstractFetchSchedule - >>maxInterval=7776000 >>> 2013-07-21 12:23:43,093 INFO regex.RegexURLNormalizer - can't find rules >>for scope 'generate_host_count', using default >>> 2013-07-21 12:23:43,234 INFO mapreduce.GoraRecordWriter - >>gora.buffer.write.limit = 10000 >>> 2013-07-21 12:23:43,250 WARN mapred.FileOutputCommitter - Output path is >>null in cleanup >>> 2013-07-21 12:23:43,250 WARN mapred.LocalJobRunner - >>job_local1378002997_0002 >>> java.lang.NullPointerException >>> at org.apache.avro.util.Utf8.<init>(Utf8.java:37) >>> at >>org.apache.nutch.crawl.GeneratorReducer.setup(GeneratorReducer.java:100) >>> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174) >>> at >>org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649) >>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418) >>> at >>org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398) >>> >>> I don't know if this is the right direction I should continue with. But >>any way, hopefully my experience could help others. >>> >>> >>> Regards, >>> Rui >>> >>> >>> >>> >>> >>> >>> At 2013-07-20 23:07:41,"Rui Gao" <>*Lewis* > -- *Lewis*