I checked the DB, the URL is already in DB. The plugin property is configured like this: <property> <name>plugin.folders</name> <value>./src/plugin,./plugins</value> <description>Directories where nutch plugins are located. Each element may be a relative or absolute path. If absolute, it is used as is. If relative, it is searched for on the classpath.</description> </property>
I guess the plugin property is configured properly. Because when I change it to other value, it complains plugins could not be found. At 2013-07-21 13:48:33,"Lewis John Mcgibbney" <[email protected]> wrote: >Yes the warns which you've shown now are fine this is the old mapred API. >Its OK. >It's now stating that you've got 1 URL injected. Can you check the db? >either check contents or dump/read them with readdb tool? >Please remember that somewhere in the tutorial you reference the absolute >patch to plugins folder needs to be changed. This is your problem here. >InjectorJob doesn't require plugins to work... however when your indexing >plugins are called you are in trouble. You need to sort this out. > >On Saturday, July 20, 2013, Rui Gao <[email protected]> wrote: >> I am following this article http://wiki.apache.org/nutch/RunNutchInEclipse. >My environment is windows XP + cygwin + eclipse. >> I thinks the top several WARN logs are not the blocker. (The >plugin.folders contains an additional folder, after I remove it job still >fails.) We can compare it with the logs from InjectorJob which runs >successfully: >> >> 2013-07-21 12:45:01,968 INFO crawl.InjectorJob - InjectorJob: starting >at 2013-07-21 12:45:01 >> 2013-07-21 12:45:01,968 INFO crawl.InjectorJob - InjectorJob: Injecting >urlDir: urls/dev >> 2013-07-21 12:45:02,921 INFO crawl.InjectorJob - InjectorJob: Using >class org.apache.gora.sql.store.SqlStore as the Gora storage class. >> 2013-07-21 12:45:02,968 WARN util.NativeCodeLoader - Unable to load >native-hadoop library for your platform... using builtin-java classes where >applicable >> 2013-07-21 12:45:02,984 WARN mapred.JobClient - No job jar file set. > User classes may not be found. See JobConf(Class) or >JobConf#setJar(String). >> 2013-07-21 12:45:03,015 WARN snappy.LoadSnappy - Snappy native library >not loaded >> 2013-07-21 12:45:03,328 INFO mapreduce.GoraRecordWriter - >gora.buffer.write.limit = 10000 >> 2013-07-21 12:45:03,437 WARN plugin.PluginRepository - Plugins: >directory not found: ./plugins >> 2013-07-21 12:45:03,484 INFO regex.RegexURLNormalizer - can't find rules >for scope 'inject', using default >> 2013-07-21 12:45:03,625 WARN mapred.FileOutputCommitter - Output path is >null in cleanup >> 2013-07-21 12:45:04,218 INFO crawl.InjectorJob - InjectorJob: total >number of urls rejected by filters: 0 >> 2013-07-21 12:45:04,218 INFO crawl.InjectorJob - InjectorJob: total >number of urls injected after normalization and filtering: 1 >> 2013-07-21 12:45:04,218 INFO crawl.InjectorJob - Injector: finished at >2013-07-21 12:45:04, elapsed: 00:00:02 >> >> >> >> >> >> >> >> >> >> At 2013-07-21 12:36:29,"Lewis John Mcgibbney" <[email protected]> >wrote: >>>Please read the exception trace. You are running on Hadoop? You need to >>>ensure that your plugins.directory points to the right path. There is also >>>a mention of a missing job file. Please ensure that your nutch job file is >>>on the Hadoop jobtracker classpath. >>>hth >>> >>>On Saturday, July 20, 2013, Rui Gao <[email protected]> wrote: >>>> Hi Lewis, >>>> >>>> I tried to downgrade gora-core to 0.2.1. then, I could run InjectorJob >>>with both hsql and mysql. But the Crawler job still fail. here's the log: >>>> 2013-07-21 12:23:41,156 INFO crawl.InjectorJob - InjectorJob: Using >>>class org.apache.gora.sql.store.SqlStore as the Gora storage class. >>>> 2013-07-21 12:23:41,203 WARN util.NativeCodeLoader - Unable to load >>>native-hadoop library for your platform... using builtin-java classes >where >>>applicable >>>> 2013-07-21 12:23:41,234 WARN mapred.JobClient - No job jar file set. >>> User classes may not be found. See JobConf(Class) or >>>JobConf#setJar(String). >>>> 2013-07-21 12:23:41,265 WARN snappy.LoadSnappy - Snappy native library >>>not loaded >>>> 2013-07-21 12:23:41,578 INFO mapreduce.GoraRecordWriter - >>>gora.buffer.write.limit = 10000 >>>> 2013-07-21 12:23:41,718 WARN plugin.PluginRepository - Plugins: >>>directory not found: ./plugins >>>> 2013-07-21 12:23:41,765 INFO regex.RegexURLNormalizer - can't find >rules >>>for scope 'inject', using default >>>> 2013-07-21 12:23:41,937 WARN mapred.FileOutputCommitter - Output path >is >>>null in cleanup >>>> 2013-07-21 12:23:42,468 INFO crawl.InjectorJob - InjectorJob: total >>>number of urls rejected by filters: 0 >>>> 2013-07-21 12:23:42,468 INFO crawl.InjectorJob - InjectorJob: total >>>number of urls injected after normalization and filtering: 1 >>>> 2013-07-21 12:23:42,468 INFO crawl.FetchScheduleFactory - Using >>>FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule >>>> 2013-07-21 12:23:42,468 INFO crawl.AbstractFetchSchedule - >>>defaultInterval=2592000 >>>> 2013-07-21 12:23:42,468 INFO crawl.AbstractFetchSchedule - >>>maxInterval=7776000 >>>> 2013-07-21 12:23:42,593 WARN mapred.JobClient - No job jar file set. >>> User classes may not be found. See JobConf(Class) or >>>JobConf#setJar(String). >>>> 2013-07-21 12:23:42,796 INFO mapreduce.GoraRecordReader - >>>gora.buffer.read.limit = 10000 >>>> 2013-07-21 12:23:43,062 INFO crawl.FetchScheduleFactory - Using >>>FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule >>>> 2013-07-21 12:23:43,062 INFO crawl.AbstractFetchSchedule - >>>defaultInterval=2592000 >>>> 2013-07-21 12:23:43,062 INFO crawl.AbstractFetchSchedule - >>>maxInterval=7776000 >>>> 2013-07-21 12:23:43,093 INFO regex.RegexURLNormalizer - can't find >rules >>>for scope 'generate_host_count', using default >>>> 2013-07-21 12:23:43,234 INFO mapreduce.GoraRecordWriter - >>>gora.buffer.write.limit = 10000 >>>> 2013-07-21 12:23:43,250 WARN mapred.FileOutputCommitter - Output path >is >>>null in cleanup >>>> 2013-07-21 12:23:43,250 WARN mapred.LocalJobRunner - >>>job_local1378002997_0002 >>>> java.lang.NullPointerException >>>> at org.apache.avro.util.Utf8.<init>(Utf8.java:37) >>>> at >>>org.apache.nutch.crawl.GeneratorReducer.setup(GeneratorReducer.java:100) >>>> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174) >>>> at >>>org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649) >>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418) >>>> at >>>org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398) >>>> >>>> I don't know if this is the right direction I should continue with. But >>>any way, hopefully my experience could help others. >>>> >>>> >>>> Regards, >>>> Rui >>>> >>>> >>>> >>>> >>>> >>>> >>>> At 2013-07-20 23:07:41,"Rui Gao" <>*Lewis* >> > >-- >*Lewis*

