Yes the warns which you've shown now are fine this is the old mapred API.
Its OK.
It's now stating that you've got 1 URL injected. Can you check the db?
either check contents or dump/read them with readdb tool?
Please remember that somewhere in the tutorial you reference the absolute
patch to plugins folder needs to be changed. This is your problem here.
InjectorJob doesn't require plugins to work... however when your indexing
plugins are called you are in trouble. You need to sort this out.

On Saturday, July 20, 2013, Rui Gao <> wrote:
> I am following this article
My environment is windows XP + cygwin + eclipse.
> I thinks the top several WARN logs are not the blocker. (The
plugin.folders contains an additional folder, after I remove it job still
fails.) We can compare it with the logs from InjectorJob which runs
> 2013-07-21 12:45:01,968 INFO  crawl.InjectorJob - InjectorJob: starting
at 2013-07-21 12:45:01
> 2013-07-21 12:45:01,968 INFO  crawl.InjectorJob - InjectorJob: Injecting
urlDir: urls/dev
> 2013-07-21 12:45:02,921 INFO  crawl.InjectorJob - InjectorJob: Using
class as the Gora storage class.
> 2013-07-21 12:45:02,968 WARN  util.NativeCodeLoader - Unable to load
native-hadoop library for your platform... using builtin-java classes where
> 2013-07-21 12:45:02,984 WARN  mapred.JobClient - No job jar file set.
 User classes may not be found. See JobConf(Class) or
> 2013-07-21 12:45:03,015 WARN  snappy.LoadSnappy - Snappy native library
not loaded
> 2013-07-21 12:45:03,328 INFO  mapreduce.GoraRecordWriter -
gora.buffer.write.limit = 10000
> 2013-07-21 12:45:03,437 WARN  plugin.PluginRepository - Plugins:
directory not found: ./plugins
> 2013-07-21 12:45:03,484 INFO  regex.RegexURLNormalizer - can't find rules
for scope 'inject', using default
> 2013-07-21 12:45:03,625 WARN  mapred.FileOutputCommitter - Output path is
null in cleanup
> 2013-07-21 12:45:04,218 INFO  crawl.InjectorJob - InjectorJob: total
number of urls rejected by filters: 0
> 2013-07-21 12:45:04,218 INFO  crawl.InjectorJob - InjectorJob: total
number of urls injected after normalization and filtering: 1
> 2013-07-21 12:45:04,218 INFO  crawl.InjectorJob - Injector: finished at
2013-07-21 12:45:04, elapsed: 00:00:02
> At 2013-07-21 12:36:29,"Lewis John Mcgibbney" <>
>>Please read the exception trace. You are running on Hadoop? You need to
>>ensure that your points to the right path. There is also
>>a mention of a missing job file. Please ensure that your nutch job file is
>>on the Hadoop jobtracker classpath.
>>On Saturday, July 20, 2013, Rui Gao <> wrote:
>>> Hi Lewis,
>>> I tried to downgrade gora-core to 0.2.1. then, I could run InjectorJob
>>with both hsql and mysql. But the Crawler job still fail. here's the log:
>>> 2013-07-21 12:23:41,156 INFO  crawl.InjectorJob - InjectorJob: Using
>>class as the Gora storage class.
>>> 2013-07-21 12:23:41,203 WARN  util.NativeCodeLoader - Unable to load
>>native-hadoop library for your platform... using builtin-java classes
>>> 2013-07-21 12:23:41,234 WARN  mapred.JobClient - No job jar file set.
>> User classes may not be found. See JobConf(Class) or
>>> 2013-07-21 12:23:41,265 WARN  snappy.LoadSnappy - Snappy native library
>>not loaded
>>> 2013-07-21 12:23:41,578 INFO  mapreduce.GoraRecordWriter -
>>gora.buffer.write.limit = 10000
>>> 2013-07-21 12:23:41,718 WARN  plugin.PluginRepository - Plugins:
>>directory not found: ./plugins
>>> 2013-07-21 12:23:41,765 INFO  regex.RegexURLNormalizer - can't find
>>for scope 'inject', using default
>>> 2013-07-21 12:23:41,937 WARN  mapred.FileOutputCommitter - Output path
>>null in cleanup
>>> 2013-07-21 12:23:42,468 INFO  crawl.InjectorJob - InjectorJob: total
>>number of urls rejected by filters: 0
>>> 2013-07-21 12:23:42,468 INFO  crawl.InjectorJob - InjectorJob: total
>>number of urls injected after normalization and filtering: 1
>>> 2013-07-21 12:23:42,468 INFO  crawl.FetchScheduleFactory - Using
>>FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
>>> 2013-07-21 12:23:42,468 INFO  crawl.AbstractFetchSchedule -
>>> 2013-07-21 12:23:42,468 INFO  crawl.AbstractFetchSchedule -
>>> 2013-07-21 12:23:42,593 WARN  mapred.JobClient - No job jar file set.
>> User classes may not be found. See JobConf(Class) or
>>> 2013-07-21 12:23:42,796 INFO  mapreduce.GoraRecordReader -
>> = 10000
>>> 2013-07-21 12:23:43,062 INFO  crawl.FetchScheduleFactory - Using
>>FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
>>> 2013-07-21 12:23:43,062 INFO  crawl.AbstractFetchSchedule -
>>> 2013-07-21 12:23:43,062 INFO  crawl.AbstractFetchSchedule -
>>> 2013-07-21 12:23:43,093 INFO  regex.RegexURLNormalizer - can't find
>>for scope 'generate_host_count', using default
>>> 2013-07-21 12:23:43,234 INFO  mapreduce.GoraRecordWriter -
>>gora.buffer.write.limit = 10000
>>> 2013-07-21 12:23:43,250 WARN  mapred.FileOutputCommitter - Output path
>>null in cleanup
>>> 2013-07-21 12:23:43,250 WARN  mapred.LocalJobRunner -
>>> java.lang.NullPointerException
>>>     at org.apache.avro.util.Utf8.<init>(
>>>     at
>>>     at
>>>     at
>>>     at
>>>     at
>>> I don't know if this is the right direction I should continue with. But
>>any way, hopefully my experience could help others.
>>> Regards,
>>> Rui
>>> At 2013-07-20 23:07:41,"Rui Gao" <>*Lewis*


Reply via email to