Please read the exception trace. You are running on Hadoop? You need to ensure that your plugins.directory points to the right path. There is also a mention of a missing job file. Please ensure that your nutch job file is on the Hadoop jobtracker classpath. hth
On Saturday, July 20, 2013, Rui Gao <[email protected]> wrote: > Hi Lewis, > > I tried to downgrade gora-core to 0.2.1. then, I could run InjectorJob with both hsql and mysql. But the Crawler job still fail. here's the log: > 2013-07-21 12:23:41,156 INFO crawl.InjectorJob - InjectorJob: Using class org.apache.gora.sql.store.SqlStore as the Gora storage class. > 2013-07-21 12:23:41,203 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable > 2013-07-21 12:23:41,234 WARN mapred.JobClient - No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). > 2013-07-21 12:23:41,265 WARN snappy.LoadSnappy - Snappy native library not loaded > 2013-07-21 12:23:41,578 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000 > 2013-07-21 12:23:41,718 WARN plugin.PluginRepository - Plugins: directory not found: ./plugins > 2013-07-21 12:23:41,765 INFO regex.RegexURLNormalizer - can't find rules for scope 'inject', using default > 2013-07-21 12:23:41,937 WARN mapred.FileOutputCommitter - Output path is null in cleanup > 2013-07-21 12:23:42,468 INFO crawl.InjectorJob - InjectorJob: total number of urls rejected by filters: 0 > 2013-07-21 12:23:42,468 INFO crawl.InjectorJob - InjectorJob: total number of urls injected after normalization and filtering: 1 > 2013-07-21 12:23:42,468 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule > 2013-07-21 12:23:42,468 INFO crawl.AbstractFetchSchedule - defaultInterval=2592000 > 2013-07-21 12:23:42,468 INFO crawl.AbstractFetchSchedule - maxInterval=7776000 > 2013-07-21 12:23:42,593 WARN mapred.JobClient - No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). > 2013-07-21 12:23:42,796 INFO mapreduce.GoraRecordReader - gora.buffer.read.limit = 10000 > 2013-07-21 12:23:43,062 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule > 2013-07-21 12:23:43,062 INFO crawl.AbstractFetchSchedule - defaultInterval=2592000 > 2013-07-21 12:23:43,062 INFO crawl.AbstractFetchSchedule - maxInterval=7776000 > 2013-07-21 12:23:43,093 INFO regex.RegexURLNormalizer - can't find rules for scope 'generate_host_count', using default > 2013-07-21 12:23:43,234 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000 > 2013-07-21 12:23:43,250 WARN mapred.FileOutputCommitter - Output path is null in cleanup > 2013-07-21 12:23:43,250 WARN mapred.LocalJobRunner - job_local1378002997_0002 > java.lang.NullPointerException > at org.apache.avro.util.Utf8.<init>(Utf8.java:37) > at org.apache.nutch.crawl.GeneratorReducer.setup(GeneratorReducer.java:100) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174) > at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418) > at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398) > > I don't know if this is the right direction I should continue with. But any way, hopefully my experience could help others. > > > Regards, > Rui > > > > > > > At 2013-07-20 23:07:41,"Rui Gao" <[email protected]> wrote: >>Hi Lewis, >> >>Thanks for your answer. >>So, what direction will Nutch go? Will it co-operate with relationship database or will it only work on non-relationship database like hbase? >>I remember when 2.2.1 has been released, I checked the release note, it says some bugs related with mysql has been fixed. That's why I try to integrate it with mysql or hsql. And also, in the wiki, there's a link talking about how to integrate nutch with mysql: http://nlp.solutions.asia/?p=362 >> >>Do you have any suggestion? >> >>Thanks. >> >>Best Regards, >>Rui >> >> >> >> >> >> >> >>At 2013-07-11 03:53:12,"Lewis John Mcgibbney" <[email protected]> wrote: >>>Hi Rui, >>>This should not work. >>>The SqlStore module and support for it is now deprecated within Apache Gora. >>>If you would like to downgrade to use Nutch 2.1, then you can use older >>>Gora artifacts but this is not recommended. >>>Thanks >>>Lewis >>> >>> >>>On Sun, Jul 7, 2013 at 12:36 AM, Rui Gao <[email protected]> wrote: >>> >>>> Hello, >>>> >>>> I have set up eclipse environment according to the WIKI. Here's some >>>> something I did before I run the inject job: >>>> 1. I use SqlStore as storage class >>>> 2. I started HSql database which contains the table 'webpage'. >>>> 3. I added 1 URL in seed.txt. >>>> Then I run the inject job. It seems the job is finished successfully. But >>>> I there's no change be made to my HSql database. Any thought about this? >>>> Here's the log: >>>> InjectorJob: starting at 2013-07-07 15:28:42 >>>> InjectorJob: Injecting urlDir: urls/dev >>>> InjectorJob: Using class org.apache.gora.sql.store.SqlStore as the Gora >>>> storage class. >>>> InjectorJob: total number of urls rejected by filters: 0 >>>> InjectorJob: total number of urls injected after normalization and >>>> filtering: 1 >>>> Injector: finished at 2013-07-07 15:28:44, elapsed: 00:00:02 >>>> >>>> Best Regards, >>>> Rui >>>> >>> >>> >>> >>>-- >>>*Lewis* > -- *Lewis*

