Hello,



I am testing nutch-2.0 with mysql storage with 1 url. I see that updatedb 
command does not do anything. It does not add outlinks to the table as new urls 
and I do not see any error messages in hadoop.log Here is the log entries 
without plugin load info

 INFO  crawl.DbUpdaterJob - DbUpdaterJob: starting
2012-07-24 10:53:46,142 WARN  util.NativeCodeLoader - Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
2012-07-24 10:53:46,979 INFO  mapreduce.GoraRecordReader - 
gora.buffer.read.limit = 10000
2012-07-24 10:53:49,801 INFO  mapreduce.GoraRecordWriter - 
gora.buffer.write.limit = 10000
2012-07-24 10:53:49,806 INFO  crawl.FetchScheduleFactory - Using FetchSchedule 
impl: org.apache.nutch.crawl.DefaultFetchSchedule
2012-07-24 10:53:49,807 INFO  crawl.AbstractFetchSchedule - 
defaultInterval=25920000
2012-07-24 10:53:49,807 INFO  crawl.AbstractFetchSchedule - maxInterval=25920000
2012-07-24 10:53:52,741 WARN  mapred.FileOutputCommitter - Output path is null 
in cleanup
2012-07-24 10:53:53,584 INFO  crawl.DbUpdaterJob - DbUpdaterJob: done

Also, I noticed that there is crawlId option to it. Where its value comes from?

Btw, updatedb with no arguments works fine if Hbase is chosen for storage.

Thanks.
Alex.





~


 
 

Reply via email to