Could you post the job counters? On Tue, Jul 24, 2012 at 8:14 PM, <[email protected]> wrote:
> > > > > > Hello, > > > > I am testing nutch-2.0 with mysql storage with 1 url. I see that updatedb > command does not do anything. It does not add outlinks to the table as new > urls and I do not see any error messages in hadoop.log Here is the log > entries without plugin load info > > INFO crawl.DbUpdaterJob - DbUpdaterJob: starting > 2012-07-24 10:53:46,142 WARN util.NativeCodeLoader - Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > 2012-07-24 10:53:46,979 INFO mapreduce.GoraRecordReader - > gora.buffer.read.limit = 10000 > 2012-07-24 10:53:49,801 INFO mapreduce.GoraRecordWriter - > gora.buffer.write.limit = 10000 > 2012-07-24 10:53:49,806 INFO crawl.FetchScheduleFactory - Using > FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule > 2012-07-24 10:53:49,807 INFO crawl.AbstractFetchSchedule - > defaultInterval=25920000 > 2012-07-24 10:53:49,807 INFO crawl.AbstractFetchSchedule - > maxInterval=25920000 > 2012-07-24 10:53:52,741 WARN mapred.FileOutputCommitter - Output path is > null in cleanup > 2012-07-24 10:53:53,584 INFO crawl.DbUpdaterJob - DbUpdaterJob: done > > Also, I noticed that there is crawlId option to it. Where its value comes > from? > > Btw, updatedb with no arguments works fine if Hbase is chosen for storage. > > Thanks. > Alex. > > > > > > ~ > > > > >

