Re: updatedb in nutch-2.0 with mysql

Ferdy Galema Wed, 25 Jul 2012 00:13:34 -0700

Could you post the job counters?

On Tue, Jul 24, 2012 at 8:14 PM, <[email protected]> wrote:


>
>
>
>
>
> Hello,
>
>
>
> I am testing nutch-2.0 with mysql storage with 1 url. I see that updatedb
> command does not do anything. It does not add outlinks to the table as new
> urls and I do not see any error messages in hadoop.log Here is the log
> entries without plugin load info
>
>  INFO  crawl.DbUpdaterJob - DbUpdaterJob: starting
> 2012-07-24 10:53:46,142 WARN  util.NativeCodeLoader - Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> 2012-07-24 10:53:46,979 INFO  mapreduce.GoraRecordReader -
> gora.buffer.read.limit = 10000
> 2012-07-24 10:53:49,801 INFO  mapreduce.GoraRecordWriter -
> gora.buffer.write.limit = 10000
> 2012-07-24 10:53:49,806 INFO  crawl.FetchScheduleFactory - Using
> FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
> 2012-07-24 10:53:49,807 INFO  crawl.AbstractFetchSchedule -
> defaultInterval=25920000
> 2012-07-24 10:53:49,807 INFO  crawl.AbstractFetchSchedule -
> maxInterval=25920000
> 2012-07-24 10:53:52,741 WARN  mapred.FileOutputCommitter - Output path is
> null in cleanup
> 2012-07-24 10:53:53,584 INFO  crawl.DbUpdaterJob - DbUpdaterJob: done
>
> Also, I noticed that there is crawlId option to it. Where its value comes
> from?
>
> Btw, updatedb with no arguments works fine if Hbase is chosen for storage.
>
> Thanks.
> Alex.
>
>
>
>
>
> ~
>
>
>
>
>

Re: updatedb in nutch-2.0 with mysql

Reply via email to