Not sure if I understood correctly.
I did
Counters c currentJob.getCounters();
System.out.println(c.toString());
With Mysql
DbUpdaterJob: starting
Counters: 20
DbUpdaterJob: starting
counter name=Counters: 20
FileSystemCounters
FILE_BYTES_READ=878298
FILE_BYTES_WRITTEN=992362
Map-Reduce Framework
Combine input records=0
Combine output records=0
Total committed heap usage (bytes)=260177920
CPU time spent (ms)=0
Map input records=1
Map output bytes=193
Map output materialized bytes=202
Map output records=1
Physical memory (bytes) snapshot=0
Reduce input groups=1
Reduce input records=1
Reduce output records=1
Reduce shuffle bytes=0
Spilled Records=2
SPLIT_RAW_BYTES=962
Virtual memory (bytes) snapshot=0
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
DbUpdaterJob: done
Thanks.
Alex.
-----Original Message-----
From: Ferdy Galema <[email protected]>
To: user <[email protected]>
Sent: Wed, Jul 25, 2012 12:13 am
Subject: Re: updatedb in nutch-2.0 with mysql
Could you post the job counters?
On Tue, Jul 24, 2012 at 8:14 PM, <[email protected]> wrote:
>
>
>
>
>
> Hello,
>
>
>
> I am testing nutch-2.0 with mysql storage with 1 url. I see that updatedb
> command does not do anything. It does not add outlinks to the table as new
> urls and I do not see any error messages in hadoop.log Here is the log
> entries without plugin load info
>
> INFO crawl.DbUpdaterJob - DbUpdaterJob: starting
> 2012-07-24 10:53:46,142 WARN util.NativeCodeLoader - Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> 2012-07-24 10:53:46,979 INFO mapreduce.GoraRecordReader -
> gora.buffer.read.limit = 10000
> 2012-07-24 10:53:49,801 INFO mapreduce.GoraRecordWriter -
> gora.buffer.write.limit = 10000
> 2012-07-24 10:53:49,806 INFO crawl.FetchScheduleFactory - Using
> FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
> 2012-07-24 10:53:49,807 INFO crawl.AbstractFetchSchedule -
> defaultInterval=25920000
> 2012-07-24 10:53:49,807 INFO crawl.AbstractFetchSchedule -
> maxInterval=25920000
> 2012-07-24 10:53:52,741 WARN mapred.FileOutputCommitter - Output path is
> null in cleanup
> 2012-07-24 10:53:53,584 INFO crawl.DbUpdaterJob - DbUpdaterJob: done
>
> Also, I noticed that there is crawlId option to it. Where its value comes
> from?
>
> Btw, updatedb with no arguments works fine if Hbase is chosen for storage.
>
> Thanks.
> Alex.
>
>
>
>
>
> ~
>
>
>
>
>