Re: updatedb in nutch-2.0 with mysql

alxsss Wed, 25 Jul 2012 13:03:07 -0700

Not sure if I understood correctly. 
I did 
Counters c currentJob.getCounters();
System.out.println(c.toString());


With Mysql
 
DbUpdaterJob: starting
Counters: 20
DbUpdaterJob: starting
counter name=Counters: 20
        FileSystemCounters
           FILE_BYTES_READ=878298
           FILE_BYTES_WRITTEN=992362
        Map-Reduce Framework
           Combine input records=0
           Combine output records=0
           Total committed heap usage (bytes)=260177920
           CPU time spent (ms)=0
           Map input records=1
           Map output bytes=193
           Map output materialized bytes=202
           Map output records=1
           Physical memory (bytes) snapshot=0
           Reduce input groups=1
           Reduce input records=1
           Reduce output records=1
           Reduce shuffle bytes=0
           Spilled Records=2
           SPLIT_RAW_BYTES=962
           Virtual memory (bytes) snapshot=0
        File Input Format Counters
           Bytes Read=0
        File Output Format Counters
           Bytes Written=0
DbUpdaterJob: done


Thanks.
Alex.



-----Original Message-----
From: Ferdy Galema <[email protected]>
To: user <[email protected]>
Sent: Wed, Jul 25, 2012 12:13 am
Subject: Re: updatedb in nutch-2.0 with mysql


Could you post the job counters?

On Tue, Jul 24, 2012 at 8:14 PM, <[email protected]> wrote:

>
>
>
>
>
> Hello,
>
>
>
> I am testing nutch-2.0 with mysql storage with 1 url. I see that updatedb
> command does not do anything. It does not add outlinks to the table as new
> urls and I do not see any error messages in hadoop.log Here is the log
> entries without plugin load info
>
>  INFO  crawl.DbUpdaterJob - DbUpdaterJob: starting
> 2012-07-24 10:53:46,142 WARN  util.NativeCodeLoader - Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> 2012-07-24 10:53:46,979 INFO  mapreduce.GoraRecordReader -
> gora.buffer.read.limit = 10000
> 2012-07-24 10:53:49,801 INFO  mapreduce.GoraRecordWriter -
> gora.buffer.write.limit = 10000
> 2012-07-24 10:53:49,806 INFO  crawl.FetchScheduleFactory - Using
> FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
> 2012-07-24 10:53:49,807 INFO  crawl.AbstractFetchSchedule -
> defaultInterval=25920000
> 2012-07-24 10:53:49,807 INFO  crawl.AbstractFetchSchedule -
> maxInterval=25920000
> 2012-07-24 10:53:52,741 WARN  mapred.FileOutputCommitter - Output path is
> null in cleanup
> 2012-07-24 10:53:53,584 INFO  crawl.DbUpdaterJob - DbUpdaterJob: done
>
> Also, I noticed that there is crawlId option to it. Where its value comes
> from?
>
> Btw, updatedb with no arguments works fine if Hbase is chosen for storage.
>
> Thanks.
> Alex.
>
>
>
>
>
> ~
>
>
>
>
>

Re: updatedb in nutch-2.0 with mysql

Reply via email to