I forgot to add the detail...
The segment i'm trying to do updatedb on has 1.3 millions urls fetched
and 1.08 million urls parsed..
Any help related to this would be appreciated...
On Sun, Nov 1, 2009 at 11:53 PM, Kalaimathan Mahenthiran
wrote:
> hi everyone
>
> I'm using nutch 1.0. I have fetched successfully and currently on the
> updatedb process. I'm doing updatedb and its taking so long. I don't
> know why its taking this long. I have a new machine with quad core
> processor and 8 gb of ram.
>
> I believe this system is really good in terms of processing power. I
> don't think processing power is the problem here. I noticed that all
> the ram is getting using up. close to 7.7gb by the updatedb process.
> The computer is becoming is really slow.
>
> The updatedb process has been running for the last 19 days continually
> with the message merging segment data into db.. Does anyone know why
> its taking so long... Is there any configuration setting i can do to
> increase the speed of the updatedb process...
>
> Thanks in advance for any help...
> Mathan
>
> r...@trweb10:/opt/nutch-1.0# bin/nutch updatedb
> Using configuration below
> /opt//jdk1.6.0_16
> Usage: CrawlDb (-dir | ...)
> [-force] [-normalize] [-filter] [-noAdditions]
> crawldb CrawlDb to update
> -dir segments parent directory containing all segments to update from
> seg1 seg2 ... list of segment names to update from
> -force force update even if CrawlDb appears to be locked
> (CAUTION advised)
> -normalize use URLNormalizer on urls in CrawlDb and
> segment (usually not needed)
> -filter use URLFilters on urls in CrawlDb and segment
> -noAdditions only update already existing URLs, don't add
> any newly discovered URLs
> r...@trweb10:/opt/nutch-1.0# bin/nutch updatedb Crawl/db/
> Crawl/segments/200909
> 20090906232208/ 20090909074026/ 20090909101115/ 20090909124554/
> 20090914115913/ 20090915141615/
> r...@trweb10:/opt/tsweb/nutch-1.0# bin/nutch updatedb Crawl/db/
> Crawl/segments/20090915141615/ -force
> Using configuration below
> conf_tamilsweb
> /opt/tsweb/jdk1.6.0_16
> CrawlDb update: starting
> CrawlDb update: db: Crawl/db
> CrawlDb update: segments: [Crawl/segments/20090915141615]
> CrawlDb update: additions allowed: true
> CrawlDb update: URL normalizing: false
> CrawlDb update: URL filtering: false
> CrawlDb update: Merging segment data into db.
>