Could you dump a stack trace of the process? That would give an idea of where it is stuck. How large is your crawlDB?
Julien -- DigitalPebble Ltd http://www.digitalpebble.com 2009/11/2 Kalaimathan Mahenthiran <matha...@gmail.com> > I forgot to add the detail... > > The segment i'm trying to do updatedb on has 1.3 millions urls fetched > and 1.08 million urls parsed.. > > Any help related to this would be appreciated... > > > On Sun, Nov 1, 2009 at 11:53 PM, Kalaimathan Mahenthiran > <matha...@gmail.com> wrote: > > hi everyone > > > > I'm using nutch 1.0. I have fetched successfully and currently on the > > updatedb process. I'm doing updatedb and its taking so long. I don't > > know why its taking this long. I have a new machine with quad core > > processor and 8 gb of ram. > > > > I believe this system is really good in terms of processing power. I > > don't think processing power is the problem here. I noticed that all > > the ram is getting using up. close to 7.7gb by the updatedb process. > > The computer is becoming is really slow. > > > > The updatedb process has been running for the last 19 days continually > > with the message merging segment data into db.. Does anyone know why > > its taking so long... Is there any configuration setting i can do to > > increase the speed of the updatedb process... > > > > Thanks in advance for any help... > > Mathan > > > > r...@trweb10:/opt/nutch-1.0# bin/nutch updatedb > > Using configuration below > > /opt/****/jdk1.6.0_16 > > Usage: CrawlDb <crawldb> (-dir <segments> | <seg1> <seg2> ...) > > [-force] [-normalize] [-filter] [-noAdditions] > > crawldb CrawlDb to update > > -dir segments parent directory containing all segments to update > from > > seg1 seg2 ... list of segment names to update from > > -force force update even if CrawlDb appears to be locked > > (CAUTION advised) > > -normalize use URLNormalizer on urls in CrawlDb and > > segment (usually not needed) > > -filter use URLFilters on urls in CrawlDb and segment > > -noAdditions only update already existing URLs, don't add > > any newly discovered URLs > > r...@trweb10:/opt/nutch-1.0# bin/nutch updatedb Crawl/db/ > Crawl/segments/200909 > > 20090906232208/ 20090909074026/ 20090909101115/ 20090909124554/ > > 20090914115913/ 20090915141615/ > > r...@trweb10:/opt/tsweb/nutch-1.0# bin/nutch updatedb Crawl/db/ > > Crawl/segments/20090915141615/ -force > > Using configuration below > > conf_tamilsweb > > /opt/tsweb/jdk1.6.0_16 > > CrawlDb update: starting > > CrawlDb update: db: Crawl/db > > CrawlDb update: segments: [Crawl/segments/20090915141615] > > CrawlDb update: additions allowed: true > > CrawlDb update: URL normalizing: false > > CrawlDb update: URL filtering: false > > CrawlDb update: Merging segment data into db. > > >