Re: updatedb is talking long long time

Julien Nioche Mon, 02 Nov 2009 01:07:10 -0800

Could you dump a stack trace of the process? That would give an idea of
where it is stuck. How large is your crawlDB?


Julien
-- 
DigitalPebble Ltd
http://www.digitalpebble.com

2009/11/2 Kalaimathan Mahenthiran <matha...@gmail.com>

> I forgot to add the detail...
>
> The segment i'm trying to do updatedb on has 1.3 millions urls fetched
> and 1.08 million urls parsed..
>
> Any help related to this would be appreciated...
>
>
> On Sun, Nov 1, 2009 at 11:53 PM, Kalaimathan Mahenthiran
> <matha...@gmail.com> wrote:
> > hi everyone
> >
> > I'm using nutch 1.0. I have fetched successfully and currently on the
> > updatedb process. I'm doing updatedb and its taking so long. I don't
> > know why its taking this long. I have a new machine with quad core
> > processor and 8 gb of ram.
> >
> > I believe this system is really good in terms of processing power. I
> > don't think processing power is the problem here. I noticed that all
> > the ram is getting using up. close to 7.7gb by the updatedb process.
> > The computer is becoming is really slow.
> >
> > The updatedb process has been running for the last 19 days continually
> > with the message merging segment data into db.. Does anyone know why
> > its taking so long... Is there any configuration setting i can do to
> > increase the speed of the updatedb process...
> >
> > Thanks in advance for any help...
> > Mathan
> >
> > r...@trweb10:/opt/nutch-1.0# bin/nutch updatedb
> > Using configuration below
> > /opt/****/jdk1.6.0_16
> > Usage: CrawlDb <crawldb> (-dir <segments> | <seg1> <seg2> ...)
> > [-force] [-normalize] [-filter] [-noAdditions]
> >        crawldb CrawlDb to update
> >        -dir segments   parent directory containing all segments to update
> from
> >        seg1 seg2 ...   list of segment names to update from
> >        -force  force update even if CrawlDb appears to be locked
> > (CAUTION advised)
> >        -normalize      use URLNormalizer on urls in CrawlDb and
> > segment (usually not needed)
> >        -filter use URLFilters on urls in CrawlDb and segment
> >        -noAdditions    only update already existing URLs, don't add
> > any newly discovered URLs
> > r...@trweb10:/opt/nutch-1.0# bin/nutch updatedb Crawl/db/
> Crawl/segments/200909
> > 20090906232208/ 20090909074026/ 20090909101115/ 20090909124554/
> > 20090914115913/ 20090915141615/
> > r...@trweb10:/opt/tsweb/nutch-1.0# bin/nutch updatedb Crawl/db/
> > Crawl/segments/20090915141615/ -force
> > Using configuration below
> > conf_tamilsweb
> > /opt/tsweb/jdk1.6.0_16
> > CrawlDb update: starting
> > CrawlDb update: db: Crawl/db
> > CrawlDb update: segments: [Crawl/segments/20090915141615]
> > CrawlDb update: additions allowed: true
> > CrawlDb update: URL normalizing: false
> > CrawlDb update: URL filtering: false
> > CrawlDb update: Merging segment data into db.
> >
>

Re: updatedb is talking long long time

Reply via email to