Re: updatedb is talking long long time

2009-11-01 Thread Kalaimathan Mahenthiran
I forgot to add the detail...

The segment i'm trying to do updatedb on has 1.3 millions urls fetched
and 1.08 million urls parsed..

Any help related to this would be appreciated...


On Sun, Nov 1, 2009 at 11:53 PM, Kalaimathan Mahenthiran
 wrote:
> hi everyone
>
> I'm using nutch 1.0. I have fetched successfully and currently on the
> updatedb process. I'm doing updatedb and its taking so long. I don't
> know why its taking this long. I have a new machine with quad core
> processor and 8 gb of ram.
>
> I believe this system is really good in terms of processing power. I
> don't think processing power is the problem here. I noticed that all
> the ram is getting using up. close to 7.7gb by the updatedb process.
> The computer is becoming is really slow.
>
> The updatedb process has been running for the last 19 days continually
> with the message merging segment data into db.. Does anyone know why
> its taking so long... Is there any configuration setting i can do to
> increase the speed of the updatedb process...
>
> Thanks in advance for any help...
> Mathan
>
> r...@trweb10:/opt/nutch-1.0# bin/nutch updatedb
> Using configuration below
> /opt//jdk1.6.0_16
> Usage: CrawlDb  (-dir  |   ...)
> [-force] [-normalize] [-filter] [-noAdditions]
>        crawldb CrawlDb to update
>        -dir segments   parent directory containing all segments to update from
>        seg1 seg2 ...   list of segment names to update from
>        -force  force update even if CrawlDb appears to be locked
> (CAUTION advised)
>        -normalize      use URLNormalizer on urls in CrawlDb and
> segment (usually not needed)
>        -filter use URLFilters on urls in CrawlDb and segment
>        -noAdditions    only update already existing URLs, don't add
> any newly discovered URLs
> r...@trweb10:/opt/nutch-1.0# bin/nutch updatedb Crawl/db/ 
> Crawl/segments/200909
> 20090906232208/ 20090909074026/ 20090909101115/ 20090909124554/
> 20090914115913/ 20090915141615/
> r...@trweb10:/opt/tsweb/nutch-1.0# bin/nutch updatedb Crawl/db/
> Crawl/segments/20090915141615/ -force
> Using configuration below
> conf_tamilsweb
> /opt/tsweb/jdk1.6.0_16
> CrawlDb update: starting
> CrawlDb update: db: Crawl/db
> CrawlDb update: segments: [Crawl/segments/20090915141615]
> CrawlDb update: additions allowed: true
> CrawlDb update: URL normalizing: false
> CrawlDb update: URL filtering: false
> CrawlDb update: Merging segment data into db.
>


updatedb is talking long long time

2009-11-01 Thread Kalaimathan Mahenthiran
hi everyone

I'm using nutch 1.0. I have fetched successfully and currently on the
updatedb process. I'm doing updatedb and its taking so long. I don't
know why its taking this long. I have a new machine with quad core
processor and 8 gb of ram.

I believe this system is really good in terms of processing power. I
don't think processing power is the problem here. I noticed that all
the ram is getting using up. close to 7.7gb by the updatedb process.
The computer is becoming is really slow.

The updatedb process has been running for the last 19 days continually
with the message merging segment data into db.. Does anyone know why
its taking so long... Is there any configuration setting i can do to
increase the speed of the updatedb process...

Thanks in advance for any help...
Mathan

r...@trweb10:/opt/nutch-1.0# bin/nutch updatedb
Using configuration below
/opt//jdk1.6.0_16
Usage: CrawlDb  (-dir  |   ...)
[-force] [-normalize] [-filter] [-noAdditions]
crawldb CrawlDb to update
-dir segments   parent directory containing all segments to update from
seg1 seg2 ...   list of segment names to update from
-force  force update even if CrawlDb appears to be locked
(CAUTION advised)
-normalize  use URLNormalizer on urls in CrawlDb and
segment (usually not needed)
-filter use URLFilters on urls in CrawlDb and segment
-noAdditionsonly update already existing URLs, don't add
any newly discovered URLs
r...@trweb10:/opt/nutch-1.0# bin/nutch updatedb Crawl/db/ Crawl/segments/200909
20090906232208/ 20090909074026/ 20090909101115/ 20090909124554/
20090914115913/ 20090915141615/
r...@trweb10:/opt/tsweb/nutch-1.0# bin/nutch updatedb Crawl/db/
Crawl/segments/20090915141615/ -force
Using configuration below
conf_tamilsweb
/opt/tsweb/jdk1.6.0_16
CrawlDb update: starting
CrawlDb update: db: Crawl/db
CrawlDb update: segments: [Crawl/segments/20090915141615]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: false
CrawlDb update: URL filtering: false
CrawlDb update: Merging segment data into db.