Hi
I have tried ur suggestion of lowering db.max.outlinks.per.page to a
smaller number. I could not reparse the segment as the segment was
already parsed... I tried modifying some other variables such as
java_heap memory and mapreduce_child_opts values... modifying these
values triggered some exce
OK. Try reparsing and set a lower value to *db.max.outlinks.per.page*. I am
pretty sure that you are running out of memory because of the inlinks which
are stored in RAM.
Applying the patch NUTCH-702 would also help.
I have modified the CrawlDBReducer and added another parameter *db
.fetch.links.m
I can see that its running out of ram because... before starting
updatedb process i have approximately 7.7gb left on the system and as
soon as this starts running for some time.. the ram comes to ~48
bytes...
definitely its clogging all the ram space...
i specified the heap size to be 9 gb.. in t
OK. What heapsize did you specify for this job? Could it be that you are
running out of ram and GCing a lot? Still it should not take THAT long
Can you see some variations in the stacktraces or are they always pointing
at the same things?
The operations on the metadata take an awful lot of time, wh
I have lot of space left on the /tmp . I don't have separate partition
for /tmp... i have a folder called /tmp... There is lot of space
left.. close to 1.3Terabytes...
1.4T 55G 1.3T 5% /
tmpfs 3.8G 0 3.8G 0% /lib/init/rw
varrun3.8G
Hi again
> i know the process is not stuck.. and the process is running because i
> turned on the hadoop logs and i can see logs being written to it...
> I'm not sure how to check if the task is completely stuck or not...
>
run jps to identify the process id then *jstack id* several times to see
Thanks for all the replies...
Okay, I think there seems to be some issue too...
I'm running nutch out of the box.. using nutch release 1.0... I
running this in "local" mode..
The number of reduce tasks.. is the default configured by nutch...
The db size is approximately 860 mb..
i know the pro
Kalaimathan Mahenthiran wrote:
I forgot to add the detail...
The segment i'm trying to do updatedb on has 1.3 millions urls fetched
and 1.08 million urls parsed..
Any help related to this would be appreciated...
On Sun, Nov 1, 2009 at 11:53 PM, Kalaimathan Mahenthiran
wrote:
hi everyone
I'
Could you dump a stack trace of the process? That would give an idea of
where it is stuck. How large is your crawlDB?
Julien
--
DigitalPebble Ltd
http://www.digitalpebble.com
2009/11/2 Kalaimathan Mahenthiran
> I forgot to add the detail...
>
> The segment i'm trying to do updatedb on has 1.3
I forgot to add the detail...
The segment i'm trying to do updatedb on has 1.3 millions urls fetched
and 1.08 million urls parsed..
Any help related to this would be appreciated...
On Sun, Nov 1, 2009 at 11:53 PM, Kalaimathan Mahenthiran
wrote:
> hi everyone
>
> I'm using nutch 1.0. I have fet
hi everyone
I'm using nutch 1.0. I have fetched successfully and currently on the
updatedb process. I'm doing updatedb and its taking so long. I don't
know why its taking this long. I have a new machine with quad core
processor and 8 gb of ram.
I believe this system is really good in terms of pro
11 matches
Mail list logo