MoD wrote:
Julien,
I did tryed with 2048M / Task child,
no luck I still have two reduce that doesn't go through,
Is it somewhat related to the number of reduce,
on this cluster I have 4 servers :
- dual xeon dual core (8 core)
- 8Gb ram
- 4 disks
I did set mapred.reduce.tasks and mapred.map.tasks to 16.
because : 4 server of 4 disks. (what do you think)
Maybe if this job is too big for my cluster, does adding reduce task
could subdivise the problem into smaller reduces.
indeed I think no, cause I guess the input key is for the same domain ?
so my two last reduce task are the biggest domains of my DB ?
This is likely caused by a large number of inlinks for certain urls -
the updatedb reduce collects this list in memory, and this sometimes
leads to memory exhaustion. Please try limiting the max. number of
inlinks per url (see nutch-default.xml for details).
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com