Nutch updatedb Crash

2009-08-16 Thread MoD
Hi, During CrawlDb Map reduce job, The reduce worker fail 1 by 1 with : java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.concurrent.ConcurrentHashMap$HashEntry.newArray(ConcurrentHashMap.java:205) at

Re: Nutch updatedb Crash

2009-08-16 Thread Julien Nioche
Hi, The reducing step of the updatedb requires quite a lot of memory indeed. See https://issues.apache.org/jira/browse/NUTCH-702 for a discussion on this subject. BTW you'll have to specify the parameter mapred.child.java.opts in your conf/hadoop-site.xml so that the value is sent to the hadoop

Re: Nutch updatedb Crash

2009-08-16 Thread MoD
Julien, I did tryed with 2048M / Task child, no luck I still have two reduce that doesn't go through, Is it somewhat related to the number of reduce, on this cluster I have 4 servers : - dual xeon dual core (8 core) - 8Gb ram - 4 disks I did set mapred.reduce.tasks and mapred.map.tasks to 16.

Re: Nutch updatedb Crash

2009-08-16 Thread Andrzej Bialecki
MoD wrote: Julien, I did tryed with 2048M / Task child, no luck I still have two reduce that doesn't go through, Is it somewhat related to the number of reduce, on this cluster I have 4 servers : - dual xeon dual core (8 core) - 8Gb ram - 4 disks I did set mapred.reduce.tasks and

Re: Nutch updatedb Crash

2009-08-16 Thread MoD
fixed, thanks. On Sun, Aug 16, 2009 at 8:38 PM, Andrzej Bialeckia...@getopt.org wrote: MoD wrote: Julien, I did tryed with 2048M / Task child, no luck I still have two reduce that doesn't go through, Is it somewhat related to the number of reduce, on this cluster I have 4 servers : -