Re: Specific fetch list based on url status or score

2009-08-16 Thread MilleBii
Just back from good holidays. Will check however I did see options that trigger this functionality. 2009/8/3 Otis Gospodnetic ogjunk-nu...@yahoo.com Hi, See this: http://markmail.org/message/znbu5khl7qbkvhkm (I didn't double-check CHANGES.txt to see if this made it into 1.0) Otis --

Nutch updatedb Crash

2009-08-16 Thread MoD
Hi, During CrawlDb Map reduce job, The reduce worker fail 1 by 1 with : java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.concurrent.ConcurrentHashMap$HashEntry.newArray(ConcurrentHashMap.java:205) at

Re: Nutch updatedb Crash

2009-08-16 Thread Julien Nioche
Hi, The reducing step of the updatedb requires quite a lot of memory indeed. See https://issues.apache.org/jira/browse/NUTCH-702 for a discussion on this subject. BTW you'll have to specify the parameter mapred.child.java.opts in your conf/hadoop-site.xml so that the value is sent to the hadoop

Re: Nutch updatedb Crash

2009-08-16 Thread MoD
Julien, I did tryed with 2048M / Task child, no luck I still have two reduce that doesn't go through, Is it somewhat related to the number of reduce, on this cluster I have 4 servers : - dual xeon dual core (8 core) - 8Gb ram - 4 disks I did set mapred.reduce.tasks and mapred.map.tasks to 16.

Re: Nutch updatedb Crash

2009-08-16 Thread Andrzej Bialecki
MoD wrote: Julien, I did tryed with 2048M / Task child, no luck I still have two reduce that doesn't go through, Is it somewhat related to the number of reduce, on this cluster I have 4 servers : - dual xeon dual core (8 core) - 8Gb ram - 4 disks I did set mapred.reduce.tasks and

Re: Nutch updatedb Crash

2009-08-16 Thread MoD
fixed, thanks. On Sun, Aug 16, 2009 at 8:38 PM, Andrzej Bialeckia...@getopt.org wrote: MoD wrote: Julien, I did tryed with 2048M / Task child, no luck I still have two reduce that doesn't go through, Is it somewhat related to the number of reduce, on this cluster I have 4 servers : -

Which versions?

2009-08-16 Thread Paul Tomblin
Which versions of Lucene, Nutch and Solr work together? I've discovered that the Nutch trunk and the Solr trunk use wildly different versions of the Lucene jars, and it's causing me problems. -- http://www.linkedin.com/in/paultomblin