That looks really interesting. I think it would especially help when running many jobs concurrently with fetching. Looks like this is introduced in hadoop 0.19 so we should have it in our 1.0 release for Nutch.

Dennis

Otis Gospodnetic wrote:
Slides 17 & 18 give a glimpse into this scheduler, http://wiki.apache.org/hadoop-data/attachments/HadoopPresentations/attachments/dhruba_apachecon2008.pdf


Oh, and I see the JIRA issue contains a patch for 0.18.1 (applicable to 0.18.2 
possibly).

But I'm really curious if others think this would work for and help with Nutch 
generate/fetch/parse/etc. operations.

Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch




________________________________
From: Otis Gospodnetic <[EMAIL PROTECTED]>
To: Nutch User List <[email protected]>
Sent: Thursday, November 20, 2008 3:51:31 PM
Subject: Hadoop's new fair sharing job scheduler

Hi,

Just noticed Hadoop's new  fair sharing job scheduler ( 
https://issues.apache.org/jira/browse/HADOOP-3746
).  It seems to be in 0.19, which I think Nutch is not on yet... but still:

- is this something that would benefit Nutch?

The last time I used Nutch I remember having to be careful about mostly 
sequential job runs and having to pay close attention to number of max 
map/reduce tasks, etc. in order to maximize the cluster, and I wonder if the 
above would make that easier, less manual, or more efficient?


Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

Reply via email to