Re: Hadoop's new fair sharing job scheduler

Dennis Kubes Wed, 26 Nov 2008 04:42:14 -0800

That looks really interesting. I think it would especially help whenrunning many jobs concurrently with fetching. Looks like this isintroduced in hadoop 0.19 so we should have it in our 1.0 release for Nutch.


Dennis


Otis Gospodnetic wrote:

Slides 17 & 18 give a glimpse into this scheduler,http://wiki.apache.org/hadoop-data/attachments/HadoopPresentations/attachments/dhruba_apachecon2008.pdf



Oh, and I see the JIRA issue contains a patch for 0.18.1 (applicable to 0.18.2 
possibly).

But I'm really curious if others think this would work for and help with Nutch 
generate/fetch/parse/etc. operations.

Otis--

Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

________________________________
From: Otis Gospodnetic <[EMAIL PROTECTED]>
To: Nutch User List <[email protected]>
Sent: Thursday, November 20, 2008 3:51:31 PM
Subject: Hadoop's new fair sharing job scheduler

Hi,

Just noticed Hadoop's new  fair sharing job scheduler ( 
https://issues.apache.org/jira/browse/HADOOP-3746
).  It seems to be in 0.19, which I think Nutch is not on yet... but still:

- is this something that would benefit Nutch?

The last time I used Nutch I remember having to be careful about mostly 
sequential job runs and having to pay close attention to number of max 
map/reduce tasks, etc. in order to maximize the cluster, and I wonder if the 
above would make that easier, less manual, or more efficient?

Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

Re: Hadoop's new fair sharing job scheduler

Reply via email to