That looks really interesting. I think it would especially help when
running many jobs concurrently with fetching. Looks like this is
introduced in hadoop 0.19 so we should have it in our 1.0 release for Nutch.
Dennis
Otis Gospodnetic wrote:
Slides 17 & 18 give a glimpse into this scheduler,
http://wiki.apache.org/hadoop-data/attachments/HadoopPresentations/attachments/dhruba_apachecon2008.pdf
Oh, and I see the JIRA issue contains a patch for 0.18.1 (applicable to 0.18.2
possibly).
But I'm really curious if others think this would work for and help with Nutch
generate/fetch/parse/etc. operations.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
________________________________
From: Otis Gospodnetic <[EMAIL PROTECTED]>
To: Nutch User List <[email protected]>
Sent: Thursday, November 20, 2008 3:51:31 PM
Subject: Hadoop's new fair sharing job scheduler
Hi,
Just noticed Hadoop's new fair sharing job scheduler (
https://issues.apache.org/jira/browse/HADOOP-3746
). It seems to be in 0.19, which I think Nutch is not on yet... but still:
- is this something that would benefit Nutch?
The last time I used Nutch I remember having to be careful about mostly
sequential job runs and having to pay close attention to number of max
map/reduce tasks, etc. in order to maximize the cluster, and I wonder if the
above would make that easier, less manual, or more efficient?
Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch