[
https://issues.apache.org/jira/browse/YARN-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659024#comment-13659024
]
Carlo Curino commented on YARN-666:
-----------------------------------
Hi Vinod, I will give you some numbers but bare in mind that these results are
very initial, based only on a handful of runs on a 9 or 10 machine cluster, and
without serious tuning of terasort.
The idea of the solution is for maps to write their output directly into HDFS
(e.g., with replication turned down to 1). Reducers will be started only when
maps complete and stream-merge straight out of HDFS (bypassing much of the
partial merging logic).
Key limitations of what we have for now:
1) if a map output is lost, all reducers will have to wait for it to be re-run
2) we have lots of dfsclients open, this might become a problem for HDFS if you
have too many maps per node.
We initially tried this as a way to make checkpointing cheaper (no need to save
any state other than last-processed key), and we were just hoping for it not
too be too much worse than regular shuffle. The surprise I mentioned above was
that we actually observe a surprisingly substantial speed up on a simple sort
job (on 9 nodes): 25% at 64GB scale and 31% at 1TB scale.
This seems to indicate that the penalty of reading through HDFS is actually
trumped by the benefits of doing a stream-merge (where data never touch disk on
the reduce side, other than for reducer output). Probably this is reducing
seeks, and using the drives from which we read and we write more efficiently.
You can imagine to get similar benefits by adding restartability to the http
client (and the buffering done by HDFS client, which was likely to be
beneficial in our test). More sophisticated versions of these could also
dynamically decide whether to stream merge from a certain map or whether to
copy the data (if for example they are small to fit in memory).
Bottomline, I don't think we should read to much out these results (again very
initial), other than using HDFS for intermediate data layer is not completely
infeasible.
> [Umbrella] Support rolling upgrades in YARN
> -------------------------------------------
>
> Key: YARN-666
> URL: https://issues.apache.org/jira/browse/YARN-666
> Project: Hadoop YARN
> Issue Type: Improvement
> Affects Versions: 2.0.4-alpha
> Reporter: Siddharth Seth
> Attachments: YARN_Rolling_Upgrades.pdf, YARN_Rolling_Upgrades_v2.pdf
>
>
> Jira to track changes required in YARN to allow rolling upgrades, including
> documentation and possible upgrade routes.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira