[jira] [Commented] (YARN-666) [Umbrella] Support rolling upgrades in YARN
[ https://issues.apache.org/jira/browse/YARN-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464330#comment-15464330 ] Brahma Reddy Battula commented on YARN-666: --- Sorry for coming late, I feel, it will be good if this needs to be documented like hdfs..? > [Umbrella] Support rolling upgrades in YARN > --- > > Key: YARN-666 > URL: https://issues.apache.org/jira/browse/YARN-666 > Project: Hadoop YARN > Issue Type: Improvement > Components: graceful, rolling upgrade >Affects Versions: 2.0.4-alpha >Reporter: Siddharth Seth > Fix For: 2.6.0 > > Attachments: YARN_Rolling_Upgrades.pdf, YARN_Rolling_Upgrades_v2.pdf > > > Jira to track changes required in YARN to allow rolling upgrades, including > documentation and possible upgrade routes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-666) [Umbrella] Support rolling upgrades in YARN
[ https://issues.apache.org/jira/browse/YARN-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13993506#comment-13993506 ] Junping Du commented on YARN-666: - Link to two related JIRAs - work preserving during RM and NM restart. [Umbrella] Support rolling upgrades in YARN --- Key: YARN-666 URL: https://issues.apache.org/jira/browse/YARN-666 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth Attachments: YARN_Rolling_Upgrades.pdf, YARN_Rolling_Upgrades_v2.pdf Jira to track changes required in YARN to allow rolling upgrades, including documentation and possible upgrade routes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-666) [Umbrella] Support rolling upgrades in YARN
[ https://issues.apache.org/jira/browse/YARN-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13686102#comment-13686102 ] Siddharth Seth commented on YARN-666: - TBD - handling of Enum fields like AMCommand, NodeAction. This may be possible by forcing defaults if a new value needs to be added, alternately define a new Enum which is used by newer clients. [Umbrella] Support rolling upgrades in YARN --- Key: YARN-666 URL: https://issues.apache.org/jira/browse/YARN-666 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth Attachments: YARN_Rolling_Upgrades.pdf, YARN_Rolling_Upgrades_v2.pdf Jira to track changes required in YARN to allow rolling upgrades, including documentation and possible upgrade routes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-666) [Umbrella] Support rolling upgrades in YARN
[ https://issues.apache.org/jira/browse/YARN-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659686#comment-13659686 ] Lohit Vijayarenu commented on YARN-666: --- This looks good. Few minor point/JIRAs against metrics, reporting and UI pages updates with different version of yarn daemon should also be included. As Karthik already mentioned, it would be very useful if this followed HDFS-2983. This will become very useful for people who manage and do rolling upgrades on cluster. Another question regarding draining of NodeManager. Do we have a concept of Blacklisting NodeManager today? Reason I ask is, if we know we can afford to kill running apps on nodemanager, but do not want new jobs to be submitted, one could potentially use blacklisting. [Umbrella] Support rolling upgrades in YARN --- Key: YARN-666 URL: https://issues.apache.org/jira/browse/YARN-666 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth Attachments: YARN_Rolling_Upgrades.pdf, YARN_Rolling_Upgrades_v2.pdf Jira to track changes required in YARN to allow rolling upgrades, including documentation and possible upgrade routes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-666) [Umbrella] Support rolling upgrades in YARN
[ https://issues.apache.org/jira/browse/YARN-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659709#comment-13659709 ] Vinod Kumar Vavilapalli commented on YARN-666: -- [~curino], thanks for the update, interesting stuff. I think we should pursue this route and do some experiments. Much much easier to do these experiments in 2.x given the YARN and MR separation. May be there's already a ticket for this. Will it be possible to put up your changes however 'hacky' they might be? [~lohit], we have per node health check monitoring which blocks bad nodes. There isn't any other concept of blacklisting NMs today, that is the reason for the proposal to add a decommission. [Umbrella] Support rolling upgrades in YARN --- Key: YARN-666 URL: https://issues.apache.org/jira/browse/YARN-666 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth Attachments: YARN_Rolling_Upgrades.pdf, YARN_Rolling_Upgrades_v2.pdf Jira to track changes required in YARN to allow rolling upgrades, including documentation and possible upgrade routes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-666) [Umbrella] Support rolling upgrades in YARN
[ https://issues.apache.org/jira/browse/YARN-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13660040#comment-13660040 ] Carlo Curino commented on YARN-666: --- Vinod, I completely agree YARN/MR separation makes hacking around this much simpler. As soon as we are done polishing/publishing the rest of checkpointing/preemption we will work on rebasing this code and we will post what we have. Also we are happy to socialize this, both development and experiments. For us this was a step towards cheaper checkpointing (as an hdfs-based shuffle is almost stateless for checkpoint purposes), but the performance wins are clearly interesting and there is quite a bit of variations you can think of (e.g., a hybrid strategy using both streaming and localized data etc.. fun stuff). By the way some of the refactorings we propose in MAPREDUCE-5192 and MAPREDUCE-5194 are (aside from their use in checkpointing) useful towards this. [Umbrella] Support rolling upgrades in YARN --- Key: YARN-666 URL: https://issues.apache.org/jira/browse/YARN-666 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth Attachments: YARN_Rolling_Upgrades.pdf, YARN_Rolling_Upgrades_v2.pdf Jira to track changes required in YARN to allow rolling upgrades, including documentation and possible upgrade routes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-666) [Umbrella] Support rolling upgrades in YARN
[ https://issues.apache.org/jira/browse/YARN-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659024#comment-13659024 ] Carlo Curino commented on YARN-666: --- Hi Vinod, I will give you some numbers but bare in mind that these results are very initial, based only on a handful of runs on a 9 or 10 machine cluster, and without serious tuning of terasort. The idea of the solution is for maps to write their output directly into HDFS (e.g., with replication turned down to 1). Reducers will be started only when maps complete and stream-merge straight out of HDFS (bypassing much of the partial merging logic). Key limitations of what we have for now: 1) if a map output is lost, all reducers will have to wait for it to be re-run 2) we have lots of dfsclients open, this might become a problem for HDFS if you have too many maps per node. We initially tried this as a way to make checkpointing cheaper (no need to save any state other than last-processed key), and we were just hoping for it not too be too much worse than regular shuffle. The surprise I mentioned above was that we actually observe a surprisingly substantial speed up on a simple sort job (on 9 nodes): 25% at 64GB scale and 31% at 1TB scale. This seems to indicate that the penalty of reading through HDFS is actually trumped by the benefits of doing a stream-merge (where data never touch disk on the reduce side, other than for reducer output). Probably this is reducing seeks, and using the drives from which we read and we write more efficiently. You can imagine to get similar benefits by adding restartability to the http client (and the buffering done by HDFS client, which was likely to be beneficial in our test). More sophisticated versions of these could also dynamically decide whether to stream merge from a certain map or whether to copy the data (if for example they are small to fit in memory). Bottomline, I don't think we should read to much out these results (again very initial), other than using HDFS for intermediate data layer is not completely infeasible. [Umbrella] Support rolling upgrades in YARN --- Key: YARN-666 URL: https://issues.apache.org/jira/browse/YARN-666 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth Attachments: YARN_Rolling_Upgrades.pdf, YARN_Rolling_Upgrades_v2.pdf Jira to track changes required in YARN to allow rolling upgrades, including documentation and possible upgrade routes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-666) [Umbrella] Support rolling upgrades in YARN
[ https://issues.apache.org/jira/browse/YARN-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13657337#comment-13657337 ] Carlo Curino commented on YARN-666: --- This seems a very important problem (and a very hard one too). Just to toss one more idea around: I think that an HDFS-based shuffle (we are playing around with it and performance are much better than expected) could simplify some of the problems, as we could piggyback on datanode decomissioning mechanics to migrate intermediate data out of a node being decomissioned. And (a bit obvious) preemption could be a good tool to make the draining fast without wasting work (the administrative scenarios we mentioned during the conversation in YARN-45). [Umbrella] Support rolling upgrades in YARN --- Key: YARN-666 URL: https://issues.apache.org/jira/browse/YARN-666 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth Attachments: YARN_Rolling_Upgrades.pdf, YARN_Rolling_Upgrades_v2.pdf Jira to track changes required in YARN to allow rolling upgrades, including documentation and possible upgrade routes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-666) [Umbrella] Support rolling upgrades in YARN
[ https://issues.apache.org/jira/browse/YARN-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13657864#comment-13657864 ] Vinod Kumar Vavilapalli commented on YARN-666: -- bq. Just to toss one more idea around: I think that an HDFS-based shuffle (we are playing around with it and performance are much better than expected) Carlo, it will be great if you share some numbers :) [Umbrella] Support rolling upgrades in YARN --- Key: YARN-666 URL: https://issues.apache.org/jira/browse/YARN-666 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth Attachments: YARN_Rolling_Upgrades.pdf, YARN_Rolling_Upgrades_v2.pdf Jira to track changes required in YARN to allow rolling upgrades, including documentation and possible upgrade routes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-666) [Umbrella] Support rolling upgrades in YARN
[ https://issues.apache.org/jira/browse/YARN-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656331#comment-13656331 ] Siddharth Seth commented on YARN-666: - bq. Steps to upgrade a YARN cluster: do you think it would make sense to upgrade the NMs first before upgrading the RM. If something goes wrong (hopefully not), users can fall-back to the older version. This really depends. There are situations which involve only an NM bug-fix. For such cases, the RM doesn't even need to be upgraded/restarted. Also depends on whether new APIs are being added to the RM which upgraded NMs may use. bq. Considerations (Upgrading the MR runtime): Until YARN/MR go into separate projects and release cycles, upgrading YARN alone (say 2.1.0 to 2.1.2) shouldn't affect the clients (MR) - no? This depends upon individual deployments. Sites may choose to deploy YARN/MR in a way where they can be upgraded independently. The same example - MR 2.1.2 which contains AM/MR runtime fixes running against YARN 2.1.0. That's one of the main goals of MR being user-land code. Until work preserving restart is implemented, there should be a way to upgrade MR without affecting the cluster. bq. I am assuming the version check will be similar to the one in HDFS-2983. We can definitely learn from that - if we want to support more specific versions than just the ones on individual protcols. I don't think YARN has any version checks at the moment, other than the ones performed on API versions by the RPC layer. [Umbrella] Support rolling upgrades in YARN --- Key: YARN-666 URL: https://issues.apache.org/jira/browse/YARN-666 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth Attachments: YARN_Rolling_Upgrades.pdf Jira to track changes required in YARN to allow rolling upgrades, including documentation and possible upgrade routes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-666) [Umbrella] Support rolling upgrades in YARN
[ https://issues.apache.org/jira/browse/YARN-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655036#comment-13655036 ] Hitesh Shah commented on YARN-666: -- +1 to getting this built out. As they say, the devil is in the details. [Umbrella] Support rolling upgrades in YARN --- Key: YARN-666 URL: https://issues.apache.org/jira/browse/YARN-666 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth Jira to track changes required in YARN to allow rolling upgrades, including documentation and possible upgrade routes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-666) [Umbrella] Support rolling upgrades in YARN
[ https://issues.apache.org/jira/browse/YARN-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655062#comment-13655062 ] Karthik Kambatla commented on YARN-666: --- Sid - thanks for creating this. Excited. Just went over the design doc (which BTW is well-articulated) and have the following comments: # Steps to upgrade a YARN cluster: do you think it would make sense to upgrade the NMs first before upgrading the RM. If something goes wrong (hopefully not), users can fall-back to the older version. # Considerations (Upgrading the MR runtime): Until YARN/MR go into separate projects and release cycles, upgrading YARN alone (say 2.1.0 to 2.1.2) shouldn't affect the clients (MR) - no? # Looks like we need to come up with an appropriate policy for YARN data formats in HADOOP-9517. # I am assuming the version check will be similar to the one in HDFS-2983. # Big +1 to drain decommission [Umbrella] Support rolling upgrades in YARN --- Key: YARN-666 URL: https://issues.apache.org/jira/browse/YARN-666 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth Attachments: YARN_Rolling_Upgrades.pdf Jira to track changes required in YARN to allow rolling upgrades, including documentation and possible upgrade routes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira