[jira] [Created] (MAPREDUCE-5387) Implement Signal.TERM on Windows
Ivan Mitic created MAPREDUCE-5387: - Summary: Implement Signal.TERM on Windows Key: MAPREDUCE-5387 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5387 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0, 1-win, 2.1.0-beta Reporter: Ivan Mitic Assignee: Ivan Mitic Signal.TERM is currently not supported by Hadoop on the Windows platform. Tracking Jira for the problem. A couple of things to keep in mind: - Support for process groups (JobObjects on Windows) - Solution should work for both java and other streaming Hadoop apps -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5387) Implement Signal.TERM on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708079#comment-13708079 ] Ivan Mitic commented on MAPREDUCE-5387: --- Copy-pasting [~cnauroth] comment from MAPREDUCE-5330: {quote} I came across similar issues while working on the YARN nodemanager changes for Windows. Bikas, I agree that this logic doesn't exactly match the meaning of SIGTERM. To match SIGTERM, we really need a way for one process to signal another process with some graceful shutdown message, and a way for the other process to trigger custom code when it receives that message. Unfortunately, I'm not aware of anything in the Windows API that provides an exact match. Therefore, the logic in this patch seems to be the closest approximation that's feasible right now. To elaborate on this, TerminateProcess immediately kills the target process, and there is no way for that process to trap the call and run custom clean-up code. http://msdn.microsoft.com/en-us/library/windows/desktop/ms686714(v=vs.85).aspx This is much different from Unix signals, which allow the target process to install signal handlers to respond gracefully to things like SIGTERM. There also seems to be some support for programmatically sending CTL-C to a process and installing a custom handler to respond to it. This would be SetConsoleCtrlHandler and GenerateConsoleCtrlEvent. I've heard anecdotally that this can be used to create a rough approximation of Unix signals, but I haven't tried it myself. http://msdn.microsoft.com/en-us/library/windows/desktop/ms686016(v=vs.85).aspx http://msdn.microsoft.com/en-us/library/windows/desktop/ms683155(v=vs.85).aspx Aside from that, the only other option seems to be for Windows applications to roll their own custom IPC protocol (i.e. one process sends another a custom graceful shutdown message over a named pipe). It might be worth pursuing one of these solutions in the long term for absolute correctness, but these approaches will require a lot more coding and testing. {quote} Implement Signal.TERM on Windows Key: MAPREDUCE-5387 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5387 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0, 1-win, 2.1.0-beta Reporter: Ivan Mitic Assignee: Ivan Mitic Signal.TERM is currently not supported by Hadoop on the Windows platform. Tracking Jira for the problem. A couple of things to keep in mind: - Support for process groups (JobObjects on Windows) - Solution should work for both java and other streaming Hadoop apps -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5330) JVM manager should not forcefully kill the process on Signal.TERM on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708081#comment-13708081 ] Ivan Mitic commented on MAPREDUCE-5330: --- Chris, Bikas, Xi, I filed a new Jira MAPREDUCE-5387 to investigate possible ways to implement Signal.TERM on Windows. I have already spent time investigating this some time ago, will try to come up with a proposal in the near term. Chris' summary from above gives a good overview of some possible options (I copied it into the new Jira). JVM manager should not forcefully kill the process on Signal.TERM on Windows Key: MAPREDUCE-5330 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5330 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1-win Environment: Windows Reporter: Xi Fang Assignee: Xi Fang Fix For: 1-win Attachments: MAPREDUCE-5330.patch In MapReduce, we sometimes kill a task's JVM before it naturally shuts down if we want to launch other tasks (look in JvmManager$JvmManagerForType.reapJvm). This behavior means that if the map task process is in the middle of doing some cleanup/finalization after the task is done, it might be interrupted/killed without giving it a chance. In the Microsoft's Hadoop Service, after a Map/Reduce task is done and during closing file systems in a special shutdown hook, we're typically uploading storage (ASV in our context) usage metrics to Microsoft Azure Tables. So if this kill happens these metrics get lost. The impact is that for many MR jobs we don't see accurate metrics reported most of the time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5387) Implement Signal.TERM on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5387: -- Issue Type: Improvement (was: Bug) Implement Signal.TERM on Windows Key: MAPREDUCE-5387 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5387 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 3.0.0, 1-win, 2.1.0-beta Reporter: Ivan Mitic Assignee: Ivan Mitic Signal.TERM is currently not supported by Hadoop on the Windows platform. Tracking Jira for the problem. A couple of things to keep in mind: - Support for process groups (JobObjects on Windows) - Solution should work for both java and other streaming Hadoop apps -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5384) Races in DelegationTokenRenewal
[ https://issues.apache.org/jira/browse/MAPREDUCE-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708159#comment-13708159 ] Matt Foley commented on MAPREDUCE-5384: --- We're in final stage of producing 1.2.1-rc. Moving this new issue to targetVersion 1.3.0. Races in DelegationTokenRenewal --- Key: MAPREDUCE-5384 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5384 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.2.0, 1.1.2, 1.2.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: mr-5384-0.patch, mr-5384-1.patch There are a couple of races in DelegationTokenRenewal. One of them was addressed by MAPREDUCE-4860, which introduced a deadlock while fixing this race. Opening a new JIRA per discussion in MAPREDUCE-5364, since MAPREDUCE-4860 is already shipped in a release. Races to fix: # TimerTask#cancel() disallows future invocations of run(), but doesn't abort an already scheduled/started run(). # In the context of DelegationTokenRenewal, RenewalTimerTask#cancel() only cancels that TimerTask instance. However, it has no effect on any other TimerTasks created for that token. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5384) Races in DelegationTokenRenewal
[ https://issues.apache.org/jira/browse/MAPREDUCE-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Foley updated MAPREDUCE-5384: -- Target Version/s: 1.3.0 (was: 1.2.1) Races in DelegationTokenRenewal --- Key: MAPREDUCE-5384 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5384 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.2.0, 1.1.2, 1.2.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: mr-5384-0.patch, mr-5384-1.patch There are a couple of races in DelegationTokenRenewal. One of them was addressed by MAPREDUCE-4860, which introduced a deadlock while fixing this race. Opening a new JIRA per discussion in MAPREDUCE-5364, since MAPREDUCE-4860 is already shipped in a release. Races to fix: # TimerTask#cancel() disallows future invocations of run(), but doesn't abort an already scheduled/started run(). # In the context of DelegationTokenRenewal, RenewalTimerTask#cancel() only cancels that TimerTask instance. However, it has no effect on any other TimerTasks created for that token. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4567) Fix failing TestJobKillAndFail in branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-4567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Foley updated MAPREDUCE-4567: -- Target Version/s: 1.3.0 (was: 1.2.1) Fix failing TestJobKillAndFail in branch-1 -- Key: MAPREDUCE-4567 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4567 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 1.2.0 Reporter: Tom White Assignee: Tom White Attachments: MAPREDUCE-4567.patch This was introduced in MAPREDUCE-4488. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4567) Fix failing TestJobKillAndFail in branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-4567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708160#comment-13708160 ] Matt Foley commented on MAPREDUCE-4567: --- Targetting 1.3.0, like MAPREDUCE-4488. Fix failing TestJobKillAndFail in branch-1 -- Key: MAPREDUCE-4567 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4567 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 1.2.0 Reporter: Tom White Assignee: Tom White Attachments: MAPREDUCE-4567.patch This was introduced in MAPREDUCE-4488. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4838) Add extra info to JH files
[ https://issues.apache.org/jira/browse/MAPREDUCE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708171#comment-13708171 ] Matt Foley commented on MAPREDUCE-4838: --- Confirmed committed to both branch-1.2 and branch-1. Updated CHANGES.txt to be consistent. Add extra info to JH files -- Key: MAPREDUCE-4838 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4838 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Arun C Murthy Assignee: Zhijie Shen Fix For: 2.0.3-alpha, 1.2.1 Attachments: MAPREDUCE-4838_1.patch, MAPREDUCE-4838_2.patch, MAPREDUCE-4838_3.patch, MAPREDUCE-4838_4.patch, MAPREDUCE-4838_5.patch, MAPREDUCE-4838-branch-1_1.patch, MAPREDUCE-4838.patch, TestRumenJobTraces.patch It will be useful to add more task-info to JH for analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira