[jira] [Created] (MAPREDUCE-5387) Implement Signal.TERM on Windows

2013-07-14 Thread Ivan Mitic (JIRA)
Ivan Mitic created MAPREDUCE-5387:
-

 Summary: Implement Signal.TERM on Windows
 Key: MAPREDUCE-5387
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5387
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.0.0, 1-win, 2.1.0-beta
Reporter: Ivan Mitic
Assignee: Ivan Mitic


Signal.TERM is currently not supported by Hadoop on the Windows platform. 
Tracking Jira for the problem. 

A couple of things to keep in mind:
 - Support for process groups (JobObjects on Windows)
 - Solution should work for both java and other streaming Hadoop apps

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5387) Implement Signal.TERM on Windows

2013-07-14 Thread Ivan Mitic (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708079#comment-13708079
 ] 

Ivan Mitic commented on MAPREDUCE-5387:
---

Copy-pasting [~cnauroth] comment from MAPREDUCE-5330:

{quote}
I came across similar issues while working on the YARN nodemanager changes for 
Windows. Bikas, I agree that this logic doesn't exactly match the meaning of 
SIGTERM. To match SIGTERM, we really need a way for one process to signal 
another process with some graceful shutdown message, and a way for the other 
process to trigger custom code when it receives that message. Unfortunately, 
I'm not aware of anything in the Windows API that provides an exact match. 
Therefore, the logic in this patch seems to be the closest approximation that's 
feasible right now.

To elaborate on this, TerminateProcess immediately kills the target process, 
and there is no way for that process to trap the call and run custom clean-up 
code.

http://msdn.microsoft.com/en-us/library/windows/desktop/ms686714(v=vs.85).aspx

This is much different from Unix signals, which allow the target process to 
install signal handlers to respond gracefully to things like SIGTERM.

There also seems to be some support for programmatically sending CTL-C to a 
process and installing a custom handler to respond to it. This would be 
SetConsoleCtrlHandler and GenerateConsoleCtrlEvent. I've heard anecdotally that 
this can be used to create a rough approximation of Unix signals, but I haven't 
tried it myself.

http://msdn.microsoft.com/en-us/library/windows/desktop/ms686016(v=vs.85).aspx

http://msdn.microsoft.com/en-us/library/windows/desktop/ms683155(v=vs.85).aspx

Aside from that, the only other option seems to be for Windows applications to 
roll their own custom IPC protocol (i.e. one process sends another a custom 
graceful shutdown message over a named pipe).

It might be worth pursuing one of these solutions in the long term for absolute 
correctness, but these approaches will require a lot more coding and testing.
{quote}

 Implement Signal.TERM on Windows
 

 Key: MAPREDUCE-5387
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5387
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.0.0, 1-win, 2.1.0-beta
Reporter: Ivan Mitic
Assignee: Ivan Mitic

 Signal.TERM is currently not supported by Hadoop on the Windows platform. 
 Tracking Jira for the problem. 
 A couple of things to keep in mind:
  - Support for process groups (JobObjects on Windows)
  - Solution should work for both java and other streaming Hadoop apps

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5330) JVM manager should not forcefully kill the process on Signal.TERM on Windows

2013-07-14 Thread Ivan Mitic (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708081#comment-13708081
 ] 

Ivan Mitic commented on MAPREDUCE-5330:
---

Chris, Bikas, Xi, I filed a new Jira MAPREDUCE-5387 to investigate possible 
ways to implement Signal.TERM on Windows. I have already spent time 
investigating this some time ago, will try to come up with a proposal in the 
near term. Chris' summary from above gives a good overview of some possible 
options (I copied it into the new Jira). 

 JVM manager should not forcefully kill the process on Signal.TERM on Windows
 

 Key: MAPREDUCE-5330
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5330
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1-win
 Environment: Windows
Reporter: Xi Fang
Assignee: Xi Fang
 Fix For: 1-win

 Attachments: MAPREDUCE-5330.patch


 In MapReduce, we sometimes kill a task's JVM before it naturally shuts down 
 if we want to launch other tasks (look in 
 JvmManager$JvmManagerForType.reapJvm). This behavior means that if the map 
 task process is in the middle of doing some cleanup/finalization after the 
 task is done, it might be interrupted/killed without giving it a chance. 
 In the Microsoft's Hadoop Service, after a Map/Reduce task is done and during 
 closing file systems in a special shutdown hook, we're typically uploading 
 storage (ASV in our context) usage metrics to Microsoft Azure Tables. So if 
 this kill happens these metrics get lost. The impact is that for many MR jobs 
 we don't see accurate metrics reported most of the time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5387) Implement Signal.TERM on Windows

2013-07-14 Thread Ivan Mitic (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Mitic updated MAPREDUCE-5387:
--

Issue Type: Improvement  (was: Bug)

 Implement Signal.TERM on Windows
 

 Key: MAPREDUCE-5387
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5387
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 3.0.0, 1-win, 2.1.0-beta
Reporter: Ivan Mitic
Assignee: Ivan Mitic

 Signal.TERM is currently not supported by Hadoop on the Windows platform. 
 Tracking Jira for the problem. 
 A couple of things to keep in mind:
  - Support for process groups (JobObjects on Windows)
  - Solution should work for both java and other streaming Hadoop apps

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5384) Races in DelegationTokenRenewal

2013-07-14 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708159#comment-13708159
 ] 

Matt Foley commented on MAPREDUCE-5384:
---

We're in final stage of producing 1.2.1-rc.  Moving this new issue to 
targetVersion 1.3.0.

 Races in DelegationTokenRenewal
 ---

 Key: MAPREDUCE-5384
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5384
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.2.0, 1.1.2, 1.2.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: mr-5384-0.patch, mr-5384-1.patch


 There are a couple of races in DelegationTokenRenewal. 
 One of them was addressed by MAPREDUCE-4860, which introduced a deadlock 
 while fixing this race. Opening a new JIRA per discussion in MAPREDUCE-5364, 
 since MAPREDUCE-4860 is already shipped in a release.
 Races to fix:
 # TimerTask#cancel() disallows future invocations of run(), but doesn't abort 
 an already scheduled/started run().
 # In the context of DelegationTokenRenewal, RenewalTimerTask#cancel() only 
 cancels that TimerTask instance. However, it has no effect on any other 
 TimerTasks created for that token. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5384) Races in DelegationTokenRenewal

2013-07-14 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-5384:
--

Target Version/s: 1.3.0  (was: 1.2.1)

 Races in DelegationTokenRenewal
 ---

 Key: MAPREDUCE-5384
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5384
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.2.0, 1.1.2, 1.2.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: mr-5384-0.patch, mr-5384-1.patch


 There are a couple of races in DelegationTokenRenewal. 
 One of them was addressed by MAPREDUCE-4860, which introduced a deadlock 
 while fixing this race. Opening a new JIRA per discussion in MAPREDUCE-5364, 
 since MAPREDUCE-4860 is already shipped in a release.
 Races to fix:
 # TimerTask#cancel() disallows future invocations of run(), but doesn't abort 
 an already scheduled/started run().
 # In the context of DelegationTokenRenewal, RenewalTimerTask#cancel() only 
 cancels that TimerTask instance. However, it has no effect on any other 
 TimerTasks created for that token. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4567) Fix failing TestJobKillAndFail in branch-1

2013-07-14 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-4567:
--

Target Version/s: 1.3.0  (was: 1.2.1)

 Fix failing TestJobKillAndFail in branch-1
 --

 Key: MAPREDUCE-4567
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4567
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 1.2.0
Reporter: Tom White
Assignee: Tom White
 Attachments: MAPREDUCE-4567.patch


 This was introduced in MAPREDUCE-4488.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4567) Fix failing TestJobKillAndFail in branch-1

2013-07-14 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708160#comment-13708160
 ] 

Matt Foley commented on MAPREDUCE-4567:
---

Targetting 1.3.0, like MAPREDUCE-4488.

 Fix failing TestJobKillAndFail in branch-1
 --

 Key: MAPREDUCE-4567
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4567
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 1.2.0
Reporter: Tom White
Assignee: Tom White
 Attachments: MAPREDUCE-4567.patch


 This was introduced in MAPREDUCE-4488.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4838) Add extra info to JH files

2013-07-14 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708171#comment-13708171
 ] 

Matt Foley commented on MAPREDUCE-4838:
---

Confirmed committed to both branch-1.2 and branch-1.
Updated CHANGES.txt to be consistent.

 Add extra info to JH files
 --

 Key: MAPREDUCE-4838
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4838
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Arun C Murthy
Assignee: Zhijie Shen
 Fix For: 2.0.3-alpha, 1.2.1

 Attachments: MAPREDUCE-4838_1.patch, MAPREDUCE-4838_2.patch, 
 MAPREDUCE-4838_3.patch, MAPREDUCE-4838_4.patch, MAPREDUCE-4838_5.patch, 
 MAPREDUCE-4838-branch-1_1.patch, MAPREDUCE-4838.patch, 
 TestRumenJobTraces.patch


 It will be useful to add more task-info to JH for analytics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira