[jira] [Updated] (YARN-493) NodeManager job control logic flaws on Windows

2013-05-24 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated YARN-493:
-

Fix Version/s: (was: 3.0.0)
   2.0.5-beta

I merged the patch to branch-2.

 NodeManager job control logic flaws on Windows
 --

 Key: YARN-493
 URL: https://issues.apache.org/jira/browse/YARN-493
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 2.0.5-beta

 Attachments: YARN-493.1.patch, YARN-493.2.patch, YARN-493.3.patch, 
 YARN-493.4.patch


 Both product and test code contain some platform-specific assumptions, such 
 as availability of bash for executing a command in a container and signals to 
 check existence of a process and terminate it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-493) NodeManager job control logic flaws on Windows

2013-04-16 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-493:
---

Attachment: YARN-493.4.patch

Attaching rebased patch and resubmitting to Jenkins.

 NodeManager job control logic flaws on Windows
 --

 Key: YARN-493
 URL: https://issues.apache.org/jira/browse/YARN-493
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 3.0.0

 Attachments: YARN-493.1.patch, YARN-493.2.patch, YARN-493.3.patch, 
 YARN-493.4.patch


 Both product and test code contain some platform-specific assumptions, such 
 as availability of bash for executing a command in a container and signals to 
 check existence of a process and terminate it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-493) NodeManager job control logic flaws on Windows

2013-04-04 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-493:
---

Attachment: YARN-493.3.patch

Here is a new patch that renames the new {{Shell}} methods to 
{{appendScriptExtension}}.

Regarding trying to use {{Shell#getRunScriptCommand}} in the badSymlink test, I 
have not been able to get this to work.  The test depends on very specific 
quoting, and the conversion to absolute path inside 
{{Shell#getRunScriptCommand}} (required by other callers) interferes with this.

 NodeManager job control logic flaws on Windows
 --

 Key: YARN-493
 URL: https://issues.apache.org/jira/browse/YARN-493
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 3.0.0

 Attachments: YARN-493.1.patch, YARN-493.2.patch, YARN-493.3.patch


 Both product and test code contain some platform-specific assumptions, such 
 as availability of bash for executing a command in a container and signals to 
 check existence of a process and terminate it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-493) NodeManager job control logic flaws on Windows

2013-03-27 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-493:
---

Attachment: YARN-493.2.patch

{quote}
would make sense to expose common Shell/Test utilities that would abstract out 
the following 2 patterns
{quote}

Good idea, Ivan.  Here is version 2 of the patch, which adds a few more helper 
methods to {{Shell}} to assist with this.  I've intentionally left one 
occurrence of this pattern untouched in 
{{TestContainerLaunch#testSpecialCharSymlinks}} because of a very specific need 
for internal quoting and escaping in the arguments.

 NodeManager job control logic flaws on Windows
 --

 Key: YARN-493
 URL: https://issues.apache.org/jira/browse/YARN-493
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 3.0.0

 Attachments: YARN-493.1.patch, YARN-493.2.patch


 Both product and test code contain some platform-specific assumptions, such 
 as availability of bash for executing a command in a container and signals to 
 check existence of a process and terminate it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-493) NodeManager job control logic flaws on Windows

2013-03-22 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-493:
---

Attachment: YARN-493.1.patch

This patch addresses the bugs that I found.  I've verified that the tests pass 
on Mac (does not have setsid), Ubuntu (does have setsid), and Windows.  Here is 
an explanation of the changes:

# Discussion on YARN-359 concluded that we should refactor 
{{getCheckProcessIsAliveCommand}} and {{getSignalKillCommand}} from 
{{ContainerExecutor}} back to {{Shell}}.  I'm taking the opportunity to do it 
now while we're working on this code.  {{isSetsidSupported}} used to return 
true for Windows, with the rationale being that this flag really means are 
process groups supported.  This didn't work out in practice, because there is 
too much logic that is very specific to using setsid.  This had been causing 
the calls to winutils to prepend a '-' character to the job ID, which is 
incorrect.
# winutils task kill had been terminating the job with exit code 1, but some 
of the YARN code depends on seeing a Unix-style exit code from signalled child 
processes, which is 128 + signal.  (See {{ContainerLaunch#call}}.)  The Windows 
{{TerminateJobObject}} API is most analogous to a kill signal, so I've changed 
task.c to use 128 + 9 = 137.
# {{TestNodeManagerShutdown}}, {{TestContainerManager}}, and 
{{TestContainerLaunch}} were using bash scripts and signals for testing.  I 
wrote alternatives for Windows that use cmd and winutils.  Note that there is 
no equivalent to bash's ability to trap a signal, so on Windows, the assertions 
need to check for process existence instead.
# Some test working directories have been shortened by switching from 
{{Class#getName}} to {{Class#getSimpleName}}, similar to several prior patches.
# {{TestContainerManager}} had been requesting memory in bytes, but the API 
actually uses megabytes.  I'm guessing that the API changed from bytes to MB at 
some point, but we forgot to update this test.  This caused a very interesting 
problem.  {{ContainerImpl#LaunchTransition}} would apply a conversion from 
bytes to MB, which would cause an overflow to exactly 0.  Then, 
{{ContainersMonitorImpl#isProcessTreeOverLimit}} would see that the container 
uses  0 MB and decide to kill it.  This is a race condition that would cause 
the test to fail unpredictably on Windows.  I hadn't seen the problem on Mac or 
Ubuntu, where it seems we were just getting lucky.  I've changed the test code 
to use MB.
# {{TestContainerLaunch#setNewEnvironmentHack}} uses reflection to modify the 
environment during the test.  I needed to update this code to handle different 
internal JDK class structure when running on Windows.


 NodeManager job control logic flaws on Windows
 --

 Key: YARN-493
 URL: https://issues.apache.org/jira/browse/YARN-493
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 3.0.0

 Attachments: YARN-493.1.patch


 Both product and test code contain some platform-specific assumptions, such 
 as availability of bash for executing a command in a container and signals to 
 check existence of a process and terminate it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-493) NodeManager job control logic flaws on Windows

2013-03-21 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-493:
---

Description: Both product and test code contain some platform-specific 
assumptions, such as availability of bash for executing a command in a 
container and signals to check existence of a process and terminate it.  (was: 
The tests contain some platform-specific assumptions, such as availability of 
bash for executing a command in a container and signals to check existence of a 
process and terminate it.)
Summary: NodeManager job control logic flaws on Windows  (was: 
TestContainerManager fails on Windows)

I'm expanding the scope of this jira to cover some flaws I've discovered in 
NodeManager's job control logic on Windows:

# Windows was erroneously flagged as supporting setsid, which caused prepending 
of a '-' character to the job ID passed to winutils.
# Exit code from job terminated by winutils task kill differed from 
expectations in YARN Java code, so that it couldn't tell the difference between 
a killed container vs. a container that had exited with failure.
# Multiple tests were relying on bash scripts and signals for launching and 
controlling containers.

I have a patch in progress.  With the expanded scope, the patch will fix the 
following tests on Windows: {{TestContainerLaunch}}, {{TestContainerManager}}, 
and {{TestNodeManagerShutdown}}.


 NodeManager job control logic flaws on Windows
 --

 Key: YARN-493
 URL: https://issues.apache.org/jira/browse/YARN-493
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 3.0.0


 Both product and test code contain some platform-specific assumptions, such 
 as availability of bash for executing a command in a container and signals to 
 check existence of a process and terminate it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira