[
https://issues.apache.org/jira/browse/YARN-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Lowe updated YARN-4467:
-----------------------------
Target Version/s: 2.8.0
Priority: Blocker (was: Major)
+1, kicked Jenkins again to get a fresh run. I also marked this as a Blocker
for 2.8. We will not be allowed to remove the public member of a Public class
once the public member ships in a release or we risk breaking backwards
compatibility. Fortunately this public member hasn't been released yet, so we
have a chance to fix it cleanly.
> Shell.checkIsBashSupported swallowed an interrupted exception
> -------------------------------------------------------------
>
> Key: YARN-4467
> URL: https://issues.apache.org/jira/browse/YARN-4467
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Reporter: Wei-Chiu Chuang
> Assignee: Wei-Chiu Chuang
> Priority: Blocker
> Labels: oct16-easy, shell, supportability
> Attachments: HADOOP-12652.001.patch, YARN-4467.001.patch
>
>
> Edit: move this JIRA from HADOOP to YARN, as Shell.checkIsBashSupported() is
> used, and only used in YARN.
> Shell.checkIsBashSupported() creates a bash shell command to verify if the
> system supports bash. However, its error message is misleading, and the logic
> should be updated.
> If the shell command throws an IOException, it does not imply the bash did
> not run successfully. If the shell command process was interrupted, its
> internal logic throws an InterruptedIOException, which is a subclass of
> IOException.
> {code:title=Shell.checkIsBashSupported|borderStyle=solid}
> ShellCommandExecutor shexec;
> boolean supported = true;
> try {
> String[] args = {"bash", "-c", "echo 1000"};
> shexec = new ShellCommandExecutor(args);
> shexec.execute();
> } catch (IOException ioe) {
> LOG.warn("Bash is not supported by the OS", ioe);
> supported = false;
> }
> {code}
> An example of it appeared in a recent jenkins job
> https://builds.apache.org/job/PreCommit-HADOOP-Build/8257/testReport/org.apache.hadoop.ipc/TestRPCWaitForProxy/testInterruptedWaitForProxy/
> The test logic in TestRPCWaitForProxy.testInterruptedWaitForProxy starts a
> thread, wait it for 1 second, and interrupt the thread, expecting the thread
> to terminate. However, the method Shell.checkIsBashSupported swallowed the
> interrupt, and therefore failed.
> {noformat}
> 2015-12-16 21:31:53,797 WARN util.Shell
> (Shell.java:checkIsBashSupported(718)) - Bash is not supported by the OS
> java.io.InterruptedIOException: java.lang.InterruptedException
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:930)
> at org.apache.hadoop.util.Shell.run(Shell.java:838)
> at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
> at org.apache.hadoop.util.Shell.checkIsBashSupported(Shell.java:716)
> at org.apache.hadoop.util.Shell.<clinit>(Shell.java:705)
> at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79)
> at
> org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:639)
> at
> org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273)
> at
> org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261)
> at
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:803)
> at
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:773)
> at
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:646)
> at org.apache.hadoop.ipc.RPC.waitForProtocolProxy(RPC.java:397)
> at org.apache.hadoop.ipc.RPC.waitForProtocolProxy(RPC.java:350)
> at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:330)
> at
> org.apache.hadoop.ipc.TestRPCWaitForProxy$RpcThread.run(TestRPCWaitForProxy.java:115)
> Caused by: java.lang.InterruptedException
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:503)
> at java.lang.UNIXProcess.waitFor(UNIXProcess.java:264)
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:920)
> ... 15 more
> {noformat}
> The original design is not desirable, as it swallowed a potential interrupt,
> causing TestRPCWaitForProxy.testInterruptedWaitForProxy to fail.
> Unfortunately, Java does not allow this static method to throw exception. We
> should removed the static member variable, so that the method can throw the
> interrupt exception. The node manager should call the static method, instead
> of using the static member variable.
> This fix has an associated benefit: the tests could run faster, because it
> will no longer need to spawn a bash process when it uses a Shell static
> method variable (which happens quite often for checking what operating system
> Hadoop is running on)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]