[jira] [Comment Edited] (HBASE-19954) ShutdownHook should check whether shutdown hook is tracked by ShutdownHookManager

2018-02-19 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16369611#comment-16369611
 ] 

Ted Yu edited comment on HBASE-19954 at 2/20/18 12:23 AM:
--

Patch v2 adds audience annotation for the ShutdownHookManager class.

hasShutdownHook() is exercised by the TestBlockReorder against hadoop3.

If specific scenario is needed to test hasShutdownHook(), let me know.


was (Author: yuzhih...@gmail.com):
Patch v2 adds audience annotation for the ShutdownHookManager class.

hasShutdownHook() is exercised by the TestBlockReorder against hadoop3.

If specific scenario is desired to test hasShutdownHook(), let me know.

> ShutdownHook should check whether shutdown hook is tracked by 
> ShutdownHookManager
> -
>
> Key: HBASE-19954
> URL: https://issues.apache.org/jira/browse/HBASE-19954
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: 19954.v1.txt, 19954.v2.txt
>
>
> Currently ShutdownHook#suppressHdfsShutdownHook() does the following:
> {code}
>   synchronized (fsShutdownHooks) {
> boolean isFSCacheDisabled = 
> fs.getConf().getBoolean("fs.hdfs.impl.disable.cache", false);
> if (!isFSCacheDisabled && 
> !fsShutdownHooks.containsKey(hdfsClientFinalizer)
> && !ShutdownHookManager.deleteShutdownHook(hdfsClientFinalizer)) {
> {code}
> There is no check that ShutdownHookManager still tracks the shutdown hook, 
> leading to potential RuntimeException (as can be observed in hadoop3 Jenkins 
> job).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-19954) ShutdownHook should check whether shutdown hook is tracked by ShutdownHookManager

2018-02-17 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16368420#comment-16368420
 ] 

Ted Yu edited comment on HBASE-19954 at 2/18/18 1:54 AM:
-

Did some debugging by installing hadoop-common of hadoop3 with additional 
logging into local maven repo.
{code}
2018-02-17 16:14:14,573 INFO  [Time-limited test] 
util.ShutdownHookManager(286): clearing hooks
2018-02-17 16:14:14,588 INFO  [Time-limited test] 
hbase.HBaseTestingUtility(1114): Minicluster is down
2018-02-17 16:14:14,627 INFO  [Time-limited test] hbase.ResourceChecker(172): 
after: fs.TestBlockReorder#testBlockLocationReorder Thread=110 (was 8)
{code}
Note the above was the first test in TestBlockReorder where the {{hooks}} Set 
of hadoop ShutdownHookManager was cleared (first line).
The 'Failed suppression' exception happened in the second subtest where the 
FileSystem$Cache$ClientFinalizer instance was no longer in the Set.
I dumped the contents of the {{hooks}} Set at time of the exception and saw 
fsdataset.impl.BlockPoolSlice instances but no ClientFinalizer instance. 

After poking around hadoop ShutdownHookManager, I don't see bug.


was (Author: yuzhih...@gmail.com):
Did some debugging by installing hadoop-common with additional logging into 
local maven repo.
{code}
2018-02-17 16:14:14,573 INFO  [Time-limited test] 
util.ShutdownHookManager(286): clearing hooks
2018-02-17 16:14:14,588 INFO  [Time-limited test] 
hbase.HBaseTestingUtility(1114): Minicluster is down
2018-02-17 16:14:14,627 INFO  [Time-limited test] hbase.ResourceChecker(172): 
after: fs.TestBlockReorder#testBlockLocationReorder Thread=110 (was 8)
{code}
Note the above was the first test in TestBlockReorder where the {{hooks}} Set 
of hadoop ShutdownHookManager was cleared (first line).
The 'Failed suppression' exception happened in the second subtest where the 
FileSystem$Cache$ClientFinalizer instance was no longer in the Set.
I dumped the contents of the {{hooks}} Set at time of the exception and saw 
fsdataset.impl.BlockPoolSlice instances but no ClientFinalizer instance. 

After poking around hadoop ShutdownHookManager, I don't see bug.

> ShutdownHook should check whether shutdown hook is tracked by 
> ShutdownHookManager
> -
>
> Key: HBASE-19954
> URL: https://issues.apache.org/jira/browse/HBASE-19954
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: 19954.v1.txt
>
>
> Currently ShutdownHook#suppressHdfsShutdownHook() does the following:
> {code}
>   synchronized (fsShutdownHooks) {
> boolean isFSCacheDisabled = 
> fs.getConf().getBoolean("fs.hdfs.impl.disable.cache", false);
> if (!isFSCacheDisabled && 
> !fsShutdownHooks.containsKey(hdfsClientFinalizer)
> && !ShutdownHookManager.deleteShutdownHook(hdfsClientFinalizer)) {
> {code}
> There is no check that ShutdownHookManager still tracks the shutdown hook, 
> leading to potential RuntimeException (as can be observed in hadoop3 Jenkins 
> job).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-19954) ShutdownHook should check whether shutdown hook is tracked by ShutdownHookManager

2018-02-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16368134#comment-16368134
 ] 

stack edited comment on HBASE-19954 at 2/17/18 7:55 AM:


You don't say why it happens. Why in hadoop3 do we get this exception and not 
in hadoop2? Why is hook removed earlier or not installed? My concern this patch 
just papers over a more substantial issue; a hook not being installed.

Looking at the patch, there is refactoring and no test. Seems easy enough to 
add. Compound checks like that done in ShutdownHook.java are easy to get wrong.

Needs Audience.

I tried it and seems to address the below.

I see this exception when TestBlockReorder fails against hadoop3. The exception 
is:

{code}
java.lang.RuntimeException: Failed suppression of fs shutdown hook: 
org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@771d03a8
  at 
org.apache.hadoop.hbase.regionserver.ShutdownHook.suppressHdfsShutdownHook(ShutdownHook.java:207)
  at 
org.apache.hadoop.hbase.regionserver.ShutdownHook.install(ShutdownHook.java:85)
  at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:927)
  at 
org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:187)
  at 
org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:133)
  at 
org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:171)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:360)
  at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1942)
  at org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:307)
  at 
org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:168)
  at java.lang.Thread.run(Thread.java:745)
{code}


was (Author: stack):
You don't say why it happens. Why in hadoop3 do we get this exception and not 
in hadoop2? Why is hook removed earlier or not installed?

Looking at the patch, there is refactoring and no test. Seems easy enough to 
add. Compound checks like that done in ShutdownHook.java are easy to get wrong.

Needs Audience.

I tried it and seems to address the below.

I see this exception when TestBlockReorder fails against hadoop3. The exception 
is:

{code}
java.lang.RuntimeException: Failed suppression of fs shutdown hook: 
org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@771d03a8
  at 
org.apache.hadoop.hbase.regionserver.ShutdownHook.suppressHdfsShutdownHook(ShutdownHook.java:207)
  at 
org.apache.hadoop.hbase.regionserver.ShutdownHook.install(ShutdownHook.java:85)
  at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:927)
  at 
org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:187)
  at 
org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:133)
  at 
org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:171)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:360)
  at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1942)
  at org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:307)
  at 
org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:168)
  at java.lang.Thread.run(Thread.java:745)
{code}

> ShutdownHook should check whether shutdown hook is tracked by 
> ShutdownHookManager
> -
>
> Key: HBASE-19954
> URL: https://issues.apache.org/jira/browse/HBASE-19954
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: 19954.v1.txt
>
>
> Currently ShutdownHook#suppressHdfsShutdownHook() does the following:
> {code}
>   synchronized (fsShutdownHooks) {
> boolean isFSCacheDisabled = 
> fs.getConf().getBoolean("fs.hdfs.impl.disable.cache", false);
> if (!isFSCacheDisabled && 
> !fsShutdownHooks.containsKey(hdfsClientFinalizer)
> && !ShutdownHookManager.deleteShutdownHook(hdfsClientFinalizer)) {
> {code}
> There is no check that ShutdownHookManager still tracks the shutdown hook, 
> leading to potential RuntimeException (as can be observed in hadoop3 Jenkins 
> job).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)