[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-04-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15990398#comment-15990398
 ] 

Hudson commented on HBASE-17287:


ABORTED: Integrated in Jenkins build HBase-HBASE-14614 #190 (See 
[https://builds.apache.org/job/HBase-HBASE-14614/190/])
HBASE-17287 Master becomes a zombie if filesystem object closes (tedyu: rev 
f159557eded160680e623b966350ea3442b5f35a)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterWalManager.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestSafemodeBringsDownMaster.java


> Master becomes a zombie if filesystem object closes
> ---
>
> Key: HBASE-17287
> URL: https://issues.apache.org/jira/browse/HBASE-17287
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Clay B.
>Assignee: Ted Yu
>Priority: Blocker
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.1.10, 1.2.6
>
> Attachments: 17287.branch-1.1.v4.txt, 17287.branch-1.v3.txt, 
> 17287.branch-1.v4.txt, 17287.master.v2.txt, 17287.master.v3.txt, 
> 17287.master.v4.txt, 17287.master.v5.txt, 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's 
> HDFS client is unable to stabilize before 
> {{dfs.client.failover.max.attempts}} then the master's filesystem object 
> closes. This seems to result in an HBase master which will continue to run 
> (process and znode exists) but no meaningful work can be done (e.g. assigning 
> meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192 
> ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: 
> Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log 
> splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat 
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
>  org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at
>  java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: 
> Filesystem closed{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947545#comment-15947545
 ] 

Hudson commented on HBASE-17287:


SUCCESS: Integrated in Jenkins build HBase-1.3-JDK8 #143 (See 
[https://builds.apache.org/job/HBase-1.3-JDK8/143/])
HBASE-17287 Master becomes a zombie if filesystem object closes (tedyu: rev 
8c6608f5a92e5bbd3ecc45c7cc7e28a4f05251e7)
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestSafemodeBringsDownMaster.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java


> Master becomes a zombie if filesystem object closes
> ---
>
> Key: HBASE-17287
> URL: https://issues.apache.org/jira/browse/HBASE-17287
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Clay B.
>Assignee: Ted Yu
>Priority: Blocker
> Fix For: 1.4.0, 1.3.1, 1.1.9, 2.0, 1.2.6
>
> Attachments: 17287.branch-1.1.v4.txt, 17287.branch-1.v3.txt, 
> 17287.branch-1.v4.txt, 17287.master.v2.txt, 17287.master.v3.txt, 
> 17287.master.v4.txt, 17287.master.v5.txt, 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's 
> HDFS client is unable to stabilize before 
> {{dfs.client.failover.max.attempts}} then the master's filesystem object 
> closes. This seems to result in an HBase master which will continue to run 
> (process and znode exists) but no meaningful work can be done (e.g. assigning 
> meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192 
> ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: 
> Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log 
> splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat 
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
>  org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at
>  java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: 
> Filesystem closed{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947539#comment-15947539
 ] 

Hudson commented on HBASE-17287:


SUCCESS: Integrated in Jenkins build HBase-1.3-JDK7 #134 (See 
[https://builds.apache.org/job/HBase-1.3-JDK7/134/])
HBASE-17287 Master becomes a zombie if filesystem object closes (tedyu: rev 
8c6608f5a92e5bbd3ecc45c7cc7e28a4f05251e7)
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestSafemodeBringsDownMaster.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java


> Master becomes a zombie if filesystem object closes
> ---
>
> Key: HBASE-17287
> URL: https://issues.apache.org/jira/browse/HBASE-17287
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Clay B.
>Assignee: Ted Yu
>Priority: Blocker
> Fix For: 1.4.0, 1.3.1, 1.1.9, 2.0, 1.2.6
>
> Attachments: 17287.branch-1.1.v4.txt, 17287.branch-1.v3.txt, 
> 17287.branch-1.v4.txt, 17287.master.v2.txt, 17287.master.v3.txt, 
> 17287.master.v4.txt, 17287.master.v5.txt, 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's 
> HDFS client is unable to stabilize before 
> {{dfs.client.failover.max.attempts}} then the master's filesystem object 
> closes. This seems to result in an HBase master which will continue to run 
> (process and znode exists) but no meaningful work can be done (e.g. assigning 
> meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192 
> ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: 
> Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log 
> splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat 
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
>  org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at
>  java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: 
> Filesystem closed{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947523#comment-15947523
 ] 

Hudson commented on HBASE-17287:


SUCCESS: Integrated in Jenkins build HBase-1.2-JDK7 #119 (See 
[https://builds.apache.org/job/HBase-1.2-JDK7/119/])
HBASE-17287 Master becomes a zombie if filesystem object closes (tedyu: rev 
2d79b7d5a508c2175312487db3e93d838e063ec2)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestSafemodeBringsDownMaster.java


> Master becomes a zombie if filesystem object closes
> ---
>
> Key: HBASE-17287
> URL: https://issues.apache.org/jira/browse/HBASE-17287
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Clay B.
>Assignee: Ted Yu
>Priority: Blocker
> Fix For: 1.4.0, 1.3.1, 1.1.9, 2.0, 1.2.6
>
> Attachments: 17287.branch-1.1.v4.txt, 17287.branch-1.v3.txt, 
> 17287.branch-1.v4.txt, 17287.master.v2.txt, 17287.master.v3.txt, 
> 17287.master.v4.txt, 17287.master.v5.txt, 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's 
> HDFS client is unable to stabilize before 
> {{dfs.client.failover.max.attempts}} then the master's filesystem object 
> closes. This seems to result in an HBase master which will continue to run 
> (process and znode exists) but no meaningful work can be done (e.g. assigning 
> meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192 
> ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: 
> Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log 
> splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat 
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
>  org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at
>  java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: 
> Filesystem closed{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947482#comment-15947482
 ] 

Hudson commented on HBASE-17287:


SUCCESS: Integrated in Jenkins build HBase-1.2-JDK8 #115 (See 
[https://builds.apache.org/job/HBase-1.2-JDK8/115/])
HBASE-17287 Master becomes a zombie if filesystem object closes (tedyu: rev 
2d79b7d5a508c2175312487db3e93d838e063ec2)
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestSafemodeBringsDownMaster.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java


> Master becomes a zombie if filesystem object closes
> ---
>
> Key: HBASE-17287
> URL: https://issues.apache.org/jira/browse/HBASE-17287
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Clay B.
>Assignee: Ted Yu
>Priority: Blocker
> Fix For: 1.4.0, 1.3.1, 1.1.9, 2.0, 1.2.6
>
> Attachments: 17287.branch-1.1.v4.txt, 17287.branch-1.v3.txt, 
> 17287.branch-1.v4.txt, 17287.master.v2.txt, 17287.master.v3.txt, 
> 17287.master.v4.txt, 17287.master.v5.txt, 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's 
> HDFS client is unable to stabilize before 
> {{dfs.client.failover.max.attempts}} then the master's filesystem object 
> closes. This seems to result in an HBase master which will continue to run 
> (process and znode exists) but no meaningful work can be done (e.g. assigning 
> meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192 
> ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: 
> Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log 
> splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat 
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
>  org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at
>  java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: 
> Filesystem closed{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947434#comment-15947434
 ] 

Hudson commented on HBASE-17287:


SUCCESS: Integrated in Jenkins build HBase-1.1-JDK8 #1939 (See 
[https://builds.apache.org/job/HBase-1.1-JDK8/1939/])
HBASE-17287 Master becomes a zombie if filesystem object closes (tedyu: rev 
1fdb97ce840c10f2b2e4aa4a5c636ae98bfc1c33)
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestSafemodeBringsDownMaster.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java


> Master becomes a zombie if filesystem object closes
> ---
>
> Key: HBASE-17287
> URL: https://issues.apache.org/jira/browse/HBASE-17287
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Clay B.
>Assignee: Ted Yu
>Priority: Blocker
> Fix For: 1.4.0, 1.3.1, 1.1.9, 2.0, 1.2.6
>
> Attachments: 17287.branch-1.1.v4.txt, 17287.branch-1.v3.txt, 
> 17287.branch-1.v4.txt, 17287.master.v2.txt, 17287.master.v3.txt, 
> 17287.master.v4.txt, 17287.master.v5.txt, 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's 
> HDFS client is unable to stabilize before 
> {{dfs.client.failover.max.attempts}} then the master's filesystem object 
> closes. This seems to result in an HBase master which will continue to run 
> (process and znode exists) but no meaningful work can be done (e.g. assigning 
> meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192 
> ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: 
> Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log 
> splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat 
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
>  org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at
>  java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: 
> Filesystem closed{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947419#comment-15947419
 ] 

Hudson commented on HBASE-17287:


FAILURE: Integrated in Jenkins build HBase-1.1-JDK7 #1855 (See 
[https://builds.apache.org/job/HBase-1.1-JDK7/1855/])
HBASE-17287 Master becomes a zombie if filesystem object closes (tedyu: rev 
1fdb97ce840c10f2b2e4aa4a5c636ae98bfc1c33)
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestSafemodeBringsDownMaster.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java


> Master becomes a zombie if filesystem object closes
> ---
>
> Key: HBASE-17287
> URL: https://issues.apache.org/jira/browse/HBASE-17287
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Clay B.
>Assignee: Ted Yu
>Priority: Blocker
> Fix For: 1.4.0, 1.3.1, 1.1.9, 2.0, 1.2.6
>
> Attachments: 17287.branch-1.1.v4.txt, 17287.branch-1.v3.txt, 
> 17287.branch-1.v4.txt, 17287.master.v2.txt, 17287.master.v3.txt, 
> 17287.master.v4.txt, 17287.master.v5.txt, 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's 
> HDFS client is unable to stabilize before 
> {{dfs.client.failover.max.attempts}} then the master's filesystem object 
> closes. This seems to result in an HBase master which will continue to run 
> (process and znode exists) but no meaningful work can be done (e.g. assigning 
> meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192 
> ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: 
> Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log 
> splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat 
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
>  org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at
>  java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: 
> Filesystem closed{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947249#comment-15947249
 ] 

Hudson commented on HBASE-17287:


SUCCESS: Integrated in Jenkins build HBase-1.2-IT #853 (See 
[https://builds.apache.org/job/HBase-1.2-IT/853/])
HBASE-17287 Master becomes a zombie if filesystem object closes (tedyu: rev 
2d79b7d5a508c2175312487db3e93d838e063ec2)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestSafemodeBringsDownMaster.java


> Master becomes a zombie if filesystem object closes
> ---
>
> Key: HBASE-17287
> URL: https://issues.apache.org/jira/browse/HBASE-17287
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Clay B.
>Assignee: Ted Yu
>Priority: Blocker
> Fix For: 1.4.0, 1.3.1, 1.1.9, 2.0, 1.2.6
>
> Attachments: 17287.branch-1.1.v4.txt, 17287.branch-1.v3.txt, 
> 17287.branch-1.v4.txt, 17287.master.v2.txt, 17287.master.v3.txt, 
> 17287.master.v4.txt, 17287.master.v5.txt, 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's 
> HDFS client is unable to stabilize before 
> {{dfs.client.failover.max.attempts}} then the master's filesystem object 
> closes. This seems to result in an HBase master which will continue to run 
> (process and znode exists) but no meaningful work can be done (e.g. assigning 
> meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192 
> ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: 
> Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log 
> splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat 
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
>  org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at
>  java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: 
> Filesystem closed{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947239#comment-15947239
 ] 

Hudson commented on HBASE-17287:


SUCCESS: Integrated in Jenkins build HBase-1.3-IT #16 (See 
[https://builds.apache.org/job/HBase-1.3-IT/16/])
HBASE-17287 Master becomes a zombie if filesystem object closes (tedyu: rev 
8c6608f5a92e5bbd3ecc45c7cc7e28a4f05251e7)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestSafemodeBringsDownMaster.java


> Master becomes a zombie if filesystem object closes
> ---
>
> Key: HBASE-17287
> URL: https://issues.apache.org/jira/browse/HBASE-17287
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Clay B.
>Assignee: Ted Yu
>Priority: Blocker
> Fix For: 1.4.0, 1.3.1, 1.1.9, 2.0, 1.2.6
>
> Attachments: 17287.branch-1.1.v4.txt, 17287.branch-1.v3.txt, 
> 17287.branch-1.v4.txt, 17287.master.v2.txt, 17287.master.v3.txt, 
> 17287.master.v4.txt, 17287.master.v5.txt, 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's 
> HDFS client is unable to stabilize before 
> {{dfs.client.failover.max.attempts}} then the master's filesystem object 
> closes. This seems to result in an HBase master which will continue to run 
> (process and znode exists) but no meaningful work can be done (e.g. assigning 
> meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192 
> ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: 
> Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log 
> splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat 
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
>  org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at
>  java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: 
> Filesystem closed{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947175#comment-15947175
 ] 

Hudson commented on HBASE-17287:


FAILURE: Integrated in Jenkins build HBase-1.4 #683 (See 
[https://builds.apache.org/job/HBase-1.4/683/])
HBASE-17287 Master becomes a zombie if filesystem object closes (tedyu: rev 
d0139a8777663aef92e5f1003e5c4682a442bfce)
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestSafemodeBringsDownMaster.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java


> Master becomes a zombie if filesystem object closes
> ---
>
> Key: HBASE-17287
> URL: https://issues.apache.org/jira/browse/HBASE-17287
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Clay B.
>Assignee: Ted Yu
>Priority: Blocker
> Fix For: 1.4.0, 2.0
>
> Attachments: 17287.branch-1.1.v4.txt, 17287.branch-1.v3.txt, 
> 17287.branch-1.v4.txt, 17287.master.v2.txt, 17287.master.v3.txt, 
> 17287.master.v4.txt, 17287.master.v5.txt, 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's 
> HDFS client is unable to stabilize before 
> {{dfs.client.failover.max.attempts}} then the master's filesystem object 
> closes. This seems to result in an HBase master which will continue to run 
> (process and znode exists) but no meaningful work can be done (e.g. assigning 
> meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192 
> ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: 
> Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log 
> splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat 
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
>  org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at
>  java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: 
> Filesystem closed{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-29 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947145#comment-15947145
 ] 

Sean Busbey commented on HBASE-17287:
-

please make sure this has a release note.

> Master becomes a zombie if filesystem object closes
> ---
>
> Key: HBASE-17287
> URL: https://issues.apache.org/jira/browse/HBASE-17287
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Clay B.
>Assignee: Ted Yu
>Priority: Blocker
> Fix For: 1.4.0, 2.0
>
> Attachments: 17287.branch-1.1.v4.txt, 17287.branch-1.v3.txt, 
> 17287.branch-1.v4.txt, 17287.master.v2.txt, 17287.master.v3.txt, 
> 17287.master.v4.txt, 17287.master.v5.txt, 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's 
> HDFS client is unable to stabilize before 
> {{dfs.client.failover.max.attempts}} then the master's filesystem object 
> closes. This seems to result in an HBase master which will continue to run 
> (process and znode exists) but no meaningful work can be done (e.g. assigning 
> meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192 
> ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: 
> Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log 
> splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat 
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
>  org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at
>  java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: 
> Filesystem closed{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-29 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947144#comment-15947144
 ] 

Sean Busbey commented on HBASE-17287:
-

A master that keeps itself as master but stops doing work should be a blocker, 
so I changed the priority.

Does this issue impact earlier 1.1.z - 1.3.z releases?

> Master becomes a zombie if filesystem object closes
> ---
>
> Key: HBASE-17287
> URL: https://issues.apache.org/jira/browse/HBASE-17287
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Clay B.
>Assignee: Ted Yu
>Priority: Blocker
> Fix For: 1.4.0, 2.0
>
> Attachments: 17287.branch-1.1.v4.txt, 17287.branch-1.v3.txt, 
> 17287.branch-1.v4.txt, 17287.master.v2.txt, 17287.master.v3.txt, 
> 17287.master.v4.txt, 17287.master.v5.txt, 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's 
> HDFS client is unable to stabilize before 
> {{dfs.client.failover.max.attempts}} then the master's filesystem object 
> closes. This seems to result in an HBase master which will continue to run 
> (process and znode exists) but no meaningful work can be done (e.g. assigning 
> meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192 
> ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: 
> Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log 
> splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat 
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
>  org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at
>  java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: 
> Filesystem closed{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15946740#comment-15946740
 ] 

Hudson commented on HBASE-17287:


SUCCESS: Integrated in Jenkins build HBase-Trunk_matrix #2760 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/2760/])
HBASE-17287 Master becomes a zombie if filesystem object closes (tedyu: rev 
f159557eded160680e623b966350ea3442b5f35a)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterWalManager.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestSafemodeBringsDownMaster.java


> Master becomes a zombie if filesystem object closes
> ---
>
> Key: HBASE-17287
> URL: https://issues.apache.org/jira/browse/HBASE-17287
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Clay B.
>Assignee: Ted Yu
> Fix For: 1.4.0, 2.0
>
> Attachments: 17287.branch-1.1.v4.txt, 17287.branch-1.v3.txt, 
> 17287.branch-1.v4.txt, 17287.master.v2.txt, 17287.master.v3.txt, 
> 17287.master.v4.txt, 17287.master.v5.txt, 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's 
> HDFS client is unable to stabilize before 
> {{dfs.client.failover.max.attempts}} then the master's filesystem object 
> closes. This seems to result in an HBase master which will continue to run 
> (process and znode exists) but no meaningful work can be done (e.g. assigning 
> meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192 
> ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: 
> Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log 
> splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat 
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
>  org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at
>  java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: 
> Filesystem closed{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-28 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15946243#comment-15946243
 ] 

Enis Soztutar commented on HBASE-17287:
---

I've already +1'ed the patch above, no? 


> Master becomes a zombie if filesystem object closes
> ---
>
> Key: HBASE-17287
> URL: https://issues.apache.org/jira/browse/HBASE-17287
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Clay B.
>Assignee: Ted Yu
> Fix For: 1.4.0, 2.0
>
> Attachments: 17287.branch-1.1.v4.txt, 17287.branch-1.v3.txt, 
> 17287.branch-1.v4.txt, 17287.master.v2.txt, 17287.master.v3.txt, 
> 17287.master.v4.txt, 17287.master.v5.txt, 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's 
> HDFS client is unable to stabilize before 
> {{dfs.client.failover.max.attempts}} then the master's filesystem object 
> closes. This seems to result in an HBase master which will continue to run 
> (process and znode exists) but no meaningful work can be done (e.g. assigning 
> meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192 
> ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: 
> Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log 
> splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat 
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
>  org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at
>  java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: 
> Filesystem closed{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-28 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15946218#comment-15946218
 ] 

Ted Yu commented on HBASE-17287:


[~enis] [~clayb]:
Do you have more review comments ?

Thanks

> Master becomes a zombie if filesystem object closes
> ---
>
> Key: HBASE-17287
> URL: https://issues.apache.org/jira/browse/HBASE-17287
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Clay B.
>Assignee: Ted Yu
> Fix For: 1.4.0, 2.0
>
> Attachments: 17287.branch-1.1.v4.txt, 17287.branch-1.v3.txt, 
> 17287.branch-1.v4.txt, 17287.master.v2.txt, 17287.master.v3.txt, 
> 17287.master.v4.txt, 17287.master.v5.txt, 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's 
> HDFS client is unable to stabilize before 
> {{dfs.client.failover.max.attempts}} then the master's filesystem object 
> closes. This seems to result in an HBase master which will continue to run 
> (process and znode exists) but no meaningful work can be done (e.g. assigning 
> meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192 
> ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: 
> Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log 
> splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat 
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
>  org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at
>  java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: 
> Filesystem closed{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-28 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945753#comment-15945753
 ] 

Ted Yu commented on HBASE-17287:


bq. with the default setup (master hosting meta table and regionserver) that 
master abort is not causing the daemon to go down

There shouldn't be problem in the above scenario.

> Master becomes a zombie if filesystem object closes
> ---
>
> Key: HBASE-17287
> URL: https://issues.apache.org/jira/browse/HBASE-17287
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Clay B.
>Assignee: Ted Yu
> Fix For: 1.4.0, 2.0
>
> Attachments: 17287.branch-1.1.v4.txt, 17287.branch-1.v3.txt, 
> 17287.branch-1.v4.txt, 17287.master.v2.txt, 17287.master.v3.txt, 
> 17287.master.v4.txt, 17287.master.v5.txt, 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's 
> HDFS client is unable to stabilize before 
> {{dfs.client.failover.max.attempts}} then the master's filesystem object 
> closes. This seems to result in an HBase master which will continue to run 
> (process and znode exists) but no meaningful work can be done (e.g. assigning 
> meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192 
> ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: 
> Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log 
> splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat 
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
>  org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at
>  java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: 
> Filesystem closed{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-28 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945725#comment-15945725
 ] 

Ted Yu commented on HBASE-17287:


bq. The timeout of 60 secs

When I ran the new test, it took ~17 seconds. 60 should be long enough even on 
a slow machine.

> Master becomes a zombie if filesystem object closes
> ---
>
> Key: HBASE-17287
> URL: https://issues.apache.org/jira/browse/HBASE-17287
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Clay B.
>Assignee: Ted Yu
> Fix For: 1.4.0, 2.0
>
> Attachments: 17287.branch-1.1.v4.txt, 17287.branch-1.v3.txt, 
> 17287.branch-1.v4.txt, 17287.master.v2.txt, 17287.master.v3.txt, 
> 17287.master.v4.txt, 17287.master.v5.txt, 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's 
> HDFS client is unable to stabilize before 
> {{dfs.client.failover.max.attempts}} then the master's filesystem object 
> closes. This seems to result in an HBase master which will continue to run 
> (process and znode exists) but no meaningful work can be done (e.g. assigning 
> meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192 
> ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: 
> Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log 
> splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat 
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
>  org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at
>  java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: 
> Filesystem closed{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-28 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945696#comment-15945696
 ] 

Enis Soztutar commented on HBASE-17287:
---

bq. In patch v5, before starting the mini cluster, I set config for master not 
to host meta region.
Ok makes sense. I've checked the test again, seems good. The timeout of 60 secs 
is aggressive I think though. Let's bump that to 3 mins. In Jenkins things can 
run super slow causing flakiness. 

On the other issue, do we have a problem with the default setup (master hosting 
meta table and regionserver) that master abort is not causing the daemon to go 
down? If we are good there, +1 for the patch. 

> Master becomes a zombie if filesystem object closes
> ---
>
> Key: HBASE-17287
> URL: https://issues.apache.org/jira/browse/HBASE-17287
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Clay B.
>Assignee: Ted Yu
> Fix For: 1.4.0, 2.0
>
> Attachments: 17287.branch-1.1.v4.txt, 17287.branch-1.v3.txt, 
> 17287.branch-1.v4.txt, 17287.master.v2.txt, 17287.master.v3.txt, 
> 17287.master.v4.txt, 17287.master.v5.txt, 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's 
> HDFS client is unable to stabilize before 
> {{dfs.client.failover.max.attempts}} then the master's filesystem object 
> closes. This seems to result in an HBase master which will continue to run 
> (process and znode exists) but no meaningful work can be done (e.g. assigning 
> meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192 
> ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: 
> Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log 
> splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat 
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
>  org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at
>  java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: 
> Filesystem closed{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944508#comment-15944508
 ] 

Hadoop QA commented on HBASE-17287:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 39s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 5s 
{color} | {color:blue} The patch file was not named according to hbase's naming 
conventions. Please see 
https://yetus.apache.org/documentation/0.3.0/precommit-patchnames for 
instructions. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
22s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
41s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
43s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
39s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
25m 48s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
44s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 99m 47s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
14s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 139m 12s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12860781/17287.master.v5.txt |
| JIRA Issue | HBASE-17287 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux edddeaed97e9 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 4b62a52 |
| Default Java | 1.8.0_121 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/6242/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/6242/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Master becomes a zombie if filesystem object closes
> ---
>
> Key: HBASE-17287
> URL: https://issues.apache.org/jira/browse/HBASE-17287

[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-27 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944353#comment-15944353
 ] 

Ted Yu commented on HBASE-17287:


bq. Why is it inside TestCreateTableProcedure

I created TestSafemodeBringsDownMaster for the new test.

bq. Why are we aborting the regionserver?

The region server, hosting meta, has unflushed edits. This simulates the 
scenario Clay reported.

> Master becomes a zombie if filesystem object closes
> ---
>
> Key: HBASE-17287
> URL: https://issues.apache.org/jira/browse/HBASE-17287
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Clay B.
>Assignee: Ted Yu
> Fix For: 1.4.0, 2.0
>
> Attachments: 17287.branch-1.1.v4.txt, 17287.branch-1.v3.txt, 
> 17287.branch-1.v4.txt, 17287.master.v2.txt, 17287.master.v3.txt, 
> 17287.master.v4.txt, 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's 
> HDFS client is unable to stabilize before 
> {{dfs.client.failover.max.attempts}} then the master's filesystem object 
> closes. This seems to result in an HBase master which will continue to run 
> (process and znode exists) but no meaningful work can be done (e.g. assigning 
> meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192 
> ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: 
> Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log 
> splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat 
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
>  org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at
>  java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: 
> Filesystem closed{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-27 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944334#comment-15944334
 ] 

Enis Soztutar commented on HBASE-17287:
---

Thanks Ted for the test. Why is it inside TestCreateTableProcedure, does not 
belong there I think. Consider TestMasterFailover, or TestMasterFileSystem or 
something. 
Why are we aborting the regionserver? Is that the one running inside the 
master? If master abort does not cause the regionserver abort, then it means 
that the issue is not fixed. 



> Master becomes a zombie if filesystem object closes
> ---
>
> Key: HBASE-17287
> URL: https://issues.apache.org/jira/browse/HBASE-17287
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Clay B.
>Assignee: Ted Yu
> Fix For: 1.4.0, 2.0
>
> Attachments: 17287.branch-1.1.v4.txt, 17287.branch-1.v3.txt, 
> 17287.branch-1.v4.txt, 17287.master.v2.txt, 17287.master.v3.txt, 
> 17287.master.v4.txt, 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's 
> HDFS client is unable to stabilize before 
> {{dfs.client.failover.max.attempts}} then the master's filesystem object 
> closes. This seems to result in an HBase master which will continue to run 
> (process and znode exists) but no meaningful work can be done (e.g. assigning 
> meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192 
> ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: 
> Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log 
> splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat 
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
>  org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at
>  java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: 
> Filesystem closed{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944319#comment-15944319
 ] 

Hadoop QA commented on HBASE-17287:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 4s 
{color} | {color:blue} The patch file was not named according to hbase's naming 
conventions. Please see 
https://yetus.apache.org/documentation/0.3.0/precommit-patchnames for 
instructions. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
28s {color} | {color:green} branch-1.1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} branch-1.1 passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} branch-1.1 passed with JDK v1.7.0_80 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
31s {color} | {color:green} branch-1.1 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
21s {color} | {color:green} branch-1.1 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 47s 
{color} | {color:red} hbase-server in branch-1.1 has 80 extant Findbugs 
warnings. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 32s 
{color} | {color:red} hbase-server in branch-1.1 failed with JDK v1.8.0_121. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s 
{color} | {color:green} branch-1.1 passed with JDK v1.7.0_80 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed with JDK v1.7.0_80 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
11m 49s {color} | {color:green} The patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
59s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 23s 
{color} | {color:red} hbase-server in the patch failed with JDK v1.8.0_121. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed with JDK v1.7.0_80 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 84m 43s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
31s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 117m 27s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8012383 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12860754/17287.branch-1.1.v4.txt
 |
| JIRA Issue | HBASE-17287 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux ffefe1b62b78 

[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944188#comment-15944188
 ] 

Hadoop QA commented on HBASE-17287:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 2s 
{color} | {color:blue} The patch file was not named according to hbase's naming 
conventions. Please see 
https://yetus.apache.org/documentation/0.3.0/precommit-patchnames for 
instructions. {color} |
| {color:blue}0{color} | {color:blue} shelldocs {color} | {color:blue} 0m 3s 
{color} | {color:blue} Shelldocs was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
53s {color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s 
{color} | {color:green} branch-1 passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} branch-1 passed with JDK v1.7.0_80 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
0s {color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
20s {color} | {color:green} branch-1 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 13s 
{color} | {color:red} hbase-server in branch-1 has 2 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} branch-1 passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s 
{color} | {color:green} branch-1 passed with JDK v1.7.0_80 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed with JDK v1.7.0_80 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
57s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 
4s {color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
15m 55s {color} | {color:green} The patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed with JDK v1.7.0_80 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 86m 33s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
21s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 118m 16s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.client.TestReplicasClient |
\\
\\
|| Subsystem || Report/Notes ||
| 

[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-27 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943945#comment-15943945
 ] 

Enis Soztutar commented on HBASE-17287:
---

bq. How do we assert that master comes down due to the added check ?
Just assert that master is not running (using Waiter.waitFor() and check 
MiniHBaseCluster.getLiveMasterThreads() or something). No need to check whether 
master aborted because of that. 
bq. How about adding test in another issue ?
Realistically that never happens. 

> Master becomes a zombie if filesystem object closes
> ---
>
> Key: HBASE-17287
> URL: https://issues.apache.org/jira/browse/HBASE-17287
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Clay B.
>Assignee: Ted Yu
> Fix For: 1.4.0, 2.0
>
> Attachments: 17287.branch-1.v3.txt, 17287.master.v2.txt, 
> 17287.master.v3.txt, 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's 
> HDFS client is unable to stabilize before 
> {{dfs.client.failover.max.attempts}} then the master's filesystem object 
> closes. This seems to result in an HBase master which will continue to run 
> (process and znode exists) but no meaningful work can be done (e.g. assigning 
> meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192 
> ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: 
> Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log 
> splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat 
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
>  org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at
>  java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: 
> Filesystem closed{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-27 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943922#comment-15943922
 ] 

Ted Yu commented on HBASE-17287:


I ran the two failed tests reported above locally and they passed.

> Master becomes a zombie if filesystem object closes
> ---
>
> Key: HBASE-17287
> URL: https://issues.apache.org/jira/browse/HBASE-17287
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Clay B.
>Assignee: Ted Yu
> Fix For: 1.4.0, 2.0
>
> Attachments: 17287.branch-1.v3.txt, 17287.master.v2.txt, 
> 17287.master.v3.txt, 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's 
> HDFS client is unable to stabilize before 
> {{dfs.client.failover.max.attempts}} then the master's filesystem object 
> closes. This seems to result in an HBase master which will continue to run 
> (process and znode exists) but no meaningful work can be done (e.g. assigning 
> meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192 
> ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: 
> Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log 
> splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat 
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
>  org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at
>  java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: 
> Filesystem closed{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-27 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943910#comment-15943910
 ] 

Ted Yu commented on HBASE-17287:


How do we assert that master comes down due to the added check ?

Difference in handling meta server shutdown between master branch and branch-1 
may also pose some challenge in writing test.

How about adding test in another issue ?

> Master becomes a zombie if filesystem object closes
> ---
>
> Key: HBASE-17287
> URL: https://issues.apache.org/jira/browse/HBASE-17287
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Clay B.
>Assignee: Ted Yu
> Fix For: 1.4.0, 2.0
>
> Attachments: 17287.branch-1.v3.txt, 17287.master.v2.txt, 
> 17287.master.v3.txt, 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's 
> HDFS client is unable to stabilize before 
> {{dfs.client.failover.max.attempts}} then the master's filesystem object 
> closes. This seems to result in an HBase master which will continue to run 
> (process and znode exists) but no meaningful work can be done (e.g. assigning 
> meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192 
> ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: 
> Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log 
> splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat 
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
>  org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at
>  java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: 
> Filesystem closed{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943787#comment-15943787
 ] 

Hadoop QA commented on HBASE-17287:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 4s 
{color} | {color:blue} The patch file was not named according to hbase's naming 
conventions. Please see 
https://yetus.apache.org/documentation/0.3.0/precommit-patchnames for 
instructions. {color} |
| {color:blue}0{color} | {color:blue} shelldocs {color} | {color:blue} 0m 4s 
{color} | {color:blue} Shelldocs was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
56s {color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} branch-1 passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s 
{color} | {color:green} branch-1 passed with JDK v1.7.0_80 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
57s {color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} branch-1 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 57s 
{color} | {color:red} hbase-server in branch-1 has 2 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} branch-1 passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s 
{color} | {color:green} branch-1 passed with JDK v1.7.0_80 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
42s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed with JDK v1.7.0_80 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
56s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 
4s {color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
15m 11s {color} | {color:green} The patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed with JDK v1.7.0_80 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 88m 10s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
27s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 118m 51s {color} 
| {color:black} {color} |
\\
\\
|| 

[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-27 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943768#comment-15943768
 ] 

Enis Soztutar commented on HBASE-17287:
---

bq. Once the meta server was killed, I observed the following in master log
Sounds good. Is there an easy way to unit test this? Start a mini cluster + 
hdfs, and use hdfs admin to put NN in safe mode, wait until master aborts 
maybe? 

> Master becomes a zombie if filesystem object closes
> ---
>
> Key: HBASE-17287
> URL: https://issues.apache.org/jira/browse/HBASE-17287
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Clay B.
>Assignee: Ted Yu
> Fix For: 1.4.0, 2.0
>
> Attachments: 17287.branch-1.v3.txt, 17287.master.v2.txt, 
> 17287.master.v3.txt, 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's 
> HDFS client is unable to stabilize before 
> {{dfs.client.failover.max.attempts}} then the master's filesystem object 
> closes. This seems to result in an HBase master which will continue to run 
> (process and znode exists) but no meaningful work can be done (e.g. assigning 
> meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192 
> ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: 
> Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log 
> splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat 
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
>  org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at
>  java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: 
> Filesystem closed{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-27 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943637#comment-15943637
 ] 

Ted Yu commented on HBASE-17287:


Performed the above procedure on 1.1 cluster patched with 17287.branch-1.v3.txt
Once the meta server was killed, I observed the following in master log:
{code}
2017-03-27 16:52:01,080 FATAL [MASTER_SERVER_OPERATIONS-cn013:16000-1] 
master.HMaster: Master server abort: loaded coprocessors are: 
[org.apache.hadoop.hbase.backup.   master.BackupController]
2017-03-27 16:52:01,080 FATAL [MASTER_SERVER_OPERATIONS-cn013:16000-1] 
master.HMaster: Shutting down HBase cluster: file system not available
java.io.IOException: File system is in safemode, it can't be written now
at 
org.apache.hadoop.hbase.util.FSUtils.checkDfsSafeMode(FSUtils.java:561)
at 
org.apache.hadoop.hbase.master.MasterFileSystem.checkFileSystem(MasterFileSystem.java:202)
at 
org.apache.hadoop.hbase.master.MasterFileSystem.getLogDirs(MasterFileSystem.java:372)
at 
org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:425)
at 
org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:402)
at 
org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:319)
at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:213)
at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}

> Master becomes a zombie if filesystem object closes
> ---
>
> Key: HBASE-17287
> URL: https://issues.apache.org/jira/browse/HBASE-17287
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Clay B.
>Assignee: Ted Yu
> Fix For: 1.4.0, 2.0
>
> Attachments: 17287.branch-1.v3.txt, 17287.master.v2.txt, 
> 17287.master.v3.txt, 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's 
> HDFS client is unable to stabilize before 
> {{dfs.client.failover.max.attempts}} then the master's filesystem object 
> closes. This seems to result in an HBase master which will continue to run 
> (process and znode exists) but no meaningful work can be done (e.g. assigning 
> meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192 
> ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: 
> Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log 
> splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat 
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
>  org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at
>  java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: 
> Filesystem closed{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-25 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941874#comment-15941874
 ] 

Ted Yu commented on HBASE-17287:


The test scenario would be:

* accumulate some WAL edits
* bring namenode to safe mode
* verify that master comes down when splitting WAL fails

Planning to do the above next week.

> Master becomes a zombie if filesystem object closes
> ---
>
> Key: HBASE-17287
> URL: https://issues.apache.org/jira/browse/HBASE-17287
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Clay B.
>Assignee: Ted Yu
> Fix For: 1.4.0, 2.0
>
> Attachments: 17287.master.v2.txt, 17287.master.v3.txt, 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's 
> HDFS client is unable to stabilize before 
> {{dfs.client.failover.max.attempts}} then the master's filesystem object 
> closes. This seems to result in an HBase master which will continue to run 
> (process and znode exists) but no meaningful work can be done (e.g. assigning 
> meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192 
> ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: 
> Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log 
> splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat 
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
>  org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at
>  java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: 
> Filesystem closed{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-24 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941380#comment-15941380
 ] 

Enis Soztutar commented on HBASE-17287:
---

v3 patch seems fine. Were you able to test it? 

> Master becomes a zombie if filesystem object closes
> ---
>
> Key: HBASE-17287
> URL: https://issues.apache.org/jira/browse/HBASE-17287
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Clay B.
>Assignee: Ted Yu
> Fix For: 1.4.0, 2.0
>
> Attachments: 17287.master.v2.txt, 17287.master.v3.txt, 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's 
> HDFS client is unable to stabilize before 
> {{dfs.client.failover.max.attempts}} then the master's filesystem object 
> closes. This seems to result in an HBase master which will continue to run 
> (process and znode exists) but no meaningful work can be done (e.g. assigning 
> meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192 
> ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: 
> Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log 
> splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat 
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
>  org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at
>  java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: 
> Filesystem closed{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-24 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940679#comment-15940679
 ] 

Ted Yu commented on HBASE-17287:


[~devaraj] [~clayb]:
What do you think of latest patch ?

> Master becomes a zombie if filesystem object closes
> ---
>
> Key: HBASE-17287
> URL: https://issues.apache.org/jira/browse/HBASE-17287
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Clay B.
>Assignee: Ted Yu
> Fix For: 1.4.0, 2.0
>
> Attachments: 17287.master.v2.txt, 17287.master.v3.txt, 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's 
> HDFS client is unable to stabilize before 
> {{dfs.client.failover.max.attempts}} then the master's filesystem object 
> closes. This seems to result in an HBase master which will continue to run 
> (process and znode exists) but no meaningful work can be done (e.g. assigning 
> meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192 
> ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: 
> Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log 
> splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat 
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
>  org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at
>  java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: 
> Filesystem closed{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939444#comment-15939444
 ] 

Hadoop QA commented on HBASE-17287:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 3m 3s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 10s 
{color} | {color:blue} The patch file was not named according to hbase's naming 
conventions. Please see 
https://yetus.apache.org/documentation/0.3.0/precommit-patchnames for 
instructions. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 
7s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
41s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
41s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
42s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
26m 18s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
44s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 99m 6s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 141m 8s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12860230/17287.master.v3.txt |
| JIRA Issue | HBASE-17287 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 4c315fa47d51 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / f1c1f25 |
| Default Java | 1.8.0_121 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/6209/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/6209/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Master becomes a zombie if filesystem object closes
> 

[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939153#comment-15939153
 ] 

Hadoop QA commented on HBASE-17287:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 4s 
{color} | {color:blue} The patch file was not named according to hbase's naming 
conventions. Please see 
https://yetus.apache.org/documentation/0.3.0/precommit-patchnames for 
instructions. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
9s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
46s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
20s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
35s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
41s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
25m 48s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 102m 35s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
21s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 143m 30s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.13.1 Server=1.13.1 Image:yetus/hbase:8d52d23 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12860192/17287.master.v2.txt |
| JIRA Issue | HBASE-17287 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux ba94894d1a06 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / f1c1f25 |
| Default Java | 1.8.0_121 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/6207/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/6207/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Master becomes a zombie if filesystem object closes
> 

[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-23 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939141#comment-15939141
 ] 

Ted Yu commented on HBASE-17287:


Unfortunately there is no dedicated subclass of IOE which expresses the close 
of filesystem

See the following hdfs tests which look for "Filesystem closed" :

http://pastebin.com/m1Ax5E2H

> Master becomes a zombie if filesystem object closes
> ---
>
> Key: HBASE-17287
> URL: https://issues.apache.org/jira/browse/HBASE-17287
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Clay B.
>Assignee: Ted Yu
> Attachments: 17287.master.v2.txt, 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's 
> HDFS client is unable to stabilize before 
> {{dfs.client.failover.max.attempts}} then the master's filesystem object 
> closes. This seems to result in an HBase master which will continue to run 
> (process and znode exists) but no meaningful work can be done (e.g. assigning 
> meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192 
> ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: 
> Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log 
> splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat 
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
>  org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at
>  java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: 
> Filesystem closed{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2017-03-23 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939131#comment-15939131
 ] 

Devaraj Das commented on HBASE-17287:
-

The approach seems brittle - doing string checks on exceptions. I am hoping 
there is a better way to address it?

> Master becomes a zombie if filesystem object closes
> ---
>
> Key: HBASE-17287
> URL: https://issues.apache.org/jira/browse/HBASE-17287
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Clay B.
>Assignee: Ted Yu
> Attachments: 17287.master.v2.txt, 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's 
> HDFS client is unable to stabilize before 
> {{dfs.client.failover.max.attempts}} then the master's filesystem object 
> closes. This seems to result in an HBase master which will continue to run 
> (process and znode exists) but no meaningful work can be done (e.g. assigning 
> meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192 
> ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: 
> Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log 
> splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat 
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
>  org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at
>  java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: 
> Filesystem closed{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes

2016-12-09 Thread Clay B. (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15736853#comment-15736853
 ] 

Clay B. commented on HBASE-17287:
-

For reference, as an operator, I would expect the Master to only do two things 
at this point:
* retry opening the filesystem indefinitely (harder to reason about though)
* most simply exit out, if it is unable to make forward progress opposed to 
lingering around as a zombie

Here the HDFS instability was caused by the HDFS namenodes all being 
unavailable.

> Master becomes a zombie if filesystem object closes
> ---
>
> Key: HBASE-17287
> URL: https://issues.apache.org/jira/browse/HBASE-17287
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Clay B.
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's 
> HDFS client is unable to stabilize before 
> {{dfs.client.failover.max.attempts}} then the master's filesystem object 
> closes. This seems to result in an HBase master which will continue to run 
> (process and znode exists) but no meaningful work can be done (e.g. assigning 
> meta).What we saw in our HBase master logs was:{code}2016-12-01 19:19:08,192 
> ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: 
> Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log 
> splitting for cluster-r5n12.bloomberg.com,60200,1480632863218, will retryat 
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
>  org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at
>  java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: 
> Filesystem closed{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)