subject:"\[jira\] \[Commented\] \(HBASE\-14317\) Stuck FSHLog\: bad disk \(HDFS\-8960\) and can't roll WAL"

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2018-03-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381862#comment-16381862
 ] 

Hudson commented on HBASE-14317:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #4669 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/4669/])
HBASE-20107 Add a test case for HBASE-14317 (Zephyr Guo) (tedyu: rev 
d7adc58e5203567b8083160d45f85f9986e272cd)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestWALLockup.java


> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14317.branch-1.txt, 14317.branch-1.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.test.txt, 14317v10.txt, 
> 14317v11.txt, 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v15.txt, 
> 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log, timeouts.branch-1.txt
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905268#comment-14905268
 ] 

Hudson commented on HBASE-14317:


FAILURE: Integrated in HBase-1.1 #677 (See 
[https://builds.apache.org/job/HBase-1.1/677/])
Revert "HBASE-14373 Backport parent 'HBASE-14317 Stuck FSHLog' issue to 1.1 and 
1.0" (stack: rev 5b0f30d5f4dc71286ac8c6d8ed8dbc6b4f816c28)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALKey.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiVersionConsistencyControl.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSWALEntry.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/LogRoller.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogReader.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogWriter.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SyncFuture.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
HBASE-14374 Backport parent 'HBASE-14317 Stuck FSHLog' issue to 1.1 (stack: rev 
0bf97bac2ed564994a0bcda5f1993260bf0b448f)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/DamagedWALException.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestWALLockup.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALKey.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/LogRoller.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogWriter.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSWALEntry.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestFailedAppendAndSync.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SyncFuture.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiVersionConsistencyControl.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogReader.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java


> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14317.branch-1.txt, 14317.branch-1.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.test.txt, 14317v10.txt, 
> 14317v11.txt, 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v15.txt, 
> 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log, timeouts.branch-1.txt
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-22 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903723#comment-14903723
 ] 

Hudson commented on HBASE-14317:


FAILURE: Integrated in HBase-1.1 #676 (See 
[https://builds.apache.org/job/HBase-1.1/676/])
HBASE-14373 Backport parent 'HBASE-14317 Stuck FSHLog' issue to 1.1 and 1.0 
(stack: rev 2966e2744a5597a8066f265a49d7528307bcb5f4)
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/LogRoller.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALKey.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SyncFuture.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiVersionConsistencyControl.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogWriter.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogReader.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSWALEntry.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java


> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14317.branch-1.txt, 14317.branch-1.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.test.txt, 14317v10.txt, 
> 14317v11.txt, 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v15.txt, 
> 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log, timeouts.branch-1.txt
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-07 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14733357#comment-14733357
 ] 

Hudson commented on HBASE-14317:


FAILURE: Integrated in HBase-1.2 #154 (See 
[https://builds.apache.org/job/HBase-1.2/154/])
HBASE-14317 Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL (stack: rev 
990e3698a7ca7e95894150a2905ba4271eb371e9)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSWALEntry.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/DamagedWALException.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiVersionConcurrencyControl.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogWriter.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConcurrencyControl.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/LogRoller.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SyncFuture.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogReader.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALKey.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestFailedAppendAndSync.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiVersionConcurrencyControlBasic.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestWALLockup.java


> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14317.branch-1.txt, 14317.branch-1.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.test.txt, 14317v10.txt, 
> 14317v11.txt, 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v15.txt, 
> 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log, timeouts.branch-1.txt
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-07 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14733445#comment-14733445
 ] 

Hudson commented on HBASE-14317:


FAILURE: Integrated in HBase-1.3-IT #136 (See 
[https://builds.apache.org/job/HBase-1.3-IT/136/])
HBASE-14317 Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL (stack: rev 
bbafb47f7271449d46b46569ca9f0cb227b44c6e)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiVersionConcurrencyControl.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogReader.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogWriter.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/LogRoller.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/DamagedWALException.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSWALEntry.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConcurrencyControl.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SyncFuture.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALKey.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiVersionConcurrencyControlBasic.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestFailedAppendAndSync.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestWALLockup.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java


> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14317.branch-1.txt, 14317.branch-1.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.test.txt, 14317v10.txt, 
> 14317v11.txt, 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v15.txt, 
> 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log, timeouts.branch-1.txt
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-07 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1474#comment-1474
 ] 

Hudson commented on HBASE-14317:


FAILURE: Integrated in HBase-1.3 #152 (See 
[https://builds.apache.org/job/HBase-1.3/152/])
HBASE-14317 Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL (stack: rev 
bbafb47f7271449d46b46569ca9f0cb227b44c6e)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/LogRoller.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/DamagedWALException.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestFailedAppendAndSync.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiVersionConcurrencyControl.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiVersionConcurrencyControlBasic.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogWriter.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestWALLockup.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSWALEntry.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALKey.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConcurrencyControl.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogReader.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SyncFuture.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java


> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14317.branch-1.txt, 14317.branch-1.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.test.txt, 14317v10.txt, 
> 14317v11.txt, 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v15.txt, 
> 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log, timeouts.branch-1.txt
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-07 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14733403#comment-14733403
 ] 

Hudson commented on HBASE-14317:


SUCCESS: Integrated in HBase-1.2-IT #130 (See 
[https://builds.apache.org/job/HBase-1.2-IT/130/])
HBASE-14317 Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL (stack: rev 
990e3698a7ca7e95894150a2905ba4271eb371e9)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogReader.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestFailedAppendAndSync.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConcurrencyControl.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestWALLockup.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SyncFuture.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiVersionConcurrencyControl.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALKey.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogWriter.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/LogRoller.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSWALEntry.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/DamagedWALException.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiVersionConcurrencyControlBasic.java


> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14317.branch-1.txt, 14317.branch-1.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.test.txt, 14317v10.txt, 
> 14317v11.txt, 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v15.txt, 
> 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log, timeouts.branch-1.txt
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-06 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14733266#comment-14733266
 ] 

stack commented on HBASE-14317:
---

Looking at recent 1.2 builds before this patch went in, it looks like the tests 
cited above are already problematic:


kalashnikov:hbase.git.commit stack$ python ./dev-support/findHangingTests.py  
https://builds.apache.org/view/H-L/view/HBase/job/HBase-1.2/151/jdk=latest1.7,label=Hadoop/consoleText
Fetching the console output from the URL
Printing hanging tests
Hanging test : 
org.apache.hadoop.hbase.security.access.TestWithDisabledAuthorization
Hanging test : org.apache.hadoop.hbase.security.access.TestAccessController2
Hanging test : org.apache.hadoop.hbase.client.TestHCM
Hanging test : org.apache.hadoop.hbase.regionserver.TestHRegion
Hanging test : org.apache.hadoop.hbase.security.access.TestAccessController
Hanging test : 
org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancer
Printing Failing tests
Failing test : 
org.apache.hadoop.hbase.replication.regionserver.TestRegionReplicaReplicationEndpoint

or 



kalashnikov:hbase.git.commit stack$ python ./dev-support/findHangingTests.py 
https://builds.apache.org/view/H-L/view/HBase/job/HBase-1.2/150/jdk=latest1.7,label=Hadoop/consoleText
Fetching the console output from the URL
Printing hanging tests
Hanging test : 
org.apache.hadoop.hbase.security.access.TestWithDisabledAuthorization
Hanging test : org.apache.hadoop.hbase.security.access.TestAccessController2
Printing Failing tests
Failing test : org.apache.hadoop.hbase.client.TestSnapshotCloneIndependence
Failing test : org.apache.hadoop.hbase.regionserver.TestSplitWalDataLoss
Failing test : org.apache.hadoop.hbase.replication.TestReplicationEndpoint


1.2 builds are failing with a while. Will be back to fix failures.



> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14317.branch-1.txt, 14317.branch-1.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.test.txt, 14317v10.txt, 
> 14317v11.txt, 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v15.txt, 
> 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log, timeouts.branch-1.txt
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-06 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14733241#comment-14733241
 ] 

stack commented on HBASE-14317:
---

This fail has these zombies:

kalashnikov:hbase.git.commit stack$ python ./dev-support/findHangingTests.py 
https://builds.apache.org/job/PreCommit-HBASE-Build/15446//consoleText
Fetching the console output from the URL
Printing hanging tests
Hanging test : 
org.apache.hadoop.hbase.security.access.TestWithDisabledAuthorization
Hanging test : org.apache.hadoop.hbase.security.access.TestScanEarlyTermination
Hanging test : org.apache.hadoop.hbase.security.access.TestAccessController2
Hanging test : 
org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithACL
Printing Failing tests
Failing test : org.apache.hadoop.hbase.client.TestSnapshotCloneIndependence


Some overlap.

I'm just going to commit this fat patch and then work on these seemingly 
unrelated zombies.



> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.branch-1.txt, 14317.branch-1.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.test.txt, 14317v10.txt, 
> 14317v11.txt, 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v15.txt, 
> 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log, timeouts.branch-1.txt
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14733171#comment-14733171
 ] 

Hadoop QA commented on HBASE-14317:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12754410/timeouts.branch-1.txt
  against branch-1 branch at commit bada19bb54a358233db2b3e23c86d215ac2dc29b.
  ATTACHMENT ID: 12754410

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 15 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

 {color:red}-1 core zombie tests{color}.  There are 6 zombie test(s):   
at 
org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithACL.testScanForUserWithFewerLabelAuthsThanLabelsInScanAuthorizations(TestVisibilityLabelsWithACL.java:121)
at 
org.apache.hadoop.hbase.security.access.TestScanEarlyTermination.testEarlyScanTermination(TestScanEarlyTermination.java:148)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15446//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15446//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15446//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15446//console

This message is automatically generated.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.branch-1.txt, 14317.branch-1.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.test.txt, 14317v10.txt, 
> 14317v11.txt, 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v15.txt, 
> 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log, timeouts.branch-1.txt
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14732242#comment-14732242
 ] 

Hadoop QA commented on HBASE-14317:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12754360/14317.branch-1.v2.txt
  against branch-1 branch at commit bada19bb54a358233db2b3e23c86d215ac2dc29b.
  ATTACHMENT ID: 12754360

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 18 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

 {color:red}-1 core zombie tests{color}.  There are 5 zombie test(s):   
at 
org.apache.hadoop.hbase.security.access.TestAccessController.testCheckPermissions(TestAccessController.java:1632)
at 
org.apache.openjpa.persistence.datacache.TestClearableScheduler.testBasic(TestClearableScheduler.java:76)
at 
org.apache.openjpa.persistence.test.AbstractPersistenceTestCase.runTest(AbstractPersistenceTestCase.java:578)
at 
org.apache.openjpa.persistence.test.AbstractPersistenceTestCase.runBare(AbstractPersistenceTestCase.java:565)
at 
org.apache.openjpa.persistence.test.AbstractPersistenceTestCase.runBare(AbstractPersistenceTestCase.java:541)
at 
org.apache.openjpa.persistence.test.AbstractPersistenceTestCase.run(AbstractPersistenceTestCase.java:206)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15444//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15444//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15444//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15444//console

This message is automatically generated.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.branch-1.txt, 14317.branch-1.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.test.txt, 14317v10.txt, 
> 14317v11.txt, 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v15.txt, 
> 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-06 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14732447#comment-14732447
 ] 

stack commented on HBASE-14317:
---

Says these hbase tests are zombies:

Printing hanging tests
Hanging test : 
org.apache.hadoop.hbase.security.access.TestWithDisabledAuthorization
Hanging test : org.apache.hadoop.hbase.security.access.TestAccessController2
Hanging test : org.apache.hadoop.hbase.security.access.TestAccessController
Hanging test : org.apache.hadoop.hbase.security.access.TestAccessControlFilter


> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.branch-1.txt, 14317.branch-1.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.test.txt, 14317v10.txt, 
> 14317v11.txt, 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v15.txt, 
> 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-05 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731825#comment-14731825
 ] 

stack commented on HBASE-14317:
---

Velocity errors again.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.branch-1.txt, 14317.branch-1.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.test.txt, 14317v10.txt, 14317v11.txt, 14317v12.txt, 14317v13.txt, 
> 14317v14.txt, 14317v15.txt, 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, 
> HBASE-14317-v1.patch, HBASE-14317-v2.patch, HBASE-14317-v3.patch, 
> HBASE-14317-v4.patch, HBASE-14317.patch, [Java] RS stuck on WAL sync to a 
> dead DN - Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, 
> san_dump.txt, subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731830#comment-14731830
 ] 

Hadoop QA commented on HBASE-14317:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12754317/14317.branch-1.v2.txt
  against branch-1 branch at commit a11f5c55b4d247c3ac0950398624383ec38e6f1b.
  ATTACHMENT ID: 12754317

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 18 new 
or modified tests.

{color:red}-1 javac{color}.  The patch appears to cause mvn compile goal to 
fail with Hadoop version 2.4.0.

Compilation errors resume:
[ERROR] Error invoking method 'get(java.lang.Integer)' in 
java.util.ArrayList at META-INF/LICENSE.vm[line 1619, column 22]
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process (default) on 
project hbase-assembly: Error rendering velocity resource. Error invoking 
method 'get(java.lang.Integer)' in java.util.ArrayList at 
META-INF/LICENSE.vm[line 1619, column 22]: InvocationTargetException: Index: 0, 
Size: 0 -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hbase-assembly


Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15432//console

This message is automatically generated.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.branch-1.txt, 14317.branch-1.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.test.txt, 14317v10.txt, 14317v11.txt, 
> 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v15.txt, 
> 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14732084#comment-14732084
 ] 

Hadoop QA commented on HBASE-14317:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12754333/14317.branch-1.v2.txt
  against branch-1 branch at commit bada19bb54a358233db2b3e23c86d215ac2dc29b.
  ATTACHMENT ID: 12754333

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 18 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15439//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15439//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15439//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15439//console

This message is automatically generated.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.branch-1.txt, 14317.branch-1.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.test.txt, 14317v10.txt, 14317v11.txt, 14317v12.txt, 14317v13.txt, 
> 14317v14.txt, 14317v15.txt, 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, 
> HBASE-14317-v1.patch, HBASE-14317-v2.patch, HBASE-14317-v3.patch, 
> HBASE-14317-v4.patch, HBASE-14317.patch, [Java] RS stuck on WAL sync to a 
> dead DN - Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, 
> san_dump.txt, subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14732022#comment-14732022
 ] 

Hadoop QA commented on HBASE-14317:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12754328/14317.branch-1.v2.txt
  against branch-1 branch at commit bada19bb54a358233db2b3e23c86d215ac2dc29b.
  ATTACHMENT ID: 12754328

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 18 new 
or modified tests.

{color:red}-1 javac{color}.  The patch appears to cause mvn compile goal to 
fail with Hadoop version 2.4.0.

Compilation errors resume:
[ERROR] Error invoking method 'get(java.lang.Integer)' in 
java.util.ArrayList at META-INF/LICENSE.vm[line 1619, column 22]
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process (default) on 
project hbase-assembly: Error rendering velocity resource. Error invoking 
method 'get(java.lang.Integer)' in java.util.ArrayList at 
META-INF/LICENSE.vm[line 1619, column 22]: InvocationTargetException: Index: 0, 
Size: 0 -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hbase-assembly


Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15434//console

This message is automatically generated.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.branch-1.txt, 14317.branch-1.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.test.txt, 14317v10.txt, 
> 14317v11.txt, 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v15.txt, 
> 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-05 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14732025#comment-14732025
 ] 

Sean Busbey commented on HBASE-14317:
-

Are these all on H2? Can we just blacklist it?

-- 
Sean



> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.branch-1.txt, 14317.branch-1.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.test.txt, 14317v10.txt, 
> 14317v11.txt, 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v15.txt, 
> 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-05 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14732040#comment-14732040
 ] 

stack commented on HBASE-14317:
---

I was just looking, and yeah, seem to be H2. Lets blacklist.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.branch-1.txt, 14317.branch-1.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.test.txt, 14317v10.txt, 
> 14317v11.txt, 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v15.txt, 
> 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-05 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14732043#comment-14732043
 ] 

stack commented on HBASE-14317:
---

Looks like you did it [~busbey]. Thanks. Let me retry patch.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.branch-1.txt, 14317.branch-1.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.test.txt, 14317v10.txt, 
> 14317v11.txt, 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v15.txt, 
> 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-05 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14732145#comment-14732145
 ] 

stack commented on HBASE-14317:
---

Hanging tests are:

Printing hanging tests
Hanging test : 
org.apache.hadoop.hbase.security.access.TestWithDisabledAuthorization
Hanging test : org.apache.hadoop.hbase.security.access.TestAccessController2
Hanging test : org.apache.hadoop.hbase.client.TestHCM
Hanging test : org.apache.hadoop.hbase.security.access.TestCellACLs
Printing Failing tests
Failing test : org.apache.hadoop.hbase.util.TestHBaseFsck

... seems unrelated to change. Let me retry.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.branch-1.txt, 14317.branch-1.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.test.txt, 14317v10.txt, 14317v11.txt, 14317v12.txt, 14317v13.txt, 
> 14317v14.txt, 14317v15.txt, 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, 
> HBASE-14317-v1.patch, HBASE-14317-v2.patch, HBASE-14317-v3.patch, 
> HBASE-14317-v4.patch, HBASE-14317.patch, [Java] RS stuck on WAL sync to a 
> dead DN - Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, 
> san_dump.txt, subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14732177#comment-14732177
 ] 

Hadoop QA commented on HBASE-14317:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12754352/14317.branch-1.v2.txt
  against branch-1 branch at commit bada19bb54a358233db2b3e23c86d215ac2dc29b.
  ATTACHMENT ID: 12754352

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 18 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

 {color:red}-1 core zombie tests{color}.  There are 5 zombie test(s):   
at org.apache.hadoop.hbase.client.TestHCM.testConnectionClose(TestHCM.java:398)
at 
org.apache.hadoop.hbase.client.TestHCM.testConnectionCloseAllowsInterrupt(TestHCM.java:281)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15442//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15442//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15442//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15442//console

This message is automatically generated.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.branch-1.txt, 14317.branch-1.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.test.txt, 14317v10.txt, 14317v11.txt, 
> 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v15.txt, 
> 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-05 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14732181#comment-14732181
 ] 

stack commented on HBASE-14317:
---

Hanging tests this time are:
kalashnikov:hbase.git.commit stack$ python ./dev-support/findHangingTests.py  
https://builds.apache.org/job/PreCommit-HBASE-Build/15442//consoleText
Fetching the console output from the URL
Printing hanging tests
Hanging test : 
org.apache.hadoop.hbase.security.access.TestWithDisabledAuthorization
Hanging test : org.apache.hadoop.hbase.security.access.TestAccessController2
Hanging test : org.apache.hadoop.hbase.client.TestHCM
Hanging test : org.apache.hadoop.hbase.security.access.TestAccessControlFilter
Printing Failing tests

Some items in common. Let me retry and look at what happens local. I've been 
trying to run all tests locally too. The two last runs seem to be missing 
TestWithDisabledAuthorization, TestAccessController2, and 
TestAccessControlFilter. The other tests all pass for me. Let me rerun and try 
these tests locally too.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.branch-1.txt, 14317.branch-1.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.branch-1.v2.txt, 14317.test.txt, 14317v10.txt, 14317v11.txt, 
> 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v15.txt, 
> 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730700#comment-14730700
 ] 

Hudson commented on HBASE-14317:


FAILURE: Integrated in HBase-TRUNK #6780 (See 
[https://builds.apache.org/job/HBase-TRUNK/6780/])
HBASE-14317 Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL; addendum2 -- 
found a fix testing the branch-1 patch (stack: rev 
ec4d719f1927576d3de321c2e380e4c4acd099db)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java


> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.branch-1.txt, 14317.test.txt, 14317v10.txt, 
> 14317v11.txt, 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v15.txt, 
> 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-04 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730963#comment-14730963
 ] 

Sean Busbey commented on HBASE-14317:
-

they were both on H2, so the second failure is probably just the stored dep 
from the first (or an earlier) faliure. on the plus side this means the next 
job to run on H2 will also fail and then I'll have the LICENSE file.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.branch-1.txt, 14317.branch-1.txt, 14317.test.txt, 
> 14317v10.txt, 14317v11.txt, 14317v12.txt, 14317v13.txt, 14317v14.txt, 
> 14317v15.txt, 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, 
> HBASE-14317-v1.patch, HBASE-14317-v2.patch, HBASE-14317-v3.patch, 
> HBASE-14317-v4.patch, HBASE-14317.patch, [Java] RS stuck on WAL sync to a 
> dead DN - Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, 
> san_dump.txt, subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-04 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730962#comment-14730962
 ] 

Sean Busbey commented on HBASE-14317:
-

dang it. we don't save the generated LICENSE/NOTICE files, so my new debug code 
to see the bad dependency doesn't help. updated the precommit job to start 
saving them. :/

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.branch-1.txt, 14317.branch-1.txt, 14317.test.txt, 
> 14317v10.txt, 14317v11.txt, 14317v12.txt, 14317v13.txt, 14317v14.txt, 
> 14317v15.txt, 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, 
> HBASE-14317-v1.patch, HBASE-14317-v2.patch, HBASE-14317-v3.patch, 
> HBASE-14317-v4.patch, HBASE-14317.patch, [Java] RS stuck on WAL sync to a 
> dead DN - Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, 
> san_dump.txt, subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730959#comment-14730959
 ] 

Hadoop QA commented on HBASE-14317:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12754211/14317.branch-1.txt
  against branch-1 branch at commit ec4d719f1927576d3de321c2e380e4c4acd099db.
  ATTACHMENT ID: 12754211

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 18 new 
or modified tests.

{color:red}-1 javac{color}.  The patch appears to cause mvn compile goal to 
fail with Hadoop version 2.4.0.

Compilation errors resume:
[ERROR] Error invoking method 'get(java.lang.Integer)' in 
java.util.ArrayList at META-INF/LICENSE.vm[line 1619, column 22]
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process (default) on 
project hbase-assembly: Error rendering velocity resource. Error invoking 
method 'get(java.lang.Integer)' in java.util.ArrayList at 
META-INF/LICENSE.vm[line 1619, column 22]: InvocationTargetException: Index: 0, 
Size: 0 -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hbase-assembly


Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15416//console

This message is automatically generated.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.branch-1.txt, 14317.branch-1.txt, 14317.test.txt, 
> 14317v10.txt, 14317v11.txt, 14317v12.txt, 14317v13.txt, 14317v14.txt, 
> 14317v15.txt, 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, 
> HBASE-14317-v1.patch, HBASE-14317-v2.patch, HBASE-14317-v3.patch, 
> HBASE-14317-v4.patch, HBASE-14317.patch, [Java] RS stuck on WAL sync to a 
> dead DN - Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, 
> san_dump.txt, subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731824#comment-14731824
 ] 

Hadoop QA commented on HBASE-14317:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12754316/14317.branch-1.v2.txt
  against branch-1 branch at commit a11f5c55b4d247c3ac0950398624383ec38e6f1b.
  ATTACHMENT ID: 12754316

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 18 new 
or modified tests.

{color:red}-1 javac{color}.  The patch appears to cause mvn compile goal to 
fail with Hadoop version 2.4.0.

Compilation errors resume:
[ERROR] Error invoking method 'get(java.lang.Integer)' in 
java.util.ArrayList at META-INF/LICENSE.vm[line 1619, column 22]
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process (default) on 
project hbase-assembly: Error rendering velocity resource. Error invoking 
method 'get(java.lang.Integer)' in java.util.ArrayList at 
META-INF/LICENSE.vm[line 1619, column 22]: InvocationTargetException: Index: 0, 
Size: 0 -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hbase-assembly


Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15430//console

This message is automatically generated.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.branch-1.txt, 14317.branch-1.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 
> 14317.test.txt, 14317v10.txt, 14317v11.txt, 14317v12.txt, 14317v13.txt, 
> 14317v14.txt, 14317v15.txt, 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, 
> HBASE-14317-v1.patch, HBASE-14317-v2.patch, HBASE-14317-v3.patch, 
> HBASE-14317-v4.patch, HBASE-14317.patch, [Java] RS stuck on WAL sync to a 
> dead DN - Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, 
> san_dump.txt, subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731807#comment-14731807
 ] 

Hadoop QA commented on HBASE-14317:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12754313/14317.branch-1.v2.txt
  against branch-1 branch at commit 2969093b5b39cb950d8710cfffa7e55484d40259.
  ATTACHMENT ID: 12754313

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 18 new 
or modified tests.

{color:red}-1 javac{color}.  The patch appears to cause mvn compile goal to 
fail with Hadoop version 2.4.0.

Compilation errors resume:
[ERROR] Error invoking method 'get(java.lang.Integer)' in 
java.util.ArrayList at META-INF/LICENSE.vm[line 1619, column 22]
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process (default) on 
project hbase-assembly: Error rendering velocity resource. Error invoking 
method 'get(java.lang.Integer)' in java.util.ArrayList at 
META-INF/LICENSE.vm[line 1619, column 22]: InvocationTargetException: Index: 0, 
Size: 0 -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hbase-assembly


Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15427//console

This message is automatically generated.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.branch-1.txt, 14317.branch-1.txt, 
> 14317.branch-1.v2.txt, 14317.test.txt, 14317v10.txt, 14317v11.txt, 
> 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v15.txt, 
> 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731820#comment-14731820
 ] 

Hadoop QA commented on HBASE-14317:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12754314/14317.branch-1.v2.txt
  against branch-1 branch at commit 2969093b5b39cb950d8710cfffa7e55484d40259.
  ATTACHMENT ID: 12754314

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 18 new 
or modified tests.

{color:red}-1 javac{color}.  The patch appears to cause mvn compile goal to 
fail with Hadoop version 2.4.0.

Compilation errors resume:
[ERROR] Error invoking method 'get(java.lang.Integer)' in 
java.util.ArrayList at META-INF/LICENSE.vm[line 1619, column 22]
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process (default) on 
project hbase-assembly: Error rendering velocity resource. Error invoking 
method 'get(java.lang.Integer)' in java.util.ArrayList at 
META-INF/LICENSE.vm[line 1619, column 22]: InvocationTargetException: Index: 0, 
Size: 0 -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hbase-assembly


Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15429//console

This message is automatically generated.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.branch-1.txt, 14317.branch-1.txt, 
> 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.test.txt, 14317v10.txt, 
> 14317v11.txt, 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v15.txt, 
> 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-04 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730452#comment-14730452
 ] 

stack commented on HBASE-14317:
---

Testing branch-1, I found this little hole. I committed it as an addendum to 
master branch. The patch is included what I've posted for branch-1.

{code}
diff --git 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
index 5708c30..c421f5c 100644
--- 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
+++ 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
@@ -878,8 +878,19 @@ public class FSHLog implements WAL {
 // Let the writer thread go regardless, whether error or not.
 if (zigzagLatch != null) {
   zigzagLatch.releaseSafePoint();
-  // It will be null if we failed our wait on safe point above.
-  if (syncFuture != null) blockOnSync(syncFuture);
+  // syncFuture will be null if we failed our wait on safe point 
above. Otherwise, if
+  // latch was obtained successfully, the sync we threw in either 
trigger the latch or it
+  // got stamped with an exception because the WAL was damaged and we 
could not sync. Now
+  // the write pipeline has been opened up again by releasing the safe 
point, process the
+  // syncFuture we got above. This is probably a noop but it may be 
stale exception from
+  // when old WAL was in place. Catch it if so.
+  if (syncFuture != null) {
+try {
+  blockOnSync(syncFuture);
+} catch (IOException ioe) {
+  if (LOG.isTraceEnabled()) LOG.trace("Stale sync exception", ioe);
+}
+  }
 }
   } finally {
 scope.close();
{code}

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.branch-1.txt, 14317.test.txt, 14317v10.txt, 
> 14317v11.txt, 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v15.txt, 
> 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730453#comment-14730453
 ] 

Hadoop QA commented on HBASE-14317:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12754161/14317.branch-1.txt
  against branch-1 branch at commit 54717a6314ef6673f7607091e5f77321c202d49f.
  ATTACHMENT ID: 12754161

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 18 new 
or modified tests.

{color:red}-1 javac{color}.  The patch appears to cause mvn compile goal to 
fail with Hadoop version 2.4.0.

Compilation errors resume:
[ERROR] Error invoking method 'get(java.lang.Integer)' in 
java.util.ArrayList at META-INF/LICENSE.vm[line 1619, column 22]
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process (default) on 
project hbase-assembly: Error rendering velocity resource. Error invoking 
method 'get(java.lang.Integer)' in java.util.ArrayList at 
META-INF/LICENSE.vm[line 1619, column 22]: InvocationTargetException: Index: 0, 
Size: 0 -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hbase-assembly


Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15412//console

This message is automatically generated.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.branch-1.txt, 14317.test.txt, 14317v10.txt, 
> 14317v11.txt, 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v15.txt, 
> 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-03 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729538#comment-14729538
 ] 

stack commented on HBASE-14317:
---

Ran on small cluster (1B ITBLL with monkeys and confirmed all data there). 
Checked logs. No hang or no complaints related to this patch. Just the usual 
complaint about slow HDFS including stuff like this:

2015-09-02 23:56:52,790 WARN  
[regionserver/c2023.halxg.cloudera.com/10.20.84.29:16020.logRoller] 
hdfs.DFSClient: Slow waitForAckedSeqno took 2577ms (threshold=20ms)

Also dfs client complaints and exceptions... but nothing from RS or related to 
WAL.

Looking at the failed test, on the one hand, the lease was just robbed on all 
WALs out from under the cluster. Let me make sure the fail is because of 
stricter semantic and not from any other byproduct. Looking at it, we should be 
able to ride over the HDFS restart. Will be back.


> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.test.txt, 14317v10.txt, 14317v11.txt, 
> 14317v12.txt, 14317v13.txt, 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, 
> HBASE-14317-v1.patch, HBASE-14317-v2.patch, HBASE-14317-v3.patch, 
> HBASE-14317-v4.patch, HBASE-14317.patch, [Java] RS stuck on WAL sync to a 
> dead DN - Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, 
> san_dump.txt, subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-03 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729578#comment-14729578
 ] 

stack commented on HBASE-14317:
---

bq. Brilliant!

Smile. This is how it was working. I just broke it by not allowing 
syncs-after-failed-appends. Sorry if gave wrong impression. Smile.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.test.txt, 14317v10.txt, 14317v11.txt, 
> 14317v12.txt, 14317v13.txt, 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, 
> HBASE-14317-v1.patch, HBASE-14317-v2.patch, HBASE-14317-v3.patch, 
> HBASE-14317-v4.patch, HBASE-14317.patch, [Java] RS stuck on WAL sync to a 
> dead DN - Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, 
> san_dump.txt, subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-03 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729572#comment-14729572
 ] 

Nick Dimiduk commented on HBASE-14317:
--

bq. Looking at it, we should be able to ride over the HDFS restart.

Brilliant!

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.test.txt, 14317v10.txt, 14317v11.txt, 
> 14317v12.txt, 14317v13.txt, 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, 
> HBASE-14317-v1.patch, HBASE-14317-v2.patch, HBASE-14317-v3.patch, 
> HBASE-14317-v4.patch, HBASE-14317.patch, [Java] RS stuck on WAL sync to a 
> dead DN - Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, 
> san_dump.txt, subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14728723#comment-14728723
 ] 

Hadoop QA commented on HBASE-14317:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12753923/14317v13.txt
  against master branch at commit 3341f13e71a25bf3f8eb5a6a57ce330b3d8a3495.
  ATTACHMENT ID: 12753923

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 15 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.regionserver.wal.TestLogRolling

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15403//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15403//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15403//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15403//console

This message is automatically generated.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.test.txt, 14317v10.txt, 14317v11.txt, 
> 14317v12.txt, 14317v13.txt, 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, 
> HBASE-14317-v1.patch, HBASE-14317-v2.patch, HBASE-14317-v3.patch, 
> HBASE-14317-v4.patch, HBASE-14317.patch, [Java] RS stuck on WAL sync to a 
> dead DN - Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, 
> san_dump.txt, subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-03 Thread Elliott Clark (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730287#comment-14730287
 ] 

Elliott Clark commented on HBASE-14317:
---

+1 still stands. The extra code clean ups are nice.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.test.txt, 14317v10.txt, 14317v11.txt, 
> 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v15.txt, 
> 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730301#comment-14730301
 ] 

Hadoop QA commented on HBASE-14317:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12754134/14317v15.txt
  against master branch at commit 2481b7f76fa7e4f2b120f8dc96004790b357e569.
  ATTACHMENT ID: 12754134

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 18 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15410//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15410//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15410//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15410//console

This message is automatically generated.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.test.txt, 14317v10.txt, 14317v11.txt, 
> 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v15.txt, 
> 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730227#comment-14730227
 ] 

Hadoop QA commented on HBASE-14317:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12754104/14317v14.txt
  against master branch at commit 5152ac0e208fd5f720734fb2abf3fae07b39c7e2.
  ATTACHMENT ID: 12754104

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 18 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
1839 checkstyle errors (more than the master's current 1838 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15408//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15408//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15408//artifact/patchprocess/checkstyle-aggregate.html

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15408//console

This message is automatically generated.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.test.txt, 14317v10.txt, 14317v11.txt, 
> 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v5.branch-1.2.txt, 
> 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, HBASE-14317-v2.patch, 
> HBASE-14317-v3.patch, HBASE-14317-v4.patch, HBASE-14317.patch, [Java] RS 
> stuck on WAL sync to a dead DN - Pastebin.com.html, append-only-test.patch, 
> raw.php, repro.txt, san_dump.txt, subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-03 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730360#comment-14730360
 ] 

Hudson commented on HBASE-14317:


FAILURE: Integrated in HBase-TRUNK #6778 (See 
[https://builds.apache.org/job/HBase-TRUNK/6778/])
HBASE-14317 Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL (stack: rev 
661faf6fe0833726d7ce7ad44a829eba3f8e3e45)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SyncFuture.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogWriter.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestWALLockup.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/LogRoller.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogReader.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConcurrencyControl.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSWALEntry.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiVersionConcurrencyControlBasic.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiVersionConcurrencyControl.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALKey.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/DamagedWALException.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestFailedAppendAndSync.java
HBASE-14317 Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL; addendum 
(stack: rev 54717a6314ef6673f7607091e5f77321c202d49f)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java


> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.test.txt, 14317v10.txt, 14317v11.txt, 
> 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v15.txt, 
> 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-03 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730243#comment-14730243
 ] 

stack commented on HBASE-14317:
---

[~eclark] Your +1 still stand?

Changes since then are comments on new stiffer semantic in code,  comments that 
our latch in getSequenceId is a bad idea (because no registry to go to to 
cancel all running on abort, log roll, etc), wrapping all sync and append 
exceptions in a DamagedWalException so clear on origin and the implication, and 
then a few small test fixes. I posted this last patch to RB. Will fix 
checkstyle.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.test.txt, 14317v10.txt, 14317v11.txt, 
> 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v5.branch-1.2.txt, 
> 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, HBASE-14317-v2.patch, 
> HBASE-14317-v3.patch, HBASE-14317-v4.patch, HBASE-14317.patch, [Java] RS 
> stuck on WAL sync to a dead DN - Pastebin.com.html, append-only-test.patch, 
> raw.php, repro.txt, san_dump.txt, subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-02 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726977#comment-14726977
 ] 

stack commented on HBASE-14317:
---

Failed because of this:

Error invoking method 'get(java.lang.Integer)' in java.util.ArrayList at 
META-INF/LICENSE.vm[line 1619, column 22]: InvocationTargetException: Index: 0, 
Size: 0 -> [Help 1]



> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.test.txt, 14317v5.branch-1.2.txt, 14317v5.txt, 
> 14317v9.txt, HBASE-14317-v1.patch, HBASE-14317-v2.patch, 
> HBASE-14317-v3.patch, HBASE-14317-v4.patch, HBASE-14317.patch, [Java] RS 
> stuck on WAL sync to a dead DN - Pastebin.com.html, append-only-test.patch, 
> raw.php, repro.txt, san_dump.txt, subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-02 Thread Elliott Clark (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726822#comment-14726822
 ] 

Elliott Clark commented on HBASE-14317:
---

bq.TestHRegion.testFlushMarkersWALFail
That looks related?

Everything else looks good to me.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.test.txt, 14317v5.branch-1.2.txt, 14317v5.txt, 
> HBASE-14317-v1.patch, HBASE-14317-v2.patch, HBASE-14317-v3.patch, 
> HBASE-14317-v4.patch, HBASE-14317.patch, [Java] RS stuck on WAL sync to a 
> dead DN - Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, 
> san_dump.txt, subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-02 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726855#comment-14726855
 ] 

Hadoop QA commented on HBASE-14317:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12753695/14317v9.txt
  against master branch at commit f8dd99d7380e5eafae62a9f0c526ba24f98eb2e5.
  ATTACHMENT ID: 12753695

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified tests.

{color:red}-1 javac{color}.  The patch appears to cause mvn compile goal to 
fail with Hadoop version 2.4.0.

Compilation errors resume:
[ERROR] Error invoking method 'get(java.lang.Integer)' in 
java.util.ArrayList at META-INF/LICENSE.vm[line 1619, column 22]
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process (default) on 
project hbase-assembly: Error rendering velocity resource. Error invoking 
method 'get(java.lang.Integer)' in java.util.ArrayList at 
META-INF/LICENSE.vm[line 1619, column 22]: InvocationTargetException: Index: 0, 
Size: 0 -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hbase-assembly


Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15387//console

This message is automatically generated.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.test.txt, 14317v5.branch-1.2.txt, 14317v5.txt, 
> 14317v9.txt, HBASE-14317-v1.patch, HBASE-14317-v2.patch, 
> HBASE-14317-v3.patch, HBASE-14317-v4.patch, HBASE-14317.patch, [Java] RS 
> stuck on WAL sync to a dead DN - Pastebin.com.html, append-only-test.patch, 
> raw.php, repro.txt, san_dump.txt, subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-02 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727661#comment-14727661
 ] 

stack commented on HBASE-14317:
---

testFlushMarkersWALFail failure was interesting. The Mockito matcher was 
failing for me. Fields were null.  Undid the mockito matcher for this test.

The second issue has to do with sloppy semantics. Previous, you could have 
append throw an exception and then a sync could go in and succeed. You could 
then carry on using the WAL as though no exception had been thrown.

This patch hardens our semantic such that once a WAL throws an exception, no 
new appends or syncs will succeed, not untill you replace the WAL. For 
testFlushMarkersWALFail, because there is no log rolling thread running, we'd 
just hang making no progress because the WAL had gone bad. I added forced log 
rolling after each test step. Also fixed weird stuff like this:

{code}
 } catch (IOException ioe) {
-  LOG.warn("Unexpected exception while wal.sync(), ignoring. 
Exception: "
-  + StringUtils.stringifyException(ioe));
+  wal.abortCacheFlush(this.getRegionInfo().getEncodedNameAsBytes());
+  throw ioe;
 }
{code}

See how we used to ignore a failed sync, just log it at WARN.

One implication of the new hardening of the semantic is that the dodgy getting 
of a sequenceid by adding an 'empty append' now fails if the WAL is bad. A log 
roll will fix it. I've been seeing some of this in tests and fix is to add in a 
log roll (in a server, we have the log rolling thread running... not in tests 
of regions only).

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.test.txt, 14317v10.txt, 14317v5.branch-1.2.txt, 
> 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, HBASE-14317-v2.patch, 
> HBASE-14317-v3.patch, HBASE-14317-v4.patch, HBASE-14317.patch, [Java] RS 
> stuck on WAL sync to a dead DN - Pastebin.com.html, append-only-test.patch, 
> raw.php, repro.txt, san_dump.txt, subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-02 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14728450#comment-14728450
 ] 

Hadoop QA commented on HBASE-14317:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12753885/14317v12.txt
  against master branch at commit a5261b6f44f338e3f4bd46fb29bed2c30e223bd4.
  ATTACHMENT ID: 12753885

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 15 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.regionserver.TestFailedAppendAndSync

 {color:red}-1 core zombie tests{color}.  There are 2 zombie test(s):   
at 
org.apache.hadoop.hbase.regionserver.TestCorruptedRegionStoreFile.testLosingFileAfterScannerInit(TestCorruptedRegionStoreFile.java:173)
at 
org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.testLogRollOnPipelineRestart(TestLogRolling.java:490)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15398//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15398//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15398//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15398//console

This message is automatically generated.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.test.txt, 14317v10.txt, 14317v11.txt, 
> 14317v12.txt, 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, 
> HBASE-14317-v1.patch, HBASE-14317-v2.patch, HBASE-14317-v3.patch, 
> HBASE-14317-v4.patch, HBASE-14317.patch, [Java] RS stuck on WAL sync to a 
> dead DN - Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, 
> san_dump.txt, subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-02 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14728214#comment-14728214
 ] 

Hadoop QA commented on HBASE-14317:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12753849/14317v11.txt
  against master branch at commit a5261b6f44f338e3f4bd46fb29bed2c30e223bd4.
  ATTACHMENT ID: 12753849

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 15 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.io.TestHeapSize

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15396//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15396//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15396//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15396//console

This message is automatically generated.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.test.txt, 14317v10.txt, 14317v11.txt, 
> 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-02 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727186#comment-14727186
 ] 

Hadoop QA commented on HBASE-14317:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12753719/14317v10.txt
  against master branch at commit f8dd99d7380e5eafae62a9f0c526ba24f98eb2e5.
  ATTACHMENT ID: 12753719

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.regionserver.TestFSErrorsExposed

 {color:red}-1 core zombie tests{color}.  There are 2 zombie test(s):   
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.testFlushMarkersWALFail(TestHRegion.java:1073)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15390//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15390//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15390//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15390//console

This message is automatically generated.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.test.txt, 14317v10.txt, 14317v5.branch-1.2.txt, 
> 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, HBASE-14317-v2.patch, 
> HBASE-14317-v3.patch, HBASE-14317-v4.patch, HBASE-14317.patch, [Java] RS 
> stuck on WAL sync to a dead DN - Pastebin.com.html, append-only-test.patch, 
> raw.php, repro.txt, san_dump.txt, subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-02 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727193#comment-14727193
 ] 

Sean Busbey commented on HBASE-14317:
-

{quote}
Failed because of this:

Error invoking method 'get(java.lang.Integer)' in java.util.ArrayList at 
META-INF/LICENSE.vm[line 1619, column 22]: InvocationTargetException: Index: 0, 
Size: 0 -> [Help 1]
Reply
{quote}

That's HBASE-14337. If I could get a review on v2 there we can figure out which 
dependency is getting corrupted.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.test.txt, 14317v10.txt, 14317v5.branch-1.2.txt, 
> 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, HBASE-14317-v2.patch, 
> HBASE-14317-v3.patch, HBASE-14317-v4.patch, HBASE-14317.patch, [Java] RS 
> stuck on WAL sync to a dead DN - Pastebin.com.html, append-only-test.patch, 
> raw.php, repro.txt, san_dump.txt, subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-01 Thread Elliott Clark (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14725940#comment-14725940
 ] 

Elliott Clark commented on HBASE-14317:
---

bq.I think we should be able to have more finesse than what is here where we 
stamp out everything 
That would be nice. Though I wasn't sure how to do it since we don't know for 
sure what order the subsequent syncs are going to come in. If I just stamp on 
one sync it's possible that that some other thread takes that sync and fails 
while the failed append thread creates a new sync.

bq.I think a sync could come in even after all we've made all our noise 
stamping on everything
We're doing the fail syncs on the append thread. So the only new syncs that can 
come in will be after this single threaded method completes. Anything that's 
racing with this method can succeed and not change correctness.




> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.test.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-01 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14725888#comment-14725888
 ] 

stack commented on HBASE-14317:
---

Trying out this for a fix for the hang: i.e. we can fall into the wait on 
zigzaglatch though all outstanding appends and syncs are failing and will never 
complete (and up the sync number to overwhelm current sequence id).

{code}
@@ -1792,9 +1797,10 @@ public class FSHLog implements WAL {
   // If here, another thread is waiting on us to get to safe point.  Don't 
leave it hanging.
   try {
 // Wait on outstanding syncers; wait for them to finish syncing 
(unless we've been
-// shutdown or unless our latch has been thrown because we have been 
aborted).
+// shutdown or unless our latch has been thrown because we have been 
aborted or unless
+// this WAL is broken and we can't get a sync/append to complete).
 while (!this.shutdown && this.zigzagLatch.isCocked() &&
-highestSyncedSequence.get() < currentSequence) {
+highestSyncedSequence.get() < currentSequence && 
this.syncFuturesCount > 0) {
{code}

Will be back on the highly-unlikely but possible case where an append fails but 
sync does not (a sync may be ongoing at time of append and may 'finish' after 
the append 'succesfully' so... let me see)

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Priority: Critical
> Attachments: 14317.test.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-01 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724844#comment-14724844
 ] 

Hadoop QA commented on HBASE-14317:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12753464/repro.txt
  against master branch at commit 498c1845ab7b01710955153c27501fdc7492849d.
  ATTACHMENT ID: 12753464

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15378//console

This message is automatically generated.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Priority: Critical
> Attachments: 14317.test.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-01 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724870#comment-14724870
 ] 

stack commented on HBASE-14317:
---

Patch is for 1.2. Will make a patch for master when have fix.

[~eclark] I took a look at your patch. I get now what you mean by a 
sync-after-a-failed-append should always fail. Agree. Lets fix that too. I 
think we should be able to have more finesse than what is here where we stamp 
out everything -- smile. I think a sync could come in even after all we've made 
all our noise stamping on everything (let me do the server mocks they way you 
have them in the patch too... and integrate your test).

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Priority: Critical
> Attachments: 14317.test.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-01 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726684#comment-14726684
 ] 

Hadoop QA commented on HBASE-14317:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12753621/14317v5.branch-1.2.txt
  against branch-1.2 branch at commit 5bb36f1594f7bc70d245ffa475e1393964c496b0.
  ATTACHMENT ID: 12753621

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

 {color:red}-1 core zombie tests{color}.  There are 4 zombie test(s):   
at 
org.apache.hadoop.hdfs.server.namenode.TestStorageRestore.testDfsAdminCmd(TestStorageRestore.java:260)
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.testFlushMarkersWALFail(TestHRegion.java:1337)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15386//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15386//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15386//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15386//console

This message is automatically generated.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.test.txt, 14317v5.branch-1.2.txt, 14317v5.txt, 
> HBASE-14317-v1.patch, HBASE-14317-v2.patch, HBASE-14317-v3.patch, 
> HBASE-14317-v4.patch, HBASE-14317.patch, [Java] RS stuck on WAL sync to a 
> dead DN - Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, 
> san_dump.txt, subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-01 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726394#comment-14726394
 ] 

stack commented on HBASE-14317:
---

https://reviews.apache.org/r/38024/

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.test.txt, 14317v5.branch-1.2.txt, 14317v5.txt, 
> HBASE-14317-v1.patch, HBASE-14317-v2.patch, HBASE-14317-v3.patch, 
> HBASE-14317-v4.patch, HBASE-14317.patch, [Java] RS stuck on WAL sync to a 
> dead DN - Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, 
> san_dump.txt, subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-08-31 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724013#comment-14724013
 ] 

stack commented on HBASE-14317:
---

This is the new bit in your patch:

{code}
1705  for (int i = 0; i < syncFutures.length; i++) {
1706if (syncFutures[i] != null) {
1707  this.syncFutures[i].done(sequence, e);
1708}
1709  }
{code}

... running through all possible syncfutures though it is as many futures as 
there are handlers?  You thinking we've just put a syncfuture in but have not 
updated the count of futures? Is that possible since it single thread doing 
syncfutures addition and count increment?

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Priority: Critical
> Attachments: 14317.test.txt, HBASE-14317-v1.patch, HBASE-14317.patch, 
> [Java] RS stuck on WAL sync to a dead DN - Pastebin.com.html, 
> append-only-test.patch, raw.php, san_dump.txt, subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-08-31 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724103#comment-14724103
 ] 

Hadoop QA commented on HBASE-14317:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12753362/append-only-test.patch
  against master branch at commit 498c1845ab7b01710955153c27501fdc7492849d.
  ATTACHMENT ID: 12753362

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15358//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15358//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15358//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15358//console

This message is automatically generated.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Priority: Critical
> Attachments: 14317.test.txt, HBASE-14317-v1.patch, HBASE-14317.patch, 
> [Java] RS stuck on WAL sync to a dead DN - Pastebin.com.html, 
> append-only-test.patch, raw.php, san_dump.txt, subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-08-31 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724304#comment-14724304
 ] 

Hadoop QA commented on HBASE-14317:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12753396/HBASE-14317-v4.patch
  against master branch at commit 498c1845ab7b01710955153c27501fdc7492849d.
  ATTACHMENT ID: 12753396

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:red}-1 findbugs{color}.  The patch appears to cause Findbugs 
(version 2.0.3) to fail.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn post-site goal 
to fail.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15369//testReport/
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15369//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15369//console

This message is automatically generated.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Priority: Critical
> Attachments: 14317.test.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, san_dump.txt, 
> subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-08-31 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724342#comment-14724342
 ] 

Hadoop QA commented on HBASE-14317:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12753371/HBASE-14317-v1.patch
  against master branch at commit 498c1845ab7b01710955153c27501fdc7492849d.
  ATTACHMENT ID: 12753371

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.util.TestProcessBasedCluster

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15360//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15360//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15360//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15360//console

This message is automatically generated.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Priority: Critical
> Attachments: 14317.test.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, san_dump.txt, 
> subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-08-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721134#comment-14721134
 ] 

Hadoop QA commented on HBASE-14317:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12753138/14317.test.txt
  against master branch at commit 4256128fa248b31c0482bdfc2510011771f84037.
  ATTACHMENT ID: 12753138

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15324//console

This message is automatically generated.

 Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
 -

 Key: HBASE-14317
 URL: https://issues.apache.org/jira/browse/HBASE-14317
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.2.0, 1.1.1
Reporter: stack
Priority: Critical
 Attachments: 14317.test.txt, HBASE-14317.patch, [Java] RS stuck on 
 WAL sync to a dead DN - Pastebin.com.html, raw.php, subset.of.rs.log


 hbase-1.1.1 and hadoop-2.7.1
 We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
 See attached thread dump and associated log. What is interesting is that 
 syncers are waiting to take syncs to run and at same time we want to flush so 
 we are waiting on a safe point but there seems to be nothing in our ring 
 buffer; did we go to roll log and not add safe point sync to clear out 
 ringbuffer?
 Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-08-29 Thread Elliott Clark (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721199#comment-14721199
 ] 

Elliott Clark commented on HBASE-14317:
---

{code}
15/08/29 11:02:00 FATAL wal.FSHLog: Waited too long in attainSafePoint. Waiting 
to get to seqId=36243253 However we are only at seqId=36243210 after waiting 
6
{code}

Just had this happen. Here's the log if it helps. There were way more than just 
one seqid that was stuck.

 Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
 -

 Key: HBASE-14317
 URL: https://issues.apache.org/jira/browse/HBASE-14317
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.2.0, 1.1.1
Reporter: stack
Priority: Critical
 Attachments: 14317.test.txt, HBASE-14317.patch, [Java] RS stuck on 
 WAL sync to a dead DN - Pastebin.com.html, raw.php, subset.of.rs.log


 hbase-1.1.1 and hadoop-2.7.1
 We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
 See attached thread dump and associated log. What is interesting is that 
 syncers are waiting to take syncs to run and at same time we want to flush so 
 we are waiting on a safe point but there seems to be nothing in our ring 
 buffer; did we go to roll log and not add safe point sync to clear out 
 ringbuffer?
 Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-08-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721233#comment-14721233
 ] 

Hadoop QA commented on HBASE-14317:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12753148/san_dump.txt
  against master branch at commit 4256128fa248b31c0482bdfc2510011771f84037.
  ATTACHMENT ID: 12753148

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation, build,
or dev-support patch that doesn't require tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15327//console

This message is automatically generated.

 Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
 -

 Key: HBASE-14317
 URL: https://issues.apache.org/jira/browse/HBASE-14317
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.2.0, 1.1.1
Reporter: stack
Priority: Critical
 Attachments: 14317.test.txt, HBASE-14317.patch, [Java] RS stuck on 
 WAL sync to a dead DN - Pastebin.com.html, raw.php, san_dump.txt, 
 subset.of.rs.log


 hbase-1.1.1 and hadoop-2.7.1
 We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
 See attached thread dump and associated log. What is interesting is that 
 syncers are waiting to take syncs to run and at same time we want to flush so 
 we are waiting on a safe point but there seems to be nothing in our ring 
 buffer; did we go to roll log and not add safe point sync to clear out 
 ringbuffer?
 Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-08-29 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721223#comment-14721223
 ] 

stack commented on HBASE-14317:
---

The ringbuffer processor is blocked waiting on outstanding syncs to come in

{code}
regionserver/r12s16.sjc.aristanetworks.com/172.24.32.16:9104.append-pool1-t1 
#140 prio=5 os_prio=0 tid=0x7fbf5cc61800 nid=0xb2 in Object.wait() 
[0x7fbf3a115000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:460)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.attainSafePoint(FSHLog.java:2024)
- locked 0x000548756b60 (a java.lang.Object)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1999)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1910)
at 
com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}

All processing of the ringbuffer is held up until we attain safe point -- i.e. 
all syncers must come home (This is by design -- we are trying to roll logs so 
no more edits allowed in). Same 'hang' is to be found over in HBASE-13974 
looking in its jstack1.txt. The 'fix' over in HBASE-13974 releases threads that 
are waiting on their sequenceid to come home; they are in the ring buffer 
behind the current point-of-processing/blockage. It looks like the blockage 
would persist after HBASE-13974 timeout 'fires'. The [~eclark] patch attached 
here where we timeout the root block would be a better workaround IMO till 
proper fix.

Still at trying to manufacture the block 'naturally'.

 Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
 -

 Key: HBASE-14317
 URL: https://issues.apache.org/jira/browse/HBASE-14317
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.2.0, 1.1.1
Reporter: stack
Priority: Critical
 Attachments: 14317.test.txt, HBASE-14317.patch, [Java] RS stuck on 
 WAL sync to a dead DN - Pastebin.com.html, raw.php, subset.of.rs.log


 hbase-1.1.1 and hadoop-2.7.1
 We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
 See attached thread dump and associated log. What is interesting is that 
 syncers are waiting to take syncs to run and at same time we want to flush so 
 we are waiting on a safe point but there seems to be nothing in our ring 
 buffer; did we go to roll log and not add safe point sync to clear out 
 ringbuffer?
 Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-08-29 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721201#comment-14721201
 ] 

stack commented on HBASE-14317:
---

You have HBASE-13971 [~eclark]

 Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
 -

 Key: HBASE-14317
 URL: https://issues.apache.org/jira/browse/HBASE-14317
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.2.0, 1.1.1
Reporter: stack
Priority: Critical
 Attachments: 14317.test.txt, HBASE-14317.patch, [Java] RS stuck on 
 WAL sync to a dead DN - Pastebin.com.html, raw.php, subset.of.rs.log


 hbase-1.1.1 and hadoop-2.7.1
 We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
 See attached thread dump and associated log. What is interesting is that 
 syncers are waiting to take syncs to run and at same time we want to flush so 
 we are waiting on a safe point but there seems to be nothing in our ring 
 buffer; did we go to roll log and not add safe point sync to clear out 
 ringbuffer?
 Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-08-29 Thread Elliott Clark (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721234#comment-14721234
 ] 

Elliott Clark commented on HBASE-14317:
---

The SyncRunners are waiting on new work to process. 
The log roller is waiting on getting to the sequence number.

It looks like somewhere in between getting the sequence numbers assigned and 
queueing the syncs something is erroring out and we're not handling it 
correctly.

 Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
 -

 Key: HBASE-14317
 URL: https://issues.apache.org/jira/browse/HBASE-14317
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.2.0, 1.1.1
Reporter: stack
Priority: Critical
 Attachments: 14317.test.txt, HBASE-14317.patch, [Java] RS stuck on 
 WAL sync to a dead DN - Pastebin.com.html, raw.php, san_dump.txt, 
 subset.of.rs.log


 hbase-1.1.1 and hadoop-2.7.1
 We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
 See attached thread dump and associated log. What is interesting is that 
 syncers are waiting to take syncs to run and at same time we want to flush so 
 we are waiting on a safe point but there seems to be nothing in our ring 
 buffer; did we go to roll log and not add safe point sync to clear out 
 ringbuffer?
 Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-08-27 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717381#comment-14717381
 ] 

stack commented on HBASE-14317:
---

Anything in logs showing how we might have skipped out in an unorthodox manner? 
In the original case, no progress because dfsclient stuck... no timeout. 
Perhaps that the case here but more likely we exited where we shouldn't have. 
[~eclark]

 Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
 -

 Key: HBASE-14317
 URL: https://issues.apache.org/jira/browse/HBASE-14317
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.1.1
Reporter: stack
Priority: Critical
 Attachments: [Java] RS stuck on WAL sync to a dead DN - 
 Pastebin.com.html, raw.php, subset.of.rs.log


 hbase-1.1.1 and hadoop-2.7.1
 We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
 See attached thread dump and associated log. What is interesting is that 
 syncers are waiting to take syncs to run and at same time we want to flush so 
 we are waiting on a safe point but there seems to be nothing in our ring 
 buffer; did we go to roll log and not add safe point sync to clear out 
 ringbuffer?
 Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-08-27 Thread Elliott Clark (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717409#comment-14717409
 ] 

Elliott Clark commented on HBASE-14317:
---

Around the same time that is happening on the regionserver I see this on the 
datanode:

{code}
15/08/27 02:19:24 ERROR datanode.DataNode: hbase4537:50010:DataXceiver error 
processing WRITE_BLOCK operation  src: /10.210.81.27:45576 dst: 
/10.210.81.27:50010
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:467)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:781)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:730)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
at java.lang.Thread.run(Thread.java:745)
{code}

 Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
 -

 Key: HBASE-14317
 URL: https://issues.apache.org/jira/browse/HBASE-14317
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.1.1
Reporter: stack
Priority: Critical
 Attachments: [Java] RS stuck on WAL sync to a dead DN - 
 Pastebin.com.html, raw.php, subset.of.rs.log


 hbase-1.1.1 and hadoop-2.7.1
 We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
 See attached thread dump and associated log. What is interesting is that 
 syncers are waiting to take syncs to run and at same time we want to flush so 
 we are waiting on a safe point but there seems to be nothing in our ring 
 buffer; did we go to roll log and not add safe point sync to clear out 
 ringbuffer?
 Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-08-27 Thread Elliott Clark (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717302#comment-14717302
 ] 

Elliott Clark commented on HBASE-14317:
---

Just saw something very like this too. Flushes are waiting on getting a 
committed seq then failing.
The append thread is just stuck.

{code}
Thread 125 
(regionserver/hbase4537.frc3.facebook.com/10.210.81.27:16020.append-pool1-t1):
  State: TIMED_WAITING
  Blocked count: 239951
  Waited count: 37873297
  Stack:
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:460)

org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.attainSafePoint(FSHLog.java:1786)

org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1761)

org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1672)
com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
{code}

 Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
 -

 Key: HBASE-14317
 URL: https://issues.apache.org/jira/browse/HBASE-14317
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.1.1
Reporter: stack
Priority: Critical
 Attachments: [Java] RS stuck on WAL sync to a dead DN - 
 Pastebin.com.html, raw.php, subset.of.rs.log


 hbase-1.1.1 and hadoop-2.7.1
 We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
 See attached thread dump and associated log. What is interesting is that 
 syncers are waiting to take syncs to run and at same time we want to flush so 
 we are waiting on a safe point but there seems to be nothing in our ring 
 buffer; did we go to roll log and not add safe point sync to clear out 
 ringbuffer?
 Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-08-27 Thread Elliott Clark (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717335#comment-14717335
 ] 

Elliott Clark commented on HBASE-14317:
---

The easiest thing that I see is to time out waiting on attain safepoint. 
However it would be nice to know how that got lost.

 Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
 -

 Key: HBASE-14317
 URL: https://issues.apache.org/jira/browse/HBASE-14317
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.1.1
Reporter: stack
Priority: Critical
 Attachments: [Java] RS stuck on WAL sync to a dead DN - 
 Pastebin.com.html, raw.php, subset.of.rs.log


 hbase-1.1.1 and hadoop-2.7.1
 We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
 See attached thread dump and associated log. What is interesting is that 
 syncers are waiting to take syncs to run and at same time we want to flush so 
 we are waiting on a safe point but there seems to be nothing in our ring 
 buffer; did we go to roll log and not add safe point sync to clear out 
 ringbuffer?
 Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-08-27 Thread Elliott Clark (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717332#comment-14717332
 ] 

Elliott Clark commented on HBASE-14317:
---

Nope I'm on 2.6.0.

 Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
 -

 Key: HBASE-14317
 URL: https://issues.apache.org/jira/browse/HBASE-14317
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.1.1
Reporter: stack
Priority: Critical
 Attachments: [Java] RS stuck on WAL sync to a dead DN - 
 Pastebin.com.html, raw.php, subset.of.rs.log


 hbase-1.1.1 and hadoop-2.7.1
 We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
 See attached thread dump and associated log. What is interesting is that 
 syncers are waiting to take syncs to run and at same time we want to flush so 
 we are waiting on a safe point but there seems to be nothing in our ring 
 buffer; did we go to roll log and not add safe point sync to clear out 
 ringbuffer?
 Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-08-27 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717330#comment-14717330
 ] 

stack commented on HBASE-14317:
---

Are you seeing the companion no-pipeline-recovery HDFS-8960? You running hadoop 
2.7.1? [~eclark]

 Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
 -

 Key: HBASE-14317
 URL: https://issues.apache.org/jira/browse/HBASE-14317
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.1.1
Reporter: stack
Priority: Critical
 Attachments: [Java] RS stuck on WAL sync to a dead DN - 
 Pastebin.com.html, raw.php, subset.of.rs.log


 hbase-1.1.1 and hadoop-2.7.1
 We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
 See attached thread dump and associated log. What is interesting is that 
 syncers are waiting to take syncs to run and at same time we want to flush so 
 we are waiting on a safe point but there seems to be nothing in our ring 
 buffer; did we go to roll log and not add safe point sync to clear out 
 ringbuffer?
 Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-08-27 Thread Elliott Clark (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717400#comment-14717400
 ] 

Elliott Clark commented on HBASE-14317:
---

About an hour before I know that everything was locked up I see this:

{code}
15/08/27 02:53:18 WARN wal.FSHLog: Could not append. Requesting close of wal
java.net.SocketTimeoutException: 7 millis timeout while waiting for channel 
to be ready for read. ch : java.nio.channels.SocketChannel[connected 
local=/10.210.81.27:60319 remote=/10.210.81.27:50010]
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2201)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:1142)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1112)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1253)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1004)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:548)
15/08/27 02:53:18 ERROR wal.FSHLog: Error syncing, request close of wal
java.net.SocketTimeoutException: 7 millis timeout while waiting for channel 
to be ready for read. ch : java.nio.channels.SocketChannel[connected 
local=/10.210.81.27:60319 remote=/10.210.81.27:50010]
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2201)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:1142)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1112)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1253)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1004)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:548)
15/08/27 02:53:18 WARN wal.FSHLog: Could not append. Requesting close of wal
java.net.SocketTimeoutException: 7 millis timeout while waiting for channel 
to be ready for read. ch : java.nio.channels.SocketChannel[connected 
local=/10.210.81.27:60319 remote=/10.210.81.27:50010]
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2201)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:1142)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1112)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1253)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1004)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:548)
{code}

Where append and sync errors are repeated several times.




 Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
 -

 Key: HBASE-14317
 URL: https://issues.apache.org/jira/browse/HBASE-14317
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.1.1
Reporter: stack
Priority: Critical
 Attachments: [Java] RS

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-08-27 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717769#comment-14717769
 ] 

stack commented on HBASE-14317:
---

That patch don't look bad to me.  Usually we just return immediately out of the 
wait on safe point so we won't be doing all the currentTimeMillis calls all the 
time, only when we waiting on safe point.

 Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
 -

 Key: HBASE-14317
 URL: https://issues.apache.org/jira/browse/HBASE-14317
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.2.0, 1.1.1
Reporter: stack
Priority: Critical
 Attachments: HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead 
 DN - Pastebin.com.html, raw.php, subset.of.rs.log


 hbase-1.1.1 and hadoop-2.7.1
 We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
 See attached thread dump and associated log. What is interesting is that 
 syncers are waiting to take syncs to run and at same time we want to flush so 
 we are waiting on a safe point but there seems to be nothing in our ring 
 buffer; did we go to roll log and not add safe point sync to clear out 
 ringbuffer?
 Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-08-27 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717946#comment-14717946
 ] 

Hadoop QA commented on HBASE-14317:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12752894/HBASE-14317.patch
  against master branch at commit cc1542828de93b8d54cc14497fd5937989ea1b6d.
  ATTACHMENT ID: 12752894

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.snapshot.TestSecureExportSnapshot

 {color:red}-1 core zombie tests{color}.  There are 8 zombie test(s):   
at 
org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles.testTableWithCFNameStartWithUnderScore(TestLoadIncrementalHFiles.java:530)
at 
org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles.testNonHfileFolder(TestLoadIncrementalHFiles.java:344)
at 
org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles.testNonHfileFolderWithUnmatchedFamilyName(TestLoadIncrementalHFiles.java:312)
at 
org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles.testNonexistentColumnFamilyLoad(TestLoadIncrementalHFiles.java:298)
at 
org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles.testRegionCrossingHFileSplit(TestLoadIncrementalHFiles.java:193)
at 
org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles.testRegionCrossingHFileSplitRowColBloom(TestLoadIncrementalHFiles.java:189)
at 
org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles.testRegionCrossingRowColBloom(TestLoadIncrementalHFiles.java:140)
at 
org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles.testRegionCrossingRowBloom(TestLoadIncrementalHFiles.java:128)
at 
org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles.testWithoutAnExistingTableAndCreateTableSetToNo(TestLoadIncrementalHFiles.java:515)
at 
org.apache.hadoop.hbase.security.access.TestAccessController2.testCreateWithCorrectOwner(TestAccessController2.java:177)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15310//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15310//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15310//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15310//console

This message is automatically generated.

 Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
 -

 Key: HBASE-14317
 URL: https://issues.apache.org/jira/browse/HBASE-14317
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.2.0, 1.1.1
Reporter: stack
Priority: Critical
 Attachments: HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead 
 DN - Pastebin.com.html, raw.php, subset.of.rs.log


 hbase-1.1.1 and hadoop-2.7.1
 We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
 See attached thread dump and associated log. What is interesting is that 
 syncers are waiting to take syncs to run and at same time we want to flush so 
 we are waiting on a safe point but there seems to be nothing in our ring 
 buffer; did we go to roll log and not add safe point sync to clear out 
 ringbuffer?
 Needs a bit of study. Try to

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-08-27 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14718051#comment-14718051
 ] 

stack commented on HBASE-14317:
---

This is from the attached log from original complaint:

{code}
2015-08-23 07:22:26,060 FATAL 
[regionserver/r12s16.sjc.aristanetworks.com/172.24.32.16:9104.append-pool1-t1] 
wal.FSHLog: Could not append. Requesting close of wal
java.io.IOException: Failed to replace a bad datanode on the existing pipeline 
due to no more good datanodes being available to try. (Nodes: 
current=[172.24.32.16:10110, 172.24.32.13:10110], original=[172.24.32.16:10110, 
172.24.32.13:10110]). The current failed datanode replacement policy is 
DEFAULT, and a client may configure this via 
'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
configuration.
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:969)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1035)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1184)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:933)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:487)
{code}

it looks like yours in that the complaint is that we cannot append.

If I manufacture a failed append, I can get a hang. It is this logic in the 
finally for HRegion#doMiniBatchMutation .. and probably in all other places we 
do the append/sync dance. At the end of step 5, we do the WAL append and if we 
get an IOE, which is what you have pasted and is what we have in original 
complaint's log, then we go to the finally:

{code}
} finally {
  // if the wal sync was unsuccessful, remove keys from memstore
  if (doRollBackMemstore) {
rollbackMemstore(memstoreCells);
  }
  if (w != null) {
mvcc.completeMemstoreInsertWithSeqNum(w, walKey);
  }
...
{code}

The rollback of edits if fine but w is not null in the above and we go to 
complete the insert in mvcc and inside here, we ask the walKey for its 
sequenceid... which is assigned AFTER we append ... only the append failed.  So 
we wait...

Let me look a bit more.

I think your patch would break a wait on safe point but am not sure it would 
unblock all threads. Let me try and manufacture safepoint waiters too.  Will be 
back.







 Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
 -

 Key: HBASE-14317
 URL: https://issues.apache.org/jira/browse/HBASE-14317
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.2.0, 1.1.1
Reporter: stack
Priority: Critical
 Attachments: HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead 
 DN - Pastebin.com.html, raw.php, subset.of.rs.log


 hbase-1.1.1 and hadoop-2.7.1
 We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
 See attached thread dump and associated log. What is interesting is that 
 syncers are waiting to take syncs to run and at same time we want to flush so 
 we are waiting on a safe point but there seems to be nothing in our ring 
 buffer; did we go to roll log and not add safe point sync to clear out 
 ringbuffer?
 Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-08-27 Thread Elliott Clark (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717810#comment-14717810
 ] 

Elliott Clark commented on HBASE-14317:
---

Yeah the actual patch isn't too bad. I just hope that it doesn't obscure the 
reason why we're getting stuck.

 Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
 -

 Key: HBASE-14317
 URL: https://issues.apache.org/jira/browse/HBASE-14317
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.2.0, 1.1.1
Reporter: stack
Priority: Critical
 Attachments: HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead 
 DN - Pastebin.com.html, raw.php, subset.of.rs.log


 hbase-1.1.1 and hadoop-2.7.1
 We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
 See attached thread dump and associated log. What is interesting is that 
 syncers are waiting to take syncs to run and at same time we want to flush so 
 we are waiting on a safe point but there seems to be nothing in our ring 
 buffer; did we go to roll log and not add safe point sync to clear out 
 ringbuffer?
 Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-08-26 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715328#comment-14715328
 ] 

stack commented on HBASE-14317:
---

Is the concurrent shutting of regions which are waiting on safe point:

{code}
RS_CLOSE_REGION-r12s16:9104-1 #33639 prio=5 os_prio=0 tid=0x7fbf546fe000 
nid=0x563 in Object.wait() [0x7fbf38107000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at 
org.apache.hadoop.hbase.regionserver.HRegion.waitForFlushesAndCompactions(HRegion.java:1512)
- locked 0x00056baa4888 (a 
org.apache.hadoop.hbase.regionserver.HRegion$WriteState)
at 
org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1371)
- locked 0x00056baa4888 (a 
org.apache.hadoop.hbase.regionserver.HRegion$WriteState)
at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1336)
- locked 0x00056baaf928 (a java.lang.Object)
at 
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:138)
at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}

... and then the FATAL roll of logs happening at same time the issue? Dig.

 Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
 -

 Key: HBASE-14317
 URL: https://issues.apache.org/jira/browse/HBASE-14317
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.1.1
Reporter: stack
 Attachments: [Java] RS stuck on WAL sync to a dead DN - 
 Pastebin.com.html, raw.php, subset.of.rs.log


 hbase-1.1.1 and hadoop-2.7.1
 We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
 See attached thread dump and associated log. What is interesting is that 
 syncers are waiting to take syncs to run and at same time we want to flush so 
 we are waiting on a safe point but there seems to be nothing in our ring 
 buffer; did we go to roll log and not add safe point sync to clear out 
 ringbuffer?
 Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

76 matches

Mail list logo