[jira] [Commented] (HBASE-20157) WAL file might get broken

2018-03-07 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390886#comment-16390886
 ] 

Yu Li commented on HBASE-20157:
---

bq. I mean the problem reported here is already fixed through HBASE-16824
Good to know, marking as duplicated with HBASE-16824. Thanks. [~gzh1992n]

> WAL file might get broken
> -
>
> Key: HBASE-20157
> URL: https://issues.apache.org/jira/browse/HBASE-20157
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 1.1.0
>Reporter: Zephyr Guo
>Assignee: Zephyr Guo
>Priority: Major
> Fix For: 2.0.0
>
>
> WAL file can get corrupted by HBASE-16824. 
> When calling Writer.close() and Writer.sync() in the same time, a HDFS 
> bug(HDFS-13243) will be triggered. And, if this did happen, the last block in 
> WAL will get broken(NN mark it as CorruptBlock).
> My purpose of reporting this scenario here is to help those who come across 
> the same problem like me. (HBASE-16824 has been fixed, though) 
> {panel:title=RS log}
> 2018-02-05 07:58:54,212 INFO 
> [regionserver/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com/10.0.0.218:16020.logRoller]
>  hdfs.DFSClient: Could not complete 
> /hbase/WALs/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com,16020,1517453470107/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com%2C16020%2C1517453470107.default.1517788719683
>  retrying...
> 2018-02-05 07:59:00,612 INFO 
> [regionserver/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com/10.0.0.218:16020.logRoller]
>  hdfs.DFSClient: Could not complete 
> /hbase/WALs/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com,16020,1517453470107/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com%2C16020%2C1517453470107.default.1517788719683
>  retrying...
> {panel}
> {panel:title=NN log}
> 2018-02-05 07:58:48,011 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> fsync: 
> /hbase/WALs/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com,16020,1517453470107/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com%2C16020%2C1517453470107.default.1517788719683
>  for DFSClient_NONMAPREDUCE_1109936977_1
> 2018-02-05 07:58:48,011 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* 
> blk_1080650145_6909339\{UCState=COMMITTED, truncateBlock=null, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUC[[DISK]DS-a4e579e7-4721-4c22-9b61-f1d00b33c45f:NORMAL:10.0.0.218:50010|RBW],
>  
> ReplicaUC[[DISK]DS-5d3d7878-876d-4a5a-97bc-5535c4cf8d59:NORMAL:10.0.0.220:50010|RBW],
>  
> ReplicaUC[[DISK]DS-ccc314b2-e2ad-4c1f-99a5-a39e3677a83b:NORMAL:10.0.0.221:50010|RBW]]}
>  is not COMPLETE (ucState = COMMITTED, replication# = 0 < minimum = 2) in 
> file 
> /hbase/WALs/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com,16020,1517453470107/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com%2C16020%2C1517453470107.default.1517788719683
> 2018-02-05 07:58:48,111 INFO BlockStateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: blk_1080650145 added as corrupt on 
> 10.0.0.221:50010 by 
> hb-j5e517al6xib80rkb-005.hbase.rds.aliyuncs.com/10.0.0.221 because block is 
> COMMITTED and reported length 1957330 does not match length in block map 80594
> 2018-02-05 07:58:48,224 INFO BlockStateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: blk_1080650145 added as corrupt on 
> 10.0.0.218:50010 by 
> hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com/10.0.0.218 because block is 
> COMMITTED and reported length 1957330 does not match length in block map 80594
> 2018-02-05 07:58:48,224 INFO BlockStateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: blk_1080650145 added as corrupt on 
> 10.0.0.220:50010 by 
> hb-j5e517al6xib80rkb-003.hbase.rds.aliyuncs.com/10.0.0.220 because block is 
> COMMITTED and reported length 1957330 does not match length in block map 80594
> 2018-02-05 07:58:48,511 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* 
> blk_1080650145_6909339\{UCState=COMMITTED, truncateBlock=null, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUC[[DISK]DS-a4e579e7-4721-4c22-9b61-f1d00b33c45f:NORMAL:10.0.0.218:50010|RBW],
>  
> ReplicaUC[[DISK]DS-5d3d7878-876d-4a5a-97bc-5535c4cf8d59:NORMAL:10.0.0.220:50010|RBW],
>  
> ReplicaUC[[DISK]DS-ccc314b2-e2ad-4c1f-99a5-a39e3677a83b:NORMAL:10.0.0.221:50010|RBW]]}
>  is not COMPLETE (ucState = COMMITTED, replication# = 3 >= minimum = 2) in 
> file 
> /hbase/WALs/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com,16020,1517453470107/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com%2C16020%2C1517453470107.default.1517788719683
> {panel}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20157) WAL file might get broken

2018-03-07 Thread Zephyr Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390876#comment-16390876
 ] 

Zephyr Guo commented on HBASE-20157:


[~carp84]
I mean he problem reported here is already fixed through -HBASE-16824.-

> WAL file might get broken
> -
>
> Key: HBASE-20157
> URL: https://issues.apache.org/jira/browse/HBASE-20157
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 1.1.0
>Reporter: Zephyr Guo
>Assignee: Zephyr Guo
>Priority: Major
> Fix For: 2.0.0
>
>
> WAL file can get corrupted by HBASE-16824. 
> When calling Writer.close() and Writer.sync() in the same time, a HDFS 
> bug(HDFS-13243) will be triggered. And, if this did happen, the last block in 
> WAL will get broken(NN mark it as CorruptBlock).
> My purpose of reporting this scenario here is to help those who come across 
> the same problem like me. (HBASE-16824 has been fixed, though) 
> {panel:title=RS log}
> 2018-02-05 07:58:54,212 INFO 
> [regionserver/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com/10.0.0.218:16020.logRoller]
>  hdfs.DFSClient: Could not complete 
> /hbase/WALs/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com,16020,1517453470107/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com%2C16020%2C1517453470107.default.1517788719683
>  retrying...
> 2018-02-05 07:59:00,612 INFO 
> [regionserver/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com/10.0.0.218:16020.logRoller]
>  hdfs.DFSClient: Could not complete 
> /hbase/WALs/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com,16020,1517453470107/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com%2C16020%2C1517453470107.default.1517788719683
>  retrying...
> {panel}
> {panel:title=NN log}
> 2018-02-05 07:58:48,011 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> fsync: 
> /hbase/WALs/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com,16020,1517453470107/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com%2C16020%2C1517453470107.default.1517788719683
>  for DFSClient_NONMAPREDUCE_1109936977_1
> 2018-02-05 07:58:48,011 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* 
> blk_1080650145_6909339\{UCState=COMMITTED, truncateBlock=null, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUC[[DISK]DS-a4e579e7-4721-4c22-9b61-f1d00b33c45f:NORMAL:10.0.0.218:50010|RBW],
>  
> ReplicaUC[[DISK]DS-5d3d7878-876d-4a5a-97bc-5535c4cf8d59:NORMAL:10.0.0.220:50010|RBW],
>  
> ReplicaUC[[DISK]DS-ccc314b2-e2ad-4c1f-99a5-a39e3677a83b:NORMAL:10.0.0.221:50010|RBW]]}
>  is not COMPLETE (ucState = COMMITTED, replication# = 0 < minimum = 2) in 
> file 
> /hbase/WALs/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com,16020,1517453470107/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com%2C16020%2C1517453470107.default.1517788719683
> 2018-02-05 07:58:48,111 INFO BlockStateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: blk_1080650145 added as corrupt on 
> 10.0.0.221:50010 by 
> hb-j5e517al6xib80rkb-005.hbase.rds.aliyuncs.com/10.0.0.221 because block is 
> COMMITTED and reported length 1957330 does not match length in block map 80594
> 2018-02-05 07:58:48,224 INFO BlockStateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: blk_1080650145 added as corrupt on 
> 10.0.0.218:50010 by 
> hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com/10.0.0.218 because block is 
> COMMITTED and reported length 1957330 does not match length in block map 80594
> 2018-02-05 07:58:48,224 INFO BlockStateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: blk_1080650145 added as corrupt on 
> 10.0.0.220:50010 by 
> hb-j5e517al6xib80rkb-003.hbase.rds.aliyuncs.com/10.0.0.220 because block is 
> COMMITTED and reported length 1957330 does not match length in block map 80594
> 2018-02-05 07:58:48,511 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* 
> blk_1080650145_6909339\{UCState=COMMITTED, truncateBlock=null, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUC[[DISK]DS-a4e579e7-4721-4c22-9b61-f1d00b33c45f:NORMAL:10.0.0.218:50010|RBW],
>  
> ReplicaUC[[DISK]DS-5d3d7878-876d-4a5a-97bc-5535c4cf8d59:NORMAL:10.0.0.220:50010|RBW],
>  
> ReplicaUC[[DISK]DS-ccc314b2-e2ad-4c1f-99a5-a39e3677a83b:NORMAL:10.0.0.221:50010|RBW]]}
>  is not COMPLETE (ucState = COMMITTED, replication# = 3 >= minimum = 2) in 
> file 
> /hbase/WALs/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com,16020,1517453470107/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com%2C16020%2C1517453470107.default.1517788719683
> {panel}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)