[ 
https://issues.apache.org/jira/browse/HDFS-10992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15564989#comment-15564989
 ] 

Chernishev Aleksandr edited comment on HDFS-10992 at 10/11/16 9:20 AM:
-----------------------------------------------------------------------

another version -  2.7.1.2.3.0.0-2557,  the same bug


was (Author: cany):
another version -  2.7.1.2.3.0.0-2557

> file is under construction but no leases found
> ----------------------------------------------
>
>                 Key: HDFS-10992
>                 URL: https://issues.apache.org/jira/browse/HDFS-10992
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.7.1
>         Environment: hortonworks 2.3 build 2557. 10 Datanodes , 2 NameNode in 
> auto failover
>            Reporter: Chernishev Aleksandr
>
> On hdfs after recording a small number of files (at least 1000) the size 
> (150Mb - 1,6Gb) found 13 damaged files with incomplete last block.
> hadoop fsck /hadoop/files/load_tarifer-zf-4_20160902165521521.csv 
> -openforwrite -files -blocks -locations
> DEPRECATED: Use of this script to execute hdfs command is deprecated.
> Instead use the hdfs command for it.
> Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8
> Connecting to namenode via 
> http://hadoop-hdfs:50070/fsck?ugi=hdfs&openforwrite=1&files=1&blocks=1&locations=1&path=%2Fstaging%2Flanding%2Fstream%2Fitc_dwh%2Ffiles%2Fload_tarifer-zf-4_20160902165521521.csv
> FSCK started by hdfs (auth:SIMPLE) from /10.0.0.178 for path 
> /hadoop/files/load_tarifer-zf-4_20160902165521521.csv at Mon Oct 10 17:12:25 
> MSK 2016
> /hadoop/files/load_tarifer-zf-4_20160902165521521.csv 920596121 bytes, 7 
> block(s), OPENFORWRITE:  MISSING 1 blocks of total size 115289753 B
> 0. BP-1552885336-10.0.0.178-1446159880991:blk_1084952841_17798971 
> len=134217728 repl=4 
> [DatanodeInfoWithStorage[10.0.0.188:50010,DS-9ba44a76-113a-43ac-87dc-46aa97ba3267,DISK],
>  
> DatanodeInfoWithStorage[10.0.0.183:50010,DS-eccd375a-ea32-491b-a4a3-5ea3faca4171,DISK],
>  
> DatanodeInfoWithStorage[10.0.0.184:50010,DS-ec462491-6766-490a-a92f-38e9bb3be5ce,DISK],
>  
> DatanodeInfoWithStorage[10.0.0.182:50010,DS-cef46399-bb70-4f1a-ac55-d71c7e820c29,DISK]]
> 1. BP-1552885336-10.0.0.178-1446159880991:blk_1084952850_17799207 
> len=134217728 repl=3 
> [DatanodeInfoWithStorage[10.0.0.184:50010,DS-412769e0-0ec2-48d3-b644-b08a516b1c2c,DISK],
>  
> DatanodeInfoWithStorage[10.0.0.181:50010,DS-97388b2f-c542-417d-ab06-c8d81b94fa9d,DISK],
>  
> DatanodeInfoWithStorage[10.0.0.187:50010,DS-e7a11951-4315-4425-a88b-a9f6429cc058,DISK]]
> 2. BP-1552885336-10.0.0.178-1446159880991:blk_1084952857_17799489 
> len=134217728 repl=3 
> [DatanodeInfoWithStorage[10.0.0.184:50010,DS-7a08c597-b0f4-46eb-9916-f028efac66d7,DISK],
>  
> DatanodeInfoWithStorage[10.0.0.180:50010,DS-fa6a4630-1626-43d8-9988-955a86ac3736,DISK],
>  
> DatanodeInfoWithStorage[10.0.0.182:50010,DS-8670e77d-c4db-4323-bb01-e0e64bd5b78e,DISK]]
> 3. BP-1552885336-10.0.0.178-1446159880991:blk_1084952866_17799725 
> len=134217728 repl=3 
> [DatanodeInfoWithStorage[10.0.0.185:50010,DS-b5ff8ba0-275e-4846-b5a4-deda35aa0ad8,DISK],
>  
> DatanodeInfoWithStorage[10.0.0.180:50010,DS-9cb6cade-9395-4f3a-ab7b-7fabd400b7f2,DISK],
>  
> DatanodeInfoWithStorage[10.0.0.183:50010,DS-e277dcf3-1bce-4efd-a668-cd6fb2e10588,DISK]]
> 4. BP-1552885336-10.0.0.178-1446159880991:blk_1084952872_17799891 
> len=134217728 repl=4 
> [DatanodeInfoWithStorage[10.0.0.184:50010,DS-e1d8f278-1a22-4294-ac7e-e12d554aef7f,DISK],
>  
> DatanodeInfoWithStorage[10.0.0.186:50010,DS-5d9aeb2b-e677-41cd-844e-4b36b3c84092,DISK],
>  
> DatanodeInfoWithStorage[10.0.0.183:50010,DS-eccd375a-ea32-491b-a4a3-5ea3faca4171,DISK],
>  
> DatanodeInfoWithStorage[10.0.0.182:50010,DS-8670e77d-c4db-4323-bb01-e0e64bd5b78e,DISK]]
> 5. BP-1552885336-10.0.0.78-1446159880991:blk_1084952880_17800120 
> len=134217728 repl=3 
> [DatanodeInfoWithStorage[10.0.0.181:50010,DS-79185b75-1938-4c91-a6d0-bb6687ca7e56,DISK],
>  
> DatanodeInfoWithStorage[10.0.0.184:50010,DS-dcbd20aa-0334-49e0-b807-d2489f5923c6,DISK],
>  
> DatanodeInfoWithStorage[10.0.0.183:50010,DS-f1d77328-f3af-483e-82e9-66ab0723a52c,DISK]]
> 6. 
> BP-1552885336-10.0.0.178-1446159880991:blk_1084952887_17800316{UCState=COMMITTED,
>  truncateBlock=null, primaryNodeIndex=-1, 
> replicas=[ReplicaUC[[DISK]DS-5f3eac72-eb55-4df7-bcaa-a6fa35c166a0:NORMAL:10.0.0.188:50010|RBW],
>  
> ReplicaUC[[DISK]DS-a2a0d8f0-772e-419f-b4ff-10b4966c57ca:NORMAL:10.0.0.184:50010|RBW],
>  
> ReplicaUC[[DISK]DS-52984aa0-598e-4fff-acfa-8904ca7b585c:NORMAL:10.0.0.185:50010|RBW]]}
>  len=115289753 MISSING!
> Status: CORRUPT
>  Total size:  920596121 B
>  Total dirs:  0
>  Total files: 1
>  Total symlinks:              0
>  Total blocks (validated):    7 (avg. block size 131513731 B)
>   ********************************
>   UNDER MIN REPL'D BLOCKS:    1 (14.285714 %)
>   dfs.namenode.replication.min:       1
>   CORRUPT FILES:      1
>   MISSING BLOCKS:     1
>   MISSING SIZE:               115289753 B
>   ********************************
>  Minimally replicated blocks: 6 (85.71429 %)
>  Over-replicated blocks:      2 (28.571428 %)
>  Under-replicated blocks:     0 (0.0 %)
>  Mis-replicated blocks:               0 (0.0 %)
>  Default replication factor:  3
>  Average block replication:   2.857143
>  Corrupt blocks:              0
>  Missing replicas:            0 (0.0 %)
>  Number of data-nodes:                10
>  Number of racks:             1
> FSCK ended at Mon Oct 10 17:12:25 MSK 2016 in 0 milliseconds
> The filesystem under path 
> '/hadoop/files/load_tarifer-zf-4_20160902165521521.csv' is CORRUPT
> File is UNDER_RECOVERY, NameNode think that last block in COMMITTED state, 
> datanode think that  block in RBW state. Recover not executed. The last block 
> file and his meta exist's in 'rwb' directory on datanode:
> -rw-r--r-- 1 hdfs hdfs 115289753 Sep  2 16:56 
> /hadoopdir/data/current/BP-1552885336-10.0.0.178-1446159880991/current/rbw/blk_1084952887
> -rw-r--r-- 1 hdfs hdfs    900711 Sep  2 16:56 
> /hadoopdir/data/current/BP-1552885336-10.0.0.178-1446159880991/current/rbw/blk_1084952887_17800316.meta
> Lease recover tool said:
> hdfs debug recoverLease -path 
> /hadoop/files/load_tarifer-zf-4_20160902165521521.csv
> Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8
> recoverLease got exception: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
>  Failed to RECOVER_LEASE 
> /hadoop/files/load_tarifer-zf-4_20160902165521521.csv for 
> DFSClient_NONMAPREDUCE_-1462314354_1 on 10.0.0.178 because the file is under 
> construction but no leases found.
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2892)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLease(FSNamesystem.java:2835)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.recoverLease(NameNodeRpcServer.java:668)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.recoverLease(ClientNamenodeProtocolServerSideTranslatorPB.java:663)
>       at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2081)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2077)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2075)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1427)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1358)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>       at com.sun.proxy.$Proxy9.recoverLease(Unknown Source)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.recoverLease(ClientNamenodeProtocolTranslatorPB.java:603)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:497)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>       at com.sun.proxy.$Proxy10.recoverLease(Unknown Source)
>       at org.apache.hadoop.hdfs.DFSClient.recoverLease(DFSClient.java:1259)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$2.doCall(DistributedFileSystem.java:279)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$2.doCall(DistributedFileSystem.java:275)
>       at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.recoverLease(DistributedFileSystem.java:275)
>       at 
> org.apache.hadoop.hdfs.tools.DebugAdmin$RecoverLeaseCommand.run(DebugAdmin.java:256)
>       at org.apache.hadoop.hdfs.tools.DebugAdmin.run(DebugAdmin.java:336)
>       at org.apache.hadoop.hdfs.tools.DebugAdmin.main(DebugAdmin.java:359)
> Giving up on recoverLease for 
> /hadoop/files/load_tarifer-zf-4_20160902165521521.csv after 1 try.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to