[ https://issues.apache.org/jira/browse/HDFS-10992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15564989#comment-15564989 ]
Chernishev Aleksandr edited comment on HDFS-10992 at 10/11/16 9:20 AM: ----------------------------------------------------------------------- another version - 2.7.1.2.3.0.0-2557, the same bug was (Author: cany): another version - 2.7.1.2.3.0.0-2557 > file is under construction but no leases found > ---------------------------------------------- > > Key: HDFS-10992 > URL: https://issues.apache.org/jira/browse/HDFS-10992 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.7.1 > Environment: hortonworks 2.3 build 2557. 10 Datanodes , 2 NameNode in > auto failover > Reporter: Chernishev Aleksandr > > On hdfs after recording a small number of files (at least 1000) the size > (150Mb - 1,6Gb) found 13 damaged files with incomplete last block. > hadoop fsck /hadoop/files/load_tarifer-zf-4_20160902165521521.csv > -openforwrite -files -blocks -locations > DEPRECATED: Use of this script to execute hdfs command is deprecated. > Instead use the hdfs command for it. > Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8 > Connecting to namenode via > http://hadoop-hdfs:50070/fsck?ugi=hdfs&openforwrite=1&files=1&blocks=1&locations=1&path=%2Fstaging%2Flanding%2Fstream%2Fitc_dwh%2Ffiles%2Fload_tarifer-zf-4_20160902165521521.csv > FSCK started by hdfs (auth:SIMPLE) from /10.0.0.178 for path > /hadoop/files/load_tarifer-zf-4_20160902165521521.csv at Mon Oct 10 17:12:25 > MSK 2016 > /hadoop/files/load_tarifer-zf-4_20160902165521521.csv 920596121 bytes, 7 > block(s), OPENFORWRITE: MISSING 1 blocks of total size 115289753 B > 0. BP-1552885336-10.0.0.178-1446159880991:blk_1084952841_17798971 > len=134217728 repl=4 > [DatanodeInfoWithStorage[10.0.0.188:50010,DS-9ba44a76-113a-43ac-87dc-46aa97ba3267,DISK], > > DatanodeInfoWithStorage[10.0.0.183:50010,DS-eccd375a-ea32-491b-a4a3-5ea3faca4171,DISK], > > DatanodeInfoWithStorage[10.0.0.184:50010,DS-ec462491-6766-490a-a92f-38e9bb3be5ce,DISK], > > DatanodeInfoWithStorage[10.0.0.182:50010,DS-cef46399-bb70-4f1a-ac55-d71c7e820c29,DISK]] > 1. BP-1552885336-10.0.0.178-1446159880991:blk_1084952850_17799207 > len=134217728 repl=3 > [DatanodeInfoWithStorage[10.0.0.184:50010,DS-412769e0-0ec2-48d3-b644-b08a516b1c2c,DISK], > > DatanodeInfoWithStorage[10.0.0.181:50010,DS-97388b2f-c542-417d-ab06-c8d81b94fa9d,DISK], > > DatanodeInfoWithStorage[10.0.0.187:50010,DS-e7a11951-4315-4425-a88b-a9f6429cc058,DISK]] > 2. BP-1552885336-10.0.0.178-1446159880991:blk_1084952857_17799489 > len=134217728 repl=3 > [DatanodeInfoWithStorage[10.0.0.184:50010,DS-7a08c597-b0f4-46eb-9916-f028efac66d7,DISK], > > DatanodeInfoWithStorage[10.0.0.180:50010,DS-fa6a4630-1626-43d8-9988-955a86ac3736,DISK], > > DatanodeInfoWithStorage[10.0.0.182:50010,DS-8670e77d-c4db-4323-bb01-e0e64bd5b78e,DISK]] > 3. BP-1552885336-10.0.0.178-1446159880991:blk_1084952866_17799725 > len=134217728 repl=3 > [DatanodeInfoWithStorage[10.0.0.185:50010,DS-b5ff8ba0-275e-4846-b5a4-deda35aa0ad8,DISK], > > DatanodeInfoWithStorage[10.0.0.180:50010,DS-9cb6cade-9395-4f3a-ab7b-7fabd400b7f2,DISK], > > DatanodeInfoWithStorage[10.0.0.183:50010,DS-e277dcf3-1bce-4efd-a668-cd6fb2e10588,DISK]] > 4. BP-1552885336-10.0.0.178-1446159880991:blk_1084952872_17799891 > len=134217728 repl=4 > [DatanodeInfoWithStorage[10.0.0.184:50010,DS-e1d8f278-1a22-4294-ac7e-e12d554aef7f,DISK], > > DatanodeInfoWithStorage[10.0.0.186:50010,DS-5d9aeb2b-e677-41cd-844e-4b36b3c84092,DISK], > > DatanodeInfoWithStorage[10.0.0.183:50010,DS-eccd375a-ea32-491b-a4a3-5ea3faca4171,DISK], > > DatanodeInfoWithStorage[10.0.0.182:50010,DS-8670e77d-c4db-4323-bb01-e0e64bd5b78e,DISK]] > 5. BP-1552885336-10.0.0.78-1446159880991:blk_1084952880_17800120 > len=134217728 repl=3 > [DatanodeInfoWithStorage[10.0.0.181:50010,DS-79185b75-1938-4c91-a6d0-bb6687ca7e56,DISK], > > DatanodeInfoWithStorage[10.0.0.184:50010,DS-dcbd20aa-0334-49e0-b807-d2489f5923c6,DISK], > > DatanodeInfoWithStorage[10.0.0.183:50010,DS-f1d77328-f3af-483e-82e9-66ab0723a52c,DISK]] > 6. > BP-1552885336-10.0.0.178-1446159880991:blk_1084952887_17800316{UCState=COMMITTED, > truncateBlock=null, primaryNodeIndex=-1, > replicas=[ReplicaUC[[DISK]DS-5f3eac72-eb55-4df7-bcaa-a6fa35c166a0:NORMAL:10.0.0.188:50010|RBW], > > ReplicaUC[[DISK]DS-a2a0d8f0-772e-419f-b4ff-10b4966c57ca:NORMAL:10.0.0.184:50010|RBW], > > ReplicaUC[[DISK]DS-52984aa0-598e-4fff-acfa-8904ca7b585c:NORMAL:10.0.0.185:50010|RBW]]} > len=115289753 MISSING! > Status: CORRUPT > Total size: 920596121 B > Total dirs: 0 > Total files: 1 > Total symlinks: 0 > Total blocks (validated): 7 (avg. block size 131513731 B) > ******************************** > UNDER MIN REPL'D BLOCKS: 1 (14.285714 %) > dfs.namenode.replication.min: 1 > CORRUPT FILES: 1 > MISSING BLOCKS: 1 > MISSING SIZE: 115289753 B > ******************************** > Minimally replicated blocks: 6 (85.71429 %) > Over-replicated blocks: 2 (28.571428 %) > Under-replicated blocks: 0 (0.0 %) > Mis-replicated blocks: 0 (0.0 %) > Default replication factor: 3 > Average block replication: 2.857143 > Corrupt blocks: 0 > Missing replicas: 0 (0.0 %) > Number of data-nodes: 10 > Number of racks: 1 > FSCK ended at Mon Oct 10 17:12:25 MSK 2016 in 0 milliseconds > The filesystem under path > '/hadoop/files/load_tarifer-zf-4_20160902165521521.csv' is CORRUPT > File is UNDER_RECOVERY, NameNode think that last block in COMMITTED state, > datanode think that block in RBW state. Recover not executed. The last block > file and his meta exist's in 'rwb' directory on datanode: > -rw-r--r-- 1 hdfs hdfs 115289753 Sep 2 16:56 > /hadoopdir/data/current/BP-1552885336-10.0.0.178-1446159880991/current/rbw/blk_1084952887 > -rw-r--r-- 1 hdfs hdfs 900711 Sep 2 16:56 > /hadoopdir/data/current/BP-1552885336-10.0.0.178-1446159880991/current/rbw/blk_1084952887_17800316.meta > Lease recover tool said: > hdfs debug recoverLease -path > /hadoop/files/load_tarifer-zf-4_20160902165521521.csv > Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8 > recoverLease got exception: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): > Failed to RECOVER_LEASE > /hadoop/files/load_tarifer-zf-4_20160902165521521.csv for > DFSClient_NONMAPREDUCE_-1462314354_1 on 10.0.0.178 because the file is under > construction but no leases found. > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2892) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLease(FSNamesystem.java:2835) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.recoverLease(NameNodeRpcServer.java:668) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.recoverLease(ClientNamenodeProtocolServerSideTranslatorPB.java:663) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2081) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2077) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2075) > at org.apache.hadoop.ipc.Client.call(Client.java:1427) > at org.apache.hadoop.ipc.Client.call(Client.java:1358) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy9.recoverLease(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.recoverLease(ClientNamenodeProtocolTranslatorPB.java:603) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy10.recoverLease(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.recoverLease(DFSClient.java:1259) > at > org.apache.hadoop.hdfs.DistributedFileSystem$2.doCall(DistributedFileSystem.java:279) > at > org.apache.hadoop.hdfs.DistributedFileSystem$2.doCall(DistributedFileSystem.java:275) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.recoverLease(DistributedFileSystem.java:275) > at > org.apache.hadoop.hdfs.tools.DebugAdmin$RecoverLeaseCommand.run(DebugAdmin.java:256) > at org.apache.hadoop.hdfs.tools.DebugAdmin.run(DebugAdmin.java:336) > at org.apache.hadoop.hdfs.tools.DebugAdmin.main(DebugAdmin.java:359) > Giving up on recoverLease for > /hadoop/files/load_tarifer-zf-4_20160902165521521.csv after 1 try. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org