Fei Hui created HDFS-9530:
-----------------------------

             Summary: huge Non-DFS Used in hadoop 2.6.2 & 2.7.1
                 Key: HDFS-9530
                 URL: https://issues.apache.org/jira/browse/HDFS-9530
             Project: Hadoop HDFS
          Issue Type: Bug
            Reporter: Fei Hui


i run a hive job, and errors are as follow
===============================================================================
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row {"k":"1","v":1}
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:172)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row {"k":"1","v":1}
        at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518)
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163)
        ... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
/test_abc/.hive-staging_hive_2015-12-09_15-24-10_553_7745334154733108653-1/_task_tmp.-ext-10002/pt=23/_tmp.000017_3
 could only be replicated to 0 nodes instead of minReplication (=1).  There are 
3 datanode(s) running and no node(s) are excluded in this operation.
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1562)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3245)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:663)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2036)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2034)

        at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:787)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
        at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
        at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
        at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162)
        at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
        ... 9 more
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
/test_abc/.hive-staging_hive_2015-12-09_15-24-10_553_7745334154733108653-1/_task_tmp.-ext-10002/pt=23/_tmp.000017_3
 could only be replicated to 0 nodes instead of minReplication (=1).  There are 
3 datanode(s) running and no node(s) are excluded in this operation.
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1562)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3245)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:663)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2036)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2034)

        at org.apache.hadoop.ipc.Client.call(Client.java:1469)
        at org.apache.hadoop.ipc.Client.call(Client.java:1400)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
        at com.sun.proxy.$Proxy12.addBlock(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at com.sun.proxy.$Proxy13.addBlock(Unknown Source)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1532)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1349)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588)


i think there are bugs in HDFS
===============================================================================
here is config
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>
        file:///mnt/disk4,file:///mnt/disk1,file:///mnt/disk3,file:///mnt/disk2
    </value>
  </property>

here is dfsadmin report 

[hadoop@worker-1 ~]$ hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 240769253376 (224.23 GB)
Present Capacity: 238604832768 (222.22 GB)
DFS Remaining: 215772954624 (200.95 GB)
DFS Used: 22831878144 (21.26 GB)
DFS Used%: 9.57%
Under replicated blocks: 4
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Live datanodes (3):

Name: 10.117.60.59:50010 (worker-2)
Hostname: worker-2
Decommission Status : Normal
Configured Capacity: 80256417792 (74.74 GB)
DFS Used: 7190958080 (6.70 GB)
Non DFS Used: 721473536 (688.05 MB)
DFS Remaining: 72343986176 (67.38 GB)
DFS Used%: 8.96%
DFS Remaining%: 90.14%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Dec 09 15:55:02 CST 2015


Name: 10.168.156.0:50010 (worker-3)
Hostname: worker-3
Decommission Status : Normal
Configured Capacity: 80256417792 (74.74 GB)
DFS Used: 7219073024 (6.72 GB)
Non DFS Used: 721473536 (688.05 MB)
DFS Remaining: 72315871232 (67.35 GB)
DFS Used%: 9.00%
DFS Remaining%: 90.11%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Dec 09 15:55:03 CST 2015


Name: 10.117.15.38:50010 (worker-1)
Hostname: worker-1
Decommission Status : Normal
Configured Capacity: 80256417792 (74.74 GB)
DFS Used: 8421847040 (7.84 GB)
Non DFS Used: 721473536 (688.05 MB)
DFS Remaining: 71113097216 (66.23 GB)
DFS Used%: 10.49%
DFS Remaining%: 88.61%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Dec 09 15:55:03 CST 2015

================================================================================

when running hive job , dfsadmin report as follows

[hadoop@worker-1 ~]$ hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 240769253376 (224.23 GB)
Present Capacity: 108266011136 (100.83 GB)
DFS Remaining: 80078416384 (74.58 GB)
DFS Used: 28187594752 (26.25 GB)
DFS Used%: 26.04%
Under replicated blocks: 7
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Live datanodes (3):

Name: 10.117.60.59:50010 (worker-2)
Hostname: worker-2
Decommission Status : Normal
Configured Capacity: 80256417792 (74.74 GB)
DFS Used: 9015627776 (8.40 GB)
Non DFS Used: 44303742464 (41.26 GB)
DFS Remaining: 26937047552 (25.09 GB)
DFS Used%: 11.23%
DFS Remaining%: 33.56%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 693
Last contact: Wed Dec 09 15:37:35 CST 2015


Name: 10.168.156.0:50010 (worker-3)
Hostname: worker-3
Decommission Status : Normal
Configured Capacity: 80256417792 (74.74 GB)
DFS Used: 9163116544 (8.53 GB)
Non DFS Used: 47895897600 (44.61 GB)
DFS Remaining: 23197403648 (21.60 GB)
DFS Used%: 11.42%
DFS Remaining%: 28.90%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 750
Last contact: Wed Dec 09 15:37:36 CST 2015


Name: 10.117.15.38:50010 (worker-1)
Hostname: worker-1
Decommission Status : Normal
Configured Capacity: 80256417792 (74.74 GB)
DFS Used: 10008850432 (9.32 GB)
Non DFS Used: 40303602176 (37.54 GB)
DFS Remaining: 29943965184 (27.89 GB)
DFS Used%: 12.47%
DFS Remaining%: 37.31%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 632
Last contact: Wed Dec 09 15:37:36 CST 2015

=========================================================================
but, df output is as follows on worker-1
[hadoop@worker-1 ~]$ df
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/xvda1      20641404 4229676  15363204  22% /
tmpfs            8165456       0   8165456   0% /dev/shm
/dev/xvdc       20642428 2596920  16996932  14% /mnt/disk3
/dev/xvdb       20642428 2692228  16901624  14% /mnt/disk4
/dev/xvdd       20642428 2445852  17148000  13% /mnt/disk2
/dev/xvde       20642428 2909764  16684088  15% /mnt/disk1


df output conflitcs with dfsadmin report


any suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to