[jira] [Commented] (HDFS-14735) File could only be replicated to 0 nodes instead of minReplication (=1)
[ https://issues.apache.org/jira/browse/HDFS-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281533#comment-17281533 ] Renukaprasad C commented on HDFS-14735: --- [~talexey] Thanks for reporting the issue, [~zhangchen] Thanks for replay & detailed clarification. We also face the similar issue which exists for some time and recovered automatically. But from the available INFO logs, we couldnt find which DN has issue. We can enhance the logging for better traceability. Can we print the log in INFO level? Sometimes issue not reproducible or enabling DEBUG log lead to huge log. We can add an event based thread which print the INFO log on node addition/deletion. What is your opinion? > File could only be replicated to 0 nodes instead of minReplication (=1) > --- > > Key: HDFS-14735 > URL: https://issues.apache.org/jira/browse/HDFS-14735 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Tatyana Alexeyev >Priority: Major > > Hello I have intermitent error when running my EMR Hadoop Cluster: > "Error: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/sphdadm/_sqoop/00501bd7b05e4182b5006b9d51 > bafb7f_f405b2f3/_temporary/1/_temporary/attempt_1565136887564_20057_m_00_0/part-m-0.snappy > could only be replicated to 0 nodes instead of minReplication (=1). There > are 5 datanode(s) running and no node(s) are excluded in this operation." > I am running Hadoop version > sphdadm@ip-10-6-15-108 hadoop]$ hadoop version > Hadoop 2.8.5-amzn-4 > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14735) File could only be replicated to 0 nodes instead of minReplication (=1)
[ https://issues.apache.org/jira/browse/HDFS-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16910897#comment-16910897 ] Chen Zhang commented on HDFS-14735: --- Sorry for delayed response, [~talexey], you can grep the keyword "is not chosen" in NameNode's log, it will tell you why the nodes can't be allocated. > File could only be replicated to 0 nodes instead of minReplication (=1) > --- > > Key: HDFS-14735 > URL: https://issues.apache.org/jira/browse/HDFS-14735 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Tatyana Alexeyev >Priority: Major > > Hello I have intermitent error when running my EMR Hadoop Cluster: > "Error: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/sphdadm/_sqoop/00501bd7b05e4182b5006b9d51 > bafb7f_f405b2f3/_temporary/1/_temporary/attempt_1565136887564_20057_m_00_0/part-m-0.snappy > could only be replicated to 0 nodes instead of minReplication (=1). There > are 5 datanode(s) running and no node(s) are excluded in this operation." > I am running Hadoop version > sphdadm@ip-10-6-15-108 hadoop]$ hadoop version > Hadoop 2.8.5-amzn-4 > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14735) File could only be replicated to 0 nodes instead of minReplication (=1)
[ https://issues.apache.org/jira/browse/HDFS-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16910854#comment-16910854 ] Tatyana Alexeyev commented on HDFS-14735: - Hello Chen, I have enabled the debugger... What is the next step? Thanks, tanya > File could only be replicated to 0 nodes instead of minReplication (=1) > --- > > Key: HDFS-14735 > URL: https://issues.apache.org/jira/browse/HDFS-14735 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Tatyana Alexeyev >Priority: Major > > Hello I have intermitent error when running my EMR Hadoop Cluster: > "Error: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/sphdadm/_sqoop/00501bd7b05e4182b5006b9d51 > bafb7f_f405b2f3/_temporary/1/_temporary/attempt_1565136887564_20057_m_00_0/part-m-0.snappy > could only be replicated to 0 nodes instead of minReplication (=1). There > are 5 datanode(s) running and no node(s) are excluded in this operation." > I am running Hadoop version > sphdadm@ip-10-6-15-108 hadoop]$ hadoop version > Hadoop 2.8.5-amzn-4 > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14735) File could only be replicated to 0 nodes instead of minReplication (=1)
[ https://issues.apache.org/jira/browse/HDFS-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908812#comment-16908812 ] Tatyana Alexeyev commented on HDFS-14735: - There are some error in datanode log file: 2019-08-16 03:52:05,696 INFO org.apache.hadoop.hdfs.server.datanode.DataNode (BP-56322450-10.6.14.101-1565836229790 heartbeating to ip-10-6-14-101.us-east-2.compute.internal/10.6.14.101:8020): DatanodeRegistration(10.6.13.226:50010, datanodeUuid=4a8a4e5b-604d-4a8d-96b7-246ccf4d9baf, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-57;cid=CID-39f69814-6f95-4195-8272-37b6d2166de4;nsid=950102054;c=1565836229790) Starting thread to transfer BP-56322450-10.6.14.101-1565836229790:blk_1073973783_232959 to 10.6.14.73:50010 10.6.13.248:50010 2019-08-16 03:52:05,696 INFO org.apache.hadoop.hdfs.server.datanode.DataNode (BP-56322450-10.6.14.101-1565836229790 heartbeating to ip-10-6-14-101.us-east-2.compute.internal/10.6.14.101:8020): DatanodeRegistration(10.6.13.226:50010, datanodeUuid=4a8a4e5b-604d-4a8d-96b7-246ccf4d9baf, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-57;cid=CID-39f69814-6f95-4195-8272-37b6d2166de4;nsid=950102054;c=1565836229790) Starting thread to transfer BP-56322450-10.6.14.101-1565836229790:blk_1073973798_232974 to 10.6.13.248:50010 10.6.14.73:50010 2019-08-16 03:52:05,696 INFO org.apache.hadoop.hdfs.server.datanode.DataNode ([org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer@32ac48b6|mailto:org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer@32ac48b6]): DataTransfer, at ip-10-6-13-226.us-east-2.compute.internal:50010: Transmitted BP-56322450-10.6.14.101-1565836229790:blk_1073973727_232903 (numBytes=60530) to /10.6.13.248:50010 2019-08-16 03:52:05,696 INFO org.apache.hadoop.hdfs.server.datanode.DataNode ([org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer@57370a18|mailto:org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer@57370a18]): DataTransfer, at ip-10-6-13-226.us-east-2.compute.internal:50010: Transmitted BP-56322450-10.6.14.101-1565836229790:blk_1073973701_232877 (numBytes=36427) to /10.6.14.73:50010 2019-08-16 03:52:05,696 INFO org.apache.hadoop.hdfs.server.datanode.DataNode ([org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer@766e0d79|mailto:org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer@766e0d79]): DataTransfer, at ip-10-6-13-226.us-east-2.compute.internal:50010: Transmitted BP-56322450-10.6.14.101-1565836229790:blk_1073973741_232917 (numBytes=36427) to /10.6.14.73:50010 2019-08-16 03:52:05,697 WARN org.apache.hadoop.hdfs.server.datanode.DataNode (BP-56322450-10.6.14.101-1565836229790 heartbeating to ip-10-6-14-101.us-east-2.compute.internal/10.6.14.101:8020): *Can't replicate block BP-56322450-10.6.14.101-1565836229790:blk_1073973750_232926 because on-disk length 402868 is shorter than NameNode recorded length 9223372036854775807* ** There are some messages related to the replication in the Namenode log: 2019-08-16 04:00:00,141 INFO org.apache.hadoop.hdfs.StateChange (IPC Server handler 30 on 8020): DIR* completeFile: /tmp/hadoop-yarn/staging/sphdadm/.staging/job_1565836275738_5267/job.xml is closed by DFSClient_NONMAPREDUCE_-1718547537_1 2019-08-16 04:00:00,152 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory (IPC Server handler 1 on 8020): *Increasing replication from 1 to 4 for /tmp/hadoop-yarn/staging/sphdadm/.staging/job_1565836275738_5268/libjars/parquet-encoding-1.6.0.jar* ** > File could only be replicated to 0 nodes instead of minReplication (=1) > --- > > Key: HDFS-14735 > URL: https://issues.apache.org/jira/browse/HDFS-14735 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Tatyana Alexeyev >Priority: Major > > Hello I have intermitent error when running my EMR Hadoop Cluster: > "Error: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/sphdadm/_sqoop/00501bd7b05e4182b5006b9d51 > bafb7f_f405b2f3/_temporary/1/_temporary/attempt_1565136887564_20057_m_00_0/part-m-0.snappy > could only be replicated to 0 nodes instead of minReplication (=1). There > are 5 datanode(s) running and no node(s) are excluded in this operation." > I am running Hadoop version > sphdadm@ip-10-6-15-108 hadoop]$ hadoop version > Hadoop 2.8.5-amzn-4 > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14735) File could only be replicated to 0 nodes instead of minReplication (=1)
[ https://issues.apache.org/jira/browse/HDFS-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908531#comment-16908531 ] Tatyana Alexeyev commented on HDFS-14735: - Can you please let me know how to enable debugger? > File could only be replicated to 0 nodes instead of minReplication (=1) > --- > > Key: HDFS-14735 > URL: https://issues.apache.org/jira/browse/HDFS-14735 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Tatyana Alexeyev >Priority: Major > > Hello I have intermitent error when running my EMR Hadoop Cluster: > "Error: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/sphdadm/_sqoop/00501bd7b05e4182b5006b9d51 > bafb7f_f405b2f3/_temporary/1/_temporary/attempt_1565136887564_20057_m_00_0/part-m-0.snappy > could only be replicated to 0 nodes instead of minReplication (=1). There > are 5 datanode(s) running and no node(s) are excluded in this operation." > I am running Hadoop version > sphdadm@ip-10-6-15-108 hadoop]$ hadoop version > Hadoop 2.8.5-amzn-4 > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14735) File could only be replicated to 0 nodes instead of minReplication (=1)
[ https://issues.apache.org/jira/browse/HDFS-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908454#comment-16908454 ] Tatyana Alexeyev commented on HDFS-14735: - bash-4.2$ hdfs dfsadmin -report Configured Capacity: 1630750064640 (1.48 TB) Present Capacity: 1591627264267 (1.45 TB) DFS Remaining: 1398788173824 (1.27 TB) DFS Used: 192839090443 (179.60 GB) DFS Used%: 12.12% Under replicated blocks: 318 Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0 Pending deletion blocks: 48 - Live datanodes (3): Name: 10.6.13.226:50010 (ip-10-6-13-226.us-east-2.compute.internal) Hostname: ip-10-6-13-226.us-east-2.compute.internal Decommission Status : Normal Configured Capacity: 543583354880 (506.25 GB) DFS Used: 62489766754 (58.20 GB) Non DFS Used: 13059818654 (12.16 GB) DFS Remaining: 468033769472 (435.89 GB) DFS Used%: 11.50% DFS Remaining%: 86.10% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 3 Last contact: Thu Aug 15 20:33:25 UTC 2019 Name: 10.6.13.248:50010 (ip-10-6-13-248.us-east-2.compute.internal) Hostname: ip-10-6-13-248.us-east-2.compute.internal Decommission Status : Normal Configured Capacity: 543583354880 (506.25 GB) DFS Used: 66380404936 (61.82 GB) Non DFS Used: 13126735672 (12.23 GB) DFS Remaining: 464076214272 (432.20 GB) DFS Used%: 12.21% DFS Remaining%: 85.37% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 3 Last contact: Thu Aug 15 20:33:25 UTC 2019 Name: 10.6.14.73:50010 (ip-10-6-14-73.us-east-2.compute.internal) Hostname: ip-10-6-14-73.us-east-2.compute.internal Decommission Status : Normal Configured Capacity: 543583354880 (506.25 GB) DFS Used: 63968918753 (59.58 GB) Non DFS Used: 12936246047 (12.05 GB) DFS Remaining: 466678190080 (434.63 GB) DFS Used%: 11.77% DFS Remaining%: 85.85% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 4 Last contact: Thu Aug 15 20:33:25 UTC 2019 > File could only be replicated to 0 nodes instead of minReplication (=1) > --- > > Key: HDFS-14735 > URL: https://issues.apache.org/jira/browse/HDFS-14735 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Tatyana Alexeyev >Priority: Major > > Hello I have intermitent error when running my EMR Hadoop Cluster: > "Error: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/sphdadm/_sqoop/00501bd7b05e4182b5006b9d51 > bafb7f_f405b2f3/_temporary/1/_temporary/attempt_1565136887564_20057_m_00_0/part-m-0.snappy > could only be replicated to 0 nodes instead of minReplication (=1). There > are 5 datanode(s) running and no node(s) are excluded in this operation." > I am running Hadoop version > sphdadm@ip-10-6-15-108 hadoop]$ hadoop version > Hadoop 2.8.5-amzn-4 > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14735) File could only be replicated to 0 nodes instead of minReplication (=1)
[ https://issues.apache.org/jira/browse/HDFS-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908453#comment-16908453 ] Tatyana Alexeyev commented on HDFS-14735: - Can you please let me know how to enable debugger? > File could only be replicated to 0 nodes instead of minReplication (=1) > --- > > Key: HDFS-14735 > URL: https://issues.apache.org/jira/browse/HDFS-14735 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Tatyana Alexeyev >Priority: Major > > Hello I have intermitent error when running my EMR Hadoop Cluster: > "Error: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/sphdadm/_sqoop/00501bd7b05e4182b5006b9d51 > bafb7f_f405b2f3/_temporary/1/_temporary/attempt_1565136887564_20057_m_00_0/part-m-0.snappy > could only be replicated to 0 nodes instead of minReplication (=1). There > are 5 datanode(s) running and no node(s) are excluded in this operation." > I am running Hadoop version > sphdadm@ip-10-6-15-108 hadoop]$ hadoop version > Hadoop 2.8.5-amzn-4 > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14735) File could only be replicated to 0 nodes instead of minReplication (=1)
[ https://issues.apache.org/jira/browse/HDFS-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908452#comment-16908452 ] Tatyana Alexeyev commented on HDFS-14735: - I have enought free space: [root@ip-10-6-14-101 hadoop]# hdfs dfsadmin -report Configured Capacity: 1630750064640 (1.48 TB) Present Capacity: 1594786349372 (1.45 TB) DFS Remaining: 1401552084992 (1.27 TB) DFS Used: 193234264380 (179.96 GB) DFS Used%: 12.12% Under replicated blocks: 613 Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0 Pending deletion blocks: 134 - > File could only be replicated to 0 nodes instead of minReplication (=1) > --- > > Key: HDFS-14735 > URL: https://issues.apache.org/jira/browse/HDFS-14735 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Tatyana Alexeyev >Priority: Major > > Hello I have intermitent error when running my EMR Hadoop Cluster: > "Error: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/sphdadm/_sqoop/00501bd7b05e4182b5006b9d51 > bafb7f_f405b2f3/_temporary/1/_temporary/attempt_1565136887564_20057_m_00_0/part-m-0.snappy > could only be replicated to 0 nodes instead of minReplication (=1). There > are 5 datanode(s) running and no node(s) are excluded in this operation." > I am running Hadoop version > sphdadm@ip-10-6-15-108 hadoop]$ hadoop version > Hadoop 2.8.5-amzn-4 > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14735) File could only be replicated to 0 nodes instead of minReplication (=1)
[ https://issues.apache.org/jira/browse/HDFS-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907776#comment-16907776 ] Chen Zhang commented on HDFS-14735: --- How much spaces left in your cluster? You can enable the debug log to check why the allocation failed > File could only be replicated to 0 nodes instead of minReplication (=1) > --- > > Key: HDFS-14735 > URL: https://issues.apache.org/jira/browse/HDFS-14735 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Tatyana Alexeyev >Priority: Major > > Hello I have intermitent error when running my EMR Hadoop Cluster: > "Error: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/sphdadm/_sqoop/00501bd7b05e4182b5006b9d51 > bafb7f_f405b2f3/_temporary/1/_temporary/attempt_1565136887564_20057_m_00_0/part-m-0.snappy > could only be replicated to 0 nodes instead of minReplication (=1). There > are 5 datanode(s) running and no node(s) are excluded in this operation." > I am running Hadoop version > sphdadm@ip-10-6-15-108 hadoop]$ hadoop version > Hadoop 2.8.5-amzn-4 > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14735) File could only be replicated to 0 nodes instead of minReplication (=1)
[ https://issues.apache.org/jira/browse/HDFS-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907747#comment-16907747 ] Tatyana Alexeyev commented on HDFS-14735: - This error happens intermitently during the Sqoop and Pig operations... > File could only be replicated to 0 nodes instead of minReplication (=1) > --- > > Key: HDFS-14735 > URL: https://issues.apache.org/jira/browse/HDFS-14735 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Tatyana Alexeyev >Priority: Major > > Hello I have intermitent error when running my EMR Hadoop Cluster: > "Error: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/sphdadm/_sqoop/00501bd7b05e4182b5006b9d51 > bafb7f_f405b2f3/_temporary/1/_temporary/attempt_1565136887564_20057_m_00_0/part-m-0.snappy > could only be replicated to 0 nodes instead of minReplication (=1). There > are 5 datanode(s) running and no node(s) are excluded in this operation." > I am running Hadoop version > sphdadm@ip-10-6-15-108 hadoop]$ hadoop version > Hadoop 2.8.5-amzn-4 > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org