Hi Omprakash,

Your description suggests DataNodes cannot send timely reports to the NameNode. 
You can check it by looking for ‘stale’ DataNodes in the NN web UI when this 
situation is occurring. A few ideas:


  *   Try increasing the NameNode RPC handler count a bit (set 
dfs.namenode.handler.count to 20 in hdfs-site.xml).
  *   Enable the NameNode service RPC port. This requires downtime and 
reformatting the ZKFC znode.
  *   Search for JvmPauseMonitor messages in your service logs. If you see any, 
try increasing JVM heap for that service.
  *   Enable debug logging as suggested here:

2017-06-21 12:11:30,626 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
place enough replicas, still in need of 1 to reach 2 (unavailableStorages=[], 
storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) For more 
information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and 
org.apache.hadoop.net<http://org.apache.hadoop.net/>.NetworkTopology


From: omprakash <ompraka...@cdac.in>
Date: Wednesday, June 21, 2017 at 9:23 PM
To: 'Ravi Prakash' <ravihad...@gmail.com>
Cc: 'user' <user@hadoop.apache.org>
Subject: RE: Lots of warning messages and exception in namenode logs

Hi Ravi,

Pasting below my core-site and hdfs-site  configurations. I have kept bare 
minimal configurations for my cluster.  The cluster started fine and I was able 
to put couple of 100K files on hdfs but then when I checked the logs there were 
errors/Exceptions. After restart of datanodes they work well for few thousand 
files but same problem again.  No idea what is wrong.

PS: I am pumping 1 file per second to hdfs with aprox size 1KB

I thought it may be due to space quota on datanodes but here is the output of 
hdfs dfs -report. Looks fine to me

$ hdfs dfsadmin -report

Configured Capacity: 42005069824 (39.12 GB)
Present Capacity: 38085839568 (35.47 GB)
DFS Remaining: 34949058560 (32.55 GB)
DFS Used: 3136781008 (2.92 GB)
DFS Used%: 8.24%
Under replicated blocks: 141863
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Pending deletion blocks: 0

-------------------------------------------------
Live datanodes (2):

Name: 192.168.9.174:50010 (node5)
Hostname: node5
Decommission Status : Normal
Configured Capacity: 21002534912 (19.56 GB)
DFS Used: 1764211024 (1.64 GB)
Non DFS Used: 811509424 (773.92 MB)
DFS Remaining: 17067913216 (15.90 GB)
DFS Used%: 8.40%
DFS Remaining%: 81.27%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Wed Jun 21 14:38:17 IST 2017


Name: 192.168.9.225:50010 (node4)
Hostname: node5
Decommission Status : Normal
Configured Capacity: 21002534912 (19.56 GB)
DFS Used: 1372569984 (1.28 GB)
Non DFS Used: 658353792 (627.86 MB)
DFS Remaining: 17881145344 (16.65 GB)
DFS Used%: 6.54%
DFS Remaining%: 85.14%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Jun 21 14:38:19 IST 2017

core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
  <name>fs.defaultFS</name>
  <value>hdfs://hdfsCluster</value>
</property>
<property>
  <name>dfs.journalnode.edits.dir</name>
  <value>/mnt/hadoopData/hadoop/journal/node/local/data</value>
</property>
</configuration>

hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
  <name>dfs.name.dir</name>
    <value>file:///mnt/hadoopData/hadoop/hdfs/namenode</value>
</property>
<property>
  <name>dfs.data.dir</name>
    <value>file:///mnt/hadoopData/hadoop/hdfs/datanode</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>hdfsCluster</value>
</property>
<property>
  <name>dfs.ha.namenodes.hdfsCluster</name>
  <value>nn1,nn2</value>
</property>

<property>
  <name>dfs.namenode.rpc-address.hdfsCluster.nn1</name>
  <value>node1:8020</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.hdfsCluster.nn2</name>
  <value>node22:8020</value>
</property>

<property>
  <name>dfs.namenode.http-address.hdfsCluster.nn1</name>
  <value>node1:50070</value>
</property>
<property>
  <name>dfs.namenode.http-address.hdfsCluster.nn2</name>
  <value>node2:50070</value>
</property>

<property>
  <name>dfs.namenode.shared.edits.dir</name>
  
<value>qjournal://node1:8485;node2:8485;node3:8485;node4:8485;node5:8485/hdfsCluster</value>
</property>
<property>
  <name>dfs.client.failover.proxy.provider.hdfsCluster</name>
  
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
   <name>ha.zookeeper.quorum</name>
   <value>node1:2181,node2:2181,node3:2181,node4:2181,node5:2181</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<property>
   <name>dfs.ha.automatic-failover.enabled</name>
   <value>true</value>
</property>
</configuration>


From: Ravi Prakash [mailto:ravihad...@gmail.com]
Sent: 22 June 2017 02:38
To: omprakash <ompraka...@cdac.in>
Cc: user <user@hadoop.apache.org>
Subject: Re: Lots of warning messages and exception in namenode logs

Hi Omprakash!
What is your default replication set to? What kind of disks do your datanodes 
have? Were you able to start a cluster with a simple configuration before you 
started tuning it?
HDFS tries to create the default number of replicas for a block on different 
datanodes. The Namenode tries to give a list of datanodes that the client can 
write replicas of the block to. If the Namenode is not able to construct a list 
with adequate number of datanodes, you will see the message you are seeing. 
This may mean that datanodes are unhealthy (failed disks), or full (disks have 
no more space), being decomissioned ( HDFS will not write replicas on 
decomissioning datanodes) or misconfigured ( I'd suggest turning on storage 
classes only after a simple configuration works).
When a client that was trying to write a file was killed (e.g. if you killed 
your MR job), after some time (hard limit expiring) the Namenode will try to 
recover the file. In your case the namenode is also not able to find enough 
datanodes for recovering the files.

HTH
Ravi





On Tue, Jun 20, 2017 at 11:50 PM, omprakash 
<ompraka...@cdac.in<mailto:ompraka...@cdac.in>> wrote:
Hi,

I am receiving lots of  warning messages in namenodes logs on ACTIVE NN in my 
HA Hadoop setup. Below are the logs

“2017-06-21 12:11:26,523 WARN 
org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough 
replicas: expected size is 1 but only 0 storage types can be selected 
(replication=2, selected=[], unavailable=[DISK], removed=[DISK], 
policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], 
replicationFallbacks=[ARCHIVE]})
2017-06-21 12:11:26,523 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
place enough replicas, still in need of 1 to reach 2 
(unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, 
storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
newBlock=true) All required storage types are unavailable:  
unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, 
storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
2017-06-21 12:11:26,523 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
allocate blk_1073894332_153508, 
replicas=192.168.9.174:50010<http://192.168.9.174:50010> for /36962._COPYING_
2017-06-21 12:11:26,810 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
completeFile: /36962._COPYING_ is closed by DFSClient_NONMAPREDUCE_146762699_1
2017-06-21 12:11:30,626 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
place enough replicas, still in need of 1 to reach 2 (unavailableStorages=[], 
storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) For more 
information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and 
org.apache.hadoop.net<http://org.apache.hadoop.net>.NetworkTopology
2017-06-21 12:11:30,626 WARN 
org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough 
replicas: expected size is 1 but only 0 storage types can be selected 
(replication=2, selected=[], unavailable=[DISK], removed=[DISK], 
policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], 
replicationFallbacks=[ARCHIVE]})”

I am also encountering exceptions in active namenode related to LeaseManager

2017-06-21 12:13:16,706 INFO 
org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder: 
DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1] has expired hard 
limit
2017-06-21 12:13:16,706 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.  
Holder: DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1], 
src=/user/hadoop/2106201707<tel:(210)%20620-1707>/02d5adda-d90f-47cb-85d5-999a079f4d79
2017-06-21 12:13:16,706 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
NameSystem.internalReleaseLease: Failed to release lease for file 
/user/hadoop/2106201707<tel:(210)%20620-1707>/02d5adda-d90f-47cb-85d5-999a079f4d79.
 Committed blocks are waiting to be minimally replicated. Try again later.
2017-06-21 12:13:16,706 ERROR 
org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the path 
/user/hadoop/2106201707<tel:(210)%20620-1707>/02d5adda-d90f-47cb-85d5-999a079f4d79
 in the lease [Lease.  Holder: DFSClient_NONMAPREDUCE_409197282_362092, pending 
creates: 1]
org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR* 
NameSystem.internalReleaseLease: Failed to release lease for file 
/user/hadoop/2106201707<tel:(210)%20620-1707>/02d5adda-d90f-47cb-85d5-999a079f4d79.
 Committed blocks are waiting to be minimally replicated. Try again later.
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3200)
        at 
org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:383)
        at 
org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:329)
        at java.lang.Thread.run(Thread.java:745)

I have checked the two datanodes. Both are running and have enough space for 
new data.

PS: I have 2 Namenode and 2 datanodes in Hadoop HA setup. The HA is setuped 
using Qourom Journal Manager and  Zookeeper server.

Any idea why these errors?

Regards
Omprakash Paliwal
HPC-Medical and Bioinformatics Applications Group
Centre for Development of Advanced Computing (C-DAC)
Pune University campus,
PUNE-411007
Maharashtra, India
email:ompraka...@cdac.in<mailto:ompraka...@cdac.in>
Contact : +91-20-25704231<tel:+91%2020%202570%204231>


-------------------------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
-------------------------------------------------------------------------------------------------------------------------------


-------------------------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
-------------------------------------------------------------------------------------------------------------------------------

Reply via email to