Hi Omprakash! This is *not* ok. Please go through the datanode logs of the inactive datanode and figure out why its inactive. If you set dfs.replication to 2, atleast as many datanodes (and ideally a LOT more datanodes) should be active and participating in the cluster.
Do you have the hdfs-site.xml you posted to the mailing list on all the nodes (including the Namenode)? Was the file containing block *blk_1074074104_337394* created when you had the cluster misconfigured to dfs.replication=3 ? You can determine which file the block belongs to using this command: hdfs fsck -blockId blk_1074074104 Once you have the file, you can set its replication using hdfs dfs -setrep 2 <Filename> I'm guessing that you probably have a lot of files with this replication, in which case you should set it on / (This would overwrite the replication on all the files) If the data on this cluster is important I would be very worried about the condition its in. HTH Ravi On Mon, Jun 26, 2017 at 11:22 PM, omprakash <ompraka...@cdac.in> wrote: > Hi all, > > > > I started the HDFS in DEBUG mode. After examining the logs I found below > logs which read that the replication factor required is 3 (as against the > specified *dfs.replication=2*). > > > > *DEBUG BlockStateChange: BLOCK* NameSystem.UnderReplicationBlock.add: > blk_1074074104_337394 has only 1 replicas and need 3 replicas so is added > to neededReplications at priority level 0* > > > > *P.S : I have 1 datanode active out of 2. * > > > > I can also see from Namenode UI that the no. of under replicated blocks > are growing. > > > > Any idea? Or this is OK. > > > > regards > > > > > > *From:* omprakash [mailto:ompraka...@cdac.in] > *Sent:* 23 June 2017 11:02 > *To:* 'Ravi Prakash' <ravihad...@gmail.com>; 'Arpit Agarwal' < > aagar...@hortonworks.com> > *Cc:* 'user' <user@hadoop.apache.org> > *Subject:* RE: Lots of warning messages and exception in namenode logs > > > > Hi Arpit, > > > > I will enable the settings as suggested and will post the results. > > > > I am just curious about setting *Namenode RPC service port*. As I have > checked the *hdfs-site.xml* properties, *dfs.namenode.rpc-address* is > already set which will be default value to RPC service port also. Does > specifying any other port have advantage over default one? > > > > Regarding JvmPauseMonitor Error, there are 5-6 instances of this error in > namenode logs. Here is one of them. > > > > How to identify the size of heap In such cases as I have 4GB of RAM on the > namenode VM.? > > > > *@Ravi* Since the file size are very small thus I have only configured a > VM with 20 GB space. The additional disk is simple SATA disk not SSD. > > > > As I can see from Namenode UI there are more than 50% of block under > replicated. I have now 400K blocks out of which 200K are under-replicated. > > I will post the results again after changing the value of > *dfs.namenode.replication.work > <http://dfs.namenode.replication.work>.multiplier.per.iteration* > > > > > > Thanks > > Om Prakash > > > > *From:* Ravi Prakash [mailto:ravihad...@gmail.com <ravihad...@gmail.com>] > *Sent:* 22 June 2017 23:04 > *To:* Arpit Agarwal <aagar...@hortonworks.com> > *Cc:* omprakash <ompraka...@cdac.in>; user <user@hadoop.apache.org> > > *Subject:* Re: Lots of warning messages and exception in namenode logs > > > > Hi Omprakash! > > How big are your disks? Just 20Gb? Just out of curiosity, are these SSDs? > > In addition to Arpit's reply, I'm also concerned with the number of > under-replicated blocks you have: Under replicated blocks: 141863 > > When there are fewer replicas for a block than there are supposed to be > (in your case e.g. when there's 1 replica when there ought to be 2), the > namenode will order the datanodes to create more replicas. The rate at > which it does this is controlled by > dfs.namenode.replication.work.multiplier.per.iteration . Given you have > only 2 datanodes, you'll only be re-replicating 4 blocks every 3 seconds. > So, it will take quite a while to re-replicate all the blocks. > > Also, please know that you want files to be much bigger than 1kb. Ideally > you'd have a couple of blocks (blocks=128Mb) for each file. You should > append to files when they are this small. > > Please do let us know how things turn out. > > Cheers, > > Ravi > > > > On Wed, Jun 21, 2017 at 11:23 PM, Arpit Agarwal <aagar...@hortonworks.com> > wrote: > > Hi Omprakash, > > > > Your description suggests DataNodes cannot send timely reports to the > NameNode. You can check it by looking for ‘stale’ DataNodes in the NN web > UI when this situation is occurring. A few ideas: > > > > - Try increasing the NameNode RPC handler count a bit (set > dfs.namenode.handler.count to 20 in hdfs-site.xml). > - Enable the NameNode service RPC port. This requires downtime and > reformatting the ZKFC znode. > - Search for JvmPauseMonitor messages in your service logs. If you see > any, try increasing JVM heap for that service. > - Enable debug logging as suggested here: > > > > *2017-06-21 12:11:30,626 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed > to place enough replicas, still in need of 1 to reach 2 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=true) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and > **org.apache.hadoop.net > <http://org.apache.hadoop.net/>.NetworkTopology* > > > > > > *From: *omprakash <ompraka...@cdac.in> > *Date: *Wednesday, June 21, 2017 at 9:23 PM > *To: *'Ravi Prakash' <ravihad...@gmail.com> > *Cc: *'user' <user@hadoop.apache.org> > *Subject: *RE: Lots of warning messages and exception in namenode logs > > > > Hi Ravi, > > > > Pasting below my core-site and hdfs-site configurations. I have kept bare > minimal configurations for my cluster. The cluster started fine and I was > able to put couple of 100K files on hdfs but then when I checked the logs > there were errors/Exceptions. After restart of datanodes they work well for > few thousand files but same problem again. No idea what is wrong. > > > > *PS: I am pumping 1 file per second to hdfs with aprox size 1KB* > > > > I thought it may be due to space quota on datanodes but here is the output > of *hdfs dfs -report*. Looks fine to me > > > > $ hdfs dfsadmin -report > > > > Configured Capacity: 42005069824 (39.12 GB) > > Present Capacity: 38085839568 (35.47 GB) > > DFS Remaining: 34949058560 (32.55 GB) > > DFS Used: 3136781008 <(313)%20678-1008> (2.92 GB) > > DFS Used%: 8.24% > > Under replicated blocks: 141863 > > Blocks with corrupt replicas: 0 > > Missing blocks: 0 > > Missing blocks (with replication factor 1): 0 > > Pending deletion blocks: 0 > > > > ------------------------------------------------- > > Live datanodes (2): > > > > Name: 192.168.9.174:50010 (node5) > > Hostname: node5 > > Decommission Status : Normal > > Configured Capacity: 21002534912 (19.56 GB) > > DFS Used: 1764211024 (1.64 GB) > > Non DFS Used: 811509424 (773.92 MB) > > DFS Remaining: 17067913216 <(706)%20791-3216> (15.90 GB) > > DFS Used%: 8.40% > > DFS Remaining%: 81.27% > > Configured Cache Capacity: 0 (0 B) > > Cache Used: 0 (0 B) > > Cache Remaining: 0 (0 B) > > Cache Used%: 100.00% > > Cache Remaining%: 0.00% > > Xceivers: 2 > > Last contact: Wed Jun 21 14:38:17 IST 2017 > > > > > > Name: 192.168.9.225:50010 (node4) > > Hostname: node5 > > Decommission Status : Normal > > Configured Capacity: 21002534912 (19.56 GB) > > DFS Used: 1372569984 (1.28 GB) > > Non DFS Used: 658353792 (627.86 MB) > > DFS Remaining: 17881145344 (16.65 GB) > > DFS Used%: 6.54% > > DFS Remaining%: 85.14% > > Configured Cache Capacity: 0 (0 B) > > Cache Used: 0 (0 B) > > Cache Remaining: 0 (0 B) > > Cache Used%: 100.00% > > Cache Remaining%: 0.00% > > Xceivers: 1 > > Last contact: Wed Jun 21 14:38:19 IST 2017 > > > > *core-site.xml* > > <?xml version="1.0" encoding="UTF-8"?> > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > > <configuration> > > <property> > > <name>fs.defaultFS</name> > > <value>hdfs://hdfsCluster</value> > > </property> > > <property> > > <name>dfs.journalnode.edits.dir</name> > > <value>/mnt/hadoopData/hadoop/journal/node/local/data</value> > > </property> > > </configuration> > > > > *hdfs-site.xml* > > <?xml version="1.0" encoding="UTF-8"?> > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > > <configuration> > > *<property>* > > *<name>dfs.replication</name>* > > *<value>2</value>* > > *</property>* > > <property> > > <name>dfs.name.dir</name> > > <value>file:///mnt/hadoopData/hadoop/hdfs/namenode</value> > > </property> > > <property> > > <name>dfs.data.dir</name> > > <value>file:///mnt/hadoopData/hadoop/hdfs/datanode</value> > > </property> > > <property> > > <name>dfs.nameservices</name> > > <value>hdfsCluster</value> > > </property> > > <property> > > <name>dfs.ha.namenodes.hdfsCluster</name> > > <value>nn1,nn2</value> > > </property> > > > > <property> > > <name>dfs.namenode.rpc-address.hdfsCluster.nn1</name> > > <value>node1:8020</value> > > </property> > > <property> > > <name>dfs.namenode.rpc-address.hdfsCluster.nn2</name> > > <value>node22:8020</value> > > </property> > > > > <property> > > <name>dfs.namenode.http-address.hdfsCluster.nn1</name> > > <value>node1:50070</value> > > </property> > > <property> > > <name>dfs.namenode.http-address.hdfsCluster.nn2</name> > > <value>node2:50070</value> > > </property> > > > > <property> > > <name>dfs.namenode.shared.edits.dir</name> > > <value>qjournal://node1:8485;node2:8485;node3:8485;node4: > 8485;node5:8485/hdfsCluster</value> > > </property> > > <property> > > <name>dfs.client.failover.proxy.provider.hdfsCluster</name> > > <value>org.apache.hadoop.hdfs.server.namenode.ha. > ConfiguredFailoverProxyProvider</value> > > </property> > > <property> > > <name>ha.zookeeper.quorum</name> > > <value>node1:2181,node2:2181,node3:2181,node4:2181,node5:2181</value> > > </property> > > <property> > > <name>dfs.ha.fencing.methods</name> > > <value>sshfence</value> > > </property> > > <property> > > <name>dfs.ha.fencing.ssh.private-key-files</name> > > <value>/home/hadoop/.ssh/id_rsa</value> > > </property> > > <property> > > <name>dfs.ha.automatic-failover.enabled</name> > > <value>true</value> > > </property> > > </configuration> > > > > > > *From:* Ravi Prakash [mailto:ravihad...@gmail.com] > *Sent:* 22 June 2017 02:38 > *To:* omprakash <ompraka...@cdac.in> > *Cc:* user <user@hadoop.apache.org> > *Subject:* Re: Lots of warning messages and exception in namenode logs > > > > Hi Omprakash! > > What is your default replication set to? What kind of disks do your > datanodes have? Were you able to start a cluster with a simple > configuration before you started tuning it? > > HDFS tries to create the default number of replicas for a block on > different datanodes. The Namenode tries to give a list of datanodes that > the client can write replicas of the block to. If the Namenode is not able > to construct a list with adequate number of datanodes, you will see the > message you are seeing. This may mean that datanodes are unhealthy (failed > disks), or full (disks have no more space), being decomissioned ( HDFS will > not write replicas on decomissioning datanodes) or misconfigured ( I'd > suggest turning on storage classes only after a simple configuration works). > > When a client that was trying to write a file was killed (e.g. if you > killed your MR job), after some time (hard limit expiring) the Namenode > will try to recover the file. In your case the namenode is also not able to > find enough datanodes for recovering the files. > > > > HTH > > Ravi > > > > > > On Tue, Jun 20, 2017 at 11:50 PM, omprakash <ompraka...@cdac.in> wrote: > > Hi, > > > > I am receiving lots of *warning messages in namenodes* logs on ACTIVE NN > in my *HA Hadoop setup*. Below are the logs > > > > *“2017-06-21 12:11:26,523 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 1 but only 0 storage types can be selected > (replication=2, selected=[], unavailable=[DISK], removed=[DISK], > policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], > replicationFallbacks=[ARCHIVE]})* > > *2017-06-21 12:11:26,523 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed > to place enough replicas, still in need of 1 to reach 2 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=true) All required storage types are unavailable: > unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}* > > *2017-06-21 12:11:26,523 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > allocate blk_1073894332_153508, replicas=**192.168.9.174:50010* > <http://192.168.9.174:50010>* for /36962._COPYING_* > > *2017-06-21 12:11:26,810 INFO org.apache.hadoop.hdfs.StateChange: DIR* > completeFile: /36962._COPYING_ is closed by > DFSClient_NONMAPREDUCE_146762699_1* > > *2017-06-21 12:11:30,626 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed > to place enough replicas, still in need of 1 to reach 2 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=true) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and * > *org.apache.hadoop.net* <http://org.apache.hadoop.net>*.NetworkTopology* > > *2017-06-21 12:11:30,626 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 1 but only 0 storage types can be selected > (replication=2, selected=[], unavailable=[DISK], removed=[DISK], > policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], > replicationFallbacks=[ARCHIVE]})”* > > > > I am also encountering exceptions in active namenode related to > LeaseManager > > > > *2017-06-21 12:13:16,706 INFO > org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease. Holder: > DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1] has expired > hard limit* > > *2017-06-21 12:13:16,706 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease. > Holder: DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1], > src=/user/hadoop/**2106201707* <(210)%20620-1707> > */02d5adda-d90f-47cb-85d5-999a079f4d79* > > *2017-06-21 12:13:16,706 WARN org.apache.hadoop.hdfs.StateChange: DIR* > NameSystem.internalReleaseLease: Failed to release lease for file > /user/hadoop/**2106201707* > <(210)%20620-1707>*/02d5adda-d90f-47cb-85d5-999a079f4d79. > Committed blocks are waiting to be minimally replicated. Try again later.* > > *2017-06-21 12:13:16,706 ERROR > org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the > path /user/hadoop/**2106201707* > <(210)%20620-1707>*/02d5adda-d90f-47cb-85d5-999a079f4d79 > in the lease [Lease. Holder: DFSClient_NONMAPREDUCE_409197282_362092, > pending creates: 1]* > > *org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR* > NameSystem.internalReleaseLease: Failed to release lease for file > /user/hadoop/**2106201707* > <(210)%20620-1707>*/02d5adda-d90f-47cb-85d5-999a079f4d79. > Committed blocks are waiting to be minimally replicated. Try again later.* > > * at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3200)* > > * at > org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:383)* > > * at > org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:329)* > > * at java.lang.Thread.run(Thread.java:745)* > > > > I have checked the two datanodes. Both are running and have enough space > for new data. > > > > *PS: I have 2 Namenode and 2 datanodes in Hadoop HA setup. The HA is > setuped using Qourom Journal Manager and Zookeeper server.* > > > > Any idea why these errors? > > > > *Regards* > > *Omprakash Paliwal* > > HPC-Medical and Bioinformatics Applications Group > > Centre for Development of Advanced Computing (C-DAC) > > Pune University campus, > > PUNE-411007 > > Maharashtra, India > > email:ompraka...@cdac.in > > Contact : +91-20-25704231 <+91%2020%202570%204231> > > > > > ------------------------------------------------------------ > ------------------------------------------------------------------- > [ C-DAC is on Social-Media too. Kindly follow us at: > Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ] > > This e-mail is for the sole use of the intended recipient(s) and may > contain confidential and privileged information. If you are not the > intended recipient, please contact the sender by reply e-mail and destroy > all copies and the original message. Any unauthorized review, use, > disclosure, dissemination, forwarding, printing or copying of this email > is strictly prohibited and appropriate legal action will be taken. > ------------------------------------------------------------ > ------------------------------------------------------------------- > > > > > ------------------------------------------------------------ > ------------------------------------------------------------------- > [ C-DAC is on Social-Media too. Kindly follow us at: > Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ] > > This e-mail is for the sole use of the intended recipient(s) and may > contain confidential and privileged information. If you are not the > intended recipient, please contact the sender by reply e-mail and destroy > all copies and the original message. Any unauthorized review, use, > disclosure, dissemination, forwarding, printing or copying of this email > is strictly prohibited and appropriate legal action will be taken. > ------------------------------------------------------------ > ------------------------------------------------------------------- > > > > ------------------------------------------------------------ > ------------------------------------------------------------------- > [ C-DAC is on Social-Media too. Kindly follow us at: > Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ] > > This e-mail is for the sole use of the intended recipient(s) and may > contain confidential and privileged information. If you are not the > intended recipient, please contact the sender by reply e-mail and destroy > all copies and the original message. Any unauthorized review, use, > disclosure, dissemination, forwarding, printing or copying of this email > is strictly prohibited and appropriate legal action will be taken. > ------------------------------------------------------------ > ------------------------------------------------------------------- >