Hi Sam, Monitoring disks and other server related activities can be easily handled by Nagios
On Mon, Oct 20, 2014 at 11:58 AM, Dhiraj Kamble <[email protected]> wrote: > Formatting NameNode will cause data loss – in effect you will lose all > your data on DataNodes(rather access to data on DataNodes). NameNode will > have no idea where your data(files) are stored. I don’t think that’s what > you’re looking for. > > I am wondering why isn’t there any log information on DataNode for disk > full. What version of Hadoop are you using and what’s your configuration( > Single Node, Single Node Pseudo Distributed or Cluster) > > > > Regards, > > Dhiraj > > > > *From:* sam liu [mailto:[email protected]] > *Sent:* Monday, October 20, 2014 11:51 AM > *To:* [email protected] > *Subject:* Re: Can add a regular check in DataNode on free disk space? > > > > Hi unmesha, > > Thanks for your response, but I am not clear what effect will the hadoop > cluster has after applying above operations. Could you pls give more > explanations? > > > > 2014-10-19 21:37 GMT-07:00 unmesha sreeveni <[email protected]>: > > 1. Stop all Hadoop daemons > > 2. Remove all files from > > /var/lib/hadoop-hdfs/cache/hdfs/dfs/name > > 3. Format namenode > > 4. Start all Hadoop daemons. > > > > On Mon, Oct 20, 2014 at 8:26 AM, sam liu <[email protected]> wrote: > > Hi Experts and Developers, > > At present, if a DataNode does not has free disk space, we can not get > this bad situation from anywhere, including DataNode log. At the same time, > under this situation, the hdfs writing operation will fail and return error > msg as below. However, from the error msg, user could not know the root > cause is that the only datanode runs out of disk space, and he also could > not get any useful hint in datanode log. So I believe it will be better if > we could add a regular check in DataNode on free disk space, and it will > add WARNING or ERROR msg in datanode log if that datanode runs out of > space. What's your opinion? > > Error Msg: > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/hadoop/PiEstimator_TMP_3_141592654/in/part0 could only be replicated > to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running > and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > > Thanks! > > > > > > -- > > *Thanks & Regards * > > > > *Unmesha Sreeveni U.B* > > *Hadoop, Bigdata Developer* > > *Center for Cyber Security | Amrita Vishwa Vidyapeetham* > > http://www.unmeshasreeveni.blogspot.in/ > > > > > > > > ------------------------------ > > PLEASE NOTE: The information contained in this electronic mail message is > intended only for the use of the designated recipient(s) named above. If > the reader of this message is not the intended recipient, you are hereby > notified that you have received this message in error and that any review, > dissemination, distribution, or copying of this message is strictly > prohibited. If you have received this communication in error, please notify > the sender by telephone or e-mail (as shown above) immediately and destroy > any and all copies of this message in your possession (whether hard copies > or electronically stored copies). > > -- Nitin Pawar
