Hi Demai, conf = new Configuration()
will create a new Configuration object and only add the properties from core-default.xml and core-site.xml in the conf object. This is basically a new configuration object, not the same that the daemons in the hadoop cluster use. I think what you are trying to ask is if you can get the Configuration object that a daemon in your live cluster (e.g. datanode) is using. I am not sure if the datanode or any other daemon on a hadoop cluster exposes such an API. I would in fact be tempted to get this information from the configuration management daemon instead - in your case cloudera manager. But I am not sure if CM exposes that API either. You could probably find out on the Cloudera mailing list. HTH, Bhooshan On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <[email protected]> wrote: > hi, Bhooshan, > > thanks for your kind response. I run the code on one of the data node of > my cluster, with only one hadoop daemon running. I believe my java client > code connect to the cluster correctly as I am able to retrieve fileStatus, > and list files under a particular hdfs path, and similar things... > However, you are right that the daemon process use the hdfs-site.xml under > another folder for cloudera : > /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml. > > about " retrieving the info from a live cluster", I would like to get the > information beyond the configuration files(that is beyond the .xml files). > Since I am able to use : > conf = new Configuration() > to connect to hdfs and did other operations, shouldn't I be able to > retrieve the configuration variables? > > Thanks > > Demai > > > On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <[email protected]> > wrote: > >> Hi Demai, >> >> When you read a property from the conf object, it will only have a value >> if the conf object contains that property. >> >> In your case, you created the conf object as new Configuration() -- adds >> core-default and core-site.xml. >> >> Then you added site.xmls (hdfs-site.xml and core-site.xml) from specific >> locations. If none of these files have defined dfs.data.dir, then you will >> get NULL. This is expected behavior. >> >> What do you mean by retrieving the info from a live cluster? Even for >> processes like datanode, namenode etc, the source of truth for these >> properties is hdfs-site.xml. It is loaded from a specific location when you >> start these services. >> >> Question: Where are you running the above code? Is it on a node which has >> other hadoop daemons as well? >> >> My guess is that the path you are referring to (/etc/hadoop/conf. >> cloudera.hdfs/core-site.xml) is not the right path where these config >> properties are defined. Since this is a CDH cluster, you would probably be >> best served by asking on the CDH mailing list as to where the right path to >> these files is. >> >> >> HTH, >> Bhooshan >> >> >> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <[email protected]> wrote: >> >>> hi, experts, >>> >>> I am trying to get the local filesystem directory of data node. My >>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So the >>> datanode is under file:///dfs/dn. I didn't specify the value in >>> hdfs-site.xml. >>> >>> My code is something like: >>> >>> conf = new Configuration() >>> >>> // test both with and without the following two lines >>> conf.addResource (new >>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml")); >>> conf.addResource (new >>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml")); >>> >>> // I also tried get("dfs.datanode.data.dir"), which also return NULL >>> String dnDir = conf.get("dfs.data.dir"); // return NULL >>> >>> It looks like the get only look at the configuration file instead of >>> retrieving the info from the live cluster? >>> >>> Many thanks for your help in advance. >>> >>> Demai >>> >> >> >> >> -- >> Bhooshan >> > > -- Bhooshan
