One doubt on building Configuration object. I have a Hadoop remote client and Hadoop cluster. When a client submitted a MR job, the Configuration object is built from Hadoop cluster node xml files, basically the resource manager node core-site.xml and mapred-site.xml and yarn-site.xml. Am I correct?
TIA Susheel Kumar On 9/9/14, Bhooshan Mogal <[email protected]> wrote: > Hi Demai, > > conf = new Configuration() > > will create a new Configuration object and only add the properties from > core-default.xml and core-site.xml in the conf object. > > This is basically a new configuration object, not the same that the daemons > in the hadoop cluster use. > > > > I think what you are trying to ask is if you can get the Configuration > object that a daemon in your live cluster (e.g. datanode) is using. I am > not sure if the datanode or any other daemon on a hadoop cluster exposes > such an API. > > I would in fact be tempted to get this information from the configuration > management daemon instead - in your case cloudera manager. But I am not > sure if CM exposes that API either. You could probably find out on the > Cloudera mailing list. > > > HTH, > Bhooshan > > > On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <[email protected]> wrote: > >> hi, Bhooshan, >> >> thanks for your kind response. I run the code on one of the data node of >> my cluster, with only one hadoop daemon running. I believe my java client >> code connect to the cluster correctly as I am able to retrieve >> fileStatus, >> and list files under a particular hdfs path, and similar things... >> However, you are right that the daemon process use the hdfs-site.xml >> under >> another folder for cloudera : >> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml. >> >> about " retrieving the info from a live cluster", I would like to get the >> information beyond the configuration files(that is beyond the .xml >> files). >> Since I am able to use : >> conf = new Configuration() >> to connect to hdfs and did other operations, shouldn't I be able to >> retrieve the configuration variables? >> >> Thanks >> >> Demai >> >> >> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <[email protected]> >> wrote: >> >>> Hi Demai, >>> >>> When you read a property from the conf object, it will only have a value >>> if the conf object contains that property. >>> >>> In your case, you created the conf object as new Configuration() -- adds >>> core-default and core-site.xml. >>> >>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from specific >>> locations. If none of these files have defined dfs.data.dir, then you >>> will >>> get NULL. This is expected behavior. >>> >>> What do you mean by retrieving the info from a live cluster? Even for >>> processes like datanode, namenode etc, the source of truth for these >>> properties is hdfs-site.xml. It is loaded from a specific location when >>> you >>> start these services. >>> >>> Question: Where are you running the above code? Is it on a node which >>> has >>> other hadoop daemons as well? >>> >>> My guess is that the path you are referring to (/etc/hadoop/conf. >>> cloudera.hdfs/core-site.xml) is not the right path where these config >>> properties are defined. Since this is a CDH cluster, you would probably >>> be >>> best served by asking on the CDH mailing list as to where the right path >>> to >>> these files is. >>> >>> >>> HTH, >>> Bhooshan >>> >>> >>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <[email protected]> wrote: >>> >>>> hi, experts, >>>> >>>> I am trying to get the local filesystem directory of data node. My >>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So >>>> the >>>> datanode is under file:///dfs/dn. I didn't specify the value in >>>> hdfs-site.xml. >>>> >>>> My code is something like: >>>> >>>> conf = new Configuration() >>>> >>>> // test both with and without the following two lines >>>> conf.addResource (new >>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml")); >>>> conf.addResource (new >>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml")); >>>> >>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL >>>> String dnDir = conf.get("dfs.data.dir"); // return NULL >>>> >>>> It looks like the get only look at the configuration file instead of >>>> retrieving the info from the live cluster? >>>> >>>> Many thanks for your help in advance. >>>> >>>> Demai >>>> >>> >>> >>> >>> -- >>> Bhooshan >>> >> >> > > > -- > Bhooshan >
