Susheel actually brought up a good point. once the client code connects to the cluster, is there way to get the real cluster configuration variables/values instead of relying on the .xml files on client side?
Demai On Mon, Sep 8, 2014 at 10:12 PM, Susheel Kumar Gadalay <[email protected]> wrote: > One doubt on building Configuration object. > > I have a Hadoop remote client and Hadoop cluster. > When a client submitted a MR job, the Configuration object is built > from Hadoop cluster node xml files, basically the resource manager > node core-site.xml and mapred-site.xml and yarn-site.xml. > Am I correct? > > TIA > Susheel Kumar > > On 9/9/14, Bhooshan Mogal <[email protected]> wrote: > > Hi Demai, > > > > conf = new Configuration() > > > > will create a new Configuration object and only add the properties from > > core-default.xml and core-site.xml in the conf object. > > > > This is basically a new configuration object, not the same that the > daemons > > in the hadoop cluster use. > > > > > > > > I think what you are trying to ask is if you can get the Configuration > > object that a daemon in your live cluster (e.g. datanode) is using. I am > > not sure if the datanode or any other daemon on a hadoop cluster exposes > > such an API. > > > > I would in fact be tempted to get this information from the configuration > > management daemon instead - in your case cloudera manager. But I am not > > sure if CM exposes that API either. You could probably find out on the > > Cloudera mailing list. > > > > > > HTH, > > Bhooshan > > > > > > On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <[email protected]> wrote: > > > >> hi, Bhooshan, > >> > >> thanks for your kind response. I run the code on one of the data node > of > >> my cluster, with only one hadoop daemon running. I believe my java > client > >> code connect to the cluster correctly as I am able to retrieve > >> fileStatus, > >> and list files under a particular hdfs path, and similar things... > >> However, you are right that the daemon process use the hdfs-site.xml > >> under > >> another folder for cloudera : > >> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml. > >> > >> about " retrieving the info from a live cluster", I would like to get > the > >> information beyond the configuration files(that is beyond the .xml > >> files). > >> Since I am able to use : > >> conf = new Configuration() > >> to connect to hdfs and did other operations, shouldn't I be able to > >> retrieve the configuration variables? > >> > >> Thanks > >> > >> Demai > >> > >> > >> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal < > [email protected]> > >> wrote: > >> > >>> Hi Demai, > >>> > >>> When you read a property from the conf object, it will only have a > value > >>> if the conf object contains that property. > >>> > >>> In your case, you created the conf object as new Configuration() -- > adds > >>> core-default and core-site.xml. > >>> > >>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from > specific > >>> locations. If none of these files have defined dfs.data.dir, then you > >>> will > >>> get NULL. This is expected behavior. > >>> > >>> What do you mean by retrieving the info from a live cluster? Even for > >>> processes like datanode, namenode etc, the source of truth for these > >>> properties is hdfs-site.xml. It is loaded from a specific location when > >>> you > >>> start these services. > >>> > >>> Question: Where are you running the above code? Is it on a node which > >>> has > >>> other hadoop daemons as well? > >>> > >>> My guess is that the path you are referring to (/etc/hadoop/conf. > >>> cloudera.hdfs/core-site.xml) is not the right path where these config > >>> properties are defined. Since this is a CDH cluster, you would probably > >>> be > >>> best served by asking on the CDH mailing list as to where the right > path > >>> to > >>> these files is. > >>> > >>> > >>> HTH, > >>> Bhooshan > >>> > >>> > >>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <[email protected]> wrote: > >>> > >>>> hi, experts, > >>>> > >>>> I am trying to get the local filesystem directory of data node. My > >>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So > >>>> the > >>>> datanode is under file:///dfs/dn. I didn't specify the value in > >>>> hdfs-site.xml. > >>>> > >>>> My code is something like: > >>>> > >>>> conf = new Configuration() > >>>> > >>>> // test both with and without the following two lines > >>>> conf.addResource (new > >>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml")); > >>>> conf.addResource (new > >>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml")); > >>>> > >>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL > >>>> String dnDir = conf.get("dfs.data.dir"); // return NULL > >>>> > >>>> It looks like the get only look at the configuration file instead of > >>>> retrieving the info from the live cluster? > >>>> > >>>> Many thanks for your help in advance. > >>>> > >>>> Demai > >>>> > >>> > >>> > >>> > >>> -- > >>> Bhooshan > >>> > >> > >> > > > > > > -- > > Bhooshan > > >
