Elek, Marton created HADOOP-16064:
-------------------------------------

             Summary: Load configuration values from external sources
                 Key: HADOOP-16064
                 URL: https://issues.apache.org/jira/browse/HADOOP-16064
             Project: Hadoop Common
          Issue Type: Improvement
            Reporter: Elek, Marton


This is a proposal to improve the Configuration.java to load configuration from 
external sources (kubernetes config map, external http reqeust, any cluster 
manager like ambari, etc.)

I will attach a patch to illustrate the proposed solution, but please comment 
the concept first, the patch is just poc and not fully implemented.

*Goals:*
 * **Load the configuration files (core-site.xml/hdfs-site.xml/...) from 
external locations instead of the classpath (classpath remains the default)
 * Make the configuration loading extensible
 * Make it in an backward-compatible way with minimal change in the existing 
Configuration.java

*Use-cases:*

 1.) load configuration from the namenode ([http://namenode:9878/conf]). With 
this approach only the namenode should be configured, other components require 
only the url of the namenode

 2.) Read configuration directly from kubernetes config-map (or mesos)

 3.) Read configuration from any external cluster management (such as Apache 
Ambari or any equivalent)

 4.) as of now in the hadoop docker images we transform environment variables 
(such as HDFS-SITE.XML_fs.defaultFs) to configuration xml files with the help 
of a python script. With the proposed implementation it would be possible to 
read the configuration directly from the system environment variables.

*Problem:*

The existing Configuration.java can read configuration from multiple sources. 
But most of the time it's used to load predefined config names ("core-site.xml" 
and "hdfs-site.xml") without configuration location. In this case the files 
will be loaded from the classpath.

I propose to add additional option to define the default location of 
core-site.xml and hdfs-site.xml (any configuration which is defined by string 
name) to use external sources in the classpath.

The configuration loading requires implementation + configuration (where are 
the external configs). We can't use regular configuration to configure the 
config loader (chicken/egg).

I propose to use a new environment variable HADOOP_CONF_SOURCE

The environment variable could contain a URL, where the schema of the url can 
define the config source and all the other parts can configure the access to 
the resource.

Examples:

HADOOP_CONF_SOURCE=hadoop-[http://namenode:9878/conf]

HADOOP_CONF_SOURCE=env://prefix

HADOOP_CONF_SOURCE=k8s://config-map-name

The ConfigurationSource interface can be as easy as:
{code:java}
/**
 * Interface to load hadoop configuration from custom location.
 */
public interface ConfigurationSource {

  /**
   * Method will be called one with the defined configuration url.
   *
   * @param uri
   */
  void initialize(URI uri) throws IOException;

  /**
   * Method will be called to load a specific configuration resource.
   *
   * @param name of the configuration resource (eg. hdfs-site.xml)
   * @return List of loaded configuraiton key and values.
   */
  List<ParsedItem> readConfiguration(String name);

}{code}
We can choose the right implementation based the schema of the uri and with 
Java Service Provider Interface mechanism 
(META-INF/services/org.apache.hadoop.conf.ConfigurationSource)

It could be with minimal modification in the Configuration.java (see the 
attached patch as an example)

 The patch contains two example implementation:

*hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/location/Env.java*

This can load configuration from environment variables based on a naming 
convention (eg. HDFS-SITE.XML_hdfs.dfs.key=value)

*hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/location/HadoopWeb.java*

 This implementation can load the configuration from a /conf servlet of any 
Hadoop components.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to