[jira] [Updated] (BEAM-9315) HadoopFileSystemOptions unable to interpret HADOOP_CONF_DIR with multiple paths

2020-02-14 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9315:
---
Fix Version/s: 2.20.0

> HadoopFileSystemOptions unable to interpret HADOOP_CONF_DIR with multiple 
> paths
> ---
>
> Key: BEAM-9315
> URL: https://issues.apache.org/jira/browse/BEAM-9315
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-hadoop-file-system
>Affects Versions: 2.19.0
> Environment: Cloudera CDH 6.3.2 with Spark 2.4.0 (Scala 2.11)
>Reporter: Claudio Venturini
>Assignee: Claudio Venturini
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In certain Hadoop deployments the {{HADOOP_CONF_DIR}} environment variable 
> could contain multiple paths. For example, when running {{spark-submit}} 
> Cloudera 6.3 sets it as follows:
> {{HADOOP_CONF_DIR=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark/conf/yarn-conf:/etc/hive/conf}}
> Currently the class {{HadoopFileSystemOptions}} reads the content of the 
> variable but treats it as a single path. When it contains multiple paths, 
> this makes Beam unable to properly configure Hadoop, and so HDFS can't be 
> accessed. At the moment, the only work arounds to make it work that I'm aware 
> of are:
>  - Override the {{HADOOP_CONF_DIR}} set by Cloudera for the Spark service, 
> but I think it could cause problems with some other tools (maybe when using 
> Hive from Spark, because I think that Spark wouldn't be able to find Hive 
> config)
>  - Pass HDFS configurations using the {{--hdfsConfigurations}} options, but 
> it's inconvenient when there are a lot of config to set, and they would not 
> be changed automatically when reconfigured in Cloudera Manager
> In my opinion, to fix this the {{HadoopFileSystemOptions}} class should split 
> the content of the {{HADOOP_CONF_DIR}} environment variable by colon (":") to 
> detect all paths contained.
> I have already fixed this and all tests on class {{HadoopFileSystemOptions}} 
> pass successfully. I'm preparing a pull request.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9315) HadoopFileSystemOptions unable to interpret HADOOP_CONF_DIR with multiple paths

2020-02-14 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9315:
---
Issue Type: Improvement  (was: Bug)

> HadoopFileSystemOptions unable to interpret HADOOP_CONF_DIR with multiple 
> paths
> ---
>
> Key: BEAM-9315
> URL: https://issues.apache.org/jira/browse/BEAM-9315
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-hadoop-file-system
>Affects Versions: 2.19.0
> Environment: Cloudera CDH 6.3.2 with Spark 2.4.0 (Scala 2.11)
>Reporter: Claudio Venturini
>Assignee: Claudio Venturini
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In certain Hadoop deployments the {{HADOOP_CONF_DIR}} environment variable 
> could contain multiple paths. For example, when running {{spark-submit}} 
> Cloudera 6.3 sets it as follows:
> {{HADOOP_CONF_DIR=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark/conf/yarn-conf:/etc/hive/conf}}
> Currently the class {{HadoopFileSystemOptions}} reads the content of the 
> variable but treats it as a single path. When it contains multiple paths, 
> this makes Beam unable to properly configure Hadoop, and so HDFS can't be 
> accessed. At the moment, the only work arounds to make it work that I'm aware 
> of are:
>  - Override the {{HADOOP_CONF_DIR}} set by Cloudera for the Spark service, 
> but I think it could cause problems with some other tools (maybe when using 
> Hive from Spark, because I think that Spark wouldn't be able to find Hive 
> config)
>  - Pass HDFS configurations using the {{--hdfsConfigurations}} options, but 
> it's inconvenient when there are a lot of config to set, and they would not 
> be changed automatically when reconfigured in Cloudera Manager
> In my opinion, to fix this the {{HadoopFileSystemOptions}} class should split 
> the content of the {{HADOOP_CONF_DIR}} environment variable by colon (":") to 
> detect all paths contained.
> I have already fixed this and all tests on class {{HadoopFileSystemOptions}} 
> pass successfully. I'm preparing a pull request.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9315) HadoopFileSystemOptions unable to interpret HADOOP_CONF_DIR with multiple paths

2020-02-14 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9315:
---
Status: Open  (was: Triage Needed)

> HadoopFileSystemOptions unable to interpret HADOOP_CONF_DIR with multiple 
> paths
> ---
>
> Key: BEAM-9315
> URL: https://issues.apache.org/jira/browse/BEAM-9315
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-hadoop-file-system
>Affects Versions: 2.19.0
> Environment: Cloudera CDH 6.3.2 with Spark 2.4.0 (Scala 2.11)
>Reporter: Claudio Venturini
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In certain Hadoop deployments the {{HADOOP_CONF_DIR}} environment variable 
> could contain multiple paths. For example, when running {{spark-submit}} 
> Cloudera 6.3 sets it as follows:
> {{HADOOP_CONF_DIR=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark/conf/yarn-conf:/etc/hive/conf}}
> Currently the class {{HadoopFileSystemOptions}} reads the content of the 
> variable but treats it as a single path. When it contains multiple paths, 
> this makes Beam unable to properly configure Hadoop, and so HDFS can't be 
> accessed. At the moment, the only work arounds to make it work that I'm aware 
> of are:
>  - Override the {{HADOOP_CONF_DIR}} set by Cloudera for the Spark service, 
> but I think it could cause problems with some other tools (maybe when using 
> Hive from Spark, because I think that Spark wouldn't be able to find Hive 
> config)
>  - Pass HDFS configurations using the {{--hdfsConfigurations}} options, but 
> it's inconvenient when there are a lot of config to set, and they would not 
> be changed automatically when reconfigured in Cloudera Manager
> In my opinion, to fix this the {{HadoopFileSystemOptions}} class should split 
> the content of the {{HADOOP_CONF_DIR}} environment variable by colon (":") to 
> detect all paths contained.
> I have already fixed this and all tests on class {{HadoopFileSystemOptions}} 
> pass successfully. I'm preparing a pull request.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)