[jira] [Updated] (BEAM-9315) HadoopFileSystemOptions unable to interpret HADOOP_CONF_DIR with multiple paths
[ https://issues.apache.org/jira/browse/BEAM-9315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-9315: --- Fix Version/s: 2.20.0 > HadoopFileSystemOptions unable to interpret HADOOP_CONF_DIR with multiple > paths > --- > > Key: BEAM-9315 > URL: https://issues.apache.org/jira/browse/BEAM-9315 > Project: Beam > Issue Type: Improvement > Components: io-java-hadoop-file-system >Affects Versions: 2.19.0 > Environment: Cloudera CDH 6.3.2 with Spark 2.4.0 (Scala 2.11) >Reporter: Claudio Venturini >Assignee: Claudio Venturini >Priority: Major > Fix For: 2.20.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > In certain Hadoop deployments the {{HADOOP_CONF_DIR}} environment variable > could contain multiple paths. For example, when running {{spark-submit}} > Cloudera 6.3 sets it as follows: > {{HADOOP_CONF_DIR=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark/conf/yarn-conf:/etc/hive/conf}} > Currently the class {{HadoopFileSystemOptions}} reads the content of the > variable but treats it as a single path. When it contains multiple paths, > this makes Beam unable to properly configure Hadoop, and so HDFS can't be > accessed. At the moment, the only work arounds to make it work that I'm aware > of are: > - Override the {{HADOOP_CONF_DIR}} set by Cloudera for the Spark service, > but I think it could cause problems with some other tools (maybe when using > Hive from Spark, because I think that Spark wouldn't be able to find Hive > config) > - Pass HDFS configurations using the {{--hdfsConfigurations}} options, but > it's inconvenient when there are a lot of config to set, and they would not > be changed automatically when reconfigured in Cloudera Manager > In my opinion, to fix this the {{HadoopFileSystemOptions}} class should split > the content of the {{HADOOP_CONF_DIR}} environment variable by colon (":") to > detect all paths contained. > I have already fixed this and all tests on class {{HadoopFileSystemOptions}} > pass successfully. I'm preparing a pull request. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9315) HadoopFileSystemOptions unable to interpret HADOOP_CONF_DIR with multiple paths
[ https://issues.apache.org/jira/browse/BEAM-9315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-9315: --- Issue Type: Improvement (was: Bug) > HadoopFileSystemOptions unable to interpret HADOOP_CONF_DIR with multiple > paths > --- > > Key: BEAM-9315 > URL: https://issues.apache.org/jira/browse/BEAM-9315 > Project: Beam > Issue Type: Improvement > Components: io-java-hadoop-file-system >Affects Versions: 2.19.0 > Environment: Cloudera CDH 6.3.2 with Spark 2.4.0 (Scala 2.11) >Reporter: Claudio Venturini >Assignee: Claudio Venturini >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > In certain Hadoop deployments the {{HADOOP_CONF_DIR}} environment variable > could contain multiple paths. For example, when running {{spark-submit}} > Cloudera 6.3 sets it as follows: > {{HADOOP_CONF_DIR=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark/conf/yarn-conf:/etc/hive/conf}} > Currently the class {{HadoopFileSystemOptions}} reads the content of the > variable but treats it as a single path. When it contains multiple paths, > this makes Beam unable to properly configure Hadoop, and so HDFS can't be > accessed. At the moment, the only work arounds to make it work that I'm aware > of are: > - Override the {{HADOOP_CONF_DIR}} set by Cloudera for the Spark service, > but I think it could cause problems with some other tools (maybe when using > Hive from Spark, because I think that Spark wouldn't be able to find Hive > config) > - Pass HDFS configurations using the {{--hdfsConfigurations}} options, but > it's inconvenient when there are a lot of config to set, and they would not > be changed automatically when reconfigured in Cloudera Manager > In my opinion, to fix this the {{HadoopFileSystemOptions}} class should split > the content of the {{HADOOP_CONF_DIR}} environment variable by colon (":") to > detect all paths contained. > I have already fixed this and all tests on class {{HadoopFileSystemOptions}} > pass successfully. I'm preparing a pull request. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9315) HadoopFileSystemOptions unable to interpret HADOOP_CONF_DIR with multiple paths
[ https://issues.apache.org/jira/browse/BEAM-9315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-9315: --- Status: Open (was: Triage Needed) > HadoopFileSystemOptions unable to interpret HADOOP_CONF_DIR with multiple > paths > --- > > Key: BEAM-9315 > URL: https://issues.apache.org/jira/browse/BEAM-9315 > Project: Beam > Issue Type: Bug > Components: io-java-hadoop-file-system >Affects Versions: 2.19.0 > Environment: Cloudera CDH 6.3.2 with Spark 2.4.0 (Scala 2.11) >Reporter: Claudio Venturini >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In certain Hadoop deployments the {{HADOOP_CONF_DIR}} environment variable > could contain multiple paths. For example, when running {{spark-submit}} > Cloudera 6.3 sets it as follows: > {{HADOOP_CONF_DIR=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark/conf/yarn-conf:/etc/hive/conf}} > Currently the class {{HadoopFileSystemOptions}} reads the content of the > variable but treats it as a single path. When it contains multiple paths, > this makes Beam unable to properly configure Hadoop, and so HDFS can't be > accessed. At the moment, the only work arounds to make it work that I'm aware > of are: > - Override the {{HADOOP_CONF_DIR}} set by Cloudera for the Spark service, > but I think it could cause problems with some other tools (maybe when using > Hive from Spark, because I think that Spark wouldn't be able to find Hive > config) > - Pass HDFS configurations using the {{--hdfsConfigurations}} options, but > it's inconvenient when there are a lot of config to set, and they would not > be changed automatically when reconfigured in Cloudera Manager > In my opinion, to fix this the {{HadoopFileSystemOptions}} class should split > the content of the {{HADOOP_CONF_DIR}} environment variable by colon (":") to > detect all paths contained. > I have already fixed this and all tests on class {{HadoopFileSystemOptions}} > pass successfully. I'm preparing a pull request. > -- This message was sent by Atlassian Jira (v8.3.4#803005)