[ https://issues.apache.org/jira/browse/SPARK-13912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yin Huai updated SPARK-13912: ----------------------------- Target Version/s: 2.0.0 > spark.hadoop.* configurations are not applied for Parquet Data Frame Readers > ---------------------------------------------------------------------------- > > Key: SPARK-13912 > URL: https://issues.apache.org/jira/browse/SPARK-13912 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.6.1 > Reporter: Matt Cheah > > I populated a SparkConf object passed to a SparkContext with some > spark.hadoop.* configurations, expecting them to be used in the backing > Hadoop file reading whenever I read from my DFS. However, when I was running > some jobs, I noticed that the configurations were not being properly applied > to the data frame reading when I used sqlContext.read().parquet(). > I looked in the codebase and noticed that SqlNewHadoopRDD doesn't use a > SparkConf nor SparkContext hadoop configuration to set up the Hadoop reading; > instead, it uses SparkHadoopUtil.get.conf. This Hadoop configuration object > won't have Hadoop configurations set on the Spark Context. In general it > seems like we have a discrepancy in how we set Hadoop configurations; when > reading raw RDDs via e.g. SparkContext.textFile() we take the Hadoop > configuration from the Spark Context, but for Data Frames we use > SparkHadoopUtil.conf. > We should probably use the Spark Context hadoop configuration for Data Frames > as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org