After a draft glance, seems a bug in Spark SQL, do you mind to create a jira for this? And then I can start to fix it.
Thanks, Hao From: Jerry Lam [mailto:chiling...@gmail.com] Sent: Wednesday, October 28, 2015 3:13 AM To: Marcelo Vanzin Cc: user@spark.apache.org Subject: Re: [Spark-SQL]: Unable to propagate hadoop configuration after SparkContext is initialized Hi Marcelo, I tried setting the properties before instantiating spark context via SparkConf. It works fine. Originally, the code I have read hadoop configurations from hdfs-site.xml which works perfectly fine as well. Therefore, can I conclude that sparkContext.hadoopConfiguration.set("key", "value") does not propagate through all SQL jobs within the same SparkContext? I haven't try with Spark Core so I cannot tell. Is there a workaround given it seems to be broken? I need to do this programmatically after the SparkContext is instantiated not before... Best Regards, Jerry On Tue, Oct 27, 2015 at 2:30 PM, Marcelo Vanzin <van...@cloudera.com<mailto:van...@cloudera.com>> wrote: If setting the values in SparkConf works, there's probably some bug in the SQL code; e.g. creating a new Configuration object instead of using the one in SparkContext. But I'm not really familiar with that code. On Tue, Oct 27, 2015 at 11:22 AM, Jerry Lam <chiling...@gmail.com<mailto:chiling...@gmail.com>> wrote: > Hi Marcelo, > > Thanks for the advice. I understand that we could set the configurations > before creating SparkContext. My question is > SparkContext.hadoopConfiguration.set("key","value") doesn't seem to > propagate to all subsequent SQLContext jobs. Note that I mentioned I can > load the parquet file but I cannot perform a count on the parquet file > because of the AmazonClientException. It means that the credential is used > during the loading of the parquet but not when we are processing the parquet > file. How this can happen? > > Best Regards, > > Jerry > > > On Tue, Oct 27, 2015 at 2:05 PM, Marcelo Vanzin > <van...@cloudera.com<mailto:van...@cloudera.com>> wrote: >> >> On Tue, Oct 27, 2015 at 10:43 AM, Jerry Lam >> <chiling...@gmail.com<mailto:chiling...@gmail.com>> wrote: >> > Anyone experiences issues in setting hadoop configurations after >> > SparkContext is initialized? I'm using Spark 1.5.1. >> > >> > I'm trying to use s3a which requires access and secret key set into >> > hadoop >> > configuration. I tried to set the properties in the hadoop configuration >> > from sparktcontext. >> > >> > sc.hadoopConfiguration.set("fs.s3a.access.key", AWSAccessKeyId) >> > sc.hadoopConfiguration.set("fs.s3a.secret.key", AWSSecretKey) >> >> Try setting "spark.hadoop.fs.s3a.access.key" and >> "spark.hadoop.fs.s3a.secret.key" in your SparkConf before creating the >> SparkContext. >> >> -- >> Marcelo > > -- Marcelo