RE: [Spark-SQL]: Unable to propagate hadoop configuration after SparkContext is initialized

Cheng, Hao Tue, 27 Oct 2015 17:52:10 -0700

After a draft glance, seems a bug in Spark SQL, do you mind to create a jira 
for this? And then I can start to fix it.

Thanks,
Hao

From: Jerry Lam [mailto:[email protected]]
Sent: Wednesday, October 28, 2015 3:13 AM
To: Marcelo Vanzin
Cc: [email protected]
Subject: Re: [Spark-SQL]: Unable to propagate hadoop configuration after 
SparkContext is initialized

Hi Marcelo,

I tried setting the properties before instantiating spark context via 
SparkConf. It works fine.
Originally, the code I have read hadoop configurations from hdfs-site.xml which 
works perfectly fine as well.
Therefore, can I conclude that sparkContext.hadoopConfiguration.set("key", 
"value") does not propagate through all SQL jobs within the same SparkContext? 
I haven't try with Spark Core so I cannot tell.

Is there a workaround given it seems to be broken? I need to do this 
programmatically after the SparkContext is instantiated not before...

Best Regards,

Jerry

On Tue, Oct 27, 2015 at 2:30 PM, Marcelo Vanzin 
<[email protected]<mailto:[email protected]>> wrote:
If setting the values in SparkConf works, there's probably some bug in
the SQL code; e.g. creating a new Configuration object instead of
using the one in SparkContext. But I'm not really familiar with that
code.

On Tue, Oct 27, 2015 at 11:22 AM, Jerry Lam 
<[email protected]<mailto:[email protected]>> wrote:
> Hi Marcelo,
>
> Thanks for the advice. I understand that we could set the configurations
> before creating SparkContext. My question is
> SparkContext.hadoopConfiguration.set("key","value") doesn't seem to
> propagate to all subsequent SQLContext jobs. Note that I mentioned I can
> load the parquet file but I cannot perform a count on the parquet file
> because of the AmazonClientException. It means that the credential is used
> during the loading of the parquet but not when we are processing the parquet
> file. How this can happen?
>
> Best Regards,
>
> Jerry
>
>
> On Tue, Oct 27, 2015 at 2:05 PM, Marcelo Vanzin 
> <[email protected]<mailto:[email protected]>> wrote:
>>
>> On Tue, Oct 27, 2015 at 10:43 AM, Jerry Lam 
>> <[email protected]<mailto:[email protected]>> wrote:
>> > Anyone experiences issues in setting hadoop configurations after
>> > SparkContext is initialized? I'm using Spark 1.5.1.
>> >
>> > I'm trying to use s3a which requires access and secret key set into
>> > hadoop
>> > configuration. I tried to set the properties in the hadoop configuration
>> > from sparktcontext.
>> >
>> > sc.hadoopConfiguration.set("fs.s3a.access.key", AWSAccessKeyId)
>> > sc.hadoopConfiguration.set("fs.s3a.secret.key", AWSSecretKey)
>>
>> Try setting "spark.hadoop.fs.s3a.access.key" and
>> "spark.hadoop.fs.s3a.secret.key" in your SparkConf before creating the
>> SparkContext.
>>
>> --
>> Marcelo
>
>

--
Marcelo

RE: [Spark-SQL]: Unable to propagate hadoop configuration after SparkContext is initialized

Reply via email to