If using RDD's, you can use saveAsHadoopFile or saveAsNewAPIHadoopFile with the conf passed in which overrides the keys you need. For example, you can do :
val saveConf = new Configuration(sc.hadoopConfiguration) // configure saveConf with overridden s3 config rdd.saveAsNewAPIHadoopFile(..., conf = saveConf) Regards, Mridul On Wed, Oct 12, 2016 at 2:49 AM, Aseem Bansal <asmbans...@gmail.com> wrote: > Hi > > I want to read CSV from one bucket, do some processing and write to a > different bucket. I know the way to set S3 credentials using > > jssc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", YOUR_ACCESS_KEY) > jssc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", YOUR_SECRET_KEY) > > But the problem is that spark is lazy. So if do the following > > set credentails 1 > read input csv > do some processing > set credentials 2 > write result csv > > Then there is a chance that due to laziness while reading input csv the > program may try to use credentails 2. > > A solution is to cache the result csv but in case there is not enough > storage it is possible that the csv will be re-read. So how to handle this > situation? --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org