Re: Spark on EMR with S3 example (Python)

2015-07-15 Thread Sujit Pal
ide the keys? > > > > Thank you, > > > > > > *From:* Sujit Pal [mailto:sujitatgt...@gmail.com] > *Sent:* Tuesday, July 14, 2015 3:14 PM > *To:* Pagliari, Roberto > *Cc:* user@spark.apache.org > *Subject:* Re: Spark on EMR with S3 example (Python) > > &g

Re: Spark on EMR with S3 example (Python)

2015-07-14 Thread Akhil Das
on. Do I still need to > provide the keys? > > > > Thank you, > > > > > > *From:* Sujit Pal [mailto:sujitatgt...@gmail.com] > *Sent:* Tuesday, July 14, 2015 3:14 PM > *To:* Pagliari, Roberto > *Cc:* user@spark.apache.org > *Subject:* Re: Spark on EMR with S3

RE: Spark on EMR with S3 example (Python)

2015-07-14 Thread Pagliari, Roberto
Hi Sujit, I just wanted to access public datasets on Amazon. Do I still need to provide the keys? Thank you, From: Sujit Pal [mailto:sujitatgt...@gmail.com] Sent: Tuesday, July 14, 2015 3:14 PM To: Pagliari, Roberto Cc: user@spark.apache.org Subject: Re: Spark on EMR with S3 example (Python

Re: Spark on EMR with S3 example (Python)

2015-07-14 Thread Sujit Pal
Hi Roberto, I have written PySpark code that reads from private S3 buckets, it should be similar for public S3 buckets as well. You need to set the AWS access and secret keys into the SparkContext, then you can access the S3 folders and files with their s3n:// paths. Something like this: sc = Spa