Can you elaborate a bit more in your approach using s3 notifications ? Just curious. dealing with a similar issue right now that might benefit from this. On 09 Apr 2016 9:25 AM, "Nezih Yigitbasi" <nyigitb...@netflix.com> wrote:
> While it is doable in Spark, S3 also supports notifications: > http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html > > > On Fri, Apr 8, 2016 at 9:15 PM Natu Lauchande <nlaucha...@gmail.com> > wrote: > >> Hi Benjamin, >> >> I have done it . The critical configuration items are the ones below : >> >> ssc.sparkContext.hadoopConfiguration.set("fs.s3n.impl", >> "org.apache.hadoop.fs.s3native.NativeS3FileSystem") >> ssc.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", >> AccessKeyId) >> >> ssc.sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", >> AWSSecretAccessKey) >> >> val inputS3Stream = >> ssc.textFileStream("s3://example_bucket/folder") >> >> This code will probe for new S3 files created in your every batch >> interval. >> >> Thanks, >> Natu >> >> On Fri, Apr 8, 2016 at 9:14 PM, Benjamin Kim <bbuil...@gmail.com> wrote: >> >>> Has anyone monitored an S3 bucket or directory using Spark Streaming and >>> pulled any new files to process? If so, can you provide basic Scala coding >>> help on this? >>> >>> Thanks, >>> Ben >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >>> >>