Re: Monitoring S3 Bucket with Spark Streaming

Natu Lauchande Sat, 09 Apr 2016 02:06:28 -0700

Can you elaborate a bit more in your approach using s3 notifications ? Just
curious. dealing with a similar issue right now that might benefit from
this.
On 09 Apr 2016 9:25 AM, "Nezih Yigitbasi" <nyigitb...@netflix.com> wrote:


> While it is doable in Spark, S3 also supports notifications:
> http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
>
>
> On Fri, Apr 8, 2016 at 9:15 PM Natu Lauchande <nlaucha...@gmail.com>
> wrote:
>
>> Hi Benjamin,
>>
>> I have done it . The critical configuration items are the ones below :
>>
>>       ssc.sparkContext.hadoopConfiguration.set("fs.s3n.impl",
>> "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
>>       ssc.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId",
>> AccessKeyId)
>>
>> ssc.sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey",
>> AWSSecretAccessKey)
>>
>>       val inputS3Stream =
>> ssc.textFileStream("s3://example_bucket/folder")
>>
>> This code will probe for new S3 files created in your every batch
>> interval.
>>
>> Thanks,
>> Natu
>>
>> On Fri, Apr 8, 2016 at 9:14 PM, Benjamin Kim <bbuil...@gmail.com> wrote:
>>
>>> Has anyone monitored an S3 bucket or directory using Spark Streaming and
>>> pulled any new files to process? If so, can you provide basic Scala coding
>>> help on this?
>>>
>>> Thanks,
>>> Ben
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>

Re: Monitoring S3 Bucket with Spark Streaming

Reply via email to