Re: S3 Source support in Flink

Yuval Itzchakov Thu, 28 Oct 2021 22:56:46 -0700

Hi Abhishek,

You can use `readFileStream` directly defined on DataStream. You will still
have to pay the ListObjects for each iteration using that method.
If you want a source that does not rely on listing, you can implement a
custom SQS source (there is no official existing one currently) and use
Amazon S3 Event Notification to ship to from S3 to SQS:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/NotificationHowTo.html




On Fri, Oct 29, 2021 at 3:34 AM Abhishek SP <abhisheksp1...@gmail.com>
wrote:

> Hello,
>
> I see S3 supported as a Sink through StreamingFileSink
> <https://ci.apache.org/projects/flink/flink-docs-master/docs/connectors/datastream/streamfile_sink/>
>  but
> do not see a source equivalent StreamingFileSource
>
> *Questions:*
> 1. What is the current recommendation for using S3 as a continuous source
> for Flink Streaming Application?
> 2. If we have to implement a custom S3 continuous source, how would one
> implement the SplitEnumerator since ListObjects S3 API can become expensive
> as the bucket grows?
>
> Thanks in advance
>
> Best,
> Abhishek
>


-- 
Best Regards,
Yuval Itzchakov.

Re: S3 Source support in Flink

Reply via email to