Tzu-Li (Gordon) Tai created FLINK-3211:
------------------------------------------

             Summary: Add AWS Kinesis streaming connector
                 Key: FLINK-3211
                 URL: https://issues.apache.org/jira/browse/FLINK-3211
             Project: Flink
          Issue Type: New Feature
          Components: Streaming Connectors
            Reporter: Tzu-Li (Gordon) Tai
             Fix For: 1.0.0


AWS Kinesis is a widely adopted message queue used by AWS users, much like a 
cloud service version of Apache Kafka. Support for AWS Kinesis will be a great 
addition to the handful of Flink's streaming connectors to external systems and 
a great reach out to the AWS community.

After a first look at the AWS KCL (Kinesis Client Library), KCL already 
supports stream read beginning from a specific offset (or "record sequence 
number" in Kinesis terminology). For external checkpointing, KCL is designed  
to use AWS DynamoDB to checkpoint application state, where each partition's 
progress (or "shard" in Kinesis terminology) corresponds to a single row in the 
KCL-managed DynamoDB table.

So, implementing the AWS Kinesis connector will very much resemble the work 
done on the Kafka connector, with a few different tweaks as following (I'm 
mainly just rewording [~StephanEwen]'s original description [1]):

1. Determine KCL Shard Worker to Flink source task mapping. KCL already offers 
worker tasks per shard, so we will need to do mapping much like [2].

2. Let the Flink connector also maintain a local copy of application state, 
accessed using KCL API, for the distributed snapshot checkpointing.

3. Restart the KCL at the last Flink local checkpointed record sequence upon 
failure. However, when KCL restarts after failure, it is originally designed to 
reference the external DynamoDB table. Need a further look on how to work with 
this so that the Flink checkpoint and external checkpoint in DynamoDB is 
properly synced.

Most of the details regarding KCL's state checkpointing, sharding, shard 
workers, and failure recovery can be found here [3].

As for the Kinesis sink connector, it should be fairly straightforward and 
almost, if not completely, identical to the Kafka sink.

References:
[1] 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kinesis-Connector-td2872.html
[2] http://data-artisans.com/kafka-flink-a-practical-how-to/
[3] http://docs.aws.amazon.com/kinesis/latest/dev/advanced-consumers.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to