Hi All, I have implemented Kinesis Connector for Structured Streaming. The code is available is at https://github.com/qubole/kinesis-sql.
Open Source Jira for the same is SPARK-18165 <https://issues.apache.org/jira/browse/SPARK-18165> Design details are mentioned here - https://docs.google.com/presentation/d/ 1eSX7WiZxkjRy5ZDXAKRUDIKWqhP3ob2A1L1lat2QIXY/edit#slide=id.g34f0efcb5a_2_76 I wrote and ran unit tests to validate the implementation. Apart from that, I started 3 node spark clusters to read from Kinesis and write to S3 in parquet format and ran the query for more than a day. List of Features implemented - Re-sharding logic to add/remove new/closed shards - Executors fetch records from Kinesis as part of the incremental job execution (instead of having a receiver model where few threads are responsible for reading from Kinesis) - Various configuration to have fine-grain control depending upon your requirements I have developed and tested the connector against Spark 2.2.x and will migrate it to DataSource V2 APIs in some time. Can anyone help me in reviewing the design/implementation? I would love it to be part of Spark Distribution. - Vikram