Hi All,
I've been working on a pull request [1] to allow Spark read from a specific
timestamp from Kinesis. I have iterated the patch with the help of other
contributors and we think that its in a good state now.

This patch would save hours of crash recovery time for Spark while reading
off Kinesis. Kinesis suffers from Throttling issues unlike Kafka and hence
this patch would essentially reduce the amount of data requested from
Kinesis.

I would love to hear some thoughts from the committers and see if I can
work on any improvements.

1. https://github.com/apache/spark/pull/18029

Best Regards,
Yash

Reply via email to