Hi,
You could also use this Receiver :
https://github.com/dibbhatt/kafka-spark-consumer
This is part of spark-packages also :
https://spark-packages.org/package/dibbhatt/kafka-spark-consumer
You do not need to enable WAL in this and still recover from Driver failure
with no data loss. You can re
IIUC, your scenario is quite like what currently ReliableKafkaReceiver
does. You can only send ack to the upstream source after WAL is persistent,
otherwise because of asynchronization of data processing and data
receiving, there's still a chance data could be lost if you send out ack
before WAL.
Hi All,
I am using a Receiver based approach. And I understand that spark streaming
API's will convert the received data from receiver into blocks and these
blocks that are in memory are also stored in WAL if one enables it. my
upstream source which is not Kafka can also replay by which I mean if