I would like to have a Spark Streaming *SQS Receiver* which deletes SQS
messages only *after* they were successfully stored on S3.
For this a *Custom Receiver* can be implemented with the semantics of the
Reliable Receiver.
The store(multiple-records) call  blocks until the given records have been
stored and replicated inside Spark
<https://spark.apache.org/docs/latest/streaming-custom-receivers.html#receiver-reliability>
 
.
If the *write-ahead logs* are enabled, all the data received from a receiver
gets written into a write  ahead log in the configuration checkpoint
directory
<https://spark.apache.org/docs/latest/streaming-programming-guide.html#checkpointing>
 
. The checkpoint directory can be pointed to S3.
After the store(multiple-records) blocking call finishes, are the records
already stored in the checkpoint directory (and thus can be safely deleted
from SQS)?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Reliable-SQS-Receiver-for-Spark-Streaming-tp23302.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to