I would like to have a Spark Streaming *SQS Receiver* which deletes SQS messages only *after* they were successfully stored on S3. For this a *Custom Receiver* can be implemented with the semantics of the Reliable Receiver. The store(multiple-records) call blocks until the given records have been stored and replicated inside Spark <https://spark.apache.org/docs/latest/streaming-custom-receivers.html#receiver-reliability> . If the *write-ahead logs* are enabled, all the data received from a receiver gets written into a write ahead log in the configuration checkpoint directory <https://spark.apache.org/docs/latest/streaming-programming-guide.html#checkpointing> . The checkpoint directory can be pointed to S3. After the store(multiple-records) blocking call finishes, are the records already stored in the checkpoint directory (and thus can be safely deleted from SQS)?
-- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Reliable-SQS-Receiver-for-Spark-Streaming-tp23302.html Sent from the Apache Spark User List mailing list archive at Nabble.com.