i am refering to back pressure implementation here. On Fri, Oct 9, 2015 at 8:30 AM, pushkar priyadarshi < priyadarshi.push...@gmail.com> wrote:
> Spark 1.5 kafka direct i think does not store messages rather than it > fetches messages as in when consumed in the pipeline.That would prevent you > from having data loss. > > > > On Fri, Oct 9, 2015 at 7:34 AM, bitborn <andrew.clark...@ave81.com> wrote: > >> Hi all, >> >> My company is using Spark streaming and the Kafka API's to process an >> event >> stream. We've got most of our application written, but are stuck on "at >> least once" processing. >> >> I created a demo to show roughly what we're doing here: >> https://github.com/bitborn/resilient-kafka-streaming-in-spark >> <https://github.com/bitborn/resilient-kafka-streaming-in-spark> >> >> The problem we're having is when the application experiences an exception >> (network issue, out of memory, etc) it will drop the batch it's >> processing. >> The ideal behavior is it will process each event "at least once" even if >> that means processing it more than once. Whether this happens via >> checkpointing, WAL, or kafka offsets is irrelevant, as long as we don't >> drop >> data. :) >> >> A couple of things we've tried: >> - Using the kafka direct stream API (via Cody Koeninger >> < >> https://github.com/koeninger/kafka-exactly-once/blob/master/src/main/scala/example/IdempotentExample.scala >> > >> ) >> - Using checkpointing with both the low-level and high-level API's >> - Enabling the write ahead log >> >> I've included a log here spark.log >> < >> https://github.com/bitborn/resilient-kafka-streaming-in-spark/blob/master/spark.log >> > >> , but I'm afraid it doesn't reveal much. >> >> The fact that others seem to be able to get this working properly suggests >> we're missing some magic configuration or are possibly executing it in a >> way >> that won't support the desired behavior. >> >> I'd really appreciate some pointers! >> >> Thanks much, >> Andrew Clarkson >> >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-streaming-at-least-once-semantics-tp24995.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >