We've been using the new DirectKafkaInputDStream to implement an exactly once
processing solution that tracks the provided offset ranges within the same
transaction that persists our data results. When an exception is thrown
within the processing loop and the configured number of retries are
Cody Koeninger-2 wrote
What's your schema for the offset table, and what's the definition of
writeOffset ?
The schema is the same as the one in your post: topic | partition| offset
The writeOffset is nearly identical:
def writeOffset(osr: OffsetRange)(implicit session: DBSession): Unit = {
We're a group of experienced backend developers who are fairly new to Spark
Streaming (and Scala) and very interested in using the new (in 1.3)
DirectKafkaInputDStream impl as part of the metrics reporting service we're
building.
Our flow involves reading in metric events, lightly modifying some
Cody Koeninger-2 wrote
In fact, you're using the 2 arg form of reduce by key to shrink it down to
1 partition
reduceByKey(sumFunc, 1)
But you started with 4 kafka partitions? So they're definitely no longer
1:1
True. I added the second arg because we were seeing multiple threads