In general, see the material linked from https://github.com/koeninger/kafka-exactly-once if you want a better understanding of the direct stream.
For spark-streaming-kafka-0-8, the direct stream doesn't really care about consumer group, since it uses the simple consumer. For the 0.10 version, it uses the new kafka consumer, so consumer group does matter. In either case, splitting events across old and new versions of the job is not what I would want. I'd suggest making sure that your outputs are idempotent or transactional, and that the new app has a different consumer group (for versions for which it matters). Start up the new app, make sure it is running (even if it errors due to transactional safeguards), then shut down the old app. On Tue, Sep 6, 2016 at 3:51 PM, Mariano Semelman <mariano.semel...@despegar.com> wrote: > Hello everybody, > > I am trying to understand how Kafka Direct Stream works. I'm interested in > having a production ready Spark Streaming application that consumes a Kafka > topic. But I need to guarantee there's (almost) no downtime, specially > during deploys (and submit) of new versions. What it seems to be the best > solution is to deploy and submit the new version without shutting down the > previous one, wait for the new application to start consuming events and > then shutdown the previous one. > > What I would expect is that the events get distributed among the two > applications in a balanced fashion using the consumer group id > splitted by the partition key that I've previously set on my Kafka Producer. > However I don't see that Kafka Direct stream support this functionality. > > I've achieved this with the Receiver-based approach (btw i've used "kafka" > for the "offsets.storage" kafka property[2]). However this approach come > with technical difficulties named in the Documentation[1] (ie: exactly-once > semantics). > > Anyway, not even this approach seems very failsafe, Does anyone know a way > to safely deploy new versions of a streaming application of this kind > without downtime? > > Thanks in advance > > Mariano > > > > [1] http://spark.apache.org/docs/latest/streaming-kafka-integration.html > [2] http://kafka.apache.org/documentation.html#oldconsumerconfigs > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org