I like the idea of the KafkaRDD and Spark partition/split per Kafka
partition. That is good use of the SimpleConsumer.

I can see a few different strategies for the commitOffsets and
partitionOwnership.

What use case are you committing your offsets for?

/*******************************************
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
********************************************/

On Sun, Dec 14, 2014 at 8:22 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
> hello all,
> we at tresata wrote a library to provide for batch integration between
> spark and kafka. it supports:
> * distributed write of rdd to kafa
> * distributed read of rdd from kafka
>
> our main use cases are (in lambda architecture speak):
> * periodic appends to the immutable master dataset on hdfs from kafka using
> spark
> * make non-streaming data available in kafka with periodic data drops from
> hdfs using spark. this is to facilitate merging the speed and batch layers
> in spark-streaming
> * distributed writes from spark-streaming
>
> see here:
> https://github.com/tresata/spark-kafka
>
> best,
> koert
>

Reply via email to