subject:"spark kafka batch integration"

Re: spark kafka batch integration

2014-12-15 Thread Koert Kuipers

gwen, i thought about it a little more and i feel pretty confident i can make it so that it's deterministic in case of node failure. will push that change out after holidays. On Mon, Dec 15, 2014 at 12:03 AM, Koert Kuipers wrote: > > hey gwen, > > no immediate plans to contribute it to spark but

Re: spark kafka batch integration

2014-12-14 Thread Koert Kuipers

hey gwen, no immediate plans to contribute it to spark but of course we are open to this. given sparks pullreq backlog my suspicion is that spark community prefers a user library at this point. if you lose a node the task will restart. and since each task reads until the end of a kafka partition,

Re: spark kafka batch integration

2014-12-14 Thread Joe Stein

I like the idea of the KafkaRDD and Spark partition/split per Kafka partition. That is good use of the SimpleConsumer. I can see a few different strategies for the commitOffsets and partitionOwnership. What use case are you committing your offsets for? /**

spark kafka batch integration

2014-12-14 Thread Koert Kuipers

hello all, we at tresata wrote a library to provide for batch integration between spark and kafka. it supports: * distributed write of rdd to kafa * distributed read of rdd from kafka our main use cases are (in lambda architecture speak): * periodic appends to the immutable master dataset on hdfs

Re: spark kafka batch integration

Re: spark kafka batch integration

Re: spark kafka batch integration

spark kafka batch integration

4 matches

Site Navigation

Mail list logo

Footer information