Hi, This is being running in Production in many organization who has adopted this consumer as an alternative option. The Consumer will run with spark 1.3.1 .
This is being running in Pearson for sometime in production. This is part of spark packages and you can see how to include it in your mvn or sbt . http://spark-packages.org/package/dibbhatt/kafka-spark-consumer As this consumer comes with in-built PID controller to control back-pressure which you can use even if you are using Spark 1.3.1 Regards, Dibyendu On Thu, Sep 10, 2015 at 5:48 PM, Krzysztof Zarzycki <k.zarzy...@gmail.com> wrote: > Thanks Akhil, seems like an interesting option to consider. > Do you know if the package is production-ready? Do you use it in > production? > > And do you know if it works for Spark 1.3.1 as well? README mentions that > package in spark-packages.org is built with Spark 1.4.1. > > > Anyway, it seems that core Spark Streaming does not support my case? Or > anyone can instruct me on how to do it? Let's say, that I'm even fine (but > not content about) with using KafkaCluster private class that is included > in Spark, for manual managing ZK offsets. Has someone done it before? Has > someone public code examples of manually managing ZK offsets? > > Thanks, > Krzysztof > > 2015-09-10 12:22 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>: > >> This consumer pretty much covers all those scenarios you listed >> github.com/dibbhatt/kafka-spark-consumer Give it a try. >> >> Thanks >> Best Regards >> >> On Thu, Sep 10, 2015 at 3:32 PM, Krzysztof Zarzycki <k.zarzy...@gmail.com >> > wrote: >> >>> Hi there, >>> I have a problem with fulfilling all my needs when using Spark Streaming >>> on Kafka. Let me enumerate my requirements: >>> 1. I want to have at-least-once/exactly-once processing. >>> 2. I want to have my application fault & simple stop tolerant. The Kafka >>> offsets need to be tracked between restarts. >>> 3. I want to be able to upgrade code of my application without losing >>> Kafka offsets. >>> >>> Now what my requirements imply according to my knowledge: >>> 1. implies using new Kafka DirectStream. >>> 2. implies using checkpointing. kafka DirectStream will write offsets >>> to the checkpoint as well. >>> 3. implies that checkpoints can't be used between controlled restarts. >>> So I need to install shutdownHook with ssc.stop(stopGracefully=true) (here >>> is a description how: >>> https://metabroadcast.com/blog/stop-your-spark-streaming-application-gracefully >>> ) >>> >>> Now my problems are: >>> 1. If I cannot make checkpoints between code upgrades, does it mean that >>> Spark does not help me at all with keeping my Kafka offsets? Does it mean, >>> that I have to implement my own storing to/initalization of offsets from >>> Zookeeper? >>> 2. When I set up shutdownHook and my any executor throws an exception, >>> it seems that application does not fail, but stuck in running state. Is >>> that because stopGracefully deadlocks on exceptions? How to overcome this >>> problem? Maybe I can avoid setting shutdownHook and there is other way to >>> stop gracefully your app? >>> >>> 3. If I somehow overcome 2., is it enough to just stop gracefully my app >>> to be able to upgrade code & not lose Kafka offsets? >>> >>> >>> Thank you a lot for your answers, >>> Krzysztof Zarzycki >>> >>> >>> >>> >> >