Hi Cody, thank you for pointing out sub-millisecond processing, it is an "exaggerated" term :D I simply got excited releasing this project, it should be: "millisecond stream processing at the spark level".
Highly appreciate the info about latest Kafka consumer. Would need to get up to speed about the most recent improvements and new features of Kafka itself. I think with Spark's latest Kafka Integration 0.10 features, Maelstrom's upside would only be the simple APIs (developer friendly). I'll play around with Spark 2.0 kafka-010 KafkaRDD to see if this is feasible. On Wed, Aug 24, 2016 at 10:46 PM, Cody Koeninger <c...@koeninger.org> wrote: > Yes, spark-streaming-kafka-0-10 uses the new consumer. Besides > pre-fetching messages, the big reason for that is that security > features are only available with the new consumer. > > The Kafka project is at release 0.10.0.1 now, they think most of the > issues with the new consumer have been ironed out. You can track the > progress as to when they'll remove the "beta" label at > https://issues.apache.org/jira/browse/KAFKA-3283 > > As far as I know, Kafka in general can't achieve sub-millisecond > end-to-end stream processing, so my guess is you need to be more > specific about your terms there. > > I promise I'm not trying to start a pissing contest :) just wanted to > check if you were aware of the current state of the other consumers. > Collaboration is always welcome. > > > On Tue, Aug 23, 2016 at 10:18 PM, Jeoffrey Lim <jeoffr...@gmail.com> > wrote: > > Apologies, I was not aware that Spark 2.0 has Kafka Consumer > caching/pooling > > now. > > What I have checked is the latest Kafka Consumer, and I believe it is > still > > in beta quality. > > > > https://kafka.apache.org/documentation.html#newconsumerconfigs > > > >> Since 0.9.0.0 we have been working on a replacement for our existing > >> simple and high-level consumers. > >> The code is considered beta quality. > > > > Not sure about this, does Spark 2.0 Kafka 0.10 integration already uses > this > > one? Is it now stable? > > With this caching feature in Spark 2,.0 could it achieve sub-milliseconds > > stream processing now? > > > > > > Maelstrom still uses the old Kafka Simple Consumer, this library was made > > open source so that I > > could continue working on it for future updates & improvements like when > the > > latest Kafka Consumer > > gets a stable release. > > > > We have been using Maelstrom "caching concept" for a long time now, as > > Receiver based Spark Kafka integration > > does not work for us. There were thoughts about using Direct Kafka APIs, > > however Maelstrom has > > very simple APIs and just "simply works" even under unstable scenarios > (e.g. > > advertised hostname failures on EMR). > > > > Maelstrom will work I believe even for Spark 1.3 and Kafka 0.8.2.1 (and > of > > course with the latest Kafka 0.10 as well) > > > > > > On Wed, Aug 24, 2016 at 9:49 AM, Cody Koeninger <c...@koeninger.org> > wrote: > >> > >> Were you aware that the spark 2.0 / kafka 0.10 integration also reuses > >> kafka consumer instances on the executors? > >> > >> On Tue, Aug 23, 2016 at 3:19 PM, Jeoffrey Lim <jeoffr...@gmail.com> > wrote: > >> > Hi, > >> > > >> > I have released the first version of a new Kafka integration with > Spark > >> > that we use in the company I work for: open sourced and named > Maelstrom. > >> > > >> > It is unique compared to other solutions out there as it reuses the > >> > Kafka Consumer connection to achieve sub-milliseconds latency. > >> > > >> > This library has been running stable in production environment and has > >> > been proven to be resilient to numerous production issues. > >> > > >> > > >> > Please check out the project's page in github: > >> > > >> > https://github.com/jeoffreylim/maelstrom > >> > > >> > > >> > Contributors welcome! > >> > > >> > > >> > Cheers! > >> > > >> > Jeoffrey Lim > >> > > >> > > >> > P.S. I am also looking for a job opportunity, please look me up at > >> > Linked In > > > > >