subject:"Maelstrom\: Kafka integration with Spark"

Re: Maelstrom: Kafka integration with Spark

2016-08-24 Thread Jeoffrey Lim

To clarify my earlier statement, I will continue working on Maelstrom as an alternative to official Spark integration with Kafka and keep the KafkaRDDs + Consumers as it is - until I find the official Spark Kafka more stable and resilient to Kafka broker issues/failures (reason I have infinite retr

Re: Maelstrom: Kafka integration with Spark

2016-08-24 Thread Jeoffrey Lim

Hi Cody, thank you for pointing out sub-millisecond processing, it is an "exaggerated" term :D I simply got excited releasing this project, it should be: "millisecond stream processing at the spark level". Highly appreciate the info about latest Kafka consumer. Would need to get up to speed about

Re: Maelstrom: Kafka integration with Spark

2016-08-24 Thread Cody Koeninger

Yes, spark-streaming-kafka-0-10 uses the new consumer. Besides pre-fetching messages, the big reason for that is that security features are only available with the new consumer. The Kafka project is at release 0.10.0.1 now, they think most of the issues with the new consumer have been ironed out

Re: Maelstrom: Kafka integration with Spark

2016-08-23 Thread Jeoffrey Lim

Apologies, I was not aware that Spark 2.0 has Kafka Consumer caching/pooling now. What I have checked is the latest Kafka Consumer, and I believe it is still in beta quality. https://kafka.apache.org/documentation.html#newconsumerconfigs > Since 0.9.0.0 we have been working on a replacement for o

Re: Maelstrom: Kafka integration with Spark

2016-08-23 Thread Cody Koeninger

Were you aware that the spark 2.0 / kafka 0.10 integration also reuses kafka consumer instances on the executors? On Tue, Aug 23, 2016 at 3:19 PM, Jeoffrey Lim wrote: > Hi, > > I have released the first version of a new Kafka integration with Spark > that we use in the company I work for: open so

Maelstrom: Kafka integration with Spark

2016-08-23 Thread Jeoffrey Lim

Hi, I have released the first version of a new Kafka integration with Spark that we use in the company I work for: open sourced and named Maelstrom. It is unique compared to other solutions out there as it reuses the Kafka Consumer connection to achieve sub-milliseconds latency. This library has

Re: Maelstrom: Kafka integration with Spark

Re: Maelstrom: Kafka integration with Spark

Re: Maelstrom: Kafka integration with Spark

Re: Maelstrom: Kafka integration with Spark

Re: Maelstrom: Kafka integration with Spark

Maelstrom: Kafka integration with Spark

6 matches

Site Navigation

Mail list logo

Footer information