To clarify my earlier statement, I will continue working on Maelstrom
as an alternative to official Spark integration with Kafka and keep
the KafkaRDDs + Consumers as it is - until I find the official Spark Kafka
more stable and resilient to Kafka broker issues/failures (reason I have
infinite retr
Hi Cody, thank you for pointing out sub-millisecond processing, it is
an "exaggerated" term :D I simply got excited releasing this project, it
should be: "millisecond stream processing at the spark level".
Highly appreciate the info about latest Kafka consumer. Would need
to get up to speed about
Yes, spark-streaming-kafka-0-10 uses the new consumer. Besides
pre-fetching messages, the big reason for that is that security
features are only available with the new consumer.
The Kafka project is at release 0.10.0.1 now, they think most of the
issues with the new consumer have been ironed out
Apologies, I was not aware that Spark 2.0 has Kafka Consumer
caching/pooling now.
What I have checked is the latest Kafka Consumer, and I believe it is still
in beta quality.
https://kafka.apache.org/documentation.html#newconsumerconfigs
> Since 0.9.0.0 we have been working on a replacement for o
Were you aware that the spark 2.0 / kafka 0.10 integration also reuses
kafka consumer instances on the executors?
On Tue, Aug 23, 2016 at 3:19 PM, Jeoffrey Lim wrote:
> Hi,
>
> I have released the first version of a new Kafka integration with Spark
> that we use in the company I work for: open so
Hi,
I have released the first version of a new Kafka integration with Spark
that we use in the company I work for: open sourced and named Maelstrom.
It is unique compared to other solutions out there as it reuses the
Kafka Consumer connection to achieve sub-milliseconds latency.
This library has