Node afinity for Kafka-Direct Stream

Gerard Maas Wed, 14 Oct 2015 02:39:07 -0700

In the receiver-based kafka streaming model, given that each receiver
starts as a long-running task, one can rely in a certain degree of data
locality based on the kafka partitioning:  Data published on a given
topic/partition will land on the same spark streaming receiving node until
the receiver dies and needs to be restarted somewhere else.


As I understand, the direct-kafka streaming model just computes offsets and
relays the work to a KafkaRDD. How is the execution locality compared to
the receiver-based approach?

thanks, Gerard.

Node afinity for Kafka-Direct Stream

Reply via email to