In the receiver-based kafka streaming model, given that each receiver starts as a long-running task, one can rely in a certain degree of data locality based on the kafka partitioning: Data published on a given topic/partition will land on the same spark streaming receiving node until the receiver dies and needs to be restarted somewhere else.
As I understand, the direct-kafka streaming model just computes offsets and relays the work to a KafkaRDD. How is the execution locality compared to the receiver-based approach? thanks, Gerard.