Hello.

I am using spark 1.5.2 and the Kafka direct stream creation to load data from 
Kafka. We're processing around 200K messages/second in a cluster with the Kafka 
and Spark nodes collocated (same switch) without issue. However, when the Kafka 
broker is further away (even a couple of router hops) the throughput decreases 
significantly, causing job delays. 

Is this typical? Have others encountered similar issues? Is there Kafka 
configuration that might mitigate this issue?

Regards,

Bryan Jeffrey

Sent from Outlook Mail for Windows 10 phone

Reply via email to