Hello. I am using spark 1.5.2 and the Kafka direct stream creation to load data from Kafka. We're processing around 200K messages/second in a cluster with the Kafka and Spark nodes collocated (same switch) without issue. However, when the Kafka broker is further away (even a couple of router hops) the throughput decreases significantly, causing job delays.
Is this typical? Have others encountered similar issues? Is there Kafka configuration that might mitigate this issue? Regards, Bryan Jeffrey Sent from Outlook Mail for Windows 10 phone