Hello Spark users, We are setting up our fist bach of spark streaming pipelines. And I am running into an issue which I am not sure how to resolve, but seems like should be fairly trivial.
I am using receiver-mode Kafka consumer that comes with Spark, and running in standalone mode. I've setup two receivers, which are consuming from a 4 broker, 4 partition kafka topic. If you will look at the image below, you will see that* even though I have two receivers, only one of them ever consumes data at a time*. I believe this to be my current bottleneck for scaling. What am I missing? To me, order of events consumed is not important. I just want to optimize for maximum throughput. [image: Inline image 1] Thanks in advance for any help or tips! Jorge