Hello Spark users,

We are setting up our fist bach of spark streaming pipelines.  And I am
running into an issue which I am not sure how to resolve, but seems like
should be fairly trivial.

I am using receiver-mode Kafka consumer that comes with Spark, and running
in standalone mode.  I've setup two receivers, which are consuming from a 4
broker, 4 partition kafka topic.

If you will look at the image below, you will see that* even though I have
two receivers, only one of them ever consumes data at a time*.  I believe
this to be my current bottleneck for scaling.

What am I missing?

To me, order of events consumed is not important.  I just want to optimize
for maximum throughput.


[image: Inline image 1]

Thanks in advance for any help or tips!

Jorge

Reply via email to