Maybe this is a pretty esoteric implementation, but I'm seeing some bad
behavior with backpressure plus multiple Kafka streams / direct streams.

Here's the scenario:
We have 1 Kafka topic using the reliable receiver (4 receivers, union the
result).    In the same app, we consume another Kafka topic using a direct
stream.

This may seem strange, but it's necessary in my application to work around
another problem:   Maxrate is set globally in SparkConf.    IMO It would be
more flexible if we could set maxrate for each stream independently.
Since directstream uses a different config parameter for maxrate, we get
the desired result.

A bit hacky I know.

Anyway, we recently turned on backpressure.   It works as expected for the
receiver-based stream.     For the direct stream, it starts out at the
maxrate (as expected) on the first batch.    Then it ratchets down the
consumption until it is eventually consuming 1 record / second / partition.

This happens even though there's no scheduling delay, and the
receiver-based stream does not appear to be throttled.

Anyone ever see anything like this?

Thanks!

Jeff Nadler
Aerohive Networks

Reply via email to