Github user akonopko commented on a diff in the pull request:
https://github.com/apache/spark/pull/19431#discussion_r166606906
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala
---
@@ -126,7 +129,10 @@ private[spark] class DirectKafkaInputDStream[K, V](
protected[streaming] def maxMessagesPerPartition(
offsets: Map[TopicPartition, Long]): Option[Map[TopicPartition, Long]]
= {
- val estimatedRateLimit = rateController.map(_.getLatestRate())
+ val estimatedRateLimit = rateController.map(x => {
+ val lr = x.getLatestRate()
+ if (lr > 0) lr else initialRate
--- End diff --
If somehow cluster was so heavily loaded with other processes that could
process 0 events in Spark Streaming, this means that we might have huge backlog
after that. Which mean without this fix system has big chance of overflowing
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]