[GitHub] spark issue #17774: [SPARK-18371][Streaming] Spark Streaming backpressure ge...

koeninger Thu, 27 Apr 2017 07:51:58 -0700

Github user koeninger commented on the issue:

    https://github.com/apache/spark/pull/17774
  
    @arzt It's entirely possible to have batch times less than a second, and 
I'm not sure I agree that the absolute number of messages allowable for a 
partition should ever be zero.
    
    So to put this another way, right now effectiveRateLimitPerPartition is a 
Map[TopicPartition, Long], which matches the return value of the function 
maxMessagesPerPartition.
    
    You're wanting to change effectiveRateLimitPerPartition to a 
Map[TopicPartition, Double], which is probably a good idea, and should fix the 
bug around treating a very small rate limit as no limit.
    
    But it still needs to be converted to Map[TopicPartition, Long] before 
returning.  Calling .toLong is probably not the right thing to do there, 
because 0.99 will get truncated to 0.  
    
    I think one message per partition per batch is the minimum reasonable rate 
limit, otherwise particular partitions may not make progress.  The relative lag 
calculation might take care of that in future batches, but it still seems 
questionable, even if it's a corner case.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #17774: [SPARK-18371][Streaming] Spark Streaming backpressure ge...

Reply via email to