[ 
https://issues.apache.org/jira/browse/SPARK-6691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-6691:
-------------------------------
    Issue Type: Improvement  (was: New Feature)

> Abstract and add a dynamic RateLimiter for Spark Streaming
> ----------------------------------------------------------
>
>                 Key: SPARK-6691
>                 URL: https://issues.apache.org/jira/browse/SPARK-6691
>             Project: Spark
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 1.3.0
>            Reporter: Saisai Shao
>
> Flow control (or rate control) for input data is very important in streaming 
> system, especially for Spark Streaming to keep stable and up-to-date. The 
> unexpected flood of incoming data or the high ingestion rate of input data 
> which beyond the computation power of cluster will make the system unstable 
> and increase the delay time. For Spark Streaming’s job generation and 
> processing pattern, this delay will be accumulated and introduce unacceptable 
> exceptions.
> ----
> Currently in Spark Streaming’s receiver based input stream, there’s a 
> RateLimiter in BlockGenerator which controls the ingestion rate of input 
> data, but the current implementation has several limitations:
> # The max ingestion rate is set by user through configuration beforehand, 
> user may lack the experience of how to set an appropriate value before the 
> application is running.
> # This configuration is fixed through the life-time of application, which 
> means you need to consider the worst scenario to set a reasonable 
> configuration.
> # Input stream like DirectKafkaInputStream need to maintain another solution 
> to achieve the same functionality.
> # Lack of slow start control makes the whole system easily trapped into large 
> processing and scheduling delay at the very beginning.
> ----
> So here we propose a new dynamic RateLimiter as well as the new interface for 
> the RateLimiter to better improve the whole system's stability. The target is:
> * Dynamically adjust the ingestion rate according to processing rate of 
> previous finished jobs.
> * Offer an uniform solution not only for receiver based input stream, but 
> also for direct stream like DirectKafkaInputStream and new ones.
> * Slow start rate to control the network congestion when job is started.
> * Pluggable framework to make the maintenance of extension more easy.
> ----
> Here is the design doc 
> (https://docs.google.com/document/d/1lqJDkOYDh_9hRLQRwqvBXcbLScWPmMa7MlG8J_TE93w/edit?usp=sharing)
>  and working branch 
> (https://github.com/jerryshao/apache-spark/tree/dynamic-rate-limiter).
> Any comment would be greatly appreciated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to