[jira] [Assigned] (SPARK-18838) High latency of event processing for large jobs
[ https://issues.apache.org/jira/browse/SPARK-18838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-18838: --- Assignee: Marcelo Vanzin > High latency of event processing for large jobs > --- > > Key: SPARK-18838 > URL: https://issues.apache.org/jira/browse/SPARK-18838 > Project: Spark > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Sital Kedia >Assignee: Marcelo Vanzin > Fix For: 2.3.0 > > Attachments: perfResults.pdf, SparkListernerComputeTime.xlsx > > > Currently we are observing the issue of very high event processing delay in > driver's `ListenerBus` for large jobs with many tasks. Many critical > component of the scheduler like `ExecutorAllocationManager`, > `HeartbeatReceiver` depend on the `ListenerBus` events and this delay might > hurt the job performance significantly or even fail the job. For example, a > significant delay in receiving the `SparkListenerTaskStart` might cause > `ExecutorAllocationManager` manager to mistakenly remove an executor which is > not idle. > The problem is that the event processor in `ListenerBus` is a single thread > which loops through all the Listeners for each event and processes each event > synchronously > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala#L94. > This single threaded processor often becomes the bottleneck for large jobs. > Also, if one of the Listener is very slow, all the listeners will pay the > price of delay incurred by the slow listener. In addition to that a slow > listener can cause events to be dropped from the event queue which might be > fatal to the job. > To solve the above problems, we propose to get rid of the event queue and the > single threaded event processor. Instead each listener will have its own > dedicate single threaded executor service . When ever an event is posted, it > will be submitted to executor service of all the listeners. The Single > threaded executor service will guarantee in order processing of the events > per listener. The queue used for the executor service will be bounded to > guarantee we do not grow the memory indefinitely. The downside of this > approach is separate event queue per listener will increase the driver memory > footprint. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-18838) High latency of event processing for large jobs
[ https://issues.apache.org/jira/browse/SPARK-18838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18838: Assignee: (was: Apache Spark) > High latency of event processing for large jobs > --- > > Key: SPARK-18838 > URL: https://issues.apache.org/jira/browse/SPARK-18838 > Project: Spark > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Sital Kedia > > Currently we are observing the issue of very high event processing delay in > driver's `ListenerBus` for large jobs with many tasks. Many critical > component of the scheduler like `ExecutorAllocationManager`, > `HeartbeatReceiver` depend on the `ListenerBus` events and these delay is > causing job failure. For example, a significant delay in receiving the > `SparkListenerTaskStart` might cause `ExecutorAllocationManager` manager to > remove an executor which is not idle. The event processor in `ListenerBus` > is a single thread which loops through all the Listeners for each event and > processes each event synchronously > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala#L94. > > The single threaded processor often becomes the bottleneck for large jobs. > In addition to that, if one of the Listener is very slow, all the listeners > will pay the price of delay incurred by the slow listener. > To solve the above problems, we plan to have a per listener single threaded > executor service and separate event queue. That way we are not bottlenecked > by the single threaded event processor and also critical listeners will not > be penalized by the slow listeners. The downside of this approach is separate > event queue per listener will increase the driver memory footprint. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-18838) High latency of event processing for large jobs
[ https://issues.apache.org/jira/browse/SPARK-18838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18838: Assignee: Apache Spark > High latency of event processing for large jobs > --- > > Key: SPARK-18838 > URL: https://issues.apache.org/jira/browse/SPARK-18838 > Project: Spark > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Sital Kedia >Assignee: Apache Spark > > Currently we are observing the issue of very high event processing delay in > driver's `ListenerBus` for large jobs with many tasks. Many critical > component of the scheduler like `ExecutorAllocationManager`, > `HeartbeatReceiver` depend on the `ListenerBus` events and these delay is > causing job failure. For example, a significant delay in receiving the > `SparkListenerTaskStart` might cause `ExecutorAllocationManager` manager to > remove an executor which is not idle. The event processor in `ListenerBus` > is a single thread which loops through all the Listeners for each event and > processes each event synchronously > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala#L94. > > The single threaded processor often becomes the bottleneck for large jobs. > In addition to that, if one of the Listener is very slow, all the listeners > will pay the price of delay incurred by the slow listener. > To solve the above problems, we plan to have a per listener single threaded > executor service and separate event queue. That way we are not bottlenecked > by the single threaded event processor and also critical listeners will not > be penalized by the slow listeners. The downside of this approach is separate > event queue per listener will increase the driver memory footprint. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org