[jira] [Commented] (SPARK-25837) Web UI does not respect spark.ui.retainedJobs in some instances

2019-03-18 Thread Xiaoju Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16794932#comment-16794932
 ] 

Xiaoju Wu commented on SPARK-25837:
---

Did you verify this fix with the reproduce case above? I tried and found the 
issue is still there: the cleanup was still backed up but better than the 
version without this fix.

> Web UI does not respect spark.ui.retainedJobs in some instances
> ---
>
> Key: SPARK-25837
> URL: https://issues.apache.org/jira/browse/SPARK-25837
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.1
> Environment: Reproduction Environment:
> Spark 2.3.1
> Dataproc 1.3-deb9
> 1x master 4 vCPUs, 15 GB
> 2x workers 4 vCPUs, 15 GB
>  
>Reporter: Patrick Brown
>Assignee: Patrick Brown
>Priority: Minor
> Fix For: 2.3.3, 2.4.1, 3.0.0
>
> Attachments: Screen Shot 2018-10-23 at 4.40.51 PM (1).png
>
>
> Expected Behavior: Web UI only displays 1 completed job and remains 
> responsive.
> Actual Behavior: Both during job execution and following all job completion 
> for some non short amount of time the UI retains many completed jobs, causing 
> limited responsiveness.
>  
> To reproduce:
>  
>  > spark-shell --conf spark.ui.retainedJobs=1
>   
>  scala> import scala.concurrent._
>  scala> import scala.concurrent.ExecutionContext.Implicits.global
>  scala> for (i <- 0 until 5) { Future
> { println(sc.parallelize(0 until i).collect.length) }
> }
>   
>  
>  
> The attached screenshot shows the state of the webui after running the repro 
> code, you can see the ui is displaying some 43k completed jobs (takes a long 
> time to load) after a few minutes of inactivity this will clear out, however 
> in an application which continues to submit jobs every once in a while, the 
> issue persists.
>  
> The issue seems to appear when running multiple jobs at once as well as in 
> sequence for a while and may as well have something to do with high master 
> CPU usage (thus the collect in the repro code). My rough guess would be 
> whatever is managing clearing out completed jobs gets overwhelmed (on the 
> master during repro htop reported almost full CPU usage across all 4 cores).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25837) Web UI does not respect spark.ui.retainedJobs in some instances

2018-10-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16667707#comment-16667707
 ] 

Apache Spark commented on SPARK-25837:
--

User 'patrickbrownsync' has created a pull request for this issue:
https://github.com/apache/spark/pull/22883

> Web UI does not respect spark.ui.retainedJobs in some instances
> ---
>
> Key: SPARK-25837
> URL: https://issues.apache.org/jira/browse/SPARK-25837
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.1
> Environment: Reproduction Environment:
> Spark 2.3.1
> Dataproc 1.3-deb9
> 1x master 4 vCPUs, 15 GB
> 2x workers 4 vCPUs, 15 GB
>  
>Reporter: Patrick Brown
>Priority: Minor
> Attachments: Screen Shot 2018-10-23 at 4.40.51 PM (1).png
>
>
> Expected Behavior: Web UI only displays 1 completed job and remains 
> responsive.
> Actual Behavior: Both during job execution and following all job completion 
> for some non short amount of time the UI retains many completed jobs, causing 
> limited responsiveness.
>  
> To reproduce:
>  
>  > spark-shell --conf spark.ui.retainedJobs=1
>   
>  scala> import scala.concurrent._
>  scala> import scala.concurrent.ExecutionContext.Implicits.global
>  scala> for (i <- 0 until 5) { Future
> { println(sc.parallelize(0 until i).collect.length) }
> }
>   
>  
>  
> The attached screenshot shows the state of the webui after running the repro 
> code, you can see the ui is displaying some 43k completed jobs (takes a long 
> time to load) after a few minutes of inactivity this will clear out, however 
> in an application which continues to submit jobs every once in a while, the 
> issue persists.
>  
> The issue seems to appear when running multiple jobs at once as well as in 
> sequence for a while and may as well have something to do with high master 
> CPU usage (thus the collect in the repro code). My rough guess would be 
> whatever is managing clearing out completed jobs gets overwhelmed (on the 
> master during repro htop reported almost full CPU usage across all 4 cores).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25837) Web UI does not respect spark.ui.retainedJobs in some instances

2018-10-29 Thread Patrick Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16667646#comment-16667646
 ] 

Patrick Brown commented on SPARK-25837:
---

The fundamental problem seems to be in `AppStatusLisener` in the 
`cleanupStages` method.

 

Using the repro code above it appears that sometimes (not always) stages and 
tasks get slightly backed up. When this occurs the iteration through tasks 
starts taking longer and longer:

 

```

val tasks = kvstore.view(classOf[TaskDataWrapper])
 .index("stage")
 .first(key)
 .last(key)
 .asScala

```

 

This seems to be because for each stage we are then iterating through all the 
tasks (of which there can be ~400k in this repro code), which can go from 
taking ~10ms before the back up to ~300ms afterwards due to the large number of 
tasks. This causes a feedback loop in which the `cleanupStages` method cannot 
keep up.

 

> Web UI does not respect spark.ui.retainedJobs in some instances
> ---
>
> Key: SPARK-25837
> URL: https://issues.apache.org/jira/browse/SPARK-25837
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.1
> Environment: Reproduction Environment:
> Spark 2.3.1
> Dataproc 1.3-deb9
> 1x master 4 vCPUs, 15 GB
> 2x workers 4 vCPUs, 15 GB
>  
>Reporter: Patrick Brown
>Priority: Minor
> Attachments: Screen Shot 2018-10-23 at 4.40.51 PM (1).png
>
>
> Expected Behavior: Web UI only displays 1 completed job and remains 
> responsive.
> Actual Behavior: Both during job execution and following all job completion 
> for some non short amount of time the UI retains many completed jobs, causing 
> limited responsiveness.
>  
> To reproduce:
>  
>  > spark-shell --conf spark.ui.retainedJobs=1
>   
>  scala> import scala.concurrent._
>  scala> import scala.concurrent.ExecutionContext.Implicits.global
>  scala> for (i <- 0 until 5) { Future
> { println(sc.parallelize(0 until i).collect.length) }
> }
>   
>  
>  
> The attached screenshot shows the state of the webui after running the repro 
> code, you can see the ui is displaying some 43k completed jobs (takes a long 
> time to load) after a few minutes of inactivity this will clear out, however 
> in an application which continues to submit jobs every once in a while, the 
> issue persists.
>  
> The issue seems to appear when running multiple jobs at once as well as in 
> sequence for a while and may as well have something to do with high master 
> CPU usage (thus the collect in the repro code). My rough guess would be 
> whatever is managing clearing out completed jobs gets overwhelmed (on the 
> master during repro htop reported almost full CPU usage across all 4 cores).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25837) Web UI does not respect spark.ui.retainedJobs in some instances

2018-10-25 Thread Patrick Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16664051#comment-16664051
 ] 

Patrick Brown commented on SPARK-25837:
---

I would be interested and happy to tackle this, if its an issue that the 
community agrees should be addressed.

> Web UI does not respect spark.ui.retainedJobs in some instances
> ---
>
> Key: SPARK-25837
> URL: https://issues.apache.org/jira/browse/SPARK-25837
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.1
> Environment: Reproduction Environment:
> Spark 2.3.1
> Dataproc 1.3-deb9
> 1x master 4 vCPUs, 15 GB
> 2x workers 4 vCPUs, 15 GB
>  
>Reporter: Patrick Brown
>Priority: Minor
> Attachments: Screen Shot 2018-10-23 at 4.40.51 PM (1).png
>
>
> Expected Behavior: Web UI only displays 1 completed job and remains 
> responsive.
> Actual Behavior: Both during job execution and following all job completion 
> for some non short amount of time the UI retains many completed jobs, causing 
> limited responsiveness.
>  
> To reproduce:
>  
>  > spark-shell --conf spark.ui.retainedJobs=1
>   
>  scala> import scala.concurrent._
>  scala> import scala.concurrent.ExecutionContext.Implicits.global
>  scala> for (i <- 0 until 5) { Future
> { println(sc.parallelize(0 until i).collect.length) }
> }
>   
>  
>  
> The attached screenshot shows the state of the webui after running the repro 
> code, you can see the ui is displaying some 43k completed jobs (takes a long 
> time to load) after a few minutes of inactivity this will clear out, however 
> in an application which continues to submit jobs every once in a while, the 
> issue persists.
>  
> The issue seems to appear when running multiple jobs at once as well as in 
> sequence for a while and may as well have something to do with high master 
> CPU usage (thus the collect in the repro code). My rough guess would be 
> whatever is managing clearing out completed jobs gets overwhelmed (on the 
> master during repro htop reported almost full CPU usage across all 4 cores).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org