Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-25 Thread Patrick Brown
Done:

https://issues.apache.org/jira/browse/SPARK-25837

On Thu, Oct 25, 2018 at 10:21 AM Marcelo Vanzin  wrote:

> Ah that makes more sense. Could you file a bug with that information
> so we don't lose track of this?
>
> Thanks
> On Wed, Oct 24, 2018 at 6:13 PM Patrick Brown
>  wrote:
> >
> > On my production application I am running ~200 jobs at once, but
> continue to submit jobs in this manner for sometimes ~1 hour.
> >
> > The reproduction code above generally only has 4 ish jobs running at
> once, and as you can see runs through 50k jobs in this manner.
> >
> > I guess I should clarify my above statement, the issue seems to appear
> when running multiple jobs at once as well as in sequence for a while and
> may as well have something to do with high master CPU usage (thus the
> collect in the code). My rough guess would be whatever is managing clearing
> out completed jobs gets overwhelmed (my master was a 4 core machine while
> running this, and htop reported almost full CPU usage across all 4 cores).
> >
> > The attached screenshot shows the state of the webui after running the
> repro code, you can see the ui is displaying some 43k completed jobs (takes
> a long time to load) after a few minutes of inactivity this will clear out,
> however as my production application continues to submit jobs every once in
> a while, the issue persists.
> >
> > On Wed, Oct 24, 2018 at 5:05 PM Marcelo Vanzin 
> wrote:
> >>
> >> When you say many jobs at once, what ballpark are you talking about?
> >>
> >> The code in 2.3+ does try to keep data about all running jobs and
> >> stages regardless of the limit. If you're running into issues because
> >> of that we may have to look again at whether that's the right thing to
> >> do.
> >> On Tue, Oct 23, 2018 at 10:02 AM Patrick Brown
> >>  wrote:
> >> >
> >> > I believe I may be able to reproduce this now, it seems like it may
> be something to do with many jobs at once:
> >> >
> >> > Spark 2.3.1
> >> >
> >> > > spark-shell --conf spark.ui.retainedJobs=1
> >> >
> >> > scala> import scala.concurrent._
> >> > scala> import scala.concurrent.ExecutionContext.Implicits.global
> >> > scala> for (i <- 0 until 5) { Future { println(sc.parallelize(0
> until i).collect.length) } }
> >> >
> >> > On Mon, Oct 22, 2018 at 11:25 AM Marcelo Vanzin 
> wrote:
> >> >>
> >> >> Just tried on 2.3.2 and worked fine for me. UI had a single job and a
> >> >> single stage (+ the tasks related to that single stage), same thing
> in
> >> >> memory (checked with jvisualvm).
> >> >>
> >> >> On Sat, Oct 20, 2018 at 6:45 PM Marcelo Vanzin 
> wrote:
> >> >> >
> >> >> > On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown
> >> >> >  wrote:
> >> >> > > I recently upgraded to spark 2.3.1 I have had these same
> settings in my spark submit script, which worked on 2.0.2, and according to
> the documentation appear to not have changed:
> >> >> > >
> >> >> > > spark.ui.retainedTasks=1
> >> >> > > spark.ui.retainedStages=1
> >> >> > > spark.ui.retainedJobs=1
> >> >> >
> >> >> > I tried that locally on the current master and it seems to be
> working.
> >> >> > I don't have 2.3 easily in front of me right now, but will take a
> look
> >> >> > Monday.
> >> >> >
> >> >> > --
> >> >> > Marcelo
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Marcelo
> >>
> >>
> >>
> >> --
> >> Marcelo
>
>
>
> --
> Marcelo
>


Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-25 Thread Marcelo Vanzin
Ah that makes more sense. Could you file a bug with that information
so we don't lose track of this?

Thanks
On Wed, Oct 24, 2018 at 6:13 PM Patrick Brown
 wrote:
>
> On my production application I am running ~200 jobs at once, but continue to 
> submit jobs in this manner for sometimes ~1 hour.
>
> The reproduction code above generally only has 4 ish jobs running at once, 
> and as you can see runs through 50k jobs in this manner.
>
> I guess I should clarify my above statement, the issue seems to appear when 
> running multiple jobs at once as well as in sequence for a while and may as 
> well have something to do with high master CPU usage (thus the collect in the 
> code). My rough guess would be whatever is managing clearing out completed 
> jobs gets overwhelmed (my master was a 4 core machine while running this, and 
> htop reported almost full CPU usage across all 4 cores).
>
> The attached screenshot shows the state of the webui after running the repro 
> code, you can see the ui is displaying some 43k completed jobs (takes a long 
> time to load) after a few minutes of inactivity this will clear out, however 
> as my production application continues to submit jobs every once in a while, 
> the issue persists.
>
> On Wed, Oct 24, 2018 at 5:05 PM Marcelo Vanzin  wrote:
>>
>> When you say many jobs at once, what ballpark are you talking about?
>>
>> The code in 2.3+ does try to keep data about all running jobs and
>> stages regardless of the limit. If you're running into issues because
>> of that we may have to look again at whether that's the right thing to
>> do.
>> On Tue, Oct 23, 2018 at 10:02 AM Patrick Brown
>>  wrote:
>> >
>> > I believe I may be able to reproduce this now, it seems like it may be 
>> > something to do with many jobs at once:
>> >
>> > Spark 2.3.1
>> >
>> > > spark-shell --conf spark.ui.retainedJobs=1
>> >
>> > scala> import scala.concurrent._
>> > scala> import scala.concurrent.ExecutionContext.Implicits.global
>> > scala> for (i <- 0 until 5) { Future { println(sc.parallelize(0 until 
>> > i).collect.length) } }
>> >
>> > On Mon, Oct 22, 2018 at 11:25 AM Marcelo Vanzin  
>> > wrote:
>> >>
>> >> Just tried on 2.3.2 and worked fine for me. UI had a single job and a
>> >> single stage (+ the tasks related to that single stage), same thing in
>> >> memory (checked with jvisualvm).
>> >>
>> >> On Sat, Oct 20, 2018 at 6:45 PM Marcelo Vanzin  
>> >> wrote:
>> >> >
>> >> > On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown
>> >> >  wrote:
>> >> > > I recently upgraded to spark 2.3.1 I have had these same settings in 
>> >> > > my spark submit script, which worked on 2.0.2, and according to the 
>> >> > > documentation appear to not have changed:
>> >> > >
>> >> > > spark.ui.retainedTasks=1
>> >> > > spark.ui.retainedStages=1
>> >> > > spark.ui.retainedJobs=1
>> >> >
>> >> > I tried that locally on the current master and it seems to be working.
>> >> > I don't have 2.3 easily in front of me right now, but will take a look
>> >> > Monday.
>> >> >
>> >> > --
>> >> > Marcelo
>> >>
>> >>
>> >>
>> >> --
>> >> Marcelo
>>
>>
>>
>> --
>> Marcelo



-- 
Marcelo

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-24 Thread Marcelo Vanzin
When you say many jobs at once, what ballpark are you talking about?

The code in 2.3+ does try to keep data about all running jobs and
stages regardless of the limit. If you're running into issues because
of that we may have to look again at whether that's the right thing to
do.
On Tue, Oct 23, 2018 at 10:02 AM Patrick Brown
 wrote:
>
> I believe I may be able to reproduce this now, it seems like it may be 
> something to do with many jobs at once:
>
> Spark 2.3.1
>
> > spark-shell --conf spark.ui.retainedJobs=1
>
> scala> import scala.concurrent._
> scala> import scala.concurrent.ExecutionContext.Implicits.global
> scala> for (i <- 0 until 5) { Future { println(sc.parallelize(0 until 
> i).collect.length) } }
>
> On Mon, Oct 22, 2018 at 11:25 AM Marcelo Vanzin  wrote:
>>
>> Just tried on 2.3.2 and worked fine for me. UI had a single job and a
>> single stage (+ the tasks related to that single stage), same thing in
>> memory (checked with jvisualvm).
>>
>> On Sat, Oct 20, 2018 at 6:45 PM Marcelo Vanzin  wrote:
>> >
>> > On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown
>> >  wrote:
>> > > I recently upgraded to spark 2.3.1 I have had these same settings in my 
>> > > spark submit script, which worked on 2.0.2, and according to the 
>> > > documentation appear to not have changed:
>> > >
>> > > spark.ui.retainedTasks=1
>> > > spark.ui.retainedStages=1
>> > > spark.ui.retainedJobs=1
>> >
>> > I tried that locally on the current master and it seems to be working.
>> > I don't have 2.3 easily in front of me right now, but will take a look
>> > Monday.
>> >
>> > --
>> > Marcelo
>>
>>
>>
>> --
>> Marcelo



-- 
Marcelo

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-23 Thread Patrick Brown
I believe I may be able to reproduce this now, it seems like it may be
something to do with many jobs at once:

Spark 2.3.1

> spark-shell --conf spark.ui.retainedJobs=1

scala> import scala.concurrent._
scala> import scala.concurrent.ExecutionContext.Implicits.global
scala> for (i <- 0 until 5) { Future { println(sc.parallelize(0 until
i).collect.length) } }

On Mon, Oct 22, 2018 at 11:25 AM Marcelo Vanzin  wrote:

> Just tried on 2.3.2 and worked fine for me. UI had a single job and a
> single stage (+ the tasks related to that single stage), same thing in
> memory (checked with jvisualvm).
>
> On Sat, Oct 20, 2018 at 6:45 PM Marcelo Vanzin 
> wrote:
> >
> > On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown
> >  wrote:
> > > I recently upgraded to spark 2.3.1 I have had these same settings in
> my spark submit script, which worked on 2.0.2, and according to the
> documentation appear to not have changed:
> > >
> > > spark.ui.retainedTasks=1
> > > spark.ui.retainedStages=1
> > > spark.ui.retainedJobs=1
> >
> > I tried that locally on the current master and it seems to be working.
> > I don't have 2.3 easily in front of me right now, but will take a look
> > Monday.
> >
> > --
> > Marcelo
>
>
>
> --
> Marcelo
>


Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-22 Thread Marcelo Vanzin
Just tried on 2.3.2 and worked fine for me. UI had a single job and a
single stage (+ the tasks related to that single stage), same thing in
memory (checked with jvisualvm).

On Sat, Oct 20, 2018 at 6:45 PM Marcelo Vanzin  wrote:
>
> On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown
>  wrote:
> > I recently upgraded to spark 2.3.1 I have had these same settings in my 
> > spark submit script, which worked on 2.0.2, and according to the 
> > documentation appear to not have changed:
> >
> > spark.ui.retainedTasks=1
> > spark.ui.retainedStages=1
> > spark.ui.retainedJobs=1
>
> I tried that locally on the current master and it seems to be working.
> I don't have 2.3 easily in front of me right now, but will take a look
> Monday.
>
> --
> Marcelo



-- 
Marcelo

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-20 Thread Marcelo Vanzin
On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown
 wrote:
> I recently upgraded to spark 2.3.1 I have had these same settings in my spark 
> submit script, which worked on 2.0.2, and according to the documentation 
> appear to not have changed:
>
> spark.ui.retainedTasks=1
> spark.ui.retainedStages=1
> spark.ui.retainedJobs=1

I tried that locally on the current master and it seems to be working.
I don't have 2.3 easily in front of me right now, but will take a look
Monday.

-- 
Marcelo

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-20 Thread Shing Hing Man
 I have the same problem when I upgrade my application from Spark 2.2.1 to 
Spark 2.3.2 and run in Yarn client mode.
Also I noticed that in my Spark driver,  org.apache.spark.status.TaskDataWrapper
could take up more than 2G of memory. 

Shing


On Tuesday, 16 October 2018, 17:34:02 GMT+1, Patrick Brown 
 wrote:  
 
 I recently upgraded to spark 2.3.1 I have had these same settings in my spark 
submit script, which worked on 2.0.2, and according to the documentation appear 
to not have changed:
spark.ui.retainedTasks=1spark.ui.retainedStages=1spark.ui.retainedJobs=1
However in 2.3.1 the UI doesn't seem to respect this, it still retains a huge 
number of jobs:



Is this a known issue? Any ideas?  
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

[Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-16 Thread Patrick Brown
I recently upgraded to spark 2.3.1 I have had these same settings in my
spark submit script, which worked on 2.0.2, and according to the
documentation appear to not have changed:

spark.ui.retainedTasks=1
spark.ui.retainedStages=1
spark.ui.retainedJobs=1

However in 2.3.1 the UI doesn't seem to respect this, it still retains a
huge number of jobs:

[image: Screen Shot 2018-10-16 at 10.31.50 AM.png]


Is this a known issue? Any ideas?