Re: Spark UI Storage Memory

2020-12-07 Thread Amit Sharma
any suggestion please.

Thanks
Amit

On Fri, Dec 4, 2020 at 2:27 PM Amit Sharma  wrote:

> Is there any memory leak in spark 2.3.3 version as mentioned in below
> Jira.
> https://issues.apache.org/jira/browse/SPARK-29055.
>
> Please let me know how to solve it.
>
> Thanks
> Amit
>
> On Fri, Dec 4, 2020 at 1:55 PM Amit Sharma  wrote:
>
>> Can someone help me on this please.
>>
>>
>> Thanks
>> Amit
>>
>> On Wed, Dec 2, 2020 at 11:52 AM Amit Sharma  wrote:
>>
>>> Hi , I have a spark streaming job. When I am checking the Excetors tab ,
>>> there is a Storage Memory column. It displays used memory  /total memory.
>>> What is used memory. Is it memory in  use or memory used so far. How would
>>> I know how much memory is unused at 1 point of time.
>>>
>>>
>>> Thanks
>>> Amit
>>>
>>


RE: Spark UI Storage Memory

2020-12-04 Thread Jack Yang
unsubsribe


Re: Spark UI Storage Memory

2020-12-04 Thread Amit Sharma
Is there any memory leak in spark 2.3.3 version as mentioned in below Jira.
https://issues.apache.org/jira/browse/SPARK-29055.

Please let me know how to solve it.

Thanks
Amit

On Fri, Dec 4, 2020 at 1:55 PM Amit Sharma  wrote:

> Can someone help me on this please.
>
>
> Thanks
> Amit
>
> On Wed, Dec 2, 2020 at 11:52 AM Amit Sharma  wrote:
>
>> Hi , I have a spark streaming job. When I am checking the Excetors tab ,
>> there is a Storage Memory column. It displays used memory  /total memory.
>> What is used memory. Is it memory in  use or memory used so far. How would
>> I know how much memory is unused at 1 point of time.
>>
>>
>> Thanks
>> Amit
>>
>


Re: Spark UI Storage Memory

2020-12-04 Thread Amit Sharma
Can someone help me on this please.


Thanks
Amit

On Wed, Dec 2, 2020 at 11:52 AM Amit Sharma  wrote:

> Hi , I have a spark streaming job. When I am checking the Excetors tab ,
> there is a Storage Memory column. It displays used memory  /total memory.
> What is used memory. Is it memory in  use or memory used so far. How would
> I know how much memory is unused at 1 point of time.
>
>
> Thanks
> Amit
>


Re: Spark UI

2020-07-20 Thread ArtemisDev
Thanks Xiao for the info.  I was looking for this, too.  This page 
wasn't linked from anywhere on the main doc page (Overview) or any of 
the pull-down menus.  Someone should remind the doc team to update the 
table of contents on the Overview page.


-- ND

On 7/19/20 10:30 PM, Xiao Li wrote:
https://spark.apache.org/docs/3.0.0/web-ui.html is the official doc 
for Spark UI.


Xiao

On Sun, Jul 19, 2020 at 1:38 PM venkatadevarapu 
mailto:ramesh.biexp...@gmail.com>> wrote:


Hi,

I'm looking for a tutorial/video/material which explains the
content of
various tabes in SPARK WEB UI.
Can some one direct me with the relevant info.

Thanks



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org




--



Re: Spark UI

2020-07-19 Thread Piyush Acharya
https://www.youtube.com/watch?v=YgQgJceojJY  (Xiao's video )





On Mon, Jul 20, 2020 at 8:03 AM Xiao Li  wrote:

> https://spark.apache.org/docs/3.0.0/web-ui.html is the official doc
> for Spark UI.
>
> Xiao
>
> On Sun, Jul 19, 2020 at 1:38 PM venkatadevarapu 
> wrote:
>
>> Hi,
>>
>> I'm looking for a tutorial/video/material which explains the content of
>> various tabes in SPARK WEB UI.
>> Can some one direct me with the relevant info.
>>
>> Thanks
>>
>>
>>
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>
> --
> 
>


Re: Spark UI

2020-07-19 Thread Xiao Li
https://spark.apache.org/docs/3.0.0/web-ui.html is the official doc
for Spark UI.

Xiao

On Sun, Jul 19, 2020 at 1:38 PM venkatadevarapu 
wrote:

> Hi,
>
> I'm looking for a tutorial/video/material which explains the content of
> various tabes in SPARK WEB UI.
> Can some one direct me with the relevant info.
>
> Thanks
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

-- 



Re: Spark UI History server on Kubernetes

2019-01-23 Thread Li Gao
In addition to what Rao mentioned, if you are using cloud blob storage such
as AWS S3, you can specify your history location to be an S3 location such
as:  `s3://mybucket/path/to/history`


On Wed, Jan 23, 2019 at 12:55 AM Rao, Abhishek (Nokia - IN/Bangalore) <
abhishek@nokia.com> wrote:

> Hi Lakshman,
>
>
>
> We’ve set these 2 properties to bringup spark history server
>
>
>
> spark.history.fs.logDirectory 
>
> spark.history.ui.port 
>
>
>
> We’re writing the logs to HDFS. In order to write logs, we’re setting
> following properties while submitting the spark job
>
> spark.eventLog.enabled true
>
> spark.eventLog.dir 
>
>
>
> Thanks and Regards,
>
> Abhishek
>
>
>
> *From:* Battini Lakshman 
> *Sent:* Wednesday, January 23, 2019 1:55 PM
> *To:* Rao, Abhishek (Nokia - IN/Bangalore) 
> *Subject:* Re: Spark UI History server on Kubernetes
>
>
>
> HI Abhishek,
>
>
>
> Thank you for your response. Could you please let me know the properties
> you configured for bringing up History Server and its UI.
>
>
>
> Also, are you writing the logs to any directory on persistent storage, if
> yes, could you let me know the changes you did in Spark to write logs to
> that directory. Thanks!
>
>
>
> Best Regards,
>
> Lakshman Battini.
>
>
>
> On Tue, Jan 22, 2019 at 10:53 PM Rao, Abhishek (Nokia - IN/Bangalore) <
> abhishek@nokia.com> wrote:
>
> Hi,
>
>
>
> We’ve setup spark-history service (based on spark 2.4) on K8S. UI works
> perfectly fine when running on NodePort. We’re facing some issues when on
> ingress.
>
> Please let us know what kind of inputs do you need?
>
>
>
> Thanks and Regards,
>
> Abhishek
>
>
>
> *From:* Battini Lakshman 
> *Sent:* Tuesday, January 22, 2019 6:02 PM
> *To:* user@spark.apache.org
> *Subject:* Spark UI History server on Kubernetes
>
>
>
> Hello,
>
>
>
> We are running Spark 2.4 on Kubernetes cluster, able to access the Spark
> UI using "kubectl port-forward".
>
>
>
> However, this spark UI contains currently running Spark application logs,
> we would like to maintain the 'completed' spark application logs as well.
> Could someone help us to setup 'Spark History server' on Kubernetes. Thanks!
>
>
>
> Best Regards,
>
> Lakshman Battini.
>
>


RE: Spark UI History server on Kubernetes

2019-01-23 Thread Rao, Abhishek (Nokia - IN/Bangalore)
Hi Lakshman,

We’ve set these 2 properties to bringup spark history server

spark.history.fs.logDirectory 
spark.history.ui.port 

We’re writing the logs to HDFS. In order to write logs, we’re setting following 
properties while submitting the spark job
spark.eventLog.enabled true
spark.eventLog.dir 

Thanks and Regards,
Abhishek

From: Battini Lakshman 
Sent: Wednesday, January 23, 2019 1:55 PM
To: Rao, Abhishek (Nokia - IN/Bangalore) 
Subject: Re: Spark UI History server on Kubernetes

HI Abhishek,

Thank you for your response. Could you please let me know the properties you 
configured for bringing up History Server and its UI.

Also, are you writing the logs to any directory on persistent storage, if yes, 
could you let me know the changes you did in Spark to write logs to that 
directory. Thanks!

Best Regards,
Lakshman Battini.

On Tue, Jan 22, 2019 at 10:53 PM Rao, Abhishek (Nokia - IN/Bangalore) 
mailto:abhishek@nokia.com>> wrote:
Hi,

We’ve setup spark-history service (based on spark 2.4) on K8S. UI works 
perfectly fine when running on NodePort. We’re facing some issues when on 
ingress.
Please let us know what kind of inputs do you need?

Thanks and Regards,
Abhishek

From: Battini Lakshman 
mailto:battini.laksh...@gmail.com>>
Sent: Tuesday, January 22, 2019 6:02 PM
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Spark UI History server on Kubernetes

Hello,

We are running Spark 2.4 on Kubernetes cluster, able to access the Spark UI 
using "kubectl port-forward".

However, this spark UI contains currently running Spark application logs, we 
would like to maintain the 'completed' spark application logs as well. Could 
someone help us to setup 'Spark History server' on Kubernetes. Thanks!

Best Regards,
Lakshman Battini.


RE: Spark UI History server on Kubernetes

2019-01-22 Thread Rao, Abhishek (Nokia - IN/Bangalore)
Hi,

We’ve setup spark-history service (based on spark 2.4) on K8S. UI works 
perfectly fine when running on NodePort. We’re facing some issues when on 
ingress.
Please let us know what kind of inputs do you need?

Thanks and Regards,
Abhishek

From: Battini Lakshman 
Sent: Tuesday, January 22, 2019 6:02 PM
To: user@spark.apache.org
Subject: Spark UI History server on Kubernetes

Hello,

We are running Spark 2.4 on Kubernetes cluster, able to access the Spark UI 
using "kubectl port-forward".

However, this spark UI contains currently running Spark application logs, we 
would like to maintain the 'completed' spark application logs as well. Could 
someone help us to setup 'Spark History server' on Kubernetes. Thanks!

Best Regards,
Lakshman Battini.


Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-25 Thread Patrick Brown
Done:

https://issues.apache.org/jira/browse/SPARK-25837

On Thu, Oct 25, 2018 at 10:21 AM Marcelo Vanzin  wrote:

> Ah that makes more sense. Could you file a bug with that information
> so we don't lose track of this?
>
> Thanks
> On Wed, Oct 24, 2018 at 6:13 PM Patrick Brown
>  wrote:
> >
> > On my production application I am running ~200 jobs at once, but
> continue to submit jobs in this manner for sometimes ~1 hour.
> >
> > The reproduction code above generally only has 4 ish jobs running at
> once, and as you can see runs through 50k jobs in this manner.
> >
> > I guess I should clarify my above statement, the issue seems to appear
> when running multiple jobs at once as well as in sequence for a while and
> may as well have something to do with high master CPU usage (thus the
> collect in the code). My rough guess would be whatever is managing clearing
> out completed jobs gets overwhelmed (my master was a 4 core machine while
> running this, and htop reported almost full CPU usage across all 4 cores).
> >
> > The attached screenshot shows the state of the webui after running the
> repro code, you can see the ui is displaying some 43k completed jobs (takes
> a long time to load) after a few minutes of inactivity this will clear out,
> however as my production application continues to submit jobs every once in
> a while, the issue persists.
> >
> > On Wed, Oct 24, 2018 at 5:05 PM Marcelo Vanzin 
> wrote:
> >>
> >> When you say many jobs at once, what ballpark are you talking about?
> >>
> >> The code in 2.3+ does try to keep data about all running jobs and
> >> stages regardless of the limit. If you're running into issues because
> >> of that we may have to look again at whether that's the right thing to
> >> do.
> >> On Tue, Oct 23, 2018 at 10:02 AM Patrick Brown
> >>  wrote:
> >> >
> >> > I believe I may be able to reproduce this now, it seems like it may
> be something to do with many jobs at once:
> >> >
> >> > Spark 2.3.1
> >> >
> >> > > spark-shell --conf spark.ui.retainedJobs=1
> >> >
> >> > scala> import scala.concurrent._
> >> > scala> import scala.concurrent.ExecutionContext.Implicits.global
> >> > scala> for (i <- 0 until 5) { Future { println(sc.parallelize(0
> until i).collect.length) } }
> >> >
> >> > On Mon, Oct 22, 2018 at 11:25 AM Marcelo Vanzin 
> wrote:
> >> >>
> >> >> Just tried on 2.3.2 and worked fine for me. UI had a single job and a
> >> >> single stage (+ the tasks related to that single stage), same thing
> in
> >> >> memory (checked with jvisualvm).
> >> >>
> >> >> On Sat, Oct 20, 2018 at 6:45 PM Marcelo Vanzin 
> wrote:
> >> >> >
> >> >> > On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown
> >> >> >  wrote:
> >> >> > > I recently upgraded to spark 2.3.1 I have had these same
> settings in my spark submit script, which worked on 2.0.2, and according to
> the documentation appear to not have changed:
> >> >> > >
> >> >> > > spark.ui.retainedTasks=1
> >> >> > > spark.ui.retainedStages=1
> >> >> > > spark.ui.retainedJobs=1
> >> >> >
> >> >> > I tried that locally on the current master and it seems to be
> working.
> >> >> > I don't have 2.3 easily in front of me right now, but will take a
> look
> >> >> > Monday.
> >> >> >
> >> >> > --
> >> >> > Marcelo
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Marcelo
> >>
> >>
> >>
> >> --
> >> Marcelo
>
>
>
> --
> Marcelo
>


Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-25 Thread Marcelo Vanzin
Ah that makes more sense. Could you file a bug with that information
so we don't lose track of this?

Thanks
On Wed, Oct 24, 2018 at 6:13 PM Patrick Brown
 wrote:
>
> On my production application I am running ~200 jobs at once, but continue to 
> submit jobs in this manner for sometimes ~1 hour.
>
> The reproduction code above generally only has 4 ish jobs running at once, 
> and as you can see runs through 50k jobs in this manner.
>
> I guess I should clarify my above statement, the issue seems to appear when 
> running multiple jobs at once as well as in sequence for a while and may as 
> well have something to do with high master CPU usage (thus the collect in the 
> code). My rough guess would be whatever is managing clearing out completed 
> jobs gets overwhelmed (my master was a 4 core machine while running this, and 
> htop reported almost full CPU usage across all 4 cores).
>
> The attached screenshot shows the state of the webui after running the repro 
> code, you can see the ui is displaying some 43k completed jobs (takes a long 
> time to load) after a few minutes of inactivity this will clear out, however 
> as my production application continues to submit jobs every once in a while, 
> the issue persists.
>
> On Wed, Oct 24, 2018 at 5:05 PM Marcelo Vanzin  wrote:
>>
>> When you say many jobs at once, what ballpark are you talking about?
>>
>> The code in 2.3+ does try to keep data about all running jobs and
>> stages regardless of the limit. If you're running into issues because
>> of that we may have to look again at whether that's the right thing to
>> do.
>> On Tue, Oct 23, 2018 at 10:02 AM Patrick Brown
>>  wrote:
>> >
>> > I believe I may be able to reproduce this now, it seems like it may be 
>> > something to do with many jobs at once:
>> >
>> > Spark 2.3.1
>> >
>> > > spark-shell --conf spark.ui.retainedJobs=1
>> >
>> > scala> import scala.concurrent._
>> > scala> import scala.concurrent.ExecutionContext.Implicits.global
>> > scala> for (i <- 0 until 5) { Future { println(sc.parallelize(0 until 
>> > i).collect.length) } }
>> >
>> > On Mon, Oct 22, 2018 at 11:25 AM Marcelo Vanzin  
>> > wrote:
>> >>
>> >> Just tried on 2.3.2 and worked fine for me. UI had a single job and a
>> >> single stage (+ the tasks related to that single stage), same thing in
>> >> memory (checked with jvisualvm).
>> >>
>> >> On Sat, Oct 20, 2018 at 6:45 PM Marcelo Vanzin  
>> >> wrote:
>> >> >
>> >> > On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown
>> >> >  wrote:
>> >> > > I recently upgraded to spark 2.3.1 I have had these same settings in 
>> >> > > my spark submit script, which worked on 2.0.2, and according to the 
>> >> > > documentation appear to not have changed:
>> >> > >
>> >> > > spark.ui.retainedTasks=1
>> >> > > spark.ui.retainedStages=1
>> >> > > spark.ui.retainedJobs=1
>> >> >
>> >> > I tried that locally on the current master and it seems to be working.
>> >> > I don't have 2.3 easily in front of me right now, but will take a look
>> >> > Monday.
>> >> >
>> >> > --
>> >> > Marcelo
>> >>
>> >>
>> >>
>> >> --
>> >> Marcelo
>>
>>
>>
>> --
>> Marcelo



-- 
Marcelo

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-24 Thread Marcelo Vanzin
When you say many jobs at once, what ballpark are you talking about?

The code in 2.3+ does try to keep data about all running jobs and
stages regardless of the limit. If you're running into issues because
of that we may have to look again at whether that's the right thing to
do.
On Tue, Oct 23, 2018 at 10:02 AM Patrick Brown
 wrote:
>
> I believe I may be able to reproduce this now, it seems like it may be 
> something to do with many jobs at once:
>
> Spark 2.3.1
>
> > spark-shell --conf spark.ui.retainedJobs=1
>
> scala> import scala.concurrent._
> scala> import scala.concurrent.ExecutionContext.Implicits.global
> scala> for (i <- 0 until 5) { Future { println(sc.parallelize(0 until 
> i).collect.length) } }
>
> On Mon, Oct 22, 2018 at 11:25 AM Marcelo Vanzin  wrote:
>>
>> Just tried on 2.3.2 and worked fine for me. UI had a single job and a
>> single stage (+ the tasks related to that single stage), same thing in
>> memory (checked with jvisualvm).
>>
>> On Sat, Oct 20, 2018 at 6:45 PM Marcelo Vanzin  wrote:
>> >
>> > On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown
>> >  wrote:
>> > > I recently upgraded to spark 2.3.1 I have had these same settings in my 
>> > > spark submit script, which worked on 2.0.2, and according to the 
>> > > documentation appear to not have changed:
>> > >
>> > > spark.ui.retainedTasks=1
>> > > spark.ui.retainedStages=1
>> > > spark.ui.retainedJobs=1
>> >
>> > I tried that locally on the current master and it seems to be working.
>> > I don't have 2.3 easily in front of me right now, but will take a look
>> > Monday.
>> >
>> > --
>> > Marcelo
>>
>>
>>
>> --
>> Marcelo



-- 
Marcelo

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-23 Thread Patrick Brown
I believe I may be able to reproduce this now, it seems like it may be
something to do with many jobs at once:

Spark 2.3.1

> spark-shell --conf spark.ui.retainedJobs=1

scala> import scala.concurrent._
scala> import scala.concurrent.ExecutionContext.Implicits.global
scala> for (i <- 0 until 5) { Future { println(sc.parallelize(0 until
i).collect.length) } }

On Mon, Oct 22, 2018 at 11:25 AM Marcelo Vanzin  wrote:

> Just tried on 2.3.2 and worked fine for me. UI had a single job and a
> single stage (+ the tasks related to that single stage), same thing in
> memory (checked with jvisualvm).
>
> On Sat, Oct 20, 2018 at 6:45 PM Marcelo Vanzin 
> wrote:
> >
> > On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown
> >  wrote:
> > > I recently upgraded to spark 2.3.1 I have had these same settings in
> my spark submit script, which worked on 2.0.2, and according to the
> documentation appear to not have changed:
> > >
> > > spark.ui.retainedTasks=1
> > > spark.ui.retainedStages=1
> > > spark.ui.retainedJobs=1
> >
> > I tried that locally on the current master and it seems to be working.
> > I don't have 2.3 easily in front of me right now, but will take a look
> > Monday.
> >
> > --
> > Marcelo
>
>
>
> --
> Marcelo
>


Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-22 Thread Marcelo Vanzin
Just tried on 2.3.2 and worked fine for me. UI had a single job and a
single stage (+ the tasks related to that single stage), same thing in
memory (checked with jvisualvm).

On Sat, Oct 20, 2018 at 6:45 PM Marcelo Vanzin  wrote:
>
> On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown
>  wrote:
> > I recently upgraded to spark 2.3.1 I have had these same settings in my 
> > spark submit script, which worked on 2.0.2, and according to the 
> > documentation appear to not have changed:
> >
> > spark.ui.retainedTasks=1
> > spark.ui.retainedStages=1
> > spark.ui.retainedJobs=1
>
> I tried that locally on the current master and it seems to be working.
> I don't have 2.3 easily in front of me right now, but will take a look
> Monday.
>
> --
> Marcelo



-- 
Marcelo

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-20 Thread Marcelo Vanzin
On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown
 wrote:
> I recently upgraded to spark 2.3.1 I have had these same settings in my spark 
> submit script, which worked on 2.0.2, and according to the documentation 
> appear to not have changed:
>
> spark.ui.retainedTasks=1
> spark.ui.retainedStages=1
> spark.ui.retainedJobs=1

I tried that locally on the current master and it seems to be working.
I don't have 2.3 easily in front of me right now, but will take a look
Monday.

-- 
Marcelo

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-20 Thread Shing Hing Man
 I have the same problem when I upgrade my application from Spark 2.2.1 to 
Spark 2.3.2 and run in Yarn client mode.
Also I noticed that in my Spark driver,  org.apache.spark.status.TaskDataWrapper
could take up more than 2G of memory. 

Shing


On Tuesday, 16 October 2018, 17:34:02 GMT+1, Patrick Brown 
 wrote:  
 
 I recently upgraded to spark 2.3.1 I have had these same settings in my spark 
submit script, which worked on 2.0.2, and according to the documentation appear 
to not have changed:
spark.ui.retainedTasks=1spark.ui.retainedStages=1spark.ui.retainedJobs=1
However in 2.3.1 the UI doesn't seem to respect this, it still retains a huge 
number of jobs:



Is this a known issue? Any ideas?  
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Spark UI Source Code

2018-05-09 Thread Marcelo Vanzin
(-dev)

The KVStore API is private to Spark, it's not really meant to be used
by others. You're free to try, and there's a lot of javadocs on the
different interfaces, but it's not a general purpose database, so
you'll need to figure out things like that by yourself.

On Tue, May 8, 2018 at 9:53 PM, Anshi Shrivastava
<anshi.shrivast...@exadatum.com> wrote:
> Hi Marcelo, Dev,
>
> Thanks for your response.
> I have used SparkListeners to fetch the metrics (the public REST API uses
> the same) but to monitor these metrics over time, I have to persist them
> (using KVStore library of spark).  Is there a way to fetch data from this
> KVStore (which uses levelDb for storage) and filter it on basis on
> timestamp?
>
> Thanks,
> Anshi
>
> On Mon, May 7, 2018 at 9:51 PM, Marcelo Vanzin [via Apache Spark User List]
> <ml+s1001560n32114...@n3.nabble.com> wrote:
>>
>> On Mon, May 7, 2018 at 1:44 AM, Anshi Shrivastava
>> <[hidden email]> wrote:
>> > I've found a KVStore wrapper which stores all the metrics in a LevelDb
>> > store. This KVStore wrapper is available as a spark-dependency but we
>> > cannot
>> > access the metrics directly from spark since they are all private.
>>
>> I'm not sure what it is you're trying to do exactly, but there's a
>> public REST API that exposes all the data Spark keeps about
>> applications. There's also a programmatic status tracker
>> (SparkContext.statusTracker) that's easier to use from within the
>> running Spark app, but has a lot less info.
>>
>> > Can we use this store to store our own metrics?
>>
>> No.
>>
>> > Also can we retrieve these metrics based on timestamp?
>>
>> Only if the REST API has that feature, don't remember off the top of my
>> head.
>>
>>
>> --
>> Marcelo
>>
>> ---------
>> To unsubscribe e-mail: [hidden email]
>>
>>
>>
>> 
>> If you reply to this email, your message will be added to the discussion
>> below:
>>
>> http://apache-spark-user-list.1001560.n3.nabble.com/Re-Spark-UI-Source-Code-tp32114.html
>> To start a new topic under Apache Spark User List, email
>> ml+s1001560n1...@n3.nabble.com
>> To unsubscribe from Apache Spark User List, click here.
>> NAML
>
>
>
>
> DISCLAIMER:
> All the content in email is intended for the recipient and not to be
> published elsewhere without Exadatum consent. And attachments shall be send
> only if required and with ownership of the sender. This message contains
> confidential information and is intended only for the individual named. If
> you are not the named addressee, you should not disseminate, distribute or
> copy this email. Please notify the sender immediately by email if you have
> received this email by mistake and delete this email from your system. Email
> transmission cannot be guaranteed to be secure or error-free, as information
> could be intercepted, corrupted, lost, destroyed, arrive late or incomplete,
> or contain viruses. The sender, therefore, does not accept liability for any
> errors or omissions in the contents of this message which arise as a result
> of email transmission. If verification is required, please request a
> hard-copy version.



-- 
Marcelo

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark UI Source Code

2018-05-07 Thread Marcelo Vanzin
On Mon, May 7, 2018 at 1:44 AM, Anshi Shrivastava
 wrote:
> I've found a KVStore wrapper which stores all the metrics in a LevelDb
> store. This KVStore wrapper is available as a spark-dependency but we cannot
> access the metrics directly from spark since they are all private.

I'm not sure what it is you're trying to do exactly, but there's a
public REST API that exposes all the data Spark keeps about
applications. There's also a programmatic status tracker
(SparkContext.statusTracker) that's easier to use from within the
running Spark app, but has a lot less info.

> Can we use this store to store our own metrics?

No.

> Also can we retrieve these metrics based on timestamp?

Only if the REST API has that feature, don't remember off the top of my head.


-- 
Marcelo

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark UI crashes on Large Workloads

2017-07-18 Thread Saatvik Shah
Hi Riccardo,

Thanks for your suggestions.
The thing is that my Spark UI is the one thing that is crashing - and not
the app. In fact the app does end up completing successfully.
That's why I'm a bit confused by this issue?
I'll still try out some of your suggestions.
Thanks and Regards,
Saatvik Shah


On Tue, Jul 18, 2017 at 9:59 AM, Riccardo Ferrari 
wrote:

> The reason you get connection refused when connecting to the application
> UI (port 4040) is because you app gets stopped thus the application UI
> stops as well. To inspect your executors logs after the fact you might find
> useful the Spark History server
> 
> (for standalone mode).
>
> Personally I I collect the logs from my worker nodes. They generally sit
> under the $SPARK_HOME/work// (for standalone).
> There you can find exceptions and messages from the executors assigned to
> your app.
>
> Now, about you app crashing, might be useful check whether it is sized
> correctly. The issue you linked sounds appropriate however I would give
> some sanity checks a try. I solved many issues just by sizing an app that I
> would first check memory size, cpu allocations and so on..
>
> Best,
>
> On Tue, Jul 18, 2017 at 3:30 PM, Saatvik Shah 
> wrote:
>
>> Hi Riccardo,
>>
>> Yes, Thanks for suggesting I do that.
>>
>> [Stage 1:==>   (12750 + 40)
>> / 15000]17/07/18 13:22:28 ERROR org.apache.spark.scheduler.LiveListenerBus:
>> Dropping SparkListenerEvent because no remaining room in event queue. This
>> likely means one of the SparkListeners is too slow and cannot keep up with
>> the rate at which tasks are being started by the scheduler.
>> 17/07/18 13:22:28 WARN org.apache.spark.scheduler.LiveListenerBus:
>> Dropped 1 SparkListenerEvents since Thu Jan 01 00:00:00 UTC 1970
>> [Stage 1:> (13320 + 41)
>> / 15000]17/07/18 13:23:28 WARN org.apache.spark.scheduler.LiveListenerBus:
>> Dropped 26782 SparkListenerEvents since Tue Jul 18 13:22:28 UTC 2017
>> [Stage 1:==>   (13867 + 40)
>> / 15000]17/07/18 13:24:28 WARN org.apache.spark.scheduler.LiveListenerBus:
>> Dropped 58751 SparkListenerEvents since Tue Jul 18 13:23:28 UTC 2017
>> [Stage 1:===>  (14277 + 40)
>> / 15000]17/07/18 13:25:10 INFO 
>> org.spark_project.jetty.server.ServerConnector:
>> Stopped ServerConnector@3b7284c4{HTTP/1.1}{0.0.0.0:4040}
>> 17/07/18 13:25:10 ERROR org.apache.spark.scheduler.LiveListenerBus:
>> SparkListenerBus has already stopped! Dropping event
>> SparkListenerExecutorMetricsUpdate(4,WrappedArray())
>> And similar WARN/INFO messages continue occurring.
>>
>> When I try to access the UI, I get:
>>
>> Problem accessing /proxy/application_1500380353993_0001/. Reason:
>>
>> Connection to http://10.142.0.17:4040 refused
>>
>> Caused by:
>>
>> org.apache.http.conn.HttpHostConnectException: Connection to 
>> http://10.142.0.17:4040 refused
>>  at 
>> org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:190)
>>  at 
>> org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
>>  at 
>> org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:643)
>>  at 
>> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479)
>>  at 
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
>>  at 
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
>>  at 
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
>>  at 
>> org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.proxyLink(WebAppProxyServlet.java:200)
>>  at 
>> org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:387)
>>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
>>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>>
>>
>>
>> I noticed this issue talks about something similar and I guess is
>> related: https://issues.apache.org/jira/browse/SPARK-18838.
>>
>> On Tue, Jul 18, 2017 at 2:49 AM, Riccardo Ferrari 
>> wrote:
>>
>>> Hi,
>>>  can you share more details. do you have any exceptions from the driver?
>>> or executors?
>>>
>>> best,
>>>
>>> On Jul 18, 2017 02:49, "saatvikshah1994" 
>>> wrote:
>>>
 Hi,

 I have a pyspark App which when provided a huge amount of data as input
 throws the error explained here sometimes:
 https://stackoverflow.com/questions/32340639/unable-to-under
 stand-error-sparklistenerbus-has-already-stopped-dropping-event.
 All my code is running inside the 

Re: Spark UI crashes on Large Workloads

2017-07-18 Thread Riccardo Ferrari
The reason you get connection refused when connecting to the application UI
(port 4040) is because you app gets stopped thus the application UI stops
as well. To inspect your executors logs after the fact you might find
useful the Spark History server

(for standalone mode).

Personally I I collect the logs from my worker nodes. They generally sit
under the $SPARK_HOME/work// (for standalone).
There you can find exceptions and messages from the executors assigned to
your app.

Now, about you app crashing, might be useful check whether it is sized
correctly. The issue you linked sounds appropriate however I would give
some sanity checks a try. I solved many issues just by sizing an app that I
would first check memory size, cpu allocations and so on..

Best,

On Tue, Jul 18, 2017 at 3:30 PM, Saatvik Shah 
wrote:

> Hi Riccardo,
>
> Yes, Thanks for suggesting I do that.
>
> [Stage 1:==>   (12750 + 40) /
> 15000]17/07/18 13:22:28 ERROR org.apache.spark.scheduler.LiveListenerBus:
> Dropping SparkListenerEvent because no remaining room in event queue. This
> likely means one of the SparkListeners is too slow and cannot keep up with
> the rate at which tasks are being started by the scheduler.
> 17/07/18 13:22:28 WARN org.apache.spark.scheduler.LiveListenerBus:
> Dropped 1 SparkListenerEvents since Thu Jan 01 00:00:00 UTC 1970
> [Stage 1:> (13320 + 41) /
> 15000]17/07/18 13:23:28 WARN org.apache.spark.scheduler.LiveListenerBus:
> Dropped 26782 SparkListenerEvents since Tue Jul 18 13:22:28 UTC 2017
> [Stage 1:==>   (13867 + 40) /
> 15000]17/07/18 13:24:28 WARN org.apache.spark.scheduler.LiveListenerBus:
> Dropped 58751 SparkListenerEvents since Tue Jul 18 13:23:28 UTC 2017
> [Stage 1:===>  (14277 + 40) /
> 15000]17/07/18 13:25:10 INFO org.spark_project.jetty.server.ServerConnector:
> Stopped ServerConnector@3b7284c4{HTTP/1.1}{0.0.0.0:4040}
> 17/07/18 13:25:10 ERROR org.apache.spark.scheduler.LiveListenerBus:
> SparkListenerBus has already stopped! Dropping event
> SparkListenerExecutorMetricsUpdate(4,WrappedArray())
> And similar WARN/INFO messages continue occurring.
>
> When I try to access the UI, I get:
>
> Problem accessing /proxy/application_1500380353993_0001/. Reason:
>
> Connection to http://10.142.0.17:4040 refused
>
> Caused by:
>
> org.apache.http.conn.HttpHostConnectException: Connection to 
> http://10.142.0.17:4040 refused
>   at 
> org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:190)
>   at 
> org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
>   at 
> org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:643)
>   at 
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479)
>   at 
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
>   at 
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
>   at 
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
>   at 
> org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.proxyLink(WebAppProxyServlet.java:200)
>   at 
> org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:387)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>
>
>
> I noticed this issue talks about something similar and I guess is related:
> https://issues.apache.org/jira/browse/SPARK-18838.
>
> On Tue, Jul 18, 2017 at 2:49 AM, Riccardo Ferrari 
> wrote:
>
>> Hi,
>>  can you share more details. do you have any exceptions from the driver?
>> or executors?
>>
>> best,
>>
>> On Jul 18, 2017 02:49, "saatvikshah1994" 
>> wrote:
>>
>>> Hi,
>>>
>>> I have a pyspark App which when provided a huge amount of data as input
>>> throws the error explained here sometimes:
>>> https://stackoverflow.com/questions/32340639/unable-to-under
>>> stand-error-sparklistenerbus-has-already-stopped-dropping-event.
>>> All my code is running inside the main function, and the only slightly
>>> peculiar thing I am doing in this app is using a custom PySpark ML
>>> Transformer(Modified from
>>> https://stackoverflow.com/questions/32331848/create-a-custom
>>> -transformer-in-pyspark-ml).
>>> Could this be the issue? How can I debug why this is happening?
>>>
>>>
>>>
>>> --
>>> View this message in context: http://apache-spark-user-list.
>>> 1001560.n3.nabble.com/Spark-UI-crashes-on-Large-Workloads-tp28873.html
>>> Sent from the Apache Spark User 

Re: Spark UI crashes on Large Workloads

2017-07-18 Thread Saatvik Shah
Hi Riccardo,

Yes, Thanks for suggesting I do that.

[Stage 1:==>   (12750 + 40) /
15000]17/07/18 13:22:28 ERROR org.apache.spark.scheduler.LiveListenerBus:
Dropping SparkListenerEvent because no remaining room in event queue. This
likely means one of the SparkListeners is too slow and cannot keep up with
the rate at which tasks are being started by the scheduler.
17/07/18 13:22:28 WARN org.apache.spark.scheduler.LiveListenerBus: Dropped
1 SparkListenerEvents since Thu Jan 01 00:00:00 UTC 1970
[Stage 1:> (13320 + 41) /
15000]17/07/18 13:23:28 WARN org.apache.spark.scheduler.LiveListenerBus:
Dropped 26782 SparkListenerEvents since Tue Jul 18 13:22:28 UTC 2017
[Stage 1:==>   (13867 + 40) /
15000]17/07/18 13:24:28 WARN org.apache.spark.scheduler.LiveListenerBus:
Dropped 58751 SparkListenerEvents since Tue Jul 18 13:23:28 UTC 2017
[Stage 1:===>  (14277 + 40) /
15000]17/07/18 13:25:10 INFO
org.spark_project.jetty.server.ServerConnector: Stopped
ServerConnector@3b7284c4{HTTP/1.1}{0.0.0.0:4040}
17/07/18 13:25:10 ERROR org.apache.spark.scheduler.LiveListenerBus:
SparkListenerBus has already stopped! Dropping event
SparkListenerExecutorMetricsUpdate(4,WrappedArray())
And similar WARN/INFO messages continue occurring.

When I try to access the UI, I get:

Problem accessing /proxy/application_1500380353993_0001/. Reason:

Connection to http://10.142.0.17:4040 refused

Caused by:

org.apache.http.conn.HttpHostConnectException: Connection to
http://10.142.0.17:4040 refused
at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:190)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:643)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at 
org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.proxyLink(WebAppProxyServlet.java:200)
at 
org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:387)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)



I noticed this issue talks about something similar and I guess is related:
https://issues.apache.org/jira/browse/SPARK-18838.

On Tue, Jul 18, 2017 at 2:49 AM, Riccardo Ferrari 
wrote:

> Hi,
>  can you share more details. do you have any exceptions from the driver?
> or executors?
>
> best,
>
> On Jul 18, 2017 02:49, "saatvikshah1994" 
> wrote:
>
>> Hi,
>>
>> I have a pyspark App which when provided a huge amount of data as input
>> throws the error explained here sometimes:
>> https://stackoverflow.com/questions/32340639/unable-to-under
>> stand-error-sparklistenerbus-has-already-stopped-dropping-event.
>> All my code is running inside the main function, and the only slightly
>> peculiar thing I am doing in this app is using a custom PySpark ML
>> Transformer(Modified from
>> https://stackoverflow.com/questions/32331848/create-a-custom
>> -transformer-in-pyspark-ml).
>> Could this be the issue? How can I debug why this is happening?
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-user-list.
>> 1001560.n3.nabble.com/Spark-UI-crashes-on-Large-Workloads-tp28873.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>


-- 
*Saatvik Shah,*
*Masters in the School of Computer Science,*
*Carnegie Mellon University,*
*LinkedIn , Website
*


Re: Spark UI crashes on Large Workloads

2017-07-18 Thread Riccardo Ferrari
Hi,
 can you share more details. do you have any exceptions from the driver? or
executors?

best,

On Jul 18, 2017 02:49, "saatvikshah1994"  wrote:

> Hi,
>
> I have a pyspark App which when provided a huge amount of data as input
> throws the error explained here sometimes:
> https://stackoverflow.com/questions/32340639/unable-to-understand-error-
> sparklistenerbus-has-already-stopped-dropping-event.
> All my code is running inside the main function, and the only slightly
> peculiar thing I am doing in this app is using a custom PySpark ML
> Transformer(Modified from
> https://stackoverflow.com/questions/32331848/create-a-
> custom-transformer-in-pyspark-ml).
> Could this be the issue? How can I debug why this is happening?
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Spark-UI-crashes-on-Large-Workloads-tp28873.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: Spark UI shows Jobs are processing, but the files are already written to S3

2017-05-19 Thread Miles Crawford
Could I be experiencing the same thing?

https://www.dropbox.com/s/egtj1056qeudswj/sparkwut.png?dl=0

On Wed, Nov 16, 2016 at 10:37 AM, Shreya Agarwal 
wrote:

> I think that is a bug. I have seen that a lot especially with long running
> jobs where Spark skips a lot of stages because it has pre-computed results.
> And some of these are never marked as completed, even though in reality
> they are. I figured this out because I was using the interactive shell
> (spark-shell) and the shell came up to a prompt indicating the job had
> finished even though there were a lot of Active jobs and tasks according to
> the UI. And my output is correct.
>
>
>
> Is there a JIRA item tracking this?
>
>
>
> *From:* Kuchekar [mailto:kuchekar.nil...@gmail.com]
> *Sent:* Wednesday, November 16, 2016 10:00 AM
> *To:* spark users 
> *Subject:* Spark UI shows Jobs are processing, but the files are already
> written to S3
>
>
>
> Hi,
>
>
>
>  I am running a spark job, which saves the computed data (massive
> data) to S3. On  the Spark Ui I see the some jobs are active, but no
> activity in the logs. Also on S3 all the data has be written (verified each
> bucket --> it has _SUCCESS file)
>
>
>
> Am I missing something?
>
>
>
> Thanks.
>
> Kuchekar, Nilesh
>


RE: Spark UI not coming up in EMR

2017-01-11 Thread Saurabh Malviya (samalviy)
Any clue on this.

Jobs are running fine , But not able to access Spark UI in EMR -yarn.

Where I can see statistics like , No of events /per sec  and rows processed  
for streaming in log files (If UI is not working)

-Saurabh

From: Saurabh Malviya (samalviy)
Sent: Monday, January 09, 2017 10:59 AM
To: user@spark.apache.org
Subject: Spark UI not coming up in EMR

Spark web UI for detailed monitoring for streaming jobs stop rendering after 2 
weeks. Its keep looping to fetch the page. Is there any clue I can get that 
page. Or logs where I can see how many events coming in spark for each internval

-Saurabh


RE: Spark UI shows Jobs are processing, but the files are already written to S3

2016-11-16 Thread Shreya Agarwal
I think that is a bug. I have seen that a lot especially with long running jobs 
where Spark skips a lot of stages because it has pre-computed results. And some 
of these are never marked as completed, even though in reality they are. I 
figured this out because I was using the interactive shell (spark-shell) and 
the shell came up to a prompt indicating the job had finished even though there 
were a lot of Active jobs and tasks according to the UI. And my output is 
correct.

Is there a JIRA item tracking this?

From: Kuchekar [mailto:kuchekar.nil...@gmail.com]
Sent: Wednesday, November 16, 2016 10:00 AM
To: spark users 
Subject: Spark UI shows Jobs are processing, but the files are already written 
to S3

Hi,

 I am running a spark job, which saves the computed data (massive data) to 
S3. On  the Spark Ui I see the some jobs are active, but no activity in the 
logs. Also on S3 all the data has be written (verified each bucket --> it has 
_SUCCESS file)

Am I missing something?

Thanks.
Kuchekar, Nilesh


Re: Spark UI error spark 2.0.1 hadoop 2.6

2016-10-27 Thread gpatcham
I'm able to fix.. added servlet 3.0 to classpath



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-UI-error-spark-2-0-1-hadoop-2-6-tp27970p27971.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark UI shows finished when job had an error

2016-06-17 Thread Mich Talebzadeh
Spark GUI runs by default on 4040 and if a job crashes (assuming you meant
there was an issue with spark-submit), then the GUI will disconnect.

GUI is not there for diagnostics as it reports on statistics. My
inclination would be to look at the YARN log files assuming you are using
YARN as your resource manager or the output from the spark-submit that you
piped to a file.

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 17 June 2016 at 14:49, Sumona Routh  wrote:

> Hi there,
> Our Spark job had an error (specifically the Cassandra table definition
> did not match what was in Cassandra), which threw an exception that logged
> out to our spark-submit log.
> However ,the UI never showed any failed stage or job. It appeared as if
> the job finished without error, which is not correct.
>
> We are trying to define our monitoring for our scheduled jobs, and we
> intended to use the Spark UI to catch issues. Can we explain why the UI
> would not report an exception like this? Is there a better approach we
> should use for tracking failures in a Spark job?
>
> We are currently on 1.2 standalone, however we do intend to upgrade to 1.6
> shortly.
>
> Thanks!
> Sumona
>


Re: Spark UI shows finished when job had an error

2016-06-17 Thread Gourav Sengupta
Hi,

Can you please see the query plan (in case you are using a query)?

There is a very high chance that the query was broken into multiple steps
and only a subsequent step failed.


Regards,
Gourav Sengupta

On Fri, Jun 17, 2016 at 2:49 PM, Sumona Routh  wrote:

> Hi there,
> Our Spark job had an error (specifically the Cassandra table definition
> did not match what was in Cassandra), which threw an exception that logged
> out to our spark-submit log.
> However ,the UI never showed any failed stage or job. It appeared as if
> the job finished without error, which is not correct.
>
> We are trying to define our monitoring for our scheduled jobs, and we
> intended to use the Spark UI to catch issues. Can we explain why the UI
> would not report an exception like this? Is there a better approach we
> should use for tracking failures in a Spark job?
>
> We are currently on 1.2 standalone, however we do intend to upgrade to 1.6
> shortly.
>
> Thanks!
> Sumona
>


Re: Spark UI shows finished when job had an error

2016-06-17 Thread Jacek Laskowski
Hi,

How do you access Cassandra? Could that connector not have sent a
SparkListenerEvent to inform about failure?

Jacek
On 17 Jun 2016 3:50 p.m., "Sumona Routh"  wrote:

> Hi there,
> Our Spark job had an error (specifically the Cassandra table definition
> did not match what was in Cassandra), which threw an exception that logged
> out to our spark-submit log.
> However ,the UI never showed any failed stage or job. It appeared as if
> the job finished without error, which is not correct.
>
> We are trying to define our monitoring for our scheduled jobs, and we
> intended to use the Spark UI to catch issues. Can we explain why the UI
> would not report an exception like this? Is there a better approach we
> should use for tracking failures in a Spark job?
>
> We are currently on 1.2 standalone, however we do intend to upgrade to 1.6
> shortly.
>
> Thanks!
> Sumona
>


Re: Spark UI doesn't give visibility on which stage job actually failed (due to lazy eval nature)

2016-05-25 Thread Nirav Patel
I think it does because user doesn't exactly see their application logic
and flow as spark internal does. Off course we follow general guidelines
for performance but we shouldn't care really how exactly spark decide to
execute DAG. Spark scheduler or core can keep changing over time to
optimize it. So optimizing from user perspective is to look at what
transformation they are using and what they are doing inside those
transformation. If user have some transparency from framework on how those
transformation are utilizing resources over time or where they are failing
we can better optimize it . That way we are focused on our application
logic rather what framework is doing underneath.

About soln, doesn't spark driver (spark context + event listner) have
knowledge of every job, taskset, task and their current state? Spark UI can
relate job to stage to task then why not stage to transformation.

Again my real point is to assess this as an requirement from users,
stakeholders perspective regardless of technical challenge.

Thanks
Nirav

On Wed, May 25, 2016 at 8:04 PM, Mark Hamstra 
wrote:

> But when you talk about optimizing the DAG, it really doesn't make sense
> to also talk about transformation steps as separate entities.  The
> DAGScheduler knows about Jobs, Stages, TaskSets and Tasks.  The
> TaskScheduler knows about TaskSets ad Tasks.  Neither of them understands
> the transformation steps that you used to define your RDD -- at least not
> as separable, distinct steps.  To give the kind of
> transformation-step-oriented information that you want would require parts
> of Spark that don't currently concern themselves at all with RDD
> transformation steps to start tracking them and how they map to Jobs,
> Stages, TaskSets and Tasks -- and when you start talking about Datasets and
> Spark SQL, you then needing to start talking about tracking and mapping
> concepts like Plans, Schemas and Queries.  It would introduce significant
> new complexity.
>
> On Wed, May 25, 2016 at 6:59 PM, Nirav Patel 
> wrote:
>
>> Hi Mark,
>>
>> I might have said stage instead of step in my last statement "UI just
>> says Collect failed but in fact it could be any stage in that lazy chain of
>> evaluation."
>>
>> Anyways even you agree that this visibility of underlaying steps wont't
>> be available. which does pose difficulties in terms of troubleshooting as
>> well as optimizations at step level. I think users will have hard time
>> without this. Its great that spark community working on different levels of
>> internal optimizations but its also important to give enough visibility
>> to users to enable them to debug issues and resolve bottleneck.
>> There is also no visibility into how spark utilizes shuffle memory space
>> vs user memory space vs cache space. It's a separate topic though. If
>> everything is working magically as a black box then it's fine but when you
>> have large number of people on this site complaining about  OOM and shuffle
>> error all the time you need to start providing some transparency to
>> address that.
>>
>> Thanks
>>
>>
>> On Wed, May 25, 2016 at 6:41 PM, Mark Hamstra 
>> wrote:
>>
>>> You appear to be misunderstanding the nature of a Stage.  Individual
>>> transformation steps such as `map` do not define the boundaries of Stages.
>>> Rather, a sequence of transformations in which there is only a
>>> NarrowDependency between each of the transformations will be pipelined into
>>> a single Stage.  It is only when there is a ShuffleDependency that a new
>>> Stage will be defined -- i.e. shuffle boundaries define Stage boundaries.
>>> With whole stage code gen in Spark 2.0, there will be even less opportunity
>>> to treat individual transformations within a sequence of narrow
>>> dependencies as though they were discrete, separable entities.  The Failed
>>> Stages portion of the Web UI will tell you which Stage in a Job failed, and
>>> the accompanying error log message will generally also give you some idea
>>> of which Tasks failed and why.  Tracing the error back further and at a
>>> different level of abstraction to lay blame on a particular transformation
>>> wouldn't be particularly easy.
>>>
>>> On Wed, May 25, 2016 at 5:28 PM, Nirav Patel 
>>> wrote:
>>>
 It's great that spark scheduler does optimized DAG processing and only
 does lazy eval when some action is performed or shuffle dependency is
 encountered. Sometime it goes further after shuffle dep before executing
 anything. e.g. if there are map steps after shuffle then it doesn't stop at
 shuffle to execute anything but goes to that next map steps until it finds
 a reason(spark action) to execute. As a result stage that spark is running
 can be internally series of (map -> shuffle -> map -> map -> collect) and
 spark UI just shows its currently running 'collect' stage. SO  if job fails
 at that point 

Re: Spark UI doesn't give visibility on which stage job actually failed (due to lazy eval nature)

2016-05-25 Thread Mark Hamstra
But when you talk about optimizing the DAG, it really doesn't make sense to
also talk about transformation steps as separate entities.  The
DAGScheduler knows about Jobs, Stages, TaskSets and Tasks.  The
TaskScheduler knows about TaskSets ad Tasks.  Neither of them understands
the transformation steps that you used to define your RDD -- at least not
as separable, distinct steps.  To give the kind of
transformation-step-oriented information that you want would require parts
of Spark that don't currently concern themselves at all with RDD
transformation steps to start tracking them and how they map to Jobs,
Stages, TaskSets and Tasks -- and when you start talking about Datasets and
Spark SQL, you then needing to start talking about tracking and mapping
concepts like Plans, Schemas and Queries.  It would introduce significant
new complexity.

On Wed, May 25, 2016 at 6:59 PM, Nirav Patel  wrote:

> Hi Mark,
>
> I might have said stage instead of step in my last statement "UI just
> says Collect failed but in fact it could be any stage in that lazy chain of
> evaluation."
>
> Anyways even you agree that this visibility of underlaying steps wont't be
> available. which does pose difficulties in terms of troubleshooting as well
> as optimizations at step level. I think users will have hard time without
> this. Its great that spark community working on different levels of
> internal optimizations but its also important to give enough visibility
> to users to enable them to debug issues and resolve bottleneck.
> There is also no visibility into how spark utilizes shuffle memory space
> vs user memory space vs cache space. It's a separate topic though. If
> everything is working magically as a black box then it's fine but when you
> have large number of people on this site complaining about  OOM and shuffle
> error all the time you need to start providing some transparency to
> address that.
>
> Thanks
>
>
> On Wed, May 25, 2016 at 6:41 PM, Mark Hamstra 
> wrote:
>
>> You appear to be misunderstanding the nature of a Stage.  Individual
>> transformation steps such as `map` do not define the boundaries of Stages.
>> Rather, a sequence of transformations in which there is only a
>> NarrowDependency between each of the transformations will be pipelined into
>> a single Stage.  It is only when there is a ShuffleDependency that a new
>> Stage will be defined -- i.e. shuffle boundaries define Stage boundaries.
>> With whole stage code gen in Spark 2.0, there will be even less opportunity
>> to treat individual transformations within a sequence of narrow
>> dependencies as though they were discrete, separable entities.  The Failed
>> Stages portion of the Web UI will tell you which Stage in a Job failed, and
>> the accompanying error log message will generally also give you some idea
>> of which Tasks failed and why.  Tracing the error back further and at a
>> different level of abstraction to lay blame on a particular transformation
>> wouldn't be particularly easy.
>>
>> On Wed, May 25, 2016 at 5:28 PM, Nirav Patel 
>> wrote:
>>
>>> It's great that spark scheduler does optimized DAG processing and only
>>> does lazy eval when some action is performed or shuffle dependency is
>>> encountered. Sometime it goes further after shuffle dep before executing
>>> anything. e.g. if there are map steps after shuffle then it doesn't stop at
>>> shuffle to execute anything but goes to that next map steps until it finds
>>> a reason(spark action) to execute. As a result stage that spark is running
>>> can be internally series of (map -> shuffle -> map -> map -> collect) and
>>> spark UI just shows its currently running 'collect' stage. SO  if job fails
>>> at that point spark UI just says Collect failed but in fact it could be any
>>> stage in that lazy chain of evaluation. Looking at executor logs gives some
>>> insights but that's not always straightforward.
>>> Correct me if I am wrong here but I think we need more visibility into
>>> what's happening underneath so we can easily troubleshoot as well as
>>> optimize our DAG.
>>>
>>> THanks
>>>
>>>
>>>
>>> [image: What's New with Xactly] 
>>>
>>>   [image: LinkedIn]
>>>   [image: Twitter]
>>>   [image: Facebook]
>>>   [image: YouTube]
>>> 
>>
>>
>>
>
>
>
> [image: What's New with Xactly] 
>
>   [image: LinkedIn]
>   [image: Twitter]
>   [image: Facebook]
>   [image: YouTube]
> 
>


Re: Spark UI doesn't give visibility on which stage job actually failed (due to lazy eval nature)

2016-05-25 Thread Nirav Patel
Hi Mark,

I might have said stage instead of step in my last statement "UI just says
Collect failed but in fact it could be any stage in that lazy chain of
evaluation."

Anyways even you agree that this visibility of underlaying steps wont't be
available. which does pose difficulties in terms of troubleshooting as well
as optimizations at step level. I think users will have hard time without
this. Its great that spark community working on different levels of
internal optimizations but its also important to give enough visibility to
users to enable them to debug issues and resolve bottleneck.
There is also no visibility into how spark utilizes shuffle memory space vs
user memory space vs cache space. It's a separate topic though. If
everything is working magically as a black box then it's fine but when you
have large number of people on this site complaining about  OOM and shuffle
error all the time you need to start providing some transparency to address
that.

Thanks


On Wed, May 25, 2016 at 6:41 PM, Mark Hamstra 
wrote:

> You appear to be misunderstanding the nature of a Stage.  Individual
> transformation steps such as `map` do not define the boundaries of Stages.
> Rather, a sequence of transformations in which there is only a
> NarrowDependency between each of the transformations will be pipelined into
> a single Stage.  It is only when there is a ShuffleDependency that a new
> Stage will be defined -- i.e. shuffle boundaries define Stage boundaries.
> With whole stage code gen in Spark 2.0, there will be even less opportunity
> to treat individual transformations within a sequence of narrow
> dependencies as though they were discrete, separable entities.  The Failed
> Stages portion of the Web UI will tell you which Stage in a Job failed, and
> the accompanying error log message will generally also give you some idea
> of which Tasks failed and why.  Tracing the error back further and at a
> different level of abstraction to lay blame on a particular transformation
> wouldn't be particularly easy.
>
> On Wed, May 25, 2016 at 5:28 PM, Nirav Patel 
> wrote:
>
>> It's great that spark scheduler does optimized DAG processing and only
>> does lazy eval when some action is performed or shuffle dependency is
>> encountered. Sometime it goes further after shuffle dep before executing
>> anything. e.g. if there are map steps after shuffle then it doesn't stop at
>> shuffle to execute anything but goes to that next map steps until it finds
>> a reason(spark action) to execute. As a result stage that spark is running
>> can be internally series of (map -> shuffle -> map -> map -> collect) and
>> spark UI just shows its currently running 'collect' stage. SO  if job fails
>> at that point spark UI just says Collect failed but in fact it could be any
>> stage in that lazy chain of evaluation. Looking at executor logs gives some
>> insights but that's not always straightforward.
>> Correct me if I am wrong here but I think we need more visibility into
>> what's happening underneath so we can easily troubleshoot as well as
>> optimize our DAG.
>>
>> THanks
>>
>>
>>
>> [image: What's New with Xactly] 
>>
>>   [image: LinkedIn]
>>   [image: Twitter]
>>   [image: Facebook]
>>   [image: YouTube]
>> 
>
>
>

-- 


[image: What's New with Xactly] 

  [image: LinkedIn] 
  [image: Twitter] 
  [image: Facebook] 
  [image: YouTube] 



Re: Spark UI doesn't give visibility on which stage job actually failed (due to lazy eval nature)

2016-05-25 Thread Mark Hamstra
You appear to be misunderstanding the nature of a Stage.  Individual
transformation steps such as `map` do not define the boundaries of Stages.
Rather, a sequence of transformations in which there is only a
NarrowDependency between each of the transformations will be pipelined into
a single Stage.  It is only when there is a ShuffleDependency that a new
Stage will be defined -- i.e. shuffle boundaries define Stage boundaries.
With whole stage code gen in Spark 2.0, there will be even less opportunity
to treat individual transformations within a sequence of narrow
dependencies as though they were discrete, separable entities.  The Failed
Stages portion of the Web UI will tell you which Stage in a Job failed, and
the accompanying error log message will generally also give you some idea
of which Tasks failed and why.  Tracing the error back further and at a
different level of abstraction to lay blame on a particular transformation
wouldn't be particularly easy.

On Wed, May 25, 2016 at 5:28 PM, Nirav Patel  wrote:

> It's great that spark scheduler does optimized DAG processing and only
> does lazy eval when some action is performed or shuffle dependency is
> encountered. Sometime it goes further after shuffle dep before executing
> anything. e.g. if there are map steps after shuffle then it doesn't stop at
> shuffle to execute anything but goes to that next map steps until it finds
> a reason(spark action) to execute. As a result stage that spark is running
> can be internally series of (map -> shuffle -> map -> map -> collect) and
> spark UI just shows its currently running 'collect' stage. SO  if job fails
> at that point spark UI just says Collect failed but in fact it could be any
> stage in that lazy chain of evaluation. Looking at executor logs gives some
> insights but that's not always straightforward.
> Correct me if I am wrong here but I think we need more visibility into
> what's happening underneath so we can easily troubleshoot as well as
> optimize our DAG.
>
> THanks
>
>
>
> [image: What's New with Xactly] 
>
>   [image: LinkedIn]
>   [image: Twitter]
>   [image: Facebook]
>   [image: YouTube]
> 


Re: Spark UI Completed Jobs

2016-03-15 Thread Prabhu Joseph
Thanks Mark and Jeff

On Wed, Mar 16, 2016 at 7:11 AM, Mark Hamstra 
wrote:

> Looks to me like the one remaining Stage would execute 19788 Task if all
> of those Tasks succeeded on the first try; but because of retries, 19841
> Tasks were actually executed.  Meanwhile, there were 41405 Tasks in the the
> 163 Stages that were skipped.
>
> I think -- but the Spark UI's accounting may not be 100% accurate and bug
> free.
>
> On Tue, Mar 15, 2016 at 6:34 PM, Prabhu Joseph  > wrote:
>
>> Okay, so out of 164 stages, is 163 are skipped. And how 41405 tasks are
>> skipped if the total is only 19788.
>>
>> On Wed, Mar 16, 2016 at 6:31 AM, Mark Hamstra 
>> wrote:
>>
>>> It's not just if the RDD is explicitly cached, but also if the map
>>> outputs for stages have been materialized into shuffle files and are still
>>> accessible through the map output tracker.  Because of that, explicitly
>>> caching RDD actions often gains you little or nothing, since even without a
>>> call to cache() or persist() the prior computation will largely be reused
>>> and stages will show up as skipped -- i.e. no need to recompute that stage.
>>>
>>> On Tue, Mar 15, 2016 at 5:50 PM, Jeff Zhang  wrote:
>>>
 If RDD is cached, this RDD is only computed once and the stages for
 computing this RDD in the following jobs are skipped.


 On Wed, Mar 16, 2016 at 8:14 AM, Prabhu Joseph <
 prabhujose.ga...@gmail.com> wrote:

> Hi All,
>
>
> Spark UI Completed Jobs section shows below information, what is the
> skipped value shown for Stages and Tasks below.
>
> Job_IDDescriptionSubmitted
> Duration   Stages (Succeeded/Total)Tasks (for all stages):
> Succeeded/Total
>
> 11 count  2016/03/14 15:35:32  1.4
> min 164/164 * (163 skipped)   *19841/19788
> *(41405 skipped)*
> Thanks,
> Prabhu Joseph
>



 --
 Best Regards

 Jeff Zhang

>>>
>>>
>>
>


Re: Spark UI Completed Jobs

2016-03-15 Thread Prabhu Joseph
Okay, so out of 164 stages, is 163 are skipped. And how 41405 tasks are
skipped if the total is only 19788.

On Wed, Mar 16, 2016 at 6:31 AM, Mark Hamstra 
wrote:

> It's not just if the RDD is explicitly cached, but also if the map outputs
> for stages have been materialized into shuffle files and are still
> accessible through the map output tracker.  Because of that, explicitly
> caching RDD actions often gains you little or nothing, since even without a
> call to cache() or persist() the prior computation will largely be reused
> and stages will show up as skipped -- i.e. no need to recompute that stage.
>
> On Tue, Mar 15, 2016 at 5:50 PM, Jeff Zhang  wrote:
>
>> If RDD is cached, this RDD is only computed once and the stages for
>> computing this RDD in the following jobs are skipped.
>>
>>
>> On Wed, Mar 16, 2016 at 8:14 AM, Prabhu Joseph <
>> prabhujose.ga...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>>
>>> Spark UI Completed Jobs section shows below information, what is the
>>> skipped value shown for Stages and Tasks below.
>>>
>>> Job_IDDescriptionSubmittedDuration
>>> Stages (Succeeded/Total)Tasks (for all stages): Succeeded/Total
>>>
>>> 11 count  2016/03/14 15:35:32  1.4
>>> min 164/164 * (163 skipped)   *19841/19788
>>> *(41405 skipped)*
>>> Thanks,
>>> Prabhu Joseph
>>>
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>
>


Re: Spark UI Completed Jobs

2016-03-15 Thread Mark Hamstra
It's not just if the RDD is explicitly cached, but also if the map outputs
for stages have been materialized into shuffle files and are still
accessible through the map output tracker.  Because of that, explicitly
caching RDD actions often gains you little or nothing, since even without a
call to cache() or persist() the prior computation will largely be reused
and stages will show up as skipped -- i.e. no need to recompute that stage.

On Tue, Mar 15, 2016 at 5:50 PM, Jeff Zhang  wrote:

> If RDD is cached, this RDD is only computed once and the stages for
> computing this RDD in the following jobs are skipped.
>
>
> On Wed, Mar 16, 2016 at 8:14 AM, Prabhu Joseph  > wrote:
>
>> Hi All,
>>
>>
>> Spark UI Completed Jobs section shows below information, what is the
>> skipped value shown for Stages and Tasks below.
>>
>> Job_IDDescriptionSubmittedDuration
>> Stages (Succeeded/Total)Tasks (for all stages): Succeeded/Total
>>
>> 11 count  2016/03/14 15:35:32  1.4
>> min 164/164 * (163 skipped)   *19841/19788
>> *(41405 skipped)*
>> Thanks,
>> Prabhu Joseph
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>


Re: Spark UI standalone "crashes" after an application finishes

2016-03-01 Thread Gourav Sengupta
Hi Teng,

I was not asking the question, I was responding in terms of what to expect
from SPARK UI in terms of how you start using SPARK application.

Thanks and Regards,
Gourav

On Tue, Mar 1, 2016 at 8:30 PM, Teng Qiu  wrote:

> as Gourav said, the application UI on port 4040 will no more available
> after your spark app finished. you should go to spark master's UI
> (port 8080), and take a look "completed applications"...
>
> refer to doc: http://spark.apache.org/docs/latest/monitoring.html
> read the first "note that" :)
>
> 2016-03-01 21:13 GMT+01:00 Gourav Sengupta :
> > Hi,
> >
> > in case you are submitting your SPARK jobs then the UI is only available
> > when the job is running.
> >
> > Else if you are starting a SPARK cluster in standalone mode or HADOOP or
> > etc, then the SPARK UI remains alive.
> >
> > The other way to keep the SPARK UI alive is to use the Jupyter notebook
> for
> > Python or Scala (see Apache Toree) or use Zeppelin.
> >
> >
> > Regards,
> > Gourav Sengupta
> >
> > On Mon, Feb 29, 2016 at 11:48 PM, Sumona Routh 
> wrote:
> >>
> >> Hi there,
> >> I've been doing some performance tuning of our Spark application, which
> is
> >> using Spark 1.2.1 standalone. I have been using the spark metrics to
> graph
> >> out details as I run the jobs, as well as the UI to review the tasks and
> >> stages.
> >>
> >> I notice that after my application completes, or is near completion, the
> >> UI "crashes." I get a Connection Refused response. Sometimes, the page
> >> eventually recovers and will load again, but sometimes I end up having
> to
> >> restart the Spark master to get it back. When I look at my graphs on the
> >> app, the memory consumption (of driver, executors, and what I believe
> to be
> >> the daemon (spark.jvm.total.used)) appears to be healthy. Monitoring the
> >> master machine itself, memory and CPU appear healthy as well.
> >>
> >> Has anyone else seen this issue? Are there logs for the UI itself, and
> >> where might I find those?
> >>
> >> Thanks!
> >> Sumona
> >
> >
>


Re: Spark UI standalone "crashes" after an application finishes

2016-03-01 Thread Teng Qiu
as Gourav said, the application UI on port 4040 will no more available
after your spark app finished. you should go to spark master's UI
(port 8080), and take a look "completed applications"...

refer to doc: http://spark.apache.org/docs/latest/monitoring.html
read the first "note that" :)

2016-03-01 21:13 GMT+01:00 Gourav Sengupta :
> Hi,
>
> in case you are submitting your SPARK jobs then the UI is only available
> when the job is running.
>
> Else if you are starting a SPARK cluster in standalone mode or HADOOP or
> etc, then the SPARK UI remains alive.
>
> The other way to keep the SPARK UI alive is to use the Jupyter notebook for
> Python or Scala (see Apache Toree) or use Zeppelin.
>
>
> Regards,
> Gourav Sengupta
>
> On Mon, Feb 29, 2016 at 11:48 PM, Sumona Routh  wrote:
>>
>> Hi there,
>> I've been doing some performance tuning of our Spark application, which is
>> using Spark 1.2.1 standalone. I have been using the spark metrics to graph
>> out details as I run the jobs, as well as the UI to review the tasks and
>> stages.
>>
>> I notice that after my application completes, or is near completion, the
>> UI "crashes." I get a Connection Refused response. Sometimes, the page
>> eventually recovers and will load again, but sometimes I end up having to
>> restart the Spark master to get it back. When I look at my graphs on the
>> app, the memory consumption (of driver, executors, and what I believe to be
>> the daemon (spark.jvm.total.used)) appears to be healthy. Monitoring the
>> master machine itself, memory and CPU appear healthy as well.
>>
>> Has anyone else seen this issue? Are there logs for the UI itself, and
>> where might I find those?
>>
>> Thanks!
>> Sumona
>
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark UI standalone "crashes" after an application finishes

2016-03-01 Thread Gourav Sengupta
Hi,

in case you are submitting your SPARK jobs then the UI is only available
when the job is running.

Else if you are starting a SPARK cluster in standalone mode or HADOOP or
etc, then the SPARK UI remains alive.

The other way to keep the SPARK UI alive is to use the Jupyter notebook for
Python or Scala (see Apache Toree) or use Zeppelin.


Regards,
Gourav Sengupta

On Mon, Feb 29, 2016 at 11:48 PM, Sumona Routh  wrote:

> Hi there,
> I've been doing some performance tuning of our Spark application, which is
> using Spark 1.2.1 standalone. I have been using the spark metrics to graph
> out details as I run the jobs, as well as the UI to review the tasks and
> stages.
>
> I notice that after my application completes, or is near completion, the
> UI "crashes." I get a Connection Refused response. Sometimes, the page
> eventually recovers and will load again, but sometimes I end up having to
> restart the Spark master to get it back. When I look at my graphs on the
> app, the memory consumption (of driver, executors, and what I believe to be
> the daemon (spark.jvm.total.used)) appears to be healthy. Monitoring the
> master machine itself, memory and CPU appear healthy as well.
>
> Has anyone else seen this issue? Are there logs for the UI itself, and
> where might I find those?
>
> Thanks!
> Sumona
>


Re: Spark UI standalone "crashes" after an application finishes

2016-03-01 Thread Sumona Routh
Thanks Shixiong!
To clarify for others, yes, I was speaking of the UI at port 4040, and I do
have event logging enabled, so I can review jobs after the fact. We hope to
upgrade our version of Spark soon, so I'll write back if that resolves it.

Sumona

On Mon, Feb 29, 2016 at 8:27 PM Sea <261810...@qq.com> wrote:

> Hi, Sumona:
>   It's a bug in Spark old version, In spark 1.6.0, it is fixed.
>   After the application complete, spark master will load event log to
> memory, and it is sync because of actor. If the event log is big, spark
> master will hang a long time, and you can not submit any applications, if
> your master memory is to small, you master will die!
>   The solution in spark 1.6 is not very good, the operation is async
> <https://www.baidu.com/link?url=x_WhMZLHfNnhHGknDAZ8Ssl9f7YlEQAvUgpLAGz6cI045umWecBzzh0ho-QkCr2nKnHOPJxIX5_n_zXe51k8z9hVuw4svP6dqWF0JrjabAa==be50a4160f49000256d50b7b>,
> and so you still need to set a big java heap for master.
>
>
>
> -- 原始邮件 --
> *发件人:* "Shixiong(Ryan) Zhu";<shixi...@databricks.com>;
> *发送时间:* 2016年3月1日(星期二) 上午8:02
> *收件人:* "Sumona Routh"<sumos...@gmail.com>;
> *抄送:* "user@spark.apache.org"<user@spark.apache.org>;
> *主题:* Re: Spark UI standalone "crashes" after an application finishes
>
> Do you mean you cannot access Master UI after your application completes?
> Could you check the master log?
>
> On Mon, Feb 29, 2016 at 3:48 PM, Sumona Routh <sumos...@gmail.com> wrote:
>
>> Hi there,
>> I've been doing some performance tuning of our Spark application, which
>> is using Spark 1.2.1 standalone. I have been using the spark metrics to
>> graph out details as I run the jobs, as well as the UI to review the tasks
>> and stages.
>>
>> I notice that after my application completes, or is near completion, the
>> UI "crashes." I get a Connection Refused response. Sometimes, the page
>> eventually recovers and will load again, but sometimes I end up having to
>> restart the Spark master to get it back. When I look at my graphs on the
>> app, the memory consumption (of driver, executors, and what I believe to be
>> the daemon (spark.jvm.total.used)) appears to be healthy. Monitoring the
>> master machine itself, memory and CPU appear healthy as well.
>>
>> Has anyone else seen this issue? Are there logs for the UI itself, and
>> where might I find those?
>>
>> Thanks!
>> Sumona
>>
>
>


RE: Spark UI standalone "crashes" after an application finishes

2016-02-29 Thread Mohammed Guller
I believe the OP is referring to the application UI on port 4040.

The application UI on port 4040 is available only while application is running. 
As per the documentation:
To view the web UI after the fact, set spark.eventLog.enabled to true before 
starting the application. This configures Spark to log Spark events that encode 
the information displayed in the UI to persisted storage.

Mohammed
Author: Big Data Analytics with 
Spark<http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>

From: Shixiong(Ryan) Zhu [mailto:shixi...@databricks.com]
Sent: Monday, February 29, 2016 4:03 PM
To: Sumona Routh
Cc: user@spark.apache.org
Subject: Re: Spark UI standalone "crashes" after an application finishes

Do you mean you cannot access Master UI after your application completes? Could 
you check the master log?

On Mon, Feb 29, 2016 at 3:48 PM, Sumona Routh 
<sumos...@gmail.com<mailto:sumos...@gmail.com>> wrote:
Hi there,
I've been doing some performance tuning of our Spark application, which is 
using Spark 1.2.1 standalone. I have been using the spark metrics to graph out 
details as I run the jobs, as well as the UI to review the tasks and stages.
I notice that after my application completes, or is near completion, the UI 
"crashes." I get a Connection Refused response. Sometimes, the page eventually 
recovers and will load again, but sometimes I end up having to restart the 
Spark master to get it back. When I look at my graphs on the app, the memory 
consumption (of driver, executors, and what I believe to be the daemon 
(spark.jvm.total.used)) appears to be healthy. Monitoring the master machine 
itself, memory and CPU appear healthy as well.
Has anyone else seen this issue? Are there logs for the UI itself, and where 
might I find those?
Thanks!
Sumona



Re: Spark UI standalone "crashes" after an application finishes

2016-02-29 Thread Shixiong(Ryan) Zhu
Do you mean you cannot access Master UI after your application completes?
Could you check the master log?

On Mon, Feb 29, 2016 at 3:48 PM, Sumona Routh  wrote:

> Hi there,
> I've been doing some performance tuning of our Spark application, which is
> using Spark 1.2.1 standalone. I have been using the spark metrics to graph
> out details as I run the jobs, as well as the UI to review the tasks and
> stages.
>
> I notice that after my application completes, or is near completion, the
> UI "crashes." I get a Connection Refused response. Sometimes, the page
> eventually recovers and will load again, but sometimes I end up having to
> restart the Spark master to get it back. When I look at my graphs on the
> app, the memory consumption (of driver, executors, and what I believe to be
> the daemon (spark.jvm.total.used)) appears to be healthy. Monitoring the
> master machine itself, memory and CPU appear healthy as well.
>
> Has anyone else seen this issue? Are there logs for the UI itself, and
> where might I find those?
>
> Thanks!
> Sumona
>


Re: Spark UI documentaton needed

2016-02-22 Thread nsalian
Hi Ajay,

Feel free to open a JIRA with the fields that you think are missing and what
kind of documentation you wish to see.

It would be best to have it in a JIRA to actually track and triage your
suggestions.

Thank you.



-
Neelesh S. Salian
Cloudera
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-UI-documentaton-needed-tp26300p26301.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: spark ui security

2016-01-07 Thread Ted Yu
According to https://spark.apache.org/docs/latest/security.html#web-ui ,
web UI is covered.

FYI

On Thu, Jan 7, 2016 at 6:35 AM, Kostiantyn Kudriavtsev <
kudryavtsev.konstan...@gmail.com> wrote:

> hi community,
>
> do I understand correctly that spark.ui.filters property sets up filters
> only for jobui interface? is it any way to protect spark web ui in the same
> *way?*
>


Re: spark ui security

2016-01-07 Thread Kostiantyn Kudriavtsev
can I do it without kerberos and hadoop?
ideally using filters as for job UI

On Jan 7, 2016, at 1:22 PM, Prem Sure  wrote:

> you can refer more on https://searchcode.com/codesearch/view/97658783/
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SecurityManager.scala
> 
> spark.authenticate = true
> spark.ui.acls.enable = true
> spark.ui.view.acls = user1,user2
> spark.ui.filters = 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter
> spark.org.apache.hadoop.security.authentication.server.AuthenticationFilter.params="type=kerberos,kerberos.principal=HTTP/mybox@MYDOMAIN,kerberos.keytab=/some/keytab"
> 
> 
> 
> 
> On Thu, Jan 7, 2016 at 10:35 AM, Kostiantyn Kudriavtsev 
>  wrote:
> I’m afraid I missed where this property must be specified? I added it to 
> spark-xxx.conf which is basically configurable per job, so I assume to 
> protect WebUI the different place must be used, isn’t it?
> 
> On Jan 7, 2016, at 10:28 AM, Ted Yu  wrote:
> 
>> According to https://spark.apache.org/docs/latest/security.html#web-ui , web 
>> UI is covered.
>> 
>> FYI
>> 
>> On Thu, Jan 7, 2016 at 6:35 AM, Kostiantyn Kudriavtsev 
>>  wrote:
>> hi community,
>> 
>> do I understand correctly that spark.ui.filters property sets up filters 
>> only for jobui interface? is it any way to protect spark web ui in the same 
>> way?
>> 
> 
> 



Re: spark ui security

2016-01-07 Thread Ted Yu
Without kerberos you don't have true security.

Cheers

On Thu, Jan 7, 2016 at 1:56 PM, Kostiantyn Kudriavtsev <
kudryavtsev.konstan...@gmail.com> wrote:

> can I do it without kerberos and hadoop?
> ideally using filters as for job UI
>
> On Jan 7, 2016, at 1:22 PM, Prem Sure  wrote:
>
> you can refer more on https://searchcode.com/codesearch/view/97658783/
>
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SecurityManager.scala
>
> spark.authenticate = true
> spark.ui.acls.enable = true
> spark.ui.view.acls = user1,user2
> spark.ui.filters =
> org.apache.hadoop.security.authentication.server.AuthenticationFilter
>
> spark.org.apache.hadoop.security.authentication.server.AuthenticationFilter.params="type=kerberos,kerberos.principal=HTTP/mybox@MYDOMAIN
> ,kerberos.keytab=/some/keytab"
>
>
>
>
> On Thu, Jan 7, 2016 at 10:35 AM, Kostiantyn Kudriavtsev <
> kudryavtsev.konstan...@gmail.com> wrote:
>
>> I’m afraid I missed where this property must be specified? I added it to
>> spark-xxx.conf which is basically configurable per job, so I assume to
>> protect WebUI the different place must be used, isn’t it?
>>
>> On Jan 7, 2016, at 10:28 AM, Ted Yu  wrote:
>>
>> According to https://spark.apache.org/docs/latest/security.html#web-ui ,
>> web UI is covered.
>>
>> FYI
>>
>> On Thu, Jan 7, 2016 at 6:35 AM, Kostiantyn Kudriavtsev <
>> kudryavtsev.konstan...@gmail.com> wrote:
>>
>>> hi community,
>>>
>>> do I understand correctly that spark.ui.filters property sets up
>>> filters only for jobui interface? is it any way to protect spark web ui in
>>> the same *way?*
>>>
>>
>>
>>
>
>


Re: spark ui security

2016-01-07 Thread Kostiantyn Kudriavtsev
I know, but I need only to hide/protect web ui at least with servlet/filter api 

On Jan 7, 2016, at 4:59 PM, Ted Yu  wrote:

> Without kerberos you don't have true security.
> 
> Cheers
> 
> On Thu, Jan 7, 2016 at 1:56 PM, Kostiantyn Kudriavtsev 
>  wrote:
> can I do it without kerberos and hadoop?
> ideally using filters as for job UI
> 
> On Jan 7, 2016, at 1:22 PM, Prem Sure  wrote:
> 
>> you can refer more on https://searchcode.com/codesearch/view/97658783/
>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SecurityManager.scala
>> 
>> spark.authenticate = true
>> spark.ui.acls.enable = true
>> spark.ui.view.acls = user1,user2
>> spark.ui.filters = 
>> org.apache.hadoop.security.authentication.server.AuthenticationFilter
>> spark.org.apache.hadoop.security.authentication.server.AuthenticationFilter.params="type=kerberos,kerberos.principal=HTTP/mybox@MYDOMAIN,kerberos.keytab=/some/keytab"
>> 
>> 
>> 
>> 
>> On Thu, Jan 7, 2016 at 10:35 AM, Kostiantyn Kudriavtsev 
>>  wrote:
>> I’m afraid I missed where this property must be specified? I added it to 
>> spark-xxx.conf which is basically configurable per job, so I assume to 
>> protect WebUI the different place must be used, isn’t it?
>> 
>> On Jan 7, 2016, at 10:28 AM, Ted Yu  wrote:
>> 
>>> According to https://spark.apache.org/docs/latest/security.html#web-ui , 
>>> web UI is covered.
>>> 
>>> FYI
>>> 
>>> On Thu, Jan 7, 2016 at 6:35 AM, Kostiantyn Kudriavtsev 
>>>  wrote:
>>> hi community,
>>> 
>>> do I understand correctly that spark.ui.filters property sets up filters 
>>> only for jobui interface? is it any way to protect spark web ui in the same 
>>> way?
>>> 
>> 
>> 
> 
> 



Re: Spark UI - Streaming Tab

2015-12-04 Thread PhuDuc Nguyen
I believe the "Streaming" tab is dynamic - it appears once you have a
streaming job running, not when the cluster is simply up. It does not
depend on 1.6 and has been in there since at least 1.0.

HTH,
Duc

On Fri, Dec 4, 2015 at 7:28 AM, patcharee  wrote:

> Hi,
>
> We tried to get the streaming tab interface on Spark UI -
> https://databricks.com/blog/2015/07/08/new-visualizations-for-understanding-spark-streaming-applications.html
>
> Tested on version 1.5.1, 1.6.0-snapshot, but no such interface for
> streaming applications at all. Any suggestions? Do we need to configure the
> history UI somehow to get such interface?
>
> Thanks,
> Patcharee
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Spark UI - Streaming Tab

2015-12-04 Thread patcharee

I ran streaming jobs, but no streaming tab appeared for those jobs.

Patcharee


On 04. des. 2015 18:12, PhuDuc Nguyen wrote:
I believe the "Streaming" tab is dynamic - it appears once you have a 
streaming job running, not when the cluster is simply up. It does not 
depend on 1.6 and has been in there since at least 1.0.


HTH,
Duc

On Fri, Dec 4, 2015 at 7:28 AM, patcharee > wrote:


Hi,

We tried to get the streaming tab interface on Spark UI -

https://databricks.com/blog/2015/07/08/new-visualizations-for-understanding-spark-streaming-applications.html

Tested on version 1.5.1, 1.6.0-snapshot, but no such interface for
streaming applications at all. Any suggestions? Do we need to
configure the history UI somehow to get such interface?

Thanks,
Patcharee

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

For additional commands, e-mail: user-h...@spark.apache.org







Re: Spark UI - Streaming Tab

2015-12-04 Thread Josh Rosen
The Streaming tab is only supported in the live UI, not in the History
Server.

On Fri, Dec 4, 2015 at 9:31 AM, patcharee  wrote:

> I ran streaming jobs, but no streaming tab appeared for those jobs.
>
> Patcharee
>
>
>
> On 04. des. 2015 18:12, PhuDuc Nguyen wrote:
>
> I believe the "Streaming" tab is dynamic - it appears once you have a
> streaming job running, not when the cluster is simply up. It does not
> depend on 1.6 and has been in there since at least 1.0.
>
> HTH,
> Duc
>
> On Fri, Dec 4, 2015 at 7:28 AM, patcharee 
> wrote:
>
>> Hi,
>>
>> We tried to get the streaming tab interface on Spark UI -
>> 
>> https://databricks.com/blog/2015/07/08/new-visualizations-for-understanding-spark-streaming-applications.html
>>
>> Tested on version 1.5.1, 1.6.0-snapshot, but no such interface for
>> streaming applications at all. Any suggestions? Do we need to configure the
>> history UI somehow to get such interface?
>>
>> Thanks,
>> Patcharee
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>
>


Re: Spark UI consuming lots of memory

2015-10-27 Thread Patrick McGloin
Hi Nicholas,

I think you are right about the issue relating to Spark-11126, I'm seeing
it as well.

Did you find any workaround?  Looking at the pull request for the fix it
doesn't look possible.

Best regards,
Patrick

On 15 October 2015 at 19:40, Nicholas Pritchard <
nicholas.pritch...@falkonry.com> wrote:

> Thanks for your help, most likely this is the memory leak you are fixing
> in https://issues.apache.org/jira/browse/SPARK-11126.
> -Nick
>
> On Mon, Oct 12, 2015 at 9:00 PM, Shixiong Zhu  wrote:
>
>> In addition, you cannot turn off JobListener and SQLListener now...
>>
>> Best Regards,
>> Shixiong Zhu
>>
>> 2015-10-13 11:59 GMT+08:00 Shixiong Zhu :
>>
>>> Is your query very complicated? Could you provide the output of
>>> `explain` your query that consumes an excessive amount of memory? If this
>>> is a small query, there may be a bug that leaks memory in SQLListener.
>>>
>>> Best Regards,
>>> Shixiong Zhu
>>>
>>> 2015-10-13 11:44 GMT+08:00 Nicholas Pritchard <
>>> nicholas.pritch...@falkonry.com>:
>>>
 As an update, I did try disabling the ui with "spark.ui.enabled=false",
 but the JobListener and SQLListener still consume a lot of memory, leading
 to OOM error. Has anyone encountered this before? Is the only solution just
 to increase the driver heap size?

 Thanks,
 Nick

 On Mon, Oct 12, 2015 at 8:42 PM, Nicholas Pritchard <
 nicholas.pritch...@falkonry.com> wrote:

> I set those configurations by passing to spark-submit script:
> "bin/spark-submit --conf spark.ui.retainedJobs=20 ...". I have verified
> that these configurations are being passed correctly because they are
> listed in the environments tab and also by counting the number of
> job/stages that are listed. The "spark.sql.ui.retainedExecutions=0"
> only applies to the number of "completed" executions; there will always be
> a "running" execution. For some reason, I have one execution that consumes
> an excessive amount of memory.
>
> Actually, I am not interested in the SQL UI, as I find the Job/Stages
> UI to have sufficient information. I am also using Spark Standalone 
> cluster
> manager so have not had to use the history server.
>
>
> On Mon, Oct 12, 2015 at 8:17 PM, Shixiong Zhu 
> wrote:
>
>> Could you show how did you set the configurations? You need to set
>> these configurations before creating SparkContext and SQLContext.
>>
>> Moreover, the history sever doesn't support SQL UI. So
>> "spark.eventLog.enabled=true" doesn't work now.
>>
>> Best Regards,
>> Shixiong Zhu
>>
>> 2015-10-13 2:01 GMT+08:00 pnpritchard <
>> nicholas.pritch...@falkonry.com>:
>>
>>> Hi,
>>>
>>> In my application, the Spark UI is consuming a lot of memory,
>>> especially the
>>> SQL tab. I have set the following configurations to reduce the memory
>>> consumption:
>>> - spark.ui.retainedJobs=20
>>> - spark.ui.retainedStages=40
>>> - spark.sql.ui.retainedExecutions=0
>>>
>>> However, I still get OOM errors in the driver process with the
>>> default 1GB
>>> heap size. The following link is a screen shot of a heap dump report,
>>> showing the SQLListener instance having a retained size of 600MB.
>>>
>>> https://cloud.githubusercontent.com/assets/5124612/10404379/20fbdcfc-6e87-11e5-9415-27e25193a25c.png
>>>
>>> Rather than just increasing the allotted heap size, does anyone have
>>> any
>>> other ideas? Is it possible to disable the SQL tab specifically? I
>>> also
>>> thought about serving the UI from disk rather than memory with
>>> "spark.eventLog.enabled=true" and "spark.ui.enabled=false". Has
>>> anyone tried
>>> this before?
>>>
>>> Thanks,
>>> Nick
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-UI-consuming-lots-of-memory-tp25033.html
>>> Sent from the Apache Spark User List mailing list archive at
>>> Nabble.com.
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>

>>>
>>
>


Re: Spark UI consuming lots of memory

2015-10-15 Thread Nicholas Pritchard
Thanks for your help, most likely this is the memory leak you are fixing in
https://issues.apache.org/jira/browse/SPARK-11126.
-Nick

On Mon, Oct 12, 2015 at 9:00 PM, Shixiong Zhu  wrote:

> In addition, you cannot turn off JobListener and SQLListener now...
>
> Best Regards,
> Shixiong Zhu
>
> 2015-10-13 11:59 GMT+08:00 Shixiong Zhu :
>
>> Is your query very complicated? Could you provide the output of `explain`
>> your query that consumes an excessive amount of memory? If this is a small
>> query, there may be a bug that leaks memory in SQLListener.
>>
>> Best Regards,
>> Shixiong Zhu
>>
>> 2015-10-13 11:44 GMT+08:00 Nicholas Pritchard <
>> nicholas.pritch...@falkonry.com>:
>>
>>> As an update, I did try disabling the ui with "spark.ui.enabled=false",
>>> but the JobListener and SQLListener still consume a lot of memory, leading
>>> to OOM error. Has anyone encountered this before? Is the only solution just
>>> to increase the driver heap size?
>>>
>>> Thanks,
>>> Nick
>>>
>>> On Mon, Oct 12, 2015 at 8:42 PM, Nicholas Pritchard <
>>> nicholas.pritch...@falkonry.com> wrote:
>>>
 I set those configurations by passing to spark-submit script:
 "bin/spark-submit --conf spark.ui.retainedJobs=20 ...". I have verified
 that these configurations are being passed correctly because they are
 listed in the environments tab and also by counting the number of
 job/stages that are listed. The "spark.sql.ui.retainedExecutions=0"
 only applies to the number of "completed" executions; there will always be
 a "running" execution. For some reason, I have one execution that consumes
 an excessive amount of memory.

 Actually, I am not interested in the SQL UI, as I find the Job/Stages
 UI to have sufficient information. I am also using Spark Standalone cluster
 manager so have not had to use the history server.


 On Mon, Oct 12, 2015 at 8:17 PM, Shixiong Zhu 
 wrote:

> Could you show how did you set the configurations? You need to set
> these configurations before creating SparkContext and SQLContext.
>
> Moreover, the history sever doesn't support SQL UI. So
> "spark.eventLog.enabled=true" doesn't work now.
>
> Best Regards,
> Shixiong Zhu
>
> 2015-10-13 2:01 GMT+08:00 pnpritchard  >:
>
>> Hi,
>>
>> In my application, the Spark UI is consuming a lot of memory,
>> especially the
>> SQL tab. I have set the following configurations to reduce the memory
>> consumption:
>> - spark.ui.retainedJobs=20
>> - spark.ui.retainedStages=40
>> - spark.sql.ui.retainedExecutions=0
>>
>> However, I still get OOM errors in the driver process with the
>> default 1GB
>> heap size. The following link is a screen shot of a heap dump report,
>> showing the SQLListener instance having a retained size of 600MB.
>>
>> https://cloud.githubusercontent.com/assets/5124612/10404379/20fbdcfc-6e87-11e5-9415-27e25193a25c.png
>>
>> Rather than just increasing the allotted heap size, does anyone have
>> any
>> other ideas? Is it possible to disable the SQL tab specifically? I
>> also
>> thought about serving the UI from disk rather than memory with
>> "spark.eventLog.enabled=true" and "spark.ui.enabled=false". Has
>> anyone tried
>> this before?
>>
>> Thanks,
>> Nick
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-UI-consuming-lots-of-memory-tp25033.html
>> Sent from the Apache Spark User List mailing list archive at
>> Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

>>>
>>
>


Re: Spark UI consuming lots of memory

2015-10-12 Thread Nicholas Pritchard
As an update, I did try disabling the ui with "spark.ui.enabled=false", but
the JobListener and SQLListener still consume a lot of memory, leading to
OOM error. Has anyone encountered this before? Is the only solution just to
increase the driver heap size?

Thanks,
Nick

On Mon, Oct 12, 2015 at 8:42 PM, Nicholas Pritchard <
nicholas.pritch...@falkonry.com> wrote:

> I set those configurations by passing to spark-submit script:
> "bin/spark-submit --conf spark.ui.retainedJobs=20 ...". I have verified
> that these configurations are being passed correctly because they are
> listed in the environments tab and also by counting the number of
> job/stages that are listed. The "spark.sql.ui.retainedExecutions=0" only
> applies to the number of "completed" executions; there will always be a
> "running" execution. For some reason, I have one execution that consumes an
> excessive amount of memory.
>
> Actually, I am not interested in the SQL UI, as I find the Job/Stages UI
> to have sufficient information. I am also using Spark Standalone cluster
> manager so have not had to use the history server.
>
>
> On Mon, Oct 12, 2015 at 8:17 PM, Shixiong Zhu  wrote:
>
>> Could you show how did you set the configurations? You need to set these
>> configurations before creating SparkContext and SQLContext.
>>
>> Moreover, the history sever doesn't support SQL UI. So
>> "spark.eventLog.enabled=true" doesn't work now.
>>
>> Best Regards,
>> Shixiong Zhu
>>
>> 2015-10-13 2:01 GMT+08:00 pnpritchard :
>>
>>> Hi,
>>>
>>> In my application, the Spark UI is consuming a lot of memory, especially
>>> the
>>> SQL tab. I have set the following configurations to reduce the memory
>>> consumption:
>>> - spark.ui.retainedJobs=20
>>> - spark.ui.retainedStages=40
>>> - spark.sql.ui.retainedExecutions=0
>>>
>>> However, I still get OOM errors in the driver process with the default
>>> 1GB
>>> heap size. The following link is a screen shot of a heap dump report,
>>> showing the SQLListener instance having a retained size of 600MB.
>>>
>>> https://cloud.githubusercontent.com/assets/5124612/10404379/20fbdcfc-6e87-11e5-9415-27e25193a25c.png
>>>
>>> Rather than just increasing the allotted heap size, does anyone have any
>>> other ideas? Is it possible to disable the SQL tab specifically? I also
>>> thought about serving the UI from disk rather than memory with
>>> "spark.eventLog.enabled=true" and "spark.ui.enabled=false". Has anyone
>>> tried
>>> this before?
>>>
>>> Thanks,
>>> Nick
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-UI-consuming-lots-of-memory-tp25033.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>


Re: Spark UI consuming lots of memory

2015-10-12 Thread Nicholas Pritchard
I set those configurations by passing to spark-submit script:
"bin/spark-submit --conf spark.ui.retainedJobs=20 ...". I have verified
that these configurations are being passed correctly because they are
listed in the environments tab and also by counting the number of
job/stages that are listed. The "spark.sql.ui.retainedExecutions=0" only
applies to the number of "completed" executions; there will always be a
"running" execution. For some reason, I have one execution that consumes an
excessive amount of memory.

Actually, I am not interested in the SQL UI, as I find the Job/Stages UI to
have sufficient information. I am also using Spark Standalone cluster
manager so have not had to use the history server.


On Mon, Oct 12, 2015 at 8:17 PM, Shixiong Zhu  wrote:

> Could you show how did you set the configurations? You need to set these
> configurations before creating SparkContext and SQLContext.
>
> Moreover, the history sever doesn't support SQL UI. So
> "spark.eventLog.enabled=true" doesn't work now.
>
> Best Regards,
> Shixiong Zhu
>
> 2015-10-13 2:01 GMT+08:00 pnpritchard :
>
>> Hi,
>>
>> In my application, the Spark UI is consuming a lot of memory, especially
>> the
>> SQL tab. I have set the following configurations to reduce the memory
>> consumption:
>> - spark.ui.retainedJobs=20
>> - spark.ui.retainedStages=40
>> - spark.sql.ui.retainedExecutions=0
>>
>> However, I still get OOM errors in the driver process with the default 1GB
>> heap size. The following link is a screen shot of a heap dump report,
>> showing the SQLListener instance having a retained size of 600MB.
>>
>> https://cloud.githubusercontent.com/assets/5124612/10404379/20fbdcfc-6e87-11e5-9415-27e25193a25c.png
>>
>> Rather than just increasing the allotted heap size, does anyone have any
>> other ideas? Is it possible to disable the SQL tab specifically? I also
>> thought about serving the UI from disk rather than memory with
>> "spark.eventLog.enabled=true" and "spark.ui.enabled=false". Has anyone
>> tried
>> this before?
>>
>> Thanks,
>> Nick
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-UI-consuming-lots-of-memory-tp25033.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


Re: Spark UI consuming lots of memory

2015-10-12 Thread Shixiong Zhu
In addition, you cannot turn off JobListener and SQLListener now...

Best Regards,
Shixiong Zhu

2015-10-13 11:59 GMT+08:00 Shixiong Zhu :

> Is your query very complicated? Could you provide the output of `explain`
> your query that consumes an excessive amount of memory? If this is a small
> query, there may be a bug that leaks memory in SQLListener.
>
> Best Regards,
> Shixiong Zhu
>
> 2015-10-13 11:44 GMT+08:00 Nicholas Pritchard <
> nicholas.pritch...@falkonry.com>:
>
>> As an update, I did try disabling the ui with "spark.ui.enabled=false",
>> but the JobListener and SQLListener still consume a lot of memory, leading
>> to OOM error. Has anyone encountered this before? Is the only solution just
>> to increase the driver heap size?
>>
>> Thanks,
>> Nick
>>
>> On Mon, Oct 12, 2015 at 8:42 PM, Nicholas Pritchard <
>> nicholas.pritch...@falkonry.com> wrote:
>>
>>> I set those configurations by passing to spark-submit script:
>>> "bin/spark-submit --conf spark.ui.retainedJobs=20 ...". I have verified
>>> that these configurations are being passed correctly because they are
>>> listed in the environments tab and also by counting the number of
>>> job/stages that are listed. The "spark.sql.ui.retainedExecutions=0"
>>> only applies to the number of "completed" executions; there will always be
>>> a "running" execution. For some reason, I have one execution that consumes
>>> an excessive amount of memory.
>>>
>>> Actually, I am not interested in the SQL UI, as I find the Job/Stages UI
>>> to have sufficient information. I am also using Spark Standalone cluster
>>> manager so have not had to use the history server.
>>>
>>>
>>> On Mon, Oct 12, 2015 at 8:17 PM, Shixiong Zhu  wrote:
>>>
 Could you show how did you set the configurations? You need to set
 these configurations before creating SparkContext and SQLContext.

 Moreover, the history sever doesn't support SQL UI. So
 "spark.eventLog.enabled=true" doesn't work now.

 Best Regards,
 Shixiong Zhu

 2015-10-13 2:01 GMT+08:00 pnpritchard 
 :

> Hi,
>
> In my application, the Spark UI is consuming a lot of memory,
> especially the
> SQL tab. I have set the following configurations to reduce the memory
> consumption:
> - spark.ui.retainedJobs=20
> - spark.ui.retainedStages=40
> - spark.sql.ui.retainedExecutions=0
>
> However, I still get OOM errors in the driver process with the default
> 1GB
> heap size. The following link is a screen shot of a heap dump report,
> showing the SQLListener instance having a retained size of 600MB.
>
> https://cloud.githubusercontent.com/assets/5124612/10404379/20fbdcfc-6e87-11e5-9415-27e25193a25c.png
>
> Rather than just increasing the allotted heap size, does anyone have
> any
> other ideas? Is it possible to disable the SQL tab specifically? I also
> thought about serving the UI from disk rather than memory with
> "spark.eventLog.enabled=true" and "spark.ui.enabled=false". Has anyone
> tried
> this before?
>
> Thanks,
> Nick
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-UI-consuming-lots-of-memory-tp25033.html
> Sent from the Apache Spark User List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

>>>
>>
>


Re: Spark UI consuming lots of memory

2015-10-12 Thread Shixiong Zhu
Could you show how did you set the configurations? You need to set these
configurations before creating SparkContext and SQLContext.

Moreover, the history sever doesn't support SQL UI. So
"spark.eventLog.enabled=true" doesn't work now.

Best Regards,
Shixiong Zhu

2015-10-13 2:01 GMT+08:00 pnpritchard :

> Hi,
>
> In my application, the Spark UI is consuming a lot of memory, especially
> the
> SQL tab. I have set the following configurations to reduce the memory
> consumption:
> - spark.ui.retainedJobs=20
> - spark.ui.retainedStages=40
> - spark.sql.ui.retainedExecutions=0
>
> However, I still get OOM errors in the driver process with the default 1GB
> heap size. The following link is a screen shot of a heap dump report,
> showing the SQLListener instance having a retained size of 600MB.
>
> https://cloud.githubusercontent.com/assets/5124612/10404379/20fbdcfc-6e87-11e5-9415-27e25193a25c.png
>
> Rather than just increasing the allotted heap size, does anyone have any
> other ideas? Is it possible to disable the SQL tab specifically? I also
> thought about serving the UI from disk rather than memory with
> "spark.eventLog.enabled=true" and "spark.ui.enabled=false". Has anyone
> tried
> this before?
>
> Thanks,
> Nick
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-UI-consuming-lots-of-memory-tp25033.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Spark UI consuming lots of memory

2015-10-12 Thread Shixiong Zhu
Is your query very complicated? Could you provide the output of `explain`
your query that consumes an excessive amount of memory? If this is a small
query, there may be a bug that leaks memory in SQLListener.

Best Regards,
Shixiong Zhu

2015-10-13 11:44 GMT+08:00 Nicholas Pritchard <
nicholas.pritch...@falkonry.com>:

> As an update, I did try disabling the ui with "spark.ui.enabled=false",
> but the JobListener and SQLListener still consume a lot of memory, leading
> to OOM error. Has anyone encountered this before? Is the only solution just
> to increase the driver heap size?
>
> Thanks,
> Nick
>
> On Mon, Oct 12, 2015 at 8:42 PM, Nicholas Pritchard <
> nicholas.pritch...@falkonry.com> wrote:
>
>> I set those configurations by passing to spark-submit script:
>> "bin/spark-submit --conf spark.ui.retainedJobs=20 ...". I have verified
>> that these configurations are being passed correctly because they are
>> listed in the environments tab and also by counting the number of
>> job/stages that are listed. The "spark.sql.ui.retainedExecutions=0" only
>> applies to the number of "completed" executions; there will always be a
>> "running" execution. For some reason, I have one execution that consumes an
>> excessive amount of memory.
>>
>> Actually, I am not interested in the SQL UI, as I find the Job/Stages UI
>> to have sufficient information. I am also using Spark Standalone cluster
>> manager so have not had to use the history server.
>>
>>
>> On Mon, Oct 12, 2015 at 8:17 PM, Shixiong Zhu  wrote:
>>
>>> Could you show how did you set the configurations? You need to set these
>>> configurations before creating SparkContext and SQLContext.
>>>
>>> Moreover, the history sever doesn't support SQL UI. So
>>> "spark.eventLog.enabled=true" doesn't work now.
>>>
>>> Best Regards,
>>> Shixiong Zhu
>>>
>>> 2015-10-13 2:01 GMT+08:00 pnpritchard :
>>>
 Hi,

 In my application, the Spark UI is consuming a lot of memory,
 especially the
 SQL tab. I have set the following configurations to reduce the memory
 consumption:
 - spark.ui.retainedJobs=20
 - spark.ui.retainedStages=40
 - spark.sql.ui.retainedExecutions=0

 However, I still get OOM errors in the driver process with the default
 1GB
 heap size. The following link is a screen shot of a heap dump report,
 showing the SQLListener instance having a retained size of 600MB.

 https://cloud.githubusercontent.com/assets/5124612/10404379/20fbdcfc-6e87-11e5-9415-27e25193a25c.png

 Rather than just increasing the allotted heap size, does anyone have any
 other ideas? Is it possible to disable the SQL tab specifically? I also
 thought about serving the UI from disk rather than memory with
 "spark.eventLog.enabled=true" and "spark.ui.enabled=false". Has anyone
 tried
 this before?

 Thanks,
 Nick



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-UI-consuming-lots-of-memory-tp25033.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


>>>
>>
>


Re: Spark UI tunneling

2015-03-23 Thread Sergey Gerasimov
Akhil,

that's what I did.

The problem is that probably web server tried to forward my request to another 
address accessible locally only.



 23 марта 2015 г., в 11:12, Akhil Das ak...@sigmoidanalytics.com написал(а):
 
 Did you try ssh -L 4040:127.0.0.1:4040 user@host
 
 Thanks
 Best Regards
 
 On Mon, Mar 23, 2015 at 1:12 PM, sergunok ser...@gmail.com wrote:
 Is it a way to tunnel Spark UI?
 
 I tried to tunnel client-node:4040  but my browser was redirected from
 localhost to some cluster locally visible domain name..
 
 Maybe there is some startup option to encourage Spark UI be fully
 accessiable just through single endpoint (address:port)?
 
 Serg.
 
 
 
 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-UI-tunneling-tp22184.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 


Re: Spark UI tunneling

2015-03-23 Thread Akhil Das
Oh in that case you could try adding the hostname in your /etc/hosts under
your localhost. Also make sure there is a request going to another host by
inspecting the network calls:

[image: Inline image 1]

Thanks
Best Regards

On Mon, Mar 23, 2015 at 1:55 PM, Sergey Gerasimov ser...@gmail.com wrote:

 Akhil,

 that's what I did.

 The problem is that probably web server tried to forward my request to
 another address accessible locally only.



 23 марта 2015 г., в 11:12, Akhil Das ak...@sigmoidanalytics.com
 написал(а):

 Did you try ssh -L 4040:127.0.0.1:4040 user@host

 Thanks
 Best Regards

 On Mon, Mar 23, 2015 at 1:12 PM, sergunok ser...@gmail.com wrote:

 Is it a way to tunnel Spark UI?

 I tried to tunnel client-node:4040  but my browser was redirected from
 localhost to some cluster locally visible domain name..

 Maybe there is some startup option to encourage Spark UI be fully
 accessiable just through single endpoint (address:port)?

 Serg.



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-UI-tunneling-tp22184.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





Re: Spark UI tunneling

2015-03-23 Thread Akhil Das
Did you try ssh -L 4040:127.0.0.1:4040 user@host

Thanks
Best Regards

On Mon, Mar 23, 2015 at 1:12 PM, sergunok ser...@gmail.com wrote:

 Is it a way to tunnel Spark UI?

 I tried to tunnel client-node:4040  but my browser was redirected from
 localhost to some cluster locally visible domain name..

 Maybe there is some startup option to encourage Spark UI be fully
 accessiable just through single endpoint (address:port)?

 Serg.



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-UI-tunneling-tp22184.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Spark UI and running spark-submit with --master yarn

2015-03-02 Thread Marcelo Vanzin
That's the RM's RPC port, not the web UI port. (See Ted's e-mail -
normally web UI is on 8088.)

On Mon, Mar 2, 2015 at 4:14 PM, Anupama Joshi anupama.jo...@gmail.com wrote:
 Hi Marcelo,
 Thanks for the quick reply.
 I have a EMR cluster and I am running the spark-submit on the master node in
 the cluster.
 When I start the spark-submit , I see
 15/03/02 23:48:33 INFO client.RMProxy: Connecting to ResourceManager at
 /172.31.43.254:9022
 But If I try that URL or the use the external DNS
 ec2-52-10-234-111.us-west-2.compute.amazonaws.com:9022
 it does not work
 What am I missing here ?
 Thanks a lot for the help
 -AJ


 On Mon, Mar 2, 2015 at 3:50 PM, Marcelo Vanzin van...@cloudera.com wrote:

 What are you calling masternode? In yarn-cluster mode, the driver
 is running somewhere in your cluster, not on the machine where you run
 spark-submit.

 The easiest way to get to the Spark UI when using Yarn is to use the
 Yarn RM's web UI. That will give you a link to the application's UI
 regardless of whether it's running on client or cluster mode.

 On Mon, Mar 2, 2015 at 3:39 PM, Anupama Joshi anupama.jo...@gmail.com
 wrote:
  Hi ,
 
   When I run my application with --master yarn-cluster or --master yarn
  --deploy-mode cluster , I can not  the spark UI at the  location --
  masternode:4040Even if I am running the job , I can not see teh SPARK
  UI.
  When I run with --master yarn --deploy-mode client  -- I see the Spark
  UI
  but I cannot see my job  running.
 
  When I run spark-submit with --master local[*] , I see the spark UI , my
  job
  everything (Thats great)
 
  Do I need to do some settings to see the UI?
 
  Thanks
 
  -AJ
 
 
 
 
 
 



 --
 Marcelo





-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark UI and running spark-submit with --master yarn

2015-03-02 Thread Ted Yu
Default RM Web UI port is 8088 (configurable
through yarn.resourcemanager.webapp.address)

Cheers

On Mon, Mar 2, 2015 at 4:14 PM, Anupama Joshi anupama.jo...@gmail.com
wrote:

 Hi Marcelo,
 Thanks for the quick reply.
 I have a EMR cluster and I am running the spark-submit on the master node
 in the cluster.
 When I start the spark-submit , I see
 15/03/02 23:48:33 INFO client.RMProxy: Connecting to ResourceManager at /
 172.31.43.254:9022
 But If I try that URL or the use the external DNS
 ec2-52-10-234-111.us-west-2.compute.amazonaws.com:9022
 it does not work
 What am I missing here ?
 Thanks a lot for the help
 -AJ


 On Mon, Mar 2, 2015 at 3:50 PM, Marcelo Vanzin van...@cloudera.com
 wrote:

 What are you calling masternode? In yarn-cluster mode, the driver
 is running somewhere in your cluster, not on the machine where you run
 spark-submit.

 The easiest way to get to the Spark UI when using Yarn is to use the
 Yarn RM's web UI. That will give you a link to the application's UI
 regardless of whether it's running on client or cluster mode.

 On Mon, Mar 2, 2015 at 3:39 PM, Anupama Joshi anupama.jo...@gmail.com
 wrote:
  Hi ,
 
   When I run my application with --master yarn-cluster or --master yarn
  --deploy-mode cluster , I can not  the spark UI at the  location --
  masternode:4040Even if I am running the job , I can not see teh SPARK
 UI.
  When I run with --master yarn --deploy-mode client  -- I see the Spark
 UI
  but I cannot see my job  running.
 
  When I run spark-submit with --master local[*] , I see the spark UI ,
 my job
  everything (Thats great)
 
  Do I need to do some settings to see the UI?
 
  Thanks
 
  -AJ
 
 
 
 
 
 



 --
 Marcelo





Re: Spark UI and running spark-submit with --master yarn

2015-03-02 Thread Marcelo Vanzin
What are you calling masternode? In yarn-cluster mode, the driver
is running somewhere in your cluster, not on the machine where you run
spark-submit.

The easiest way to get to the Spark UI when using Yarn is to use the
Yarn RM's web UI. That will give you a link to the application's UI
regardless of whether it's running on client or cluster mode.

On Mon, Mar 2, 2015 at 3:39 PM, Anupama Joshi anupama.jo...@gmail.com wrote:
 Hi ,

  When I run my application with --master yarn-cluster or --master yarn
 --deploy-mode cluster , I can not  the spark UI at the  location --
 masternode:4040Even if I am running the job , I can not see teh SPARK UI.
 When I run with --master yarn --deploy-mode client  -- I see the Spark UI
 but I cannot see my job  running.

 When I run spark-submit with --master local[*] , I see the spark UI , my job
 everything (Thats great)

 Do I need to do some settings to see the UI?

 Thanks

 -AJ









-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark UI and running spark-submit with --master yarn

2015-03-02 Thread Anupama Joshi
Hi Marcelo,
Thanks for the quick reply.
I have a EMR cluster and I am running the spark-submit on the master node
in the cluster.
When I start the spark-submit , I see
15/03/02 23:48:33 INFO client.RMProxy: Connecting to ResourceManager at /
172.31.43.254:9022
But If I try that URL or the use the external DNS
ec2-52-10-234-111.us-west-2.compute.amazonaws.com:9022
it does not work
What am I missing here ?
Thanks a lot for the help
-AJ


On Mon, Mar 2, 2015 at 3:50 PM, Marcelo Vanzin van...@cloudera.com wrote:

 What are you calling masternode? In yarn-cluster mode, the driver
 is running somewhere in your cluster, not on the machine where you run
 spark-submit.

 The easiest way to get to the Spark UI when using Yarn is to use the
 Yarn RM's web UI. That will give you a link to the application's UI
 regardless of whether it's running on client or cluster mode.

 On Mon, Mar 2, 2015 at 3:39 PM, Anupama Joshi anupama.jo...@gmail.com
 wrote:
  Hi ,
 
   When I run my application with --master yarn-cluster or --master yarn
  --deploy-mode cluster , I can not  the spark UI at the  location --
  masternode:4040Even if I am running the job , I can not see teh SPARK
 UI.
  When I run with --master yarn --deploy-mode client  -- I see the Spark UI
  but I cannot see my job  running.
 
  When I run spark-submit with --master local[*] , I see the spark UI , my
 job
  everything (Thats great)
 
  Do I need to do some settings to see the UI?
 
  Thanks
 
  -AJ
 
 
 
 
 
 



 --
 Marcelo



Re: Spark UI and running spark-submit with --master yarn

2015-03-02 Thread Marcelo Vanzin
That does not look like the RM UI. Please check your configuration for
the port (see Ted's e-mail).

On Mon, Mar 2, 2015 at 4:45 PM, Anupama Joshi anupama.jo...@gmail.com wrote:
 Hi ,
  port 8088 does not show me anything .(can not connect)
 where as port ec2-52-10-234-111.us-west-2.compute.amazonaws.com:9026 shows
 me all the applications.
 Do I have to do anything for the port 8088 or whatever I am seeing at 9026
 port is good .Attached is screenshot .
 Thanks
 AJ

 On Mon, Mar 2, 2015 at 4:24 PM, Marcelo Vanzin van...@cloudera.com wrote:

 That's the RM's RPC port, not the web UI port. (See Ted's e-mail -
 normally web UI is on 8088.)

 On Mon, Mar 2, 2015 at 4:14 PM, Anupama Joshi anupama.jo...@gmail.com
 wrote:
  Hi Marcelo,
  Thanks for the quick reply.
  I have a EMR cluster and I am running the spark-submit on the master
  node in
  the cluster.
  When I start the spark-submit , I see
  15/03/02 23:48:33 INFO client.RMProxy: Connecting to ResourceManager at
  /172.31.43.254:9022
  But If I try that URL or the use the external DNS
  ec2-52-10-234-111.us-west-2.compute.amazonaws.com:9022
  it does not work
  What am I missing here ?
  Thanks a lot for the help
  -AJ
 
 
  On Mon, Mar 2, 2015 at 3:50 PM, Marcelo Vanzin van...@cloudera.com
  wrote:
 
  What are you calling masternode? In yarn-cluster mode, the driver
  is running somewhere in your cluster, not on the machine where you run
  spark-submit.
 
  The easiest way to get to the Spark UI when using Yarn is to use the
  Yarn RM's web UI. That will give you a link to the application's UI
  regardless of whether it's running on client or cluster mode.
 
  On Mon, Mar 2, 2015 at 3:39 PM, Anupama Joshi anupama.jo...@gmail.com
  wrote:
   Hi ,
  
When I run my application with --master yarn-cluster or --master
   yarn
   --deploy-mode cluster , I can not  the spark UI at the  location --
   masternode:4040Even if I am running the job , I can not see teh
   SPARK
   UI.
   When I run with --master yarn --deploy-mode client  -- I see the
   Spark
   UI
   but I cannot see my job  running.
  
   When I run spark-submit with --master local[*] , I see the spark UI ,
   my
   job
   everything (Thats great)
  
   Do I need to do some settings to see the UI?
  
   Thanks
  
   -AJ
  
  
  
  
  
  
 
 
 
  --
  Marcelo
 
 



 --
 Marcelo





-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark UI and Spark Version on Google Compute Engine

2015-01-17 Thread Matei Zaharia
Unfortunately we don't have anything to do with Spark on GCE, so I'd suggest 
asking in the GCE support forum. You could also try to launch a Spark cluster 
by hand on nodes in there. Sigmoid Analytics published a package for this here: 
http://spark-packages.org/package/9

Matei

 On Jan 17, 2015, at 4:47 PM, Soumya Simanta soumya.sima...@gmail.com wrote:
 
 I'm deploying Spark using the Click to Deploy Hadoop - Install Apache 
 Spark on Google Compute Engine.
 
 I can run Spark jobs on the REPL and read data from Google storage. However, 
 I'm not sure how to access the Spark UI in this deployment. Can anyone help? 
 
 Also, it deploys Spark 1.1. It there an easy way to bump it to Spark 1.2 ? 
 
 Thanks
 -Soumya
 
 
 image.png
 
 image.png


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: spark ui redirecting to port 8100

2014-10-21 Thread Sameer Farooqui
Hi Sadhan,

Which port are you specifically trying to redirect? The driver program has
a web UI, typically on port 4040... or the Spark Standalone Cluster Master
has a UI exposed on port 7077.

Which setting did you update in which file to make this change?

And finally, which version of Spark are you on?

Sameer F.
Client Services @ Databricks

On Tue, Oct 21, 2014 at 3:29 PM, sadhan sadhan.s...@gmail.com wrote:

 Set up the spark port to a different one and the connection seems
 successful
 but get a 302 to /proxy on port 8100 ? Nothing is listening on that port as
 well.



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/spark-ui-redirecting-to-port-8100-tp16956.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: SPARK UI - Details post job processiong

2014-09-26 Thread Matt Narrell
Yes, I’m running Hadoop’s Timeline server that does this for the YARN/Hadoop 
logs (and works very nicely btw).  Are you saying I can do the same for the 
SparkUI as well?  Also, where do I set these Spark configurations since this 
will be executed inside a YARN container?  On the “client” machine via 
spark-env.sh?  Do I pass these as command line arguments to spark-submit?  Do I 
set them explicitly on my SparkConf?

Thanks in advance.

mn

On Sep 25, 2014, at 9:13 PM, Andrew Ash and...@andrewash.com wrote:

 Matt you should be able to set an HDFS path so you'll get logs written to a 
 unified place instead of to local disk on a random box on the cluster.
 
 On Thu, Sep 25, 2014 at 1:38 PM, Matt Narrell matt.narr...@gmail.com wrote:
 How does this work with a cluster manager like YARN?
 
 mn
 
 On Sep 25, 2014, at 2:23 PM, Andrew Or and...@databricks.com wrote:
 
 Hi Harsha,
 
 You can turn on `spark.eventLog.enabled` as documented here: 
 http://spark.apache.org/docs/latest/monitoring.html. Then, if you are 
 running standalone mode, you can access the finished SparkUI through the 
 Master UI. Otherwise, you can start a HistoryServer to display finished UIs.
 
 -Andrew
 
 2014-09-25 12:55 GMT-07:00 Harsha HN 99harsha.h@gmail.com:
 Hi,
 
 Details laid out in Spark UI for the job in progress is really interesting 
 and very useful. 
 But this gets vanished once the job is done. 
 Is there a way to get job details post processing? 
 
 Looking for Spark UI data, not standard input,output and error info.
 
 Thanks,
 Harsha
 
 
 



Re: SPARK UI - Details post job processiong

2014-09-26 Thread Chester @work
I am working on a PR that allows one to send the same spark listener event 
message back to the application in yarn cluster mode. 

So far I have put this function in our application, our UI will receive and 
display the same spark job event message such as progress, job start, completed 
etc

Essentially, it establish a communication channel , you can send over progress, 
messages and detailed exceptions from spark job inside yarn to your 
application, on you application side , you can display , or log, make use it in 
other ways. 

You can send send message to the running spark job via the channel. 

I will cleanup the code and send PR soon

Chester
Alpine Data Lab

Sent from my iPhone

 On Sep 26, 2014, at 7:38 AM, Matt Narrell matt.narr...@gmail.com wrote:
 
 Yes, I’m running Hadoop’s Timeline server that does this for the YARN/Hadoop 
 logs (and works very nicely btw).  Are you saying I can do the same for the 
 SparkUI as well?  Also, where do I set these Spark configurations since this 
 will be executed inside a YARN container?  On the “client” machine via 
 spark-env.sh?  Do I pass these as command line arguments to spark-submit?  Do 
 I set them explicitly on my SparkConf?
 
 Thanks in advance.
 
 mn
 
 On Sep 25, 2014, at 9:13 PM, Andrew Ash and...@andrewash.com wrote:
 
 Matt you should be able to set an HDFS path so you'll get logs written to a 
 unified place instead of to local disk on a random box on the cluster.
 
 On Thu, Sep 25, 2014 at 1:38 PM, Matt Narrell matt.narr...@gmail.com 
 wrote:
 How does this work with a cluster manager like YARN?
 
 mn
 
 On Sep 25, 2014, at 2:23 PM, Andrew Or and...@databricks.com wrote:
 
 Hi Harsha,
 
 You can turn on `spark.eventLog.enabled` as documented here: 
 http://spark.apache.org/docs/latest/monitoring.html. Then, if you are 
 running standalone mode, you can access the finished SparkUI through the 
 Master UI. Otherwise, you can start a HistoryServer to display finished 
 UIs.
 
 -Andrew
 
 2014-09-25 12:55 GMT-07:00 Harsha HN 99harsha.h@gmail.com:
 Hi,
 
 Details laid out in Spark UI for the job in progress is really 
 interesting and very useful. 
 But this gets vanished once the job is done. 
 Is there a way to get job details post processing? 
 
 Looking for Spark UI data, not standard input,output and error info.
 
 Thanks,
 Harsha
 


Re: SPARK UI - Details post job processiong

2014-09-25 Thread Andrew Ash
Matt you should be able to set an HDFS path so you'll get logs written to a
unified place instead of to local disk on a random box on the cluster.

On Thu, Sep 25, 2014 at 1:38 PM, Matt Narrell matt.narr...@gmail.com
wrote:

 How does this work with a cluster manager like YARN?

 mn

 On Sep 25, 2014, at 2:23 PM, Andrew Or and...@databricks.com wrote:

 Hi Harsha,

 You can turn on `spark.eventLog.enabled` as documented here:
 http://spark.apache.org/docs/latest/monitoring.html. Then, if you are
 running standalone mode, you can access the finished SparkUI through the
 Master UI. Otherwise, you can start a HistoryServer to display finished UIs.

 -Andrew

 2014-09-25 12:55 GMT-07:00 Harsha HN 99harsha.h@gmail.com:

 Hi,

 Details laid out in Spark UI for the job in progress is really
 interesting and very useful.
 But this gets vanished once the job is done.
 Is there a way to get job details post processing?

 Looking for Spark UI data, not standard input,output and error info.

 Thanks,
 Harsha






Re: spark ui on yarn

2014-07-13 Thread Koert Kuipers
my yarn environment does have less memory for the executors.

i am checking if the RDDs are cached by calling sc.getRDDStorageInfo, which
shows an RDD as fully cached in memory, yet it does not show up in the UI


On Sun, Jul 13, 2014 at 1:49 AM, Matei Zaharia matei.zaha...@gmail.com
wrote:

 The UI code is the same in both, but one possibility is that your
 executors were given less memory on YARN. Can you check that? Or otherwise,
 how do you know that some RDDs were cached?

 Matei

 On Jul 12, 2014, at 4:12 PM, Koert Kuipers ko...@tresata.com wrote:

 hey shuo,
 so far all stage links work fine for me.

 i did some more testing, and it seems kind of random what shows up on the
 gui and what does not. some partially cached RDDs make it to the GUI, while
 some fully cached ones do not. I have not been able to detect a pattern.

 is the codebase for the gui different in standalone than in yarn-client
 mode?


 On Sat, Jul 12, 2014 at 3:34 AM, Shuo Xiang shuoxiang...@gmail.com
 wrote:

 Hi Koert,
   Just curious did you find any information like CANNOT FIND ADDRESS
 after clicking into some stage? I've seen similar problems due to lost of
 executors.

 Best,



 On Fri, Jul 11, 2014 at 4:42 PM, Koert Kuipers ko...@tresata.com wrote:

 I just tested a long lived application (that we normally run in
 standalone mode) on yarn in client mode.

 it looks to me like cached rdds are missing in the storage tap of the ui.

 accessing the rdd storage information via the spark context shows rdds
 as fully cached but they are missing on storage page.

 spark 1.0.0







Re: spark ui on yarn

2014-07-12 Thread Shuo Xiang
Hi Koert,
  Just curious did you find any information like CANNOT FIND ADDRESS
after clicking into some stage? I've seen similar problems due to lost of
executors.

Best,



On Fri, Jul 11, 2014 at 4:42 PM, Koert Kuipers ko...@tresata.com wrote:

 I just tested a long lived application (that we normally run in standalone
 mode) on yarn in client mode.

 it looks to me like cached rdds are missing in the storage tap of the ui.

 accessing the rdd storage information via the spark context shows rdds as
 fully cached but they are missing on storage page.

 spark 1.0.0



Re: spark ui on yarn

2014-07-12 Thread Koert Kuipers
hey shuo,
so far all stage links work fine for me.

i did some more testing, and it seems kind of random what shows up on the
gui and what does not. some partially cached RDDs make it to the GUI, while
some fully cached ones do not. I have not been able to detect a pattern.

is the codebase for the gui different in standalone than in yarn-client
mode?


On Sat, Jul 12, 2014 at 3:34 AM, Shuo Xiang shuoxiang...@gmail.com wrote:

 Hi Koert,
   Just curious did you find any information like CANNOT FIND ADDRESS
 after clicking into some stage? I've seen similar problems due to lost of
 executors.

 Best,



 On Fri, Jul 11, 2014 at 4:42 PM, Koert Kuipers ko...@tresata.com wrote:

 I just tested a long lived application (that we normally run in
 standalone mode) on yarn in client mode.

 it looks to me like cached rdds are missing in the storage tap of the ui.

 accessing the rdd storage information via the spark context shows rdds as
 fully cached but they are missing on storage page.

 spark 1.0.0





Re: spark ui on yarn

2014-07-12 Thread Matei Zaharia
The UI code is the same in both, but one possibility is that your executors 
were given less memory on YARN. Can you check that? Or otherwise, how do you 
know that some RDDs were cached?

Matei

On Jul 12, 2014, at 4:12 PM, Koert Kuipers ko...@tresata.com wrote:

 hey shuo,
 so far all stage links work fine for me.
 
 i did some more testing, and it seems kind of random what shows up on the gui 
 and what does not. some partially cached RDDs make it to the GUI, while some 
 fully cached ones do not. I have not been able to detect a pattern.
 
 is the codebase for the gui different in standalone than in yarn-client mode? 
 
 
 On Sat, Jul 12, 2014 at 3:34 AM, Shuo Xiang shuoxiang...@gmail.com wrote:
 Hi Koert,
   Just curious did you find any information like CANNOT FIND ADDRESS after 
 clicking into some stage? I've seen similar problems due to lost of executors.
 
 Best,
 
 
 
 On Fri, Jul 11, 2014 at 4:42 PM, Koert Kuipers ko...@tresata.com wrote:
 I just tested a long lived application (that we normally run in standalone 
 mode) on yarn in client mode.
 
 it looks to me like cached rdds are missing in the storage tap of the ui.
 
 accessing the rdd storage information via the spark context shows rdds as 
 fully cached but they are missing on storage page.
 
 spark 1.0.0