subject:"Spark UI"

Re: Help in understanding Exchange in Spark UI

2024-06-20 Thread Mich Talebzadeh

OK, I gave an answer in StackOverflow. Happy reading

Mich Talebzadeh,
Technologist | Architect | Data Engineer  | Generative AI | FinCrime
PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College
London <https://en.wikipedia.org/wiki/Imperial_College_London>
London, United Kingdom

   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>

 https://en.everybodywiki.com/Mich_Talebzadeh

*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".

On Thu, 20 Jun 2024 at 17:15, Dhruv Singla  wrote:

> Hey Team
>
> I've posted a question of StackOverflow. The link is -
> https://stackoverflow.com/questions/78644118/understanding-exchange-in-spark-ui
>
> I haven't got any responses yet. If possible could you please look into
> it? If you need me to write the question in the mailing list, I can do that
> as well.
>
> Thanks & Regards
> Dhruv
>

Help in understanding Exchange in Spark UI

2024-06-20 Thread Dhruv Singla

Hey Team

I've posted a question of StackOverflow. The link is -
https://stackoverflow.com/questions/78644118/understanding-exchange-in-spark-ui

I haven't got any responses yet. If possible could you please look into it?
If you need me to write the question in the mailing list, I can do that as
well.

Thanks & Regards
Dhruv

Spark-UI stages and other tabs not accessible in standalone mode when reverse-proxy is enabled

2024-03-19 Thread sharad mishra

Hi Team,
We're encountering an issue with Spark UI.
I've documented the details here:
https://issues.apache.org/jira/browse/SPARK-47232
When enabled reverse proxy in master and worker configOptions. We're not
able to access different tabs available in spark UI e.g.(stages,
environment, storage etc.)

We're deploying spark through bitnami helm chart :
https://github.com/bitnami/charts/tree/main/bitnami/spark

Name and Version

bitnami/spark - 6.0.0

What steps will reproduce the bug?

Kubernetes Version: 1.25
Spark: 3.4.2
Helm chart: 6.0.0

Steps to reproduce:
After installing the chart Spark Cluster(Master and worker) UI is available
at:


https://spark.staging.abc.com/

We are able to access running application by click on applicationID under
Running Applications link:



We can access spark UI by clicking Application Detail UI:

We are taken to jobs tab when we click on Application Detail UI


URL looks like:
https://spark.staging.abc.com/proxy/app-20240208103209-0030/stages/

When we click any of the tab from spark UI e.g. stages or environment etc,
it takes us back to spark cluster UI page
We noticed that endpoint changes to


https://spark.staging.abc.com/stages/
instead of
https://spark.staging.abc.com/proxy/app-20240208103209-0030/stages/



Are you using any custom parameters or values?

Configurations set in values.yaml
```
master:
  configOptions:
-Dspark.ui.reverseProxy=true
-Dspark.ui.reverseProxyUrl=https://spark.staging.abc.com

worker:
  configOptions:
-Dspark.ui.reverseProxy=true
-Dspark.ui.reverseProxyUrl=https://spark.staging.abc.com

service:
  type: ClusterIP
  ports:
http: 8080
https: 443
cluster: 7077

ingress:

  enabled: true
  pathType: ImplementationSpecific
  apiVersion: ""
  hostname: spark.staging.abc.com
  ingressClassName: "staging"
  path: /
```



What is the expected behavior?

Expected behaviour is that when I click on stages tab, instead of taking me
to
https://spark.staging.abc.com/stages/
it should take me to following URL:
https://spark.staging.abc.com/proxy/app-20240208103209-0030/stages/

What do you see instead?

current behaviour is it takes me to URL:
https://spark.staging.abc.com/stages/ , which shows spark cluster UI with
master and worker details

would appreciate any help on this, thanks.

Best,
Sharad

Spark-UI stages and other tabs not accessible in standalone mode when reverse-proxy is enabled

2024-03-08 Thread sharad mishra

Hi Team,
We're encountering an issue with Spark UI.
When enabled reverse proxy in master and worker configOptions. We're not
able to access different tabs available in spark UI e.g.(stages,
environment, storage etc.)

We're deploying spark through bitnami helm chart :
https://github.com/bitnami/charts/tree/main/bitnami/spark

Name and Version

bitnami/spark - 6.0.0

What steps will reproduce the bug?

Kubernetes Version: 1.25
Spark: 3.4.2
Helm chart: 6.0.0

Steps to reproduce:
After installing the chart Spark Cluster(Master and worker) UI is available
at:


https://spark.staging.abc.com/

We are able to access running application by click on applicationID under
Running Applications link:



We can access spark UI by clicking Application Detail UI:

We are taken to jobs tab when we click on Application Detail UI


URL looks like:
https://spark.staging.abc.com/proxy/app-20240208103209-0030/stages/

When we click any of the tab from spark UI e.g. stages or environment etc,
it takes us back to spark cluster UI page
We noticed that endpoint changes to


https://spark.staging.abc.com/stages/
instead of
https://spark.staging.abc.com/proxy/app-20240208103209-0030/stages/



Are you using any custom parameters or values?

Configurations set in values.yaml
```
master:
  configOptions:
-Dspark.ui.reverseProxy=true
-Dspark.ui.reverseProxyUrl=https://spark.staging.abc.com

worker:
  configOptions:
-Dspark.ui.reverseProxy=true
-Dspark.ui.reverseProxyUrl=https://spark.staging.abc.com

service:
  type: ClusterIP
  ports:
http: 8080
https: 443
cluster: 7077

ingress:

  enabled: true
  pathType: ImplementationSpecific
  apiVersion: ""
  hostname: spark.staging.abc.com
  ingressClassName: "staging"
  path: /
```



What is the expected behavior?

Expected behaviour is that when I click on stages tab, instead of taking me
to
https://spark.staging.abc.com/stages/
it should take me to following URL:
https://spark.staging.abc.com/proxy/app-20240208103209-0030/stages/

What do you see instead?

current behaviour is it takes me to URL:
https://spark.staging.abc.com/stages/ , which shows spark cluster UI with
master and worker details

Best,
Sharad

Spark UI - Bug Executors tab when using proxy port

2023-07-06 Thread Bruno Pistone

Hello everyone,

I’m really sorry to use this mailing list, but seems impossible to notify a 
strange behaviour that is happening with the Spark UI. I’m sending also the 
link to the stackoverflow question here 
https://stackoverflow.com/questions/76632692/spark-ui-executors-tab-its-empty

I’m trying to run the Spark UI on a web server. I need to configure a specific 
port for running the UI and a redirect URL. I’m setting up the following OPTS:

```
export SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=${LOCAL_PATH_LOGS}
-Dspark.history.ui.port=18080 
-Dspark.eventLog.enabled=true 
-Dspark.ui.proxyRedirectUri=${SERVER_URL}"

./start-history-server.sh
``

What is happening: The UI is accessible through the url 
https://${SERVER_URL}/proxy/18080 <https://${server_url}/proxy/18080>

When I’m selecting an application and I’m clicking on the tab “Executors”, it 
remains empty. By looking at the API calls done by the UI, I see there is the 
"/allexecutors” which returns 404.

Instead of calling 
https://${SERVER_URL}/proxy/18080/api/v1/applications/${APP_ID}/allexecutors 
<https://${server_url}/proxy/18080/api/v1/applications/$%7BAPP_ID%7D/allexecutors>
I see that the URL called is 
https://${SERVER_URL}/proxy/18080/api/v1/applications/18080/allexecutors 
<https://${server_url}/proxy/18080/api/v1/applications/18080/allexecutors>

Seems that the appId is not correctly identified. Can you please provide a 
solution for this, or an estimated date for fixing the error?

Thank you,

CVE-2023-32007: Apache Spark: Shell command injection via Spark UI

2023-05-02 Thread Arnout Engelen

Severity: important

Affected versions:

- Apache Spark 3.1.1 before 3.2.2

Description:

** UNSUPPORTED WHEN ASSIGNED ** The Apache Spark UI offers the possibility to 
enable ACLs via the configuration option spark.acls.enable. With an 
authentication filter, this checks whether a user has access permissions to 
view or modify the application. If ACLs are enabled, a code path in 
HttpSecurityFilter can allow someone to perform impersonation by providing an 
arbitrary user name. A malicious user might then be able to reach a permission 
check function that will ultimately build a Unix shell command based on their 
input, and execute it. This will result in arbitrary shell command execution as 
the user Spark is currently running as. This issue was disclosed earlier as 
CVE-2022-33891, but incorrectly claimed version 3.1.3 (which has since gone 
EOL) would not be affected.

NOTE: This vulnerability only affects products that are no longer supported by 
the maintainer.

Users are recommended to upgrade to a supported version of Apache Spark, such 
as version 3.4.0.

Credit:

Sven Krewitt, Flashpoint (reporter)

References:

https://www.cve.org/CVERecord?id=CVE-2022-33891
https://spark.apache.org/security.html
https://spark.apache.org/
https://www.cve.org/CVERecord?id=CVE-2023-32007


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: RDD block has negative value in Spark UI

2022-12-07 Thread Stelios Philippou

Already a know minor issue

https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-10141

On Wed, 7 Dec 2022, 15:09 K B M Kaala Subhikshan, <
kbmkaalasubhiks...@gmail.com> wrote:

> Could you explain why the RDD block has a negative value?
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Filtering by job group in the Spark UI / API

2022-08-18 Thread Yeachan Park

Hi All,

Is there a way that we can filter in all the jobs from the history server
UI / in Spark's API based on the Job Group to which the job belongs to?

Ideally we would like to supply a particular job group, and only see the
jobs associated with that job group in the UI.

Thanks,
Yeachan

CVE-2022-33891: Apache Spark shell command injection vulnerability via Spark UI

2022-07-17 Thread Sean Owen

Severity: important

Description:

The Apache Spark UI offers the possibility to enable ACLs via the
configuration option spark.acls.enable. With an authentication filter, this
checks whether a user has access permissions to view or modify the
application. If ACLs are enabled, a code path in HttpSecurityFilter can
allow someone to perform impersonation by providing an arbitrary user name.
A malicious user might then be able to reach a permission check function
that will ultimately build a Unix shell command based on their input, and
execute it. This will result in arbitrary shell command execution as the
user Spark is currently running as. This affects Apache Spark versions
3.0.3 and earlier, versions 3.1.1 to 3.1.2, and versions 3.2.0 to 3.2.1.

This issue is being tracked as SPARK-38992

Mitigation:

Upgrade to supported Apache Spark maintenance release 3.1.3, 3.2.2, or
3.3.0 or later

Credit:

 Kostya Kortchinsky (Databricks)

Re: Reverse proxy for Spark UI on Kubernetes

2022-05-17 Thread bo yang

Yes, it should be possible, any interest to work on this together? Need
more hands to add more features here :)

On Tue, May 17, 2022 at 2:06 PM Holden Karau  wrote:

> Could we make it do the same sort of history server fallback approach?
>
> On Tue, May 17, 2022 at 10:41 PM bo yang  wrote:
>
>> It is like Web Application Proxy in YARN (
>> https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html),
>> to provide easy access for Spark UI when the Spark application is running.
>>
>> When running Spark on Kubernetes with S3, there is no YARN. The reverse
>> proxy here is to behave like that Web Application Proxy. It will
>> simplify settings to access Spark UI on Kubernetes.
>>
>>
>> On Mon, May 16, 2022 at 11:46 PM wilson  wrote:
>>
>>> what's the advantage of using reverse proxy for spark UI?
>>>
>>> Thanks
>>>
>>> On Tue, May 17, 2022 at 1:47 PM bo yang  wrote:
>>>
>>>> Hi Spark Folks,
>>>>
>>>> I built a web reverse proxy to access Spark UI on Kubernetes (working
>>>> together with
>>>> https://github.com/GoogleCloudPlatform/spark-on-k8s-operator). Want to
>>>> share here in case other people have similar need.
>>>>
>>>> The reverse proxy code is here:
>>>> https://github.com/datapunchorg/spark-ui-reverse-proxy
>>>>
>>>> Let me know if anyone wants to use or would like to contribute.
>>>>
>>>> Thanks,
>>>> Bo
>>>>
>>>> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>

Re: Reverse proxy for Spark UI on Kubernetes

2022-05-17 Thread Holden Karau

Could we make it do the same sort of history server fallback approach?

On Tue, May 17, 2022 at 10:41 PM bo yang  wrote:

> It is like Web Application Proxy in YARN (
> https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html),
> to provide easy access for Spark UI when the Spark application is running.
>
> When running Spark on Kubernetes with S3, there is no YARN. The reverse
> proxy here is to behave like that Web Application Proxy. It will
> simplify settings to access Spark UI on Kubernetes.
>
>
> On Mon, May 16, 2022 at 11:46 PM wilson  wrote:
>
>> what's the advantage of using reverse proxy for spark UI?
>>
>> Thanks
>>
>> On Tue, May 17, 2022 at 1:47 PM bo yang  wrote:
>>
>>> Hi Spark Folks,
>>>
>>> I built a web reverse proxy to access Spark UI on Kubernetes (working
>>> together with
>>> https://github.com/GoogleCloudPlatform/spark-on-k8s-operator). Want to
>>> share here in case other people have similar need.
>>>
>>> The reverse proxy code is here:
>>> https://github.com/datapunchorg/spark-ui-reverse-proxy
>>>
>>> Let me know if anyone wants to use or would like to contribute.
>>>
>>> Thanks,
>>> Bo
>>>
>>> --
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Re: Reverse proxy for Spark UI on Kubernetes

2022-05-17 Thread bo yang

It is like Web Application Proxy in YARN (
https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html),
to provide easy access for Spark UI when the Spark application is running.

When running Spark on Kubernetes with S3, there is no YARN. The reverse
proxy here is to behave like that Web Application Proxy. It will
simplify settings to access Spark UI on Kubernetes.

On Mon, May 16, 2022 at 11:46 PM wilson  wrote:

> what's the advantage of using reverse proxy for spark UI?
>
> Thanks
>
> On Tue, May 17, 2022 at 1:47 PM bo yang  wrote:
>
>> Hi Spark Folks,
>>
>> I built a web reverse proxy to access Spark UI on Kubernetes (working
>> together with
>> https://github.com/GoogleCloudPlatform/spark-on-k8s-operator). Want to
>> share here in case other people have similar need.
>>
>> The reverse proxy code is here:
>> https://github.com/datapunchorg/spark-ui-reverse-proxy
>>
>> Let me know if anyone wants to use or would like to contribute.
>>
>> Thanks,
>> Bo
>>
>>

Re: Reverse proxy for Spark UI on Kubernetes

2022-05-17 Thread bo yang

Thanks Holden :)

On Mon, May 16, 2022 at 11:12 PM Holden Karau  wrote:

> Oh that’s rad 
>
> On Tue, May 17, 2022 at 7:47 AM bo yang  wrote:
>
>> Hi Spark Folks,
>>
>> I built a web reverse proxy to access Spark UI on Kubernetes (working
>> together with
>> https://github.com/GoogleCloudPlatform/spark-on-k8s-operator). Want to
>> share here in case other people have similar need.
>>
>> The reverse proxy code is here:
>> https://github.com/datapunchorg/spark-ui-reverse-proxy
>>
>> Let me know if anyone wants to use or would like to contribute.
>>
>> Thanks,
>> Bo
>>
>> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>

Re: Reverse proxy for Spark UI on Kubernetes

2022-05-17 Thread wilson

what's the advantage of using reverse proxy for spark UI?

Thanks

On Tue, May 17, 2022 at 1:47 PM bo yang  wrote:

> Hi Spark Folks,
>
> I built a web reverse proxy to access Spark UI on Kubernetes (working
> together with https://github.com/GoogleCloudPlatform/spark-on-k8s-operator).
> Want to share here in case other people have similar need.
>
> The reverse proxy code is here:
> https://github.com/datapunchorg/spark-ui-reverse-proxy
>
> Let me know if anyone wants to use or would like to contribute.
>
> Thanks,
> Bo
>
>

Re: Reverse proxy for Spark UI on Kubernetes

2022-05-17 Thread Holden Karau

Oh that’s rad 

On Tue, May 17, 2022 at 7:47 AM bo yang  wrote:

> Hi Spark Folks,
>
> I built a web reverse proxy to access Spark UI on Kubernetes (working
> together with https://github.com/GoogleCloudPlatform/spark-on-k8s-operator).
> Want to share here in case other people have similar need.
>
> The reverse proxy code is here:
> https://github.com/datapunchorg/spark-ui-reverse-proxy
>
> Let me know if anyone wants to use or would like to contribute.
>
> Thanks,
> Bo
>
> --
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Reverse proxy for Spark UI on Kubernetes

2022-05-16 Thread bo yang

Hi Spark Folks,

I built a web reverse proxy to access Spark UI on Kubernetes (working
together with https://github.com/GoogleCloudPlatform/spark-on-k8s-operator).
Want to share here in case other people have similar need.

The reverse proxy code is here:
https://github.com/datapunchorg/spark-ui-reverse-proxy

Let me know if anyone wants to use or would like to contribute.

Thanks,
Bo

Display AggregateExec modes in Spark UI

2021-11-18 Thread 万昆

Hello, 


Now Spark UI does not show  HashAggregateExec  modes, could we add the 
aggregate modes in SparkPlan? I think it's helpful when we analyze a very 
complicated SparkPlan.
Am I right? 


For example:
SELECT key2, sum(value2) as sum_value2
FROM ( SELECT id % 1 as key2, id as value2 FROM range(1, 1000) ) as skewData2
GROUP BY key2
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=true
+- *(2) HashAggregate(keys=[key2#286L], functions=[sum(value2#287L)], 
modes=(Final), output=[key2#286L, sum_value2#288L])
   +- CustomShuffleReader coalesced
  +- ShuffleQueryStage 0
 +- Exchange hashpartitioning(key2#286L, 5), ENSURE_REQUIREMENTS, 
[id=#112]
+- *(1) HashAggregate(keys=[key2#286L], 
functions=[partial_sum(value2#287L)], modes=(Partial), output=[key2#286L, 
sum#294L])
   +- *(1) Project [(id#289L % 1) AS key2#286L, id#289L AS 
value2#287L]
  +- *(1) Range (1, 1000, step=1, splits=2)

spark UI not showing correct file and line numbers

2021-10-05 Thread Rochel Wasserman

Hi,

I am experiencing performance issues in one of my pyspark applications. When I 
look at the spark UI, the file and line number of each entry is listed as 
. I would like to use the information in the Spark UI for debugging, 
but without knowing the correct file and line number for the information I'm 
seeing, it does not help much.

Are others having this issue as well? What can be done to resolve?

Thanks!

Rochel

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Spark UI Storage Memory

2020-12-07 Thread Amit Sharma

any suggestion please.

Thanks
Amit

On Fri, Dec 4, 2020 at 2:27 PM Amit Sharma  wrote:

> Is there any memory leak in spark 2.3.3 version as mentioned in below
> Jira.
> https://issues.apache.org/jira/browse/SPARK-29055.
>
> Please let me know how to solve it.
>
> Thanks
> Amit
>
> On Fri, Dec 4, 2020 at 1:55 PM Amit Sharma  wrote:
>
>> Can someone help me on this please.
>>
>>
>> Thanks
>> Amit
>>
>> On Wed, Dec 2, 2020 at 11:52 AM Amit Sharma  wrote:
>>
>>> Hi , I have a spark streaming job. When I am checking the Excetors tab ,
>>> there is a Storage Memory column. It displays used memory  /total memory.
>>> What is used memory. Is it memory in  use or memory used so far. How would
>>> I know how much memory is unused at 1 point of time.
>>>
>>>
>>> Thanks
>>> Amit
>>>
>>

RE: Spark UI Storage Memory

2020-12-04 Thread Jack Yang

unsubsribe

Re: Spark UI Storage Memory

2020-12-04 Thread Amit Sharma

Is there any memory leak in spark 2.3.3 version as mentioned in below Jira.
https://issues.apache.org/jira/browse/SPARK-29055.

Please let me know how to solve it.

Thanks
Amit

On Fri, Dec 4, 2020 at 1:55 PM Amit Sharma  wrote:

> Can someone help me on this please.
>
>
> Thanks
> Amit
>
> On Wed, Dec 2, 2020 at 11:52 AM Amit Sharma  wrote:
>
>> Hi , I have a spark streaming job. When I am checking the Excetors tab ,
>> there is a Storage Memory column. It displays used memory  /total memory.
>> What is used memory. Is it memory in  use or memory used so far. How would
>> I know how much memory is unused at 1 point of time.
>>
>>
>> Thanks
>> Amit
>>
>

Re: Spark UI Storage Memory

2020-12-04 Thread Amit Sharma

Can someone help me on this please.


Thanks
Amit

On Wed, Dec 2, 2020 at 11:52 AM Amit Sharma  wrote:

> Hi , I have a spark streaming job. When I am checking the Excetors tab ,
> there is a Storage Memory column. It displays used memory  /total memory.
> What is used memory. Is it memory in  use or memory used so far. How would
> I know how much memory is unused at 1 point of time.
>
>
> Thanks
> Amit
>

Spark UI Storage Memory

2020-12-02 Thread Amit Sharma

Hi , I have a spark streaming job. When I am checking the Excetors tab ,
there is a Storage Memory column. It displays used memory  /total memory.
What is used memory. Is it memory in  use or memory used so far. How would
I know how much memory is unused at 1 point of time.


Thanks
Amit

spark UI storage tab

2020-11-11 Thread Amit Sharma

Hi , I have few questions as below

1. In the spark ui storage tab is displayed 'storage level',' size in
memory' and size on disk, i am not sure it displays RDD ID 16 with memory
usage 76 MB not sure why it is not getting 0 once a request for spark
streaming is completed. I am caching some RDD inside a method and uncaching
it.

2. Similarly on Executor tab it display  'Storage Memory' used and
available, is that used means currently in use  or memory used on that
executor at some point of time (maximum memory used so far)




Thanks
Amit

Re: Spark UI

2020-07-20 Thread ArtemisDev

Thanks Xiao for the info.  I was looking for this, too.  This page 
wasn't linked from anywhere on the main doc page (Overview) or any of 
the pull-down menus.  Someone should remind the doc team to update the 
table of contents on the Overview page.


-- ND

On 7/19/20 10:30 PM, Xiao Li wrote:
https://spark.apache.org/docs/3.0.0/web-ui.html is the official doc 
for Spark UI.


Xiao

On Sun, Jul 19, 2020 at 1:38 PM venkatadevarapu 
mailto:ramesh.biexp...@gmail.com>> wrote:


Hi,

I'm looking for a tutorial/video/material which explains the
content of
various tabes in SPARK WEB UI.
Can some one direct me with the relevant info.

Thanks



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
<mailto:user-unsubscr...@spark.apache.org>



--
<https://databricks.com/sparkaisummit/north-america>

Re: Spark UI

2020-07-19 Thread Piyush Acharya

https://www.youtube.com/watch?v=YgQgJceojJY  (Xiao's video )





On Mon, Jul 20, 2020 at 8:03 AM Xiao Li  wrote:

> https://spark.apache.org/docs/3.0.0/web-ui.html is the official doc
> for Spark UI.
>
> Xiao
>
> On Sun, Jul 19, 2020 at 1:38 PM venkatadevarapu 
> wrote:
>
>> Hi,
>>
>> I'm looking for a tutorial/video/material which explains the content of
>> various tabes in SPARK WEB UI.
>> Can some one direct me with the relevant info.
>>
>> Thanks
>>
>>
>>
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>
> --
> <https://databricks.com/sparkaisummit/north-america>
>

Re: Spark UI

2020-07-19 Thread Xiao Li

https://spark.apache.org/docs/3.0.0/web-ui.html is the official doc
for Spark UI.

Xiao

On Sun, Jul 19, 2020 at 1:38 PM venkatadevarapu 
wrote:

> Hi,
>
> I'm looking for a tutorial/video/material which explains the content of
> various tabes in SPARK WEB UI.
> Can some one direct me with the relevant info.
>
> Thanks
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

-- 
<https://databricks.com/sparkaisummit/north-america>

Spark UI

2020-07-19 Thread venkatadevarapu

Hi,

I'm looking for a tutorial/video/material which explains the content of
various tabes in SPARK WEB UI.
Can some one direct me with the relevant info.

Thanks



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Using Spark UI with Running Spark on Hadoop Yarn

2020-07-13 Thread ArtemisDev

Is there anyway to make the spark process visible via Spark UI when 
running Spark 3.0 on a Hadoop yarn cluster?  The spark documentation 
talked about replacing Spark UI with the spark history server, but 
didn't give much details.  Therefore I would assume it is still possible 
to use Spark UI when running spark on a hadoop yarn cluster.  Is this 
correct?   Does the spark history server have the same user functions as 
the Spark UI?


But how could this be possible (the possibility of using Spark UI) if 
the Spark master server isn't active when all the job scheduling and 
resource allocation tasks are replaced by yarn servers?


Thanks!

-- ND


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Can you view thread dumps on spark UI if job finished

2020-04-08 Thread Zahid Rahman

http://spark.apache.org/docs/latest/monitoring.html#spark-configuration-options

says
"Note that this information is only available for the duration of the
application by default. To view the web UI after the fact, set
spark.eventLog.enabled to true before starting the application. This
configures Spark to log Spark events that encode the information displayed
in the UI to persisted storage."

Backbutton.co.uk
¯\_(ツ)_/¯
♡۶Java♡۶RMI ♡۶
Make Use Method {MUM}
makeuse.org
<http://www.backbutton.co.uk>


On Thu, 9 Apr 2020 at 00:50, Ruijing Li  wrote:

> Thanks Zahid, Yes I am using history server to see previous UIs.
>
>  However, my question still remains on viewing old thread dumps, as I
> cannot see them on the old completed spark UIs, only when spark context is
> running.
>
> On Wed, Apr 8, 2020 at 4:01 PM Zahid Rahman  wrote:
>
>> Spark UI is only available while SparkContext is running.
>>
>> However  You can get to the Spark UI after your application  completes or
>> crashes.
>>
>> To do this Spark includes a tool called the Spark History Server that
>> allows you to reconstruct the Spark UI.
>>
>> You can find up to date information on how to use this tool in the spark
>> documentation https://spark.apache.org/docs/latest/monitoring.html
>>
>>
>>
>>
>>
>> On Wed, 8 Apr 2020, 23:47 Ruijing Li,  wrote:
>>
>>> Hi all,
>>>
>>> As stated in title, currently when I view the spark UI of a completed
>>> spark job, I see there are thread dump links in the executor tab, but
>>> clicking on them does nothing. Is it possible to see the thread dumps
>>> somehow even if the job finishes? On spark 2.4.5.
>>>
>>> Thanks.
>>> --
>>> Cheers,
>>> Ruijing Li
>>>
>> --
> Cheers,
> Ruijing Li
>

Re: Can you view thread dumps on spark UI if job finished

2020-04-08 Thread Ruijing Li

Thanks Zahid, Yes I am using history server to see previous UIs.

 However, my question still remains on viewing old thread dumps, as I
cannot see them on the old completed spark UIs, only when spark context is
running.

On Wed, Apr 8, 2020 at 4:01 PM Zahid Rahman  wrote:

> Spark UI is only available while SparkContext is running.
>
> However  You can get to the Spark UI after your application  completes or
> crashes.
>
> To do this Spark includes a tool called the Spark History Server that
> allows you to reconstruct the Spark UI.
>
> You can find up to date information on how to use this tool in the spark
> documentation https://spark.apache.org/docs/latest/monitoring.html
>
>
>
>
>
> On Wed, 8 Apr 2020, 23:47 Ruijing Li,  wrote:
>
>> Hi all,
>>
>> As stated in title, currently when I view the spark UI of a completed
>> spark job, I see there are thread dump links in the executor tab, but
>> clicking on them does nothing. Is it possible to see the thread dumps
>> somehow even if the job finishes? On spark 2.4.5.
>>
>> Thanks.
>> --
>> Cheers,
>> Ruijing Li
>>
> --
Cheers,
Ruijing Li

Re: Can you view thread dumps on spark UI if job finished

2020-04-08 Thread Zahid Rahman

Spark UI is only available while SparkContext is running.

However  You can get to the Spark UI after your application  completes or
crashes.

To do this Spark includes a tool called the Spark History Server that
allows you to reconstruct the Spark UI.

You can find up to date information on how to use this tool in the spark
documentation https://spark.apache.org/docs/latest/monitoring.html

On Wed, 8 Apr 2020, 23:47 Ruijing Li,  wrote:

> Hi all,
>
> As stated in title, currently when I view the spark UI of a completed
> spark job, I see there are thread dump links in the executor tab, but
> clicking on them does nothing. Is it possible to see the thread dumps
> somehow even if the job finishes? On spark 2.4.5.
>
> Thanks.
> --
> Cheers,
> Ruijing Li
>

Can you view thread dumps on spark UI if job finished

2020-04-08 Thread Ruijing Li

Hi all,

As stated in title, currently when I view the spark UI of a completed spark
job, I see there are thread dump links in the executor tab, but clicking on
them does nothing. Is it possible to see the thread dumps somehow even if
the job finishes? On spark 2.4.5.

Thanks.
-- 
Cheers,
Ruijing Li

Re: Spark UI History server on Kubernetes

2019-01-23 Thread Li Gao

In addition to what Rao mentioned, if you are using cloud blob storage such
as AWS S3, you can specify your history location to be an S3 location such
as:  `s3://mybucket/path/to/history`


On Wed, Jan 23, 2019 at 12:55 AM Rao, Abhishek (Nokia - IN/Bangalore) <
abhishek@nokia.com> wrote:

> Hi Lakshman,
>
>
>
> We’ve set these 2 properties to bringup spark history server
>
>
>
> spark.history.fs.logDirectory 
>
> spark.history.ui.port 
>
>
>
> We’re writing the logs to HDFS. In order to write logs, we’re setting
> following properties while submitting the spark job
>
> spark.eventLog.enabled true
>
> spark.eventLog.dir 
>
>
>
> Thanks and Regards,
>
> Abhishek
>
>
>
> *From:* Battini Lakshman 
> *Sent:* Wednesday, January 23, 2019 1:55 PM
> *To:* Rao, Abhishek (Nokia - IN/Bangalore) 
> *Subject:* Re: Spark UI History server on Kubernetes
>
>
>
> HI Abhishek,
>
>
>
> Thank you for your response. Could you please let me know the properties
> you configured for bringing up History Server and its UI.
>
>
>
> Also, are you writing the logs to any directory on persistent storage, if
> yes, could you let me know the changes you did in Spark to write logs to
> that directory. Thanks!
>
>
>
> Best Regards,
>
> Lakshman Battini.
>
>
>
> On Tue, Jan 22, 2019 at 10:53 PM Rao, Abhishek (Nokia - IN/Bangalore) <
> abhishek@nokia.com> wrote:
>
> Hi,
>
>
>
> We’ve setup spark-history service (based on spark 2.4) on K8S. UI works
> perfectly fine when running on NodePort. We’re facing some issues when on
> ingress.
>
> Please let us know what kind of inputs do you need?
>
>
>
> Thanks and Regards,
>
> Abhishek
>
>
>
> *From:* Battini Lakshman 
> *Sent:* Tuesday, January 22, 2019 6:02 PM
> *To:* user@spark.apache.org
> *Subject:* Spark UI History server on Kubernetes
>
>
>
> Hello,
>
>
>
> We are running Spark 2.4 on Kubernetes cluster, able to access the Spark
> UI using "kubectl port-forward".
>
>
>
> However, this spark UI contains currently running Spark application logs,
> we would like to maintain the 'completed' spark application logs as well.
> Could someone help us to setup 'Spark History server' on Kubernetes. Thanks!
>
>
>
> Best Regards,
>
> Lakshman Battini.
>
>

RE: Spark UI History server on Kubernetes

2019-01-23 Thread Rao, Abhishek (Nokia - IN/Bangalore)

Hi Lakshman,

We’ve set these 2 properties to bringup spark history server

spark.history.fs.logDirectory 
spark.history.ui.port 

We’re writing the logs to HDFS. In order to write logs, we’re setting following 
properties while submitting the spark job
spark.eventLog.enabled true
spark.eventLog.dir 

Thanks and Regards,
Abhishek

From: Battini Lakshman 
Sent: Wednesday, January 23, 2019 1:55 PM
To: Rao, Abhishek (Nokia - IN/Bangalore) 
Subject: Re: Spark UI History server on Kubernetes

HI Abhishek,

Thank you for your response. Could you please let me know the properties you 
configured for bringing up History Server and its UI.

Also, are you writing the logs to any directory on persistent storage, if yes, 
could you let me know the changes you did in Spark to write logs to that 
directory. Thanks!

Best Regards,
Lakshman Battini.

On Tue, Jan 22, 2019 at 10:53 PM Rao, Abhishek (Nokia - IN/Bangalore) 
mailto:abhishek@nokia.com>> wrote:
Hi,

We’ve setup spark-history service (based on spark 2.4) on K8S. UI works 
perfectly fine when running on NodePort. We’re facing some issues when on 
ingress.
Please let us know what kind of inputs do you need?

Thanks and Regards,
Abhishek

From: Battini Lakshman 
mailto:battini.laksh...@gmail.com>>
Sent: Tuesday, January 22, 2019 6:02 PM
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Spark UI History server on Kubernetes

Hello,

We are running Spark 2.4 on Kubernetes cluster, able to access the Spark UI 
using "kubectl port-forward".

However, this spark UI contains currently running Spark application logs, we 
would like to maintain the 'completed' spark application logs as well. Could 
someone help us to setup 'Spark History server' on Kubernetes. Thanks!

Best Regards,
Lakshman Battini.

RE: Spark UI History server on Kubernetes

2019-01-22 Thread Rao, Abhishek (Nokia - IN/Bangalore)

Hi,

We’ve setup spark-history service (based on spark 2.4) on K8S. UI works 
perfectly fine when running on NodePort. We’re facing some issues when on 
ingress.
Please let us know what kind of inputs do you need?

Thanks and Regards,
Abhishek

From: Battini Lakshman 
Sent: Tuesday, January 22, 2019 6:02 PM
To: user@spark.apache.org
Subject: Spark UI History server on Kubernetes

Hello,

We are running Spark 2.4 on Kubernetes cluster, able to access the Spark UI 
using "kubectl port-forward".

However, this spark UI contains currently running Spark application logs, we 
would like to maintain the 'completed' spark application logs as well. Could 
someone help us to setup 'Spark History server' on Kubernetes. Thanks!

Best Regards,
Lakshman Battini.

Spark UI History server on Kubernetes

2019-01-22 Thread Battini Lakshman

Hello,

We are running Spark 2.4 on Kubernetes cluster, able to access the Spark UI
using "kubectl port-forward".

However, this spark UI contains currently running Spark application logs,
we would like to maintain the 'completed' spark application logs as well.
Could someone help us to setup 'Spark History server' on Kubernetes. Thanks!

Best Regards,
Lakshman Battini.

Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-25 Thread Patrick Brown

Done:

https://issues.apache.org/jira/browse/SPARK-25837

On Thu, Oct 25, 2018 at 10:21 AM Marcelo Vanzin  wrote:

> Ah that makes more sense. Could you file a bug with that information
> so we don't lose track of this?
>
> Thanks
> On Wed, Oct 24, 2018 at 6:13 PM Patrick Brown
>  wrote:
> >
> > On my production application I am running ~200 jobs at once, but
> continue to submit jobs in this manner for sometimes ~1 hour.
> >
> > The reproduction code above generally only has 4 ish jobs running at
> once, and as you can see runs through 50k jobs in this manner.
> >
> > I guess I should clarify my above statement, the issue seems to appear
> when running multiple jobs at once as well as in sequence for a while and
> may as well have something to do with high master CPU usage (thus the
> collect in the code). My rough guess would be whatever is managing clearing
> out completed jobs gets overwhelmed (my master was a 4 core machine while
> running this, and htop reported almost full CPU usage across all 4 cores).
> >
> > The attached screenshot shows the state of the webui after running the
> repro code, you can see the ui is displaying some 43k completed jobs (takes
> a long time to load) after a few minutes of inactivity this will clear out,
> however as my production application continues to submit jobs every once in
> a while, the issue persists.
> >
> > On Wed, Oct 24, 2018 at 5:05 PM Marcelo Vanzin 
> wrote:
> >>
> >> When you say many jobs at once, what ballpark are you talking about?
> >>
> >> The code in 2.3+ does try to keep data about all running jobs and
> >> stages regardless of the limit. If you're running into issues because
> >> of that we may have to look again at whether that's the right thing to
> >> do.
> >> On Tue, Oct 23, 2018 at 10:02 AM Patrick Brown
> >>  wrote:
> >> >
> >> > I believe I may be able to reproduce this now, it seems like it may
> be something to do with many jobs at once:
> >> >
> >> > Spark 2.3.1
> >> >
> >> > > spark-shell --conf spark.ui.retainedJobs=1
> >> >
> >> > scala> import scala.concurrent._
> >> > scala> import scala.concurrent.ExecutionContext.Implicits.global
> >> > scala> for (i <- 0 until 5) { Future { println(sc.parallelize(0
> until i).collect.length) } }
> >> >
> >> > On Mon, Oct 22, 2018 at 11:25 AM Marcelo Vanzin 
> wrote:
> >> >>
> >> >> Just tried on 2.3.2 and worked fine for me. UI had a single job and a
> >> >> single stage (+ the tasks related to that single stage), same thing
> in
> >> >> memory (checked with jvisualvm).
> >> >>
> >> >> On Sat, Oct 20, 2018 at 6:45 PM Marcelo Vanzin 
> wrote:
> >> >> >
> >> >> > On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown
> >> >> >  wrote:
> >> >> > > I recently upgraded to spark 2.3.1 I have had these same
> settings in my spark submit script, which worked on 2.0.2, and according to
> the documentation appear to not have changed:
> >> >> > >
> >> >> > > spark.ui.retainedTasks=1
> >> >> > > spark.ui.retainedStages=1
> >> >> > > spark.ui.retainedJobs=1
> >> >> >
> >> >> > I tried that locally on the current master and it seems to be
> working.
> >> >> > I don't have 2.3 easily in front of me right now, but will take a
> look
> >> >> > Monday.
> >> >> >
> >> >> > --
> >> >> > Marcelo
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Marcelo
> >>
> >>
> >>
> >> --
> >> Marcelo
>
>
>
> --
> Marcelo
>

Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-25 Thread Marcelo Vanzin

Ah that makes more sense. Could you file a bug with that information
so we don't lose track of this?

Thanks
On Wed, Oct 24, 2018 at 6:13 PM Patrick Brown
 wrote:
>
> On my production application I am running ~200 jobs at once, but continue to 
> submit jobs in this manner for sometimes ~1 hour.
>
> The reproduction code above generally only has 4 ish jobs running at once, 
> and as you can see runs through 50k jobs in this manner.
>
> I guess I should clarify my above statement, the issue seems to appear when 
> running multiple jobs at once as well as in sequence for a while and may as 
> well have something to do with high master CPU usage (thus the collect in the 
> code). My rough guess would be whatever is managing clearing out completed 
> jobs gets overwhelmed (my master was a 4 core machine while running this, and 
> htop reported almost full CPU usage across all 4 cores).
>
> The attached screenshot shows the state of the webui after running the repro 
> code, you can see the ui is displaying some 43k completed jobs (takes a long 
> time to load) after a few minutes of inactivity this will clear out, however 
> as my production application continues to submit jobs every once in a while, 
> the issue persists.
>
> On Wed, Oct 24, 2018 at 5:05 PM Marcelo Vanzin  wrote:
>>
>> When you say many jobs at once, what ballpark are you talking about?
>>
>> The code in 2.3+ does try to keep data about all running jobs and
>> stages regardless of the limit. If you're running into issues because
>> of that we may have to look again at whether that's the right thing to
>> do.
>> On Tue, Oct 23, 2018 at 10:02 AM Patrick Brown
>>  wrote:
>> >
>> > I believe I may be able to reproduce this now, it seems like it may be 
>> > something to do with many jobs at once:
>> >
>> > Spark 2.3.1
>> >
>> > > spark-shell --conf spark.ui.retainedJobs=1
>> >
>> > scala> import scala.concurrent._
>> > scala> import scala.concurrent.ExecutionContext.Implicits.global
>> > scala> for (i <- 0 until 5) { Future { println(sc.parallelize(0 until 
>> > i).collect.length) } }
>> >
>> > On Mon, Oct 22, 2018 at 11:25 AM Marcelo Vanzin  
>> > wrote:
>> >>
>> >> Just tried on 2.3.2 and worked fine for me. UI had a single job and a
>> >> single stage (+ the tasks related to that single stage), same thing in
>> >> memory (checked with jvisualvm).
>> >>
>> >> On Sat, Oct 20, 2018 at 6:45 PM Marcelo Vanzin  
>> >> wrote:
>> >> >
>> >> > On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown
>> >> >  wrote:
>> >> > > I recently upgraded to spark 2.3.1 I have had these same settings in 
>> >> > > my spark submit script, which worked on 2.0.2, and according to the 
>> >> > > documentation appear to not have changed:
>> >> > >
>> >> > > spark.ui.retainedTasks=1
>> >> > > spark.ui.retainedStages=1
>> >> > > spark.ui.retainedJobs=1
>> >> >
>> >> > I tried that locally on the current master and it seems to be working.
>> >> > I don't have 2.3 easily in front of me right now, but will take a look
>> >> > Monday.
>> >> >
>> >> > --
>> >> > Marcelo
>> >>
>> >>
>> >>
>> >> --
>> >> Marcelo
>>
>>
>>
>> --
>> Marcelo



-- 
Marcelo

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-24 Thread Marcelo Vanzin

When you say many jobs at once, what ballpark are you talking about?

The code in 2.3+ does try to keep data about all running jobs and
stages regardless of the limit. If you're running into issues because
of that we may have to look again at whether that's the right thing to
do.
On Tue, Oct 23, 2018 at 10:02 AM Patrick Brown
 wrote:
>
> I believe I may be able to reproduce this now, it seems like it may be 
> something to do with many jobs at once:
>
> Spark 2.3.1
>
> > spark-shell --conf spark.ui.retainedJobs=1
>
> scala> import scala.concurrent._
> scala> import scala.concurrent.ExecutionContext.Implicits.global
> scala> for (i <- 0 until 5) { Future { println(sc.parallelize(0 until 
> i).collect.length) } }
>
> On Mon, Oct 22, 2018 at 11:25 AM Marcelo Vanzin  wrote:
>>
>> Just tried on 2.3.2 and worked fine for me. UI had a single job and a
>> single stage (+ the tasks related to that single stage), same thing in
>> memory (checked with jvisualvm).
>>
>> On Sat, Oct 20, 2018 at 6:45 PM Marcelo Vanzin  wrote:
>> >
>> > On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown
>> >  wrote:
>> > > I recently upgraded to spark 2.3.1 I have had these same settings in my 
>> > > spark submit script, which worked on 2.0.2, and according to the 
>> > > documentation appear to not have changed:
>> > >
>> > > spark.ui.retainedTasks=1
>> > > spark.ui.retainedStages=1
>> > > spark.ui.retainedJobs=1
>> >
>> > I tried that locally on the current master and it seems to be working.
>> > I don't have 2.3 easily in front of me right now, but will take a look
>> > Monday.
>> >
>> > --
>> > Marcelo
>>
>>
>>
>> --
>> Marcelo



-- 
Marcelo

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-23 Thread Patrick Brown

I believe I may be able to reproduce this now, it seems like it may be
something to do with many jobs at once:

Spark 2.3.1

> spark-shell --conf spark.ui.retainedJobs=1

scala> import scala.concurrent._
scala> import scala.concurrent.ExecutionContext.Implicits.global
scala> for (i <- 0 until 5) { Future { println(sc.parallelize(0 until
i).collect.length) } }

On Mon, Oct 22, 2018 at 11:25 AM Marcelo Vanzin  wrote:

> Just tried on 2.3.2 and worked fine for me. UI had a single job and a
> single stage (+ the tasks related to that single stage), same thing in
> memory (checked with jvisualvm).
>
> On Sat, Oct 20, 2018 at 6:45 PM Marcelo Vanzin 
> wrote:
> >
> > On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown
> >  wrote:
> > > I recently upgraded to spark 2.3.1 I have had these same settings in
> my spark submit script, which worked on 2.0.2, and according to the
> documentation appear to not have changed:
> > >
> > > spark.ui.retainedTasks=1
> > > spark.ui.retainedStages=1
> > > spark.ui.retainedJobs=1
> >
> > I tried that locally on the current master and it seems to be working.
> > I don't have 2.3 easily in front of me right now, but will take a look
> > Monday.
> >
> > --
> > Marcelo
>
>
>
> --
> Marcelo
>

Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-22 Thread Marcelo Vanzin

Just tried on 2.3.2 and worked fine for me. UI had a single job and a
single stage (+ the tasks related to that single stage), same thing in
memory (checked with jvisualvm).

On Sat, Oct 20, 2018 at 6:45 PM Marcelo Vanzin  wrote:
>
> On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown
>  wrote:
> > I recently upgraded to spark 2.3.1 I have had these same settings in my 
> > spark submit script, which worked on 2.0.2, and according to the 
> > documentation appear to not have changed:
> >
> > spark.ui.retainedTasks=1
> > spark.ui.retainedStages=1
> > spark.ui.retainedJobs=1
>
> I tried that locally on the current master and it seems to be working.
> I don't have 2.3 easily in front of me right now, but will take a look
> Monday.
>
> --
> Marcelo



-- 
Marcelo

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-20 Thread Marcelo Vanzin

On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown
 wrote:
> I recently upgraded to spark 2.3.1 I have had these same settings in my spark 
> submit script, which worked on 2.0.2, and according to the documentation 
> appear to not have changed:
>
> spark.ui.retainedTasks=1
> spark.ui.retainedStages=1
> spark.ui.retainedJobs=1

I tried that locally on the current master and it seems to be working.
I don't have 2.3 easily in front of me right now, but will take a look
Monday.

-- 
Marcelo

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-20 Thread Shing Hing Man

 I have the same problem when I upgrade my application from Spark 2.2.1 to 
Spark 2.3.2 and run in Yarn client mode.
Also I noticed that in my Spark driver,  org.apache.spark.status.TaskDataWrapper
could take up more than 2G of memory. 

Shing


On Tuesday, 16 October 2018, 17:34:02 GMT+1, Patrick Brown 
 wrote:  
 
 I recently upgraded to spark 2.3.1 I have had these same settings in my spark 
submit script, which worked on 2.0.2, and according to the documentation appear 
to not have changed:
spark.ui.retainedTasks=1spark.ui.retainedStages=1spark.ui.retainedJobs=1
However in 2.3.1 the UI doesn't seem to respect this, it still retains a huge 
number of jobs:



Is this a known issue? Any ideas?  
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

[Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-16 Thread Patrick Brown

I recently upgraded to spark 2.3.1 I have had these same settings in my
spark submit script, which worked on 2.0.2, and according to the
documentation appear to not have changed:

spark.ui.retainedTasks=1
spark.ui.retainedStages=1
spark.ui.retainedJobs=1

However in 2.3.1 the UI doesn't seem to respect this, it still retains a
huge number of jobs:

[image: Screen Shot 2018-10-16 at 10.31.50 AM.png]


Is this a known issue? Any ideas?

[Spark UI] find driver for an application

2018-09-24 Thread bsikander

Hello,
I am having some troubles in using Spark Master UI to figure out some basic
information.
The process is too tedious.
I am using spark 2.2.1 with Spark standalone.

- In cluster mode, how to figure out which driver is related to which
application?
- In supervise mode, how to track the restarts? How many times it was
restarted, the app id of all the applications after restart and VM IP where
it was running.

Any help would be much appreciated.




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Understanding Event Timeline of Spark UI

2018-06-15 Thread Aakash Basu

Hi,

I've a job running which shows the Event Timeline as follows, I am trying
to guess the gaps between these single lines, they seem to be parallel but
not immediately sequential with other stages.

Any other insight from this, and what is the cluster doing during these
gaps?




Thanks,
Aakash.

Re: Spark UI Source Code

2018-05-09 Thread Marcelo Vanzin

(-dev)

The KVStore API is private to Spark, it's not really meant to be used
by others. You're free to try, and there's a lot of javadocs on the
different interfaces, but it's not a general purpose database, so
you'll need to figure out things like that by yourself.

On Tue, May 8, 2018 at 9:53 PM, Anshi Shrivastava
<anshi.shrivast...@exadatum.com> wrote:
> Hi Marcelo, Dev,
>
> Thanks for your response.
> I have used SparkListeners to fetch the metrics (the public REST API uses
> the same) but to monitor these metrics over time, I have to persist them
> (using KVStore library of spark).  Is there a way to fetch data from this
> KVStore (which uses levelDb for storage) and filter it on basis on
> timestamp?
>
> Thanks,
> Anshi
>
> On Mon, May 7, 2018 at 9:51 PM, Marcelo Vanzin [via Apache Spark User List]
> <ml+s1001560n32114...@n3.nabble.com> wrote:
>>
>> On Mon, May 7, 2018 at 1:44 AM, Anshi Shrivastava
>> <[hidden email]> wrote:
>> > I've found a KVStore wrapper which stores all the metrics in a LevelDb
>> > store. This KVStore wrapper is available as a spark-dependency but we
>> > cannot
>> > access the metrics directly from spark since they are all private.
>>
>> I'm not sure what it is you're trying to do exactly, but there's a
>> public REST API that exposes all the data Spark keeps about
>> applications. There's also a programmatic status tracker
>> (SparkContext.statusTracker) that's easier to use from within the
>> running Spark app, but has a lot less info.
>>
>> > Can we use this store to store our own metrics?
>>
>> No.
>>
>> > Also can we retrieve these metrics based on timestamp?
>>
>> Only if the REST API has that feature, don't remember off the top of my
>> head.
>>
>>
>> --
>> Marcelo
>>
>> -----
>> To unsubscribe e-mail: [hidden email]
>>
>>
>>
>> 
>> If you reply to this email, your message will be added to the discussion
>> below:
>>
>> http://apache-spark-user-list.1001560.n3.nabble.com/Re-Spark-UI-Source-Code-tp32114.html
>> To start a new topic under Apache Spark User List, email
>> ml+s1001560n1...@n3.nabble.com
>> To unsubscribe from Apache Spark User List, click here.
>> NAML
>
>
>
>
> DISCLAIMER:
> All the content in email is intended for the recipient and not to be
> published elsewhere without Exadatum consent. And attachments shall be send
> only if required and with ownership of the sender. This message contains
> confidential information and is intended only for the individual named. If
> you are not the named addressee, you should not disseminate, distribute or
> copy this email. Please notify the sender immediately by email if you have
> received this email by mistake and delete this email from your system. Email
> transmission cannot be guaranteed to be secure or error-free, as information
> could be intercepted, corrupted, lost, destroyed, arrive late or incomplete,
> or contain viruses. The sender, therefore, does not accept liability for any
> errors or omissions in the contents of this message which arise as a result
> of email transmission. If verification is required, please request a
> hard-copy version.



-- 
Marcelo

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Spark UI Source Code

2018-05-07 Thread Marcelo Vanzin

On Mon, May 7, 2018 at 1:44 AM, Anshi Shrivastava
 wrote:
> I've found a KVStore wrapper which stores all the metrics in a LevelDb
> store. This KVStore wrapper is available as a spark-dependency but we cannot
> access the metrics directly from spark since they are all private.

I'm not sure what it is you're trying to do exactly, but there's a
public REST API that exposes all the data Spark keeps about
applications. There's also a programmatic status tracker
(SparkContext.statusTracker) that's easier to use from within the
running Spark app, but has a lot less info.

> Can we use this store to store our own metrics?

No.

> Also can we retrieve these metrics based on timestamp?

Only if the REST API has that feature, don't remember off the top of my head.

-- 
Marcelo

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: What do I need to set to see the number of records and processing time for each batch in SPARK UI?

2018-03-27 Thread kant kodali

For example in this blog
<https://databricks.com/blog/2015/07/08/new-visualizations-for-understanding-apache-spark-streaming-applications.html>
post.
Looking at figure 1 and figure 2 I wonder What I need to do to see those
graphs in spark 2.3.0?

On Mon, Mar 26, 2018 at 7:10 AM, kant kodali <kanth...@gmail.com> wrote:

> Hi All,
>
> I am using spark 2.3.0 and I wondering what do I need to set to see the
> number of records and processing time for each batch in SPARK UI? The
> default UI doesn't seem to show this.
>
> Thanks@
>

What do I need to set to see the number of records and processing time for each batch in SPARK UI?

2018-03-26 Thread kant kodali

Hi All,

I am using spark 2.3.0 and I wondering what do I need to set to see the
number of records and processing time for each batch in SPARK UI? The
default UI doesn't seem to show this.

Thanks@

Spark UI Streaming batch time interval does not match batch interval

2018-03-12 Thread Jordan Pilat

Hello,

I am running a streaming app on Spark 2.1.2.  The batch interval is set to 
5000ms, and when I go to the "Streaming" tab in the Spark UI, it correctly 
reports a 5 second batch interval, but the list of batches below only shows one 
batch every two minutes (IE the batch time for each batch is two minutes after 
the prior batch).

Keep in mind, I am not referring to delay or processing time -- only when each 
batch starts.  What could account for this discrepancy?

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

How to create security filter for Spark UI in Spark on YARN

2018-01-09 Thread Jhon Anderson Cardenas Diaz

*Environment*:
AWS EMR, yarn cluster.

*Description*:
I am trying to use a java filter to protect the access to spark ui, this by
using the property spark.ui.filters; the problem is that when spark is
running on yarn mode, that property is being allways overriden by hadoop
with the filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter:

*spark.ui.filters:
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter*

And this properties are automatically added:


*spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS:
ip-x-x-x-226.eu-west-1.compute.internalspark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES:
http://ip-x-x-x-226.eu-west-1.compute.internal:20888/proxy/application_x_
<http://ip-x-x-x-226.eu-west-1.compute.internal:20888/proxy/application_x_>*

Any suggestion of how to add a java security filter so hadoop does not
override it, or maybe how to configure the security from hadoop side?

Thanks.

Spark UI stdout/stderr links point to executors internal address

2018-01-09 Thread Jhon Anderson Cardenas Diaz

*Environment:*

AWS EMR, yarn cluster.



*Description:*

On Spark ui, in Environment and Executors tabs, the links of stdout and
stderr point to the internal address of the executors. This would imply to
expose the executors so that links can be accessed. Shouldn't those links
be pointed to master then handled internally serving the master as a proxy
for these files instead of exposing the internal machines?



I have tried setting SPARK_PUBLIC_DNS and SPARK_LOCAL_IP variables so they
contain the master ip address. I also tried with this properties:
spark.yarn.appMasterEnv.SPARK_LOCAL_IP and
spark.yarn.appMasterEnv.SPARK_PUBLIC_DNS but it does not seem to work.


Any suggestion?

Spark UI port

2017-09-10 Thread Sunil Kalyanpur

Hello all,

I am running PySpark Job (v2.0.2) with checkpoint enabled in Mesos cluster
and am using Marathon for orchestration.

When the job is restarted using Marathon, Spark UI is not getting started
at the port specified by Marathon. Instead, it is picking port from the
checkpoint.

Is there a way we can make spark job to use the port assigned by Marathon
instead of Spark job picking the configuration from the Checkpoint?

-- 
Thanks,
Sunil

Spark UI to use Marathon assigned port

2017-09-07 Thread Sunil Kalyanpur

Hello all,

I am running PySpark Job (v2.0.2) with checkpoint enabled in Mesos cluster
and am using Marathon for orchestration.

When the job is restarted using Marathon, Spark UI is not getting started
at the port specified by Marathon. Instead, it is picking port from the
checkpoint.

Is there a way we can make spark job to use the port assigned by Marathon
instead of Spark job picking the configuration from the Checkpoint?

Please let me know if you need any information.

Thank you,
Sunil

Re: Kill Spark Streaming JOB from Spark UI or Yarn

2017-08-27 Thread Matei Zaharia

The batches should all have the same application ID, so use that one. You can 
also find the application in the YARN UI to terminate it from there.

Matei

> On Aug 27, 2017, at 10:27 AM, KhajaAsmath Mohammed  
> wrote:
> 
> Hi,
> 
> I am new to spark streaming and not able to find an option to kill it after 
> starting spark streaming context.
> 
> Streaming Tab doesnt have option to kill it.
> 
> Jobs tab too doesn't have option to kill it
> 
> 
> 
> if scheduled on yarn, how to kill that if spark submit is running in 
> background as I will not have an option to find yarn application id. does 
> batches have separate yarn application id or same one?
> 
> Thanks,
> Asmath


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Spark UI crashes on Large Workloads

2017-07-18 Thread Saatvik Shah

Hi Riccardo,

Thanks for your suggestions.
The thing is that my Spark UI is the one thing that is crashing - and not
the app. In fact the app does end up completing successfully.
That's why I'm a bit confused by this issue?
I'll still try out some of your suggestions.
Thanks and Regards,
Saatvik Shah


On Tue, Jul 18, 2017 at 9:59 AM, Riccardo Ferrari <ferra...@gmail.com>
wrote:

> The reason you get connection refused when connecting to the application
> UI (port 4040) is because you app gets stopped thus the application UI
> stops as well. To inspect your executors logs after the fact you might find
> useful the Spark History server
> <https://spark.apache.org/docs/latest/monitoring.html#viewing-after-the-fact>
> (for standalone mode).
>
> Personally I I collect the logs from my worker nodes. They generally sit
> under the $SPARK_HOME/work// (for standalone).
> There you can find exceptions and messages from the executors assigned to
> your app.
>
> Now, about you app crashing, might be useful check whether it is sized
> correctly. The issue you linked sounds appropriate however I would give
> some sanity checks a try. I solved many issues just by sizing an app that I
> would first check memory size, cpu allocations and so on..
>
> Best,
>
> On Tue, Jul 18, 2017 at 3:30 PM, Saatvik Shah <saatvikshah1...@gmail.com>
> wrote:
>
>> Hi Riccardo,
>>
>> Yes, Thanks for suggesting I do that.
>>
>> [Stage 1:==>   (12750 + 40)
>> / 15000]17/07/18 13:22:28 ERROR org.apache.spark.scheduler.LiveListenerBus:
>> Dropping SparkListenerEvent because no remaining room in event queue. This
>> likely means one of the SparkListeners is too slow and cannot keep up with
>> the rate at which tasks are being started by the scheduler.
>> 17/07/18 13:22:28 WARN org.apache.spark.scheduler.LiveListenerBus:
>> Dropped 1 SparkListenerEvents since Thu Jan 01 00:00:00 UTC 1970
>> [Stage 1:> (13320 + 41)
>> / 15000]17/07/18 13:23:28 WARN org.apache.spark.scheduler.LiveListenerBus:
>> Dropped 26782 SparkListenerEvents since Tue Jul 18 13:22:28 UTC 2017
>> [Stage 1:==>   (13867 + 40)
>> / 15000]17/07/18 13:24:28 WARN org.apache.spark.scheduler.LiveListenerBus:
>> Dropped 58751 SparkListenerEvents since Tue Jul 18 13:23:28 UTC 2017
>> [Stage 1:===>  (14277 + 40)
>> / 15000]17/07/18 13:25:10 INFO 
>> org.spark_project.jetty.server.ServerConnector:
>> Stopped ServerConnector@3b7284c4{HTTP/1.1}{0.0.0.0:4040}
>> 17/07/18 13:25:10 ERROR org.apache.spark.scheduler.LiveListenerBus:
>> SparkListenerBus has already stopped! Dropping event
>> SparkListenerExecutorMetricsUpdate(4,WrappedArray())
>> And similar WARN/INFO messages continue occurring.
>>
>> When I try to access the UI, I get:
>>
>> Problem accessing /proxy/application_1500380353993_0001/. Reason:
>>
>> Connection to http://10.142.0.17:4040 refused
>>
>> Caused by:
>>
>> org.apache.http.conn.HttpHostConnectException: Connection to 
>> http://10.142.0.17:4040 refused
>>  at 
>> org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:190)
>>  at 
>> org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
>>  at 
>> org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:643)
>>  at 
>> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479)
>>  at 
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
>>  at 
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
>>  at 
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
>>  at 
>> org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.proxyLink(WebAppProxyServlet.java:200)
>>  at 
>> org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:387)
>>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
>>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>>
>>
>>
>> I noticed this issue talks about something similar and I guess is
>> related: https://issues.apache.org/jira/browse/SPARK-18838.
>>
>> On Tue, Jul 18, 2017 at 2:49 AM, Riccardo Ferrari <ferra...@gmail.com>
>> wrote:
>>
>>> Hi,
>>&

Re: Spark UI crashes on Large Workloads

2017-07-18 Thread Riccardo Ferrari

The reason you get connection refused when connecting to the application UI
(port 4040) is because you app gets stopped thus the application UI stops
as well. To inspect your executors logs after the fact you might find
useful the Spark History server
<https://spark.apache.org/docs/latest/monitoring.html#viewing-after-the-fact>
(for standalone mode).

Personally I I collect the logs from my worker nodes. They generally sit
under the $SPARK_HOME/work// (for standalone).
There you can find exceptions and messages from the executors assigned to
your app.

Now, about you app crashing, might be useful check whether it is sized
correctly. The issue you linked sounds appropriate however I would give
some sanity checks a try. I solved many issues just by sizing an app that I
would first check memory size, cpu allocations and so on..

Best,

On Tue, Jul 18, 2017 at 3:30 PM, Saatvik Shah <saatvikshah1...@gmail.com>
wrote:

> Hi Riccardo,
>
> Yes, Thanks for suggesting I do that.
>
> [Stage 1:==>   (12750 + 40) /
> 15000]17/07/18 13:22:28 ERROR org.apache.spark.scheduler.LiveListenerBus:
> Dropping SparkListenerEvent because no remaining room in event queue. This
> likely means one of the SparkListeners is too slow and cannot keep up with
> the rate at which tasks are being started by the scheduler.
> 17/07/18 13:22:28 WARN org.apache.spark.scheduler.LiveListenerBus:
> Dropped 1 SparkListenerEvents since Thu Jan 01 00:00:00 UTC 1970
> [Stage 1:> (13320 + 41) /
> 15000]17/07/18 13:23:28 WARN org.apache.spark.scheduler.LiveListenerBus:
> Dropped 26782 SparkListenerEvents since Tue Jul 18 13:22:28 UTC 2017
> [Stage 1:==>   (13867 + 40) /
> 15000]17/07/18 13:24:28 WARN org.apache.spark.scheduler.LiveListenerBus:
> Dropped 58751 SparkListenerEvents since Tue Jul 18 13:23:28 UTC 2017
> [Stage 1:===>  (14277 + 40) /
> 15000]17/07/18 13:25:10 INFO org.spark_project.jetty.server.ServerConnector:
> Stopped ServerConnector@3b7284c4{HTTP/1.1}{0.0.0.0:4040}
> 17/07/18 13:25:10 ERROR org.apache.spark.scheduler.LiveListenerBus:
> SparkListenerBus has already stopped! Dropping event
> SparkListenerExecutorMetricsUpdate(4,WrappedArray())
> And similar WARN/INFO messages continue occurring.
>
> When I try to access the UI, I get:
>
> Problem accessing /proxy/application_1500380353993_0001/. Reason:
>
> Connection to http://10.142.0.17:4040 refused
>
> Caused by:
>
> org.apache.http.conn.HttpHostConnectException: Connection to 
> http://10.142.0.17:4040 refused
>   at 
> org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:190)
>   at 
> org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
>   at 
> org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:643)
>   at 
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479)
>   at 
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
>   at 
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
>   at 
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
>   at 
> org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.proxyLink(WebAppProxyServlet.java:200)
>   at 
> org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:387)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>
>
>
> I noticed this issue talks about something similar and I guess is related:
> https://issues.apache.org/jira/browse/SPARK-18838.
>
> On Tue, Jul 18, 2017 at 2:49 AM, Riccardo Ferrari <ferra...@gmail.com>
> wrote:
>
>> Hi,
>>  can you share more details. do you have any exceptions from the driver?
>> or executors?
>>
>> best,
>>
>> On Jul 18, 2017 02:49, "saatvikshah1994" <saatvikshah1...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I have a pyspark App which when provided a huge amount of data as input
>>> throws the error explained here sometimes:
>>> https://stackoverflow.com/questions/32340639/unable-to-under
>>> stand-error-sparklistenerbus-has-already-stopped-dropping-event.
>>> All my code is running inside the main function, and the only slightly
>>> peculiar thing I am doing in this app is using a custom PySpark ML
>>> Transformer(M

Re: Spark UI crashes on Large Workloads

2017-07-18 Thread Saatvik Shah

Hi Riccardo,

Yes, Thanks for suggesting I do that.

[Stage 1:==>   (12750 + 40) /
15000]17/07/18 13:22:28 ERROR org.apache.spark.scheduler.LiveListenerBus:
Dropping SparkListenerEvent because no remaining room in event queue. This
likely means one of the SparkListeners is too slow and cannot keep up with
the rate at which tasks are being started by the scheduler.
17/07/18 13:22:28 WARN org.apache.spark.scheduler.LiveListenerBus: Dropped
1 SparkListenerEvents since Thu Jan 01 00:00:00 UTC 1970
[Stage 1:> (13320 + 41) /
15000]17/07/18 13:23:28 WARN org.apache.spark.scheduler.LiveListenerBus:
Dropped 26782 SparkListenerEvents since Tue Jul 18 13:22:28 UTC 2017
[Stage 1:==>   (13867 + 40) /
15000]17/07/18 13:24:28 WARN org.apache.spark.scheduler.LiveListenerBus:
Dropped 58751 SparkListenerEvents since Tue Jul 18 13:23:28 UTC 2017
[Stage 1:===>  (14277 + 40) /
15000]17/07/18 13:25:10 INFO
org.spark_project.jetty.server.ServerConnector: Stopped
ServerConnector@3b7284c4{HTTP/1.1}{0.0.0.0:4040}
17/07/18 13:25:10 ERROR org.apache.spark.scheduler.LiveListenerBus:
SparkListenerBus has already stopped! Dropping event
SparkListenerExecutorMetricsUpdate(4,WrappedArray())
And similar WARN/INFO messages continue occurring.

When I try to access the UI, I get:

Problem accessing /proxy/application_1500380353993_0001/. Reason:

Connection to http://10.142.0.17:4040 refused

Caused by:

org.apache.http.conn.HttpHostConnectException: Connection to
http://10.142.0.17:4040 refused
at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:190)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:643)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at 
org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.proxyLink(WebAppProxyServlet.java:200)
at 
org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:387)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)



I noticed this issue talks about something similar and I guess is related:
https://issues.apache.org/jira/browse/SPARK-18838.

On Tue, Jul 18, 2017 at 2:49 AM, Riccardo Ferrari <ferra...@gmail.com>
wrote:

> Hi,
>  can you share more details. do you have any exceptions from the driver?
> or executors?
>
> best,
>
> On Jul 18, 2017 02:49, "saatvikshah1994" <saatvikshah1...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I have a pyspark App which when provided a huge amount of data as input
>> throws the error explained here sometimes:
>> https://stackoverflow.com/questions/32340639/unable-to-under
>> stand-error-sparklistenerbus-has-already-stopped-dropping-event.
>> All my code is running inside the main function, and the only slightly
>> peculiar thing I am doing in this app is using a custom PySpark ML
>> Transformer(Modified from
>> https://stackoverflow.com/questions/32331848/create-a-custom
>> -transformer-in-pyspark-ml).
>> Could this be the issue? How can I debug why this is happening?
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-user-list.
>> 1001560.n3.nabble.com/Spark-UI-crashes-on-Large-Workloads-tp28873.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>


-- 
*Saatvik Shah,*
*Masters in the School of Computer Science,*
*Carnegie Mellon University,*
*LinkedIn <https://www.linkedin.com/in/saatvikshah/>, Website
<https://saatvikshah1994.github.io/>*

Re: Spark UI crashes on Large Workloads

2017-07-18 Thread Riccardo Ferrari

Hi,
 can you share more details. do you have any exceptions from the driver? or
executors?

best,

On Jul 18, 2017 02:49, "saatvikshah1994" <saatvikshah1...@gmail.com> wrote:

> Hi,
>
> I have a pyspark App which when provided a huge amount of data as input
> throws the error explained here sometimes:
> https://stackoverflow.com/questions/32340639/unable-to-understand-error-
> sparklistenerbus-has-already-stopped-dropping-event.
> All my code is running inside the main function, and the only slightly
> peculiar thing I am doing in this app is using a custom PySpark ML
> Transformer(Modified from
> https://stackoverflow.com/questions/32331848/create-a-
> custom-transformer-in-pyspark-ml).
> Could this be the issue? How can I debug why this is happening?
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Spark-UI-crashes-on-Large-Workloads-tp28873.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Spark UI crashes on Large Workloads

2017-07-17 Thread saatvikshah1994

Hi,

I have a pyspark App which when provided a huge amount of data as input
throws the error explained here sometimes:
https://stackoverflow.com/questions/32340639/unable-to-understand-error-sparklistenerbus-has-already-stopped-dropping-event.
All my code is running inside the main function, and the only slightly
peculiar thing I am doing in this app is using a custom PySpark ML
Transformer(Modified from
https://stackoverflow.com/questions/32331848/create-a-custom-transformer-in-pyspark-ml).
Could this be the issue? How can I debug why this is happening?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-UI-crashes-on-Large-Workloads-tp28873.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

proxy on spark UI

2017-06-27 Thread Soheila S.

Hi all,
I am using Hadoop 2.6.5 and spark 2.1.0 and run a job using spark-submit
and master is set to "yarn". When spark starts, I can load Spark UI page
using port 4040 but no job is shown in the page. After the following logs
(registering application master on yarn) spark UI is not accessible
anymore, even from tracking UI (ApplicationMaster) in cluster UI.

The URL (http://z401:4040) is redirected to a new one (
http://z401:8088/proxy/application_1498135277395_0009) and can not be
reached.

Any idea?

Thanks a lot in advance.

17/06/23 12:35:45 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint:
ApplicationMaster registered as NettyRpcEndpointRef(null)

17/06/23 12:35:45 INFO cluster.YarnClientSchedulerBackend: Add WebUI
Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter,
Map(PROXY_HOSTS -> z401, PROXY_URI_BASES ->
http://z401:8088/proxy/application_1498135277395_0009),
/proxy/application_1498135277395_0009

17/06/23 12:35:45 INFO ui.JettyUtils: Adding filter:
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter

17/06/23 12:35:45 INFO yarn.Client: Application report for
application_1498135277395_0009 (state: RUNNING)

Re: Spark UI shows Jobs are processing, but the files are already written to S3

2017-05-19 Thread Miles Crawford

Could I be experiencing the same thing?

https://www.dropbox.com/s/egtj1056qeudswj/sparkwut.png?dl=0

On Wed, Nov 16, 2016 at 10:37 AM, Shreya Agarwal <shrey...@microsoft.com>
wrote:

> I think that is a bug. I have seen that a lot especially with long running
> jobs where Spark skips a lot of stages because it has pre-computed results.
> And some of these are never marked as completed, even though in reality
> they are. I figured this out because I was using the interactive shell
> (spark-shell) and the shell came up to a prompt indicating the job had
> finished even though there were a lot of Active jobs and tasks according to
> the UI. And my output is correct.
>
>
>
> Is there a JIRA item tracking this?
>
>
>
> *From:* Kuchekar [mailto:kuchekar.nil...@gmail.com]
> *Sent:* Wednesday, November 16, 2016 10:00 AM
> *To:* spark users <user@spark.apache.org>
> *Subject:* Spark UI shows Jobs are processing, but the files are already
> written to S3
>
>
>
> Hi,
>
>
>
>  I am running a spark job, which saves the computed data (massive
> data) to S3. On  the Spark Ui I see the some jobs are active, but no
> activity in the logs. Also on S3 all the data has be written (verified each
> bucket --> it has _SUCCESS file)
>
>
>
> Am I missing something?
>
>
>
> Thanks.
>
> Kuchekar, Nilesh
>

RE: Spark UI not coming up in EMR

2017-01-11 Thread Saurabh Malviya (samalviy)

Any clue on this.

Jobs are running fine , But not able to access Spark UI in EMR -yarn.

Where I can see statistics like , No of events /per sec  and rows processed  
for streaming in log files (If UI is not working)

-Saurabh

From: Saurabh Malviya (samalviy)
Sent: Monday, January 09, 2017 10:59 AM
To: user@spark.apache.org
Subject: Spark UI not coming up in EMR

Spark web UI for detailed monitoring for streaming jobs stop rendering after 2 
weeks. Its keep looping to fetch the page. Is there any clue I can get that 
page. Or logs where I can see how many events coming in spark for each internval

-Saurabh

Spark UI not coming up in EMR

2017-01-09 Thread Saurabh Malviya (samalviy)

Spark web UI for detailed monitoring for streaming jobs stop rendering after 2 
weeks. Its keep looping to fetch the page. Is there any clue I can get that 
page. Or logs where I can see how many events coming in spark for each internval

-Saurabh

Spark UI - Puzzling “Input Size / Records” in Stage Details

2017-01-08 Thread Appu K

Was trying something basic to understand tasks stages and shuffles a bit
better in Spark. The dataset is 256 MB

Tried this in zeppelin

val tmpDF = spark.read
  .option("header", "true")
  .option("delimiter", ",")
  .option("inferSchema", "true")
  .csv("s3://l4b-d4t4/wikipedia/pageviews-by-second-tsv")
tmpDF.count

This kicked off 4 jobs -

   - 3 jobs for the first statement and
   - 1 job with 2 stages for tmpDF.count

The last stage of the job that corresponded to the count statement has some
puzzling data that i'm unable to explain.

   1.

   Stage details section says "Input Size / Records: 186.6 MB / 720 "
   and Aggregated Metrics by Executor says "Input Size / Records " to be
   "186.6 MB / 5371292" - Stage Details UI
   
   2.

   In the tasks list, one particular server
   ip-x-x-x-60.eu-west-1.compute.internal has 4 tasks with "0.0 B / 457130" as
   the value for "Input Size / Records " - Task Details UI
   

I initially thought this is some local disk cache or something that has to
do with EMRFS. But however, once I cached the dataframe and took the count
again, it showed up "16.8 MB / 46" for all 16 tasks corresponding to the 16
partitions.

Any links/pointers to understand this a bit better would be highly helpful

RE: Spark UI shows Jobs are processing, but the files are already written to S3

2016-11-16 Thread Shreya Agarwal

I think that is a bug. I have seen that a lot especially with long running jobs 
where Spark skips a lot of stages because it has pre-computed results. And some 
of these are never marked as completed, even though in reality they are. I 
figured this out because I was using the interactive shell (spark-shell) and 
the shell came up to a prompt indicating the job had finished even though there 
were a lot of Active jobs and tasks according to the UI. And my output is 
correct.

Is there a JIRA item tracking this?

From: Kuchekar [mailto:kuchekar.nil...@gmail.com]
Sent: Wednesday, November 16, 2016 10:00 AM
To: spark users <user@spark.apache.org>
Subject: Spark UI shows Jobs are processing, but the files are already written 
to S3

Hi,

 I am running a spark job, which saves the computed data (massive data) to 
S3. On  the Spark Ui I see the some jobs are active, but no activity in the 
logs. Also on S3 all the data has be written (verified each bucket --> it has 
_SUCCESS file)

Am I missing something?

Thanks.
Kuchekar, Nilesh

Spark UI shows Jobs are processing, but the files are already written to S3

2016-11-16 Thread Kuchekar

Hi,

 I am running a spark job, which saves the computed data (massive data)
to S3. On  the Spark Ui I see the some jobs are active, but no activity in
the logs. Also on S3 all the data has be written (verified each bucket -->
it has _SUCCESS file)

Am I missing something?

Thanks.
Kuchekar, Nilesh

How to interpret the Time Line on "Details for Stage" Spark UI page

2016-11-09 Thread Xiaoye Sun

Hi,

I am using Spark 1.6.1, and I am looking at the Event Timeline on "Details
for Stage" Spark UI web page in detail.

I found that the "scheduler delay" on event timeline is somehow
misrepresented. I want to confirm if my understanding is correct.

Here is the detailed description:
In Spark's code, I found that the definition of "SCHEDULER_DELAY" is that
"scheduler delay includes time to ship the task from the scheduler to the
executor, and time to send the task result from the executor to the
scheduler. If scheduler delay is large, consider decreasing the size of
tasks or decreasing the size of task results"

My interpretation of the definition is that the scheduler delay has two
components. The first component happens at the beginning of a task when
scheduler assigns task executable to the executor; The second component
happens at the end of a task when the scheduler collects the results from
the executor.

However, on the event timeline figure, there is only one section for the
scheduler delay at the beginning of each task, whose length represents the
SUM of these two components. This means that the following "Task
Deserialization Time" , “Shuffle Read Time", "Executor Computing Time",
etc, should have started earlier on this event timeline figure.


Best,
Xiaoye

Re: Spark UI error spark 2.0.1 hadoop 2.6

2016-10-27 Thread gpatcham

I'm able to fix.. added servlet 3.0 to classpath



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-UI-error-spark-2-0-1-hadoop-2-6-tp27970p27971.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Spark UI error spark 2.0.1 hadoop 2.6

2016-10-27 Thread gpatcham

Hi,

I'm running spark-shell in yarn client mode and sparkcontext started and
able to run commands .

But UI is not coming up and see below error's in spark shell

20:51:20 WARN servlet.ServletHandler: 
javax.servlet.ServletException: Could not determine the proxy server for
redirection
at
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.findRedirectUrl(AmIpFilter.java:183)
at
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:139)
at
org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at
org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at
org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at
org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at
org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at
org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.spark_project.jetty.servlets.gzip.GzipHandler.handle(GzipHandler.java:479)
at
org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at
org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.spark_project.jetty.server.Server.handle(Server.java:499)
at 
org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:311)
at
org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at
org.spark_project.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
at
org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at
org.spark_project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:724)
16/10/27 20:51:20 WARN server.HttpChannel: /
java.lang.NoSuchMethodError:
javax.servlet.http.HttpServletRequest.isAsyncStarted()Z
at
org.spark_project.jetty.servlets.gzip.GzipHandler.handle(GzipHandler.java:484)
at
org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at
org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.spark_project.jetty.server.Server.handle(Server.java:499)
at 
org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:311)
at
org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at
org.spark_project.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
at
org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at
org.spark_project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:724)
16/10/27 20:51:20 WARN thread.QueuedThreadPool: 
java.lang.NoSuchMethodError:
javax.servlet.http.HttpServletResponse.getStatus()I
at
org.spark_project.jetty.server.handler.ErrorHandler.handle(ErrorHandler.java:112)
at org.spark_project.jetty.server.Response.sendError(Response.java:597)
at
org.spark_project.jetty.server.HttpChannel.handleException(HttpChannel.java:487)
at
org.spark_project.jetty.server.HttpConnection$HttpChannelOverHttp.handleException(HttpConnection.java:594)
at 
org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:387)
at
org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at
org.spark_project.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
at
org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at
org.spark_project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:724)
16/10/27 20:51:20 WARN thread.QueuedThreadPool: Unexpected thread death:
org.spark_project.jetty.util.thread.QueuedThreadPool$3@10d268b in
SparkUI{STARTED,8<=8<=200,i=2,q=0}


Let me know if I'm missing something.

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-UI-error-spark-2-0-1-hadoop-2-6-tp27970.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Is there anyway Spark UI is set to poll and refreshes itself

2016-09-05 Thread Mich Talebzadeh

Hi Sivakumaran

Thanks for your very useful research. Apologies have been very busy. Let me
read through and come back.

Regards


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 5 September 2016 at 10:21, Sivakumaran S <siva.kuma...@icloud.com> wrote:

> Hi Mich,
>
> Here is a summary of what I have so far understood (May require
> correction):
>
> *Facts*.
> 1. Spark has an internal web (http) server based on Jetty (
> http://www.eclipse.org/jetty/) that serves a ‘read-only’ UI on port 4040.
> Additional SCs are displayed in sequentially increased port numbers (4041,
> 4042 and so on)
> 2. Spark has a ReST API for job submissions.
> 3. The current UI has seven tabs: Jobs, Stages, Storage, Environment,
> Executors, Streaming and SQL.
> 4. Some tabs has further links pointing to greater depth of information
> about the job.
> 5. Changing Spark source code is not an option for building a dashboard
> because most users use the compiled binaries or custom solutions provided
> by cloud providers.
> 6. SparkListener can be used to monitor only from within the job
> submitted and is therefore job-specific.
>
> *Assumptions*
> 1. There is no API (ReST or otherwise) for retrieving statistics or
> information about current jobs.
> 2. There is no way of automatically refreshing the information in these
> tabs.
>
> *Proposed Solution to build a dashboard*
> 1. HTTP is stateless. Every call to the web server gives the current
> snapshot of information about the Spark job.
> 2. Python can be used to write this software.
> 3. So this proposed software sits between the Spark Server and the
> browser. It repetitively (say, every 5 seconds) gets data (the HTML pages)
>  from port 4040 and after parsing the required information from these HTML
> pages, updates the dashboard using Websocket. A light weight web server
> framework will also be required for hosting the dashboard.
> 4. Identifying what information is most important to be displayed (Some
> experienced Spark users can provide inputs) is important.
>
>
>
> If the folks at Spark provide a ReST API for retrieving important
> information about running jobs, this would be far more simpler.
>
> Regards,
>
> Sivakumaran S
>
>
>
> On 27-Aug-2016, at 3:59 PM, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
> Thanks Nguyen for the link.
>
> I installed Super Refresh as ADD on to Chrome. By default the refresh is
> stop until you set it to x seconds. However, the issue we have is that
> Spark UI comes with 6+ tabs and you have to repeat the process for each tab.
>
> However, that messes up the things. For example if I choose to refresh
> "Executors" tab every 2 seconds and decide to refresh "Stages" tab, then
> there is a race condition whereas you are thrown back to the last refresh
> page which is not really what one wants.
>
> Ideally one wants the Spark UI page identified by host:port to be the
> driver and every other tab underneath say host:port/Stages to be refreshed
> once we open that tab and stay there. If I go back to say SQL tab, I like
> to see that refreshed ever n seconds.
>
> I hope this makes sense.
>
>
>
> Dr Mich Talebzadeh
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
> http://talebzadehmich.wordpress.com
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 27 August 2016 at 15:01, nguyen duc Tuan <newvalu...@gmail.com> wrote:
>
>> The simplest solution that I found: using an browser extension which do
>> that for you :D. For example, if you are using Chrome, you can use this
>> extension: https://chrome.google.com/webstore/detail/easy-auto-refresh/
>> aabcgdmkeabbnleenpncegpcngjpnjkc/related?hl=en
>> An other way, but a bit mor

Re: Is there anyway Spark UI is set to poll and refreshes itself

2016-08-27 Thread Mich Talebzadeh

Thanks Nguyen for the link.

I installed Super Refresh as ADD on to Chrome. By default the refresh is
stop until you set it to x seconds. However, the issue we have is that
Spark UI comes with 6+ tabs and you have to repeat the process for each tab.

However, that messes up the things. For example if I choose to refresh
"Executors" tab every 2 seconds and decide to refresh "Stages" tab, then
there is a race condition whereas you are thrown back to the last refresh
page which is not really what one wants.

Ideally one wants the Spark UI page identified by host:port to be the
driver and every other tab underneath say host:port/Stages to be refreshed
once we open that tab and stay there. If I go back to say SQL tab, I like
to see that refreshed ever n seconds.

I hope this makes sense.



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 27 August 2016 at 15:01, nguyen duc Tuan <newvalu...@gmail.com> wrote:

> The simplest solution that I found: using an browser extension which do
> that for you :D. For example, if you are using Chrome, you can use this
> extension: https://chrome.google.com/webstore/detail/easy-auto-refresh/
> aabcgdmkeabbnleenpncegpcngjpnjkc/related?hl=en
> An other way, but a bit more manually using javascript: start with a
> window, you will create a child window with your target url. The parent
> window will refresh that child window for you. Due to same-original
> pollicy, you should set parent url to the same url as your target url. Try
> this in your web console:
> wi = window.open("your target url")
> var timeInMinis = 2000
> setInterval(function(){ wi.location.reload();}, timeInMinis)
> Hope this help.
>
> 2016-08-27 20:17 GMT+07:00 Mich Talebzadeh <mich.talebza...@gmail.com>:
>
>> Hi All,
>>
>> GitHub project SparkUIDashboard created here
>> <https://github.com/search?utf8=%E2%9C%93=sparkuidashboard=simplesearch>
>>
>>
>>
>>
>> [image: Inline images 2]
>> Let use put the show on the road :)
>>
>> Cheers
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 27 August 2016 at 12:53, Sivakumaran S <siva.kuma...@icloud.com>
>> wrote:
>>
>>> Hi Mich,
>>>
>>> Unlikely that we can use Zeppelin for dynamic, real time update
>>> visualisation. It makes nice, static visuals.
>>>
>>> I was thinking more on the lines of http://dashingdemo.herokuap
>>> p.com/sample
>>>
>>> The library is http://dashing.io
>>>
>>> There are more widgets that can be used https://github.com/Shopif
>>> y/dashing/wiki/Additional-Widgets
>>>
>>> The Spark UI is functional, but I am looking forward to some aesthetics
>>> and high level picture of the process. Using Websockets, the dashboard can
>>> be updated real time without the need of refreshing the page.
>>>
>>> Regards,
>>>
>>> Sivakumaran S
>>>
>>>
>>> On 27-Aug-2016, at 10:10 AM, Mich Talebzadeh <mich.talebza...@gmail.com>
>>> wrote:
>>>
>>> Thanks Sivakumaran
>>>
>>> I don't think we can use Zeppelin for this purpose. It is not a real
>>> time dashboard or can it be. I use it but much like Tableau with added
>>> Scala programming.
>>>
>>> Does anyone know of open source real time dashboards?
>>>
>>>
>>> Cheers
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>> LinkedIn * 
>>>

Re: Is there anyway Spark UI is set to poll and refreshes itself

2016-08-27 Thread nguyen duc Tuan

The simplest solution that I found: using an browser extension which do
that for you :D. For example, if you are using Chrome, you can use this
extension:
https://chrome.google.com/webstore/detail/easy-auto-refresh/aabcgdmkeabbnleenpncegpcngjpnjkc/related?hl=en
An other way, but a bit more manually using javascript: start with a
window, you will create a child window with your target url. The parent
window will refresh that child window for you. Due to same-original
pollicy, you should set parent url to the same url as your target url. Try
this in your web console:
wi = window.open("your target url")
var timeInMinis = 2000
setInterval(function(){ wi.location.reload();}, timeInMinis)
Hope this help.

2016-08-27 20:17 GMT+07:00 Mich Talebzadeh <mich.talebza...@gmail.com>:

> Hi All,
>
> GitHub project SparkUIDashboard created here
> <https://github.com/search?utf8=%E2%9C%93=sparkuidashboard=simplesearch>
>
>
>
>
> [image: Inline images 2]
> Let use put the show on the road :)
>
> Cheers
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 27 August 2016 at 12:53, Sivakumaran S <siva.kuma...@icloud.com> wrote:
>
>> Hi Mich,
>>
>> Unlikely that we can use Zeppelin for dynamic, real time update
>> visualisation. It makes nice, static visuals.
>>
>> I was thinking more on the lines of http://dashingdemo.herokuap
>> p.com/sample
>>
>> The library is http://dashing.io
>>
>> There are more widgets that can be used https://github.com/Shopif
>> y/dashing/wiki/Additional-Widgets
>>
>> The Spark UI is functional, but I am looking forward to some aesthetics
>> and high level picture of the process. Using Websockets, the dashboard can
>> be updated real time without the need of refreshing the page.
>>
>> Regards,
>>
>> Sivakumaran S
>>
>>
>> On 27-Aug-2016, at 10:10 AM, Mich Talebzadeh <mich.talebza...@gmail.com>
>> wrote:
>>
>> Thanks Sivakumaran
>>
>> I don't think we can use Zeppelin for this purpose. It is not a real time
>> dashboard or can it be. I use it but much like Tableau with added Scala
>> programming.
>>
>> Does anyone know of open source real time dashboards?
>>
>>
>> Cheers
>>
>> Dr Mich Talebzadeh
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 27 August 2016 at 09:42, Sivakumaran S <siva.kuma...@icloud.com>
>> wrote:
>>
>>> I would love to participate in developing a dashboard of some sort in
>>> lieu (or at least complement it)  of Spark UI .
>>>
>>> Regards,
>>>
>>> Sivakumaran S
>>>
>>> On 27 Aug 2016 9:34 a.m., Mich Talebzadeh <mich.talebza...@gmail.com>
>>> wrote:
>>>
>>> Are we actually looking for a eal time dashboard of some sort for Spark
>>> UI interface?
>>>
>>> After all one can think a real time dashboard can do this!
>>>
>>> HTH
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>

Re: Is there anyway Spark UI is set to poll and refreshes itself

2016-08-27 Thread Mich Talebzadeh

Hi All,

GitHub project SparkUIDashboard created here
<https://github.com/search?utf8=%E2%9C%93=sparkuidashboard=simplesearch>




[image: Inline images 2]
Let use put the show on the road :)

Cheers


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 27 August 2016 at 12:53, Sivakumaran S <siva.kuma...@icloud.com> wrote:

> Hi Mich,
>
> Unlikely that we can use Zeppelin for dynamic, real time update
> visualisation. It makes nice, static visuals.
>
> I was thinking more on the lines of http://dashingdemo.
> herokuapp.com/sample
>
> The library is http://dashing.io
>
> There are more widgets that can be used https://github.com/
> Shopify/dashing/wiki/Additional-Widgets
>
> The Spark UI is functional, but I am looking forward to some aesthetics
> and high level picture of the process. Using Websockets, the dashboard can
> be updated real time without the need of refreshing the page.
>
> Regards,
>
> Sivakumaran S
>
>
> On 27-Aug-2016, at 10:10 AM, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
> Thanks Sivakumaran
>
> I don't think we can use Zeppelin for this purpose. It is not a real time
> dashboard or can it be. I use it but much like Tableau with added Scala
> programming.
>
> Does anyone know of open source real time dashboards?
>
>
> Cheers
>
> Dr Mich Talebzadeh
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
> http://talebzadehmich.wordpress.com
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 27 August 2016 at 09:42, Sivakumaran S <siva.kuma...@icloud.com> wrote:
>
>> I would love to participate in developing a dashboard of some sort in
>> lieu (or at least complement it)  of Spark UI .
>>
>> Regards,
>>
>> Sivakumaran S
>>
>> On 27 Aug 2016 9:34 a.m., Mich Talebzadeh <mich.talebza...@gmail.com>
>> wrote:
>>
>> Are we actually looking for a eal time dashboard of some sort for Spark
>> UI interface?
>>
>> After all one can think a real time dashboard can do this!
>>
>> HTH
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 26 August 2016 at 23:38, Mich Talebzadeh <mich.talebza...@gmail.com>
>> wrote:
>>
>> Thanks Jacek,
>>
>> I will have a look. I think it is long overdue.
>>
>> I mean we try to micro batch and stream everything below seconds but when
>> it comes to help  monitor basics we are still miles behind :(
>>
>> Cheers,
>>
>> Dr Mich Talebzadeh
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary da

Re: Is there anyway Spark UI is set to poll and refreshes itself

2016-08-27 Thread Jacek Laskowski

Hi,

There's no better way to start a project than...github it :-) Create a new
project, clone it and do dzieła! (= go ahead in Polish).

Jacek

On 27 Aug 2016 10:42 a.m., "Sivakumaran S" <siva.kuma...@icloud.com> wrote:

> I would love to participate in developing a dashboard of some sort in lieu
> (or at least complement it)  of Spark UI .
>
> Regards,
>
> Sivakumaran S
>
> On 27 Aug 2016 9:34 a.m., Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
> Are we actually looking for a eal time dashboard of some sort for Spark UI
> interface?
>
> After all one can think a real time dashboard can do this!
>
> HTH
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 26 August 2016 at 23:38, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
> Thanks Jacek,
>
> I will have a look. I think it is long overdue.
>
> I mean we try to micro batch and stream everything below seconds but when
> it comes to help  monitor basics we are still miles behind :(
>
> Cheers,
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 26 August 2016 at 23:21, Jacek Laskowski <ja...@japila.pl> wrote:
>
> Hi Mich,
>
> I don't think so. There is support for a UI page refresh but I haven't
> seen it in use.
>
> See StreamingPage [1] where it schedules refresh every 5 secs, i.e.
> Some(5000). In SparkUIUtils.headerSparkPage [2] there is
> refreshInterval but it's not used in any place in Spark.
>
> Time to fill an JIRA issue?
>
> What about REST API and httpie updating regularly [3]? Perhaps Metrics
> with ConsoleSink [4]?
>
> [1] https://github.com/apache/spark/blob/master/streaming/src/ma
> in/scala/org/apache/spark/streaming/ui/StreamingPage.scala#L158
> [2] https://github.com/apache/spark/blob/master/core/src/main/sc
> ala/org/apache/spark/ui/UIUtils.scala#L202
> [3] http://spark.apache.org/docs/latest/monitoring.html#rest-api
> [4] http://spark.apache.org/docs/latest/monitoring.html#metrics
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Thu, Aug 25, 2016 at 11:55 AM, Mich Talebzadeh
> <mich.talebza...@gmail.com> wrote:
> > Hi,
> >
> > This may be already there.
> >
> > A spark job opens up a UI on port specified by --conf
> "spark.ui.port=${SP}"
> > that defaults to 4040.
> >
> > However, on UI one needs to refresh the page to see the progress.
> >
> > Can this be polled so it is refreshed automatically
> >
> > Thanks
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn
> > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJ
> d6zP6AcPCCdOABUrV8Pw
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > Disclaimer: Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> The
> > author will in no case be liable for any monetary damages arising from
> such
> > loss, damage or destruction.
> >
> >
>
>
>
>
>

Re: Is there anyway Spark UI is set to poll and refreshes itself

2016-08-27 Thread Mich Talebzadeh

Thanks Sivakumaran

I don't think we can use Zeppelin for this purpose. It is not a real time
dashboard or can it be. I use it but much like Tableau with added Scala
programming.

Does anyone know of open source real time dashboards?


Cheers

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 27 August 2016 at 09:42, Sivakumaran S <siva.kuma...@icloud.com> wrote:

> I would love to participate in developing a dashboard of some sort in lieu
> (or at least complement it)  of Spark UI .
>
> Regards,
>
> Sivakumaran S
>
> On 27 Aug 2016 9:34 a.m., Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
> Are we actually looking for a eal time dashboard of some sort for Spark UI
> interface?
>
> After all one can think a real time dashboard can do this!
>
> HTH
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 26 August 2016 at 23:38, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
> Thanks Jacek,
>
> I will have a look. I think it is long overdue.
>
> I mean we try to micro batch and stream everything below seconds but when
> it comes to help  monitor basics we are still miles behind :(
>
> Cheers,
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 26 August 2016 at 23:21, Jacek Laskowski <ja...@japila.pl> wrote:
>
> Hi Mich,
>
> I don't think so. There is support for a UI page refresh but I haven't
> seen it in use.
>
> See StreamingPage [1] where it schedules refresh every 5 secs, i.e.
> Some(5000). In SparkUIUtils.headerSparkPage [2] there is
> refreshInterval but it's not used in any place in Spark.
>
> Time to fill an JIRA issue?
>
> What about REST API and httpie updating regularly [3]? Perhaps Metrics
> with ConsoleSink [4]?
>
> [1] https://github.com/apache/spark/blob/master/streaming/src/ma
> in/scala/org/apache/spark/streaming/ui/StreamingPage.scala#L158
> [2] https://github.com/apache/spark/blob/master/core/src/main/sc
> ala/org/apache/spark/ui/UIUtils.scala#L202
> [3] http://spark.apache.org/docs/latest/monitoring.html#rest-api
> [4] http://spark.apache.org/docs/latest/monitoring.html#metrics
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Thu, Aug 25, 2016 at 11:55 AM, Mich Talebzadeh
> <mich.talebza...@gmail.com> wrote:
> > Hi,
> >
> > This may be already there.
> >
> > A spark job opens up a UI on port specified by --conf
> "spark.ui.port=${SP}"
> > that defaults to 4040.
> >
> > However, on UI one needs to refresh the page to see the progress.
> >
> > Can this be polled so it is refreshed automatically
> >
> > Thanks
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn
> > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJ
> d6zP6AcPCCdOABUrV8Pw
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > Disclaimer: Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> The
> > author will in no case be liable for any monetary damages arising from
> such
> > loss, damage or destruction.
> >
> >
>
>
>
>
>

Re: Is there anyway Spark UI is set to poll and refreshes itself

2016-08-27 Thread Mich Talebzadeh

Are we actually looking for a eal time dashboard of some sort for Spark UI
interface?

After all one can think a real time dashboard can do this!

HTH



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 26 August 2016 at 23:38, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Thanks Jacek,
>
> I will have a look. I think it is long overdue.
>
> I mean we try to micro batch and stream everything below seconds but when
> it comes to help  monitor basics we are still miles behind :(
>
> Cheers,
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 26 August 2016 at 23:21, Jacek Laskowski <ja...@japila.pl> wrote:
>
>> Hi Mich,
>>
>> I don't think so. There is support for a UI page refresh but I haven't
>> seen it in use.
>>
>> See StreamingPage [1] where it schedules refresh every 5 secs, i.e.
>> Some(5000). In SparkUIUtils.headerSparkPage [2] there is
>> refreshInterval but it's not used in any place in Spark.
>>
>> Time to fill an JIRA issue?
>>
>> What about REST API and httpie updating regularly [3]? Perhaps Metrics
>> with ConsoleSink [4]?
>>
>> [1] https://github.com/apache/spark/blob/master/streaming/src/
>> main/scala/org/apache/spark/streaming/ui/StreamingPage.scala#L158
>> [2] https://github.com/apache/spark/blob/master/core/src/main/
>> scala/org/apache/spark/ui/UIUtils.scala#L202
>> [3] http://spark.apache.org/docs/latest/monitoring.html#rest-api
>> [4] http://spark.apache.org/docs/latest/monitoring.html#metrics
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>> On Thu, Aug 25, 2016 at 11:55 AM, Mich Talebzadeh
>> <mich.talebza...@gmail.com> wrote:
>> > Hi,
>> >
>> > This may be already there.
>> >
>> > A spark job opens up a UI on port specified by --conf
>> "spark.ui.port=${SP}"
>> > that defaults to 4040.
>> >
>> > However, on UI one needs to refresh the page to see the progress.
>> >
>> > Can this be polled so it is refreshed automatically
>> >
>> > Thanks
>> >
>> >
>> > Dr Mich Talebzadeh
>> >
>> >
>> >
>> > LinkedIn
>> > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJ
>> d6zP6AcPCCdOABUrV8Pw
>> >
>> >
>> >
>> > http://talebzadehmich.wordpress.com
>> >
>> >
>> > Disclaimer: Use it at your own risk. Any and all responsibility for any
>> > loss, damage or destruction of data or any other property which may
>> arise
>> > from relying on this email's technical content is explicitly
>> disclaimed. The
>> > author will in no case be liable for any monetary damages arising from
>> such
>> > loss, damage or destruction.
>> >
>> >
>>
>
>

Re: Is there anyway Spark UI is set to poll and refreshes itself

2016-08-26 Thread Mich Talebzadeh

Thanks Jacek,

I will have a look. I think it is long overdue.

I mean we try to micro batch and stream everything below seconds but when
it comes to help  monitor basics we are still miles behind :(

Cheers,

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 26 August 2016 at 23:21, Jacek Laskowski <ja...@japila.pl> wrote:

> Hi Mich,
>
> I don't think so. There is support for a UI page refresh but I haven't
> seen it in use.
>
> See StreamingPage [1] where it schedules refresh every 5 secs, i.e.
> Some(5000). In SparkUIUtils.headerSparkPage [2] there is
> refreshInterval but it's not used in any place in Spark.
>
> Time to fill an JIRA issue?
>
> What about REST API and httpie updating regularly [3]? Perhaps Metrics
> with ConsoleSink [4]?
>
> [1] https://github.com/apache/spark/blob/master/streaming/
> src/main/scala/org/apache/spark/streaming/ui/StreamingPage.scala#L158
> [2] https://github.com/apache/spark/blob/master/core/src/
> main/scala/org/apache/spark/ui/UIUtils.scala#L202
> [3] http://spark.apache.org/docs/latest/monitoring.html#rest-api
> [4] http://spark.apache.org/docs/latest/monitoring.html#metrics
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Thu, Aug 25, 2016 at 11:55 AM, Mich Talebzadeh
> <mich.talebza...@gmail.com> wrote:
> > Hi,
> >
> > This may be already there.
> >
> > A spark job opens up a UI on port specified by --conf
> "spark.ui.port=${SP}"
> > that defaults to 4040.
> >
> > However, on UI one needs to refresh the page to see the progress.
> >
> > Can this be polled so it is refreshed automatically
> >
> > Thanks
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn
> > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > Disclaimer: Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> The
> > author will in no case be liable for any monetary damages arising from
> such
> > loss, damage or destruction.
> >
> >
>

Re: Is there anyway Spark UI is set to poll and refreshes itself

2016-08-26 Thread Jacek Laskowski

Hi Mich,

I don't think so. There is support for a UI page refresh but I haven't
seen it in use.

See StreamingPage [1] where it schedules refresh every 5 secs, i.e.
Some(5000). In SparkUIUtils.headerSparkPage [2] there is
refreshInterval but it's not used in any place in Spark.

Time to fill an JIRA issue?

What about REST API and httpie updating regularly [3]? Perhaps Metrics
with ConsoleSink [4]?

[1] 
https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingPage.scala#L158
[2] 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ui/UIUtils.scala#L202
[3] http://spark.apache.org/docs/latest/monitoring.html#rest-api
[4] http://spark.apache.org/docs/latest/monitoring.html#metrics

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Thu, Aug 25, 2016 at 11:55 AM, Mich Talebzadeh
<mich.talebza...@gmail.com> wrote:
> Hi,
>
> This may be already there.
>
> A spark job opens up a UI on port specified by --conf "spark.ui.port=${SP}"
> that defaults to 4040.
>
> However, on UI one needs to refresh the page to see the progress.
>
> Can this be polled so it is refreshed automatically
>
> Thanks
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> Disclaimer: Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed. The
> author will in no case be liable for any monetary damages arising from such
> loss, damage or destruction.
>
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Is there anyway Spark UI is set to poll and refreshes itself

2016-08-25 Thread Marek Wiewiorka

Hi you can take a look at:
https://github.com/hammerlab/spree

it's a bit outdated but maybe it's still possible to use with some more
recent Spark version.

M.

2016-08-25 11:55 GMT+02:00 Mich Talebzadeh :

> Hi,
>
> This may be already there.
>
> A spark job opens up a UI on port specified by --conf
> "spark.ui.port=${SP}"  that defaults to 4040.
>
> However, on UI one needs to refresh the page to see the progress.
>
> Can this be polled so it is refreshed automatically
>
> Thanks
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>

Is there anyway Spark UI is set to poll and refreshes itself

2016-08-25 Thread Mich Talebzadeh

Hi,

This may be already there.

A spark job opens up a UI on port specified by --conf
"spark.ui.port=${SP}"  that defaults to 4040.

However, on UI one needs to refresh the page to see the progress.

Can this be polled so it is refreshed automatically

Thanks


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Re: How to give name to Spark jobs shown in Spark UI

2016-07-27 Thread unk1102

Thank Rahul I think you didn't read question properly I have one main spark
job which I name using the approach you described. As part of main spark
job I create multiple threads which essentially becomes child spark jobs
and those jobs has no direct way of naming.

On Jul 27, 2016 11:17, "rahulkumar-aws [via Apache Spark User List]" <
ml-node+s1001560n27414...@n3.nabble.com> wrote:

> You can set name in SparkConf() or if You are using Spark submit set
> --name flag
>
> *val sparkconf = new SparkConf()*
> * .setMaster("local[4]")*
> * .setAppName("saveFileJob")*
> *val sc = new SparkContext(sparkconf)*
>
>
> or spark-submit :
>
> *./bin/spark-submit --name "FileSaveJob" --master local[4]  fileSaver.jar*
>
>
>
>
> On Mon, Jul 25, 2016 at 9:46 PM, neil90 [via Apache Spark User List] <[hidden
> email] <http:///user/SendEmail.jtp?type=node=27414=0>> wrote:
>
>> As far as I know you can give a name to the SparkContext. I recommend
>> using a cluster monitoring tool like Ganglia to determine were its slow in
>> your spark jobs.
>>
>> --
>> If you reply to this email, your message will be added to the discussion
>> below:
>>
>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-give-name-to-Spark-jobs-shown-in-Spark-UI-tp27400p27406.html
>> To start a new topic under Apache Spark User List, email [hidden email]
>> <http:///user/SendEmail.jtp?type=node=27414=1>
>> To unsubscribe from Apache Spark User List, click here.
>> NAML
>> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer=instant_html%21nabble%3Aemail.naml=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
> Software Developer Sigmoid (SigmoidAnalytics), India
>
>
> --
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-give-name-to-Spark-jobs-shown-in-Spark-UI-tp27400p27414.html
> To unsubscribe from How to give name to Spark jobs shown in Spark UI, click
> here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code=27400=dW1lc2gua2FjaGFAZ21haWwuY29tfDI3NDAwfDEwMTUyMzU4ODk=>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer=instant_html%21nabble%3Aemail.naml=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-give-name-to-Spark-jobs-shown-in-Spark-UI-tp27400p27415.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: How to give name to Spark jobs shown in Spark UI

2016-07-26 Thread rahulkumar-aws

You can set name in SparkConf() or if You are using Spark submit set --name
flag

*val sparkconf = new SparkConf()*
* .setMaster("local[4]")*
* .setAppName("saveFileJob")*
*val sc = new SparkContext(sparkconf)*


or spark-submit :

*./bin/spark-submit --name "FileSaveJob" --master local[4]  fileSaver.jar*




On Mon, Jul 25, 2016 at 9:46 PM, neil90 [via Apache Spark User List] <
ml-node+s1001560n27406...@n3.nabble.com> wrote:

> As far as I know you can give a name to the SparkContext. I recommend
> using a cluster monitoring tool like Ganglia to determine were its slow in
> your spark jobs.
>
> --
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-give-name-to-Spark-jobs-shown-in-Spark-UI-tp27400p27406.html
> To start a new topic under Apache Spark User List, email
> ml-node+s1001560n1...@n3.nabble.com
> To unsubscribe from Apache Spark User List, click here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code=1=cmFodWxrdW1hci5hd3NAZ21haWwuY29tfDF8LTEzNTczMzg4MjQ=>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer=instant_html%21nabble%3Aemail.naml=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




-
Software Developer
Sigmoid (SigmoidAnalytics), India

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-give-name-to-Spark-jobs-shown-in-Spark-UI-tp27400p27414.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Spark Jobs not getting shown in Spark UI browser

2016-07-26 Thread Prashant verma

Hi All,
 I have recently started using  Spark 1.6.2 for running my spark jobs. But
now my jobs are not getting shown in the spark browser UI, even though the
job is running fine which i can see in shell output.

Any suggestions.

Thanks,
Prashant Verma

Re: How to give name to Spark jobs shown in Spark UI

2016-07-23 Thread Andrew Ehrlich

As far as I know, the best you can do is refer to the Actions by line number.

> On Jul 23, 2016, at 8:47 AM, unk1102 <umesh.ka...@gmail.com> wrote:
> 
> Hi I have multiple child spark jobs run at a time. Is there any way to name
> these child spark jobs so I can identify slow running ones. For e. g.
> xyz_saveAsTextFile(),  abc_saveAsTextFile() etc please guide. Thanks in
> advance. 
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-give-name-to-Spark-jobs-shown-in-Spark-UI-tp27400.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

How to give name to Spark jobs shown in Spark UI

2016-07-23 Thread unk1102

Hi I have multiple child spark jobs run at a time. Is there any way to name
these child spark jobs so I can identify slow running ones. For e. g.
xyz_saveAsTextFile(),  abc_saveAsTextFile() etc please guide. Thanks in
advance. 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-give-name-to-Spark-jobs-shown-in-Spark-UI-tp27400.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Understanding Spark UI DAGs

2016-07-21 Thread C. Josephson

Ok, so those line numbers in our DAG don't refer to our code. Is there any
way to display (or calculate) line numbers that refer to code we actually
wrote, or is that only possible in Scala Spark?

On Thu, Jul 21, 2016 at 12:24 PM, Jacek Laskowski  wrote:

> Hi,
>
> My little understanding of Python-Spark bridge is that at some point
> the python code communicates over the wire with Spark's backbone that
> includes PythonRDD [1].
>
> When the CallSite can't be computed, it's null:-1 to denote "nothing
> could be referred to".
>
> [1]
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Thu, Jul 21, 2016 at 8:36 PM, C. Josephson  wrote:
> >> It's called a CallSite that shows where the line comes from. You can see
> >> the code yourself given the python file and the line number.
> >
> >
> > But that's what I don't understand. Which python file? We spark submit
> one
> > file called ctr_parsing.py, but it only has 150 lines. So what is
> > MapPartitions at PythonRDD.scala:374 referring to? ctr_parsing.py
> imports a
> > number of support functions we wrote, but how do we know which python
> file
> > to look at?
> >
> > Furthermore, what on earth is null:-1 referring to?
>



-- 
Colleen Josephson
Engineering Researcher
Uhana, Inc.

Re: Understanding Spark UI DAGs

2016-07-21 Thread RK Aduri

That -1 is coming from here:

PythonRDD.writeIteratorToStream(inputIterator, dataOut)
dataOut.writeInt(SpecialLengths.END_OF_DATA_SECTION)   —>  val 
END_OF_DATA_SECTION = -1
dataOut.writeInt(SpecialLengths.END_OF_STREAM)
dataOut.flush()

> On Jul 21, 2016, at 12:24 PM, Jacek Laskowski  wrote:
> 
> Hi,
> 
> My little understanding of Python-Spark bridge is that at some point
> the python code communicates over the wire with Spark's backbone that
> includes PythonRDD [1].
> 
> When the CallSite can't be computed, it's null:-1 to denote "nothing
> could be referred to".
> 
> [1] 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
> 
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
> 
> 
> On Thu, Jul 21, 2016 at 8:36 PM, C. Josephson  wrote:
>>> It's called a CallSite that shows where the line comes from. You can see
>>> the code yourself given the python file and the line number.
>> 
>> 
>> But that's what I don't understand. Which python file? We spark submit one
>> file called ctr_parsing.py, but it only has 150 lines. So what is
>> MapPartitions at PythonRDD.scala:374 referring to? ctr_parsing.py imports a
>> number of support functions we wrote, but how do we know which python file
>> to look at?
>> 
>> Furthermore, what on earth is null:-1 referring to?
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 


-- 
Collective[i] dramatically improves sales and marketing performance using 
technology, applications and a revolutionary network designed to provide 
next generation analytics and decision-support directly to business users. 
Our goal is to maximize human potential and minimize mistakes. In most 
cases, the results are astounding. We cannot, however, stop emails from 
sometimes being sent to the wrong person. If you are not the intended 
recipient, please notify us by replying to this email's sender and deleting 
it (and any attachments) permanently from your system. If you are, please 
respect the confidentiality of this communication's contents.

Re: Understanding Spark UI DAGs

2016-07-21 Thread Jacek Laskowski

Hi,

My little understanding of Python-Spark bridge is that at some point
the python code communicates over the wire with Spark's backbone that
includes PythonRDD [1].

When the CallSite can't be computed, it's null:-1 to denote "nothing
could be referred to".

[1] 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Thu, Jul 21, 2016 at 8:36 PM, C. Josephson  wrote:
>> It's called a CallSite that shows where the line comes from. You can see
>> the code yourself given the python file and the line number.
>
>
> But that's what I don't understand. Which python file? We spark submit one
> file called ctr_parsing.py, but it only has 150 lines. So what is
> MapPartitions at PythonRDD.scala:374 referring to? ctr_parsing.py imports a
> number of support functions we wrote, but how do we know which python file
> to look at?
>
> Furthermore, what on earth is null:-1 referring to?

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Understanding Spark UI DAGs

2016-07-21 Thread C. Josephson

>
> It's called a CallSite that shows where the line comes from. You can see
> the code yourself given the python file and the line number.
>

But that's what I don't understand. Which python file? We spark submit one
file called ctr_parsing.py, but it only has 150 lines. So what is
MapPartitions at PythonRDD.scala:374 referring to? ctr_parsing.py imports a
number of support functions we wrote, but how do we know which python file
to look at?

Furthermore, what on earth is null:-1 referring to?

Re: Understanding Spark UI DAGs

2016-07-21 Thread Jacek Laskowski

On Thu, Jul 21, 2016 at 2:56 AM, C. Josephson  wrote:

> I just started looking at the DAG for a Spark Streaming job, and had a
> couple of questions about it (image inline).
>
> 1.) What do the numbers in brackets mean, e.g. PythonRDD[805]?
>

Every RDD has its identifier (as id attribute) within a SparkContext (which
is the broadest scope an RDD can belong to). In this case, it means you've
already created 806 RDDs (counting from 0).

> 2.) What code is "RDD at PythonRDD.scala:43" referring to? Is there any
> way to tie this back to lines of code we've written in pyspark?
>

It's called a CallSite that shows where the line comes from. You can see
the code yourself given the python file and the line number.

Jacek

spark UI what does storage memory x/y mean

2016-07-11 Thread Andy Davidson

My stream app is running into problems It seems to slow down over time. How
can I interpret the storage memory column. I wonder if I have a GC problem?
Any idea how I can get GC stats?

Thanks

Andy

Executors (3)
* Memory: 9.4 GB Used (1533.4 MB Total)
* Disk: 0.0 B Used
Executor IDAddressRDD BlocksStorage MemoryDisk UsedActive TasksFailed
TasksComplete TasksTotal TasksTask TimeInputShuffle ReadShuffle
WriteLogsThread Dump
0ip-172-31-23-202.us-west-1.compute.internal:5245628604.7 GB / 511.1 MB0.0
B04013495793499805.37 h72.9 GB84.0 B5.9 MBstdout

stderr 

Thread Dump 

1ip-172-31-23-200.us-west-1.compute.internal:5160928544.6 GB / 511.1 MB0.0
B04113493653497765.42 h72.6 GB142.0 B5.9 MBstdout

stderr 

Thread Dump 

driver172.31.23.203:4801800.0 B / 511.1 MB0.0 B0 ms0.0 B0.0 B0.0 BThread
Dump

Re: Spark UI shows finished when job had an error

2016-06-17 Thread Mich Talebzadeh

Spark GUI runs by default on 4040 and if a job crashes (assuming you meant
there was an issue with spark-submit), then the GUI will disconnect.

GUI is not there for diagnostics as it reports on statistics. My
inclination would be to look at the YARN log files assuming you are using
YARN as your resource manager or the output from the spark-submit that you
piped to a file.

HTH

Dr Mich Talebzadeh

LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

http://talebzadehmich.wordpress.com

On 17 June 2016 at 14:49, Sumona Routh <sumos...@gmail.com> wrote:

> Hi there,
> Our Spark job had an error (specifically the Cassandra table definition
> did not match what was in Cassandra), which threw an exception that logged
> out to our spark-submit log.
> However ,the UI never showed any failed stage or job. It appeared as if
> the job finished without error, which is not correct.
>
> We are trying to define our monitoring for our scheduled jobs, and we
> intended to use the Spark UI to catch issues. Can we explain why the UI
> would not report an exception like this? Is there a better approach we
> should use for tracking failures in a Spark job?
>
> We are currently on 1.2 standalone, however we do intend to upgrade to 1.6
> shortly.
>
> Thanks!
> Sumona
>

Re: Spark UI shows finished when job had an error

2016-06-17 Thread Gourav Sengupta

Hi,

Can you please see the query plan (in case you are using a query)?

There is a very high chance that the query was broken into multiple steps
and only a subsequent step failed.


Regards,
Gourav Sengupta

On Fri, Jun 17, 2016 at 2:49 PM, Sumona Routh <sumos...@gmail.com> wrote:

> Hi there,
> Our Spark job had an error (specifically the Cassandra table definition
> did not match what was in Cassandra), which threw an exception that logged
> out to our spark-submit log.
> However ,the UI never showed any failed stage or job. It appeared as if
> the job finished without error, which is not correct.
>
> We are trying to define our monitoring for our scheduled jobs, and we
> intended to use the Spark UI to catch issues. Can we explain why the UI
> would not report an exception like this? Is there a better approach we
> should use for tracking failures in a Spark job?
>
> We are currently on 1.2 standalone, however we do intend to upgrade to 1.6
> shortly.
>
> Thanks!
> Sumona
>

Re: Spark UI shows finished when job had an error

2016-06-17 Thread Jacek Laskowski

Hi,

How do you access Cassandra? Could that connector not have sent a
SparkListenerEvent to inform about failure?

Jacek
On 17 Jun 2016 3:50 p.m., "Sumona Routh" <sumos...@gmail.com> wrote:

> Hi there,
> Our Spark job had an error (specifically the Cassandra table definition
> did not match what was in Cassandra), which threw an exception that logged
> out to our spark-submit log.
> However ,the UI never showed any failed stage or job. It appeared as if
> the job finished without error, which is not correct.
>
> We are trying to define our monitoring for our scheduled jobs, and we
> intended to use the Spark UI to catch issues. Can we explain why the UI
> would not report an exception like this? Is there a better approach we
> should use for tracking failures in a Spark job?
>
> We are currently on 1.2 standalone, however we do intend to upgrade to 1.6
> shortly.
>
> Thanks!
> Sumona
>

Spark UI shows finished when job had an error

2016-06-17 Thread Sumona Routh

Hi there,
Our Spark job had an error (specifically the Cassandra table definition did
not match what was in Cassandra), which threw an exception that logged out
to our spark-submit log.
However ,the UI never showed any failed stage or job. It appeared as if the
job finished without error, which is not correct.

We are trying to define our monitoring for our scheduled jobs, and we
intended to use the Spark UI to catch issues. Can we explain why the UI
would not report an exception like this? Is there a better approach we
should use for tracking failures in a Spark job?

We are currently on 1.2 standalone, however we do intend to upgrade to 1.6
shortly.

Thanks!
Sumona

Re: Spark UI doesn't give visibility on which stage job actually failed (due to lazy eval nature)

2016-05-25 Thread Nirav Patel

I think it does because user doesn't exactly see their application logic
and flow as spark internal does. Off course we follow general guidelines
for performance but we shouldn't care really how exactly spark decide to
execute DAG. Spark scheduler or core can keep changing over time to
optimize it. So optimizing from user perspective is to look at what
transformation they are using and what they are doing inside those
transformation. If user have some transparency from framework on how those
transformation are utilizing resources over time or where they are failing
we can better optimize it . That way we are focused on our application
logic rather what framework is doing underneath.

About soln, doesn't spark driver (spark context + event listner) have
knowledge of every job, taskset, task and their current state? Spark UI can
relate job to stage to task then why not stage to transformation.

Again my real point is to assess this as an requirement from users,
stakeholders perspective regardless of technical challenge.

Thanks
Nirav

On Wed, May 25, 2016 at 8:04 PM, Mark Hamstra <m...@clearstorydata.com>
wrote:

> But when you talk about optimizing the DAG, it really doesn't make sense
> to also talk about transformation steps as separate entities.  The
> DAGScheduler knows about Jobs, Stages, TaskSets and Tasks.  The
> TaskScheduler knows about TaskSets ad Tasks.  Neither of them understands
> the transformation steps that you used to define your RDD -- at least not
> as separable, distinct steps.  To give the kind of
> transformation-step-oriented information that you want would require parts
> of Spark that don't currently concern themselves at all with RDD
> transformation steps to start tracking them and how they map to Jobs,
> Stages, TaskSets and Tasks -- and when you start talking about Datasets and
> Spark SQL, you then needing to start talking about tracking and mapping
> concepts like Plans, Schemas and Queries.  It would introduce significant
> new complexity.
>
> On Wed, May 25, 2016 at 6:59 PM, Nirav Patel <npa...@xactlycorp.com>
> wrote:
>
>> Hi Mark,
>>
>> I might have said stage instead of step in my last statement "UI just
>> says Collect failed but in fact it could be any stage in that lazy chain of
>> evaluation."
>>
>> Anyways even you agree that this visibility of underlaying steps wont't
>> be available. which does pose difficulties in terms of troubleshooting as
>> well as optimizations at step level. I think users will have hard time
>> without this. Its great that spark community working on different levels of
>> internal optimizations but its also important to give enough visibility
>> to users to enable them to debug issues and resolve bottleneck.
>> There is also no visibility into how spark utilizes shuffle memory space
>> vs user memory space vs cache space. It's a separate topic though. If
>> everything is working magically as a black box then it's fine but when you
>> have large number of people on this site complaining about  OOM and shuffle
>> error all the time you need to start providing some transparency to
>> address that.
>>
>> Thanks
>>
>>
>> On Wed, May 25, 2016 at 6:41 PM, Mark Hamstra <m...@clearstorydata.com>
>> wrote:
>>
>>> You appear to be misunderstanding the nature of a Stage.  Individual
>>> transformation steps such as `map` do not define the boundaries of Stages.
>>> Rather, a sequence of transformations in which there is only a
>>> NarrowDependency between each of the transformations will be pipelined into
>>> a single Stage.  It is only when there is a ShuffleDependency that a new
>>> Stage will be defined -- i.e. shuffle boundaries define Stage boundaries.
>>> With whole stage code gen in Spark 2.0, there will be even less opportunity
>>> to treat individual transformations within a sequence of narrow
>>> dependencies as though they were discrete, separable entities.  The Failed
>>> Stages portion of the Web UI will tell you which Stage in a Job failed, and
>>> the accompanying error log message will generally also give you some idea
>>> of which Tasks failed and why.  Tracing the error back further and at a
>>> different level of abstraction to lay blame on a particular transformation
>>> wouldn't be particularly easy.
>>>
>>> On Wed, May 25, 2016 at 5:28 PM, Nirav Patel <npa...@xactlycorp.com>
>>> wrote:
>>>
>>>> It's great that spark scheduler does optimized DAG processing and only
>>>> does lazy eval when some action is performed or shuffle dependency is
>>>> encountered. Sometime it goes further after shuffle dep bef

Re: Spark UI doesn't give visibility on which stage job actually failed (due to lazy eval nature)

2016-05-25 Thread Mark Hamstra

But when you talk about optimizing the DAG, it really doesn't make sense to
also talk about transformation steps as separate entities.  The
DAGScheduler knows about Jobs, Stages, TaskSets and Tasks.  The
TaskScheduler knows about TaskSets ad Tasks.  Neither of them understands
the transformation steps that you used to define your RDD -- at least not
as separable, distinct steps.  To give the kind of
transformation-step-oriented information that you want would require parts
of Spark that don't currently concern themselves at all with RDD
transformation steps to start tracking them and how they map to Jobs,
Stages, TaskSets and Tasks -- and when you start talking about Datasets and
Spark SQL, you then needing to start talking about tracking and mapping
concepts like Plans, Schemas and Queries.  It would introduce significant
new complexity.

On Wed, May 25, 2016 at 6:59 PM, Nirav Patel <npa...@xactlycorp.com> wrote:

> Hi Mark,
>
> I might have said stage instead of step in my last statement "UI just
> says Collect failed but in fact it could be any stage in that lazy chain of
> evaluation."
>
> Anyways even you agree that this visibility of underlaying steps wont't be
> available. which does pose difficulties in terms of troubleshooting as well
> as optimizations at step level. I think users will have hard time without
> this. Its great that spark community working on different levels of
> internal optimizations but its also important to give enough visibility
> to users to enable them to debug issues and resolve bottleneck.
> There is also no visibility into how spark utilizes shuffle memory space
> vs user memory space vs cache space. It's a separate topic though. If
> everything is working magically as a black box then it's fine but when you
> have large number of people on this site complaining about  OOM and shuffle
> error all the time you need to start providing some transparency to
> address that.
>
> Thanks
>
>
> On Wed, May 25, 2016 at 6:41 PM, Mark Hamstra <m...@clearstorydata.com>
> wrote:
>
>> You appear to be misunderstanding the nature of a Stage.  Individual
>> transformation steps such as `map` do not define the boundaries of Stages.
>> Rather, a sequence of transformations in which there is only a
>> NarrowDependency between each of the transformations will be pipelined into
>> a single Stage.  It is only when there is a ShuffleDependency that a new
>> Stage will be defined -- i.e. shuffle boundaries define Stage boundaries.
>> With whole stage code gen in Spark 2.0, there will be even less opportunity
>> to treat individual transformations within a sequence of narrow
>> dependencies as though they were discrete, separable entities.  The Failed
>> Stages portion of the Web UI will tell you which Stage in a Job failed, and
>> the accompanying error log message will generally also give you some idea
>> of which Tasks failed and why.  Tracing the error back further and at a
>> different level of abstraction to lay blame on a particular transformation
>> wouldn't be particularly easy.
>>
>> On Wed, May 25, 2016 at 5:28 PM, Nirav Patel <npa...@xactlycorp.com>
>> wrote:
>>
>>> It's great that spark scheduler does optimized DAG processing and only
>>> does lazy eval when some action is performed or shuffle dependency is
>>> encountered. Sometime it goes further after shuffle dep before executing
>>> anything. e.g. if there are map steps after shuffle then it doesn't stop at
>>> shuffle to execute anything but goes to that next map steps until it finds
>>> a reason(spark action) to execute. As a result stage that spark is running
>>> can be internally series of (map -> shuffle -> map -> map -> collect) and
>>> spark UI just shows its currently running 'collect' stage. SO  if job fails
>>> at that point spark UI just says Collect failed but in fact it could be any
>>> stage in that lazy chain of evaluation. Looking at executor logs gives some
>>> insights but that's not always straightforward.
>>> Correct me if I am wrong here but I think we need more visibility into
>>> what's happening underneath so we can easily troubleshoot as well as
>>> optimize our DAG.
>>>
>>> THanks
>>>
>>>
>>>
>>> [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>
>>>
>>> <https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn]
>>> <https://www.linkedin.com/company/xactly-corporation>  [image: Twitter]
>>> <https://twitter.com/Xactly>  [image: Facebook]
>>> <https://www.facebook.com/XactlyCorp>  [image: YouTube]
>>> <http://www.youtube.com/xactlycorporation>
>>
>>
>>
>
>
>
> [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>
>
> <https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn]
> <https://www.linkedin.com/company/xactly-corporation>  [image: Twitter]
> <https://twitter.com/Xactly>  [image: Facebook]
> <https://www.facebook.com/XactlyCorp>  [image: YouTube]
> <http://www.youtube.com/xactlycorporation>
>

1 2 3 >

1 - 100 of 207 matches

Mail list logo