[jira] [Commented] (SPARK-10912) Improve Spark metrics executor.filesystem

2018-02-20 Thread Gil Vernik (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370089#comment-16370089
 ] 

Gil Vernik commented on SPARK-10912:


This patch need to be generic and doesn't include "s3a" in the code, but rather 
take the value from configuration. This way other connectors, will benefit from 
this patch as well. [~srowen] how is this sounds?

> Improve Spark metrics executor.filesystem
> -
>
> Key: SPARK-10912
> URL: https://issues.apache.org/jira/browse/SPARK-10912
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Affects Versions: 1.5.0
>Reporter: Yongjia Wang
>Priority: Minor
> Attachments: s3a_metrics.patch
>
>
> In org.apache.spark.executor.ExecutorSource it has 2 filesystem metrics: 
> "hdfs" and "file". I started using s3 as the persistent storage with Spark 
> standalone cluster in EC2, and s3 read/write metrics do not appear anywhere. 
> The 'file' metric appears to be only for driver reading local file, it would 
> be nice to also report shuffle read/write metrics, so it can help with 
> optimization.
> I think these 2 things (s3 and shuffle) are very useful and cover all the 
> missing information about Spark IO especially for s3 setup.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10912) Improve Spark metrics executor.filesystem

2018-02-10 Thread Harel Ben Attia (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359810#comment-16359810
 ] 

Harel Ben Attia commented on SPARK-10912:
-

We would really be glad to see this happening as well, without the need to 
change spark's source code.

Also, externalizing the array to a configuration properly in metrics.properties 
would be best (or auto-supporting each used FileSystem schema obviously, but 
this might include bigger changes to the registration logic, so it's not 
necessary).

> Improve Spark metrics executor.filesystem
> -
>
> Key: SPARK-10912
> URL: https://issues.apache.org/jira/browse/SPARK-10912
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Affects Versions: 1.5.0
>Reporter: Yongjia Wang
>Priority: Minor
> Attachments: s3a_metrics.patch
>
>
> In org.apache.spark.executor.ExecutorSource it has 2 filesystem metrics: 
> "hdfs" and "file". I started using s3 as the persistent storage with Spark 
> standalone cluster in EC2, and s3 read/write metrics do not appear anywhere. 
> The 'file' metric appears to be only for driver reading local file, it would 
> be nice to also report shuffle read/write metrics, so it can help with 
> optimization.
> I think these 2 things (s3 and shuffle) are very useful and cover all the 
> missing information about Spark IO especially for s3 setup.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10912) Improve Spark metrics executor.filesystem

2016-11-05 Thread Yongjia Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638836#comment-15638836
 ] 

Yongjia Wang commented on SPARK-10912:
--

s3a and hdfs are different "schemes" in Spark's FileSystem.Statistics
I think it is Spark's responsibility to choose what to report, and currently 
only "hdfs" and "file" are reported.
I have been using the attached s3a_metrics.patch to build Spark in order to get 
the s3a metrics reported. I'm not sure whether there is a way to report s3a 
metrics just through configuration (without changing Spark source like what was 
did in the attached patch file).
Now I need to add GoogleHadoopFileSystem's "gs" metrics, please advise the best 
approach.
Thank you.

> Improve Spark metrics executor.filesystem
> -
>
> Key: SPARK-10912
> URL: https://issues.apache.org/jira/browse/SPARK-10912
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Affects Versions: 1.5.0
>Reporter: Yongjia Wang
>Priority: Minor
> Attachments: s3a_metrics.patch
>
>
> In org.apache.spark.executor.ExecutorSource it has 2 filesystem metrics: 
> "hdfs" and "file". I started using s3 as the persistent storage with Spark 
> standalone cluster in EC2, and s3 read/write metrics do not appear anywhere. 
> The 'file' metric appears to be only for driver reading local file, it would 
> be nice to also report shuffle read/write metrics, so it can help with 
> optimization.
> I think these 2 things (s3 and shuffle) are very useful and cover all the 
> missing information about Spark IO especially for s3 setup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org