[jira] [Commented] (SPARK-10912) Improve Spark metrics executor.filesystem
[ https://issues.apache.org/jira/browse/SPARK-10912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370089#comment-16370089 ] Gil Vernik commented on SPARK-10912: This patch need to be generic and doesn't include "s3a" in the code, but rather take the value from configuration. This way other connectors, will benefit from this patch as well. [~srowen] how is this sounds? > Improve Spark metrics executor.filesystem > - > > Key: SPARK-10912 > URL: https://issues.apache.org/jira/browse/SPARK-10912 > Project: Spark > Issue Type: Improvement > Components: Deploy >Affects Versions: 1.5.0 >Reporter: Yongjia Wang >Priority: Minor > Attachments: s3a_metrics.patch > > > In org.apache.spark.executor.ExecutorSource it has 2 filesystem metrics: > "hdfs" and "file". I started using s3 as the persistent storage with Spark > standalone cluster in EC2, and s3 read/write metrics do not appear anywhere. > The 'file' metric appears to be only for driver reading local file, it would > be nice to also report shuffle read/write metrics, so it can help with > optimization. > I think these 2 things (s3 and shuffle) are very useful and cover all the > missing information about Spark IO especially for s3 setup. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10912) Improve Spark metrics executor.filesystem
[ https://issues.apache.org/jira/browse/SPARK-10912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359810#comment-16359810 ] Harel Ben Attia commented on SPARK-10912: - We would really be glad to see this happening as well, without the need to change spark's source code. Also, externalizing the array to a configuration properly in metrics.properties would be best (or auto-supporting each used FileSystem schema obviously, but this might include bigger changes to the registration logic, so it's not necessary). > Improve Spark metrics executor.filesystem > - > > Key: SPARK-10912 > URL: https://issues.apache.org/jira/browse/SPARK-10912 > Project: Spark > Issue Type: Improvement > Components: Deploy >Affects Versions: 1.5.0 >Reporter: Yongjia Wang >Priority: Minor > Attachments: s3a_metrics.patch > > > In org.apache.spark.executor.ExecutorSource it has 2 filesystem metrics: > "hdfs" and "file". I started using s3 as the persistent storage with Spark > standalone cluster in EC2, and s3 read/write metrics do not appear anywhere. > The 'file' metric appears to be only for driver reading local file, it would > be nice to also report shuffle read/write metrics, so it can help with > optimization. > I think these 2 things (s3 and shuffle) are very useful and cover all the > missing information about Spark IO especially for s3 setup. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10912) Improve Spark metrics executor.filesystem
[ https://issues.apache.org/jira/browse/SPARK-10912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638836#comment-15638836 ] Yongjia Wang commented on SPARK-10912: -- s3a and hdfs are different "schemes" in Spark's FileSystem.Statistics I think it is Spark's responsibility to choose what to report, and currently only "hdfs" and "file" are reported. I have been using the attached s3a_metrics.patch to build Spark in order to get the s3a metrics reported. I'm not sure whether there is a way to report s3a metrics just through configuration (without changing Spark source like what was did in the attached patch file). Now I need to add GoogleHadoopFileSystem's "gs" metrics, please advise the best approach. Thank you. > Improve Spark metrics executor.filesystem > - > > Key: SPARK-10912 > URL: https://issues.apache.org/jira/browse/SPARK-10912 > Project: Spark > Issue Type: Improvement > Components: Deploy >Affects Versions: 1.5.0 >Reporter: Yongjia Wang >Priority: Minor > Attachments: s3a_metrics.patch > > > In org.apache.spark.executor.ExecutorSource it has 2 filesystem metrics: > "hdfs" and "file". I started using s3 as the persistent storage with Spark > standalone cluster in EC2, and s3 read/write metrics do not appear anywhere. > The 'file' metric appears to be only for driver reading local file, it would > be nice to also report shuffle read/write metrics, so it can help with > optimization. > I think these 2 things (s3 and shuffle) are very useful and cover all the > missing information about Spark IO especially for s3 setup. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org