[jira] [Commented] (SPARK-5647) Output metrics do not show up for older hadoop versions (< 2.5)
[ https://issues.apache.org/jira/browse/SPARK-5647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313020#comment-14313020 ] Patrick Wendell commented on SPARK-5647: Isn't it just possible to get the file path in the case of file output format, and then read the size of that file? The main challenge I see is how quickly that size becomes visible to the HDFS client. In general I think it's worth doing because a lot of people still use older versions of the Spark HDFS client, for instance people based on AWS who primarily read from S3 and don't keep up to date with the newest Hadoop API's. > Output metrics do not show up for older hadoop versions (< 2.5) > --- > > Key: SPARK-5647 > URL: https://issues.apache.org/jira/browse/SPARK-5647 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Reporter: Kostas Sakellis > > Need to add output metrics for hadoop < 2.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5647) Output metrics do not show up for older hadoop versions (< 2.5)
[ https://issues.apache.org/jira/browse/SPARK-5647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312841#comment-14312841 ] Kay Ousterhout commented on SPARK-5647: --- Cool thanks [~sandyr]...mostly I was curious because I've done this for my own purposes by recompiling HDFS with these metrics exposed, and was just wondering if there was something simpler. > Output metrics do not show up for older hadoop versions (< 2.5) > --- > > Key: SPARK-5647 > URL: https://issues.apache.org/jira/browse/SPARK-5647 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Reporter: Kostas Sakellis > > Need to add output metrics for hadoop < 2.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5647) Output metrics do not show up for older hadoop versions (< 2.5)
[ https://issues.apache.org/jira/browse/SPARK-5647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312834#comment-14312834 ] Sandy Ryza commented on SPARK-5647: --- Yeah, we would need to check the final file size. But my opinion is that this isn't worth the effort. > Output metrics do not show up for older hadoop versions (< 2.5) > --- > > Key: SPARK-5647 > URL: https://issues.apache.org/jira/browse/SPARK-5647 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Reporter: Kostas Sakellis > > Need to add output metrics for hadoop < 2.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5647) Output metrics do not show up for older hadoop versions (< 2.5)
[ https://issues.apache.org/jira/browse/SPARK-5647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312818#comment-14312818 ] Kostas Sakellis commented on SPARK-5647: I'm not sure if this is possible with older hadoop. Need to do some investigation. We could possibly check the final file size after it has been written? [~sandyr] had some ideas when I talked to him about it. > Output metrics do not show up for older hadoop versions (< 2.5) > --- > > Key: SPARK-5647 > URL: https://issues.apache.org/jira/browse/SPARK-5647 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Reporter: Kostas Sakellis > > Need to add output metrics for hadoop < 2.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5647) Output metrics do not show up for older hadoop versions (< 2.5)
[ https://issues.apache.org/jira/browse/SPARK-5647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312811#comment-14312811 ] Kay Ousterhout commented on SPARK-5647: --- Is this possible? I thought Hadoop didn't add thread-level stats until 2.5 (see the comment here: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L156) -- is there a different way you were thinking of adding the output bytes? > Output metrics do not show up for older hadoop versions (< 2.5) > --- > > Key: SPARK-5647 > URL: https://issues.apache.org/jira/browse/SPARK-5647 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Reporter: Kostas Sakellis > > Need to add output metrics for hadoop < 2.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org