[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718368#comment-16718368 ] ASF GitHub Bot commented on SPARK-26193: asfgit closed pull request #23286: [SPARK-26193][SQL][Follow Up] Read metrics rename and display text changes URL: https://github.com/apache/spark/pull/23286 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala b/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala index 2a8d1dd995e27..35664ff515d4b 100644 --- a/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala +++ b/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala @@ -92,7 +92,7 @@ private[spark] class ShuffleMapTask( threadMXBean.getCurrentThreadCpuTime - deserializeStartCpuTime } else 0L -dep.shuffleWriterProcessor.writeProcess(rdd, dep, partitionId, context, partition) +dep.shuffleWriterProcessor.write(rdd, dep, partitionId, context, partition) } override def preferredLocations: Seq[TaskLocation] = preferredLocs diff --git a/core/src/main/scala/org/apache/spark/shuffle/ShuffleWriteProcessor.scala b/core/src/main/scala/org/apache/spark/shuffle/ShuffleWriteProcessor.scala index f5213157a9a85..5b0c7e9f2b0b4 100644 --- a/core/src/main/scala/org/apache/spark/shuffle/ShuffleWriteProcessor.scala +++ b/core/src/main/scala/org/apache/spark/shuffle/ShuffleWriteProcessor.scala @@ -41,7 +41,7 @@ private[spark] class ShuffleWriteProcessor extends Serializable with Logging { * get from [[ShuffleManager]] and triggers rdd compute, finally return the [[MapStatus]] for * this task. */ - def writeProcess( + def write( rdd: RDD[_], dep: ShuffleDependency[_, _, _], partitionId: Int, diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/ShuffledRowRDD.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/ShuffledRowRDD.scala index 9b05faaed0459..079ff25fcb67e 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/ShuffledRowRDD.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/ShuffledRowRDD.scala @@ -22,7 +22,7 @@ import java.util.Arrays import org.apache.spark._ import org.apache.spark.rdd.RDD import org.apache.spark.sql.catalyst.InternalRow -import org.apache.spark.sql.execution.metric.{SQLMetric, SQLShuffleMetricsReporter} +import org.apache.spark.sql.execution.metric.{SQLMetric, SQLShuffleReadMetricsReporter} /** * The [[Partition]] used by [[ShuffledRowRDD]]. A post-shuffle partition @@ -157,9 +157,9 @@ class ShuffledRowRDD( override def compute(split: Partition, context: TaskContext): Iterator[InternalRow] = { val shuffledRowPartition = split.asInstanceOf[ShuffledRowRDDPartition] val tempMetrics = context.taskMetrics().createTempShuffleReadMetrics() -// `SQLShuffleMetricsReporter` will update its own metrics for SQL exchange operator, +// `SQLShuffleReadMetricsReporter` will update its own metrics for SQL exchange operator, // as well as the `tempMetrics` for basic shuffle metrics. -val sqlMetricsReporter = new SQLShuffleMetricsReporter(tempMetrics, metrics) +val sqlMetricsReporter = new SQLShuffleReadMetricsReporter(tempMetrics, metrics) // The range of pre-shuffle partitions that we are fetching at here is // [startPreShufflePartitionIndex, endPreShufflePartitionIndex - 1]. val reader = diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala index 0c2020572e721..da7b0c6f43fbc 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala @@ -31,7 +31,7 @@ import org.apache.spark.sql.catalyst.expressions.{Attribute, BoundReference, Uns import org.apache.spark.sql.catalyst.expressions.codegen.LazilyGeneratedOrdering import org.apache.spark.sql.catalyst.plans.physical._ import org.apache.spark.sql.execution._ -import org.apache.spark.sql.execution.metric.{SQLMetric, SQLMetrics, SQLShuffleMetricsReporter, SQLShuffleWriteMetricsReporter} +import org.apache.spark.sql.execution.metric.{SQLMetric, SQLMetrics, SQLShuffleReadMetricsReporter, SQLShuffleWriteMetricsReporter} import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.types.StructType import org.apache.spark.util.MutablePair @@ -50,7 +50,7 @@ case class ShuffleExchangeExec( private lazy val writeMetrics =
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718254#comment-16718254 ] ASF GitHub Bot commented on SPARK-26193: rxin commented on issue #23286: [SPARK-26193][SQL][Follow Up] Read metrics rename and display text changes URL: https://github.com/apache/spark/pull/23286#issuecomment-446413488 LGTM too This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717571#comment-16717571 ] ASF GitHub Bot commented on SPARK-26193: AmplabJenkins commented on issue #23286: [SPARK-26193][SQL][Follow Up] Read metrics rename and display text changes URL: https://github.com/apache/spark/pull/23286#issuecomment-446284279 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99971/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717573#comment-16717573 ] ASF GitHub Bot commented on SPARK-26193: AmplabJenkins removed a comment on issue #23286: [SPARK-26193][SQL][Follow Up] Read metrics rename and display text changes URL: https://github.com/apache/spark/pull/23286#issuecomment-446284258 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717575#comment-16717575 ] ASF GitHub Bot commented on SPARK-26193: AmplabJenkins removed a comment on issue #23286: [SPARK-26193][SQL][Follow Up] Read metrics rename and display text changes URL: https://github.com/apache/spark/pull/23286#issuecomment-446284279 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99971/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717559#comment-16717559 ] ASF GitHub Bot commented on SPARK-26193: SparkQA commented on issue #23286: [SPARK-26193][SQL][Follow Up] Read metrics rename and display text changes URL: https://github.com/apache/spark/pull/23286#issuecomment-446283476 **[Test build #99971 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99971/testReport)** for PR 23286 at commit [`8d6dac9`](https://github.com/apache/spark/commit/8d6dac92d8787dbd7f9b576545a6fbe6999861af). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717588#comment-16717588 ] ASF GitHub Bot commented on SPARK-26193: AmplabJenkins removed a comment on issue #23286: [SPARK-26193][SQL][Follow Up] Read metrics rename and display text changes URL: https://github.com/apache/spark/pull/23286#issuecomment-446285210 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99972/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717587#comment-16717587 ] ASF GitHub Bot commented on SPARK-26193: AmplabJenkins removed a comment on issue #23286: [SPARK-26193][SQL][Follow Up] Read metrics rename and display text changes URL: https://github.com/apache/spark/pull/23286#issuecomment-446285205 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717584#comment-16717584 ] ASF GitHub Bot commented on SPARK-26193: AmplabJenkins commented on issue #23286: [SPARK-26193][SQL][Follow Up] Read metrics rename and display text changes URL: https://github.com/apache/spark/pull/23286#issuecomment-446285205 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717585#comment-16717585 ] ASF GitHub Bot commented on SPARK-26193: AmplabJenkins commented on issue #23286: [SPARK-26193][SQL][Follow Up] Read metrics rename and display text changes URL: https://github.com/apache/spark/pull/23286#issuecomment-446285210 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99972/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717576#comment-16717576 ] ASF GitHub Bot commented on SPARK-26193: SparkQA removed a comment on issue #23286: [SPARK-26193][SQL][Follow Up] Read metrics rename and display text changes URL: https://github.com/apache/spark/pull/23286#issuecomment-446191510 **[Test build #99972 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99972/testReport)** for PR 23286 at commit [`8d6dac9`](https://github.com/apache/spark/commit/8d6dac92d8787dbd7f9b576545a6fbe6999861af). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717570#comment-16717570 ] ASF GitHub Bot commented on SPARK-26193: AmplabJenkins commented on issue #23286: [SPARK-26193][SQL][Follow Up] Read metrics rename and display text changes URL: https://github.com/apache/spark/pull/23286#issuecomment-446284258 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717572#comment-16717572 ] ASF GitHub Bot commented on SPARK-26193: SparkQA commented on issue #23286: [SPARK-26193][SQL][Follow Up] Read metrics rename and display text changes URL: https://github.com/apache/spark/pull/23286#issuecomment-446284495 **[Test build #99972 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99972/testReport)** for PR 23286 at commit [`8d6dac9`](https://github.com/apache/spark/commit/8d6dac92d8787dbd7f9b576545a6fbe6999861af). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717567#comment-16717567 ] ASF GitHub Bot commented on SPARK-26193: SparkQA removed a comment on issue #23286: [SPARK-26193][SQL][Follow Up] Read metrics rename and display text changes URL: https://github.com/apache/spark/pull/23286#issuecomment-446189996 **[Test build #99971 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99971/testReport)** for PR 23286 at commit [`8d6dac9`](https://github.com/apache/spark/commit/8d6dac92d8787dbd7f9b576545a6fbe6999861af). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717064#comment-16717064 ] ASF GitHub Bot commented on SPARK-26193: SparkQA commented on issue #23286: [SPARK-26193][SQL][Follow Up] Read metrics rename and display text changes URL: https://github.com/apache/spark/pull/23286#issuecomment-446191510 **[Test build #99972 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99972/testReport)** for PR 23286 at commit [`8d6dac9`](https://github.com/apache/spark/commit/8d6dac92d8787dbd7f9b576545a6fbe6999861af). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717062#comment-16717062 ] ASF GitHub Bot commented on SPARK-26193: AmplabJenkins removed a comment on issue #23286: [SPARK-26193][SQL][Follow Up] Read metrics rename and display text changes URL: https://github.com/apache/spark/pull/23286#issuecomment-446191027 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5972/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717061#comment-16717061 ] ASF GitHub Bot commented on SPARK-26193: AmplabJenkins removed a comment on issue #23286: [SPARK-26193][SQL][Follow Up] Read metrics rename and display text changes URL: https://github.com/apache/spark/pull/23286#issuecomment-446191023 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717058#comment-16717058 ] ASF GitHub Bot commented on SPARK-26193: AmplabJenkins commented on issue #23286: [SPARK-26193][SQL][Follow Up] Read metrics rename and display text changes URL: https://github.com/apache/spark/pull/23286#issuecomment-446191023 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717059#comment-16717059 ] ASF GitHub Bot commented on SPARK-26193: AmplabJenkins commented on issue #23286: [SPARK-26193][SQL][Follow Up] Read metrics rename and display text changes URL: https://github.com/apache/spark/pull/23286#issuecomment-446191027 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5972/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717047#comment-16717047 ] ASF GitHub Bot commented on SPARK-26193: AmplabJenkins commented on issue #23286: [SPARK-26193][SQL][Follow Up] Read metrics rename and display text changes URL: https://github.com/apache/spark/pull/23286#issuecomment-446190094 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5971/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717048#comment-16717048 ] ASF GitHub Bot commented on SPARK-26193: AmplabJenkins removed a comment on issue #23286: [SPARK-26193][SQL][Follow Up] Read metrics rename and display text changes URL: https://github.com/apache/spark/pull/23286#issuecomment-446190084 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717049#comment-16717049 ] ASF GitHub Bot commented on SPARK-26193: AmplabJenkins removed a comment on issue #23286: [SPARK-26193][SQL][Follow Up] Read metrics rename and display text changes URL: https://github.com/apache/spark/pull/23286#issuecomment-446190094 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5971/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717046#comment-16717046 ] ASF GitHub Bot commented on SPARK-26193: AmplabJenkins commented on issue #23286: [SPARK-26193][SQL][Follow Up] Read metrics rename and display text changes URL: https://github.com/apache/spark/pull/23286#issuecomment-446190084 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717044#comment-16717044 ] ASF GitHub Bot commented on SPARK-26193: SparkQA commented on issue #23286: [SPARK-26193][SQL][Follow Up] Read metrics rename and display text changes URL: https://github.com/apache/spark/pull/23286#issuecomment-446189996 **[Test build #99971 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99971/testReport)** for PR 23286 at commit [`8d6dac9`](https://github.com/apache/spark/commit/8d6dac92d8787dbd7f9b576545a6fbe6999861af). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717043#comment-16717043 ] ASF GitHub Bot commented on SPARK-26193: cloud-fan commented on issue #23286: [SPARK-26193][SQL][Follow Up] Read metrics rename and display text changes URL: https://github.com/apache/spark/pull/23286#issuecomment-446189829 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717042#comment-16717042 ] ASF GitHub Bot commented on SPARK-26193: xuanyuanking commented on issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQL URL: https://github.com/apache/spark/pull/23207#issuecomment-446189428 Thanks all for your review. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717040#comment-16717040 ] ASF GitHub Bot commented on SPARK-26193: xuanyuanking commented on a change in pull request #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQL URL: https://github.com/apache/spark/pull/23207#discussion_r239312090 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala ## @@ -170,13 +172,23 @@ class SQLMetricsSuite extends SparkFunSuite with SQLMetricsTestUtils with Shared val df = testData2.groupBy().agg(collect_set('a)) // 2 partitions testSparkPlanMetrics(df, 1, Map( 2L -> (("ObjectHashAggregate", Map("number of output rows" -> 2L))), + 1L -> (("Exchange", Map( +"shuffle records written" -> 2L, +"records read" -> 2L, +"local blocks fetched" -> 2L, Review comment: Copy, the display text will be done in another pr. Done in #23286. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717039#comment-16717039 ] ASF GitHub Bot commented on SPARK-26193: xuanyuanking commented on a change in pull request #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQL URL: https://github.com/apache/spark/pull/23207#discussion_r239995006 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala ## @@ -38,13 +38,21 @@ case class CollectLimitExec(limit: Int, child: SparkPlan) extends UnaryExecNode override def outputPartitioning: Partitioning = SinglePartition override def executeCollect(): Array[InternalRow] = child.executeTake(limit) private val serializer: Serializer = new UnsafeRowSerializer(child.output.size) - override lazy val metrics = SQLShuffleMetricsReporter.createShuffleReadMetrics(sparkContext) + private lazy val writeMetrics = +SQLShuffleWriteMetricsReporter.createShuffleWriteMetrics(sparkContext) + private lazy val readMetrics = +SQLShuffleMetricsReporter.createShuffleReadMetrics(sparkContext) Review comment: Yea that was done and revert in https://github.com/apache/spark/pull/23207/commits/7d104ebe854effb3d8ceb63cd408b9749cee1a8a, will separate to another pr after this. Done in #23286. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717037#comment-16717037 ] ASF GitHub Bot commented on SPARK-26193: xuanyuanking commented on a change in pull request #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQL URL: https://github.com/apache/spark/pull/23207#discussion_r240587659 ## File path: core/src/main/scala/org/apache/spark/shuffle/ShuffleWriteProcessor.scala ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.shuffle + +import org.apache.spark.{Partition, ShuffleDependency, SparkEnv, TaskContext} +import org.apache.spark.internal.Logging +import org.apache.spark.rdd.RDD +import org.apache.spark.scheduler.MapStatus + +/** + * The interface for customizing shuffle write process. The driver create a ShuffleWriteProcessor + * and put it into [[ShuffleDependency]], and executors use it in each ShuffleMapTask. + */ +private[spark] class ShuffleWriteProcessor extends Serializable with Logging { + + /** + * Create a [[ShuffleWriteMetricsReporter]] from the task context. As the reporter is a + * per-row operator, here need a careful consideration on performance. + */ + protected def createMetricsReporter(context: TaskContext): ShuffleWriteMetricsReporter = { +context.taskMetrics().shuffleWriteMetrics + } + + /** + * The write process for particular partition, it controls the life circle of [[ShuffleWriter]] + * get from [[ShuffleManager]] and triggers rdd compute, finally return the [[MapStatus]] for + * this task. + */ + def writeProcess( Review comment: Copy, will change this to `write`. Done in #23286. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717034#comment-16717034 ] ASF GitHub Bot commented on SPARK-26193: xuanyuanking commented on a change in pull request #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQL URL: https://github.com/apache/spark/pull/23207#discussion_r240587720 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLShuffleMetricsReporter.scala ## @@ -95,3 +96,57 @@ private[spark] object SQLShuffleMetricsReporter { FETCH_WAIT_TIME -> SQLMetrics.createTimingMetric(sc, "fetch wait time"), RECORDS_READ -> SQLMetrics.createMetric(sc, "records read")) } + +/** + * A shuffle write metrics reporter for SQL exchange operators. + * @param metricsReporter Other reporter need to be updated in this SQLShuffleWriteMetricsReporter. + * @param metrics Shuffle write metrics in current SparkPlan. + */ +private[spark] class SQLShuffleWriteMetricsReporter( Review comment: Thanks, will do it in follow up pr. Done in #23286. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717030#comment-16717030 ] ASF GitHub Bot commented on SPARK-26193: xuanyuanking opened a new pull request #23286: [SPARK-26193][SQL][Follow Up] Read metrics rename and display text changes URL: https://github.com/apache/spark/pull/23286 ## What changes were proposed in this pull request? Follow up pr for #23207, include following changes: - Rename `SQLShuffleMetricsReporter` to `SQLShuffleReadMetricsReporter` to make it match with write side naming. - Display text changes for read side for naming consistent. - Rename function in `ShuffleWriteProcessor`. - Delete `private[spark]` in execution package. ## How was this patch tested? Existing tests. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717002#comment-16717002 ] ASF GitHub Bot commented on SPARK-26193: xuanyuanking commented on a change in pull request #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQL URL: https://github.com/apache/spark/pull/23207#discussion_r240587720 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLShuffleMetricsReporter.scala ## @@ -95,3 +96,57 @@ private[spark] object SQLShuffleMetricsReporter { FETCH_WAIT_TIME -> SQLMetrics.createTimingMetric(sc, "fetch wait time"), RECORDS_READ -> SQLMetrics.createMetric(sc, "records read")) } + +/** + * A shuffle write metrics reporter for SQL exchange operators. + * @param metricsReporter Other reporter need to be updated in this SQLShuffleWriteMetricsReporter. + * @param metrics Shuffle write metrics in current SparkPlan. + */ +private[spark] class SQLShuffleWriteMetricsReporter( Review comment: Thanks, will do it in follow up pr. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717001#comment-16717001 ] ASF GitHub Bot commented on SPARK-26193: xuanyuanking commented on a change in pull request #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQL URL: https://github.com/apache/spark/pull/23207#discussion_r240587659 ## File path: core/src/main/scala/org/apache/spark/shuffle/ShuffleWriteProcessor.scala ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.shuffle + +import org.apache.spark.{Partition, ShuffleDependency, SparkEnv, TaskContext} +import org.apache.spark.internal.Logging +import org.apache.spark.rdd.RDD +import org.apache.spark.scheduler.MapStatus + +/** + * The interface for customizing shuffle write process. The driver create a ShuffleWriteProcessor + * and put it into [[ShuffleDependency]], and executors use it in each ShuffleMapTask. + */ +private[spark] class ShuffleWriteProcessor extends Serializable with Logging { + + /** + * Create a [[ShuffleWriteMetricsReporter]] from the task context. As the reporter is a + * per-row operator, here need a careful consideration on performance. + */ + protected def createMetricsReporter(context: TaskContext): ShuffleWriteMetricsReporter = { +context.taskMetrics().shuffleWriteMetrics + } + + /** + * The write process for particular partition, it controls the life circle of [[ShuffleWriter]] + * get from [[ShuffleManager]] and triggers rdd compute, finally return the [[MapStatus]] for + * this task. + */ + def writeProcess( Review comment: Copy, will change this to `write`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716230#comment-16716230 ] ASF GitHub Bot commented on SPARK-26193: HyukjinKwon commented on a change in pull request #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQL URL: https://github.com/apache/spark/pull/23207#discussion_r240478048 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLShuffleMetricsReporter.scala ## @@ -95,3 +96,57 @@ private[spark] object SQLShuffleMetricsReporter { FETCH_WAIT_TIME -> SQLMetrics.createTimingMetric(sc, "fetch wait time"), RECORDS_READ -> SQLMetrics.createMetric(sc, "records read")) } + +/** + * A shuffle write metrics reporter for SQL exchange operators. + * @param metricsReporter Other reporter need to be updated in this SQLShuffleWriteMetricsReporter. + * @param metrics Shuffle write metrics in current SparkPlan. + */ +private[spark] class SQLShuffleWriteMetricsReporter( Review comment: `execution` is already a private package. We don't need to do `private[spark]`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715619#comment-16715619 ] ASF GitHub Bot commented on SPARK-26193: rxin commented on a change in pull request #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQL URL: https://github.com/apache/spark/pull/23207#discussion_r240394461 ## File path: core/src/main/scala/org/apache/spark/shuffle/ShuffleWriteProcessor.scala ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.shuffle + +import org.apache.spark.{Partition, ShuffleDependency, SparkEnv, TaskContext} +import org.apache.spark.internal.Logging +import org.apache.spark.rdd.RDD +import org.apache.spark.scheduler.MapStatus + +/** + * The interface for customizing shuffle write process. The driver create a ShuffleWriteProcessor + * and put it into [[ShuffleDependency]], and executors use it in each ShuffleMapTask. + */ +private[spark] class ShuffleWriteProcessor extends Serializable with Logging { + + /** + * Create a [[ShuffleWriteMetricsReporter]] from the task context. As the reporter is a + * per-row operator, here need a careful consideration on performance. + */ + protected def createMetricsReporter(context: TaskContext): ShuffleWriteMetricsReporter = { +context.taskMetrics().shuffleWriteMetrics + } + + /** + * The write process for particular partition, it controls the life circle of [[ShuffleWriter]] + * get from [[ShuffleManager]] and triggers rdd compute, finally return the [[MapStatus]] for + * this task. + */ + def writeProcess( Review comment: a nit: it's weird to call this "writeProcess". Maybe just "write", or just "process". This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707211#comment-16707211 ] Apache Spark commented on SPARK-26193: -- User 'xuanyuanking' has created a pull request for this issue: https://github.com/apache/spark/pull/23207 > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707210#comment-16707210 ] Apache Spark commented on SPARK-26193: -- User 'xuanyuanking' has created a pull request for this issue: https://github.com/apache/spark/pull/23207 > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16706892#comment-16706892 ] Yuanjian Li commented on SPARK-26193: - Yes, add them in ShuffleExchangeExec has a more clearer implement, I shouldn't change SparkPlan just for the display issue. Also update the document and demo. > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16706368#comment-16706368 ] Reynold Xin commented on SPARK-26193: - Can we simplify it and add those metrics only to the same exchange operator as the read side? > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26193) Implement shuffle write metrics in SQL
[ https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16706359#comment-16706359 ] Yuanjian Li commented on SPARK-26193: - cc [~smilegator] [~cloud_fan] and [~rxin], cause the writer side of shuffle metrics need more changes than reader side, add a sketch design and demo doc in this jira, I'll give a PR soon after you think the implement describe in doc is ok. Thanks :) > Implement shuffle write metrics in SQL > -- > > Key: SPARK-26193 > URL: https://issues.apache.org/jira/browse/SPARK-26193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org