[jira] [Issue Comment Deleted] (SPARK-26193) Implement shuffle write metrics in SQL

Reynold Xin (JIRA) Mon, 10 Dec 2018 13:57:41 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Reynold Xin updated SPARK-26193:
--------------------------------
    Comment: was deleted

(was: rxin commented on a change in pull request #23207: [SPARK-26193][SQL] 
Implement shuffle write metrics in SQL
URL: https://github.com/apache/spark/pull/23207#discussion_r240394461
 
 

 ##########
 File path: 
core/src/main/scala/org/apache/spark/shuffle/ShuffleWriteProcessor.scala
 ##########
 @@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.shuffle
+
+import org.apache.spark.{Partition, ShuffleDependency, SparkEnv, TaskContext}
+import org.apache.spark.internal.Logging
+import org.apache.spark.rdd.RDD
+import org.apache.spark.scheduler.MapStatus
+
+/**
+ * The interface for customizing shuffle write process. The driver create a 
ShuffleWriteProcessor
+ * and put it into [[ShuffleDependency]], and executors use it in each 
ShuffleMapTask.
+ */
+private[spark] class ShuffleWriteProcessor extends Serializable with Logging {
+
+  /**
+   * Create a [[ShuffleWriteMetricsReporter]] from the task context. As the 
reporter is a
+   * per-row operator, here need a careful consideration on performance.
+   */
+  protected def createMetricsReporter(context: TaskContext): 
ShuffleWriteMetricsReporter = {
+    context.taskMetrics().shuffleWriteMetrics
+  }
+
+  /**
+   * The write process for particular partition, it controls the life circle 
of [[ShuffleWriter]]
+   * get from [[ShuffleManager]] and triggers rdd compute, finally return the 
[[MapStatus]] for
+   * this task.
+   */
+  def writeProcess(
 
 Review comment:
   a nit: it's weird to call this "writeProcess". Maybe just "write", or just 
"process".
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org
)

> Implement shuffle write metrics in SQL
> --------------------------------------
>
>                 Key: SPARK-26193
>                 URL: https://issues.apache.org/jira/browse/SPARK-26193
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Xiao Li
>            Assignee: Yuanjian Li
>            Priority: Major
>             Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-26193) Implement shuffle write metrics in SQL

Reply via email to