[GitHub] [spark] ekoifman commented on a change in pull request #32776: [SPARK-35639][SQL] Add metrics about coalesced partitions to CustomShuffleReader in AQE

GitBox Thu, 15 Jul 2021 17:03:45 -0700


ekoifman commented on a change in pull request #32776:
URL: https://github.com/apache/spark/pull/32776#discussion_r670878574




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala
##########
@@ -76,19 +76,76 @@ case class CustomShuffleReaderExec private(
     val desc = if (isLocalReader) {
       "local"
     } else if (hasCoalescedPartition && hasSkewedPartition) {
-      "coalesced and skewed"
+      s"$coalescedDetail and $skewedDetail"
     } else if (hasCoalescedPartition) {
-      "coalesced"
+      coalescedDetail
     } else if (hasSkewedPartition) {
-      "skewed"
+      skewedDetail
     } else {
       ""
     }
     Iterator(desc)
   }
+  private def isCoalesced(spec: ShufflePartitionSpec) = coalesceRange(spec) > 1
+  /**
+   * How many partitions were coalesced; 0 if not [[CoalescedPartitionSpec]]
+   */
+  private def coalesceRange(spec: ShufflePartitionSpec) = spec match {
+    case s: CoalescedPartitionSpec => s.endReducerIndex - s.startReducerIndex
+    case _ => 0
+  }
+
+  /* This is left as documentation
+   * Is it worth reporting this?  For example, if we have
+   * MapOutputStatistics 0,0,0,72,0
+   * MapOutputStatistics 0,0,0,138,138
+   * with target partition size 10, we'll have
+   * CoalescedPartitionSpec(3,4) & CoalescedPartitionSpec(4,5)
+   * So pre-shuffle partitions 0,1,2 are dropped
+   * Another example, (target size 10)
+   * MapOutputStatistics 0,3,0,2,7
+   * MapOutputStatistics 0,2,0,2,7
+   * Results in CoalescedPartitionSpec(1,4) & CoalescedPartitionSpec(4,5)
+   * So pre-shuffle partition 2 is included
+   * We could figure out dropped partitions but doesn't seem that useful.
+   */
+  private def numDroppedPartitions = 0
+
+  private def numCoalescedPartitions = partitionSpecs.count(isCoalesced)
+
+  /**
+   * partitions that will be combined with others (as opposed to taken as is, 
spilt, dropped)
+   */
+  private def numPartitionsToCoalesce = partitionSpecs.filter(isCoalesced)
+    .foldLeft(0)((c, s) => c + coalesceRange(s))
+
+  /**
+   * total splits of all skewed partitions
+   */
+  private def skewedPartitionSplits = partitionSpecs.collect {
+    case p: PartialReducerPartitionSpec => p
+  }
 
-  def hasCoalescedPartition: Boolean =
-    partitionSpecs.exists(_.isInstanceOf[CoalescedPartitionSpec])

Review comment:
       this was done via https://github.com/apache/spark/pull/32872




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] ekoifman commented on a change in pull request #32776: [SPARK-35639][SQL] Add metrics about coalesced partitions to CustomShuffleReader in AQE

Reply via email to