[GitHub] [spark] yabola commented on a diff in pull request #39687: [SPARK-41470][SQL] Relax constraints on Storage-Partitioned-Join should assume InternalRow implements equals and hashCode

via GitHub Fri, 27 Jan 2023 18:36:32 -0800


yabola commented on code in PR #39687:
URL: https://github.com/apache/spark/pull/39687#discussion_r1089601759



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/BatchScanExec.scala:
##########
@@ -132,11 +130,12 @@ case class BatchScanExec(
 
       outputPartitioning match {
         case p: KeyGroupedPartitioning =>
-          val partitionMapping = finalPartitions.map(s =>
-            s.head.asInstanceOf[HasPartitionKey].partitionKey() -> s).toMap
+          val partitionMapping = finalPartitions
+            .map(s => InternalRowComparableWrapper(s.head, p.expressions) -> 
s).toMap
           finalPartitions = p.partitionValues.map { partValue =>
             // Use empty partition for those partition values that are not 
present
-            partitionMapping.getOrElse(partValue, Seq.empty)

Review Comment:
   `Map` operation needs to rely on `equals` method, change to use 
`InternalRowComparableWrapper`



##########
sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryBaseTable.scala:
##########
@@ -541,13 +542,31 @@ class BufferedRows(val key: Seq[Any] = Seq.empty) extends 
WriterCommitMessage
 
   def keyString(): String = key.toArray.mkString("/")
 
-  override def partitionKey(): InternalRow = {
-    InternalRow.fromSeq(key)
-  }
+  override def partitionKey(): InternalRow = PartitionInternalRow(key.toArray)

Review Comment:
   `GenericInternalRow` implements `equals` and `hashCode` methods already. 
Change to use `PartitionInternalRow`



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala:
##########
@@ -198,12 +198,9 @@ case class EnsureRequirements(
                 // Check if the two children are partition keys compatible. If 
so, find the
                 // common set of partition values, and adjust the plan 
accordingly.
                 if (leftSpec.areKeysCompatible(rightSpec)) {
-                  val leftPartValues = leftSpec.partitioning.partitionValues
-                  val rightPartValues = rightSpec.partitioning.partitionValues
-
-                  val mergedPartValues = Utils.mergeOrdered(
-                    Seq(leftPartValues, 
rightPartValues))(leftSpec.ordering).toSeq.distinct
-

Review Comment:
   Merge operation also needs to rely on `equals` method, change to use 
`InternalRowComparableWrapper`



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala:
##########
@@ -677,9 +678,6 @@ case class KeyGroupedShuffleSpec(
     }
   }
 
-  lazy val ordering: Ordering[InternalRow] =
-    
RowOrdering.createNaturalAscendingOrdering(partitioning.expressions.map(_.dataType))
-

Review Comment:
   In order to centralize the comparison logic, `InternalRowComparableWrapper` 
is uniformly used here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] yabola commented on a diff in pull request #39687: [SPARK-41470][SQL] Relax constraints on Storage-Partitioned-Join should assume InternalRow implements equals and hashCode

Reply via email to