szehon-ho commented on code in PR #53097:
URL: https://github.com/apache/spark/pull/53097#discussion_r2535622366
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/InternalRowComparableWrapper.scala:
##########
@@ -112,3 +112,65 @@ object InternalRowComparableWrapper {
result.toSeq
}
}
+
+/**
+ * Effectively the same as [[InternalRowComparableWrapper]], but using a
precomputed `ordering`
+ * and `structType` to avoid the cache lookup for each row.
+ */
+class BoundInternalRowComparableWrapper(
+ val row: InternalRow,
+ val dataTypes: Seq[DataType],
+ val ordering: BaseOrdering,
+ val structType: StructType) {
+
+ override def hashCode(): Int = Murmur3HashFunction.hash(
+ row,
+ structType,
+ 42L,
+ isCollationAware = true,
+ // legacyCollationAwareHashing only matters when isCollationAware is false.
+ legacyCollationAwareHashing = false).toInt
+
+ override def equals(other: Any): Boolean = {
+ if (!other.isInstanceOf[BoundInternalRowComparableWrapper]) {
+ return false
+ }
+ val otherWrapper = other.asInstanceOf[BoundInternalRowComparableWrapper]
+ if (!otherWrapper.dataTypes.equals(this.dataTypes)) {
+ return false
+ }
+ ordering.compare(row, otherWrapper.row) == 0
+ }
+}
+
+object BoundInternalRowComparableWrapper {
+ /** Compute the schema and row ordering for a given list of data types. */
+ def getStructTypeAndOrdering(dataTypes: Seq[DataType]): (StructType,
BaseOrdering) =
+ StructType(dataTypes.map(t => StructField("f", t))) ->
+ RowOrdering.createNaturalAscendingOrdering(dataTypes)
+
+ def mergePartitions(
Review Comment:
it looks like the only caller for this method in the original class
(InternalRowComparableWrapper) is the InternalRowComparableWrapper benchmark.
Should we just migrate that over and deprecate this method?
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/InternalRowComparableWrapper.scala:
##########
@@ -112,3 +112,65 @@ object InternalRowComparableWrapper {
result.toSeq
}
}
+
+/**
+ * Effectively the same as [[InternalRowComparableWrapper]], but using a
precomputed `ordering`
+ * and `structType` to avoid the cache lookup for each row.
+ */
+class BoundInternalRowComparableWrapper(
Review Comment:
Also I feel it should go in its own file.
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/InternalRowComparableWrapper.scala:
##########
@@ -112,3 +112,65 @@ object InternalRowComparableWrapper {
result.toSeq
}
}
+
+/**
+ * Effectively the same as [[InternalRowComparableWrapper]], but using a
precomputed `ordering`
+ * and `structType` to avoid the cache lookup for each row.
+ */
+class BoundInternalRowComparableWrapper(
Review Comment:
As there's no checks now that derive structType/ordering to dataType, it
seems a bit dangerous to not include them in hash/equals. Should we do that?
Alternatively we could also keep the binding by making a factory and keep
this class constructor pviate, ie
```
object BoundInternalRowComparableFactory(dataTypes) {
val structType = getStructType(dataTypes);
val ordering = getOrdering(dataTypes)
def newBoundInternalRowComparableWrapper(row) =>
BoundInternalRowComparableWrapper(row, structType, ordering, dataTypes)
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]