Re: [PR] [SPARK-57103][SQL][TEST][FOLLOWUP] Add test coverage for max_by/min_by over nanosecond-precision timestamp types [spark]

via GitHub Sun, 21 Jun 2026 05:59:54 -0700


stevomitric commented on code in PR #56612:
URL: https://github.com/apache/spark/pull/56612#discussion_r3448451858



##########
sql/core/src/test/scala/org/apache/spark/sql/TimestampNanosFunctionsSuiteBase.scala:
##########
@@ -450,6 +450,89 @@ abstract class TimestampNanosFunctionsSuiteBase extends 
SharedSparkSession {
     }
   }
 
+  // ===== max_by / min_by over nanosecond-precision timestamps (SPARK-56822) 
=====
+  // `MaxBy`/`MinBy` gate only on the ordering expression's orderability
+  // (`MaxMinBy.checkInputDataTypes` -> `TypeUtils.checkForOrderingExpr`), 
which the nanosecond
+  // types pass (SPARK-57103); the value expression is unrestricted and 
`dataType = valueExpr
+  // .dataType`, so a nanosecond *value* is returned with its precision 
preserved. No change to the
+  // aggregates is needed -- these tests lock in both the nanos-as-value and 
nanos-as-ordering paths.
+
+  test("SPARK-57103: max_by/min_by return a nanosecond value and preserve its 
precision") {
+    Seq(7, 8, 9).foreach { p =>
+      // Value columns are nanos; the ordering column is a plain int key (max 
at k=3, min at k=1).
+      // The sub-microsecond parts are multiples of 100ns, so they are exact 
at every p in [7, 9]
+      // (no flooring) yet still non-zero -- proving the nanos value survives, 
not truncated to micros.
+      val schema = new StructType()
+        .add("ntz", TimestampNTZNanosType(p))
+        .add("ltz", TimestampLTZNanosType(p))
+        .add("k", IntegerType)
+      val data = Seq(
+        Row(LocalDateTime.parse("2020-01-01T00:00:00.000000100"),
+          Instant.parse("2020-01-01T00:00:00.000000100Z"), 1),
+        Row(LocalDateTime.parse("2020-01-01T00:00:00.000000900"),
+          Instant.parse("2020-01-01T00:00:00.000000900Z"), 3),
+        Row(LocalDateTime.parse("2020-01-01T00:00:00.000000500"),
+          Instant.parse("2020-01-01T00:00:00.000000500Z"), 2))
+      val df = spark.createDataFrame(spark.sparkContext.parallelize(data), 
schema)
+      val res = df.selectExpr(
+        "max_by(ntz, k)", "min_by(ntz, k)", "max_by(ltz, k)", "min_by(ltz, k)")

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-57103][SQL][TEST][FOLLOWUP] Add test coverage for max_by/min_by over nanosecond-precision timestamp types [spark]

Reply via email to