Re: [PR] [SPARK-53401] Enable Direct Passthrough Partitioning in the DataFrame API [spark]

via GitHub Fri, 29 Aug 2025 02:28:48 -0700


cloud-fan commented on code in PR #52153:
URL: https://github.com/apache/spark/pull/52153#discussion_r2309667801



##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala:
##########
@@ -2785,6 +2785,41 @@ class DataFrameSuite extends QueryTest
     val df1 = df.select("a").orderBy("b").orderBy("all")
     checkAnswer(df1, Seq(Row(1), Row(4)))
   }
+
+  test("SPARK-53401: repartitionById - should partition rows to the specified 
" +
+    "partition ID") {
+    val numPartitions = 10
+    val df = spark.range(100).withColumn("p_id", col("id") % numPartitions)
+
+    val repartitioned = df.repartitionById(numPartitions, $"p_id")
+    val result = repartitioned.withColumn("actual_p_id", spark_partition_id())
+
+    assert(result.filter(col("p_id") =!= col("actual_p_id")).count() == 0)
+
+    assert(result.rdd.getNumPartitions == numPartitions)
+  }
+
+  test("SPARK-53401: repartitionById - should fail when partition ID is null") 
{
+    val df = spark.range(10).withColumn("p_id",
+      when(col("id") < 5, col("id")).otherwise(lit(null).cast("long"))
+    )
+    val repartitioned = df.repartitionById(5, $"p_id")
+
+    val e = intercept[SparkException] {
+      repartitioned.collect()
+    }
+    assert(e.getCause.isInstanceOf[IllegalArgumentException])

Review Comment:
   what's the actual error? if the error message is not clear we should do 
explicit null check, or simply treat null as partition id 0.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-53401] Enable Direct Passthrough Partitioning in the DataFrame API [spark]

Reply via email to