zhengruifeng commented on code in PR #42770: URL: https://github.com/apache/spark/pull/42770#discussion_r1314388163
########## python/pyspark/sql/dataframe.py: ########## @@ -1809,18 +1810,27 @@ def repartition( # type: ignore[misc] Repartition the data into 10 partitions. - >>> df.repartition(10).rdd.getNumPartitions() - 10 + >>> df.repartition(10).explain() + == Physical Plan == Review Comment: I found that current changes can reflect the explainations like ``` Repartition the data into 7 partitions by 'age' and 'name columns. ``` ``` Repartition the data into 2 partitions by range in 'age' column. For example, the first partition can have ``(14, "Tom")``, and the second partition would have ``(16, "Bob")`` and ``(23, "Alice")``. ``` while if we use the numPartitions (not matter `rdd.getNumPartitions` or `spark_partition_id()`) in the examples, it can not provide such information. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org