repartitionByRange`

via GitHub Sun, 03 Sep 2023 19:39:55 -0700


zhengruifeng commented on code in PR #42770:
URL: https://github.com/apache/spark/pull/42770#discussion_r1314388163



##########
python/pyspark/sql/dataframe.py:
##########
@@ -1809,18 +1810,27 @@ def repartition(  # type: ignore[misc]
 
         Repartition the data into 10 partitions.
 
-        >>> df.repartition(10).rdd.getNumPartitions()
-        10
+        >>> df.repartition(10).explain()
+        == Physical Plan ==

Review Comment:
   I found that current changes can reflect the explainations like
   
   ```
           Repartition the data into 7 partitions by 'age' and 'name columns.
   ```
   
   ```
           Repartition the data into 2 partitions by range in 'age' column.
           For example, the first partition can have ``(14, "Tom")``, and the 
second
           partition would have ``(16, "Bob")`` and ``(23, "Alice")``.
   ```
   
   while if we use the numPartitions (not matter `rdd.getNumPartitions` or 
`spark_partition_id()`) in the examples, it can not provide such information.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42770: [SPARK-45049][CONNECT][DOCS][TESTS] Refine docstrings of `coalesce/repartition/repartitionByRange`

Reply via email to