[GitHub] [spark] bjornjorgensen commented on pull request #39098: [SPARK-41553][PS][PYTHON][CORE] Change `num_files` to `repartition`

GitBox Sat, 17 Dec 2022 06:11:59 -0800


bjornjorgensen commented on PR #39098:
URL: https://github.com/apache/spark/pull/39098#issuecomment-1356274657


   @dongjoon-hyun Thank you. 
   
   
   "And the num_files argument doesn't actually manage the number of files, but 
specifying the partition number."
   https://github.com/apache/spark/pull/33379
   
   But in the documentation 
   
https://github.com/apache/spark/blob/d95fb4c33f6f061190fae091868117d182659147/python/pyspark/pandas/generic.py#L674
   
   "So, we should deprecate the num_files argument and encourage users to use 
DataFrame.spark.repartition API instead."
   https://github.com/apache/spark/pull/33379
   
   The use of `DataFrame.spark.repartition` API has not been documented. 
   
   So if we change back to `num_files` 
   
   Change "The number of files can be controlled by `num_files`." to "The 
number of partitions can be controlled by `num_files`."
   
   Add a note instead of `num_files` to manage the number of files 
`DataFrame.spark.repartition` can be used. 
   
   And remove if num_files is not None:


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] bjornjorgensen commented on pull request #39098: [SPARK-41553][PS][PYTHON][CORE] Change `num_files` to `repartition`

Reply via email to