bjornjorgensen commented on PR #39098: URL: https://github.com/apache/spark/pull/39098#issuecomment-1356274657
@dongjoon-hyun Thank you. "And the num_files argument doesn't actually manage the number of files, but specifying the partition number." https://github.com/apache/spark/pull/33379 But in the documentation https://github.com/apache/spark/blob/d95fb4c33f6f061190fae091868117d182659147/python/pyspark/pandas/generic.py#L674 "So, we should deprecate the num_files argument and encourage users to use DataFrame.spark.repartition API instead." https://github.com/apache/spark/pull/33379 The use of `DataFrame.spark.repartition` API has not been documented. So if we change back to `num_files` Change "The number of files can be controlled by `num_files`." to "The number of partitions can be controlled by `num_files`." Add a note instead of `num_files` to manage the number of files `DataFrame.spark.repartition` can be used. And remove if num_files is not None: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
