(spark) branch master updated: [SPARK-47891][PYTHON][DOCS] Improve docstring of mapInPandas

gurwls223 Wed, 17 Apr 2024 17:48:04 -0700

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new a7f8ccef122a [SPARK-47891][PYTHON][DOCS] Improve docstring of 
mapInPandas
a7f8ccef122a is described below

commit a7f8ccef122a629559bae91e3847589c4cf1a46a
Author: Xinrong Meng <xinr...@apache.org>
AuthorDate: Thu Apr 18 09:47:47 2024 +0900

    [SPARK-47891][PYTHON][DOCS] Improve docstring of mapInPandas
    
    ### What changes were proposed in this pull request?
    Improve docstring of mapInPandas
    
    - "using a Python native function that takes and outputs a pandas 
DataFrame" is confusing cause the function takes and outputs "ITERATOR of 
pandas DataFrames" instead.
    - "All columns are passed together as an iterator of pandas DataFrames" 
easily mislead users to think the entire DataFrame will be passed together, "a 
batch of rows" is used instead.
    
    ### Why are the changes needed?
    More accurate and clear docstring.
    
    ### Does this PR introduce _any_ user-facing change?
    No.
    
    ### How was this patch tested?
    Doc change only.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    No.
    
    Closes #46108 from xinrong-meng/doc_mapInPandas.
    
    Authored-by: Xinrong Meng <xinr...@apache.org>
    Signed-off-by: Hyukjin Kwon <gurwls...@apache.org>
---
 python/pyspark/sql/pandas/map_ops.py | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/python/pyspark/sql/pandas/map_ops.py 
b/python/pyspark/sql/pandas/map_ops.py
index 82bcd58b0c0e..6d8bb7c779b7 100644
--- a/python/pyspark/sql/pandas/map_ops.py
+++ b/python/pyspark/sql/pandas/map_ops.py
@@ -30,7 +30,7 @@ if TYPE_CHECKING:
 
 class PandasMapOpsMixin:
     """
-    Min-in for pandas map operations. Currently, only :class:`DataFrame`
+    Mix-in for pandas map operations. Currently, only :class:`DataFrame`
     can use this class.
     """
 
@@ -43,16 +43,14 @@ class PandasMapOpsMixin:
     ) -> "DataFrame":
         """
         Maps an iterator of batches in the current :class:`DataFrame` using a 
Python native
-        function that takes and outputs a pandas DataFrame, and returns the 
result as a
-        :class:`DataFrame`.
+        function that is performed on pandas DataFrames both as input and 
output,
+        and returns the result as a :class:`DataFrame`.
 
-        The function should take an iterator of `pandas.DataFrame`\\s and 
return
-        another iterator of `pandas.DataFrame`\\s. All columns are passed
-        together as an iterator of `pandas.DataFrame`\\s to the function and 
the
-        returned iterator of `pandas.DataFrame`\\s are combined as a 
:class:`DataFrame`.
-        Each `pandas.DataFrame` size can be controlled by
-        `spark.sql.execution.arrow.maxRecordsPerBatch`. The size of the 
function's input and
-        output can be different.
+        This method applies the specified Python function to an iterator of
+        `pandas.DataFrame`\\s, each representing a batch of rows from the 
original DataFrame.
+        The returned iterator of `pandas.DataFrame`\\s are combined as a 
:class:`DataFrame`.
+        The size of the function's input and output can be different. Each 
`pandas.DataFrame`
+        size can be controlled by 
`spark.sql.execution.arrow.maxRecordsPerBatch`.
 
         .. versionadded:: 3.0.0
 
@@ -68,7 +66,8 @@ class PandasMapOpsMixin:
             the return type of the `func` in PySpark. The value can be either a
             :class:`pyspark.sql.types.DataType` object or a DDL-formatted type 
string.
         barrier : bool, optional, default False
-            Use barrier mode execution.
+            Use barrier mode execution, ensuring that all Python workers in 
the stage will be
+            launched concurrently.
 
             .. versionadded: 3.5.0
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47891][PYTHON][DOCS] Improve docstring of mapInPandas

Reply via email to