[GitHub] [spark] HyukjinKwon commented on a change in pull request #34007: [SPARK-36710][PYTHON] Support new typing syntax in function apply APIs in pandas API on Spark

GitBox Wed, 15 Sep 2021 13:53:11 -0700


HyukjinKwon commented on a change in pull request #34007:
URL: https://github.com/apache/spark/pull/34007#discussion_r709564361




##########
File path: python/pyspark/pandas/accessors.py
##########
@@ -230,28 +231,24 @@ def apply_batch(
 
             To avoid this, specify return type in ``func``, for instance, as 
below:
 
-            >>> def plus_one(x) -> ps.DataFrame[float, float]:
+            >>> def plus_one(x) -> ps.DataFrame[int, [float, float]]:
             ...     return x + 1
 
             If the return type is specified, the output column names become
             `c0, c1, c2 ... cn`. These names are positionally mapped to the 
returned
             DataFrame in ``func``.
 
-            To specify the column names, you can assign them in a pandas 
friendly style as below:
+            To specify the column names, you can assign them in a NumPy 
compound type style
+            as below:
 
-            >>> def plus_one(x) -> ps.DataFrame["a": float, "b": float]:
+            >>> def plus_one(x) -> ps.DataFrame[("index", int), [("a", float), 
("b", float)]:

Review comment:
       ```suggestion
               >>> def plus_one(x) -> ps.DataFrame[("index", int), [("a", 
float), ("b", float)]]:
   ```

##########
File path: python/pyspark/pandas/accessors.py
##########
@@ -439,28 +447,24 @@ def transform_batch(
 
             To avoid this, specify return type in ``func``, for instance, as 
below:
 
-            >>> def plus_one(x) -> ps.DataFrame[float, float]:
+            >>> def plus_one(x) -> ps.DataFrame[int, [float, float]]:
             ...     return x + 1
 
             If the return type is specified, the output column names become
             `c0, c1, c2 ... cn`. These names are positionally mapped to the 
returned
             DataFrame in ``func``.
 
-            To specify the column names, you can assign them in a pandas 
friendly style as below:
+            To specify the column names, you can assign them in a NumPy 
compound type style
+            as below:
 
-            >>> def plus_one(x) -> ps.DataFrame['a': float, 'b': float]:
+            >>> def plus_one(x) -> ps.DataFrame[("index", int), [("a", float), 
("b", float)]:

Review comment:
       ```suggestion
               >>> def plus_one(x) -> ps.DataFrame[("index", int), [("a", 
float), ("b", float)]]:
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon commented on a change in pull request #34007: [SPARK-36710][PYTHON] Support new typing syntax in function apply APIs in pandas API on Spark

Reply via email to