Re: [PR] [SPARK-49383][SQL][PYTHON][CONNECT] Support Transpose DataFrame API [spark]

via GitHub Tue, 27 Aug 2024 22:59:16 -0700


xinrong-meng commented on code in PR #47884:
URL: https://github.com/apache/spark/pull/47884#discussion_r1734037023



##########
python/pyspark/sql/dataframe.py:
##########
@@ -6323,6 +6323,71 @@ def toPandas(self) -> "PandasDataFrameLike":
         """
         ...
 
+    @dispatch_df_method
+    def transpose(self, indexColumn: Optional[Column] = None) -> "DataFrame":
+        """
+        Transpose a DataFrame such that the values in the specified index 
column become the new
+        columns of the DataFrame. If no index column is provided, the first 
column is used as
+        the default.
+
+        Please note:
+         - All columns except the index column must share a least common data 
type. Unless they
+         are the same data type, all columns are cast to the nearest common 
data type.
+         - The name of the column into which the original column names are 
transposed defaults
+         to "key".
+         - Non-"key" column names for the transposed table are ordered in 
ascending order.
+
+        .. versionadded:: 4.0.0
+
+        Parameters
+        ----------
+        indexColumn : str, Column, optional
+            The single column that will be treated as the index for the 
transpose operation.This
+            column will be used to transform the DataFrame such that the 
values of the indexColumn
+            become the new columns in the transposed DataFrame. If not 
provided, the first column of
+            the DataFrame will be used as the default.
+
+        Returns
+        -------
+        :class:`DataFrame`
+            Transposed DataFrame.
+
+        Notes
+        -----
+        Supports Spark Connect.
+
+        Examples
+        --------
+        >>> df = spark.createDataFrame(
+        ...     [("A", 1, 2), ("B", 3, 4)],
+        ...     ["id", "val1", "val2"],
+        ... )
+        >>> df.show()
+        +---+----+----+
+        | id|val1|val2|
+        +---+----+----+
+        |  A|   1|   2|
+        |  B|   3|   4|
+        +---+----+----+
+
+        >>> df.transpose().show()
+        +----+---+---+
+        | key|  A|  B|

Review Comment:
   I'm afraid not. According to API spec, 
   
   > The name of the column into which the original column names are transposed 
defaults to "key"
   
   If "id" was reused, it could lead to confusion because val1 and val2 are not 
"id"s. The term "key" is neutral and generic, and more suitable in this case in 
my view. Please let me know if you have other concern.
   



##########
python/pyspark/sql/dataframe.py:
##########
@@ -6323,6 +6323,71 @@ def toPandas(self) -> "PandasDataFrameLike":
         """
         ...
 
+    @dispatch_df_method
+    def transpose(self, indexColumn: Optional[Column] = None) -> "DataFrame":
+        """
+        Transpose a DataFrame such that the values in the specified index 
column become the new
+        columns of the DataFrame. If no index column is provided, the first 
column is used as
+        the default.
+
+        Please note:
+         - All columns except the index column must share a least common data 
type. Unless they
+         are the same data type, all columns are cast to the nearest common 
data type.
+         - The name of the column into which the original column names are 
transposed defaults
+         to "key".
+         - Non-"key" column names for the transposed table are ordered in 
ascending order.
+
+        .. versionadded:: 4.0.0
+
+        Parameters
+        ----------
+        indexColumn : str, Column, optional
+            The single column that will be treated as the index for the 
transpose operation.This
+            column will be used to transform the DataFrame such that the 
values of the indexColumn
+            become the new columns in the transposed DataFrame. If not 
provided, the first column of
+            the DataFrame will be used as the default.
+
+        Returns
+        -------
+        :class:`DataFrame`
+            Transposed DataFrame.
+
+        Notes
+        -----
+        Supports Spark Connect.
+
+        Examples
+        --------
+        >>> df = spark.createDataFrame(
+        ...     [("A", 1, 2), ("B", 3, 4)],
+        ...     ["id", "val1", "val2"],
+        ... )
+        >>> df.show()
+        +---+----+----+
+        | id|val1|val2|
+        +---+----+----+
+        |  A|   1|   2|
+        |  B|   3|   4|
+        +---+----+----+
+
+        >>> df.transpose().show()
+        +----+---+---+
+        | key|  A|  B|
+        +----+---+---+
+        |val1|  1|  3|
+        |val2|  2|  4|
+        +----+---+---+
+
+        >>> df.transpose(df.id).show()
+        +----+---+---+
+        | key|  A|  B|

Review Comment:
   Please see above.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-49383][SQL][PYTHON][CONNECT] Support Transpose DataFrame API [spark]

Reply via email to