Re: [PR] [SPARK-49383][SQL][PYTHON][CONNECT] Support Transpose DataFrame API [spark]

via GitHub Wed, 28 Aug 2024 17:42:03 -0700


xinrong-meng commented on code in PR #47884:
URL: https://github.com/apache/spark/pull/47884#discussion_r1735431843



##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala:
##########
@@ -1780,6 +1788,89 @@ class Dataset[T] private[sql] (
   def melt(ids: Array[Column], variableColumnName: String, valueColumnName: 
String): DataFrame =
     unpivot(ids, variableColumnName, valueColumnName)
 
+  /**
+   * Transpose a DataFrame such that the values in the specified index column 
become the new

Review Comment:
   Resolved for both scala and python thanks!



##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala:
##########
@@ -1780,6 +1788,89 @@ class Dataset[T] private[sql] (
   def melt(ids: Array[Column], variableColumnName: String, valueColumnName: 
String): DataFrame =
     unpivot(ids, variableColumnName, valueColumnName)
 
+  /**
+   * Transpose a DataFrame such that the values in the specified index column 
become the new
+   * columns of the DataFrame.
+   *
+   * Please note:
+   *   - All columns except the index column must share a least common data 
type. Unless they
+   *   are the same data type, all columns are cast to the nearest common data 
type.
+   *   - The name of the column into which the original column names are 
transposed defaults
+   *   to "key".
+   *   - null values in the index column are excluded from the column names 
for the
+   *   transposed table, which are ordered in ascending order.
+   *
+   * {{{
+   *   val df = Seq(("A", 1, 2), ("B", 3, 4)).toDF("id", "val1", "val2")
+   *   df.show()
+   *   // output:
+   *   // +---+----+----+
+   *   // | id|val1|val2|
+   *   // +---+----+----+
+   *   // |  A|   1|   2|
+   *   // |  B|   3|   4|
+   *   // +---+----+----+
+   *
+   *   df.transpose($"id").show()
+   *   // output:
+   *   // +----+---+---+
+   *   // | key|  A|  B|
+   *   // +----+---+---+
+   *   // |val1|  1|  3|
+   *   // |val2|  2|  4|
+   *   // +----+---+---+
+   *   // schema:
+   *   // root
+   *   //  |-- key: string (nullable = false)
+   *   //  |-- A: integer (nullable = true)
+   *   //  |-- B: integer (nullable = true)
+   *
+   *   df.transpose().show()
+   *   // output:
+   *   // +----+---+---+
+   *   // | key|  A|  B|
+   *   // +----+---+---+
+   *   // |val1|  1|  3|
+   *   // |val2|  2|  4|
+   *   // +----+---+---+
+   *   // schema:
+   *   // root
+   *   //  |-- key: string (nullable = false)
+   *   //  |-- A: integer (nullable = true)
+   *   //  |-- B: integer (nullable = true)
+   * }}}
+   *
+   * @param indexColumn
+   *   The single column that will be treated as the index for the transpose 
operation.This column

Review Comment:
   ditto.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-49383][SQL][PYTHON][CONNECT] Support Transpose DataFrame API [spark]

Reply via email to