[GitHub] [spark] zhengruifeng commented on a diff in pull request #42875: [SPARK-45119][PYTHON][DOCS] Refine docstring of inline

via GitHub Mon, 11 Sep 2023 19:00:09 -0700


zhengruifeng commented on code in PR #42875:
URL: https://github.com/apache/spark/pull/42875#discussion_r1322274339



##########
python/pyspark/sql/functions.py:
##########
@@ -12451,37 +12451,132 @@ def inline(col: "ColumnOrName") -> Column:
     """
     Explodes an array of structs into a table.
 
+    This function takes an input column containing an array of structs and 
returns a
+    new column where each struct in the array is exploded into a separate row.
+
     .. versionadded:: 3.4.0
 
     Parameters
     ----------
     col : :class:`~pyspark.sql.Column` or str
-        input column of values to explode.
+        Input column of values to explode.
 
     Returns
     -------
     :class:`~pyspark.sql.Column`
-        generator expression with the inline exploded result.
+        Generator expression with the inline exploded result.
 
     See Also
     --------
-    :meth:`explode`
-
-    Notes
-    -----
-    Supports Spark Connect.
+    :meth:`pyspark.functions.explode`
+    :meth:`pyspark.functions.inline_outer`
 
     Examples
     --------
+    Example 1: Using inline with a single struct array column
+
+    >>> import pyspark.sql.functions as sf
+    >>> from pyspark.sql import Row
+    >>> df = spark.createDataFrame([Row(structlist=[Row(a=1, b=2), Row(a=3, 
b=4)])])
+    >>> df.select(sf.inline(df.structlist)).show()
+    +---+---+
+    |  a|  b|
+    +---+---+
+    |  1|  2|
+    |  3|  4|
+    +---+---+
+    
+    Example 2: Using inline with a column name
+
+    >>> import pyspark.sql.functions as sf
     >>> from pyspark.sql import Row
     >>> df = spark.createDataFrame([Row(structlist=[Row(a=1, b=2), Row(a=3, 
b=4)])])
-    >>> df.select(inline(df.structlist)).show()
+    >>> df.select(sf.inline("structlist")).show()
     +---+---+
     |  a|  b|
     +---+---+
     |  1|  2|
     |  3|  4|
     +---+---+
+
+    Example 3: Using inline with an alias
+
+    >>> import pyspark.sql.functions as sf
+    >>> from pyspark.sql import Row
+    >>> df = spark.createDataFrame([Row(structlist=[Row(a=1, b=2), Row(a=3, 
b=4)])])
+    >>> df.select(sf.inline("structlist").alias("c1", "c2")).show()
+    +---+---+
+    | c1| c2|
+    +---+---+
+    |  1|  2|
+    |  3|  4|
+    +---+---+
+
+    Example 4: Using inline with multiple struct array columns
+
+    >>> import pyspark.sql.functions as sf
+    >>> from pyspark.sql import Row
+    >>> df = spark.createDataFrame([
+    ...     Row(structlist1=[Row(a=1, b=2), Row(a=3, b=4)],
+    ...         structlist2=[Row(c=5, d=6), Row(c=7, d=8)])
+    ... ])
+    >>> df.select(sf.inline("structlist1"), "structlist2") \
+    ...     .select("a", "b", sf.inline("structlist2")).show()
+    +---+---+---+---+
+    |  a|  b|  c|  d|
+    +---+---+---+---+
+    |  1|  2|  5|  6|
+    |  1|  2|  7|  8|
+    |  3|  4|  5|  6|
+    |  3|  4|  7|  8|
+    +---+---+---+---+
+
+    Example 5: Using inline with a nested struct array column
+
+    >>> import pyspark.sql.functions as sf
+    >>> from pyspark.sql import Row
+    >>> df = spark.createDataFrame([
+    ...     Row(structlist=[Row(a=1, b=2, nested=[Row(c=3, d=4), Row(c=5, 
d=6)])])
+    ... ])
+    >>> df.select(sf.inline("structlist")).show()
+    +---+---+----------------+
+    |  a|  b|          nested|
+    +---+---+----------------+
+    |  1|  2|[{3, 4}, {5, 6}]|
+    +---+---+----------------+
+    >>> df.select(sf.inline("structlist")).select(sf.inline("nested")).show()

Review Comment:
   ```
   In [14]: df.select("structlist").printSchema()
   root
    |-- structlist: array (nullable = true)
    |    |-- element: struct (containsNull = true)
    |    |    |-- a: long (nullable = true)
    |    |    |-- b: long (nullable = true)
    |    |    |-- nested: array (nullable = true)
    |    |    |    |-- element: struct (containsNull = true)
    |    |    |    |    |-- c: long (nullable = true)
    |    |    |    |    |-- d: long (nullable = true)
   
   
   In [15]: df.select("structlist.nested").printSchema()
   root
    |-- nested: array (nullable = true)
    |    |-- element: array (containsNull = true)
    |    |    |-- element: struct (containsNull = true)
    |    |    |    |-- c: long (nullable = true)
    |    |    |    |-- d: long (nullable = true)
   
   
   In [16]: df.select(df.structlist.nested).printSchema()
   root
    |-- structlist.nested: array (nullable = true)
    |    |-- element: array (containsNull = true)
    |    |    |-- element: struct (containsNull = true)
    |    |    |    |-- c: long (nullable = true)
    |    |    |    |-- d: long (nullable = true)
   ```
   
   I guess there is something wrong in column resolution?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42875: [SPARK-45119][PYTHON][DOCS] Refine docstring of inline

Reply via email to