Github user pkuwm commented on a diff in the pull request:
https://github.com/apache/spark/pull/21386#discussion_r189729116
--- Diff: python/pyspark/sql/functions.py ---
@@ -2268,6 +2268,21 @@ def array_sort(col):
return Column(sc._jvm.functions.array_sort(_to_java_column(col)))
+@since(2.4)
+def shuffle(col):
+ """
+ Collection function: Generate a random permutation of the given array.
+
+ :param col: name of column or expression
+
+ >>> df = spark.createDataFrame([([2, 1, 3],),([2, 1, None,
3],),([1],),([],)], ['data'])
+ >>> df.select(shuffle(df.data).alias('r')).collect()
+ [Row(r=[1, 3, 2]), Row(r=[3, None, 1, 2]), Row(r=[1]), Row(r=[])]
--- End diff --
Cool. My bad. Not familiar with this. Thought they were just doc like
comments... Will fix it.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]