Github user pkuwm commented on a diff in the pull request:
https://github.com/apache/spark/pull/21802#discussion_r203453412
--- Diff: python/pyspark/sql/functions.py ---
@@ -2382,6 +2382,20 @@ def array_sort(col):
return Column(sc._jvm.functions.array_sort(_to_java_column(col)))
+@since(2.4)
+def shuffle(col):
+ """
+ Collection function: Generates a random permutation of the given array.
+
+ .. note:: The function is non-deterministic because its results
depends on order of rows which
--- End diff --
Maybe this one would be better? "The function is non-deterministic because
it produces
an unbiased permutation: every permutation is equally likely."
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]