Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/21121#discussion_r183214167
--- Diff: python/pyspark/sql/functions.py ---
@@ -2191,6 +2191,24 @@ def reverse(col):
return Column(sc._jvm.functions.reverse(_to_java_column(col)))
+@since(2.4)
+def zip_with_index(col, indexFirst=False):
+ """
+ Collection function: transforms the input array by encapsulating
elements into pairs
+ with indexes indicating the order.
+
+ :param col: name of column or expression
+
+ >>> df = spark.createDataFrame([([2, 5, 3],), ([],)], ['data'])
+ >>> df.select(zip_with_index(df.data).alias('r')).collect()
+ [Row(r=[[value=2, index=0], [value=5, index=1], [value=3, index=2]]),
Row(r=[])]
+ >>> df.select(zip_with_index(df.data,
indexFirst=True).alias('r')).collect()
+ [Row(r=[[index=0, value=2], [index=1, value=5], [index=2, value=3]]),
Row(r=[])]
+ """
--- End diff --
nit: there's one more leading space here.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]