Re: [PR] [SPARK-55229][PYTHON] Implement DataFrame.zipWithIndex in PySpark Classic [spark]

via GitHub Sat, 07 Feb 2026 00:32:52 -0800


zhengruifeng commented on code in PR #54195:
URL: https://github.com/apache/spark/pull/54195#discussion_r2777265932



##########
python/pyspark/sql/classic/dataframe.py:
##########
@@ -280,6 +281,11 @@ def explain(
     def exceptAll(self, other: ParentDataFrame) -> ParentDataFrame:
         return DataFrame(self._jdf.exceptAll(other._jdf), self.sparkSession)
 
+    def zipWithIndex(self, indexColName: str = "index") -> ParentDataFrame:
+        return self.select(
+            F.col("*"), 
InternalFunction.distributed_sequence_id().alias(indexColName)

Review Comment:
   this function is dedicated for PS, and I am making it different from 
zipWithIndex on underlying RDD cache.
   
   basically, we directly invoke JVM methods via py4j for methods in pyspark 
classic



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-55229][PYTHON] Implement DataFrame.zipWithIndex in PySpark Classic [spark]

Reply via email to