Re: [PR] [SPARK-55228][SPARK-55230][SQL][CONNECT] Implement Dataset.zipWithIndex in Scala API [spark]

via GitHub Sun, 01 Feb 2026 19:09:15 -0800


cloud-fan commented on code in PR #54014:
URL: https://github.com/apache/spark/pull/54014#discussion_r2752362758



##########
sql/api/src/main/scala/org/apache/spark/sql/Dataset.scala:
##########
@@ -2010,6 +2010,33 @@ abstract class Dataset[T] extends Serializable {
    */
   def exceptAll(other: Dataset[T]): Dataset[T]
 
+  /**
+   * Returns a new [[Dataset]] by appending a column containing consecutive 
0-based Long indices,
+   * similar to `RDD.zipWithIndex()`.
+   *
+   * The index column is appended as the last column of the resulting 
[[DataFrame]].
+   *
+   * @group typedrel
+   * @since 4.2.0
+   */
+  def zipWithIndex(): DataFrame = zipWithIndex("index")
+
+  /**
+   * Returns a new [[Dataset]] by appending a column containing consecutive 
0-based Long indices,
+   * similar to `RDD.zipWithIndex()`.
+   *
+   * The index column is appended as the last column of the resulting 
[[DataFrame]].
+   *
+   * @param indexColName
+   *   The name of the index column to append. The dataset must not already 
contain a column with
+   *   this name.
+   * @throws AnalysisException
+   *   if a column with `indexColName` already exists in the schema.

Review Comment:
   This makes sense to me, and we can relax it later if needed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-55228][SPARK-55230][SQL][CONNECT] Implement Dataset.zipWithIndex in Scala API [spark]

Reply via email to