The following are 2 different approaches to adding an id/index to RDDs and 1 approach to adding an index to a DataFrame.
Add an index column to an RDD ```scala // RDD val dataRDD = sc.textFile("./README.md") // Add index then set index as key in map() transformation // Results in RDD[(Long, String)] val indexedRDD = dataRDD.zipWithIndex().map(pair => (pair._2, pair._1)) ``` Add a unique id column to an RDD ```scala // RDD val dataRDD = sc.textFile("./README.md") // Add unique id then set id as key in map() transformation // Results in RDD[(Long, String)] val indexedRDD = dataRDD.zipWithUniqueId().map(pair => (pair._2, pair._1)) indexedRDD.collect ``` Add an index column to a DataFrame Note: You could use a similar approach with a Dataset. ```scala import spark.implicits._ import org.apache.spark.sql.functions.monotonicallyIncreasingId val dataDF = spark.read.textFile("./README.md") val indexedDF = dataDF.withColumn("id", monotonically_increasing_id) indexedDF.select($"id", $"value").show ``` ----- Delixus.com - Spark Consulting -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Append-column-to-Data-Frame-or-RDD-tp22385p28300.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org