There is a way, you can use org.apache.spark.sql.functions.monotonicallyIncreasingId it will give each rows of your dataframe a unique Id
On Tue, Oct 18, 2016 10:36 AM, ayan guha guha.a...@gmail.com wrote: Do you have any primary key or unique identifier in your data? Even if multiple columns can make a composite key? In other words, can your data have exactly same 2 rows with different unique ID? Also, do you have to have numeric ID? You may want to pursue hashing algorithm such as sha group to convert single or composite unique columns to an ID. On 18 Oct 2016 15:32, "Saurav Sinha" <sauravsinh...@gmail.com> wrote: Can any one help me out On Mon, Oct 17, 2016 at 7:27 PM, Saurav Sinha <sauravsinh...@gmail.com> wrote: Hi, I am in situation where I want to generate unique Id for each row. I have use monotonicallyIncreasingId but it is giving increasing values and start generating from start if it fail. I have two question here: Q1. Does this method give me unique id even in failure situation becaue I want to use that ID in my solr id. Q2. If answer to previous question is NO. Then Is there way yo generate UUID for each row which is uniqe and not updatedable. As I have come up with situation where UUID is updated val idUDF = udf(() => UUID.randomUUID().toString) val a = withColumn("alarmUUID", lit(idUDF()))a.persist(StorageLevel.MEMORY_ AND_DISK) rawDataDf.registerTempTable("rawAlarms") ////// I do some joines but as I reach further below I do sonthing likeb is transformation of asqlContext.sql("""Select a.alarmUUID,b.alarmUUID from a right outer join bon a.alarmUUID = b.alarmUUID""") it give output as +--------------------+--------------------+| alarmUUID| alarmUUID|+--------------------+--------------------+|7d33a516-5532-410...| null|| null|2439d6db-16a2-44b...| +--------------------+--------------------+ -- Thanks and Regards, Saurav Sinha Contact: 9742879062 -- Thanks and Regards, Saurav Sinha Contact: 9742879062 Olivier Girardot| Associé o.girar...@lateral-thoughts.com +33 6 24 09 17 94