Thanks for the response. I can use either row_number() or monotonicallyIncreasingId to generate uniqueIds as in https://hadoopist.wordpress.com/2016/05/24/generate-unique-ids-for-each-rows-in-a-spark-dataframe/
I'm looking for a java example to use that to replicate a single row n times by appending a rownum column generated as above or using explode function. Ex: ds.withColumn("ROWNUM", org.apache.spark.sql.functions.explode(columnEx)); columnEx needs to be of type array inorder for explode to work. Any suggestions are helpful. Thanks On Thu, Sep 28, 2017 at 7:21 PM, ayan guha <guha.a...@gmail.com> wrote: > How about using row number for primary key? > > Select row_number() over (), * from table > > On Fri, 29 Sep 2017 at 10:21 am, Kanagha Kumar <kpra...@salesforce.com> > wrote: > >> Hi, >> >> I'm trying to replicate a single row from a dataset n times and create a >> new dataset from it. But, while replicating I need a column's value to be >> changed for each replication since it would be end up as the primary key >> when stored finally. >> >> Looked at the following reference:https://stackoverflow.com/questions/ >> 40397740/replicate-spark-row-n-times >> >> import org.apache.spark.sql.functions._ >> val result = singleRowDF >> .withColumn("dummy", explode(array((1 until 100).map(lit): _*))) >> .selectExpr(singleRowDF.columns: _*) >> >> How can I create a column from an array of values in Java and pass it to >> explode function? Suggestions are helpful. >> >> >> Thanks >> Kanagha >> > -- > Best Regards, > Ayan Guha >