Re: RowId unique key for Dataframes

Burak Yavuz Tue, 21 Jul 2015 18:49:41 -0700

Would monotonicallyIncreasingId
<https://github.com/apache/spark/blob/d4c7a7a3642a74ad40093c96c4bf45a62a470605/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L637>
work for you?


Best,
Burak



On Tue, Jul 21, 2015 at 4:55 PM, Srikanth <srikanth...@gmail.com> wrote:

> Hello,
>
> I'm creating dataframes from three CSV files using spark-csv package. I
> want to add a unique ID for each row in dataframe.
> Not sure how withColumn() can be used to achieve this. I need a Long value
> not an UUID.
>
> One option I found was to create a RDD and use zipWithUniqueId.
>
> sqlContext.textFile(file).
>> zipWithUniqueId().
>> map(case(d, i)=>i.toString + delimiter + d).
>> map(_.split(delimiter)).
>> map(s=>caseclass(...))
>
> .toDF().select("field1, "field2")
>
>
> Its a bit hacky. Is there an easier way to do this on dataframes and use
> spark-csv?
>
> Srikanth
>

Re: RowId unique key for Dataframes

Reply via email to