You can use the monotonically_increasing_id method to generate guaranteed
unique (but not necessarily consecutive) IDs. Calling something like:
df.withColumn("id", monotonically_increasing_id())
You don't mention which language you're using but you'll need to pull in the
sql.functions library.
Mike
> On Aug 5, 2016, at 9:11 AM, Tony Lane <[email protected]> wrote:
>
> Ayan - basically i have a dataset with structure, where bid are unique string
> values
>
> bid: String
> val : integer
>
> I need unique int values for these string bid''s to do some processing in the
> dataset
>
> like
>
> id:int (unique integer id for each bid)
> bid:String
> val:integer
>
>
>
> -Tony
>
>> On Fri, Aug 5, 2016 at 6:35 PM, ayan guha <[email protected]> wrote:
>> Hi
>>
>> Can you explain a little further?
>>
>> best
>> Ayan
>>
>>> On Fri, Aug 5, 2016 at 10:14 PM, Tony Lane <[email protected]> wrote:
>>> I have a row with structure like
>>>
>>> identifier: String
>>> value: int
>>>
>>> All identifier are unique and I want to generate a unique long id for the
>>> data and get a row object back for further processing.
>>>
>>> I understand using the zipWithUniqueId function on RDD, but that would mean
>>> first converting to RDD and then joining back the RDD and dataset
>>>
>>> What is the best way to do this ?
>>>
>>> -Tony
>>
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>