Hi Tony,

Would hash on the bid work for you?

hash(cols: Column
<https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/Column.html>
*): Column
<https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/Column.html>[image:
Permalink]
<https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$@hash(cols:org.apache.spark.sql.Column*):org.apache.spark.sql.Column>

Calculates the hash code of given columns, and returns the result as an int
column.
Annotations@varargs()Since

2.0

Best Regards,
Sonal
Founder, Nube Technologies <http://www.nubetech.co>
Reifier at Strata Hadoop World <https://www.youtube.com/watch?v=eD3LkpPQIgM>
Reifier at Spark Summit 2015
<https://spark-summit.org/2015/events/real-time-fuzzy-matching-with-spark-and-elastic-search/>

<http://in.linkedin.com/in/sonalgoyal>



On Fri, Aug 5, 2016 at 7:41 PM, Tony Lane <tonylane....@gmail.com> wrote:

> Ayan - basically i have a dataset with structure, where bid are unique
> string values
>
> bid: String
> val : integer
>
> I need unique int values for these string bid''s to do some processing in
> the dataset
>
> like
>
> id:int   (unique integer id for each bid)
> bid:String
> val:integer
>
>
>
> -Tony
>
> On Fri, Aug 5, 2016 at 6:35 PM, ayan guha <guha.a...@gmail.com> wrote:
>
>> Hi
>>
>> Can you explain a little further?
>>
>> best
>> Ayan
>>
>> On Fri, Aug 5, 2016 at 10:14 PM, Tony Lane <tonylane....@gmail.com>
>> wrote:
>>
>>> I have a row with structure like
>>>
>>> identifier: String
>>> value: int
>>>
>>> All identifier are unique and I want to generate a unique long id for
>>> the data and get a row object back for further processing.
>>>
>>> I understand using the zipWithUniqueId function on RDD, but that would
>>> mean first converting to RDD and then joining back the RDD and dataset
>>>
>>> What is the best way to do this ?
>>>
>>> -Tony
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>

Reply via email to