Hi Tony, Would hash on the bid work for you?
hash(cols: Column <https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/Column.html> *): Column <https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/Column.html>[image: Permalink] <https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$@hash(cols:org.apache.spark.sql.Column*):org.apache.spark.sql.Column> Calculates the hash code of given columns, and returns the result as an int column. Annotations@varargs()Since 2.0 Best Regards, Sonal Founder, Nube Technologies <http://www.nubetech.co> Reifier at Strata Hadoop World <https://www.youtube.com/watch?v=eD3LkpPQIgM> Reifier at Spark Summit 2015 <https://spark-summit.org/2015/events/real-time-fuzzy-matching-with-spark-and-elastic-search/> <http://in.linkedin.com/in/sonalgoyal> On Fri, Aug 5, 2016 at 7:41 PM, Tony Lane <tonylane....@gmail.com> wrote: > Ayan - basically i have a dataset with structure, where bid are unique > string values > > bid: String > val : integer > > I need unique int values for these string bid''s to do some processing in > the dataset > > like > > id:int (unique integer id for each bid) > bid:String > val:integer > > > > -Tony > > On Fri, Aug 5, 2016 at 6:35 PM, ayan guha <guha.a...@gmail.com> wrote: > >> Hi >> >> Can you explain a little further? >> >> best >> Ayan >> >> On Fri, Aug 5, 2016 at 10:14 PM, Tony Lane <tonylane....@gmail.com> >> wrote: >> >>> I have a row with structure like >>> >>> identifier: String >>> value: int >>> >>> All identifier are unique and I want to generate a unique long id for >>> the data and get a row object back for further processing. >>> >>> I understand using the zipWithUniqueId function on RDD, but that would >>> mean first converting to RDD and then joining back the RDD and dataset >>> >>> What is the best way to do this ? >>> >>> -Tony >>> >>> >> >> >> -- >> Best Regards, >> Ayan Guha >> > >