You can do something like: df.collect().map { case Row(name: String, age1: Int, age2: Int) => ... }
On Tue, Mar 31, 2015 at 4:05 PM, roni <roni.epi...@gmail.com> wrote: > I have 2 paraquet files with format e.g name , age, town > I read them and then join them to get all the names which are in both > towns . > the resultant dataset is > > res4: Array[org.apache.spark.sql.Row] = Array([name1, age1, > town1,name2,age2,town2]....) > > Name 1 and name 2 are same as I am joining . > Now , I want to get only to the format (name , age1, age2) > > But I cant seem to getting to manipulate the spark.sql.row. > > Trying something like map(_.split (",")).map(a=> (a(0), > a(1).trim().toInt)) does not work . > > Can you suggest a way ? > > Thanks > -R > >