Hi Pedro, In 1.6.1, you can do: >> ds.groupBy(_.uid).count().map(_._1) or >> ds.groupBy(_.uid).count().select($"value".as[String])
It doesn't have the exact same syntax as for DataFrame. http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Dataset It might be different in 2.0. Xinh On Fri, Jun 17, 2016 at 3:33 PM, Pedro Rodriguez <ski.rodrig...@gmail.com> wrote: > Hi All, > > I am working on using Datasets in 1.6.1 and eventually 2.0 when its > released. > > I am running the aggregate code below where I have a dataset where the row > has a field uid: > > ds.groupBy(_.uid).count() > // res0: org.apache.spark.sql.Dataset[(String, Long)] = [_1: string, _2: > bigint] > > This works as expected, however, attempts to run select statements after > fails: > ds.groupBy(_.uid).count().select(_._1) > // error: missing parameter type for expanded function ((x$2) => x$2._1) > ds.groupBy(_.uid).count().select(_._1) > > I have tried several variants, but nothing seems to work. Below is the > equivalent Dataframe code which works as expected: > df.groupBy("uid").count().select("uid") > > Thanks! > -- > Pedro Rodriguez > PhD Student in Distributed Machine Learning | CU Boulder > UC Berkeley AMPLab Alumni > > ski.rodrig...@gmail.com | pedrorodriguez.io | 909-353-4423 > Github: github.com/EntilZha | LinkedIn: > https://www.linkedin.com/in/pedrorodriguezscience > >