Hi, In 2.0, you can say; val ds = Seq[Tuple2[Int, Int]]((1, 0), (2, 0)).toDS ds.groupBy($"_1").count.select($"_1", $"count").show
// maropu On Sat, Jun 18, 2016 at 7:53 AM, Xinh Huynh <xinh.hu...@gmail.com> wrote: > Hi Pedro, > > In 1.6.1, you can do: > >> ds.groupBy(_.uid).count().map(_._1) > or > >> ds.groupBy(_.uid).count().select($"value".as[String]) > > It doesn't have the exact same syntax as for DataFrame. > http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Dataset > > It might be different in 2.0. > > Xinh > > On Fri, Jun 17, 2016 at 3:33 PM, Pedro Rodriguez <ski.rodrig...@gmail.com> > wrote: > >> Hi All, >> >> I am working on using Datasets in 1.6.1 and eventually 2.0 when its >> released. >> >> I am running the aggregate code below where I have a dataset where the >> row has a field uid: >> >> ds.groupBy(_.uid).count() >> // res0: org.apache.spark.sql.Dataset[(String, Long)] = [_1: string, _2: >> bigint] >> >> This works as expected, however, attempts to run select statements after >> fails: >> ds.groupBy(_.uid).count().select(_._1) >> // error: missing parameter type for expanded function ((x$2) => x$2._1) >> ds.groupBy(_.uid).count().select(_._1) >> >> I have tried several variants, but nothing seems to work. Below is the >> equivalent Dataframe code which works as expected: >> df.groupBy("uid").count().select("uid") >> >> Thanks! >> -- >> Pedro Rodriguez >> PhD Student in Distributed Machine Learning | CU Boulder >> UC Berkeley AMPLab Alumni >> >> ski.rodrig...@gmail.com | pedrorodriguez.io | 909-353-4423 >> Github: github.com/EntilZha | LinkedIn: >> https://www.linkedin.com/in/pedrorodriguezscience >> >> > -- --- Takeshi Yamamuro