Ps.. forgot to mention this syntax works... but then you loose your group by fields (which is honestly pretty weird, i'm not sure if this is as designed or a bug?)
>>> t2 = reviews.groupBy("stars").agg(count("stars").alias("count")) >>> t2 *DataFrame[count: bigint]* On Thu, Apr 16, 2015 at 9:32 PM, elliott cordo <elliottco...@gmail.com> wrote: > FYI.. the problem is that column names spark generates are not able to be > referenced within SQL or dataframe operations (ie. "SUM(cool_cnt#725)").. > any idea how to alias these final aggregate columns.. > > the syntax below doesn't make sense, but this is what i'd ideally want to > do: > .agg({"cool_cnt":"sum".alias("cool_cnt"),"*":"count".alias("cnt")}) > > On Wed, Apr 15, 2015 at 7:23 PM, elliott cordo <elliottco...@gmail.com> > wrote: > >> Hi Guys - >> >> Having trouble figuring out the semantics for using the alias function on >> the final sum and count aggregations? >> >> >>> cool_summary = reviews.select(reviews.user_id, >> cool_cnt("votes.cool").alias("cool_cnt")).groupBy("user_id").agg({"cool_cnt":"sum","*":"count"}) >> >> >>> cool_summary >> >> DataFrame[user_id: string, SUM(cool_cnt#725): double, COUNT(1): bigint] >> > >