Re: DataFrame Column Alias problem
However this returns a single column of c, without showing the original col1 . On Thu, May 21, 2015 at 11:25 PM Ram Sriharsha sriharsha@gmail.com wrote: df.groupBy($col1).agg(count($col1).as(c)).show On Thu, May 21, 2015 at 3:09 AM, SLiZn Liu sliznmail...@gmail.com wrote: Hi Spark Users Group, I’m doing groupby operations on my DataFrame *df* as following, to get count for each value of col1: df.groupBy(col1).agg(col1 - count).show // I don't know if I should write like this. col1 COUNT(col1#347) aaa2 bbb4 ccc4 ... and more... As I ‘d like to sort by the resulting count, with .sort(COUNT(col1#347)), but the column name of the count result obviously cannot be retrieved in advance. Intuitively one might consider acquire column name by column index in a fashion of R’s DataFrame, except Spark doesn’t support. I have Googled *spark agg alias* and so forth, and checked DataFrame.as in Spark API, neither helped on this. Am I the only one who had ever got stuck on this issue or anything I have missed? REGARDS, Todd Leo
Re: DataFrame Column Alias problem
Despite the odd usage, it does the trick, thanks Reynold! On Fri, May 22, 2015 at 2:47 PM Reynold Xin r...@databricks.com wrote: In 1.4 it actually shows col1 by default. In 1.3, you can add col1 to the output, i.e. df.groupBy($col1).agg($col1, count($col1).as(c)).show() On Thu, May 21, 2015 at 11:22 PM, SLiZn Liu sliznmail...@gmail.com wrote: However this returns a single column of c, without showing the original col1. On Thu, May 21, 2015 at 11:25 PM Ram Sriharsha sriharsha@gmail.com wrote: df.groupBy($col1).agg(count($col1).as(c)).show On Thu, May 21, 2015 at 3:09 AM, SLiZn Liu sliznmail...@gmail.com wrote: Hi Spark Users Group, I’m doing groupby operations on my DataFrame *df* as following, to get count for each value of col1: df.groupBy(col1).agg(col1 - count).show // I don't know if I should write like this. col1 COUNT(col1#347) aaa2 bbb4 ccc4 ... and more... As I ‘d like to sort by the resulting count, with .sort(COUNT(col1#347)), but the column name of the count result obviously cannot be retrieved in advance. Intuitively one might consider acquire column name by column index in a fashion of R’s DataFrame, except Spark doesn’t support. I have Googled *spark agg alias* and so forth, and checked DataFrame.as in Spark API, neither helped on this. Am I the only one who had ever got stuck on this issue or anything I have missed? REGARDS, Todd Leo
Re: DataFrame Column Alias problem
In 1.4 it actually shows col1 by default. In 1.3, you can add col1 to the output, i.e. df.groupBy($col1).agg($col1, count($col1).as(c)).show() On Thu, May 21, 2015 at 11:22 PM, SLiZn Liu sliznmail...@gmail.com wrote: However this returns a single column of c, without showing the original col1. On Thu, May 21, 2015 at 11:25 PM Ram Sriharsha sriharsha@gmail.com wrote: df.groupBy($col1).agg(count($col1).as(c)).show On Thu, May 21, 2015 at 3:09 AM, SLiZn Liu sliznmail...@gmail.com wrote: Hi Spark Users Group, I’m doing groupby operations on my DataFrame *df* as following, to get count for each value of col1: df.groupBy(col1).agg(col1 - count).show // I don't know if I should write like this. col1 COUNT(col1#347) aaa2 bbb4 ccc4 ... and more... As I ‘d like to sort by the resulting count, with .sort(COUNT(col1#347)), but the column name of the count result obviously cannot be retrieved in advance. Intuitively one might consider acquire column name by column index in a fashion of R’s DataFrame, except Spark doesn’t support. I have Googled *spark agg alias* and so forth, and checked DataFrame.as in Spark API, neither helped on this. Am I the only one who had ever got stuck on this issue or anything I have missed? REGARDS, Todd Leo
DataFrame Column Alias problem
Hi Spark Users Group, I’m doing groupby operations on my DataFrame *df* as following, to get count for each value of col1: df.groupBy(col1).agg(col1 - count).show // I don't know if I should write like this. col1 COUNT(col1#347) aaa2 bbb4 ccc4 ... and more... As I ‘d like to sort by the resulting count, with .sort(COUNT(col1#347)), but the column name of the count result obviously cannot be retrieved in advance. Intuitively one might consider acquire column name by column index in a fashion of R’s DataFrame, except Spark doesn’t support. I have Googled *spark agg alias* and so forth, and checked DataFrame.as in Spark API, neither helped on this. Am I the only one who had ever got stuck on this issue or anything I have missed? REGARDS, Todd Leo
Re: DataFrame Column Alias problem
df.groupBy($col1).agg(count($col1).as(c)).show On Thu, May 21, 2015 at 3:09 AM, SLiZn Liu sliznmail...@gmail.com wrote: Hi Spark Users Group, I’m doing groupby operations on my DataFrame *df* as following, to get count for each value of col1: df.groupBy(col1).agg(col1 - count).show // I don't know if I should write like this. col1 COUNT(col1#347) aaa2 bbb4 ccc4 ... and more... As I ‘d like to sort by the resulting count, with .sort(COUNT(col1#347)), but the column name of the count result obviously cannot be retrieved in advance. Intuitively one might consider acquire column name by column index in a fashion of R’s DataFrame, except Spark doesn’t support. I have Googled *spark agg alias* and so forth, and checked DataFrame.as in Spark API, neither helped on this. Am I the only one who had ever got stuck on this issue or anything I have missed? REGARDS, Todd Leo