Re: DataFrame Column Alias problem

2015-05-22 Thread SLiZn Liu
However this returns a single column of c, without showing the original col1
.
​

On Thu, May 21, 2015 at 11:25 PM Ram Sriharsha sriharsha@gmail.com
wrote:

 df.groupBy($col1).agg(count($col1).as(c)).show

 On Thu, May 21, 2015 at 3:09 AM, SLiZn Liu sliznmail...@gmail.com wrote:

 Hi Spark Users Group,

 I’m doing groupby operations on my DataFrame *df* as following, to get
 count for each value of col1:

  df.groupBy(col1).agg(col1 - count).show // I don't know if I should 
  write like this.
 col1   COUNT(col1#347)
 aaa2
 bbb4
 ccc4
 ...
 and more...

 As I ‘d like to sort by the resulting count, with
 .sort(COUNT(col1#347)), but the column name of the count result
 obviously cannot be retrieved in advance. Intuitively one might consider
 acquire column name by column index in a fashion of R’s DataFrame, except
 Spark doesn’t support. I have Googled *spark agg alias* and so forth,
 and checked DataFrame.as in Spark API, neither helped on this. Am I the
 only one who had ever got stuck on this issue or anything I have missed?

 REGARDS,
 Todd Leo
 ​





Re: DataFrame Column Alias problem

2015-05-22 Thread SLiZn Liu
Despite the odd usage, it does the trick, thanks Reynold!

On Fri, May 22, 2015 at 2:47 PM Reynold Xin r...@databricks.com wrote:

 In 1.4 it actually shows col1 by default.

 In 1.3, you can add col1 to the output, i.e.

 df.groupBy($col1).agg($col1, count($col1).as(c)).show()


 On Thu, May 21, 2015 at 11:22 PM, SLiZn Liu sliznmail...@gmail.com
 wrote:

 However this returns a single column of c, without showing the original
 col1.
 ​

 On Thu, May 21, 2015 at 11:25 PM Ram Sriharsha sriharsha@gmail.com
 wrote:

 df.groupBy($col1).agg(count($col1).as(c)).show

 On Thu, May 21, 2015 at 3:09 AM, SLiZn Liu sliznmail...@gmail.com
 wrote:

 Hi Spark Users Group,

 I’m doing groupby operations on my DataFrame *df* as following, to get
 count for each value of col1:

  df.groupBy(col1).agg(col1 - count).show // I don't know if I 
  should write like this.
 col1   COUNT(col1#347)
 aaa2
 bbb4
 ccc4
 ...
 and more...

 As I ‘d like to sort by the resulting count, with
 .sort(COUNT(col1#347)), but the column name of the count result
 obviously cannot be retrieved in advance. Intuitively one might consider
 acquire column name by column index in a fashion of R’s DataFrame, except
 Spark doesn’t support. I have Googled *spark agg alias* and so forth,
 and checked DataFrame.as in Spark API, neither helped on this. Am I
 the only one who had ever got stuck on this issue or anything I have 
 missed?

 REGARDS,
 Todd Leo
 ​






Re: DataFrame Column Alias problem

2015-05-22 Thread Reynold Xin
In 1.4 it actually shows col1 by default.

In 1.3, you can add col1 to the output, i.e.

df.groupBy($col1).agg($col1, count($col1).as(c)).show()


On Thu, May 21, 2015 at 11:22 PM, SLiZn Liu sliznmail...@gmail.com wrote:

 However this returns a single column of c, without showing the original
 col1.
 ​

 On Thu, May 21, 2015 at 11:25 PM Ram Sriharsha sriharsha@gmail.com
 wrote:

 df.groupBy($col1).agg(count($col1).as(c)).show

 On Thu, May 21, 2015 at 3:09 AM, SLiZn Liu sliznmail...@gmail.com
 wrote:

 Hi Spark Users Group,

 I’m doing groupby operations on my DataFrame *df* as following, to get
 count for each value of col1:

  df.groupBy(col1).agg(col1 - count).show // I don't know if I 
  should write like this.
 col1   COUNT(col1#347)
 aaa2
 bbb4
 ccc4
 ...
 and more...

 As I ‘d like to sort by the resulting count, with
 .sort(COUNT(col1#347)), but the column name of the count result
 obviously cannot be retrieved in advance. Intuitively one might consider
 acquire column name by column index in a fashion of R’s DataFrame, except
 Spark doesn’t support. I have Googled *spark agg alias* and so forth,
 and checked DataFrame.as in Spark API, neither helped on this. Am I the
 only one who had ever got stuck on this issue or anything I have missed?

 REGARDS,
 Todd Leo
 ​





DataFrame Column Alias problem

2015-05-21 Thread SLiZn Liu
Hi Spark Users Group,

I’m doing groupby operations on my DataFrame *df* as following, to get
count for each value of col1:

 df.groupBy(col1).agg(col1 - count).show // I don't know if I should 
 write like this.
col1   COUNT(col1#347)
aaa2
bbb4
ccc4
...
and more...

As I ‘d like to sort by the resulting count, with .sort(COUNT(col1#347)),
but the column name of the count result obviously cannot be retrieved in
advance. Intuitively one might consider acquire column name by column index
in a fashion of R’s DataFrame, except Spark doesn’t support. I have
Googled *spark
agg alias* and so forth, and checked DataFrame.as in Spark API, neither
helped on this. Am I the only one who had ever got stuck on this issue or
anything I have missed?

REGARDS,
Todd Leo
​


Re: DataFrame Column Alias problem

2015-05-21 Thread Ram Sriharsha
df.groupBy($col1).agg(count($col1).as(c)).show

On Thu, May 21, 2015 at 3:09 AM, SLiZn Liu sliznmail...@gmail.com wrote:

 Hi Spark Users Group,

 I’m doing groupby operations on my DataFrame *df* as following, to get
 count for each value of col1:

  df.groupBy(col1).agg(col1 - count).show // I don't know if I should 
  write like this.
 col1   COUNT(col1#347)
 aaa2
 bbb4
 ccc4
 ...
 and more...

 As I ‘d like to sort by the resulting count, with .sort(COUNT(col1#347)),
 but the column name of the count result obviously cannot be retrieved in
 advance. Intuitively one might consider acquire column name by column index
 in a fashion of R’s DataFrame, except Spark doesn’t support. I have Googled 
 *spark
 agg alias* and so forth, and checked DataFrame.as in Spark API, neither
 helped on this. Am I the only one who had ever got stuck on this issue or
 anything I have missed?

 REGARDS,
 Todd Leo
 ​