Re: DataFrame Column Alias problem

Reynold Xin Thu, 21 May 2015 23:49:27 -0700

In 1.4 it actually shows col1 by default.

In 1.3, you can add "col1" to the output, i.e.


df.groupBy($"col1").agg($"col1", count($"col1").as("c")).show()


On Thu, May 21, 2015 at 11:22 PM, SLiZn Liu <sliznmail...@gmail.com> wrote:

> However this returns a single column of c, without showing the original
> col1.
> 
>
> On Thu, May 21, 2015 at 11:25 PM Ram Sriharsha <sriharsha....@gmail.com>
> wrote:
>
>> df.groupBy($"col1").agg(count($"col1").as("c")).show
>>
>> On Thu, May 21, 2015 at 3:09 AM, SLiZn Liu <sliznmail...@gmail.com>
>> wrote:
>>
>>> Hi Spark Users Group,
>>>
>>> I’m doing groupby operations on my DataFrame *df* as following, to get
>>> count for each value of col1:
>>>
>>> > df.groupBy("col1").agg("col1" -> "count").show // I don't know if I 
>>> > should write like this.
>>> col1   COUNT(col1#347)
>>> aaa    2
>>> bbb    4
>>> ccc    4
>>> ...
>>> and more...
>>>
>>> As I ‘d like to sort by the resulting count, with
>>> .sort("COUNT(col1#347)"), but the column name of the count result
>>> obviously cannot be retrieved in advance. Intuitively one might consider
>>> acquire column name by column index in a fashion of R’s DataFrame, except
>>> Spark doesn’t support. I have Googled *spark agg alias* and so forth,
>>> and checked DataFrame.as in Spark API, neither helped on this. Am I the
>>> only one who had ever got stuck on this issue or anything I have missed?
>>>
>>> REGARDS,
>>> Todd Leo
>>> 
>>>
>>
>>

Re: DataFrame Column Alias problem

Reply via email to