Re: Aggregate over a column: the proper way to do

sam smith Sun, 10 Apr 2022 15:52:17 -0700

Exact, one row, and two columns

Le sam. 9 avr. 2022 à 17:44, Sean Owen <sro...@gmail.com> a écrit :


> But it only has one row, right?
>
> On Sat, Apr 9, 2022, 10:06 AM sam smith <qustacksm2123...@gmail.com>
> wrote:
>
>> Yes. Returns the number of rows in the Dataset as *long*. but in my case
>> the aggregation returns a table of two columns.
>>
>> Le ven. 8 avr. 2022 à 14:12, Sean Owen <sro...@gmail.com> a écrit :
>>
>>> Dataset.count() returns one value directly?
>>>
>>> On Thu, Apr 7, 2022 at 11:25 PM sam smith <qustacksm2123...@gmail.com>
>>> wrote:
>>>
>>>> My bad, yes of course that! still i don't like the ..
>>>> select("count(myCol)") .. part in my line is there any replacement to that 
>>>> ?
>>>>
>>>> Le ven. 8 avr. 2022 à 06:13, Sean Owen <sro...@gmail.com> a écrit :
>>>>
>>>>> Just do an average then? Most of my point is that filtering to one
>>>>> group and then grouping is pointless.
>>>>>
>>>>> On Thu, Apr 7, 2022, 11:10 PM sam smith <qustacksm2123...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> What if i do avg instead of count?
>>>>>>
>>>>>> Le ven. 8 avr. 2022 à 05:32, Sean Owen <sro...@gmail.com> a écrit :
>>>>>>
>>>>>>> Wait, why groupBy at all? After the filter only rows with myCol
>>>>>>> equal to your target are left. There is only one group. Don't group just
>>>>>>> count after the filter?
>>>>>>>
>>>>>>> On Thu, Apr 7, 2022, 10:27 PM sam smith <qustacksm2123...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I want to aggregate a column by counting the number of rows having
>>>>>>>> the value "myTargetValue" and return the result
>>>>>>>> I am doing it like the following:in JAVA
>>>>>>>>
>>>>>>>>> long result =
>>>>>>>>> dataset.filter(dataset.col("myCol").equalTo("myTargetVal")).groupBy(col("myCol")).agg(count(dataset.col("myCol"))).select("count(myCol)").first().getLong(0);
>>>>>>>>
>>>>>>>>
>>>>>>>> Is that the right way? if no, what if a more optimized way to do
>>>>>>>> that (always in JAVA)?
>>>>>>>> Thanks for the help.
>>>>>>>>
>>>>>>>

Re: Aggregate over a column: the proper way to do

Reply via email to