Please see if this works
-- aggregate array into map of element of count
SELECT aggregate(array(1,2,3,4,5),
map('cnt',0),
(acc,x) -> map('cnt', acc.cnt+1)) as array_count
thanks
Vijay
On 2023/05/05 19:32:04 Yong Zhang wrote:
> Hi, This is on Spark 3.1 environment.
>
> For some reason, I can
In my view spark is behaving as expected.
TL:DR
Every time a dataframe is reused or branched or forked the sequence operations
evaluated run again. Use Cache or persist to avoid this behavior and un-persist
when no longer required, spark does not un-persist automatically.
Couple of things