RE: Can Spark SQL (not DataFrame or Dataset) aggregate array into map of element of count?

2023-05-10 Thread Vijay B
Please see if this works -- aggregate array into map of element of count SELECT aggregate(array(1,2,3,4,5), map('cnt',0), (acc,x) -> map('cnt', acc.cnt+1)) as array_count thanks Vijay On 2023/05/05 19:32:04 Yong Zhang wrote: > Hi, This is on Spark 3.1 environment. > > For some reason, I can

Does spark read the same file twice, if two stages are using the same DataFrame?

2023-05-11 Thread Vijay B
In my view spark is behaving as expected. TL:DR Every time a dataframe is reused or branched or forked the sequence operations evaluated run again. Use Cache or persist to avoid this behavior and un-persist when no longer required, spark does not un-persist automatically. Couple of things