Hi, This is on Spark 3.1 environment. For some reason, I can ONLY do this in Spark SQL, instead of either Scala or PySpark environment.
I want to aggregate an array into a Map of element count, within that array, but in Spark SQL. I know that there is an aggregate function available like aggregate(expr, start, merge [, finish]) But I want to know if this can be done in the Spark SQL only, and: * How to represent an empty Map as "start" element above * How to merge each element (as String type) into Map (as adding count if exist in the Map, or add as (element -> 1) as new entry in the Map if not exist) Like the following example -> https://docs.databricks.com/sql/language-manual/functions/aggregate.html SELECT aggregate(array(1, 2, 3, 4), named_struct('sum', 0, 'cnt', 0), (acc, x) -> named_struct('sum', acc.sum + x, 'cnt', acc.cnt + 1), acc -> acc.sum / acc.cnt) AS avg I wonder: select aggregate( array('a','b','a')), map('', 0), (acc, x) -> ??? acc -> acc) as output How to do the logic after "(acc, x) -> ", so I can output a map of count of each element in the array? I know I can "explode", then groupby + count, but since I have multi array columns need to transform, so I want to do more a high order function way, and in pure Spark SQL. Thanks