Hi Spark Community,
I have implemented a custom Spark Aggregator (a subclass to
org.apache.spark.sql.expressions.Aggregator). Now I'm trying to use it in a
PySpark application, but for some reason, I'm not able to trigger the
function. Here is what I'm doing, could someone help me take a look?
should work.
> --
> Raghavendra
>
>
> On Sun, Apr 23, 2023 at 9:20 PM Thomas Wang wrote:
>
>> Hi Spark Community,
>>
>> I'm trying to implement a custom Spark Aggregator (a subclass to
>> org.apache.spark.sql.expressions.Aggregator). Correct me if I'm
Hi Spark Community,
I'm trying to implement a custom Spark Aggregator (a subclass to
org.apache.spark.sql.expressions.Aggregator). Correct me if I'm wrong, but
I'm assuming I will be able to use it as an aggregation function like SUM.
What I'm trying to do is that I have a column of ARRAY and I
Hi,
I have two tables t1 and t2. Both are bucketed and sorted on user_id into
32 buckets.
When I use a regular equal join, Spark triggers the expected Sorted Merge
Bucket Join. Please see my code and the physical plan below.
from pyspark.sql import SparkSession
def