from:"Thomas Wang"

Use Spark Aggregator in PySpark

2023-04-23 Thread Thomas Wang

Hi Spark Community, I have implemented a custom Spark Aggregator (a subclass to org.apache.spark.sql.expressions.Aggregator). Now I'm trying to use it in a PySpark application, but for some reason, I'm not able to trigger the function. Here is what I'm doing, could someone help me take a look?

Re: Spark Aggregator with ARRAY input and ARRAY output

2023-04-23 Thread Thomas Wang

should work. > -- > Raghavendra > > > On Sun, Apr 23, 2023 at 9:20 PM Thomas Wang wrote: > >> Hi Spark Community, >> >> I'm trying to implement a custom Spark Aggregator (a subclass to >> org.apache.spark.sql.expressions.Aggregator). Correct me if I'm

Spark Aggregator with ARRAY input and ARRAY output

2023-04-23 Thread Thomas Wang

Hi Spark Community, I'm trying to implement a custom Spark Aggregator (a subclass to org.apache.spark.sql.expressions.Aggregator). Correct me if I'm wrong, but I'm assuming I will be able to use it as an aggregation function like SUM. What I'm trying to do is that I have a column of ARRAY and I

eqNullSafe breaks Sorted Merge Bucket Join?

2023-03-09 Thread Thomas Wang

Hi, I have two tables t1 and t2. Both are bucketed and sorted on user_id into 32 buckets. When I use a regular equal join, Spark triggers the expected Sorted Merge Bucket Join. Please see my code and the physical plan below. from pyspark.sql import SparkSession def

Use Spark Aggregator in PySpark

Re: Spark Aggregator with ARRAY input and ARRAY output

Spark Aggregator with ARRAY input and ARRAY output

eqNullSafe breaks Sorted Merge Bucket Join?

4 matches

Site Navigation

Mail list logo

Footer information