What I did:

I have two datasets I need to join. One of the datasets does not change so
I bucket it once and save in a table. It looks something like:

spark.table("profiles").bucketBy(500, "uid").saveAsTable("profiles_bkt").

Now I have another dataset that I bucket "online":

spark.sql(".....").createOrReplaceTempView("sessions").
spark.table("sessions").bucketBy(500, "uid").saveAsTable("sessions_bkt").

And then I have the simples join:

SELECT profiles_bkt.profile,
       s*truct*(sessions_bkt.*) AS session
FROM   sessions_bkt
       LEFT OUTER JOIN profiles_bkt
                    ON sessions_bkt.uid = profiles_bkt.uid

What I sometimes receive:

java.lang.AssertionError: assertion failed: There should be only one
distinct value of the number pre-shuffle partitions among registered
Exchange operator.

Any clue?

Reply via email to