[sparklyR] broadcast table for temporary table -> can you compute statistics for temporary table?

Joris Billen Wed, 23 Nov 2022 04:23:29 -0800

Hi,
question about using the R api for spark:we load some files from oracle 
(through jdbc ) and register it in a temporary table in spark.
I see a lot of shuffling, but we have joins between large and small tables. So 
I probably need to broadcast the small tables.
Normally autobroadcasting happens for tables up to 
(spark.sql.autoBroadcastJoinThreshold) 10MB, but spark only knows if the table 
is small enough to broadcast based on the statistics. These are statistics 
known to the hive metastore. So I assume for a temporary table (registered 
based on external files or in this case an oracle table) there will not be any 
statistics.
Is there any way to compute the stats for a temporary table so that spark will 
know whether he needs to autobroadcast?



Thanks!

[sparklyR] broadcast table for temporary table -> can you compute statistics for temporary table?

Reply via email to