I am joining two entities. One of the entities weighs ~0.5 TB. The other weighs ~16GB
Both are stored in parquet. Another trait of the problem is that the "smaller" entity does not change, so I figured I'd pre-bucket it to improve performance. * What are the guidelines for deciding the best amount of buckets for this? Does it solely depend on the overall size of the bucketed entity or do I need to take into account the size of the unbucketed one. How?