I am joining two entities.
One of the entities weighs ~0.5 TB. The other weighs ~16GB

Both are stored in parquet.

Another trait of the problem is that the "smaller" entity does not change,
so I figured I'd pre-bucket it
to improve performance.

* What are the guidelines for deciding the best amount of buckets for this?
Does it solely depend on the overall size of the bucketed entity or do I
need to take into account the size of the unbucketed one. How?

Reply via email to