I’ve asked this question a couple of times from a friend who didn’t know the answer… so I thought I would try here.
Suppose we launch a job on a cluster (YARN) and we have set up the containers to be 3GB in size. What does that 3GB represent? I mean what happens if we end up using 2-3GB of off heap storage via tungsten? What will Spark do? Will it try to honor the container’s limits and throw an exception or will it allow my job to grab that amount of memory and exceed YARN’s expectations since its off heap? Thx -Mike