I am trying a run terasort in spark , for a 7 node cluster with only 10g of
data and executors get lost with GC overhead limit exceeded error.

This is what my cluster looks like -


   - *Alive Workers:* 7
   - *Cores in use:* 28 Total, 2 Used
   - *Memory in use:* 56.0 GB Total, 1024.0 MB Used
   - *Applications:* 1 Running, 6 Completed
   - *Drivers:* 0 Running, 0 Completed
   - *Status:* ALIVE

Each worker has 8 cores and 4GB memory.

My questions is how do people running in production decide these properties
-

1) --num-executors
2) --executor-cores
3) --executor-memory
4) num of partitions
5) spark.default.parallelism

Thanks,
Kartik

Reply via email to