I am working on profiling TPCH queries for Spark 2.0. I see lot of
temporary object creation (sometimes size as much as the data size) which
is justified for the kind of processing Spark does. But, from production
perspective, is there a guideline on how much memory should be allocated
for processing a specific data size of let's say parquet data? Also, has
someone investigated memory usage for the individual SQL operators like
Filter, group by, order by, Exchange etc.?
Hemant Bhanawat <https://www.linkedin.com/in/hemant-bhanawat-92a3811>