Re: Memory usage by Spark jobs

2016-09-22 Thread Jörn Franke
You should take also into account that spark has different option to represent data in-memory, such as Java serialized objects, Kyro serialized, Tungsten (columnar optionally compressed) etc. the tungsten thing depends heavily on the underlying data and sorting especially if compressed. Then,

Memory usage by Spark jobs

2016-09-22 Thread Hemant Bhanawat
I am working on profiling TPCH queries for Spark 2.0. I see lot of temporary object creation (sometimes size as much as the data size) which is justified for the kind of processing Spark does. But, from production perspective, is there a guideline on how much memory should be allocated for