Hi Yuming - I was running into the same issue with larger worker nodes a few weeks ago.
The way I managed to get around the high GC time, as per the suggestion of some others, was to break each worker node up into individual workers of around 10G in size. Divide your cores accordingly. The other way I was able to avoid high GC time was to use the right kind of serialisation to keep the number of objects in memory low. Hope that helps! - nick -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-High-GC-time-tp23005p23030.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org