Hi Michael,

No spark upgrade, we've been changing some of our data pipelines so the data volumes have probably been getting a bit larger. Just in the last few weeks we've seen quite a few jobs needing a larger maxResultSize. Some jobs have gone from "fine with 1GB default" to 3GB. Wondering what besides a collect could cause this (as there's certainly not an explicit collect()).

Mesos, parquet source data, a broadcast of a small table earlier which is joined then just a few aggregations, select, coalesce and spark-csv write. The executors go along nicely (as does the driver) and then we start to hit memory pressure on the driver in the output loop and the job grinds to a crawl (we eventually have to kill it and restart with more memory).


To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to