When you say "started seeing", do you mean after a Spark version upgrade? After running a new job?
On Mon, Sep 19, 2016 at 2:05 PM, Adrian Bridgett <adr...@opensignal.com> wrote: > Hi, > > We've recently started seeing a huge increase in > spark.driver.maxResultSize - we are starting to set it at 3GB (and increase > our driver memory a lot to 12GB or so). This is on v1.6.1 with Mesos > scheduler. > > All the docs I can see is that this is to do with .collect() being called > on a large RDD (which isn't the case AFAIK - certainly nothing in the code) > and it's rather puzzling me as to what's going on. I thought that the > number of tasks was coming into it (about 14000 tasks in each of about a > dozen stages). Adding a coalesce seemed to help but now we are hitting the > problem again after a few minor code tweaks. > > What else could be contributing to this? Thoughts I've had: > - number of tasks > - metrics? > - um, a bit stuck! > > The code looks like this: > df=.... > df.persist() > val rows = df.count() > > // actually we loop over this a few times > val output = df. groupBy("id").agg( > avg($"score").as("avg_score"), > count($"id").as("rows") > ). > select( > $"id", > $"avg_score, > $"rows", > ).sort($"id") > output.coalesce(1000).write.format("com.databricks.spark.csv > ").save('/tmp/...') > > Cheers for any help/pointers! There are a couple of memory leak tickets > fixed in v1.6.2 that may affect the driver so I may try an upgrade (the > executors are fine). > > Adrian > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- Michael Gummelt Software Engineer Mesosphere