Re: very high maxresults setting (no collect())

2016-09-22 Thread Adrian Bridgett
Hi Michael, No spark upgrade, we've been changing some of our data pipelines so the data volumes have probably been getting a bit larger. Just in the last few weeks we've seen quite a few jobs needing a larger maxResultSize. Some jobs have gone from "fine with 1GB default" to 3GB. Wondering

Re: very high maxresults setting (no collect())

2016-09-19 Thread Michael Gummelt
When you say "started seeing", do you mean after a Spark version upgrade? After running a new job? On Mon, Sep 19, 2016 at 2:05 PM, Adrian Bridgett wrote: > Hi, > > We've recently started seeing a huge increase in > spark.driver.maxResultSize - we are starting to set it at 3GB (and increase > ou

very high maxresults setting (no collect())

2016-09-19 Thread Adrian Bridgett
Hi, We've recently started seeing a huge increase in spark.driver.maxResultSize - we are starting to set it at 3GB (and increase our driver memory a lot to 12GB or so). This is on v1.6.1 with Mesos scheduler. All the docs I can see is that this is to do with .collect() being called on a la