As I said, it should not affect performance of transformations on RDDs, only of sending tasks to the workers and getting results back. In general, you want the Akka frame size to be as small as possible while still holding your largest task or result; as long as your application isn’t throwing an error due to the frame size being too small, you’re fine. Having a bigger frame size will result in wasted space and unneeded memory allocation for buffers. It doesn’t make the communication more efficient.
Matei On Dec 8, 2013, at 12:57 PM, Shangyu Luo <[email protected]> wrote: > I would like to know the maximum value for spark.akka.framesize, too and I am > wondering if it will affect the performance of reduceByKey(). > Thanks! > > > 2013/12/8 Matei Zaharia <[email protected]> > Hey Matt, > > This setting shouldn’t really affect groupBy operations, because they don’t > go through Akka. The frame size setting is for messages from the master to > workers (specifically, sending out tasks), and for results that go directly > from workers to the application (e.g. collect()). So it shouldn’t be a > problem unless these are large. In Spark 0.8.1, results back to the master > will be sent in a different way if they’re large, so the setting will only > cover task sizes. > > Matei > > On Dec 7, 2013, at 10:20 PM, Matt Cheah <[email protected]> wrote: > >> Hi everyone, >> >> I'm noticing like others that group-By operations with large sized groups >> gives Spark some trouble. Increasing the spark.akka.frameSize property >> alleviates it up to a point. >> >> I was wondering what the maximum setting for this value is. I've seen >> previous e-mails talking about the ramifications of turning up this value, >> but I was wondering what the actual maximum number that could be set for it >> is. I'll benchmark the performance hit accordingly. >> >> Thanks! >> >> -Matt Cheah > > > > > -- > -- > > Shangyu, Luo >
