Thank you for your inputs. Will test it out and share my findings
On Thursday, July 14, 2016, CosminC <ciob...@adobe.com> wrote: > Didn't have the time to investigate much further, but the one thing that > popped out is that partitioning was no longer working on 1.6.1. This would > definitely explain the 2x performance loss. > > Checking 1.5.1 Spark logs for the same application showed that our > partitioner was working correctly, and after the DStream / RDD creation a > user session was only processed on a single machine. Running on top of > 1.6.1 > though, the session was processed on up to 4 machines, in a 5 node cluster > including the driver, with a lot of redundant operations. We use a custom > but very simple partitioner which extends HashPartitioner. It partitions on > a case class which has a single string parameter. > > Speculative operations are turned off by default, and we never enabled it, > so it's not that. > > Right now we're postponing any Spark upgrade, and we'll probably try to > upgrade directly to Spark 2.0, hoping the partitioning issue is no longer > present there. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Severe-Spark-Streaming-performance-degradation-after-upgrading-to-1-6-1-tp27056p27334.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org <javascript:;> > >