Hi, any inputs will be welcome regarding below We are running with external shuffle service. Mesos cluster(1.5.1)
After upgrading our production workload to spark 2.3 we started to see OOM failures of external shuffle services(running on each node). Does anybody experienced same problems? Any direction to any code would be helpful(I know that there was work done in external shuffle service domain under 2.3, but from reading PRs can't pinpoint what change causing those OOM) Unfortunately there is no test case for reproduction and even with 2.3, OOM failures start after 2+ days of production load Igor -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org