Hi,all I'm using Spark 2.0.0 to train a model with 1000w+ parameters, about 500GB data. The treeAggregate is used to aggregate the gradient, when I set the depth = 2 or 3, it works, and depth equals to 3 is faster. So I set depth = 4 to obtain better performance, but now some executors will be OOM in the shuffle phase. Why would this happen? With deeper depth, each executor should aggregate less records and use less memory, I don't know why OOM happens. Can someone help?