well I guess your problem is quite unbalanced and due to the information value as a splitting criterion I guess the algo stops after very view splits
work arround is oversampling build many training datasets like take randomly 50% of the positives and from the negativ the same amount or let say the double => 6000 positives and 12000 negatives build a tree this you do many times => many models (agents) and than you make an ensemble model. means vote all the model in a way similar two random forest but at the completely different -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLLib-Decision-Tree-not-getting-built-for-5-or-more-levels-maxDepth-5-and-the-one-built-for-3-levelsy-tp7401p7405.html Sent from the Apache Spark User List mailing list archive at Nabble.com.