well I guess your problem is quite unbalanced and due to the information
value as a splitting criterion I guess the algo stops after very view splits

work arround is oversampling

build many training datasets like

take randomly 50% of the positives and from the negativ the same amount or
let say the double

=> 6000 positives and 12000 negatives

build a tree

this you do many times => many models (agents)

and than you make an ensemble model. means vote all the model

in a way similar two random forest but at the completely different



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/MLLib-Decision-Tree-not-getting-built-for-5-or-more-levels-maxDepth-5-and-the-one-built-for-3-levelsy-tp7401p7405.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to