To clarify: You have 2.3M samples How many features? How many active features on average per sample? In 7k classes: multiclass or multilabel?
Have you tried limiting the depth of the forest? Have you tried embedding your feature space into a smaller vector (pre-trained embeddings, hashing, lda, PCA or random projection)?
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn