You can control the size of your random forest by adjusting the parameters n_estimators, min_samples_split and even max_depth (read the documentation for more details).
It's up to you to find parameter values that match your constraints in terms of accuracy vs model size in RAM and prediction speed. To get slightly faster dumping and loading you can do: from sklearn.externals import joblib then save the model with: joblib.dump(rf, filename) Then later: model = joblib.load(filename, mmap_mode='r') Using the mmap_mode argument make it possible to share memory if you have several python processes that need to load the same mode on the same Linux / POSIX server (e.g. several Celery offline workers or gunicorn + flask HTTP computing predictions in concurrently). Also for regression or classification with a small number of tasks you might want to try GradientBoostingRegressor/Classifier instead of RF: you might get smaller models for similar predictive accuracy as the RF models. Have a look at those slides for tricks to adjust Gradient Boosting parameters: http://orbi.ulg.ac.be/handle/2268/163521 -- Olivier ------------------------------------------------------------------------------ Flow-based real-time traffic analytics software. Cisco certified tool. Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer Customize your own dashboards, set traffic alerts and generate reports. Network behavioral analysis & security monitoring. All-in-one tool. http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general