You can control the size of your random forest by adjusting the
parameters n_estimators, min_samples_split and even max_depth (read
the documentation for more details).

It's up to you to find parameter values that match your constraints in
terms of accuracy vs model size in RAM and prediction speed.

To get slightly faster dumping and loading you can do:

from sklearn.externals import joblib

then save the model with:

joblib.dump(rf, filename)

Then later:

model = joblib.load(filename, mmap_mode='r')

Using the mmap_mode argument make it possible to share memory if you
have several python processes that need to load the same mode on the
same Linux / POSIX server (e.g. several Celery offline workers or
gunicorn + flask HTTP computing predictions in concurrently).


Also for regression or classification with a small number of tasks you
might want to try GradientBoostingRegressor/Classifier instead of RF:
you might get smaller models for similar predictive accuracy as the RF
models. Have a look at those slides for tricks to adjust Gradient
Boosting parameters:

http://orbi.ulg.ac.be/handle/2268/163521

-- 
Olivier

------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to