I'm pickling a random forest model (128 estimators, trained on 50k
examples) and the resulting .pkl size is on the order of 200MB.
Is that expected?  The whole dataset size is only 400k...

Here's the code that reproduces it:

import sklearn.ensemble, pickle
clf = sklearn.ensemble.RandomForestClassifier(n_estimators=128)
clf.fit(X = [[i % 6, i % 7, i % 8] for i in range(50000)], y=[i % 5 > 0 for
i in range(50000)])
pickle.dump(clf, open("test.pkl", 'wb'))

Regards,
Dmitry
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to