Re: [Scikit-learn-general] indices_ on DecisionTreeRegressor

Andreas Mueller Thu, 29 Oct 2015 09:33:40 -0700

Hi Mike.

This has been fixed in the development version (and the releasecandidate which will be released imminently).


Best,
Andy


On 10/27/2015 05:59 PM, Michael Albert wrote:

Greetings!

When pickling a random forest fit, the storage requirements seemdisproportionately large.It seems that that the space usage is dominated by the indices_property on the DecisionTreeRegressor's in the estimators_.

For what are these needed?

It seems that one can do predictions after deleting them, and save alot of space.

Sample code and output below.

Thanks!
-Mike

-----Sample Code-----
#!/usr/bin/env python

#c.f.http://scikit-learn-general.narkive.com/yJjAn9P2/pickled-random-forest-file-size-by-design


import sklearn.ensemble, pickle

N=500000
toPredict = [[i % 6, i % 7, i % 8] for i in range(1000)]

clf = sklearn.ensemble.RandomForestClassifier(n_estimators=128)
clf.fit(X = [[i % 6, i % 7, i % 8] for i in range(N)],
          y=[i % 5 > 0 for i in range(N)])

size1 = len(pickle.dumps(clf))
print("size1 = " + str(size1))

predict1 = clf.predict(toPredict)


for x in clf.estimators_:
  del x.indices_

size2 = len(pickle.dumps(clf))
print("size2 = " + str(size2))

predict2 = clf.predict(toPredict)

tot = (predict1 != predict2).sum()
print("error = " + str(tot))

-----sample output------
size1 = 67145826
size2 = 3137874
error = 0




------------------------------------------------------------------------------


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] indices_ on DecisionTreeRegressor

Reply via email to