Dear All, I am using RandomForest on a data set which has less than 20 features, but about 400000 lines. The point is that, even if I work on a subset of about 30000 lines to train my model, when I save it using pickle, I get a large file in the order of several hundreds of Mb of space (see the snippet at the end of the email). I can then later load the model by doing the following
In [8]: pkl_file = open("rf_wallmart_holidays.txt") In [9]: clf = pickle.load(pkl_file) In [10]: pkl_file.close() However, I am concerned thay when I use the whole dataset, I will get a model size of the order of several Gb and I wonder if I will be able to load it via pickle as I do above. I am just wondering if I am making any gross mistake (I have never used pickle in the past). Any suggestions about efficient ways to store/read the models developed with sklearn is appreciated. Regards Lorenzo ################################################################################ clf = RandomForestRegressor(n_estimators=150,\ # compute_importances = True, \ n_jobs=2, verbose=3) sales=train.Weekly_Sales my_cols = set(train.columns) my_cols.remove("Weekly_Sales") my_cols = list(my_cols) clf.fit(train[my_cols], sales) f = open('rf_wallmart_non_holidays.txt','wb') pickle.dump(clf,f) ------------------------------------------------------------------------------ Flow-based real-time traffic analytics software. Cisco certified tool. Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer Customize your own dashboards, set traffic alerts and generate reports. Network behavioral analysis & security monitoring. All-in-one tool. http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general