On 06/13/2012 10:52 AM, Olivier Grisel wrote: > 2012/6/13 Emanuele Olivetti<[email protected]>: >> Hi, >> >> You can use gzip.open() instead of open() to add compression and to >> (possibly) >> decrease the file size a lot - at least it did to me in a similar example: >> >> import gzip >> pickle.dump(clf, gzip.open("test.pkl", 'wb'), >> protocol=pickle.HIGHEST_PROTOCOL) >> >> # To retrieve: >> clf = pickle.load(gzip.open("test.pkl")) > Note that joblib can do this by passing a compression level to `dump` > as explained by @pprett and @mblondel. joblib pickler is smarter > (faster) than the default python pickler at serializing large > numerical arrays too. >
I made some preliminary tests with 5000x5000 random matrix and observed more or less the same results. I see that joblib uses pickle + zlib + pickle.HIGHEST_PROTOCOL so it is not big surprise. Are there settings in which joblib.dump() is expected to provide larger gains? Of course joblib.dump solution has much more concise syntax than pickle+gzip, which is a welcome plus. Best, Emanuele ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
