On 06/13/2012 10:52 AM, Olivier Grisel wrote:
> 2012/6/13 Emanuele Olivetti<[email protected]>:
>> Hi,
>>
>> You can use gzip.open() instead of open() to add compression and to 
>> (possibly)
>> decrease the file size a lot - at least it did to me in a similar example:
>>
>> import gzip
>> pickle.dump(clf, gzip.open("test.pkl", 'wb'), 
>> protocol=pickle.HIGHEST_PROTOCOL)
>>
>> # To retrieve:
>> clf = pickle.load(gzip.open("test.pkl"))
> Note that joblib can do this by passing a compression level  to `dump`
> as explained by @pprett and @mblondel. joblib pickler is smarter
> (faster) than the default python pickler at serializing large
> numerical arrays too.
>

I made some preliminary tests with 5000x5000 random matrix
and observed more or less the same results. I see that joblib
uses pickle + zlib + pickle.HIGHEST_PROTOCOL so it is not
big surprise. Are there settings in which joblib.dump()
is expected to provide larger gains?

Of course joblib.dump solution has much more concise syntax
than pickle+gzip, which is a welcome plus.

Best,

Emanuele


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to