> Obviously the fine-tuning that I did is not needed for the > scikit's storage of the datasets, but it general fast dump/load of Python > objects is useful for scientific computing and big data (think caching or > message passing parallel computing).
If you want to experiment with more options, you might also play with blosc (http://blosc.pytables.org/trac). The compression level is not as good as heavier weight algorithms, but it is really zippy. I ended up using it as my compressor of choice, since I was willing to sacrifice a bit of disk space in exchange for faster loading. Some old, crude benchmarks -- not carefully measured -- on pickled mnist are at http://groups.google.com/group/theano-users/msg/4bbccbd4a7e8c2ed. -josh ------------------------------------------------------------------------------ Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex infrastructure or vast IT resources to deliver seamless, secure access to virtual desktops. With this all-in-one solution, easily deploy virtual desktops for less than the cost of PCs and save 60% on VDI infrastructure costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
