> Obviously the fine-tuning that I did is not needed for the
> scikit's storage of the datasets, but it general fast dump/load of Python
> objects is useful for scientific computing and big data (think caching or
> message passing parallel computing).

If you want to experiment with more options, you might also play with
blosc (http://blosc.pytables.org/trac). The compression level is not
as good as heavier weight algorithms, but it is really zippy. I ended
up using it as my compressor of choice, since I was willing to
sacrifice a bit of disk space in exchange for faster loading.

Some old, crude benchmarks -- not carefully measured -- on pickled
mnist are at http://groups.google.com/group/theano-users/msg/4bbccbd4a7e8c2ed.

-josh

------------------------------------------------------------------------------
Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
infrastructure or vast IT resources to deliver seamless, secure access to
virtual desktops. With this all-in-one solution, easily deploy virtual 
desktops for less than the cost of PCs and save 60% on VDI infrastructure 
costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to