Hi list,

This message is not terribly informative. I just to share my current
successes with joblib compression.

I am a bit frustrated at the fact that the LFW cache takes 400M on my
disk, for something that I never used. The disk space in the LFW cache is
made of two major contributors:

 * The 'lfw_funneled' directory, with the jpeg images: 289M

 * The joblib directory, used to store precomputed extraction of the
   images: 197M

I spent quite a while trying to play trick in the code to load and
compress intermediate data structures, in order to avoid having thousands
of jpeg stored in the lfw_funneled directory for no good reasons, but
couldn't really find a good compromise. The best I can get to, it to use
tar followed by bzip2, with gets me down to 231M.

I tried my current development version of joblib, with compression
activated. This brings down the size of the joblib directory to 79M.

Of course there is a price to pay in speed:

* With compressed joblib:

  In [2]: %timeit d = datasets.fetch_lfw_people()
  1 loops, best of 3: 2.49 s per loop

  In [3]: %timeit d = datasets.fetch_lfw_pair()
  1 loops, best of 3: 822 ms per loop

* With joblib and no compression:

  In [2]: %timeit d = datasets.fetch_lfw_people()
  100 loops, best of 3: 2.64 ms per loop

  In [3]: %timeit d = datasets.fetch_lfw_pairs()
  100 loops, best of 3: 3.44 ms per loop

* Without joblib caching:

  In [2]: %timeit d = datasets.fetch_lfw_people()
  1 loops, best of 3: 84.9 s per loop

  In [3]: %timeit d = datasets.fetch_lfw_pairs()
  1 loops, best of 3: 26.1 s per loop

I think that the new joblib has a useful compression/speed tradeoff :) I
need to iron it a bit more, release it, and we can systematically use it
in the dataset loaders (note that it will not beat domain-specific
compressed data standards, e.g. for images or music).

Gael

------------------------------------------------------------------------------
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create 
new or port existing apps to sell to consumers worldwide. Explore the 
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to