Re: [Scikit-learn-general] Joblib compression and LFW

Gael Varoquaux Thu, 29 Dec 2011 09:01:57 -0800

On Wed, Dec 28, 2011 at 05:21:39PM +0100, Alexandre Gramfort wrote:
> thanks Gael for the christmas present :)


I just couldn't help playing more. I have pushed a new update that
enables to control the compression level, and in general can achieve
better compromises between speed and compression. Here are benchmarks on
my computer (3.5 year old dell laptop, Intel Core 2 Duo with 2Go RAM):

 Olivetti       old code  ,  write   2.01s,  read 0.197s,  disk   3M
                compress 0,  write   0.26s,  read 0.024s,  disk  12M
                compress 1,  write   0.74s,  read 0.176s,  disk   4M
                compress 3,  write   1.03s,  read 0.164s,  disk   3M
                compress 6,  write   2.03s,  read 0.156s,  disk   3M
                compress 9,  write   2.16s,  read 0.158s,  disk   3M
                mmap      ,  write   0.89s,  read 0.003s,  disk  12M

  20news        old code  ,  write   4.23s,  read 0.435s,  disk   9M
                compress 0,  write   0.59s,  read 0.118s,  disk  23M
                compress 1,  write   1.80s,  read 0.415s,  disk  10M
                compress 3,  write   1.83s,  read 0.401s,  disk   9M
                compress 6,  write   2.91s,  read 0.397s,  disk   8M
                compress 9,  write   3.92s,  read 0.402s,  disk   8M
                mmap      ,  write   0.57s,  read 0.112s,  disk  23M

  LFW pairs     old code  ,  write  12.84s,  read 0.799s,  disk  18M
                compress 0,  write   2.24s,  read 0.080s,  disk  48M
                compress 1,  write   3.11s,  read 0.790s,  disk  21M
                compress 3,  write   4.80s,  read 0.687s,  disk  18M
                compress 6,  write  10.71s,  read 0.725s,  disk  18M
                compress 9,  write  55.39s,  read 0.666s,  disk  17M
                mmap      ,  write   2.14s,  read 0.003s,  disk  48M

       Species  old code  ,  write   7.57s,  read 0.986s,  disk   6M
                compress 0,  write   4.31s,  read 0.167s,  disk 103M
                compress 1,  write   1.61s,  read 0.468s,  disk   4M
                compress 3,  write   2.19s,  read 0.457s,  disk   3M
                compress 6,  write   2.13s,  read 0.444s,  disk   3M
                compress 9,  write   4.65s,  read 0.443s,  disk   2M
                mmap      ,  write   4.99s,  read 0.007s,  disk 103M

  LFW people    old code  ,  write  40.93s,  read 2.490s,  disk  60M
                compress 0,  write   6.39s,  read 0.231s,  disk 147M
                compress 1,  write   9.87s,  read 2.629s,  disk  66M
                compress 3,  write  16.86s,  read 2.380s,  disk  59M
                compress 6,  write  35.20s,  read 2.483s,  disk  60M
                compress 9,  write 188.15s,  read 2.300s,  disk  56M
                mmap      ,  write   6.35s,  read 0.003s,  disk 147M

Big LFW people  old code not available
                compress 0,  write  22.86s, read   0.819s, disk 441M
                compress 1,  write  39.20s, read  20.898s, disk 199M
                compress 3,  write  53.81s, read  15.821s, disk 179M
                compress 6,  write 110.72s, read  13.421s, disk 179M
                compress 9,  write 526.09s, read  11.922s, disk 170M
                mmap      ,  write  21.54s, read   0.040s, disk 441M

As with any benchmarks, caveat emptor!

The take home message seems to be that in general compress=3 gives a
reasonnable tradeoff between dump/load time, and disk space. Not
compressing is always faster, even in loading, and on non-compressed
data, memmaping kicks ass.

I used the scikit's datasets for benching because the performances
depend a lot on the entropy of the data, and thus I needed real-world
usecases. Obviously the fine-tuning that I did is not needed for the
scikit's storage of the datasets, but it general fast dump/load of Python
objects is useful for scientific computing and big data (think caching or
message passing parallel computing).

Some notes on the datasets: 

 * 20 news is mostly not arrays, it is a useful bench of fairly
   general Python objects.

 * Big LFW people is LFW people 4 times bigger. I created it because it
   takes ~ 450M, and with the various memory duplications (one due to the
   benching code, and other due to compression) it is pretty much the
   upper limit of what I can compress on my computer. It gives a good
   indication of RAM-limited performance.

I am attaching the benching code used. It plays ugly tricks to try to
flush the disk cache. Performance tradeoffs will depend on the relative
speed of the CPU and the disk. I'd love if other people could try it on
their computer (with the github/0.5.X version of joblib). Warning: it
will take a while to run! Once I have more insight, I'll make a 0.6
joblib release, and a blog posts with pretty graphs. Give me input for
this :)

 Gael

PS: a general lesson that I relearned during this process is that any
optimization is tricky and surprising, and that doing good benchmarks
takes a while, but is always worth the effort.

import os
import time
import shutil

import numpy as np
# Neuroimaging specific I/O
from sklearn import datasets

import joblib
from joblib.disk import disk_used


def kill_disk_cache():
    # Write ~100M to the disk
    file('tmp', 'w').write(np.random.random(2e7))


def timeit(func, *args, **kwargs):
    times = list()
    for _ in range(5):
        kill_disk_cache()
        t0 = time.time()
        func(*args, **kwargs)
        times.append(time.time() - t0)
    times.sort()
    return np.mean(times[1:-1])


def bench_dump(dataset, name='', compress_levels=(1, 0, 3, 6, 9)):
    time_write = list()
    time_read = list()
    du = list()
    for compress in compress_levels:
        if os.path.exists('out'):
            shutil.rmtree('out')
        os.mkdir('out')
        time_write.append(
            timeit(joblib.dump, dataset, 'out/test.pkl', compress=compress))
        du.append(disk_used('out')/1024.)
        time_read.append(
            timeit(joblib.load, 'out/test.pkl'))
        print '% 15s, compress %i,  write % 6.2fs, read % 7.3fs, disk % 5.1fM' % (
                    name, compress, time_write[-1], time_read[-1], du[-1])
    if os.path.exists('out'):
        shutil.rmtree('out')
    os.mkdir('out')
    time_write.append(
        timeit(joblib.dump, dataset, 'out/test.pkl'))
    time_read.append(
        timeit(joblib.load, 'out/test.pkl', mmap_mode='r'))
    du.append(disk_used('out')/1024.)
    print '% 15s, mmap      ,  write % 6.2fs, read % 7.3fs, disk % 5.1fM' % (
                    name, time_write[-1], time_read[-1], du[-1])

    return '% 10s | %s' % (name, ' | '.join('% 6.2fs/% 7.3fs, % 5.1fM' %
                    (t_w, t_r, d)
                    for t_w, t_r, d in zip(time_write, time_read, du)))

#d = datasets.fetch_olivetti_faces()
#bench_dump(d, 'Olivetti')
#print 80*'-'
#d = datasets.fetch_20newsgroups()
#bench_dump(d, '20news')
#print 80*'-'
#d = datasets.fetch_lfw_pairs()
#bench_dump(d, 'lfw_pairs')
#print 80*'-'
#d = datasets.fetch_species_distributions()
#bench_dump(d, 'Species')
d = datasets.fetch_lfw_people()
print 80*'-'
#bench_dump(d, 'big people')
d.data = np.r_[d.data, d.data, d.data ]
print 80*'-'
bench_dump(d, 'big people')

------------------------------------------------------------------------------
Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
infrastructure or vast IT resources to deliver seamless, secure access to
virtual desktops. With this all-in-one solution, easily deploy virtual 
desktops for less than the cost of PCs and save 60% on VDI infrastructure 
costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Joblib compression and LFW

Reply via email to