Re: [Pytables-users] Can pytables be as fast as numpy.memmap?

braingateway Thu, 14 Oct 2010 12:10:10 -0700

Francesc Alted:

Hi braingateway,


A Thursday 14 October 2010 00:45:05 braingateway escrigué:

Hi everyone,


I used to work with numpy.memmap, the speed was roughly OK for me,
but I always need to save corresponding metadata (such as variable
names, variable shapes, experiment descriptions, etc.) into a
separate file, which is a very bad approach when I have lots of data
files and change their names from time to time. I heard a lot
amazing characteristics about Pytables recently. It sounds perfectly
match my application, It is based on HDF5, can be compressed by
Blosc, and even faster I/O speed that numpy.memmap. So I decide to
shift my project to Pytables. When I tried the official bench mark
code (poly.py), it seems OK, at least without compression the I/O
speed is faster than nump.memmap. However, when I try to dig a
little big deeper, I got problems immediately.


Mmh, you rather meant *performance* problems probably :-)

I did several
different experiments to get familiar with performance spec of
Pytables. First, I try to just read data chunks (smaller than
(1E+6,24)) into RAM from a random location in a larger data file
which containing (3E+6,24) random float64 numbers, about 549MB. For
each reading operation, I obtained the average speed from 10
experiments. It took numpy.memmap 56ms to read 1E+6 long single
column, and 73ms to read data chunk (1E+6,24). Pytables (with
chunkshape (65536, 24) complib = None) scored 1470ms for (1E+6,) and
257ms for (1E+6,24).The standard deviations of all the results are
always very low, which suggests the performance is stable.


I've been reading your code, and you are accessing your data column-

wise, instead of row-wise. In the C-world (and hence Python, NumPy,PyTables...) you want to make sure that you access data by row, notcolumn, to get maximum performance. For an explanation on why see:


https://portal.g-node.org/python-
autumnschool/_media/materials/starving_cpus/starvingcpus.pdf

and specifically slides 23 and 31.

Surprisingly, Pytables are 3 times slower than numpy.memmap. I
thought maybe pytables will show better or at least same performance
as numpy.memmap when I need to stream data to the disk and there is
some calculation involved. So next test, I used the same expr as
official bench mark code (poly.py) to operate on the entire array
and streamed the result onto disk. Averagely numpy.memmap+numexpr
took 1.5s to finish the calculation, but Pytables took 9.0s. Then I
start to think, this might because I used the wrong chunkshape for
Pytables. So I did all the tests again with chunkshape = None which
let the Pytables decide its optimized chunkshape (1365, 24). The
results are actually much worse than bigger chunkshape except for
reading (1E+6,) data into RAM, which is 225ms comparing to 1470ms
with bigger chunkshape. It took 358ms for reading a chunk with size
(1E+6,24) into RAM, and 14s to finish the expr calculation. In all
the tests, the pytables use far less RAM (<300MB) than numpy.memmap
(around 1GB).

PyTables should not use as much as 300 MB for your problem. You areprobably speaking about virtual memory, but you should get the amount of*resident* memory instead.

I am almost sure there is something I did wrong to
make pytables so slow. So if you could give me some hint, I shall
highly appreciate your assistance. I attached my test code and
results.

Another thing about your "performance problems" when using compressionis that you are trying your benchmarks with completely random data, andin this case, compression is rather useless. Make sure that you usereal data for your benchmarks. If it is compressible, things mightchange a lot.

BTW, in order to make your messages more readable, it would help if youcan make a proper use of paragraphing. You know, trying to read a bigparagraph with 40 lines is not exactly easy.


Cheers,

Sorry about the super big paragraph! Thanks a lot for your detailedresponse!I was aware it is pointless to compress pure random data, so I did notmention the compression rate at all in my post. Unfortunately, thedynamic range of my data is very large and it is very “random”-like.Blosc only reduces 10% file size of my real dataset , so I am not a fanof the compression feature.

I am really confused about the dimension order. I cannot see the freedomto change the Column-major or Row-major, because the HDF5 is Row-major.For example, I got N different sensors, each sensor generate 1E9samples/s, the fixed length dimension (fastest dimension) should alwaysstore N-samples from sensor network, so the time always has to be thecolumn. And in most case, we always want to access data from all sensorsduring certain period of time. In some case, we only want to access datajust from one or two sensors. So I think it is correct to make rawstores data from all sensors at the same time point. In my opinion,almost for all kind of real-world data, the slowest dimension shouldalways represent the time. Probably, I should inverse the dimensionorder when I load them into RAM.

Even though I did invert the dimension order,. the speed did not improvefor accessing all channels, but did improve a lot for only accessingdata from one sensor for both memmap and pytables. However, the pytablesis still much slower than memmap:


Read 24x1e6 data chunk at random position:

Memmap: 128ms, (without shift dimension order: 81ms)

Pytables (automatic chunshape = (1, 32768)) 327ms, (without shiftdimension order: 358ms)Pytables ( chunshape = (24, 65536)): 270ms, (without shift dimensionorder: 255ms)

Pytables ( chunshape = (1, 65535)): 328ms

Calculate expr on the whole array:

Memmap: 1.4~1.8s (without shift dimension order: 1.4~1.6s)

Pytables (automatic chunshape = (1, 32768)): 9.4s, (without shiftdimension order: 14s)Pytables ( chunshape = (24, 65536)): 16s, (without shift dimensionorder: 9s)

Pytables ( chunshape = (1, 65535)): 13s

Should I change some default parameters such as buffersize, etc, toimprove the performance?

By the way, after I change shape to (24,3e6), the pytables.Expr returnsan Error:The error was --> <type 'exceptions.AttributeError'>: 'Expr' object hasno attribute 'BUFFERTIMES'.

I think this is because you have not updated the expression.py for new‘BUFFER_TIMES’ parameter? So I add:

from tables.parameters import BUFFER_TIMES
change self.BUFFERTIMES to BUFFER_TIMES

I hope this is correct.

Thanks a lot

LittleBigBrain

#######################################################################
# This script compares I/O speed and the speed of the computation of
#a polynomial for different (numpy.memmap+numexpr and tables.Expr)
#
# Author: Little Big Brain
# Date: 2010-10-14
#######################################################################

import numpy as npy
import tables as pytbs
import numexpr as nep
import time
import os.path
import sys
expr = ".25*a**3 + .75*a**2 - 1.5*a - 2"

def 
CreateH5file(h5srcname,shape=None,dtype=None,chunkshape=None,srcArrayName=None):
    atom=pytbs.Atom.from_dtype(npy.dtype(dtype))
    h5file = pytbs.openFile(h5srcname, mode = "w")
    ca = h5file.createCArray(\
        h5file.root,srcArrayName,\
        atom,shape,\
        chunkshape=chunkshape)
    print '#='*20
    print 'pytable hdf5 file info:\r\n',h5file
    print 'pytable Array info:\r\n',ca
    print '#='*20
    t0=time.time()
    for i in range(0,10):
        #output random floating number into array
        ca[:,i*shape[1]/10:(i+1)*shape[1]/10]=\
                    npy.random.randn(shape[0],shape[1]/10)
    h5file.close()
    t1=time.time()
    
    return t1-t0,os.path.getsize(h5srcname)

def CreateArrayMap(h5srcname,npmapname=None,srcArrayName=None):
    #Copy data from pytable h5 file to generate numpy.memmap
    try:
        h5src = pytbs.openFile(\
            h5srcname,\
            mode = "r")
        
    except:
        (type, value, traceback) = sys.exc_info()
        print "Problems open %s" % \
              h5srcname
        print "The error was --> %s: %s" % (type, value)
        h5src.close()
        raise RuntimeError, "Please check Source File Name"
    try:
        
        srcArray=h5src.getNode('/'+srcArrayName)
    except:
        (type, value, traceback) = sys.exc_info()
        print "Problems open %s/%s" % \
              (h5srcname, srcArrayName)
        print "The error was --> %s: %s" % (type, value)
        print "The source file looks like:\n", h5src
        h5src.close()
        raise RuntimeError, "Please check Source Array Name"
    if os.path.isfile(npmapname):
        h5src.close()
        raise RuntimeError, '\''+\
              npmapname+'\'' +\
              ' already exists. Please Check Destination File Name'
    try:
        npmap1=npy.memmap(npmapname,dtype=srcArray.dtype.name,\
                          mode='w+',shape=srcArray.shape)
    except:
        (type, value, traceback) = sys.exc_info()
        print "Problems open %s" % \
              npmapname
        print "The error was --> %s: %s" % (type, value)
        h5src.close()
        raise RuntimeError, "Please check Destination File Name"
    try:
        t0=time.time()
        #Copying data into Memmap
        for i in range(0,10):
            npmap1[:shape[0],i*shape[1]/10:(i+1)*shape[1]/10]=\
                         srcArray[:shape[0],i*shape[1]/10:(i+1)*shape[1]/10]
            npmap1.flush()
        t1=time.time()
    except:
        (type, value, traceback) = sys.exc_info()
        print "Problems copy %s/%s into %s" % \
              (h5srcname, srcArrayName,npmapname)
        print "The error was --> %s: %s" % (type, value)
        h5src.close()
        npmap1.close()
        del npmap1
        raise RuntimeError, "Cannot Finish Copy, have enough space?"
    h5src.close()
    npmap1.close()
    del npmap1
    return t1-t0, os.path.getsize(npmapname)

def RandReadNpMap(npmapname,shape=None,dtype='float64',\
                  readlen=100,roundPerLen=10):
    #read same chunks at random position
    if not os.path.isfile(npmapname):
        raise RuntimeError, '\''+\
              npmapname+'\'' +\
              ' does not exist. Please Check Destination File Name'
    try:
        npmap1=npy.memmap(npmapname,dtype=dtype,\
                          mode='r',shape=shape)
    except:
        (type, value, traceback) = sys.exc_info()
        print "Problems open %s" % \
              npmapname
        print "The error was --> %s: %s" % (type, value)
        
        raise RuntimeError, "Please check Destination File Name"
    try:
        a1_st=npy.zeros((len(readlen),roundPerLen,2),dtype='uint32')
        dt_list=npy.zeros((len(readlen),roundPerLen,3),dtype='float64')
        #print readlen
        for i in range(0,len(readlen)):
##            print 'read %d rows' % readlen[i]
            for k in range(0,roundPerLen):
                #print 'start point = ', a1_st[i]
                #print 'reading length = ', readlen[i]
                t0=time.time()
                a=npy.zeros((shape[0],readlen[i]),dtype=dtype)
                #print 'debug info:', 'a shape',a.shape
                t1=time.time()
                dt_list[i,k,0]=t1-t0

                a1_st[i,k,0]=npy.random.randint(0,shape[1]-readlen.max()-1)
                t0=time.time()
                a[0,:]=npmap1[0,a1_st[i,k,0]:(readlen[i]+a1_st[i,k,0])]
                t1=time.time()
                dt_list[i,k,1]=t1-t0

                a1_st[i,k,1]=npy.random.randint(0,shape[1]-readlen.max()-1)
                t0=time.time()
                a[:,:]=npmap1[:shape[0],a1_st[i,k,1]:(readlen[i]+a1_st[i,k,1])]
                t1=time.time()
                dt_list[i,k,2]=t1-t0
##            print 'create a: %f ms, read 1st row:%f ms, read all:%f ms' \
##                  % tuple(dt_list[i,:,:].mean(axis=0)*1e3)
            #print 'debug info:','dt_list ms', dt_list*1e3

    except:
        (type, value, traceback) = sys.exc_info()
        print "Problems retrive data from %s" % \
              npmapname
        print "The error was --> %s: %s" % (type, value)
        npmap1.close()
        del npmap1
        raise RuntimeError, "Cannot Finish Copy, have enough space?"
    npmap1.close()
    del npmap1
    return dt_list

def RandReadTables(h5srcname,\
                   srcArrayName=None,\
                   readlen=100,roundPerLen=10):
    
    if not os.path.isfile(h5srcname):
        raise RuntimeError, '\''+\
              h5srcname+'\'' +\
              ' does not exist. Please Check Destination File Name'
    try:
        h5src=pytbs.openFile(h5srcname,mode='r')
    except:
        (type, value, traceback) = sys.exc_info()
        print "Problems open %s" % \
              h5srcname
        print "The error was --> %s: %s" % (type, value)
        
        raise RuntimeError, "Please check Destination File Name"
    try:
        
        srcArray=h5src.getNode('/'+srcArrayName)
    except:
        (type, value, traceback) = sys.exc_info()
        print "Problems open %s/%s" % \
              (h5srcname, srcArrayName)
        print "The error was --> %s: %s" % (type, value)
        print "The source file looks like:\n", h5src
        h5src.close()
        raise RuntimeError, "Please check Source Array Name"
    shape=srcArray.shape
    atom=srcArray.atom
    dstchunkshape=srcArray.chunkshape
    
    try:
        a1_st=npy.random.randint(0,shape[1]-readlen.max()-1,\
                                 (len(readlen),roundPerLen,2))
        dt_list=npy.zeros((len(readlen),roundPerLen,3),dtype='float64')
        for i in range(0,len(readlen)):
##            print 'read %d rows' % readlen[i]
            for k in range(0,roundPerLen):
##                print 'start point = ', a1_st[i]
##                print 'reading length = ', readlen[i]
                t0=time.time()
                a=npy.zeros((shape[0],readlen[i]),dtype=srcArray.atom.dtype)
                #print 'debug info:', 'a shape',a.shape
                t1=time.time()
                dt_list[i,k,0]=t1-t0

                t0=time.time()
                a[0,:]=srcArray[0,a1_st[i,k,0]:(readlen[i]+a1_st[i,k,0])]
                t1=time.time()
                dt_list[i,k,1]=t1-t0

                t0=time.time()
                
a[:,:]=srcArray[:shape[0],a1_st[i,k,1]:(readlen[i]+a1_st[i,k,1])]
                t1=time.time()
                dt_list[i,k,2]=t1-t0
##            print 'create array %f ms, reading average time =%fms for 1 row, 
%fms for all row'\
##                  % tuple(dt_list[i,:,:].mean(axis=0)*1e3)
        

    except:
        (type, value, traceback) = sys.exc_info()
        print "Problems compute data from %s" % \
              h5srcname
        print "The error was --> %s: %s" % (type, value)
        h5src.close()
        
        raise RuntimeError, "Cannot Finish Copy, have enough space?"
    h5src.close()
    
    return dt_list

def SerialRWNpMap(npmapname,dstmapname=None,shape=None,\
                  dtype='float64',\
                  readlen=100,roundPerLen=10):
    #Read chunk into Memory then do calculation then write to disk
    if not os.path.isfile(npmapname):
        raise RuntimeError, '\''+\
              npmapname+'\'' +\
              ' does not exist. Please Check Destination File Name'
    try:
        npmap1=npy.memmap(npmapname,dtype=dtype,\
                          mode='r',shape=shape)
    except:
        (type, value, traceback) = sys.exc_info()
        print "Problems open %s" % \
              npmapname
        print "The error was --> %s: %s" % (type, value)
        raise RuntimeError, "Please check Destination File Name"
    try:
        #create result memmap
        npmap2=npy.memmap(dstmapname,dtype=dtype,\
                          mode='w+',shape=shape)
    except:
        (type, value, traceback) = sys.exc_info()
        print "Problems open %s" % \
              dstmapname
        print "The error was --> %s: %s" % (type, value)
        raise RuntimeError, "Please check Destination File Name"
    try:
        dt_list=npy.zeros((len(readlen),roundPerLen),dtype='float64')
        for i in range(0,len(readlen)):
##            print 'read %d rows' % readlen[i]
            a=npy.zeros((shape[0],readlen[i]),dtype=dtype)
            for k in range(0,roundPerLen):
                t0=time.time()
                for ik in range(0,shape[1]/readlen[i]):
                    a[:,:]=npmap1[:shape[0],ik*readlen[i]:(ik+1)*readlen[i]]
                    
npmap2[:shape[0],ik*readlen[i]:(ik+1)*readlen[i]]=nep.evaluate(expr)
                t1=time.time()
                dt_list[i,k]=t1-t0
##                print 'elaspe ',(t1-t0)*1e3,'ms'
            
    except:
        (type, value, traceback) = sys.exc_info()
        print "Problems retrive data from %s" % \
              npmapname
        print "The error was --> %s: %s" % (type, value)
        npmap1.close()
        del npmap1
        raise RuntimeError, "Cannot Finish Copy, have enough space?"
    npmap1.close()
    del npmap1
    return dt_list
def SerialRWTables(h5srcname,h5dstname=None,\
                   srcArrayName=None,\
                   dstArrayName=None,\
                   filters=None,\
                   roundPerLen=10):
    
    if not os.path.isfile(h5srcname):
        raise RuntimeError, '\''+\
              h5srcname+'\'' +\
              ' does not exist. Please Check Destination File Name'
    try:
        h5src=pytbs.openFile(h5srcname,mode='r')
    except:
        (type, value, traceback) = sys.exc_info()
        print "Problems open %s" % \
              h5srcname
        print "The error was --> %s: %s" % (type, value)
        
        raise RuntimeError, "Please check Destination File Name"
    try:
        
        srcArray=h5src.getNode('/'+srcArrayName)
    except:
        (type, value, traceback) = sys.exc_info()
        print "Problems open %s/%s" % \
              (h5srcname, srcArrayName)
        print "The error was --> %s: %s" % (type, value)
        print "The source file looks like:\n", h5src
        h5src.close()
        raise RuntimeError, "Please check Source Array Name"
    shape=srcArray.shape
    atom=srcArray.atom
    filters=filters
    dstchunkshape=srcArray.chunkshape
        
    if not dstArrayName:
        dstArrayName=srcArrayName

    try:
        h5dst = pytbs.openFile(\
            h5dstname,\
            mode = "w")
    
    except:
        (type, value, traceback) = sys.exc_info()
        print "Problems open %s" % \
              h5dstname
        print "The error was --> %s: %s" % (type, value)
        h5src.close()
        h5dst.close()
        raise RuntimeError, "Please check Destination File Name"
    try:
        dstArray = h5dst.createCArray(\
            h5dst.root,dstArrayName,\
            atom,shape,filters=filters,\
            chunkshape=dstchunkshape)
    except:
        (type, value, traceback) = sys.exc_info()
        print "Problems create %s/%s" % \
              (h5dstname, dstArrayName)
        print "The error was --> %s: %s" % (type, value)
        print "The destination file looks like:\n", h5dst
        h5src.close()
        h5dst.close()
        raise RuntimeError, "Please check Destination Array parameters"
    print '#='*20
    print 'pytable hdf5 file info:\r\n',h5dst
    print 'pytable Array info:\r\n',dstArray
    print '#='*20
    
    try:
        dt_list=npy.zeros((roundPerLen,),dtype='float64')
        print srcArray.chunkshape
        a=srcArray
        for k in range(0,roundPerLen):
            t0=time.time()
            exprtbs = pytbs.Expr(expr)
            exprtbs.setOutput(dstArray)
            exprtbs.eval()
            t1=time.time()
            dt_list[k]=t1-t0
##            print 'read-compute-write:%f ms' \
##                  % (dt_list[k]*1e3)
            
            #print 'debug info:','dt_list ms', dt_list*1e3
##        print 'time elapse %f ms' %(dt_list.mean()*1e3)

    except:
        (type, value, traceback) = sys.exc_info()
        print "Problems compute data from %s to %s" % \
              (h5srcname,h5dstname)
        print "The error was --> %s: %s" % (type, value)
        h5src.close()
        h5dst.close()
        raise RuntimeError, "Cannot Finish Copy, have enough space?"
    h5src.close()
    h5dst.close()
    return dt_list
if __name__ == '__main__':
    version='v2Cm-'
    pytbs.print_versions()
    shape=(24,3e6)
    chunkshape=None
    clib='blosc'
    clevel=3
    srcArrayName='TestArray'
    filters = pytbs.Filters(complib=clib, complevel=clevel)
    readlen=10**npy.arange(2,7,dtype='uint32')
    roundPerLen=10 # run 10 times for each operation, to get average elaspe time
    testFolder='G:/PyTables_vs_Memmap'
    h5srcname=testFolder+'/'+version+'src-'+str(chunkshape)+'-noncomp'+'.h5'
    srcmapname=testFolder+'/'+version+'src-'+'.memmap'
    dstmapname=testFolder+'/'+version+'dst-'+'.memmap'
    h5dstname1=testFolder+'/'+version+'dst-'+str(chunkshape)+'-noncomp'+'.h5'
    h5dstname2=testFolder+'/'+version+'dst-'+str(chunkshape)+\
                '-'+clib+'('+str(clevel)+')'+'.h5'
    createTimes=npy.zeros((2,1))
    fileSize=npy.zeros((2,1))
    createTimes[0],fileSize[0]=CreateH5file(\
        h5srcname,shape=shape,chunkshape=chunkshape\
        ,srcArrayName=srcArrayName,dtype='float64')
    createTimes[1],fileSize[1]=CreateArrayMap(\
        h5srcname,npmapname=srcmapname\
        ,srcArrayName=srcArrayName)
    dt_list0=RandReadNpMap(srcmapname,\
                    shape=shape,dtype='float64',\
                    readlen=readlen,roundPerLen=roundPerLen)
    dt_list1=RandReadTables(h5srcname,\
                    srcArrayName=srcArrayName,\
                    readlen=readlen,roundPerLen=roundPerLen)
    dt_list2=SerialRWNpMap(srcmapname,dstmapname=dstmapname,\
                    shape=shape,dtype='float64',\
                    readlen=readlen,roundPerLen=roundPerLen)
    dt_list3=SerialRWTables(h5srcname,h5dstname=h5dstname1,\
                    srcArrayName=srcArrayName,\
                    roundPerLen=roundPerLen)
    dt_list4=SerialRWTables(h5srcname,h5dstname=h5dstname2,\
                    srcArrayName=srcArrayName,\
                    filters=filters,\
                    roundPerLen=roundPerLen)
    
    save_var_list=['dt_list'+str(i) for i in range(0,5)]
    save_var_list=save_var_list+\
                   ['createTimes','fileSize',\
                    'h5srcname','srcmapname',\
                    'dstmapname','h5dstname1',\
                    'h5dstname2','shape',\
                    'chunkshape','clib',
                    'clevel','srcArrayName',\
                    'filters','readlen',\
                    'roundPerLen']
    save_var_dict={}
    for i in save_var_list:
        save_var_dict[i]=vars()[i]
##    param_file=__file__.rsplit(os.path.sep,1)[1]
##    param_file=param_file.rsplit('.',1)[0]
    param_file='pyIOspeedCompareAll-'+version+time.strftime('%Y%m%d%H%M')
    npy.save(testFolder+'/'+param_file,save_var_dict)

    print '#='*20
    print 'Source Pytables HDF5 file: %s \r\n%d MB generated in %d s'\
          % (h5srcname,fileSize[0]/(1024**2),createTimes[0])
    print 'Source MemMap file: %s \r\n%d MB generated in %d s'\
          % (srcmapname,fileSize[1]/(1024**2),createTimes[1])
    print '#*'*10
    print '#*'*5, 'Randomly read some chunks into RAM by numpy.memmap'
    for i in range(0,len(readlen)):
        print ('Randomly reading %d x %d:\r\n'\
    +'%f ms for numpy array creating,' \
    +'%f ms for reading 1st row, %f for reading all') \
              % ((shape[0],readlen[i])+\
                 tuple(dt_list0[i,:,:].mean(axis=0)*1e3))
    print '#*'*5, 'Randomly read some chunks into RAM by pytables'
    for i in range(0,len(readlen)):
        print ('Randomly reading %d x %d:\r\n'\
    +'%f ms for numpy array creating,' \
    +'%f ms for reading 1st row, %f for reading all') \
              % ((shape[0],readlen[i])+\
                 tuple(dt_list1[i,:,:].mean(axis=0)*1e3))
    print '#*'*10
    print '#*'*5,'Compute polynomal function:',expr,'\r\n'
    print '#*'*5,'and stream to HDD by numpy.memmap'
    print 'Result MemMap file: %s \r\n%d MB'\
          % (dstmapname,os.path.getsize(dstmapname)/(1024**2))
    for i in range(0,len(readlen)):
        print ('reading chunkshape %dx%d:\r\n'+ \
            '%f ms for reading and computing the whole array') \
            % (shape[0],readlen[i],\
                dt_list2[i,:].mean()*1e3)
    print '#*'*10
    print '#*'*5,'Compute polynomal function:',expr,'\r\n'
    print '#*'*5,'and stream to HDD by pytalbes.Expr'
    print 'Result HDF5 file: %s \r\n%d MB'\
          % (h5dstname1,os.path.getsize(h5dstname1)/(1024**2))
    print ('%f ms for reading and computing the whole array') \
        % (dt_list3.mean()*1e3)
    print '#*'*10
    print '#*'*5,'Compute polynomal function:',expr,'\r\n'
    print '#*'*5,'and stream to HDD by pytalbes.Expr'
    print 'Result HDF5 file: %s \r\n%d MB'\
          % (h5dstname2,os.path.getsize(h5dstname2)/(1024**2))
    print ('%f ms for reading and computing the whole array') \
        % (dt_list4.mean()*1e3)

 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
PyTables version:  2.2
HDF5 version:      1.8.5
NumPy version:     1.4.1
Numexpr version:   1.3.1 (using VML/MKL 10.2.5)
Zlib version:      1.2.3 (in Python interpreter)
Blosc version:     1.0 (2010-07-01)
Python version:    2.6.5 (r265:79096, Mar 19 2010, 18:02:59) [MSC v.1500 64 bit 
(AMD64)]
Byte-ordering:     little
Detected cores:    4
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
pytable hdf5 file info:G:/PyTables_vs_Memmap/v2Cm-src-None-noncomp.h5 (File) ''
Last modif.: 'Thu Oct 14 19:16:47 2010'
Object Tree: 
/ (RootGroup) ''
/TestArray (CArray(24, 3000000)) ''

pytable Array info:/TestArray (CArray(24, 3000000)) ''
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
pytable hdf5 file info:G:/PyTables_vs_Memmap/v2Cm-dst-None-noncomp.h5 (File) ''
Last modif.: 'Thu Oct 14 19:19:26 2010'
Object Tree: 
/ (RootGroup) ''
/TestArray (CArray(24, 3000000)) ''

pytable Array info:/TestArray (CArray(24, 3000000)) ''
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
(1, 32768)
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
pytable hdf5 file info:G:/PyTables_vs_Memmap/v2Cm-dst-None-blosc(3).h5 (File) ''
Last modif.: 'Thu Oct 14 19:21:00 2010'
Object Tree: 
/ (RootGroup) ''
/TestArray (CArray(24, 3000000), shuffle, blosc(3)) ''

pytable Array info:/TestArray (CArray(24, 3000000), shuffle, blosc(3)) ''
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
(1, 32768)
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
Source Pytables HDF5 file: G:/PyTables_vs_Memmap/v2Cm-src-None-noncomp.h5 552 
MB generated in 17 s
Source MemMap file: G:/PyTables_vs_Memmap/v2Cm-src-.memmap 549 MB generated in 
20 s
#*#*#*#*#*#*#*#*#*#*
#*#*#*#*#* Randomly read some chunks into RAM by numpy.memmap
Randomly reading 24 x 100:0.000000 ms for numpy array creating,0.000000 ms for 
reading 1st row, 0.000000 for reading all
Randomly reading 24 x 1000:0.000000 ms for numpy array creating,0.000000 ms for 
reading 1st row, 0.000000 for reading all
Randomly reading 24 x 10000:1.600003 ms for numpy array creating,0.000000 ms 
for reading 1st row, 1.500010 for reading all
Randomly reading 24 x 100000:7.800007 ms for numpy array creating,0.000000 ms 
for reading 1st row, 14.099979 for reading all
Randomly reading 24 x 1000000:78.099990 ms for numpy array creating,6.299996 ms 
for reading 1st row, 128.100014 for reading all
#*#*#*#*#* Randomly read some chunks into RAM by pytables
Randomly reading 24 x 100:0.000000 ms for numpy array creating,1.600003 ms for 
reading 1st row, 6.199980 for reading all
Randomly reading 24 x 1000:0.000000 ms for numpy array creating,0.000000 ms for 
reading 1st row, 4.700017 for reading all
Randomly reading 24 x 10000:0.000000 ms for numpy array creating,3.099990 ms 
for reading 1st row, 7.800007 for reading all
Randomly reading 24 x 100000:4.600000 ms for numpy array creating,3.000021 ms 
for reading 1st row, 40.799975 for reading all
Randomly reading 24 x 1000000:93.699980 ms for numpy array creating,18.900013 
ms for reading 1st row, 326.500010 for reading all
#*#*#*#*#*#*#*#*#*#*
#*#*#*#*#* Compute polynomal function: .25*a**3 + .75*a**2 - 1.5*a - 2 
#*#*#*#*#* and stream to HDD by numpy.memmap
Result MemMap file: G:/PyTables_vs_Memmap/v2Cm-dst-.memmap 549 MB
reading chunkshape 24x100:3798.399997 ms for reading and computing the whole 
array
reading chunkshape 24x1000:1409.399986 ms for reading and computing the whole 
array
reading chunkshape 24x10000:1845.300007 ms for reading and computing the whole 
array
reading chunkshape 24x100000:1796.900010 ms for reading and computing the whole 
array
reading chunkshape 24x1000000:1790.599990 ms for reading and computing the 
whole array
#*#*#*#*#*#*#*#*#*#*
#*#*#*#*#* Compute polynomal function: .25*a**3 + .75*a**2 - 1.5*a - 2 
#*#*#*#*#* and stream to HDD by pytalbes.Expr
Result HDF5 file: G:/PyTables_vs_Memmap/v2Cm-dst-None-noncomp.h5 552 MB
9401.500010 ms for reading and computing the whole array
#*#*#*#*#*#*#*#*#*#*
#*#*#*#*#* Compute polynomal function: .25*a**3 + .75*a**2 - 1.5*a - 2 
#*#*#*#*#* and stream to HDD by pytalbes.Expr
Result HDF5 file: G:/PyTables_vs_Memmap/v2Cm-dst-None-blosc(3).h5 549 MB
12813.999987 ms for reading and computing the whole array
>>>

 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
PyTables version:  2.2
HDF5 version:      1.8.5
NumPy version:     1.4.1
Numexpr version:   1.3.1 (using VML/MKL 10.2.5)
Zlib version:      1.2.3 (in Python interpreter)
Blosc version:     1.0 (2010-07-01)
Python version:    2.6.5 (r265:79096, Mar 19 2010, 18:02:59) [MSC v.1500 64 bit 
(AMD64)]
Byte-ordering:     little
Detected cores:    4
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
pytable hdf5 file info:G:/PyTables_vs_Memmap/v1bCm-src-(1, 65535)-noncomp.h5 
(File) ''
Last modif.: 'Thu Oct 14 20:31:23 2010'
Object Tree: 
/ (RootGroup) ''
/TestArray (CArray(24, 3000000)) ''

pytable Array info:/TestArray (CArray(24, 3000000)) ''
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
pytable hdf5 file info:G:/PyTables_vs_Memmap/v1bCm-dst-(1, 65535)-noncomp.h5 
(File) ''
Last modif.: 'Thu Oct 14 20:34:05 2010'
Object Tree: 
/ (RootGroup) ''
/TestArray (CArray(24, 3000000)) ''

pytable Array info:/TestArray (CArray(24, 3000000)) ''
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
(1, 65535)
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
pytable hdf5 file info:G:/PyTables_vs_Memmap/v1bCm-dst-(1, 65535)-blosc(3).h5 
(File) ''
Last modif.: 'Thu Oct 14 20:36:18 2010'
Object Tree: 
/ (RootGroup) ''
/TestArray (CArray(24, 3000000), shuffle, blosc(3)) ''

pytable Array info:/TestArray (CArray(24, 3000000), shuffle, blosc(3)) ''
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
(1, 65535)
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
Source Pytables HDF5 file: G:/PyTables_vs_Memmap/v1bCm-src-(1, 
65535)-noncomp.h5 552 MB generated in 19 s
Source MemMap file: G:/PyTables_vs_Memmap/v1bCm-src-.memmap 549 MB generated in 
20 s
#*#*#*#*#*#*#*#*#*#*
#*#*#*#*#* Randomly read some chunks into RAM by numpy.memmap
Randomly reading 24 x 100:0.000000 ms for numpy array creating,0.000000 ms for 
reading 1st row, 1.500010 for reading all
Randomly reading 24 x 1000:0.000000 ms for numpy array creating,0.000000 ms for 
reading 1st row, 0.000000 for reading all
Randomly reading 24 x 10000:0.000000 ms for numpy array creating,0.000000 ms 
for reading 1st row, 1.599979 for reading all
Randomly reading 24 x 100000:7.700014 ms for numpy array creating,0.000000 ms 
for reading 1st row, 15.700006 for reading all
Randomly reading 24 x 1000000:106.399989 ms for numpy array creating,1.500010 
ms for reading 1st row, 126.499987 for reading all
#*#*#*#*#* Randomly read some chunks into RAM by pytables
Randomly reading 24 x 100:0.000000 ms for numpy array creating,1.500010 ms for 
reading 1st row, 9.399986 for reading all
Randomly reading 24 x 1000:0.000000 ms for numpy array creating,0.000000 ms for 
reading 1st row, 9.400010 for reading all
Randomly reading 24 x 10000:0.000000 ms for numpy array creating,1.600003 ms 
for reading 1st row, 13.999987 for reading all
Randomly reading 24 x 100000:15.499997 ms for numpy array creating,0.000000 ms 
for reading 1st row, 36.100006 for reading all
Randomly reading 24 x 1000000:96.999979 ms for numpy array creating,21.700025 
ms for reading 1st row, 328.199983 for reading all
#*#*#*#*#*#*#*#*#*#*
#*#*#*#*#* Compute polynomal function: .25*a**3 + .75*a**2 - 1.5*a - 2 
#*#*#*#*#* and stream to HDD by numpy.memmap
Result MemMap file: G:/PyTables_vs_Memmap/v1bCm-dst-.memmap 549 MB
reading chunkshape 24x100:3832.800007 ms for reading and computing the whole 
array
reading chunkshape 24x1000:1415.700006 ms for reading and computing the whole 
array
reading chunkshape 24x10000:1865.599990 ms for reading and computing the whole 
array
reading chunkshape 24x100000:1835.899997 ms for reading and computing the whole 
array
reading chunkshape 24x1000000:1785.900021 ms for reading and computing the 
whole array
#*#*#*#*#*#*#*#*#*#*
#*#*#*#*#* Compute polynomal function: .25*a**3 + .75*a**2 - 1.5*a - 2 
#*#*#*#*#* and stream to HDD by pytalbes.Expr
Result HDF5 file: G:/PyTables_vs_Memmap/v1bCm-dst-(1, 65535)-noncomp.h5 552 MB
13342.199993 ms for reading and computing the whole array
#*#*#*#*#*#*#*#*#*#*
#*#*#*#*#* Compute polynomal function: .25*a**3 + .75*a**2 - 1.5*a - 2 
#*#*#*#*#* and stream to HDD by pytalbes.Expr
Result HDF5 file: G:/PyTables_vs_Memmap/v1bCm-dst-(1, 65535)-blosc(3).h5 549 MB
13706.299996 ms for reading and computing the whole array
>>>

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
PyTables version:  2.2
HDF5 version:      1.8.5
NumPy version:     1.4.1
Numexpr version:   1.3.1 (using VML/MKL 10.2.5)
Zlib version:      1.2.3 (in Python interpreter)
Blosc version:     1.0 (2010-07-01)
Python version:    2.6.5 (r265:79096, Mar 19 2010, 18:02:59) [MSC v.1500 64 bit 
(AMD64)]
Byte-ordering:     little
Detected cores:    4
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
pytable hdf5 file info:G:/PyTables_vs_Memmap/v1Cm-src-(24, 65536)-noncomp.h5 
(File) ''
Last modif.: 'Thu Oct 14 19:46:08 2010'
Object Tree: 
/ (RootGroup) ''
/TestArray (CArray(24, 3000000)) ''

pytable Array info:/TestArray (CArray(24, 3000000)) ''
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
pytable hdf5 file info:G:/PyTables_vs_Memmap/v1Cm-dst-(24, 65536)-noncomp.h5 
(File) ''
Last modif.: 'Thu Oct 14 19:48:48 2010'
Object Tree: 
/ (RootGroup) ''
/TestArray (CArray(24, 3000000)) ''

pytable Array info:/TestArray (CArray(24, 3000000)) ''
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
(24, 65536)
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
pytable hdf5 file info:G:/PyTables_vs_Memmap/v1Cm-dst-(24, 65536)-blosc(3).h5 
(File) ''
Last modif.: 'Thu Oct 14 19:51:26 2010'
Object Tree: 
/ (RootGroup) ''
/TestArray (CArray(24, 3000000), shuffle, blosc(3)) ''

pytable Array info:/TestArray (CArray(24, 3000000), shuffle, blosc(3)) ''
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
(24, 65536)
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
Source Pytables HDF5 file: G:/PyTables_vs_Memmap/v1Cm-src-(24, 
65536)-noncomp.h5 552 MB generated in 20 s
Source MemMap file: G:/PyTables_vs_Memmap/v1Cm-src-.memmap 549 MB generated in 
20 s
#*#*#*#*#*#*#*#*#*#*
#*#*#*#*#* Randomly read some chunks into RAM by numpy.memmap
Randomly reading 24 x 100:0.000000 ms for numpy array creating,0.000000 ms for 
reading 1st row, 0.000000 for reading all
Randomly reading 24 x 1000:0.000000 ms for numpy array creating,0.000000 ms for 
reading 1st row, 0.000000 for reading all
Randomly reading 24 x 10000:0.000000 ms for numpy array creating,0.000000 ms 
for reading 1st row, 1.600003 for reading all
Randomly reading 24 x 100000:7.700014 ms for numpy array creating,3.200006 ms 
for reading 1st row, 12.499976 for reading all
Randomly reading 24 x 1000000:92.299962 ms for numpy array creating,7.800007 ms 
for reading 1st row, 129.600024 for reading all
#*#*#*#*#* Randomly read some chunks into RAM by pytables
Randomly reading 24 x 100:0.000000 ms for numpy array creating,0.000000 ms for 
reading 1st row, 1.499987 for reading all
Randomly reading 24 x 1000:0.000000 ms for numpy array creating,0.000000 ms for 
reading 1st row, 0.000000 for reading all
Randomly reading 24 x 10000:1.500010 ms for numpy array creating,1.600003 ms 
for reading 1st row, 1.600003 for reading all
Randomly reading 24 x 100000:12.400007 ms for numpy array creating,0.000000 ms 
for reading 1st row, 26.699972 for reading all
Randomly reading 24 x 1000000:95.199966 ms for numpy array creating,15.500045 
ms for reading 1st row, 270.499992 for reading all
#*#*#*#*#*#*#*#*#*#*
#*#*#*#*#* Compute polynomal function: .25*a**3 + .75*a**2 - 1.5*a - 2 
#*#*#*#*#* and stream to HDD by numpy.memmap
Result MemMap file: G:/PyTables_vs_Memmap/v1Cm-dst-.memmap 549 MB
reading chunkshape 24x100:3796.900010 ms for reading and computing the whole 
array
reading chunkshape 24x1000:1387.500000 ms for reading and computing the whole 
array
reading chunkshape 24x10000:1854.600000 ms for reading and computing the whole 
array
reading chunkshape 24x100000:1782.800007 ms for reading and computing the whole 
array
reading chunkshape 24x1000000:1778.100014 ms for reading and computing the 
whole array
#*#*#*#*#*#*#*#*#*#*
#*#*#*#*#* Compute polynomal function: .25*a**3 + .75*a**2 - 1.5*a - 2 
#*#*#*#*#* and stream to HDD by pytalbes.Expr
Result HDF5 file: G:/PyTables_vs_Memmap/v1Cm-dst-(24, 65536)-noncomp.h5 552 MB
15798.500013 ms for reading and computing the whole array
#*#*#*#*#*#*#*#*#*#*
#*#*#*#*#* Compute polynomal function: .25*a**3 + .75*a**2 - 1.5*a - 2 
#*#*#*#*#* and stream to HDD by pytalbes.Expr
Result HDF5 file: G:/PyTables_vs_Memmap/v1Cm-dst-(24, 65536)-blosc(3).h5 642 MB
177037.500000 ms for reading and computing the whole array
>>>

------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb

_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Can pytables be as fast as numpy.memmap?

Reply via email to