Francesc Alted:
Hi braingateway,
A Thursday 14 October 2010 00:45:05 braingateway escrigué:
Hi everyone,
I used to work with numpy.memmap, the speed was roughly OK for me,
but I always need to save corresponding metadata (such as variable
names, variable shapes, experiment descriptions, etc.) into a
separate file, which is a very bad approach when I have lots of data
files and change their names from time to time. I heard a lot
amazing characteristics about Pytables recently. It sounds perfectly
match my application, It is based on HDF5, can be compressed by
Blosc, and even faster I/O speed that numpy.memmap. So I decide to
shift my project to Pytables. When I tried the official bench mark
code (poly.py), it seems OK, at least without compression the I/O
speed is faster than nump.memmap. However, when I try to dig a
little big deeper, I got problems immediately.
Mmh, you rather meant *performance* problems probably :-)
I did several
different experiments to get familiar with performance spec of
Pytables. First, I try to just read data chunks (smaller than
(1E+6,24)) into RAM from a random location in a larger data file
which containing (3E+6,24) random float64 numbers, about 549MB. For
each reading operation, I obtained the average speed from 10
experiments. It took numpy.memmap 56ms to read 1E+6 long single
column, and 73ms to read data chunk (1E+6,24). Pytables (with
chunkshape (65536, 24) complib = None) scored 1470ms for (1E+6,) and
257ms for (1E+6,24).The standard deviations of all the results are
always very low, which suggests the performance is stable.
I've been reading your code, and you are accessing your data column-
wise, instead of row-wise. In the C-world (and hence Python, NumPy,
PyTables...) you want to make sure that you access data by row, not
column, to get maximum performance. For an explanation on why see:
https://portal.g-node.org/python-
autumnschool/_media/materials/starving_cpus/starvingcpus.pdf
and specifically slides 23 and 31.
Surprisingly, Pytables are 3 times slower than numpy.memmap. I
thought maybe pytables will show better or at least same performance
as numpy.memmap when I need to stream data to the disk and there is
some calculation involved. So next test, I used the same expr as
official bench mark code (poly.py) to operate on the entire array
and streamed the result onto disk. Averagely numpy.memmap+numexpr
took 1.5s to finish the calculation, but Pytables took 9.0s. Then I
start to think, this might because I used the wrong chunkshape for
Pytables. So I did all the tests again with chunkshape = None which
let the Pytables decide its optimized chunkshape (1365, 24). The
results are actually much worse than bigger chunkshape except for
reading (1E+6,) data into RAM, which is 225ms comparing to 1470ms
with bigger chunkshape. It took 358ms for reading a chunk with size
(1E+6,24) into RAM, and 14s to finish the expr calculation. In all
the tests, the pytables use far less RAM (<300MB) than numpy.memmap
(around 1GB).
PyTables should not use as much as 300 MB for your problem. You are
probably speaking about virtual memory, but you should get the amount of
*resident* memory instead.
I am almost sure there is something I did wrong to
make pytables so slow. So if you could give me some hint, I shall
highly appreciate your assistance. I attached my test code and
results.
Another thing about your "performance problems" when using compression
is that you are trying your benchmarks with completely random data, and
in this case, compression is rather useless. Make sure that you use
real data for your benchmarks. If it is compressible, things might
change a lot.
BTW, in order to make your messages more readable, it would help if you
can make a proper use of paragraphing. You know, trying to read a big
paragraph with 40 lines is not exactly easy.
Cheers,
Sorry about the super big paragraph! Thanks a lot for your detailed
response!
I was aware it is pointless to compress pure random data, so I did not
mention the compression rate at all in my post. Unfortunately, the
dynamic range of my data is very large and it is very “random”-like.
Blosc only reduces 10% file size of my real dataset , so I am not a fan
of the compression feature.
I am really confused about the dimension order. I cannot see the freedom
to change the Column-major or Row-major, because the HDF5 is Row-major.
For example, I got N different sensors, each sensor generate 1E9
samples/s, the fixed length dimension (fastest dimension) should always
store N-samples from sensor network, so the time always has to be the
column. And in most case, we always want to access data from all sensors
during certain period of time. In some case, we only want to access data
just from one or two sensors. So I think it is correct to make raw
stores data from all sensors at the same time point. In my opinion,
almost for all kind of real-world data, the slowest dimension should
always represent the time. Probably, I should inverse the dimension
order when I load them into RAM.
Even though I did invert the dimension order,. the speed did not improve
for accessing all channels, but did improve a lot for only accessing
data from one sensor for both memmap and pytables. However, the pytables
is still much slower than memmap:
Read 24x1e6 data chunk at random position:
Memmap: 128ms, (without shift dimension order: 81ms)
Pytables (automatic chunshape = (1, 32768)) 327ms, (without shift
dimension order: 358ms)
Pytables ( chunshape = (24, 65536)): 270ms, (without shift dimension
order: 255ms)
Pytables ( chunshape = (1, 65535)): 328ms
Calculate expr on the whole array:
Memmap: 1.4~1.8s (without shift dimension order: 1.4~1.6s)
Pytables (automatic chunshape = (1, 32768)): 9.4s, (without shift
dimension order: 14s)
Pytables ( chunshape = (24, 65536)): 16s, (without shift dimension
order: 9s)
Pytables ( chunshape = (1, 65535)): 13s
Should I change some default parameters such as buffersize, etc, to
improve the performance?
By the way, after I change shape to (24,3e6), the pytables.Expr returns
an Error:
The error was --> <type 'exceptions.AttributeError'>: 'Expr' object has
no attribute 'BUFFERTIMES'.
I think this is because you have not updated the expression.py for new
‘BUFFER_TIMES’ parameter? So I add:
from tables.parameters import BUFFER_TIMES
change self.BUFFERTIMES to BUFFER_TIMES
I hope this is correct.
Thanks a lot
LittleBigBrain
#######################################################################
# This script compares I/O speed and the speed of the computation of
#a polynomial for different (numpy.memmap+numexpr and tables.Expr)
#
# Author: Little Big Brain
# Date: 2010-10-14
#######################################################################
import numpy as npy
import tables as pytbs
import numexpr as nep
import time
import os.path
import sys
expr = ".25*a**3 + .75*a**2 - 1.5*a - 2"
def
CreateH5file(h5srcname,shape=None,dtype=None,chunkshape=None,srcArrayName=None):
atom=pytbs.Atom.from_dtype(npy.dtype(dtype))
h5file = pytbs.openFile(h5srcname, mode = "w")
ca = h5file.createCArray(\
h5file.root,srcArrayName,\
atom,shape,\
chunkshape=chunkshape)
print '#='*20
print 'pytable hdf5 file info:\r\n',h5file
print 'pytable Array info:\r\n',ca
print '#='*20
t0=time.time()
for i in range(0,10):
#output random floating number into array
ca[:,i*shape[1]/10:(i+1)*shape[1]/10]=\
npy.random.randn(shape[0],shape[1]/10)
h5file.close()
t1=time.time()
return t1-t0,os.path.getsize(h5srcname)
def CreateArrayMap(h5srcname,npmapname=None,srcArrayName=None):
#Copy data from pytable h5 file to generate numpy.memmap
try:
h5src = pytbs.openFile(\
h5srcname,\
mode = "r")
except:
(type, value, traceback) = sys.exc_info()
print "Problems open %s" % \
h5srcname
print "The error was --> %s: %s" % (type, value)
h5src.close()
raise RuntimeError, "Please check Source File Name"
try:
srcArray=h5src.getNode('/'+srcArrayName)
except:
(type, value, traceback) = sys.exc_info()
print "Problems open %s/%s" % \
(h5srcname, srcArrayName)
print "The error was --> %s: %s" % (type, value)
print "The source file looks like:\n", h5src
h5src.close()
raise RuntimeError, "Please check Source Array Name"
if os.path.isfile(npmapname):
h5src.close()
raise RuntimeError, '\''+\
npmapname+'\'' +\
' already exists. Please Check Destination File Name'
try:
npmap1=npy.memmap(npmapname,dtype=srcArray.dtype.name,\
mode='w+',shape=srcArray.shape)
except:
(type, value, traceback) = sys.exc_info()
print "Problems open %s" % \
npmapname
print "The error was --> %s: %s" % (type, value)
h5src.close()
raise RuntimeError, "Please check Destination File Name"
try:
t0=time.time()
#Copying data into Memmap
for i in range(0,10):
npmap1[:shape[0],i*shape[1]/10:(i+1)*shape[1]/10]=\
srcArray[:shape[0],i*shape[1]/10:(i+1)*shape[1]/10]
npmap1.flush()
t1=time.time()
except:
(type, value, traceback) = sys.exc_info()
print "Problems copy %s/%s into %s" % \
(h5srcname, srcArrayName,npmapname)
print "The error was --> %s: %s" % (type, value)
h5src.close()
npmap1.close()
del npmap1
raise RuntimeError, "Cannot Finish Copy, have enough space?"
h5src.close()
npmap1.close()
del npmap1
return t1-t0, os.path.getsize(npmapname)
def RandReadNpMap(npmapname,shape=None,dtype='float64',\
readlen=100,roundPerLen=10):
#read same chunks at random position
if not os.path.isfile(npmapname):
raise RuntimeError, '\''+\
npmapname+'\'' +\
' does not exist. Please Check Destination File Name'
try:
npmap1=npy.memmap(npmapname,dtype=dtype,\
mode='r',shape=shape)
except:
(type, value, traceback) = sys.exc_info()
print "Problems open %s" % \
npmapname
print "The error was --> %s: %s" % (type, value)
raise RuntimeError, "Please check Destination File Name"
try:
a1_st=npy.zeros((len(readlen),roundPerLen,2),dtype='uint32')
dt_list=npy.zeros((len(readlen),roundPerLen,3),dtype='float64')
#print readlen
for i in range(0,len(readlen)):
## print 'read %d rows' % readlen[i]
for k in range(0,roundPerLen):
#print 'start point = ', a1_st[i]
#print 'reading length = ', readlen[i]
t0=time.time()
a=npy.zeros((shape[0],readlen[i]),dtype=dtype)
#print 'debug info:', 'a shape',a.shape
t1=time.time()
dt_list[i,k,0]=t1-t0
a1_st[i,k,0]=npy.random.randint(0,shape[1]-readlen.max()-1)
t0=time.time()
a[0,:]=npmap1[0,a1_st[i,k,0]:(readlen[i]+a1_st[i,k,0])]
t1=time.time()
dt_list[i,k,1]=t1-t0
a1_st[i,k,1]=npy.random.randint(0,shape[1]-readlen.max()-1)
t0=time.time()
a[:,:]=npmap1[:shape[0],a1_st[i,k,1]:(readlen[i]+a1_st[i,k,1])]
t1=time.time()
dt_list[i,k,2]=t1-t0
## print 'create a: %f ms, read 1st row:%f ms, read all:%f ms' \
## % tuple(dt_list[i,:,:].mean(axis=0)*1e3)
#print 'debug info:','dt_list ms', dt_list*1e3
except:
(type, value, traceback) = sys.exc_info()
print "Problems retrive data from %s" % \
npmapname
print "The error was --> %s: %s" % (type, value)
npmap1.close()
del npmap1
raise RuntimeError, "Cannot Finish Copy, have enough space?"
npmap1.close()
del npmap1
return dt_list
def RandReadTables(h5srcname,\
srcArrayName=None,\
readlen=100,roundPerLen=10):
if not os.path.isfile(h5srcname):
raise RuntimeError, '\''+\
h5srcname+'\'' +\
' does not exist. Please Check Destination File Name'
try:
h5src=pytbs.openFile(h5srcname,mode='r')
except:
(type, value, traceback) = sys.exc_info()
print "Problems open %s" % \
h5srcname
print "The error was --> %s: %s" % (type, value)
raise RuntimeError, "Please check Destination File Name"
try:
srcArray=h5src.getNode('/'+srcArrayName)
except:
(type, value, traceback) = sys.exc_info()
print "Problems open %s/%s" % \
(h5srcname, srcArrayName)
print "The error was --> %s: %s" % (type, value)
print "The source file looks like:\n", h5src
h5src.close()
raise RuntimeError, "Please check Source Array Name"
shape=srcArray.shape
atom=srcArray.atom
dstchunkshape=srcArray.chunkshape
try:
a1_st=npy.random.randint(0,shape[1]-readlen.max()-1,\
(len(readlen),roundPerLen,2))
dt_list=npy.zeros((len(readlen),roundPerLen,3),dtype='float64')
for i in range(0,len(readlen)):
## print 'read %d rows' % readlen[i]
for k in range(0,roundPerLen):
## print 'start point = ', a1_st[i]
## print 'reading length = ', readlen[i]
t0=time.time()
a=npy.zeros((shape[0],readlen[i]),dtype=srcArray.atom.dtype)
#print 'debug info:', 'a shape',a.shape
t1=time.time()
dt_list[i,k,0]=t1-t0
t0=time.time()
a[0,:]=srcArray[0,a1_st[i,k,0]:(readlen[i]+a1_st[i,k,0])]
t1=time.time()
dt_list[i,k,1]=t1-t0
t0=time.time()
a[:,:]=srcArray[:shape[0],a1_st[i,k,1]:(readlen[i]+a1_st[i,k,1])]
t1=time.time()
dt_list[i,k,2]=t1-t0
## print 'create array %f ms, reading average time =%fms for 1 row,
%fms for all row'\
## % tuple(dt_list[i,:,:].mean(axis=0)*1e3)
except:
(type, value, traceback) = sys.exc_info()
print "Problems compute data from %s" % \
h5srcname
print "The error was --> %s: %s" % (type, value)
h5src.close()
raise RuntimeError, "Cannot Finish Copy, have enough space?"
h5src.close()
return dt_list
def SerialRWNpMap(npmapname,dstmapname=None,shape=None,\
dtype='float64',\
readlen=100,roundPerLen=10):
#Read chunk into Memory then do calculation then write to disk
if not os.path.isfile(npmapname):
raise RuntimeError, '\''+\
npmapname+'\'' +\
' does not exist. Please Check Destination File Name'
try:
npmap1=npy.memmap(npmapname,dtype=dtype,\
mode='r',shape=shape)
except:
(type, value, traceback) = sys.exc_info()
print "Problems open %s" % \
npmapname
print "The error was --> %s: %s" % (type, value)
raise RuntimeError, "Please check Destination File Name"
try:
#create result memmap
npmap2=npy.memmap(dstmapname,dtype=dtype,\
mode='w+',shape=shape)
except:
(type, value, traceback) = sys.exc_info()
print "Problems open %s" % \
dstmapname
print "The error was --> %s: %s" % (type, value)
raise RuntimeError, "Please check Destination File Name"
try:
dt_list=npy.zeros((len(readlen),roundPerLen),dtype='float64')
for i in range(0,len(readlen)):
## print 'read %d rows' % readlen[i]
a=npy.zeros((shape[0],readlen[i]),dtype=dtype)
for k in range(0,roundPerLen):
t0=time.time()
for ik in range(0,shape[1]/readlen[i]):
a[:,:]=npmap1[:shape[0],ik*readlen[i]:(ik+1)*readlen[i]]
npmap2[:shape[0],ik*readlen[i]:(ik+1)*readlen[i]]=nep.evaluate(expr)
t1=time.time()
dt_list[i,k]=t1-t0
## print 'elaspe ',(t1-t0)*1e3,'ms'
except:
(type, value, traceback) = sys.exc_info()
print "Problems retrive data from %s" % \
npmapname
print "The error was --> %s: %s" % (type, value)
npmap1.close()
del npmap1
raise RuntimeError, "Cannot Finish Copy, have enough space?"
npmap1.close()
del npmap1
return dt_list
def SerialRWTables(h5srcname,h5dstname=None,\
srcArrayName=None,\
dstArrayName=None,\
filters=None,\
roundPerLen=10):
if not os.path.isfile(h5srcname):
raise RuntimeError, '\''+\
h5srcname+'\'' +\
' does not exist. Please Check Destination File Name'
try:
h5src=pytbs.openFile(h5srcname,mode='r')
except:
(type, value, traceback) = sys.exc_info()
print "Problems open %s" % \
h5srcname
print "The error was --> %s: %s" % (type, value)
raise RuntimeError, "Please check Destination File Name"
try:
srcArray=h5src.getNode('/'+srcArrayName)
except:
(type, value, traceback) = sys.exc_info()
print "Problems open %s/%s" % \
(h5srcname, srcArrayName)
print "The error was --> %s: %s" % (type, value)
print "The source file looks like:\n", h5src
h5src.close()
raise RuntimeError, "Please check Source Array Name"
shape=srcArray.shape
atom=srcArray.atom
filters=filters
dstchunkshape=srcArray.chunkshape
if not dstArrayName:
dstArrayName=srcArrayName
try:
h5dst = pytbs.openFile(\
h5dstname,\
mode = "w")
except:
(type, value, traceback) = sys.exc_info()
print "Problems open %s" % \
h5dstname
print "The error was --> %s: %s" % (type, value)
h5src.close()
h5dst.close()
raise RuntimeError, "Please check Destination File Name"
try:
dstArray = h5dst.createCArray(\
h5dst.root,dstArrayName,\
atom,shape,filters=filters,\
chunkshape=dstchunkshape)
except:
(type, value, traceback) = sys.exc_info()
print "Problems create %s/%s" % \
(h5dstname, dstArrayName)
print "The error was --> %s: %s" % (type, value)
print "The destination file looks like:\n", h5dst
h5src.close()
h5dst.close()
raise RuntimeError, "Please check Destination Array parameters"
print '#='*20
print 'pytable hdf5 file info:\r\n',h5dst
print 'pytable Array info:\r\n',dstArray
print '#='*20
try:
dt_list=npy.zeros((roundPerLen,),dtype='float64')
print srcArray.chunkshape
a=srcArray
for k in range(0,roundPerLen):
t0=time.time()
exprtbs = pytbs.Expr(expr)
exprtbs.setOutput(dstArray)
exprtbs.eval()
t1=time.time()
dt_list[k]=t1-t0
## print 'read-compute-write:%f ms' \
## % (dt_list[k]*1e3)
#print 'debug info:','dt_list ms', dt_list*1e3
## print 'time elapse %f ms' %(dt_list.mean()*1e3)
except:
(type, value, traceback) = sys.exc_info()
print "Problems compute data from %s to %s" % \
(h5srcname,h5dstname)
print "The error was --> %s: %s" % (type, value)
h5src.close()
h5dst.close()
raise RuntimeError, "Cannot Finish Copy, have enough space?"
h5src.close()
h5dst.close()
return dt_list
if __name__ == '__main__':
version='v2Cm-'
pytbs.print_versions()
shape=(24,3e6)
chunkshape=None
clib='blosc'
clevel=3
srcArrayName='TestArray'
filters = pytbs.Filters(complib=clib, complevel=clevel)
readlen=10**npy.arange(2,7,dtype='uint32')
roundPerLen=10 # run 10 times for each operation, to get average elaspe time
testFolder='G:/PyTables_vs_Memmap'
h5srcname=testFolder+'/'+version+'src-'+str(chunkshape)+'-noncomp'+'.h5'
srcmapname=testFolder+'/'+version+'src-'+'.memmap'
dstmapname=testFolder+'/'+version+'dst-'+'.memmap'
h5dstname1=testFolder+'/'+version+'dst-'+str(chunkshape)+'-noncomp'+'.h5'
h5dstname2=testFolder+'/'+version+'dst-'+str(chunkshape)+\
'-'+clib+'('+str(clevel)+')'+'.h5'
createTimes=npy.zeros((2,1))
fileSize=npy.zeros((2,1))
createTimes[0],fileSize[0]=CreateH5file(\
h5srcname,shape=shape,chunkshape=chunkshape\
,srcArrayName=srcArrayName,dtype='float64')
createTimes[1],fileSize[1]=CreateArrayMap(\
h5srcname,npmapname=srcmapname\
,srcArrayName=srcArrayName)
dt_list0=RandReadNpMap(srcmapname,\
shape=shape,dtype='float64',\
readlen=readlen,roundPerLen=roundPerLen)
dt_list1=RandReadTables(h5srcname,\
srcArrayName=srcArrayName,\
readlen=readlen,roundPerLen=roundPerLen)
dt_list2=SerialRWNpMap(srcmapname,dstmapname=dstmapname,\
shape=shape,dtype='float64',\
readlen=readlen,roundPerLen=roundPerLen)
dt_list3=SerialRWTables(h5srcname,h5dstname=h5dstname1,\
srcArrayName=srcArrayName,\
roundPerLen=roundPerLen)
dt_list4=SerialRWTables(h5srcname,h5dstname=h5dstname2,\
srcArrayName=srcArrayName,\
filters=filters,\
roundPerLen=roundPerLen)
save_var_list=['dt_list'+str(i) for i in range(0,5)]
save_var_list=save_var_list+\
['createTimes','fileSize',\
'h5srcname','srcmapname',\
'dstmapname','h5dstname1',\
'h5dstname2','shape',\
'chunkshape','clib',
'clevel','srcArrayName',\
'filters','readlen',\
'roundPerLen']
save_var_dict={}
for i in save_var_list:
save_var_dict[i]=vars()[i]
## param_file=__file__.rsplit(os.path.sep,1)[1]
## param_file=param_file.rsplit('.',1)[0]
param_file='pyIOspeedCompareAll-'+version+time.strftime('%Y%m%d%H%M')
npy.save(testFolder+'/'+param_file,save_var_dict)
print '#='*20
print 'Source Pytables HDF5 file: %s \r\n%d MB generated in %d s'\
% (h5srcname,fileSize[0]/(1024**2),createTimes[0])
print 'Source MemMap file: %s \r\n%d MB generated in %d s'\
% (srcmapname,fileSize[1]/(1024**2),createTimes[1])
print '#*'*10
print '#*'*5, 'Randomly read some chunks into RAM by numpy.memmap'
for i in range(0,len(readlen)):
print ('Randomly reading %d x %d:\r\n'\
+'%f ms for numpy array creating,' \
+'%f ms for reading 1st row, %f for reading all') \
% ((shape[0],readlen[i])+\
tuple(dt_list0[i,:,:].mean(axis=0)*1e3))
print '#*'*5, 'Randomly read some chunks into RAM by pytables'
for i in range(0,len(readlen)):
print ('Randomly reading %d x %d:\r\n'\
+'%f ms for numpy array creating,' \
+'%f ms for reading 1st row, %f for reading all') \
% ((shape[0],readlen[i])+\
tuple(dt_list1[i,:,:].mean(axis=0)*1e3))
print '#*'*10
print '#*'*5,'Compute polynomal function:',expr,'\r\n'
print '#*'*5,'and stream to HDD by numpy.memmap'
print 'Result MemMap file: %s \r\n%d MB'\
% (dstmapname,os.path.getsize(dstmapname)/(1024**2))
for i in range(0,len(readlen)):
print ('reading chunkshape %dx%d:\r\n'+ \
'%f ms for reading and computing the whole array') \
% (shape[0],readlen[i],\
dt_list2[i,:].mean()*1e3)
print '#*'*10
print '#*'*5,'Compute polynomal function:',expr,'\r\n'
print '#*'*5,'and stream to HDD by pytalbes.Expr'
print 'Result HDF5 file: %s \r\n%d MB'\
% (h5dstname1,os.path.getsize(h5dstname1)/(1024**2))
print ('%f ms for reading and computing the whole array') \
% (dt_list3.mean()*1e3)
print '#*'*10
print '#*'*5,'Compute polynomal function:',expr,'\r\n'
print '#*'*5,'and stream to HDD by pytalbes.Expr'
print 'Result HDF5 file: %s \r\n%d MB'\
% (h5dstname2,os.path.getsize(h5dstname2)/(1024**2))
print ('%f ms for reading and computing the whole array') \
% (dt_list4.mean()*1e3)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
PyTables version: 2.2
HDF5 version: 1.8.5
NumPy version: 1.4.1
Numexpr version: 1.3.1 (using VML/MKL 10.2.5)
Zlib version: 1.2.3 (in Python interpreter)
Blosc version: 1.0 (2010-07-01)
Python version: 2.6.5 (r265:79096, Mar 19 2010, 18:02:59) [MSC v.1500 64 bit
(AMD64)]
Byte-ordering: little
Detected cores: 4
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
pytable hdf5 file info:G:/PyTables_vs_Memmap/v2Cm-src-None-noncomp.h5 (File) ''
Last modif.: 'Thu Oct 14 19:16:47 2010'
Object Tree:
/ (RootGroup) ''
/TestArray (CArray(24, 3000000)) ''
pytable Array info:/TestArray (CArray(24, 3000000)) ''
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
pytable hdf5 file info:G:/PyTables_vs_Memmap/v2Cm-dst-None-noncomp.h5 (File) ''
Last modif.: 'Thu Oct 14 19:19:26 2010'
Object Tree:
/ (RootGroup) ''
/TestArray (CArray(24, 3000000)) ''
pytable Array info:/TestArray (CArray(24, 3000000)) ''
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
(1, 32768)
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
pytable hdf5 file info:G:/PyTables_vs_Memmap/v2Cm-dst-None-blosc(3).h5 (File) ''
Last modif.: 'Thu Oct 14 19:21:00 2010'
Object Tree:
/ (RootGroup) ''
/TestArray (CArray(24, 3000000), shuffle, blosc(3)) ''
pytable Array info:/TestArray (CArray(24, 3000000), shuffle, blosc(3)) ''
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
(1, 32768)
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
Source Pytables HDF5 file: G:/PyTables_vs_Memmap/v2Cm-src-None-noncomp.h5 552
MB generated in 17 s
Source MemMap file: G:/PyTables_vs_Memmap/v2Cm-src-.memmap 549 MB generated in
20 s
#*#*#*#*#*#*#*#*#*#*
#*#*#*#*#* Randomly read some chunks into RAM by numpy.memmap
Randomly reading 24 x 100:0.000000 ms for numpy array creating,0.000000 ms for
reading 1st row, 0.000000 for reading all
Randomly reading 24 x 1000:0.000000 ms for numpy array creating,0.000000 ms for
reading 1st row, 0.000000 for reading all
Randomly reading 24 x 10000:1.600003 ms for numpy array creating,0.000000 ms
for reading 1st row, 1.500010 for reading all
Randomly reading 24 x 100000:7.800007 ms for numpy array creating,0.000000 ms
for reading 1st row, 14.099979 for reading all
Randomly reading 24 x 1000000:78.099990 ms for numpy array creating,6.299996 ms
for reading 1st row, 128.100014 for reading all
#*#*#*#*#* Randomly read some chunks into RAM by pytables
Randomly reading 24 x 100:0.000000 ms for numpy array creating,1.600003 ms for
reading 1st row, 6.199980 for reading all
Randomly reading 24 x 1000:0.000000 ms for numpy array creating,0.000000 ms for
reading 1st row, 4.700017 for reading all
Randomly reading 24 x 10000:0.000000 ms for numpy array creating,3.099990 ms
for reading 1st row, 7.800007 for reading all
Randomly reading 24 x 100000:4.600000 ms for numpy array creating,3.000021 ms
for reading 1st row, 40.799975 for reading all
Randomly reading 24 x 1000000:93.699980 ms for numpy array creating,18.900013
ms for reading 1st row, 326.500010 for reading all
#*#*#*#*#*#*#*#*#*#*
#*#*#*#*#* Compute polynomal function: .25*a**3 + .75*a**2 - 1.5*a - 2
#*#*#*#*#* and stream to HDD by numpy.memmap
Result MemMap file: G:/PyTables_vs_Memmap/v2Cm-dst-.memmap 549 MB
reading chunkshape 24x100:3798.399997 ms for reading and computing the whole
array
reading chunkshape 24x1000:1409.399986 ms for reading and computing the whole
array
reading chunkshape 24x10000:1845.300007 ms for reading and computing the whole
array
reading chunkshape 24x100000:1796.900010 ms for reading and computing the whole
array
reading chunkshape 24x1000000:1790.599990 ms for reading and computing the
whole array
#*#*#*#*#*#*#*#*#*#*
#*#*#*#*#* Compute polynomal function: .25*a**3 + .75*a**2 - 1.5*a - 2
#*#*#*#*#* and stream to HDD by pytalbes.Expr
Result HDF5 file: G:/PyTables_vs_Memmap/v2Cm-dst-None-noncomp.h5 552 MB
9401.500010 ms for reading and computing the whole array
#*#*#*#*#*#*#*#*#*#*
#*#*#*#*#* Compute polynomal function: .25*a**3 + .75*a**2 - 1.5*a - 2
#*#*#*#*#* and stream to HDD by pytalbes.Expr
Result HDF5 file: G:/PyTables_vs_Memmap/v2Cm-dst-None-blosc(3).h5 549 MB
12813.999987 ms for reading and computing the whole array
>>>
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
PyTables version: 2.2
HDF5 version: 1.8.5
NumPy version: 1.4.1
Numexpr version: 1.3.1 (using VML/MKL 10.2.5)
Zlib version: 1.2.3 (in Python interpreter)
Blosc version: 1.0 (2010-07-01)
Python version: 2.6.5 (r265:79096, Mar 19 2010, 18:02:59) [MSC v.1500 64 bit
(AMD64)]
Byte-ordering: little
Detected cores: 4
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
pytable hdf5 file info:G:/PyTables_vs_Memmap/v1bCm-src-(1, 65535)-noncomp.h5
(File) ''
Last modif.: 'Thu Oct 14 20:31:23 2010'
Object Tree:
/ (RootGroup) ''
/TestArray (CArray(24, 3000000)) ''
pytable Array info:/TestArray (CArray(24, 3000000)) ''
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
pytable hdf5 file info:G:/PyTables_vs_Memmap/v1bCm-dst-(1, 65535)-noncomp.h5
(File) ''
Last modif.: 'Thu Oct 14 20:34:05 2010'
Object Tree:
/ (RootGroup) ''
/TestArray (CArray(24, 3000000)) ''
pytable Array info:/TestArray (CArray(24, 3000000)) ''
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
(1, 65535)
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
pytable hdf5 file info:G:/PyTables_vs_Memmap/v1bCm-dst-(1, 65535)-blosc(3).h5
(File) ''
Last modif.: 'Thu Oct 14 20:36:18 2010'
Object Tree:
/ (RootGroup) ''
/TestArray (CArray(24, 3000000), shuffle, blosc(3)) ''
pytable Array info:/TestArray (CArray(24, 3000000), shuffle, blosc(3)) ''
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
(1, 65535)
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
Source Pytables HDF5 file: G:/PyTables_vs_Memmap/v1bCm-src-(1,
65535)-noncomp.h5 552 MB generated in 19 s
Source MemMap file: G:/PyTables_vs_Memmap/v1bCm-src-.memmap 549 MB generated in
20 s
#*#*#*#*#*#*#*#*#*#*
#*#*#*#*#* Randomly read some chunks into RAM by numpy.memmap
Randomly reading 24 x 100:0.000000 ms for numpy array creating,0.000000 ms for
reading 1st row, 1.500010 for reading all
Randomly reading 24 x 1000:0.000000 ms for numpy array creating,0.000000 ms for
reading 1st row, 0.000000 for reading all
Randomly reading 24 x 10000:0.000000 ms for numpy array creating,0.000000 ms
for reading 1st row, 1.599979 for reading all
Randomly reading 24 x 100000:7.700014 ms for numpy array creating,0.000000 ms
for reading 1st row, 15.700006 for reading all
Randomly reading 24 x 1000000:106.399989 ms for numpy array creating,1.500010
ms for reading 1st row, 126.499987 for reading all
#*#*#*#*#* Randomly read some chunks into RAM by pytables
Randomly reading 24 x 100:0.000000 ms for numpy array creating,1.500010 ms for
reading 1st row, 9.399986 for reading all
Randomly reading 24 x 1000:0.000000 ms for numpy array creating,0.000000 ms for
reading 1st row, 9.400010 for reading all
Randomly reading 24 x 10000:0.000000 ms for numpy array creating,1.600003 ms
for reading 1st row, 13.999987 for reading all
Randomly reading 24 x 100000:15.499997 ms for numpy array creating,0.000000 ms
for reading 1st row, 36.100006 for reading all
Randomly reading 24 x 1000000:96.999979 ms for numpy array creating,21.700025
ms for reading 1st row, 328.199983 for reading all
#*#*#*#*#*#*#*#*#*#*
#*#*#*#*#* Compute polynomal function: .25*a**3 + .75*a**2 - 1.5*a - 2
#*#*#*#*#* and stream to HDD by numpy.memmap
Result MemMap file: G:/PyTables_vs_Memmap/v1bCm-dst-.memmap 549 MB
reading chunkshape 24x100:3832.800007 ms for reading and computing the whole
array
reading chunkshape 24x1000:1415.700006 ms for reading and computing the whole
array
reading chunkshape 24x10000:1865.599990 ms for reading and computing the whole
array
reading chunkshape 24x100000:1835.899997 ms for reading and computing the whole
array
reading chunkshape 24x1000000:1785.900021 ms for reading and computing the
whole array
#*#*#*#*#*#*#*#*#*#*
#*#*#*#*#* Compute polynomal function: .25*a**3 + .75*a**2 - 1.5*a - 2
#*#*#*#*#* and stream to HDD by pytalbes.Expr
Result HDF5 file: G:/PyTables_vs_Memmap/v1bCm-dst-(1, 65535)-noncomp.h5 552 MB
13342.199993 ms for reading and computing the whole array
#*#*#*#*#*#*#*#*#*#*
#*#*#*#*#* Compute polynomal function: .25*a**3 + .75*a**2 - 1.5*a - 2
#*#*#*#*#* and stream to HDD by pytalbes.Expr
Result HDF5 file: G:/PyTables_vs_Memmap/v1bCm-dst-(1, 65535)-blosc(3).h5 549 MB
13706.299996 ms for reading and computing the whole array
>>>
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
PyTables version: 2.2
HDF5 version: 1.8.5
NumPy version: 1.4.1
Numexpr version: 1.3.1 (using VML/MKL 10.2.5)
Zlib version: 1.2.3 (in Python interpreter)
Blosc version: 1.0 (2010-07-01)
Python version: 2.6.5 (r265:79096, Mar 19 2010, 18:02:59) [MSC v.1500 64 bit
(AMD64)]
Byte-ordering: little
Detected cores: 4
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
pytable hdf5 file info:G:/PyTables_vs_Memmap/v1Cm-src-(24, 65536)-noncomp.h5
(File) ''
Last modif.: 'Thu Oct 14 19:46:08 2010'
Object Tree:
/ (RootGroup) ''
/TestArray (CArray(24, 3000000)) ''
pytable Array info:/TestArray (CArray(24, 3000000)) ''
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
pytable hdf5 file info:G:/PyTables_vs_Memmap/v1Cm-dst-(24, 65536)-noncomp.h5
(File) ''
Last modif.: 'Thu Oct 14 19:48:48 2010'
Object Tree:
/ (RootGroup) ''
/TestArray (CArray(24, 3000000)) ''
pytable Array info:/TestArray (CArray(24, 3000000)) ''
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
(24, 65536)
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
pytable hdf5 file info:G:/PyTables_vs_Memmap/v1Cm-dst-(24, 65536)-blosc(3).h5
(File) ''
Last modif.: 'Thu Oct 14 19:51:26 2010'
Object Tree:
/ (RootGroup) ''
/TestArray (CArray(24, 3000000), shuffle, blosc(3)) ''
pytable Array info:/TestArray (CArray(24, 3000000), shuffle, blosc(3)) ''
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
(24, 65536)
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=
Source Pytables HDF5 file: G:/PyTables_vs_Memmap/v1Cm-src-(24,
65536)-noncomp.h5 552 MB generated in 20 s
Source MemMap file: G:/PyTables_vs_Memmap/v1Cm-src-.memmap 549 MB generated in
20 s
#*#*#*#*#*#*#*#*#*#*
#*#*#*#*#* Randomly read some chunks into RAM by numpy.memmap
Randomly reading 24 x 100:0.000000 ms for numpy array creating,0.000000 ms for
reading 1st row, 0.000000 for reading all
Randomly reading 24 x 1000:0.000000 ms for numpy array creating,0.000000 ms for
reading 1st row, 0.000000 for reading all
Randomly reading 24 x 10000:0.000000 ms for numpy array creating,0.000000 ms
for reading 1st row, 1.600003 for reading all
Randomly reading 24 x 100000:7.700014 ms for numpy array creating,3.200006 ms
for reading 1st row, 12.499976 for reading all
Randomly reading 24 x 1000000:92.299962 ms for numpy array creating,7.800007 ms
for reading 1st row, 129.600024 for reading all
#*#*#*#*#* Randomly read some chunks into RAM by pytables
Randomly reading 24 x 100:0.000000 ms for numpy array creating,0.000000 ms for
reading 1st row, 1.499987 for reading all
Randomly reading 24 x 1000:0.000000 ms for numpy array creating,0.000000 ms for
reading 1st row, 0.000000 for reading all
Randomly reading 24 x 10000:1.500010 ms for numpy array creating,1.600003 ms
for reading 1st row, 1.600003 for reading all
Randomly reading 24 x 100000:12.400007 ms for numpy array creating,0.000000 ms
for reading 1st row, 26.699972 for reading all
Randomly reading 24 x 1000000:95.199966 ms for numpy array creating,15.500045
ms for reading 1st row, 270.499992 for reading all
#*#*#*#*#*#*#*#*#*#*
#*#*#*#*#* Compute polynomal function: .25*a**3 + .75*a**2 - 1.5*a - 2
#*#*#*#*#* and stream to HDD by numpy.memmap
Result MemMap file: G:/PyTables_vs_Memmap/v1Cm-dst-.memmap 549 MB
reading chunkshape 24x100:3796.900010 ms for reading and computing the whole
array
reading chunkshape 24x1000:1387.500000 ms for reading and computing the whole
array
reading chunkshape 24x10000:1854.600000 ms for reading and computing the whole
array
reading chunkshape 24x100000:1782.800007 ms for reading and computing the whole
array
reading chunkshape 24x1000000:1778.100014 ms for reading and computing the
whole array
#*#*#*#*#*#*#*#*#*#*
#*#*#*#*#* Compute polynomal function: .25*a**3 + .75*a**2 - 1.5*a - 2
#*#*#*#*#* and stream to HDD by pytalbes.Expr
Result HDF5 file: G:/PyTables_vs_Memmap/v1Cm-dst-(24, 65536)-noncomp.h5 552 MB
15798.500013 ms for reading and computing the whole array
#*#*#*#*#*#*#*#*#*#*
#*#*#*#*#* Compute polynomal function: .25*a**3 + .75*a**2 - 1.5*a - 2
#*#*#*#*#* and stream to HDD by pytalbes.Expr
Result HDF5 file: G:/PyTables_vs_Memmap/v1Cm-dst-(24, 65536)-blosc(3).h5 642 MB
177037.500000 ms for reading and computing the whole array
>>>
------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3.
Spend less time writing and rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users