Re: [Pytables-users] PerformanceWarning: The Leaf ``/tetrahedrons`` is exceeding the maximum recommended rowsize

Francesc Alted Mon, 13 Dec 2010 11:11:09 -0800

A Monday 13 December 2010 15:08:03 Francesc Alted escrigué:
> A Monday 13 December 2010 14:56:26 Dominik Szczerba escrigué:
> > > But, for knowing if accessing columns this is efficient for your
> > > case, I'd need more info on your datasets.  Are they contiguous
> > > or chunked? If chunked, which is the chunkshape you have chosen?
> > 
> > Both. Files saved from matlab are uncompressed/contiguous, the ones
> > saved from my program are usually compressed/chunked and the size
> > is around 1024^2/sizeof(type).
> 
> Well, for PyTables (or any C application) and contiguous datasets,
> accessing data by columns is inefficient: the privileged direction
> for performance are rows.


I was curious to see the difference in performance.  Here are some 
timings:

>>> nptetra = np.empty((4, 4622544))
>>> f = tb.openFile("/tmp/t.h5", "w")
>>> tetra = f.createArray(f.root, "tetra", nptetra)
>>> %time [ tetra[:,i] for i in range(4622544) ]
CPU times: user 201.61 s, sys: 162.59 s, total: 364.20 s
Wall time: 367.91 s

Using the transposed version (i.e. accessing by rows):

>>> tetra2 = f.createArray(f.root, "tetra2", nptetra.transpose())
>>> %time [ tetra2[i] for i in range(4622544) ]
CPU times: user 163.78 s, sys: 0.48 s, total: 164.25 s
Wall time: 165.44 s   # the time is more than 2x faster

But using the iterator is the fastest mode (the I/O is buffered):

>>> %time [ row for row in tetra2 ]
CPU times: user 26.21 s, sys: 0.38 s, total: 26.59 s
Wall time: 26.81 s

I'd say that for chunked datasets you can expect something similar.

-- 
Francesc Alted

------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] PerformanceWarning: The Leaf ``/tetrahedrons`` is exceeding the maximum recommended rowsize

Reply via email to