Thanks a lot for your insight.

Regards,
Dominik

On Mon, Dec 13, 2010 at 8:09 PM, Francesc Alted <fal...@pytables.org> wrote:

> A Monday 13 December 2010 15:08:03 Francesc Alted escrigué:
> > A Monday 13 December 2010 14:56:26 Dominik Szczerba escrigué:
> > > > But, for knowing if accessing columns this is efficient for your
> > > > case, I'd need more info on your datasets.  Are they contiguous
> > > > or chunked? If chunked, which is the chunkshape you have chosen?
> > >
> > > Both. Files saved from matlab are uncompressed/contiguous, the ones
> > > saved from my program are usually compressed/chunked and the size
> > > is around 1024^2/sizeof(type).
> >
> > Well, for PyTables (or any C application) and contiguous datasets,
> > accessing data by columns is inefficient: the privileged direction
> > for performance are rows.
>
> I was curious to see the difference in performance.  Here are some
> timings:
>
> >>> nptetra = np.empty((4, 4622544))
> >>> f = tb.openFile("/tmp/t.h5", "w")
> >>> tetra = f.createArray(f.root, "tetra", nptetra)
> >>> %time [ tetra[:,i] for i in range(4622544) ]
> CPU times: user 201.61 s, sys: 162.59 s, total: 364.20 s
> Wall time: 367.91 s
>
> Using the transposed version (i.e. accessing by rows):
>
> >>> tetra2 = f.createArray(f.root, "tetra2", nptetra.transpose())
> >>> %time [ tetra2[i] for i in range(4622544) ]
> CPU times: user 163.78 s, sys: 0.48 s, total: 164.25 s
> Wall time: 165.44 s   # the time is more than 2x faster
>
> But using the iterator is the fastest mode (the I/O is buffered):
>
> >>> %time [ row for row in tetra2 ]
> CPU times: user 26.21 s, sys: 0.38 s, total: 26.59 s
> Wall time: 26.81 s
>
> I'd say that for chunked datasets you can expect something similar.
>
> --
> Francesc Alted
>
>
> ------------------------------------------------------------------------------
> Lotusphere 2011
> Register now for Lotusphere 2011 and learn how
> to connect the dots, take your collaborative environment
> to the next level, and enter the era of Social Business.
> http://p.sf.net/sfu/lotusphere-d2d
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to