A Thursday 24 June 2010 22:51:10 Felix Schlesinger escrigué:
> Francesc Alted <faltet <at> pytables.org> writes:
> > A Wednesday 23 June 2010 22:17:23 Felix Schlesinger escrigué:
> > > I am dealing with a large table. One of the columns of that table
> > > should hold a variable number of pairs of integers. In order to
> > > implement this I use a pytables table for the fixed columns and a
> > > seperate VLArray of IntAtom(4,2) for the variable column.
> >
> > Which is the approximate number of entries on each row?
> 
> Between 1 and 5.

Mmh, this is not very much.  Given that, I'd try to put that info in a 
multidimensional column of the same table.  The column can be defined 
something like Int32Col(shape=(5,2)), that is, up to 5 pairs of 32-bit 
integers.  Then, another column (Int8Col) can have the actual number of valid 
entries.  This is 5x2x4+1=41 bytes of maximum overhead per row, which is not 
that much.

In addition, if you use compression, you can get rid of this overhead very 
efficiently (try Blosc in 2.2 series for maximum performance).  I think that 
would be a very fast approach and also one that offers the simplest 
implementation.

> > What about using an CArray for all the IntAtom(4,2) atoms, named say,
> > 'values' and an additional array (saved in a different CArray or even in
> > an attribute), say 'indices', for keeping track of the different 'row'
> > indices.
> 
> I am doing something like this. Many rows in the VLArray would have length
>  0. I am not creating those rows and instead putting the index of the
>  relevant VLArray row into each table row (or -1 if there is no VLArray
>  row). I could use an EArray and put start:stop into the table. Are appends
>  to an EArray faster then to a VLArray?

I'd say yes.  In addition EArray does have support for compression in data, 
while VLArray does not.

> But since I do not know the size in advance, I need some data structure to
> collect all the items before I can create a CArray. I can benchmark it with
> numpy.append or with simple lists.

Sorry.  Continue using an EArray if you want to go this route.  EArray has 
more or less the same performance that a CArray, and you don't need to know 
the number of entries in advance.

Tell us how it goes,

-- 
Francesc Alted

------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to