As requested I am replying with some feedback regarding some timing
tests of the different variations of sollutions to the problem mentioned
earlier. Like every good experiment The experiments were run three times
and the averages were taken. These times also include overhead of
reading the data which is a constant throughout all experiments and
therefore is controlled. I have listed my results bellow.

There may be issues relative to the number of partitions within my type
of data upon certain structures. But after my experimentation these
factors can only push me towards my choice of design. Again this is
related to my type of data (as you mentioned).

On Wed, Feb 06, 2008 at 05:33:38PM +0100, Ivan Vilata i Balaguer wrote:
> However, you may create a ``VLArray`` of ``ObjectAtom``, which will save
> every row as a pickled Python object.  

Experiment 1:  VLArray of Object Atoms

547 Records

Writing:    28.397 s  
Reading:     2.744 s

> Pickling into a fixed width field

This is not really possible due to the nature of the data as there is no
maximum to the length of this structure. There is one that can be
calculated from the current data but it may need changing in the future.
Which I have read is possible to do with pytables tables. I am avoiding
this area for now.

> in a table (as you mention) or into a row in an enlargeable array are
> also possible solutions, but involve manual (un)pickling.

Experiment 2: 2 EArrays

One for Single Chars = Pickled data
One for data offsets

I am not sure if there is a better way of doing this. Maybe someone can
inspire me.

Writing:    32.343s
Reading:     3.441s

> You may also use two ``VLArray`` nodes, one for the flat list of
> numbers
> and another one for the indexes where the list is splitted::
> 
>     vlarray1 = [                        vlarray2 = [
>       [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],    [0, 3, 5, 6],
>       [1, 2, 3, 4, 5, 7, 8, 9, 6],        [0, 2, 5],
>       [4, 5, 6, 7, 3, 2, 1],              [0, 2, 3],
>       [1, 2, 3, 4, 5, 6],                 [0, 3, 5],
>       [1, 4, 5, 7, 8, 2, 3],              [0, 1, 3],
>       ...                                 ...
>     ]                                   ]

Experiment 3:

This is essentially the same as Exp 2 but more intuitive.

Writing:    27.394s
Reading:    20.755s

Type 3 was consistently faster in writing which is intuitive as it does
not invlove pickling. Where as recreating the structure through the
access of two arrays slows everything down.

Type 2 is as consistent as Type 1 with variations between reading and
writing however is a very convoluted method of doing things.

Type 1 is the most intuitive to program and use and happens to be the
fastest to read which will be done more often that writing. And is only
slightly slower than type 3.

I would be looking ofrward to the inclusion of variable arrays or
variable length strings or pickled objects or variable length within the
standard table as that would be the most suitable for my application.
Until then I believe that I will be going with pickling into a variable
length array with a link in the table linking to the index of its object
into the vlarray.

Thanks,

-- 
Hatem Nassrat BCSc.
FCS - Dalhousie University

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to