Hi Pierre,

A Sunday 25 May 2008, Pierre GM escrigué:
> Folks,
>
> I need to store MaskedArrays in a HDF5 file, and retrieve them as
> such. I wrote a small subclass of Table (MaskedTable, cf a simplified
> version below) that overwrites the __init__ and read methods, so that
> I can just pass a masked array, store it as a recarray and read it
> back to a MaskedArray, seamlessly.
>
> Well, it doesn't really work as expected: when I write a MaskedTable
> to a file, it is recognized as that subclass. When I close a file,
> reopen it and access the table, it reverts to a standard Table, and
> of course my tailored read method isn't accessed. The behavior is
> illustrated below. Obviously, I missing something: is there an
> attribute that I'm not setting that would let the file recognize that
> its tables are in fact MaskedTables?

Well, basically you missed a couple of things:

- You need to declare the `_c_classId` class variable in order to 
correctly register your new class.

- You need to be able to reconstruct the `description` parameter in the 
constructor in case you are opening the table againg.  Fortunately this 
is very easy to do, because the underlying Table does the dirty job.

A final advice: please try to not overwrite the system HDF5 attributes 
(normally set in UPPER case) unless you have a good reason to do so.  
In your case, I think it would be clearer to set a `shape` attribute 
rather than a `SHAPE` that overwrites the shape of the underlying table 
(doing that can have bad side effects, for example, if you try to 
append more data to the table).

Here it is a version of your code that works correctly:

############################
import numpy as np
import numpy.ma as ma

import tables
from tables import File, Table
from tables.file import _checkfilters
from tables.parameters import EXPECTED_ROWS_TABLE

class MaskedTable(Table):
    _c_classId = 'MaskedTable'
    def __init__(self, parentNode, name, description=None,
                 title="", filters=None,
                 expectedrows=EXPECTED_ROWS_TABLE,
                 chunkshape=None, byteorder=None, _log=True):
        new = description is None
        if not new:
            maskedarray = description
            description = np.array(zip(maskedarray.filled().flat,
                                   ma.getmaskarray(maskedarray).flat),
                                   dtype=[('_data',maskedarray.dtype),
                                          ('_mask',bool)])
        Table.__init__(self, parentNode, name, 
                       description=description, title=title,
                       filters=filters,
                       expectedrows=expectedrows,
                       chunkshape=chunkshape, byteorder=byteorder,
                       _log=_log)
        if not new:
            self.attrs.shape = maskedarray.shape

    def read(self, start=None, stop=None, step=None, field=None):
        data = Table.read(self, start=start, stop=stop, step=step,
                          field=field)
        newshape = self.attrs.shape
        return ma.array(data['_data'],
                        mask=data['_mask']).reshape(newshape)


def createMaskedTable(self, where, name, maskedarray, title="",
                      filters=None, expectedrows=10000,
                      chunkshape=None, byteorder=None,
                      createparents=False):
    parentNode = self._getOrCreatePath(where, createparents)
    _checkfilters(filters)
    return MaskedTable(parentNode, name, maskedarray,
                       title=title, filters=filters, 
                       expectedrows=expectedrows,
                       chunkshape=chunkshape, byteorder=byteorder)
File.createMaskedTable = createMaskedTable


if __name__ == '__main__':
    x = ma.array(np.random.rand(100),mask=(np.random.rand(100) > 0.7))
    h5file = tables.openFile('tester.hdf5','w')
    mtab = h5file.createMaskedTable('/','random',x)
    h5file.flush()
    print type(mtab)
    print mtab.read()
    h5file.close()
    h5file = tables.openFile('tester.hdf5','r')
    mtab = h5file.root.random
    print type(mtab)
    print mtab.read()
###########################


Hope it helps,

PS: Your code is a very nice start of how to support masked arrays in 
PyTables.  In case you eventually end with something more polished and 
tested, I'd glad to add it to PyTables itself.

-- 
Francesc Altet
Freelance developer
Tel +34-964-282-249

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to