Hello,

A while ago I had proposed an extension to
PyTables to support files that use Dimension Scales.
In the meantime I had to work on different projects,
so I didn't have the time to respond. But now I found
time to work on it again, so here my response to
the last mail that we exchanged.

> Now, apart of some small technical details, the thing that I find the most
> 'arguable' is the fact that you have introduced a new dtype (typecode 'r')
> for NumPy (good trick, BTW).  Is that strictly necessary?  My worries is that
> NumPy would decide in the future to make use of the 'r' typecode, in which
> case, we would have a problem.  Would not it be possible to use plain python
> nested lists for this?  I find the latter preferable, but perhaps you have
> some use case for wanting a native NumPy typecode.

I thought about that (actually, I once even wrote a version like that, but you
mentioned that PyTables now supports record arrays as attributes).
The problem is that dimension scales use extremely complicated data
structures (a table in an attribute containing references) and doing all
that with standard python data types is certainly possible. The drawback
is that it is really hard to write a generic solution that works for a wide
range of usecases. Therefore I ended up with a very complicated and
very specific code that could cope with not much more than Dimension Scales.
So, once we want to add support for other new data structures we would
have to add new special code.

The numpy reference solution on the other hand is very generic, it will be easy
to also introduce tables in datasets containing references or other stuff.

The problem of numpy using the typecode r for something else I don't see
as a big problem: there is nearly noone who introduces new numpy datatypes,
so we can just ask the numpy guys to reserve the r for us. Since they don't get
many of those request (actually, I haven't found any) I guess they
won't obstain.

> Also, I don't see that you have made a proper implementation of 'Dimension
> Scales' as understood in:
>
> http://ftp.hdfgroup.org/HDF5/Tutor/h5dimscale.html
>
> but only support for HDF5 references (but I suppose that, with a little more
> of work we can be there...)

Well, I wrote that once in an email. On purpose I didn't support
dimension scales
directly, but added support for the necessary data structures in PyTables.
This has several advantages: firstly, my code runs without problems
with hdf5 1.6,
while dimension scales exist only starting 1.8. Supporting dimension scales
directly would also have ment that we are creating attributes that we
cannot interpret,
and will show up as an unknown type only. Actually, PyTables at some
point wasn't
able to open datasets containing dimension scales at all.

Once one has the support for references, and variable-length lists as
attributes,
it is trivial to do that in python.

> Finally, I miss some test units, but that should easy to solve.

Yep, I'm working on that. Especially some that create dimension scales...
(I have written some already, they're just unreadable to anyone else than me...)

> If you agree to work with that, I'd like to open a new public branch in the
> PyTables repository and give you commit permissions there.  We can continue
> discussing the issues here so that other people can contribute with
> opinions, test units, docstrings or whatever.

That would be real cool.

Greetings,

Martin

------------------------------------------------------------------------------
Throughout its 18-year history, RSA Conference consistently attracts the
world's best and brightest in the field, creating opportunities for Conference
attendees to learn about information security's most important issues through
interactions with peers, luminaries and emerging and established companies.
http://p.sf.net/sfu/rsaconf-dev2dev
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to