Hi V S P,

A Sunday 26 October 2008, V S P escrigué:
> Hi,
>
>
>
> NetCDF4 introduces a new atomic type called string
> (and also the underlying format is HDF5)
>
> however I am having difficulties verifying that that functionality
> is available in all the pieces of the software in postprocessing side
> of things.
>
> I contacted the developer for Python-netcdf4 package about
> string support and here is what he said:
> "
> ... No, I don't expect to be adding that anytime soon.  The main
> problem is that there is no numpy array type corresponding to
> variable length strings.  You might check out pytables, it may be
> able to handle it.
> "
>
> So now wanted to find out , if dealing with strings
> is something that pytables can do with netCDF4 (and then I will be
> using NumPy to postprocess the data as part of SageMath)

Strings are supported in PyTables as long as they are fixed length.  If 
you want to work with strings with variable length, this can be faked 
by using the provisions that PyTables/NumPy has to represent variable 
length strings coming from fixed length ones.  For example:

In [1]: import tables

In [2]: f = tables.openFile("/tmp/file.h5", "w")

In [3]: a = f.createArray("/", "dstring", ["123", "123456789"])

In [4]: a
Out[4]:
/dstring (Array(2,)) ''
  atom := StringAtom(itemsize=9, shape=(), dflt='')
  maindim := 0
  flavor := 'python'
  byteorder := 'irrelevant'
  chunkshape := None

In [5]: a[0]
Out[5]: '123'

In [6]: a[1]
Out[6]: '123456789'

As you see, you are retrieving "variable" length strings out of 
the "dstring" dataset, even though they are saved as regular fixed 
length ones in HDF5.

Fixed length string implementation in PyTables is similar to VARCHAR 
type in relational databases in that you choose a maximum length 
(MAXLEN) for your types.  This means that they take MAXLEN bytes for 
each string type.  However, that additional space consumption can be 
minimized if you use on-disk compression.

> Also, (this is a separate question) -- is Python 3.0 support
> something that you plan to make available this year?

I'd like to, but it happens that PyTables depends on NumPy, and they 
don't have announced plans for Python 3.0 support yet (in fact, even 
compiling NumPy for Python 2.6 and Windows platforms is not supported 
yet).  As soon as NumPy would add support for Python 3.0, I'll start 
adding the support for PyTables too.  Having said this, PyTables trunk 
(as well as NumPy indeed) already works flawlessly against Python 2.6, 
and as you may know, having this done, the support for 3.0 should be 
rather easy.

Cheers,

-- 
Francesc Alted

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to