Hi V S P, A Sunday 26 October 2008, V S P escrigué: > Hi, > > > > NetCDF4 introduces a new atomic type called string > (and also the underlying format is HDF5) > > however I am having difficulties verifying that that functionality > is available in all the pieces of the software in postprocessing side > of things. > > I contacted the developer for Python-netcdf4 package about > string support and here is what he said: > " > ... No, I don't expect to be adding that anytime soon. The main > problem is that there is no numpy array type corresponding to > variable length strings. You might check out pytables, it may be > able to handle it. > " > > So now wanted to find out , if dealing with strings > is something that pytables can do with netCDF4 (and then I will be > using NumPy to postprocess the data as part of SageMath)
Strings are supported in PyTables as long as they are fixed length. If you want to work with strings with variable length, this can be faked by using the provisions that PyTables/NumPy has to represent variable length strings coming from fixed length ones. For example: In [1]: import tables In [2]: f = tables.openFile("/tmp/file.h5", "w") In [3]: a = f.createArray("/", "dstring", ["123", "123456789"]) In [4]: a Out[4]: /dstring (Array(2,)) '' atom := StringAtom(itemsize=9, shape=(), dflt='') maindim := 0 flavor := 'python' byteorder := 'irrelevant' chunkshape := None In [5]: a[0] Out[5]: '123' In [6]: a[1] Out[6]: '123456789' As you see, you are retrieving "variable" length strings out of the "dstring" dataset, even though they are saved as regular fixed length ones in HDF5. Fixed length string implementation in PyTables is similar to VARCHAR type in relational databases in that you choose a maximum length (MAXLEN) for your types. This means that they take MAXLEN bytes for each string type. However, that additional space consumption can be minimized if you use on-disk compression. > Also, (this is a separate question) -- is Python 3.0 support > something that you plan to make available this year? I'd like to, but it happens that PyTables depends on NumPy, and they don't have announced plans for Python 3.0 support yet (in fact, even compiling NumPy for Python 2.6 and Windows platforms is not supported yet). As soon as NumPy would add support for Python 3.0, I'll start adding the support for PyTables too. Having said this, PyTables trunk (as well as NumPy indeed) already works flawlessly against Python 2.6, and as you may know, having this done, the support for 3.0 should be rather easy. Cheers, -- Francesc Alted ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users