Hi Scott, I also getting ready to use strings to store (only in netCDF4 format that is built on HDF5 that pyTables supports, and incidently netCDF4 supports compressions for strings (as does HDF5))
I have inquired about the same thing in a separate email and with post the reply below. But overall, what I am planning to do is to figure out how to determine a max length of my string per file (probably when creating the file) at runtime, and then set the PyTable string to that maximum amount (I have not implemented that yet -- but that's what I am thinking) Here is the reply I got on the same question (so that I will save Francesc some typing :-) --------------------------------------------------------- Strings are supported in PyTables as long as they are fixed length. If you want to work with strings with variable length, this can be faked by using the provisions that PyTables/NumPy has to represent variable length strings coming from fixed length ones. For example: In [1]: import tables In [2]: f = tables.openFile("/tmp/file.h5", "w") In [3]: a = f.createArray("/", "dstring", ["123", "123456789"]) In [4]: a Out[4]: /dstring (Array(2,)) '' atom := StringAtom(itemsize=9, shape=(), dflt='') maindim := 0 flavor := 'python' byteorder := 'irrelevant' chunkshape := None In [5]: a[0] Out[5]: '123' In [6]: a[1] Out[6]: '123456789' As you see, you are retrieving "variable" length strings out of the "dstring" dataset, even though they are saved as regular fixed length ones in HDF5. Fixed length string implementation in PyTables is similar to VARCHAR type in relational databases in that you choose a maximum length (MAXLEN) for your types. This means that they take MAXLEN bytes for each string type. However, that additional space consumption can be minimized if you use on-disk compression. ---------------------------------------------------------------- Regards, Vlad On Sat, 8 Nov 2008 17:35:25 -0700, "Scott MacDonald" <[EMAIL PROTECTED]> said: > I am trying to populate an HDF5 file using PyTables with data from SQL > database. One of the columns of this table is defined as 'VARCHAR(MAX)'. > I > would like to be able to do something like: > > class NewsItems(tables.IsDescription): > item_id = tables.Int32Col() > isodatetime = tables.StringCol(26) > newstext = tables.StringCol(MAX) ** obviously won't work > > In my database, the length of the strings in the varchar(max) column > varries > from ~2000 to ~60000. If I simply set the 'newstext' variable in the > above > class definition to be slightly larger than the maximum length then the > file > that I create is unacceptably large (over 6GB). > > I am a new user to PyTables, but I have read the documentation and have > not > been able to answer my own question yet. Any thoughts? > > Thanks in advance. > > Scott -- V S P [EMAIL PROTECTED] -- http://www.fastmail.fm - Email service worth paying for. Try it for free ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users