Re: [Pytables-users] Variable length string column?

V S P Sat, 08 Nov 2008 17:49:14 -0800

Hi Scott,
I also getting ready to use strings to store
(only in netCDF4 format that is built on HDF5 that pyTables supports,
and incidently netCDF4 supports compressions for strings (as does
HDF5))


I have inquired about the same thing in a separate email and
with post the reply below.

But overall, what I am planning to do is to figure out
how to determine a max length of my string per file (probably
when creating the file) at
runtime, and then set the PyTable string to that maximum
amount (I have not implemented that yet -- but that's what I am
thinking)

Here is the reply I got on the same question
(so that I will save Francesc some typing :-)

---------------------------------------------------------

Strings are supported in PyTables as long as they are fixed length.  If 
you want to work with strings with variable length, this can be faked 
by using the provisions that PyTables/NumPy has to represent variable 
length strings coming from fixed length ones.  For example:

In [1]: import tables

In [2]: f = tables.openFile("/tmp/file.h5", "w")

In [3]: a = f.createArray("/", "dstring", ["123", "123456789"])

In [4]: a
Out[4]:
/dstring (Array(2,)) ''
  atom := StringAtom(itemsize=9, shape=(), dflt='')
  maindim := 0
  flavor := 'python'
  byteorder := 'irrelevant'
  chunkshape := None

In [5]: a[0]
Out[5]: '123'

In [6]: a[1]
Out[6]: '123456789'

As you see, you are retrieving "variable" length strings out of 
the "dstring" dataset, even though they are saved as regular fixed 
length ones in HDF5.

Fixed length string implementation in PyTables is similar to VARCHAR 
type in relational databases in that you choose a maximum length 
(MAXLEN) for your types.  This means that they take MAXLEN bytes for 
each string type.  However, that additional space consumption can be 
minimized if you use on-disk compression.


----------------------------------------------------------------

Regards,
Vlad




On Sat, 8 Nov 2008 17:35:25 -0700, "Scott MacDonald"
<[EMAIL PROTECTED]> said:
> I am trying to populate an HDF5 file using PyTables with data from SQL
> database.  One of the columns of this table is defined as 'VARCHAR(MAX)'.
>  I
> would like to be able to do something like:
> 
> class NewsItems(tables.IsDescription):
>     item_id = tables.Int32Col()
>     isodatetime = tables.StringCol(26)
>     newstext = tables.StringCol(MAX)  ** obviously won't work
> 
> In my database, the length of the strings in the varchar(max) column
> varries
> from ~2000 to ~60000.  If I simply set the 'newstext' variable in the
> above
> class definition to be slightly larger than the maximum length then the
> file
> that I create is unacceptably large (over 6GB).
> 
> I am a new user to PyTables, but I have read the documentation and have
> not
> been able to answer my own question yet.  Any thoughts?
> 
> Thanks in advance.
> 
> Scott
-- 
  V S P
  [EMAIL PROTECTED]

-- 
http://www.fastmail.fm - Email service worth paying for. Try it for free


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Variable length string column?

Reply via email to