Re: [Pytables-users] issues modifing table

Francesc Alted Fri, 29 Jan 2010 03:52:38 -0800

Hello Patrick,

A Friday 29 January 2010 11:03:33 Patrick Kilian escrigué:
> Hi all,
> 
> I like pytables a lot this morning it is driving me up the walls. I have
> a file with particles and I want to modify it. I basically want to split
> _some_ of the particles into smaller subparticles. The HDF5 file is
> generated by some other code (written in C++), but I don't want to use
> the terrible C++ interface to for this simple task.
> 
> The definition of the table containing the particles looks like this:
> /table1 (Table(1049256L,), zlib(6)) 'table1'
>    description := {
>    "particle_id": Float64Col(shape=(), dflt=0.0, pos=0),
>    "x_ort": Float64Col(shape=(), dflt=0.0, pos=1),
>    "y_ort": Float64Col(shape=(), dflt=0.0, pos=2),
>    "z_ort": Float64Col(shape=(), dflt=0.0, pos=3),
>    "x_geschwindigkeit": Float64Col(shape=(), dflt=0.0, pos=4),
>    "y_geschwindigkeit": Float64Col(shape=(), dflt=0.0, pos=5),
>    "z_geschwindigkeit": Float64Col(shape=(), dflt=0.0, pos=6),
>    "Mue": Float64Col(shape=(), dflt=0.0, pos=7),
>    "Masse": Float64Col(shape=(), dflt=0.0, pos=8),
>    "Ladung": Float64Col(shape=(), dflt=0.0, pos=9)}
>    byteorder := 'little'
>    chunkshape := (10L,)
> 
> So my first idea was to use "for p in f.root.table:" to iterate over the
> table, check if the condition is true (basically |v| > 0.1 c) and if
> that is the case delete the particle using removeRows and appending new
> particles.


Caveat emptor: Table.removeRows() is a slow operation due to how HDF5 is 
designed.  In general, it is better to copy rows into separate tables and 
delete the old tables.  See below.

> This turned out to be a bad idea for two (or three) reasons. I would
> have to avoid splitting the particles further and further. A bit ugly
> but manageable.
> 
> Second problem: p contains particles, but removeRows wants row number.
> How do I find that out? Well. The documentation doesn't say.

It does: `Row.nrow`, as can be seen in:
http://www.pytables.org/docs/manual/ch04.html#RowClassDescr

> Turns out
> that this doesn't matter because:
> 
> Third problem: removing and appending in a table you are iterating over
> doesn't work. Ok, fair enough.

Yes.  You cannot alter the length of an iterator while you are iterating it or 
you will get in trouble (this is a general programming principle, at least in 
Python).

> So. second idea. Create a second file, copy the particles which are slow
> enough, and add subparticles insted of the fast particles. Sounds good
> right? Well. Problem. I can't find any way to say "create a table just
> like this one over there, but without the couple of millions of rows".

There are several ways to accomplish this.  One is using `table1.description` 
as the description for your new table.

> And no, I don't want to write boilerplate code, defining the table
> format. And rewriting it every time a new column is added to the output
> of the C++ program.
> 
> Ok. So I though "If I can't create a empty table, I can copy the file to
> a new name, and drop all the rows in the table. That gets me a nice and
> empty table I can fill." Turn out I can't:
> 
> "NotImplementedError: You are trying to delete all the rows in table
> "/table1". This is not supported right now due to limitations on the
> underlying HDF5 library. Sorry!"

As the error says, this is a limitation of the HDF5 1.6.x series.  Link 
PyTables against HDF5 1.8.x and you will get rid of this error.

> Great. Of course I could delete all but the first particle, append what
> I want to the mostly empty table and after I'm done remove that first
> particle. Messy, but at this point I don't care.
> 
> Except ... I can't. If I have the input file i and the output file o.
> Both with identical tables defined. And I do:
> 
> for p in i.root.table1:
>       o.root.table1.append(p)
> 
> it breaks with:
> 
>    File "/usr/lib/python2.5/site-packages/tables/table.py", line 1758,
> in append
>      "rows parameter cannot be converted into a recarray object
> compliant with table '%s'. The error was: <%s>" % (str(self), exc)
> ValueError: rows parameter cannot be converted into a recarray object
> compliant with table '/table1 (Table(1L,), zlib(6)) 'table1''. The error
> was: <objects of type ``Row`` are not supported in this context, sorry;
> supported objects are: NumPy array, record or scalar; homogeneous list
> or tuple, integer, float, complex or string>

Yeah, two problems here.  First, rows is a data *accessor*, not a container.  
If you want to get the *contents* of the row, you should use `p[:]`.  Also, 
`p[:]` is a single row, so it is an scalar, but `Table.append()` wants an 
*array* (or list) of rows.  With this, the next idiom:

for p in i.root.table1:
    o.root.table1.append([p[:]])

will do the trick.

> At that point I decided that pytables is fine for reading but just
> doesn't cut it for modifying tables. Which is a shame given its name.

I'm attaching a small script showing you how to do what you are trying to do 
in an 'pytablish' manner.  As you will see, it splits the original table in 
one file in two ('slow' and 'fast') in another file.

> Patrick "Using pytables ro for now" Kilian

Hope you can proceed with 'rw' mode soon ;-)

-- 
Francesc Alted

from tables import openFile, Float64Col

N = 50*1000  # the number of particles
cut = 300000*.1

# The definition or your particle
particle = {
    "particle_id": Float64Col(shape=(), dflt=0.0, pos=0),
    "x_ort": Float64Col(shape=(), dflt=0.0, pos=1),
    "y_ort": Float64Col(shape=(), dflt=0.0, pos=2),
    "z_ort": Float64Col(shape=(), dflt=0.0, pos=3),
    "x_g": Float64Col(shape=(), dflt=0.0, pos=4),
    "y_g": Float64Col(shape=(), dflt=0.0, pos=5),
    "z_g": Float64Col(shape=(), dflt=0.0, pos=6),
    "Mue": Float64Col(shape=(), dflt=0.0, pos=7),
    "Masse": Float64Col(shape=(), dflt=0.0, pos=8),
    "Ladung": Float64Col(shape=(), dflt=0.0, pos=9)}

# Create initial file and table
f = openFile("/tmp/initial.h5", "w")
table1 = f.createTable(f.root, 'table1', particle)

# Populate the table with some arbitrary values for speed
row = table1.row
for x in xrange(N):
    row["x_g"] = x
    row["y_g"] = x*2
    row["z_g"] = x/2
    row.append()
table1.flush()

# Create a new file for the split
f2 = openFile("/tmp/final.h5", "w")

# Now, split the table in two with 'slow' and 'fast' particles each
# Fast particles...
fast = f2.createTable(f2.root, 'fast', table1.description)
row_fast = fast.row
for row in table1.where("sqrt(x_g**2+y_g**2+z_g**2) > cut"):
    #fast.append([row[:]])
    row_fast["x_g"] = row["x_g"]
    row_fast["y_g"] = row["y_g"]
    row_fast["z_g"] = row["z_g"]
    row_fast.append()
fast.flush()

# Slow particles...
slow = f2.createTable(f2.root, 'slow', table1.description)
row_slow = slow.row
for row in table1.where("sqrt(x_g**2+y_g**2+z_g**2) <= cut"):
    row_slow["x_g"] = row["x_g"]
    row_slow["y_g"] = row["y_g"]
    row_slow["z_g"] = row["z_g"]
    row_slow.append()
slow.flush()

f.close()
f2.close()

------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com

_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] issues modifing table

Reply via email to