Hi, Francesc Alted

>> I like pytables a lot this morning it is driving me up the walls. I have
>> a file with particles and I want to modify it. I basically want to split
>> _some_ of the particles into smaller subparticles. The HDF5 file is
>> generated by some other code (written in C++), but I don't want to use
>> the terrible C++ interface to for this simple task.
>>
>> So my first idea was to use "for p in f.root.table:" to iterate over the
>> table, check if the condition is true (basically |v| > 0.1 c) and if
>> that is the case delete the particle using removeRows and appending new
>> particles.
> 
> Caveat emptor: Table.removeRows() is a slow operation due to how HDF5 is 
> designed.  In general, it is better to copy rows into separate tables and 
> delete the old tables.  See below.
Well yes. But then again deleting the one million rows takes about 10 
seconds on my system which is fast enough.


>> This turned out to be a bad idea for two (or three) reasons. I would
>> have to avoid splitting the particles further and further. A bit ugly
>> but manageable.
>>
>> Second problem: p contains particles, but removeRows wants row number.
>> How do I find that out? Well. The documentation doesn't say.
> 
> It does: `Row.nrow`, as can be seen in:
> http://www.pytables.org/docs/manual/ch04.html#RowClassDescr
Ah. Thanks.


>> So. second idea. Create a second file, copy the particles which are slow
>> enough, and add subparticles insted of the fast particles. Sounds good
>> right? Well. Problem. I can't find any way to say "create a table just
>> like this one over there, but without the couple of millions of rows".
> 
> There are several ways to accomplish this.  One is using `table1.description` 
> as the description for your new table.
Now that I knew what to look for I found it in the documentation. It's 
quite well hidden though.


>> Ok. So I though "If I can't create a empty table, I can copy the file to
>> a new name, and drop all the rows in the table. That gets me a nice and
>> empty table I can fill." Turn out I can't:
>>
>> "NotImplementedError: You are trying to delete all the rows in table
>> "/table1". This is not supported right now due to limitations on the
>> underlying HDF5 library. Sorry!"
> 
> As the error says, this is a limitation of the HDF5 1.6.x series.  Link 
> PyTables against HDF5 1.8.x and you will get rid of this error.
Problem is that the current C++ code relies on HDF5 1.6. But ok. Good to 
know that it will get better once we switch to HDF5 1.8.




>> Except ... I can't. If I have the input file i and the output file o.
>> Both with identical tables defined. And I do:
>>
>> for p in i.root.table1:
>>      o.root.table1.append(p)
>>
>> it breaks with:
>>
>>    File "/usr/lib/python2.5/site-packages/tables/table.py", line 1758,
>> in append
>>      "rows parameter cannot be converted into a recarray object
>> compliant with table '%s'. The error was: <%s>" % (str(self), exc)
>> ValueError: rows parameter cannot be converted into a recarray object
>> compliant with table '/table1 (Table(1L,), zlib(6)) 'table1''. The error
>> was: <objects of type ``Row`` are not supported in this context, sorry;
>> supported objects are: NumPy array, record or scalar; homogeneous list
>> or tuple, integer, float, complex or string>
> 
> Yeah, two problems here.  First, rows is a data *accessor*, not a container.  
> If you want to get the *contents* of the row, you should use `p[:]`.  Also, 
> `p[:]` is a single row, so it is an scalar, but `Table.append()` wants an 
> *array* (or list) of rows.  With this, the next idiom:
> 
> for p in i.root.table1:
>     o.root.table1.append([p[:]])
> 
> will do the trick.
These two paragraphs were immensely helpful. I'm still not sure where I 
could have found in the documentation. But those two lines of code along 
with our explanation saved my day.


>> At that point I decided that pytables is fine for reading but just
>> doesn't cut it for modifying tables. Which is a shame given its name.
>>
>> Patrick "Using pytables ro for now" Kilian
> 
> Hope you can proceed with 'rw' mode soon ;-)
Doing that now. Thanks again.


> Ooops!  I forgot the handy and efficient `Table.whereAppend()` method.  With 
> it, your problem is reduced to something like:
> 
> # Fast particles...
> fast = f2.createTable(f2.root, 'fast', table1.description)
> table1.whereAppend(fast, "sqrt(x_g**2+y_g**2+z_g**2) > cut")
> 
> # Slow particles...
> slow = f2.createTable(f2.root, 'slow', table1.description)
> table1.whereAppend(slow, "sqrt(x_g**2+y_g**2+z_g**2) <= cut")
This is not exactly what I need but quite elegant.



Patrick "a happy pytables user" Kilian

------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to