Forwarding to list. Your email address doesn't appear to be registered. See http://sourceforge.net/mail/?group_id=63486 for more information.
Begin forwarded message: > From: pytables-users-boun...@lists.sourceforge.net > Date: September 13, 2011 7:51:22 AM GMT+02:00 > To: pytables-users-ow...@lists.sourceforge.net > Subject: Auto-discard notification > > The attached message has been automatically discarded. > From: "Christian Werner" <werner_christ...@gmx.net> > Date: September 13, 2011 7:51:13 AM GMT+02:00 > To: pytables-users@lists.sourceforge.net > Subject: Efficient way to do nested grouping of rows ?!? > > > Hi list. > > I was pulling my hair out all day trying to implement the following but > cannot seem to see the light. Not sure if the subject line was all that > descriptive but here it goes: > > Say I have a massive table (>10 mio lines) stored in a hdf file. > The simplyfied table structure looks something like this: > > id year area repl var1 var2 var3 > 1 1990 12.3 1 1.4 1.34 1.23 > 1 1991 12.3 1 0.9 0.3 1.3 > 1 1992 12.3 1 1.3 1.1 1.7 > 1 1990 12.3 2 1.5 1.4 1.3 > 1 1991 12.3 2 1.1 0.5 1.53 > 1 1992 12.3 2 1.6 1.3 1.8 > 1 1990 12.3 3 1.8 1.74 1.3 > 1 1991 12.3 3 1.4 0.5 1.43 > 1 1992 12.3 3 1.7 1.3 1.8 > 2 1990 4.5 1 3.3 3.6 2.3 > 2 1991 4.5 1 5.6 6.7 1.2 > 2 1992 4.5 1 6.5 4.5 3.3 > 2 1990 4.5 2 3.3 3.6 2.3 > 2 1991 4.5 2 5.6 6.7 1.2 > 2 1992 4.5 2 6.5 4.5 3.3 > 2 1990 4.5 3 3.3 3.6 2.3 > 2 1991 4.5 3 5.6 6.7 1.2 > 2 1992 4.5 3 6.5 4.5 3.3 > 3 1990 12.9 1 1.3 1.1 0.4 > 3 1991 12.9 1 3.2 3.4 5.6 > 3 1992 12.9 1 3.5 3.4 3.5 > ... > > The pytables table is sorted by field "id". Currently I use the following > code: > > import itertools > import numpy as np > > var = 'var1' > > def id_selector(row): > return row['id'] > > icnt = 0 > for i, rows_grouped_by_id in itertools.groupby(table, id_selector): > x = np.average( [ [r[var] * r['area'] for r in rows_grouped_by_id] ) > print i, x > > > This works fine and dandy (and fast). However, how do I efficiently do > another "nested" grouping? Say, I might want > to average all replicates (col repl) of a given year (col year) within each > and every id. I'm stumped. How would you do that? I feel I should use > row-indices, no? I cannot use the .where() construct of table to limit rows > on a bunch of rows, can I? > > Thanks for any pointers. > > C > >
------------------------------------------------------------------------------ BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA Learn about the latest advances in developing for the BlackBerry® mobile platform with sessions, labs & more. See new tools and technologies. Register for BlackBerry® DevCon today! http://p.sf.net/sfu/rim-devcon-copy1
_______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users