On Tue, Sep 13, 2011 at 12:53 AM, PyTables Org <pytab...@googlemail.com>wrote:
> Forwarding to list. Your email address doesn't appear to be registered. > See http://sourceforge.net/mail/?group_id=63486 for more information. > > Begin forwarded message: > > *From: *pytables-users-boun...@lists.sourceforge.net > *Date: *September 13, 2011 7:51:22 AM GMT+02:00 > *To: *pytables-users-ow...@lists.sourceforge.net > *Subject: **Auto-discard notification* > > The attached message has been automatically discarded. > *From: *"Christian Werner" <werner_christ...@gmx.net> > *Date: *September 13, 2011 7:51:13 AM GMT+02:00 > *To: *pytables-users@lists.sourceforge.net > *Subject: **Efficient way to do nested grouping of rows ?!?* > > > Hi list. > > I was pulling my hair out all day trying to implement the following but > cannot seem to see the light. Not sure if the subject line was all that > descriptive but here it goes: > > Say I have a massive table (>10 mio lines) stored in a hdf file. > The simplyfied table structure looks something like this: > > id year area repl var1 var2 var3 > 1 1990 12.3 1 1.4 1.34 1.23 > 1 1991 12.3 1 0.9 0.3 1.3 > 1 1992 12.3 1 1.3 1.1 1.7 > 1 1990 12.3 2 1.5 1.4 1.3 > 1 1991 12.3 2 1.1 0.5 1.53 > 1 1992 12.3 2 1.6 1.3 1.8 > 1 1990 12.3 3 1.8 1.74 1.3 > 1 1991 12.3 3 1.4 0.5 1.43 > 1 1992 12.3 3 1.7 1.3 1.8 > 2 1990 4.5 1 3.3 3.6 2.3 > 2 1991 4.5 1 5.6 6.7 1.2 > 2 1992 4.5 1 6.5 4.5 3.3 > 2 1990 4.5 2 3.3 3.6 2.3 > 2 1991 4.5 2 5.6 6.7 1.2 > 2 1992 4.5 2 6.5 4.5 3.3 > 2 1990 4.5 3 3.3 3.6 2.3 > 2 1991 4.5 3 5.6 6.7 1.2 > 2 1992 4.5 3 6.5 4.5 3.3 > 3 1990 12.9 1 1.3 1.1 0.4 > 3 1991 12.9 1 3.2 3.4 5.6 > 3 1992 12.9 1 3.5 3.4 3.5 > ... > > The pytables table is sorted by field "id". Currently I use the following > code: > > import itertools > import numpy as np > > var = 'var1' > > def id_selector(row): > return row['id'] > > icnt = 0 > for i, rows_grouped_by_id in itertools.groupby(table, id_selector): > x = np.average( [ [r[var] * r['area'] for r in rows_grouped_by_id] ) > print i, x > > > This works fine and dandy (and fast). However, how do I efficiently do > another "nested" grouping? Say, I might want > to average all replicates (col repl) of a given year (col year) within each > and every id. I'm stumped. How would you do that? I feel I should use > row-indices, no? I cannot use the .where() construct of table to limit rows > on a bunch of rows, can I? > > I am pretty sure that .where() is exactly what you want to use. You could construct expressions that select on both, and then loop through them. The following would get you the mean for one id and one year: np.mean([row.repl for row in table.wher("id == {id} & year == {y}".format(r=id, y = year))]) On the other hand, you can probably keep using itertools and groupby via having your selector function return a tuple: def selector(row): return (row['id'], row['year']) This also might work, and would be faster if it did: def selector(row): # note the double index. return (row[['id', 'year']] Be Well Anthony > Thanks for any pointers. > > C > > > > > > ------------------------------------------------------------------------------ > BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA > Learn about the latest advances in developing for the > BlackBerry® mobile platform with sessions, labs & more. > See new tools and technologies. Register for BlackBerry® DevCon today! > http://p.sf.net/sfu/rim-devcon-copy1 > _______________________________________________ > Pytables-users mailing list > Pytables-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/pytables-users > >
------------------------------------------------------------------------------ BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA Learn about the latest advances in developing for the BlackBerry® mobile platform with sessions, labs & more. See new tools and technologies. Register for BlackBerry® DevCon today! http://p.sf.net/sfu/rim-devcon-copy1
_______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users