[Pytables-users] Efficient way to do nested grouping of rows ?!?

PyTables Org Mon, 12 Sep 2011 22:54:32 -0700

Forwarding to list. Your email address doesn't appear to be registered. See 
http://sourceforge.net/mail/?group_id=63486 for more information.


Begin forwarded message:

> From: pytables-users-boun...@lists.sourceforge.net
> Date: September 13, 2011 7:51:22 AM GMT+02:00
> To: pytables-users-ow...@lists.sourceforge.net
> Subject: Auto-discard notification
> 
> The attached message has been automatically discarded.
> From: "Christian Werner" <werner_christ...@gmx.net>
> Date: September 13, 2011 7:51:13 AM GMT+02:00
> To: pytables-users@lists.sourceforge.net
> Subject: Efficient way to do nested grouping of rows ?!?
> 
> 
> Hi list.
> 
> I was pulling my hair out all day trying to implement the following but 
> cannot seem to see the light. Not sure if the subject line was all that 
> descriptive but here it goes:
> 
> Say I have a massive table (>10 mio lines) stored in a hdf file.
> The simplyfied table structure looks something like this:
> 
> id  year    area    repl    var1    var2    var3
> 1   1990    12.3    1       1.4     1.34    1.23
> 1   1991    12.3    1       0.9     0.3     1.3
> 1   1992    12.3    1       1.3     1.1     1.7
> 1   1990    12.3    2       1.5     1.4     1.3
> 1   1991    12.3    2       1.1     0.5     1.53
> 1   1992    12.3    2       1.6     1.3     1.8
> 1   1990    12.3    3       1.8     1.74    1.3
> 1   1991    12.3    3       1.4     0.5     1.43
> 1   1992    12.3    3       1.7     1.3     1.8
> 2   1990    4.5     1       3.3     3.6     2.3
> 2   1991    4.5     1       5.6     6.7     1.2
> 2   1992    4.5     1       6.5     4.5     3.3
> 2   1990    4.5     2       3.3     3.6     2.3
> 2   1991    4.5     2       5.6     6.7     1.2
> 2   1992    4.5     2       6.5     4.5     3.3
> 2   1990    4.5     3       3.3     3.6     2.3
> 2   1991    4.5     3       5.6     6.7     1.2
> 2   1992    4.5     3       6.5     4.5     3.3
> 3   1990    12.9    1       1.3     1.1     0.4
> 3   1991    12.9    1       3.2     3.4     5.6
> 3   1992    12.9    1       3.5     3.4     3.5
> ...
> 
> The pytables table is sorted by field "id". Currently I use the following 
> code:
> 
> import itertools
> import numpy as np
> 
> var = 'var1'
> 
> def id_selector(row):
>    return row['id']
> 
> icnt = 0
> for i, rows_grouped_by_id in itertools.groupby(table, id_selector):
>    x = np.average( [ [r[var] * r['area'] for r in rows_grouped_by_id] )
>    print i, x
> 
> 
> This works fine and dandy (and fast). However, how do I efficiently do 
> another "nested" grouping? Say, I might want 
> to average all replicates (col repl) of a given year (col year) within each 
> and every id. I'm stumped. How would you do that? I feel I should use 
> row-indices, no? I cannot use the .where() construct of table to limit rows 
> on a bunch of rows, can I?
> 
> Thanks for any pointers.
> 
> C    
> 
>

------------------------------------------------------------------------------
BlackBerry&reg; DevCon Americas, Oct. 18-20, San Francisco, CA
Learn about the latest advances in developing for the 
BlackBerry&reg; mobile platform with sessions, labs & more.
See new tools and technologies. Register for BlackBerry&reg; DevCon today!
http://p.sf.net/sfu/rim-devcon-copy1

_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

[Pytables-users] Efficient way to do nested grouping of rows ?!?

Reply via email to