On Tue, Sep 13, 2011 at 12:53 AM, PyTables Org <pytab...@googlemail.com>wrote:

> Forwarding to list. Your email address doesn't appear to be registered.
> See http://sourceforge.net/mail/?group_id=63486 for more information.
>
> Begin forwarded message:
>
> *From: *pytables-users-boun...@lists.sourceforge.net
> *Date: *September 13, 2011 7:51:22 AM GMT+02:00
> *To: *pytables-users-ow...@lists.sourceforge.net
> *Subject: **Auto-discard notification*
>
> The attached message has been automatically discarded.
> *From: *"Christian Werner" <werner_christ...@gmx.net>
> *Date: *September 13, 2011 7:51:13 AM GMT+02:00
> *To: *pytables-users@lists.sourceforge.net
> *Subject: **Efficient way to do nested grouping of rows ?!?*
>
>
> Hi list.
>
> I was pulling my hair out all day trying to implement the following but
> cannot seem to see the light. Not sure if the subject line was all that
> descriptive but here it goes:
>
> Say I have a massive table (>10 mio lines) stored in a hdf file.
> The simplyfied table structure looks something like this:
>
> id  year    area    repl    var1    var2    var3
> 1   1990    12.3    1       1.4     1.34    1.23
> 1   1991    12.3    1       0.9     0.3     1.3
> 1   1992    12.3    1       1.3     1.1     1.7
> 1   1990    12.3    2       1.5     1.4     1.3
> 1   1991    12.3    2       1.1     0.5     1.53
> 1   1992    12.3    2       1.6     1.3     1.8
> 1   1990    12.3    3       1.8     1.74    1.3
> 1   1991    12.3    3       1.4     0.5     1.43
> 1   1992    12.3    3       1.7     1.3     1.8
> 2   1990    4.5     1       3.3     3.6     2.3
> 2   1991    4.5     1       5.6     6.7     1.2
> 2   1992    4.5     1       6.5     4.5     3.3
> 2   1990    4.5     2       3.3     3.6     2.3
> 2   1991    4.5     2       5.6     6.7     1.2
> 2   1992    4.5     2       6.5     4.5     3.3
> 2   1990    4.5     3       3.3     3.6     2.3
> 2   1991    4.5     3       5.6     6.7     1.2
> 2   1992    4.5     3       6.5     4.5     3.3
> 3   1990    12.9    1       1.3     1.1     0.4
> 3   1991    12.9    1       3.2     3.4     5.6
> 3   1992    12.9    1       3.5     3.4     3.5
> ...
>
> The pytables table is sorted by field "id". Currently I use the following
> code:
>
> import itertools
> import numpy as np
>
> var = 'var1'
>
> def id_selector(row):
>    return row['id']
>
> icnt = 0
> for i, rows_grouped_by_id in itertools.groupby(table, id_selector):
>    x = np.average( [ [r[var] * r['area'] for r in rows_grouped_by_id] )
>    print i, x
>
>
> This works fine and dandy (and fast). However, how do I efficiently do
> another "nested" grouping? Say, I might want
> to average all replicates (col repl) of a given year (col year) within each
> and every id. I'm stumped. How would you do that? I feel I should use
> row-indices, no? I cannot use the .where() construct of table to limit rows
> on a bunch of rows, can I?
>
>
I am pretty sure that .where() is exactly what you want to use.  You could
construct expressions that select on both, and then loop through them.  The
following would get you the mean for one id and one year:

np.mean([row.repl for row in table.wher("id == {id} & year ==
{y}".format(r=id, y = year))])


On the other hand, you can probably keep using itertools and groupby via
having your selector function return a tuple:

def selector(row):
    return (row['id'], row['year'])


This also might work, and would be faster if it did:

def selector(row):
    # note the double index.
    return (row[['id', 'year']]


Be Well
Anthony


> Thanks for any pointers.
>
> C
>
>
>
>
>
> ------------------------------------------------------------------------------
> BlackBerry&reg; DevCon Americas, Oct. 18-20, San Francisco, CA
> Learn about the latest advances in developing for the
> BlackBerry&reg; mobile platform with sessions, labs & more.
> See new tools and technologies. Register for BlackBerry&reg; DevCon today!
> http://p.sf.net/sfu/rim-devcon-copy1
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
------------------------------------------------------------------------------
BlackBerry&reg; DevCon Americas, Oct. 18-20, San Francisco, CA
Learn about the latest advances in developing for the 
BlackBerry&reg; mobile platform with sessions, labs & more.
See new tools and technologies. Register for BlackBerry&reg; DevCon today!
http://p.sf.net/sfu/rim-devcon-copy1 
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to