Here's an alternative method that uses the built-in search capabilities in
PyTables in place of the itertools library.
Using readWhere as shown below will return a NumPy ndarray of the data that
matches the query. I think that answers your question #4. There are
similar methods - where and getWhereList - that return an iterator over the
matching rows and a list of the matching row indices, respectively. They
may be more appropriate depending on your use case.
def iter5(tbl):
keys = set(tbl.col('key'))
for _key in keys:
rows = tbl.readWhere('key == _key')
rows.sort(order = ['value'])
for row in rows:
print(row['key'], row['value'])
Hope this helps,
Josh
On Tue, Jun 28, 2011 at 4:51 PM, Geoffrey Zhu <zyzhu2...@gmail.com> wrote:
> Hi All,
>
> I am trying to iterate through records in a pytable. The records in
> the table are ordered by a key. I need to first divide the records
> into groups as defined by the key, then iterate through each group,
> and finally iterate through records in each group. The below code does
> exactly this:
>
> def iter0(tbl):
> print "***Iter0 - Iterate records by subgroup****"
> for k1, m in itertools.groupby(tbl,lambda x: x['key']):
> for v in m:
> print v['key'], v['value'], type(v)
>
> The complexity comes in when I try to iterate through records in each
> subgroup in a particular order, i.e., if I want to sort the records in
> each group and then iterate through them. Let me generate some fake
> data and then go through the four ways I tried. None of them are
> ideal.
>
>
> This code generates some fake data for our tests.
>
> hf = tables.openFile('sample.h5','w')
> # Generate some data
> class SampleRecord(tables.IsDescription):
> key = tables.Int32Col()
> value = tables.Int32Col()
>
>
> hf.createTable("/", "samples", SampleRecord, "samples")
> for j in range(1, 3):
> for i in range(10,13):
> row = hf.root.samples.row
> row['key'] = j
> row['value'] = i
> row.append()
> hf.root.samples.flush()
> hf.flush()
>
> The first method I tried is as follows. This looks exactly like the
> previous code, but in the inner loop, I use "for v in
> sorted(m,key=lambda x: -x['value'])" instead of "for v in m."
>
> def iter1(tbl):
> print
> print "****Attempt 1**** - Iterate values by subgroup w/ records
> in subgroups sorted"
> print "THIS DOES NOT WORK"
> for k1, m in itertools.groupby(tbl,lambda x: x['key']):
> for v in sorted(m,key=lambda x: -x['value']):
> print v['key'], v['value'], type(v)
>
> However, this gives the wrong results, as follows. I don't know what
> it does not work.
>
> ****Attempt 1**** - Iterate values by subgroup w/ records in subgroups
> sorted
> THIS DOES NOT WORK
> 2 10 <type 'tables.tableExtension.Row'>
> 2 10 <type 'tables.tableExtension.Row'>
> 2 10 <type 'tables.tableExtension.Row'>
> 2 12 <type 'tables.tableExtension.Row'>
> 2 12 <type 'tables.tableExtension.Row'>
> 2 12 <type 'tables.tableExtension.Row'>
>
>
> The second method I tried is as follows. I try to copy what is in the
> inner iterator into a list and then sort the list.
>
> def iter2(tbl):
> print
> print "****Attempt 2**** - Iterate values by subgroup w/ records
> in subgroups sorted"
> print "THIS DOES NOT WORK, EITHER"
> for k1, m in itertools.groupby(tbl,lambda x: x['key']):
> temp_list = list(m)
> temp_list2 = sorted(temp_list, key=lambda x: -x['value'])
> for v in temp_list2:
> print v['key'], v['value'], type(v)
>
> This does not work either. The results are similar to the last one.
>
> ****Attempt 2**** - Iterate values by subgroup w/ records in subgroups
> sorted
> THIS DOES NOT WORK, EITHER
> 2 10 <type 'tables.tableExtension.Row'>
> 2 10 <type 'tables.tableExtension.Row'>
> 2 10 <type 'tables.tableExtension.Row'>
> 2 12 <type 'tables.tableExtension.Row'>
> 2 12 <type 'tables.tableExtension.Row'>
> 2 12 <type 'tables.tableExtension.Row'>
>
>
> The other two methods I tried are as follows. In these methods, I try
> to get the row index number from the inner iterator and then reference
> the records with these index numbers.
>
>
> def iter3(tbl):
> print
> print "****Attempt 3**** - Iterate values by subgroup w/ records
> in subgroups sorted"
> print "THIS WORKs, BUT TERRIBLY SLOW!"
>
> for k1, m in itertools.groupby(tbl,lambda x: x['key']):
> rows = [x.nrow for x in m]
> sorted_rows = sorted(rows, key = lambda x: -tbl[x]['value'])
> for i in sorted_rows:
> v = tbl[i]
> print v['key'], v['value'], type(v)
>
> def iter4(tbl):
> print
> print "****Attempt 4**** - Iterate values by subgroup w/ records
> in subgroups sorted"
> print "THIS WORKs, BUT TERRIBLY SLOW, TOO!"
>
> for k1, m in itertools.groupby(tbl,lambda x: x['key']):
> rows = [x.nrow for x in m]
> sorted_rows = sorted(rows, key = lambda x: -tbl[x]['value'])
> for v in tbl.itersequence(sorted_rows):
> print v['key'], v['value'], type(v)
>
>
> These two methods seem to give the correct results, but they are
> terribly slow. They are about 10-20 times slower than the original
> iterator version.
>
>
> ****Attempt 3**** - Iterate values by subgroup w/ records in subgroups
> sorted
> THIS WORKs, BUT TERRIBLY SLOW!
> 1 12 <type 'numpy.void'>
> 1 11 <type 'numpy.void'>
> 1 10 <type 'numpy.void'>
> 2 12 <type 'numpy.void'>
> 2 11 <type 'numpy.void'>
> 2 10 <type 'numpy.void'>
>
> ****Attempt 4**** - Iterate values by subgroup w/ records in subgroups
> sorted
> THIS WORKs, BUT TERRIBLY SLOW, TOO!
> 1 12 <type 'tables.tableExtension.Row'>
> 1 11 <type 'tables.tableExtension.Row'>
> 1 10 <type 'tables.tableExtension.Row'>
> 2 12 <type 'tables.tableExtension.Row'>
> 2 11 <type 'tables.tableExtension.Row'>
> 2 10 <type 'tables.tableExtension.Row'>
>
>
>
> My questions are:
>
> 1. Is there any better way to do this?
> 2. Why method 1 and 2 fail?
> 3. In the last two methods, notice that the types of v are different.
> One is numpy.void and the other is 'tables.tableExtension.Row'. In
> this example, they are used the same way, but when there are nested
> structs, they are used differently -- with the former you will do
> v['foo']['bar'] and with the latter, you will do v['foo/bar']. Why is
> this the case?
> 4. If I want to copy part of the table into memory, what is the best
> way of doing this?
>
> Thanks,
> Geoffrey
>
>
> ------------------------------------------------------------------------------
> All of the data generated in your IT infrastructure is seriously valuable.
> Why? It contains a definitive record of application performance, security
> threats, fraudulent activity, and more. Splunk takes this data and makes
> sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-d2d-c2
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users