I have many sorted tables of identical structure and would like to combine them 
into a single, sorted table. Python >= 2.6 offers heapq.merge(*iterators), but 
I cannot quite get it to work. Could somebody please tell me how to convert a 
list of Table instances into a list of iterators that heapq.merge() can use?

I'm particularly puzzled by the following (can be run after the code below). 
Note that t[0] is a Table object.

In [74]: for row in t[0]: print row
   ....:
(0, 100.0)
(3, 103.0)
(6, 106.0)
(9, 109.0)

In [75]: [row for row in t[0]]
Out[75]: [(9, 109.0), (9, 109.0), (9, 109.0), (9, 109.0)]
In [77]: a, b, c, d = [row for row in t[0]]

In [78]: a is b
Out[78]: True

In [79]: b is c
Out[79]: True


I thought that a list comprehension and explicit for loop were mostly 
equivalent. However, it seems that the Row class is a slippery creature...

Any hints highly appreciated,
Jon Olav



"""Merge identically-structured HDF files using heapq.merge"""
import heapq
import tables as pt
import numpy as np

# Sample data in a Numpy record array
dtype = [("a", int), ("b", float)]
x = np.rec.fromarrays([np.arange(0, 10), np.arange(100, 110)], dtype=dtype)

f = pt.openFile("test.h5", "w")

# Create three tables
t = [f.createTable(f.root, "t%s" % i, x[:0]) for i in range(3)]

# Put records in alternate tables (% is the modulus operator)
# Not using "for row in x" because that generates tuples, not size-1 recarrays
for i in range(len(x)): # reversing the sort order won't work: [::-1]
    t[i % 3].append(x[i:i+1])

# This is what I hoped would work...it gives me the correct number of rows, 
# but just repeats the last record in each table.
[row for row in heapq.merge(*t)]
# [(9, 109.0),
#  (9, 109.0),
#  (9, 109.0),
#  (9, 109.0),
#  (7, 107.0),
#  (7, 107.0),
#  (7, 107.0),
#  (8, 108.0),
#  (8, 108.0),
#  (8, 108.0)]

# Row instances _are_ comparable, so I don't see why heapq.merge doesn't work.
for row0 in t[0]:
    break # Just sets row0 to first item in table t[0]

for row1 in t[1]:
    break # Sets row1 to first item in table t[1]

row0 < row1 # True



------------------------------------------------------------------------------
Join us December 9, 2009 for the Red Hat Virtual Experience,
a free event focused on virtualization and cloud computing. 
Attend in-depth sessions from your desk. Your couch. Anywhere.
http://p.sf.net/sfu/redhat-sfdev2dev
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to