A Thursday 02 October 2008, John Gu escrigué:
Hello,
I am using numpy in conjunction with pyTables. The data that I read
in from pyTables seem to have the following dtype:
p = hdf5.root.myTable.read()
p.__class__
type 'numpy.ndarray'
p[0].__class__
type 'numpy.void'
p.dtype
dtype([('time', 'f4'), ('obs1', 'f4'), ('obs2', 'f8'), ('obs3',
'f4')])
p.shape
(61230,)
The manner in which I access a particular column is p['time'] or
p['obs1']. I have a couple of questions regarding this data
structure: 1) how do I restructure the array into a 61230 x 4 array
that can be indexed using [r,c] notation?
In your example, the table (record array in NumPy jargon) is
inhomogeneous (all fields are 'f4' except 'obs2' which is 'f8'). In
that case, you can obtain an homogeneous array by doing something like:
In [44]: a = numpy.array([(1,2),(3,4)], dtype=[('obs1','f4'),
('obs2','f8')])
In [45]: b = numpy.array([(val['obs1'], val['obs2']) for val in a],
dtype='f4')
In [46]: b
Out[46]:
array([[ 1., 2.],
[ 3., 4.]], dtype=float32)
In case your table would be homegenous, there is a simpler way:
In [41]: a = numpy.array([(1,2),(3,4)], dtype=[('obs1','f4'),
('obs2','f4')])
In [42]: d = a.view(('f4',2))
In [43]: d
Out[43]:
array([[ 1., 2.],
[ 3., 4.]], dtype=float32)
which is faster:
In [68]: timeit d = a.view(('f4',2))
10 loops, best of 3: 11.5 µs per loop
In [69]: timeit b=numpy.array([(val['obs1'], val['obs2']) for val in a],
dtype='f4')
1 loops, best of 3: 39.8 µs per loop
2) What kind of dtype is
pyTables using? How do I create a similar array that can be indexed
by a named column? I tried various ways:
a = array([[1,2],[3,4]],
dtype=dtype([('obs1','f4'),('obs2','f4')]))
-
-- type 'exceptions.TypeError' Traceback (most
recent call last)
p:\AsiaDesk\johngu\projects\deltaForce\ipython console in
module()
type 'exceptions.TypeError': expected a readable buffer object
Yeah, the error message is too terse in this case. Record array
constructor needs to be sure where your records start and end, and this
is achieved by mapping tuples to records. So, your example must be
rewritten as:
In [70]: a = numpy.array([(1,2),(3,4)], dtype=[('obs1','f4'),
('obs2','f4')])
In [71]: a
Out[71]:
array([(1.0, 2.0), (3.0, 4.0)],
dtype=[('obs1', 'f4'), ('obs2', 'f4')])
Have a look at:
http://www.scipy.org/RecordArrays
for more info on record arrays.
I did find some documentation about array type descriptors when
reading from files... it seems like these array types are specific to
arrays created when reading from some sort of file / buffer? Any
help is appreciated. Thanks!
I'm not sure on what you are asking here. At any rate, it might be
useful to have a look at complex dtype examples in:
http://www.scipy.org/Numpy_Example_List#head-f9175c69cccd74b9e4ee92e2a060af27c7447b76
Hope that helps,
--
Francesc Alted
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion