Re: [Numpy-discussion] Can I add rows and columns to recarray?

2010-12-06 Thread Christopher Barker

On 12/5/10 7:56 PM, Wai Yip Tung wrote:

I'm fairly new to numpy and I'm trying to figure out the right way to do
things. Continuing on my question about using recarray as a relation.


note that recarrays (or structured arrays, AFAIK, the difference is 
atturube access only -- I don't use recarrays) are far more static than 
a database table. So you may really want to use a database, or maybe 
pytables. Or maybe even just stick with lists.


But if you are keeping things in memory, should be able to do what you want.


In [339]: arr = np.array([
 .: (1, 2.2, 0.0),
 .: (3, 4.5, 0.0)
 .: ],
 .: dtype=[
 .: ('unit',int),
 .: ('price',float),
 .: ('amount',float),
 .: ]
 .: )

In [340]: data = arr.view(recarray)


One of the most common thing I want to do is to append rows to data.


numpy arrays do not naturally support appending, as you have discovered.


 I
think concatenate() might be the method.


yes.


But I get a problem:



In [342]: np.concatenate((data0,[1,9.0,9.0]))
---
TypeError Traceback (most recent call last)

c:\Python26\Lib\site-packages\numpy\ipython console  inmodule()

TypeError: expected a readable buffer object


concatenate expects two arrays to be joined. If you pass in something 
that can easily be turned into an array, it will work, but a tuple can 
be converted to multiple types of arrays, so it doesn't know what to do. 
So you need to re-construct the second array:


a2 = np.array( [(3,5.5, 3)], dtype=dt)
arr = np.concatenate( (arr, a2) )


In [343]: data.amount = data.unit * data.price


yup


But sometimes it may require me to add a new column not already exist,
e.g.:

In [344]: data.discount_price = data.price * 0.9


How can I add a new column?


you can't. what you need to do is create a new array with a new dtype 
that includes the new field.


The trick is that numpy only supports homogenous arrays -- evey item is 
the same data type. So when you could a strut array like above, numpy 
does not define it as a 2-d table, but rather, a 1-d array, each element 
of which is a structure.


so you need to do something like:

# create a new array
data2 = np.zeros(len(data), dtype=dt2)

# fill the array:
for field_name in dt.fields.keys():
data2[field_name] = data[field_name]

# now some calculations:
data2['discount_price'] = data2['price'] * 0.9

I don't know of a way to avoid that loop when filling the array.

Better yet -- anticipate your needs and create the array with all the 
fields you need in the first place.


You can see that ndarrays are pretty static -- struct arrays can be 
useful data storage, but are not very suitable when things are changing 
much.


You could write a class that wraps an andarray, and supports what you 
need better -- it could be a pretty usefull general purpose class, too. 
I've got one that handle the appending part, but nothing with adding new 
fields.


Here's appending with my class:

data3 = accumulator.accumulator(dtype = dt2)
data3.append((1, 2.2, 0.0, 0.0))
data3.append((3, 4.5, 0.0, 0.0))
data3.append((2, 1.2, 0.0, 0.0))
data3.append((5, 4.2, 0.0, 0.0))
print repr(data3)

# convert to regular array for calculations:
data3 = np.array(data3)

# now some calculations:
data3['discount_price'] = data3['price'] * 0.9

You wouldn't have to convert to a regular array, except that I haven't 
written the code to support field access yet -- I don't think it would 
be too hard, though.


I've enclosed some test code, and my accumulator class, in case you find 
it useful.




-Chris








--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov


struct_test.py
Description: application/python


accumulator.py
Description: application/python
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Can I add rows and columns to recarray?

2010-12-06 Thread Benjamin Root
On Mon, Dec 6, 2010 at 12:26 PM, Christopher Barker
chris.bar...@noaa.govwrote:

 On 12/5/10 7:56 PM, Wai Yip Tung wrote:

 I'm fairly new to numpy and I'm trying to figure out the right way to do
 things. Continuing on my question about using recarray as a relation.


 note that recarrays (or structured arrays, AFAIK, the difference is
 atturube access only -- I don't use recarrays) are far more static than a
 database table. So you may really want to use a database, or maybe pytables.
 Or maybe even just stick with lists.

 But if you are keeping things in memory, should be able to do what you
 want.


  In [339]: arr = np.array([
 .: (1, 2.2, 0.0),
 .: (3, 4.5, 0.0)
 .: ],
 .: dtype=[
 .: ('unit',int),
 .: ('price',float),
 .: ('amount',float),
 .: ]
 .: )

 In [340]: data = arr.view(recarray)


 One of the most common thing I want to do is to append rows to data.


 numpy arrays do not naturally support appending, as you have discovered.


   I
 think concatenate() might be the method.


 yes.


  But I get a problem:


  In [342]: np.concatenate((data0,[1,9.0,9.0]))

 ---
 TypeError Traceback (most recent call
 last)

 c:\Python26\Lib\site-packages\numpy\ipython console  inmodule()

 TypeError: expected a readable buffer object


 concatenate expects two arrays to be joined. If you pass in something that
 can easily be turned into an array, it will work, but a tuple can be
 converted to multiple types of arrays, so it doesn't know what to do. So you
 need to re-construct the second array:

 a2 = np.array( [(3,5.5, 3)], dtype=dt)
 arr = np.concatenate( (arr, a2) )


  In [343]: data.amount = data.unit * data.price


 yup


  But sometimes it may require me to add a new column not already exist,
 e.g.:

 In [344]: data.discount_price = data.price * 0.9


 How can I add a new column?


 you can't. what you need to do is create a new array with a new dtype that
 includes the new field.

 The trick is that numpy only supports homogenous arrays -- evey item is the
 same data type. So when you could a strut array like above, numpy does not
 define it as a 2-d table, but rather, a 1-d array, each element of which is
 a structure.

 so you need to do something like:

 # create a new array
 data2 = np.zeros(len(data), dtype=dt2)

 # fill the array:
 for field_name in dt.fields.keys():
data2[field_name] = data[field_name]

 # now some calculations:
 data2['discount_price'] = data2['price'] * 0.9

 I don't know of a way to avoid that loop when filling the array.

 Better yet -- anticipate your needs and create the array with all the
 fields you need in the first place.

 You can see that ndarrays are pretty static -- struct arrays can be useful
 data storage, but are not very suitable when things are changing much.

 You could write a class that wraps an andarray, and supports what you need
 better -- it could be a pretty usefull general purpose class, too. I've got
 one that handle the appending part, but nothing with adding new fields.

 Here's appending with my class:

 data3 = accumulator.accumulator(dtype = dt2)
 data3.append((1, 2.2, 0.0, 0.0))
 data3.append((3, 4.5, 0.0, 0.0))
 data3.append((2, 1.2, 0.0, 0.0))
 data3.append((5, 4.2, 0.0, 0.0))
 print repr(data3)

 # convert to regular array for calculations:
 data3 = np.array(data3)

 # now some calculations:
 data3['discount_price'] = data3['price'] * 0.9

 You wouldn't have to convert to a regular array, except that I haven't
 written the code to support field access yet -- I don't think it would be
 too hard, though.

 I've enclosed some test code, and my accumulator class, in case you find it
 useful.



 -Chris


numpy.lib.recfunctions has a method for easily adding new columns.  Of
course, it really returns a new recarray rather than adding it to an
existing recarray.  Appending records to such an array, however is a
different story, and you have to do something like you demonstrated above.

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Can I add rows and columns to recarray?

2010-12-06 Thread Christopher Barker
On 12/6/10 11:00 AM, Benjamin Root wrote:

 numpy.lib.recfunctions has a method for easily adding new columns.

cool! There is a lot of other nifty- looking stuff in there too. The OP 
should really take a look.

And maybe an appending function is in order, too.

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Can I add rows and columns to recarray?

2010-12-06 Thread Wai Yip Tung
Thank you for the quick response and Christopher's explanation on the  
design background.

All my tables fit in-memory. I want to explore the data interactively and  
relational database is does not provide me a lot of value.

I was rolling my own library before I come to numpy. Then I find numpy's  
universal function awesome and really fit what I want to do. Now I just  
need to find out what to add row which is easy in Python. It is OK if it  
rebuild an array when I add a column, which should happen infrequently.  
But if adding row build a new array, this will lead to O(n^2) complexity.  
In anycase, I will explore the recfunctions.

Thank you

Wai Yip


 On Sun, Dec 5, 2010 at 10:56 PM, Wai Yip Tung tungwai...@yahoo.com  
 wrote:
 I'm fairly new to numpy and I'm trying to figure out the right way to do
 things. Continuing on my question about using recarray as a relation. I
 have a recarray like this


 In [339]: arr = np.array([
.: (1, 2.2, 0.0),
.: (3, 4.5, 0.0)
.: ],
.: dtype=[
.: ('unit',int),
.: ('price',float),
.: ('amount',float),
.: ]
.: )

 In [340]: data = arr.view(recarray)


 One of the most common thing I want to do is to append rows to data.  I
 think concatenate() might be the method. But I get a problem:


 In [342]: np.concatenate((data0,[1,9.0,9.0]))
 ---
 TypeError Traceback (most recent call  
 last)

 c:\Python26\Lib\site-packages\numpy\ipython console in module()

 TypeError: expected a readable buffer object



 The other thing I want to do is to calculate the column value. Right now
 it can do great thing like



 In [343]: data.amount = data.unit * data.price



 But sometimes it may require me to add a new column not already exist,
 e.g.:


 In [344]: data.discount_price = data.price * 0.9


 How can I add a new column? I tried column_stack. But it give a similar
 TypeError. I figure I need to first specify the type of the column. But  
 I
 don't know how.


 Check out numpy.lib.recfunctions

 I often have

 import numpy.lib.recfunctions as nprf

 Skipper

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Can I add rows and columns to recarray?

2010-12-06 Thread Christopher Barker
On 12/6/10 1:00 PM, Wai Yip Tung wrote:
 Thank you for the quick response and Christopher's explanation on the
 design background.

you're welcome.

 But if adding row build a new array, this will lead to O(n^2) complexity.

if you are adding a lot of rows one at a time, yes, you can have 
performance issues -- though re-allocating data is pretty fast, too -- 
maybe it won't matter.

If it does, consider the accumulator code I sent, or use it as 
inspiration to write your own.

If you do improve it, please send your improvements back to me.

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Can I add rows and columns to recarray?

2010-12-06 Thread Francesc Alted
A Monday 06 December 2010 22:00:29 Wai Yip Tung escrigué:
 Thank you for the quick response and Christopher's explanation on the
 design background.
 
 All my tables fit in-memory. I want to explore the data interactively
 and relational database is does not provide me a lot of value.
 
 I was rolling my own library before I come to numpy. Then I find
 numpy's universal function awesome and really fit what I want to do.
 Now I just need to find out what to add row which is easy in Python.
 It is OK if it rebuild an array when I add a column, which should
 happen infrequently. But if adding row build a new array, this will
 lead to O(n^2) complexity. In anycase, I will explore the
 recfunctions.

If you want a container with a better complexity for adding columns  
than O(n^2), you may want to have a look at the ctable object in carray 
package:

https://github.com/FrancescAlted/carray

carray is about providing compressed, in-memory data containers for both 
homogeneous (arrays) and heterogeneous data (structured arrays).  Here 
it is an example of use:

 import numpy as np
 import carray as ca
 NR = 1000*1000
 r = np.fromiter(((i,i*i) for i in xrange(NR)), dtype=i4,i8)
 new_field = np.arange(NR, dtype='f8')**3
 rc = ca.ctable(r)
 rc
ctable((100,), [('f0', 'i4'), ('f1', 'i8')])
  nbytes: 11.44 MB; cbytes: 1.71 MB; ratio: 6.70
[(0, 0), (1, 1), (2, 4), ..., (97, 9409), (98, 
9604), (99, 9801)]
 time rc.addcol(new_field, f2)
CPU times: user 0.03 s, sys: 0.00 s, total: 0.03 s
Wall time: 0.03 s

that is, only 30 ms for appending a column.  This is basically the time 
to copy (and compress) the data (i.e. O(n)).  If you append an already 
compressed column, the cost of adding it is O(1):

 r = np.fromiter(((i,i*i) for i in xrange(NR)), dtype=i4,i8)
 rc = ca.ctable(r)
 cnew_field = ca.carray(np.arange(NR, dtype='f8')**3)
 time rc.addcol(cnew_field, f2)
CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s
Wall time: 0.00 s

On his hand, using plain structured arrays is pretty more costly:

 import numpy.lib.recfunctions as nprf
 time r2 = nprf.rec_append_fields(r, 'f2', new_field, 'f8')
CPU times: user 0.34 s, sys: 0.02 s, total: 0.36 s
Wall time: 0.36 s

Appending data at the end of ctable objects is also very fast:

 timeit rc.append(row)
10 loops, best of 3: 13.1 µs per loop

Compare this with an append with an structured array:

 timeit np.concatenate((r2, row))
100 loops, best of 3: 6.84 ms per loop

Unfortunately you cannot do the full range of operations supported by 
structured arrays with ctables, and a ctable object is rather meant to 
be used as an efficient, compressed container for structures in memory:

 r2[2]
(2, 4, 8.0)
 rc[2]
(2, 4, 8.0)
 r2['f1']
array([0, 1, 4, ..., 1, 1, 1])
 rc['f1']
carray((1452223,), int64)  nbytes: 11.08 MB; cbytes: 1.62 MB; ratio: 
6.85
  cparams := cparams(clevel=5, shuffle=True)
[0, 1, 4, ..., 1, 1, 1]

But still, you can do funny things like complex queries:

 [r for r in rc.getif((f010)(f24), [__nrow__, f1])]
[(2, 4),
 (3, 9),
 (4, 16),
 (5, 25),
 (6, 36),
 (7, 49),
 (8, 64),
 (9, 81),
 (1041112, 1)]

The queries are also very fast (both Numexpr and Blosc are used under 
the hood):

 timeit [r for r in rc.getif((f010)(f24))]
10 loops, best of 3: 58.6 ms per loop
 timeit r2[(r2['f0']10)(r2['f2']4)]
10 loops, best of 3: 28 ms per loop

So, queries on ctables are only 2x slower than using plain structured 
arrays  --of course, the secret goal is to make these sort of queries 
actually faster than using structured arrays :)

I still need to finish the docs, but I plan to release carray 0.3 later 
this week.

Cheers,

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Can I add rows and columns to recarray?

2010-12-05 Thread Wai Yip Tung
I'm fairly new to numpy and I'm trying to figure out the right way to do  
things. Continuing on my question about using recarray as a relation. I  
have a recarray like this


In [339]: arr = np.array([
.: (1, 2.2, 0.0),
.: (3, 4.5, 0.0)
.: ],
.: dtype=[
.: ('unit',int),
.: ('price',float),
.: ('amount',float),
.: ]
.: )

In [340]: data = arr.view(recarray)


One of the most common thing I want to do is to append rows to data.  I  
think concatenate() might be the method. But I get a problem:


In [342]: np.concatenate((data0,[1,9.0,9.0]))
---
TypeError Traceback (most recent call last)

c:\Python26\Lib\site-packages\numpy\ipython console in module()

TypeError: expected a readable buffer object



The other thing I want to do is to calculate the column value. Right now  
it can do great thing like



In [343]: data.amount = data.unit * data.price



But sometimes it may require me to add a new column not already exist,  
e.g.:


In [344]: data.discount_price = data.price * 0.9


How can I add a new column? I tried column_stack. But it give a similar  
TypeError. I figure I need to first specify the type of the column. But I  
don't know how.

Thanks,

Wai Yip

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Can I add rows and columns to recarray?

2010-12-05 Thread Skipper Seabold
On Sun, Dec 5, 2010 at 10:56 PM, Wai Yip Tung tungwai...@yahoo.com wrote:
 I'm fairly new to numpy and I'm trying to figure out the right way to do
 things. Continuing on my question about using recarray as a relation. I
 have a recarray like this


 In [339]: arr = np.array([
    .:     (1, 2.2, 0.0),
    .:     (3, 4.5, 0.0)
    .:     ],
    .:     dtype=[
    .:         ('unit',int),
    .:         ('price',float),
    .:         ('amount',float),
    .:     ]
    .: )

 In [340]: data = arr.view(recarray)


 One of the most common thing I want to do is to append rows to data.  I
 think concatenate() might be the method. But I get a problem:


 In [342]: np.concatenate((data0,[1,9.0,9.0]))
 ---
 TypeError                                 Traceback (most recent call last)

 c:\Python26\Lib\site-packages\numpy\ipython console in module()

 TypeError: expected a readable buffer object



 The other thing I want to do is to calculate the column value. Right now
 it can do great thing like



 In [343]: data.amount = data.unit * data.price



 But sometimes it may require me to add a new column not already exist,
 e.g.:


 In [344]: data.discount_price = data.price * 0.9


 How can I add a new column? I tried column_stack. But it give a similar
 TypeError. I figure I need to first specify the type of the column. But I
 don't know how.


Check out numpy.lib.recfunctions

I often have

import numpy.lib.recfunctions as nprf

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion