[Numpy-discussion] ANN: SciPy 0.7.0

2009-02-11 Thread Jarrod Millman
I'm pleased to announce SciPy 0.7.0.  SciPy is a package of tools for
science and engineering for Python.  It includes modules for
statistics, optimization, integration, linear  algebra, Fourier
transforms, signal and image processing, ODE solvers,
and more.

This release comes sixteen months after the 0.6.0 release and contains
many new features, numerous bug-fixes, improved test coverage, and
better documentation.  Please note that SciPy 0.7.0 requires Python
2.4 or greater (but not Python 3) and NumPy 1.2.0 or greater.

For information, please see the release notes:
https://sourceforge.net/project/shownotes.php?release_id=660191group_id=27747

You can download the release from here:
https://sourceforge.net/project/showfiles.php?group_id=27747package_id=19531release_id=660191

Thank you to everybody who contributed to this release.

Enjoy,

Jarrod Millman
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy-izing a loop

2009-02-11 Thread Paul Rudin
Stéfan van der Walt ste...@sun.ac.za writes:

 2009/2/10 Stéfan van der Walt ste...@sun.ac.za:
 x = np.arange(dim)
 y = np.arange(dim)[:, None]
 z = np.arange(dim)[:, None, None]

 Do not operate heavy machinery or attempt broadcasting while tired or
 under the influence.  That order was incorrect:

 z = np.arange(dim)
 y = np.arange(dim)[:, None]
 x = np.arange(dim)[:, None, None]

Thanks to you both. I can confirm that the two functions below give the
same result - as far as my comprehensive testing of one example shows :)

def compute_voxels(depth_buffers):
assert len(depth_buffers) == 6
dim = depth_buffers[0].shape[0]
znear, zfar, ynear, yfar, xnear, xfar = depth_buffers
result = numpy.empty((dim, dim, dim), numpy.bool)
for x in xrange(dim):
for y in xrange(dim):
for z in xrange(dim):
result[x, y, z] = ((xnear[y, z]  xfar[y, z]) and 
   (ynear[x, z]  yfar[x, z]) and
   (znear[x, y]  zfar[x, y]))
return result

def compute_voxels2(depth_buffers):
dim = depth_buffers[0].shape[0]
znear, zfar, ynear, yfar, xnear, xfar = depth_buffers
z = numpy.arange(dim)
y = numpy.arange(dim)[:, None]
x = numpy.arange(dim)[:, None, None]

return  ((xnear[y,z]  xfar[y, z]) 
 (ynear[x, z]  yfar[x, z]) 
 (znear[x,y]  zfar[x,y]))


All that remains is for me to verify that it's actually the result I want :)

(Well - I also need to learn to spot these things for myself, but I
guess the intuition comes with practice.)

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] PEP: named axis

2009-02-11 Thread Lars Friedrich
Hello list,

I am not sure, if I understood everything of the discussion on the 
named-axis-idea of numpy-arrays, since I am only a *user* of numpy. I 
never subclassed the numpy-array-class ;-)

However, I have the need to store meta-information for my arrays. I do 
this with a stand-alone class with the name 'Wave' that stores its data 
in a n-dimensional numpy-array as a member. The meta-information I store 
(using dicts and lists) is

* coordinateLabel per axis
* x0 per axis
* dx per axis

This concept is taken from the data structures in the commercial 
software IGOR, that are also called 'Waves'. An example would be an 
image I took with a microscope. The data would be 2d, say shape = (640, 
480) holding the intesity information per pixel. x0 could then be 
[-1e-6, -2e-6] and dx [100e-9, 100e-9] meaning that the image's pixel 
index [0,0] corresponds to a position of -1micrometer/-2micrometer and 
the pixels have a spacing of 100nanometers. coordinateLabels would be 
['x(m)', 'y(m)'].

If I have a movie, the data would be 3d with x0 = [-1e-6, -2e-6, 0], dx 
= [100e-9, 100e-9, 100e-3] and coordinateLabels = ['x(m)', 'y(m)', 
't(s)'] for a frame rate of 10 fps.

What I would like to say with this is the following (as a user...) :

* Meta-information is often necessary
* A string-label per axis is often not enough. Scaling is also important
* I like the idea of a most-basic-as-possible numpy-array. In my 
opinion, the meta-data-management should be done by another (sub-?) 
class. This way, numpy-arrays are simple enough for new users (as I was 
roughly two years ago...).

I would be very interested in a class that *uses* numpy-arrays to 
provide a datastructure for physical data with coordinate labels and 
scaling.

Regards,
Lars Friedrich


-- 
Dipl.-Ing. Lars Friedrich

Bio- and Nano-Photonics
Department of Microsystems Engineering -- IMTEK
University of Freiburg
Georges-Köhler-Allee 102
D-79110 Freiburg
Germany

phone: +49-761-203-7531
fax:   +49-761-203-7537
room:  01 088
email: lfrie...@imtek.de
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Numpy 1.2.1 and Scipy 0.7.0; Ubuntu packages

2009-02-11 Thread David Cournapeau
Hi,

I started to set up a PPA for scipy on launchpad, which enables to
build ubuntu packages for various distributions/architectures. The
link is there:

https://edge.launchpad.net/~scipy/+archive/ppa

So you just need to add one line to your /etc/apt/sources.list, and
you will get uptodate numpy and scipy packages,

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy 1.2.1 and Scipy 0.7.0; Ubuntu packages

2009-02-11 Thread David Cournapeau
On Wed, Feb 11, 2009 at 9:46 PM, David Cournapeau courn...@gmail.com wrote:
 Hi,

 I started to set up a PPA for scipy on launchpad, which enables to
 build ubuntu packages for various distributions/architectures. The
 link is there:

 https://edge.launchpad.net/~scipy/+archive/ppa

 So you just need to add one line to your /etc/apt/sources.list, and
 you will get uptodate numpy and scipy packages,

I forgot to mention that those packages closely follow the official
packages: once a distribution update their package, it will
automatically supercede the one above. IOW, it can be seen as a
preview of the packages to come in Ubuntu/Debian,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: Numexpr 1.2 released

2009-02-11 Thread Francesc Alted

 Announcing Numexpr 1.2


Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like 3*a+4*b) are accelerated
and use less memory than doing the same calculation in Python.

The main feature added in this version is the support of the Intel VML 
library (many thanks to Gregor Thalhammer for his nice work on this!).  
In addition, when the VML support is on, several processors can be used 
in parallel (see the new `set_vml_num_threads()` function).

When the VML support is on, the computation of transcendental functions 
(like trigonometrical, exponential, logarithmic, hyperbolic, power...) 
can be accelerated quite a few.  Typical speed-ups when using one 
single core for contiguous arrays are around 3x, with peaks of 7.5x 
(for the pow() function).  When using 2 cores the speed-ups are around 
4x and 14x respectively.

In case you want to know more in detail what has changed in this
version, have a look at the release notes:

http://code.google.com/p/numexpr/wiki/ReleaseNotes


Where I can find Numexpr?
=

The project is hosted at Google code in:

http://code.google.com/p/numexpr/

And you can get the packages from PyPI as well:

http://pypi.python.org/pypi


How it works?
=

See:

http://code.google.com/p/numexpr/wiki/Overview

for a detailed description of the package.


Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy!

-- 
Francesc Alted
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] PyArray_SETITEM with object arrays in Cython

2009-02-11 Thread Wes McKinney
Hello,

I am writing some Cython code and have noted that the buffer interface
offers very little speedup for PyObject arrays. In trying to rewrite the
same code using the C API in Cython, I find I can't get PyArray_SETITEM to
work, in a call like:

PyArray_SETITEM(result, void * iterresult.dataptr, obj)

where result is an ndarray of dtype object, and obj is a PyObject*.

Anyone have some experience with this can offer pointers (no pun intended!)?


Thanks,
Wes
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] PyArray_SETITEM with object arrays in Cython

2009-02-11 Thread Dag Sverre Seljebotn
Wes McKinney wrote:
 I am writing some Cython code and have noted that the buffer interface
 offers very little speedup for PyObject arrays. In trying to rewrite the
 same code using the C API in Cython, I find I can't get PyArray_SETITEM to
 work, in a call like:

 PyArray_SETITEM(result, void * iterresult.dataptr, obj)

 where result is an ndarray of dtype object, and obj is a PyObject*.

Interesting. Whatever you end up doing, I'll make sure to integrate
whatever works faster into Cython.

I do doubt your results a bit though -- the buffer interface in Cython
increfs/decrefs the objects, but otherwise it should be completely raw
access, so using SETITEM shouldn't be faster except one INCREF/DECREF per
object (i.e. still way faster than using Python).

Could you perhaps post your Cython code?

Dag Sverre

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] PyArray_SETITEM with object arrays in Cython

2009-02-11 Thread Wes McKinney
I actually got it to work-- the function prototype in the pxi file was
wrong, needed to be:

int PyArray_SETITEM(object obj, void* itemptr, object item)

This still doesn't explain why the buffer interface was slow.

The general problem here is an indexed array (by dates or strings, for
example), that you want to conform to a new index. The arrays most of the
time contain floats but occasionally PyObjects. For some reason the access
and assignment is slow (this function can be faster by a factor of 50 with C
API macros, so clearly something is awry)-- let me know if you see anything
obviously wrong with this

def reindexObject(ndarray[object, ndim=1] index,
  ndarray[object, ndim=1] arr,
  dict idxMap):
'''
Using the provided new index, a given array, and a mapping of
index-value
correpondences in the value array, return a new ndarray conforming to
the new index.
'''
cdef object idx, value

cdef int length  = index.shape[0]
cdef ndarray[object, ndim = 1] result = np.empty(length, dtype=object)

cdef int i = 0
for i from 0 = i  length:
idx = index[i]
if not PyDict_Contains(idxMap, idx):
result[i] = None
continue
value = arr[idxMap[idx]]
result[i] = value
return result

On Wed, Feb 11, 2009 at 3:25 PM, Dag Sverre Seljebotn 
da...@student.matnat.uio.no wrote:

 Wes McKinney wrote:
  I am writing some Cython code and have noted that the buffer interface
  offers very little speedup for PyObject arrays. In trying to rewrite the
  same code using the C API in Cython, I find I can't get PyArray_SETITEM
 to
  work, in a call like:
 
  PyArray_SETITEM(result, void * iterresult.dataptr, obj)
 
  where result is an ndarray of dtype object, and obj is a PyObject*.

 Interesting. Whatever you end up doing, I'll make sure to integrate
 whatever works faster into Cython.

 I do doubt your results a bit though -- the buffer interface in Cython
 increfs/decrefs the objects, but otherwise it should be completely raw
 access, so using SETITEM shouldn't be faster except one INCREF/DECREF per
 object (i.e. still way faster than using Python).

 Could you perhaps post your Cython code?

 Dag Sverre

 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] PEP: named axis

2009-02-11 Thread Pauli Virtanen
Wed, 11 Feb 2009 22:21:30 +, Andrew Jaffe wrote:
[clip]
 Maybe I misunderstand the proposal, but, actually, I think this is
 completely the wrong semantics for axis=  anyway. axis= in numpy
 refers to what is also a dimension,  not a  column.

I think the proposal was to add the ability to refer to dimensions with 
names instead of numbers. This is separate from referring to entries in a 
dimension. (Addressing 'columns' by name is already provided by 
structured arrays.)

-- 
Pauli Virtanen

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] PEP: named axis

2009-02-11 Thread Andrew Jaffe
Pauli Virtanen wrote:
 Wed, 11 Feb 2009 22:21:30 +, Andrew Jaffe wrote:
 [clip]
 Maybe I misunderstand the proposal, but, actually, I think this is
 completely the wrong semantics for axis=  anyway. axis= in numpy
 refers to what is also a dimension,  not a  column.
 
 I think the proposal was to add the ability to refer to dimensions with 
 names instead of numbers. This is separate from referring to entries in a 
 dimension. (Addressing 'columns' by name is already provided by 
 structured arrays.)
 

My bad -- I completely misread the proposal!

Nevermind...

Andrew
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [SciPy-user] Numpy 1.2.1 and Scipy 0.7.0; Ubuntu packages

2009-02-11 Thread Fernando Perez
On Wed, Feb 11, 2009 at 4:46 AM, David Cournapeau courn...@gmail.com wrote:
 Hi,

 I started to set up a PPA for scipy on launchpad, which enables to
 build ubuntu packages for various distributions/architectures. The
 link is there:

 https://edge.launchpad.net/~scipy/+archive/ppa

Cool, thanks.  Is it easy to provide also hardy packages, or does it
require a lot of work on your part?

Cheers,

f
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [SciPy-user] Numpy 1.2.1 and Scipy 0.7.0; Ubuntu packages

2009-02-11 Thread David Cournapeau
On Thu, Feb 12, 2009 at 8:11 AM, Fernando Perez fperez@gmail.com wrote:
 On Wed, Feb 11, 2009 at 4:46 AM, David Cournapeau courn...@gmail.com wrote:
 Hi,

 I started to set up a PPA for scipy on launchpad, which enables to
 build ubuntu packages for various distributions/architectures. The
 link is there:

 https://edge.launchpad.net/~scipy/+archive/ppa

 Cool, thanks.  Is it easy to provide also hardy packages, or does it
 require a lot of work on your part?

Unfortunately, it does require some work, because hardy uses g77
instead of gfortran, so the source package has to be different (once
hardy is done, all the one below would be easy, though). I am not sure
how to do that with PPA (the doc is not great).

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt issues

2009-02-11 Thread A B
On Tue, Feb 10, 2009 at 9:52 PM, Brent Pedersen bpede...@gmail.com wrote:
 On Tue, Feb 10, 2009 at 9:40 PM, A B python6...@gmail.com wrote:
 Hi,

 How do I write a loadtxt command to read in the following file and
 store each data point as the appropriate data type:

 12|h|34.5|44.5
 14552|bbb|34.5|42.5

 Do the strings have to be read in separately from the numbers?

 Why would anyone use 'S10' instead of 'string'?

 dt = {'names': ('gender','age','weight','bal'), 'formats': ('i4',
 'S4','f4', 'f4')}

 a = loadtxt(sample_data.txt, dtype=dt)

 gives

 ValueError: need more than 1 value to unpack

 I can do a = loadtxt(sample_data.txt, dtype=string) but can't use
 'string' instead of S4 and all my data is read into strings.

 Seems like all the examples on-line use either numeric or textual
 input, but not both.

 Thanks.
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion


 works for me but not sure i understand the problem, did you try
 setting the delimiter?


 import numpy as np
 from cStringIO import StringIO

 txt = StringIO(\
 12|h|34.5|44.5
 14552|bbb|34.5|42.5)

 dt = {'names': ('gender','age','weight','bal'), 'formats': ('i4',
 'S4','f4', 'f4')}
 a = np.loadtxt(txt, dtype=dt, delimiter=|)
 print a.dtype

I had tried both with and without the delimiter. In any event, it just
worked for me as well. Not sure what I was missing before. Anyway,
thank you.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] genloadtxt: dtype=None and unpack=True

2009-02-11 Thread Ryan May
Pierre,

I noticed that using dtype=None with a heterogeneous set of data, trying to
use unpack=True to get the columns into separate arrays (instead of a
structured array) doesn't work.  I've attached a patch that, in the case of
dtype=None, unpacks the fields in the final array into a list of separate
arrays.  Does this seem like a good idea to you?

Here's a test case:

from cStringIO import StringIO
s = '2,1950-02-27,35.55\n2,1951-02-19,35.27\n'
a,b,c = np.genfromtxt(StringIO(s), delimiter=',', unpack=True, missing=' ',
dtype=None)

Ryan

--
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma


genloadtxt_unpack_fields.diff
Description: Binary data
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] genloadtxt: dtype=None and unpack=True

2009-02-11 Thread Pierre GM

On Feb 11, 2009, at 11:38 PM, Ryan May wrote:

 Pierre,

 I noticed that using dtype=None with a heterogeneous set of data,  
 trying to use unpack=True to get the columns into separate arrays  
 (instead of a structured array) doesn't work.  I've attached a patch  
 that, in the case of dtype=None, unpacks the fields in the final  
 array into a list of separate arrays.  Does this seem like a good  
 idea to you?

Nope, as it breaks consistency: depending on some input parameters,  
you either get an array or a list. I think it's better to leave it as  
it is, maybe adding an extra line in the doc precising that  
unpack=True doesn't do anything for structured arrays. 
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] FYI: New select-multiple-fields behavior

2009-02-11 Thread Travis E. Oliphant

Hi all,

As of r6358, I checked in the functionality to allow selection by 
multiple fields along with a couple of tests.

ary['field1', 'field3']  raises an error
ary[['field1', 'field3']] is the correct spelling and returns a copy of 
the data in those fields in a new array.


-Travis

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt issues

2009-02-11 Thread A B
On Wed, Feb 11, 2009 at 6:27 PM, A B python6...@gmail.com wrote:
 On Tue, Feb 10, 2009 at 9:52 PM, Brent Pedersen bpede...@gmail.com wrote:
 On Tue, Feb 10, 2009 at 9:40 PM, A B python6...@gmail.com wrote:
 Hi,

 How do I write a loadtxt command to read in the following file and
 store each data point as the appropriate data type:

 12|h|34.5|44.5
 14552|bbb|34.5|42.5

 Do the strings have to be read in separately from the numbers?

 Why would anyone use 'S10' instead of 'string'?

 dt = {'names': ('gender','age','weight','bal'), 'formats': ('i4',
 'S4','f4', 'f4')}

 a = loadtxt(sample_data.txt, dtype=dt)

 gives

 ValueError: need more than 1 value to unpack

 I can do a = loadtxt(sample_data.txt, dtype=string) but can't use
 'string' instead of S4 and all my data is read into strings.

 Seems like all the examples on-line use either numeric or textual
 input, but not both.

 Thanks.
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion


 works for me but not sure i understand the problem, did you try
 setting the delimiter?


 import numpy as np
 from cStringIO import StringIO

 txt = StringIO(\
 12|h|34.5|44.5
 14552|bbb|34.5|42.5)

 dt = {'names': ('gender','age','weight','bal'), 'formats': ('i4',
 'S4','f4', 'f4')}
 a = np.loadtxt(txt, dtype=dt, delimiter=|)
 print a.dtype

 I had tried both with and without the delimiter. In any event, it just
 worked for me as well. Not sure what I was missing before. Anyway,
 thank you.


Actually, I was using two different machines and it appears that the
version of numpy available on Ubuntu is seriously out of date (1.0.4).
Wonder why ...
Version 1.2.1 on a RedHat box worked fine.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] FYI: New select-multiple-fields behavior

2009-02-11 Thread Stéfan van der Walt
Hi Travis

2009/2/12 Travis E. Oliphant oliph...@enthought.com:
 ary['field1', 'field3']  raises an error
 ary[['field1', 'field3']] is the correct spelling and returns a copy of
 the data in those fields in a new array.

Is there absolutely no way of returning the result as a view?

Regards
Stéfan
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Outer join ?

2009-02-11 Thread A B
Hi,

I have the following data structure:

col1 | col2 | col3

20080101|key1|4
20080201|key1|6
20080301|key1|5
20080301|key2|3.4
20080601|key2|5.6

For each key in the second column, I would like to create an array
where for all unique values in the first column, there will be either
a value or zero if there is no data available. Like so:

# 20080101, 20080201, 20080301, 20080601

key1 - 4, 6, 5,0
key2 - 0, 0, 3.4, 5.6

Ideally, the results would end up in a 2d array.

What's the most efficient way to accomplish this? Currently, I am
getting a list of uniq col1 and col2 values into separate variables,
then looping through each unique value in col2

a = loadtxt(...)

dates = unique(a[:]['col1'])
keys = unique(a[:]['col2'])

for key in keys:
b = a[where(a[:]['col2'] == key)]
???

Thanks in advance.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] FYI: New select-multiple-fields behavior

2009-02-11 Thread Travis E. Oliphant
Stéfan van der Walt wrote:
 Hi Travis

 2009/2/12 Travis E. Oliphant oliph...@enthought.com:
   
 ary['field1', 'field3']  raises an error
 ary[['field1', 'field3']] is the correct spelling and returns a copy of
 the data in those fields in a new array.
 

 Is there absolutely no way of returning the result as a view?
   
Not that I can think of --- it does match advanced indexing semantics to 
have it be a copy.

-Travis

-- 

Travis Oliphant
Enthought, Inc.
(512) 536-1057 (office)
(512) 536-1059 (fax)
http://www.enthought.com
oliph...@enthought.com

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Outer join ?

2009-02-11 Thread Robert Kern
On Wed, Feb 11, 2009 at 23:24, A B python6...@gmail.com wrote:
 Hi,

 I have the following data structure:

 col1 | col2 | col3

 20080101|key1|4
 20080201|key1|6
 20080301|key1|5
 20080301|key2|3.4
 20080601|key2|5.6

 For each key in the second column, I would like to create an array
 where for all unique values in the first column, there will be either
 a value or zero if there is no data available. Like so:

 # 20080101, 20080201, 20080301, 20080601

 key1 - 4, 6, 5,0
 key2 - 0, 0, 3.4, 5.6

 Ideally, the results would end up in a 2d array.

 What's the most efficient way to accomplish this? Currently, I am
 getting a list of uniq col1 and col2 values into separate variables,
 then looping through each unique value in col2

 a = loadtxt(...)

 dates = unique(a[:]['col1'])
 keys = unique(a[:]['col2'])

 for key in keys:
b = a[where(a[:]['col2'] == key)]
???

Take a look at setmember1d().

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numscons/numpy.distutils bug related to MACOSX_DEPLOYMENT_TARGET

2009-02-11 Thread Stéfan van der Walt
2009/2/6 Brian Granger ellisonbg@gmail.com:
 Great, what is the best way of rolling this into numpy?

I've committed your patch.

Cheers
Stéfan
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Fast threading solution thoughts

2009-02-11 Thread Brian Granger
Hi,

This is relevant for anyone who would like to speed up array based
codes using threads.

I have a simple loop that I have implemented using Cython:

def backstep(np.ndarray opti, np.ndarray optf,
 int istart, int iend, double p, double q):
cdef int j
cdef double *pi
cdef double *pf
pi = double *opti.data
pf = double *optf.data

with nogil:
for j in range(istart, iend):
pf[j] = (p*pi[j+1] + q*pi[j])

I need to call this function *many* times and each time cannot be
performed until the previous time is completely as there are data
dependencies.  But, I still want to parallelize a single call to this
function across multiple cores (notice that I am releasing the GIL
before I do the heavy lifting).

I want to break my loop range(istart,iend) into pieces and have a
thread do each piece.  The arrays have sizes 10^3 to 10^5.

Things I have tried:

* Use pthreads and create new threads for each call to my function.
This performed horribly due to the thread creation overhead.
* Use a simple threadpool implementation in Python.  This performed
horribly as well even though I was not recreating threads each call.
The reason in this case was the time spent waiting on locks in the
thread pool implementation (which is based on Queue and threading).
This is either a problem with threading itself or a fundamental
limitation of the pthreads library itself.
* My next step is to implement a thread pool using pthreads/Cython.
Assuming I do this right, this should be as fast as I can get using
pthreads.

The only tools that I know of that are *really* designed to handle
these types of fine-grained parallel problems are:

* Intel's TBB
* OpenMP
* Cilk++
* ???

This seem like pretty heavy solutions though.

So, do people have thoughts about ways of effectively using threads
(not processes) for thin parallel loops over arrays?  This is relevant
to Numpy itself as it would be very nice if all ufuncs could release
the GIL and be run on multiple threads.

Cheers,

Brian
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numscons/numpy.distutils bug related to MACOSX_DEPLOYMENT_TARGET

2009-02-11 Thread Brian Granger
Thanks much!

Brian

On Wed, Feb 11, 2009 at 9:44 PM, Stéfan van der Walt ste...@sun.ac.za wrote:
 2009/2/6 Brian Granger ellisonbg@gmail.com:
 Great, what is the best way of rolling this into numpy?

 I've committed your patch.

 Cheers
 Stéfan
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-11 Thread Robert Kern
On Wed, Feb 11, 2009 at 23:46, Brian Granger ellisonbg@gmail.com wrote:
 Hi,

 This is relevant for anyone who would like to speed up array based
 codes using threads.

 I have a simple loop that I have implemented using Cython:

 def backstep(np.ndarray opti, np.ndarray optf,
 int istart, int iend, double p, double q):
cdef int j
cdef double *pi
cdef double *pf
pi = double *opti.data
pf = double *optf.data

with nogil:
for j in range(istart, iend):
pf[j] = (p*pi[j+1] + q*pi[j])

 I need to call this function *many* times and each time cannot be
 performed until the previous time is completely as there are data
 dependencies.  But, I still want to parallelize a single call to this
 function across multiple cores (notice that I am releasing the GIL
 before I do the heavy lifting).

 I want to break my loop range(istart,iend) into pieces and have a
 thread do each piece.  The arrays have sizes 10^3 to 10^5.

 Things I have tried:

 * Use pthreads and create new threads for each call to my function.
 This performed horribly due to the thread creation overhead.
 * Use a simple threadpool implementation in Python.  This performed
 horribly as well even though I was not recreating threads each call.
 The reason in this case was the time spent waiting on locks in the
 thread pool implementation (which is based on Queue and threading).
 This is either a problem with threading itself or a fundamental
 limitation of the pthreads library itself.
 * My next step is to implement a thread pool using pthreads/Cython.
 Assuming I do this right, this should be as fast as I can get using
 pthreads.

 The only tools that I know of that are *really* designed to handle
 these types of fine-grained parallel problems are:

 * Intel's TBB
 * OpenMP
 * Cilk++
 * ???

 This seem like pretty heavy solutions though.

From a programmer's perspective, it seems to me like OpenMP is a muck
lighter weight solution than pthreads.

 So, do people have thoughts about ways of effectively using threads
 (not processes) for thin parallel loops over arrays?  This is relevant
 to Numpy itself as it would be very nice if all ufuncs could release
 the GIL and be run on multiple threads.

Eric Jones tried to do this with pthreads in C some time ago. His work is here:

  http://svn.scipy.org/svn/numpy/branches/multicore/

The lock overhead makes it usually not worthwhile.

And there's still an open problem for determining whether two strided
arrays overlap in memory.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt issues

2009-02-11 Thread Scott Sinclair
 2009/2/12 A B python6...@gmail.com:
 Actually, I was using two different machines and it appears that the
 version of numpy available on Ubuntu is seriously out of date (1.0.4).
 Wonder why ...

See the recent post here

http://projects.scipy.org/pipermail/numpy-discussion/2009-February/040252.html

Cheers,
Scott
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-11 Thread Brian Granger
 Eric Jones tried to do this with pthreads in C some time ago. His work is 
 here:

  http://svn.scipy.org/svn/numpy/branches/multicore/

 The lock overhead makes it usually not worthwhile.

I was under the impression that Eric's implementation didn't use a
thread pool.  Thus I thought the bottleneck was the thread creation
time for his implementation.  Eric can you comment?

Cheers,

Brian
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [SciPy-user] Numpy 1.2.1 and Scipy 0.7.0; Ubuntu packages

2009-02-11 Thread Fernando Perez
On Wed, Feb 11, 2009 at 6:17 PM, David Cournapeau courn...@gmail.com wrote:

 Unfortunately, it does require some work, because hardy uses g77
 instead of gfortran, so the source package has to be different (once
 hardy is done, all the one below would be easy, though). I am not sure
 how to do that with PPA (the doc is not great).

OK, thanks for the info.  This is already very useful.

Cheers,

f
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-11 Thread Robert Kern
On Thu, Feb 12, 2009 at 00:03, Brian Granger ellisonbg@gmail.com wrote:
 Eric Jones tried to do this with pthreads in C some time ago. His work is 
 here:

  http://svn.scipy.org/svn/numpy/branches/multicore/

 The lock overhead makes it usually not worthwhile.

 I was under the impression that Eric's implementation didn't use a
 thread pool.  Thus I thought the bottleneck was the thread creation
 time for his implementation.  Eric can you comment?

It does use a thread pool. See PyUFunc_GenericFunction:

http://svn.scipy.org/svn/numpy/branches/multicore/numpy/core/src/ufuncobject.c

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-11 Thread Gael Varoquaux
On Wed, Feb 11, 2009 at 11:52:40PM -0600, Robert Kern wrote:
  This seem like pretty heavy solutions though.

 From a programmer's perspective, it seems to me like OpenMP is a muck
 lighter weight solution than pthreads.

From a programmer's perspective, because, IMHO, openmp is implemented
using pthreads. I do have difficulties verifying this on the web, but
documents I find hint to that.

Gaël
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-11 Thread David Cournapeau
Robert Kern wrote:

 Eric Jones tried to do this with pthreads in C some time ago. His work is 
 here:

   http://svn.scipy.org/svn/numpy/branches/multicore/

 The lock overhead makes it usually not worthwhile.
   

I am curious: would you know what would be different in numpy's case
compared to matlab array model concerning locks ? Matlab, up to
recently, only spreads BLAS/LAPACK on multi-cores, but since matlab 7.3
(or 7.4), it also uses multicore for mathematical functions (cos,
etc...). So at least in matlab's model, it looks like it can be useful.
I understand that numpy's model is more flexible (I don't think strided
arrays can overlap in matlab for example, at least not from what you can
see from the public API).

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-11 Thread David Cournapeau
Gael Varoquaux wrote:
 From a programmer's perspective, because, IMHO, openmp is implemented
 using pthreads. 

Since openmp also exists on windows, I doubt that it is required that
openmp uses pthread :)

On linux, with gcc, using -fopenmp implies -pthread, so I guess it uses
pthread (can you be more low level than pthread on pthread-enabled Linux
in userland ?)

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-11 Thread Brian Granger
 I am curious: would you know what would be different in numpy's case
 compared to matlab array model concerning locks ? Matlab, up to
 recently, only spreads BLAS/LAPACK on multi-cores, but since matlab 7.3
 (or 7.4), it also uses multicore for mathematical functions (cos,
 etc...). So at least in matlab's model, it looks like it can be useful.

Good point.  Is it possible to tell what array size it switches over
to using multiple threads?  Also, do you happen to iknow how Matlab is
doing this?

 I understand that numpy's model is more flexible (I don't think strided
 arrays can overlap in matlab for example, at least not from what you can
 see from the public API).

True, but I would be happy to just have a fast C based threadpool
implentation I could use in low level Cython based loops.  When
performance *really* matters, ufuncs have a lot of other overhead that
is typically unacceptable.  But I could imagine that all that extra
logic kill the parallel scaling through Amdahl's law (the extra logic
is serial logic).

Cheers,

Brian

 cheers,

 David
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-11 Thread Brian Granger
 Good point.  Is it possible to tell what array size it switches over
 to using multiple threads?

 Yes.

 http://svn.scipy.org/svn/numpy/branches/multicore/numpy/core/threadapi.py

Sorry, I was curious about what Matlab does in this respect.  But,
this is very useful and I will look at it.

Cheers,

Brian


 --
 Robert Kern

 I have come to believe that the whole world is an enigma, a harmless
 enigma that is made terrible by our own mad attempt to interpret it as
 though it had an underlying truth.
  -- Umberto Eco
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-11 Thread David Cournapeau
Brian Granger wrote:
 I am curious: would you know what would be different in numpy's case
 compared to matlab array model concerning locks ? Matlab, up to
 recently, only spreads BLAS/LAPACK on multi-cores, but since matlab 7.3
 (or 7.4), it also uses multicore for mathematical functions (cos,
 etc...). So at least in matlab's model, it looks like it can be useful.
 

 Good point.  Is it possible to tell what array size it switches over
 to using multiple threads?  Also, do you happen to iknow how Matlab is
 doing this?
   

No - I have never seen deep explanation of the matlab model. The C api
is so small that it is hard to deduce anything from it (except that the
memory handling is not ref-counting-based, I don't know if it matters
for our discussion of speeding up ufunc). I would guess that since two
arrays cannot share data (COW-based), lock handling may be easier to
deal with ? I am not really familiar with multi-thread programming (my
only limited experience is for soft real-time programming for audio
processing, where the issues are totally different, since latency
matters as much if not more than throughput).


 True, but I would be happy to just have a fast C based threadpool
 implentation I could use in low level Cython based loops.

Matlab has a parallel toolbox to do this kind of things in matlab (I
don't know in C). I don't know anything about it, nor do I know if that
can be applied in any way to python/numpy's case:

http://www.mathworks.com/products/parallel-computing/

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion