Re: [Numpy-discussion] another discussion on numpy correlate (and convolution)

2013-02-22 Thread josef . pktd
On Thu, Feb 21, 2013 at 1:00 PM, Pierre Haessig
pierre.haes...@crans.org wrote:
 Hi everybody,

 (just coming from a discussion on the performance of Matplotlib's
 (x)corr function which uses np.correlate)

 There have been already many discussions on how to compute
 (cross-)correlations of time-series in Python (like
 http://stackoverflow.com/questions/6991471/computing-cross-correlation-function).
 The discussion is spread between the various stakeholders (just to name
 some I've in mind : scipy, statsmodel, matplotlib, ...).

 There are basically 2 implementations : time-domain and frequency-domain
 (using fft + multiplication). My discussion is only on time-domain
 implementation. The key usecase which I feel is not well adressed today
 is when computing the (cross)correlation of two long timeseries on only
 a few lagpoints.

 For time-domain, one can either write its own implementation or rely on
 numpy.correlate. The latter rely on the fast C implementation
 _pyarray_correlate() in multiarraymodule.c
 (https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/multiarraymodule.c#L1177)

 Now, the signature of this function is meant to accept one of three
 *computation modes* ('valid', 'same', 'full'). Thoses modes make a lot
 of sense when using this function in the context of convoluting two
 signals (say an input array and a impulse response array). In the
 context of computing the (cross)correlation of two long timeseries on
 only a few lagpoints, those computation modes are unadapted and
 potentially leads to huge computational and memory wastes
 (for example :
 https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/axes.py#L4332).



 For some time, I was thinking the solution was to write a dedicated
 function in one of the stakeholder modules (but which ? statsmodel ?)
 but now I came to think numpy is the best place to put a change and this
 would quickly benefit to all stackeholders downstream.

 Indeed, I looked more carefully at the C _pyarray_correlate() function,
 and I've come to the conclusion that these three computation modes are
 an unnecessary restriction because the actual computation relies on a
 triple for loop
 (https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/multiarraymodule.c#L1177)
 which boundaries are governed by two integers `n_left` and `n_right`
 instead of the three modes.

 Maybe I've misunderstood the inner-working of this function, but I've
 the feeling that the ability to set these two integers directly instead
 of just the mode would open up the possibility to compute the
 correlation on only a few lagpoints.

 I'm fully aware that changing the signature of such a core numpy is
 out-of-question but I've got the feeling that a reasonably small some
 code refactor might lift the longgoing problem of (cross-)correlation
 computation. The Python np.correlate would require two additional
 keywords -`n_left` and `n_right`)which would override the `mode`
 keyword. Only the C function would need some more care to keep good
 backward compatibility in case it's used externally.



 What do other people think ?

I think it's a good idea. I didn't look at the implementation details.

In statsmodels we also use explicit loops (x[lag:] * y[:lag]).sum(0)
at places where we expect that in most cases we only need a few
correlations (10 or 20).
(the fft and correlate versions are mainly used where we expect or
need a large number or the full number of valid correlations)

Josef


 Best,
 Pierre


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] numpy.einsum bug?

2013-02-22 Thread Jaakko Luttinen
Hi,

Is this a bug in numpy.einsum?

 np.einsum(3, [], 2, [], [])
ValueError: If 'op_axes' or 'itershape' is not NULL in theiterator
constructor, 'oa_ndim' must be greater than zero

I think it should return 6 (i.e., 3*2).

Regards,
Jaakko
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] another discussion on numpy correlate (and convolution)

2013-02-22 Thread Matthew Brett
Hi,

On Thu, Feb 21, 2013 at 10:00 AM, Pierre Haessig
pierre.haes...@crans.org wrote:
 Hi everybody,

 (just coming from a discussion on the performance of Matplotlib's
 (x)corr function which uses np.correlate)

 There have been already many discussions on how to compute
 (cross-)correlations of time-series in Python (like
 http://stackoverflow.com/questions/6991471/computing-cross-correlation-function).
 The discussion is spread between the various stakeholders (just to name
 some I've in mind : scipy, statsmodel, matplotlib, ...).

 There are basically 2 implementations : time-domain and frequency-domain
 (using fft + multiplication). My discussion is only on time-domain
 implementation. The key usecase which I feel is not well adressed today
 is when computing the (cross)correlation of two long timeseries on only
 a few lagpoints.

 For time-domain, one can either write its own implementation or rely on
 numpy.correlate. The latter rely on the fast C implementation
 _pyarray_correlate() in multiarraymodule.c
 (https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/multiarraymodule.c#L1177)

 Now, the signature of this function is meant to accept one of three
 *computation modes* ('valid', 'same', 'full'). Thoses modes make a lot
 of sense when using this function in the context of convoluting two
 signals (say an input array and a impulse response array). In the
 context of computing the (cross)correlation of two long timeseries on
 only a few lagpoints, those computation modes are unadapted and
 potentially leads to huge computational and memory wastes
 (for example :
 https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/axes.py#L4332).



 For some time, I was thinking the solution was to write a dedicated
 function in one of the stakeholder modules (but which ? statsmodel ?)
 but now I came to think numpy is the best place to put a change and this
 would quickly benefit to all stackeholders downstream.

 Indeed, I looked more carefully at the C _pyarray_correlate() function,
 and I've come to the conclusion that these three computation modes are
 an unnecessary restriction because the actual computation relies on a
 triple for loop
 (https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/multiarraymodule.c#L1177)
 which boundaries are governed by two integers `n_left` and `n_right`
 instead of the three modes.

 Maybe I've misunderstood the inner-working of this function, but I've
 the feeling that the ability to set these two integers directly instead
 of just the mode would open up the possibility to compute the
 correlation on only a few lagpoints.

 I'm fully aware that changing the signature of such a core numpy is
 out-of-question but I've got the feeling that a reasonably small some
 code refactor might lift the longgoing problem of (cross-)correlation
 computation. The Python np.correlate would require two additional
 keywords -`n_left` and `n_right`)which would override the `mode`
 keyword. Only the C function would need some more care to keep good
 backward compatibility in case it's used externally.

From complete ignorance, do you think it is an option to allow a
(n_left, n_right) tuple as a value for 'mode'?

Cheers,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Windows, blas, atlas and dlls

2013-02-22 Thread Chris Barker - NOAA Federal
On Thu, Feb 21, 2013 at 4:16 AM, Matyáš Novák lo...@centrum.cz wrote:
 You could also look into OpenBLAS, which is easier to build and
 generally faster than ATLAS. (But alas, not supported by NumPy/SciPY AFAIK.)

It look slike OpenBLAS is BSD-licensed, and thus compatible with numpy/sciy.

It  there a reason (other than someone having to do the work) it could
not be used as the standard BLAS for numpy?

-Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] another discussion on numpy correlate (and convolution)

2013-02-22 Thread Todd
We don't actually want remove sensitive data, but this tutorial should
still allow us to remove a file totally and completely from git history. It
doesn't look that hard:

https://help.github.com/articles/remove-sensitive-data

It will require everyone to rebase, so if you want to do this it may be a
good idea to schedule it for a specific day maybe 2-4 weeks in the future
and warn people on the mailing list so nobody tries to commit anything
while the process is underway.
On Feb 22, 2013 5:40 PM, Matthew Brett matthew.br...@gmail.com wrote:

 Hi,

 On Thu, Feb 21, 2013 at 10:00 AM, Pierre Haessig
 pierre.haes...@crans.org wrote:
  Hi everybody,
 
  (just coming from a discussion on the performance of Matplotlib's
  (x)corr function which uses np.correlate)
 
  There have been already many discussions on how to compute
  (cross-)correlations of time-series in Python (like
 
 http://stackoverflow.com/questions/6991471/computing-cross-correlation-function
 ).
  The discussion is spread between the various stakeholders (just to name
  some I've in mind : scipy, statsmodel, matplotlib, ...).
 
  There are basically 2 implementations : time-domain and frequency-domain
  (using fft + multiplication). My discussion is only on time-domain
  implementation. The key usecase which I feel is not well adressed today
  is when computing the (cross)correlation of two long timeseries on only
  a few lagpoints.
 
  For time-domain, one can either write its own implementation or rely on
  numpy.correlate. The latter rely on the fast C implementation
  _pyarray_correlate() in multiarraymodule.c
  (
 https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/multiarraymodule.c#L1177
 )
 
  Now, the signature of this function is meant to accept one of three
  *computation modes* ('valid', 'same', 'full'). Thoses modes make a lot
  of sense when using this function in the context of convoluting two
  signals (say an input array and a impulse response array). In the
  context of computing the (cross)correlation of two long timeseries on
  only a few lagpoints, those computation modes are unadapted and
  potentially leads to huge computational and memory wastes
  (for example :
 
 https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/axes.py#L4332
 ).
 
 
 
  For some time, I was thinking the solution was to write a dedicated
  function in one of the stakeholder modules (but which ? statsmodel ?)
  but now I came to think numpy is the best place to put a change and this
  would quickly benefit to all stackeholders downstream.
 
  Indeed, I looked more carefully at the C _pyarray_correlate() function,
  and I've come to the conclusion that these three computation modes are
  an unnecessary restriction because the actual computation relies on a
  triple for loop
  (
 https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/multiarraymodule.c#L1177
 )
  which boundaries are governed by two integers `n_left` and `n_right`
  instead of the three modes.
 
  Maybe I've misunderstood the inner-working of this function, but I've
  the feeling that the ability to set these two integers directly instead
  of just the mode would open up the possibility to compute the
  correlation on only a few lagpoints.
 
  I'm fully aware that changing the signature of such a core numpy is
  out-of-question but I've got the feeling that a reasonably small some
  code refactor might lift the longgoing problem of (cross-)correlation
  computation. The Python np.correlate would require two additional
  keywords -`n_left` and `n_right`)which would override the `mode`
  keyword. Only the C function would need some more care to keep good
  backward compatibility in case it's used externally.

 From complete ignorance, do you think it is an option to allow a
 (n_left, n_right) tuple as a value for 'mode'?

 Cheers,

 Matthew
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Windows, blas, atlas and dlls

2013-02-22 Thread Dag Sverre Seljebotn
On 02/22/2013 05:52 PM, Chris Barker - NOAA Federal wrote:
 On Thu, Feb 21, 2013 at 4:16 AM, Matyáš Novák lo...@centrum.cz wrote:
 You could also look into OpenBLAS, which is easier to build and
 generally faster than ATLAS. (But alas, not supported by NumPy/SciPY AFAIK.)

 It look slike OpenBLAS is BSD-licensed, and thus compatible with numpy/sciy.

 It  there a reason (other than someone having to do the work) it could
 not be used as the standard BLAS for numpy?

This was discussed some weeks ago (I think the thread title contains 
openblas). IIRC it was just that somebody needs to do the work but don't 
take my word for it.

Dag Svere
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Windows, blas, atlas and dlls

2013-02-22 Thread Sergio Callegari
 
 from scipy.linalg.blas import fblas
 dgemm = fblas.dgemm._cpointer
 sgemm = fblas.sgemm._cpointer
 

OK, but this gives me a PyCObject. How do I make it a function pointer of the
correct type in cython?

Thanks again

Sergio

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Windows, blas, atlas and dlls

2013-02-22 Thread V. Armando Sole


On 22.02.2013 19:54, Sergio Callegari wrote:

 from scipy.linalg.blas import fblas
 dgemm = fblas.dgemm._cpointer
 sgemm = fblas.sgemm._cpointer


 OK, but this gives me a PyCObject. How do I make it a function 
 pointer of the
 correct type in cython?


In cython I do not know it. I coded it directly in C.

In my code I receive the pointer in input2. The relevant part is:

 PyObject *input1;
 PyObject *input2 = NULL;
 /*pointer to dgemm function */
 void * gemm_pointer = NULL;

 /** statements **/
 if (!PyArg_ParseTuple(args, OO, input1, input2))
 return NULL;

 if (input2 != NULL){
#if PY_MAJOR_VERSION = 3
 if (PyCapsule_CheckExact(input2))
 gemm_pointer = PyCapsule_GetPointer(input2, NULL);
#else
 gemm_pointer = PyCObject_AsVoidPtr(input2);
#endif
   if (gemm_pointer != NULL)
   {
/* release the GIL */
 Py_BEGIN_ALLOW_THREADS
 /* your function call here */
 Py_END_ALLOW_THREADS
}
 }


Best regards,

Armando

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Windows, blas, atlas and dlls

2013-02-22 Thread David Cournapeau
On 22 Feb 2013 16:53, Chris Barker - NOAA Federal chris.bar...@noaa.gov
wrote:

 On Thu, Feb 21, 2013 at 4:16 AM, Matyáš Novák lo...@centrum.cz wrote:
  You could also look into OpenBLAS, which is easier to build and
  generally faster than ATLAS. (But alas, not supported by NumPy/SciPY
AFAIK.)

 It look slike OpenBLAS is BSD-licensed, and thus compatible with
numpy/sciy.

 It  there a reason (other than someone having to do the work) it could
 not be used as the standard BLAS for numpy?

no reason, and it actually works quite nicely. Bento supports it,  at least
on Mac/linux.

David

 -Chris


 --

 Christopher Barker, Ph.D.
 Oceanographer

 Emergency Response Division
 NOAA/NOS/ORR(206) 526-6959   voice
 7600 Sand Point Way NE   (206) 526-6329   fax
 Seattle, WA  98115   (206) 526-6317   main reception

 chris.bar...@noaa.gov
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Windows, blas, atlas and dlls

2013-02-22 Thread Frédéric Bastien
I just read a web page on how to embed python in an application[1].

They explain that we can keep the symbol exported event if we
statically link the BLAS library in scipy. This make me think we could
just change how we compile the lib that link with BLAS and we will be
able to reuse it for other project!

But I didn't played much with this type of thing. Do someone have more
information? Do you think it would be useful?

Fred

[1] http://docs.python.org/2/extending/embedding.html#linking-requirements

On Fri, Feb 22, 2013 at 3:38 PM, David Cournapeau courn...@gmail.com wrote:

 On 22 Feb 2013 16:53, Chris Barker - NOAA Federal chris.bar...@noaa.gov
 wrote:

 On Thu, Feb 21, 2013 at 4:16 AM, Matyáš Novák lo...@centrum.cz wrote:
  You could also look into OpenBLAS, which is easier to build and
  generally faster than ATLAS. (But alas, not supported by NumPy/SciPY
  AFAIK.)

 It look slike OpenBLAS is BSD-licensed, and thus compatible with
 numpy/sciy.

 It  there a reason (other than someone having to do the work) it could
 not be used as the standard BLAS for numpy?

 no reason, and it actually works quite nicely. Bento supports it,  at least
 on Mac/linux.

 David



 -Chris


 --

 Christopher Barker, Ph.D.
 Oceanographer

 Emergency Response Division
 NOAA/NOS/ORR(206) 526-6959   voice
 7600 Sand Point Way NE   (206) 526-6329   fax
 Seattle, WA  98115   (206) 526-6317   main reception

 chris.bar...@noaa.gov
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Smart way to do this?

2013-02-22 Thread santhu kumar
Hi all,

I dont want to run a loop for this but it should be possible using numpy
smart ways.

a = np.ones(30)
idx = np.array([2,3,2]) # there is a duplicate index of 2
a += 2

a
array([ 1.,  1.,  3.,  3.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
1.,  1.,  1.,  1.])


But if we do this :
for i in range(idx.shape[0]):
   a[idx[i]] += 2

 a
array([ 1.,  1.,  5.,  3.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
1.,  1.,  1.,  1.])

How to achieve the second result without looping??
Thanks
Santhosh
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Smart way to do this?

2013-02-22 Thread santhu kumar
Sorry typo :

a = np.ones(30)
idx = np.array([2,3,2]) # there is a duplicate index of 2
a[idx] += 2

On Fri, Feb 22, 2013 at 8:35 PM, santhu kumar mesan...@gmail.com wrote:

 Hi all,

 I dont want to run a loop for this but it should be possible using numpy
 smart ways.

 a = np.ones(30)
 idx = np.array([2,3,2]) # there is a duplicate index of 2
 a += 2

 a
 array([ 1.,  1.,  3.,  3.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
 1.,  1.,  1.,  1.])


 But if we do this :
 for i in range(idx.shape[0]):
a[idx[i]] += 2

  a
 array([ 1.,  1.,  5.,  3.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
 1.,  1.,  1.,  1.])

 How to achieve the second result without looping??
 Thanks
 Santhosh

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Smart way to do this?

2013-02-22 Thread Brett Olsen
a = np.ones(30)
idx = np.array([2, 3, 2])
a += 2 * np.bincount(idx, minlength=len(a))
 a
array([ 1.,  1.,  5.,  3.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
1.,  1.,  1.,  1.])

As for speed:

def loop(a, idx):
for i in idx:
a[i] += 2

def count(a, idx):
a += 2 * np.bincount(idx, minlength=len(a))

%timeit loop(np.ones(30), np.array([2, 3, 2]))
1 loops, best of 3: 19.9 us per loop

%timeit count(np.ones(30), np.array(2, 3, 2]))
10 loops, best of 3: 19.2 us per loop

So no big difference here.  But go to larger systems and you'll see a huge
difference:

%timeit loop(np.ones(1), np.random.randint(1, size=10))
1 loops, best of 3: 260 ms per loop

%timeit count(np.ones(1), np.random.randint(1, size=10))
100 loops, best of 3: 3.03 ms per loop.

~Brett


On Fri, Feb 22, 2013 at 8:38 PM, santhu kumar mesan...@gmail.com wrote:

 Sorry typo :

 a = np.ones(30)
 idx = np.array([2,3,2]) # there is a duplicate index of 2
 a[idx] += 2

 On Fri, Feb 22, 2013 at 8:35 PM, santhu kumar mesan...@gmail.com wrote:

 Hi all,

 I dont want to run a loop for this but it should be possible using numpy
 smart ways.

 a = np.ones(30)
 idx = np.array([2,3,2]) # there is a duplicate index of 2
 a += 2

 a
 array([ 1.,  1.,  3.,  3.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
 1.,  1.,  1.,  1.])


 But if we do this :
 for i in range(idx.shape[0]):
a[idx[i]] += 2

  a
 array([ 1.,  1.,  5.,  3.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
 1.,  1.,  1.,  1.])

 How to achieve the second result without looping??
 Thanks
 Santhosh



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion