from:"eat"

Re: [Numpy-discussion] ufunc for sum of squared difference

2016-11-14 Thread eat

Yeah,

but it's not so obvious what's happening "under the hoods". Consider this
(with an old Win7 machine):
Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 25 2016, 22:18:55) [MSC v.1900 64
bit (AMD64)]
np.__version__
'1.11.1'

On Mon, Nov 14, 2016 at 10:38 AM, Jerome Kieffer <jerome.kief...@esrf.fr>
wrote:

> On Fri, 11 Nov 2016 11:25:58 -0500
> Matthew Harrigan <harrigan.matt...@gmail.com> wrote:
>
> > I started a ufunc to compute the sum of square differences here
> > <https://gist.github.com/mattharrigan/6f678b3d6df5efd236fc23bfb59fd3bd>.
> > It is about 4x faster and uses half the memory compared to
> > np.sum(np.square(x-c)).
>
> Hi Matt,
>
> Using *blas* you win already a factor two (maybe more depending on you
> blas implementation):
>
> % python -m timeit -s "import numpy as np;x=np.linspace(0,1,int(1e7))"
> "np.sum(np.square(x-2.))"
> 10 loops, best of 3: 135 msec per loop
>
> % python -m timeit -s "import numpy as np;x=np.linspace(0,1,int(1e7))"
> "y=x-2.;np.dot(y,y)"
> 10 loops, best of 3: 70.2 msec per loop
>
x= np.linspace(0, 1, int(1e6))

timeit np.sum(np.square(x- 2.))
10 loops, best of 3: 23 ms per loop

y= x- 2.

timeit np.dot(y, y)
The slowest run took 18.60 times longer than the fastest. This could mean
that an intermediate result is being cached.
1000 loops, best of 3: 1.78 ms per loop

timeit np.dot(y, y)
1000 loops, best of 3: 1.73 ms per loop

Best,
eat

>
>
> Cheers,
> --
> Jérôme Kieffer
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Characteristic of a Matrix.

2015-01-05 Thread eat

Hi,


On Mon, Jan 5, 2015 at 8:40 PM, Colin J. Williams cjwilliam...@gmail.com
wrote:

 One of the essential characteristics of a matrix is that it be rectangular.

 This is neither spelt out or checked currently.

 The Doc description refers to a class:

- *class *numpy.matrix[source]

 http://github.com/numpy/numpy/blob/v1.9.1/numpy/matrixlib/defmatrix.py#L206

 Returns a matrix from an array-like object, or from a string of data. A
 matrix is aspecialized 2-D array that retains its 2-D
 nature through operations. It has certain special operators, such as *
 (matrix multiplication) and ** (matrix power).

 This illustrates a failure, which is reported later in the calculation:

 A2= np.matrix([[1, 2, -2], [-3, -1, 4], [4, 2 -6]])

 Here 2 - 6 is treated as an expression.

FWIW, here A2 is definitely rectangular, with shape== (1, 3) and dtype==
object, i.e elements are just python lists.

 Wikipedia offers:

 In mathematics http://en.wikipedia.org/wiki/Mathematics, a *matrix*
 (plural *matrices*) is a rectangular
 http://en.wikipedia.org/wiki/Rectangle *array
 http://en.wiktionary.org/wiki/array*[1]
 http://en.wikipedia.org/wiki/Matrix_%28mathematics%29#cite_note-1 of
 numbers http://en.wikipedia.org/wiki/Number, symbols
 http://en.wikipedia.org/wiki/Symbol_%28formal%29, or expressions
 http://en.wikipedia.org/wiki/Expression_%28mathematics%29, arranged in *rows
 http://en.wiktionary.org/wiki/row* and *columns
 http://en.wiktionary.org/wiki/column*.[2]
 http://en.wikipedia.org/wiki/Matrix_%28mathematics%29#cite_note-2[3]
 http://en.wikipedia.org/wiki/Matrix_%28mathematics%29#cite_note-3

(and in this context also python objects).

-eat

 The individual items in a matrix are called its *elements* or *entries*.
 An example of a matrix with 2 rows and 3 columns is
 [image: \begin{bmatrix}1  9  -13 \\20  5  -6 \end{bmatrix}.]In the
 Numpy context, the symbols or expressions need to be evaluable.

 Colin W.





 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Characteristic of a Matrix.

2015-01-05 Thread eat

On Mon, Jan 5, 2015 at 9:36 PM, Nathaniel Smith n...@pobox.com wrote:

 On Mon, Jan 5, 2015 at 7:18 PM, josef.p...@gmail.com wrote:
 
 
 
  On Mon, Jan 5, 2015 at 1:58 PM, Nathaniel Smith n...@pobox.com wrote:
 
  I'm afraid that I really don't understand what you're trying to say. Is
 there something that you think numpy should be doing differently?
 
 
  I liked it better when this raised an exception, instead of creating a
 rectangular object array.

 Did it really used to raise an exception? Patches accepted :-)  (#5303
 is the relevant bug, like Warren points out. From the discussion there
 it doesn't look like np.array's handling of non-conformable lists has
 any defenders.)

+1 for 'object array [and matrix] construction should require explicitly
specifying dtype= object'

-eat


 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] deprecate numpy.matrix

2014-02-10 Thread eat

On Mon, Feb 10, 2014 at 7:00 PM, alex argri...@ncsu.edu wrote:

 On Mon, Feb 10, 2014 at 11:27 AM,  josef.p...@gmail.com wrote:
  How do we calculate the diagonal of the hat matrix without using N by N
  matrices?

 Not sure if this was a rhetorical question or what, but this seems to work
 leverages = np.square(scipy.linalg.qr(X, mode='economic')[0]).sum(axis=1)
 http://www4.ncsu.edu/~ipsen/ps/slides_CSE2013.pdf

Rhetorical or not, but FWIW I'll prefer to take singular value
decomposition (u, s, vt= svd(x)) and then based on the singular values
sI'll estimate a numerically feasible rank
r. Thus the diagonal of such hat matrix would be (u[:, :r]** 2).sum(1).


Regards,
-eat


 Sorry for off-topic...
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] deprecate numpy.matrix

2014-02-10 Thread eat

On Mon, Feb 10, 2014 at 9:08 PM, alex argri...@ncsu.edu wrote:

 On Mon, Feb 10, 2014 at 2:03 PM, eat e.antero.ta...@gmail.com wrote:
  Rhetorical or not, but FWIW I'll prefer to take singular value
 decomposition
  (u, s, vt= svd(x)) and then based on the singular values s I'll estimate
 a
  numerically feasible rank r. Thus the diagonal of such hat matrix
 would be
  (u[:, :r]** 2).sum(1).

 It's a small detail but you probably want svd(x, full_matrices=False)
 to avoid anything NxN.

Indeed.

Thanks,
-eat

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Creating an ndarray from an iterable over sequences

2014-01-21 Thread eat

Hi,


On Tue, Jan 21, 2014 at 8:34 AM, Dr. Leo fhaxbo...@googlemail.com wrote:

 Hi,

 I would like to write something like:

 In [25]: iterable=((i, i**2) for i in range(10))

 In [26]: a=np.fromiter(iterable, int32)
 ---
 ValueErrorTraceback (most recent call
 last)
 ipython-input-26-5bcc2e94dbca in module()
  1 a=np.fromiter(iterable, int32)

 ValueError: setting an array element with a sequence.


 Is there an efficient way to do this?

Perhaps you could just utilize structured arrays (
http://docs.scipy.org/doc/numpy/user/basics.rec.html), like:
iterable= ((i, i**2) for i in range(10))
a= np.fromiter(iterable, [('a', int32), ('b', int32)], 10)
a.view(int32).reshape(-1, 2)
Out[]:
array([[ 0,  0],
   [ 1,  1],
   [ 2,  4],
   [ 3,  9],
   [ 4, 16],
   [ 5, 25],
   [ 6, 36],
   [ 7, 49],
   [ 8, 64],
   [ 9, 81]])

My 2 cents,
-eat


 Creating two 1-dimensional arrays first is costly as one has to
 iterate twice over the data. So the only way I see is creating an
 empty [10,2] array and filling it row by row. This is memory-efficient
 but slow. List comprehension is vice versa.

 If there is no solution, wouldn't it be possible to rewrite fromiter
 so as to accept sequences?

 Leo

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Possible conversion bug with record array

2013-05-22 Thread eat

Hi,

FWIW, apparently bug related to dtype of np.eye(.)


On Wed, May 22, 2013 at 8:07 PM, Nicolas Rougier
nicolas.roug...@inria.frwrote:



 Hi all,

 I got a weird output from the following script:

 import numpy as np

 U = np.zeros(1, dtype=[('x', np.float32, (4,4))])

 U[0] = np.eye(4)
 print U[0]
 # output:  ([[0.0, 1.875, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0,
 1.875], [0.0, 0.0, 0.0, 0.0]],)

 U[0] = np.eye(4, dtype=np.float32)
 print U[0]
 # output:  ([[1.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 1.0,
 0.0], [0.0, 0.0, 0.0, 1.0]],)


 The first output is obviously wrong. Can anyone confirm ?
 (using numpy 1.7.1 on osx 10.8.3)

In []: sys.version
Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]'
In []: np.__version__
Out[]: '1.6.0'
In []: U= np.zeros(1, dtype= [('x', np.float32, (4, 4))])

In []: U[0]= np.eye(4)
In []: U
Out[]:
array([ ([[0.0, 1.875, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0,
1.875], [0.0, 0.0, 0.0, 0.0]],)],
  dtype=[('x', 'f4', (4, 4))])

In []: U[0]= np.eye(4, dtype= np.float32)
In []: U
Out[]:
array([ ([[1.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0],
[0.0, 0.0, 0.0, 1.0]],)],
  dtype=[('x', 'f4', (4, 4))])

and
In []: sys.version
Out[]: '3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64
bit (AMD64)]'
In []: np.__version__
Out[]: '1.7.0rc1'
In []: U= np.zeros(1, dtype= [('x', np.float32, (4, 4))])

In []: U[0]= np.eye(4)
In []: U
Out[17]:
array([ ([[0.0, 1.875, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0,
1.875], [0.0, 0.0, 0.0, 0.0]],)],
  dtype=[('x', 'f4', (4, 4))])

U[0]= np.eye(4, dtype= np.float32)

In []: U
Out[]:
array([ ([[1.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0],
[0.0, 0.0, 0.0, 1.0]],)],
  dtype=[('x', 'f4', (4, 4))])


My 2 cents,
-eat



 Nicolas
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] drawing the line (was: Adding .abs() method to the array object)

2013-02-26 Thread eat

Huh,

On Tue, Feb 26, 2013 at 5:03 PM, josef.p...@gmail.com wrote:

 On Tue, Feb 26, 2013 at 9:39 AM, Benjamin Root ben.r...@ou.edu wrote:
 
 
  On Mon, Feb 25, 2013 at 8:23 PM, Alan G Isaac alan.is...@gmail.com
 wrote:
 
  I'm hoping this discussion will return to the drawing the line question.
 
 
 http://stackoverflow.com/questions/8108688/in-python-when-should-i-use-a-function-instead-of-a-method
 
  Alan Isaac
 
 
  Proposed line:
 
  Reduction methods only.
 
  Discuss...

 arr.dot ?

 the 99 most common functions for which chaining looks nice.

Care to elaborate more for us less uninitiated?

Regards,
-eat


 Josef

 
 
  Ben Root
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Shouldn't all in-place operations simply return self?

2013-01-19 Thread eat

Hi,

On Fri, Jan 18, 2013 at 12:13 AM, Thouis (Ray) Jones tho...@gmail.comwrote:

 On Thu, Jan 17, 2013 at 10:27 AM, Charles R Harris
 charlesr.har...@gmail.com wrote:
 
 
  On Wed, Jan 16, 2013 at 5:11 PM, eat e.antero.ta...@gmail.com wrote:
 
  Hi,
 
  In a recent thread
  http://article.gmane.org/gmane.comp.python.numeric.general/52772 it was
  proposed that .fill(.) should return self as an alternative for a
 trivial
  two-liner.
 
  I'm raising now the question: what if all in-place operations indeed
 could
  return self? How bad this would be? A 'strong' counter argument may be
 found
  at http://mail.python.org/pipermail/python-dev/2003-October/038855.html
 .
 
  But anyway, at least for me. it would be much more straightforward to
  implement simple mini dsl's
  (http://en.wikipedia.org/wiki/Domain-specific_language) a much more
  straightforward manner.
 
  What do you think?
 
 
  I've read Guido about why he didn't like inplace operations returning
 self
  and found him convincing for a while. And then I listened to other folks
  express a preference for the freight train style and found them
 convincing
  also. I think it comes down to a preference for one style over another
 and I
  go back and forth myself. If I had to vote, I'd go for returning self,
 but
  I'm not sure it's worth breaking python conventions to do so.
 
  Chuck

 I'm -1 on breaking with Python convention without very good reasons.

As an example I personally find following behavior highly counter intuitive.
In []: p, P= rand(3, 1), rand(3, 5)

In []: ((p- P)** 2).sum(0).argsort()
Out[]: array([2, 4, 1, 3, 0])

In []: ((p- P)** 2).sum(0).sort().diff()

Traceback (most recent call last):
  File ipython console, line 1, in module
AttributeError: 'NoneType' object has no attribute 'diff'


Regards,
-eat


 Ray
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Shouldn't all in-place operations simply return self?

2013-01-16 Thread eat

Hi,

In a recent thread
http://article.gmane.org/gmane.comp.python.numeric.general/52772 it was
proposed that .fill(.) should return self as an alternative for a trivial
two-liner.

I'm raising now the question: what if all in-place operations indeed could
return self? How bad this would be? A 'strong' counter argument may be
found at
http://mail.python.org/pipermail/python-dev/2003-October/038855.html.

But anyway, at least for me. it would be much more straightforward to
implement simple mini dsl's (
http://en.wikipedia.org/wiki/Domain-specific_language) a much more
straightforward manner.

What do you think?


-eat

P.S. FWIW, if this idea really gains momentum obviously I'm volunteering to
create a PR of it.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] argsort

2013-01-15 Thread eat

Hi,

On Tue, Jan 15, 2013 at 1:50 PM, Mads Ipsen madsip...@gmail.com wrote:

  Hi,

 I simply can't understand this. I'm trying to use argsort to produce
 indices that can be used to sort an array:

   from numpy import *

   indices = array([[4,3],[1,12],[23,7],[11,6],[8,9]])
   args = argsort(indices, axis=0)
   print indices[args]

 gives:

 [[[ 1 12]
   [ 4  3]]

  [[ 4  3]
   [11  6]]

  [[ 8  9]
   [23  7]]

  [[11  6]
   [ 8  9]]

  [[23  7]
   [ 1 12]]]

 I thought this should produce a sorted version of the indices array.

 Any help is appreciated.

Perhaps these three different point of views will help you a little bit
more to move on:
In []: x
Out[]:
array([[ 4,  3],
   [ 1, 12],
   [23,  7],
   [11,  6],
   [ 8,  9]])
In []: ind= x.argsort(axis= 0)
In []: ind
Out[]:
array([[1, 0],
   [0, 3],
   [4, 2],
   [3, 4],
   [2, 1]])

In []: x[ind[:, 0]]
Out[]:
array([[ 1, 12],
   [ 4,  3],
   [ 8,  9],
   [11,  6],
   [23,  7]])

In []: x[ind[:, 1]]
Out[]:
array([[ 4,  3],
   [11,  6],
   [23,  7],
   [ 8,  9],
   [ 1, 12]])

In []: x[ind, [0, 1]]
Out[]:
array([[ 1,  3],
   [ 4,  6],
   [ 8,  7],
   [11,  9],
   [23, 12]])
-eat


 Best regards,

 Mads

  --
 +-+
 | Mads Ipsen  |
 +--+--+
 | Gåsebæksvej 7, 4. tv |  |
 | DK-2500 Valby| phone:  +45-29716388 |
 | Denmark  | email:  mads.ip...@gmail.com |
 +--+--+



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] nan result from np.linalg.lstsq()

2012-10-29 Thread eat

Hi,

On Mon, Oct 29, 2012 at 11:01 AM, Larry Paltrow larry.palt...@gmail.comwrote:

 np.isnan(data) is True
  False

Check with:
np.all(~np.isnan(x))

My 2 cents,
-eat



 On Mon, Oct 29, 2012 at 1:50 AM, Pauli Virtanen p...@iki.fi wrote:

 Larry Paltrow larry.paltrow at gmail.com writes:
  I'm having some trouble using the linalg.lstsq() function
  with certain data and trying to understand why. It's
  always returning nans when I feed it this numpy array of float64 data:
 
  data = df.close.values #coming from a pandas dataframe
 
  type(data)
   numpy.ndarray

 Maybe you have NaNs in the array? (i.e. np.isnan(data) is True)

 --
 Pauli Virtanen


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] sum and prod

2012-09-08 Thread eat

Hi,

On Sun, Sep 9, 2012 at 12:56 AM, nicky van foreest vanfore...@gmail.comwrote:

 Hi,

 I ran the following code:

 args = np.array([4,8])
 print np.sum( (arg  0) for arg in args)
 print np.sum([(arg  0) for arg in args])
 print np.prod( (arg  0) for arg in args)
 print np.prod([(arg  0) for arg in args])

Can't see why someone would write code like above, but anyway:
In []: args = np.array([4,8])
In []: print np.sum( (arg  0) for arg in args)
2
In []: print np.sum([(arg  0) for arg in args])
2
In []: print np.prod( (arg  0) for arg in args)
generator object genexpr at 0x062BDA08
In []: print np.prod([(arg  0) for arg in args])
1
In []: print np.prod( (arg  0) for arg in args).next()
True
In []: sys.version
Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]'
In []: np.version.version
Out[]: '1.6.0'

My 2 cents,
-eat


 with this result:

 2
 1
 generator object genexpr at 0x1c70410
 1

 Is the difference between prod and sum intentional? I would expect
 that  numpy.prod would also work on a generator, just like numpy.sum.

 BTW: the last line does what I need: the product over the truth values
 of all elements of args. Is there perhaps a nicer (conciser) way to
 achieve this?  Thanks.

 Nicky
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] 2 greatest values, in a 3-d array, along one axis

2012-08-03 Thread eat

Hi,

On Fri, Aug 3, 2012 at 7:02 PM, Angus McMorland amcm...@gmail.com wrote:

 On 3 August 2012 11:18, Jim Vickroy jim.vick...@noaa.gov wrote:

 Hello everyone,

 I'm trying to determine the 2 greatest values, in a 3-d array, along one
 axis.

 Here is an approach:

 # --
 # procedure to determine greatest 2 values for 3rd dimension of 3-d
 array ...
 import numpy, numpy.ma
 xcnt, ycnt, zcnt   = 2,3,4 # actual case is (1024, 1024, 8)
 p0 = numpy.empty ((xcnt,ycnt,zcnt))
 for z in range (zcnt) : p0[:,:,z] = z*z
 zaxis  = 2# max
 values to be determined for 3rd axis
 p0max  = numpy.max (p0, axis=zaxis)   # max
 values for zaxis
 maxindices = numpy.argmax (p0, axis=zaxis)#
 indices of max values
 p1 = p0.copy()# work
 array to scan for 2nd highest values
 j, i   = numpy.meshgrid (numpy.arange (ycnt), numpy.arange
 (xcnt))
 p1[i,j,maxindices] = numpy.NaN# flag
 all max values
 p1 = numpy.ma.masked_where (numpy.isnan (p1), p1) # hide
 all max values
 p1max  = numpy.max (p1, axis=zaxis)   # 2nd
 highest values for zaxis
 # additional code to analyze p0max and p1max goes here
 # --

 I would appreciate feedback on a simpler approach -- e.g., one that does
 not require masked arrays and or use of magic values like NaN.

 Thanks,
 -- jv


 Here's a way that only uses argsort and fancy indexing:

 a = np.random.randint(10, size=(3,3,3))
 print a

 [[[0 3 8]
   [4 2 8]
   [8 6 3]]

  [[0 6 7]
   [0 3 9]
   [0 9 1]]

  [[7 9 7]
   [5 2 9]
   [9 3 3]]]

 am = a.argsort(axis=2)
 maxs = a[np.arange(a.shape[0])[:,None], np.arange(a.shape[1])[None],
 am[:,:,-1]]
 print maxs

 [[8 8 8]
  [7 9 9]
  [9 9 9]]

 seconds = a[np.arange(a.shape[0])[:,None], np.arange(a.shape[1])[None],
 am[:,:,-2]]
 print seconds

 [[3 4 6]
  [6 3 1]
  [7 5 3]]

 And to double check:

 i, j = 0, 1
 l = a[i, j,:]
 print l

 [4 2 8]

  print np.max(a[i,j,:]), maxs[i,j]

 8 8

 print l[np.argsort(l)][-2], second[i,j]

 4 4

 Good luck.

Here the np.indicies function may help a little bit, like:
In []: a= randint(10, size= (3, 2, 4))
In []: a
Out[]:
array([[[1, 9, 6, 6],
[0, 3, 4, 2]],
   [[4, 2, 4, 4],
[5, 9, 4, 4]],
   [[6, 1, 4, 3],
[5, 4, 5, 5]]])

In []: ndx= indices(a.shape)
In []: # largest
In []: a[a.argsort(0), ndx[1], ndx[2]][-1]
Out[]:
array([[6, 9, 6, 6],
   [5, 9, 5, 5]])

In []: # second largest
In []: a[a.argsort(0), ndx[1], ndx[2]][-2]
Out[]:
array([[4, 2, 4, 4],
   [5, 4, 4, 4]])


My 2 cents,
-eat


 Angus.
 --
 AJC McMorland
 Post-doctoral research fellow
 Neurobiology, University of Pittsburgh

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Reordering 2 dimensional array by column

2012-08-02 Thread eat

Hi,

On Thu, Aug 2, 2012 at 3:43 PM, Nicole Stoffels
nicole.stoff...@forwind.dewrote:

 Dear all,

 I have a two-dimensional array:

 a = array([[1,2,3],[0,2,1],[5,7,8]])

 I want to reorder it by the last column in descending order, so that I get:

 b =array([[5, 7, 8],[1, 2, 3],[0, 2, 1]])

Perhaps along the lines:
In []: a
Out[]:
array([[1, 2, 3],
   [0, 2, 1],
   [5, 7, 8]])
In []: ndx= a[:, 2].argsort()
In []: a[ndx[::-1], :]
Out[]:
array([[5, 7, 8],
   [1, 2, 3],
   [0, 2, 1]])



 What I did first is the following, which reorders the array in ascending
 order (I found that method in the internet):

 b = array(sorted(a, key=lambda new_entry: new_entry[2]))
 b = array([[0, 2, 1],[1, 2, 3],[5, 7, 8]])

 But I want it just the other way arround. So I did the following
 afterwards which results in an array only containing zeros:
 b_indices = b.argsort()
 b_matrix = b[b_indices[::-1]]
 new_b = b_matrix[len(b_matrix)-1]

 Is there an easy way to reorder it? Or is there at least a complicated
 way which produces the right output?

 I hope you can help me! Thanks!

My 2 cents,
-eat


 Best regards,

 Nicole

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] array slicing questions

2012-07-31 Thread eat

Hi,

On Tue, Jul 31, 2012 at 10:23 AM, Vlastimil Brom
vlastimil.b...@gmail.comwrote:

 2012/7/30 eat e.antero.ta...@gmail.com:
  Hi,
 
  A partial answer to your questions:
 
  On Mon, Jul 30, 2012 at 10:33 PM, Vlastimil Brom 
 vlastimil.b...@gmail.com
  wrote:
 
  Hi all,
  I'd like to ask for some hints or advice regarding the usage of
  numpy.array and especially  slicing.
 
  I only recently tried numpy and was impressed by the speedup in some
  parts of the code, hence I suspect, that I might miss some other
  oportunities in this area.
 
  I currently use the following code for a simple visualisation of the
  search matches within the text, the arrays are generally much larger
  than the sample - the texts size is generally hundreds of kilobytes up
  to a few MB - with an index position for each character.
  First there is a list of spans(obtained form the regex match objects),
  the respective character indices in between these slices should be set
  to 1:
 
   import numpy
   characters_matches = numpy.zeros(10)
   matches_spans = numpy.array([[2,4], [5,9]])
   for start, stop in matches_spans:
  ... characters_matches[start:stop] = 1
  ...
   characters_matches
  array([ 0.,  0.,  1.,  1.,  0.,  1.,  1.,  1.,  1.,  0.])
 
  Is there maybe a way tu achieve this in a numpy-only way - without the
  python loop?
  (I got the impression, the powerful slicing capabilities could make it
  possible, bud haven't found this kind of solution.)
 
 
  In the next piece of code all the character positions are evaluated
  with their neighbourhood and a kind of running proportions of the
  matched text parts are computed (the checks_distance could be
  generally up to the order of the half the text length, usually less :
 
  
   check_distance = 1
   floating_checks_proportions = []
   for i in numpy.arange(len(characters_matches)):
  ... lo = i - check_distance
  ... if lo  0:
  ... lo = None
  ... hi = i + check_distance + 1
  ... checked_sublist = characters_matches[lo:hi]
  ... proportion = (checked_sublist.sum() / (check_distance * 2 +
 1.0))
  ... floating_checks_proportions.append(proportion)
  ...
   floating_checks_proportions
  [0.0, 0.1, 0.3, 0.3,
  0.3, 0.3, 1.0, 1.0,
  0.3, 0.1]
  
 
  Define a function for proportions:
 
  from numpy import r_
 
  from numpy.lib.stride_tricks import as_strided as ast
 
  def proportions(matches, distance= 1):
 
  cd, cd2p1, s= distance, 2* distance+ 1, matches.strides[0]
 
  # pad
 
  m= r_[[0.]* cd, matches, [0.]* cd]
 
  # create a suitable view
 
  m= ast(m, shape= (m.shape[0], cd2p1), strides= (s, s))
 
  # average
 
  return m[:-2* cd].sum(1)/ cd2p1
  and use it like:
  In []: matches
  Out[]: array([ 0.,  0.,  1.,  1.,  0.,  1.,  1.,  1.,  1.,  0.])
 
  In []: proportions(matches).round(2)
  Out[]: array([ 0.  ,  0.33,  0.67,  0.67,  0.67,  0.67,  1.  ,  1.  ,
  0.67,
  0.33])
  In []: proportions(matches, 5).round(2)
  Out[]: array([ 0.27,  0.36,  0.45,  0.55,  0.55,  0.55,  0.55,  0.55,
  0.45,
  0.36])
 
 
  I'd like to ask about the possible better approaches, as it doesn't
  look very elegant to me, and I obviously don't know the implications
  or possible drawbacks of numpy arrays in some scenarios.
 
  the pattern
  for i in range(len(...)): is usually considered inadequate in python,
  but what should be used in this case as the indices are primarily
  needed?
  is something to be gained or lost using (x)range or np.arange as the
  python loop is (probably?) inevitable anyway?
 
  Here np.arange(.) will create a new array and potentially wasting memory
 if
  it's not otherwise used. IMO nothing wrong looping with xrange(.) (if you
  really need to loop ;).
 
  Is there some mor elegant way to check for the underflowing lower
  bound lo to replace with None?
 
  Is it significant, which container is used to collect the results of
  the computation in the python loop - i.e. python list or a numpy
  array?
  (Could possibly matplotlib cooperate better with either container?)
 
  And of course, are there maybe other things, which should be made
  better/differently?
 
  (using Numpy 1.6.2, python 2.7.3, win XP)
 
 
  My 2 cents,
  -eat
 
  Thanks in advance for any hints or suggestions,
 regards,
Vlastimil Brom
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 Hi,
 thank you very much for your suggestions!

 do I understand it correctly, that I have to special-case the function
 for distance = 0 (which should return the matches themselves without
 recalculation)?

Yes.


 However, more importantly, I am getting a ValueError for some larger,
 (but not completely unreasonable) distance

  proportions(matches, distance= 8190)
 Traceback (most recent call

Re: [Numpy-discussion] array slicing questions

2012-07-31 Thread eat

Hi,

On Tue, Jul 31, 2012 at 5:01 PM, Vlastimil Brom vlastimil.b...@gmail.comwrote:

 2012/7/31 eat e.antero.ta...@gmail.com:
  Hi,
 
  On Tue, Jul 31, 2012 at 10:23 AM, Vlastimil Brom 
 vlastimil.b...@gmail.com
  wrote:
 
  2012/7/30 eat e.antero.ta...@gmail.com:
   Hi,
  
   A partial answer to your questions:
  
   On Mon, Jul 30, 2012 at 10:33 PM, Vlastimil Brom
   vlastimil.b...@gmail.com
   wrote:
  
   Hi all,
   I'd like to ask for some hints or advice regarding the usage of
   numpy.array and especially  slicing.
  
   I only recently tried numpy and was impressed by the speedup in some
   parts of the code, hence I suspect, that I might miss some other
   oportunities in this area.
  
   I currently use the following code for a simple visualisation of the
   search matches within the text, the arrays are generally much larger
   than the sample - the texts size is generally hundreds of kilobytes
 up
   to a few MB - with an index position for each character.
   First there is a list of spans(obtained form the regex match
 objects),
   the respective character indices in between these slices should be
 set
   to 1:
  
import numpy
characters_matches = numpy.zeros(10)
matches_spans = numpy.array([[2,4], [5,9]])
for start, stop in matches_spans:
   ... characters_matches[start:stop] = 1
   ...
characters_matches
   array([ 0.,  0.,  1.,  1.,  0.,  1.,  1.,  1.,  1.,  0.])
  
   Is there maybe a way tu achieve this in a numpy-only way - without
 the
   python loop?
   (I got the impression, the powerful slicing capabilities could make
 it
   possible, bud haven't found this kind of solution.)
  
  
   In the next piece of code all the character positions are evaluated
   with their neighbourhood and a kind of running proportions of the
   matched text parts are computed (the checks_distance could be
   generally up to the order of the half the text length, usually less :
  
   
check_distance = 1
floating_checks_proportions = []
for i in numpy.arange(len(characters_matches)):
   ... lo = i - check_distance
   ... if lo  0:
   ... lo = None
   ... hi = i + check_distance + 1
   ... checked_sublist = characters_matches[lo:hi]
   ... proportion = (checked_sublist.sum() / (check_distance * 2 +
   1.0))
   ... floating_checks_proportions.append(proportion)
   ...
floating_checks_proportions
   [0.0, 0.1, 0.3, 0.3,
   0.3, 0.3, 1.0, 1.0,
   0.3, 0.1]
   
  
   Define a function for proportions:
  
   from numpy import r_
  
   from numpy.lib.stride_tricks import as_strided as ast
  
   def proportions(matches, distance= 1):
  
   cd, cd2p1, s= distance, 2* distance+ 1, matches.strides[0]
  
   # pad
  
   m= r_[[0.]* cd, matches, [0.]* cd]
  
   # create a suitable view
  
   m= ast(m, shape= (m.shape[0], cd2p1), strides= (s, s))
  
   # average
  
   return m[:-2* cd].sum(1)/ cd2p1
   and use it like:
   In []: matches
   Out[]: array([ 0.,  0.,  1.,  1.,  0.,  1.,  1.,  1.,  1.,  0.])
  
   In []: proportions(matches).round(2)
   Out[]: array([ 0.  ,  0.33,  0.67,  0.67,  0.67,  0.67,  1.  ,  1.  ,
   0.67,
   0.33])
   In []: proportions(matches, 5).round(2)
   Out[]: array([ 0.27,  0.36,  0.45,  0.55,  0.55,  0.55,  0.55,  0.55,
   0.45,
   0.36])
  
  
   I'd like to ask about the possible better approaches, as it doesn't
   look very elegant to me, and I obviously don't know the implications
   or possible drawbacks of numpy arrays in some scenarios.
  
   the pattern
   for i in range(len(...)): is usually considered inadequate in python,
   but what should be used in this case as the indices are primarily
   needed?
   is something to be gained or lost using (x)range or np.arange as the
   python loop is (probably?) inevitable anyway?
  
   Here np.arange(.) will create a new array and potentially wasting
 memory
   if
   it's not otherwise used. IMO nothing wrong looping with xrange(.) (if
   you
   really need to loop ;).
  
   Is there some mor elegant way to check for the underflowing lower
   bound lo to replace with None?
  
   Is it significant, which container is used to collect the results of
   the computation in the python loop - i.e. python list or a numpy
   array?
   (Could possibly matplotlib cooperate better with either container?)
  
   And of course, are there maybe other things, which should be made
   better/differently?
  
   (using Numpy 1.6.2, python 2.7.3, win XP)
  
  
   My 2 cents,
   -eat
  
   Thanks in advance for any hints or suggestions,
  regards,
 Vlastimil Brom
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
  Hi,
  thank you very much for your suggestions!
 
  do I understand it correctly, that I have to special-case

Re: [Numpy-discussion] array slicing questions

2012-07-31 Thread eat

Hi,

On Tue, Jul 31, 2012 at 6:43 PM, Nathaniel Smith n...@pobox.com wrote:

 On Tue, Jul 31, 2012 at 2:23 PM, eat e.antero.ta...@gmail.com wrote:
  Apparently ast(.) does not return a view of the original matches rather a
  copy of size (n* (2* distance+ 1)), thus you may run out of memory.

 The problem isn't memory, it's that on 32-bit Python,
 np.prod(arr.shape) must be 2**32 (or maybe 2**31 -- something like
 that).

I think this is what the traceback is indicating.

 Normally you wouldn't be creating such arrays anyway because
 they would be too big to fit into memory, so this problem isn't
 observed, but when you're using stride_tricks then it's very easy to
 create arrays that use only a small amount of memory but that have
 very large shapes.

But in this specific case .nbytes attribute indicates that a huge amount of
memory is used. So I guess stride_tricks(.) is not returning a view.

 Solution: don't buy more memory, just use a 64-bit
 Python, where the limit is 2**64 (or 2**63 or whatever).

Regards,
-eat


 -n
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] array slicing questions

2012-07-31 Thread eat

Hi,

On Tue, Jul 31, 2012 at 7:30 PM, Nathaniel Smith n...@pobox.com wrote:

 On Tue, Jul 31, 2012 at 4:57 PM, eat e.antero.ta...@gmail.com wrote:
  Hi,
 
  On Tue, Jul 31, 2012 at 6:43 PM, Nathaniel Smith n...@pobox.com wrote:
 
  On Tue, Jul 31, 2012 at 2:23 PM, eat e.antero.ta...@gmail.com wrote:
   Apparently ast(.) does not return a view of the original matches
 rather
   a
   copy of size (n* (2* distance+ 1)), thus you may run out of memory.
 
  The problem isn't memory, it's that on 32-bit Python,
  np.prod(arr.shape) must be 2**32 (or maybe 2**31 -- something like
  that).
 
  I think this is what the traceback is indicating.
 
  Normally you wouldn't be creating such arrays anyway because
  they would be too big to fit into memory, so this problem isn't
  observed, but when you're using stride_tricks then it's very easy to
  create arrays that use only a small amount of memory but that have
  very large shapes.
 
  But in this specific case .nbytes attribute indicates that a huge amount
 of
  memory is used. So I guess stride_tricks(.) is not returning a view.

 No, .nbytes is lying to you -- it just returns np.prod(arr.shape) *
 arr.dtype.itemsize. It isn't smart enough to realize that you have
 wacky strides that cause the same memory region to be referenced by
 many different array elements.

Aha, very good to know.

Thanks,
-eat


 -n
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] array slicing questions

2012-07-31 Thread eat

Hi,

On Tue, Jul 31, 2012 at 7:20 PM, Vlastimil Brom vlastimil.b...@gmail.comwrote:

 2012/7/31 eat e.antero.ta...@gmail.com:
  Hi,
 
  On Tue, Jul 31, 2012 at 5:01 PM, Vlastimil Brom 
 vlastimil.b...@gmail.com
  wrote:
 
  2012/7/31 eat e.antero.ta...@gmail.com:
   Hi,
  
   On Tue, Jul 31, 2012 at 10:23 AM, Vlastimil Brom
   vlastimil.b...@gmail.com
   wrote:
  
   2012/7/30 eat e.antero.ta...@gmail.com:
Hi,
   
A partial answer to your questions:
   
On Mon, Jul 30, 2012 at 10:33 PM, Vlastimil Brom
vlastimil.b...@gmail.com
wrote:
   
Hi all,
I'd like to ask for some hints or advice regarding the usage of
numpy.array and especially  slicing.
   
I only recently tried numpy and was impressed by the speedup in
 some
parts of the code, hence I suspect, that I might miss some other
oportunities in this area.
   
I currently use the following code for a simple visualisation of
 the
search matches within the text, the arrays are generally much
 larger
than the sample - the texts size is generally hundreds of
 kilobytes
up
to a few MB - with an index position for each character.
First there is a list of spans(obtained form the regex match
objects),
the respective character indices in between these slices should be
set
to 1:
   
 import numpy
 characters_matches = numpy.zeros(10)
 matches_spans = numpy.array([[2,4], [5,9]])
 for start, stop in matches_spans:
... characters_matches[start:stop] = 1
...
 characters_matches
array([ 0.,  0.,  1.,  1.,  0.,  1.,  1.,  1.,  1.,  0.])
   
Is there maybe a way tu achieve this in a numpy-only way - without
the
python loop?
(I got the impression, the powerful slicing capabilities could
 make
it
possible, bud haven't found this kind of solution.)
   
   
In the next piece of code all the character positions are
 evaluated
with their neighbourhood and a kind of running proportions of
 the
matched text parts are computed (the checks_distance could be
generally up to the order of the half the text length, usually
 less
:
   

 check_distance = 1
 floating_checks_proportions = []
 for i in numpy.arange(len(characters_matches)):
... lo = i - check_distance
... if lo  0:
... lo = None
... hi = i + check_distance + 1
... checked_sublist = characters_matches[lo:hi]
... proportion = (checked_sublist.sum() / (check_distance * 2
 +
1.0))
... floating_checks_proportions.append(proportion)
...
 floating_checks_proportions
[0.0, 0.1, 0.3,
 0.3,
0.3, 0.3, 1.0, 1.0,
0.3, 0.1]

   
Define a function for proportions:
   
from numpy import r_
   
from numpy.lib.stride_tricks import as_strided as ast
   
def proportions(matches, distance= 1):
   
cd, cd2p1, s= distance, 2* distance+ 1, matches.strides[0]
   
# pad
   
m= r_[[0.]* cd, matches, [0.]* cd]
   
# create a suitable view
   
m= ast(m, shape= (m.shape[0], cd2p1), strides= (s, s))
   
# average
   
return m[:-2* cd].sum(1)/ cd2p1
and use it like:
In []: matches
Out[]: array([ 0.,  0.,  1.,  1.,  0.,  1.,  1.,  1.,  1.,  0.])
   
In []: proportions(matches).round(2)
Out[]: array([ 0.  ,  0.33,  0.67,  0.67,  0.67,  0.67,  1.  ,  1.
  ,
0.67,
0.33])
In []: proportions(matches, 5).round(2)
Out[]: array([ 0.27,  0.36,  0.45,  0.55,  0.55,  0.55,  0.55,
  0.55,
0.45,
0.36])
   
   
I'd like to ask about the possible better approaches, as it
 doesn't
look very elegant to me, and I obviously don't know the
 implications
or possible drawbacks of numpy arrays in some scenarios.
   
the pattern
for i in range(len(...)): is usually considered inadequate in
python,
but what should be used in this case as the indices are primarily
needed?
is something to be gained or lost using (x)range or np.arange as
 the
python loop is (probably?) inevitable anyway?
   
Here np.arange(.) will create a new array and potentially wasting
memory
if
it's not otherwise used. IMO nothing wrong looping with xrange(.)
 (if
you
really need to loop ;).
   
Is there some mor elegant way to check for the underflowing
 lower
bound lo to replace with None?
   
Is it significant, which container is used to collect the results
 of
the computation in the python loop - i.e. python list or a numpy
array?
(Could possibly matplotlib cooperate better with either
 container?)
   
And of course, are there maybe other things, which should be made
better/differently?
   
(using Numpy 1.6.2, python 2.7.3, win XP)
   
   
My 2 cents,
-eat
   
Thanks in advance for any hints or suggestions

Re: [Numpy-discussion] array slicing questions

2012-07-30 Thread eat

Hi,

A partial answer to your questions:

On Mon, Jul 30, 2012 at 10:33 PM, Vlastimil Brom
vlastimil.b...@gmail.comwrote:

 Hi all,
 I'd like to ask for some hints or advice regarding the usage of
 numpy.array and especially  slicing.

 I only recently tried numpy and was impressed by the speedup in some
 parts of the code, hence I suspect, that I might miss some other
 oportunities in this area.

 I currently use the following code for a simple visualisation of the
 search matches within the text, the arrays are generally much larger
 than the sample - the texts size is generally hundreds of kilobytes up
 to a few MB - with an index position for each character.
 First there is a list of spans(obtained form the regex match objects),
 the respective character indices in between these slices should be set
 to 1:

  import numpy
  characters_matches = numpy.zeros(10)
  matches_spans = numpy.array([[2,4], [5,9]])
  for start, stop in matches_spans:
 ... characters_matches[start:stop] = 1
 ...
  characters_matches
 array([ 0.,  0.,  1.,  1.,  0.,  1.,  1.,  1.,  1.,  0.])

 Is there maybe a way tu achieve this in a numpy-only way - without the
 python loop?
 (I got the impression, the powerful slicing capabilities could make it
 possible, bud haven't found this kind of solution.)


 In the next piece of code all the character positions are evaluated
 with their neighbourhood and a kind of running proportions of the
 matched text parts are computed (the checks_distance could be
 generally up to the order of the half the text length, usually less :

 
  check_distance = 1
  floating_checks_proportions = []
  for i in numpy.arange(len(characters_matches)):
 ... lo = i - check_distance
 ... if lo  0:
 ... lo = None
 ... hi = i + check_distance + 1
 ... checked_sublist = characters_matches[lo:hi]
 ... proportion = (checked_sublist.sum() / (check_distance * 2 + 1.0))
 ... floating_checks_proportions.append(proportion)
 ...
  floating_checks_proportions
 [0.0, 0.1, 0.3, 0.3,
 0.3, 0.3, 1.0, 1.0,
 0.3, 0.1]
 

Define a function for proportions:

from numpy import r_

from numpy.lib.stride_tricks import as_strided as ast

def proportions(matches, distance= 1):

cd, cd2p1, s= distance, 2* distance+ 1, matches.strides[0]

# pad

m= r_[[0.]* cd, matches, [0.]* cd]

# create a suitable view

m= ast(m, shape= (m.shape[0], cd2p1), strides= (s, s))

# average
return m[:-2* cd].sum(1)/ cd2p1
and use it like:
In []: matches
Out[]: array([ 0.,  0.,  1.,  1.,  0.,  1.,  1.,  1.,  1.,  0.])

In []: proportions(matches).round(2)
Out[]: array([ 0.  ,  0.33,  0.67,  0.67,  0.67,  0.67,  1.  ,  1.  ,
 0.67,  0.33])
In []: proportions(matches, 5).round(2)
Out[]: array([ 0.27,  0.36,  0.45,  0.55,  0.55,  0.55,  0.55,  0.55,
 0.45,  0.36])


 I'd like to ask about the possible better approaches, as it doesn't
 look very elegant to me, and I obviously don't know the implications
 or possible drawbacks of numpy arrays in some scenarios.

 the pattern
 for i in range(len(...)): is usually considered inadequate in python,
 but what should be used in this case as the indices are primarily
 needed?
 is something to be gained or lost using (x)range or np.arange as the
 python loop is (probably?) inevitable anyway?

Here np.arange(.) will create a new array and potentially wasting memory if
it's not otherwise used. IMO nothing wrong looping with xrange(.) (if you
really need to loop ;).

 Is there some mor elegant way to check for the underflowing lower
 bound lo to replace with None?

 Is it significant, which container is used to collect the results of
 the computation in the python loop - i.e. python list or a numpy
 array?
 (Could possibly matplotlib cooperate better with either container?)

 And of course, are there maybe other things, which should be made
 better/differently?

 (using Numpy 1.6.2, python 2.7.3, win XP)


My 2 cents,
-eat

 Thanks in advance for any hints or suggestions,
regards,
   Vlastimil Brom
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] np.unique for one bi-dimensional array

2012-07-24 Thread eat

Hi,

On Wed, Jul 25, 2012 at 1:09 AM, Giuseppe Amatulli 
giuseppe.amatu...@gmail.com wrote:

 Hi,

 would like to identify unique pairs of numbers in two arrays o in one
 bi-dimensional array, and count the observation

 a_clean=array([4,4,5,4,4,4])
 b_clean=array([3,5,4,4,3,4])

 and obtain
 (4,3,2)
 (4,5,1)
 (5,4,1)
 (4,4,2)

 I solved with tow loops but off course there will be a fast solution
 Any idea?
 what about using np.unique for one bi-dimensional array?

According the docs
http://docs.scipy.org/doc/numpy/reference/generated/numpy.unique.html np.unique
will flatten the input to 1D array.

However, perhaps something like the following lines will help you:
In []: lot= zip(a_clean, b_clean)
In []: lot
Out[]: [(4, 3), (4, 5), (5, 4), (4, 4), (4, 3), (4, 4)]
In []: [[x, lot.count(x)] for x in set(lot)]
Out[]: [[(4, 5), 1], [(5, 4), 1], [(4, 4), 2], [(4, 3), 2]]


My 2 cents,
-eat


 In bash I usually unique command

 thanks in advance
 Giuseppe

 --
 Giuseppe Amatulli
 Web: www.spatial-ecology.net
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Problems understanding histogram2d

2012-07-20 Thread eat

Hi,

On Fri, Jul 20, 2012 at 6:42 PM, yogesh karpate yogeshkarp...@gmail.comwrote:

 I think since its a joint histogram, you need to have equal no. of data
 points and bins
 in both x and y.

Makes sense that number of elements of data points (x, y) is equal. Perhaps
the documentation like
http://docs.scipy.org/doc/numpy-1.6.0/reference/generated/numpy.histogram2d.html
could
make this aspect more clearer.

Especially confused is the requirement that *x* : array_like, shape(N,) and
*y* : array_like, shape(M,), may indicate that N!= M could be a feasible
case. A slightly better way would just state that x and y must be
one dimensional and they must be equal length.


My 2 cents,
-eat



 On Fri, Jul 20, 2012 at 5:11 PM, Andreas Hilboll li...@hilboll.de wrote:

 Hi,

 I have a problem using histogram2d:

from numpy import linspace, histogram2d
bins_x = linspace(-180., 180., 360)
bins_y = linspace(-90., 90., 180)
data_x = linspace(-179.96875, 179.96875, 5760)
data_y = linspace(-89.96875, 89.96875, 2880)
histogram2d(data_x, data_y, (bins_x, bins_y))

AttributeError: The dimension of bins must be equal to the dimension of
 the sample x.

 I would expect histogram2d to return a 2d array of shape (360,180), which
 is full of 256s. What am I missing here?

 Cheers,
 Andreas.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




 --
 Regards
 Yogesh Karpate


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Good way to develop numpy as popular choice!

2012-06-22 Thread eat

Hi,

On Fri, Jun 22, 2012 at 6:05 PM, Benjamin Root ben.r...@ou.edu wrote:



 On Fri, Jun 22, 2012 at 10:25 AM, Travis Oliphant tra...@continuum.iowrote:

 Accessing individual elements of NumPy arrays is slower than accessing
 individual elements of lists --- around 2.5x-3x slower.NumPy has to do
 more work to figure out what kind of indexing you are trying to do because
 of its flexibility.   It also has to create the Python object to return.
  In contrast, the list approach already has the Python objects created and
 you are just returning pointers to them and there is much less flexibility
 in the kinds of indexing you can do.

 Simple timings show that a.item(i,j) is about 2x slower than list element
 access (but faster than a[i,j] which is about 2.5x to 3x slower).   The
 slowness of a.item is due to the need to create the Python object to return
 (there are just raw bytes there) so it gives some idea of the relative cost
 of each part of the slowness of a[i,j].

 Also, math on the array scalars returned from NumPy will be slower than
 math on integers and floats --- because NumPy re-uses the ufunc machinery
 which is not optimized at all for scalars.

 The take-away is that NumPy is built for doing vectorized operations on
 bytes of data.   It is not optimized for doing element-by-element
 individual access.The right approach there is to just use lists (or use
 a version specialized for the kind of data in the lists that removes the
 boxing and unboxing).

 Here are my timings using IPython for NumPy indexing:

 1-D:

 In[2]: a = arange(100)

 In [3]: %timeit [a.item(i) for i in xrange(100)]
 1 loops, best of 3: 25.6 us per loop

 In [4]: %timeit [a[i] for i in xrange(100)]
 1 loops, best of 3: 31.8 us per loop

 In [5]: al = a.tolist()

 In [6]: %timeit [al[i] for i in xrange(100)]
 10 loops, best of 3: 10.6 us per loop



 2-D:

 In [7]: a = arange(100).reshape(10,10)

 In [8]: al = a.tolist()

 In [9]: %timeit [al[i][j] for i in xrange(10) for j in xrange(10)]
 1 loops, best of 3: 18.6 us per loop

 In [10]: %timeit [a[i,j] for i in xrange(10) for j in xrange(10)]
 1 loops, best of 3: 44.4 us per loop

 In [11]: %timeit [a.item(i,j) for i in xrange(10) for j in xrange(10)]
 1 loops, best of 3: 34.2 us per loop



 -Travis


 However, what is the timing/memory cost of converting a large numpy array
 that already exists into python list of lists?  If all my processing before
 the munkres step is using NumPy, converting it into python lists has a
 cost.  Also, your timings indicate only ~2x slowdown, while the timing
 tests done by eat show an order-of-magnitude difference.  I suspect there
 is great room for improvement before even starting to worry about the array
 access issues.

To create list of list from array is quite fast, like
In []: A= rand(500, 500)
In []: %timeit A.tolist()
100 loops, best of 3: 10.8 ms per loop

Regards,
-eat


 Cheers!
 Ben Root


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Good way to develop numpy as popular choice!

2012-06-21 Thread eat

Heh,

On Thu, Jun 21, 2012 at 6:03 PM, Robert Kern robert.k...@gmail.com wrote:

 On Thu, Jun 21, 2012 at 3:59 PM, bob tnur bobtnu...@gmail.com wrote:
  Hi all numpy fun;)
  This question is already posted in stackoverflow by some people, I am
 just
  thinking that numpy python will do this with trick;) I guess numpy will
 be
  every ones choice as its popularity increases. The question is herein:
 
 http://stackoverflow.com/questions/10074270/how-can-i-find-the-minimum-number-of-lines-needed-to-cover-all-the-zeros-in-a-2

 My numpy solution for this is just

  $ pip install munkres

munkres seems to be a pure python implementation ;-).

FWIIW, There exists pure python implementation(s) to outperform
munkresimplementation more than 200 times already with a 100x100
random cost
matrix, based on shortest path variant of the Hungarian algorithm (more
details of the algorithms can be found for example at
http://www.assignmentproblems.com/).

How the assignment algorithms are (typically) described, it actually may be
quite a tedious job to create more performance ones utilizing numpy arrays
instead of lists of lists.


My 2 cents,
-eat


 http://pypi.python.org/pypi/munkres

 --
 Robert Kern
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] convert any non square matrix in to square matrix using numpy

2012-06-18 Thread eat

Hi,

On Mon, Jun 18, 2012 at 6:55 PM, bob tnur bobtnu...@gmail.com wrote:

 Hi,
 how I can convert (by adding zero) of any non-square numpy matrix in to
 square matrix using numpy? then how to find the minimum number in each row
 except the zeros added(for making square matrix)? ;)

Perhaps something like this:
In []: def make_square(A):
   ..: s= A.shape
   ..: if s[0] s[1]:
   : return r_[A, zeros((s[1]- s[0], s[1]), dtype= A.dtype)]
   ..: return c_[A, zeros((s[0], s[0]- s[1]), dtype= A.dtype)]
   ..:
In []: A= rand(4, 2)
In []: make_square(A)
Out[]:
array([[ 0.76109774,  0.42980812,  0.,  0.],
   [ 0.11810978,  0.59622975,  0.,  0.],
   [ 0.54991376,  0.29315485,  0.,  0.],
   [ 0.78182313,  0.3828001 ,  0.,  0.]])
In []: make_square(A.T)
Out[]:
array([[ 0.76109774,  0.11810978,  0.54991376,  0.78182313],
   [ 0.42980812,  0.59622975,  0.29315485,  0.3828001 ],
   [ 0.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.]])

will help you.

My 2 cents,
-eat


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] fast access and normalizing of ndarray slices

2012-06-04 Thread eat

Hi,

On Mon, Jun 4, 2012 at 12:44 AM, srean srean.l...@gmail.com wrote:

 Hi Wolfgang,

  I think you are looking for reduceat( ), in particular add.reduceat()


Indeed OP could utilize add.reduceat(...), like:

# tst.py

import numpy as np


 def reduce(data, lengths):

ind, ends= np.r_[lengths, lengths], lengths.cumsum()

ind[::2], ind[1::2]= ends- lengths, ends

return np.add.reduceat(np.r_[data, 0], ind)[::2]


 def normalize(data, lengths):

return data/ np.repeat(reduce(data, lengths), lengths)


 def gen(par):

lengths= np.random.randint(*par)

return np.random.randn(lengths.sum()), lengths


 if __name__ == '__main__':

data= np.array([1, 2, 1, 2, 3, 4, 1, 2, 3], dtype= float)

lengths= np.array([2, 4, 3])

print reduce(data, lengths)

print normalize(data, lengths).round(2)


Resulting:
In []: %run tst
[  3.  10.   6.]
[ 0.33  0.67  0.1   0.2   0.3   0.4   0.17  0.33  0.5 ]

Fast enough:
In []: data, lengths= gen([5, 15, 5e4])
In []: data.size
Out[]: 476028
In []: %timeit normalize(data, lengths)
10 loops, best of 3: 29.4 ms per loop


My 2 cents,
-eat


 -- srean

 On Thu, May 31, 2012 at 12:36 AM, Wolfgang Kerzendorf
 wkerzend...@gmail.com wrote:
  Dear all,
 
  I have an ndarray which consists of many arrays stacked behind each
 other (only conceptually, in truth it's a normal 1d float64 array).
  I have a second array which tells me the start of the individual data
 sets in the 1d float64 array and another one which tells me the length.
  Example:
 
  data_array = (conceptually) [[1,2], [1,2,3,4], [1,2,3]] = in reality
 [1,2,1,2,3,4,1,2,3, dtype=float64]
  start_pointer = [0, 2, 6]
  length_data = [2, 4, 3]
 
  I now want to normalize each of the individual data sets. I wrote a
 simple for loop over the start_pointer and length data grabbed the data and
 normalized it and wrote it back to the big array. That's slow. Is there an
 elegant numpy way to do that? Do I have to go the cython way?
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] why not zerodivision error?

2012-05-20 Thread eat

Hi,

On Sun, May 20, 2012 at 10:21 AM, Chao YUE chaoyue...@gmail.com wrote:

 Dear all,

 could anybody give one sentence about this? why in the loop I didn't get
 zerodivision error by when I explicitly do this, I get a zerodivision
 error? thanks.

 In [7]: for i in np.arange(-10,10):
 print 1./i
...:
 -0.1
 -0.
 -0.125
 -0.142857142857
 -0.1667
 -0.2
 -0.25
 -0.
 -0.5
 -1.0
 inf
 1.0
 0.5
 0.
 0.25
 0.2
 0.1667
 0.142857142857
 0.125
 0.

 In [8]: 1/0.
 ---
 ZeroDivisionError Traceback (most recent call last)
 /mnt/f/data/DROUGTH/ipython-input-8-7e0bf5b37da6 in module()
  1 1/0.

 ZeroDivisionError: float division by zero

 In [9]: 1./0.
 ---
 ZeroDivisionError Traceback (most recent call last)
 /mnt/f/data/DROUGTH/ipython-input-9-3543596c47ff in module()
  1 1./0.

 ZeroDivisionError: float division by zero

 In [10]: 1./0
 ---
 ZeroDivisionError Traceback (most recent call last)
 /mnt/f/data/DROUGTH/ipython-input-10-523760448f92 in module()
  1 1./0

 ZeroDivisionError: float division by zero

You may like to read more on here
http://docs.scipy.org/doc/numpy/reference/generated/numpy.seterr.html#numpy.seterr

So, for your specific example:
In []: a= arange(-10, 10)
In []: 1./ a
Out[]:
array([-0.1   , -0., -0.125 , -0.14285714, -0.1667,
   -0.2   , -0.25  , -0., -0.5   , -1.,
   inf,  1.,  0.5   ,  0.,  0.25  ,
0.2   ,  0.1667,  0.14285714,  0.125 ,  0.])
In []: seterr(divide= 'raise')
In []: 1./ a

Traceback (most recent call last):
  File ipython console, line 1, in module
FloatingPointError: divide by zero encountered in divide


My 2 cents,
-eat




 Chao

 --

 ***
 Chao YUE
 Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
 UMR 1572 CEA-CNRS-UVSQ
 Batiment 712 - Pe 119
 91191 GIF Sur YVETTE Cedex
 Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16

 


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] (no subject)

2012-04-22 Thread eat

Hi,

On Fri, Apr 20, 2012 at 9:15 PM, Andre Martel soucoupevola...@yahoo.comwrote:

 What would be the best way to remove the maximum from a cube and
 collapse the remaining elements along the z-axis ?
 For example, I want to reduce Cube to NewCube:

  Cube
 array([[[  13,   2,   3,  42],
 [  5, 100,   7,   8],
 [  9,   1,  11,  12]],

[[ 25,   4,  15,   1],
 [ 17,  30,   9,  20],
 [ 21,   2,  23,  24]],

[[ 1,   2,  27,  28],
 [ 29,  18,  31,  32],
 [ -1,   3,  35,   4]]])

 NewCube

 array([[[  13,   2,   3,  1],
 [  5, 30,   7,   8],
 [  9,   1,  11,  12]],

[[ 1,   2,  15,  28],
 [ 17,  18,  9,  20],
 [ -1,   2,  23,   4]]])

 I tried with argmax() and then roll() and delete() but these
 all work on 1-D arrays only. Thanks.

Perhaps it would be more straightforward to process via 2D-arrays, like:
In []: C
Out[]:
array([[[ 13,   2,   3,  42],
[  5, 100,   7,   8],
[  9,   1,  11,  12]],
   [[ 25,   4,  15,   1],
[ 17,  30,   9,  20],
[ 21,   2,  23,  24]],
   [[  1,   2,  27,  28],
[ 29,  18,  31,  32],
[ -1,   3,  35,   4]]])
In []: C_in= C.reshape(3, -1).copy()
In []: ndx= C_in.argmax(0)
In []: C_out= C_in[:2, :]
In []: C_out[:, ndx== 0]= C_in[1:, ndx== 0]
In []: C_out[1, ndx== 1]= C_in[2, ndx== 1]
In []: C_out.reshape(2, 3, 4)
Out[]:
array([[[13,  2,  3,  1],
[ 5, 30,  7,  8],
[ 9,  1, 11, 12]],
   [[ 1,  2, 15, 28],
[17, 18,  9, 20],
[-1,  2, 23,  4]]])

My 2 cents,
-eat


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] problem with vectorized difference equation

2012-04-07 Thread eat

Hi,

On Sat, Apr 7, 2012 at 1:27 AM, Sameer Grover sameer.grove...@gmail.comwrote:

 On Saturday 07 April 2012 02:51 AM, Francesco Barale wrote:
  Hello Sameer,
 
  Thank you very much for your reply. My goal was to try to speed up the
 loop
  describing the accumulator. In the (excellent) book I was mentioning in
 my
  initial post I could find one example that seemed to match what I was
 trying
  to do. Basically, it is said that a loop of the following kind:
 
  n = size(u)-1
  for i in xrange(1,n,1):
   u_new[i] = u[i-1] + u[i] + u[i+1]
 
  should be equivalent to:
 
  u[1:n] = u[0:n-1] + u[1:n] + u[i+1]
 This example is correct.

 What I was trying to point out was that the single expression y[1:n] =
 y[0:n-1] + u[1:n] will iterate over the array but will not accumulate.
 It will add y[0:n-1] to u[1:n] and assign the result to y[1:n].

 For example,
 y = [1,2,3,4]
 u = [5,6,7,8]
 Then y[0:n-1] = [1,2,3] and u[1:n]=[6,7,8]

 The statement y[1:n] = y[0:n-1] + u[1:n] implies
 y[1:n] = [6+1,7+2,8+3] = [7,9,11]
 yielding y = [1,7,9,11]

FWIIFO, if assignment in loop like this ever makes any sense (which I
doubt),


 whereas the code:

 for i in range(0,n-1):
  y[i+1] = y[i] + u[i+1]

then it can be captured in a function, like
In []: def s0(y, u):
   ..: yn= y.copy()
   ..: for k in xrange(y.size- 1):
   ..: yn[k+ 1]= yn[k]+ u[k+ 1]
   ..: return yn
   ..:
and now this function can be easily transformed to utilize cumsum, like
In []: def s1(y, u):
   ..: un= u.copy()
   ..: un[0]= y[0]
   ..: return cumsum(un)
   ..:
thus
In []: y, u= rand(1e5), rand(1e5)
In []: allclose(s0(y, u), s1(y, u))
Out[]: True
and definitely this transformation will outperform a plain python loop
In []: timeit s0(y, u)
10 loops, best of 3: 122 ms per loop
In []: timeit s1(y, u)
100 loops, best of 3: 2.16 ms per loop
In []: 122/ 2.16
Out[]: 56.48148148148148


My 2 cents,
-eat


 will accumulate and give y = [1,7,14,22]

 Sameer
  Am I missing something?
 
  Regards,
  Francesco
 
 
  Sameer Grover wrote:
  On Saturday 07 April 2012 12:14 AM, francesco82 wrote:
  Hello everyone,
 
  After reading the very good post
 
 http://technicaldiscovery.blogspot.com/2011/06/speeding-up-python-numpy-cython-and.html
  and the book by H. P. Langtangen 'Python scripting for computational
  science' I was trying to speed up the execution of a loop on numpy
 arrays
  being used to describe a simple difference equation.
 
  The actual code I am working on involves some more complicated
 equations,
  but I am having the same exact behavior as described below. To test the
  improvement in speed I wrote the following in vect_try.py:
 
  #!/usr/bin/python
  import numpy as np
  import matplotlib.pyplot as plt
 
  dt = 0.02 #time step
  time = np.arange(0,2,dt) #time array
  u = np.sin(2*np.pi*time) #input signal array
 
  def vect_int(u,y): #vectorized function
n = u.size
y[1:n] = y[0:n-1] + u[1:n]
return y
 
  def sc_int(u,y): #scalar function
y = y + u
return y
 
  def calc_vect(u, func=vect_int):
out = np.zeros(u.size)
for i in xrange(u.size):
out = func(u,out)
return out
 
  def calc_sc(u, func=sc_int):
out = np.zeros(u.size)
for i in xrange(u.size-1):
out[i+1] = sc_int(u[i+1],out[i])
return out
 
  To verify the execution time I've used the timeit function in Ipython:
 
  import vect_try as vt
  timeit vt.calc_vect(vt.u) --   1000 loops, best of 3: 494 us per loop
  timeit vt.calc_sc(vt.u) --1 loops, best of 3: 92.8 us per loop
 
  As you can see the scalar implementation looping one item at the time
  (calc_sc) is 494/92.8~5.3 times faster than the vectorized one
  (calc_vect).
 
  My problem consists in the fact that I need to iterate the execution of
  calc_vect in order for it to operate on all the elements of the input
  array.
  If I execute calc_vect only once, it will only operate on the first
 slice
  of
  the vectors leaving the remaining untouched. My understanding was that
  the
  vector expression y[1:n] = y[0:n-1] + u[1:n] was supposed to iterate
 over
  all the array, but that's not happening for me. Can anyone tell me
 what I
  am
  doing wrong?
 
  Thanks!
  Francesco
 
  1. By vectorizing, we mean replacing a loop with a single expression. In
  your program, both the scalar and vector implementations (calc_vect and
  calc_sc) have a loop each. This isn't going to make anything faster. The
  vectorized implementation is just a convoluted way of achieving the same
  result and is slower.
 
  2. The expression y[1:n] = y[0:n-1] + u[1:n] is /not/ equivalent to the
  following loop
 
  for i in range(0,n-1):
y[i+1] = y[i] + u[i+1]
 
  It is equivalent to something like
 
  z = np.zeros(n-1)
  for i in range(0,n-1):
z[i] = y[i] + u[i+1]
  y[1:n] = z
 
  i.e., the RHS is computed in totality and then assigned to the LHS. This
  is how array operations work even

Re: [Numpy-discussion] Numpy Memory Error with corrcoef

2012-03-27 Thread eat

Hi,

On Tue, Mar 27, 2012 at 12:12 PM, Nicole Stoffels 
nicole.stoff...@forwind.de wrote:

 **
 Dear all,

 I get the following memory error while running my program:

 *Traceback (most recent call last):
   File /home/nistl/Software/Netzbetreiber/FLOW/src/MemoryError_Debug.py,
 line 9, in module
 correlation = corrcoef(data_records)
   File /usr/lib/python2.7/dist-packages/numpy/lib/function_base.py, line
 1992, in corrcoef
 c = cov(x, y, rowvar, bias)
   File /usr/lib/python2.7/dist-packages/numpy/lib/function_base.py, line
 1973, in cov
 return (dot(X, X.T.conj()) / fact).squeeze()
 MemoryError*

 Here an easy example how to reproduce the error:

 *#!/usr/bin/env python2.7

 from pybinsel import open
 from numpy import *

 if __name__ == '__main__':

 data_records = random.random((459375, 24))
 correlation = corrcoef(data_records)

 *My real data has the same dimension. Is this a size problem of the array
 or did I simply make a mistake in the application of corrcoef?

 I hope that you can help me! Thanks!

As other ones has explained this approach yields an enormous matrix.
However, if I have understood your problem correctly you could implement a
helper class to iterate over all of your observations. Something like along
the lines (although it will take hours? with your data size) to iterate
over all correlations:

A helper class for correlations between observations.

import numpy as np


 class Correlations(object):

def __init__(self, data):

self.m= data.shape[0]

# compatible with corrcoef

self.scale= self.m- 1

self.data= data- np.mean(data, 1)[:, None]

# but you may actually need to scale and translate data

# more application speficic manner

self.var= (self.data** 2.).sum(1)/ self.scale


 def obs_kth(self, k):

c= np.dot(self.data, self.data[k])/ self.scale

return c/ (self.var[k]* self.var)** .5


 def obs_iterate(self):

for k in xrange(self.m):

yield self.obs_kth(k)


 if __name__ == '__main__':

data= np.random.randn(5, 3)

print np.corrcoef(data).round(3)

print

c= Correlations(data)

print np.array([p for p in c.obs_iterate()]).round(3)


My 2 cents,
-eat


 Best regards,

 Nicole Stoffels

 --

 Dipl.-Met. Nicole Stoffels

 Wind Power Forecasting and Simulation

 ForWind - Center for Wind Energy Research
 Institute of Physics
 Carl von Ossietzky University Oldenburg

 Ammerländer Heerstr. 136
 D-26129 Oldenburg

 Tel: +49(0)441 798 - 5079
 Fax: +49(0)441 798 - 5099

 Web  : www.ForWind.de
 Email: nicole.stoff...@forwind.de


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Want to eliminate direct for-loop

2012-02-11 Thread eat

Hi,

On Sat, Feb 11, 2012 at 10:56 PM, Dinesh B Vadhia dineshbvad...@hotmail.com
 wrote:

 **
 Could the following be written without the direct for-loop?

 import numpy
 # numpy vector r of any data type and length, eg.
 r = numpy.ones(25, dtype='int')
 # s is a list of values (of any data type), eg.
 s = [47, 27, 67]
 # c is a list of (variable length) lists where the sub-list elements are
 index values of r and len(s) = len(c), eg.
 c = [[3, 6, 9], [6, 11, 19, 24], [4, 9, 11, 21 ]]
 # for each element in each sub-list c, add corresponding s value to the
 index value in r, eg.
 for i, j in enumerate(c):
 r[j] += s[i]

 So, we get:
 r[[3, 6, 9]] += s[0] = 1 + 47 = 48
 r[[6, 11, 19, 24]] += s[1] = 1 + 27 = 28
 r[[4, 9, 11, 21]] += s[2] = 1 + 67 = 68

 ie. r = array([  1,   1,   1,  95,  68,   1, 122,   1,   1, 162,   1,
 95,   1,  1,   1,   1,   1,   1,   1,  28,   1,  68,   1,   1,  28])

 Thank-you!

Could you describe more detailed manner about why you want to get rid of
that loop? Performance wise? If so, do you have profiled what's
the bottleneck?

Please provide also a more detailed description of your problem, since now
your current spec seems to yield:
r= array([  1,   1,   1,  48,  68,   1,  75,   1,   1, 115,   1,  95,   1,
1,   1,   1,   1,   1,   1,  28,   1,  68,   1,   1,  28])


My 2 cents,
-eat




 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy.arange() error?

2012-02-09 Thread eat

Hi,

On Thu, Feb 9, 2012 at 9:47 PM, Eric Firing efir...@hawaii.edu wrote:

 On 02/09/2012 09:20 AM, Drew Frank wrote:
  Eric Firingefiringat  hawaii.edu  writes:
 
 
  On 02/08/2012 09:31 PM, teomat wrote:
 
  Hi,
 
  Am I wrong or the numpy.arange() function is not correct 100%?
 
  Try to do this:
 
  In [7]: len(np.arange(3.1, 4.9, 0.1))
  Out[7]: 18
 
  In [8]: len(np.arange(8.1, 9.9, 0.1))
  Out[8]: 19
 
  I would expect the same result for each command.
 
  Not after more experience with the wonders of floating point!
  Nice-looking decimal numbers often have long, drawn-out, inexact
  floating point (base 2) representations.  That leads to exactly this
  sort of problem.
 
  numpy.linspace is provided to help get around some of these surprises;
  or you can use an integer sequence and then scale and shift it.
 
  Eric
 
 
  All the best
 
 
 
  I also found this surprising -- not because I lack experience with
 floating
  point, but because I do have experience with MATLAB.  In MATLAB, the
  corresponding operation 3.1:0.1:4.9 has length 19 because of an explicit
  tolerance parameter used in the implmentation
  (
 http://www.mathworks.com/support/solutions/en/data/1-4FLI96/index.html?solution=1-4FLI96
 ).
 
  Of course, NumPy is not MATLAB :).  That said, I prefer the MATLAB
 behavior in
  this case -- even though it has a bit of a magic feel to it, I find it
 hard to
  imagine code that operates correctly given the Python semantics and
 incorrectly
  under MATLAB's.  Thoughts?

 You raise a good point.  Neither arange nor linspace provides a close
 equivalent to the nice behavior of the Matlab colon, even though that is
 often what one really wants.  Adding this, either via an arange kwarg, a
 linspace kwarg, or a new function, seems like a good idea.

Maybe this issue is raised also earlier, but wouldn't it be more consistent
to let arange operate only with integers (like Python's range) and let
linspace handle the floats as well?


My 2 cents,
eat


 Eric

 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] avoiding loops when downsampling arrays

2012-02-07 Thread eat

Hi

This is elegant and very fast as well!
On Tue, Feb 7, 2012 at 2:57 PM, Sturla Molden stu...@molden.no wrote:

 On 06.02.2012 22:27, Sturla Molden wrote:
 
 
 
  # Make a 4D view of this data, such that b[i,j]
  # is a 2D block with shape (4,4) (e.g. b[0,0] is
  # the same as a[:4, :4]).
  b = as_strided(a, shape=(a.shape[0]/4, a.shape[1]/4, 4, 4),
  strides=(4*a.strides[0], 4*a.strides[1], a.strides[0],
 a.strides[1]))
 
 
  Yes :-) Being used to Fortran (and also MATLAB) this is the kind of
 mapping It never occurs for me to think about. What else but NumPy is
 flexible enough to do this? :-)

 Actually, using as_strided is not needed. We can just reshape like this:

(m,n) --- (m//4, 4, n//4, 4)

 and then use np.any along the two length-4 dimensions.

   m,n = data.shape
   cond = lamda x : (x = t1)  (x = t2)

I guess you meant here cond= lambda x: (x= t1) (x= t2)

   x = cond(data).reshape((m//4, 4, n//4, 4))
   found = np.any(np.any(x, axis=1), axis=2)

Regards,
eat



 Sturla
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] avoiding loops when downsampling arrays

2012-02-06 Thread eat

Hi,

On Mon, Feb 6, 2012 at 9:16 PM, Moroney, Catherine M (388D) 
catherine.m.moro...@jpl.nasa.gov wrote:

 Hello,

 I have to write a code to downsample an array in a specific way, and I am
 hoping that
 somebody can tell me how to do this without the nested do-loops.  Here is
 the problem
 statement:  Segment a (MXN) array into 4x4 squares and set a flag if any
 of the pixels
 in that 4x4 square meet a certain condition.

 Here is the code that I want to rewrite avoiding loops:

 shape_out = (data_in.shape[0]/4, data_in.shape[1]/4)
 found = numpy.zeros(shape_out).astype(numpy.bool)

 for i in xrange(0, shape_out[0]):
for j in xrange(0, shape_out[1]):

excerpt = data_in[i*4:(i+1)*4, j*4:(j+1)*4]
mask = numpy.where( (excerpt = t1)  (excerpt = t2),
 True, False)
if (numpy.any(mask)):
found[i,j] = True

 Thank you for any hints and education!

 Catherine

Following Warrens answer a slight demonstration of code like this:

import numpy as np

def ds_0(data_in, t1= 1, t2= 4):

shape_out= (data_in.shape[0]/ 4, data_in.shape[1]/ 4)

found= np.zeros(shape_out).astype(np.bool)

for i in xrange(0, shape_out[0]):

for j in xrange(0, shape_out[1]):

excerpt= data_in[i* 4: (i+ 1)* 4, j* 4: (j+ 1)* 4]

mask= np.where((excerpt= t1) (excerpt= t2), True, False)

if (np.any(mask)):

found[i, j]= True

return found

# with stride_tricks you may cook up something like this:

from numpy.lib.stride_tricks import as_strided as ast

def _ss(dt, ds, s):

return {'shape': (ds[0]/ s[0], ds[1]/ s[1])+ s,

'strides': (s[0]* dt[0], s[1]* dt[1])+ dt}

def _view(D, shape= (4, 4)):

return ast(D, **_ss(D.strides, D.shape, shape))

def ds_1(data_in, t1= 1, t2= 4):

# return _view(data_in)

excerpt= _view(data_in)

mask= np.where((excerpt= t1) (excerpt= t2), True, False)

return mask.sum(2).sum(2).astype(np.bool)

 if __name__ == '__main__':

from numpy.random import randint

r= randint(777, size= (64, 288)); print r

print np.allclose(ds_0(r), ds_1(r))




My 2 cents,
eat

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] avoiding loops when downsampling arrays

2012-02-06 Thread eat

Hi,

Sorry for my latest post, hands way too quick ;(

On Mon, Feb 6, 2012 at 9:16 PM, Moroney, Catherine M (388D) 
catherine.m.moro...@jpl.nasa.gov wrote:

 Hello,

 I have to write a code to downsample an array in a specific way, and I am
 hoping that
 somebody can tell me how to do this without the nested do-loops.  Here is
 the problem
 statement:  Segment a (MXN) array into 4x4 squares and set a flag if any
 of the pixels
 in that 4x4 square meet a certain condition.

 Here is the code that I want to rewrite avoiding loops:

 shape_out = (data_in.shape[0]/4, data_in.shape[1]/4)
 found = numpy.zeros(shape_out).astype(numpy.bool)

 for i in xrange(0, shape_out[0]):
for j in xrange(0, shape_out[1]):

excerpt = data_in[i*4:(i+1)*4, j*4:(j+1)*4]
mask = numpy.where( (excerpt = t1)  (excerpt = t2),
 True, False)
if (numpy.any(mask)):
found[i,j] = True

 Thank you for any hints and education!

Following closely with Warrens answer a slight demonstration of code like
this:
import numpy as np

 def ds_0(data_in, t1= 1, t2= 4):

shape_out= (data_in.shape[0]/ 4, data_in.shape[1]/ 4)

found= np.zeros(shape_out).astype(np.bool)

for i in xrange(0, shape_out[0]):

for j in xrange(0, shape_out[1]):

excerpt= data_in[i* 4: (i+ 1)* 4, j* 4: (j+ 1)* 4]

mask= np.where((excerpt= t1) (excerpt= t2), True, False)

if (np.any(mask)):

found[i, j]= True

return found


 # with stride_tricks you may cook up something like this:

from numpy.lib.stride_tricks import as_strided as ast


 def _ss(dt, ds, s):

return {'shape': (ds[0]/ s[0], ds[1]/ s[1])+ s,

'strides': (s[0]* dt[0], s[1]* dt[1])+ dt}


 def _view(D, shape= (4, 4)):

return ast(D, **_ss(D.strides, D.shape, shape))


 def ds_1(data_in, t1= 1, t2= 4):

excerpt= _view(data_in)

mask= np.where((excerpt= t1) (excerpt= t2), True, False)

return mask.sum(2).sum(2).astype(np.bool)


 if __name__ == '__main__':

from numpy.random import randint

r= randint(777, size= (64, 288)); print r

print np.allclose(ds_0(r), ds_1(r))


and when run, it will yield like:

In []: run dsa

[[ 60 470 521 ..., 147 435 295]

 [246 127 662 ..., 718 525 256]

 [354 384 205 ..., 225 364 239]

 ...,

 [277 428 201 ..., 460 282 433]

 [ 27 407 130 ..., 245 346 309]

 [649 157 153 ..., 316 613 570]]

True

and compared in performance wise:
In []: %timeit ds_0(r)
10 loops, best of 3: 56.3 ms per loop

In []: %timeit ds_1(r)
100 loops, best of 3: 2.17 ms per loop


My 2 cents,

eat



 Catherine
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Unrealistic expectations of class Polynomial or a bug?

2012-01-30 Thread eat

On Sat, Jan 28, 2012 at 11:14 PM, Charles R Harris 
charlesr.har...@gmail.com wrote:



 On Sat, Jan 28, 2012 at 11:15 AM, eat e.antero.ta...@gmail.com wrote:

 Hi,

 Short demonstration of the issue:
 In []: sys.version
 Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit
 (Intel)]'
 In []: np.version.version
 Out[]: '1.6.0'

 In []: from numpy.polynomial import Polynomial as Poly
 In []: def p_tst(c):
..: p= Poly(c)
..: r= p.roots()
..: return sort(abs(p(r)))
..:

 Now I would expect a result more like:
 In []: p_tst(randn(123))[-3:]
 Out[]: array([  3.41987203e-07,   2.82123675e-03,   2.82123675e-03])

 be the case, but actually most result seems to be more like:
 In []: p_tst(randn(123))[-3:]
 Out[]: array([  9.09325898e+13,   9.09325898e+13,   1.29387029e+72])
 In []: p_tst(randn(123))[-3:]
 Out[]: array([  8.60862087e-11,   8.60862087e-11,   6.58784520e+32])
 In []: p_tst(randn(123))[-3:]
 Out[]: array([  2.00545673e-09,   3.25537709e+32,   3.25537709e+32])
 In []: p_tst(randn(123))[-3:]
 Out[]: array([  3.22753481e-04,   1.87056454e+00,   1.87056454e+00])
 In []: p_tst(randn(123))[-3:]
 Out[]: array([  2.98556327e+08,   2.98556327e+08,   8.23588003e+12])

 So, does this phenomena imply that
 - I'm testing with too high order polynomials (if so, does there exists a
 definite upper limit of polynomial order I'll not face this issue)
 or
 - it's just the 'nature' of computations with float values (if so,
 probably I should be able to tackle this regardless of the polynomial order)
 or
 - it's a nasty bug in class Polynomial


 It's a defect. You will get all the roots and the number will equal the
 degree. I haven't decided what the best way to deal with this is, but my
 thoughts have trended towards specifying an interval with the default being
 the domain. If you have other thoughts I'd be glad for the feedback.

 For the problem at hand, note first that you are specifying the
 coefficients, not the roots as was the case with poly1d. Second, as a rule
 of thumb, plain old polynomials will generally only be good for degree  22
 due to being numerically ill conditioned. If you are really looking to use
 high degrees, Chebyshev or Legendre will work better, although you will
 probably need to explicitly specify the domain. If you want to specify the
 polynomial using roots, do Poly.fromroots(...). Third, for the high degrees
 you are probably screwed anyway for degree 123, since the accuracy of the
 root finding will be limited, especially for roots that can cluster, and
 any root that falls even a little bit outside the interval [-1,1] (the
 default domain) is going to evaluate to a big number simply because the
 polynomial is going to h*ll at a rate you wouldn't believe ;)

 For evenly spaced roots in [-1, 1] and using Chebyshev polynomials, things
 look good for degree 50, get a bit loose at degree 75 but can be fixed up
 with one iteration of Newton, and blow up at degree 100. I think that's
 pretty good, actually, doing better would require a lot more work. There
 are some zero finding algorithms out there that might do better if someone
 wants to give it a shot.

 In [20]: p = Cheb.fromroots(linspace(-1, 1, 50))

 In [21]: sort(abs(p(p.roots(
 Out[21]:
 array([  6.20385459e-25,   1.65436123e-24,   2.06795153e-24,
  5.79026429e-24,   5.89366186e-24,   6.44916482e-24,
  6.44916482e-24,   6.77254127e-24,   6.97933642e-24,
  7.25459208e-24,   1.00295649e-23,   1.37391414e-23,
  1.37391414e-23,   1.63368171e-23,   2.39882378e-23,
  3.30872245e-23,   4.38405725e-23,   4.49502653e-23,
  4.49502653e-23,   5.58346913e-23,   8.35452419e-23,
  9.38407760e-23,   9.38407760e-23,   1.03703218e-22,
  1.03703218e-22,   1.23249911e-22,   1.75197880e-22,
  1.75197880e-22,   3.07711188e-22,   3.09821786e-22,
  3.09821786e-22,   4.56625520e-22,   4.56625520e-22,
  4.69638303e-22,   4.69638303e-22,   5.96448724e-22,
  5.96448724e-22,   1.24076485e-21,   1.24076485e-21,
  1.59972624e-21,   1.59972624e-21,   1.62930347e-21,
  1.62930347e-21,   1.73773328e-21,   1.73773328e-21,
  1.87935435e-21,   2.30287083e-21,   2.48815928e-21,
  2.85411753e-21,   2.85411753e-21])

Thanks,

for a very informative feedback. I'll study those orthogonal polynomials
more detail.


Regards,
- eat



 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Unrealistic expectations of class Polynomial or a bug?

2012-01-28 Thread eat

Hi,

Short demonstration of the issue:
In []: sys.version
Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]'
In []: np.version.version
Out[]: '1.6.0'

In []: from numpy.polynomial import Polynomial as Poly
In []: def p_tst(c):
   ..: p= Poly(c)
   ..: r= p.roots()
   ..: return sort(abs(p(r)))
   ..:

Now I would expect a result more like:
In []: p_tst(randn(123))[-3:]
Out[]: array([  3.41987203e-07,   2.82123675e-03,   2.82123675e-03])

be the case, but actually most result seems to be more like:
In []: p_tst(randn(123))[-3:]
Out[]: array([  9.09325898e+13,   9.09325898e+13,   1.29387029e+72])
In []: p_tst(randn(123))[-3:]
Out[]: array([  8.60862087e-11,   8.60862087e-11,   6.58784520e+32])
In []: p_tst(randn(123))[-3:]
Out[]: array([  2.00545673e-09,   3.25537709e+32,   3.25537709e+32])
In []: p_tst(randn(123))[-3:]
Out[]: array([  3.22753481e-04,   1.87056454e+00,   1.87056454e+00])
In []: p_tst(randn(123))[-3:]
Out[]: array([  2.98556327e+08,   2.98556327e+08,   8.23588003e+12])

So, does this phenomena imply that
- I'm testing with too high order polynomials (if so, does there exists a
definite upper limit of polynomial order I'll not face this issue)
or
- it's just the 'nature' of computations with float values (if so, probably
I should be able to tackle this regardless of the polynomial order)
or
- it's a nasty bug in class Polynomial


Regards,
eat
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread eat

Hi,

Oddly, but numpy 1.6 seems to behave more consistent manner:

In []: sys.version
Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]'
In []: np.version.version
Out[]: '1.6.0'

In []: d= np.load('data.npy')
In []: d.dtype
Out[]: dtype('float32')

In []: d.mean()
Out[]: 3045.74718
In []: d.mean(dtype= np.float32)
Out[]: 3045.74718
In []: d.mean(dtype= np.float64)
Out[]: 3045.747251076416
In []: (d- d.min()).mean()+ d.min()
Out[]: 3045.7472508750002
In []: d.mean(axis= 0).mean()
Out[]: 3045.74724
In []: d.mean(axis= 1).mean()
Out[]: 3045.74724

Or does the results of calculations depend more on the platform?


My 2 cents,
eat
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread eat

Hi

On Wed, Jan 25, 2012 at 1:21 AM, Kathleen M Tacina 
kathleen.m.tac...@nasa.gov wrote:

 **
 I found something similar, with a very simple example.

 On 64-bit linux, python 2.7.2, numpy development version:

 In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32)

 In [23]: a.mean()
 Out[23]: 4034.16357421875

 In [24]: np.version.full_version
 Out[24]: '2.0.0.dev-55472ca'


 But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives:
 a = np.ones((1024,1024),dtype=np.float32)
 a.mean()
 4000.0
 np.version.full_version
 '1.6.1'

This indeed looks very nasty, regardless of whether it is a version or
platform related problem.

-eat




 On Tue, 2012-01-24 at 17:12 -0600, eat wrote:

 Hi,



  Oddly, but numpy 1.6 seems to behave more consistent manner:



  In []: sys.version

  Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit
 (Intel)]'

  In []: np.version.version

  Out[]: '1.6.0'



  In []: d= np.load('data.npy')

  In []: d.dtype

  Out[]: dtype('float32')



  In []: d.mean()

  Out[]: 3045.74718

  In []: d.mean(dtype= np.float32)

  Out[]: 3045.74718

  In []: d.mean(dtype= np.float64)

  Out[]: 3045.747251076416

  In []: (d- d.min()).mean()+ d.min()

  Out[]: 3045.7472508750002

  In []: d.mean(axis= 0).mean()

  Out[]: 3045.74724

  In []: d.mean(axis= 1).mean()

  Out[]: 3045.74724



  Or does the results of calculations depend more on the platform?





  My 2 cents,

  eat

   --
 --
 Kathleen M. Tacina
 NASA Glenn Research Center
 MS 5-10
 21000 Brookpark Road
 Cleveland, OH 44135
 Telephone: (216) 433-6660
 Fax: (216) 433-5802
 --


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Would it be possible to enhance np.unique(.) with a keyword kind

2011-12-19 Thread eat

Hi,

Especially when the keyword return_index of np.unique(.) is specified to be
True, would it in general also be reasonable to be able to specify the
sorting algorithm of the underlying np.argsort(.)?

The rationale is that (for at least some of my cases) higher level
algorithms seems to be too over complicated unless I'm not able to request
a stable sorting order from np.unique(.) (like np.unique(., return_index=
True, kind= 'mergesort').

(FWIW, I apparently do have a working local hack for this kind of
functionality, but without extensive testing of 'all' corner cases).


Thanks,
eat
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Would it be possible to enhance np.unique(.) with a keyword kind

2011-12-19 Thread eat

Hi,

On Tue, Dec 20, 2011 at 2:33 AM, josef.p...@gmail.com wrote:

 On Mon, Dec 19, 2011 at 6:27 PM, eat e.antero.ta...@gmail.com wrote:
  Hi,
 
  Especially when the keyword return_index of np.unique(.) is specified to
 be
  True, would it in general also be reasonable to be able to specify the
  sorting algorithm of the underlying np.argsort(.)?
 
  The rationale is that (for at least some of my cases) higher level
  algorithms seems to be too over complicated unless I'm not able to
 request a
  stable sorting order from np.unique(.) (like np.unique(., return_index=
  True, kind= 'mergesort').
 
  (FWIW, I apparently do have a working local hack for this kind of
  functionality, but without extensive testing of 'all' corner cases).

 Just to understand this:

 Is the return_index currently always the first occurrence or random?

No, for current implementation it's not always the first occurrence
returned. AFAIK, the only stable algorithm to provide this is ' mergesort'
and that's why I'll like to have a keyword 'kind' to propagate down to then
internals.


 I haven't found a use for return_index yet (but use return_inverse a
 lot), but if we are guaranteed to get the first instance, then this
 could be very useful.

I think that 'return_inverse' will suffer of the same
nondeterministic behavior as well.

Thanks,
eat


 Josef


 
 
  Thanks,
  eat
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Would it be possible to enhance np.unique(.) with a keyword kind

2011-12-19 Thread eat

Hi,

On Tue, Dec 20, 2011 at 3:41 AM, josef.p...@gmail.com wrote:

 On Mon, Dec 19, 2011 at 8:16 PM, eat e.antero.ta...@gmail.com wrote:
  Hi,
 
  On Tue, Dec 20, 2011 at 2:33 AM, josef.p...@gmail.com wrote:
 
  On Mon, Dec 19, 2011 at 6:27 PM, eat e.antero.ta...@gmail.com wrote:
   Hi,
  
   Especially when the keyword return_index of np.unique(.) is specified
 to
   be
   True, would it in general also be reasonable to be able to specify the
   sorting algorithm of the underlying np.argsort(.)?
  
   The rationale is that (for at least some of my cases) higher level
   algorithms seems to be too over complicated unless I'm not able to
   request a
   stable sorting order from np.unique(.)
 (like np.unique(., return_index=
   True, kind= 'mergesort').
  
   (FWIW, I apparently do have a working local hack for this kind of
   functionality, but without extensive testing of 'all' corner cases).
 
  Just to understand this:
 
  Is the return_index currently always the first occurrence or random?
 
  No, for current implementation it's not always the first occurrence
  returned. AFAIK, the only stable algorithm to provide this is
 ' mergesort'
  and that's why I'll like to have a keyword 'kind' to propagate down to
 then
  internals.

 Thanks, then I'm all in favor of mergesort.

 
 
  I haven't found a use for return_index yet (but use return_inverse a
  lot), but if we are guaranteed to get the first instance, then this
  could be very useful.
 
  I think that 'return_inverse' will suffer of the same
  nondeterministic behavior as well.

 I don't think so, because there is a unique mapping from observations
 to unique items.

But (source code of 1.6.1) indicates that keywords 'return_inverse' are
'return_index' are related, indeed.

Just my 2 cents
eat


 Josef

 
  Thanks,
  eat
 
 
  Josef
 
 
  
  
   Thanks,
   eat
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] build numpy matrix out of smaller matrix

2011-12-01 Thread eat

Hi,

On Thu, Dec 1, 2011 at 6:52 PM, jonasr jonas.rueb...@web.de wrote:


 Hi,
 is there any possibility to define a numpy matrix, via a smaller given
 matrix, i.e. in matlab
 i can do this like

 a=[1 2 ; 3 4 ]


 A=[a a ; a a ]

 so that i finally get

 A=[ [1,2,1,2]
  [3,4,3,4]
  [1,2,1,2]
  [3,4,3,4]]

 i tried different things on numpy which didn't work
 any ideas ?

Perhaps something like this:
In []: a= np.array([[1, 2], [3, 4]])
In []: np.c_[[a, a], [a, a]]
Out[]:
array([[[1, 2, 1, 2],
[3, 4, 3, 4]],
   [[1, 2, 1, 2],
[3, 4, 3, 4]]])

Regards,
eat


 thank you


 --
 View this message in context:
 http://old.nabble.com/build-numpy-matrix-out-of-smaller-matrix-tp32896004p32896004.html
 Sent from the Numpy-discussion mailing list archive at Nabble.com.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] build numpy matrix out of smaller matrix

2011-12-01 Thread eat

Oops, slightly incorrect answer, but anyway my intention was more along the
lines:
In []: a= np.array([[1, 2], [3, 4]])
In []: np.c_[[a, a], [a, a]].reshape(4, 4)
Out[]:
array([[1, 2, 1, 2],
   [3, 4, 3, 4],
   [1, 2, 1, 2],
   [3, 4, 3, 4]])

On Thu, Dec 1, 2011 at 8:16 PM, josef.p...@gmail.com wrote:

 On Thu, Dec 1, 2011 at 12:26 PM, Benjamin Root ben.r...@ou.edu wrote:
  On Thu, Dec 1, 2011 at 10:52 AM, jonasr jonas.rueb...@web.de wrote:
 
 
  Hi,
  is there any possibility to define a numpy matrix, via a smaller given
  matrix, i.e. in matlab
  i can do this like
 
  a=[1 2 ; 3 4 ]
 
 
  A=[a a ; a a ]
 
  so that i finally get
 
  A=[ [1,2,1,2]
   [3,4,3,4]
   [1,2,1,2]
   [3,4,3,4]]
 
  i tried different things on numpy which didn't work
  any ideas ?
 
  thank you
 
 
  numpy.tile() might be what you are looking for.

 or which is my favorite tile and repeat replacement

  a= np.array([[1, 2], [3, 4]])
  np.kron(np.ones((2,2)), a)
 array([[ 1.,  2.,  1.,  2.],
   [ 3.,  4.,  3.,  4.],
   [ 1.,  2.,  1.,  2.],
   [ 3.,  4.,  3.,  4.]])

  np.kron(a, np.ones((2,2)))
 array([[ 1.,  1.,  2.,  2.],
   [ 1.,  1.,  2.,  2.],
   [ 3.,  3.,  4.,  4.],
   [ 3.,  3.,  4.,  4.]])

But, of'course this is way more generic (and preferable) approach to
utilize.

eat


 Josef

 
  Cheers!
  Ben Root
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Example Usage of Neighborhood Iterator in Cython

2011-10-17 Thread eat

Hi,

On Mon, Oct 17, 2011 at 9:19 PM, T J tjhn...@gmail.com wrote:

 I recently put together a Cython example which uses the neighborhood
 iterator.  It was trickier than I thought it would be, so I thought to
 share it with the community.  The function takes a 1-dimensional array
 and returns a 2-dimensional array of neighborhoods in the original
 area. This is somewhat similar to the functionality provided by
 segment_axis (http://projects.scipy.org/numpy/ticket/901), but I
 believe this slightly different in that neighborhood can extend to the
 left of the current index (assuming circular boundaries).  Keep in
 mind that this is just an example, and normal usage probably is not
 concerned with creating a new array.

 External link:  http://codepad.org/czRIzXQl

 --

 import numpy as np
 cimport numpy as np

 cdef extern from numpy/arrayobject.h:

ctypedef extern class numpy.flatiter [object PyArrayIterObject]:
cdef int nd_m1
cdef np.npy_intp index, size
cdef np.ndarray ao
cdef char *dataptr

# This isn't exposed to the Python API.
# So we can't use the same approach we used to define flatiter
ctypedef struct PyArrayNeighborhoodIterObject:
int nd_m1
np.npy_intp index, size
np.PyArrayObject *ao # note the change from np.ndarray
char *dataptr

object PyArray_NeighborhoodIterNew(flatiter it, np.npy_intp* bounds,
   int mode, np.ndarray fill_value)
int PyArrayNeighborhoodIter_Next(PyArrayNeighborhoodIterObject *it)
int PyArrayNeighborhoodIter_Reset(PyArrayNeighborhoodIterObject *it)

object PyArray_IterNew(object arr)
void PyArray_ITER_NEXT(flatiter it)
np.npy_intp PyArray_SIZE(np.ndarray arr)

cdef enum:
NPY_NEIGHBORHOOD_ITER_ZERO_PADDING,
NPY_NEIGHBORHOOD_ITER_ONE_PADDING,
NPY_NEIGHBORHOOD_ITER_CONSTANT_PADDING,
NPY_NEIGHBORHOOD_ITER_CIRCULAR_PADDING,
NPY_NEIGHBORHOOD_ITER_MIRROR_PADDING

 np.import_array()

 def windowed(np.ndarray[np.int_t, ndim=1] arr, bounds):

cdef flatiter iterx = flatiterPyArray_IterNew(objectarr)
cdef np.npy_intp size = PyArray_SIZE(arr)
cdef np.npy_intp* boundsPtr = [bounds[0], bounds[1]]
cdef int hoodSize = bounds[1] - bounds[0] + 1

# Create the Python object and keep a reference to it
cdef object niterx_ = PyArray_NeighborhoodIterNew(iterx,
boundsPtr, NPY_NEIGHBORHOOD_ITER_CIRCULAR_PADDING, None)
cdef PyArrayNeighborhoodIterObject *niterx = \
PyArrayNeighborhoodIterObject *niterx_

cdef int i,j
cdef np.ndarray[np.int_t, ndim=2] hoods

hoods = np.empty((arr.shape[0], hoodSize), dtype=np.int)
for i in range(iterx.size):
for j in range(niterx.size):
hoods[i,j] = (niterx.dataptr)[0]
PyArrayNeighborhoodIter_Next(niterx)
PyArray_ITER_NEXT(iterx)
PyArrayNeighborhoodIter_Reset(niterx)
return hoods

 def test():
x = np.arange(10)
print x
print
print windowed(x, [-1, 3])
print
print windowed(x, [-2, 2])


 --

 If you run test(), this is what you should see:

 [0 1 2 3 4 5 6 7 8 9]

 [[9 0 1 2 3]
  [0 1 2 3 4]
  [1 2 3 4 5]
  [2 3 4 5 6]
  [3 4 5 6 7]
  [4 5 6 7 8]
  [5 6 7 8 9]
  [6 7 8 9 0]
  [7 8 9 0 1]
  [8 9 0 1 2]]

 [[8 9 0 1 2]
  [9 0 1 2 3]
  [0 1 2 3 4]
  [1 2 3 4 5]
  [2 3 4 5 6]
  [3 4 5 6 7]
  [4 5 6 7 8]
  [5 6 7 8 9]
  [6 7 8 9 0]
  [7 8 9 0 1]]

 windowed(x, [0, 2]) is almost like segment_axis(x, 3, 2, end='wrap').

Just wondering what are the main benefits, of your approach, comparing to
simple:
In []: a= arange(5)
In []: n= 10
In []: b= arange(n)[:, None]
In []: mod(a+ roll(b, 1), n)
Out[]:
array([[9, 0, 1, 2, 3],
   [0, 1, 2, 3, 4],
   [1, 2, 3, 4, 5],
   [2, 3, 4, 5, 6],
   [3, 4, 5, 6, 7],
   [4, 5, 6, 7, 8],
   [5, 6, 7, 8, 9],
   [6, 7, 8, 9, 0],
   [7, 8, 9, 0, 1],
   [8, 9, 0, 1, 2]])
In []: mod(a+ roll(b, 2), n)
Out[]:
array([[8, 9, 0, 1, 2],
   [9, 0, 1, 2, 3],
   [0, 1, 2, 3, 4],
   [1, 2, 3, 4, 5],
   [2, 3, 4, 5, 6],
   [3, 4, 5, 6, 7],
   [4, 5, 6, 7, 8],
   [5, 6, 7, 8, 9],
   [6, 7, 8, 9, 0],
   [7, 8, 9, 0, 1]])

Regards,
eat

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Fill a particular value in the place of number satisfying certain condition by another number in an array.

2011-08-01 Thread eat

Hi

On Mon, Aug 1, 2011 at 3:14 PM, Jeffrey Spencer jeffspenc...@gmail.comwrote:

  Depends where it is contained but another option is and I find it to
 typically be faster:

 B = zeros(A.shape)
 maximum(A,B,A)

Since maximum(.) can handle broadcasting
maximum(A, 0, A)
will be even faster.

-eat



 On 08/01/2011 07:31 PM, dileep kunjaai wrote:

 Dear sir,
How can we fill a particular value in the place of number satisfying
 certain condition by another number in an array.


 Example:
  A=[[[  9.42233087e-42  - 4.71116544e-42   0.e+00 ...,
 1.48303127e+01
  1.31524124e+01   1.14745111e+01]
   [  3.91788793e+00   1.95894396e+00   0.e+00 ...,   1.78252487e+01
  1.28667984e+01   7.90834856e+00]
   [  7.83592510e+00   -3.91796255e+00   0.e+00 ...,
 2.08202991e+01
  1.25811749e+01   4.34205008e+00]
   ...,
   [  -8.51249974e-03   7.00901222e+00   -1.40095119e+01 ...,
 0.e+00
  0.e+00   0.e+00]
   [  4.26390441e-03   3.51080871e+00   -7.01735353e+00 ...,
 0.e+00
  0.e+00   0.e+00]
   [  0.e+00   0.e+00   0.e+00 ...,   0.e+00
  0.e+00   0.e+00]]

  [[  9.42233087e-42   -4.71116544e-42   0.e+00 ...,
 8.48242474e+00
  7.97146845e+00   7.46051216e+00]
   [  5.16325808e+00   2.58162904e+00   0.e+00 ...,   8.47719383e+00
  8.28024673e+00   8.08330059e+00]
   [  1.03267126e+01   5.16335630e+00   0.e+00 ...,   8.47196198e+00
  8.58903694e+00   8.70611191e+00]
   ...,
   [  0.e+00   2.74500012e-01   5.4925e-01 ...,   0.e+00
  0.e+00   0.e+00]
   [  0.e+00   1.37496844e-01   -2.74993688e-01 ...,
 0.e+00
  0.e+00   0.e+00]
   [  0.e+00   0.e+00   0.e+00 ...,   0.e+00
  0.e+00   0.e+00]]

  [[  9.42233087e-42   4.71116544e-42   0.e+00 ...,   1.18437748e+01
  9.72778034e+00   7.61178637e+00]
   [  2.96431869e-01   1.48215935e-01   0.e+00 ...,   1.64031239e+01
  1.32768812e+01   1.01506386e+01]
   [  5.92875004e-01   2.96437502e-01   0.e+00 ...,   2.09626484e+01
  1.68261185e+01   1.26895866e+01]
   ...,
   [  1.78188753e+00   -8.90943766e-01   0.e+00 ...,
 0.e+00
  1.2755e-03   2.5509e-03]
   [  9.34620261e-01   -4.67310131e-01   0.e+00 ...,
 0.e+00
  6.38646539e-04   1.27729308e-03]
   [  8.4339e-02   4.21500020e-02   0.e+00 ...,   0.e+00
  0.e+00   0.e+00]]]
   A contain some negative value i want to change the negative numbers to
 '0'.
 I used 'masked_where', command but I failed.



 Please help me

 --
 DILEEPKUMAR. R
 J R F, IIT DELHI



 ___
 NumPy-Discussion mailing 
 listNumPy-Discussion@scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Rationale for returning type-wrapped min() / max() scalars? (was: Problem with ufunc of a numpy.ndarray derived class)

2011-07-31 Thread eat

Hi,

On Sun, Jul 31, 2011 at 7:36 PM, Charles R Harris charlesr.har...@gmail.com
 wrote:



 On Sun, Jul 31, 2011 at 12:50 AM, Hans Meine 
 me...@informatik.uni-hamburg.de wrote:

 Am 29.07.2011 um 20:23 schrieb Nathaniel Smith:
  Even so, surely this behavior should be consistent between base class
  ndarrays and subclasses? If returning 0d arrays is a good idea, then
  we should do it everywhere. If it's a bad idea, then we shouldn't do
  it at all...?

 Very well put.  That's exactly the reason why I am insisting on this
 discussion, and why I believe that the behavior change is not intentional.
  Otherwise, ndarray and matrix should behave like my subclass.  (BTW: I did
 not check masked_array yet.)

  (In reality, it sounds like this might be some mishap in the
  __array_wrap__ mechanism?)

 That's exactly my guess.  (That could also explain why Mark did not see
 anything obvious in the code.)


 Maybe. There isn't a problem for plain old zero dimensional arrays.

 In [1]: a = array(1)

 In [2]: a.dtype
 Out[2]: dtype('int64')

 In [3]: reshape(a, (1,1), order='f')
 Out[3]: array([[1]])

FWIW:
In []: sys.version
Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]'
In []: np.version.version
Out[]: '1.6.0'

In []: a= array(1)

In []: a.reshape((1, 1), order= 'F').flags
Out[]:
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

In []: a.reshape((1, 1), order= 'C').flags
Out[]:
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

Seems to be slightly inconsistent, but does it really matter?

-eat


 This on Linux 64 with latest master.

 Chuck


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Array vectorization in numpy

2011-07-20 Thread eat

Hi,

On Wed, Jul 20, 2011 at 2:42 AM, Chad Netzer chad.net...@gmail.com wrote:

 On Tue, Jul 19, 2011 at 6:10 PM, Pauli Virtanen p...@iki.fi wrote:

 k = m - 0.5
 
  does here the same thing as
 
 k = np.empty_like(m)
 np.subtract(m, 0.5, out=k)
 
  The memory allocation (empty_like and the subsequent deallocation)
  costs essentially nothing, and there are no temporaries or copying
  in `subtract`.

 As verification:

  import timeit
  import numpy as np
  t=timeit.Timer('k = m - 0.5', setup='import numpy as np;m =
 np.ones([8092,8092],float)')
  np.mean(t.repeat(repeat=10, number=1))
 0.53904647827148433

  t=timeit.Timer('k = np.empty_like(m);np.subtract(m, 0.5, out=k)',
 setup='import numpy as np;m = np.ones([8092,8092],float)')
  np.mean(t.repeat(repeat=10, number=1))
 0.54006035327911373

 The trivial difference is expected as extra python parsing overhead, I
 think.

 Which leads me to apologize, since in my previous post I clearly meant
 to type m -= 0.5, not m =- 0.5, which is *quite* a different
 operation...  Carlos, and Lutz, please take heed. :)  In fact, as Lutz
 pointed out, that example was not at all what I intended to show
 anyway.


 So, just to demonstrate how it was wrong

Perhaps slightly OT, but here is something very odd going on. I would expect
the performance to be in totally different ballpark.

  t=timeit.Timer('m =- 0.5', setup='import numpy as np;m =
 np.ones([8092,8092],float)')
  np.mean(t.repeat(repeat=10, number=1))
 0.058299207687377931

More like:
In []: %timeit m =- .5
1000 loops, best of 3: 35 ns per loop

-eat


  t=timeit.Timer('m -= 0.5', setup='import numpy as np;m =
 np.ones([8092,8092],float)')
  np.mean(t.repeat(repeat=10, number=1))
 0.28192551136016847

  t=timeit.Timer('np.subtract(m, 0.5, m)', setup='import numpy as np;m =
 np.ones([8092,8092],float)')
  np.mean(t.repeat(repeat=10, number=1))
 0.27014491558074949

  t=timeit.Timer('np.subtract(m, 0.5, k)', setup='import numpy as np;m =
 np.ones([8092,8092],float); k = np.empty_like(m)')
  np.mean(t.repeat(repeat=10, number=1))
 0.54962997436523442

 Perhaps the difference in the last two simply comes down to cache
 effects (having to iterate over two different large memory blocks,
 rather than one)?

 -Chad
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] SVD does not converge

2011-06-28 Thread eat

Hi,

On Tue, Jun 28, 2011 at 7:43 PM, Lou Pecora lou_boog2...@yahoo.com wrote:


 --
 *From:* santhu kumar mesan...@gmail.com
 *To:* numpy-discussion@scipy.org
 *Sent:* Tue, June 28, 2011 11:56:48 AM
 *Subject:* [Numpy-discussion] SVD does not converge

 Hello,

 I have a 380X5 matrix and when I am calculating pseudo-inverse of the
 matrix using pinv(numpy.linalg) I get the following error message:

 raise LinAlgError, 'SVD did not converge'
 numpy.linalg.linalg.LinAlgError: SVD did not converge

 I have looked in the list that it is a recurring issue but I was unable to
 find any solution. Can somebody please guide me on how to fix that issue?

 Thanks
 Santhosh
 ==

 I had a similar problem (although I wasn't looking for the pseudo inverse).
  I found that squaring the matrix fixed the problem.  But I'm guessing in
 your situation that would mean a 380 x 380 matrix (I hope I'm thinking about
 your case correctly).  But it's worth trying since it's easy to do.

With my rusty linear algebra: if one chooses to proceed with this 'squaring'
avenue, wouldn't it then be more economical to base the calculations on a
square 5x5 matrix? Something like:
A_pinv= dot(A, pinv(dot(A.T, A))).T
Instead of a 380x380 based matrix:
A_pinv= dot(pinv(dot(A, A.T)), A).T

My two cents
- eat


 -- Lou Pecora, my views are my own.


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] missing data discussion round 2

2011-06-28 Thread eat

Hi,

On Wed, Jun 29, 2011 at 1:40 AM, Jason Grout jason-s...@creativetrax.comwrote:

 On 6/28/11 5:20 PM, Matthew Brett wrote:
  Hi,
 
  On Tue, Jun 28, 2011 at 4:06 PM, Nathaniel Smithn...@pobox.com  wrote:
  ...
  (You might think, what difference does it make if you *can* unmask an
  item? Us missing data folks could just ignore this feature. But:
  whatever we end up implementing is something that I will have to
  explain over and over to different people, most of them not
  particularly sophisticated programmers. And there's just no sensible
  way to explain this idea that if you store some particular value, then
  it replaces the old value, but if you store NA, then the old value is
  still there.
 
  Ouch - yes.  No question, that is difficult to explain.   Well, I
  think the explanation might go like this:
 
  Ah, yes, well, that's because in fact numpy records missing values by
  using a 'mask'.   So when you say `a[3] = np.NA', what you mean is,
  'a._mask = np.ones(a.shape, np.dtype(bool); a._mask[3] = False`
 
  Is that fair?

 Maybe instead of np.NA, we could say np.IGNORE, which sort of conveys
 the idea that the entry is still there, but we're just ignoring it.  Of
 course, that goes against common convention, but it might be easier to
 explain.

Somehow very similar approach how I always have treated the NaNs.
(Thus postponing all the real (slightly dirty) work  on to the imputation
procedures).

For me it has been sufficient to ignore what's the actual cause of NaNs. But
I believe there exists plenty other much more sophisticated situations where
this  kind of simple treatment is not sufficient, at all. Anyway, even in
the future it should still be possible to play nicely with these kind of
simple scenarios.

- eat


 Thanks,

 Jason

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] missing data discussion round 2

2011-06-27 Thread eat

Hi,

On Mon, Jun 27, 2011 at 6:55 PM, Mark Wiebe mwwi...@gmail.com wrote:

 First I'd like to thank everyone for all the feedback you're providing,
 clearly this is an important topic to many people, and the discussion has
 helped clarify the ideas for me. I've renamed and updated the NEP, then
 placed it into the master NumPy repository so it has a more permanent home
 here:

 https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst

 In the NEP, I've tried to address everything that was raised in the
 original thread and in Nathaniel's followup 'Concepts' thread. To deal with
 the issue of whether a mask is True or False for a missing value, I've
 removed the 'mask' attribute entirely, except for ufunc-like functions
 np.ismissing and np.isavail which return the two styles of masks. Here's a
 high level summary of how I'm thinking of the topic, and what I will
 implement:

 *Missing Data Abstraction*

 There appear to be two useful ways to think about missing data that are
 worth supporting.

 1) Unknown yet existing data
 2) Data that doesn't exist

 In 1), an NA value causes outputs to become NA except in a small number of
 exceptions such as boolean logic, and in 2), operations treat the data as if
 there were a smaller array without the NA values.

 *Temporarily Ignoring Data*
 *
 *
 In some cases, it is useful to flag data as NA temporarily, possibly in
 several different ways, for particular calculations or testing out different
 ways of throwing away outliers. This is independent of the missing data
 abstraction, still requiring a choice of 1) or 2) above.

 *Implementation Techniques*
 *
 *
 There are two mechanisms generally used to implement missing data
 abstractions,
 *
 *
 1) An NA bit pattern
 2) A mask

 I've described a design in the NEP which can include both techniques using
 the same interface. The mask approach is strictly more general than the NA
 bit pattern approach, except for a few things like the idea of supporting
 the dtype 'NA[f8,InfNan]' which you can read about in the NEP.

 My intention is to implement the mask-based design, and possibly also
 implement the NA bit pattern design, but if anything gets cut it will be the
 NA bit patterns.

 Thanks again for all your input so far, and thanks in advance for your
 suggestions for improving this new revision of the NEP.

A very impressive PEP indeed.

However, how would corner cases, like

 a = np.array([np.NA, np.NA], dtype='f8', masked=True)
 np.mean(a, skipna=True)

 np.mean(a)

be handled?

My concern here is that there always seems to be such corner cases which can
only be handled with specific context knowledge. Thus producing 100% generic
code to handle 'missing data' is not doable.


Thanks,
- eat


 -Mark

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] missing data discussion round 2

2011-06-27 Thread eat

On Mon, Jun 27, 2011 at 8:53 PM, Mark Wiebe mwwi...@gmail.com wrote:

 On Mon, Jun 27, 2011 at 12:44 PM, eat e.antero.ta...@gmail.com wrote:

 Hi,

 On Mon, Jun 27, 2011 at 6:55 PM, Mark Wiebe mwwi...@gmail.com wrote:

 First I'd like to thank everyone for all the feedback you're providing,
 clearly this is an important topic to many people, and the discussion has
 helped clarify the ideas for me. I've renamed and updated the NEP, then
 placed it into the master NumPy repository so it has a more permanent home
 here:

 https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst

 In the NEP, I've tried to address everything that was raised in the
 original thread and in Nathaniel's followup 'Concepts' thread. To deal with
 the issue of whether a mask is True or False for a missing value, I've
 removed the 'mask' attribute entirely, except for ufunc-like functions
 np.ismissing and np.isavail which return the two styles of masks. Here's a
 high level summary of how I'm thinking of the topic, and what I will
 implement:

 *Missing Data Abstraction*

 There appear to be two useful ways to think about missing data that are
 worth supporting.

 1) Unknown yet existing data
 2) Data that doesn't exist

 In 1), an NA value causes outputs to become NA except in a small number
 of exceptions such as boolean logic, and in 2), operations treat the data as
 if there were a smaller array without the NA values.

 *Temporarily Ignoring Data*
 *
 *
 In some cases, it is useful to flag data as NA temporarily, possibly in
 several different ways, for particular calculations or testing out different
 ways of throwing away outliers. This is independent of the missing data
 abstraction, still requiring a choice of 1) or 2) above.

 *Implementation Techniques*
 *
 *
 There are two mechanisms generally used to implement missing data
 abstractions,
 *
 *
 1) An NA bit pattern
 2) A mask

 I've described a design in the NEP which can include both techniques
 using the same interface. The mask approach is strictly more general than
 the NA bit pattern approach, except for a few things like the idea of
 supporting the dtype 'NA[f8,InfNan]' which you can read about in the NEP.

 My intention is to implement the mask-based design, and possibly also
 implement the NA bit pattern design, but if anything gets cut it will be the
 NA bit patterns.

 Thanks again for all your input so far, and thanks in advance for your
 suggestions for improving this new revision of the NEP.

 A very impressive PEP indeed.

 Hi,


 However, how would corner cases, like

  a = np.array([np.NA, np.NA], dtype='f8', masked=True)
  np.mean(a, skipna=True)

 This should be equivalent to removing all the NA values, then calling
 mean, like this:

  b = np.array([], dtype='f8')
  np.mean(b)
 /home/mwiebe/virtualenvs/dev/lib/python2.7/site-packages/numpy/core/fromnumeric.py:2374:
 RuntimeWarning: invalid value encountered in double_scalars
   return mean(axis, dtype, out)
 nan

  np.mean(a)

 This would return NA, since NA values are sitting in positions that would
 affect the output result.

OK.



 be handled?

 My concern here is that there always seems to be such corner cases which
 can only be handled with specific context knowledge. Thus producing 100%
 generic code to handle 'missing data' is not doable.


 Working out the corner cases for the functions that are already in numpy
 seems tractable to me, how to or whether to support missing data is
 something the author of each new function will have to consider when missing
 data support is in NumPy, but I don't think we can do more than provide the
 mechanisms for people to use.

Sure. I'll ride up with this and wait when I'll have some tangible to
outperform the 'traditional' NaN handling.

- eat


 -Mark


 Thanks,
 - eat


 -Mark

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] argmax for top N elements

2011-06-22 Thread eat

Hi,

On Wed, Jun 22, 2011 at 6:30 PM, Alex Flint alex.fl...@gmail.com wrote:

 Is it possible to use argmax or something similar to find the locations of
 the largest N elements in a matrix?

How bout argsort(.)?, Like:
In []: a= arange(9)
In []: a.argsort()[::-1][:3]
Out[]: array([8, 7, 6])

My 2 cents,
eat



___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] 3d ODR

2011-06-16 Thread eat

Hi,

On Thu, Jun 16, 2011 at 2:28 PM, Christian K. ckk...@hoc.net wrote:

 Hi,

 I need to do fit a 3d surface to a point cloud. This sounds like a job for
 3d
 orthogonal distance regression. Does anybody know of an implementation?

How bout 
http://docs.scipy.org/doc/scipy/reference/odr.htmhttp://docs.scipy.org/doc/scipy/reference/odr.html
My 2 cents,
eat


 Regards, Christian K.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] finding elements that match any in a set

2011-05-29 Thread eat

On Sun, May 29, 2011 at 10:58 PM, Chris Barker chris.bar...@noaa.govwrote:

  On 5/28/2011 3:40 PM, Robert wrote:
  (myarray in mylist) turns into mylist.__contains__(myarray).
  Only the list object is ever checked for this method. There is no
  paired method myarray.__rcontains__(mylist) so there is nothing that
  numpy can override to make this operation do anything different from
  what lists normally do,

 however, numpy arrays should be able to override in be defining their
 own.__contains__ method, so you could do:

 something in an_array

 and get a useful, vectorized result.

 So I thought I'd see what currently happens when I try that:

 In [24]: a
 Out[24]: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

 In [25]: 3 in a
 Out[25]: True

 So the simple case works just like a list. But what If I want what the
 OP wants:

 In [26]: b
 Out[26]: array([3, 6, 4])

 In [27]: b in a
 Out[27]: False

 OK, so the full b array is not in a, and it doesn't vectorize it,
 either. But:

 In [29]: a
 Out[29]:
 array([[ 0,  1,  2],
[ 3,  4,  5],
[ 6,  7,  8],
[ 9, 10, 11]])

 In [30]: b in a
 Out[30]: True

 HUH?

 I'm not sure by what definition we would say that b is contained in a.

 but maybe..

 In [41]: b
 Out[41]: array([  4,   2, 345])

 In [42]: b in a
 Out[42]: False

 so it's are all of the elements in b in a somewhere? but only for 2-d
 arrays?


 So what does it mean?

FWIW, a short prelude on the theme seems quite promising, indeed:
In []: A
Out[]:
array([[0, 1, 2],
   [3, 4, 5],
   [6, 7, 8]])
In []: [0, 1, 2] in A
Out[]: True
In []: [0, 3, 6] in A
Out[]: True
In []: [0, 4, 8] in A
Out[]: True
In []: [8, 4, 0] in A
Out[]: True
In []: [2, 4, 6] in A
Out[]: True
In []: [6, 4, 2] in A
Out[]: True
In []: [3, 1, 5] in A
Out[]: True
In [1061]: [3, 1, 4] in A
Out[1061]: True
But
In []: [1, 2, 3] in A
Out[]: False
In []: [3, 2, 1] in A
Out[]: True

So, obviously the logic behind __contains__ is not so very straightforward.
Perhaps just a bug?

Regards,
eat


 The docstring is not helpful:

 In [58]: np.ndarray.__contains__?
 Type:   wrapper_descriptor
 Base Class: type 'wrapper_descriptor'
 String Form:slot wrapper '__contains__' of 'numpy.ndarray' objects
 Namespace:  Interactive
 Docstring:
 x.__contains__(y) == y in x


 If nothing useful, maybe it could provide a vectorized version of in
 for this sort of use case.

 -Chris






 --
 Christopher Barker, Ph.D.
 Oceanographer

 Emergency Response Division
 NOAA/NOS/ORR(206) 526-6959   voice
 7600 Sand Point Way NE   (206) 526-6329   fax
 Seattle, WA  98115   (206) 526-6317   main reception

 chris.bar...@noaa.gov
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Need to eliminate a nested loop

2011-05-11 Thread eat

Hi,

On Wed, May 11, 2011 at 8:28 AM, Elfnor elf...@gmail.com wrote:


 Hi

 The following code produces the desired result but has a slow triple loop
 iterating over the matrix multiplication.

 I'm sure it can be eliminated with a neat indexing trick but I can't figure
 out how.

 Any suggestions please?
 -
 import numpy
 #define domain of function
 x = numpy.linspace(-5,5,64)
 y = numpy.linspace(-5,5,64)
 z = numpy.linspace(-5,5,64)

 #calculate f at each point in domain
 a = 5.0
 b = 3.0
 c = 2.0
 #ellipsoid
 E = numpy.array([[1/a**2,   0   ,   0  ,  0  ],
[   0   ,1/b**2 ,   0  ,  0  ],
[   0   ,   0   ,1/c**2,  0  ],
[   0   ,   0   ,   0  , -1  ]])

 f = numpy.zeros((x.size,y.size,z.size))

 for i,xi in enumerate(x):
for j,yj in enumerate(y):
for k,zk in enumerate(z):
X = numpy.array([xi,yj,zk,1])
f[i,j,k] = numpy.dot(numpy.dot(X.T,E),X)
 ---

Something like this:
n= 64
u= np.linspace(-5, 5, n)[None, :]
u0= u.repeat(n** 2)[None, :]

u1= u.repeat(n)[None, :].repeat(n, axis= 0).reshape(1, -1)

u2= u.repeat(n** 2, axis= 0).reshape(1, -1)

U= np.r_[u0, u1, u2, np.ones((1, n** 3))]

f= (np.dot(E, U)* U).sum(0).reshape(n, n, n)


Regards,eat


 Thanks Eleanor
 --
 View this message in context:
 http://old.nabble.com/Need-to-eliminate-a-nested-loop-tp31591457p31591457.html
 Sent from the Numpy-discussion mailing list archive at Nabble.com.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] np.histogram on arrays.

2011-03-31 Thread eat

Hi,

On Wed, Mar 30, 2011 at 1:44 PM, Éric Depagne e...@depagne.org wrote:

 
  Well I guess, for a slight performance improvement, you could create your
  own streamlined histogrammer.
 
  But, in order to better grasp your situation it would be beneficial to
 know
  how the counts and bounds are used later on. Just wondering if this kind
  massive histogramming could be somehow avoided totally.
 Indeed.
 Here's what I do.
 My images come from CCD, and as such, the zero level in the image is not
 the
 true zero level, but is the true zero + the background noise of each
 pixels.
 By doing the histogram, I plan on detecting what is the most common value
 per
 row.
 Once I have the most common value, I can derive the interval where most of
 the
 values are (the index of the largest occurence is easily obtained by
 sorting
 the counts, and I take a slice [index_max_count,index_max_count+1] in the
 second array given by the histogram).
 Then, I  take the mean value of this interval and I assume it is the value
 of
 the bias for my row.

 I do this procedure both on the row and columns as a sanity check.
 And I know this procedure will not work if on any row/column there is a lot
 of
 signal and very little bias. I'll fix that afterwards ;-)

Perhaps something along these lines will help you:
from numpy import histogram, linspace, r_

def hist2(a, nob):
bins= linspace(a.min(), a.max(), nob+ 1)
d= linspace(0, bins[-1]* a.shape[0], a.shape[0])[:, None]
b= (a+ d).ravel()
bbins= (bins[:-1]+ d).ravel()
bbins= r_[bbins, bbins[-1]+ 1]
counts, _= histogram(b, bbins)
return counts.reshape(-1, nob), bins

It has two disadvantages 1) needs more memory and 2) global bins
(which although should be quite straightforward to enhance if needed).

Regards,
eat


 Éric.


 
  Regards,
  eat
 

 Un clavier azerty en vaut deux
 --
 Éric Depagnee...@depagne.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] np.histogram on arrays.

2011-03-30 Thread eat

Hi,

On Wed, Mar 30, 2011 at 10:04 AM, Éric Depagne e...@depagne.org wrote:

 Hi.

 Sorry for not having been clearer. I'll explain a little bit.

 I have 4k x 4k images that I want to analyse. I turn them into numpy arrays
 so
 I have 4k x 4k np.array.

 My analysis starts with determining the bias level. To do that, I compute
 for
 each line, and then for each row, an histogram.
 So I compute 8000 histograms.

 Here is the code I've used sofar:

for i in range(self.data.shape[0]):
   #Compute an histogram along the columns
   # Gets counts and bounds
self.countsC[i], self.boundsC[i] = np.histogram(data[i],
 bins=self.bins)
for i in range(self.data.shape[1]):
# Do the same, along the rows.
self.countsR[i], self.boundsR[i] = np.histogram(data[:,i],
 bins=self.bins)

 And data.shape is (4000,4000).

 If histogram  had an axis parameter, I could avoid the loop and I guess it
 would be faster.

Well I guess, for a slight performance improvement, you could create your
own streamlined histogrammer.

But, in order to better grasp your situation it would be beneficial to know
how the counts and bounds are used later on. Just wondering if this kind
massive histogramming could be somehow avoided totally.

Regards,
eat


 Éric.
  So it seems that you give your array directly to histogramdd (asking a
  4000D histogram!). Surely that's not what you are trying to achieve. Can
  you elaborate more on your objectives? Perhaps some code (slow but
  working) to demonstrate the point.
 
  Regards,
  eat
 

 Un clavier azerty en vaut deux
 --
 Éric Depagnee...@depagne.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] random number genration

2011-03-29 Thread eat

Hi,

On Tue, Mar 29, 2011 at 12:00 PM, Alex Ter-Sarkissov ater1...@gmail.comwrote:

 If I want to generate a string of random bits with equal probability I run

 random.randint(0,2,size).

 What if I want a specific proportion of bits? In other words, each bit is 1
 with probability p1/2 and 0 with probability q=1-p?

Would it be sufficient to:
In []: bs= ones(1e6, dtype= int)
In []: bs[randint(0, 1e6, 1e5)]= 0
In []: bs.sum()/ 1e6
Out[]: 0.904706

Regards,
eat


 thanks

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] random number genration

2011-03-29 Thread eat

On Tue, Mar 29, 2011 at 1:29 PM, eat e.antero.ta...@gmail.com wrote:

 Hi,

 On Tue, Mar 29, 2011 at 12:00 PM, Alex Ter-Sarkissov 
 ater1...@gmail.comwrote:

 If I want to generate a string of random bits with equal probability I run


 random.randint(0,2,size).

 What if I want a specific proportion of bits? In other words, each bit is
 1 with probability p1/2 and 0 with probability q=1-p?

 Would it be sufficient to:
 In []: bs= ones(1e6, dtype= int)
 In []: bs[randint(0, 1e6, 1e5)]= 0
 In []: bs.sum()/ 1e6
 Out[]: 0.904706

Or:
In []: bs= ones(1e6, dtype= int)
In []: bs[rand(1e6) 1./ 10]= 0
In []: bs.sum()/ 1e6
Out[]: 0.89983


 Regards,
 eat


 thanks

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] np.histogram on arrays.

2011-03-29 Thread eat

Hi,

On Tue, Mar 29, 2011 at 4:29 PM, Éric Depagne e...@depagne.org wrote:

 Hi all.

 Sorry if this question has already been asked. I've searched the archive,
 but
 could not find anything related, so here is my question.

 I'm using np.histogram on a 4000x4000 array, each with 200 bins. I do that
 on
 both dimensions, meaning I compute 8000 histograms. It takes around 5
 seconds
 (which is of course quite fast).

 I was wondering why np.histogram does not accept an axis parameter so that
 it
 could work directly on the array without me having to write a loop.

 Or maybe did I miss some parameters using np.histogram.

FWIW, have you considered to use
http://docs.scipy.org/doc/numpy/reference/generated/numpy.histogramdd.html#numpy.histogramdd

Regards,
eat


 Thanks.

 Éric.

 Un clavier azerty en vaut deux
 --
 Éric Depagnee...@depagne.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] np.histogram on arrays.

2011-03-29 Thread eat

Hi,

On Tue, Mar 29, 2011 at 5:13 PM, Éric Depagne e...@depagne.org wrote:

  FWIW, have you considered to use
 
 http://docs.scipy.org/doc/numpy/reference/generated/numpy.histogramdd.html#
  numpy.histogramdd
 
  Regards,
  eat
 
 I tried, but I get a
 /usr/lib/pymodules/python2.6/numpy/lib/function_base.pyc in
 histogramdd(sample, bins, range, normed, weights)
370 # Reshape is used so that overlarge arrays

371 # will raise an error.

 -- 372 hist = zeros(nbin, float).reshape(-1)
373
374 # Compute the sample indices in the flattened histogram matrix.


 ValueError: sequence too large; must be smaller than 32

 so I suspect my array is too big for histogramdd

So it seems that you give your array directly to histogramdd (asking a 4000D
histogram!). Surely that's not what you are trying to achieve. Can you
elaborate more on your objectives? Perhaps some code (slow but working) to
demonstrate the point.

Regards,
eat


 Éric.
 --
 Un clavier azerty en vaut deux
 --
 Éric Depagnee...@depagne.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] should get rid of the annoying numpy STDERR output

2011-03-24 Thread eat

Hi

2011/3/24 Dmitrey tm...@ukr.net

   from numpy import inf, array
  inf*0
 nan

 (ok)

  array(inf) * 0.0
 StdErr: Warning: invalid value encountered in multiply
 nan

 My cycled calculations yields this thousands times slowing computations and
 making text output completely non-readable.

Would old= seterr(invalid= 'ignore') be sufficient for you?

Regards,
eat


  from numpy import __version__
  __version__
 '2.0.0.dev-1fe8136'

 D.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] should get rid of the annoying numpy STDERR output

2011-03-24 Thread eat

Hi

On Thu, Mar 24, 2011 at 4:17 PM, Skipper Seabold jsseab...@gmail.comwrote:

 On Thu, Mar 24, 2011 at 9:45 AM, Ralf Gommers
 ralf.gomm...@googlemail.com wrote:
  2011/3/24 Dmitrey tm...@ukr.net:
 
  Hi
 
  2011/3/24 Dmitrey tm...@ukr.net
 
   from numpy import inf, array
   inf*0
  nan
 
  (ok)
 
   array(inf) * 0.0
  StdErr: Warning: invalid value encountered in multiply
  nan
 
  My cycled calculations yields this thousands times slowing computations
  and making text output completely non-readable.
 
  Would old= seterr(invalid= 'ignore') be sufficient for you?
 
  yes for me, but I'm not sure for all those users who use my soft. Maybe
 it
  will hide some bugs in their objective functions and nonlinear
 constraints
  in numerical optimization and nonlinear equation systems.
 
  If we do what you request in your message subject your users will have
  the same issue.
 
  You can wrap (parts of) your code in something like:
   olderr = seterr(invalid= 'ignore')
   do stuff
   seterr(**olderr)
 

 Also, as Robert pointed out to me before np.errstate is a
 context-manager for ignoring these warnings. I often wrap optimization
 code with it.

I didn't realize that its context-manager, but that's really create!
(Perhaps documents could reflect that.)

Regards,
eat


 In [3]: np.array(np.inf)*0.
 Warning: invalid value encountered in multiply
 Out[3]: nan

 In [4]: with np.errstate(all='ignore'):
   ...: np.array(np.inf)*0.
   ...:
   ...:
 Out[4]: nan

 In [5]: np.array(np.inf)*0.
 Warning: invalid value encountered in multiply
 Out[5]: nan

 Skipper
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] avoid a line...

2011-03-17 Thread eat

Hi,

On Thu, Mar 17, 2011 at 9:36 AM, dileep kunjaai dileepkunj...@gmail.comwrote:

 Dear sir,
  I am try to read a file of the following format, I want to avoid the first
 line and read the remaining as 'float' .
 Please help me...


 RainWindTempPrSal
 0.11.10.020.2   0.2
 0.50.   0.  0.4   0.8
 0.55.51.50.5   1.5
 3.50.51.55.0   2.6
 5.14.13.22.3   1.5
 4.40.91.52.2.3

You may use loadtxt(), with skiprows argument. (See more on
http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html).

Regards,
eat


 --
 DILEEPKUMAR. R
 J R F, IIT DELHI


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Norm of array of vectors

2011-03-17 Thread eat

Hi,

On Thu, Mar 17, 2011 at 10:44 AM, Andrey N. Sobolev inco...@list.ru wrote:

 Dear all,

 Sorry if that's a noob question, but anyway. I have several thousands of
 vectors stacked in 2d array. I'd like to get new array containing
 Euclidean norms of these vectors and get the vector with minimal norm.

 Is there more efficient way to do this than
 argmin(array([sqrt(dot(x,x)) for x in vec_array]))?

Try
argmin(sum(vec_array** 2, 0)** 0.5)

Regards,
eat


 Thanks in advance.
 Andrey.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] view 1d array as overlapping segments?

2011-03-07 Thread eat

Hi,

On Mon, Mar 7, 2011 at 5:01 PM, Neal Becker ndbeck...@gmail.com wrote:

 reshape can view a 1d array as non-overlapping segments.

 Is there a convenient way to view a 1d array as a 2d array of overlapping
 segments?

 nonoverlapping:
 l: segment length
 k: overlap
 u is the 1d array
 v is a 2d array

 v[i] = u[l*i:(l+1)*i]

 overlapping:
 v[i] = u[l*i:(l+1)*i+k]

In case actually  v[i]= u[i* l: (i+ 1)* l+ k], then this may be useful
from numpy.lib.stride_tricks import as_strided as ast

def os(a, l, k= 0):
shape, strides= (a.shape[0]- l+ 1, l+ k), a.strides* 2
return ast(a, shape= shape, strides= strides)

if __name__ == '__main__':
import numpy as np
a= np.arange(7, dtype= np.int8)
print os(a, 3)
# [[0 1 2]
#  [1 2 3]
#  [2 3 4]
#  [3 4 5]
#  [4 5 6]]
print os(a, 3, 2)
# [[ 0  1  2  3  4]
#  [ 1  2  3  4  5]
#  [ 2  3  4  5  6]
#  [ 3  4  5  6  0]  # last item garbage
#  [ 4  5  6  0 34]] # 2 last items garbage

My two cents,
eat

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Taking a large number of dot products at once

2011-03-03 Thread eat

Hi,

On Fri, Mar 4, 2011 at 8:19 AM, Daniel Hyams dhy...@gmail.com wrote:

 This is probably so easy, I'm embarrassed to ask it...but I've been casting
 around trying things to no avail for the last hour and a half, so here
 goes...

 I have a lot of dot products to take.  The length-3 vectors that I want to
 dot are stacked in a 2D array like this:

 U = [u1 u2 u3]

 and

 V = [v1 v2 v3]

 So both of these arrays, are, say, 3x100 each.  I just want to take the dot
 product of each of the corresponding vectors, so that the result is

 [u1.v1 u2.v2  u3.v3 ]

 which would be a 1x100 array in this case.

 Which function do I need to use?  I thought tensordot() was the one, but I
 couldn't make it workpure user error I'm sure.

No function needed for this case, just:
In []: x= rand(3, 7)
In []: y= rand(3, 7)
In []: d= (x* y).sum(0)
In [490]: d
Out[490]:
array([ 1.25404683,  0.19113117,  1.37267133,  0.74219888,  1.55296562,
0.15264303,  0.72039922])
In [493]: dot(x[:, 0].T, y[:, 0])
Out[493]: 1.2540468282421895

Regards,
eat




 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] OT: performance in C extension; OpenMP, or SSE ?

2011-02-15 Thread eat

Hi,

On Tue, Feb 15, 2011 at 5:50 PM, Sebastian Haase seb.ha...@gmail.comwrote:

 Hi,
 I assume that someone here could maybe help me, and I'm hoping it's
 not too much off topic.
 I have 2 arrays of 2d point coordinates and would like to calculate
 all pairwise distances as fast as possible.
 Going from Python/Numpy to a (Swigged) C extension already gave me a
 55x speedup.
 (.9ms vs. 50ms for arrays of length 329 and 340).

With my very modest machine (Intel Dual CPU E2200, 2.2 Ghz) utilizing
scipy.spatial.distance.pdist will take some 1.5 ms for such arrays. Even the
simple linear algebra based full matrix calculations can be done less than 5
ms.


My two cents,
eat

 I'm using gcc on Linux.
 Now I'm wondering if I could go even faster !?
 My hope that the compiler might automagically do some SSE2
 optimization got disappointed.
 Since I have a 4 core CPU I thought OpenMP might be an option;
 I never used that, and after some playing around I managed to get
 (only) 50% slowdown(!) :-(

 My code in short is this:
 (My SWIG typemaps use obj_to_array_no_conversion() from numpy.i)
 ---Ccode --
 void dists2d(
   double *a_ps, int nx1, int na,
   double *b_ps, int nx2, int nb,
   double *dist, int nx3, int ny3)  throw (char*)
 {
  if(nx1 != 2)  throw (char*) a must be of shape (n,2);
  if(nx2 != 2)  throw (char*) b must be of shape (n,2);
  if(nx3 != nb || ny3 != na)throw (char*) c must be of shape (na,nb);

  double *dist_;
  int i, j;

 #pragma omp parallel private(dist_, j, i)
  {
 #pragma omp for nowait
for(i=0;ina;i++)
  {
//num_threads=omp_get_num_threads();  -- 4
dist_ = dist+i*nb; // dists_  is  only
 introduced for OpenMP
for(j=0;jnb;j++)
  {
*dist_++  = sqrt( sq(a_ps[i*nx1]   - b_ps[j*nx2]) +
  sq(a_ps[i*nx1+1] -
 b_ps[j*nx2+1]) );
  }
  }
  }
 }
 ---/Ccode --
 There is probably a simple mistake in this code - as I said I never
 used OpenMP before.
 It should be not too difficult to use OpenMP correctly here
  or -  maybe better -
 is there a simple SSE(2,3,4) version that might be even better than
 OpenMP... !?

 I supposed, that I did not get the #pragma omp lines right - any idea ?
 Or is it in general not possible to speed this kind of code up using OpenMP
 !?

 Thanks,
 Sebastian Haase
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] odd performance of sum?

2011-02-12 Thread eat

Hi Sturla,

On Sat, Feb 12, 2011 at 5:38 PM, Sturla Molden stu...@molden.no wrote:

 Den 10.02.2011 16:29, skrev eat:
  One would expect sum to outperform dot with a clear marginal. Does
  there exixts any 'tricks' to increase the performance of sum?

First of all, thanks for you still replying. Well, I'm still a little bit
unsure how I should proceed with this discussion... I may have used bad
wordings and created unneccessary mayhyem with my original question (:.
Trust me, I'm only trying to discuss here  with a positive criticism in my
mind.

Now, I'm not pretending to know what kind of a person a 'typical' numpy user
is. But I'm assuming that there just exists more than me with roughly
similar questions in their (our) minds and who wish to utilize numpy more
 'pythonic; all batteries included' way. Ocassionally I (we) may ask really
stupid questions, but please beare with us.

Said that, I'm still very confident that (from a users point of view)
there's some real substance on the issue I addressed.

 I see that others have ansvered already. The ufunc np.sum is not going
 going to beat np.dot. You are racing the heavy machinery of NumPy (array
 iterators, type chekcs, bound checks, etc.) against level-3 BLAS routine
 DGEMM, the most heavily optimized numerical kernel ever written.

Fair enough.

 Also
 beware that computation is much cheaper than memory access.

Sure, that's exactly where I expected the performance boost to emerge.

 Although
 DGEMM does more arithmetics, and even is O(N3) in that respect, it is
 always faster except for very sparse arrays. If you need fast loops, you
 can always write your own Fortran or C, and even insert OpenMP pragmas.

That's a very important potential, but surely not all numpy users are
expected to master that ;-)

 But don't expect that to beat optimized high-level BLAS kernels by any
 margin. The first chapters of Numerical Methods in Fortran 90 might be
 worth reading. It deals with several of these issues, including
 dimensional expansion, which is important for writing fast numerical
 code -- but not intuitively obvious. I expect this to be faster because
 it does less work is a fundamental misconception in numerical
 computing. Whatever cause less traffic on the memory BUS (the real
 bottleneck) will almost always be faster, regardless of the amount of
 work done by the CPU.

And I'm totally aware of it, and actually it was exactly the original
intended logic of my question: how bout if the sum could follow the steps
of dot; then, since less instructions it must be bounded below of
the execution time of dot. But as R. Kern gently pointed out allready it's
not fruitfull enough avenue to proceed. And I'm able to live with that.


Regards,
eat

 A good advice is to use high-level BLAS whenever
 you can. The only exception, as mentioned, is when matrices get very
 sparse.

 Sturla


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] odd performance of sum?

2011-02-10 Thread eat

Hi,

Observing following performance:
In []: m= 1e5
In []: n= 1e2
In []: o= ones(n)
In []: M= randn(m, n)
In []: timeit M.sum(1)
10 loops, best of 3: 38.3 ms per loop
In []: timeit dot(M, o)
10 loops, best of 3: 21.1 ms per loop

In []: m= 1e2
In []: n= 1e5
In []: o= ones(n)
In []: M= randn(m, n)
In []: timeit M.sum(1)
100 loops, best of 3: 18.3 ms per loop
In []: timeit dot(M, o)
10 loops, best of 3: 21.2 ms per loop

One would expect sum to outperform dot with a clear marginal. Does there
exixts any 'tricks' to increase the performance of sum?

In []: sys.version
Out[]: '2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit
(Intel)]'
# installed binaries from http://python.org/
In []: np.version.version
Out[]: '1.5.1'
 # installed binaries from http://scipy.org/


Regards,
eat
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] odd performance of sum?

2011-02-10 Thread eat

Thanks Chuck,

for replying. But don't you still feel very odd that dot outperforms sum in
your machine? Just to get it simply; why sum can't outperform dot? Whatever
architecture (computer, cache) you have, it don't make any sense at all that
when performing significantly less instructions, you'll reach to spend more
time ;-).


Regards,
eat
On Thu, Feb 10, 2011 at 7:10 PM, Charles R Harris charlesr.har...@gmail.com
 wrote:



  On Thu, Feb 10, 2011 at 8:29 AM, eat e.antero.ta...@gmail.com wrote:

 Hi,

 Observing following performance:
 In []: m= 1e5
 In []: n= 1e2
 In []: o= ones(n)
 In []: M= randn(m, n)
 In []: timeit M.sum(1)
 10 loops, best of 3: 38.3 ms per loop
 In []: timeit dot(M, o)
 10 loops, best of 3: 21.1 ms per loop

 In []: m= 1e2
 In []: n= 1e5
 In []: o= ones(n)
 In []: M= randn(m, n)
 In []: timeit M.sum(1)
 100 loops, best of 3: 18.3 ms per loop
 In []: timeit dot(M, o)
 10 loops, best of 3: 21.2 ms per loop

 One would expect sum to outperform dot with a clear marginal. Does there
 exixts any 'tricks' to increase the performance of sum?



 I'm not surprised, much depends on the version of ATLAS or MKL you are
 linked to. If you aren't linked to either and just using numpy's version
 then the results are a bit strange. With numpy development I get

 In [1]: m= 1e5

 In [2]: n= 1e2

 In [3]: o= ones(n)

 In [4]: M= randn(m, n)

 In [5]: timeit M.sum(1)
 100 loops, best of 3: 19.2 ms per loop

 In [6]: timeit dot(M, o)
 100 loops, best of 3: 15 ms per loop

 In [7]: m= 1e2

 In [8]: n= 1e5

 In [9]: o= ones(n)

 In [10]: M= randn(m, n)

 In [11]: timeit M.sum(1)
 100 loops, best of 3: 17.4 ms per loop

 In [12]: timeit dot(M, o)
 100 loops, best of 3: 14.2 ms per loop

 Chuck


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] odd performance of sum?

2011-02-10 Thread eat

Hi Robert,

On Thu, Feb 10, 2011 at 8:16 PM, Robert Kern robert.k...@gmail.com wrote:

 On Thu, Feb 10, 2011 at 11:53, eat e.antero.ta...@gmail.com wrote:
  Thanks Chuck,
 
  for replying. But don't you still feel very odd that dot outperforms sum
 in
  your machine? Just to get it simply; why sum can't outperform dot?
 Whatever
  architecture (computer, cache) you have, it don't make any sense at all
 that
  when performing significantly less instructions, you'll reach to spend
 more
  time ;-).

 These days, the determining factor is less often instruction count
 than memory latency, and the optimized BLAS implementations of dot()
 heavily optimize the memory access patterns.

Can't we have this as well with simple sum?

 Additionally, the number
 of instructions in your dot() probably isn't that many more than the
 sum(). The sum() is pretty dumb

But does it need to be?

 and just does a linear accumulation
 using the ufunc reduce mechanism, so (m*n-1) ADDs plus quite a few
 instructions for traversing the array in a generic manner. With fused
 multiply-adds, being able to assume contiguous data and ignore the
 numpy iterator overhead, and applying divide-and-conquer kernels to
 arrange sums, the optimized dot() implementations could have a
 comparable instruction count.

Couldn't sum benefit with similar logic?

 If you were willing to spend that amount of developer time and code
 complexity to make platform-specific backends to sum()

Actually I would, but I'm not competent at all in that detailed level (:,
But I'm willing to spend more on my own time for example for testing,
debugging, analysing various improvements and suggestions if such emerge.

 , you could make
 it go really fast, too. Typically, it's not all that important to make
 it worthwhile, though. One thing that might be worthwhile is to make
 implementations of sum() and cumsum() that avoid the ufunc machinery
 and do their iterations more quickly, at least for some common
 combinations of dtype and contiguity.

Well I'm allready perplexd before reaching that 'ufunc machinery', it's
actually anyway trivial (for us more mortal ;-) to figure out what's
happening with sum on fromnumeric.py!


Regards,
eat


 --
 Robert Kern

 I have come to believe that the whole world is an enigma, a harmless
 enigma that is made terrible by our own mad attempt to interpret it as
 though it had an underlying truth.
   -- Umberto Eco
  ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] odd performance of sum?

2011-02-10 Thread eat

Hi Pauli,

On Thu, Feb 10, 2011 at 8:31 PM, Pauli Virtanen p...@iki.fi wrote:

 Thu, 10 Feb 2011 12:16:12 -0600, Robert Kern wrote:
 [clip]
  One thing that might be worthwhile is to make
  implementations of sum() and cumsum() that avoid the ufunc machinery and
  do their iterations more quickly, at least for some common combinations
  of dtype and contiguity.

 I wonder what is the balance between the iterator overhead and the time
 taken in the reduction inner loop. This should be straightforward to
 benchmark.

 Apparently, some overhead decreased with the new iterators, since current
 Numpy master outperforms 1.5.1 by a factor of 2 for this benchmark:

 In [8]: %timeit M.sum(1) # Numpy 1.5.1
 10 loops, best of 3: 85 ms per loop

 In [8]: %timeit M.sum(1) # Numpy master
 10 loops, best of 3: 49.5 ms per loop

 I don't think this is explainable by the new memory layout optimizations,
 since M is C-contiguous.

 Perhaps there would be room for more optimization, even within the ufunc
 framework?

I hope so. Please suggest if there's anything that I can do to further
advance this. (My C skills are allready bit rusty, but at any higher level
I'll try my best to contribute).


Thanks,
eat


 --
 Pauli Virtanen

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] odd performance of sum?

2011-02-10 Thread eat

Hi Robert,

On Thu, Feb 10, 2011 at 10:58 PM, Robert Kern robert.k...@gmail.com wrote:

 On Thu, Feb 10, 2011 at 14:29, eat e.antero.ta...@gmail.com wrote:
  Hi Robert,
 
  On Thu, Feb 10, 2011 at 8:16 PM, Robert Kern robert.k...@gmail.com
 wrote:
 
  On Thu, Feb 10, 2011 at 11:53, eat e.antero.ta...@gmail.com wrote:
   Thanks Chuck,
  
   for replying. But don't you still feel very odd that dot outperforms
 sum
   in
   your machine? Just to get it simply; why sum can't outperform dot?
   Whatever
   architecture (computer, cache) you have, it don't make any sense at
 all
   that
   when performing significantly less instructions, you'll reach to spend
   more
   time ;-).
 
  These days, the determining factor is less often instruction count
  than memory latency, and the optimized BLAS implementations of dot()
  heavily optimize the memory access patterns.
 
  Can't we have this as well with simple sum?

 It's technically feasible to accomplish, but as I mention later, it
 entails quite a large cost. Those optimized BLASes represent many
 man-years of effort

Yes I acknowledge this. But didn't they then  ignore them something simpler,
like sum (but which actually could benefit exactly similiar optimizations).

 and cause substantial headaches for people
 building and installing numpy.

I appreciate this. No doubt at all.

 However, they are frequently worth it
 because those operations are often bottlenecks in whole applications.
 sum(), even in its stupidest implementation, rarely is. In the places
 where it is a significant bottleneck, an ad hoc implementation in C or
 Cython or even FORTRAN for just that application is pretty easy to
 write.

But here I have to disagree; I'll think that at least I (if not even the
majority of numpy users) don't like (nor I'm be capable/ or have enough
time/ resources) go to dwell such details. I'm sorry but I'll have to
restate that it's quite reasonable to expect that sum outperforms dot in any
case. Lets now to start make such movements, which enables sum to outperform
dot.

 You can gain speed by specializing to just your use case, e.g.
 contiguous data, summing down to one number, or summing along one axis
 of only 2D data, etc. There's usually no reason to try to generalize
 that implementation to put it back into numpy.

Yes, I would really like to specialize into my case, but 'without going out
the python realm.'


Thanks,
eat


  Additionally, the number
  of instructions in your dot() probably isn't that many more than the
  sum(). The sum() is pretty dumb
 
  But does it need to be?

 As I also allude to later in my email, no, but there are still costs
 involved.

  and just does a linear accumulation
  using the ufunc reduce mechanism, so (m*n-1) ADDs plus quite a few
  instructions for traversing the array in a generic manner. With fused
  multiply-adds, being able to assume contiguous data and ignore the
  numpy iterator overhead, and applying divide-and-conquer kernels to
  arrange sums, the optimized dot() implementations could have a
  comparable instruction count.
 
  Couldn't sum benefit with similar logic?

 Etc. I'm not going to keep repeating myself.

 --
  Robert Kern

 I have come to believe that the whole world is an enigma, a harmless
 enigma that is made terrible by our own mad attempt to interpret it as
 though it had an underlying truth.
   -- Umberto Eco
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] odd performance of sum?

2011-02-10 Thread eat

Hi,

On Fri, Feb 11, 2011 at 12:08 AM, Robert Kern robert.k...@gmail.com wrote:

  On Thu, Feb 10, 2011 at 15:32, eat e.antero.ta...@gmail.com wrote:
  Hi Robert,
 
  On Thu, Feb 10, 2011 at 10:58 PM, Robert Kern robert.k...@gmail.com
 wrote:
 
  On Thu, Feb 10, 2011 at 14:29, eat e.antero.ta...@gmail.com wrote:
   Hi Robert,
  
   On Thu, Feb 10, 2011 at 8:16 PM, Robert Kern robert.k...@gmail.com
   wrote:
  
   On Thu, Feb 10, 2011 at 11:53, eat e.antero.ta...@gmail.com wrote:
Thanks Chuck,
   
for replying. But don't you still feel very odd that dot
 outperforms
sum
in
your machine? Just to get it simply; why sum can't outperform dot?
Whatever
architecture (computer, cache) you have, it don't make any sense at
all
that
when performing significantly less instructions, you'll reach to
spend
more
time ;-).
  
   These days, the determining factor is less often instruction count
   than memory latency, and the optimized BLAS implementations of dot()
   heavily optimize the memory access patterns.
  
   Can't we have this as well with simple sum?
 
  It's technically feasible to accomplish, but as I mention later, it
  entails quite a large cost. Those optimized BLASes represent many
  man-years of effort
 
  Yes I acknowledge this. But didn't they then  ignore them something
 simpler,
  like sum (but which actually could benefit exactly similiar
 optimizations).

 Let's set aside the fact that the people who optimized the
 implementation of dot() (the authors of ATLAS or the MKL or whichever
 optimized BLAS library you linked to) are different from those who
 implemented sum() (the numpy devs). Let me repeat a reason why one
 would put a lot of effort into optimizing dot() but not sum():

 
  However, they are frequently worth it
  because those operations are often bottlenecks in whole applications.
  sum(), even in its stupidest implementation, rarely is.
 

 I don't know if I'm just not communicating very clearly, or if you
 just reply to individual statements before reading the whole email.

You are communicating very well. It's me who's not communicating so well.


  and cause substantial headaches for people
  building and installing numpy.
 
  I appreciate this. No doubt at all.
 
  However, they are frequently worth it
  because those operations are often bottlenecks in whole applications.
  sum(), even in its stupidest implementation, rarely is. In the places
  where it is a significant bottleneck, an ad hoc implementation in C or
  Cython or even FORTRAN for just that application is pretty easy to
  write.
 
  But here I have to disagree; I'll think that at least I (if not even the
  majority of numpy users) don't like (nor I'm be capable/ or have enough
  time/ resources) go to dwell such details.

 And you think we have the time and resources to do it for you?

My intention was not to suggest anything like that.


  I'm sorry but I'll have to
  restate that it's quite reasonable to expect that sum outperforms dot in
 any
  case.

 You don't optimize a function just because you are capable of it. You
 optimize a function because it is taking up a significant portion of
 total runtime in your real application. Anything else is a waste of
 time.

  Lets now to start make such movements, which enables sum to outperform
  dot.

 Sorry, you don't get to volunteer anyone's time or set anyone's
 priorities but your own. There are some sensible things one could do
 to optimize sums or even general ufunc reductions, as outlined my
 Mark, Pauli and myself, but please stop using the accelerated-BLAS
 dot() as your benchmark. It's a completely inappropriate comparison.

OK. Lets compare sum to itself:
In []: M= randn(1e5, 1e2)
In []: timeit M.sum(0)
10 loops, best of 3: 169 ms per loop
In []: timeit M.sum(1)
10 loops, best of 3: 37.5 ms per loop
In []: timeit M.sum()
10 loops, best of 3: 18.1 ms per loop
In []: timeit sum(M.sum(0))
10 loops, best of 3: 169 ms per loop
In []: timeit sum(M.sum(1))
10 loops, best of 3: 37.7 ms per loop
In []: timeit M.T.sum(0)
10 loops, best of 3: 37.7 ms per loop
In []: timeit M.T.sum(1)
10 loops, best of 3: 171 ms per loop
In []: timeit M.T.sum()
1 loops, best of 3: 288 ms per loop
In []: timeit sum(M.T.sum(0))
10 loops, best of 3: 37.7 ms per loop
In []: timeit sum(M.T.sum(1))
10 loops, best of 3: 170 ms per loop
In []: 288./ 18.1
Out[]: 15.91160220994475


  You can gain speed by specializing to just your use case, e.g.
  contiguous data, summing down to one number, or summing along one axis
  of only 2D data, etc. There's usually no reason to try to generalize
  that implementation to put it back into numpy.
 
  Yes, I would really like to specialize into my case, but 'without going
 out
  the python realm.'

 The Bottleneck project is a good place for such things. It's a nice
 middle-ground for somewhat-specialized routines that are still pretty
 common but not general enough to be in numpy yet.

  http

Re: [Numpy-discussion] Vectorize or rewrite function to work with array inputs?

2011-02-01 Thread eat

Hi David,

(I think the discussion has divergegd, but newer mind ;-)

I didn't treat far as a vector (yet, but for sure it's doable). Anyway befor
going on more details I'll suggest you*ll study my new attachment. As far I
can see, you just have 'simple' polynomial evaluatons, which I have
refactored such a way that you just need to provide the coefficients arrays.
There is no need to write explicitly polynomial evaluation in the code. My
opinion is that the computational side (if your functions really are like
shown sofar) will be pretty straightforward. Then it boils down how to
handle the coefficient arrays reasonable (perhaps some kind of lightweigt
'database' for them ;-).

Please feel free to provide any more information.

Regards,
eat

On Tue, Feb 1, 2011 at 10:20 PM, dpar...@chromalloy.com wrote:

 I'm not sure I need to dive into cython or C for this - performance is not
 an issue for my problem - I just want a flexible function that will accept
 scalars or arrays.

 Both Sebastian's and eat's suggestions show using indexing to handle the
 conditional statements in the original function. The problem I'm having
 implementing this is in getting the input arguments and outputs to a common
 array size. Here's how I can do this but it seems ugly:

 # t and far are function arguments which may be scalars or arrays
 # ag is the output array
 # need to make everything array with common length
 t = np.array(t, ndmin=1)# Convert t to an array
 far = np.array(far, ndmin=1)# Convert far to an array
 ag = t*far*np.nan# Make an output array of the
 proper length using broadcasting rules
 t = np.zeros_like(ag)+t# Expand t to the length of the
 output array
 far = np.zeros_like(ag)+far# Expand far to the length of the output
 array

 Now with all arrays the same length I can use indexing with logical
 statements:
 ag[far0.005] = -3.472487e-22 * t ** 6. + 6.218811e-18 * t ** 5. -
 4.428098e-14 * t ** 4. + \
 1.569889e-10 * t ** 3. - 0.002753524 * t ** 2. +
 0.0001684666 * t + 1.368652

 The resulting code looks like this:
 import numpy as np

 def air_gamma_dp(t, far=0.0):
 
 Specific heat ratio (gamma) of Air/JP8
 t - static temperature, Rankine
 [far] - fuel air ratio [- defaults to 0.0 (dry air)]
 air_gamma - specific heat ratio
 
 t = np.array(t, ndmin=1)
 far = np.array(far, ndmin=1)
 ag = t*far*np.nan
 t = np.zeros_like(ag)+t
 far = np.zeros_like(ag)+far

 far[(far0.) | (far0.069)] = np.nan
 t[(t  379.) | (t  4731.)] = np.nan
 ag[(far0.005)] = -3.472487e-22 * t ** 6. + 6.218811e-18 * t ** 5. -
 4.428098e-14 * t ** 4. +
1.569889e-10 * t ** 3. - 0.002753524 * t ** 2. +
 0.0001684666 * t + 1.368652
 t[(t  699.) | (t  4731.)] = np.nan
 a6 = 4.114808e-20 * far ** 3. - 1.644588e-20 * far ** 2. + 3.103507e-21
 * far - 3.391308e-22
 a5 = -6.819015e-16 * far ** 3. + 2.773945e-16 * far ** 2. -
 5.469399e-17 * far + 6.058125e-18
 a4 = 4.684637e-12 * far ** 3. - 1.887227e-12 * far ** 2. + 3.865306e-13
 * far - 4.302534e-14
 a3 = -0.0001700602 * far ** 3. + 0.6593809 * far ** 2. -
 0.1392629 * far + 1.520583e-10
 a2 = 0.3431136 * far ** 3. - 0.1248285 * far ** 2. +
 0.02688007 * far - 0.002651616
 a1 = -0.03792449 * far ** 3. + 0.01261025 * far ** 2. - 0.002676877 *
 far + 0.0001580424
 a0 = 13.65379 * far ** 3. - 3.311225 * far ** 2. + 0.3573201 * far +
 1.372714
 ag[far=0.005] = a6 * t ** 6. + a5 * t ** 5. + a4 * t ** 4. + a3 * t **
 3. + a2 * t ** 2. + a1 * t + a0
 return ag

 I was hoping there was a more elegant way to do this.

 David Parker
 Chromalloy - TDAG



 From:John Salvatier jsalv...@u.washington.edu
 To:Discussion of Numerical Python numpy-discussion@scipy.org
 Date:02/01/2011 02:29 PM
  Subject:Re: [Numpy-discussion] Vectorize or rewrite function to
 work with array inputs?
 Sent by:numpy-discussion-boun...@scipy.org
 --



 Have you thought about using cython to work with the numpy C-API (*
 http://wiki.cython.org/tutorials/numpy#UsingtheNumpyCAPI*http://wiki.cython.org/tutorials/numpy#UsingtheNumpyCAPI)?
 This will be fast, simple (you can mix and match Python and Cython).

 As for your specific issue: you can simply cast to all the inputs to numpy
 arrays (using asarray *
 http://docs.scipy.org/doc/numpy/reference/generated/numpy.asarray.html*http://docs.scipy.org/doc/numpy/reference/generated/numpy.asarray.html)
 to deal with scalars. This will make sure they get broadcast correctly.

 On Tue, Feb 1, 2011 at 11:22 AM, 
 *dpar...@chromalloy.com*dpar...@chromalloy.com
 wrote:
 Thanks for the advice.

 Using Sebastian's advice I was able to write a version that worked when the
 input arguments are both arrays with the same length. The code provided by
 eat works when t is an array

Re: [Numpy-discussion] Vectorize or rewrite function to work with array inputs?

2011-01-31 Thread eat

Hi,

On Mon, Jan 31, 2011 at 5:15 PM, dpar...@chromalloy.com wrote:

 I have several functions like the example below that I would like to make
 compatible with array inputs. The problem is the conditional statements give
 a *ValueError: The truth value of an array with more than one element is
 ambiguous. Use a.any() or a.all()*. I can use numpy.vectorize, but if
 possible I'd prefer to rewrite the function. Does anyone have any advice the
 best way to modify the code to accept array inputs? Thanks in advance for
 any assistance.


If I understod your question correctly, then air_gamma could be coded as:
def air_gamma_0(t, far=0.0):

Specific heat ratio (gamma) of Air/JP8
t - static temperature, Rankine
[far] - fuel air ratio [- defaults to 0.0 (dry air)]
air_gamma - specific heat ratio

if far 0.:
return NAN
elif far  0.005:
ag= air_gamma_1(t)
ag[np.logical_or(t 379., t 4731.)]= NAN
return ag
elif far 0.069:
ag= air_gamma_2(t, far)
ag[np.logical_or(t 699., t 4731.)]= NAN
return ag
else:
return NAN
Rest of the code is in the attachment.


My two cents,
eat




 NAN = float('nan')

 def air_gamma(t, far=0.0):
 
 Specific heat ratio (gamma) of Air/JP8
 t - static temperature, Rankine
 [far] - fuel air ratio [- defaults to 0.0 (dry air)]
 air_gamma - specific heat ratio
 
 if far  0.:
 return NAN
 elif far  0.005:
 if t  379. or t  4731.:
 return NAN
 else:
 air_gamma = -3.472487e-22 * t ** 6. + 6.218811e-18 * t ** 5. -
 4.428098e-14 * t ** 4. + 1.569889e-10 * t ** 3. - 0.002753524 * t ** 2.
 + 0.0001684666 * t + 1.368652
 elif far  0.069:
 if t  699. or t  4731.:
 return NAN
 else:
 a6 = 4.114808e-20 * far ** 3. - 1.644588e-20 * far ** 2. +
 3.103507e-21 * far - 3.391308e-22
 a5 = -6.819015e-16 * far ** 3. + 2.773945e-16 * far ** 2. -
 5.469399e-17 * far + 6.058125e-18
 a4 = 4.684637e-12 * far ** 3. - 1.887227e-12 * far ** 2. +
 3.865306e-13 * far - 4.302534e-14
 a3 = -0.0001700602 * far ** 3. + 0.6593809 * far **
 2. - 0.1392629 * far + 1.520583e-10
 a2 = 0.3431136 * far ** 3. - 0.1248285 * far ** 2. +
 0.02688007 * far - 0.002651616
 a1 = -0.03792449 * far ** 3. + 0.01261025 * far ** 2. -
 0.002676877 * far + 0.0001580424
 a0 = 13.65379 * far ** 3. - 3.311225 * far ** 2. + 0.3573201 *
 far + 1.372714
 air_gamma = a6 * t ** 6. + a5 * t ** 5. + a4 * t ** 4. + a3 * t
 ** 3. + a2 * t ** 2. + a1 * t + a0
 elif far = 0.069:
 return NAN
 else:
 return NAN
 return air_gamma

 David Parker
 Chromalloy - TDAG
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




air_gamma.py
Description: Binary data
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Help in speeding up accumulation in a matrix

2011-01-29 Thread eat

Hi,

On Sat, Jan 29, 2011 at 11:01 PM, Nicolas SCHEFFER 
scheffer.nico...@gmail.com wrote:

 Hi all,

 First email to the list for me, I just want to say how grateful I am
 to have python+numpy+ipython etc... for my day to day needs. Great
 combination of software.

 Anyway, I've been having this bottleneck in one my algorithms that has
 been bugging me for quite a while.
 The objective is to speed this part up. I've been doing tons of
 optimization and parallel processing around that piece of code to get
 a decent run time.

 The problem is easy. You want to accumulate in a matrix, a weighted
 sum of other matrices. Let's call this function scale_and_add:
 def scale_and_add_re(R,w,Ms):
(nb_add,mdim,j)=np.shape(Ms)
for i in range(nb_add):
R+=w[i]*Ms[i]
return R
 This 'for' loop bugs me since I know this will slow things down.

 But the dimension of these things are so large that any attempt to
 vectorize this is slower and takes too much memory.
 I typically work with 1000 weights and matrices, matrices of dimension
 of several hundred.

 My current config is:
 In [2]: %timeit scale_and_add_re(R,w,Ms)
 1 loops, best of 3: 392 ms per loop

 In [3]: w.shape
 Out[3]: (2000,)

 In [4]: Ms.shape
 Out[4]: (2000, 200, 200)
 I'd like to be able to double these dimensions.

How this array Ms is created?  Do you really need to have it in the memory
as whole?

Assuming it's created by (200, 200) 'chunks' at a time, then you could
accumulate that right away to R. It still involves Python looping but that's
not so much overhead.


My 2 cents
eat
For instance I could use broadcasting by using a dot product
%timeit dot(Ms.T,w)
1 loops, best of 3: 1.77 s per loop
But this is i) slower ii) takes too much memory
(btw, I'd really need an inplace dot-product in numpy to avoid the
copy in memory, like the blas call in scipy.linalg. But that's for an
other thread...)

The matrices are squared and symmetric. I should be able to get
something out of this, but I never found anything related to this in
Numpy.

I also tried a Cython reimplementation
%timeit scale_and_add_reg(L1,w,Ms)
1 loops, best of 3: 393 ms per loop
It brought nothing in speed.

Here's the code
@cython.boundscheck(False)
def scale_and_add_reg(np.ndarray[float, ndim=2] R, np.ndarray[float,
ndim=1] w, np.ndarray[float, ndim=3] Ms):
   return _scale_and_add_reg(R,w,Ms)

@cython.boundscheck(False)
cdef int _scale_and_add_reg(np.ndarray[float, ndim=2] R,
np.ndarray[float, ndim=1] w, np.ndarray[float, ndim=3] Ms):
   cdef unsigned int mdim
   cdef int nb_add
   (nb_add,mdim,j)=np.shape(Ms)
   cdef unsigned int i
   for i from 0 = i  nb_add:
   R+=w[i]*Ms[i]
   #for j in range(mdim):
   #for k in range(mdim):
   #R[j][k]+=w[i]*Ms[i][j][k]

   return 0

So here's my question if someone has time to answer it: Did I try
anything possible? Should I move along and deal with this bottleneck?
Or is there something else I didn't think of?

Thanks for reading, keep up the great work!

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] tril, triu, document/ implementation conflict

2011-01-26 Thread eat

Hi,

I just noticed a document/ implementation conflict with tril and triu.
According tril documentation it should return of same shape and data-type as

called. But this is not the case at least with dtype bool.

The input shape is referred as (M, N) in tril and triu, but as (N, M) in
tri.
Inconsistent?

Also I'm not very happy with the performance, at least dtype bool can be
accelerated as follows.

In []: M= ones((2000, 3000), dtype= bool)
In []: timeit triu(M)
10 loops, best of 3: 173 ms per loop
In []: timeit triu_(M)
10 loops, best of 3: 107 ms per loop

In []: M= asarray(M, dtype= int)
In []: timeit triu(M)
10 loops, best of 3: 160 ms per loop
In []: timeit triu_(M)
10 loops, best of 3: 163 ms per loop

In []: M= asarray(M, dtype= float)
In []: timeit triu(M)
10 loops, best of 3: 195 ms per loop
In []: timeit triu_(M)
10 loops, best of 3: 157 ms per loop

I have attached a crude 'fix' incase someone is interested.
Regards,
eat


twodim_base_fix.py
Description: Binary data
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] tril, triu, document/ implementation conflict

2011-01-26 Thread eat

Hi,

On Wed, Jan 26, 2011 at 2:35 PM, josef.p...@gmail.com wrote:

  On Wed, Jan 26, 2011 at 7:22 AM, eat e.antero.ta...@gmail.com wrote:
  Hi,
 
  I just noticed a document/ implementation conflict with tril and triu.
  According tril documentation it should return of same shape and data-type
 as
  called. But this is not the case at least with dtype bool.
 
  The input shape is referred as (M, N) in tril and triu, but as (N, M) in
  tri.
  Inconsistent?

Any comments about the names for rows and cols. I prefer (M, N).

  
  Also I'm not very happy with the performance, at least dtype bool can be
  accelerated as follows.
 
  In []: M= ones((2000, 3000), dtype= bool)
  In []: timeit triu(M)
  10 loops, best of 3: 173 ms per loop
  In []: timeit triu_(M)
  10 loops, best of 3: 107 ms per loop
 
  In []: M= asarray(M, dtype= int)
  In []: timeit triu(M)
  10 loops, best of 3: 160 ms per loop
  In []: timeit triu_(M)
  10 loops, best of 3: 163 ms per loop
 
  In []: M= asarray(M, dtype= float)
  In []: timeit triu(M)
  10 loops, best of 3: 195 ms per loop
  In []: timeit triu_(M)
  10 loops, best of 3: 157 ms per loop
 
  I have attached a crude 'fix' incase someone is interested.

 You could open a ticket for this.

 just one comment:
 I don't think this is readable, especially if we only look at the
 source of the function with np.source

 out= mul(ge(so(ar(m.shape[0]), ar(m.shape[1])), -k), m)

 from np.source(np.tri) with numpy 1.5.1
 m = greater_equal(subtract.outer(arange(N), arange(M)),-k)

I agree, thats why I called it crude. Before opening a ticket I'll try to
figure out if there exists somewhere in numpy .astype functionality, but not
copying if allready proper dtype.

Also I'm afraid that I can't produce sufficient testing.

Regards, eat


 Josef

 
  Regards,
  eat
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] How to improve performance of slow tri*_indices calculations?

2011-01-24 Thread eat

Hi,

Running on:
In []: np.__version__
Out[]: '1.5.1'
In []: sys.version
Out[]: '2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)]'

For the reference:
In []: X= randn(10, 125)
In []: timeit dot(X.T, X)
1 loops, best of 3: 170 us per loop
In []: X= randn(10, 250)
In []: timeit dot(X.T, X)
1000 loops, best of 3: 671 us per loop
In []: X= randn(10, 500)
In []: timeit dot(X.T, X)
100 loops, best of 3: 5.15 ms per loop
In []: X= randn(10, 1000)
In []: timeit dot(X.T, X)
100 loops, best of 3: 20 ms per loop
In []: X= randn(10, 2000)
In []: timeit dot(X.T, X)
10 loops, best of 3: 80.7 ms per loop

Performance of triu_indices:
In []: timeit triu_indices(125)
1000 loops, best of 3: 662 us per loop
In []: timeit triu_indices(250)
100 loops, best of 3: 2.55 ms per loop
In []: timeit triu_indices(500)
100 loops, best of 3: 15 ms per loop
In []: timeit triu_indices(1000)
10 loops, best of 3: 59.8 ms per loop
In []: timeit triu_indices(2000)
1 loops, best of 3: 239 ms per loop

So the tri*_indices calculations seems to be unreasonable slow compared to for 
example calculations of inner products.

Now, just to compare for a very naive implementation of triu indices.
In []: def iut(n):
   ..: r= np.empty(n* (n+ 1)/ 2, dtype= int)
   ..: c= r.copy()
   ..: a= np.arange(n)
   ..: m= 0
   ..: for i in xrange(n):
   ..: ni= n- i
   ..: mni= m+ ni
   ..: r[m: mni]= i
   ..: c[m: mni]= a[i: n]
   ..: m+= ni
   ..: return (r, c)
   ..:

Are we really calculating the same thing?
In []: triu_indices(5)
Out[]:
(array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]),
 array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4]))
In []: iut(5)
Out[]:
(array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]),
 array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4]))

Seems so, and then its performance:
In []: timeit iut(125)
1000 loops, best of 3: 992 us per loop
In []: timeit iut(250)
100 loops, best of 3: 2.03 ms per loop
In []: timeit iut(500)
100 loops, best of 3: 5.3 ms per loop
In []: timeit iut(1000)
100 loops, best of 3: 13.9 ms per loop
In []: timeit iut(2000)
10 loops, best of 3: 39.8 ms per loop

Even the naive implementation is very slow, but allready outperforms 
triu_indices, when n is  250!

So finally my question is how one could substantially improve the performance 
of indices calculations?


Regards,
eat


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Crosstabulation

2010-07-19 Thread eat

Ionut Sandric sandricionut at yahoo.com writes:

 Thank you Zack:
 
 By raster data I mean classified slope gradient (derived from a dem), 
landuse-landcover, lithology etc. A
 crosstabulation analysis will give me a table with the common areas for each 
class from each raster and
 this will go into other analysis. I can do it with other softwares (like 
ArcGIS DEsktop etc), but I would
 like to have all with numpy or to build something on top of numpy
 
 Thanks's again
 
 Ionut
 
 - Original Message -
 From: Zachary Pincus zachary.pincus at yale.edu
 To: Discussion of Numerical Python numpy-discussion at scipy.org
 Sent: Wednesday, July 14, 2010 9:42:49 PM GMT +02:00 Athens, Beirut, 
Bucharest, Istanbul
 Subject: Re: [Numpy-discussion] Crosstabulation
 
 Hi Ionut,
 
 Check out the tabular package:
 http://parsemydata.com/tabular/index.html
 
 It seems to be basically what you want... it does pivot tables (aka  
 crosstabulation), it's built on top of numpy, and has simple data IO  
 tools.
 
 Also check out this discussion on pivot tables from the numpy list a  
 while ago:
 http://mail.scipy.org/pipermail/numpy-discussion/2007-August/028739.html
 
 One question -- what do you mean by raster data? In my arena, that  
 usually means images... and I'm not sure what a crosstabulation on  
 image data would mean!
 
 Zach
 
 On Jul 14, 2010, at 2:28 PM, Ionut Sandric wrote:
 
 
  Sorry, the first email was sent before to finish it...
 
 
  Hi:
 
  I have two raster data and I would like to do a crosstabulation  
  between them and export the results to a table in a text file. Is it  
  possible to do it with NumPy? Does someone have an example?
 
  Thank you,
 
  Ionut
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion at scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion at scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
Hi,

You may be able to adapt this simple script to your case.

import numpy as np

# generate some test data
def gen(b, e, m, n):
return np.arange(b, e+ 1), np.random.randint(b, e+ 1, (m, n))
m, n= 15, 15
c1, d1= gen(0, 3, m, n); print d1
c2, d2= gen(3, 5, m, n); print d2

# perform actual x-tabulation
xtab= np.zeros((len(c1), len(c2)), np.int)
for i in xrange(len(c1)):
tmp= d2[c1[i]== d1]
for j in xrange(len(c2)):
xtab[i, j]= np.sum(c2[j]== tmp)
print xtab, np.sum(xtab)== np.prod(d1.shape)


Anyway it's straightforward to extend it to nd x-tabulations ;-).


My 2 cents,
eat



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Find indices of largest elements

2010-04-15 Thread eat

Nikolaus Rath Nikolaus at rath.org writes:

[snip]
 Not quite, because I'm interested in the n largest values over all
 elements, not the largest element in each row or column. But Keith's
 solution seems to work fine, even though I'm still struggling to
 understand what's going on there .

My bad. I just concentrated on your example, not the actual question.

However, what's wrong with your above approach
[ np.unravel_index(i, x.shape) for i in idx[-3:] ] ?

Especially if your n largest elements are just a small fraction of all 
elements.

# Note also the differencies
a= np.asarray([[1, 8, 2], [2, 3, 3], [3, 4, 1]])
n= 3
# between
print [np.unravel_index(ind, a.shape) for ind in np.argsort(a.ravel())[-n:]]
# and
print [np.where(val== a) for val in np.sort(a.ravel())[-n:]]


Regards,
eat
 
 Best,
 
-Nikolaus
 




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Find indices of largest elements

2010-04-14 Thread eat

Nikolaus Rath Nikolaus at rath.org writes:

 
 Hello,
 
 How do I best find out the indices of the largest x elements in an
 array?
 
 Example:
 
 a = [ [1,8,2], [2,1,3] ]
 magic_function(a, 2) == [ (0,1), (1,2) ]
 
 Since the largest 2 elements are at positions (0,1) and (1,2).
 
 Best,
 
-Niko
 
Hi,

Just
a= np.asarray([[1, 8, 2], [2, 1, 3]])
print np.where((a.T== a.max(axis= 1)).T)

However, if any row contains more than 1 max entity, above will fail. Please 
let me to know if that's relevant for you.

-eat


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Getting index of array after applying cond

2010-04-02 Thread eat

Shailendra shailendra.vikas at gmail.com writes:

 
 I forgot to mention that i wanted this to work for general shape. So i
 modified it little bit
 
  x = array([[1,2,3,4,5], [6,7,8,7,6], [1,2,3,4,5]])
  cond = (x  5)
  loc= where(cond)
  arg_max=argmax(x[cond])
  x[tuple([e[arg_max] for e in loc])]
 8

But what happens if your x is for example
x = array([[1,2,3,4,5], [6,8,8,8,6], [1,2,3,4,5]])

x[tuple([e[arg_max] for e in loc])]
would still yield to 8,
which may or may not be an acceptable answer.

Basically what I mean is that why to bother with the argmax at all,
if your only interest is x[cond].max()?


Just my 2 cents.


Regards,
eat


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Getting index of array after applying cond

2010-04-02 Thread eat

Shailendra shailendra.vikas at gmail.com writes:

 
 Well, this is just a toy problem. argmax represent a method which will
 give me a index in x[cond] . And for the case of multiple value my
 requirement is fine with getting  any max index.

OK. My concern seems to be needless then.

BTW, does this current thread relate anyway to the earlier one 'Match two 
arrays'? If so, would you like to elaborate more about your 'real' problem?


Regards,
eat
 
 Thanks,
 Shailendra


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] quot;Matchquot; two arrays

2010-04-01 Thread eat

Shailendra shailendra.vikas at gmail.com writes:

 
 Hi All,
 I want to make a function which should be like this
 code
 cordinates1=(x1,y1) # x1 and y1 are x-cord and y-cord of a large
 number of points
 cordinates2=(x2,y2) # similar to condinates1
 indices1,indices2= match_cordinates(cordinates1,cordinates2)
 code
 (x1[indices1],y1[indices1]) matches (x2[indices2],y2[indices2])
 
 where definition of match is such that :
 If A is closest point to B and distance between A and B is less that
 delta than it is a match.
 If A is closest point to B and distance between A and B is more that
 delta than there is no match.
 Every point has either 1 match(closest point) or none

This logic is problematic in general case. See below. You may need to be able 
to handle several pairs of 'closest points'!
 
 Also, the size of the cordinates1 and cordinates2 are quite large and
 outer should not be used. I can think of only C style code to
 achieve this. Can any one suggest pythonic way of doing this?
 
 Thanks,
 Shailendra
 
This is straightforward implementation as a starting point.


eat

code
import numpy as np

def dist(p1, p2):
return np.sqrt(np.sum((p1- p2)** 2, 0))

def cdist(p1, p2, trh):
Expects 2d arrays p1 and p2, with combatible first dimesions
and a threshold.

Returns indicies of points close to each other
-ind[:, 0], array of p1 indicies
-ind[:, 1], array of p2 indicies
-ambi, list of list of ambiquous situations (where more
than 1 pair of points are 'equally close')

The indicies are aranged such that
dist(p1[:, ind[k, 0]], p2[:, ind[k, 1]]) trh
is true for all k.

ind= []
ambi= []
for k in range(p2.shape[1]):
d= dist(p1, p2[:, None, k])
i= np.where(d trh)[0]
if 0 len(i):
m= np.where(d[i]== d[i].min())[0] # problematic
i= i[m].tolist()
ind.append([i[0], k])
if 1 len(m):
ambi.append([ind[-1], i])
return np.array(ind), ambi


if __name__ == '__main__':
n= 10
trh= 2e-1
p1= np.round(np.random.rand(2, n), 1)
p2= np.round(p1+ 1e-1* np.random.randn(2, n), 1)
ind, ambi= cdist(p1, p2, trh)
print 'points close to each other:'
if 0 len(ind):
print 'p1:'
print p1[:, ind[:, 0]], ind[:, 0]
print 'p2:'
print p2[:, ind[:, 1]], ind[:, 1]
print 'valid:'
print dist(p1[:, ind[:, 0]], p2[:, ind[:, 1]]) trh
print 'with ambiguous situation(s):'
if ambi:
print ambi
else:
print 'None'
else:
print 'None'


import timeit
n= 1e2
trh= 2e-1
rep= 5
p1= np.random.rand(2, 1e3* n)
p2= np.random.randn(2, n)
def perf():
ind, ambi= cdist(p1, p2, trh)

print 'performance:'
t= np.array(timeit.repeat(perf, repeat= rep, number= 1))/ rep
print 'min: ', t.min(), 'mean: ', t.mean(), 'max: ', t.max()
code



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] amp;quot;Matchamp;quot; two arrays

2010-04-01 Thread eat

Oops.

Wrongly timed.
 t= np.array(timeit.repeat(perf, repeat= rep, number= 1))/ rep

should be
  t= np.array(timeit.repeat(perf, repeat= rep, number= 1))


eat



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

90 matches

Mail list logo