Re: [Numpy-discussion] SVD does not converge on clean matrix

2011-08-14 Thread Charanpal Dhanjal
I had a quick look at the code 
(https://github.com/numpy/numpy/blob/master/numpy/linalg/linalg.py) and 
the numpy.linalg.svd function calls lapack_lite.dgesdd (for real 
matrices) so I guess the non-convergence occurs in this function. As I 
understood lapack_lite is used by default unless numpy is installed with 
ATLAS/MKL etc. I wonder why svd works for Nadav and not for anyone else? 
Any ideas anyone?

Charanpal


On Sat, 13 Aug 2011 13:13:25 -0600, Charles R Harris wrote:
 On Thu, Aug 11, 2011 at 7:23 AM,  wrote:

 Hi all,

 I get an error message numpy.linalg.linalg.LinAlgError: SVD did
 not
 converge when calling numpy.linalg.svd on a clean matrix of size
 (1952,
 895). The matrix is clean in the sense that it contains no NaN or
 Inf
 values. The corresponding npz file is available here:

 
 https://docs.google.com/leaf?id=0Bw0NXKxxc40jMWEyNTljMWUtMzBmNS00NGZmLThhZWUtY2I2MWU2MGZiNDgxhl=fr
 [1]

 Here is some information about my setup: I use Python 2.7.1 on
 Ubuntu
 11.04 with numpy 1.6.1. Furthermore, I thought the problem might be
 solved
 by recompiling numpy with my local ATLAS library (version 3.8.3),
 and this
 didnt seem to help. On another machine with Python 2.7.1 and numpy
 1.5.1
 the SVD does converge however it contains 1 NaN singular value and
 3
 negative singular values of the order -10^-1 (singular values
 should
 always be non-negative).

 I also tried computing the SVD of the matrix using Octave 3.2.4 and
 Matlab
 7.10.0.499 (R2010a) 64-bit (glnxa64) and there were no problems.
 Any help
 is greatly appreciated.

 Thanks in advance,
 Charanpal

 Fails here also, fedora 15 64 bits AMD 940. There should be a maximum
 iterations argument somewhere...

 Chuck



 Links:
 --
 [1]
 
 https://docs.google.com/leaf?id=0Bw0NXKxxc40jMWEyNTljMWUtMzBmNS00NGZmLThhZWUtY2I2MWU2MGZiNDgx|+|amp|+|hl=fr
 [2] mailto:dhan...@telecom-paristech.fr

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] SVD does not converge on clean matrix

2011-08-14 Thread Lou Pecora
Chuck wrote:




Fails here also, fedora 15 64 bits AMD 940. There should be a maximum 
iterations argument somewhere...

Chuck 

---

   ***  Here's the FIX:

Chuck is right.  There is a max iterations.  Here is a reply from a thread of 
mine in this group several years ago about this problem and some comments that 
might help you.

 From Mr. Damian Menscher who was kind enough to find the iteration 
location and provide some insight:

Ok, so after several hours of trying to read that code, I found
the parameter that needs to be tuned.  In case anyone has this
problem and finds this thread a year from now, here's your hint:

File: Src/dlapack_lite.c
Subroutine: dlasd4_
Line: 22562

There's a for loop there that limits the number of iterations to
20.  Increasing this value to 50 allows my matrix to converge.
I have not bothered to test what the best value for this number
is, though.  In any case, it appears the number just exists to
prevent infinite loops, and 50 isn't really that much closer to
infinity than 20  (Actually, I'm just going to set it to 100
so I don't have to think about it ever again.)

Damian Menscher
-- 
-=#| Physics Grad Student  SysAdmin @ U Illinois Urbana-Champaign |#=-
-=#| 488 LLP, 1110 W. Green St, Urbana, IL 61801 Ofc:(217)333-0038 |#=-
-=#| 1412 DCL, Workstation Services Group, CITES Ofc:(217)244-3862 |#=-
-=#| menscher at uiuc.edu www.uiuc.edu/~menscher/ Fax:(217)333-9819 |#=-


 My reply and a fix of sorts without changing the hard coded iteration 
max:

I have looked in Src/dlapack_lite.c and line 22562 is no longer a line that 
sets a max. iterations parameter.  There are several set in the file, but that 
code is hard to figure (sort of a Fortran-in-C hybrid).  

Here's one, for example:

    maxit = *n * 6 * *n;   // Line 887

I have no idea which parameter to tweak.  Apparently this error is still in 
numpy (at least to my version). Does anyone have a fix?  Should I start a 
ticket (I think this is what people do)?  Any help appreciated.

I'm using a Mac Book Pro (Intel chip), system 10.4.11, Python 2.4.4.

 Possible try/except === 

#  A is the original matrix
try:
    U,W,VT=linalg.svd(A)
except linalg.linalg.LinAlgError:  # Square the matrix and do SVD

    print Got svd except, trying square of A.
    A2=dot(conj(A.T),A)
    U,W2,VT=linalg.svd(A2)


This works so far.

---

I've been using that simple fix of squaring the original matrix for several 
years and it's worked every time.  I'm not sure why.  It was just a test and it 
worked.  

You could also change the underlying C or Fortran code, but you then have to 
recompile everything in numpy.  I wasn't that brave.


-- Lou Pecora, my views are my own.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Efficient way to load a 1Gb file?

2011-08-14 Thread Torgil Svensson
Try the fromiter function, that will allow you to pass an iterator
which can read the file line by line and not preload the whole file.

file_iterator = iter(open('filename.txt')
line_parser = lambda x: map(float,x.split('\t'))
a=np.fromiter(itertools.imap(line_parser,file_iterator),dtype=float)

You have also the option to iterate the file twice and pass the
count argument.

//Torgil

On Wed, Aug 10, 2011 at 7:22 PM, Russell E. Owen ro...@uw.edu wrote:
 A coworker is trying to load a 1Gb text data file into a numpy array
 using numpy.loadtxt, but he says it is using up all of his machine's 6Gb
 of RAM. Is there a more efficient way to read such text data files?

 -- Russell

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Questionable reduceat behavior

2011-08-14 Thread Wes McKinney
On Sat, Aug 13, 2011 at 8:06 PM, Mark Wiebe mwwi...@gmail.com wrote:
 Looks like this is the second-oldest open bug in the bug tracker.
 http://projects.scipy.org/numpy/ticket/236
 For what it's worth, I'm in favour of changing this behavior to be more
 consistent as proposed in that ticket.
 -Mark

 On Thu, Aug 11, 2011 at 11:25 AM, Wes McKinney wesmck...@gmail.com wrote:

 I'm a little perplexed why reduceat was made to behave like this:

 In [26]: arr = np.ones((10, 4), dtype=bool)

 In [27]: arr
 Out[27]:
 array([[ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True]], dtype=bool)


 In [30]: np.add.reduceat(arr, [0, 3, 3, 7, 9], axis=0)
 Out[30]:
 array([[3, 3, 3, 3],
       [1, 1, 1, 1],
       [4, 4, 4, 4],
       [2, 2, 2, 2],
       [1, 1, 1, 1]])

 this does not seem intuitively correct. Since we have:

 In [33]: arr[3:3].sum(0)
 Out[33]: array([0, 0, 0, 0])

 I would expect

 array([[3, 3, 3, 3],
       [0, 0, 0, 0],
       [4, 4, 4, 4],
       [2, 2, 2, 2],
       [1, 1, 1, 1]])

 Obviously I can RTFM and see why it does this (if ``indices[i] =
 indices[i + 1]``, the i-th generalized row is simply
 ``a[indices[i]]``), but it doesn't make much sense to me, and I need
 work around it. Suggestions?
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



Well, I certainly hope it doesn't get forgotten about for another 5
years. I think having more consistent behavior would be better rather
than conforming to a seemingly arbitrary decision made ages ago in
Numeric.

- Wes
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] help translating matlab to numpy

2011-08-14 Thread alan
I'm translating some code from Matlab to numpy, and struggling a bit
since I have very little knowledge of Matlab.

My question is this - the arg function in Matlab (which seems to be deprecated,
they don't show it in their current documentation) is exactly equivalent to
what in Numpy? I know it is angle(x, deg=1) to get degrees instead of radians,
but what is the output range for the Matlab function -pi to pi, or 0 to 2*pi ?

Thanks!

-- 
---
| Alan K. Jackson| To see a World in a Grain of Sand  |
| a...@ajackson.org  | And a Heaven in a Wild Flower, |
| www.ajackson.org   | Hold Infinity in the palm of your hand |
| Houston, Texas | And Eternity in an hour. - Blake   |
---
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] help translating matlab to numpy

2011-08-14 Thread Fabrice Silva
Le dimanche 14 août 2011 à 12:43 -0500, a...@ajackson.org a écrit :
 I'm translating some code from Matlab to numpy, and struggling a bit
 since I have very little knowledge of Matlab.
 
 My question is this - the arg function in Matlab (which seems to be 
 deprecated,
 they don't show it in their current documentation) is exactly equivalent to
 what in Numpy? I know it is angle(x, deg=1) to get degrees instead of radians,
 but what is the output range for the Matlab function -pi to pi, or 0 to 2*pi ?

Can you tell from which toolbox your arg function comes from ? Using
help (or which ?) for instance...
It could help!
-- 
Fabrice

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] help translating matlab to numpy

2011-08-14 Thread alan
Never mind, I've been digging through too much stuff and got confused...

I think trying to read Matlab code can do that to you. 8-)

I'm translating some code from Matlab to numpy, and struggling a bit
since I have very little knowledge of Matlab.

My question is this - the arg function in Matlab (which seems to be deprecated,
they don't show it in their current documentation) is exactly equivalent to
what in Numpy? I know it is angle(x, deg=1) to get degrees instead of radians,
but what is the output range for the Matlab function -pi to pi, or 0 to 2*pi ?

Thanks!

-- 
---
| Alan K. Jackson| To see a World in a Grain of Sand  |
| a...@ajackson.org  | And a Heaven in a Wild Flower, |
| www.ajackson.org   | Hold Infinity in the palm of your hand |
| Houston, Texas | And Eternity in an hour. - Blake   |
---
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


-- 
---
| Alan K. Jackson| To see a World in a Grain of Sand  |
| a...@ajackson.org  | And a Heaven in a Wild Flower, |
| www.ajackson.org   | Hold Infinity in the palm of your hand |
| Houston, Texas | And Eternity in an hour. - Blake   |
---
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] SVD does not converge on clean matrix

2011-08-14 Thread Charanpal Dhanjal
Thanks very much Lou for the information. I tried delving into the C 
code and found a line in the dlasd4_ routine which reads:

for (niter = iter; niter = MAXITERLOOPS; ++niter) {

This is apparently the main loop for this subroutine and the value of 
MAXITERLOOPS = 100. All I did was increase the maximum number of 
iterations to 200, and this seemed to solve the problem for the matrix 
in question. Let this matrix be called A, then

 P0, o0, Q0 = numpy.linalg.svd(A, full_matrices=False)
 numpy.linalg.norm((P0*o0).dot(Q0)- A)
1.8558089412794851
 numpy.linalg.norm(A)
4.558649005154054
 A.shape
(1952, 895)

It seems A has quite a small norm given its dimension, and perhaps this 
explains the error in the SVD (the numpy.linalg.norm((P0*o0).dot(Q0)- A) 
bit).

To investigate a little further I tried finding the SVD of A*1000:

 P0, o0, Q0 = numpy.linalg.svd(A*1000, full_matrices=False)
 numpy.isfinite(Q0).all()
False
 numpy.isfinite(P0).all()
False
 numpy.isfinite(o0).all()
False

and hence increasing the number of iterations does not solve the 
problem in this case. That was about as far as I felt I could go with 
investigating the C code. In the meanwhile I will try the squaring the 
matrix solution.

Incidentally, I am confused as to why numpy calls the lapack lite 
routines - when I call numpy.show_config() it seems to have detected my 
ATLAS libraries and I would have expected it to use those.

Charanpal




On Sun, 14 Aug 2011 07:27:06 -0700 (PDT), Lou Pecora wrote:
 Chuck wrote:

 -

 Fails here also, fedora 15 64 bits AMD 940. There should be a maximum
 iterations argument somewhere...

 Chuck

  ---

  *** Here's the FIX:

 Chuck is right. There is a max iterations. Here is a reply from a
 thread of mine in this group several years ago about this problem and
 some comments that might help you.

  From Mr. Damian Menscher who was kind enough to find the
 iteration location and provide some insight:

 Ok, so after several hours of trying to read that code, I found
 the parameter that needs to be tuned. In case anyone has this
 problem and finds this thread a year from now, here's your hint:

 File: Src/dlapack_lite.c
 Subroutine: dlasd4_
 Line: 22562

 There's a for loop there that limits the number of iterations to
 20. Increasing this value to 50 allows my matrix to converge.
 I have not bothered to test what the best value for this number
 is, though. In any case, it appears the number just exists to
 prevent infinite loops, and 50 isn't really that much closer to
 infinity than 20 (Actually, I'm just going to set it to 100
 so I don't have to think about it ever again.)

 Damian Menscher
 --
 -=#| Physics Grad Student  SysAdmin @ U Illinois Urbana-Champaign
 |#=-
 -=#| 488 LLP, 1110 W. Green St, Urbana, IL 61801 Ofc:(217)333-0038
 |#=-
 -=#| 1412 DCL, Workstation Services Group, CITES Ofc:(217)244-3862
 |#=-
 -=#|  www.uiuc.edu/~menscher/ Fax:(217)333-9819 |#=-

  My reply and a fix of sorts without changing the hard coded
 iteration max:

 I have looked in Src/dlapack_lite.c and line 22562 is no longer a 
 line
 that sets a max. iterations parameter. There are several set in the
 file, but that code is hard to figure (sort of a Fortran-in-C 
 hybrid).


 Here's one, for example:

  maxit = *n * 6 * *n; // Line 887

 I have no idea which parameter to tweak. Apparently this error is
 still in numpy (at least to my version). Does anyone have a fix?
 Should I start a ticket (I think this is what people do)? Any help
 appreciated.

 I'm using a Mac Book Pro (Intel chip), system 10.4.11, Python 2.4.4.

  Possible try/except ===

 # A is the original matrix
 try:
  U,W,VT=linalg.svd(A)
 except linalg.linalg.LinAlgError: # Square the matrix and do SVD

  print Got svd except, trying square of A.
  A2=dot(conj(A.T),A)
  U,W2,VT=linalg.svd(A2)

 This works so far.

 
 ---

 I've been using that simple fix of squaring the original matrix
 for several years and it's worked every time. I'm not sure why. It 
 was
 just a test and it worked.

 You could also change the underlying C or Fortran code, but you then
 have to recompile everything in numpy. I wasn't that brave.

 -- Lou Pecora, my views are my own.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Statistical distributions on samples

2011-08-14 Thread Brennan Williams
You can use scipy.stats.truncnorm, can't you? Unless I misread, you want 
to sample a normal distribution but with generated values only being 
within a specified range? However you also say you want to do this with 
triangular and log normal and for these I presume the easiest way is to 
sample and then accept/reject.


Brennan

On 13/08/2011 2:53 a.m., Christopher Jordan-Squire wrote:

Hi Andrea--An easy way to get something like this would be

import numpy as np
import scipy.stats as stats

sigma = #some reasonable standard deviation for your application
x = stats.norm.rvs(size=1000, loc=125, scale=sigma)
x = x[x50]
x = x[x200]

That will give a roughly normal distribution to your velocities, as 
long as, say, sigma25. (I'm using the rule of thumb for the normal 
distribution that normal random samples lie 3 standard deviations away 
from the mean about 1 out of 350 times.) Though you won't be able to 
get exactly normal errors about your mean since normal random samples 
can theoretically be of any size.


You can use this same process for any other distribution, as long as 
you've chosen a scale variable so that the probability of samples 
being outside your desired interval is really small. Of course, once 
again your random errors won't be exactly from the distribution you 
get your original samples from.


-Chris JS

On Fri, Aug 12, 2011 at 8:32 AM, Andrea Gavana 
andrea.gav...@gmail.com mailto:andrea.gav...@gmail.com wrote:


Hi All,

I am working on something that appeared to be a no-brainer
issue (at the beginning), by my complete ignorance in statistics
is overwhelming and I got stuck.

What I am trying to do can be summarized as follows

Let's assume that I have to generate a sample of a 1,000 values
for a variable (let's say, velocity) using a normal distribution
(but later I will have to do it with log-normal, triangular and a
couple of others). The only thing I know about this velocity
sample is the minimum and maximum values (let's say 50 and 200
respectively) and, obviously for the normal distribution (but not
so for the other distributions), the mean value (125 in this case).

Now, I would like to generate this sample of 1,000 points, in
which none of the point has velocity smaller than 50 or bigger
than 200, and the number of samples close to the mean (125) should
be higher than the number of samples close to the minimum and the
maximum, following some kind of normal distribution.

What I have tried up to now is summarized in the code below, but
as you can easily see, I don't really know what I am doing. I am
open to every suggestion, and I apologize for the dumbness of my
question.

import numpy

from scipy import stats
import matplotlib.pyplot as plt

minval, maxval = 50.0, 250.0
x = numpy.linspace(minval, maxval, 500)

samp = stats.norm.rvs(size=len(x))
pdf = stats.norm.pdf(x)
cdf = stats.norm.cdf(x)
ppf = stats.norm.ppf(x)

ax1 = plt.subplot(2, 2, 1)
ax1.plot(range(len(x)), samp)

ax2 = plt.subplot(2, 2, 2)
ax2.plot(x, pdf)

ax3 = plt.subplot(2, 2, 3)
ax3.plot(x, cdf)

ax4 = plt.subplot(2, 2, 4)
ax4.plot(x, ppf)

plt.show()


Andrea.

Imagination Is The Only Weapon In The War Against Reality.
http://xoomer.alice.it/infinity77/


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] __array_wrap__ / __array_finalize__ in NumPy v1.4+

2011-08-14 Thread Aronne Merrelli
Hello,

I'm attempting to implement a subclass of ndarray, and becoming confused
about the way __array_wrap__ and __array_finalize__ operate. I boiled it
down to a short subclass, which is the example on the website at
http://docs.scipy.org/doc/numpy-1.6.0/user/basics.subclassing.html, with one
added attribute that is a copy of the self array multiplied by 2. The
doubled copy is stored in a plain ndarray. The attachment has the python
code.

The output below is from NumPy 1.3 and 1.6 (1.4 has the same output as 1.6).
The output from 1.3 matches the documentation on the website. In 1.6,
__array_wrap__ and __array_finalize__ are invoked in the opposite order,
__array_finalize__ appears to be getting an empty array, and array_wrap's
argument is no longer an ndarray but rather an instance of the subclass.
This doesn't match the documentation so I am not sure if this is the correct
behavior in newer NumPy. Is this a bug, or the expected behavior in newer
NumPy versions?

Am I just missing something simple? The actual code I am trying to write
uses essentially the same idea - keeping another array, related to the self
array through some calculation, as another object attribute.  Is there a
better way to accomplish this?

Thanks,
Aronne



NumPy version:  1.3.0

object creation
In __array_finalize__:
   self type class 'array_wrap_test.TestClass', values TestClass([0, 1])
   obj  type type 'numpy.ndarray', values array([0, 1])

object + ndarray
In __array_wrap__:
   self type class 'array_wrap_test.TestClass', values TestClass([0, 1])
   arr  type type 'numpy.ndarray', values array([2, 3])
In __array_finalize__:
   self type class 'array_wrap_test.TestClass', values TestClass([2, 3])
   obj  type class 'array_wrap_test.TestClass', values TestClass([0, 1])

obj=  [0 1] [0 2]  ret=  [2 3] [4 6]


NumPy version:  1.6.0

object creation
In __array_finalize__:
   self type class 'array_wrap_test.TestClass', values TestClass([0, 1])
   obj  type type 'numpy.ndarray', values array([0, 1])

object + ndarray
In __array_finalize__:
   self type class 'array_wrap_test.TestClass', values TestClass([
15, 22033837])
   obj  type class 'array_wrap_test.TestClass', values TestClass([0, 1])
In __array_wrap__:
   self type class 'array_wrap_test.TestClass', values TestClass([0, 1])
   arr  type class 'array_wrap_test.TestClass', values TestClass([2, 3])

obj=  [0 1] [0 2]  ret=  [2 3] [  30 44067674]


array_wrap_test.py
Description: Binary data
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion