Re: [Numpy-discussion] SVD does not converge on clean matrix
I had a quick look at the code (https://github.com/numpy/numpy/blob/master/numpy/linalg/linalg.py) and the numpy.linalg.svd function calls lapack_lite.dgesdd (for real matrices) so I guess the non-convergence occurs in this function. As I understood lapack_lite is used by default unless numpy is installed with ATLAS/MKL etc. I wonder why svd works for Nadav and not for anyone else? Any ideas anyone? Charanpal On Sat, 13 Aug 2011 13:13:25 -0600, Charles R Harris wrote: On Thu, Aug 11, 2011 at 7:23 AM, wrote: Hi all, I get an error message numpy.linalg.linalg.LinAlgError: SVD did not converge when calling numpy.linalg.svd on a clean matrix of size (1952, 895). The matrix is clean in the sense that it contains no NaN or Inf values. The corresponding npz file is available here: https://docs.google.com/leaf?id=0Bw0NXKxxc40jMWEyNTljMWUtMzBmNS00NGZmLThhZWUtY2I2MWU2MGZiNDgxhl=fr [1] Here is some information about my setup: I use Python 2.7.1 on Ubuntu 11.04 with numpy 1.6.1. Furthermore, I thought the problem might be solved by recompiling numpy with my local ATLAS library (version 3.8.3), and this didnt seem to help. On another machine with Python 2.7.1 and numpy 1.5.1 the SVD does converge however it contains 1 NaN singular value and 3 negative singular values of the order -10^-1 (singular values should always be non-negative). I also tried computing the SVD of the matrix using Octave 3.2.4 and Matlab 7.10.0.499 (R2010a) 64-bit (glnxa64) and there were no problems. Any help is greatly appreciated. Thanks in advance, Charanpal Fails here also, fedora 15 64 bits AMD 940. There should be a maximum iterations argument somewhere... Chuck Links: -- [1] https://docs.google.com/leaf?id=0Bw0NXKxxc40jMWEyNTljMWUtMzBmNS00NGZmLThhZWUtY2I2MWU2MGZiNDgx|+|amp|+|hl=fr [2] mailto:dhan...@telecom-paristech.fr ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] SVD does not converge on clean matrix
Chuck wrote: Fails here also, fedora 15 64 bits AMD 940. There should be a maximum iterations argument somewhere... Chuck --- *** Here's the FIX: Chuck is right. There is a max iterations. Here is a reply from a thread of mine in this group several years ago about this problem and some comments that might help you. From Mr. Damian Menscher who was kind enough to find the iteration location and provide some insight: Ok, so after several hours of trying to read that code, I found the parameter that needs to be tuned. In case anyone has this problem and finds this thread a year from now, here's your hint: File: Src/dlapack_lite.c Subroutine: dlasd4_ Line: 22562 There's a for loop there that limits the number of iterations to 20. Increasing this value to 50 allows my matrix to converge. I have not bothered to test what the best value for this number is, though. In any case, it appears the number just exists to prevent infinite loops, and 50 isn't really that much closer to infinity than 20 (Actually, I'm just going to set it to 100 so I don't have to think about it ever again.) Damian Menscher -- -=#| Physics Grad Student SysAdmin @ U Illinois Urbana-Champaign |#=- -=#| 488 LLP, 1110 W. Green St, Urbana, IL 61801 Ofc:(217)333-0038 |#=- -=#| 1412 DCL, Workstation Services Group, CITES Ofc:(217)244-3862 |#=- -=#| menscher at uiuc.edu www.uiuc.edu/~menscher/ Fax:(217)333-9819 |#=- My reply and a fix of sorts without changing the hard coded iteration max: I have looked in Src/dlapack_lite.c and line 22562 is no longer a line that sets a max. iterations parameter. There are several set in the file, but that code is hard to figure (sort of a Fortran-in-C hybrid). Here's one, for example: maxit = *n * 6 * *n; // Line 887 I have no idea which parameter to tweak. Apparently this error is still in numpy (at least to my version). Does anyone have a fix? Should I start a ticket (I think this is what people do)? Any help appreciated. I'm using a Mac Book Pro (Intel chip), system 10.4.11, Python 2.4.4. Possible try/except === # A is the original matrix try: U,W,VT=linalg.svd(A) except linalg.linalg.LinAlgError: # Square the matrix and do SVD print Got svd except, trying square of A. A2=dot(conj(A.T),A) U,W2,VT=linalg.svd(A2) This works so far. --- I've been using that simple fix of squaring the original matrix for several years and it's worked every time. I'm not sure why. It was just a test and it worked. You could also change the underlying C or Fortran code, but you then have to recompile everything in numpy. I wasn't that brave. -- Lou Pecora, my views are my own. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Efficient way to load a 1Gb file?
Try the fromiter function, that will allow you to pass an iterator which can read the file line by line and not preload the whole file. file_iterator = iter(open('filename.txt') line_parser = lambda x: map(float,x.split('\t')) a=np.fromiter(itertools.imap(line_parser,file_iterator),dtype=float) You have also the option to iterate the file twice and pass the count argument. //Torgil On Wed, Aug 10, 2011 at 7:22 PM, Russell E. Owen ro...@uw.edu wrote: A coworker is trying to load a 1Gb text data file into a numpy array using numpy.loadtxt, but he says it is using up all of his machine's 6Gb of RAM. Is there a more efficient way to read such text data files? -- Russell ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Questionable reduceat behavior
On Sat, Aug 13, 2011 at 8:06 PM, Mark Wiebe mwwi...@gmail.com wrote: Looks like this is the second-oldest open bug in the bug tracker. http://projects.scipy.org/numpy/ticket/236 For what it's worth, I'm in favour of changing this behavior to be more consistent as proposed in that ticket. -Mark On Thu, Aug 11, 2011 at 11:25 AM, Wes McKinney wesmck...@gmail.com wrote: I'm a little perplexed why reduceat was made to behave like this: In [26]: arr = np.ones((10, 4), dtype=bool) In [27]: arr Out[27]: array([[ True, True, True, True], [ True, True, True, True], [ True, True, True, True], [ True, True, True, True], [ True, True, True, True], [ True, True, True, True], [ True, True, True, True], [ True, True, True, True], [ True, True, True, True], [ True, True, True, True]], dtype=bool) In [30]: np.add.reduceat(arr, [0, 3, 3, 7, 9], axis=0) Out[30]: array([[3, 3, 3, 3], [1, 1, 1, 1], [4, 4, 4, 4], [2, 2, 2, 2], [1, 1, 1, 1]]) this does not seem intuitively correct. Since we have: In [33]: arr[3:3].sum(0) Out[33]: array([0, 0, 0, 0]) I would expect array([[3, 3, 3, 3], [0, 0, 0, 0], [4, 4, 4, 4], [2, 2, 2, 2], [1, 1, 1, 1]]) Obviously I can RTFM and see why it does this (if ``indices[i] = indices[i + 1]``, the i-th generalized row is simply ``a[indices[i]]``), but it doesn't make much sense to me, and I need work around it. Suggestions? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Well, I certainly hope it doesn't get forgotten about for another 5 years. I think having more consistent behavior would be better rather than conforming to a seemingly arbitrary decision made ages ago in Numeric. - Wes ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] help translating matlab to numpy
I'm translating some code from Matlab to numpy, and struggling a bit since I have very little knowledge of Matlab. My question is this - the arg function in Matlab (which seems to be deprecated, they don't show it in their current documentation) is exactly equivalent to what in Numpy? I know it is angle(x, deg=1) to get degrees instead of radians, but what is the output range for the Matlab function -pi to pi, or 0 to 2*pi ? Thanks! -- --- | Alan K. Jackson| To see a World in a Grain of Sand | | a...@ajackson.org | And a Heaven in a Wild Flower, | | www.ajackson.org | Hold Infinity in the palm of your hand | | Houston, Texas | And Eternity in an hour. - Blake | --- ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] help translating matlab to numpy
Le dimanche 14 août 2011 à 12:43 -0500, a...@ajackson.org a écrit : I'm translating some code from Matlab to numpy, and struggling a bit since I have very little knowledge of Matlab. My question is this - the arg function in Matlab (which seems to be deprecated, they don't show it in their current documentation) is exactly equivalent to what in Numpy? I know it is angle(x, deg=1) to get degrees instead of radians, but what is the output range for the Matlab function -pi to pi, or 0 to 2*pi ? Can you tell from which toolbox your arg function comes from ? Using help (or which ?) for instance... It could help! -- Fabrice ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] help translating matlab to numpy
Never mind, I've been digging through too much stuff and got confused... I think trying to read Matlab code can do that to you. 8-) I'm translating some code from Matlab to numpy, and struggling a bit since I have very little knowledge of Matlab. My question is this - the arg function in Matlab (which seems to be deprecated, they don't show it in their current documentation) is exactly equivalent to what in Numpy? I know it is angle(x, deg=1) to get degrees instead of radians, but what is the output range for the Matlab function -pi to pi, or 0 to 2*pi ? Thanks! -- --- | Alan K. Jackson| To see a World in a Grain of Sand | | a...@ajackson.org | And a Heaven in a Wild Flower, | | www.ajackson.org | Hold Infinity in the palm of your hand | | Houston, Texas | And Eternity in an hour. - Blake | --- ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- --- | Alan K. Jackson| To see a World in a Grain of Sand | | a...@ajackson.org | And a Heaven in a Wild Flower, | | www.ajackson.org | Hold Infinity in the palm of your hand | | Houston, Texas | And Eternity in an hour. - Blake | --- ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] SVD does not converge on clean matrix
Thanks very much Lou for the information. I tried delving into the C code and found a line in the dlasd4_ routine which reads: for (niter = iter; niter = MAXITERLOOPS; ++niter) { This is apparently the main loop for this subroutine and the value of MAXITERLOOPS = 100. All I did was increase the maximum number of iterations to 200, and this seemed to solve the problem for the matrix in question. Let this matrix be called A, then P0, o0, Q0 = numpy.linalg.svd(A, full_matrices=False) numpy.linalg.norm((P0*o0).dot(Q0)- A) 1.8558089412794851 numpy.linalg.norm(A) 4.558649005154054 A.shape (1952, 895) It seems A has quite a small norm given its dimension, and perhaps this explains the error in the SVD (the numpy.linalg.norm((P0*o0).dot(Q0)- A) bit). To investigate a little further I tried finding the SVD of A*1000: P0, o0, Q0 = numpy.linalg.svd(A*1000, full_matrices=False) numpy.isfinite(Q0).all() False numpy.isfinite(P0).all() False numpy.isfinite(o0).all() False and hence increasing the number of iterations does not solve the problem in this case. That was about as far as I felt I could go with investigating the C code. In the meanwhile I will try the squaring the matrix solution. Incidentally, I am confused as to why numpy calls the lapack lite routines - when I call numpy.show_config() it seems to have detected my ATLAS libraries and I would have expected it to use those. Charanpal On Sun, 14 Aug 2011 07:27:06 -0700 (PDT), Lou Pecora wrote: Chuck wrote: - Fails here also, fedora 15 64 bits AMD 940. There should be a maximum iterations argument somewhere... Chuck --- *** Here's the FIX: Chuck is right. There is a max iterations. Here is a reply from a thread of mine in this group several years ago about this problem and some comments that might help you. From Mr. Damian Menscher who was kind enough to find the iteration location and provide some insight: Ok, so after several hours of trying to read that code, I found the parameter that needs to be tuned. In case anyone has this problem and finds this thread a year from now, here's your hint: File: Src/dlapack_lite.c Subroutine: dlasd4_ Line: 22562 There's a for loop there that limits the number of iterations to 20. Increasing this value to 50 allows my matrix to converge. I have not bothered to test what the best value for this number is, though. In any case, it appears the number just exists to prevent infinite loops, and 50 isn't really that much closer to infinity than 20 (Actually, I'm just going to set it to 100 so I don't have to think about it ever again.) Damian Menscher -- -=#| Physics Grad Student SysAdmin @ U Illinois Urbana-Champaign |#=- -=#| 488 LLP, 1110 W. Green St, Urbana, IL 61801 Ofc:(217)333-0038 |#=- -=#| 1412 DCL, Workstation Services Group, CITES Ofc:(217)244-3862 |#=- -=#| www.uiuc.edu/~menscher/ Fax:(217)333-9819 |#=- My reply and a fix of sorts without changing the hard coded iteration max: I have looked in Src/dlapack_lite.c and line 22562 is no longer a line that sets a max. iterations parameter. There are several set in the file, but that code is hard to figure (sort of a Fortran-in-C hybrid). Here's one, for example: maxit = *n * 6 * *n; // Line 887 I have no idea which parameter to tweak. Apparently this error is still in numpy (at least to my version). Does anyone have a fix? Should I start a ticket (I think this is what people do)? Any help appreciated. I'm using a Mac Book Pro (Intel chip), system 10.4.11, Python 2.4.4. Possible try/except === # A is the original matrix try: U,W,VT=linalg.svd(A) except linalg.linalg.LinAlgError: # Square the matrix and do SVD print Got svd except, trying square of A. A2=dot(conj(A.T),A) U,W2,VT=linalg.svd(A2) This works so far. --- I've been using that simple fix of squaring the original matrix for several years and it's worked every time. I'm not sure why. It was just a test and it worked. You could also change the underlying C or Fortran code, but you then have to recompile everything in numpy. I wasn't that brave. -- Lou Pecora, my views are my own. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Statistical distributions on samples
You can use scipy.stats.truncnorm, can't you? Unless I misread, you want to sample a normal distribution but with generated values only being within a specified range? However you also say you want to do this with triangular and log normal and for these I presume the easiest way is to sample and then accept/reject. Brennan On 13/08/2011 2:53 a.m., Christopher Jordan-Squire wrote: Hi Andrea--An easy way to get something like this would be import numpy as np import scipy.stats as stats sigma = #some reasonable standard deviation for your application x = stats.norm.rvs(size=1000, loc=125, scale=sigma) x = x[x50] x = x[x200] That will give a roughly normal distribution to your velocities, as long as, say, sigma25. (I'm using the rule of thumb for the normal distribution that normal random samples lie 3 standard deviations away from the mean about 1 out of 350 times.) Though you won't be able to get exactly normal errors about your mean since normal random samples can theoretically be of any size. You can use this same process for any other distribution, as long as you've chosen a scale variable so that the probability of samples being outside your desired interval is really small. Of course, once again your random errors won't be exactly from the distribution you get your original samples from. -Chris JS On Fri, Aug 12, 2011 at 8:32 AM, Andrea Gavana andrea.gav...@gmail.com mailto:andrea.gav...@gmail.com wrote: Hi All, I am working on something that appeared to be a no-brainer issue (at the beginning), by my complete ignorance in statistics is overwhelming and I got stuck. What I am trying to do can be summarized as follows Let's assume that I have to generate a sample of a 1,000 values for a variable (let's say, velocity) using a normal distribution (but later I will have to do it with log-normal, triangular and a couple of others). The only thing I know about this velocity sample is the minimum and maximum values (let's say 50 and 200 respectively) and, obviously for the normal distribution (but not so for the other distributions), the mean value (125 in this case). Now, I would like to generate this sample of 1,000 points, in which none of the point has velocity smaller than 50 or bigger than 200, and the number of samples close to the mean (125) should be higher than the number of samples close to the minimum and the maximum, following some kind of normal distribution. What I have tried up to now is summarized in the code below, but as you can easily see, I don't really know what I am doing. I am open to every suggestion, and I apologize for the dumbness of my question. import numpy from scipy import stats import matplotlib.pyplot as plt minval, maxval = 50.0, 250.0 x = numpy.linspace(minval, maxval, 500) samp = stats.norm.rvs(size=len(x)) pdf = stats.norm.pdf(x) cdf = stats.norm.cdf(x) ppf = stats.norm.ppf(x) ax1 = plt.subplot(2, 2, 1) ax1.plot(range(len(x)), samp) ax2 = plt.subplot(2, 2, 2) ax2.plot(x, pdf) ax3 = plt.subplot(2, 2, 3) ax3.plot(x, cdf) ax4 = plt.subplot(2, 2, 4) ax4.plot(x, ppf) plt.show() Andrea. Imagination Is The Only Weapon In The War Against Reality. http://xoomer.alice.it/infinity77/ ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] __array_wrap__ / __array_finalize__ in NumPy v1.4+
Hello, I'm attempting to implement a subclass of ndarray, and becoming confused about the way __array_wrap__ and __array_finalize__ operate. I boiled it down to a short subclass, which is the example on the website at http://docs.scipy.org/doc/numpy-1.6.0/user/basics.subclassing.html, with one added attribute that is a copy of the self array multiplied by 2. The doubled copy is stored in a plain ndarray. The attachment has the python code. The output below is from NumPy 1.3 and 1.6 (1.4 has the same output as 1.6). The output from 1.3 matches the documentation on the website. In 1.6, __array_wrap__ and __array_finalize__ are invoked in the opposite order, __array_finalize__ appears to be getting an empty array, and array_wrap's argument is no longer an ndarray but rather an instance of the subclass. This doesn't match the documentation so I am not sure if this is the correct behavior in newer NumPy. Is this a bug, or the expected behavior in newer NumPy versions? Am I just missing something simple? The actual code I am trying to write uses essentially the same idea - keeping another array, related to the self array through some calculation, as another object attribute. Is there a better way to accomplish this? Thanks, Aronne NumPy version: 1.3.0 object creation In __array_finalize__: self type class 'array_wrap_test.TestClass', values TestClass([0, 1]) obj type type 'numpy.ndarray', values array([0, 1]) object + ndarray In __array_wrap__: self type class 'array_wrap_test.TestClass', values TestClass([0, 1]) arr type type 'numpy.ndarray', values array([2, 3]) In __array_finalize__: self type class 'array_wrap_test.TestClass', values TestClass([2, 3]) obj type class 'array_wrap_test.TestClass', values TestClass([0, 1]) obj= [0 1] [0 2] ret= [2 3] [4 6] NumPy version: 1.6.0 object creation In __array_finalize__: self type class 'array_wrap_test.TestClass', values TestClass([0, 1]) obj type type 'numpy.ndarray', values array([0, 1]) object + ndarray In __array_finalize__: self type class 'array_wrap_test.TestClass', values TestClass([ 15, 22033837]) obj type class 'array_wrap_test.TestClass', values TestClass([0, 1]) In __array_wrap__: self type class 'array_wrap_test.TestClass', values TestClass([0, 1]) arr type class 'array_wrap_test.TestClass', values TestClass([2, 3]) obj= [0 1] [0 2] ret= [2 3] [ 30 44067674] array_wrap_test.py Description: Binary data ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion