[Numpy-discussion] dtype comparison and hashing
Hello, Currently in numpy comparing dtypes for equality with == does an internal PyArray_EquivTypes check, which means that the dtypes NPY_INT and NPY_LONG compare as equal in python. However, the hash function for dtypes reduces id(), which is therefore inconsistent with ==. Unfortunately I can't produce a python snippet showing this since I don't know how to create a NPY_INT dtype in pure python. Based on the source it looks like hash should raise a type error, since tp_hash is null but tp_richcompare is not. Does the following snippet through an exception for others? import numpy hash(numpy.dtype('int')) 5708736 This might be the problem: /* Macro to get the tp_richcompare field of a type if defined */ #define RICHCOMPARE(t) (PyType_HasFeature((t), Py_TPFLAGS_HAVE_RICHCOMPARE) \ ? (t)-tp_richcompare : NULL) I'm using the default Mac OS X 10.5 installation of python 2.5 and numpy, so maybe those weren't compiled correctly. Has anyone else seen this issue? Thanks, Geoffrey ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] memory usage
Huang-Wen Chen wrote: Robert Kern wrote: from numpy import * for i in range(1000): a = random.randn(512**2) b = a.argsort(kind='quick') Can you try upgrading to numpy 1.2.0? On my machine with numpy 1.2.0 on OS X, the memory usage is stable. I tried the code fragment on two platforms and the memory usage is also normal. 1. numpy 1.1.1, python 2.5.1 on Vista 32bit 2. numpy 1.2.0, python 2.6 on RedHat 64bit If I recall correctly, there were some major improvements in python's memory management/garbage collection from version 2.4 to 2.5. If you could try to upgrade your python to 2.5 (and possibly also your numpy to 1.2.0), you'd probably see some better behaviour. Regards, Vincent. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] LU factorization?
2008/10/15 Charles R Harris [EMAIL PROTECTED]: numpy.linalg has qr and cholesky factorizations, but LU factorization is only available in scipy. That doesn't seem quite right. I think is would make sense to include the LU factorization in numpy among the basic linalg operations, and probably LU_solve also. Thoughts? I've needed it a lot in the past, and it is a perfect fit for numpy.linalg. It also paves the way to a reduced-row-echelon routine in the Matrix class. It seems technically feasable, so I am in favour. Regards Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] var bias reason?
Gabriel Gellner wrote: Some colleagues noticed that var uses biased formula's by default in numpy, searching for the reason only brought up: http://article.gmane.org/gmane.comp.python.numeric.general/12438/match=var+bias which I totally agree with, but there was no response? Any reason for this? I will try to respond to this as it was me who made the change. I think there have been responses, but I think I've preferred to stay quiet rather than feed a flame war. Ultimately, it is a matter of preference and I don't think there would be equal weights given to all the arguments surrounding the decision by everybody. I will attempt to articulate my reasons: dividing by n is the maximum likelihood estimator of variance and I prefer that justification more than the un-biased justification for a default (especially given that bias is just one part of the error in an estimator).Having every package that computes the mean return the un-biased estimate gives it more cultural weight than than the concept deserves, I think. Any surprise that is created by the different default should be mitigated by the fact that it's an opportunity to learn something about what you are doing.Here is a paper I wrote on the subject that you might find useful: https://contentdm.lib.byu.edu/cdm4/item_viewer.php?CISOROOT=/EERCISOPTR=134CISOBOX=1REC=1 (Hopefully, they will resolve a link problem at the above site soon, but you can read the abstract). I'm not trying to persuade anybody with this email (although if you can download the paper at the above link, then I am trying to persuade with that). In this email I'm just trying to give context to the poster as I think the question is legitimate. With that said, there is the ddof parameter so that you can change what the divisor is. I think that is a useful compromise. I'm unhappy with the internal inconsistency of cov, as I think it was an oversight. I'd be happy to see cov changed as well to use the ddof argument instead of the bias keyword, but that is an API change and requires some transition discussion and work. The only other argument I've heard against the current situation is unit testing with MATLAB or R code. Just use ddof=1 when comparing against MATLAB and R code is my suggestion. Best regards, -Travis ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] var bias reason?
On Wed, Oct 15, 2008 at 11:45 PM, Travis E. Oliphant [EMAIL PROTECTED] wrote: Gabriel Gellner wrote: Some colleagues noticed that var uses biased formula's by default in numpy, searching for the reason only brought up: http://article.gmane.org/gmane.comp.python.numeric.general/12438/match=var+bias which I totally agree with, but there was no response? Any reason for this? I will try to respond to this as it was me who made the change. I think there have been responses, but I think I've preferred to stay quiet rather than feed a flame war. Ultimately, it is a matter of preference and I don't think there would be equal weights given to all the arguments surrounding the decision by everybody. I will attempt to articulate my reasons: dividing by n is the maximum likelihood estimator of variance and I prefer that justification more than the un-biased justification for a default (especially given that bias is just one part of the error in an estimator).Having every package that computes the mean return the un-biased estimate gives it more cultural weight than than the concept deserves, I think. Any surprise that is created by the different default should be mitigated by the fact that it's an opportunity to learn something about what you are doing.Here is a paper I wrote on the subject that you might find useful: https://contentdm.lib.byu.edu/cdm4/item_viewer.php?CISOROOT=/EERCISOPTR=134CISOBOX=1REC=1 (Hopefully, they will resolve a link problem at the above site soon, but you can read the abstract). Yes, I hope too, I would be happy to read the article. On the limit of unbiasdness, the following document mentions an example (in a different context than variance estimation): http://www.stat.columbia.edu/~gelman/research/published/badbayesresponsemain.pdf AFAIK, even statisticians who consider themselves as mostly frequentist (if that makes any sense) do not advocate unbiasdness as such an important concept anymore (Larry Wasserman mentions it in his all of statistics). cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] var bias reason?
I'm behind Travis on this one. -- Paul On Wed, Oct 15, 2008 at 11:19 AM, David Cournapeau [EMAIL PROTECTED] wrote: On Wed, Oct 15, 2008 at 11:45 PM, Travis E. Oliphant [EMAIL PROTECTED] wrote: Gabriel Gellner wrote: Some colleagues noticed that var uses biased formula's by default in numpy, searching for the reason only brought up: http://article.gmane.org/gmane.comp.python.numeric.general/12438/match=var+bias which I totally agree with, but there was no response? Any reason for this? I will try to respond to this as it was me who made the change. I think there have been responses, but I think I've preferred to stay quiet rather than feed a flame war. Ultimately, it is a matter of preference and I don't think there would be equal weights given to all the arguments surrounding the decision by everybody. I will attempt to articulate my reasons: dividing by n is the maximum likelihood estimator of variance and I prefer that justification more than the un-biased justification for a default (especially given that bias is just one part of the error in an estimator).Having every package that computes the mean return the un-biased estimate gives it more cultural weight than than the concept deserves, I think. Any surprise that is created by the different default should be mitigated by the fact that it's an opportunity to learn something about what you are doing.Here is a paper I wrote on the subject that you might find useful: https://contentdm.lib.byu.edu/cdm4/item_viewer.php?CISOROOT=/EERCISOPTR=134CISOBOX=1REC=1 (Hopefully, they will resolve a link problem at the above site soon, but you can read the abstract). Yes, I hope too, I would be happy to read the article. On the limit of unbiasdness, the following document mentions an example (in a different context than variance estimation): http://www.stat.columbia.edu/~gelman/research/published/badbayesresponsemain.pdf AFAIK, even statisticians who consider themselves as mostly frequentist (if that makes any sense) do not advocate unbiasdness as such an important concept anymore (Larry Wasserman mentions it in his all of statistics). cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] var bias reason?
Me too. S On Wednesday 15 October 2008 11:31:44 am Paul Barrett wrote: I'm behind Travis on this one. -- Paul On Wed, Oct 15, 2008 at 11:19 AM, David Cournapeau [EMAIL PROTECTED] wrote: On Wed, Oct 15, 2008 at 11:45 PM, Travis E. Oliphant [EMAIL PROTECTED] wrote: Gabriel Gellner wrote: Some colleagues noticed that var uses biased formula's by default in numpy, searching for the reason only brought up: http://article.gmane.org/gmane.comp.python.numeric.general/12438/ match=var+bias which I totally agree with, but there was no response? Any reason for this? I will try to respond to this as it was me who made the change. I think there have been responses, but I think I've preferred to stay quiet rather than feed a flame war. Ultimately, it is a matter of preference and I don't think there would be equal weights given to all the arguments surrounding the decision by everybody. I will attempt to articulate my reasons: dividing by n is the maximum likelihood estimator of variance and I prefer that justification more than the un-biased justification for a default (especially given that bias is just one part of the error in an estimator).Having every package that computes the mean return the un-biased estimate gives it more cultural weight than than the concept deserves, I think. Any surprise that is created by the different default should be mitigated by the fact that it's an opportunity to learn something about what you are doing.Here is a paper I wrote on the subject that you might find useful: https://contentdm.lib.byu.edu/cdm4/item_viewer.php?CISOROOT=/EERC ISOPTR=134CISOBOX=1REC=1 (Hopefully, they will resolve a link problem at the above site soon, but you can read the abstract). Yes, I hope too, I would be happy to read the article. On the limit of unbiasdness, the following document mentions an example (in a different context than variance estimation): http://www.stat.columbia.edu/~gelman/research/published/badbayesres ponsemain.pdf AFAIK, even statisticians who consider themselves as mostly frequentist (if that makes any sense) do not advocate unbiasdness as such an important concept anymore (Larry Wasserman mentions it in his all of statistics). cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion -- Scott M. RansomAddress: NRAO Phone: (434) 296-0320 520 Edgemont Rd. email: [EMAIL PROTECTED] Charlottesville, VA 22903 USA GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] var bias reason?
On Wed, Oct 15, 2008 at 09:45:39AM -0500, Travis E. Oliphant wrote: Gabriel Gellner wrote: Some colleagues noticed that var uses biased formula's by default in numpy, searching for the reason only brought up: http://article.gmane.org/gmane.comp.python.numeric.general/12438/match=var+bias which I totally agree with, but there was no response? Any reason for this? I will try to respond to this as it was me who made the change. I think there have been responses, but I think I've preferred to stay quiet rather than feed a flame war. Ultimately, it is a matter of preference and I don't think there would be equal weights given to all the arguments surrounding the decision by everybody. I will attempt to articulate my reasons: dividing by n is the maximum likelihood estimator of variance and I prefer that justification more than the un-biased justification for a default (especially given that bias is just one part of the error in an estimator).Having every package that computes the mean return the un-biased estimate gives it more cultural weight than than the concept deserves, I think. Any surprise that is created by the different default should be mitigated by the fact that it's an opportunity to learn something about what you are doing.Here is a paper I wrote on the subject that you might find useful: https://contentdm.lib.byu.edu/cdm4/item_viewer.php?CISOROOT=/EERCISOPTR=134CISOBOX=1REC=1 (Hopefully, they will resolve a link problem at the above site soon, but you can read the abstract). Thanks for the reply, I look forward to reading the paper when it is available. The major issue in my mind is not the technical issue but the surprise factor. I can't think of single other package that uses this as the default, and since it is also a method of ndarray (which is a built in type and can't be monkey patched) there is no way of taking a different view (that is supplying my on function) without the confusion I am feeling in my own lab . . . (less technical people need to understand that they shouldn't use a method of the same name) I worry about having numpy take this unpopular stance (as far as packages go) simply to fight the good fight, as a built in method/behaviour of any ndarray, rather than as a built in function, which presents no such problem, as it allows dissent over a clearly muddy issue. Sorry for the noise, and I am happy to see their is a reason, but I can't help but find this a wort for purely pedagogical reasons. Gabriel ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] var bias reason?
Hi, While I disagree, I really do not care because this is documented. But perhaps a clear warning is need at the start so it clear what the default ddof means instead of it being buried in the Notes section. Also I am surprised that you did not directly reference the Stein estimator (your minimum mean-squared estimator) and known effects in your paper: http://en.wikipedia.org/wiki/James-Stein_estimator So I did not find thiss any different from what is already known about the Stein estimator. Bruce PS While I may have gotten access via my University, I did get it from the link *Access this item. https://contentdm.lib.byu.edu/cgi-bin/showfile.exe?CISOROOT=/EERCISOPTR=134filename=135.pdf https://contentdm.lib.byu.edu/cgi-bin/showfile.exe?CISOROOT=/EERCISOPTR=134filename=135.pdf * Travis E. Oliphant wrote: Gabriel Gellner wrote: Some colleagues noticed that var uses biased formula's by default in numpy, searching for the reason only brought up: http://article.gmane.org/gmane.comp.python.numeric.general/12438/match=var+bias which I totally agree with, but there was no response? Any reason for this? I will try to respond to this as it was me who made the change. I think there have been responses, but I think I've preferred to stay quiet rather than feed a flame war. Ultimately, it is a matter of preference and I don't think there would be equal weights given to all the arguments surrounding the decision by everybody. I will attempt to articulate my reasons: dividing by n is the maximum likelihood estimator of variance and I prefer that justification more than the un-biased justification for a default (especially given that bias is just one part of the error in an estimator).Having every package that computes the mean return the un-biased estimate gives it more cultural weight than than the concept deserves, I think. Any surprise that is created by the different default should be mitigated by the fact that it's an opportunity to learn something about what you are doing.Here is a paper I wrote on the subject that you might find useful: https://contentdm.lib.byu.edu/cdm4/item_viewer.php?CISOROOT=/EERCISOPTR=134CISOBOX=1REC=1 (Hopefully, they will resolve a link problem at the above site soon, but you can read the abstract). I'm not trying to persuade anybody with this email (although if you can download the paper at the above link, then I am trying to persuade with that). In this email I'm just trying to give context to the poster as I think the question is legitimate. With that said, there is the ddof parameter so that you can change what the divisor is. I think that is a useful compromise. I'm unhappy with the internal inconsistency of cov, as I think it was an oversight. I'd be happy to see cov changed as well to use the ddof argument instead of the bias keyword, but that is an API change and requires some transition discussion and work. The only other argument I've heard against the current situation is unit testing with MATLAB or R code. Just use ddof=1 when comparing against MATLAB and R code is my suggestion. Best regards, -Travis ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] var bias reason?
On Wed, Oct 15, 2008 at 9:19 AM, David Cournapeau [EMAIL PROTECTED]wrote: On Wed, Oct 15, 2008 at 11:45 PM, Travis E. Oliphant [EMAIL PROTECTED] wrote: Gabriel Gellner wrote: Some colleagues noticed that var uses biased formula's by default in numpy, searching for the reason only brought up: http://article.gmane.org/gmane.comp.python.numeric.general/12438/match=var+bias which I totally agree with, but there was no response? Any reason for this? I will try to respond to this as it was me who made the change. I think there have been responses, but I think I've preferred to stay quiet rather than feed a flame war. Ultimately, it is a matter of preference and I don't think there would be equal weights given to all the arguments surrounding the decision by everybody. I will attempt to articulate my reasons: dividing by n is the maximum likelihood estimator of variance and I prefer that justification more than the un-biased justification for a default (especially given that bias is just one part of the error in an estimator).Having every package that computes the mean return the un-biased estimate gives it more cultural weight than than the concept deserves, I think. Any surprise that is created by the different default should be mitigated by the fact that it's an opportunity to learn something about what you are doing.Here is a paper I wrote on the subject that you might find useful: https://contentdm.lib.byu.edu/cdm4/item_viewer.php?CISOROOT=/EERCISOPTR=134CISOBOX=1REC=1 (Hopefully, they will resolve a link problem at the above site soon, but you can read the abstract). Yes, I hope too, I would be happy to read the article. On the limit of unbiasdness, the following document mentions an example (in a different context than variance estimation): http://www.stat.columbia.edu/~gelman/research/published/badbayesresponsemain.pdfhttp://www.stat.columbia.edu/%7Egelman/research/published/badbayesresponsemain.pdf AFAIK, even statisticians who consider themselves as mostly frequentist (if that makes any sense) do not advocate unbiasdness as Frequently frequentist? Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Any numpy trick for my problem ?
Hi, I got a matrix of 2100 lines, and I want to calculate blockwise mean vectors. Each block consists of 10 consecutive rows. My code looks like this: rv = [] for i in range(0, 2100, 10): rv.append( mean(matrix[i:i+10], axis=0)) return array(rv) Is there a more elegant and may be faster method to perform this calculation ? Greetings, Uwe ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Any numpy trick for my problem ?
That's cool. Thanks for your fast answer. Greetings, Uwe On 15 Okt., 12:56, Charles R Harris [EMAIL PROTECTED] wrote: On Wed, Oct 15, 2008 at 4:47 AM, Uwe Schmitt [EMAIL PROTECTED] wrote: Hi, I got a matrix of 2100 lines, and I want to calculate blockwise mean vectors. Each block consists of 10 consecutive rows. My code looks like this: rv = [] for i in range(0, 2100, 10): rv.append( mean(matrix[i:i+10], axis=0)) return array(rv) Is there a more elegant and may be faster method to perform this calculation ? Something like In [1]: M = np.random.ranf((40,5)) In [2]: M.reshape(4,10,5).mean(axis=1) Out[2]: array([[ 0.57979278, 0.50013352, 0.66783389, 0.4009187 , 0.36379445], [ 0.46938844, 0.34449102, 0.56419189, 0.49134703, 0.61380198], [ 0.5644788 , 0.61734034, 0.3656104 , 0.63147275, 0.46319345], [ 0.56556899, 0.59012606, 0.39691084, 0.26566127, 0.57107896]]) Chuck ___ Numpy-discussion mailing list [EMAIL PROTECTED]://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Array printing differences between 64- and 32-bit platforms
Hi Folks, In porting some code to a 64-bit machine, I ran across the following issue. On the 64-bit machine, an array with dtype=int32 prints the dtype explicitly, whereas on a 32 bit machine it doesn't. The same is true for dtype=intc (since 'intc is int32' -- True), and the converse is true for dtype=int64 and dtype=longlong. Arrays with dtype of plain int both print without the dtype, but then you're not using the same underlying size anymore. I think this is handled in core/numeric.py in array_repr and just above, where certain types are added to _typelessdata if they are subclasses of 'int'. The issubclass test returns different values on platforms with different word lengths. This difference can make doctests fail and although there are doctest tricks to prevent the failure, they're a bit annoying to use. I was wondering about introducing a new print setting that forced dtypes to be printed always. Is there any support for that? Any other ideas would also be most welcome. Thanks, Ken Basye ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] how to save a large array into a file quickly
On Oct 14 15:29 -1000, Eric Firing wrote: frank wang wrote: Hi, I have a large ndarray that I want to dump to a file. I know that I can use a for loop to write one data at a time. Since Python is a very powerfully language, I want to find a way that will dump the data fast and clean. The data can be in floating point or integer. Use numpy.save for a single array, or numpy.savez for multiple ndarrays, assuming you will want to read them back with numpy. If you want to dump to a text file, use numpy.savetxt. If you want to dump to a binary file to be read by another program, you might want to use the tofile method of the ndarray. I've just updated [1] to mention scipy.io.npfile as well as numpy.save friends. Now, I hope that all common ways to read/write arrays are present in one place. [1] http://scipy.org/Cookbook/InputOutput best, steve ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Any numpy trick for my problem ?
On Wed, Oct 15, 2008 at 4:47 AM, Uwe Schmitt [EMAIL PROTECTED] wrote: Hi, I got a matrix of 2100 lines, and I want to calculate blockwise mean vectors. Each block consists of 10 consecutive rows. My code looks like this: rv = [] for i in range(0, 2100, 10): rv.append( mean(matrix[i:i+10], axis=0)) return array(rv) Is there a more elegant and may be faster method to perform this calculation ? Something like In [1]: M = np.random.ranf((40,5)) In [2]: M.reshape(4,10,5).mean(axis=1) Out[2]: array([[ 0.57979278, 0.50013352, 0.66783389, 0.4009187 , 0.36379445], [ 0.46938844, 0.34449102, 0.56419189, 0.49134703, 0.61380198], [ 0.5644788 , 0.61734034, 0.3656104 , 0.63147275, 0.46319345], [ 0.56556899, 0.59012606, 0.39691084, 0.26566127, 0.57107896]]) Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] how to save a large array into a file quickly
On 10/14/2008 9:23 PM frank wang apparently wrote: I have a large ndarray that I want to dump to a file. I know that I can use a for loop to write one data at a time. Since Python is a very powerfully language, I want to find a way that will dump the data fast and clean. The data can be in floating point or integer. Use the ``tofile()`` method: http://www.scipy.org/Numpy_Example_List#head-2acd2a84907edbd410bf426847403ce8ea151814 hth, Alan Isaac ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] memory usage (Emil Sidky)
Huang-Wen Chen wrote: Robert Kern wrote: from numpy import * for i in range(1000): a = random.randn(512**2) b = a.argsort(kind='quick') Can you try upgrading to numpy 1.2.0? On my machine with numpy 1.2.0 on OS X, the memory usage is stable. I tried the code fragment on two platforms and the memory usage is also normal. 1. numpy 1.1.1, python 2.5.1 on Vista 32bit 2. numpy 1.2.0, python 2.6 on RedHat 64bit If I recall correctly, there were some major improvements in python's memory management/garbage collection from version 2.4 to 2.5. If you could try to upgrade your python to 2.5 (and possibly also your numpy to 1.2.0), you'd probably see some better behaviour. Regards, Vincent. Problem fixed. Thanks. But it turns out there were two things going on: (1) Upgrading to numpy 1.2 (even with python 2.4) fixed the memory usage for the loop with argsort in it. (2) Unfortunately, when I went back to my original program and ran it with the upgraded numpy, it still was chewing up tons of memory. I finally found the problem: Consider the following two code snippets (extension of my previous example). from numpy import * d = [] for i in range(1000): a = random.randn(512**2) b = a.argsort(kind= 'quick') c = b[-100:] d.append(c) and from numpy import * d = [] for i in range(1000): a = random.randn(512**2) b = a.argsort(kind= 'quick') c = b[-100:].copy() d.append(c) The difference being that c is a reference to the last 100 elements of b in the first example, while c is a copy of the last 100 in the second example. Both examples yield identical results (provide randn is run with the same seed value). But the former chews up tons of memory, and the latter doesn't. I don't know if this explanation makes any sense, but it is as if python has to keep all the generated b's around in the first example because c is only a reference. Anyway, bottom line is that my problem is solved. Thanks, Emil ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] LU factorization?
On Wed, Oct 15, 2008 at 00:23, Charles R Harris [EMAIL PROTECTED] wrote: Hi All, numpy.linalg has qr and cholesky factorizations, but LU factorization is only available in scipy. That doesn't seem quite right. I think is would make sense to include the LU factorization in numpy among the basic linalg operations, and probably LU_solve also. Thoughts? -1. As far as I am concerned, numpy.linalg exists because Numeric had LinearAlgebra, and we had to provide it to allow people to upgrade. I do not want to see an expansion of functionality to maintain. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] memory usage (Emil Sidky)
When you slice an array, you keep the original array in memory until the slice is deleted. The slice uses the original array memory and is not a copy. The second example explicitly makes a copy. Perry On Oct 15, 2008, at 2:31 PM, emil wrote: Huang-Wen Chen wrote: Robert Kern wrote: from numpy import * for i in range(1000): a = random.randn(512**2) b = a.argsort(kind='quick') Can you try upgrading to numpy 1.2.0? On my machine with numpy 1.2.0 on OS X, the memory usage is stable. I tried the code fragment on two platforms and the memory usage is also normal. 1. numpy 1.1.1, python 2.5.1 on Vista 32bit 2. numpy 1.2.0, python 2.6 on RedHat 64bit If I recall correctly, there were some major improvements in python's memory management/garbage collection from version 2.4 to 2.5. If you could try to upgrade your python to 2.5 (and possibly also your numpy to 1.2.0), you'd probably see some better behaviour. Regards, Vincent. Problem fixed. Thanks. But it turns out there were two things going on: (1) Upgrading to numpy 1.2 (even with python 2.4) fixed the memory usage for the loop with argsort in it. (2) Unfortunately, when I went back to my original program and ran it with the upgraded numpy, it still was chewing up tons of memory. I finally found the problem: Consider the following two code snippets (extension of my previous example). from numpy import * d = [] for i in range(1000): a = random.randn(512**2) b = a.argsort(kind= 'quick') c = b[-100:] d.append(c) and from numpy import * d = [] for i in range(1000): a = random.randn(512**2) b = a.argsort(kind= 'quick') c = b[-100:].copy() d.append(c) The difference being that c is a reference to the last 100 elements of b in the first example, while c is a copy of the last 100 in the second example. Both examples yield identical results (provide randn is run with the same seed value). But the former chews up tons of memory, and the latter doesn't. I don't know if this explanation makes any sense, but it is as if python has to keep all the generated b's around in the first example because c is only a reference. Anyway, bottom line is that my problem is solved. Thanks, Emil ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] LU factorization?
2008/10/15 Robert Kern [EMAIL PROTECTED]: numpy.linalg has qr and cholesky factorizations, but LU factorization is only available in scipy. That doesn't seem quite right. I think is would make sense to include the LU factorization in numpy among the basic linalg operations, and probably LU_solve also. Thoughts? -1. As far as I am concerned, numpy.linalg exists because Numeric had LinearAlgebra, and we had to provide it to allow people to upgrade. I do not want to see an expansion of functionality to maintain. It's silly to have a crippled linear algebra module in NumPy. Either take it out, or finish it. NumPy without the linear algebra module would make it much less useful to many of us. Regards Stefan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] LU factorization?
On Wed, Oct 15, 2008 at 1:06 PM, Robert Kern [EMAIL PROTECTED] wrote: On Wed, Oct 15, 2008 at 00:23, Charles R Harris [EMAIL PROTECTED] wrote: Hi All, numpy.linalg has qr and cholesky factorizations, but LU factorization is only available in scipy. That doesn't seem quite right. I think is would make sense to include the LU factorization in numpy among the basic linalg operations, and probably LU_solve also. Thoughts? -1. As far as I am concerned, numpy.linalg exists because Numeric had LinearAlgebra, and we had to provide it to allow people to upgrade. I do not want to see an expansion of functionality to maintain. I would be happier with that argument if scipy was broken into separately downloadable modules and released on a regular schedule. As is, I think that exposing already existing (and used) functions in numpy lapack_lite is unlikely to increase the maitainence burden and will add to the usefulness of barebones numpy out of the box. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] dtype comparison and hashing
On Wed, Oct 15, 2008 at 02:20, Geoffrey Irving [EMAIL PROTECTED] wrote: Hello, Currently in numpy comparing dtypes for equality with == does an internal PyArray_EquivTypes check, which means that the dtypes NPY_INT and NPY_LONG compare as equal in python. However, the hash function for dtypes reduces id(), which is therefore inconsistent with ==. Unfortunately I can't produce a python snippet showing this since I don't know how to create a NPY_INT dtype in pure python. Based on the source it looks like hash should raise a type error, since tp_hash is null but tp_richcompare is not. Does the following snippet through an exception for others? import numpy hash(numpy.dtype('int')) 5708736 This might be the problem: /* Macro to get the tp_richcompare field of a type if defined */ #define RICHCOMPARE(t) (PyType_HasFeature((t), Py_TPFLAGS_HAVE_RICHCOMPARE) \ ? (t)-tp_richcompare : NULL) I'm using the default Mac OS X 10.5 installation of python 2.5 and numpy, so maybe those weren't compiled correctly. Has anyone else seen this issue? Actually, the problem is that we provide a hash function explicitly. In multiarraymodule.c: PyArrayDescr_Type.tp_hash = (hashfunc)_Py_HashPointer; That is a violation of the hashing protocol (objects which compare equal and are hashable need to hash equal), and should be fixed. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] LU factorization?
On Wed, Oct 15, 2008 at 14:43, Stéfan van der Walt [EMAIL PROTECTED] wrote: 2008/10/15 Robert Kern [EMAIL PROTECTED]: numpy.linalg has qr and cholesky factorizations, but LU factorization is only available in scipy. That doesn't seem quite right. I think is would make sense to include the LU factorization in numpy among the basic linalg operations, and probably LU_solve also. Thoughts? -1. As far as I am concerned, numpy.linalg exists because Numeric had LinearAlgebra, and we had to provide it to allow people to upgrade. I do not want to see an expansion of functionality to maintain. It's silly to have a crippled linear algebra module in NumPy. Either take it out, or finish it. And what exactly would constitute finishing it? Considering that it has not had LU decompositions since the days it was LinearAlgebra without anyone stepping up to add it, I hardly consider it crippled without it. It has a clear purpose, replacing LinearAlgebra so that people could upgrade from Numeric. It's not crippled because it doesn't serve some other purpose. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] LU factorization?
On Wed, Oct 15, 2008 at 14:49, Charles R Harris [EMAIL PROTECTED] wrote: On Wed, Oct 15, 2008 at 1:06 PM, Robert Kern [EMAIL PROTECTED] wrote: On Wed, Oct 15, 2008 at 00:23, Charles R Harris [EMAIL PROTECTED] wrote: Hi All, numpy.linalg has qr and cholesky factorizations, but LU factorization is only available in scipy. That doesn't seem quite right. I think is would make sense to include the LU factorization in numpy among the basic linalg operations, and probably LU_solve also. Thoughts? -1. As far as I am concerned, numpy.linalg exists because Numeric had LinearAlgebra, and we had to provide it to allow people to upgrade. I do not want to see an expansion of functionality to maintain. I would be happier with that argument if scipy was broken into separately downloadable modules and released on a regular schedule. Then that is the deficiency that we should spend time on, not duplicating the functionality again. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Array printing differences between 64- and 32-bit platforms
On Wed, Oct 15, 2008 at 10:52 AM, Ken Basye [EMAIL PROTECTED] wrote: Hi Folks, In porting some code to a 64-bit machine, I ran across the following issue. On the 64-bit machine, an array with dtype=int32 prints the dtype explicitly, whereas on a 32 bit machine it doesn't. The same is true for dtype=intc (since 'intc is int32' -- True), and the converse is true for dtype=int64 and dtype=longlong. Arrays with dtype of plain int both print without the dtype, but then you're not using the same underlying size anymore. I think this is handled in core/numeric.py in array_repr and just above, where certain types are added to _typelessdata if they are subclasses of 'int'. The issubclass test returns different values on platforms with different word lengths. This difference can make doctests fail and although there are doctest tricks to prevent the failure, they're a bit annoying to use. I was wondering about introducing a new print setting that forced dtypes to be printed always. Is there any support for that? Any other ideas would also be most welcome. I'm inclined to say always print the type, but that is a behavior change and might break some current code. I'm not sure how we should handle small fixups like that. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] LU factorization?
On Wed, Oct 15, 2008 at 2:04 PM, Robert Kern [EMAIL PROTECTED] wrote: On Wed, Oct 15, 2008 at 14:49, Charles R Harris [EMAIL PROTECTED] wrote: On Wed, Oct 15, 2008 at 1:06 PM, Robert Kern [EMAIL PROTECTED] wrote: On Wed, Oct 15, 2008 at 00:23, Charles R Harris [EMAIL PROTECTED] wrote: Hi All, numpy.linalg has qr and cholesky factorizations, but LU factorization is only available in scipy. That doesn't seem quite right. I think is would make sense to include the LU factorization in numpy among the basic linalg operations, and probably LU_solve also. Thoughts? -1. As far as I am concerned, numpy.linalg exists because Numeric had LinearAlgebra, and we had to provide it to allow people to upgrade. I do not want to see an expansion of functionality to maintain. I would be happier with that argument if scipy was broken into separately downloadable modules and released on a regular schedule. Then that is the deficiency that we should spend time on, not duplicating the functionality again. Should we break out the linear algebra part of scipy and make it a separate package? I suspect that would add to the build burden, because we would then have a new package to maintain and release binaries for. I don't see the problem with having a bit of linear algebra as part of the numpy base package. My own feeling is that numpy isn't the bare core of array functionality, rather it is the elementary or student version with enough functionality to be useful while scipy adds advanced features that commercial packages would charge extra for. To some extent this is also a matter of hierarchy, as numpy includes functions used by packages further up the food chain. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] LU factorization?
On Wed, Oct 15, 2008 at 15:21, Charles R Harris [EMAIL PROTECTED] wrote: On Wed, Oct 15, 2008 at 2:04 PM, Robert Kern [EMAIL PROTECTED] wrote: On Wed, Oct 15, 2008 at 14:49, Charles R Harris [EMAIL PROTECTED] wrote: On Wed, Oct 15, 2008 at 1:06 PM, Robert Kern [EMAIL PROTECTED] wrote: On Wed, Oct 15, 2008 at 00:23, Charles R Harris [EMAIL PROTECTED] wrote: Hi All, numpy.linalg has qr and cholesky factorizations, but LU factorization is only available in scipy. That doesn't seem quite right. I think is would make sense to include the LU factorization in numpy among the basic linalg operations, and probably LU_solve also. Thoughts? -1. As far as I am concerned, numpy.linalg exists because Numeric had LinearAlgebra, and we had to provide it to allow people to upgrade. I do not want to see an expansion of functionality to maintain. I would be happier with that argument if scipy was broken into separately downloadable modules and released on a regular schedule. Then that is the deficiency that we should spend time on, not duplicating the functionality again. Should we break out the linear algebra part of scipy and make it a separate package? I suspect that would add to the build burden, because we would then have a new package to maintain and release binaries for. I don't see the problem with having a bit of linear algebra as part of the numpy base package. Which bits? The current set has worked fine for more than 10 years. Where do we stop? There will always be someone who wants just one more function. And a case can always be made that adding just that function is reasonable. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] LU factorization?
On 10/15/2008 4:26 PM Robert Kern apparently wrote: Which bits? Those in lapack_lite? Alan Isaac ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] LU factorization?
On Wed, Oct 15, 2008 at 15:33, Charles R Harris [EMAIL PROTECTED] wrote: On Wed, Oct 15, 2008 at 2:26 PM, Robert Kern [EMAIL PROTECTED] wrote: On Wed, Oct 15, 2008 at 15:21, Charles R Harris [EMAIL PROTECTED] wrote: On Wed, Oct 15, 2008 at 2:04 PM, Robert Kern [EMAIL PROTECTED] wrote: On Wed, Oct 15, 2008 at 14:49, Charles R Harris [EMAIL PROTECTED] wrote: On Wed, Oct 15, 2008 at 1:06 PM, Robert Kern [EMAIL PROTECTED] wrote: On Wed, Oct 15, 2008 at 00:23, Charles R Harris [EMAIL PROTECTED] wrote: Hi All, numpy.linalg has qr and cholesky factorizations, but LU factorization is only available in scipy. That doesn't seem quite right. I think is would make sense to include the LU factorization in numpy among the basic linalg operations, and probably LU_solve also. Thoughts? -1. As far as I am concerned, numpy.linalg exists because Numeric had LinearAlgebra, and we had to provide it to allow people to upgrade. I do not want to see an expansion of functionality to maintain. I would be happier with that argument if scipy was broken into separately downloadable modules and released on a regular schedule. Then that is the deficiency that we should spend time on, not duplicating the functionality again. Should we break out the linear algebra part of scipy and make it a separate package? I suspect that would add to the build burden, because we would then have a new package to maintain and release binaries for. I don't see the problem with having a bit of linear algebra as part of the numpy base package. Which bits? The current set has worked fine for more than 10 years. Where do we stop? There will always be someone who wants just one more function. And a case can always be made that adding just that function is reasonable. I would just add the bits that are already there and don't add any extra dependencies, i.e., they are there when numpy is built without ATLAS or other external packages. The determinant function in linalg uses the LU decomposition, so I don't see why that shouldn't be available to the general user. I'm softening to this argument. But mostly because it will be you who will have to defend this arbitrary line in the sand in the future rather than me. :-) -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] LU factorization?
2008/10/15 Robert Kern [EMAIL PROTECTED]: Which bits? The current set has worked fine for more than 10 years. I'm surprised no-one has requested the LU decomposition in NumPy before -- it is a fundamental building block in linear algebra. I think it is going too far, stating that NumPy's linear algebra module serves simply as an upgrade path for those coming from Numeric. Its use has developed far beyond that. Where do we stop? There will always be someone who wants just one more function. And a case can always be made that adding just that function is reasonable. I'd rather we examine each function on a case-by-case basis, than to put a solid freeze on NumPy. Regards Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] LU factorization?
Charles R Harris wrote: I would just add the bits that are already there and don't add any extra dependencies, i.e., they are there when numpy is built without ATLAS or other external packages. The determinant function in linalg uses the LU decomposition, so I don't see why that shouldn't be available to the general user. If LU is already part of lapack_lite and somebody is willing to put in the work to expose the functionality to the end user in a reasonable way, then I think it should be added. -Travis ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] LU factorization?
If LU is already part of lapack_lite and somebody is willing to put in the work to expose the functionality to the end user in a reasonable way, then I think it should be added. +1 ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] LU factorization?
OK, I take this as a go ahead with the proviso that it's my problem. The big question is naming. Scipy has lu lu_factor lu_solve cholesky cho_factor cho_solve The code for lu and lu_factor isn't the same, although they both look to call the same underlying function; the same is true of the cholesky code. I also see various functions with the same names as their numpy counterparts. So my inclination would be to use lu and lu_solve. Likewise, maybe add cho_solve to compliment cholesky. I don't have strong preferences one way or the other. Thoughts? Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion