Re: [Numpy-discussion] a question about freeze on numpy 1.7.0
On Mon, Feb 25, 2013 at 3:53 PM, Bradley M. Froehle brad.froe...@gmail.comwrote: I can reproduce with NumPy 1.7.0, but I'm not convinced the bug lies within NumPy. The exception is not being raised on the `del sys` line. Rather it is being raised in numpy.__init__: File /home/bfroehle/.local/lib/python2.7/site-packages/cx_Freeze/initscripts/Console.py, line 27, in module exec code in m.__dict__ File numpytest.py, line 1, in module import numpy File /home/bfroehle/.local/lib/python2.7/site-packages/numpy/__init__.py, line 147, in module from core import * AttributeError: 'module' object has no attribute 'sys' This is because, somehow, `'sys' in numpy.core.__all__` returns True in the cx_Freeze context but False in the regular Python context. -Brad On Sun, Feb 24, 2013 at 10:49 PM, Gelin Yan dynami...@gmail.com wrote: On Mon, Feb 25, 2013 at 9:16 AM, Ondřej Čertík ondrej.cer...@gmail.comwrote: Hi Gelin, On Sun, Feb 24, 2013 at 12:08 AM, Gelin Yan dynami...@gmail.com wrote: Hi All When I used numpy 1.7.0 with cx_freeze 4.3.1 on windows, I quickly found out even a simple import numpy may lead to program failed with following exception: AttributeError: 'module' object has no attribute 'sys' After a poking around some codes I noticed /numpy/core/__init__.py has a line 'del sys' at the bottom. After I commented this line, and repacked the whole program, It ran fine. I also noticed this 'del sys' didn't exist on numpy 1.6.2 I am curious why this 'del sys' should be here and whether it is safe to omit it. Thanks. The del sys line was introduced in the commit: https://github.com/numpy/numpy/commit/4c0576fe9947ef2af8351405e0990cebd83ccbb6 and it seems to me that it is needed so that the numpy.core namespace is not cluttered by it. Can you post the full stacktrace of your program (and preferably some instructions how to reproduce the problem)? It should become clear where the problem is. Thanks, Ondrej ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Hi Ondrej I attached two files here for demonstration. you need cx_freeze to build a standalone executable file. simply running python setup.py build and try to run the executable file you may see this exception. This example works with numpy 1.6.2. Thanks. Regards gelin yan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Hi Bradley So is it supposed to be a bug of cx_freeze? Any work around for that except omit 'del sys'? If the answer is no, I may consider submit a ticket on cx_freeze site. Thanks Regards gelin yan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] a question about freeze on numpy 1.7.0
I submitted a bug report (and patch) to cx_freeze. You can follow up with them at http://sourceforge.net/p/cx-freeze/bugs/36/. -Brad On Mon, Feb 25, 2013 at 12:06 AM, Gelin Yan dynami...@gmail.com wrote: On Mon, Feb 25, 2013 at 3:53 PM, Bradley M. Froehle brad.froe...@gmail.com wrote: I can reproduce with NumPy 1.7.0, but I'm not convinced the bug lies within NumPy. The exception is not being raised on the `del sys` line. Rather it is being raised in numpy.__init__: File /home/bfroehle/.local/lib/python2.7/site-packages/cx_Freeze/initscripts/Console.py, line 27, in module exec code in m.__dict__ File numpytest.py, line 1, in module import numpy File /home/bfroehle/.local/lib/python2.7/site-packages/numpy/__init__.py, line 147, in module from core import * AttributeError: 'module' object has no attribute 'sys' This is because, somehow, `'sys' in numpy.core.__all__` returns True in the cx_Freeze context but False in the regular Python context. -Brad On Sun, Feb 24, 2013 at 10:49 PM, Gelin Yan dynami...@gmail.com wrote: On Mon, Feb 25, 2013 at 9:16 AM, Ondřej Čertík ondrej.cer...@gmail.comwrote: Hi Gelin, On Sun, Feb 24, 2013 at 12:08 AM, Gelin Yan dynami...@gmail.com wrote: Hi All When I used numpy 1.7.0 with cx_freeze 4.3.1 on windows, I quickly found out even a simple import numpy may lead to program failed with following exception: AttributeError: 'module' object has no attribute 'sys' After a poking around some codes I noticed /numpy/core/__init__.py has a line 'del sys' at the bottom. After I commented this line, and repacked the whole program, It ran fine. I also noticed this 'del sys' didn't exist on numpy 1.6.2 I am curious why this 'del sys' should be here and whether it is safe to omit it. Thanks. The del sys line was introduced in the commit: https://github.com/numpy/numpy/commit/4c0576fe9947ef2af8351405e0990cebd83ccbb6 and it seems to me that it is needed so that the numpy.core namespace is not cluttered by it. Can you post the full stacktrace of your program (and preferably some instructions how to reproduce the problem)? It should become clear where the problem is. Thanks, Ondrej ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Hi Ondrej I attached two files here for demonstration. you need cx_freeze to build a standalone executable file. simply running python setup.py build and try to run the executable file you may see this exception. This example works with numpy 1.6.2. Thanks. Regards gelin yan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Hi Bradley So is it supposed to be a bug of cx_freeze? Any work around for that except omit 'del sys'? If the answer is no, I may consider submit a ticket on cx_freeze site. Thanks Regards gelin yan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Leaking memory problem
Hi! I was wondering if anyone could help me in finding a memory leak problem with NumPy. My project is quite massive and I haven't been able to construct a simple example which would reproduce the problem.. I have an iterative algorithm which should not increase the memory usage as the iteration progresses. However, after the first iteration, 1GB of memory is used and it steadily increases until at about 100-200 iterations 8GB is used and the program exits with MemoryError. I have a collection of objects which contain large arrays. In each iteration, the objects are updated in turns by re-computing the arrays they contain. The number of arrays and their sizes are constant (do not change during the iteration). So the memory usage should not increase, and I'm a bit confused, how can the program run out of memory if it can easily compute at least a few iterations.. I've tried to use Pympler, but I've understood that it doesn't show the memory usage of NumPy arrays.. ? I also tried gc.set_debug(gc.DEBUG_UNCOLLECTABLE) and then printing gc.garbage at each iteration, but that doesn't show anything. Does anyone have any ideas how to debug this kind of memory leak bug? And how to find out whether the bug is in my code, NumPy or elsewhere? Thanks for any help! Jaakko ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] What should np.ndarray.__contains__ do
Hello all, currently the `__contains__` method or the `in` operator on arrays, does not return what the user would expect when in the operation `a in b` the `a` is not a single element (see In [3]-[4] below). The first solution coming to mind might be checking `all()` for all dimensions given in argument `a` (see line In [5] for a simplistic example). This does not play too well with broadcasting however, but one could maybe simply *not* broadcast at all (i.e. a.shape == b.shape[b.ndim-a.ndim:]) and raise an error/return False otherwise. On the other hand one could say broadcasting of `a` onto `b` should be any along that dimension (see In [8]). The other way should maybe raise an error though (see In [9] to understand what I mean). I think using broadcasting dimensions where `a` is repeated over `b` as the dimensions to use any logic on is the most general way for numpy to handle this consistently, while the other way around could be handled with an `all` but to me makes so little sense that I think it should be an error. Of course this is different to a list of lists, which gives False in these cases, but arrays are not list of lists... As a side note, since for loop, etc. use for item in array, I do not think that vectorizing along `a` as np.in1d does is reasonable. `in` should return a single boolean. I have opened an issue for it: https://github.com/numpy/numpy/issues/3016#issuecomment-14045545 Regards, Sebastian In [1]: a = np.array([0, 2]) In [2]: b = np.arange(10).reshape(5,2) In [3]: b Out[3]: array([[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]]) In [4]: a in b Out[4]: True In [5]: (b == a).any() Out[5]: True In [6]: (b == a).all(0).any() # the 0 could be multiple axes Out[6]: False In [7]: a_2d = a[None,:] In [8]: a_2d in b # broadcast dimension means any - True Out[8]: True In [9]: [0, 1] in b[:,:1] # should not work (or be False, not True) Out[9]: True ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Adding .abs() method to the array object
First, sorry that i didnt search for an old thread, but because i disagree with conclusion i would at least address my reason: I don't like np.abs(arr).max() because I have to concentrate to much on the braces, especially if arr is a calculation This exactly, adding an abs into an old expression is always a little annoyance due to the parenthesis. The argument that np.abs() also works is true for (almost?) every other method. The fact that so many methods already exists, especially for most of the commonly used functions (min, max, dot, mean, std, argmin, argmax, conj, T) makes me missing abs. Of course, if one would redesign the api, one would drop most methods (i am looking at you ptp and byteswap). But the objected is already cluttered and adding abs is imo logical application of practicality beats purity. greetings Till ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Adding .abs() method to the array object
On Sat, Feb 23, 2013 at 9:34 PM, Benjamin Root ben.r...@ou.edu wrote: My issue is having to remember which ones are methods and which ones are functions. There doesn't seem to be a rhyme or reason for the choices, and I would rather like to see that a line is drawn, but I am not picky as to where it is drawn. I like that. I think it would be a good idea to find a good line for NumPy 2.0. As we already will break the API, why not break it for another part at the same time. I don't have any idea what would be a good line... Do someone have a good idea? Do you agree that it would be a good idea for 2.0? Fred ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Leaking memory problem
I added allocation tracking tools to numpy for exactly this reason. They are not very well documented, but you can see how to use them here: https://github.com/numpy/numpy/tree/master/tools/allocation_tracking Ray On Mon, Feb 25, 2013 at 8:41 AM, Jaakko Luttinen jaakko.lutti...@aalto.fi wrote: Hi! I was wondering if anyone could help me in finding a memory leak problem with NumPy. My project is quite massive and I haven't been able to construct a simple example which would reproduce the problem.. I have an iterative algorithm which should not increase the memory usage as the iteration progresses. However, after the first iteration, 1GB of memory is used and it steadily increases until at about 100-200 iterations 8GB is used and the program exits with MemoryError. I have a collection of objects which contain large arrays. In each iteration, the objects are updated in turns by re-computing the arrays they contain. The number of arrays and their sizes are constant (do not change during the iteration). So the memory usage should not increase, and I'm a bit confused, how can the program run out of memory if it can easily compute at least a few iterations.. I've tried to use Pympler, but I've understood that it doesn't show the memory usage of NumPy arrays.. ? I also tried gc.set_debug(gc.DEBUG_UNCOLLECTABLE) and then printing gc.garbage at each iteration, but that doesn't show anything. Does anyone have any ideas how to debug this kind of memory leak bug? And how to find out whether the bug is in my code, NumPy or elsewhere? Thanks for any help! Jaakko ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Adding .abs() method to the array object
On Mon, Feb 25, 2013 at 10:43 AM, Till Stensitzki mail.t...@gmx.de wrote: First, sorry that i didnt search for an old thread, but because i disagree with conclusion i would at least address my reason: I don't like np.abs(arr).max() because I have to concentrate to much on the braces, especially if arr is a calculation This exactly, adding an abs into an old expression is always a little annoyance due to the parenthesis. The argument that np.abs() also works is true for (almost?) every other method. The fact that so many methods already exists, especially for most of the commonly used functions (min, max, dot, mean, std, argmin, argmax, conj, T) makes me missing abs. Of course, if one would redesign the api, one would drop most methods (i am looking at you ptp and byteswap). But the objected is already cluttered and adding abs is imo logical application of practicality beats purity. I tend to agree here. The situation isn't all that dire for the number of methods in an array. No scrolling at reasonably small terminal sizes. [~/] [3]: x. x.T x.copy x.getfield x.put x.std x.all x.ctypesx.imag x.ravel x.strides x.any x.cumprod x.item x.real x.sum x.argmaxx.cumsumx.itemset x.repeatx.swapaxes x.argminx.data x.itemsize x.reshape x.take x.argsort x.diagonal x.max x.resizex.tofile x.astypex.dot x.mean x.round x.tolist x.base x.dtype x.min x.searchsorted x.tostring x.byteswap x.dump x.nbytesx.setfield x.trace x.choosex.dumps x.ndim x.setflags x.transpose x.clip x.fill x.newbyteorder x.shape x.var x.compress x.flags x.nonzero x.size x.view x.conj x.flat x.prod x.sort x.conjugate x.flatten x.ptp x.squeeze I find myself typing things like arr.abs() and arr.unique() quite often. Skipper ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Leaking memory problem
On Mon, Feb 25, 2013 at 8:41 AM, Jaakko Luttinen jaakko.lutti...@aalto.fi wrote: Hi! I was wondering if anyone could help me in finding a memory leak problem with NumPy. My project is quite massive and I haven't been able to construct a simple example which would reproduce the problem.. I have an iterative algorithm which should not increase the memory usage as the iteration progresses. However, after the first iteration, 1GB of memory is used and it steadily increases until at about 100-200 iterations 8GB is used and the program exits with MemoryError. I have a collection of objects which contain large arrays. In each iteration, the objects are updated in turns by re-computing the arrays they contain. The number of arrays and their sizes are constant (do not change during the iteration). So the memory usage should not increase, and I'm a bit confused, how can the program run out of memory if it can easily compute at least a few iterations.. There are some stories where pythons garbage collection is too slow to kick in. try to call gc.collect in the loop to see if it helps. roughly what I remember: collection works by the number of objects, if you have a few very large arrays, then memory increases, but garbage collection doesn't start yet. Josef I've tried to use Pympler, but I've understood that it doesn't show the memory usage of NumPy arrays.. ? I also tried gc.set_debug(gc.DEBUG_UNCOLLECTABLE) and then printing gc.garbage at each iteration, but that doesn't show anything. Does anyone have any ideas how to debug this kind of memory leak bug? And how to find out whether the bug is in my code, NumPy or elsewhere? Thanks for any help! Jaakko ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What should np.ndarray.__contains__ do
On Mon, Feb 25, 2013 at 3:10 PM, Sebastian Berg sebast...@sipsolutions.net wrote: Hello all, currently the `__contains__` method or the `in` operator on arrays, does not return what the user would expect when in the operation `a in b` the `a` is not a single element (see In [3]-[4] below). True, I did not expect that! The first solution coming to mind might be checking `all()` for all dimensions given in argument `a` (see line In [5] for a simplistic example). This does not play too well with broadcasting however, but one could maybe simply *not* broadcast at all (i.e. a.shape == b.shape[b.ndim-a.ndim:]) and raise an error/return False otherwise. On the other hand one could say broadcasting of `a` onto `b` should be any along that dimension (see In [8]). The other way should maybe raise an error though (see In [9] to understand what I mean). I think using broadcasting dimensions where `a` is repeated over `b` as the dimensions to use any logic on is the most general way for numpy to handle this consistently, while the other way around could be handled with an `all` but to me makes so little sense that I think it should be an error. Of course this is different to a list of lists, which gives False in these cases, but arrays are not list of lists... As a side note, since for loop, etc. use for item in array, I do not think that vectorizing along `a` as np.in1d does is reasonable. `in` should return a single boolean. Python effectively calls bool() on the return value from __contains__, so reasonableness doesn't even come into it -- the only possible behaviours for `in` are to return True, False, or raise an exception. I admit that I don't actually really understand any of this discussion of broadcasting. in's semantics are, is this scalar in this container? (And the scalarness is enforced by Python, as per above.) So I think we should find some approach where the left argument is treated as a scalar. The two approaches that I can see, and which generalize the behaviour of simple Python lists in natural ways, are: a) the left argument is coerced to a scalar of the appropriate type, then we check if that value appears anywhere in the array (basically raveling the right argument). b) for an array with shape (n1, n2, n3, ...), the left argument is treated as an array of shape (n2, n3, ...), and we check if that subarray (as a whole) appears anywhere in the array. Or in other words, 'A in B' is true iff there is some i such that np.array_equals(B[i], A). Question 1: are there any other sensible options that aren't on this list? Question 2: if not, then which should we choose? (Or we could choose both, I suppose, depending on what the left argument looks like.) Between these two options, I like (a) and don't like (b). The pretending-to-be-a-list-of-lists special case behaviour for multidimensional arrays is already weird and confusing, and besides, I'd expect equality comparison on arrays to use ==, not array_equals. So (b) feels pretty inconsistent with other numpy conventions to me. -n I have opened an issue for it: https://github.com/numpy/numpy/issues/3016#issuecomment-14045545 Regards, Sebastian In [1]: a = np.array([0, 2]) In [2]: b = np.arange(10).reshape(5,2) In [3]: b Out[3]: array([[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]]) In [4]: a in b Out[4]: True In [5]: (b == a).any() Out[5]: True In [6]: (b == a).all(0).any() # the 0 could be multiple axes Out[6]: False In [7]: a_2d = a[None,:] In [8]: a_2d in b # broadcast dimension means any - True Out[8]: True In [9]: [0, 1] in b[:,:1] # should not work (or be False, not True) Out[9]: True ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What should np.ndarray.__contains__ do
The problem with b is that it breaks down if the two status have the same dimensionality. I think a better approach would be for a in b With a having n dimensions, it returns true if there is any subarray of b that matches a along the last n dimensions. So if a has 3 dimensions and b has 6, a in b is true iff there is any i, j, k, m, n, p such that a=b[i, j, k, m:m+a.shape[0], n:n+a.shape[1], p:p+a.shape[2]] ] This isn't a very clear way to describe it, but I think it is consistent with the concept of a being a subarray of b even when they have the same dimensionality. On Feb 25, 2013 5:34 PM, Nathaniel Smith n...@pobox.com wrote: On Mon, Feb 25, 2013 at 3:10 PM, Sebastian Berg sebast...@sipsolutions.net wrote: Hello all, currently the `__contains__` method or the `in` operator on arrays, does not return what the user would expect when in the operation `a in b` the `a` is not a single element (see In [3]-[4] below). True, I did not expect that! The first solution coming to mind might be checking `all()` for all dimensions given in argument `a` (see line In [5] for a simplistic example). This does not play too well with broadcasting however, but one could maybe simply *not* broadcast at all (i.e. a.shape == b.shape[b.ndim-a.ndim:]) and raise an error/return False otherwise. On the other hand one could say broadcasting of `a` onto `b` should be any along that dimension (see In [8]). The other way should maybe raise an error though (see In [9] to understand what I mean). I think using broadcasting dimensions where `a` is repeated over `b` as the dimensions to use any logic on is the most general way for numpy to handle this consistently, while the other way around could be handled with an `all` but to me makes so little sense that I think it should be an error. Of course this is different to a list of lists, which gives False in these cases, but arrays are not list of lists... As a side note, since for loop, etc. use for item in array, I do not think that vectorizing along `a` as np.in1d does is reasonable. `in` should return a single boolean. Python effectively calls bool() on the return value from __contains__, so reasonableness doesn't even come into it -- the only possible behaviours for `in` are to return True, False, or raise an exception. I admit that I don't actually really understand any of this discussion of broadcasting. in's semantics are, is this scalar in this container? (And the scalarness is enforced by Python, as per above.) So I think we should find some approach where the left argument is treated as a scalar. The two approaches that I can see, and which generalize the behaviour of simple Python lists in natural ways, are: a) the left argument is coerced to a scalar of the appropriate type, then we check if that value appears anywhere in the array (basically raveling the right argument). b) for an array with shape (n1, n2, n3, ...), the left argument is treated as an array of shape (n2, n3, ...), and we check if that subarray (as a whole) appears anywhere in the array. Or in other words, 'A in B' is true iff there is some i such that np.array_equals(B[i], A). Question 1: are there any other sensible options that aren't on this list? Question 2: if not, then which should we choose? (Or we could choose both, I suppose, depending on what the left argument looks like.) Between these two options, I like (a) and don't like (b). The pretending-to-be-a-list-of-lists special case behaviour for multidimensional arrays is already weird and confusing, and besides, I'd expect equality comparison on arrays to use ==, not array_equals. So (b) feels pretty inconsistent with other numpy conventions to me. -n I have opened an issue for it: https://github.com/numpy/numpy/issues/3016#issuecomment-14045545 Regards, Sebastian In [1]: a = np.array([0, 2]) In [2]: b = np.arange(10).reshape(5,2) In [3]: b Out[3]: array([[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]]) In [4]: a in b Out[4]: True In [5]: (b == a).any() Out[5]: True In [6]: (b == a).all(0).any() # the 0 could be multiple axes Out[6]: False In [7]: a_2d = a[None,:] In [8]: a_2d in b # broadcast dimension means any - True Out[8]: True In [9]: [0, 1] in b[:,:1] # should not work (or be False, not True) Out[9]: True ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What should np.ndarray.__contains__ do
On Mon, 2013-02-25 at 16:33 +, Nathaniel Smith wrote: On Mon, Feb 25, 2013 at 3:10 PM, Sebastian Berg sebast...@sipsolutions.net wrote: Hello all, currently the `__contains__` method or the `in` operator on arrays, does not return what the user would expect when in the operation `a in b` the `a` is not a single element (see In [3]-[4] below). True, I did not expect that! snip The two approaches that I can see, and which generalize the behaviour of simple Python lists in natural ways, are: a) the left argument is coerced to a scalar of the appropriate type, then we check if that value appears anywhere in the array (basically raveling the right argument). b) for an array with shape (n1, n2, n3, ...), the left argument is treated as an array of shape (n2, n3, ...), and we check if that subarray (as a whole) appears anywhere in the array. Or in other words, 'A in B' is true iff there is some i such that np.array_equals(B[i], A). Question 1: are there any other sensible options that aren't on this list? Question 2: if not, then which should we choose? (Or we could choose both, I suppose, depending on what the left argument looks like.) Between these two options, I like (a) and don't like (b). The pretending-to-be-a-list-of-lists special case behaviour for multidimensional arrays is already weird and confusing, and besides, I'd expect equality comparison on arrays to use ==, not array_equals. So (b) feels pretty inconsistent with other numpy conventions to me. I agree with rejecting (b). (a) seems a good way to think about the problem and I don't see other sensible options. The question is, lets say you have an array b = [[0, 1], [2, 3]] and a = [[0, 1]] since they are both 2d, should b be interpreted as two 2d elements? Another way of seeing this would be ignoring one sized dimensions in `a` for the sake of defining its element. This would allow: In [1]: b = np.arange(10).reshape(5,2) In [2]: b Out[2]: array([[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]]) In [3]: a = np.array([[0, 1]]) # extra dimensions at the start In [4]: a in b Out[4]: True # But would also allow transpose, since now the last axes is a dummy: In [5]: a.T in b.T Out[5]: True Those two examples could also be a shape mismatch error, I tend to think they are reasonable enough to work, but then the user could just reshape/transpose to achieve the same. I also wondered about b having i.e. b.shape = (5,1) with a.shape = (1,2) being sensible enough to be not an error, but this element thinking is a good reasoning for rejecting it IMO. Maybe this is clearer, Sebastian -n I have opened an issue for it: https://github.com/numpy/numpy/issues/3016#issuecomment-14045545 Regards, Sebastian In [1]: a = np.array([0, 2]) In [2]: b = np.arange(10).reshape(5,2) In [3]: b Out[3]: array([[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]]) In [4]: a in b Out[4]: True In [5]: (b == a).any() Out[5]: True In [6]: (b == a).all(0).any() # the 0 could be multiple axes Out[6]: False In [7]: a_2d = a[None,:] In [8]: a_2d in b # broadcast dimension means any - True Out[8]: True In [9]: [0, 1] in b[:,:1] # should not work (or be False, not True) Out[9]: True ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What should np.ndarray.__contains__ do
On Mon, 2013-02-25 at 18:01 +0100, Todd wrote: The problem with b is that it breaks down if the two status have the same dimensionality. I think a better approach would be for a in b With a having n dimensions, it returns true if there is any subarray of b that matches a along the last n dimensions. So if a has 3 dimensions and b has 6, a in b is true iff there is any i, j, k, m, n, p such that a=b[i, j, k, m:m+a.shape[0], n:n+a.shape[1], p:p+a.shape[2]] ] This isn't a very clear way to describe it, but I think it is consistent with the concept of a being a subarray of b even when they have the same dimensionality. Oh, great point. Guess this is the most general way, I completely missed this option. Allows [0, 3] in [1, 0, 3, 5] to be true. I am not sure if this kind of matching should be part of the in operator or not, though on the other hand it would only do something reasonable when otherwise an error would be thrown and it definitely is useful and compatible to what anyone else might expect. On Feb 25, 2013 5:34 PM, Nathaniel Smith n...@pobox.com wrote: On Mon, Feb 25, 2013 at 3:10 PM, Sebastian Berg sebast...@sipsolutions.net wrote: Hello all, currently the `__contains__` method or the `in` operator on arrays, does not return what the user would expect when in the operation `a in b` the `a` is not a single element (see In [3]-[4] below). True, I did not expect that! The first solution coming to mind might be checking `all()` for all dimensions given in argument `a` (see line In [5] for a simplistic example). This does not play too well with broadcasting however, but one could maybe simply *not* broadcast at all (i.e. a.shape == b.shape[b.ndim-a.ndim:]) and raise an error/return False otherwise. On the other hand one could say broadcasting of `a` onto `b` should be any along that dimension (see In [8]). The other way should maybe raise an error though (see In [9] to understand what I mean). I think using broadcasting dimensions where `a` is repeated over `b` as the dimensions to use any logic on is the most general way for numpy to handle this consistently, while the other way around could be handled with an `all` but to me makes so little sense that I think it should be an error. Of course this is different to a list of lists, which gives False in these cases, but arrays are not list of lists... As a side note, since for loop, etc. use for item in array, I do not think that vectorizing along `a` as np.in1d does is reasonable. `in` should return a single boolean. Python effectively calls bool() on the return value from __contains__, so reasonableness doesn't even come into it -- the only possible behaviours for `in` are to return True, False, or raise an exception. I admit that I don't actually really understand any of this discussion of broadcasting. in's semantics are, is this scalar in this container? (And the scalarness is enforced by Python, as per above.) So I think we should find some approach where the left argument is treated as a scalar. The two approaches that I can see, and which generalize the behaviour of simple Python lists in natural ways, are: a) the left argument is coerced to a scalar of the appropriate type, then we check if that value appears anywhere in the array (basically raveling the right argument). b) for an array with shape (n1, n2, n3, ...), the left argument is treated as an array of shape (n2, n3, ...), and we check if that subarray (as a whole) appears anywhere in the array. Or in other words, 'A in B' is true iff there is some i such that np.array_equals(B[i], A). Question 1: are there any other sensible options that aren't on this list? Question 2: if not, then which should we choose? (Or we could choose both, I suppose, depending on what the left argument looks like.) Between these two options, I like (a) and don't like (b). The pretending-to-be-a-list-of-lists special case behaviour for multidimensional arrays is already weird and confusing, and besides, I'd expect equality comparison on arrays to use ==, not array_equals.
Re: [Numpy-discussion] Adding .abs() method to the array object
On Sat, Feb 23, 2013 at 1:33 PM, Robert Kern robert.k...@gmail.com wrote: On Sat, Feb 23, 2013 at 7:25 PM, Nathaniel Smith n...@pobox.com wrote: On Sat, Feb 23, 2013 at 3:38 PM, Till Stensitzki mail.t...@gmx.de wrote: Hello, i know that the array object is already crowded, but i would like to see the abs method added, especially doing work on the console. Considering that many much less used functions are also implemented as a method, i don't think adding one more would be problematic. My gut feeling is that we have too many methods on ndarray, not too few, but in any case, can you elaborate? What's the rationale for why np.abs(a) is so much harder than a.abs(), and why this function and not other unary functions? Or even abs(a). Well, that just calls a method: In [1]: ones(3).__abs__() Out[1]: array([ 1., 1., 1.]) Which shows the advantage of methods, they provide universal function hooks. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Adding .abs() method to the array object
On Mon, Feb 25, 2013 at 7:11 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sat, Feb 23, 2013 at 1:33 PM, Robert Kern robert.k...@gmail.com wrote: On Sat, Feb 23, 2013 at 7:25 PM, Nathaniel Smith n...@pobox.com wrote: On Sat, Feb 23, 2013 at 3:38 PM, Till Stensitzki mail.t...@gmx.de wrote: Hello, i know that the array object is already crowded, but i would like to see the abs method added, especially doing work on the console. Considering that many much less used functions are also implemented as a method, i don't think adding one more would be problematic. My gut feeling is that we have too many methods on ndarray, not too few, but in any case, can you elaborate? What's the rationale for why np.abs(a) is so much harder than a.abs(), and why this function and not other unary functions? Or even abs(a). Well, that just calls a method: In [1]: ones(3).__abs__() Out[1]: array([ 1., 1., 1.]) Which shows the advantage of methods, they provide universal function hooks. Maybe we should start to advertise magic methods. I only recently discovered I can use divmod instead of the numpy functions: divmod(np.array([1.4]), 1) (array([ 1.]), array([ 0.4])) np.array([1.4]).__divmod__(1) (array([ 1.]), array([ 0.4])) Josef Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Adding .abs() method to the array object
On Mon, Feb 25, 2013 at 7:49 PM, josef.p...@gmail.com wrote: On Mon, Feb 25, 2013 at 7:11 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sat, Feb 23, 2013 at 1:33 PM, Robert Kern robert.k...@gmail.com wrote: On Sat, Feb 23, 2013 at 7:25 PM, Nathaniel Smith n...@pobox.com wrote: On Sat, Feb 23, 2013 at 3:38 PM, Till Stensitzki mail.t...@gmx.de wrote: Hello, i know that the array object is already crowded, but i would like to see the abs method added, especially doing work on the console. Considering that many much less used functions are also implemented as a method, i don't think adding one more would be problematic. My gut feeling is that we have too many methods on ndarray, not too few, but in any case, can you elaborate? What's the rationale for why np.abs(a) is so much harder than a.abs(), and why this function and not other unary functions? Or even abs(a). Well, that just calls a method: In [1]: ones(3).__abs__() Out[1]: array([ 1., 1., 1.]) Which shows the advantage of methods, they provide universal function hooks. Maybe we should start to advertise magic methods. I only recently discovered I can use divmod instead of the numpy functions: divmod(np.array([1.4]), 1) (array([ 1.]), array([ 0.4])) np.array([1.4]).__divmod__(1) (array([ 1.]), array([ 0.4])) Thanks for the hint. my new favorite :) (freq - nobs * probs).__abs__().max() 132.0 Josef Josef Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] drawing the line (was: Adding .abs() method to the array object)
I'm hoping this discussion will return to the drawing the line question. http://stackoverflow.com/questions/8108688/in-python-when-should-i-use-a-function-instead-of-a-method Alan Isaac ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Adding .abs() method to the array object
On Mon, 2013-02-25 at 10:50 -0500, Skipper Seabold wrote: On Mon, Feb 25, 2013 at 10:43 AM, Till Stensitzki mail.t...@gmx.de wrote: First, sorry that i didnt search for an old thread, but because i disagree with conclusion i would at least address my reason: I don't like np.abs(arr).max() because I have to concentrate to much on the braces, especially if arr is a calculation This exactly, adding an abs into an old expression is always a little annoyance due to the parenthesis. The argument that np.abs() also works is true for (almost?) every other method. The fact that so many methods already exists, especially for most of the commonly used functions (min, max, dot, mean, std, argmin, argmax, conj, T) makes me missing abs. Of course, if one would redesign the api, one would drop most methods (i am looking at you ptp and byteswap). But the objected is already cluttered and adding abs is imo logical application of practicality beats purity. I tend to agree here. The situation isn't all that dire for the number of methods in an array. No scrolling at reasonably small terminal sizes. [~/] [3]: x. x.T x.copy x.getfield x.put x.std x.all x.ctypesx.imag x.ravel x.strides x.any x.cumprod x.item x.real x.sum x.argmaxx.cumsumx.itemset x.repeat x.swapaxes x.argminx.data x.itemsize x.reshape x.take x.argsort x.diagonal x.max x.resize x.tofile x.astypex.dot x.mean x.round x.tolist x.base x.dtype x.min x.searchsorted x.tostring x.byteswap x.dump x.nbytesx.setfield x.trace x.choosex.dumps x.ndim x.setflags x.transpose x.clip x.fill x.newbyteorder x.shape x.var x.compress x.flags x.nonzero x.size x.view x.conj x.flat x.prod x.sort x.conjugate x.flatten x.ptp x.squeeze Two small things (not sure if it matters much). But first almost all of these methods are related to the container and not the elements. Second actually using a method arr.abs() has a tiny pitfall, since abs would work on numpy types, but not on python types. This means that: np.array([1, 2, 3]).max().abs() works, but np.array([1, 2, 3], dtype=object).max().abs() breaks. Python has a safe name for abs already... I find myself typing things like arr.abs() and arr.unique() quite often. Skipper ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Adding .abs() method to the array object
On Mon, Feb 25, 2013 at 9:20 PM, Sebastian Berg sebast...@sipsolutions.net wrote: On Mon, 2013-02-25 at 10:50 -0500, Skipper Seabold wrote: On Mon, Feb 25, 2013 at 10:43 AM, Till Stensitzki mail.t...@gmx.de wrote: First, sorry that i didnt search for an old thread, but because i disagree with conclusion i would at least address my reason: I don't like np.abs(arr).max() because I have to concentrate to much on the braces, especially if arr is a calculation This exactly, adding an abs into an old expression is always a little annoyance due to the parenthesis. The argument that np.abs() also works is true for (almost?) every other method. The fact that so many methods already exists, especially for most of the commonly used functions (min, max, dot, mean, std, argmin, argmax, conj, T) makes me missing abs. Of course, if one would redesign the api, one would drop most methods (i am looking at you ptp and byteswap). But the objected is already cluttered and adding abs is imo logical application of practicality beats purity. I tend to agree here. The situation isn't all that dire for the number of methods in an array. No scrolling at reasonably small terminal sizes. [~/] [3]: x. x.T x.copy x.getfield x.put x.std x.all x.ctypesx.imag x.ravel x.strides x.any x.cumprod x.item x.real x.sum x.argmaxx.cumsumx.itemset x.repeat x.swapaxes x.argminx.data x.itemsize x.reshape x.take x.argsort x.diagonal x.max x.resize x.tofile x.astypex.dot x.mean x.round x.tolist x.base x.dtype x.min x.searchsorted x.tostring x.byteswap x.dump x.nbytesx.setfield x.trace x.choosex.dumps x.ndim x.setflags x.transpose x.clip x.fill x.newbyteorder x.shape x.var x.compress x.flags x.nonzero x.size x.view x.conj x.flat x.prod x.sort x.conjugate x.flatten x.ptp x.squeeze Two small things (not sure if it matters much). But first almost all of these methods are related to the container and not the elements. Second actually using a method arr.abs() has a tiny pitfall, since abs would work on numpy types, but not on python types. This means that: np.array([1, 2, 3]).max().abs() works, but np.array([1, 2, 3], dtype=object).max().abs() breaks. Python has a safe name for abs already... (np.array([1, 2, 3], dtype=object)).max() 3 (np.array([1, 2, 3], dtype=object)).__abs__().max() 3 (np.array([1, 2, '3'], dtype=object)).__abs__() Traceback (most recent call last): File stdin, line 1, in module TypeError: bad operand type for abs(): 'str' map(abs, [1, 2, 3]) [1, 2, 3] map(abs, [1, 2, '3']) Traceback (most recent call last): File stdin, line 1, in module TypeError: bad operand type for abs(): 'str' I don't see a difference. (I don't expect to use max abs on anything else than numbers.) Josef I find myself typing things like arr.abs() and arr.unique() quite often. Skipper ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Adding .abs() method to the array object
On Mon, Feb 25, 2013 at 9:58 PM, josef.p...@gmail.com wrote: On Mon, Feb 25, 2013 at 9:20 PM, Sebastian Berg sebast...@sipsolutions.net wrote: On Mon, 2013-02-25 at 10:50 -0500, Skipper Seabold wrote: On Mon, Feb 25, 2013 at 10:43 AM, Till Stensitzki mail.t...@gmx.de wrote: First, sorry that i didnt search for an old thread, but because i disagree with conclusion i would at least address my reason: I don't like np.abs(arr).max() because I have to concentrate to much on the braces, especially if arr is a calculation This exactly, adding an abs into an old expression is always a little annoyance due to the parenthesis. The argument that np.abs() also works is true for (almost?) every other method. The fact that so many methods already exists, especially for most of the commonly used functions (min, max, dot, mean, std, argmin, argmax, conj, T) makes me missing abs. Of course, if one would redesign the api, one would drop most methods (i am looking at you ptp and byteswap). But the objected is already cluttered and adding abs is imo logical application of practicality beats purity. I tend to agree here. The situation isn't all that dire for the number of methods in an array. No scrolling at reasonably small terminal sizes. [~/] [3]: x. x.T x.copy x.getfield x.put x.std x.all x.ctypesx.imag x.ravel x.strides x.any x.cumprod x.item x.real x.sum x.argmaxx.cumsumx.itemset x.repeat x.swapaxes x.argminx.data x.itemsize x.reshape x.take x.argsort x.diagonal x.max x.resize x.tofile x.astypex.dot x.mean x.round x.tolist x.base x.dtype x.min x.searchsorted x.tostring x.byteswap x.dump x.nbytesx.setfield x.trace x.choosex.dumps x.ndim x.setflags x.transpose x.clip x.fill x.newbyteorder x.shape x.var x.compress x.flags x.nonzero x.size x.view x.conj x.flat x.prod x.sort x.conjugate x.flatten x.ptp x.squeeze Two small things (not sure if it matters much). But first almost all of these methods are related to the container and not the elements. Second actually using a method arr.abs() has a tiny pitfall, since abs would work on numpy types, but not on python types. This means that: np.array([1, 2, 3]).max().abs() works, but np.array([1, 2, 3], dtype=object).max().abs() breaks. Python has a safe name for abs already... (np.array([1, 2, 3], dtype=object)).max() 3 (np.array([1, 2, 3], dtype=object)).__abs__().max() 3 (np.array([1, 2, '3'], dtype=object)).__abs__() Traceback (most recent call last): File stdin, line 1, in module TypeError: bad operand type for abs(): 'str' map(abs, [1, 2, 3]) [1, 2, 3] map(abs, [1, 2, '3']) Traceback (most recent call last): File stdin, line 1, in module TypeError: bad operand type for abs(): 'str' or maybe more useful from decimal import Decimal d = [Decimal(str(k)) for k in np.linspace(-1, 1, 5)] map(abs, d) [Decimal('1.0'), Decimal('0.5'), Decimal('0.0'), Decimal('0.5'), Decimal('1.0')] np.asarray(d).__abs__() array([1.0, 0.5, 0.0, 0.5, 1.0], dtype=object) np.asarray(d).__abs__()[0] Decimal('1.0') Josef I don't see a difference. (I don't expect to use max abs on anything else than numbers.) Josef I find myself typing things like arr.abs() and arr.unique() quite often. Skipper ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Leaking memory problem
Is this with 1.7? There see a few memory leak fixes in 1.7, so if you aren't using that you should try it to be sure. And if you are using it, then there is one known memory leak bug in 1.7 that you might want to check whether you're hitting: https://github.com/numpy/numpy/issues/2969 -n On 25 Feb 2013 13:41, Jaakko Luttinen jaakko.lutti...@aalto.fi wrote: Hi! I was wondering if anyone could help me in finding a memory leak problem with NumPy. My project is quite massive and I haven't been able to construct a simple example which would reproduce the problem.. I have an iterative algorithm which should not increase the memory usage as the iteration progresses. However, after the first iteration, 1GB of memory is used and it steadily increases until at about 100-200 iterations 8GB is used and the program exits with MemoryError. I have a collection of objects which contain large arrays. In each iteration, the objects are updated in turns by re-computing the arrays they contain. The number of arrays and their sizes are constant (do not change during the iteration). So the memory usage should not increase, and I'm a bit confused, how can the program run out of memory if it can easily compute at least a few iterations.. I've tried to use Pympler, but I've understood that it doesn't show the memory usage of NumPy arrays.. ? I also tried gc.set_debug(gc.DEBUG_UNCOLLECTABLE) and then printing gc.garbage at each iteration, but that doesn't show anything. Does anyone have any ideas how to debug this kind of memory leak bug? And how to find out whether the bug is in my code, NumPy or elsewhere? Thanks for any help! Jaakko ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion