Re: [Numpy-discussion] Boolean arrays
Ideally, I would like in1d to always be the right answer to this problem. It should be easy to put in an if statement to switch to a kern_in()-type function in the case of large ar1 but small ar2. I will do some timing tests and make a patch. I uploaded a timing test and a patch to arraysetops.py here: http://projects.scipy.org/numpy/ticket/1603 The new in1d() uses the kern_in algorithm when it's faster, and the existing algorithm otherwise. The speedup compared to the old in1d() for cases with very large ar1 and small ar2 can be up to 10x on my laptop. If someone with commit access could take a look and and apply it if ok, that would be great. Thanks, Neil ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Boolean arrays
Nathaniel Smith njs at pobox.com writes: On Fri, Aug 27, 2010 at 1:35 PM, Robert Kern robert.kern at gmail.com wrote: As valid gets larger, in1d() will catch up but for smallish sizes of valid, which I suspect given the non-numeric nature of the OP's (Hi, Brett!) request, kern_in() is usually better. Oh well, I was just guessing based on algorithmic properties. Sounds like there might be some optimizations possible to in1d then, if anyone had a reason to care . Ideally, I would like in1d to always be the right answer to this problem. It should be easy to put in an if statement to switch to a kern_in()-type function in the case of large ar1 but small ar2. I will do some timing tests and make a patch. Incidentally, the timing tests done when in1d was introduced only considered the case when len(ar1) = len(ar2). In this case the current in_1d is pretty much always faster than kern_in(). Neil ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Boolean arrays
2010/8/27 Brett Olsen brett.ol...@gmail.com: If there's multiple possible valid values, I've come up with a couple possible methods, but they all seem to be inefficient or kludges: valid = N.array((a, c)) (ar == valid[0]) | (ar == valid[1]) array([ True, False, True, False, False, True, False, True, True], dtype=bool) N.array(map(lambda x: x in valid, ar)) array([ True, False, True, False, False, True, False, True, True], dtype=bool) (ar[..., numpy.newaxis] == valid).T.sum(axis=0).T 0 should also do the job. But it eats up memory. (It employs broadcasting.) Friedrich ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Boolean arrays
2010/8/27, Robert Kern robert.k...@gmail.com: [~] |2 def kern_in(x, valid): .. mask = np.zeros(x.shape, dtype=bool) .. for good in valid: .. mask |= (x == good) .. return mask .. [~] |6 ar = np.random.randint(100, size=100) [~] |7 valid = np.arange(0, 100, 5) [~] |8 %timeit kern_in(ar, valid) 10 loops, best of 3: 115 ms per loop [~] |9 %timeit np.in1d(ar, valid) 1 loops, best of 3: 279 ms per loop Another possibility is to use numexpr. On a machine with 2 x E5520 quad-core processors (i.e. a total of 8 physical cores and, with hyperthreading, 16 logical cores): In [1]: import numpy as np In [2]: def kern_in(x, valid): ...: mask = np.zeros(x.shape, dtype=bool) ...: for good in valid: ...: mask |= (x == good) ...: return mask ...: In [3]: ar = np.random.randint(100, size=1000) In [4]: valid = np.arange(0, 100, 5) In [5]: timeit kern_in(ar, valid) 1 loops, best of 3: 1.21 s per loop In [6]: sexpr = |.join([ (ar == %d) % v for v in valid ]) In [7]: sexpr # (ar == 0) | (ar == 1) == (0,1) in ar Out[7]: '(ar == 0)|(ar == 5)|(ar == 10)|(ar == 15)|(ar == 20)|(ar == 25)|(ar == 30)|(ar == 35)|(ar == 40)|(ar == 45)|(ar == 50)|(ar == 55)|(ar == 60)|(ar == 65)|(ar == 70)|(ar == 75)|(ar == 80)|(ar == 85)|(ar == 90)|(ar == 95)' In [9]: import numexpr as nx In [10]: timeit nx.evaluate(sexpr) 10 loops, best of 3: 71.9 ms per loop That's almost 17x of speed-up wrt to kern_in() function, but not all is due to the use of the full 16 threads. Using only one thread gives: In [11]: nx.set_num_threads(1) In [12]: timeit nx.evaluate(sexpr) 1 loops, best of 3: 586 ms per loop which is about 2x faster than kern_in() for this machine. It is not always possible to use numexpr, but in this case it seems to work pretty well. -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Boolean arrays
Hello, I have an array of non-numeric data, and I want to create a boolean array denoting whether each element in this array is a valid value or not. This is straightforward if there's only one possible valid value: import numpy as N ar = N.array((a, b, c, b, b, a, d, c, a)) ar == a array([ True, False, False, False, False, True, False, False, True], dtype=bool) If there's multiple possible valid values, I've come up with a couple possible methods, but they all seem to be inefficient or kludges: valid = N.array((a, c)) (ar == valid[0]) | (ar == valid[1]) array([ True, False, True, False, False, True, False, True, True], dtype=bool) N.array(map(lambda x: x in valid, ar)) array([ True, False, True, False, False, True, False, True, True], dtype=bool) Is there a numpy-appropriate way to do this? Thanks, Brett Olsen ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Boolean arrays
On Fri, Aug 27, 2010 at 3:58 PM, Brett Olsen brett.ol...@gmail.com wrote: Hello, I have an array of non-numeric data, and I want to create a boolean array denoting whether each element in this array is a valid value or not. This is straightforward if there's only one possible valid value: import numpy as N ar = N.array((a, b, c, b, b, a, d, c, a)) ar == a array([ True, False, False, False, False, True, False, False, True], dtype=bool) If there's multiple possible valid values, I've come up with a couple possible methods, but they all seem to be inefficient or kludges: valid = N.array((a, c)) (ar == valid[0]) | (ar == valid[1]) array([ True, False, True, False, False, True, False, True, True], dtype=bool) N.array(map(lambda x: x in valid, ar)) array([ True, False, True, False, False, True, False, True, True], dtype=bool) Is there a numpy-appropriate way to do this? Thanks, Brett Olsen amap: Like Map, but for arrays. ar = numpy.array((a, b, c, b, b, a, d, c, a)) valid = ('a', 'c') numpy.amap(lambda x: x in valid, ar) array([ True, False, True, False, False, True, False, True, True], dtype=bool) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Boolean arrays
On Fri, Aug 27, 2010 at 15:10, Ken Watford kwatford+sc...@gmail.com wrote: On Fri, Aug 27, 2010 at 3:58 PM, Brett Olsen brett.ol...@gmail.com wrote: Hello, I have an array of non-numeric data, and I want to create a boolean array denoting whether each element in this array is a valid value or not. This is straightforward if there's only one possible valid value: import numpy as N ar = N.array((a, b, c, b, b, a, d, c, a)) ar == a array([ True, False, False, False, False, True, False, False, True], dtype=bool) If there's multiple possible valid values, I've come up with a couple possible methods, but they all seem to be inefficient or kludges: valid = N.array((a, c)) (ar == valid[0]) | (ar == valid[1]) array([ True, False, True, False, False, True, False, True, True], dtype=bool) N.array(map(lambda x: x in valid, ar)) array([ True, False, True, False, False, True, False, True, True], dtype=bool) Is there a numpy-appropriate way to do this? Thanks, Brett Olsen amap: Like Map, but for arrays. ar = numpy.array((a, b, c, b, b, a, d, c, a)) valid = ('a', 'c') numpy.amap(lambda x: x in valid, ar) array([ True, False, True, False, False, True, False, True, True], dtype=bool) I'm not sure what version of numpy this would be in; I've never seen it. But in any case, that would be very slow for large arrays since it would invoke a Python function call for every value in ar. Instead, iterate over the valid array, which is much shorter: mask = np.zeros(ar.shape, dtype=bool) for good in valid: mask |= (ar == good) Wrap that up into a function and you're good to go. That's about as efficient as it gets unless if the valid array gets large. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Boolean arrays
On Fri, Aug 27, 2010 at 1:17 PM, Robert Kern robert.k...@gmail.com wrote: But in any case, that would be very slow for large arrays since it would invoke a Python function call for every value in ar. Instead, iterate over the valid array, which is much shorter: mask = np.zeros(ar.shape, dtype=bool) for good in valid: mask |= (ar == good) Wrap that up into a function and you're good to go. That's about as efficient as it gets unless if the valid array gets large. Probably even more efficient if 'ar' is large and 'valid' is small, and shorter to boot: np.in1d(ar, valid) -- Nathaniel ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Boolean arrays
On 27 August 2010 16:17, Robert Kern robert.k...@gmail.com wrote: On Fri, Aug 27, 2010 at 15:10, Ken Watford kwatford+sc...@gmail.com wrote: On Fri, Aug 27, 2010 at 3:58 PM, Brett Olsen brett.ol...@gmail.com wrote: Hello, I have an array of non-numeric data, and I want to create a boolean array denoting whether each element in this array is a valid value or not. This is straightforward if there's only one possible valid value: import numpy as N ar = N.array((a, b, c, b, b, a, d, c, a)) ar == a array([ True, False, False, False, False, True, False, False, True], dtype=bool) If there's multiple possible valid values, I've come up with a couple possible methods, but they all seem to be inefficient or kludges: valid = N.array((a, c)) (ar == valid[0]) | (ar == valid[1]) array([ True, False, True, False, False, True, False, True, True], dtype=bool) N.array(map(lambda x: x in valid, ar)) array([ True, False, True, False, False, True, False, True, True], dtype=bool) Is there a numpy-appropriate way to do this? Thanks, Brett Olsen amap: Like Map, but for arrays. ar = numpy.array((a, b, c, b, b, a, d, c, a)) valid = ('a', 'c') numpy.amap(lambda x: x in valid, ar) array([ True, False, True, False, False, True, False, True, True], dtype=bool) I'm not sure what version of numpy this would be in; I've never seen it. But in any case, that would be very slow for large arrays since it would invoke a Python function call for every value in ar. Instead, iterate over the valid array, which is much shorter: mask = np.zeros(ar.shape, dtype=bool) for good in valid: mask |= (ar == good) Wrap that up into a function and you're good to go. That's about as efficient as it gets unless if the valid array gets large. The problem here is really one of how you specify which values are valid. If your only specification is with a python function, then you're stuck calling that python function once for each possible value, no way around it. But it could happen that you have an array of possible values and a corresponding boolean array that says whether they're valid or not. Then there's a shortcut that's probably faster than oring as Robert suggests: In [3]: A = np.array([1,2,6,4,4,2,1,7,8,2,2,1]) In [4]: B = np.unique1d(A) In [5]: B Out[5]: array([1, 2, 4, 6, 7, 8]) Here C specifies which ones are valid. C could be computed using some sort of validity function (which it may be possible to vectorize). In any case it's only the distinct values, and they're sorted (so you can use ranges). In [6]: C = np.array([True,True,True,False,False,True]) Now to compute validity of A: In [10]: C[np.searchsorted(B,A)] Out[10]: array([ True, True, False, True, True, True, True, False, True, True, True, True], dtype=bool) Anne -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Boolean arrays
On Fri, Aug 27, 2010 at 4:17 PM, Robert Kern robert.k...@gmail.com wrote: On Fri, Aug 27, 2010 at 15:10, Ken Watford kwatford+sc...@gmail.com wrote: On Fri, Aug 27, 2010 at 3:58 PM, Brett Olsen brett.ol...@gmail.com wrote: Hello, I have an array of non-numeric data, and I want to create a boolean array denoting whether each element in this array is a valid value or not. This is straightforward if there's only one possible valid value: import numpy as N ar = N.array((a, b, c, b, b, a, d, c, a)) ar == a array([ True, False, False, False, False, True, False, False, True], dtype=bool) If there's multiple possible valid values, I've come up with a couple possible methods, but they all seem to be inefficient or kludges: valid = N.array((a, c)) (ar == valid[0]) | (ar == valid[1]) array([ True, False, True, False, False, True, False, True, True], dtype=bool) N.array(map(lambda x: x in valid, ar)) array([ True, False, True, False, False, True, False, True, True], dtype=bool) Is there a numpy-appropriate way to do this? Thanks, Brett Olsen amap: Like Map, but for arrays. ar = numpy.array((a, b, c, b, b, a, d, c, a)) valid = ('a', 'c') numpy.amap(lambda x: x in valid, ar) array([ True, False, True, False, False, True, False, True, True], dtype=bool) I'm not sure what version of numpy this would be in; I've never seen it. Ah, my fault. I started ipython in pylab mode, expected to find something like amap, found it, and assumed it was in numpy. It's actually in matplotlib.mlab, strangely enough. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Boolean arrays
On Fri, Aug 27, 2010 at 15:21, Nathaniel Smith n...@pobox.com wrote: On Fri, Aug 27, 2010 at 1:17 PM, Robert Kern robert.k...@gmail.com wrote: But in any case, that would be very slow for large arrays since it would invoke a Python function call for every value in ar. Instead, iterate over the valid array, which is much shorter: mask = np.zeros(ar.shape, dtype=bool) for good in valid: mask |= (ar == good) Wrap that up into a function and you're good to go. That's about as efficient as it gets unless if the valid array gets large. Probably even more efficient if 'ar' is large and 'valid' is small, and shorter to boot: np.in1d(ar, valid) Not according to my timings: [~] |2 def kern_in(x, valid): .. mask = np.zeros(x.shape, dtype=bool) .. for good in valid: .. mask |= (x == good) .. return mask .. [~] |6 ar = np.random.randint(100, size=100) [~] |7 valid = np.arange(0, 100, 5) [~] |8 %timeit kern_in(ar, valid) 10 loops, best of 3: 115 ms per loop [~] |9 %timeit np.in1d(ar, valid) 1 loops, best of 3: 279 ms per loop As valid gets larger, in1d() will catch up but for smallish sizes of valid, which I suspect given the non-numeric nature of the OP's (Hi, Brett!) request, kern_in() is usually better. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Boolean arrays
On Fri, Aug 27, 2010 at 1:35 PM, Robert Kern robert.k...@gmail.com wrote: [~] |8 %timeit kern_in(ar, valid) 10 loops, best of 3: 115 ms per loop [~] |9 %timeit np.in1d(ar, valid) 1 loops, best of 3: 279 ms per loop As valid gets larger, in1d() will catch up but for smallish sizes of valid, which I suspect given the non-numeric nature of the OP's (Hi, Brett!) request, kern_in() is usually better. Oh well, I was just guessing based on algorithmic properties. Sounds like there might be some optimizations possible to in1d then, if anyone had a reason to care :-). -- Nathaniel ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] boolean arrays
Hi all, is the following behaviour correct a = array(([True,True],[True,True])) b = array(([False,False],[False,False])) a+b array([[ True, True], [ True, True]]) I have expected False. Nils ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] boolean arrays
It is obvious to me that True+False == True,. Why do you think it should be False? Nadav On Thu, 2009-11-26 at 14:20 +0100, Nils Wagner wrote: Hi all, is the following behaviour correct a = array(([True,True],[True,True])) b = array(([False,False],[False,False])) a+b array([[ True, True], [ True, True]]) I have expected False. Nils ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] boolean arrays
Le jeudi 26 novembre 2009 à 18:26 +0200, Nadav Horesh a écrit : It is obvious to me that True+False == True,. Why do you think it should be False? I would understand it is not obvious that '+' stands for logical 'or', and '*' for logical 'and'... -- Fabrice Silva si...@lma.cnrs-mrs.fr LMA UPR CNRS 7051 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] boolean arrays
On Thu, Nov 26, 2009 at 02:43:14PM +0100, Fabrice Silva wrote: Le jeudi 26 novembre 2009 à 18:26 +0200, Nadav Horesh a écrit : It is obvious to me that True+False == True,. Why do you think it should be False? I would understand it is not obvious that '+' stands for logical 'or', and '*' for logical 'and'... In Bool's algebra, this is the common convention. The reason being that only 'or' can correspond to the additive law of an algebra: its null element is absorbant for 'and'. In other words, if you map '+' and '*' to the opposite, you'll get suprising behaviors. Gaël ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] boolean arrays
On 11/26/2009 8:20 AM, Nils Wagner wrote: a = array(([True,True],[True,True])) b = array(([False,False],[False,False])) a+b NumPy's boolean operations are very well behaved. a = np.array(([True,True],[True,True])) a+a array([[ True, True], [ True, True]], dtype=bool) Compare Python: True + True 2 Ugh! Not fixing this in Python 3 was a big mistake, imo. Alan Isaac ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] boolean arrays
Le jeudi 26 novembre 2009 à 14:44 +0100, Gael Varoquaux a écrit : On Thu, Nov 26, 2009 at 02:43:14PM +0100, Fabrice Silva wrote: Le jeudi 26 novembre 2009 à 18:26 +0200, Nadav Horesh a écrit : It is obvious to me that True+False == True,. Why do you think it should be False? I would understand it is not obvious that '+' stands for logical 'or', and '*' for logical 'and'... In Bool's algebra, this is the common convention. The reason being that only 'or' can correspond to the additive law of an algebra: its null element is absorbant for 'and'. In other words, if you map '+' and '*' to the opposite, you'll get suprising behaviors. I fully agree with you. My point was to complete Nadav's comment with potentially missing information, trying to figrue why Nils was expected False... -- Fabrice Silva si...@lma.cnrs-mrs.fr LMA UPR CNRS 7051 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] boolean arrays
On Thu, 26 Nov 2009 15:14:04 +0100 Fabrice Silva si...@lma.cnrs-mrs.fr wrote: Le jeudi 26 novembre 2009 à 14:44 +0100, Gael Varoquaux a écrit : On Thu, Nov 26, 2009 at 02:43:14PM +0100, Fabrice Silva wrote: Le jeudi 26 novembre 2009 à 18:26 +0200, Nadav Horesh a écrit : It is obvious to me that True+False == True,. Why do you think it should be False? I would understand it is not obvious that '+' stands for logical 'or', and '*' for logical 'and'... In Bool's algebra, this is the common convention. The reason being that only 'or' can correspond to the additive law of an algebra: its null element is absorbant for 'and'. In other words, if you map '+' and '*' to the opposite, you'll get suprising behaviors. I fully agree with you. My point was to complete Nadav's comment with potentially missing information, trying to figrue why Nils was expected False... -- Fabrice Silva si...@lma.cnrs-mrs.fr LMA UPR CNRS 7051 Sorry, I mixed up '+' and '' a = array(([True,True],[True,True])) b = array(([False,False],[False,False])) a b array([[False, False], [False, False]], dtype=bool) Cheers, Nils ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] boolean arrays
On Thu, Nov 26, 2009 at 7:35 PM, Nils Wagner nwag...@iam.uni-stuttgart.de wrote: Sorry, I mixed up '+' and '' a = array(([True,True],[True,True])) b = array(([False,False],[False,False])) a b array([[False, False], [False, False]], dtype=bool) Cheers, Nils hey, this is a classical problem with + (sometimes pronounced 'and') and (also pronounced 'and'). Happens to all of us sometimes. cu, ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion