Re: [Numpy-discussion] extract elements of an array that are contained in another array?
Hi Josef, thanks for the summary! I am responding below, later I will make an enhancement ticket. josef.p...@gmail.com wrote: On Sat, Jun 6, 2009 at 4:42 AM, Neil Crighton neilcrigh...@gmail.com wrote: Robert Cimrman cimrman3 at ntc.zcu.cz writes: Anne Archibald wrote: 1. add a keyword argument to intersect1d assume_unique; if it is not present, check for uniqueness and emit a warning if not unique 2. change the warning to an exception Optionally: 3. change the meaning of the function to that of intersect1d_nu if the keyword argument is not present 1. merge _nu version into one function --- You mean something like: def intersect1d(ar1, ar2, assume_unique=False): if not assume_unique: return intersect1d_nu(ar1, ar2) else: ... # the current code intersect1d_nu could be still exported to numpy namespace, or not. +1 - from the user's point of view there should just be intersect1d and setmember1d (i.e. no '_nu' versions). The assume_unique keyword Robert suggests can be used if speed is a problem. + 1 on rolling the _nu versions this way into the plain version, this would avoid a lot of the confusion. It would not be a code breaking API change for existing correct usage (but some speed regression without adding keyword) +1 depreciate intersect1d_nu ^^ intersect1d_nu could be still exported to numpy namespace, or not. I would say not, if they are the default branch of the non _nu version +1 on depreciation +0 2. alias as in - I really like in1d (no underscore) as a new name for setmember1d_nu. inarray is another possibility. I don't like 'ain'; 'a' in front of 'in' detracts from readability, unlike the extra a in arange. I don't like the extra as either, ones name spaces are commonly used alias setmember1d_nu as `in1d` or `isin1d`, because the function is a in and not a set operation +1 +1 3. behavior of other set functions --- guarantee that setdiff1d works for non-unique arrays (even when implementation changes), and change documentation +1 +1, it is useful for non-unique arrays. need to check other functions ^^ union1d: works for non-unique arrays, obvious from source Yes. setxor1d: requires unique arrays np.setxor1d([1,2,3,3,4,5], [0,0,1,2,2,6]) array([2, 4, 5, 6]) np.setxor1d(np.unique([1,2,3,3,4,5]), np.unique([0,0,1,2,2,6])) array([0, 3, 4, 5, 6]) setxor: add keyword option and call unique by default +1 for symmetry +1 - you mean np.setxor1d(np.unique(a), np.unique(b)) to become np.setxor1d(a, b, assume_unique=False), right? ediff1d and unique1d are defined for non-unique arrays yes 4. name of keyword intersect1d(ar1, ar2, assume_unique=False) alternative isunique=False or just unique=False +1 less to write We should look at other functions in numpy (and/or scipy), what is a common scheme here. -1e-1 to the proposed names, as isunique is singular only, and unique=False does not show clearly the intent for me. What about ar1_unique=False, ar2_unique=False - to address each argument specifically? 5. module name --- rename arraysetops to something easier to read like setfun. I think it would only affect internal changes since all functions are exported to the main numpy name space +1e-4 (I got used to arrayse_tops) +0 (internal change only). Other numpy/scipy submodules containing a bunch of functions are called *pack (fftpack, arpack, lapack), *alg (linalg), *utils. *fun is used comonly in the matlab world. 5. keep docs in sync with correct usage - obvious +1 thanks, r. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] extract elements of an array that are contained in another array?
Robert Cimrman wrote: Hi Josef, thanks for the summary! I am responding below, later I will make an enhancement ticket. Done, see http://projects.scipy.org/numpy/ticket/1133 r. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] extract elements of an array that are contained in another array?
Robert Cimrman cimrman3 at ntc.zcu.cz writes: Anne Archibald wrote: 1. add a keyword argument to intersect1d assume_unique; if it is not present, check for uniqueness and emit a warning if not unique 2. change the warning to an exception Optionally: 3. change the meaning of the function to that of intersect1d_nu if the keyword argument is not present You mean something like: def intersect1d(ar1, ar2, assume_unique=False): if not assume_unique: return intersect1d_nu(ar1, ar2) else: ... # the current code intersect1d_nu could be still exported to numpy namespace, or not. +1 - from the user's point of view there should just be intersect1d and setmember1d (i.e. no '_nu' versions). The assume_unique keyword Robert suggests can be used if speed is a problem. I really like in1d (no underscore) as a new name for setmember1d_nu. inarray is another possibility. I don't like 'ain'; 'a' in front of 'in' detracts from readability, unlike the extra a in arange. Can we summarise the discussion in this thread and write up a short proposal about what we'd like to change in arraysetops, and how to make the changes? Then it's easy for other people to give their opinion on any changes. I can do this if no one else has time. Neil ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] extract elements of an array that are contained in another array?
On Sat, Jun 6, 2009 at 4:42 AM, Neil Crighton neilcrigh...@gmail.com wrote: Robert Cimrman cimrman3 at ntc.zcu.cz writes: Anne Archibald wrote: 1. add a keyword argument to intersect1d assume_unique; if it is not present, check for uniqueness and emit a warning if not unique 2. change the warning to an exception Optionally: 3. change the meaning of the function to that of intersect1d_nu if the keyword argument is not present 1. merge _nu version into one function --- You mean something like: def intersect1d(ar1, ar2, assume_unique=False): if not assume_unique: return intersect1d_nu(ar1, ar2) else: ... # the current code intersect1d_nu could be still exported to numpy namespace, or not. +1 - from the user's point of view there should just be intersect1d and setmember1d (i.e. no '_nu' versions). The assume_unique keyword Robert suggests can be used if speed is a problem. + 1 on rolling the _nu versions this way into the plain version, this would avoid a lot of the confusion. It would not be a code breaking API change for existing correct usage (but some speed regression without adding keyword) depreciate intersect1d_nu ^^ intersect1d_nu could be still exported to numpy namespace, or not. I would say not, if they are the default branch of the non _nu version +1 on depreciation 2. alias as in - I really like in1d (no underscore) as a new name for setmember1d_nu. inarray is another possibility. I don't like 'ain'; 'a' in front of 'in' detracts from readability, unlike the extra a in arange. I don't like the extra as either, ones name spaces are commonly used alias setmember1d_nu as `in1d` or `isin1d`, because the function is a in and not a set operation +1 Can we summarise the discussion in this thread and write up a short proposal about what we'd like to change in arraysetops, and how to make the changes? Then it's easy for other people to give their opinion on any changes. I can do this if no one else has time. other points 3. behavior of other set functions --- guarantee that setdiff1d works for non-unique arrays (even when implementation changes), and change documentation +1 need to check other functions ^^ union1d: works for non-unique arrays, obvious from source setxor1d: requires unique arrays np.setxor1d([1,2,3,3,4,5], [0,0,1,2,2,6]) array([2, 4, 5, 6]) np.setxor1d(np.unique([1,2,3,3,4,5]), np.unique([0,0,1,2,2,6])) array([0, 3, 4, 5, 6]) setxor: add keyword option and call unique by default +1 for symmetry ediff1d and unique1d are defined for non-unique arrays 4. name of keyword intersect1d(ar1, ar2, assume_unique=False) alternative isunique=False or just unique=False +1 less to write 5. module name --- rename arraysetops to something easier to read like setfun. I think it would only affect internal changes since all functions are exported to the main numpy name space +1e-4 (I got used to arrayse_tops) 5. keep docs in sync with correct usage - obvious That's my summary and opinions Josef Neil ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] extract elements of an array that are contained in another array?
josef.p...@gmail.com wrote: On Fri, Jun 5, 2009 at 1:48 AM, Robert Cimrman cimrm...@ntc.zcu.cz wrote: josef.p...@gmail.com wrote: On Thu, Jun 4, 2009 at 4:30 PM, Gael Varoquaux gael.varoqu...@normalesup.org wrote: On Thu, Jun 04, 2009 at 10:27:11PM +0200, Kim Hansen wrote: in(b) or in_iterable(b) method, such that you could do a.in(b) which would return a boolean array of the same shape as a with elements true if the equivalent a members were members in the iterable b. That would really by what I would be looking for. Just using in might promise more than it does, eg. it works only for one dimensional arrays, maybe in1d. With in, I would expect a generic function as in python that works with many array types and dimensions. (But I haven't checked whether it would work with a 1d structured array or object array.) I found arraysetops because of unique1d, but I didn't figure out what the subpackage really does, because I was reading arrayse-tops instead of array-set-ops I am bad in choosing names, but note that numpy sub-modules usually do not use underscores, so array_set_ops would not fit well. I would have chosen something like setfun. Since this is in numpy that sets refers to arrays should be implied. Yes, good idea. I am not sure how to proceed, if people agree (name contest is open!) What about making an alias name setfun, and deprecate the name arraysetops? BTW, for the docs, I haven't found a counter example where np.setdiff1d gives the wrong answer for non-unique arrays. In [4]: np.setmember1d( [1, 1, 2, 4, 2], [3, 2, 4] ) Out[4]: array([ True, False, True, True, True], dtype=bool) setdiff1ddiff not member Looking at the source, I think setdiff always works even if for non-unique arrays. Whoops, sorry. setdiff1d seems really to work for non-unique arrays - it relies on the behaviour above though :) - there is always one correct False even for repeated entries in the first array. r. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] extract elements of an array that are contained in another array?
On 4-Jun-09, at 4:38 PM, Anne Archibald wrote: It seems to me that this is the basic source of the problem. Perhaps this can be addressed? I realize maintaining compatibility with the current behaviour is necessary, so how about a multistage deprecation: 1. add a keyword argument to intersect1d assume_unique; if it is not present, check for uniqueness and emit a warning if not unique 2. change the warning to an exception Optionally: 3. change the meaning of the function to that of intersect1d_nu if the keyword argument is not present One could do something similar with setmember1d. +1 on this idea. I've been bitten by the non-unique stuff in the past, especially with setmember1d, not realizing that both need to be unique. David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] extract elements of an array that are contained in another array?
a[(a==b[:,None]).sum(axis=0,dtype=bool)] hth, Alan Isaac ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] extract elements of an array that are contained in another array?
On Thu, Jun 4, 2009 at 8:23 AM, Alan G Isaac ais...@american.edu wrote: a[(a==b[:,None]).sum(axis=0,dtype=bool)] this is my preferred way when b is small and has unique elements. if the elements in b are not unique, then be can be replaced by np.unique(b) If b is large this creates a huge intermediate array The advantage of the new setmember1d_nu is that it handles large b very efficiently. My try on it was more than 10 times slower than the proposed solution for larger arrays. Josef hth, Alan Isaac ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] extract elements of an array that are contained in another array?
On Thu, Jun 4, 2009 at 8:23 AM, Alan G Isaac ais...@american.edu wrote: a[(a==b[:,None]).sum(axis=0,dtype=bool)] On 6/4/2009 8:35 AM josef.p...@gmail.com apparently wrote: If b is large this creates a huge intermediate array True enough, but one could then use fromiter: setb = set(b) itr = (ai for ai in a if ai in setb) out = np.fromiter(itr, dtype=a.dtype) I suspect (?) that b would have to be pretty big relative to a for the repeated testing to be more costly than sorting a. Or if a stable order is not important (I don't recall if the OP specified), one could just np.intersect1d(a, np.unique(b)) On a different note, I think a name change is needed for your function. (Compare intersect1d_nu to see the potential confusion. And btw, what is the use case for intersect1d, which gives neither a set intersection nor a multiset intersection?) Cheers, Alan Isaac ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] extract elements of an array that are contained in another array?
On Thu, Jun 4, 2009 at 10:13 AM, Alan G Isaac ais...@american.edu wrote: On Thu, Jun 4, 2009 at 8:23 AM, Alan G Isaac ais...@american.edu wrote: a[(a==b[:,None]).sum(axis=0,dtype=bool)] On 6/4/2009 8:35 AM josef.p...@gmail.com apparently wrote: If b is large this creates a huge intermediate array True enough, but one could then use fromiter: setb = set(b) itr = (ai for ai in a if ai in setb) out = np.fromiter(itr, dtype=a.dtype) I suspect (?) that b would have to be pretty big relative to a for the repeated testing to be more costly than sorting a. I didn't look at this case very closely for speed, setmember1d and setmember1d_nu return a boolean array, that can be used for indexing, not the actual elements. Your iterator is in python and could be pretty slow, but I only ran the performance script attached to the ticket and the speed differences for different ways of doing it were pretty big for large arrays. Or if a stable order is not important (I don't recall if the OP specified), one could just np.intersect1d(a, np.unique(b)) This requires that also `a` has only unique elements. intersect1d_nu doesn't require unique elements. On a different note, I think a name change is needed for your function. (Compare intersect1d_nu to see the potential confusion. And btw, what is the use case for intersect1d, which gives neither a set intersection nor a multiset intersection?) intersect1d gives set intersection if both arrays have only unique elements (i.e. are sets). I thought the naming is pretty clear: intersect1d(a,b) set intersection if a and b with unique elements intersect1d_nu(a,b) set intersection if a and b with non-unique elements setmember1d(a,b) boolean index array for a of set intersection if a and b with unique elements setmember1d_nu(a,b) boolean index array for a of set intersection if a and b with non-unique elements The new docs http://docs.scipy.org/numpy/docs/numpy.lib.arraysetops.intersect1d/ are a bit clearer. However, I haven't used either of these functions much, and non of them are *my* functions. Of the arraysetops functions, I use unique1d most (because of the return index). I just keep track of these functions because of the use for categorical and dummy variables. Josef Cheers, Alan Isaac ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] extract elements of an array that are contained in another array?
On Thu, Jun 4, 2009 at 10:13 AM, Alan G Isaac ais...@american.edu wrote: Or if a stable order is not important (I don't recall if the OP specified), one could just np.intersect1d(a, np.unique(b)) On 6/4/2009 10:50 AM josef.p...@gmail.com apparently wrote: This requires that also `a` has only unique elements. intersect1d_nu doesn't require unique elements. a array([1, 1, 2, 3, 3, 4]) b array([1, 4]) np.intersect1d(a, np.unique(b)) array([1, 1, 3, 4]) (And thus my question about intersect1d...) Cheers, Alan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] extract elements of an array that are contained in another array?
On Thu, Jun 4, 2009 at 11:12 AM, Alan G Isaac ais...@american.edu wrote: On Thu, Jun 4, 2009 at 10:13 AM, Alan G Isaac ais...@american.edu wrote: Or if a stable order is not important (I don't recall if the OP specified), one could just np.intersect1d(a, np.unique(b)) On 6/4/2009 10:50 AM josef.p...@gmail.com apparently wrote: This requires that also `a` has only unique elements. intersect1d_nu doesn't require unique elements. a array([1, 1, 2, 3, 3, 4]) b array([1, 4]) np.intersect1d(a, np.unique(b)) array([1, 1, 3, 4]) (And thus my question about intersect1d...) Yes, I know, and in my current numpy help file this is the only example there is, which is very misleading for its intended use. a = np.array([1, 1, 2, 3, 3, 4]) b = np.array([1, 4, 5]) np.intersect1d(np.unique(a), np.unique(b)) array([1, 4]) np.intersect1d_nu(a,b) array([1, 4]) Josef Cheers, Alan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] extract elements of an array that are contained in another array?
On 6/4/2009 10:50 AM josef.p...@gmail.com apparently wrote: intersect1d gives set intersection if both arrays have only unique elements (i.e. are sets). I thought the naming is pretty clear: intersect1d(a,b) set intersection if a and b with unique elements intersect1d_nu(a,b) set intersection if a and b with non-unique elements setmember1d(a,b) boolean index array for a of set intersection if a and b with unique elements setmember1d_nu(a,b) boolean index array for a of set intersection if a and b with non-unique elements a array([1, 1, 2, 3, 3, 4]) b array([1, 4, 4, 4]) np.intersect1d_nu(a,b) array([1, 4]) That is, intersect1d_nu is the actual set intersection function. (I.e., intersect1d and intersect1d_nu would most naturally have swapped names.) That is why the appended _nu will not communicate what was intended. (I.e., setmember1d_nu will not be a match for intersect1d_nu.) Cheers, Alan Isaac ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] extract elements of an array that are contained in another array?
Alan G Isaac wrote: On 6/4/2009 10:50 AM josef.p...@gmail.com apparently wrote: intersect1d gives set intersection if both arrays have only unique elements (i.e. are sets). I thought the naming is pretty clear: intersect1d(a,b) set intersection if a and b with unique elements intersect1d_nu(a,b) set intersection if a and b with non-unique elements setmember1d(a,b) boolean index array for a of set intersection if a and b with unique elements setmember1d_nu(a,b) boolean index array for a of set intersection if a and b with non-unique elements a array([1, 1, 2, 3, 3, 4]) b array([1, 4, 4, 4]) np.intersect1d_nu(a,b) array([1, 4]) That is, intersect1d_nu is the actual set intersection function. (I.e., intersect1d and intersect1d_nu would most naturally have swapped names.) That is why the appended _nu will not communicate what was intended. (I.e., setmember1d_nu will not be a match for intersect1d_nu.) The naming should express this: intersect1d expects its arguments are sets, intersect1d_nu does not. A set has unique elements by definition. cheers, r. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] extract elements of an array that are contained in another array?
On Thu, Jun 4, 2009 at 11:19 AM, Alan G Isaac ais...@american.edu wrote: On 6/4/2009 10:50 AM josef.p...@gmail.com apparently wrote: intersect1d gives set intersection if both arrays have only unique elements (i.e. are sets). I thought the naming is pretty clear: intersect1d(a,b) set intersection if a and b with unique elements intersect1d_nu(a,b) set intersection if a and b with non-unique elements setmember1d(a,b) boolean index array for a of set intersection if a and b with unique elements setmember1d_nu(a,b) boolean index array for a of set intersection if a and b with non-unique elements a array([1, 1, 2, 3, 3, 4]) b array([1, 4, 4, 4]) np.intersect1d_nu(a,b) array([1, 4]) That is, intersect1d_nu is the actual set intersection function. (I.e., intersect1d and intersect1d_nu would most naturally have swapped names.) That is why the appended _nu will not communicate what was intended. (I.e., setmember1d_nu will not be a match for intersect1d_nu.) intersect1d is the intersection between sets (which are stored as arrays), just like in the mathematical definition the two sets only have unique elements intersect1d_nu is the intersection between two arrays which can have repeated elements. The result is a set, i.e. unique elements, stored as an array same for setmember1d, setmember1d_nu so postfix `_nu` only means that this function also works if the two arrays are not really sets, i.e. are not required to have unique elements to make sense. intersect1d should throw a domain error if you give it arrays with non-unique elements, which is not done for speed reasons Cheers, Alan Isaac ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] extract elements of an array that are contained in another array?
On 6/4/2009 11:29 AM josef.p...@gmail.com apparently wrote: intersect1d is the intersection between sets (which are stored as arrays), just like in the mathematical definition the two sets only have unique elements Hmmm. OK, I see you and Robert believe this. But it does not match the documentation. But indeed, I see that the documentation is incorrect. E.g., np.intersect1d([1,1,2,3,3,4],[1,4]) array([1, 1, 3, 4]) Is this a bug or a documentation bug? intersect1d_nu is the intersection between two arrays which can have repeated elements. The result is a set, i.e. unique elements, stored as an array same for setmember1d, setmember1d_nu I cannot understand this. Following your proposed reasoning, I expect a[setmember1d_nu(a,b)] to return the same as intersect1d_nu(a, b). It does not. so postfix `_nu` only means that this function also works if the two arrays are not really sets But that just begs the question: what does 'works' mean? See my previous comment (above). intersect1d should throw a domain error if you give it arrays with non-unique elements, which is not done for speed reasons *If* intersect1d behaved *exactly* as documented, the example intersect1d(a, np.unique(b)) shows that the documented behavior can be useful. And indeed, this would be the match to a[setmember1d_nu(a,b)] Cheers, Alan Isaac ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] extract elements of an array that are contained in another array?
On Thu, Jun 4, 2009 at 12:32 PM, Alan G Isaac ais...@american.edu wrote: On 6/4/2009 11:29 AM josef.p...@gmail.com apparently wrote: intersect1d is the intersection between sets (which are stored as arrays), just like in the mathematical definition the two sets only have unique elements Hmmm. OK, I see you and Robert believe this. But it does not match the documentation. But indeed, I see that the documentation is incorrect. E.g., np.intersect1d([1,1,2,3,3,4],[1,4]) array([1, 1, 3, 4]) Is this a bug or a documentation bug? intersect1d_nu is the intersection between two arrays which can have repeated elements. The result is a set, i.e. unique elements, stored as an array same for setmember1d, setmember1d_nu I cannot understand this. Following your proposed reasoning, I expect a[setmember1d_nu(a,b)] to return the same as intersect1d_nu(a, b). It does not. I don't have setmember1d_nu available right now, but from my reading we should have intersect1d_nu(a, b).== np.unique(a[setmember1d_nu(a,b)]) so postfix `_nu` only means that this function also works if the two arrays are not really sets But that just begs the question: what does 'works' mean? See my previous comment (above). intersect1d should throw a domain error if you give it arrays with non-unique elements, which is not done for speed reasons *If* intersect1d behaved *exactly* as documented, the example intersect1d(a, np.unique(b)) shows that the documented behavior can be useful. And indeed, this would be the match to a[setmember1d_nu(a,b)] I'm don't know if anyone looked at the behavior for unintented usage intersect1d rearranges, sorts np.intersect1d([4,1,3,3],[3,4]) array([3, 3, 4]) but it gives you the correct multiplicity np.intersect1d([4,4,4,1,3,3],np.unique([3,4,3,0])) array([3, 3, 4, 4, 4]) so I guess, we have np.intersect1d([4,4,4,1,3,3], np.unique([3,4,3,0])) == np.sort(a[setmember1d_nu(a,b)]) for the example from the help file I don't find any meaningful interpretation np.intersect1d([1,3,3],[3,1,1]) array([1, 1, 3, 3]) wrong answer np.setmember1d([4,1,1,3,3],[3,4]) array([ True, True, False, True, True], dtype=bool) Note: there are two versions of the docs for np.intersect1d, the currently published docs which describe the actual behavior (for the non-unique case), and the new docs on the doc editor http://docs.scipy.org/numpy/docs/numpy.lib.arraysetops.intersect1d/ that describe the intended usage of the functions, which also corresponds closer to the original source docstring (http://docs.scipy.org/numpy/docs/numpy.lib.arraysetops.intersect1d/?revision=-227 ). that's my interpretation If you think that functions make sense also for the unintended usage, then you could add an example to the new docs. Josef ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] extract elements of an array that are contained in another array?
On 6/4/2009 1:27 PM josef.p...@gmail.com apparently wrote: Note: there are two versions of the docs for np.intersect1d, the currently published docs which describe the actual behavior (for the non-unique case), and the new docs on the doc editor http://docs.scipy.org/numpy/docs/numpy.lib.arraysetops.intersect1d/ that describe the intended usage of the functions, which also corresponds closer to the original source docstring (http://docs.scipy.org/numpy/docs/numpy.lib.arraysetops.intersect1d/?revision=-227 ). that's my interpretation Again, the distributed docs do *not* describe the actual behavior for the non-unique case. E.g., np.intersect1d([1,1,2,3,3,4], [1,4]) array([1, 1, 3, 4]) Might this is a better example of failure than the one in the doc editor? However the doc editor version states that the function fails for the non-unique case, so it seems there was a documentation bug that is in the process of being fixed. Thanks, Alan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] extract elements of an array that are contained in another array?
On Thu, Jun 4, 2009 at 2:58 PM, Alan G Isaac ais...@american.edu wrote: On 6/4/2009 1:27 PM josef.p...@gmail.com apparently wrote: Note: there are two versions of the docs for np.intersect1d, the currently published docs which describe the actual behavior (for the non-unique case), and the new docs on the doc editor http://docs.scipy.org/numpy/docs/numpy.lib.arraysetops.intersect1d/ that describe the intended usage of the functions, which also corresponds closer to the original source docstring (http://docs.scipy.org/numpy/docs/numpy.lib.arraysetops.intersect1d/?revision=-227 ). that's my interpretation Again, the distributed docs do *not* describe the actual behavior for the non-unique case. E.g., np.intersect1d([1,1,2,3,3,4], [1,4]) array([1, 1, 3, 4]) Might this is a better example of failure than the one in the doc editor? Thanks, that's a very clear example of a wrong answer, and it removes the question whether the function makes any sense for the non-unique case. I changed the example in the doc editor to this one. It will hopefully merged with the source at the next update. Josef However the doc editor version states that the function fails for the non-unique case, so it seems there was a documentation bug that is in the process of being fixed. Yes Thanks, Alan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] extract elements of an array that are contained in another array?
Concerning the name setmember1d_nu, I personally find it quite verbose and not the name I would expect as a non-insider coming to numpy and not knowing all the names of the more special hidden-away functions and not being a python-wiz either. I think ain(a,b) would be the name I had expected as an array equivalent of a in b (just as arange is the array version of range) or I would had anticipated that an ndarray object would have an in(b) or in_iterable(b) method, such that you could do a.in(b) which would return a boolean array of the same shape as a with elements true if the equivalent a members were members in the iterable b. When I had a problem where I needed this function, I could not find anything near that, and after looking around and also asking here I got some hints to use the 1d functions, which gave me the idea to implement the few-line, very simple proposal for a in b, which is now the proposal under review as the new function setmember1d_nu(a,b). Whereas I see this function name is in line with the existing functions, I really think the names are non-intuitive. I would therefore propose that it was also aliased to a more intuitive name such as ain(a,b) or perhaps better a.in(b) Again, I am probably missing some important points here as a non-experienced Python programmer and numpy user, I am just trying to give some input from the beginners point-of-view, if that can be of any help. Thank you, Kim ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] extract elements of an array that are contained in another array?
On Thu, Jun 04, 2009 at 10:27:11PM +0200, Kim Hansen wrote: in(b) or in_iterable(b) method, such that you could do a.in(b) which would return a boolean array of the same shape as a with elements true if the equivalent a members were members in the iterable b. That would really by what I would be looking for. Gaël ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] extract elements of an array that are contained in another array?
2009/6/4 josef.p...@gmail.com: intersect1d should throw a domain error if you give it arrays with non-unique elements, which is not done for speed reasons It seems to me that this is the basic source of the problem. Perhaps this can be addressed? I realize maintaining compatibility with the current behaviour is necessary, so how about a multistage deprecation: 1. add a keyword argument to intersect1d assume_unique; if it is not present, check for uniqueness and emit a warning if not unique 2. change the warning to an exception Optionally: 3. change the meaning of the function to that of intersect1d_nu if the keyword argument is not present One could do something similar with setmember1d. This would remove the pitfall of the 1d assumption and the wart of the _nu names without hampering performance for people who know they have unique arrays and are in a hurry. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] extract elements of an array that are contained in another array?
On Thu, Jun 4, 2009 at 4:30 PM, Gael Varoquaux gael.varoqu...@normalesup.org wrote: On Thu, Jun 04, 2009 at 10:27:11PM +0200, Kim Hansen wrote: in(b) or in_iterable(b) method, such that you could do a.in(b) which would return a boolean array of the same shape as a with elements true if the equivalent a members were members in the iterable b. That would really by what I would be looking for. Just using in might promise more than it does, eg. it works only for one dimensional arrays, maybe in1d. With in, I would expect a generic function as in python that works with many array types and dimensions. (But I haven't checked whether it would work with a 1d structured array or object array.) I found arraysetops because of unique1d, but I didn't figure out what the subpackage really does, because I was reading arrayse-tops instead of array-set-ops BTW, for the docs, I haven't found a counter example where np.setdiff1d gives the wrong answer for non-unique arrays. Josef ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] extract elements of an array that are contained in another array?
On Thu, Jun 04, 2009 at 04:43:39PM -0400, josef.p...@gmail.com wrote: Just using in might promise more than it does, eg. it works only for one dimensional arrays, maybe in1d. With in, Then 'in_1d' I found arraysetops because of unique1d, but I didn't figure out what the subpackage really does, because I was reading arrayse-tops instead of array-set-ops That's why I push people to use more underscores. IMHO PEP8 lacks a push for underscores. Gaël ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] extract elements of an array that are contained in another array?
On Thu, Jun 4, 2009 at 4:52 PM, Gael Varoquaux gael.varoqu...@normalesup.org wrote: On Thu, Jun 04, 2009 at 04:43:39PM -0400, josef.p...@gmail.com wrote: Just using in might promise more than it does, eg. it works only for one dimensional arrays, maybe in1d. With in, Then 'in_1d' No, if the breaks in a name are obvious, I still prefer names without underscores. I don't think `1d` or `2d` needs to be separated from the word, in1d I always remember how to spell unique1d, but I usually have to check how to spell at_least_2d, or maybe atleast_2d or even atleast2d. how about def setmember1d_nu(a, b): ... #aliases set_member_1d_but_it_does_not_really_have_to_be_a_set = setmember1d_nu in1d = setmember1d_nu Josef [f for f in dir(np) if f[-2:]=='1d' or f[-2:]=='2d'] ['atleast_1d', 'atleast_2d', 'ediff1d', 'histogram2d', 'intersect1d', 'poly1d', 'setdiff1d', 'setmember1d', 'setxor1d', 'union1d', 'unique1d'] [f for f in dir(scipy.signal) if f[-2:]=='1d' or f[-2:]=='2d'] ['atleast_1d', 'atleast_2d', 'convolve2d', 'correlate2d', 'cspline1d', 'cspline2d', 'medfilt2d', 'qspline1d', 'qspline2d', 'sepfir2d'] [f for f in dir(scipy.stats) if f[-2:]=='1d' or f[-2:]=='2d'] [] [f for f in dir(scipy.ndimage) if f[-2:]=='1d' or f[-2:]=='2d'] ['convolve1d', 'correlate1d', 'gaussian_filter1d', 'generic_filter1d', 'maximum_filter1d', 'minimum_filter1d', 'spline_filter1d', 'uniform_filter1d'] I found arraysetops because of unique1d, but I didn't figure out what the subpackage really does, because I was reading arrayse-tops instead of array-set-ops That's why I push people to use more underscores. IMHO PEP8 lacks a push for underscores. Gaël ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] extract elements of an array that are contained in another array?
josef.p...@gmail.com wrote: On Thu, Jun 4, 2009 at 2:58 PM, Alan G Isaac ais...@american.edu wrote: On 6/4/2009 1:27 PM josef.p...@gmail.com apparently wrote: Note: there are two versions of the docs for np.intersect1d, the currently published docs which describe the actual behavior (for the non-unique case), and the new docs on the doc editor http://docs.scipy.org/numpy/docs/numpy.lib.arraysetops.intersect1d/ that describe the intended usage of the functions, which also corresponds closer to the original source docstring (http://docs.scipy.org/numpy/docs/numpy.lib.arraysetops.intersect1d/?revision=-227 ). that's my interpretation Again, the distributed docs do *not* describe the actual behavior for the non-unique case. E.g., np.intersect1d([1,1,2,3,3,4], [1,4]) array([1, 1, 3, 4]) Might this is a better example of failure than the one in the doc editor? Thanks, that's a very clear example of a wrong answer, and it removes the question whether the function makes any sense for the non-unique case. I changed the example in the doc editor to this one. It will hopefully merged with the source at the next update. Thank you Josef! r. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] extract elements of an array that are contained in another array?
Kim Hansen wrote: Concerning the name setmember1d_nu, I personally find it quite verbose and not the name I would expect as a non-insider coming to numpy and not knowing all the names of the more special hidden-away functions and not being a python-wiz either. To explain the naming: those names are used in matlab for functions of similar functionality. If better names are found, I am not against. What I particularly do not like is the _nu suffix (yes, blame me). r. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] extract elements of an array that are contained in another array?
Anne Archibald wrote: 2009/6/4 josef.p...@gmail.com: intersect1d should throw a domain error if you give it arrays with non-unique elements, which is not done for speed reasons It seems to me that this is the basic source of the problem. Perhaps this can be addressed? I realize maintaining compatibility with the current behaviour is necessary, so how about a multistage deprecation: 1. add a keyword argument to intersect1d assume_unique; if it is not present, check for uniqueness and emit a warning if not unique 2. change the warning to an exception Optionally: 3. change the meaning of the function to that of intersect1d_nu if the keyword argument is not present One could do something similar with setmember1d. This would remove the pitfall of the 1d assumption and the wart of the _nu names without hampering performance for people who know they have unique arrays and are in a hurry. You mean something like: def intersect1d(ar1, ar2, assume_unique=False): if not assume_unique: return intersect1d_nu(ar1, ar2) else: ... # the current code intersect1d_nu could be still exported to numpy namespace, or not. I like this. I do not undestand, however, what you mean by remove the pitfall of the 1d assumption? cheers, r. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] extract elements of an array that are contained in another array?
josef.p...@gmail.com wrote: On Thu, Jun 4, 2009 at 4:30 PM, Gael Varoquaux gael.varoqu...@normalesup.org wrote: On Thu, Jun 04, 2009 at 10:27:11PM +0200, Kim Hansen wrote: in(b) or in_iterable(b) method, such that you could do a.in(b) which would return a boolean array of the same shape as a with elements true if the equivalent a members were members in the iterable b. That would really by what I would be looking for. Just using in might promise more than it does, eg. it works only for one dimensional arrays, maybe in1d. With in, I would expect a generic function as in python that works with many array types and dimensions. (But I haven't checked whether it would work with a 1d structured array or object array.) I found arraysetops because of unique1d, but I didn't figure out what the subpackage really does, because I was reading arrayse-tops instead of array-set-ops I am bad in choosing names, but note that numpy sub-modules usually do not use underscores, so array_set_ops would not fit well. BTW, for the docs, I haven't found a counter example where np.setdiff1d gives the wrong answer for non-unique arrays. In [4]: np.setmember1d( [1, 1, 2, 4, 2], [3, 2, 4] ) Out[4]: array([ True, False, True, True, True], dtype=bool) r. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] extract elements of an array that are contained in another array?
On Fri, Jun 5, 2009 at 1:48 AM, Robert Cimrman cimrm...@ntc.zcu.cz wrote: josef.p...@gmail.com wrote: On Thu, Jun 4, 2009 at 4:30 PM, Gael Varoquaux gael.varoqu...@normalesup.org wrote: On Thu, Jun 04, 2009 at 10:27:11PM +0200, Kim Hansen wrote: in(b) or in_iterable(b) method, such that you could do a.in(b) which would return a boolean array of the same shape as a with elements true if the equivalent a members were members in the iterable b. That would really by what I would be looking for. Just using in might promise more than it does, eg. it works only for one dimensional arrays, maybe in1d. With in, I would expect a generic function as in python that works with many array types and dimensions. (But I haven't checked whether it would work with a 1d structured array or object array.) I found arraysetops because of unique1d, but I didn't figure out what the subpackage really does, because I was reading arrayse-tops instead of array-set-ops I am bad in choosing names, but note that numpy sub-modules usually do not use underscores, so array_set_ops would not fit well. I would have chosen something like setfun. Since this is in numpy that sets refers to arrays should be implied. BTW, for the docs, I haven't found a counter example where np.setdiff1d gives the wrong answer for non-unique arrays. In [4]: np.setmember1d( [1, 1, 2, 4, 2], [3, 2, 4] ) Out[4]: array([ True, False, True, True, True], dtype=bool) setdiff1ddiff not member Looking at the source, I think setdiff always works even if for non-unique arrays. Josef r. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] extract elements of an array that are contained in another array?
Hi, I want to extract elements of an array (say, a) that are contained in another array (say, b). That is, if a=array([1,1,2,3,3,4]), b=array([1,4]), then I want array([1,1,4]). I did the following but the speed is very slow (maybe because a is very long): c=array([]) for x in b: c=append(c,a[a==x]) any way to speed it up? Thanks! -Ning ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] extract elements of an array that are contained in another array?
On Wed, Jun 3, 2009 at 8:29 PM, Ning Sean nings...@gmail.com wrote: Hi, I want to extract elements of an array (say, a) that are contained in another array (say, b). That is, if a=array([1,1,2,3,3,4]), b=array([1,4]), then I want array([1,1,4]). I did the following but the speed is very slow (maybe because a is very long): c=array([]) for x in b: c=append(c,a[a==x]) any way to speed it up? Thanks! -Ning It's waiting in Trac for inclusion in numpy http://projects.scipy.org/numpy/ticket/1036 The current version only handles arrays with unique elements. You can copy the ticket attachment, the version there is very fast. Josef ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] extract elements of an array that are contained in another array?
Thanks! Tried it and it is about twice as fast as my approach. -Ning On Wed, Jun 3, 2009 at 7:45 PM, josef.p...@gmail.com wrote: On Wed, Jun 3, 2009 at 8:29 PM, Ning Sean nings...@gmail.com wrote: Hi, I want to extract elements of an array (say, a) that are contained in another array (say, b). That is, if a=array([1,1,2,3,3,4]), b=array([1,4]), then I want array([1,1,4]). I did the following but the speed is very slow (maybe because a is very long): c=array([]) for x in b: c=append(c,a[a==x]) any way to speed it up? Thanks! -Ning It's waiting in Trac for inclusion in numpy http://projects.scipy.org/numpy/ticket/1036 The current version only handles arrays with unique elements. You can copy the ticket attachment, the version there is very fast. Josef ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion