Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-08 Thread Robert Cimrman
Hi Josef,

thanks for the summary! I am responding below, later I will make an 
enhancement ticket.

josef.p...@gmail.com wrote:
 On Sat, Jun 6, 2009 at 4:42 AM, Neil Crighton neilcrigh...@gmail.com wrote:
 Robert Cimrman cimrman3 at ntc.zcu.cz writes:

 Anne Archibald wrote:

 1. add a keyword argument to intersect1d assume_unique; if it is not
 present, check for uniqueness and emit a warning if not unique
 2. change the warning to an exception
 Optionally:
 3. change the meaning of the function to that of intersect1d_nu if the
 keyword argument is not present

 
 1. merge _nu version into one function
 ---
 
 You mean something like:

 def intersect1d(ar1, ar2, assume_unique=False):
  if not assume_unique:
  return intersect1d_nu(ar1, ar2)
  else:
  ... # the current code

 intersect1d_nu could be still exported to numpy namespace, or not.

 +1 - from the user's point of view there should just be intersect1d and
 setmember1d (i.e. no '_nu' versions). The assume_unique keyword Robert 
 suggests
 can be used if speed is a problem.
 
 + 1 on rolling the _nu versions this way into the plain version, this
 would avoid a lot of the confusion.
 It would not be a code breaking API change for existing correct usage
 (but some speed regression without adding keyword)

+1

 depreciate intersect1d_nu
 ^^
 intersect1d_nu could be still exported to numpy namespace, or not.
 I would say not, if they are the default branch of the non _nu version
 
 +1 on depreciation

+0

 2. alias as in
 -
 I really like in1d (no underscore) as a new name for setmember1d_nu. inarray 
 is
 another possibility. I don't like 'ain'; 'a' in front of 'in' detracts from
 readability, unlike the extra a in arange.
 I don't like the extra as either, ones name spaces are commonly used
 
 alias setmember1d_nu as `in1d` or `isin1d`, because the function is a
 in and not a set operation
 +1

+1

 3. behavior of other set functions
 ---
 
 guarantee that setdiff1d works for non-unique arrays (even when
 implementation changes), and change documentation
 +1

+1, it is useful for non-unique arrays.

 need to check other functions
 ^^
 union1d:  works for non-unique arrays, obvious from source

Yes.

 setxor1d: requires unique arrays
 np.setxor1d([1,2,3,3,4,5], [0,0,1,2,2,6])
 array([2, 4, 5, 6])
 np.setxor1d(np.unique([1,2,3,3,4,5]), np.unique([0,0,1,2,2,6]))
 array([0, 3, 4, 5, 6])
 
 setxor: add keyword option and call unique by default
 +1 for symmetry

+1 - you mean np.setxor1d(np.unique(a), np.unique(b)) to become 
np.setxor1d(a, b, assume_unique=False), right?

 ediff1d and unique1d are defined for non-unique arrays

yes

 4. name of keyword
 
 
 intersect1d(ar1, ar2, assume_unique=False)
 
 alternative isunique=False  or just unique=False
 +1 less to write

We should look at other functions in numpy (and/or scipy), what is a 
common scheme here. -1e-1 to the proposed names, as isunique is singular 
only, and unique=False does not show clearly the intent for me. What 
about ar1_unique=False, ar2_unique=False - to address each argument 
specifically?

 5. module name
 ---
 
 rename arraysetops to something easier to read like setfun. I think it
 would only affect internal changes since all functions are exported to
 the main numpy name space
 +1e-4  (I got used to arrayse_tops)

+0 (internal change only). Other numpy/scipy submodules containing a 
bunch of functions are called *pack (fftpack, arpack, lapack), *alg 
(linalg), *utils. *fun is used comonly in the matlab world.

 5. keep docs in sync with correct usage
 -
 
 obvious

+1

thanks,
r.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-08 Thread Robert Cimrman
Robert Cimrman wrote:
 Hi Josef,
 
 thanks for the summary! I am responding below, later I will make an 
 enhancement ticket.

Done, see http://projects.scipy.org/numpy/ticket/1133
r.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-06 Thread Neil Crighton
Robert Cimrman cimrman3 at ntc.zcu.cz writes:

 Anne Archibald wrote:

  1. add a keyword argument to intersect1d assume_unique; if it is not
  present, check for uniqueness and emit a warning if not unique
  2. change the warning to an exception
  Optionally:
  3. change the meaning of the function to that of intersect1d_nu if the
  keyword argument is not present
  
 You mean something like:
 
 def intersect1d(ar1, ar2, assume_unique=False):
  if not assume_unique:
  return intersect1d_nu(ar1, ar2)
  else:
  ... # the current code
 
 intersect1d_nu could be still exported to numpy namespace, or not.
 

+1 - from the user's point of view there should just be intersect1d and
setmember1d (i.e. no '_nu' versions). The assume_unique keyword Robert suggests
can be used if speed is a problem.

I really like in1d (no underscore) as a new name for setmember1d_nu. inarray is
another possibility. I don't like 'ain'; 'a' in front of 'in' detracts from 
readability, unlike the extra a in arange.

Can we summarise the discussion in this thread and write up a short proposal
about what we'd like to change in arraysetops, and how to make the changes? 
Then it's easy for other people to give their opinion on any changes. I can do
this if no one else has time.


Neil


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-06 Thread josef . pktd
On Sat, Jun 6, 2009 at 4:42 AM, Neil Crighton neilcrigh...@gmail.com wrote:
 Robert Cimrman cimrman3 at ntc.zcu.cz writes:

 Anne Archibald wrote:

  1. add a keyword argument to intersect1d assume_unique; if it is not
  present, check for uniqueness and emit a warning if not unique
  2. change the warning to an exception
  Optionally:
  3. change the meaning of the function to that of intersect1d_nu if the
  keyword argument is not present
 

1. merge _nu version into one function
---

 You mean something like:

 def intersect1d(ar1, ar2, assume_unique=False):
      if not assume_unique:
          return intersect1d_nu(ar1, ar2)
      else:
          ... # the current code

 intersect1d_nu could be still exported to numpy namespace, or not.


 +1 - from the user's point of view there should just be intersect1d and
 setmember1d (i.e. no '_nu' versions). The assume_unique keyword Robert 
 suggests
 can be used if speed is a problem.

+ 1 on rolling the _nu versions this way into the plain version, this
would avoid a lot of the confusion.
It would not be a code breaking API change for existing correct usage
(but some speed regression without adding keyword)

depreciate intersect1d_nu
^^
 intersect1d_nu could be still exported to numpy namespace, or not.
I would say not, if they are the default branch of the non _nu version

+1 on depreciation


2. alias as in
-

 I really like in1d (no underscore) as a new name for setmember1d_nu. inarray 
 is
 another possibility. I don't like 'ain'; 'a' in front of 'in' detracts from
 readability, unlike the extra a in arange.
I don't like the extra as either, ones name spaces are commonly used

alias setmember1d_nu as `in1d` or `isin1d`, because the function is a
in and not a set operation
+1


 Can we summarise the discussion in this thread and write up a short proposal
 about what we'd like to change in arraysetops, and how to make the changes?
 Then it's easy for other people to give their opinion on any changes. I can do
 this if no one else has time.


 other points

3. behavior of other set functions
---

guarantee that setdiff1d works for non-unique arrays (even when
implementation changes), and change documentation
+1

need to check other functions
^^
union1d:  works for non-unique arrays, obvious from source

setxor1d: requires unique arrays
 np.setxor1d([1,2,3,3,4,5], [0,0,1,2,2,6])
array([2, 4, 5, 6])
 np.setxor1d(np.unique([1,2,3,3,4,5]), np.unique([0,0,1,2,2,6]))
array([0, 3, 4, 5, 6])

setxor: add keyword option and call unique by default
+1 for symmetry

ediff1d and unique1d are defined for non-unique arrays


4. name of keyword


intersect1d(ar1, ar2, assume_unique=False)

alternative isunique=False  or just unique=False
+1 less to write


5. module name
---

rename arraysetops to something easier to read like setfun. I think it
would only affect internal changes since all functions are exported to
the main numpy name space
+1e-4  (I got used to arrayse_tops)


5. keep docs in sync with correct usage
-

obvious


That's my summary and opinions

Josef


 Neil


 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-05 Thread Robert Cimrman
josef.p...@gmail.com wrote:
 On Fri, Jun 5, 2009 at 1:48 AM, Robert Cimrman cimrm...@ntc.zcu.cz wrote:
 josef.p...@gmail.com wrote:
 On Thu, Jun 4, 2009 at 4:30 PM, Gael Varoquaux
 gael.varoqu...@normalesup.org wrote:
 On Thu, Jun 04, 2009 at 10:27:11PM +0200, Kim Hansen wrote:
 in(b) or in_iterable(b) method, such that you could do a.in(b)
 which would return a boolean array of the same shape as a with
 elements true if the equivalent a members were members in the iterable
 b.
 That would really by what I would be looking for.

 Just using in might promise more than it does, eg. it works only for
 one dimensional arrays, maybe in1d. With in, I would expect a
 generic function as in python that works with many array types and
 dimensions. (But I haven't checked whether it would work with a 1d
 structured array or object array.)

 I found arraysetops because of unique1d, but I didn't figure out what
 the subpackage really does, because I was reading arrayse-tops
 instead of array-set-ops
 I am bad in choosing names, but note that numpy sub-modules usually do
 not use underscores, so array_set_ops would not fit well.
 
 I would have chosen something like setfun.  Since this is in numpy
 that sets refers to arrays should be implied.

Yes, good idea. I am not sure how to proceed, if people agree (name 
contest is open!) What about making an alias name setfun, and deprecate 
the name arraysetops?

 BTW, for the docs, I haven't found a counter example where
 np.setdiff1d gives the wrong answer for non-unique arrays.
 In [4]: np.setmember1d( [1, 1, 2, 4, 2], [3, 2, 4] )
 Out[4]: array([ True, False,  True,  True,  True], dtype=bool)
 
 setdiff1ddiff  not  member
 Looking at the source, I think setdiff always works even if for
 non-unique arrays.

Whoops, sorry. setdiff1d seems really to work for non-unique arrays - it 
relies on the behaviour above though :) - there is always one correct 
False even for repeated entries in the first array.

r.


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-05 Thread David Warde-Farley
On 4-Jun-09, at 4:38 PM, Anne Archibald wrote:

 It seems to me that this is the basic source of the problem. Perhaps
 this can be addressed? I realize maintaining compatibility with the
 current behaviour is necessary, so how about a multistage deprecation:

 1. add a keyword argument to intersect1d assume_unique; if it is not
 present, check for uniqueness and emit a warning if not unique
 2. change the warning to an exception
 Optionally:
 3. change the meaning of the function to that of intersect1d_nu if the
 keyword argument is not present

 One could do something similar with setmember1d.

+1 on this idea. I've been bitten by the non-unique stuff in the past,  
especially with setmember1d, not realizing that both need to be unique.

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread Alan G Isaac
a[(a==b[:,None]).sum(axis=0,dtype=bool)]

hth,
Alan Isaac

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread josef . pktd
On Thu, Jun 4, 2009 at 8:23 AM, Alan G Isaac ais...@american.edu wrote:
 a[(a==b[:,None]).sum(axis=0,dtype=bool)]

this is my preferred way when b is small and has unique elements.
if the elements in b are not unique, then be can be replaced by np.unique(b)
If b is large this creates a huge intermediate array

The advantage of the new setmember1d_nu is that it handles large b
very efficiently. My try on it was more than 10 times slower than the
proposed solution for larger arrays.

Josef

 hth,
 Alan Isaac
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread Alan G Isaac
 On Thu, Jun 4, 2009 at 8:23 AM, Alan G Isaac ais...@american.edu wrote:
 a[(a==b[:,None]).sum(axis=0,dtype=bool)]


On 6/4/2009 8:35 AM josef.p...@gmail.com apparently wrote:
 If b is large this creates a huge intermediate array


True enough, but one could then use fromiter:
setb = set(b)
itr = (ai for ai in a if ai in setb)
out = np.fromiter(itr, dtype=a.dtype)

I suspect (?) that b would have to be pretty
big relative to a for the repeated testing
to be more costly than sorting a.

Or if a stable order is not important (I don't
recall if the OP specified), one could just
np.intersect1d(a, np.unique(b))

On a different note, I think a name change
is needed for your function. (Compare
intersect1d_nu to see the potential
confusion. And btw, what is the use case
for intersect1d, which gives neither a
set intersection nor a multiset intersection?)

Cheers,
Alan Isaac

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread josef . pktd
On Thu, Jun 4, 2009 at 10:13 AM, Alan G Isaac ais...@american.edu wrote:
 On Thu, Jun 4, 2009 at 8:23 AM, Alan G Isaac ais...@american.edu wrote:
 a[(a==b[:,None]).sum(axis=0,dtype=bool)]


 On 6/4/2009 8:35 AM josef.p...@gmail.com apparently wrote:
 If b is large this creates a huge intermediate array


 True enough, but one could then use fromiter:
 setb = set(b)
 itr = (ai for ai in a if ai in setb)
 out = np.fromiter(itr, dtype=a.dtype)

 I suspect (?) that b would have to be pretty
 big relative to a for the repeated testing
 to be more costly than sorting a.

I didn't look at this case very closely for speed, setmember1d and
setmember1d_nu return a boolean array, that can be used for indexing,
not the actual elements.

Your iterator is in python and could be pretty slow, but I only ran
the performance script attached to the ticket and the speed
differences for different ways of doing it were pretty big for large
arrays.


 Or if a stable order is not important (I don't
 recall if the OP specified), one could just
 np.intersect1d(a, np.unique(b))

This requires that also `a` has only unique elements.
intersect1d_nu doesn't require unique elements.


 On a different note, I think a name change
 is needed for your function. (Compare
 intersect1d_nu to see the potential
 confusion. And btw, what is the use case
 for intersect1d, which gives neither a
 set intersection nor a multiset intersection?)

intersect1d gives set intersection if both arrays have only unique
elements (i.e. are sets).
I thought the naming is pretty clear:

intersect1d(a,b)   set intersection if a and b with unique elements
intersect1d_nu(a,b)   set intersection if a and b with non-unique elements
setmember1d(a,b)  boolean index array for a of set intersection if a
and b with unique elements
setmember1d_nu(a,b)  boolean index array for a of set intersection if
a and b with non-unique elements

The new docs http://docs.scipy.org/numpy/docs/numpy.lib.arraysetops.intersect1d/
are a bit clearer.

However, I haven't used either of these functions much, and non of
them are *my* functions.
Of the arraysetops functions, I use unique1d most (because of the
return index).
I just keep track of these functions because of the use for
categorical and dummy variables.

Josef


 Cheers,
 Alan Isaac

 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread Alan G Isaac
 On Thu, Jun 4, 2009 at 10:13 AM, Alan G Isaac ais...@american.edu wrote:
 Or if a stable order is not important (I don't
 recall if the OP specified), one could just
 np.intersect1d(a, np.unique(b))

On 6/4/2009 10:50 AM josef.p...@gmail.com apparently wrote:
 This requires that also `a` has only unique elements.
 intersect1d_nu doesn't require unique elements.


 a
array([1, 1, 2, 3, 3, 4])
 b
array([1, 4])
 np.intersect1d(a, np.unique(b))
array([1, 1, 3, 4])

(And thus my question about intersect1d...)

Cheers,
Alan

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread josef . pktd
On Thu, Jun 4, 2009 at 11:12 AM, Alan G Isaac ais...@american.edu wrote:
 On Thu, Jun 4, 2009 at 10:13 AM, Alan G Isaac ais...@american.edu wrote:
 Or if a stable order is not important (I don't
 recall if the OP specified), one could just
 np.intersect1d(a, np.unique(b))

 On 6/4/2009 10:50 AM josef.p...@gmail.com apparently wrote:
 This requires that also `a` has only unique elements.
 intersect1d_nu doesn't require unique elements.


 a
 array([1, 1, 2, 3, 3, 4])
 b
 array([1, 4])
 np.intersect1d(a, np.unique(b))
 array([1, 1, 3, 4])

 (And thus my question about intersect1d...)

Yes, I know, and in my current numpy help file this is the only
example there is, which is very misleading for its intended use.

 a = np.array([1, 1, 2, 3, 3, 4])
 b = np.array([1, 4, 5])
 np.intersect1d(np.unique(a), np.unique(b))
array([1, 4])

 np.intersect1d_nu(a,b)
array([1, 4])

Josef


 Cheers,
 Alan

 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread Alan G Isaac
On 6/4/2009 10:50 AM josef.p...@gmail.com apparently wrote:
 intersect1d gives set intersection if both arrays have 
 only unique elements (i.e. are sets).  I thought the 
 naming is pretty clear:

 intersect1d(a,b)   set intersection if a and b with unique elements 
 intersect1d_nu(a,b)   set intersection if a and b with non-unique elements 
 setmember1d(a,b)  boolean index array for a of set intersection if a 
 and b with unique elements 
 setmember1d_nu(a,b)  boolean index array for a of set intersection if 
 a and b with non-unique elements 


 a
array([1, 1, 2, 3, 3, 4])
 b
array([1, 4, 4, 4])
 np.intersect1d_nu(a,b)
array([1, 4])

That is, intersect1d_nu is the actual set intersection
function.  (I.e., intersect1d and intersect1d_nu would most
naturally have swapped names.)  That is why the appended _nu
will not communicate what was intended.  (I.e.,
setmember1d_nu will not be a match for intersect1d_nu.)

Cheers,
Alan Isaac


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread Robert Cimrman
Alan G Isaac wrote:
 On 6/4/2009 10:50 AM josef.p...@gmail.com apparently wrote:
 intersect1d gives set intersection if both arrays have 
 only unique elements (i.e. are sets).  I thought the 
 naming is pretty clear:
 
 intersect1d(a,b)   set intersection if a and b with unique elements 
 intersect1d_nu(a,b)   set intersection if a and b with non-unique elements 
 setmember1d(a,b)  boolean index array for a of set intersection if a 
 and b with unique elements 
 setmember1d_nu(a,b)  boolean index array for a of set intersection if 
 a and b with non-unique elements 
 
 
 a
 array([1, 1, 2, 3, 3, 4])
 b
 array([1, 4, 4, 4])
 np.intersect1d_nu(a,b)
 array([1, 4])
 
 That is, intersect1d_nu is the actual set intersection
 function.  (I.e., intersect1d and intersect1d_nu would most
 naturally have swapped names.)  That is why the appended _nu
 will not communicate what was intended.  (I.e.,
 setmember1d_nu will not be a match for intersect1d_nu.)

The naming should express this: intersect1d expects its arguments are 
sets, intersect1d_nu does not. A set has unique elements by definition.

cheers,
r.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread josef . pktd
On Thu, Jun 4, 2009 at 11:19 AM, Alan G Isaac ais...@american.edu wrote:
 On 6/4/2009 10:50 AM josef.p...@gmail.com apparently wrote:
 intersect1d gives set intersection if both arrays have
 only unique elements (i.e. are sets).  I thought the
 naming is pretty clear:

 intersect1d(a,b)   set intersection if a and b with unique elements
 intersect1d_nu(a,b)   set intersection if a and b with non-unique elements
 setmember1d(a,b)  boolean index array for a of set intersection if a
 and b with unique elements
 setmember1d_nu(a,b)  boolean index array for a of set intersection if
 a and b with non-unique elements


 a
 array([1, 1, 2, 3, 3, 4])
 b
 array([1, 4, 4, 4])
 np.intersect1d_nu(a,b)
 array([1, 4])

 That is, intersect1d_nu is the actual set intersection
 function.  (I.e., intersect1d and intersect1d_nu would most
 naturally have swapped names.)  That is why the appended _nu
 will not communicate what was intended.  (I.e.,
 setmember1d_nu will not be a match for intersect1d_nu.)

intersect1d  is the intersection between sets (which are stored as
arrays), just like in the mathematical definition the two sets only
have unique elements

intersect1d_nu is the intersection between two arrays which can have
repeated elements. The result is a set, i.e. unique elements, stored
as an array

same for setmember1d, setmember1d_nu

so  postfix `_nu` only means that this function also works if the two
arrays are not really sets, i.e. are not required to have unique
elements to make sense.


intersect1d should throw a domain error if you give it arrays with
non-unique elements, which is not done for speed reasons


 Cheers,
 Alan Isaac
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread Alan G Isaac
On 6/4/2009 11:29 AM josef.p...@gmail.com apparently wrote:
 intersect1d  is the intersection between sets (which are stored as 
 arrays), just like in the mathematical definition the two sets only 
 have unique elements 

Hmmm. OK, I see you and Robert believe this.
But it does not match the documentation.
But indeed, I see that the documentation is incorrect.
E.g.,

 np.intersect1d([1,1,2,3,3,4],[1,4])
array([1, 1, 3, 4])

Is this a bug or a documentation bug?



 intersect1d_nu is the intersection between two arrays which can have 
 repeated elements. The result is a set, i.e. unique elements, stored 
 as an array 

 same for setmember1d, setmember1d_nu 

I cannot understand this.
Following your proposed reasoning,
I expect a[setmember1d_nu(a,b)]
to return the same as
intersect1d_nu(a, b).
It does not.



 so  postfix `_nu` only means that this function also works 
 if the two arrays are not really sets

But that just begs the question: what does 'works' mean?
See my previous comment (above).



 intersect1d should throw a domain error if you give it arrays with 
 non-unique elements, which is not done for speed reasons 

*If* intersect1d behaved *exactly* as documented,
the example
intersect1d(a, np.unique(b))
shows that the documented behavior can be useful.
And indeed, this would be the match to
a[setmember1d_nu(a,b)]

Cheers,
Alan Isaac


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread josef . pktd
On Thu, Jun 4, 2009 at 12:32 PM, Alan G Isaac ais...@american.edu wrote:
 On 6/4/2009 11:29 AM josef.p...@gmail.com apparently wrote:
 intersect1d  is the intersection between sets (which are stored as
 arrays), just like in the mathematical definition the two sets only
 have unique elements

 Hmmm. OK, I see you and Robert believe this.
 But it does not match the documentation.
 But indeed, I see that the documentation is incorrect.
 E.g.,

 np.intersect1d([1,1,2,3,3,4],[1,4])
 array([1, 1, 3, 4])

 Is this a bug or a documentation bug?



 intersect1d_nu is the intersection between two arrays which can have
 repeated elements. The result is a set, i.e. unique elements, stored
 as an array

 same for setmember1d, setmember1d_nu

 I cannot understand this.
 Following your proposed reasoning,
 I expect a[setmember1d_nu(a,b)]
 to return the same as
 intersect1d_nu(a, b).
 It does not.

I don't have setmember1d_nu available right now, but from my reading
we should have

 intersect1d_nu(a, b).== np.unique(a[setmember1d_nu(a,b)])





 so  postfix `_nu` only means that this function also works
 if the two arrays are not really sets

 But that just begs the question: what does 'works' mean?
 See my previous comment (above).



 intersect1d should throw a domain error if you give it arrays with
 non-unique elements, which is not done for speed reasons

 *If* intersect1d behaved *exactly* as documented,
 the example
 intersect1d(a, np.unique(b))
 shows that the documented behavior can be useful.
 And indeed, this would be the match to
 a[setmember1d_nu(a,b)]

I'm don't know if anyone looked at the behavior for unintented usage

intersect1d  rearranges, sorts
 np.intersect1d([4,1,3,3],[3,4])
array([3, 3, 4])

but it gives you the correct multiplicity
 np.intersect1d([4,4,4,1,3,3],np.unique([3,4,3,0]))
array([3, 3, 4, 4, 4])

so I guess, we have
np.intersect1d([4,4,4,1,3,3], np.unique([3,4,3,0])) ==
np.sort(a[setmember1d_nu(a,b)])

for the example from the help file I don't find any meaningful interpretation
 np.intersect1d([1,3,3],[3,1,1])
array([1, 1, 3, 3])


wrong answer
 np.setmember1d([4,1,1,3,3],[3,4])
array([ True,  True, False,  True,  True], dtype=bool)

Note: there are two versions of the docs for np.intersect1d, the
currently published docs which describe the actual behavior (for the
non-unique case), and the new docs on the doc editor
http://docs.scipy.org/numpy/docs/numpy.lib.arraysetops.intersect1d/
that describe the intended usage of the functions, which also
corresponds closer to the original source docstring
(http://docs.scipy.org/numpy/docs/numpy.lib.arraysetops.intersect1d/?revision=-227
). that's my interpretation

If you think that functions make sense also for the unintended
usage, then you could add an example to the new docs.

Josef
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread Alan G Isaac
On 6/4/2009 1:27 PM josef.p...@gmail.com apparently wrote:
 Note: there are two versions of the docs for np.intersect1d, the
 currently published docs which describe the actual behavior (for the
 non-unique case), and the new docs on the doc editor
 http://docs.scipy.org/numpy/docs/numpy.lib.arraysetops.intersect1d/
 that describe the intended usage of the functions, which also
 corresponds closer to the original source docstring
 (http://docs.scipy.org/numpy/docs/numpy.lib.arraysetops.intersect1d/?revision=-227
 ). that's my interpretation


Again, the distributed docs do *not* describe the actual
behavior for the non-unique case.  E.g.,

 np.intersect1d([1,1,2,3,3,4], [1,4])
array([1, 1, 3, 4])

Might this is a better example of
failure than the one in the doc editor?

However the doc editor version states that the function
fails for the non-unique case, so it seems there was a
documentation bug that is in the process of being fixed.

Thanks,
Alan

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread josef . pktd
On Thu, Jun 4, 2009 at 2:58 PM, Alan G Isaac ais...@american.edu wrote:
 On 6/4/2009 1:27 PM josef.p...@gmail.com apparently wrote:
 Note: there are two versions of the docs for np.intersect1d, the
 currently published docs which describe the actual behavior (for the
 non-unique case), and the new docs on the doc editor
 http://docs.scipy.org/numpy/docs/numpy.lib.arraysetops.intersect1d/
 that describe the intended usage of the functions, which also
 corresponds closer to the original source docstring
 (http://docs.scipy.org/numpy/docs/numpy.lib.arraysetops.intersect1d/?revision=-227
 ). that's my interpretation


 Again, the distributed docs do *not* describe the actual
 behavior for the non-unique case.  E.g.,

 np.intersect1d([1,1,2,3,3,4], [1,4])
 array([1, 1, 3, 4])

 Might this is a better example of
 failure than the one in the doc editor?

Thanks, that's a very clear example of a wrong answer,
and it removes the question whether the function makes any sense for
the non-unique case.
I changed the example in the doc editor to this one.

It will hopefully merged with the source at the next update.

Josef



 However the doc editor version states that the function
 fails for the non-unique case, so it seems there was a
 documentation bug that is in the process of being fixed.

Yes


 Thanks,
 Alan

 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread Kim Hansen
Concerning the name setmember1d_nu, I personally find it quite verbose
and not the name I would expect as a non-insider coming to numpy and
not knowing all the names of the more special hidden-away functions
and not being a python-wiz either.

I think ain(a,b) would be the name I had expected as an array
equivalent of a in b (just as arange is the array version of range)
or I would had anticipated that an ndarray object would have an
in(b) or in_iterable(b) method, such that you could do a.in(b)
which would return a boolean array of the same shape as a with
elements true if the equivalent a members were members in the iterable
b.

When I had a problem where I needed this function, I could not find
anything near that, and after looking around and also asking here I
got some hints to use the 1d functions, which gave me the idea to
implement the few-line, very simple proposal for a in b, which is
now the proposal under review as the new function setmember1d_nu(a,b).
Whereas I see this function name is in line with the existing
functions, I really think the names are non-intuitive. I would
therefore propose that it was also aliased to a more intuitive name
such as ain(a,b) or perhaps better a.in(b)

Again, I am probably missing some important points here as a
non-experienced Python programmer and numpy user, I am just trying to
give some input from the beginners point-of-view, if that can be of
any help.

Thank you,

Kim
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread Gael Varoquaux
On Thu, Jun 04, 2009 at 10:27:11PM +0200, Kim Hansen wrote:
 in(b) or in_iterable(b) method, such that you could do a.in(b)
 which would return a boolean array of the same shape as a with
 elements true if the equivalent a members were members in the iterable
 b.

That would really by what I would be looking for.

Gaël
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread Anne Archibald
2009/6/4  josef.p...@gmail.com:

 intersect1d should throw a domain error if you give it arrays with
 non-unique elements, which is not done for speed reasons

It seems to me that this is the basic source of the problem. Perhaps
this can be addressed? I realize maintaining compatibility with the
current behaviour is necessary, so how about a multistage deprecation:

1. add a keyword argument to intersect1d assume_unique; if it is not
present, check for uniqueness and emit a warning if not unique
2. change the warning to an exception
Optionally:
3. change the meaning of the function to that of intersect1d_nu if the
keyword argument is not present

One could do something similar with setmember1d.

This would remove the pitfall of the 1d assumption and the wart of the
_nu names without hampering performance for people who know they have
unique arrays and are in a hurry.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread josef . pktd
On Thu, Jun 4, 2009 at 4:30 PM, Gael Varoquaux
gael.varoqu...@normalesup.org wrote:
 On Thu, Jun 04, 2009 at 10:27:11PM +0200, Kim Hansen wrote:
 in(b) or in_iterable(b) method, such that you could do a.in(b)
 which would return a boolean array of the same shape as a with
 elements true if the equivalent a members were members in the iterable
 b.

 That would really by what I would be looking for.


Just using in might promise more than it does, eg. it works only for
one dimensional arrays, maybe in1d. With in, I would expect a
generic function as in python that works with many array types and
dimensions. (But I haven't checked whether it would work with a 1d
structured array or object array.)

I found arraysetops because of unique1d, but I didn't figure out what
the subpackage really does, because I was reading arrayse-tops
instead of array-set-ops

BTW, for the docs, I haven't found a counter example where
np.setdiff1d gives the wrong answer for non-unique arrays.

Josef
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread Gael Varoquaux
On Thu, Jun 04, 2009 at 04:43:39PM -0400, josef.p...@gmail.com wrote:
 Just using in might promise more than it does, eg. it works only for
 one dimensional arrays, maybe in1d. With in, 

Then 'in_1d'

 I found arraysetops because of unique1d, but I didn't figure out what
 the subpackage really does, because I was reading arrayse-tops
 instead of array-set-ops

That's why I push people to use more underscores. IMHO PEP8 lacks a push
for underscores.

Gaël
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread josef . pktd
On Thu, Jun 4, 2009 at 4:52 PM, Gael Varoquaux
gael.varoqu...@normalesup.org wrote:
 On Thu, Jun 04, 2009 at 04:43:39PM -0400, josef.p...@gmail.com wrote:
 Just using in might promise more than it does, eg. it works only for
 one dimensional arrays, maybe in1d. With in,

 Then 'in_1d'

No, if the breaks in a name are obvious, I still prefer names without
underscores. I don't think `1d` or `2d` needs to be separated from the
word, in1d
I always remember how to spell unique1d, but I usually have to check
how to spell at_least_2d, or maybe atleast_2d or even atleast2d.

how about

def setmember1d_nu(a, b):
...

#aliases
set_member_1d_but_it_does_not_really_have_to_be_a_set = setmember1d_nu
in1d = setmember1d_nu

Josef

 [f for f in dir(np) if f[-2:]=='1d' or f[-2:]=='2d']
['atleast_1d', 'atleast_2d', 'ediff1d', 'histogram2d', 'intersect1d',
'poly1d', 'setdiff1d', 'setmember1d', 'setxor1d', 'union1d',
'unique1d']

 [f for f in dir(scipy.signal) if f[-2:]=='1d' or f[-2:]=='2d']
['atleast_1d', 'atleast_2d', 'convolve2d', 'correlate2d', 'cspline1d',
'cspline2d', 'medfilt2d', 'qspline1d', 'qspline2d', 'sepfir2d']

 [f for f in dir(scipy.stats) if f[-2:]=='1d' or f[-2:]=='2d']
[]

 [f for f in dir(scipy.ndimage) if f[-2:]=='1d' or f[-2:]=='2d']
['convolve1d', 'correlate1d', 'gaussian_filter1d', 'generic_filter1d',
'maximum_filter1d', 'minimum_filter1d', 'spline_filter1d',
'uniform_filter1d']



 I found arraysetops because of unique1d, but I didn't figure out what
 the subpackage really does, because I was reading arrayse-tops
 instead of array-set-ops

 That's why I push people to use more underscores. IMHO PEP8 lacks a push
 for underscores.

 Gaël
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread Robert Cimrman
josef.p...@gmail.com wrote:
 On Thu, Jun 4, 2009 at 2:58 PM, Alan G Isaac ais...@american.edu wrote:
 On 6/4/2009 1:27 PM josef.p...@gmail.com apparently wrote:
 Note: there are two versions of the docs for np.intersect1d, the
 currently published docs which describe the actual behavior (for the
 non-unique case), and the new docs on the doc editor
 http://docs.scipy.org/numpy/docs/numpy.lib.arraysetops.intersect1d/
 that describe the intended usage of the functions, which also
 corresponds closer to the original source docstring
 (http://docs.scipy.org/numpy/docs/numpy.lib.arraysetops.intersect1d/?revision=-227
 ). that's my interpretation

 Again, the distributed docs do *not* describe the actual
 behavior for the non-unique case.  E.g.,

 np.intersect1d([1,1,2,3,3,4], [1,4])
 array([1, 1, 3, 4])

 Might this is a better example of
 failure than the one in the doc editor?
 
 Thanks, that's a very clear example of a wrong answer,
 and it removes the question whether the function makes any sense for
 the non-unique case.
 I changed the example in the doc editor to this one.
 
 It will hopefully merged with the source at the next update.

Thank you Josef!

r.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread Robert Cimrman
Kim Hansen wrote:
 Concerning the name setmember1d_nu, I personally find it quite verbose
 and not the name I would expect as a non-insider coming to numpy and
 not knowing all the names of the more special hidden-away functions
 and not being a python-wiz either.

To explain the naming: those names are used in matlab for functions of 
similar functionality. If better names are found, I am not against.

What I particularly do not like is the _nu suffix (yes, blame me).

r.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread Robert Cimrman
Anne Archibald wrote:
 2009/6/4  josef.p...@gmail.com:
 
 intersect1d should throw a domain error if you give it arrays with
 non-unique elements, which is not done for speed reasons
 
 It seems to me that this is the basic source of the problem. Perhaps
 this can be addressed? I realize maintaining compatibility with the
 current behaviour is necessary, so how about a multistage deprecation:
 
 1. add a keyword argument to intersect1d assume_unique; if it is not
 present, check for uniqueness and emit a warning if not unique
 2. change the warning to an exception
 Optionally:
 3. change the meaning of the function to that of intersect1d_nu if the
 keyword argument is not present
 
 One could do something similar with setmember1d.
 
 This would remove the pitfall of the 1d assumption and the wart of the
 _nu names without hampering performance for people who know they have
 unique arrays and are in a hurry.

You mean something like:

def intersect1d(ar1, ar2, assume_unique=False):
 if not assume_unique:
 return intersect1d_nu(ar1, ar2)
 else:
 ... # the current code

intersect1d_nu could be still exported to numpy namespace, or not.

I like this. I do not undestand, however, what you mean by remove the 
pitfall of the 1d assumption?

cheers,
r.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread Robert Cimrman
josef.p...@gmail.com wrote:
 On Thu, Jun 4, 2009 at 4:30 PM, Gael Varoquaux
 gael.varoqu...@normalesup.org wrote:
 On Thu, Jun 04, 2009 at 10:27:11PM +0200, Kim Hansen wrote:
 in(b) or in_iterable(b) method, such that you could do a.in(b)
 which would return a boolean array of the same shape as a with
 elements true if the equivalent a members were members in the iterable
 b.
 That would really by what I would be looking for.

 
 Just using in might promise more than it does, eg. it works only for
 one dimensional arrays, maybe in1d. With in, I would expect a
 generic function as in python that works with many array types and
 dimensions. (But I haven't checked whether it would work with a 1d
 structured array or object array.)
 
 I found arraysetops because of unique1d, but I didn't figure out what
 the subpackage really does, because I was reading arrayse-tops
 instead of array-set-ops

I am bad in choosing names, but note that numpy sub-modules usually do 
not use underscores, so array_set_ops would not fit well.

 BTW, for the docs, I haven't found a counter example where
 np.setdiff1d gives the wrong answer for non-unique arrays.

In [4]: np.setmember1d( [1, 1, 2, 4, 2], [3, 2, 4] )
Out[4]: array([ True, False,  True,  True,  True], dtype=bool)

r.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread josef . pktd
On Fri, Jun 5, 2009 at 1:48 AM, Robert Cimrman cimrm...@ntc.zcu.cz wrote:
 josef.p...@gmail.com wrote:
 On Thu, Jun 4, 2009 at 4:30 PM, Gael Varoquaux
 gael.varoqu...@normalesup.org wrote:
 On Thu, Jun 04, 2009 at 10:27:11PM +0200, Kim Hansen wrote:
 in(b) or in_iterable(b) method, such that you could do a.in(b)
 which would return a boolean array of the same shape as a with
 elements true if the equivalent a members were members in the iterable
 b.
 That would really by what I would be looking for.


 Just using in might promise more than it does, eg. it works only for
 one dimensional arrays, maybe in1d. With in, I would expect a
 generic function as in python that works with many array types and
 dimensions. (But I haven't checked whether it would work with a 1d
 structured array or object array.)

 I found arraysetops because of unique1d, but I didn't figure out what
 the subpackage really does, because I was reading arrayse-tops
 instead of array-set-ops

 I am bad in choosing names, but note that numpy sub-modules usually do
 not use underscores, so array_set_ops would not fit well.

I would have chosen something like setfun.  Since this is in numpy
that sets refers to arrays should be implied.


 BTW, for the docs, I haven't found a counter example where
 np.setdiff1d gives the wrong answer for non-unique arrays.

 In [4]: np.setmember1d( [1, 1, 2, 4, 2], [3, 2, 4] )
 Out[4]: array([ True, False,  True,  True,  True], dtype=bool)

setdiff1ddiff  not  member
Looking at the source, I think setdiff always works even if for
non-unique arrays.

Josef


 r.

 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-03 Thread Ning Sean
Hi, I want to extract elements of an array (say, a) that are contained in
another array (say, b). That is, if a=array([1,1,2,3,3,4]), b=array([1,4]),
then I want array([1,1,4]).

I did the following but the speed is very slow (maybe because a is very
long):

c=array([])
for x in b:
   c=append(c,a[a==x])

any way to speed it up?

Thanks!
-Ning
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-03 Thread josef . pktd
On Wed, Jun 3, 2009 at 8:29 PM, Ning Sean nings...@gmail.com wrote:
 Hi, I want to extract elements of an array (say, a) that are contained in
 another array (say, b). That is, if a=array([1,1,2,3,3,4]), b=array([1,4]),
 then I want array([1,1,4]).

 I did the following but the speed is very slow (maybe because a is very
 long):

 c=array([])
 for x in b:
    c=append(c,a[a==x])

 any way to speed it up?

 Thanks!
 -Ning



It's waiting in Trac for inclusion in numpy
http://projects.scipy.org/numpy/ticket/1036
The current version only handles arrays with unique elements.

You can copy the ticket attachment, the version there is very fast.

Josef
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-03 Thread Ning Sean
Thanks! Tried it and it is about twice as fast as my approach.

-Ning

On Wed, Jun 3, 2009 at 7:45 PM, josef.p...@gmail.com wrote:

 On Wed, Jun 3, 2009 at 8:29 PM, Ning Sean nings...@gmail.com wrote:
  Hi, I want to extract elements of an array (say, a) that are contained in
  another array (say, b). That is, if a=array([1,1,2,3,3,4]),
 b=array([1,4]),
  then I want array([1,1,4]).
 
  I did the following but the speed is very slow (maybe because a is very
  long):
 
  c=array([])
  for x in b:
 c=append(c,a[a==x])
 
  any way to speed it up?
 
  Thanks!
  -Ning
 


 It's waiting in Trac for inclusion in numpy
 http://projects.scipy.org/numpy/ticket/1036
 The current version only handles arrays with unique elements.

 You can copy the ticket attachment, the version there is very fast.

 Josef
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion