[Numpy-discussion] np.unique with structured arrays

2014-08-22 Thread Nicolas P. Rougier

Hello,

I've found a strange behavior or I'm missing something obvious (or np.unique is 
not supposed to work with structured arrays).

I'm trying to extract unique values from a simple structured array but it does 
not seem to work as expected.
Here is a minimal script showing the problem:

import numpy as np

V = np.zeros(4, dtype=[(v, np.float32, 3)])
V[v] = [ [0.5,0.0,   1.0],
   [0.5, -1.e-16,  1.0], # [0.5, +1.e-16,  1.0] works
   [0.5,0.0,  -1.0],
   [0.5, -1.e-16, -1.0]] # [0.5, +1.e-16, -1.0]] works
V_ = np.zeros_like(V)
V_[v][:,0] = V[v][:,0].round(decimals=3)
V_[v][:,1] = V[v][:,1].round(decimals=3)
V_[v][:,2] = V[v][:,2].round(decimals=3)

print np.unique(V_)
[([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],) ([0.5, -0.0, 1.0],) ([0.5, -0.0, 
-1.0],)]


While I would have expected:

[([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],)]


Can anyone confirm ?


Nicolas___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.unique with structured arrays

2014-08-22 Thread Jaime Fernández del Río
I can confirm, the issue seems to be in sorting:

 np.sort(V_)
array([([0.5, 0.0, 1.0],), ([0.5, 0.0, -1.0],), ([0.5, -0.0, 1.0],),
   ([0.5, -0.0, -1.0],)],
  dtype=[('v', 'f4', (3,))])

These I think are handled by the generic sort functions, and it looks like
the comparison function being used is the one for a VOID dtype with no
fields, so it is being done byte-wise, hence the problems with 0.0 and
-0.0. Not sure where exactly the bug is, though...

Jaime



On Fri, Aug 22, 2014 at 6:20 AM, Nicolas P. Rougier 
nicolas.roug...@inria.fr wrote:


 Hello,

 I've found a strange behavior or I'm missing something obvious (or
 np.unique is not supposed to work with structured arrays).

 I'm trying to extract unique values from a simple structured array but it
 does not seem to work as expected.
 Here is a minimal script showing the problem:

 import numpy as np

 V = np.zeros(4, dtype=[(v, np.float32, 3)])
 V[v] = [ [0.5,0.0,   1.0],
[0.5, -1.e-16,  1.0], # [0.5, +1.e-16,  1.0] works
[0.5,0.0,  -1.0],
[0.5, -1.e-16, -1.0]] # [0.5, +1.e-16, -1.0]] works
 V_ = np.zeros_like(V)
 V_[v][:,0] = V[v][:,0].round(decimals=3)
 V_[v][:,1] = V[v][:,1].round(decimals=3)
 V_[v][:,2] = V[v][:,2].round(decimals=3)

 print np.unique(V_)
 [([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],) ([0.5, -0.0, 1.0],) ([0.5, -0.0,
 -1.0],)]


 While I would have expected:

 [([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],)]


 Can anyone confirm ?


 Nicolas

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.unique with structured arrays

2014-08-22 Thread Eelco Hoogendoorn
It does not sound like an issue with unique, but rather like a matter of 
floating point equality and representation. Do the ' identical' elements pass 
an equality test?

-Original Message-
From: Nicolas P. Rougier nicolas.roug...@inria.fr
Sent: ‎22-‎8-‎2014 15:21
To: Discussion of Numerical Python numpy-discussion@scipy.org
Subject: [Numpy-discussion] np.unique with structured arrays



Hello,


I've found a strange behavior or I'm missing something obvious (or np.unique is 
not supposed to work with structured arrays).


I'm trying to extract unique values from a simple structured array but it does 
not seem to work as expected.
Here is a minimal script showing the problem:


import numpy as np

V = np.zeros(4, dtype=[(v, np.float32, 3)])
V[v] = [ [0.5,0.0,   1.0],
   [0.5, -1.e-16,  1.0], # [0.5, +1.e-16,  1.0] works
   [0.5,0.0,  -1.0],
   [0.5, -1.e-16, -1.0]] # [0.5, +1.e-16, -1.0]] works
V_ = np.zeros_like(V)
V_[v][:,0] = V[v][:,0].round(decimals=3)
V_[v][:,1] = V[v][:,1].round(decimals=3)
V_[v][:,2] = V[v][:,2].round(decimals=3)

print np.unique(V_)
[([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],) ([0.5, -0.0, 1.0],) ([0.5, -0.0, 
-1.0],)]




While I would have expected:


[([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],)]




Can anyone confirm ?




Nicolas___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.unique with structured arrays

2014-08-22 Thread Eelco Hoogendoorn
Oh yeah this could be. Floating point equality and bitwise equality are not the 
same thing. 

-Original Message-
From: Jaime Fernández del Río jaime.f...@gmail.com
Sent: ‎22-‎8-‎2014 16:22
To: Discussion of Numerical Python numpy-discussion@scipy.org
Subject: Re: [Numpy-discussion] np.unique with structured arrays

I can confirm, the issue seems to be in sorting:


 np.sort(V_)
array([([0.5, 0.0, 1.0],), ([0.5, 0.0, -1.0],), ([0.5, -0.0, 1.0],),
   ([0.5, -0.0, -1.0],)],
  dtype=[('v', 'f4', (3,))])


These I think are handled by the generic sort functions, and it looks like the 
comparison function being used is the one for a VOID dtype with no fields, so 
it is being done byte-wise, hence the problems with 0.0 and -0.0. Not sure 
where exactly the bug is, though...


Jaime





On Fri, Aug 22, 2014 at 6:20 AM, Nicolas P. Rougier nicolas.roug...@inria.fr 
wrote:



Hello,


I've found a strange behavior or I'm missing something obvious (or np.unique is 
not supposed to work with structured arrays).


I'm trying to extract unique values from a simple structured array but it does 
not seem to work as expected.
Here is a minimal script showing the problem:


import numpy as np

V = np.zeros(4, dtype=[(v, np.float32, 3)])
V[v] = [ [0.5,0.0,   1.0],
   [0.5, -1.e-16,  1.0], # [0.5, +1.e-16,  1.0] works
   [0.5,0.0,  -1.0],
   [0.5, -1.e-16, -1.0]] # [0.5, +1.e-16, -1.0]] works
V_ = np.zeros_like(V)
V_[v][:,0] = V[v][:,0].round(decimals=3)
V_[v][:,1] = V[v][:,1].round(decimals=3)
V_[v][:,2] = V[v][:,2].round(decimals=3)

print np.unique(V_)
[([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],) ([0.5, -0.0, 1.0],) ([0.5, -0.0, 
-1.0],)]




While I would have expected:


[([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],)]




Can anyone confirm ?




Nicolas

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion







-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de 
dominación mundial. ___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.unique with structured arrays

2014-08-22 Thread Jaime Fernández del Río
structured arrays are of VOID dtype, but with a non-None names attribute:

 V_.dtype.num
20
 V_.dtype.names
('v',)
 V_.view(np.void).dtype.num
20
 V_.view(np.void).dtype.names


The comparison function uses the STRING comparison function if names is
None, or a proper field by field comparison if not, see here:

https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/arraytypes.c.src#L2675

With a quick look at the source, the only fishy thing I see is that the
original array has the sort axis moved to the end of the shape tuple, and
is then copied into a contiguous array here:

https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/item_selection.c#L1151

But that new array should preserve the dtype unchanged, and hence the right
compare function should be called. If no one with a better understanding of
the internals spots it, I will try to further debug it over the weekend.

Jaime


On Fri, Aug 22, 2014 at 7:54 AM, Eelco Hoogendoorn 
hoogendoorn.ee...@gmail.com wrote:

 Oh yeah this could be. Floating point equality and bitwise equality are
 not the same thing.
 --
 From: Jaime Fernández del Río jaime.f...@gmail.com
 Sent: ‎22-‎8-‎2014 16:22

 To: Discussion of Numerical Python numpy-discussion@scipy.org
 Subject: Re: [Numpy-discussion] np.unique with structured arrays

 I can confirm, the issue seems to be in sorting:

  np.sort(V_)
 array([([0.5, 0.0, 1.0],), ([0.5, 0.0, -1.0],), ([0.5, -0.0, 1.0],),
([0.5, -0.0, -1.0],)],
   dtype=[('v', 'f4', (3,))])

 These I think are handled by the generic sort functions, and it looks like
 the comparison function being used is the one for a VOID dtype with no
 fields, so it is being done byte-wise, hence the problems with 0.0 and
 -0.0. Not sure where exactly the bug is, though...

 Jaime



 On Fri, Aug 22, 2014 at 6:20 AM, Nicolas P. Rougier 
 nicolas.roug...@inria.fr wrote:


 Hello,

 I've found a strange behavior or I'm missing something obvious (or
 np.unique is not supposed to work with structured arrays).

 I'm trying to extract unique values from a simple structured array but it
 does not seem to work as expected.
 Here is a minimal script showing the problem:

 import numpy as np

 V = np.zeros(4, dtype=[(v, np.float32, 3)])
 V[v] = [ [0.5,0.0,   1.0],
[0.5, -1.e-16,  1.0], # [0.5, +1.e-16,  1.0] works
[0.5,0.0,  -1.0],
[0.5, -1.e-16, -1.0]] # [0.5, +1.e-16, -1.0]] works
 V_ = np.zeros_like(V)
 V_[v][:,0] = V[v][:,0].round(decimals=3)
 V_[v][:,1] = V[v][:,1].round(decimals=3)
 V_[v][:,2] = V[v][:,2].round(decimals=3)

 print np.unique(V_)
 [([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],) ([0.5, -0.0, 1.0],) ([0.5, -0.0,
 -1.0],)]


 While I would have expected:

 [([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],)]


 Can anyone confirm ?


 Nicolas

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




 --
 (\__/)
 ( O.o)
 (  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
 de dominación mundial.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion