[Numpy-discussion] np.unique with structured arrays
Hello, I've found a strange behavior or I'm missing something obvious (or np.unique is not supposed to work with structured arrays). I'm trying to extract unique values from a simple structured array but it does not seem to work as expected. Here is a minimal script showing the problem: import numpy as np V = np.zeros(4, dtype=[(v, np.float32, 3)]) V[v] = [ [0.5,0.0, 1.0], [0.5, -1.e-16, 1.0], # [0.5, +1.e-16, 1.0] works [0.5,0.0, -1.0], [0.5, -1.e-16, -1.0]] # [0.5, +1.e-16, -1.0]] works V_ = np.zeros_like(V) V_[v][:,0] = V[v][:,0].round(decimals=3) V_[v][:,1] = V[v][:,1].round(decimals=3) V_[v][:,2] = V[v][:,2].round(decimals=3) print np.unique(V_) [([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],) ([0.5, -0.0, 1.0],) ([0.5, -0.0, -1.0],)] While I would have expected: [([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],)] Can anyone confirm ? Nicolas___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] np.unique with structured arrays
I can confirm, the issue seems to be in sorting: np.sort(V_) array([([0.5, 0.0, 1.0],), ([0.5, 0.0, -1.0],), ([0.5, -0.0, 1.0],), ([0.5, -0.0, -1.0],)], dtype=[('v', 'f4', (3,))]) These I think are handled by the generic sort functions, and it looks like the comparison function being used is the one for a VOID dtype with no fields, so it is being done byte-wise, hence the problems with 0.0 and -0.0. Not sure where exactly the bug is, though... Jaime On Fri, Aug 22, 2014 at 6:20 AM, Nicolas P. Rougier nicolas.roug...@inria.fr wrote: Hello, I've found a strange behavior or I'm missing something obvious (or np.unique is not supposed to work with structured arrays). I'm trying to extract unique values from a simple structured array but it does not seem to work as expected. Here is a minimal script showing the problem: import numpy as np V = np.zeros(4, dtype=[(v, np.float32, 3)]) V[v] = [ [0.5,0.0, 1.0], [0.5, -1.e-16, 1.0], # [0.5, +1.e-16, 1.0] works [0.5,0.0, -1.0], [0.5, -1.e-16, -1.0]] # [0.5, +1.e-16, -1.0]] works V_ = np.zeros_like(V) V_[v][:,0] = V[v][:,0].round(decimals=3) V_[v][:,1] = V[v][:,1].round(decimals=3) V_[v][:,2] = V[v][:,2].round(decimals=3) print np.unique(V_) [([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],) ([0.5, -0.0, 1.0],) ([0.5, -0.0, -1.0],)] While I would have expected: [([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],)] Can anyone confirm ? Nicolas ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- (\__/) ( O.o) ( ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] np.unique with structured arrays
It does not sound like an issue with unique, but rather like a matter of floating point equality and representation. Do the ' identical' elements pass an equality test? -Original Message- From: Nicolas P. Rougier nicolas.roug...@inria.fr Sent: 22-8-2014 15:21 To: Discussion of Numerical Python numpy-discussion@scipy.org Subject: [Numpy-discussion] np.unique with structured arrays Hello, I've found a strange behavior or I'm missing something obvious (or np.unique is not supposed to work with structured arrays). I'm trying to extract unique values from a simple structured array but it does not seem to work as expected. Here is a minimal script showing the problem: import numpy as np V = np.zeros(4, dtype=[(v, np.float32, 3)]) V[v] = [ [0.5,0.0, 1.0], [0.5, -1.e-16, 1.0], # [0.5, +1.e-16, 1.0] works [0.5,0.0, -1.0], [0.5, -1.e-16, -1.0]] # [0.5, +1.e-16, -1.0]] works V_ = np.zeros_like(V) V_[v][:,0] = V[v][:,0].round(decimals=3) V_[v][:,1] = V[v][:,1].round(decimals=3) V_[v][:,2] = V[v][:,2].round(decimals=3) print np.unique(V_) [([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],) ([0.5, -0.0, 1.0],) ([0.5, -0.0, -1.0],)] While I would have expected: [([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],)] Can anyone confirm ? Nicolas___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] np.unique with structured arrays
Oh yeah this could be. Floating point equality and bitwise equality are not the same thing. -Original Message- From: Jaime Fernández del Río jaime.f...@gmail.com Sent: 22-8-2014 16:22 To: Discussion of Numerical Python numpy-discussion@scipy.org Subject: Re: [Numpy-discussion] np.unique with structured arrays I can confirm, the issue seems to be in sorting: np.sort(V_) array([([0.5, 0.0, 1.0],), ([0.5, 0.0, -1.0],), ([0.5, -0.0, 1.0],), ([0.5, -0.0, -1.0],)], dtype=[('v', 'f4', (3,))]) These I think are handled by the generic sort functions, and it looks like the comparison function being used is the one for a VOID dtype with no fields, so it is being done byte-wise, hence the problems with 0.0 and -0.0. Not sure where exactly the bug is, though... Jaime On Fri, Aug 22, 2014 at 6:20 AM, Nicolas P. Rougier nicolas.roug...@inria.fr wrote: Hello, I've found a strange behavior or I'm missing something obvious (or np.unique is not supposed to work with structured arrays). I'm trying to extract unique values from a simple structured array but it does not seem to work as expected. Here is a minimal script showing the problem: import numpy as np V = np.zeros(4, dtype=[(v, np.float32, 3)]) V[v] = [ [0.5,0.0, 1.0], [0.5, -1.e-16, 1.0], # [0.5, +1.e-16, 1.0] works [0.5,0.0, -1.0], [0.5, -1.e-16, -1.0]] # [0.5, +1.e-16, -1.0]] works V_ = np.zeros_like(V) V_[v][:,0] = V[v][:,0].round(decimals=3) V_[v][:,1] = V[v][:,1].round(decimals=3) V_[v][:,2] = V[v][:,2].round(decimals=3) print np.unique(V_) [([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],) ([0.5, -0.0, 1.0],) ([0.5, -0.0, -1.0],)] While I would have expected: [([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],)] Can anyone confirm ? Nicolas ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- (\__/) ( O.o) ( ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] np.unique with structured arrays
structured arrays are of VOID dtype, but with a non-None names attribute: V_.dtype.num 20 V_.dtype.names ('v',) V_.view(np.void).dtype.num 20 V_.view(np.void).dtype.names The comparison function uses the STRING comparison function if names is None, or a proper field by field comparison if not, see here: https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/arraytypes.c.src#L2675 With a quick look at the source, the only fishy thing I see is that the original array has the sort axis moved to the end of the shape tuple, and is then copied into a contiguous array here: https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/item_selection.c#L1151 But that new array should preserve the dtype unchanged, and hence the right compare function should be called. If no one with a better understanding of the internals spots it, I will try to further debug it over the weekend. Jaime On Fri, Aug 22, 2014 at 7:54 AM, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: Oh yeah this could be. Floating point equality and bitwise equality are not the same thing. -- From: Jaime Fernández del Río jaime.f...@gmail.com Sent: 22-8-2014 16:22 To: Discussion of Numerical Python numpy-discussion@scipy.org Subject: Re: [Numpy-discussion] np.unique with structured arrays I can confirm, the issue seems to be in sorting: np.sort(V_) array([([0.5, 0.0, 1.0],), ([0.5, 0.0, -1.0],), ([0.5, -0.0, 1.0],), ([0.5, -0.0, -1.0],)], dtype=[('v', 'f4', (3,))]) These I think are handled by the generic sort functions, and it looks like the comparison function being used is the one for a VOID dtype with no fields, so it is being done byte-wise, hence the problems with 0.0 and -0.0. Not sure where exactly the bug is, though... Jaime On Fri, Aug 22, 2014 at 6:20 AM, Nicolas P. Rougier nicolas.roug...@inria.fr wrote: Hello, I've found a strange behavior or I'm missing something obvious (or np.unique is not supposed to work with structured arrays). I'm trying to extract unique values from a simple structured array but it does not seem to work as expected. Here is a minimal script showing the problem: import numpy as np V = np.zeros(4, dtype=[(v, np.float32, 3)]) V[v] = [ [0.5,0.0, 1.0], [0.5, -1.e-16, 1.0], # [0.5, +1.e-16, 1.0] works [0.5,0.0, -1.0], [0.5, -1.e-16, -1.0]] # [0.5, +1.e-16, -1.0]] works V_ = np.zeros_like(V) V_[v][:,0] = V[v][:,0].round(decimals=3) V_[v][:,1] = V[v][:,1].round(decimals=3) V_[v][:,2] = V[v][:,2].round(decimals=3) print np.unique(V_) [([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],) ([0.5, -0.0, 1.0],) ([0.5, -0.0, -1.0],)] While I would have expected: [([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],)] Can anyone confirm ? Nicolas ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- (\__/) ( O.o) ( ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- (\__/) ( O.o) ( ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion