Re: [Numpy-discussion] improving arraysetops
Hi Neil, Neil Crighton wrote: >>> What about merging unique and unique1d? They're essentially identical for >>> an >>> array input, but unique uses the builtin set() for non-array inputs and so >>> is >>> around 2x faster in this case - see below. Is it worth accepting a speed >>> regression for unique to get rid of the function duplication? (Or can they >>> be >>> combined?) >> unique1d can return the indices - can this be achieved by using set(), too? >> > > No, set() can't return the indices as far as I know. > >> The implementation for arrays is the same already, IMHO, so I would >> prefer adding return_index, return_inverse to unique (automatically >> converting input to array, if necessary), and deprecate unique1d. >> >> We can view it also as adding the set() approach to unique1d, when the >> return_index, return_inverse arguments are not set, and renaming >> unique1d -> unique. >> > > This sounds good. If you don't have time to do it, I don't mind having > a go at writing > a patch to implement these changes (deprecate the existing unique1d, rename > unique1d to unique and add the set approach from the old unique, and the other > changes mentioned in http://projects.scipy.org/numpy/ticket/1133). That would be really great - I will not be online starting tomorrow till the end of next week (more or less), so I can really look at the issue after I return. [...] >> UnicodeEncodeError: 'ascii' codec can't encode character u'\xb5' in >> position 28: ordinal not in range(128) >> >> It disappears after increasing the array size, or the integer size. >> In [39]: np.__version__ >> Out[39]: '1.4.0.dev7047' >> >> r. > > Weird! From the error message, it looks like a problem with ipython's timeit > function rather than unique. I can't reproduce it on my machine > (numpy 1.4.0.dev, r7059; IPython 0.10.bzr.r1163 ). True, I have ipython 0.9.1, that might cause the problem. cheers, r. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] improving arraysetops
> > What about merging unique and unique1d? They're essentially identical for > > an > > array input, but unique uses the builtin set() for non-array inputs and so > > is > > around 2x faster in this case - see below. Is it worth accepting a speed > > regression for unique to get rid of the function duplication? (Or can they > > be > > combined?) > > unique1d can return the indices - can this be achieved by using set(), too? > No, set() can't return the indices as far as I know. > The implementation for arrays is the same already, IMHO, so I would > prefer adding return_index, return_inverse to unique (automatically > converting input to array, if necessary), and deprecate unique1d. > > We can view it also as adding the set() approach to unique1d, when the > return_index, return_inverse arguments are not set, and renaming > unique1d -> unique. > This sounds good. If you don't have time to do it, I don't mind having a go at writing a patch to implement these changes (deprecate the existing unique1d, rename unique1d to unique and add the set approach from the old unique, and the other changes mentioned in http://projects.scipy.org/numpy/ticket/1133). > I have found a strange bug in unique(): > > In [24]: l = list(np.random.randint(100, size=1000)) > > In [25]: %timeit np.unique(l) > --- > UnicodeEncodeErrorTraceback (most recent call last) > > /usr/lib64/python2.5/site-packages/IPython/iplib.py in ipmagic(self, arg_s) > 951 else: > 952 magic_args = self.var_expand(magic_args,1) > --> 953 return fn(magic_args) > 954 > 955 def ipalias(self,arg_s): > > /usr/lib64/python2.5/site-packages/IPython/Magic.py in > magic_timeit(self, parameter_s) > 1829 > precision, > 1830 best > * scaling[order], > -> 1831 > units[order]) > 1832 if tc > tc_min: > 1833 print "Compiler time: %.2f s" % tc > > UnicodeEncodeError: 'ascii' codec can't encode character u'\xb5' in > position 28: ordinal not in range(128) > > It disappears after increasing the array size, or the integer size. > In [39]: np.__version__ > Out[39]: '1.4.0.dev7047' > > r. Weird! From the error message, it looks like a problem with ipython's timeit function rather than unique. I can't reproduce it on my machine (numpy 1.4.0.dev, r7059; IPython 0.10.bzr.r1163 ). Neil ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] improving arraysetops
Neil Crighton wrote: > Robert Cimrman ntc.zcu.cz> writes: > >> Hi, >> >> I am starting a new thread, so that it reaches the interested people. >> Let us discuss improvements to arraysetops (array set operations) at [1] >> (allowing non-unique arrays as function arguments, better naming >> conventions and documentation). >> >> r. >> >> [1] http://projects.scipy.org/numpy/ticket/1133 >> > > Hi, > > These changes looks good to me. For point (1) I think we should fold the > unique and _nu code into a single function. For point (3) I like in1d - it's > shorter than isin1d but is still clear. yes, the _nu functions will be useless then, their bodies can be moved into the generic functions. > What about merging unique and unique1d? They're essentially identical for an > array input, but unique uses the builtin set() for non-array inputs and so is > around 2x faster in this case - see below. Is it worth accepting a speed > regression for unique to get rid of the function duplication? (Or can they > be > combined?) unique1d can return the indices - can this be achieved by using set(), too? The implementation for arrays is the same already, IMHO, so I would prefer adding return_index, return_inverse to unique (automatically converting input to array, if necessary), and deprecate unique1d. We can view it also as adding the set() approach to unique1d, when the return_index, return_inverse arguments are not set, and renaming unique1d -> unique. > Neil > > > In [24]: l = list(np.random.randint(100, size=1)) > In [25]: %timeit np.unique1d(l) > 1000 loops, best of 3: 1.9 ms per loop > In [26]: %timeit np.unique(l) > 1000 loops, best of 3: 793 µs per loop > In [27]: l = list(np.random.randint(100, size=100)) > In [28]: %timeit np.unique(l) > 10 loops, best of 3: 78 ms per loop > In [29]: %timeit np.unique1d(l) > 10 loops, best of 3: 233 ms per loop I have found a strange bug in unique(): In [24]: l = list(np.random.randint(100, size=1000)) In [25]: %timeit np.unique(l) --- UnicodeEncodeErrorTraceback (most recent call last) /usr/lib64/python2.5/site-packages/IPython/iplib.py in ipmagic(self, arg_s) 951 else: 952 magic_args = self.var_expand(magic_args,1) --> 953 return fn(magic_args) 954 955 def ipalias(self,arg_s): /usr/lib64/python2.5/site-packages/IPython/Magic.py in magic_timeit(self, parameter_s) 1829 precision, 1830 best * scaling[order], -> 1831 units[order]) 1832 if tc > tc_min: 1833 print "Compiler time: %.2f s" % tc UnicodeEncodeError: 'ascii' codec can't encode character u'\xb5' in position 28: ordinal not in range(128) It disappears after increasing the array size, or the integer size. In [39]: np.__version__ Out[39]: '1.4.0.dev7047' r. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] improving arraysetops
Robert Cimrman ntc.zcu.cz> writes: > > Hi, > > I am starting a new thread, so that it reaches the interested people. > Let us discuss improvements to arraysetops (array set operations) at [1] > (allowing non-unique arrays as function arguments, better naming > conventions and documentation). > > r. > > [1] http://projects.scipy.org/numpy/ticket/1133 > Hi, These changes looks good to me. For point (1) I think we should fold the unique and _nu code into a single function. For point (3) I like in1d - it's shorter than isin1d but is still clear. What about merging unique and unique1d? They're essentially identical for an array input, but unique uses the builtin set() for non-array inputs and so is around 2x faster in this case - see below. Is it worth accepting a speed regression for unique to get rid of the function duplication? (Or can they be combined?) Neil In [24]: l = list(np.random.randint(100, size=1)) In [25]: %timeit np.unique1d(l) 1000 loops, best of 3: 1.9 ms per loop In [26]: %timeit np.unique(l) 1000 loops, best of 3: 793 µs per loop In [27]: l = list(np.random.randint(100, size=100)) In [28]: %timeit np.unique(l) 10 loops, best of 3: 78 ms per loop In [29]: %timeit np.unique1d(l) 10 loops, best of 3: 233 ms per loop ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] improving arraysetops
Hi, I am starting a new thread, so that it reaches the interested people. Let us discuss improvements to arraysetops (array set operations) at [1] (allowing non-unique arrays as function arguments, better naming conventions and documentation). r. [1] http://projects.scipy.org/numpy/ticket/1133 ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion