Re: [Numpy-discussion] improving arraysetops

2009-06-17 Thread Robert Cimrman
Hi Neil,

Neil Crighton wrote:
>>> What about merging unique and unique1d?  They're essentially identical for 
>>> an
>>> array input, but unique uses the builtin set() for non-array inputs and so 
>>> is
>>> around 2x faster in this case - see below. Is it worth accepting a speed
>>> regression for unique to get rid of the function duplication?  (Or can they 
>>> be
>>> combined?)
>> unique1d can return the indices - can this be achieved by using set(), too?
>>
> 
> No, set() can't return the indices as far as I know.
> 
>> The implementation for arrays is the same already, IMHO, so I would
>> prefer adding return_index, return_inverse to unique (automatically
>> converting input to array, if necessary), and deprecate unique1d.
>>
>> We can view it also as adding the set() approach to unique1d, when the
>> return_index, return_inverse arguments are not set, and renaming
>> unique1d -> unique.
>>
> 
> This sounds good. If you don't have time to do it, I don't mind having
> a go at writing
> a patch to implement these changes (deprecate the existing unique1d, rename
> unique1d to unique and add the set approach from the old unique, and the other
> changes mentioned in http://projects.scipy.org/numpy/ticket/1133).

That would be really great - I will not be online starting tomorrow till 
the end of next week (more or less), so I can really look at the issue 
after I return.

[...]
>> UnicodeEncodeError: 'ascii' codec can't encode character u'\xb5' in
>> position 28: ordinal not in range(128)
>>
>> It disappears after increasing the array size, or the integer size.
>> In [39]: np.__version__
>> Out[39]: '1.4.0.dev7047'
>>
>> r.
> 
> Weird! From the error message, it looks like a problem with ipython's timeit
> function rather than unique. I can't reproduce it on my machine
> (numpy 1.4.0.dev, r7059;   IPython 0.10.bzr.r1163 ).

True, I have ipython 0.9.1, that might cause the problem.

cheers,
r.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] improving arraysetops

2009-06-17 Thread Neil Crighton
> > What about merging unique and unique1d?  They're essentially identical for 
> > an
> > array input, but unique uses the builtin set() for non-array inputs and so 
> > is
> > around 2x faster in this case - see below. Is it worth accepting a speed
> > regression for unique to get rid of the function duplication?  (Or can they 
> > be
> > combined?)
>
> unique1d can return the indices - can this be achieved by using set(), too?
>

No, set() can't return the indices as far as I know.

> The implementation for arrays is the same already, IMHO, so I would
> prefer adding return_index, return_inverse to unique (automatically
> converting input to array, if necessary), and deprecate unique1d.
>
> We can view it also as adding the set() approach to unique1d, when the
> return_index, return_inverse arguments are not set, and renaming
> unique1d -> unique.
>

This sounds good. If you don't have time to do it, I don't mind having
a go at writing
a patch to implement these changes (deprecate the existing unique1d, rename
unique1d to unique and add the set approach from the old unique, and the other
changes mentioned in http://projects.scipy.org/numpy/ticket/1133).

> I have found a strange bug in unique():
>
> In [24]: l = list(np.random.randint(100, size=1000))
>
> In [25]: %timeit np.unique(l)
> ---
> UnicodeEncodeErrorTraceback (most recent call last)
>
> /usr/lib64/python2.5/site-packages/IPython/iplib.py in ipmagic(self, arg_s)
>  951 else:
>  952 magic_args = self.var_expand(magic_args,1)
> --> 953 return fn(magic_args)
>  954
>  955 def ipalias(self,arg_s):
>
> /usr/lib64/python2.5/site-packages/IPython/Magic.py in
> magic_timeit(self, parameter_s)
> 1829
> precision,
> 1830   best
> * scaling[order],
> -> 1831
> units[order])
> 1832 if tc > tc_min:
> 1833 print "Compiler time: %.2f s" % tc
>
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xb5' in
> position 28: ordinal not in range(128)
>
> It disappears after increasing the array size, or the integer size.
> In [39]: np.__version__
> Out[39]: '1.4.0.dev7047'
>
> r.

Weird! From the error message, it looks like a problem with ipython's timeit
function rather than unique. I can't reproduce it on my machine
(numpy 1.4.0.dev, r7059;   IPython 0.10.bzr.r1163 ).

Neil
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] improving arraysetops

2009-06-15 Thread Robert Cimrman
Neil Crighton wrote:
> Robert Cimrman  ntc.zcu.cz> writes:
> 
>> Hi,
>>
>> I am starting a new thread, so that it reaches the interested people.
>> Let us discuss improvements to arraysetops (array set operations) at [1] 
>> (allowing non-unique arrays as function arguments, better naming 
>> conventions and documentation).
>>
>> r.
>>
>> [1] http://projects.scipy.org/numpy/ticket/1133
>>
> 
> Hi,
> 
> These changes looks good to me.  For point (1) I think we should fold the 
> unique and _nu code into a single function. For point (3) I like in1d - it's 
> shorter than isin1d but is still clear.

yes, the _nu functions will be useless then, their bodies can be moved 
into the generic functions.

> What about merging unique and unique1d?  They're essentially identical for an 
> array input, but unique uses the builtin set() for non-array inputs and so is 
> around 2x faster in this case - see below. Is it worth accepting a speed 
> regression for unique to get rid of the function duplication?  (Or can they 
> be 
> combined?) 

unique1d can return the indices - can this be achieved by using set(), too?

The implementation for arrays is the same already, IMHO, so I would 
prefer adding return_index, return_inverse to unique (automatically 
converting input to array, if necessary), and deprecate unique1d.

We can view it also as adding the set() approach to unique1d, when the 
return_index, return_inverse arguments are not set, and renaming 
unique1d -> unique.

> Neil
> 
> 
> In [24]: l = list(np.random.randint(100, size=1))
> In [25]: %timeit np.unique1d(l)
> 1000 loops, best of 3: 1.9 ms per loop
> In [26]: %timeit np.unique(l)
> 1000 loops, best of 3: 793 µs per loop
> In [27]: l = list(np.random.randint(100, size=100))
> In [28]: %timeit np.unique(l)
> 10 loops, best of 3: 78 ms per loop
> In [29]: %timeit np.unique1d(l)
> 10 loops, best of 3: 233 ms per loop

I have found a strange bug in unique():

In [24]: l = list(np.random.randint(100, size=1000))

In [25]: %timeit np.unique(l)
---
UnicodeEncodeErrorTraceback (most recent call last)

/usr/lib64/python2.5/site-packages/IPython/iplib.py in ipmagic(self, arg_s)
 951 else:
 952 magic_args = self.var_expand(magic_args,1)
--> 953 return fn(magic_args)
 954
 955 def ipalias(self,arg_s):

/usr/lib64/python2.5/site-packages/IPython/Magic.py in 
magic_timeit(self, parameter_s)
1829 
precision,
1830   best 
* scaling[order],
-> 1831 
units[order])
1832 if tc > tc_min:
1833 print "Compiler time: %.2f s" % tc

UnicodeEncodeError: 'ascii' codec can't encode character u'\xb5' in 
position 28: ordinal not in range(128)

It disappears after increasing the array size, or the integer size.
In [39]: np.__version__
Out[39]: '1.4.0.dev7047'

r.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] improving arraysetops

2009-06-14 Thread Neil Crighton
Robert Cimrman  ntc.zcu.cz> writes:

> 
> Hi,
> 
> I am starting a new thread, so that it reaches the interested people.
> Let us discuss improvements to arraysetops (array set operations) at [1] 
> (allowing non-unique arrays as function arguments, better naming 
> conventions and documentation).
> 
> r.
> 
> [1] http://projects.scipy.org/numpy/ticket/1133
> 

Hi,

These changes looks good to me.  For point (1) I think we should fold the 
unique and _nu code into a single function. For point (3) I like in1d - it's 
shorter than isin1d but is still clear.

What about merging unique and unique1d?  They're essentially identical for an 
array input, but unique uses the builtin set() for non-array inputs and so is 
around 2x faster in this case - see below. Is it worth accepting a speed 
regression for unique to get rid of the function duplication?  (Or can they be 
combined?) 


Neil


In [24]: l = list(np.random.randint(100, size=1))
In [25]: %timeit np.unique1d(l)
1000 loops, best of 3: 1.9 ms per loop
In [26]: %timeit np.unique(l)
1000 loops, best of 3: 793 µs per loop
In [27]: l = list(np.random.randint(100, size=100))
In [28]: %timeit np.unique(l)
10 loops, best of 3: 78 ms per loop
In [29]: %timeit np.unique1d(l)
10 loops, best of 3: 233 ms per loop

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] improving arraysetops

2009-06-09 Thread Robert Cimrman
Hi,

I am starting a new thread, so that it reaches the interested people.
Let us discuss improvements to arraysetops (array set operations) at [1] 
(allowing non-unique arrays as function arguments, better naming 
conventions and documentation).

r.

[1] http://projects.scipy.org/numpy/ticket/1133
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion