Re: [Numpy-discussion] ANN: NumPy 1.9.0 release candidate 1 available

2014-08-27 Thread Charles R Harris
On Wed, Aug 27, 2014 at 3:52 PM, Orion Poplawski 
wrote:

> On 08/27/2014 11:07 AM, Julian Taylor wrote:
> > Hello,
> >
> > Almost punctually for EuroScipy we have finally managed to release the
> > first release candidate of NumPy 1.9.
> > We intend to only fix bugs until the final release which we plan to do
> > in the next 1-2 weeks.
>
>
> I'm seeing the following errors from setup.py:
>
>
> non-existing path in 'numpy/f2py': 'docs'
> non-existing path in 'numpy/f2py': 'f2py.1'
>
> non-existing path in 'numpy/lib': 'benchmarks'
>

Hmm, benchmarks is long gone, and the f2py files also no longer exist.
Since this causes no problems on my system, I suspect something else. How
are you doing the install?

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Does a `mergesorted` function make sense?

2014-08-27 Thread Eelco Hoogendoorn
I just checked the docs on ufuncs, and it appears that's a solved problem
now, since ufunc.reduceat now comes with an axis argument. Or maybe it
already did when I wrote that, but I simply wasn't paying attention. Either
way, the code is fully vectorized now, in both grouped and non-grouped
axes. Its a lot of code, but all that happens for a grouping other than
some O(1) and O(n) stuff is an argsort of the keys, and then the reduction
itself, all fully vectorized.

Note that I sort the values first, and then use ufunc.reduceat on the
groups. It would seem to me that ufunc.at should be more efficient, by
avoiding this indirection, but testing very much revealed the opposite, for
reasons unclear to me. Perhaps that's changed now as well.


On Wed, Aug 27, 2014 at 11:32 PM, Jaime Fernández del Río <
jaime.f...@gmail.com> wrote:

> Yes, I was aware of that. But the point would be to provide true
> vectorization on those operations.
>
> The way I see it, numpy may not have to have a GroupBy implementation, but
> it should at least enable implementing one that is fast and efficient over
> any axis.
>
>
> On Wed, Aug 27, 2014 at 12:38 PM, Eelco Hoogendoorn <
> hoogendoorn.ee...@gmail.com> wrote:
>
>> i.e, if the grouped axis is small but the other axes are not, you could
>> write this, which avoids the python loop over the long axis that
>> np.vectorize would otherwise perform.
>>
>> import numpy as np
>> from grouping import group_by
>> keys = np.random.randint(0,4,10)
>> values = np.random.rand(10,2000)
>> for k,g in zip(*group_by(keys)(values)):
>> print k, g.mean(0)
>>
>>
>>
>>
>> On Wed, Aug 27, 2014 at 9:29 PM, Eelco Hoogendoorn <
>> hoogendoorn.ee...@gmail.com> wrote:
>>
>>> f.i., this works as expected as well (100 keys of 1d int arrays and 100
>>> values of 1d float arrays):
>>>
>>> group_by(randint(0,4,(100,2))).mean(rand(100,2))
>>>
>>>
>>> On Wed, Aug 27, 2014 at 9:27 PM, Eelco Hoogendoorn <
>>> hoogendoorn.ee...@gmail.com> wrote:
>>>
 If I understand you correctly, the current implementation supports
 these operations. All reductions over groups (except for median) are
 performed through the corresponding ufunc (see GroupBy.reduce). This works
 on multidimensional arrays as well, although this broadcasting over the
 non-grouping axes is accomplished using np.vectorize. Actual vectorization
 only happens over the axis being grouped over, but this is usually a long
 axis. If it isn't, it is more efficient to perform a reduction by means of
 splitting the array by its groups first, and then map the iterable of
 groups over some reduction operation (as noted in the docstring of
 GroupBy.reduce).


 On Wed, Aug 27, 2014 at 8:29 PM, Jaime Fernández del Río <
 jaime.f...@gmail.com> wrote:

> Hi Eelco,
>
> I took a deeper look into your code a couple of weeks back. I don't
> think I have fully grasped what it allows completely, but I agree that 
> some
> form of what you have there is highly desirable. Along the same lines, for
> sometime I have been thinking that the right place for a `groupby` in 
> numpy
> is as a method of ufuncs, so that `np.add.groupby(arr, groups)` would do a
> multidimensional version of `np.bincount(groups, weights=arr)`. You would
> then need a more powerful version of `np.unique` to produce the `groups`,
> but that is something that Joe Kington's old PR was very close to
> achieving, that should probably be resurrected as well. But yes, there
> seems to be material for a NEP here, and some guidance from one of the
> numpy devs would be helpful in getting this somewhere.
>
> Jaime
>
>
> On Wed, Aug 27, 2014 at 10:35 AM, Eelco Hoogendoorn <
> hoogendoorn.ee...@gmail.com> wrote:
>
>> It wouldn't hurt to have this function, but my intuition is that its
>> use will be minimal. If you are already working with sorted arrays, you
>> already have a flop cost on that order of magnitude, and the optimized
>> merge saves you a factor two at the very most. Using numpy means you are
>> sacrificing factors of two and beyond relative to pure C left right and
>> center anyway, so if this kind of thing matters to you, you probably wont
>> be working in numpy in the first place.
>>
>> That said, I share your interest in overhauling arraysetops. I see
>> many opportunities for expanding its functionality. There is a question
>> that amounts to 'how do I do group-by in numpy' on stackoverflow almost
>> every week. That would have my top priority, but also things like 
>> extending
>> np.unique to things like graph edges, or other more complex input, is 
>> very
>> often useful to me.
>>
>> Ive written up a draft a while ago
>> which accomplishes all of the above and more. It reimplements functions
>> like np.unique around a common Index object.

Re: [Numpy-discussion] Does a `mergesorted` function make sense?

2014-08-27 Thread Daπid
On 27 August 2014 19:02, Jaime Fernández del Río 
wrote:

>
> Since there is at least one other person out there that likes it, is there
> any more interest in such a function? If yes, any comments on what the
> proper interface for extra output should be? Although perhaps the best is
> to leave that out for starters and see what use people make of it, if any.
>

I think a perhaps more useful thing would be to implement timsort. I
understand it is capable of take full advantage of the partially sorted
arrays, with the extra safety of not making the assumption that the
individual arrays are sorted. This will also be useful for other real world
cases where the data is already partially sorted.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumPy 1.9.0 release candidate 1 available

2014-08-27 Thread Orion Poplawski
On 08/27/2014 11:07 AM, Julian Taylor wrote:
> Hello,
>
> Almost punctually for EuroScipy we have finally managed to release the
> first release candidate of NumPy 1.9.
> We intend to only fix bugs until the final release which we plan to do
> in the next 1-2 weeks.


I'm seeing the following errors from setup.py:


non-existing path in 'numpy/f2py': 'docs'
non-existing path in 'numpy/f2py': 'f2py.1'

non-existing path in 'numpy/lib': 'benchmarks'

It would be nice if f2py.1 was installed in /usr/share/man/man1/f2py.1.

Filed as https://github.com/numpy/numpy/issues/5010

-- 
Orion Poplawski
Technical Manager 303-415-9701 x222
NWRA, Boulder/CoRA Office FAX: 303-415-9702
3380 Mitchell Lane   or...@nwra.com
Boulder, CO 80301   http://www.nwra.com
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Does a `mergesorted` function make sense?

2014-08-27 Thread Jaime Fernández del Río
Yes, I was aware of that. But the point would be to provide true
vectorization on those operations.

The way I see it, numpy may not have to have a GroupBy implementation, but
it should at least enable implementing one that is fast and efficient over
any axis.


On Wed, Aug 27, 2014 at 12:38 PM, Eelco Hoogendoorn <
hoogendoorn.ee...@gmail.com> wrote:

> i.e, if the grouped axis is small but the other axes are not, you could
> write this, which avoids the python loop over the long axis that
> np.vectorize would otherwise perform.
>
> import numpy as np
> from grouping import group_by
> keys = np.random.randint(0,4,10)
> values = np.random.rand(10,2000)
> for k,g in zip(*group_by(keys)(values)):
> print k, g.mean(0)
>
>
>
>
> On Wed, Aug 27, 2014 at 9:29 PM, Eelco Hoogendoorn <
> hoogendoorn.ee...@gmail.com> wrote:
>
>> f.i., this works as expected as well (100 keys of 1d int arrays and 100
>> values of 1d float arrays):
>>
>> group_by(randint(0,4,(100,2))).mean(rand(100,2))
>>
>>
>> On Wed, Aug 27, 2014 at 9:27 PM, Eelco Hoogendoorn <
>> hoogendoorn.ee...@gmail.com> wrote:
>>
>>> If I understand you correctly, the current implementation supports these
>>> operations. All reductions over groups (except for median) are performed
>>> through the corresponding ufunc (see GroupBy.reduce). This works on
>>> multidimensional arrays as well, although this broadcasting over the
>>> non-grouping axes is accomplished using np.vectorize. Actual vectorization
>>> only happens over the axis being grouped over, but this is usually a long
>>> axis. If it isn't, it is more efficient to perform a reduction by means of
>>> splitting the array by its groups first, and then map the iterable of
>>> groups over some reduction operation (as noted in the docstring of
>>> GroupBy.reduce).
>>>
>>>
>>> On Wed, Aug 27, 2014 at 8:29 PM, Jaime Fernández del Río <
>>> jaime.f...@gmail.com> wrote:
>>>
 Hi Eelco,

 I took a deeper look into your code a couple of weeks back. I don't
 think I have fully grasped what it allows completely, but I agree that some
 form of what you have there is highly desirable. Along the same lines, for
 sometime I have been thinking that the right place for a `groupby` in numpy
 is as a method of ufuncs, so that `np.add.groupby(arr, groups)` would do a
 multidimensional version of `np.bincount(groups, weights=arr)`. You would
 then need a more powerful version of `np.unique` to produce the `groups`,
 but that is something that Joe Kington's old PR was very close to
 achieving, that should probably be resurrected as well. But yes, there
 seems to be material for a NEP here, and some guidance from one of the
 numpy devs would be helpful in getting this somewhere.

 Jaime


 On Wed, Aug 27, 2014 at 10:35 AM, Eelco Hoogendoorn <
 hoogendoorn.ee...@gmail.com> wrote:

> It wouldn't hurt to have this function, but my intuition is that its
> use will be minimal. If you are already working with sorted arrays, you
> already have a flop cost on that order of magnitude, and the optimized
> merge saves you a factor two at the very most. Using numpy means you are
> sacrificing factors of two and beyond relative to pure C left right and
> center anyway, so if this kind of thing matters to you, you probably wont
> be working in numpy in the first place.
>
> That said, I share your interest in overhauling arraysetops. I see
> many opportunities for expanding its functionality. There is a question
> that amounts to 'how do I do group-by in numpy' on stackoverflow almost
> every week. That would have my top priority, but also things like 
> extending
> np.unique to things like graph edges, or other more complex input, is very
> often useful to me.
>
> Ive written up a draft a while ago
> which accomplishes all of the above and more. It reimplements functions
> like np.unique around a common Index object. This index object 
> encapsulates
> the precomputation (sorting) required for efficient set-ops on different
> datatypes, and provides a common interface to obtain the kind of
> information you are talking about (which is used extensively internally in
> the implementation of group_by, for instance).
>
> ie, this functionality allows you to write neat things like
> group_by(randint(0,9,(100,2))).median(rand(100))
>
> But I have the feeling much more could be done in this direction, and
> I feel this draft could really use a bit of back and forth. If we are 
> going
> to completely rewrite arraysetops, we might as well do it right.
>
>
> On Wed, Aug 27, 2014 at 7:02 PM, Jaime Fernández del Río <
> jaime.f...@gmail.com> wrote:
>
>> A request was open in github to add a `merge` function to numpy that
>> would merge two sorted 1d arrays into a single sorted 1d array. I

Re: [Numpy-discussion] Does a `mergesorted` function make sense?

2014-08-27 Thread Eelco Hoogendoorn
i.e, if the grouped axis is small but the other axes are not, you could
write this, which avoids the python loop over the long axis that
np.vectorize would otherwise perform.

import numpy as np
from grouping import group_by
keys = np.random.randint(0,4,10)
values = np.random.rand(10,2000)
for k,g in zip(*group_by(keys)(values)):
print k, g.mean(0)




On Wed, Aug 27, 2014 at 9:29 PM, Eelco Hoogendoorn <
hoogendoorn.ee...@gmail.com> wrote:

> f.i., this works as expected as well (100 keys of 1d int arrays and 100
> values of 1d float arrays):
>
> group_by(randint(0,4,(100,2))).mean(rand(100,2))
>
>
> On Wed, Aug 27, 2014 at 9:27 PM, Eelco Hoogendoorn <
> hoogendoorn.ee...@gmail.com> wrote:
>
>> If I understand you correctly, the current implementation supports these
>> operations. All reductions over groups (except for median) are performed
>> through the corresponding ufunc (see GroupBy.reduce). This works on
>> multidimensional arrays as well, although this broadcasting over the
>> non-grouping axes is accomplished using np.vectorize. Actual vectorization
>> only happens over the axis being grouped over, but this is usually a long
>> axis. If it isn't, it is more efficient to perform a reduction by means of
>> splitting the array by its groups first, and then map the iterable of
>> groups over some reduction operation (as noted in the docstring of
>> GroupBy.reduce).
>>
>>
>> On Wed, Aug 27, 2014 at 8:29 PM, Jaime Fernández del Río <
>> jaime.f...@gmail.com> wrote:
>>
>>> Hi Eelco,
>>>
>>> I took a deeper look into your code a couple of weeks back. I don't
>>> think I have fully grasped what it allows completely, but I agree that some
>>> form of what you have there is highly desirable. Along the same lines, for
>>> sometime I have been thinking that the right place for a `groupby` in numpy
>>> is as a method of ufuncs, so that `np.add.groupby(arr, groups)` would do a
>>> multidimensional version of `np.bincount(groups, weights=arr)`. You would
>>> then need a more powerful version of `np.unique` to produce the `groups`,
>>> but that is something that Joe Kington's old PR was very close to
>>> achieving, that should probably be resurrected as well. But yes, there
>>> seems to be material for a NEP here, and some guidance from one of the
>>> numpy devs would be helpful in getting this somewhere.
>>>
>>> Jaime
>>>
>>>
>>> On Wed, Aug 27, 2014 at 10:35 AM, Eelco Hoogendoorn <
>>> hoogendoorn.ee...@gmail.com> wrote:
>>>
 It wouldn't hurt to have this function, but my intuition is that its
 use will be minimal. If you are already working with sorted arrays, you
 already have a flop cost on that order of magnitude, and the optimized
 merge saves you a factor two at the very most. Using numpy means you are
 sacrificing factors of two and beyond relative to pure C left right and
 center anyway, so if this kind of thing matters to you, you probably wont
 be working in numpy in the first place.

 That said, I share your interest in overhauling arraysetops. I see many
 opportunities for expanding its functionality. There is a question that
 amounts to 'how do I do group-by in numpy' on stackoverflow almost every
 week. That would have my top priority, but also things like extending
 np.unique to things like graph edges, or other more complex input, is very
 often useful to me.

 Ive written up a draft a while ago which
 accomplishes all of the above and more. It reimplements functions like
 np.unique around a common Index object. This index object encapsulates the
 precomputation (sorting) required for efficient set-ops on different
 datatypes, and provides a common interface to obtain the kind of
 information you are talking about (which is used extensively internally in
 the implementation of group_by, for instance).

 ie, this functionality allows you to write neat things like
 group_by(randint(0,9,(100,2))).median(rand(100))

 But I have the feeling much more could be done in this direction, and I
 feel this draft could really use a bit of back and forth. If we are going
 to completely rewrite arraysetops, we might as well do it right.


 On Wed, Aug 27, 2014 at 7:02 PM, Jaime Fernández del Río <
 jaime.f...@gmail.com> wrote:

> A request was open in github to add a `merge` function to numpy that
> would merge two sorted 1d arrays into a single sorted 1d array. I have 
> been
> playing around with that idea for a while, and have a branch in my numpy
> fork that adds a `mergesorted` function to `numpy.lib`:
>
>
> https://github.com/jaimefrio/numpy/commit/ce5d480afecc989a36e5d2bf4ea1d1ba58a83b0a
>
> I drew inspiration from C++ STL algorithms, and merged into a single
> function what merge, set_union, set_intersection, set_difference and
> set_symmetric_difference do there.
>
> My first thoug

Re: [Numpy-discussion] Does a `mergesorted` function make sense?

2014-08-27 Thread Eelco Hoogendoorn
f.i., this works as expected as well (100 keys of 1d int arrays and 100
values of 1d float arrays):

group_by(randint(0,4,(100,2))).mean(rand(100,2))


On Wed, Aug 27, 2014 at 9:27 PM, Eelco Hoogendoorn <
hoogendoorn.ee...@gmail.com> wrote:

> If I understand you correctly, the current implementation supports these
> operations. All reductions over groups (except for median) are performed
> through the corresponding ufunc (see GroupBy.reduce). This works on
> multidimensional arrays as well, although this broadcasting over the
> non-grouping axes is accomplished using np.vectorize. Actual vectorization
> only happens over the axis being grouped over, but this is usually a long
> axis. If it isn't, it is more efficient to perform a reduction by means of
> splitting the array by its groups first, and then map the iterable of
> groups over some reduction operation (as noted in the docstring of
> GroupBy.reduce).
>
>
> On Wed, Aug 27, 2014 at 8:29 PM, Jaime Fernández del Río <
> jaime.f...@gmail.com> wrote:
>
>> Hi Eelco,
>>
>> I took a deeper look into your code a couple of weeks back. I don't think
>> I have fully grasped what it allows completely, but I agree that some form
>> of what you have there is highly desirable. Along the same lines, for
>> sometime I have been thinking that the right place for a `groupby` in numpy
>> is as a method of ufuncs, so that `np.add.groupby(arr, groups)` would do a
>> multidimensional version of `np.bincount(groups, weights=arr)`. You would
>> then need a more powerful version of `np.unique` to produce the `groups`,
>> but that is something that Joe Kington's old PR was very close to
>> achieving, that should probably be resurrected as well. But yes, there
>> seems to be material for a NEP here, and some guidance from one of the
>> numpy devs would be helpful in getting this somewhere.
>>
>> Jaime
>>
>>
>> On Wed, Aug 27, 2014 at 10:35 AM, Eelco Hoogendoorn <
>> hoogendoorn.ee...@gmail.com> wrote:
>>
>>> It wouldn't hurt to have this function, but my intuition is that its use
>>> will be minimal. If you are already working with sorted arrays, you already
>>> have a flop cost on that order of magnitude, and the optimized merge saves
>>> you a factor two at the very most. Using numpy means you are sacrificing
>>> factors of two and beyond relative to pure C left right and center anyway,
>>> so if this kind of thing matters to you, you probably wont be working in
>>> numpy in the first place.
>>>
>>> That said, I share your interest in overhauling arraysetops. I see many
>>> opportunities for expanding its functionality. There is a question that
>>> amounts to 'how do I do group-by in numpy' on stackoverflow almost every
>>> week. That would have my top priority, but also things like extending
>>> np.unique to things like graph edges, or other more complex input, is very
>>> often useful to me.
>>>
>>> Ive written up a draft a while ago which
>>> accomplishes all of the above and more. It reimplements functions like
>>> np.unique around a common Index object. This index object encapsulates the
>>> precomputation (sorting) required for efficient set-ops on different
>>> datatypes, and provides a common interface to obtain the kind of
>>> information you are talking about (which is used extensively internally in
>>> the implementation of group_by, for instance).
>>>
>>> ie, this functionality allows you to write neat things like
>>> group_by(randint(0,9,(100,2))).median(rand(100))
>>>
>>> But I have the feeling much more could be done in this direction, and I
>>> feel this draft could really use a bit of back and forth. If we are going
>>> to completely rewrite arraysetops, we might as well do it right.
>>>
>>>
>>> On Wed, Aug 27, 2014 at 7:02 PM, Jaime Fernández del Río <
>>> jaime.f...@gmail.com> wrote:
>>>
 A request was open in github to add a `merge` function to numpy that
 would merge two sorted 1d arrays into a single sorted 1d array. I have been
 playing around with that idea for a while, and have a branch in my numpy
 fork that adds a `mergesorted` function to `numpy.lib`:


 https://github.com/jaimefrio/numpy/commit/ce5d480afecc989a36e5d2bf4ea1d1ba58a83b0a

 I drew inspiration from C++ STL algorithms, and merged into a single
 function what merge, set_union, set_intersection, set_difference and
 set_symmetric_difference do there.

 My first thought when implementing this was to not make it a public
 function, but use it under the hood to speed-up some of the functions of
 `arraysetops.py`, which are now merging two already sorted functions by
 doing `np.sort(np.concatenate((a, b)))`. I would need to revisit my
 testing, but the speed-ups weren't that great.

 One other thing I saw value in for some of the `arraysetops.py`
 functions, but couldn't fully figure out, was in providing extra output
 aside from the merged arrays, either in the form of indices, o

Re: [Numpy-discussion] Does a `mergesorted` function make sense?

2014-08-27 Thread Eelco Hoogendoorn
If I understand you correctly, the current implementation supports these
operations. All reductions over groups (except for median) are performed
through the corresponding ufunc (see GroupBy.reduce). This works on
multidimensional arrays as well, although this broadcasting over the
non-grouping axes is accomplished using np.vectorize. Actual vectorization
only happens over the axis being grouped over, but this is usually a long
axis. If it isn't, it is more efficient to perform a reduction by means of
splitting the array by its groups first, and then map the iterable of
groups over some reduction operation (as noted in the docstring of
GroupBy.reduce).


On Wed, Aug 27, 2014 at 8:29 PM, Jaime Fernández del Río <
jaime.f...@gmail.com> wrote:

> Hi Eelco,
>
> I took a deeper look into your code a couple of weeks back. I don't think
> I have fully grasped what it allows completely, but I agree that some form
> of what you have there is highly desirable. Along the same lines, for
> sometime I have been thinking that the right place for a `groupby` in numpy
> is as a method of ufuncs, so that `np.add.groupby(arr, groups)` would do a
> multidimensional version of `np.bincount(groups, weights=arr)`. You would
> then need a more powerful version of `np.unique` to produce the `groups`,
> but that is something that Joe Kington's old PR was very close to
> achieving, that should probably be resurrected as well. But yes, there
> seems to be material for a NEP here, and some guidance from one of the
> numpy devs would be helpful in getting this somewhere.
>
> Jaime
>
>
> On Wed, Aug 27, 2014 at 10:35 AM, Eelco Hoogendoorn <
> hoogendoorn.ee...@gmail.com> wrote:
>
>> It wouldn't hurt to have this function, but my intuition is that its use
>> will be minimal. If you are already working with sorted arrays, you already
>> have a flop cost on that order of magnitude, and the optimized merge saves
>> you a factor two at the very most. Using numpy means you are sacrificing
>> factors of two and beyond relative to pure C left right and center anyway,
>> so if this kind of thing matters to you, you probably wont be working in
>> numpy in the first place.
>>
>> That said, I share your interest in overhauling arraysetops. I see many
>> opportunities for expanding its functionality. There is a question that
>> amounts to 'how do I do group-by in numpy' on stackoverflow almost every
>> week. That would have my top priority, but also things like extending
>> np.unique to things like graph edges, or other more complex input, is very
>> often useful to me.
>>
>> Ive written up a draft a while ago which
>> accomplishes all of the above and more. It reimplements functions like
>> np.unique around a common Index object. This index object encapsulates the
>> precomputation (sorting) required for efficient set-ops on different
>> datatypes, and provides a common interface to obtain the kind of
>> information you are talking about (which is used extensively internally in
>> the implementation of group_by, for instance).
>>
>> ie, this functionality allows you to write neat things like
>> group_by(randint(0,9,(100,2))).median(rand(100))
>>
>> But I have the feeling much more could be done in this direction, and I
>> feel this draft could really use a bit of back and forth. If we are going
>> to completely rewrite arraysetops, we might as well do it right.
>>
>>
>> On Wed, Aug 27, 2014 at 7:02 PM, Jaime Fernández del Río <
>> jaime.f...@gmail.com> wrote:
>>
>>> A request was open in github to add a `merge` function to numpy that
>>> would merge two sorted 1d arrays into a single sorted 1d array. I have been
>>> playing around with that idea for a while, and have a branch in my numpy
>>> fork that adds a `mergesorted` function to `numpy.lib`:
>>>
>>>
>>> https://github.com/jaimefrio/numpy/commit/ce5d480afecc989a36e5d2bf4ea1d1ba58a83b0a
>>>
>>> I drew inspiration from C++ STL algorithms, and merged into a single
>>> function what merge, set_union, set_intersection, set_difference and
>>> set_symmetric_difference do there.
>>>
>>> My first thought when implementing this was to not make it a public
>>> function, but use it under the hood to speed-up some of the functions of
>>> `arraysetops.py`, which are now merging two already sorted functions by
>>> doing `np.sort(np.concatenate((a, b)))`. I would need to revisit my
>>> testing, but the speed-ups weren't that great.
>>>
>>> One other thing I saw value in for some of the `arraysetops.py`
>>> functions, but couldn't fully figure out, was in providing extra output
>>> aside from the merged arrays, either in the form of indices, or of boolean
>>> masks, indicating which items of the original arrays made it into the
>>> merged one, and/or where did they end up in it.
>>>
>>> Since there is at least one other person out there that likes it, is
>>> there any more interest in such a function? If yes, any comments on what
>>> the proper interface for extra output should be

Re: [Numpy-discussion] Does a `mergesorted` function make sense?

2014-08-27 Thread Jaime Fernández del Río
Hi Eelco,

I took a deeper look into your code a couple of weeks back. I don't think I
have fully grasped what it allows completely, but I agree that some form of
what you have there is highly desirable. Along the same lines, for sometime
I have been thinking that the right place for a `groupby` in numpy is as a
method of ufuncs, so that `np.add.groupby(arr, groups)` would do a
multidimensional version of `np.bincount(groups, weights=arr)`. You would
then need a more powerful version of `np.unique` to produce the `groups`,
but that is something that Joe Kington's old PR was very close to
achieving, that should probably be resurrected as well. But yes, there
seems to be material for a NEP here, and some guidance from one of the
numpy devs would be helpful in getting this somewhere.

Jaime


On Wed, Aug 27, 2014 at 10:35 AM, Eelco Hoogendoorn <
hoogendoorn.ee...@gmail.com> wrote:

> It wouldn't hurt to have this function, but my intuition is that its use
> will be minimal. If you are already working with sorted arrays, you already
> have a flop cost on that order of magnitude, and the optimized merge saves
> you a factor two at the very most. Using numpy means you are sacrificing
> factors of two and beyond relative to pure C left right and center anyway,
> so if this kind of thing matters to you, you probably wont be working in
> numpy in the first place.
>
> That said, I share your interest in overhauling arraysetops. I see many
> opportunities for expanding its functionality. There is a question that
> amounts to 'how do I do group-by in numpy' on stackoverflow almost every
> week. That would have my top priority, but also things like extending
> np.unique to things like graph edges, or other more complex input, is very
> often useful to me.
>
> Ive written up a draft a while ago which
> accomplishes all of the above and more. It reimplements functions like
> np.unique around a common Index object. This index object encapsulates the
> precomputation (sorting) required for efficient set-ops on different
> datatypes, and provides a common interface to obtain the kind of
> information you are talking about (which is used extensively internally in
> the implementation of group_by, for instance).
>
> ie, this functionality allows you to write neat things like
> group_by(randint(0,9,(100,2))).median(rand(100))
>
> But I have the feeling much more could be done in this direction, and I
> feel this draft could really use a bit of back and forth. If we are going
> to completely rewrite arraysetops, we might as well do it right.
>
>
> On Wed, Aug 27, 2014 at 7:02 PM, Jaime Fernández del Río <
> jaime.f...@gmail.com> wrote:
>
>> A request was open in github to add a `merge` function to numpy that
>> would merge two sorted 1d arrays into a single sorted 1d array. I have been
>> playing around with that idea for a while, and have a branch in my numpy
>> fork that adds a `mergesorted` function to `numpy.lib`:
>>
>>
>> https://github.com/jaimefrio/numpy/commit/ce5d480afecc989a36e5d2bf4ea1d1ba58a83b0a
>>
>> I drew inspiration from C++ STL algorithms, and merged into a single
>> function what merge, set_union, set_intersection, set_difference and
>> set_symmetric_difference do there.
>>
>> My first thought when implementing this was to not make it a public
>> function, but use it under the hood to speed-up some of the functions of
>> `arraysetops.py`, which are now merging two already sorted functions by
>> doing `np.sort(np.concatenate((a, b)))`. I would need to revisit my
>> testing, but the speed-ups weren't that great.
>>
>> One other thing I saw value in for some of the `arraysetops.py`
>> functions, but couldn't fully figure out, was in providing extra output
>> aside from the merged arrays, either in the form of indices, or of boolean
>> masks, indicating which items of the original arrays made it into the
>> merged one, and/or where did they end up in it.
>>
>> Since there is at least one other person out there that likes it, is
>> there any more interest in such a function? If yes, any comments on what
>> the proper interface for extra output should be? Although perhaps the best
>> is to leave that out for starters and see what use people make of it, if
>> any.
>>
>> Jaime
>>
>> --
>> (\__/)
>> ( O.o)
>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
>> de dominación mundial.
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.s

Re: [Numpy-discussion] Does a `mergesorted` function make sense?

2014-08-27 Thread Eelco Hoogendoorn
It wouldn't hurt to have this function, but my intuition is that its use
will be minimal. If you are already working with sorted arrays, you already
have a flop cost on that order of magnitude, and the optimized merge saves
you a factor two at the very most. Using numpy means you are sacrificing
factors of two and beyond relative to pure C left right and center anyway,
so if this kind of thing matters to you, you probably wont be working in
numpy in the first place.

That said, I share your interest in overhauling arraysetops. I see many
opportunities for expanding its functionality. There is a question that
amounts to 'how do I do group-by in numpy' on stackoverflow almost every
week. That would have my top priority, but also things like extending
np.unique to things like graph edges, or other more complex input, is very
often useful to me.

Ive written up a draft a while ago which
accomplishes all of the above and more. It reimplements functions like
np.unique around a common Index object. This index object encapsulates the
precomputation (sorting) required for efficient set-ops on different
datatypes, and provides a common interface to obtain the kind of
information you are talking about (which is used extensively internally in
the implementation of group_by, for instance).

ie, this functionality allows you to write neat things like
group_by(randint(0,9,(100,2))).median(rand(100))

But I have the feeling much more could be done in this direction, and I
feel this draft could really use a bit of back and forth. If we are going
to completely rewrite arraysetops, we might as well do it right.


On Wed, Aug 27, 2014 at 7:02 PM, Jaime Fernández del Río <
jaime.f...@gmail.com> wrote:

> A request was open in github to add a `merge` function to numpy that would
> merge two sorted 1d arrays into a single sorted 1d array. I have been
> playing around with that idea for a while, and have a branch in my numpy
> fork that adds a `mergesorted` function to `numpy.lib`:
>
>
> https://github.com/jaimefrio/numpy/commit/ce5d480afecc989a36e5d2bf4ea1d1ba58a83b0a
>
> I drew inspiration from C++ STL algorithms, and merged into a single
> function what merge, set_union, set_intersection, set_difference and
> set_symmetric_difference do there.
>
> My first thought when implementing this was to not make it a public
> function, but use it under the hood to speed-up some of the functions of
> `arraysetops.py`, which are now merging two already sorted functions by
> doing `np.sort(np.concatenate((a, b)))`. I would need to revisit my
> testing, but the speed-ups weren't that great.
>
> One other thing I saw value in for some of the `arraysetops.py` functions,
> but couldn't fully figure out, was in providing extra output aside from the
> merged arrays, either in the form of indices, or of boolean masks,
> indicating which items of the original arrays made it into the merged one,
> and/or where did they end up in it.
>
> Since there is at least one other person out there that likes it, is there
> any more interest in such a function? If yes, any comments on what the
> proper interface for extra output should be? Although perhaps the best is
> to leave that out for starters and see what use people make of it, if any.
>
> Jaime
>
> --
> (\__/)
> ( O.o)
> ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
> de dominación mundial.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Should concatenate broadcast shapes?

2014-08-27 Thread Jaime Fernández del Río
On Wed, Aug 27, 2014 at 10:01 AM, Robert Kern  wrote:

> On Wed, Aug 27, 2014 at 5:44 PM, Jaime Fernández del Río
>  wrote:
> > After reading this stackoverflow question:
> >
> >
> http://stackoverflow.com/questions/25530223/append-a-list-at-the-end-of-each-row-of-2d-array
> >
> > I was reminded that the `np.concatenate` family of functions do not
> > broadcast the shapes of their inputs:
> >
>  import numpy as np
>  a = np.arange(6).reshape(3, 2)
>  b = np.arange(6, 8)
>  np.concatenate((a, b), axis=1)
> > Traceback (most recent call last):
> >   File "", line 1, in 
> > ValueError: all the input arrays must have same number of dimensions
>  np.concatenate((a, b[None]), axis=1)
> > Traceback (most recent call last):
> >   File "", line 1, in 
> > ValueError: all the input array dimensions except for the concatenation
> axis
> > must match exactly
>  np.concatenate((a, np.tile(b[None], (a.shape[0], 1))), axis=1)
> > array([[0, 1, 6, 7],
> >[2, 3, 6, 7],
> >[4, 5, 6, 7]])
>
> In my experience, when I get that ValueError, it has usually been a
> legitimate error on my part and broadcasting would not have
> accomplished what I wanted. Typically, I forgot to transpose
> something. If we allowed broadcasting, my most common source of errors
> using these functions would silently do something unintended.
>

That makes sense, I kind of figured there had to be a reason. So though it
may be beating a dead horse, perhaps adding a `broadcast=False` argument to
the function would do the trick? No side effects unless you ask for them,
in which case you had it coming...

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: NumPy 1.9.0 release candidate 1 available

2014-08-27 Thread Julian Taylor
Hello,

Almost punctually for EuroScipy we have finally managed to release the
first release candidate of NumPy 1.9.
We intend to only fix bugs until the final release which we plan to do
in the next 1-2 weeks.

In this release numerous performance improvements have been added, most
significantly the indexing code has been rewritten be several times
faster for most cases and performance of using small arrays and scalars
has almost doubled.
Plenty of other functions have been improved too, nonzero, where,
count_nonzero, floating point min/max, boolean argmin/argmax,
searchsorted, triu/tril, masked sorting can be expected to perform
significantly better in many cases.

Also NumPy now releases the GIL for more functions, most notably the
indexing now releases it and the random modules state object has a
private lock instead of using the GIL. This allows leveraging pure
python threads more efficiently.

In order to make working with arrays containing NaN values easier
nanmedian and nanpercentile have been added which ignore these values.
These functions and the regular median and percentile now also support
generalized axis arguments that ufuncs already have, these allow
reducing along multiple axis in one call.

Please see the release notes for all the details. Please also take not
of the many small compatibility notes and deprecation in the notes.
https://github.com/numpy/numpy/blob/maintenance/1.9.x/doc/release/1.9.0-notes.rst

The source tarballs and win32 binaries can be downloaded here:
https://sourceforge.net/projects/numpy/files/NumPy/1.9.0rc1

Cheers,
Julian Taylor



signature.asc
Description: OpenPGP digital signature
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Does a `mergesorted` function make sense?

2014-08-27 Thread Jaime Fernández del Río
A request was open in github to add a `merge` function to numpy that would
merge two sorted 1d arrays into a single sorted 1d array. I have been
playing around with that idea for a while, and have a branch in my numpy
fork that adds a `mergesorted` function to `numpy.lib`:

https://github.com/jaimefrio/numpy/commit/ce5d480afecc989a36e5d2bf4ea1d1ba58a83b0a

I drew inspiration from C++ STL algorithms, and merged into a single
function what merge, set_union, set_intersection, set_difference and
set_symmetric_difference do there.

My first thought when implementing this was to not make it a public
function, but use it under the hood to speed-up some of the functions of
`arraysetops.py`, which are now merging two already sorted functions by
doing `np.sort(np.concatenate((a, b)))`. I would need to revisit my
testing, but the speed-ups weren't that great.

One other thing I saw value in for some of the `arraysetops.py` functions,
but couldn't fully figure out, was in providing extra output aside from the
merged arrays, either in the form of indices, or of boolean masks,
indicating which items of the original arrays made it into the merged one,
and/or where did they end up in it.

Since there is at least one other person out there that likes it, is there
any more interest in such a function? If yes, any comments on what the
proper interface for extra output should be? Although perhaps the best is
to leave that out for starters and see what use people make of it, if any.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Should concatenate broadcast shapes?

2014-08-27 Thread Robert Kern
On Wed, Aug 27, 2014 at 5:44 PM, Jaime Fernández del Río
 wrote:
> After reading this stackoverflow question:
>
> http://stackoverflow.com/questions/25530223/append-a-list-at-the-end-of-each-row-of-2d-array
>
> I was reminded that the `np.concatenate` family of functions do not
> broadcast the shapes of their inputs:
>
 import numpy as np
 a = np.arange(6).reshape(3, 2)
 b = np.arange(6, 8)
 np.concatenate((a, b), axis=1)
> Traceback (most recent call last):
>   File "", line 1, in 
> ValueError: all the input arrays must have same number of dimensions
 np.concatenate((a, b[None]), axis=1)
> Traceback (most recent call last):
>   File "", line 1, in 
> ValueError: all the input array dimensions except for the concatenation axis
> must match exactly
 np.concatenate((a, np.tile(b[None], (a.shape[0], 1))), axis=1)
> array([[0, 1, 6, 7],
>[2, 3, 6, 7],
>[4, 5, 6, 7]])

In my experience, when I get that ValueError, it has usually been a
legitimate error on my part and broadcasting would not have
accomplished what I wanted. Typically, I forgot to transpose
something. If we allowed broadcasting, my most common source of errors
using these functions would silently do something unintended.


a = np.arange(6).reshape(3, 2)
b = np.arange(6, 9)  # b.shape == (3,)
# I *intend* to append b as a new column, but forget to make b.shape==(3,1)
c = np.hstack([a, b])

# If hstack() doesn't broadcast, that will fail and show me my error.
# If it does broadcast, it "succeeds" but gives me something I didn't want:
array([[0, 1, 6, 7, 8],
   [2, 3, 6, 7, 8],
   [4, 5, 6, 7, 8]])

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Should concatenate broadcast shapes?

2014-08-27 Thread Jaime Fernández del Río
After reading this stackoverflow question:

http://stackoverflow.com/questions/25530223/append-a-list-at-the-end-of-each-row-of-2d-array

I was reminded that the `np.concatenate` family of functions do not
broadcast the shapes of their inputs:

>>> import numpy as np
>>> a = np.arange(6).reshape(3, 2)
>>> b = np.arange(6, 8)
>>> np.concatenate((a, b), axis=1)
Traceback (most recent call last):
  File "", line 1, in 
ValueError: all the input arrays must have same number of dimensions
>>> np.concatenate((a, b[None]), axis=1)
Traceback (most recent call last):
  File "", line 1, in 
ValueError: all the input array dimensions except for the concatenation
axis must match exactly
>>> np.concatenate((a, np.tile(b[None], (a.shape[0], 1))), axis=1)
array([[0, 1, 6, 7],
   [2, 3, 6, 7],
   [4, 5, 6, 7]])

But there doesn't seem to be any fundamental reason why they shouldn't:

>>> from numpy.lib.stride_tricks import as_strided
>>> b_ = as_strided(b, (a.shape[0],)+b.shape, (0,)+b.strides)
>>> np.concatenate((a, b_), axis=1)
array([[0, 1, 6, 7],
   [2, 3, 6, 7],
   [4, 5, 6, 7]])

Is there any fundamental interface design reason why things are the way
they are? Or is it simply that no one has implemented broadcasting for
these functions? Without thinking much about it, I am +1 on doing this...
At the least, it would probably be good to add a note to the docs
explaining why broadcasting is not implemented.

Jaime




-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Convert 3d NumPy array into 2d

2014-08-27 Thread Sebastian Berg
On Do, 2014-08-28 at 00:08 +0900, phinn stuart wrote:
> Hi everyone, how can I convert (1L, 480L, 1440L) shaped numpy array
> into (480L, 1440L)?
> 

Just slice it arr[0, ...] will do the trick. If you are daring,
np.squeeze also works, or of course np.reshape.

- Sebastian

> 
> Thanks in the advance.
> 
> phinn
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Convert 3d NumPy array into 2d

2014-08-27 Thread Julian Taylor
On 27.08.2014 17:08, phinn stuart wrote:
> Hi everyone, how can I convert (1L, 480L, 1440L) shaped numpy array into
> (480L, 1440L)?
> 
> Thanks in the advance.

np.squeeze removes empty dimensions:

In [2]: np.squeeze(np.ones((1,23,232))).shape
Out[2]: (23, 232)

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Convert 3d NumPy array into 2d

2014-08-27 Thread Benjamin Root
There is also np.squeeze(), which will eliminate any singleton dimensions
(but I personally hate using it because it can accidentally squeeze out
dimensions that you didn't intend to squeeze when you have arbitrary input
data).

Ben Root


On Wed, Aug 27, 2014 at 11:12 AM, Wagner Sebastian <
sebastian.wagner...@ait.ac.at> wrote:

>  Hi,
>
>
>
> Our short example-data:
>
> >>> np.arange(10).reshape(1,5,2)
>
> array([[[0, 1],
>
> [2, 3],
>
> [4, 5],
>
> [6, 7],
>
> [8, 9]]])
>
> Shape is (1,5,2)
>
>
>
> Two possibilies:
>
> >>> data.reshape(5,2)
>
> array([[0, 1],
>
>[2, 3],
>
>[4, 5],
>
>[6, 7],
>
>[8, 9]])
>
>
>
> Or just:
>
> >>> data[0]
>
> array([[0, 1],
>
>[2, 3],
>
>[4, 5],
>
>[6, 7],
>
>[8, 9]])
>
>
>
>
>
> *From:* numpy-discussion-boun...@scipy.org [mailto:
> numpy-discussion-boun...@scipy.org] *On Behalf Of *phinn stuart
> *Sent:* Mittwoch, 27. August 2014 17:09
> *To:* python-l...@python.org; scipy-u...@scipy.org;
> numpy-discussion@scipy.org
> *Subject:* [Numpy-discussion] Convert 3d NumPy array into 2d
>
>
>
> Hi everyone, how can I convert (1L, 480L, 1440L) shaped numpy array into
> (480L, 1440L)?
>
>
>
> Thanks in the advance.
>
>
>
> phinn
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Convert 3d NumPy array into 2d

2014-08-27 Thread Wagner Sebastian
Hi,

Our short example-data:
>>> np.arange(10).reshape(1,5,2)
array([[[0, 1],
[2, 3],
[4, 5],
[6, 7],
[8, 9]]])
Shape is (1,5,2)

Two possibilies:
>>> data.reshape(5,2)
array([[0, 1],
   [2, 3],
   [4, 5],
   [6, 7],
   [8, 9]])

Or just:
>>> data[0]
array([[0, 1],
   [2, 3],
   [4, 5],
   [6, 7],
   [8, 9]])


From: numpy-discussion-boun...@scipy.org 
[mailto:numpy-discussion-boun...@scipy.org] On Behalf Of phinn stuart
Sent: Mittwoch, 27. August 2014 17:09
To: python-l...@python.org; scipy-u...@scipy.org; numpy-discussion@scipy.org
Subject: [Numpy-discussion] Convert 3d NumPy array into 2d

Hi everyone, how can I convert (1L, 480L, 1440L) shaped numpy array into (480L, 
1440L)?

Thanks in the advance.

phinn
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Convert 3d NumPy array into 2d

2014-08-27 Thread phinn stuart
Hi everyone, how can I convert (1L, 480L, 1440L) shaped numpy array into
(480L, 1440L)?

Thanks in the advance.

phinn
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bug in genfromtxt with usecols and converters

2014-08-27 Thread Derek Homeier
On 26 Aug 2014, at 09:05 pm, Adrian Altenhoff  
wrote:

>> But you are right that the problem with using the first_values, which should 
>> of course be valid,
>> somehow stems from the use of usecols, it seems that in that loop
>> 
>>for (i, conv) in user_converters.items():
>> 
>> i in user_converters and in usecols get out of sync. This certainly looks 
>> like a bug, the entire way of
>> modifying i inside the loop appears a bit dangerous to me. I’ll have look if 
>> I can make this safer.
> Thanks.
>> 
>> As long as your data don’t actually contain any missing values you might 
>> also simply use np.loadtxt.
> Ok, wasn't aware of that function so far. I will try that!
> 
It was first_values that needs to be addressed by the original indices.
I have created a short test from your case and submitted a fix at
https://github.com/numpy/numpy/pull/5006

Cheers,
Derek
 
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion