Re: [Numpy-discussion] GSoC?

2016-02-16 Thread Stephan Hoyer
On Wed, Feb 10, 2016 at 4:22 PM, Chris Barker  wrote:

> We might consider adding "improve duck typing for numpy arrays"
>>
>
> care to elaborate on that one?
>
> I know it come up on here that it would be good to have some code in numpy
> itself that made it easier to make array-like objects (I.e. do indexing the
> same way) Is that what you mean?
>

I was thinking particularly of improving the compatibility of numpy
functions (e.g., concatenate) with non-numpy array-like objects, but now
that you mention it utilities to make it easier to make array-like objects
could also be a good thing.

In any case, I've now elaborated on my thought into a full project idea on
the Wiki:
https://github.com/scipy/scipy/wiki/GSoC-2016-project-ideas#improved-duck-typing-support-for-n-dimensional-arrays

Arguably, this might be too difficult for most GSoC students -- the API
design questions here are quite contentious. But given that "Pythonic
dtypes" is up there as a GSoC proposal still it's in good company.

Cheers,
Stephan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: Windows wheels for testing

2016-02-16 Thread Matthew Brett
On Sat, Feb 13, 2016 at 9:55 AM, Jonathan Helmus  wrote:
>
>
> On 2/12/16 10:23 PM, Matthew Brett wrote:
>>
>> On Fri, Feb 12, 2016 at 8:18 PM, R Schumacher  wrote:
>>>
>>> At 03:45 PM 2/12/2016, you wrote:

 PS C:\tmp> c:\Python35\python -m venv np-testing
 PS C:\tmp> .\np-testing\Scripts\Activate.ps1
 (np-testing) PS C:\tmp> pip install -f
 https://nipy.bic.berkeley.edu/scipy_installers/atlas_builds numpy nose
>>>
>>>
>>> C:\Python34\Scripts>pip install  "D:\Python
>>> distros\numpy-1.10.4-cp34-none-win_amd64.whl"
>>> Unpacking d:\python distros\numpy-1.10.4-cp34-none-win_amd64.whl
>>> Installing collected packages: numpy
>>> Successfully installed numpy
>>> Cleaning up...
>>>
>>> C:\Python34\Scripts>..\python
>>> Python 3.4.2 (v3.4.2:ab2c023a9432, Oct  6 2014, 22:16:31) [MSC v.1600 64
>>> bit
>>> (AMD64)] on win32
>>> Type "help", "copyright", "credits" or "license" for more information.
>>
>> import numpy
>> numpy.test()
>>>
>>> Running unit tests for numpy
>>> NumPy version 1.10.4
>>> NumPy relaxed strides checking option: False
>>> NumPy is installed in C:\Python34\lib\site-packages\numpy
>>> Python version 3.4.2 (v3.4.2:ab2c023a9432, Oct  6 2014, 22:16:31) [MSC
>>> v.1600 64 bit (AMD64)]
>>> nose version 1.3.7
>>>
>>> ...FS...
>>>
>>> .S..
>>>
>>> ..C:\Python34\lib\unittest\case.
>>> py:162: DeprecationWarning: using a non-integer number instead of an
>>> integer
>>> will result in an error in the future
>>>callable_obj(*args, **kwargs)
>>> C:\Python34\lib\unittest\case.py:162: DeprecationWarning: using a
>>> non-integer number instead of an integer will
>>> result in an error in the future
>>>callable_obj(*args, **kwargs)
>>> C:\Python34\lib\unittest\case.py:162: DeprecationWarning: using a
>>> non-integer number instead of an integer will result i
>>> n an error in the future
>>>callable_obj(*args, **kwargs)
>>>
>>> ...S
>>>
>>> 
>>>
>>> ..C:\Python34\lib\unittest\case.py:162:
>>> Deprecat
>>> ionWarning: using a non-integer number instead of an integer will result
>>> in
>>> an error in the future
>>>callable_obj(*args, **kwargs)
>>> ..C:\Python34\lib\unittest\case.py:162: DeprecationWarning: using a
>>> non-integer number instead of an integer will result
>>>   in an error in the future
>>>callable_obj(*args, **kwargs)
>>> C:\Python34\lib\unittest\case.py:162: DeprecationWarning: using a
>>> non-integer number instead of an integer will result i
>>> n an error in the future
>>>callable_obj(*args, **kwargs)
>>> C:\Python34\lib\unittest\case.py:162: DeprecationWarning: using a
>>> non-integer number instead of an integer will result i
>>> n an error in the future
>>>callable_obj(*args, **kwargs)
>>> C:\Python34\lib\unittest\case.py:162: DeprecationWarning: using a
>>> non-integer number instead of an integer will result i
>>> n an error in the future
>>>callable_obj(*args, **kwargs)
>>>
>>> 
>>>
>>> 
>>>
>>> 
>>>
>>> 
>>>
>>> 
>>>
>>> 
>>>
>>> 
>>>
>>> 
>>>
>>> 
>>>
>>> 
>>>
>>> 

Re: [Numpy-discussion] Proposal to add `weights` to `np.percentile` and `np.median`

2016-02-16 Thread josef.pktd
On Tue, Feb 16, 2016 at 2:48 PM, Joseph Fox-Rabinovitz <
jfoxrabinov...@gmail.com> wrote:

> Please correct me if I misunderstood, but the code in that commit is
> doing a full sort, somewhat similar to what
> `scipy.stats.scoreatpercentile`. If that is correct, I will run some
> benchmarks first, but I think there is value to going forward with a
> numpy version that extends the current partitioning scheme.
>

I think so, but it's hiding inside pandas groupby, which also uses a hash,
IIUC.
AFAICS, the main reason it's implemented this way is to get correct tie
handling.

There could be large performance differences depending on whether there are
many ties (discretized data) or only unique floats.

(just guessing)

Josef



>
> - Joe
>
> On Tue, Feb 16, 2016 at 2:39 PM,   wrote:
> >
> >
> > On Tue, Feb 16, 2016 at 1:41 PM, Joseph Fox-Rabinovitz
> >  wrote:
> >>
> >> Thanks for pointing me to that. I had something a bit different in
> >> mind but that definitely looks like a good start.
> >>
> >> On Tue, Feb 16, 2016 at 1:32 PM, Antony Lee 
> >> wrote:
> >> > See earlier discussion here:
> https://github.com/numpy/numpy/issues/6326
> >> > Basically, naïvely sorting may be faster than a not-so-optimized
> version
> >> > of
> >> > quickselect.
> >> >
> >> > Antony
> >> >
> >> > 2016-02-15 21:49 GMT-08:00 Joseph Fox-Rabinovitz
> >> > :
> >> >>
> >> >> I would like to add a `weights` keyword to `np.partition`,
> >> >> `np.percentile` and `np.median`. My reason for doing so is to to
> allow
> >> >> `np.histogram` to process automatic bin selection with weights.
> >> >> Currently, weights are not supported for the automatic bin selection
> >> >> and would be difficult to support in `auto` mode without having
> >> >> `np.percentile` support a `weights` keyword. I suspect that there are
> >> >> many other uses for such a feature.
> >> >>
> >> >> I have taken a preliminary look at the C implementation of the
> >> >> partition functions that are the basis for `partition`, `median` and
> >> >> `percentile`. I think that it would be possible to add versions (or
> >> >> just extend the functionality of existing ones) that check the ratio
> >> >> of the weights below the partition point to the total sum of the
> >> >> weights instead of just counting elements.
> >> >>
> >> >> One of the main advantages of such an implementation is that it would
> >> >> allow any real weights to be handled correctly, not just integers.
> >> >> Complex weights would not be supported.
> >> >>
> >> >> The purpose of this email is to see if anybody objects, has ideas or
> >> >> cares at all about this proposal before I spend a significant amount
> >> >> of time working on it. For example, did I miss any functions in my
> >> >> list?
> >> >>
> >> >> Regards,
> >> >>
> >> >> -Joe
> >> >> ___
> >> >> NumPy-Discussion mailing list
> >> >> NumPy-Discussion@scipy.org
> >> >> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >> >
> >> >
> >> >
> >> > ___
> >> > NumPy-Discussion mailing list
> >> > NumPy-Discussion@scipy.org
> >> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >> >
> >> ___
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion@scipy.org
> >> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> >
> > statsmodels just got weighted quantiles
> > https://github.com/statsmodels/statsmodels/pull/2707
> >
> > I didn't try to figure out it's computational efficiency, and we would
> > gladly delegate to whatever fast algorithm would be in numpy.
> >
> > Josef
> >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] building NumPy with gcc if Python was built with icc?!?

2016-02-16 Thread Nathaniel Smith
In principle this should work (offer may be void on windows which has its
own special weirdnesses, but I assume you're not on windows). icc and gcc
should both support the same calling conventions and so forth. It sounds
like you're just running into an annoying build system configuration issue
where python likes to remember the compiler options used to build python,
and then when it's time to build extension modules then the extension
modules ask distutils what to do and distutils tells them to use these
remembered options to build themselves as well. So you want to look into
what compiler flag defaults are being exported by your python build, and
figure out some way to make it export the ones you want instead of the
defaults. I don't think there's anything really numpy specific about this,
since it's about cpython's own build system plus stdlib -- I'd try asking
on python-list or so.

-n
On Feb 16, 2016 11:40 AM, "BERGER Christian" 
wrote:

> Hi All,
>
>
>
> Here's a potentially dumb question: is it possible to build NumPy with
> gcc, if python was built with icc?
>
> Right now, the build is failing in the toolchain check phase, because gcc
> doesn't know how to handle icc-specific c flags (like -fp-model, prec-sqrt,
> ...)
>
> In our environment we're providing an embedded python that our customers
> should be able to use and extend with 3rd party modules (like numpy).
> Problem is that our sw is built using icc, but we don't want to force our
> customers to do the same and we also don't want to build every possible 3rd
> party module for our customers.
>
>
>
> Thanks for your help,
>
> Christian
>
>
>
> This email and any attachments are intended solely for the use of the
> individual or entity to whom it is addressed and may be confidential and/or
> privileged.
>
> If you are not one of the named recipients or have received this email in
> error,
>
> (i) you should not read, disclose, or copy it,
>
> (ii) please notify sender of your receipt by reply email and delete this
> email and all attachments,
>
> (iii) Dassault Systemes does not accept or assume any liability or
> responsibility for any use of or reliance on this email.
>
> For other languages, go to http://www.3ds.com/terms/email-disclaimer
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal to add `weights` to `np.percentile` and `np.median`

2016-02-16 Thread Joseph Fox-Rabinovitz
Please correct me if I misunderstood, but the code in that commit is
doing a full sort, somewhat similar to what
`scipy.stats.scoreatpercentile`. If that is correct, I will run some
benchmarks first, but I think there is value to going forward with a
numpy version that extends the current partitioning scheme.

- Joe

On Tue, Feb 16, 2016 at 2:39 PM,   wrote:
>
>
> On Tue, Feb 16, 2016 at 1:41 PM, Joseph Fox-Rabinovitz
>  wrote:
>>
>> Thanks for pointing me to that. I had something a bit different in
>> mind but that definitely looks like a good start.
>>
>> On Tue, Feb 16, 2016 at 1:32 PM, Antony Lee 
>> wrote:
>> > See earlier discussion here: https://github.com/numpy/numpy/issues/6326
>> > Basically, naïvely sorting may be faster than a not-so-optimized version
>> > of
>> > quickselect.
>> >
>> > Antony
>> >
>> > 2016-02-15 21:49 GMT-08:00 Joseph Fox-Rabinovitz
>> > :
>> >>
>> >> I would like to add a `weights` keyword to `np.partition`,
>> >> `np.percentile` and `np.median`. My reason for doing so is to to allow
>> >> `np.histogram` to process automatic bin selection with weights.
>> >> Currently, weights are not supported for the automatic bin selection
>> >> and would be difficult to support in `auto` mode without having
>> >> `np.percentile` support a `weights` keyword. I suspect that there are
>> >> many other uses for such a feature.
>> >>
>> >> I have taken a preliminary look at the C implementation of the
>> >> partition functions that are the basis for `partition`, `median` and
>> >> `percentile`. I think that it would be possible to add versions (or
>> >> just extend the functionality of existing ones) that check the ratio
>> >> of the weights below the partition point to the total sum of the
>> >> weights instead of just counting elements.
>> >>
>> >> One of the main advantages of such an implementation is that it would
>> >> allow any real weights to be handled correctly, not just integers.
>> >> Complex weights would not be supported.
>> >>
>> >> The purpose of this email is to see if anybody objects, has ideas or
>> >> cares at all about this proposal before I spend a significant amount
>> >> of time working on it. For example, did I miss any functions in my
>> >> list?
>> >>
>> >> Regards,
>> >>
>> >> -Joe
>> >> ___
>> >> NumPy-Discussion mailing list
>> >> NumPy-Discussion@scipy.org
>> >> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> >
>> >
>> > ___
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@scipy.org
>> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> statsmodels just got weighted quantiles
> https://github.com/statsmodels/statsmodels/pull/2707
>
> I didn't try to figure out it's computational efficiency, and we would
> gladly delegate to whatever fast algorithm would be in numpy.
>
> Josef
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] building NumPy with gcc if Python was built with icc?!?

2016-02-16 Thread G Young
I'm not sure about anyone else, but having been playing around with both
gcc and icc, I'm afraid you might be out of luck.  Is there any reason why
you can't use a Python distribution built with gcc?

On Tue, Feb 16, 2016 at 7:39 PM, BERGER Christian 
wrote:

> Hi All,
>
>
>
> Here's a potentially dumb question: is it possible to build NumPy with
> gcc, if python was built with icc?
>
> Right now, the build is failing in the toolchain check phase, because gcc
> doesn't know how to handle icc-specific c flags (like -fp-model, prec-sqrt,
> ...)
>
> In our environment we're providing an embedded python that our customers
> should be able to use and extend with 3rd party modules (like numpy).
> Problem is that our sw is built using icc, but we don't want to force our
> customers to do the same and we also don't want to build every possible 3rd
> party module for our customers.
>
>
>
> Thanks for your help,
>
> Christian
>
>
>
> This email and any attachments are intended solely for the use of the
> individual or entity to whom it is addressed and may be confidential and/or
> privileged.
>
> If you are not one of the named recipients or have received this email in
> error,
>
> (i) you should not read, disclose, or copy it,
>
> (ii) please notify sender of your receipt by reply email and delete this
> email and all attachments,
>
> (iii) Dassault Systemes does not accept or assume any liability or
> responsibility for any use of or reliance on this email.
>
> For other languages, go to http://www.3ds.com/terms/email-disclaimer
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] building NumPy with gcc if Python was built with icc?!?

2016-02-16 Thread BERGER Christian
Hi All,

Here's a potentially dumb question: is it possible to build NumPy with gcc, if 
python was built with icc?
Right now, the build is failing in the toolchain check phase, because gcc 
doesn't know how to handle icc-specific c flags (like -fp-model, prec-sqrt, ...)
In our environment we're providing an embedded python that our customers should 
be able to use and extend with 3rd party modules (like numpy). Problem is that 
our sw is built using icc, but we don't want to force our customers to do the 
same and we also don't want to build every possible 3rd party module for our 
customers.

Thanks for your help,
Christian


This email and any attachments are intended solely for the use of the 
individual or entity to whom it is addressed and may be confidential and/or 
privileged.

If you are not one of the named recipients or have received this email in error,

(i) you should not read, disclose, or copy it,

(ii) please notify sender of your receipt by reply email and delete this email 
and all attachments,

(iii) Dassault Systemes does not accept or assume any liability or 
responsibility for any use of or reliance on this email.

For other languages, go to http://www.3ds.com/terms/email-disclaimer
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal to add `weights` to `np.percentile` and `np.median`

2016-02-16 Thread josef.pktd
On Tue, Feb 16, 2016 at 1:41 PM, Joseph Fox-Rabinovitz <
jfoxrabinov...@gmail.com> wrote:

> Thanks for pointing me to that. I had something a bit different in
> mind but that definitely looks like a good start.
>
> On Tue, Feb 16, 2016 at 1:32 PM, Antony Lee 
> wrote:
> > See earlier discussion here: https://github.com/numpy/numpy/issues/6326
> > Basically, naïvely sorting may be faster than a not-so-optimized version
> of
> > quickselect.
> >
> > Antony
> >
> > 2016-02-15 21:49 GMT-08:00 Joseph Fox-Rabinovitz <
> jfoxrabinov...@gmail.com>:
> >>
> >> I would like to add a `weights` keyword to `np.partition`,
> >> `np.percentile` and `np.median`. My reason for doing so is to to allow
> >> `np.histogram` to process automatic bin selection with weights.
> >> Currently, weights are not supported for the automatic bin selection
> >> and would be difficult to support in `auto` mode without having
> >> `np.percentile` support a `weights` keyword. I suspect that there are
> >> many other uses for such a feature.
> >>
> >> I have taken a preliminary look at the C implementation of the
> >> partition functions that are the basis for `partition`, `median` and
> >> `percentile`. I think that it would be possible to add versions (or
> >> just extend the functionality of existing ones) that check the ratio
> >> of the weights below the partition point to the total sum of the
> >> weights instead of just counting elements.
> >>
> >> One of the main advantages of such an implementation is that it would
> >> allow any real weights to be handled correctly, not just integers.
> >> Complex weights would not be supported.
> >>
> >> The purpose of this email is to see if anybody objects, has ideas or
> >> cares at all about this proposal before I spend a significant amount
> >> of time working on it. For example, did I miss any functions in my
> >> list?
> >>
> >> Regards,
> >>
> >> -Joe
> >> ___
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion@scipy.org
> >> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>


statsmodels just got weighted quantiles
https://github.com/statsmodels/statsmodels/pull/2707

I didn't try to figure out it's computational efficiency, and we would
gladly delegate to whatever fast algorithm would be in numpy.

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal to add `weights` to `np.percentile` and `np.median`

2016-02-16 Thread Joseph Fox-Rabinovitz
Thanks for pointing me to that. I had something a bit different in
mind but that definitely looks like a good start.

On Tue, Feb 16, 2016 at 1:32 PM, Antony Lee  wrote:
> See earlier discussion here: https://github.com/numpy/numpy/issues/6326
> Basically, naïvely sorting may be faster than a not-so-optimized version of
> quickselect.
>
> Antony
>
> 2016-02-15 21:49 GMT-08:00 Joseph Fox-Rabinovitz :
>>
>> I would like to add a `weights` keyword to `np.partition`,
>> `np.percentile` and `np.median`. My reason for doing so is to to allow
>> `np.histogram` to process automatic bin selection with weights.
>> Currently, weights are not supported for the automatic bin selection
>> and would be difficult to support in `auto` mode without having
>> `np.percentile` support a `weights` keyword. I suspect that there are
>> many other uses for such a feature.
>>
>> I have taken a preliminary look at the C implementation of the
>> partition functions that are the basis for `partition`, `median` and
>> `percentile`. I think that it would be possible to add versions (or
>> just extend the functionality of existing ones) that check the ratio
>> of the weights below the partition point to the total sum of the
>> weights instead of just counting elements.
>>
>> One of the main advantages of such an implementation is that it would
>> allow any real weights to be handled correctly, not just integers.
>> Complex weights would not be supported.
>>
>> The purpose of this email is to see if anybody objects, has ideas or
>> cares at all about this proposal before I spend a significant amount
>> of time working on it. For example, did I miss any functions in my
>> list?
>>
>> Regards,
>>
>> -Joe
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal to add `weights` to `np.percentile` and `np.median`

2016-02-16 Thread Antony Lee
See earlier discussion here: https://github.com/numpy/numpy/issues/6326
Basically, naïvely sorting may be faster than a not-so-optimized version of
quickselect.

Antony

2016-02-15 21:49 GMT-08:00 Joseph Fox-Rabinovitz :

> I would like to add a `weights` keyword to `np.partition`,
> `np.percentile` and `np.median`. My reason for doing so is to to allow
> `np.histogram` to process automatic bin selection with weights.
> Currently, weights are not supported for the automatic bin selection
> and would be difficult to support in `auto` mode without having
> `np.percentile` support a `weights` keyword. I suspect that there are
> many other uses for such a feature.
>
> I have taken a preliminary look at the C implementation of the
> partition functions that are the basis for `partition`, `median` and
> `percentile`. I think that it would be possible to add versions (or
> just extend the functionality of existing ones) that check the ratio
> of the weights below the partition point to the total sum of the
> weights instead of just counting elements.
>
> One of the main advantages of such an implementation is that it would
> allow any real weights to be handled correctly, not just integers.
> Complex weights would not be supported.
>
> The purpose of this email is to see if anybody objects, has ideas or
> cares at all about this proposal before I spend a significant amount
> of time working on it. For example, did I miss any functions in my
> list?
>
> Regards,
>
> -Joe
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [Suggestion] Labelled Array

2016-02-16 Thread Sérgio
Just something I tried with pandas:

>>> image
array([[[ 0,  1,  2,  3,  4],
[ 5,  6,  7,  8,  9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]],

   [[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39]],

   [[40, 41, 42, 43, 44],
[45, 46, 47, 48, 49],
[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59]]])

>>> label
array([[0, 1, 2, 3, 4],
   [1, 2, 3, 4, 5],
   [2, 3, 4, 5, 6],
   [3, 4, 5, 6, 7]])

>>> dt = pd.DataFrame(np.vstack((label.ravel(), image.reshape(3, 20))).T)
>>> labelled_image = dt.groupby(0)

>>> labelled_image.mean().values
array([[ 0, 20, 40],
   [ 3, 23, 43],
   [ 6, 26, 46],
   [ 9, 29, 49],
   [10, 30, 50],
   [13, 33, 53],
   [16, 36, 56],
   [19, 39, 59]])

Sergio


> Date: Sat, 13 Feb 2016 22:41:13 -0500
> From: Allan Haldane 
> To: numpy-discussion@scipy.org
> Subject: Re: [Numpy-discussion] [Suggestion] Labelled Array
> Message-ID: <56bff759.7010...@gmail.com>
> Content-Type: text/plain; charset=windows-1252; format=flowed
>
> Impressive!
>
> Possibly there's still a case for including a 'groupby' function in
> numpy itself since it's a generally useful operation, but I do see less
> of a need given the nice pandas functionality.
>
> At least, next time someone asks a stackoverflow question like the ones
> below someone should tell them to use pandas!
>
> (copied from my gist for future list reference).
>
> http://stackoverflow.com/questions/4373631/sum-array-by-number-in-numpy
>
> http://stackoverflow.com/questions/31483912/split-numpy-array-according-to-values-in-the-array-a-condition/31484134#31484134
>
> http://stackoverflow.com/questions/31863083/python-split-numpy-array-based-on-values-in-the-array
>
> http://stackoverflow.com/questions/28599405/splitting-an-array-into-two-smaller-arrays-in-python
>
> http://stackoverflow.com/questions/7662458/how-to-split-an-array-according-to-a-condition-in-numpy
>
> Allan
>
>
> On 02/13/2016 01:39 PM, Jeff Reback wrote:
> > In [10]: pd.options.display.max_rows=10
> >
> > In [13]: np.random.seed(1234)
> >
> > In [14]: c = np.random.randint(0,32,size=10)
> >
> > In [15]: v = np.arange(10)
> >
> > In [16]: df = DataFrame({'v' : v, 'c' : c})
> >
> > In [17]: df
> > Out[17]:
> >  c  v
> > 0  15  0
> > 1  19  1
> > 2   6  2
> > 3  21  3
> > 4  12  4
> > ........
> > 5   7  5
> > 6   2  6
> > 7  27  7
> > 8  28  8
> > 9   7  9
> >
> > [10 rows x 2 columns]
> >
> > In [19]: df.groupby('c').count()
> > Out[19]:
> > v
> > c
> > 0   3136
> > 1   3229
> > 2   3093
> > 3   3121
> > 4   3041
> > ..   ...
> > 27  3128
> > 28  3063
> > 29  3147
> > 30  3073
> > 31  3090
> >
> > [32 rows x 1 columns]
> >
> > In [20]: %timeit df.groupby('c').count()
> > 100 loops, best of 3: 2 ms per loop
> >
> > In [21]: %timeit df.groupby('c').mean()
> > 100 loops, best of 3: 2.39 ms per loop
> >
> > In [22]: df.groupby('c').mean()
> > Out[22]:
> > v
> > c
> > 0   49883.384885
> > 1   50233.692165
> > 2   48634.116069
> > 3   50811.743992
> > 4   50505.368629
> > ..   ...
> > 27  49715.349425
> > 28  50363.501469
> > 29  50485.395933
> > 30  50190.155223
> > 31  50691.041748
> >
> > [32 rows x 1 columns]
> >
> >
> > On Sat, Feb 13, 2016 at 1:29 PM,  > > wrote:
> >
> >
> >
> > On Sat, Feb 13, 2016 at 1:01 PM, Allan Haldane
> > > wrote:
> >
> > Sorry, to reply to myself here, but looking at it with fresh
> > eyes maybe the performance of the naive version isn't too bad.
> > Here's a comparison of the naive vs a better implementation:
> >
> > def split_classes_naive(c, v):
> >  return [v[c == u] for u in unique(c)]
> >
> > def split_classes(c, v):
> >  perm = c.argsort()
> >  csrt = c[perm]
> >  div = where(csrt[1:] != csrt[:-1])[0] + 1
> >  return [v[x] for x in split(perm, div)]
> >
> > >>> c = randint(0,32,size=10)
> > >>> v = arange(10)
> > >>> %timeit split_classes_naive(c,v)
> > 100 loops, best of 3: 8.4 ms per loop
> > >>> %timeit split_classes(c,v)
> > 100 loops, best of 3: 4.79 ms per loop
> >
> >
> > The usecases I recently started to target for similar things is 1
> > Million or more rows and 1 uniques in the labels.
> > The second version should be faster for large number of uniques, I
> > guess.
> >
> > Overall numpy is falling far behind pandas in terms of simple
> > groupby operations. bincount and histogram (IIRC) worked for some
> > cases but are rather limited.
> >
> > reduce_at looks nice for cases where it applies.
> >
> > In contrast to 

Re: [Numpy-discussion] NumPy 1.11.0b3 released.

2016-02-16 Thread Sebastian Berg
On Di, 2016-02-16 at 00:13 -0500, josef.p...@gmail.com wrote:
> 
> 
> On Tue, Feb 16, 2016 at 12:09 AM,  wrote:
> > 
> > 

> > 
> > 
> > Or, it forces everyone to watch out for the color of the ducks :)
> > 
> > It's just a number, whether it's python scalar, numpy scalar, 1D or
> > 2D.
> > And once we squeeze, we cannot iterate over it anymore. 
> > 
> > 
> > This looks like the last problem with have in statsmodels master.
> > Part of the reason that 0.10 hurt quite a bit is that we are using
> > in statsmodels some of the grey zones so we don't have to commit to
> > a specific usage. Even if a user or developer tries a "weird" case,
> > it works for most of the results, but breaks in some unknown
> > places. 
> > 
> > 
> I meant 1.11 here.
>  

The reason for this part is that `arr[np.array([1])]` is very different
from `arr[np.array(1)]`. For `list[np.array([1])]` if you allow
`operator.index(np.array([1]))` you will not get equivalent results for
lists and arrays.

The normal array result cannot work for lists. We had open bug reports
about it. Of course I dislike it in any case ;), but that is the
reasoning behind being a bit more restrictive for `__index__`.

- Sebastian


> > (In the current case a cryptic exception would be raised if the
> > user has two constant columns in the regression. Which is fine for
> > some usecases but not for every result.)
> > 
> > Josef
> >  
> > > 
> > > Chuck
> > > 
> > > ___
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion@scipy.org
> > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > > 
> > 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: Numexpr-3.0 proposal

2016-02-16 Thread Francesc Alted
2016-02-16 10:04 GMT+01:00 Robert McLeod :

> On Mon, Feb 15, 2016 at 10:43 AM, Gregor Thalhammer <
> gregor.thalham...@gmail.com> wrote:
>
>>
>> Dear Robert,
>>
>> thanks for your effort on improving numexpr. Indeed, vectorized math
>> libraries (VML) can give a large boost in performance (~5x), except for a
>> couple of basic operations (add, mul, div), which current compilers are
>> able to vectorize automatically. With recent gcc even more functions are
>> vectorized, see https://sourceware.org/glibc/wiki/libmvec But you need
>> special flags depending on the platform (SSE, AVX present?), runtime
>> detection of processor capabilities would be nice for distributing
>> binaries. Some time ago, since I lost access to Intels MKL, I patched
>> numexpr to use Accelerate/Veclib on os x, which is preinstalled on each
>> mac, see https://github.com/geggo/numexpr.git veclib_support branch.
>>
>> As you increased the opcode size, I could imagine providing a bit to
>> switch (during runtime) between internal functions and vectorized ones,
>> that would be handy for tests and benchmarks.
>>
>
> Dear Gregor,
>
> Your suggestion to separate the opcode signature from the library used to
> execute it is very clever. Based on your suggestion, I think that the
> natural evolution of the opcodes is to specify them by function signature
> and library, using a two-level dict, i.e.
>
> numexpr.interpreter.opcodes['exp_f8f8f8'][gnu] = some_enum
> numexpr.interpreter.opcodes['exp_f8f8f8'][msvc] = some_enum +1
> numexpr.interpreter.opcodes['exp_f8f8f8'][vml] = some_enum + 2
> numexpr.interpreter.opcodes['exp_f8f8f8'][yeppp] = some_enum +3
>

Yes, by using a two level dictionary you can access the functions
implementing opcodes much faster and hence you can add much more opcodes
without too much slow-down.


>
> I want to procedurally generate opcodes.cpp and interpreter_body.cpp.  If
> I do it the way you suggested funccodes.hpp and all the many #define's
> regarding function codes in the interpreter can hopefully be removed and
> hence simplify the overall codebase. One could potentially take it a step
> further and plan (optimize) each expression, similar to what FFTW does with
> regards to matrix shape. That is, the basic way to control the library
> would be with a singleton library argument, i.e.:
>
> result = ne.evaluate( "A*log(foo**2 / bar**2", lib=vml )
>
> However, we could also permit a tuple to be passed in, where each element
> of the tuple reflects the library to use for each operation in the AST tree:
>
> result = ne.evaluate( "A*log(foo**2 / bar**2", lib=(gnu,gnu,gnu,yeppp,gnu)
> )
>
> In this case the ops are (mul,mul,div,log,mul).  The op-code picking is
> done by the Python side, and this tuple could be potentially optimized by
> numexpr rather than hand-optimized, by trying various permutations of the
> linked C math libraries. The wisdom from the planning could be pickled and
> saved in a wisdom file.  Currently Numexpr has cacheDict in util.py but
> there's no reason this can't be pickled and saved to disk. I've done a
> similar thing by creating wrappers for PyFFTW already.
>

I like the idea of various permutations of linked C math libraries to be
probed by numexpr during the initial iteration and then cached somehow.
That will probably require run-time detection of available C math libraries
(think that a numexpr binary will be able to run on different machines with
different libraries and computing capabilities), but in exchange, it will
allow for the fastest execution paths independently of the machine that
runs the code.

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Fwd: Numexpr-3.0 proposal

2016-02-16 Thread Robert McLeod
On Mon, Feb 15, 2016 at 10:43 AM, Gregor Thalhammer <
gregor.thalham...@gmail.com> wrote:

>
> Dear Robert,
>
> thanks for your effort on improving numexpr. Indeed, vectorized math
> libraries (VML) can give a large boost in performance (~5x), except for a
> couple of basic operations (add, mul, div), which current compilers are
> able to vectorize automatically. With recent gcc even more functions are
> vectorized, see https://sourceware.org/glibc/wiki/libmvec But you need
> special flags depending on the platform (SSE, AVX present?), runtime
> detection of processor capabilities would be nice for distributing
> binaries. Some time ago, since I lost access to Intels MKL, I patched
> numexpr to use Accelerate/Veclib on os x, which is preinstalled on each
> mac, see https://github.com/geggo/numexpr.git veclib_support branch.
>
> As you increased the opcode size, I could imagine providing a bit to
> switch (during runtime) between internal functions and vectorized ones,
> that would be handy for tests and benchmarks.
>

Dear Gregor,

Your suggestion to separate the opcode signature from the library used to
execute it is very clever. Based on your suggestion, I think that the
natural evolution of the opcodes is to specify them by function signature
and library, using a two-level dict, i.e.

numexpr.interpreter.opcodes['exp_f8f8f8'][gnu] = some_enum
numexpr.interpreter.opcodes['exp_f8f8f8'][msvc] = some_enum +1
numexpr.interpreter.opcodes['exp_f8f8f8'][vml] = some_enum + 2
numexpr.interpreter.opcodes['exp_f8f8f8'][yeppp] = some_enum +3

I want to procedurally generate opcodes.cpp and interpreter_body.cpp.  If I
do it the way you suggested funccodes.hpp and all the many #define's
regarding function codes in the interpreter can hopefully be removed and
hence simplify the overall codebase. One could potentially take it a step
further and plan (optimize) each expression, similar to what FFTW does with
regards to matrix shape. That is, the basic way to control the library
would be with a singleton library argument, i.e.:

result = ne.evaluate( "A*log(foo**2 / bar**2", lib=vml )

However, we could also permit a tuple to be passed in, where each element
of the tuple reflects the library to use for each operation in the AST tree:

result = ne.evaluate( "A*log(foo**2 / bar**2", lib=(gnu,gnu,gnu,yeppp,gnu) )

In this case the ops are (mul,mul,div,log,mul).  The op-code picking is
done by the Python side, and this tuple could be potentially optimized by
numexpr rather than hand-optimized, by trying various permutations of the
linked C math libraries. The wisdom from the planning could be pickled and
saved in a wisdom file.  Currently Numexpr has cacheDict in util.py but
there's no reason this can't be pickled and saved to disk. I've done a
similar thing by creating wrappers for PyFFTW already.

Robert

-- 
Robert McLeod, Ph.D.
Center for Cellular Imaging and Nano Analytics (C-CINA)
Biozentrum der Universität Basel
Mattenstrasse 26, 4058 Basel
Work: +41.061.387.3225
robert.mcl...@unibas.ch
robert.mcl...@bsse.ethz.ch 
robbmcl...@gmail.com
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numexpr-3.0 proposal

2016-02-16 Thread Robert McLeod
On Mon, Feb 15, 2016 at 7:28 AM, Ralf Gommers 
wrote:

>
>
> On Sun, Feb 14, 2016 at 11:19 PM, Robert McLeod 
> wrote:
>
>>
>> 4.) I took a stab at converting from distutils to setuputils but this
>> seems challenging with numpy as a dependency. I wonder if anyone has tried
>> monkey-patching so that setup.py build_ext uses distutils and then pass the
>> interpreter.pyd/so as a data file, or some other such chicanery?
>>
>
> Not sure what you mean, since numpexpr already uses setuptools:
> https://github.com/pydata/numexpr/blob/master/setup.py#L22. What is the
> real goal you're trying to achieve?
>
> This monkeypatching is a bad idea:
> https://github.com/robbmcleod/numexpr/blob/numexpr-3.0/setup.py#L19. Both
> setuptools and numpy.distutils already do that, and that's already one too
> many. So you definitely don't want to add a third place You can use the
> -j (--parallel) flag to numpy.distutils instead, see
> http://docs.scipy.org/doc/numpy-dev/user/building.html#parallel-builds
>
> Ralf
>

Dear Ralf,

Yes, this appears to be a bad idea.  I was just trying to think about if I
could use the more object-oriented approach that I am familiar with in
setuptools to easily build wheels for Pypi.  Thanks for the comments and
links; I didn't know I could parallelize the numpy build.

Robert

-- 
Robert McLeod, Ph.D.
Center for Cellular Imaging and Nano Analytics (C-CINA)
Biozentrum der Universität Basel
Mattenstrasse 26, 4058 Basel
Work: +41.061.387.3225
robert.mcl...@unibas.ch
robert.mcl...@bsse.ethz.ch 
robbmcl...@gmail.com
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion