Re: [Numpy-discussion] Perf regression with Pythran between Numpy 0.19.5 and 0.20 (commit 4cd6e4b336fbc68d88c0e9bc45a435ce7b721f1f, ENH: implement NEP-35's `like=` argument)

2021-03-15 Thread PIERRE AUGIER

- Mail original -
> De: "Juan Nunez-Iglesias" 
> À: "numpy-discussion" 
> Envoyé: Dimanche 14 Mars 2021 07:15:39
> Objet: Re: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 
> and 0.20 explaining a perf regression with
> Pythran

> Hi Pierre,
> 
> If you’re able to compile NumPy locally and you have reliable benchmarks, you
> can write a script that tests the runtime of your benchmark and reports it as 
> a
> test pass/fail. You can then use “git bisect run” to automatically find the
> commit that caused the issue. That will help narrow down the discussion before
> it gets completely derailed a second time. 
> 
> [ https://lwn.net/Articles/317154/ | https://lwn.net/Articles/317154/ ]
> 
> Juan.

Thanks a lot for this advice Juan! I wasn't able to use Git but with `hg 
bisect` I managed to find that the first "bad" commit is

https://github.com/numpy/numpy/commit/4cd6e4b336fbc68d88c0e9bc45a435ce7b721f1f  
 ENH: implement NEP-35's `like=` argument (gh-16935)

From the point of view of my benchmark, this commit changes the behavior of 
arr.copy() (the resulting arrays do not give to the same performance). This 
makes sense because it is indeed about the array creation.

I haven't yet studied in details this commit (which is quite big and not 
simple) and I'm not sure I'm going to be able to understand it and in 
particular understand why it leads to such performance regression!

Cheers,

Pierre

> 
> 
> On 13 Mar 2021, at 10:34 am, PIERRE AUGIER
>  wrote:
> 
> 
> 
> 
> Hi,
> 
> I tried to compile Numpy with `pip install numpy==1.20.1 --no-binary numpy
> --force-reinstall` and I can reproduce the regression.
> 
> Good news, I was able to reproduce the difference with only Numpy 1.20.1.
> 
> Arrays prepared with (`df` is a Pandas dataframe)
> 
> arr = df.values.copy()
> 
> or
> 
> arr = np.ascontiguousarray(df.values)
> 
> lead to "slow" execution while arrays prepared with
> 
> arr = np.copy(df.values)
> 
> lead to faster execution.
> 
> arr.copy() or np.copy(arr) do not give the same result, with arr obtained 
> from a
> Pandas dataframe with arr = df.values. It's strange because type(df.values)
> gives  so I would expect arr.copy() and np.copy(arr) to
> give exactly the same result.
> 
> Note that I think I'm doing quite serious and reproducible benchmarks. I also
> checked that this regression is reproducible on another computer.
> 
> Cheers,
> 
> Pierre
> 
> - Mail original -
> 
> 
> De: "Sebastian Berg" 
> 
> 
> À: "numpy-discussion" 
> 
> 
> Envoyé: Vendredi 12 Mars 2021 22:50:24
> 
> 
> Objet: Re: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 
> and
> 0.20 explaining a perf regression with
> 
> 
> Pythran
> 
> 
> 
> On Fri, 2021-03-12 at 21:36 +0100, PIERRE AUGIER wrote:
> 
> 
> 
> 
> Hi,
> 
> 
> 
> 
> 
> 
> 
> 
> 
> I'm looking for a difference between Numpy 0.19.5 and 0.20 which
> 
> 
> 
> 
> could explain a performance regression (~15 %) with Pythran.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> I observe this regression with the script
> 
> 
> 
> 
> https://github.com/paugier/nbabel/blob/master/py/bench.py
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Pythran reimplements Numpy so it is not about Numpy code for
> 
> 
> 
> 
> computation. However, Pythran of course uses the native array
> 
> 
> 
> 
> contained in a Numpy array. I'm quite sure that something has changed
> 
> 
> 
> 
> between Numpy 0.19.5 and 0.20 (or between the corresponding wheels?)
> 
> 
> 
> 
> since I don't get the same performance with Numpy 0.20. I checked
> 
> 
> 
> 
> that the values in the arrays are the same and that the flags
> 
> 
> 
> 
> characterizing the arrays are also the same.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Good news, I'm now able to obtain the performance difference just
> 
> 
> 
> 
> with Numpy 0.19.5. In this code, I load the data with Pandas and need
> 
> 
> 
> 
> to prepare contiguous Numpy arrays to give them to Pythran. With
> 
> 
> 
> 
> Numpy 0.19.5, if I use np.copy I get better performance that with
> 
> 
> 
> 
> np.ascontiguousarray. With Numpy 0.20, both functions create array
> 
> 
> 
> 
> giving the same performance with Pythran (again, less good that with
> 
> 
> 
> 
> Numpy 0.19.5).
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Note that this code is very efficient (more that 100 times faster
> 
> 
> 
> 
> than

Re: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 and 0.20 explaining a perf regression with Pythran

2021-03-12 Thread PIERRE AUGIER
Hi,

I tried to compile Numpy with `pip install numpy==1.20.1 --no-binary numpy 
--force-reinstall` and I can reproduce the regression.

Good news, I was able to reproduce the difference with only Numpy 1.20.1. 

Arrays prepared with (`df` is a Pandas dataframe)

arr = df.values.copy()

or 

arr = np.ascontiguousarray(df.values)

lead to "slow" execution while arrays prepared with

arr = np.copy(df.values)

lead to faster execution.

arr.copy() or np.copy(arr) do not give the same result, with arr obtained from 
a Pandas dataframe with arr = df.values. It's strange because type(df.values) 
gives  so I would expect arr.copy() and np.copy(arr) to 
give exactly the same result.

Note that I think I'm doing quite serious and reproducible benchmarks. I also 
checked that this regression is reproducible on another computer.

Cheers,

Pierre

- Mail original -
> De: "Sebastian Berg" 
> À: "numpy-discussion" 
> Envoyé: Vendredi 12 Mars 2021 22:50:24
> Objet: Re: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 
> and 0.20 explaining a perf regression with
> Pythran

> On Fri, 2021-03-12 at 21:36 +0100, PIERRE AUGIER wrote:
>> Hi,
>> 
>> I'm looking for a difference between Numpy 0.19.5 and 0.20 which
>> could explain a performance regression (~15 %) with Pythran.
>> 
>> I observe this regression with the script
>> https://github.com/paugier/nbabel/blob/master/py/bench.py
>> 
>> Pythran reimplements Numpy so it is not about Numpy code for
>> computation. However, Pythran of course uses the native array
>> contained in a Numpy array. I'm quite sure that something has changed
>> between Numpy 0.19.5 and 0.20 (or between the corresponding wheels?)
>> since I don't get the same performance with Numpy 0.20. I checked
>> that the values in the arrays are the same and that the flags
>> characterizing the arrays are also the same.
>> 
>> Good news, I'm now able to obtain the performance difference just
>> with Numpy 0.19.5. In this code, I load the data with Pandas and need
>> to prepare contiguous Numpy arrays to give them to Pythran. With
>> Numpy 0.19.5, if I use np.copy I get better performance that with
>> np.ascontiguousarray. With Numpy 0.20, both functions create array
>> giving the same performance with Pythran (again, less good that with
>> Numpy 0.19.5).
>> 
>> Note that this code is very efficient (more that 100 times faster
>> than using Numpy), so I guess that things like alignment or memory
>> location can lead to such difference.
>> 
>> More details in this issue
>> https://github.com/serge-sans-paille/pythran/issues/1735
>> 
>> Any help to understand what has changed would be greatly appreciated!
>> 
> 
> If you want to really dig into this, it would be good to do profiling
> to find out at where the differences are.
> 
> Without that, I don't have much appetite to investigate personally. The
> reason is that fluctuations of ~30% (or even much more) when running
> the NumPy benchmarks are very common.
> 
> I am not aware of an immediate change in NumPy, especially since you
> are talking pythran, and only the memory space or the interface code
> should matter.
> As to the interface code... I would expect it to be quite a bit faster,
> not slower.
> There was no change around data allocation, so at best what you are
> seeing is a different pattern in how the "small array cache" ends up
> being used.
> 
> 
> Unfortunately, getting stable benchmarks that reflect code changes
> exactly is tough...  Here is a nice blog post from Victor Stinner where
> he had to go as far as using "profile guided compilation" to avoid
> fluctuations:
> 
> https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html
> 
> I somewhat hope that this is also the reason for the huge fluctuations
> we see in the NumPy benchmarks due to absolutely unrelated code
> changes.
> But I did not have the energy to try it (and a probably fixed bug in
> gcc makes it a bit harder right now).
> 
> Cheers,
> 
> Sebastian
> 
> 
> 
> 
>> Cheers,
>> Pierre
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>> 
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Looking for a difference between Numpy 0.19.5 and 0.20 explaining a perf regression with Pythran

2021-03-12 Thread PIERRE AUGIER
Hi,

I'm looking for a difference between Numpy 0.19.5 and 0.20 which could explain 
a performance regression (~15 %) with Pythran.

I observe this regression with the script 
https://github.com/paugier/nbabel/blob/master/py/bench.py

Pythran reimplements Numpy so it is not about Numpy code for computation. 
However, Pythran of course uses the native array contained in a Numpy array. 
I'm quite sure that something has changed between Numpy 0.19.5 and 0.20 (or 
between the corresponding wheels?) since I don't get the same performance with 
Numpy 0.20. I checked that the values in the arrays are the same and that the 
flags characterizing the arrays are also the same.

Good news, I'm now able to obtain the performance difference just with Numpy 
0.19.5. In this code, I load the data with Pandas and need to prepare 
contiguous Numpy arrays to give them to Pythran. With Numpy 0.19.5, if I use 
np.copy I get better performance that with np.ascontiguousarray. With Numpy 
0.20, both functions create array giving the same performance with Pythran 
(again, less good that with Numpy 0.19.5).

Note that this code is very efficient (more that 100 times faster than using 
Numpy), so I guess that things like alignment or memory location can lead to 
such difference.

More details in this issue 
https://github.com/serge-sans-paille/pythran/issues/1735

Any help to understand what has changed would be greatly appreciated!

Cheers,
Pierre
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Type annotation for Numpy arrays, accelerators and numpy.typing

2021-02-16 Thread PIERRE AUGIER
Hi,

When Numpy 1.20 was released, I discovered numpy.typing and its documentation 
https://numpy.org/doc/stable/reference/typing.html

I know that it is very new but I'm a bit lost. A good API to describe Array 
type would be useful not only for type checkers but also for Python 
accelerators using ndarrays (in particular Pythran, Numba, Cython, Transonic).

For Transonic, I'd like to be able to use internally numpy.typing to have a 
better implementation of what we need in transonic.typing (in particular 
compatible with type checkers like MyPy).

However, it seems that I can't do anything with what I see today in 
numpy.typing.

For Python-Numpy accelerators, we need to be able to define precise array types 
to limit the compilation time and give useful hints for optimizations (ndim, 
partial or full shape). We also need fused types.

What can be done with Transonic is described in these pages: 
https://transonic.readthedocs.io/en/latest/examples/type_hints.html and 
https://transonic.readthedocs.io/en/latest/generated/transonic.typing.html

I think it would be good to be able to do things like that with numpy.typing. 
It may be already possible but I can't find how in the doc.

I can give few examples here. First very simple:

from transonic import Array

Af3d = Array[float, "3d"]

# Note that this can also be written without Array just as
Af3d = "float[:,:,:]"

# same thing but only contiguous C ordered
Af3d = Array[float, "3d", "C"]

Note: being able to limit the compilation just for C-aligned arrays is very 
important since it can drastically decrease the compilation time/memory and 
that some numerical kernels are anyway written to be efficient only with C (or 
Fortran) ordered arrays.

# 2d color image
A_im = Array[np.int16, "[:,:,3]"]

Now, fused types. This example is taken from a real life case 
(https://foss.heptapod.net/fluiddyn/fluidsim/-/blob/branch/default/fluidsim/base/time_stepping/pseudo_spect.py)
 so it's really useful in practice.

from transonic import Type, NDim, Array, Union

N = NDim(2, 3, 4)
A = Array[np.complex128, N, "C"]
Am1 = Array[np.complex128, N - 1, "C"]

N123 = NDim(1, 2, 3)
A123c = Array[np.complex128, N123, "C"]
A123f = Array[np.float64, N123, "C"]

T = Type(np.float64, np.complex128)
A1 = Array[T, N, "C"]
A2 = Array[T, N - 1, "C"]
ArrayDiss = Union[A1, A2]

To summarize, type annotations are and will also be used for Python-Numpy 
accelerators. It would be good to also consider this application when designing 
numpy.typing.

Cheers,
Pierre
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Comment published in Nature Astronomy about The ecological impact of computing with Python

2021-01-26 Thread PIERRE AUGIER
Hi,

Here are some preliminary results of an experiment on energy consumption 
measurement at Grid'5000 
(https://www.grid5000.fr/w/Energy_consumption_monitoring_tutorial). The goal is 
to have enough to be able to submit a serious comment to a recent article 
published in Nature Astronomy (Zwart, 2020) which recommends to stop using and 
teaching Python because of the ecological impact of computing.

On the attached figure: it represents the CO2 production as a function of 
elapsed time for different implementations (https://github.com/paugier/nbabel). 
It can be compared with 
https://raw.githubusercontent.com/paugier/nbabel/master/py/fig/fig_ecolo_impact_transonic.png
 taken from Zwart (2020).

The implementations labeled "nbabel.org" have been found on 
https://www.nbabel.org/ and have recently been used in the article by Zwart 
(2020). Note that these C++, Fortran and Julia implementations are not well 
optimized. However, I think they are representative of many C++, Fortran and 
Julia codes written by scientists.

There is one simple Pythran implementation which is really fast (slightly 
slower than the fastest implementation in Julia but it is not a big deal).

Note that of course these results do not show that Python is faster that C++!! 
We just show here that it's easy to write in Python **very** efficient 
implementations of numerically intensive problems.

I think soon I'm going to have enough materials to contact the editors to ask 
how we can publicly answer to this comment published in their journal. With 
this work https://github.com/paugier/nbabel (proper energy consumption 
measurements), I clearly demonstrate that the basis of Zwart's comment is 
factually wrong (its benchmark and Figure 3), so that most of its conclusions 
are also wrong.

I'd like to know if some people involved in the community are willing to be 
co-authors of this potential reply.

Cheers,
Pierre

--
Pierre Augier - CR CNRS http://www.legi.grenoble-inp.fr
LEGI (UMR 5519) Laboratoire des Ecoulements Geophysiques et Industriels
BP53, 38041 Grenoble Cedex, Francetel:+33.4.56.52.86.16

- Mail original -----
> De: "PIERRE AUGIER" 
> À: "numpy-discussion" 
> Envoyé: Mardi 24 Novembre 2020 16:47:05
> Objet: [Numpy-discussion] Comment published in Nature Astronomy about The 
> ecological impact of computing with Python

> Hi,
> 
> I recently took a bit of time to study the comment "The ecological impact of
> high-performance computing in astrophysics" published in Nature Astronomy
> (Zwart, 2020, https://www.nature.com/articles/s41550-020-1208-y,
> https://arxiv.org/pdf/2009.11295.pdf), where it is stated that "Best however,
> for the environment is to abandon Python for a more environmentally friendly
> (compiled) programming language.".
> 
> I wrote a simple Python-Numpy implementation of the problem used for this 
> study
> (https://www.nbabel.org) and, accelerated by Transonic-Pythran, it's very
> efficient. Here are some numbers (elapsed times in s, smaller is better):
> 
>| # particles |  Py | C++ | Fortran | Julia |
>|-|-|-|-|---|
>| 1024|  29 |  55 |   41|   45  |
>| 2048| 123 | 231 |  166|  173  |
> 
> The code and a modified figure are here: https://github.com/paugier/nbabel
> (There is no check on the results for https://www.nbabel.org, so one still has
> to be very careful.)
> 
> I think that the Numpy community should spend a bit of energy to show what can
> be done with the existing tools to get very high performance (and low CO2
> production) with Python. This work could be the basis of a serious reply to 
> the
> comment by Zwart (2020).
> 
> Unfortunately the Python solution in https://www.nbabel.org is very bad in 
> terms
> of performance (and therefore CO2 production). It is also true for most of the
> Python solutions for the Computer Language Benchmarks Game in
> https://benchmarksgame-team.pages.debian.net/benchmarksgame/ (codes here
> https://salsa.debian.org/benchmarksgame-team/benchmarksgame#what-else).
> 
> We could try to fix this so that people see that in many cases, it is not
> necessary to "abandon Python for a more environmentally friendly (compiled)
> programming language". One of the longest and hardest task would be to
> implement the different cases of the Computer Language Benchmarks Game in
> standard and modern Python-Numpy. Then, optimizing and accelerating such code
> should be doable and we should be able to get very good performance at least
> for some cases. Good news for this project, (i) the first point can be done by
> anyone with good knowledge in Python-Numpy (many potential workers), (ii) for
> some cases, there are already good Python implementations and (iii) the work
> ca

[Numpy-discussion] Showing by examples how Python-Numpy can be efficient even for computationally intensive tasks

2020-11-26 Thread PIERRE AUGIER
me decreases to 8 s (a x85 speedup!, with Pythran 0.9.8).

I conclude from these types of results that we need to tell Python users how to 
accelerate their Python-Numpy codes when they feel the need of it. I think 
acceleration tools should be mentioned in Numpy website. I also think we should 
spend a bit of energy to play some benchmark games.

It would be much better if we can change the widespread idea on Python 
performance for numerical problems from "Python is very slow and ineffective 
for most algorithms" to "interpreted Python can be very slow but with the 
existing Python accelerators, one can be extremely efficient with Python".

Pierre

> 
> On Tue, Nov 24, 2020 at 11:12 AM Ilhan Polat < [ mailto:ilhanpo...@gmail.com |
> ilhanpo...@gmail.com ] > wrote:
> 
> Do we have to take it seriously to start with? Because, with absolutely no
> offense meant, I am having significant difficulty doing so.
> 
> On Tue, Nov 24, 2020 at 4:58 PM PIERRE AUGIER < [
> mailto:pierre.aug...@univ-grenoble-alpes.fr |
> pierre.aug...@univ-grenoble-alpes.fr ] > wrote:
> 
> 
> Hi,
> 
> I recently took a bit of time to study the comment "The ecological impact of
> high-performance computing in astrophysics" published in Nature Astronomy
> (Zwart, 2020, [ https://www.nature.com/articles/s41550-020-1208-y |
> https://www.nature.com/articles/s41550-020-1208-y ] , [
> https://arxiv.org/pdf/2009.11295.pdf | https://arxiv.org/pdf/2009.11295.pdf ]
> ), where it is stated that "Best however, for the environment is to abandon
> Python for a more environmentally friendly (compiled) programming language.".
> 
> I wrote a simple Python-Numpy implementation of the problem used for this 
> study
> ( [ https://www.nbabel.org/ | https://www.nbabel.org ] ) and, accelerated by
> Transonic-Pythran, it's very efficient. Here are some numbers (elapsed times 
> in
> s, smaller is better):
> 
>| # particles | Py | C++ | Fortran | Julia |
>|-|-|-|-|---|
>| 1024 | 29 | 55 | 41 | 45 |
>| 2048 | 123 | 231 | 166 | 173 |
> 
> The code and a modified figure are here: [ https://github.com/paugier/nbabel |
> https://github.com/paugier/nbabel ] (There is no check on the results for [
> https://www.nbabel.org/ | https://www.nbabel.org ] , so one still has to be
> very careful.)
> 
> I think that the Numpy community should spend a bit of energy to show what can
> be done with the existing tools to get very high performance (and low CO2
> production) with Python. This work could be the basis of a serious reply to 
> the
> comment by Zwart (2020).
> 
> Unfortunately the Python solution in [ https://www.nbabel.org/ |
> https://www.nbabel.org ] is very bad in terms of performance (and therefore 
> CO2
> production). It is also true for most of the Python solutions for the Computer
> Language Benchmarks Game in [
> https://benchmarksgame-team.pages.debian.net/benchmarksgame/ |
> https://benchmarksgame-team.pages.debian.net/benchmarksgame/ ] (codes here [
> https://salsa.debian.org/benchmarksgame-team/benchmarksgame#what-else |
> https://salsa.debian.org/benchmarksgame-team/benchmarksgame#what-else ] ).
> 
> We could try to fix this so that people see that in many cases, it is not
> necessary to "abandon Python for a more environmentally friendly (compiled)
> programming language". One of the longest and hardest task would be to
> implement the different cases of the Computer Language Benchmarks Game in
> standard and modern Python-Numpy. Then, optimizing and accelerating such code
> should be doable and we should be able to get very good performance at least
> for some cases. Good news for this project, (i) the first point can be done by
> anyone with good knowledge in Python-Numpy (many potential workers), (ii) for
> some cases, there are already good Python implementations and (iii) the work
> can easily be parallelized.
> 
> It is not a criticism, but the (beautiful and very nice) new Numpy website [
> https://numpy.org/ | https://numpy.org/ ] is not very convincing in terms of
> performance. It's written "Performant The core of NumPy is well-optimized C
> code. Enjoy the flexibility of Python with the speed of compiled code." It's
> true that the core of Numpy is well-optimized C code but to seriously compete
> with C++, Fortran or Julia in terms of numerical performance, one needs to use
> other tools to move the compiled-interpreted boundary outside the hot loops. 
> So
> it could be reasonable to mention such tools (in particular Numba, Pythran,
> Cython and Transonic).
> 
> Is there already something planned to answer to Zwart (2020)?
> 
> Any opinions or suggestions on this potential project?
> 
> Pierr

[Numpy-discussion] Comment published in Nature Astronomy about The ecological impact of computing with Python

2020-11-24 Thread PIERRE AUGIER
Hi,

I recently took a bit of time to study the comment "The ecological impact of 
high-performance computing in astrophysics" published in Nature Astronomy 
(Zwart, 2020, https://www.nature.com/articles/s41550-020-1208-y, 
https://arxiv.org/pdf/2009.11295.pdf), where it is stated that "Best however, 
for the environment is to abandon Python for a more environmentally friendly 
(compiled) programming language.".

I wrote a simple Python-Numpy implementation of the problem used for this study 
(https://www.nbabel.org) and, accelerated by Transonic-Pythran, it's very 
efficient. Here are some numbers (elapsed times in s, smaller is better):

| # particles |  Py | C++ | Fortran | Julia |
|-|-|-|-|---|
| 1024|  29 |  55 |   41|   45  |
| 2048| 123 | 231 |  166|  173  |

The code and a modified figure are here: https://github.com/paugier/nbabel 
(There is no check on the results for https://www.nbabel.org, so one still has 
to be very careful.)

I think that the Numpy community should spend a bit of energy to show what can 
be done with the existing tools to get very high performance (and low CO2 
production) with Python. This work could be the basis of a serious reply to the 
comment by Zwart (2020).

Unfortunately the Python solution in https://www.nbabel.org is very bad in 
terms of performance (and therefore CO2 production). It is also true for most 
of the Python solutions for the Computer Language Benchmarks Game in 
https://benchmarksgame-team.pages.debian.net/benchmarksgame/ (codes here 
https://salsa.debian.org/benchmarksgame-team/benchmarksgame#what-else).

We could try to fix this so that people see that in many cases, it is not 
necessary to "abandon Python for a more environmentally friendly (compiled) 
programming language". One of the longest and hardest task would be to 
implement the different cases of the Computer Language Benchmarks Game in 
standard and modern Python-Numpy. Then, optimizing and accelerating such code 
should be doable and we should be able to get very good performance at least 
for some cases. Good news for this project, (i) the first point can be done by 
anyone with good knowledge in Python-Numpy (many potential workers), (ii) for 
some cases, there are already good Python implementations and (iii) the work 
can easily be parallelized.

It is not a criticism, but the (beautiful and very nice) new Numpy website 
https://numpy.org/ is not very convincing in terms of performance. It's written 
"Performant The core of NumPy is well-optimized C code. Enjoy the flexibility 
of Python with the speed of compiled code." It's true that the core of Numpy is 
well-optimized C code but to seriously compete with C++, Fortran or Julia in 
terms of numerical performance, one needs to use other tools to move the 
compiled-interpreted boundary outside the hot loops. So it could be reasonable 
to mention such tools (in particular Numba, Pythran, Cython and Transonic).

Is there already something planned to answer to Zwart (2020)?

Any opinions or suggestions on this potential project?

Pierre

PS: Of course, alternative Python interpreters (PyPy, GraalPython, Pyjion, 
Pyston, etc.) could also be used, especially if HPy 
(https://github.com/hpyproject/hpy) is successful (C core of Numpy written in 
HPy, Cython able to produce HPy code, etc.). However, I tend to be a bit 
skeptical in the ability of such technologies to reach very high performance 
for low-level Numpy code (performance that can be reached by replacing whole 
Python functions with optimized compiled code). Of course, I hope I'm wrong! 
IMHO, it does not remove the need for a successful HPy!

--
Pierre Augier - CR CNRS http://www.legi.grenoble-inp.fr
LEGI (UMR 5519) Laboratoire des Ecoulements Geophysiques et Industriels
BP53, 38041 Grenoble Cedex, Francetel:+33.4.56.52.86.16
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Transonic Vision: unifying Python-Numpy accelerators

2019-11-12 Thread PIERRE AUGIER


> Date: Wed, 6 Nov 2019 23:49:08 -0500
> From: Ralf Gommers 
> To: Discussion of Numerical Python 
> Subject: Re: [Numpy-discussion] Transonic Vision: unifying
>   Python-Numpy accelerators
> Message-ID:
>   
> Content-Type: text/plain; charset="utf-8"
> 
> On Mon, Nov 4, 2019 at 4:54 PM PIERRE AUGIER <
> pierre.aug...@univ-grenoble-alpes.fr> wrote:
> 
>> Dear Python-Numpy community,
>>
>> Transonic is a pure Python package to easily accelerate modern
>> Python-Numpy code with different accelerators (currently Cython, Pythran
>> and Numba).
>>
>> I'm trying to get some funding for this project. The related work would
>> benefit in particular to Cython, Numba, Pythran and Xtensor.
>>
>> To obtain this funding, we really need some feedback from some people
>> knowing the subject of performance with Python-Numpy code.
>>
>> That's one of the reason why we wrote this long and serious text on
>> Transonic Vision: http://tiny.cc/transonic-vision. We describe some
>> issues (perf for numerical kernels, incompatible accelerators, community
>> split between experts and simple users, ...) and possible improvements.
>>
> 
> Thanks Pierre, that's a very interesting vision paper.

Thanks Ralf for this kind and interesting reply!

> 
> In case you haven't seen it, there was a discussion on the pandas-dev
> mailing list a couple of weeks ago about adopting Numba as a dependency
> (and issues with that).
> 
> Your comment on my assessment from 1.5 years ago being a little unfair to
> Pythran may be true - not sure it was at the time, but Pythran seems to
> mature nicely.
> 
> The ability to switch between just-in-time and ahead-of-time compilation is
> nice. One thing I noticed is that this actual switching is not completely
> fluent: the jit and boost decorators have different signatures, and there's
> no way to globally switch behavior (say with an env var, as for backend
> selection).

Yes, it seems evident now but I forgot to update the jit decorators when I was 
working on the boost decorator.  
My first "targets" for Transonic are packages for which the ahead-of-time mode 
seems more adapted.

This incompatibility between the 2 main decorators used in Transonic will soon 
be fixed!

Regarding the way to globally switch behavior, I'll open a dedicated issue.

>> Help would be very much appreciated.
>>
> 
> I'd be interested to help think about adoption and/or funding.
> 
> Cheers,
> Ralf
>

As you've seen with the jit/boost incompatibility, I guess API design would be 
better if people knowing the subject could be included in some discussions.

For example, I had to design the Python API for type declaration of arrays (see 
https://transonic.readthedocs.io/en/latest/generated/transonic.typing.html) 
since I didn't find anything adapted. My implementation is not great neither 
since types in transonic.typing and in `typing` are right now not compatible ! 
(However, it won't be difficult to fix that)

Another API design that needs to be thought is about user-defined types in 
Transonic. This is for future because Pythran does not currently support that, 
but I think we will have to be able to support kind of dataclass, something 
like the equivalent of C struct (corresponding to Cython `cdef class` and Numba 
`jitclass`).

A more theoretical subject that would be interesting to investigate is about 
the subset of Python-Numpy that can and should be implemented by accelerators. 
For example, I think a function having different branches with different types 
for the returned objects depending of runtime values cannot be rewritten as 
efficient modern C++.

If you know people potentially interested to discuss about these subjects, 
please tell me.

Cheers,
Pierre

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Transonic Vision: unifying Python-Numpy accelerators

2019-11-04 Thread PIERRE AUGIER
Dear Python-Numpy community,

Transonic is a pure Python package to easily accelerate modern Python-Numpy 
code with different accelerators (currently Cython, Pythran and Numba).

I'm trying to get some funding for this project. The related work would benefit 
in particular to Cython, Numba, Pythran and Xtensor.

To obtain this funding, we really need some feedback from some people knowing 
the subject of performance with Python-Numpy code.

That's one of the reason why we wrote this long and serious text on Transonic 
Vision: http://tiny.cc/transonic-vision. We describe some issues (perf for 
numerical kernels, incompatible accelerators, community split between experts 
and simple users, ...) and possible improvements.

Help would be very much appreciated.

Now a coding riddle:

import numpy as np
from transonic import jit

@jit(native=True, xsimd=True)
def fxfy(ft, fn, theta):
sin_theta = np.sin(theta)
cos_theta = np.cos(theta)
fx = cos_theta * ft - sin_theta * fn
fy = sin_theta * ft + cos_theta * fn
return fx, fy

@jit(native=True, xsimd=True)
def fxfy_loops(ft, fn, theta):
n0 = theta.size
fx = np.empty_like(ft)
fy = np.empty_like(fn)
for index in range(n0):
sin_theta = np.sin(theta[index])
cos_theta = np.cos(theta[index])
fx[index] = cos_theta * ft[index] - sin_theta * fn[index]
fy[index] = sin_theta * ft[index] + cos_theta * fn[index]
return fx, fy

How can be compared the performances of these functions with pure Numpy, Numba 
and Pythran ?

You can find out the answer in our note http://tiny.cc/transonic-vision :-)

Pierre

> Message: 1
> Date: Thu, 31 Oct 2019 21:16:06 +0100 (CET)
> From: PIERRE AUGIER 
> To: numpy-discussion@python.org
> Subject: [Numpy-discussion] Transonic Vision: unifying Python-Numpy
>   accelerators
> Message-ID:
>   
> <1080118635.5930814.1572552966711.javamail.zim...@univ-grenoble-alpes.fr>
>   
> Content-Type: text/plain; charset=utf-8
> 
> Dear Python-Numpy community,
> 
> Few years ago I started to use a lot Python and Numpy for science. I'd like to
> thanks all people who contribute to this fantastic community.
> 
> I used a lot Cython, Pythran and Numba and for the FluidDyn project, we 
> created
> Transonic, a pure Python package to easily accelerate modern Python-Numpy code
> with different accelerators. We wrote a long and serious text to explain why 
> we
> think Transonic could have a positive impact on the scientific Python
> ecosystem.
> 
> Here it is: http://tiny.cc/transonic-vision
> 
> Feedback and discussions would be greatly appreciated!
> 
> Pierre
> 
> --
> Pierre Augier - CR CNRS http://www.legi.grenoble-inp.fr
> LEGI (UMR 5519) Laboratoire des Ecoulements Geophysiques et Industriels
> BP53, 38041 Grenoble Cedex, Francetel:+33.4.56.52.86.16
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Transonic Vision: unifying Python-Numpy accelerators

2019-10-31 Thread PIERRE AUGIER
Dear Python-Numpy community,

Few years ago I started to use a lot Python and Numpy for science. I'd like to 
thanks all people who contribute to this fantastic community.

I used a lot Cython, Pythran and Numba and for the FluidDyn project, we created 
Transonic, a pure Python package to easily accelerate modern Python-Numpy code 
with different accelerators. We wrote a long and serious text to explain why we 
think Transonic could have a positive impact on the scientific Python ecosystem.

Here it is: http://tiny.cc/transonic-vision

Feedback and discussions would be greatly appreciated!

Pierre

--
Pierre Augier - CR CNRS http://www.legi.grenoble-inp.fr
LEGI (UMR 5519) Laboratoire des Ecoulements Geophysiques et Industriels
BP53, 38041 Grenoble Cedex, Francetel:+33.4.56.52.86.16
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Efficiency of Numpy wheels and simple way to benchmark Numpy installation?

2018-05-27 Thread PIERRE AUGIER
Hello,

I don't know if it is a good place to ask such questions. As advised here 
https://www.scipy.org/scipylib/mailing-lists.html#stackoverflow, I first posted 
a question on stackoverflow:

https://stackoverflow.com/questions/50475989/efficiency-of-numpy-wheels-and-simple-benchmark-for-numpy-installations

Since I got no feedback, I try here. My questions are:

- When we care about performance, is it a good practice to rely on wheels 
(especially for Numpy)? Will it be slower than using (for example) a conda 
built Numpy?

- Are there simple commands to benchmark Numpy installations and get a good 
idea of their overall performance?

I explain a little bit more in the stackoverflow question...

Pierre Augier
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion