Re: [Python-ideas] Fused multiply-add (FMA)

2017-01-17 Thread Xavier Combelle
I never said it was impossible, just very hard. Le 17/01/2017 à 16:48, Stephan Houben a écrit : > Hi Xavier, > > In this bright age of IEEE-754 compatible CPUs, > it is certainly possible to achieve reproducible FP. > I worked for a company whose software produced bit-identical > results on vario

Re: [Python-ideas] Fused multiply-add (FMA)

2017-01-17 Thread Gregory P. Smith
Makes sense, thanks! math.fma() it is. :) On Tue, Jan 17, 2017, 7:48 AM Stephan Houben wrote: > Hi Xavier, > > In this bright age of IEEE-754 compatible CPUs, > it is certainly possible to achieve reproducible FP. > I worked for a company whose software produced bit-identical > results on vario

Re: [Python-ideas] Fused multiply-add (FMA)

2017-01-17 Thread Stephan Houben
Hi Xavier, In this bright age of IEEE-754 compatible CPUs, it is certainly possible to achieve reproducible FP. I worked for a company whose software produced bit-identical results on various CPUs (x86, Sparc, Itanium) and OSes (Linux, Solaris, Windows). The trick is to closely RTFM for your CPU

Re: [Python-ideas] Fused multiply-add (FMA)

2017-01-17 Thread Xavier Combelle
> Generally speaking, there are two reasons why people may *not* want an > FMA operation. > 1. They need their results to be reproducible across > compilers/platforms. (the most common reason) > The reproducibility of floating point calculation is very hard to reach a good survey of the problem i

Re: [Python-ideas] Fused multiply-add (FMA)

2017-01-17 Thread Stephan Houben
Hi Gregory, 2017-01-16 20:28 GMT+01:00 Gregory P. Smith : > Is there a good reason not to detect single expression multiply adds and > just emit a new FMA bytecode? > Yes ;-) See below. > Is our goal for floats to strictly match the result of the same operations > coded in unoptimized C using d

Re: [Python-ideas] Fused multiply-add (FMA)

2017-01-16 Thread Sven R. Kunze
On 16.01.2017 20:28, Gregory P. Smith wrote: Is there a good reason not to detect single expression multiply adds and just emit a new FMA bytecode? Same question here. ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/m

Re: [Python-ideas] Fused multiply-add (FMA)

2017-01-16 Thread Gregory P. Smith
Is there a good reason not to detect single expression multiply adds and just emit a new FMA bytecode? Is our goal for floats to strictly match the result of the same operations coded in unoptimized C using doubles? Or can we be more precise on occasion? I guess a similar question may be asked o

Re: [Python-ideas] Fused multiply-add (FMA)

2017-01-16 Thread David Mertz
My understanding is that NumPy does NOT currently support a direct FMA operation "natively." However, higher-level routines like `numpy.linalg.solve` that are linked to MKL or BLAS DO take advantage of FMA within the underlying libraries. On Mon, Jan 16, 2017 at 10:06 AM, Guido van Rossum wrote:

Re: [Python-ideas] Fused multiply-add (FMA)

2017-01-16 Thread Guido van Rossum
Does numpy support this? --Guido (mobile) On Jan 16, 2017 7:27 AM, "Stephan Houben" wrote: > Hi Steve, > > Very good! > Here is a version which also handles the nan's, infinities, > negative zeros properly. > > === > import math > from fractions import Fraction > > def fma2(x, y, z)

Re: [Python-ideas] Fused multiply-add (FMA)

2017-01-16 Thread Stephan Houben
Hi Steve, Very good! Here is a version which also handles the nan's, infinities, negative zeros properly. === import math from fractions import Fraction def fma2(x, y, z): if math.isfinite(x) and math.isfinite(y) and math.isfinite(z): result = float(Fraction(x)*Fraction(y

Re: [Python-ideas] Fused multiply-add (FMA)

2017-01-16 Thread Steven D'Aprano
On Mon, Jan 16, 2017 at 11:01:23AM +0100, Stephan Houben wrote: [...] > So the following would not be a valid FMA fallback > > double bad_fma(double x, double y, double z) { > return x*y + z; > } [...] > Upshot: if we want to provide a software fallback in the Python code, we > need to do somet

Re: [Python-ideas] Fused multiply-add (FMA)

2017-01-16 Thread Stephan Houben
Hi Victor, The fallback implementations in the various libc take care to preserve the correct rounding behaviour. Let me stress that *fused* multiply-add means the specific rounding behaviour as defined in the standard IEEE-754 2008 (i.e. with guaranteed *no* intermediate rounding). So the follo

Re: [Python-ideas] Fused multiply-add (FMA)

2017-01-16 Thread Victor Stinner
2017-01-15 18:25 GMT+01:00 Juraj Sukop : > C99 includes `fma` function to `math.h` [6] and emulates the calculation if > the FMA instruction is not present on the host CPU [7]. If even the libc function has a fallback on x*y followed by +z, it's fine to add such function to the Python stdlib. It m

Re: [Python-ideas] Fused multiply-add (FMA)

2017-01-15 Thread Mark Dickinson
On Sun, Jan 15, 2017 at 5:25 PM, Juraj Sukop wrote: > This proposal is then about adding new `fma` function with the following > signature to `math` module: > > math.fma(x, y, z) Sounds good to me. Please could you open an issue on the bug tracker (http://bugs.python.org)? Thanks, Mark

Re: [Python-ideas] Fused multiply-add (FMA)

2017-01-15 Thread Chris Angelico
On Mon, Jan 16, 2017 at 4:25 AM, Juraj Sukop wrote: > There is a simple module for Python 3 demonstrating the fused multiply-add > operation which was build with simple `python3 setup.py build` under Linux > [9]. > > Any feedback is greatly appreciated! +1. Just tried it out, and apart from dropp

Re: [Python-ideas] Fused multiply-add (FMA)

2017-01-15 Thread Stephan Houben
Hi Juraj, I think this would be a very useful addition to the `math' module. The gating issue is probably C compiler support. The most important non-C99 C compiler for Python is probably MS Visual Studio. And that one appears to support it: https://msdn.microsoft.com/en-us/library/mt720715.aspx

[Python-ideas] Fused multiply-add (FMA)

2017-01-15 Thread Juraj Sukop
Hello! Fused multiply-add (henceforth FMA) is an operation which calculates the product of two numbers and then the sum of the product and a third number with just one floating-point rounding. More concretely: r = x*y + z The value of `r` is the same as if the RHS was calculated with infinit