On Tue, Jul 2, 2013 at 6:30 PM, Pekka Jääskeläinen <
[email protected]> wrote:

> On 07/02/2013 10:56 PM, Erik Schnetter wrote:
>
>> Pocl automatically sets fp-contract=off when generating code. This has
>> significant implications; in particular, nans are then handled
>> incorrectly. Pocl
>> should never do this on its own -- this needs to be a user option,
>> enabled e.g.
>> by pragmas or by choosing fast-math.
>>
>
> How does this have implications to handling NaNs?
>
> This is what clang --help states:
>
> -ffp-contract=<value>   Form fused FP ops (e.g. FMAs): fast (everywhere) |
> on (according to FP_CONTRACT pragma, default) | off (never fuse)
>

Interesting. I'll investigate more. Maybe there is interference from
another bug. I thought that fp-contract=off would also split explicit fma
calls, but that doesn't seem to be the issue here.

One cannot now enable the automatic fusing of fmul+fadd (fast) using the
> pragma, but I thought it's safe this way around. In Clang 3.2 it didn't
> fuse
> them automatically and in 3.3 it started to convert them to the fmuladd
> intrinsics, without considering the ISA support (at least properly!),
> leading to the perf regression.
>
>
>  A nearby comment states
>>
>> # With fp-contract we get calls to fma with processors which do not
>> # have fma instructions. These ruin the performance. Better to have
>> # the mul+add separated in the IR.
>>
>> Which architecture and which benchmark is this?
>>
>
> At least my Core 2 duo. The above mentioned CFD of Rodinia at least
> had a huge runtime explosion from this as it ended up having several
> function calls instead of the direct machine instructions. Several others
> slowed down too.
>
>
>  Could we instead use fast-math for benchmarks only?
>>
>
> The case here is the other way around? I force-disabled an
> fp optimization as it seems to not be always a clear improvement.
> So I'm not forcing an unsafe fast FP optimization here.
>
> This was done until a better way around this is found.
>
> Some background for the Clang's fp-contract implementation:
> http://comments.gmane.org/**gmane.comp.compilers.llvm.cvs/**114632<http://comments.gmane.org/gmane.comp.compilers.llvm.cvs/114632>
>
> So.. perhaps Subtarget::isFMACheap() is LLVM should return false on
> the Core 2 Duo and the problem should go away.


Yes, it should. Cheap FMAs are only available on modern AMD processors, and
not on any Intel processors yet.

-erik

-- 
Erik Schnetter <[email protected]>
http://www.perimeterinstitute.ca/personal/eschnetter/
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
pocl-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pocl-devel

Reply via email to