Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-21 Thread Richard Earnshaw (lists)
On 13/09/16 12:35, Wilco Dijkstra wrote:
> Jakub wrote:
>> On Mon, Sep 12, 2016 at 04:19:32PM +, Tamar Christina wrote:
>>> This patch adds an optimized route to the fpclassify builtin
>>> for floating point numbers which are similar to IEEE-754 in format.
>>>
>>> The goal is to make it faster by:
>>> 1. Trying to determine the most common case first
>>>(e.g. the float is a Normal number) and then the
>>>rest. The amount of code generated at -O2 are
>>>about the same +/- 1 instruction, but the code
>>>is much better.
>>> 2. Using integer operation in the optimized path.
>>
>> Is it generally preferable to use integer operations for this instead
>> of floating point operations?  I mean various targets have quite high costs
>> of moving data in between the general purpose and floating point register
>> file, often it has to go through memory etc.
> 
> It is generally preferable indeed - there was a *very* long discussion about 
> integer
> vs FP on the GLIBC mailing list when I updated math.h to use the GCC builtins 
> a
> while back (the GLIBC implementation used a non-inlined unoptimized integer
> implementation, so an inlined FP implementation seemed a good intermediate 
> solution).
> 
> Integer operations are generally lower latency and enable bit manipulation 
> tricks like the
> fast early exit. The FP version requires execution of 5 branches for a 
> "normal" FP value
> and loads several floating point immediates. There are also many targets with 
> emulated
> floating point types, so 5 calls to the comparison lib function would be 
> seriously slow.
> Note using so many FP comparisons is not just slow but they aren't correct 
> for signalling
> NaNs, so this patch also fixes bug 66462 for fpclassify.

And don't forget that getting the results of a floating-point comparison
back to the branch unit may be no faster than transferring the value in
the first place.

R.

> 
> I would suggest someone with access to a machine with slow FP moves (POWER?)
> to benchmark this using the fpclassify test 
> (glibc/benchtests/bench-math-inlines.c)
> so we know for sure.
> 
> Wilco
> 



Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-21 Thread Richard Biener
On Wed, 21 Sep 2016, Joseph Myers wrote:

> On Tue, 20 Sep 2016, Michael Meissner wrote:
> 
> > It would be better to have a fpclassify2 pattern, and if it isn't
> > defined, then do the machine independent processing.  That is the way it is
> > done elsewhere.
> 
> But note:
> 
> * The __builtin_fpclassify function takes arguments for all the possible 
> FP_* results, so the insn pattern would need to map the results to the 
> arguments passed to __builtin_fpclassify.  (They are documented as needing 
> to be constants, of type int.)

Yeah, that's the reason we "lower" this early.

> * Then you want that mapping step to get optimized away in the case of a 
> comparison fpclassify (...) == FP_SUBNORMAL (for example), or a switch 
> over possible results.  Will the RTL optimizers do that given the insns 
> structured appropriately?

I think it makes sense to fold fpclassify (...) == N to more specific
classification builtins that do not have this issue if possible.  OTOH
RTL expansion could detect some of the non-builtin ways to do such checks
and see if an optab exists as well.

> (For that matter, I don't know if the GIMPLE optimizers will optimize away 
> such a mapping either, but they clearly should.  I've wondered what the 
> right approach would be for making FLT_ROUNDS properly depend on the 
> rounding mode - bug 30569, 
>  - where the same issues 
> apply.  For boolean operations such as isnan you don't have such 
> complications.)

I think they do via jump-threading.

Richard.


Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-21 Thread Richard Biener
On Tue, 20 Sep 2016, Jeff Law wrote:

> On 09/20/2016 06:00 AM, Tamar Christina wrote:
> > 
> > 
> > On 16/09/16 20:49, Jeff Law wrote:
> > > On 09/12/2016 10:19 AM, Tamar Christina wrote:
> > > > Hi All,
> > > > +
> > > > +  /* Re-interpret the float as an unsigned integer type
> > > > + with equal precision.  */
> > > > +  int_arg_type = build_nonstandard_integer_type (TYPE_PRECISION
> > > > (type), 0);
> > > > +  int_arg = fold_build1_loc (loc, INDIRECT_REF, int_arg_type,
> > > > +  fold_build1_loc (loc, NOP_EXPR,
> > > > +   build_pointer_type (int_arg_type),
> > > > +fold_build1_loc (loc, ADDR_EXPR,
> > > > + build_pointer_type (type), arg)));
> > > Doesn't this make ARG addressable?  Which in turn means ARG won't be
> > > exposed to the gimple/ssa optimizers.Or is it the case that when
> > > fpclassify is used its argument is already in memory (and thus
> > > addressable?)
> > > 
> > I believe that it is the case that when fpclassify is use the argument
> > is already addressable, but I am not 100% certain. I may be able to do
> > this differently so I'll come back to you on this one.
> The more I think about it, the more I suspect ARG is only going to already be
> marked as addressable if it has already had its address taken.

Sure, if it has it's address taken ... but I don't see how
fpclassify requires the arg to be address taken.

> But I think we can look at this as an opportunity.  If ARG is already
> addressable, then it's most likely going to be living in memory (there are
> exceptions).  If ARG is most likely going to be living in memory, then we
> clearly want to use your fast integer path, regardless of the target.
> 
> If ARG is not addressable, then it's not as clear as the object is likely
> going to be assigned into an FP register.  Integer operations on the an FP
> register likely will force a sequence where we dump the register into memory,
> load from memory into a GPR, then bit test on the GPR.  That gets very
> expensive on some architectures.
> 
> Could we defer lowering in the case where the object is not addressable until
> gimple->rtl expansion time?  That's the best time to introduce target
> dependencies into the code we generate.

Note that GIMPLE doesn't require sth to be addressable just because
you access random pieces of it.  The IL has tricks like allowing
MEM[ + CST] w/o actually marking decl TREE_ADDRESSABLE (and the
expanders trying to cope with that) and there is of course
BIT_FIELD_REF which you can use to extract arbitrary bits off any
entity without it living in memory (and again the expanders trying to
cope with that).

So may I suggest to move the "folding" from builtins.c to gimplify.c
and simply emit GIMPLE directly there?  That would make it also clearer
that we are dealing with a lowering process rather than a "folding".

Doing it in GIMPLE lowering is another possibility - we lower things
like posix_memalign and setjmp there as well.

Thanks,
Richard.


Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-20 Thread Joseph Myers
On Tue, 20 Sep 2016, Michael Meissner wrote:

> It would be better to have a fpclassify2 pattern, and if it isn't
> defined, then do the machine independent processing.  That is the way it is
> done elsewhere.

But note:

* The __builtin_fpclassify function takes arguments for all the possible 
FP_* results, so the insn pattern would need to map the results to the 
arguments passed to __builtin_fpclassify.  (They are documented as needing 
to be constants, of type int.)

* Then you want that mapping step to get optimized away in the case of a 
comparison fpclassify (...) == FP_SUBNORMAL (for example), or a switch 
over possible results.  Will the RTL optimizers do that given the insns 
structured appropriately?

(For that matter, I don't know if the GIMPLE optimizers will optimize away 
such a mapping either, but they clearly should.  I've wondered what the 
right approach would be for making FLT_ROUNDS properly depend on the 
rounding mode - bug 30569, 
 - where the same issues 
apply.  For boolean operations such as isnan you don't have such 
complications.)

* If flag_signaling_nans, then any pattern should work for signaling NaNs.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-20 Thread Michael Meissner
On Tue, Sep 20, 2016 at 01:19:07PM +0100, Tamar Christina wrote:
> On 19/09/16 23:16, Michael Meissner wrote:
> >On Mon, Sep 12, 2016 at 04:19:32PM +, Tamar Christina wrote:
> >>Hi All,
> >>
> >>This patch adds an optimized route to the fpclassify builtin
> >>for floating point numbers which are similar to IEEE-754 in format.
> >>
> >>The goal is to make it faster by:
> >>1. Trying to determine the most common case first
> >>(e.g. the float is a Normal number) and then the
> >>rest. The amount of code generated at -O2 are
> >>about the same +/- 1 instruction, but the code
> >>is much better.
> >>2. Using integer operation in the optimized path.
> >>
> >>At a high level, the optimized path uses integer operations
> >>to perform the following:
> >>
> >>   if (exponent bits aren't all set or unset)
> >>  return Normal;
> >>   else if (no bits are set on the number after masking out
> >>sign bits then)
> >>  return Zero;
> >>   else if (exponent has no bits set)
> >>  return Subnormal;
> >>   else if (mantissa has no bits set)
> >>  return Infinite;
> >>   else
> >>  return NaN;
> >I haven't looked at fpclassify.  I assume we can define a backend insn to do
> >the right thing?  One of the things we've noticed over the years with the
> >PowerPC is that it can be rather expensive to move things from the floating
> >point/vector unit to the integer registers and vice versa.  This is
> >particularly true if you having to do the transfer via the memory unit via
> >stores and loads of different sizes.
> >
> Hmm, what do you mean with the right thing? Do you mean never to use the
> integer version?

The forthcoming PowerPC with ISA 3.0 (power9), we have different ways to do
classification within the floating point unit.

For example, there is the XSTSTDCDP instruction that can set a condition code
register to whether the value is 0, NaN, Infinity, Denormal.  We might come up
with a clever set of tests to use 4 of these instructions to return the
appropriate FP_.

Even if we want to do it by looking at the exponent, ISA 3.0 defines
instructions like XSXEXPDP that extracts the exponent from a double precision
value and returns it in a GPR register.

> If so then no, it currently determines it based on the format.
> I could potentially add a hook to allow backends to opt-in/out if
> there's a concern this might be slower.

It would be better to have a fpclassify2 pattern, and if it isn't
defined, then do the machine independent processing.  That is the way it is
done elsewhere.

> Though is the move that much slower that it negates the benefits we
> should get from not having to do
> 4 branches in the normal case?

It depends.  We have a lot of other stuff for ISA 3.0 on our plates, and
truthfully, we won't be able to answer the question about performance until we
get real hardware, but I would prefer not to be locked into an existing
implementation.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-20 Thread Joseph Myers
On Tue, 20 Sep 2016, Jeff Law wrote:

> Could we defer lowering in the case where the object is not addressable until
> gimple->rtl expansion time?  That's the best time to introduce target
> dependencies into the code we generate.

If we do that (remembering that -fsignaling-nans always wants the integer 
path), we need to make sure there are tests of fpclassify that reliably 
exercise both paths

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-20 Thread Jeff Law

On 09/20/2016 06:00 AM, Tamar Christina wrote:



On 16/09/16 20:49, Jeff Law wrote:

On 09/12/2016 10:19 AM, Tamar Christina wrote:

Hi All,
+
+  /* Re-interpret the float as an unsigned integer type
+ with equal precision.  */
+  int_arg_type = build_nonstandard_integer_type (TYPE_PRECISION
(type), 0);
+  int_arg = fold_build1_loc (loc, INDIRECT_REF, int_arg_type,
+  fold_build1_loc (loc, NOP_EXPR,
+   build_pointer_type (int_arg_type),
+fold_build1_loc (loc, ADDR_EXPR,
+ build_pointer_type (type), arg)));

Doesn't this make ARG addressable?  Which in turn means ARG won't be
exposed to the gimple/ssa optimizers.Or is it the case that when
fpclassify is used its argument is already in memory (and thus
addressable?)


I believe that it is the case that when fpclassify is use the argument
is already addressable, but I am not 100% certain. I may be able to do
this differently so I'll come back to you on this one.
The more I think about it, the more I suspect ARG is only going to 
already be marked as addressable if it has already had its address taken.



But I think we can look at this as an opportunity.  If ARG is already 
addressable, then it's most likely going to be living in memory (there 
are exceptions).  If ARG is most likely going to be living in memory, 
then we clearly want to use your fast integer path, regardless of the 
target.


If ARG is not addressable, then it's not as clear as the object is 
likely going to be assigned into an FP register.  Integer operations on 
the an FP register likely will force a sequence where we dump the 
register into memory, load from memory into a GPR, then bit test on the 
GPR.  That gets very expensive on some architectures.


Could we defer lowering in the case where the object is not addressable 
until gimple->rtl expansion time?  That's the best time to introduce 
target dependencies into the code we generate.



Jeff


Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-20 Thread Tamar Christina



On 16/09/16 20:49, Jeff Law wrote:

On 09/12/2016 10:19 AM, Tamar Christina wrote:

Hi All,
+
+  /* Re-interpret the float as an unsigned integer type
+ with equal precision.  */
+  int_arg_type = build_nonstandard_integer_type (TYPE_PRECISION 
(type), 0);

+  int_arg = fold_build1_loc (loc, INDIRECT_REF, int_arg_type,
+  fold_build1_loc (loc, NOP_EXPR,
+   build_pointer_type (int_arg_type),
+fold_build1_loc (loc, ADDR_EXPR,
+ build_pointer_type (type), arg)));
Doesn't this make ARG addressable?  Which in turn means ARG won't be 
exposed to the gimple/ssa optimizers.Or is it the case that when 
fpclassify is used its argument is already in memory (and thus 
addressable?)


I believe that it is the case that when fpclassify is use the argument 
is already addressable, but I am not 100% certain. I may be able to do 
this differently so I'll

come back to you on this one.

+ exp, const1));
+
+  /* Combine the values together.  */
+  specials = fold_build3_loc (loc, COND_EXPR, int_type, 
zero_check, fp_zero,

+   fold_build3_loc (loc, COND_EXPR, int_type, exp_lsb_set,
+fold_build3_loc (loc, COND_EXPR, int_type, 
mantissa_any_set,

+  HONOR_NANS (mode) ? fp_nan : fp_normal,
+  HONOR_INFINITIES (mode) ? fp_infinite : fp_normal),
+fp_subnormal));
So this implies you're running on generic, not gimple, right? 
Otherwise you can't generate these kinds of expressions.




Yes this is generic.


diff --git a/gcc/real.h b/gcc/real.h
index 
59af580e78f2637be84f71b98b45ec6611053222..36ded57cf4db7c30c935bdb24219a167480f39c8 
100644

--- a/gcc/real.h
+++ b/gcc/real.h
@@ -161,6 +161,15 @@ struct real_format
   bool has_signed_zero;
   bool qnan_msb_set;
   bool canonical_nan_lsbs_set;
+
+  /* This flag indicates whether the format can be used in the 
optimized

+ code paths for the __builtin_fpclassify function and friends.
+ The format has to have the same NaN and INF representation as 
normal
+ IEEE floats (e.g. exp must have all bits set), most significant 
bit must be
+ sign bit, followed by exp bits of at most 32 bits.  Lastly the 
floating
+ point number must be representable as an integer.  The base of 
the number

+ also must be base 2.  */
+  bool is_binary_ieee_compatible;
   const char *name;
 };
I think Joseph has already commented on the contents of the 
initializer and a few more cases were we can use the optimized paths.


However, I do have a general question.  There are some targets which 
have FPUs that are basically IEEE, but don't support certain IEEE 
features like NaNs, denorms, etc.


Presumably all that's needed is for those targets to define a hook to 
describe which checks will always be false and you can check the 
hook's return value.  Right?


Yes, that should be enough. Not supporting NAN and Infinities is already 
supported though, but it's tied to the real format rather than a 
particular target.


Can you please include some tests to verify you're getting the initial 
code generation you want?  Ideally there'd be execution tests too 
where you generate one of the special nodes, then call the __builtin 
and verify that you get the expected results back. The latter in 
particular are key since it'll allow us to catch problems much earlier 
across the wide variety of targets GCC supports.


I can add some code generation tests. There are I believe already some 
execution tests, which test both correct and incorrect output.


I think you already had plans to post an updated patch.  Please 
include the fixes noted above in that update.


Yes I will include your feedback in it. I'm currently waiting for some 
extra performance numbers.


Thanks,
Tamar



Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-19 Thread Michael Meissner
On Mon, Sep 12, 2016 at 04:19:32PM +, Tamar Christina wrote:
> Hi All,
> 
> This patch adds an optimized route to the fpclassify builtin
> for floating point numbers which are similar to IEEE-754 in format.
> 
> The goal is to make it faster by:
> 1. Trying to determine the most common case first
>(e.g. the float is a Normal number) and then the
>rest. The amount of code generated at -O2 are
>about the same +/- 1 instruction, but the code
>is much better.
> 2. Using integer operation in the optimized path.
> 
> At a high level, the optimized path uses integer operations
> to perform the following:
> 
>   if (exponent bits aren't all set or unset)
>  return Normal;
>   else if (no bits are set on the number after masking out
>  sign bits then)
>  return Zero;
>   else if (exponent has no bits set)
>  return Subnormal;
>   else if (mantissa has no bits set)
>  return Infinite;
>   else
>  return NaN;

I haven't looked at fpclassify.  I assume we can define a backend insn to do
the right thing?  One of the things we've noticed over the years with the
PowerPC is that it can be rather expensive to move things from the floating
point/vector unit to the integer registers and vice versa.  This is
particularly true if you having to do the transfer via the memory unit via
stores and loads of different sizes.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-16 Thread Jeff Law

On 09/12/2016 10:19 AM, Tamar Christina wrote:

Hi All,

This patch adds an optimized route to the fpclassify builtin
for floating point numbers which are similar to IEEE-754 in format.

The goal is to make it faster by:
1. Trying to determine the most common case first
   (e.g. the float is a Normal number) and then the
   rest. The amount of code generated at -O2 are
   about the same ± 1 instruction, but the code
   is much better.
2. Using integer operation in the optimized path.

At a high level, the optimized path uses integer operations
to perform the following:

  if (exponent bits aren't all set or unset)
 return Normal;
  else if (no bits are set on the number after masking out
   sign bits then)
 return Zero;
  else if (exponent has no bits set)
 return Subnormal;
  else if (mantissa has no bits set)
 return Infinite;
  else
 return NaN;

In case the optimization can't be applied the old
implementation is used as a fall-back.

A limitation with this new approach is that the exponent
of the floating point has to fit in 31 bits and the floating
point has to have an IEEE like format and values for NaN and INF
(e.g. for NaN and INF all bits of the exp must be set).

To determine this IEEE likeness a new boolean was added to real_format.

Regression tests ran on aarch64-none-linux and arm-none-linux-gnueabi
and no regression. x86 uses it's own implementation other than
the fpclassify builtin.

As an example, Aarch64 now generates for classification of doubles:

f:
fmovx1, d0
mov w0, 7
sbfxx2, x1, 52, 11
add w3, w2, 1
tst w3, 0x07FE
bne .L1
mov w0, 13
tst x1, 0x7fff
beq .L1
mov w0, 11
tbz x2, 0, .L1
tst x1, 0xf
mov w0, 3
mov w1, 5
cselw0, w0, w1, ne

.L1:
ret

No new tests as there are existing tests to test functionality.
glibc benchmarks ran against the builtin and this shows a 31.3%
performance gain.

Ok for trunk?

Thanks,
Tamar

PS. I don't have commit rights so if OK can someone apply the patch for me.

gcc/
2016-08-25  Tamar Christina  
Wilco Dijkstra  

* gcc/builtins.c (fold_builtin_fpclassify): Added optimized version.
* gcc/real.h (real_format): Added is_ieee_compatible field.
* gcc/real.c (ieee_single_format): Set is_ieee_compatible flag.
(mips_single_format): Likewise.
(motorola_single_format): Likewise.
(spu_single_format): Likewise.
(ieee_double_format): Likewise.
(mips_double_format): Likewise.
(motorola_double_format): Likewise.
(ieee_extended_motorola_format): Likewise.
(ieee_extended_intel_128_format): Likewise.
(ieee_extended_intel_96_round_53_format): Likewise.
(ibm_extended_format): Likewise.
(mips_extended_format): Likewise.
(ieee_quad_format): Likewise.
(mips_quad_format): Likewise.
(vax_f_format): Likewise.
(vax_d_format): Likewise.
(vax_g_format): Likewise.
(decimal_single_format): Likewise.
(decimal_quad_format): Likewise.
(iee_half_format): Likewise.
(mips_single_format): Likewise.
(arm_half_format): Likewise.
(real_internal_format): Likewise.


gcc-public.patch


diff --git a/gcc/builtins.c b/gcc/builtins.c
index 
1073e35b17b1bc1f6974c71c940bd9d82bbbfc0f..58bf129f9a0228659fd3b976d38d021d1d5bd6bb
 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -7947,10 +7947,8 @@ static tree
 fold_builtin_fpclassify (location_t loc, tree *args, int nargs)
 {
   tree fp_nan, fp_infinite, fp_normal, fp_subnormal, fp_zero,
-arg, type, res, tmp;
+arg, type, res;
   machine_mode mode;
-  REAL_VALUE_TYPE r;
-  char buf[128];

   /* Verify the required arguments in the original call.  */
   if (nargs != 6
@@ -7970,14 +7968,143 @@ fold_builtin_fpclassify (location_t loc, tree *args, 
int nargs)
   arg = args[5];
   type = TREE_TYPE (arg);
   mode = TYPE_MODE (type);
-  arg = builtin_save_expr (fold_build1_loc (loc, ABS_EXPR, type, arg));
+  const real_format *format = REAL_MODE_FORMAT (mode);
+
+  /*
+  For IEEE 754 types:
+
+  fpclassify (x) ->
+   !((exp + 1) & (exp_mask & ~1)) // exponent bits not all set or unset
+? (x & sign_mask == 0 ? FP_ZERO :
+  (exp & exp_mask == exp_mask
+ ? (mantisa == 0 ? FP_INFINITE : FP_NAN) :
+ FP_SUBNORMAL)):
+   FP_NORMAL.
+
+  Otherwise
+
+  fpclassify (x) ->
+   isnan (x) ? FP_NAN :
+   (fabs (x) == Inf ? FP_INFINITE :
+  (fabs (x) >= DBL_MIN ? FP_NORMAL :
+(x == 0 ? FP_ZERO : FP_SUBNORMAL))).
+  */
+
+  /* Check if the number that is being classified is close enough to IEEE 754
+ format to be able to go in the early exit code.  */
+  if (format->is_binary_ieee_compatible)
+

Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-15 Thread Richard Biener
On September 15, 2016 5:52:34 PM GMT+02:00, Jeff Law  wrote:
>On 09/14/2016 02:24 AM, Richard Biener wrote:
>> On Tue, Sep 13, 2016 at 6:15 PM, Jeff Law  wrote:
>>> On 09/13/2016 02:41 AM, Jakub Jelinek wrote:

 On Mon, Sep 12, 2016 at 04:19:32PM +, Tamar Christina wrote:
>
> This patch adds an optimized route to the fpclassify builtin
> for floating point numbers which are similar to IEEE-754 in
>format.
>
> The goal is to make it faster by:
> 1. Trying to determine the most common case first
>(e.g. the float is a Normal number) and then the
>rest. The amount of code generated at -O2 are
>about the same +/- 1 instruction, but the code
>is much better.
> 2. Using integer operation in the optimized path.


 Is it generally preferable to use integer operations for this
>instead
 of floating point operations?  I mean various targets have quite
>high
 costs
 of moving data in between the general purpose and floating point
>register
 file, often it has to go through memory etc.
>>>
>>> Bit testing/twiddling is obviously a trade-off for a non-addressable
>object.
>>> I don't think there's any reasonable way to always generate the most
>>> efficient code as it's going to depend on (for example) register
>allocation
>>> behavior.
>>>
>>> So what we're stuck doing is relying on the target costing bits to
>guide
>>> this kind of thing.
>>
>> I think the reason for this patch is to provide a general optimized
>> integer version.
>And just to be clear, that's fine with me.  While there are cases where
>
>bit twiddling hurts, I think bit twiddling is generally better.
>
>
>> I think it asks for a FP (class) propagation pass somewhere (maybe as
>part of
>> complex lowering which already has a similar "coarse" lattice -- not
>that I like
>> its implementation very much) and doing the "lowering" there.
>Not a bad idea -- I wonder how much a coarse tracking of the
>exceptional 
>cases would allow later optimization.

I guess it really depends on the ability to set ffast-math flags on individual 
stmts (or at least built-in calls).

Richard.

>>
>> Not something that should block this patch though.
>Agreed.
>
>jeff




Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-15 Thread Jeff Law

On 09/14/2016 02:24 AM, Richard Biener wrote:

On Tue, Sep 13, 2016 at 6:15 PM, Jeff Law  wrote:

On 09/13/2016 02:41 AM, Jakub Jelinek wrote:


On Mon, Sep 12, 2016 at 04:19:32PM +, Tamar Christina wrote:


This patch adds an optimized route to the fpclassify builtin
for floating point numbers which are similar to IEEE-754 in format.

The goal is to make it faster by:
1. Trying to determine the most common case first
   (e.g. the float is a Normal number) and then the
   rest. The amount of code generated at -O2 are
   about the same +/- 1 instruction, but the code
   is much better.
2. Using integer operation in the optimized path.



Is it generally preferable to use integer operations for this instead
of floating point operations?  I mean various targets have quite high
costs
of moving data in between the general purpose and floating point register
file, often it has to go through memory etc.


Bit testing/twiddling is obviously a trade-off for a non-addressable object.
I don't think there's any reasonable way to always generate the most
efficient code as it's going to depend on (for example) register allocation
behavior.

So what we're stuck doing is relying on the target costing bits to guide
this kind of thing.


I think the reason for this patch is to provide a general optimized
integer version.
And just to be clear, that's fine with me.  While there are cases where 
bit twiddling hurts, I think bit twiddling is generally better.




I think it asks for a FP (class) propagation pass somewhere (maybe as part of
complex lowering which already has a similar "coarse" lattice -- not that I like
its implementation very much) and doing the "lowering" there.
Not a bad idea -- I wonder how much a coarse tracking of the exceptional 
cases would allow later optimization.




Not something that should block this patch though.

Agreed.

jeff


Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-15 Thread Joseph Myers
On Thu, 15 Sep 2016, Tamar Christina wrote:

> a rather large costs in complexity. Also wouldn't this be problematic 
> for other functions as well such as expand_builtin_signbit?

expand_builtin_signbit computes a word number and the bit position in that 
word.  It has no problem with 128-bit types on 32-bit systems where the 
largest integer mode supported for scalar variables is DImode.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-15 Thread Joseph Myers
On Thu, 15 Sep 2016, Wilco Dijkstra wrote:

> Yes, if there are targets which don't implement TImode operations then 
> surely they should be automatically split into DImode operations before 
> or during Expand?

The operations generally don't exist if the mode fails the 
scalar_mode_supported_p hook.  I don't know whether there are sufficient 
TImode operations for the bitwise operations you need here, even in the 
case where it fails that hook (and so you can't declare variables with 
that mode) - it's arithmetic, and the ABI support needed for argument 
passing, that are harder to do by splitting into smaller modes (and that 
GCC generally only handles in libgcc for 2-word operands, not for 4-word 
operands).

> So for now it would seem best to keep the boolean false for quad formats 
> on 32-bit targets.

This is a function of command-line options, not the format, so it can't go 
in the table.  The table should describe the format properties only.

Does the expansion work, in fact, for __float128 on 32-bit x86, given the 
boolean set to true (other relevant cases include 128-bit long double on 
32-bit s390 and 32-bit sparc with appropriate options to make long double 
128-bit)?  If it does, it may be OK to use modes that fail the 
scalar_mode_supported_p hook.  If something doesn't work in that case, the 
right way to avoid an expansion is not to set the boolean to false in the 
table of formats, it's to loop over supported integer modes seeing if 
there is one wide enough that also passes the scalar_mode_supported_p 
hook.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-15 Thread Wilco Dijkstra
Tamar Christina wrote:
> On 13/09/16 13:43, Joseph Myers wrote:
> > On Tue, 13 Sep 2016, Tamar Christina wrote:
>>
> >> On 12/09/16 23:33, Joseph Myers wrote:
> >>> Why is this boolean false for ieee_quad_format, mips_quad_format and
> >>> ieee_half_format?  They should meet your description (even if the x86 /
> >>> m68k "extended" formats don't because of the leading mantissa bit being
> >>> set for infinities).
> >>>
> >> Ah, I played it a bit too safe there. I will change this and do some
> >> re-testing and update the patch.
> > It occurred to me that there might be an issue with your approach of
> > overlaying the floating-point value with a single integer, when the quad
> > formats are used on 32-bit systems where TImode isn't fully supported as a
> > scalar mode.  However, if that's an issue the answer isn't to mark the
> > formats as non-IEEE, it's to support ORing together the relevant parts of
> > multiple words when determining whether the mantissa is nonzero (or some
> > equivalent logic).
> >
> I have been trying to reproduce this on the architectures I have access to
> but have been unable to so far. In practice if this does happen though 
> isn't it the fault of the system for advertising partial TImode support and 
> support of IEEE types?
>
> It seems to me that in order for me to be able to do this fpclassify 
> would incur a rather large costs in complexity. Also wouldn't this be 
> problematic 
> for other functions as well such as expand_builtin_signbit?

Yes, if there are targets which don't implement TImode operations then surely
they should be automatically split into DImode operations before or during 
Expand?
GCC's implementation of types larger than the register int type is generally 
extremely
poor as it is missing such an expansion (practically all compilers do this), so 
this
would improve things significantly.

So for now it would seem best to keep the boolean false for quad formats on 
32-bit
targets.

Wilco



Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-15 Thread Tamar Christina



On 13/09/16 13:43, Joseph Myers wrote:

On Tue, 13 Sep 2016, Tamar Christina wrote:



On 12/09/16 23:33, Joseph Myers wrote:

Why is this boolean false for ieee_quad_format, mips_quad_format and
ieee_half_format?  They should meet your description (even if the x86 /
m68k "extended" formats don't because of the leading mantissa bit being
set for infinities).


Ah, I played it a bit too safe there. I will change this and do some
re-testing and update the patch.

It occurred to me that there might be an issue with your approach of
overlaying the floating-point value with a single integer, when the quad
formats are used on 32-bit systems where TImode isn't fully supported as a
scalar mode.  However, if that's an issue the answer isn't to mark the
formats as non-IEEE, it's to support ORing together the relevant parts of
multiple words when determining whether the mantissa is nonzero (or some
equivalent logic).


I have been trying to reproduce this on the architectures I have access to
but have been unable to so far. In practice if this does happen though 
isn't it
the fault of the system for advertising partial TImode support and 
support of

IEEE types?

It seems to me that in order for me to be able to do this fpclassify 
would incur
a rather large costs in complexity. Also wouldn't this be problematic 
for other functions

as well such as expand_builtin_signbit?



Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-14 Thread Richard Biener
On Tue, Sep 13, 2016 at 6:15 PM, Jeff Law  wrote:
> On 09/13/2016 02:41 AM, Jakub Jelinek wrote:
>>
>> On Mon, Sep 12, 2016 at 04:19:32PM +, Tamar Christina wrote:
>>>
>>> This patch adds an optimized route to the fpclassify builtin
>>> for floating point numbers which are similar to IEEE-754 in format.
>>>
>>> The goal is to make it faster by:
>>> 1. Trying to determine the most common case first
>>>(e.g. the float is a Normal number) and then the
>>>rest. The amount of code generated at -O2 are
>>>about the same +/- 1 instruction, but the code
>>>is much better.
>>> 2. Using integer operation in the optimized path.
>>
>>
>> Is it generally preferable to use integer operations for this instead
>> of floating point operations?  I mean various targets have quite high
>> costs
>> of moving data in between the general purpose and floating point register
>> file, often it has to go through memory etc.
>
> Bit testing/twiddling is obviously a trade-off for a non-addressable object.
> I don't think there's any reasonable way to always generate the most
> efficient code as it's going to depend on (for example) register allocation
> behavior.
>
> So what we're stuck doing is relying on the target costing bits to guide
> this kind of thing.

I think the reason for this patch is to provide a general optimized
integer version.

The only reason to not use integer operation (compared to what
fold_builtin_classify
does currently) is that the folding is done very early at the moment
and it's harder
to optimize the integer bit-twiddling with more FP context known.
Like if we know
if (! isnan ()) then unless we also expand that inline via
bit-twiddling nothing will
optimize the followup test from the fpclassify.   This might be somewhat moot
at the moment given our lack of FP value-range propagation but it should be a
general concern (of doing this too early).

I think it asks for a FP (class) propagation pass somewhere (maybe as part of
complex lowering which already has a similar "coarse" lattice -- not that I like
its implementation very much) and doing the "lowering" there.

Not something that should block this patch though.

Richard.

> jeff


Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-13 Thread Jeff Law

On 09/13/2016 02:41 AM, Jakub Jelinek wrote:

On Mon, Sep 12, 2016 at 04:19:32PM +, Tamar Christina wrote:

This patch adds an optimized route to the fpclassify builtin
for floating point numbers which are similar to IEEE-754 in format.

The goal is to make it faster by:
1. Trying to determine the most common case first
   (e.g. the float is a Normal number) and then the
   rest. The amount of code generated at -O2 are
   about the same +/- 1 instruction, but the code
   is much better.
2. Using integer operation in the optimized path.


Is it generally preferable to use integer operations for this instead
of floating point operations?  I mean various targets have quite high costs
of moving data in between the general purpose and floating point register
file, often it has to go through memory etc.
Bit testing/twiddling is obviously a trade-off for a non-addressable 
object.  I don't think there's any reasonable way to always generate the 
most efficient code as it's going to depend on (for example) register 
allocation behavior.


So what we're stuck doing is relying on the target costing bits to guide 
this kind of thing.


jeff


Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-13 Thread Joseph Myers
On Tue, 13 Sep 2016, Wilco Dijkstra wrote:

> I would suggest someone with access to a machine with slow FP moves (POWER?)
> to benchmark this using the fpclassify test 
> (glibc/benchtests/bench-math-inlines.c)
> so we know for sure.

And if for some operations on some architectures the floating-point 
version is faster, that just means we need a hook to choose between them 
(in the default -fno-signaling-nans case, since -fsignaling-nans should 
always use the integer version).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-13 Thread Joseph Myers
On Tue, 13 Sep 2016, Tamar Christina wrote:

> On 12/09/16 23:41, Joseph Myers wrote:
> > Are you making endianness assumptions - specifically, does the
> > reinterpretation as an integer require that WORDS_BIG_ENDIAN and
> > FLOAT_WORDS_BIG_ENDIAN are the same?  If so, I think that's OK (in that
> > the only target where they aren't the same seems to be pdp11 which doesn't
> > use IEEE formats), but probably the code should check explicitly.
> > 
> No, if I understood the question correctly then  this should be ok,
> since I always access the float as an integer of equivalent precision.
> So a 64bit float will be addressed as a 64bit int.

My point is that there are theoretically systems where the order of words 
in a 64-bit float is not the same as the order of words in a 64-bit 
integer.  Though it may be the case in practice that no such targets in 
GCC use IEEE formats (and that pdp11 is the only target without all of 
BYTES_BIG_ENDIAN, WORDS_BIG_ENDIAN and FLOAT_WORDS_BIG_ENDIAN the same).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-13 Thread Joseph Myers
On Tue, 13 Sep 2016, Tamar Christina wrote:

> 
> 
> On 12/09/16 23:33, Joseph Myers wrote:
> > Why is this boolean false for ieee_quad_format, mips_quad_format and
> > ieee_half_format?  They should meet your description (even if the x86 /
> > m68k "extended" formats don't because of the leading mantissa bit being
> > set for infinities).
> > 
> Ah, I played it a bit too safe there. I will change this and do some 
> re-testing and update the patch.

It occurred to me that there might be an issue with your approach of 
overlaying the floating-point value with a single integer, when the quad 
formats are used on 32-bit systems where TImode isn't fully supported as a 
scalar mode.  However, if that's an issue the answer isn't to mark the 
formats as non-IEEE, it's to support ORing together the relevant parts of 
multiple words when determining whether the mantissa is nonzero (or some 
equivalent logic).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-13 Thread Tamar Christina


On 12/09/16 23:41, Joseph Myers wrote:

Are you making endianness assumptions - specifically, does the
reinterpretation as an integer require that WORDS_BIG_ENDIAN and
FLOAT_WORDS_BIG_ENDIAN are the same?  If so, I think that's OK (in that
the only target where they aren't the same seems to be pdp11 which doesn't
use IEEE formats), but probably the code should check explicitly.


No, if I understood the question correctly then  this should be ok,
since I always access the float as an integer of equivalent precision.
So a 64bit float will be addressed as a 64bit int.

Tamar




Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-13 Thread Tamar Christina



On 12/09/16 23:33, Joseph Myers wrote:

Why is this boolean false for ieee_quad_format, mips_quad_format and
ieee_half_format?  They should meet your description (even if the x86 /
m68k "extended" formats don't because of the leading mantissa bit being
set for infinities).

Ah, I played it a bit too safe there. I will change this and do some 
re-testing and

update the patch.

Tamar



Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-13 Thread Tamar Christina



On 12/09/16 23:28, Joseph Myers wrote:

On Mon, 12 Sep 2016, Tamar Christina wrote:

Similar changes may be useful for __builtin_isfinite, __builtin_isnan,
__builtin_isinf, __builtin_isinf_sign, __builtin_isnormal.

Will your version always use only integer operations if the format is IEEE
enough?
Yes it will, the idea was indeed to also do those calls but to start 
with this one first to see
what the feedback would be. I believe there's a ticket for that as well 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66462


Tamar



Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-13 Thread Wilco Dijkstra
Jakub wrote:
> On Mon, Sep 12, 2016 at 04:19:32PM +, Tamar Christina wrote:
> > This patch adds an optimized route to the fpclassify builtin
> > for floating point numbers which are similar to IEEE-754 in format.
> > 
> > The goal is to make it faster by:
> > 1. Trying to determine the most common case first
> >(e.g. the float is a Normal number) and then the
> >rest. The amount of code generated at -O2 are
> >about the same +/- 1 instruction, but the code
> >is much better.
> > 2. Using integer operation in the optimized path.
> 
> Is it generally preferable to use integer operations for this instead
> of floating point operations?  I mean various targets have quite high costs
> of moving data in between the general purpose and floating point register
> file, often it has to go through memory etc.

It is generally preferable indeed - there was a *very* long discussion about 
integer
vs FP on the GLIBC mailing list when I updated math.h to use the GCC builtins a
while back (the GLIBC implementation used a non-inlined unoptimized integer
implementation, so an inlined FP implementation seemed a good intermediate 
solution).

Integer operations are generally lower latency and enable bit manipulation 
tricks like the
fast early exit. The FP version requires execution of 5 branches for a "normal" 
FP value
and loads several floating point immediates. There are also many targets with 
emulated
floating point types, so 5 calls to the comparison lib function would be 
seriously slow.
Note using so many FP comparisons is not just slow but they aren't correct for 
signalling
NaNs, so this patch also fixes bug 66462 for fpclassify.

I would suggest someone with access to a machine with slow FP moves (POWER?)
to benchmark this using the fpclassify test 
(glibc/benchtests/bench-math-inlines.c)
so we know for sure.

Wilco



Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-13 Thread Jakub Jelinek
On Mon, Sep 12, 2016 at 04:19:32PM +, Tamar Christina wrote:
> This patch adds an optimized route to the fpclassify builtin
> for floating point numbers which are similar to IEEE-754 in format.
> 
> The goal is to make it faster by:
> 1. Trying to determine the most common case first
>(e.g. the float is a Normal number) and then the
>rest. The amount of code generated at -O2 are
>about the same +/- 1 instruction, but the code
>is much better.
> 2. Using integer operation in the optimized path.

Is it generally preferable to use integer operations for this instead
of floating point operations?  I mean various targets have quite high costs
of moving data in between the general purpose and floating point register
file, often it has to go through memory etc.

Jakub


Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-12 Thread Joseph Myers
Are you making endianness assumptions - specifically, does the 
reinterpretation as an integer require that WORDS_BIG_ENDIAN and 
FLOAT_WORDS_BIG_ENDIAN are the same?  If so, I think that's OK (in that 
the only target where they aren't the same seems to be pdp11 which doesn't 
use IEEE formats), but probably the code should check explicitly.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-12 Thread Joseph Myers
On Mon, 12 Sep 2016, Tamar Christina wrote:

> A limitation with this new approach is that the exponent
> of the floating point has to fit in 31 bits and the floating
> point has to have an IEEE like format and values for NaN and INF
> (e.g. for NaN and INF all bits of the exp must be set).
> 
> To determine this IEEE likeness a new boolean was added to real_format.

Why is this boolean false for ieee_quad_format, mips_quad_format and 
ieee_half_format?  They should meet your description (even if the x86 / 
m68k "extended" formats don't because of the leading mantissa bit being 
set for infinities).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-12 Thread Joseph Myers
On Mon, 12 Sep 2016, Tamar Christina wrote:

> Hi All,
> 
> This patch adds an optimized route to the fpclassify builtin
> for floating point numbers which are similar to IEEE-754 in format.

Similar changes may be useful for __builtin_isfinite, __builtin_isnan, 
__builtin_isinf, __builtin_isinf_sign, __builtin_isnormal.

Will your version always use only integer operations if the format is IEEE 
enough?  If so, it could be used by glibc's  if __SUPPORT_SNAN__ 
(-fsignaling-nans), except in the case where IBM long double is supported, 
whereas presently all those built-in functions are avoided by glibc 
 for -fsignaling-nans.  The same applies to integer versions of 
the other functions - whether or not they are beneficial in performance 
normally, they are correct for -fsignaling-nans, which the present 
built-in functions aren't.

(I intend to add issubnormal and iszero macros to glibc following TS 
18661-1; built-in versions of those would be useful as well.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-12 Thread Andrew Pinski
On Mon, Sep 12, 2016 at 6:21 PM, Moritz Klammler  wrote:
>
> Tamar Christina  writes:
>
>> Hi All,
>>
>> This patch adds an optimized route to the fpclassify builtin
>> for floating point numbers which are similar to IEEE-754 in format.
>>
>> [...]
>
> I might be the least competent person on this list to review this patch
> but nevertheless read it out of interest and stumbled over a comment
> that I believe could be improved for clarity.
>
> diff --git a/gcc/real.h b/gcc/real.h
> index 
> 59af580e78f2637be84f71b98b45ec6611053222..36ded57cf4db7c30c935bdb24219a167480f39c8
>  100644
> --- a/gcc/real.h
> +++ b/gcc/real.h
> @@ -161,6 +161,15 @@ struct real_format
>bool has_signed_zero;
>bool qnan_msb_set;
>bool canonical_nan_lsbs_set;
> +
> +  /* This flag indicates whether the format can be used in the optimized
> + code paths for the __builtin_fpclassify function and friends.
> + The format has to have the same NaN and INF representation as normal
> + IEEE floats (e.g. exp must have all bits set), most significant bit 
> must be
> + sign bit, followed by exp bits of at most 32 bits.  Lastly the 
> floating
> + point number must be representable as an integer.  The base of the 
> number
> + also must be base 2.  */
> +  bool is_binary_ieee_compatible;
>const char *name;
>  };
>
> My first issue is that
>
>> The format has to have the same NaN and INF representation as normal
>> IEEE floats
>
> is kind of an oxymoron because NaNs and INFs are not "normal" IEEE
> floats.

Let me clarify here what was originally meant,  first some float uses
the same format as IEEE but don't support INF or NaNs (SPUv1 float for
an example, v2 supports both though).

Thanks,
Andrew.


>
> Second,
>
>> the floating point number must be representable as an integer
>
> is also somewhat misleading because it could be interpreted in the
> (obviously nonsensical) way that the floating-point *values* have to be
> integral.  (I think it should be possible to *interpret* not *represent*
> them as integers.)
>
> So I would like to suggest the following rewording.
>
>> This flag indicates whether the format is suitable for the optimized
>> code paths for the __builtin_fpclassify function and friends.  For
>> this, the format must be a base 2 representation with the sign bit as
>> the most-significant bit followed by (exp <= 32) exponent bits
>> followed by the mantissa bits.  It must be possible to interpret the
>> bits of the floating-point representation as an integer.  NaNs and
>> INFs must be represented by the same schema used by IEEE 754.  (NaNs
>> must be represented by an exponent with all bits 1, any mantissa
>> except all bits 0 and any sign bit.  +INF and -INF must be represented
>> by an exponent with all bits 1, a mantissa with all bits 0 and a sign
>> bit of 0 and 1 respectively.)
>
> I Hope this is clearer and still matches what the comment was supposed
> to say.
> --
> OpenPGP:
>
> Public Key:   http://openpgp.klammler.eu
> Fingerprint:  2732 DA32 C8D0 EEEC A081  BE9D CF6C 5166 F393 A9C0


Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-12 Thread Moritz Klammler

Tamar Christina  writes:

> Hi All,
>
> This patch adds an optimized route to the fpclassify builtin
> for floating point numbers which are similar to IEEE-754 in format.
>
> [...]

I might be the least competent person on this list to review this patch
but nevertheless read it out of interest and stumbled over a comment
that I believe could be improved for clarity.

diff --git a/gcc/real.h b/gcc/real.h
index 
59af580e78f2637be84f71b98b45ec6611053222..36ded57cf4db7c30c935bdb24219a167480f39c8
 100644
--- a/gcc/real.h
+++ b/gcc/real.h
@@ -161,6 +161,15 @@ struct real_format
   bool has_signed_zero;
   bool qnan_msb_set;
   bool canonical_nan_lsbs_set;
+
+  /* This flag indicates whether the format can be used in the optimized
+ code paths for the __builtin_fpclassify function and friends.
+ The format has to have the same NaN and INF representation as normal
+ IEEE floats (e.g. exp must have all bits set), most significant bit 
must be
+ sign bit, followed by exp bits of at most 32 bits.  Lastly the 
floating
+ point number must be representable as an integer.  The base of the 
number
+ also must be base 2.  */
+  bool is_binary_ieee_compatible;
   const char *name;
 };

My first issue is that

> The format has to have the same NaN and INF representation as normal
> IEEE floats

is kind of an oxymoron because NaNs and INFs are not "normal" IEEE
floats.

Second,

> the floating point number must be representable as an integer

is also somewhat misleading because it could be interpreted in the
(obviously nonsensical) way that the floating-point *values* have to be
integral.  (I think it should be possible to *interpret* not *represent*
them as integers.)

So I would like to suggest the following rewording.

> This flag indicates whether the format is suitable for the optimized
> code paths for the __builtin_fpclassify function and friends.  For
> this, the format must be a base 2 representation with the sign bit as
> the most-significant bit followed by (exp <= 32) exponent bits
> followed by the mantissa bits.  It must be possible to interpret the
> bits of the floating-point representation as an integer.  NaNs and
> INFs must be represented by the same schema used by IEEE 754.  (NaNs
> must be represented by an exponent with all bits 1, any mantissa
> except all bits 0 and any sign bit.  +INF and -INF must be represented
> by an exponent with all bits 1, a mantissa with all bits 0 and a sign
> bit of 0 and 1 respectively.)

I Hope this is clearer and still matches what the comment was supposed
to say.
-- 
OpenPGP:

Public Key:   http://openpgp.klammler.eu
Fingerprint:  2732 DA32 C8D0 EEEC A081  BE9D CF6C 5166 F393 A9C0


signature.asc
Description: PGP signature