Re: [PATCH v2] rs6000: Add buildin for mffscrn instructions

2023-05-31 Thread Carl Love via Gcc-patches
Kewen:

On Wed, 2023-05-31 at 17:11 +0800, Kewen.Lin wrote:
> > So, there is no need for the builtin to have to determine if the
> > user
> > is storing the result of the __builtin_set_fpscr_rn.  The RN bits
> > will
> > always be updated by the __builtin_set_fpscr_rn builtin and the
> > existing fields of the FPSCR will always be returned by the
> > builtin.
> 
> Yeah, I agree, even with pre-P9 code when the returned value is
> unused,
> I'd expect DCE can eliminate the part for the FPSCR bits reading and
> masking, it's just like before (only setting RN bits).
> 
> The only concern I mentioned before is the built-in name doesn't
> clearly
> match what it does (with extending, it returns something instead)
> since
> it's only saying "set" and setting RN bits, the return value is
> easily
> misunderstood as returning old RN bits, the documentation has to
> explain
> and note it well.
> 
> Looking forward to Segher's opinion on this.

I have the patch to extend the __builtin_set_fpscr_rn builtin working. 
I agree the documentation on the instructions in the ISA is not really
clear about that.  It needs to be much more explicit in the builtin
description that the current RN field is returned then the field is
updated with the new RN bits from the argument.  

I sent the patch, with the updated builtin description and testcases to
the GLibC team to see what they thought of it.  The goal was for the
builtin to be effectively a "drop in replacement" for the inline asm
that they have.  I was planning on posting the new version if the GLibC
team says it works for them.  Hopefully I will hear from them soon.

Carl 



Re: [PATCH v2] rs6000: Add buildin for mffscrn instructions

2023-05-31 Thread Kewen.Lin via Gcc-patches
Hi Carl,

on 2023/5/25 23:59, Carl Love wrote:
> Peter, Kewen:
> 
> On Thu, 2023-05-25 at 13:28 +0800, Kewen.Lin wrote:
>> on 2023/5/24 23:20, Carl Love wrote:
>>> On Wed, 2023-05-24 at 13:32 +0800, Kewen.Lin wrote:
 on 2023/5/24 06:30, Peter Bergner wrote:
> On 5/23/23 12:24 AM, Kewen.Lin wrote:
>> on 2023/5/23 01:31, Carl Love wrote:
>>> The builtins were requested for use in GLibC.  As of
>>> version
>>> 2.31 they
>>> were added as inline asm.  They requested a builtin so the
>>> asm
>>> could be
>>> removed.
>>
>> So IMHO we also want the similar support for mffscrn, that is
>> to
>> make
>> use of mffscrn and mffscrni on Power9 and later, but falls
>> back
>> to 
>> __builtin_set_fpscr_rn + mffs similar on older platforms.
>
> So __builtin_set_fpscr_rn everything we want (sets the RN bits)
> and
> uses mffscrn/mffscrni on P9 and later and uses older insns on
> pre-
> P9.
> The only problem is we don't return the current FPSCR bits, as
> the
> bif
> is defined to return void.

 Yes.

> Crazy idea, but could we extend the built-in
> with an overload that returns the FPSCR bits?  

 So you agree that we should make this proposed new bif handle
 pre-P9
 just
 like some other existing bifs. :)  I think extending it is good
 and
 doable,
 but the only concern here is the bif name
 "__builtin_set_fpscr_rn",
 which
 matches the existing behavior (only set rounding) but doesn't
 match
 the
 proposed extending behavior (set rounding and get some env bits
 back).
 Maybe it's not a big deal if the documentation clarify it well.
>>>
>>> Extending the builtin to pre Power 9 is straight forward and I
>>> agree
>>> would make good sense to do.
>>>
>>> I am a bit concerned on how to extend __builtin_set_fpscr_rn to add
>>> the
>>> new functionality.  Peter suggests overloading the builtin to
>>> either
>>> return void or returns FPSCR bits.  It is my understanding that the
>>> return value for a given builtin had to be the same, i.e. you can't
>>> overload the return value. Maybe you can with Bill's new
>>> infrastructure?  I recall having problems trying to overload the
>>> return
>>> value in the past and Bill said you couldn't do it.  I play with
>>> this
>>> and see if I can overload the return value.
>>
>> Your understanding on that we fail to overload this for just
>> different
>> return types is correct.  But previously I interpreted the extending
>> proposal as to extend
>>
>>   void __builtin_set_fpscr_rn (int);
>>
>> to 
>>
>>   void __builtin_set_fpscr_rn (int, double*);
>>
>> The related address taken and store here can be optimized out
>> normally.
> 
> I don't think that is correct.   The current definition of the builtin

Just to clarify: I didn't meant to suggest it, just explained why I
thought overloading is doable previously as I interpreted it with one
extended argument. :)

> is:
> 
>  void __builtin_set_fpscr_rn (int);
> 
> The proposal by Peter was to change the return type to double, i.e.
> 
>  double __builtin_set_fpscr_rn (int);
> 
> Peter also said the following:
> 
>The built-in machinery can see that the usage is expecting a return
>value or not and for the pre-P9 code, can skip generating the ending
>mffs if we don't want the return value.
> 
> Which I don't think we want.  The mffscrn and mffscrni instructions
> return the contents of the control bits in the FPSCR, that is, bits
> 29:31 (DRN) and bits 56:63 (VE, OE, UE, ZE, XE, NI, RN), are placed
> into the corresponding bits in register FRT. All other bits in register
> FRT are set to 0.  
> 
> The instructions also updates the current RN field of the FPSCR with
> the new RN supplied the second argument of the instruction.  So, the
> instructions update the RN field just like the __builtin_set_fpscr_rn. 
> So, we can use the existing __builtin_set_fpscr_rn to update the RN for
> all ISAs, we just need to have __builtin_set_fpscr_rn always return a
> double with the desired fields from the FPSCR (the current RN).  This
> will then emulate the behavior of the mffscrn and mffscrni
> instructions.  The current uses of __builtin_set_fpscr_rn will just
> ignore the return value which is not a problem.  The return value can
> be stored in the places were the user is currently using the inline asm
> for the mffscrn and mffscrni instructions.
> 
> The __builtin_set_fpscr_rn builtin is currently using the mffscrn and
> mffscrni on Power 9 and throwing away the result from the instruction. 
> We just need to change __builtin_set_fpscr_rn to return the value
> instead.  For the pre Power 9 code, the builtin will need to read the
> full FPSCR, mask of the desired fields and return the fields.
> 
> So, there is no need for the builtin to have to determine if the user
> is storing the result of the __builtin_set_fpscr_rn.  The RN bits 

Re: [PATCH v2] rs6000: Add buildin for mffscrn instructions

2023-05-25 Thread Carl Love via Gcc-patches
Peter, Kewen:

On Thu, 2023-05-25 at 13:28 +0800, Kewen.Lin wrote:
> on 2023/5/24 23:20, Carl Love wrote:
> > On Wed, 2023-05-24 at 13:32 +0800, Kewen.Lin wrote:
> > > on 2023/5/24 06:30, Peter Bergner wrote:
> > > > On 5/23/23 12:24 AM, Kewen.Lin wrote:
> > > > > on 2023/5/23 01:31, Carl Love wrote:
> > > > > > The builtins were requested for use in GLibC.  As of
> > > > > > version
> > > > > > 2.31 they
> > > > > > were added as inline asm.  They requested a builtin so the
> > > > > > asm
> > > > > > could be
> > > > > > removed.
> > > > > 
> > > > > So IMHO we also want the similar support for mffscrn, that is
> > > > > to
> > > > > make
> > > > > use of mffscrn and mffscrni on Power9 and later, but falls
> > > > > back
> > > > > to 
> > > > > __builtin_set_fpscr_rn + mffs similar on older platforms.
> > > > 
> > > > So __builtin_set_fpscr_rn everything we want (sets the RN bits)
> > > > and
> > > > uses mffscrn/mffscrni on P9 and later and uses older insns on
> > > > pre-
> > > > P9.
> > > > The only problem is we don't return the current FPSCR bits, as
> > > > the
> > > > bif
> > > > is defined to return void.
> > > 
> > > Yes.
> > > 
> > > > Crazy idea, but could we extend the built-in
> > > > with an overload that returns the FPSCR bits?  
> > > 
> > > So you agree that we should make this proposed new bif handle
> > > pre-P9
> > > just
> > > like some other existing bifs. :)  I think extending it is good
> > > and
> > > doable,
> > > but the only concern here is the bif name
> > > "__builtin_set_fpscr_rn",
> > > which
> > > matches the existing behavior (only set rounding) but doesn't
> > > match
> > > the
> > > proposed extending behavior (set rounding and get some env bits
> > > back).
> > > Maybe it's not a big deal if the documentation clarify it well.
> > 
> > Extending the builtin to pre Power 9 is straight forward and I
> > agree
> > would make good sense to do.
> > 
> > I am a bit concerned on how to extend __builtin_set_fpscr_rn to add
> > the
> > new functionality.  Peter suggests overloading the builtin to
> > either
> > return void or returns FPSCR bits.  It is my understanding that the
> > return value for a given builtin had to be the same, i.e. you can't
> > overload the return value. Maybe you can with Bill's new
> > infrastructure?  I recall having problems trying to overload the
> > return
> > value in the past and Bill said you couldn't do it.  I play with
> > this
> > and see if I can overload the return value.
> 
> Your understanding on that we fail to overload this for just
> different
> return types is correct.  But previously I interpreted the extending
> proposal as to extend
> 
>   void __builtin_set_fpscr_rn (int);
> 
> to 
> 
>   void __builtin_set_fpscr_rn (int, double*);
> 
> The related address taken and store here can be optimized out
> normally.

I don't think that is correct.   The current definition of the builtin
is:

 void __builtin_set_fpscr_rn (int);

The proposal by Peter was to change the return type to double, i.e.

 double __builtin_set_fpscr_rn (int);

Peter also said the following:

   The built-in machinery can see that the usage is expecting a return
   value or not and for the pre-P9 code, can skip generating the ending
   mffs if we don't want the return value.

Which I don't think we want.  The mffscrn and mffscrni instructions
return the contents of the control bits in the FPSCR, that is, bits
29:31 (DRN) and bits 56:63 (VE, OE, UE, ZE, XE, NI, RN), are placed
into the corresponding bits in register FRT. All other bits in register
FRT are set to 0.  

The instructions also updates the current RN field of the FPSCR with
the new RN supplied the second argument of the instruction.  So, the
instructions update the RN field just like the __builtin_set_fpscr_rn. 
So, we can use the existing __builtin_set_fpscr_rn to update the RN for
all ISAs, we just need to have __builtin_set_fpscr_rn always return a
double with the desired fields from the FPSCR (the current RN).  This
will then emulate the behavior of the mffscrn and mffscrni
instructions.  The current uses of __builtin_set_fpscr_rn will just
ignore the return value which is not a problem.  The return value can
be stored in the places were the user is currently using the inline asm
for the mffscrn and mffscrni instructions.

The __builtin_set_fpscr_rn builtin is currently using the mffscrn and
mffscrni on Power 9 and throwing away the result from the instruction. 
We just need to change __builtin_set_fpscr_rn to return the value
instead.  For the pre Power 9 code, the builtin will need to read the
full FPSCR, mask of the desired fields and return the fields.

So, there is no need for the builtin to have to determine if the user
is storing the result of the __builtin_set_fpscr_rn.  The RN bits will
always be updated by the __builtin_set_fpscr_rn builtin and the
existing fields of the FPSCR will always be returned by the builtin.

Please let me know if you agree.  I think I have this sorted out

Re: [PATCH v2] rs6000: Add buildin for mffscrn instructions

2023-05-24 Thread Kewen.Lin via Gcc-patches
on 2023/5/24 23:20, Carl Love wrote:
> On Wed, 2023-05-24 at 13:32 +0800, Kewen.Lin wrote:
>> on 2023/5/24 06:30, Peter Bergner wrote:
>>> On 5/23/23 12:24 AM, Kewen.Lin wrote:
 on 2023/5/23 01:31, Carl Love wrote:
> The builtins were requested for use in GLibC.  As of version
> 2.31 they
> were added as inline asm.  They requested a builtin so the asm
> could be
> removed.

 So IMHO we also want the similar support for mffscrn, that is to
 make
 use of mffscrn and mffscrni on Power9 and later, but falls back
 to 
 __builtin_set_fpscr_rn + mffs similar on older platforms.
>>>
>>> So __builtin_set_fpscr_rn everything we want (sets the RN bits) and
>>> uses mffscrn/mffscrni on P9 and later and uses older insns on pre-
>>> P9.
>>> The only problem is we don't return the current FPSCR bits, as the
>>> bif
>>> is defined to return void.
>>
>> Yes.
>>
>>> Crazy idea, but could we extend the built-in
>>> with an overload that returns the FPSCR bits?  
>>
>> So you agree that we should make this proposed new bif handle pre-P9
>> just
>> like some other existing bifs. :)  I think extending it is good and
>> doable,
>> but the only concern here is the bif name "__builtin_set_fpscr_rn",
>> which
>> matches the existing behavior (only set rounding) but doesn't match
>> the
>> proposed extending behavior (set rounding and get some env bits
>> back).
>> Maybe it's not a big deal if the documentation clarify it well.
> 
> Extending the builtin to pre Power 9 is straight forward and I agree
> would make good sense to do.
> 
> I am a bit concerned on how to extend __builtin_set_fpscr_rn to add the
> new functionality.  Peter suggests overloading the builtin to either
> return void or returns FPSCR bits.  It is my understanding that the
> return value for a given builtin had to be the same, i.e. you can't
> overload the return value. Maybe you can with Bill's new
> infrastructure?  I recall having problems trying to overload the return
> value in the past and Bill said you couldn't do it.  I play with this
> and see if I can overload the return value.

Your understanding on that we fail to overload this for just different
return types is correct.  But previously I interpreted the extending
proposal as to extend
 
  void __builtin_set_fpscr_rn (int);

to 

  void __builtin_set_fpscr_rn (int, double*);

The related address taken and store here can be optimized out normally.

BR,
Kewen


Re: [PATCH v2] rs6000: Add buildin for mffscrn instructions

2023-05-24 Thread Peter Bergner via Gcc-patches
On 5/24/23 10:20 AM, Carl Love wrote:
> Extending the builtin to pre Power 9 is straight forward and I agree
> would make good sense to do.
> 
> I am a bit concerned on how to extend __builtin_set_fpscr_rn to add the
> new functionality.  Peter suggests overloading the builtin to either
> return void or returns FPSCR bits.  It is my understanding that the
> return value for a given builtin had to be the same, i.e. you can't
> overload the return value. Maybe you can with Bill's new
> infrastructure?  I recall having problems trying to overload the return
> value in the past and Bill said you couldn't do it.  I play with this
> and see if I can overload the return value.

In this case, I don't think we need a built-in overload, but just change
the current built-in to return a double rather than void.  All of the
old code should still work, since they'll just ignore the return
value.  As I said, the built-in machinery can see whether we're assigning
the built-in return value to a variable or not, ie, the difference between

  __builtin_set_fpscr_rn ();

versus:

  foo = __builtin_set_fpscr_rn ();

In the former case, the built-in can expand exactly as how it does now.
In the later case, we'll use the target rtx we're passed in as the
destination of the mffscrn[i] insn for P9/10 and for pre-P9, we'll
use the target for an additional mffs instruction which we don't
generate now.  Note we'd only generate the mffs when we're passed in
a target rtx as in the second case.  The first case we won't.

This is all assuming Segher is fine with the change this way.  If we do
go this way, I would recommend adding a predefined macro that users can
test for to know whether the built-in returns a value or not.

If Segher doesn't like this idea, then it's all moot! :-)

Peter


Re: [PATCH v2] rs6000: Add buildin for mffscrn instructions

2023-05-24 Thread Carl Love via Gcc-patches
On Wed, 2023-05-24 at 13:32 +0800, Kewen.Lin wrote:
> on 2023/5/24 06:30, Peter Bergner wrote:
> > On 5/23/23 12:24 AM, Kewen.Lin wrote:
> > > on 2023/5/23 01:31, Carl Love wrote:
> > > > The builtins were requested for use in GLibC.  As of version
> > > > 2.31 they
> > > > were added as inline asm.  They requested a builtin so the asm
> > > > could be
> > > > removed.
> > > 
> > > So IMHO we also want the similar support for mffscrn, that is to
> > > make
> > > use of mffscrn and mffscrni on Power9 and later, but falls back
> > > to 
> > > __builtin_set_fpscr_rn + mffs similar on older platforms.
> > 
> > So __builtin_set_fpscr_rn everything we want (sets the RN bits) and
> > uses mffscrn/mffscrni on P9 and later and uses older insns on pre-
> > P9.
> > The only problem is we don't return the current FPSCR bits, as the
> > bif
> > is defined to return void.
> 
> Yes.
> 
> > Crazy idea, but could we extend the built-in
> > with an overload that returns the FPSCR bits?  
> 
> So you agree that we should make this proposed new bif handle pre-P9
> just
> like some other existing bifs. :)  I think extending it is good and
> doable,
> but the only concern here is the bif name "__builtin_set_fpscr_rn",
> which
> matches the existing behavior (only set rounding) but doesn't match
> the
> proposed extending behavior (set rounding and get some env bits
> back).
> Maybe it's not a big deal if the documentation clarify it well.

Extending the builtin to pre Power 9 is straight forward and I agree
would make good sense to do.

I am a bit concerned on how to extend __builtin_set_fpscr_rn to add the
new functionality.  Peter suggests overloading the builtin to either
return void or returns FPSCR bits.  It is my understanding that the
return value for a given builtin had to be the same, i.e. you can't
overload the return value. Maybe you can with Bill's new
infrastructure?  I recall having problems trying to overload the return
value in the past and Bill said you couldn't do it.  I play with this
and see if I can overload the return value.
> 
> 
> > To be honest, I like
> > the __builtin_set_fpscr_rn name better than __builtin_mffscrn[i].
> 
> +1
> 
> BR,
> Kewen
> 
> > The built-in machinery can see that the usage is expecting a return
> > value
> > or not and for the pre-P9 code, can skip generating the ending mffs
> > if
> > we don't want the return value.
> > 
> > Peter
> > 
> > 



Re: [PATCH v2] rs6000: Add buildin for mffscrn instructions

2023-05-23 Thread Kewen.Lin via Gcc-patches
on 2023/5/24 06:30, Peter Bergner wrote:
> On 5/23/23 12:24 AM, Kewen.Lin wrote:
>> on 2023/5/23 01:31, Carl Love wrote:
>>> The builtins were requested for use in GLibC.  As of version 2.31 they
>>> were added as inline asm.  They requested a builtin so the asm could be
>>> removed.
>>
>> So IMHO we also want the similar support for mffscrn, that is to make
>> use of mffscrn and mffscrni on Power9 and later, but falls back to 
>> __builtin_set_fpscr_rn + mffs similar on older platforms.
> 
> So __builtin_set_fpscr_rn everything we want (sets the RN bits) and
> uses mffscrn/mffscrni on P9 and later and uses older insns on pre-P9.
> The only problem is we don't return the current FPSCR bits, as the bif
> is defined to return void.

Yes.

> Crazy idea, but could we extend the built-in
> with an overload that returns the FPSCR bits?  

So you agree that we should make this proposed new bif handle pre-P9 just
like some other existing bifs. :)  I think extending it is good and doable,
but the only concern here is the bif name "__builtin_set_fpscr_rn", which
matches the existing behavior (only set rounding) but doesn't match the
proposed extending behavior (set rounding and get some env bits back).
Maybe it's not a big deal if the documentation clarify it well.


> To be honest, I like
> the __builtin_set_fpscr_rn name better than __builtin_mffscrn[i].

+1

BR,
Kewen

> The built-in machinery can see that the usage is expecting a return value
> or not and for the pre-P9 code, can skip generating the ending mffs if
> we don't want the return value.
> 
> Peter
> 
>


Re: [PATCH v2] rs6000: Add buildin for mffscrn instructions

2023-05-23 Thread Peter Bergner via Gcc-patches
On 5/23/23 12:24 AM, Kewen.Lin wrote:
> on 2023/5/23 01:31, Carl Love wrote:
>> The builtins were requested for use in GLibC.  As of version 2.31 they
>> were added as inline asm.  They requested a builtin so the asm could be
>> removed.
>
> So IMHO we also want the similar support for mffscrn, that is to make
> use of mffscrn and mffscrni on Power9 and later, but falls back to 
> __builtin_set_fpscr_rn + mffs similar on older platforms.

So __builtin_set_fpscr_rn everything we want (sets the RN bits) and
uses mffscrn/mffscrni on P9 and later and uses older insns on pre-P9.
The only problem is we don't return the current FPSCR bits, as the bif
is defined to return void.  Crazy idea, but could we extend the built-in
with an overload that returns the FPSCR bits?  To be honest, I like
the __builtin_set_fpscr_rn name better than __builtin_mffscrn[i].
The built-in machinery can see that the usage is expecting a return value
or not and for the pre-P9 code, can skip generating the ending mffs if
we don't want the return value.

Peter




Re: [PATCH v2] rs6000: Add buildin for mffscrn instructions

2023-05-22 Thread Kewen.Lin via Gcc-patches
on 2023/5/23 01:31, Carl Love wrote:
> On Mon, 2023-05-22 at 14:36 +0800, Kewen.Lin wrote:
>> Hi Carl,
>>
>> on 2023/5/19 05:12, Carl Love via Gcc-patches wrote:
>>> GCC maintainers:
>>>
>>> version 2.  Fixed an issue with the test case.  The dg-options line
>>> was
>>> missing.
>>>
>>> The following patch adds an overloaded builtin.  There are two
>>> possible
>>> arguments for the builtin.  The builtin definitions are:
>>>
>>>   double __builtin_mffscrn (unsigned long int);
>>>   double __builtin_mffscrn (double);
>>>
>>
>> We already have one  bif __builtin_set_fpscr_rn for RN setting,
>> apparently
>> these two are mainly for direct mapping to mffscr[ni] and want the
>> FPSCR bits.
>> I'm curious what's the requirements requesting these two built-in
>> functions?
> 
> The builtins were requested for use in GLibC.  As of version 2.31 they
> were added as inline asm.  They requested a builtin so the asm could be
> removed.
> 

OK, thanks for the information.

>>
>>> The patch has been tested on Power 10 with no regressions.  
>>>
>>> Please let me know if the patch is acceptable for
>>> mainline.  Thanks.
>>>
>>> Carl
>>>
>>> 
>>> rs6000: Add buildin for mffscrn instructions
>>>
>>
>> s/buildin/built-in/
> 
> fixed
>>
>>> This patch adds overloaded __builtin_mffscrn for the move From
>>> FPSCR
>>> Control & Set R instruction with an immediate argument.  It also
>>> adds the
>>> builtin with a floating point register argument.  A new runnable
>>> test is
>>> added for the new builtin.
>>
>> s/Set R/Set RN/
> 
> fixed
> 
>>> gcc/
>>>
>>> * config/rs6000/rs6000-builtins.def (__builtin_mffscrni,
>>> __builtin_mffscrnd): Add builtin definitions.
>>> * config/rs6000/rs6000-overload.def (__builtin_mffscrn): Add
>>> overloaded definition.
>>> * doc/extend.texi: Add documentation for __builtin_mffscrn.
>>>
>>> gcc/testsuite/
>>>
>>> * gcc.target/powerpc/builtin-mffscrn.c: Add testcase for new
>>> builtin.
>>> ---
>>>  gcc/config/rs6000/rs6000-builtins.def |   7 ++
>>>  gcc/config/rs6000/rs6000-overload.def |   5 +
>>>  gcc/doc/extend.texi   |   8 ++
>>>  .../gcc.target/powerpc/builtin-mffscrn.c  | 106
>>> ++
>>>  4 files changed, 126 insertions(+)
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/builtin-
>>> mffscrn.c
>>>
>>> diff --git a/gcc/config/rs6000/rs6000-builtins.def
>>> b/gcc/config/rs6000/rs6000-builtins.def
>>> index 92d9b46e1b9..67125473684 100644
>>> --- a/gcc/config/rs6000/rs6000-builtins.def
>>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>>> @@ -2875,6 +2875,13 @@
>>>pure vsc __builtin_vsx_xl_len_r (void *, signed long);
>>>  XL_LEN_R xl_len_r {}
>>>  
>>> +; Immediate instruction only uses the least significant two bits
>>> of the
>>> +; const int.
>>> +  double __builtin_mffscrni (const int<2>);
>>> +MFFSCRNI rs6000_mffscrni {}
>>> +
>>> +  double __builtin_mffscrnd (double);
>>> +MFFSCRNF rs6000_mffscrn {}
>>>  
>>
>> Why are them put in [power9-64] rather than [power9]?  IMHO [power9]
>> is the
>> correct stanza for them.
> 
> Moved them to power 9 stanza.
> 
>>   Besides, {nosoft} attribute is required.
> 
> OK, added that.  I was trying to figure out why nosoft is needed.  The
> instructions are manipulating bits in a physical register that controls
> the hardware floating point instructions.  It looks to me like that
> would be why.  Because if you were using msoft float then the floating
> point HW registers are disabled and the floating point operations are
> done using software.  Did I figure this out correctly?

Yes, and also the destination of these two instructions is hardware float
register, its relatives mffs and mffsl have that as well.

> 
>  
>>
>>>  ; Builtins requiring hardware support for IEEE-128 floating-point.
>>>  [ieee128-hw]
>>> diff --git a/gcc/config/rs6000/rs6000-overload.def
>>> b/gcc/config/rs6000/rs6000-overload.def
>>> index c582490c084..adda2df69ea 100644
>>> --- a/gcc/config/rs6000/rs6000-overload.def
>>> +++ b/gcc/config/rs6000/rs6000-overload.def
>>> @@ -78,6 +78,11 @@
>>>  ; like after a required newline, but nowhere else.  Lines
>>> beginning with
>>>  ; a semicolon are also treated as blank lines.
>>>  
>>> +[MFFSCR, __builtin_mffscrn, __builtin_mffscrn]
>>> +  double __builtin_mffscrn (const int<2>);
>>> +MFFSCRNI
>>> +  double __builtin_mffscrn (double);
>>> +MFFSCRNF
>>>  
>>>  [BCDADD, __builtin_bcdadd, __builtin_vec_bcdadd]
>>>vsq __builtin_vec_bcdadd (vsq, vsq, const int);
>>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>>> index ed8b9c8a87b..f16c046051a 100644
>>> --- a/gcc/doc/extend.texi
>>> +++ b/gcc/doc/extend.texi
>>> @@ -18455,6 +18455,9 @@ int __builtin_dfp_dtstsfi_ov_td (unsigned
>>> int comparison, _Decimal128 value);
>>>  
>>>  double __builtin_mffsl(void);
>>>  
>>> +double __builtin_mffscrn (unsigned long int);
>>> +double 

Re: [PATCH v2] rs6000: Add buildin for mffscrn instructions

2023-05-22 Thread Carl Love via Gcc-patches
On Mon, 2023-05-22 at 14:36 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/5/19 05:12, Carl Love via Gcc-patches wrote:
> > GCC maintainers:
> > 
> > version 2.  Fixed an issue with the test case.  The dg-options line
> > was
> > missing.
> > 
> > The following patch adds an overloaded builtin.  There are two
> > possible
> > arguments for the builtin.  The builtin definitions are:
> > 
> >   double __builtin_mffscrn (unsigned long int);
> >   double __builtin_mffscrn (double);
> > 
> 
> We already have one  bif __builtin_set_fpscr_rn for RN setting,
> apparently
> these two are mainly for direct mapping to mffscr[ni] and want the
> FPSCR bits.
> I'm curious what's the requirements requesting these two built-in
> functions?

The builtins were requested for use in GLibC.  As of version 2.31 they
were added as inline asm.  They requested a builtin so the asm could be
removed.

> 
> > The patch has been tested on Power 10 with no regressions.  
> > 
> > Please let me know if the patch is acceptable for
> > mainline.  Thanks.
> > 
> > Carl
> > 
> > 
> > rs6000: Add buildin for mffscrn instructions
> > 
> 
> s/buildin/built-in/

fixed
> 
> > This patch adds overloaded __builtin_mffscrn for the move From
> > FPSCR
> > Control & Set R instruction with an immediate argument.  It also
> > adds the
> > builtin with a floating point register argument.  A new runnable
> > test is
> > added for the new builtin.
> 
> s/Set R/Set RN/

fixed

> > gcc/
> > 
> > * config/rs6000/rs6000-builtins.def (__builtin_mffscrni,
> > __builtin_mffscrnd): Add builtin definitions.
> > * config/rs6000/rs6000-overload.def (__builtin_mffscrn): Add
> > overloaded definition.
> > * doc/extend.texi: Add documentation for __builtin_mffscrn.
> > 
> > gcc/testsuite/
> > 
> > * gcc.target/powerpc/builtin-mffscrn.c: Add testcase for new
> > builtin.
> > ---
> >  gcc/config/rs6000/rs6000-builtins.def |   7 ++
> >  gcc/config/rs6000/rs6000-overload.def |   5 +
> >  gcc/doc/extend.texi   |   8 ++
> >  .../gcc.target/powerpc/builtin-mffscrn.c  | 106
> > ++
> >  4 files changed, 126 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/builtin-
> > mffscrn.c
> > 
> > diff --git a/gcc/config/rs6000/rs6000-builtins.def
> > b/gcc/config/rs6000/rs6000-builtins.def
> > index 92d9b46e1b9..67125473684 100644
> > --- a/gcc/config/rs6000/rs6000-builtins.def
> > +++ b/gcc/config/rs6000/rs6000-builtins.def
> > @@ -2875,6 +2875,13 @@
> >pure vsc __builtin_vsx_xl_len_r (void *, signed long);
> >  XL_LEN_R xl_len_r {}
> >  
> > +; Immediate instruction only uses the least significant two bits
> > of the
> > +; const int.
> > +  double __builtin_mffscrni (const int<2>);
> > +MFFSCRNI rs6000_mffscrni {}
> > +
> > +  double __builtin_mffscrnd (double);
> > +MFFSCRNF rs6000_mffscrn {}
> >  
> 
> Why are them put in [power9-64] rather than [power9]?  IMHO [power9]
> is the
> correct stanza for them.

Moved them to power 9 stanza.

>   Besides, {nosoft} attribute is required.

OK, added that.  I was trying to figure out why nosoft is needed.  The
instructions are manipulating bits in a physical register that controls
the hardware floating point instructions.  It looks to me like that
would be why.  Because if you were using msoft float then the floating
point HW registers are disabled and the floating point operations are
done using software.  Did I figure this out correctly?

 
> 
> >  ; Builtins requiring hardware support for IEEE-128 floating-point.
> >  [ieee128-hw]
> > diff --git a/gcc/config/rs6000/rs6000-overload.def
> > b/gcc/config/rs6000/rs6000-overload.def
> > index c582490c084..adda2df69ea 100644
> > --- a/gcc/config/rs6000/rs6000-overload.def
> > +++ b/gcc/config/rs6000/rs6000-overload.def
> > @@ -78,6 +78,11 @@
> >  ; like after a required newline, but nowhere else.  Lines
> > beginning with
> >  ; a semicolon are also treated as blank lines.
> >  
> > +[MFFSCR, __builtin_mffscrn, __builtin_mffscrn]
> > +  double __builtin_mffscrn (const int<2>);
> > +MFFSCRNI
> > +  double __builtin_mffscrn (double);
> > +MFFSCRNF
> >  
> >  [BCDADD, __builtin_bcdadd, __builtin_vec_bcdadd]
> >vsq __builtin_vec_bcdadd (vsq, vsq, const int);
> > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> > index ed8b9c8a87b..f16c046051a 100644
> > --- a/gcc/doc/extend.texi
> > +++ b/gcc/doc/extend.texi
> > @@ -18455,6 +18455,9 @@ int __builtin_dfp_dtstsfi_ov_td (unsigned
> > int comparison, _Decimal128 value);
> >  
> >  double __builtin_mffsl(void);
> >  
> > +double __builtin_mffscrn (unsigned long int);
> > +double __builtin_mffscrn (double);
> 
> s/unsigned long int/const int/

Fixed

> 
> Note that this section is for all configurations and your
> implementation is put
> __builtin_mffscrn power9 only, so if the intention (requirement) is
> to make this
> be for also all 

Re: [PATCH v2] rs6000: Add buildin for mffscrn instructions

2023-05-22 Thread Kewen.Lin via Gcc-patches
Hi Carl,

on 2023/5/19 05:12, Carl Love via Gcc-patches wrote:
> GCC maintainers:
> 
> version 2.  Fixed an issue with the test case.  The dg-options line was
> missing.
> 
> The following patch adds an overloaded builtin.  There are two possible
> arguments for the builtin.  The builtin definitions are:
> 
>   double __builtin_mffscrn (unsigned long int);
>   double __builtin_mffscrn (double);
> 

We already have one  bif __builtin_set_fpscr_rn for RN setting, apparently
these two are mainly for direct mapping to mffscr[ni] and want the FPSCR bits.
I'm curious what's the requirements requesting these two built-in functions?

> The patch has been tested on Power 10 with no regressions.  
> 
> Please let me know if the patch is acceptable for mainline.  Thanks.
> 
> Carl
> 
> 
> rs6000: Add buildin for mffscrn instructions
> 

s/buildin/built-in/

> This patch adds overloaded __builtin_mffscrn for the move From FPSCR
> Control & Set R instruction with an immediate argument.  It also adds the
> builtin with a floating point register argument.  A new runnable test is
> added for the new builtin.

s/Set R/Set RN/

> 
> gcc/
> 
>   * config/rs6000/rs6000-builtins.def (__builtin_mffscrni,
>   __builtin_mffscrnd): Add builtin definitions.
>   * config/rs6000/rs6000-overload.def (__builtin_mffscrn): Add
>   overloaded definition.
>   * doc/extend.texi: Add documentation for __builtin_mffscrn.
> 
> gcc/testsuite/
> 
>   * gcc.target/powerpc/builtin-mffscrn.c: Add testcase for new
>   builtin.
> ---
>  gcc/config/rs6000/rs6000-builtins.def |   7 ++
>  gcc/config/rs6000/rs6000-overload.def |   5 +
>  gcc/doc/extend.texi   |   8 ++
>  .../gcc.target/powerpc/builtin-mffscrn.c  | 106 ++
>  4 files changed, 126 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/builtin-mffscrn.c
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 92d9b46e1b9..67125473684 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -2875,6 +2875,13 @@
>pure vsc __builtin_vsx_xl_len_r (void *, signed long);
>  XL_LEN_R xl_len_r {}
>  
> +; Immediate instruction only uses the least significant two bits of the
> +; const int.
> +  double __builtin_mffscrni (const int<2>);
> +MFFSCRNI rs6000_mffscrni {}
> +
> +  double __builtin_mffscrnd (double);
> +MFFSCRNF rs6000_mffscrn {}
>  

Why are them put in [power9-64] rather than [power9]?  IMHO [power9] is the
correct stanza for them.  Besides, {nosoft} attribute is required.

>  ; Builtins requiring hardware support for IEEE-128 floating-point.
>  [ieee128-hw]
> diff --git a/gcc/config/rs6000/rs6000-overload.def 
> b/gcc/config/rs6000/rs6000-overload.def
> index c582490c084..adda2df69ea 100644
> --- a/gcc/config/rs6000/rs6000-overload.def
> +++ b/gcc/config/rs6000/rs6000-overload.def
> @@ -78,6 +78,11 @@
>  ; like after a required newline, but nowhere else.  Lines beginning with
>  ; a semicolon are also treated as blank lines.
>  
> +[MFFSCR, __builtin_mffscrn, __builtin_mffscrn]
> +  double __builtin_mffscrn (const int<2>);
> +MFFSCRNI
> +  double __builtin_mffscrn (double);
> +MFFSCRNF
>  
>  [BCDADD, __builtin_bcdadd, __builtin_vec_bcdadd]
>vsq __builtin_vec_bcdadd (vsq, vsq, const int);
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index ed8b9c8a87b..f16c046051a 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -18455,6 +18455,9 @@ int __builtin_dfp_dtstsfi_ov_td (unsigned int 
> comparison, _Decimal128 value);
>  
>  double __builtin_mffsl(void);
>  
> +double __builtin_mffscrn (unsigned long int);
> +double __builtin_mffscrn (double);

s/unsigned long int/const int/

Note that this section is for all configurations and your implementation is put
__builtin_mffscrn power9 only, so if the intention (requirement) is to make this
be for also all configurations, we need to deal with the cases without the 
support
of actual hw insns mffscrn{,i}, just like the existing handlings for mffsl etc.

> +
>  @end smallexample
>  The @code{__builtin_byte_in_set} function requires a
>  64-bit environment supporting ISA 3.0 or later.  This function returns
> @@ -18511,6 +18514,11 @@ the FPSCR.  The instruction is a lower latency 
> version of the @code{mffs}
>  instruction.  If the @code{mffsl} instruction is not available, then the
>  builtin uses the older @code{mffs} instruction to read the FPSCR.
>  
> +The @code{__builtin_mffscrn} returns the contents of the control bits in the
> +FPSCR, bits 29:31 (DRN) and bits 56:63 (VE, OE, UE, ZE, XE, NI, RN).  The
> +contents of bits [62:63] of the unsigned long int or double argument are 
> placed
> +into bits [62:63] of the FPSCR (RN).
> +

I know this description is copied from ISA doc, but this part is for GCC 
documentation,
the 

[PATCH v2] rs6000: Add buildin for mffscrn instructions

2023-05-18 Thread Carl Love via Gcc-patches
GCC maintainers:

version 2.  Fixed an issue with the test case.  The dg-options line was
missing.

The following patch adds an overloaded builtin.  There are two possible
arguments for the builtin.  The builtin definitions are:

  double __builtin_mffscrn (unsigned long int);
  double __builtin_mffscrn (double);

The patch has been tested on Power 10 with no regressions.  

Please let me know if the patch is acceptable for mainline.  Thanks.

Carl


rs6000: Add buildin for mffscrn instructions

This patch adds overloaded __builtin_mffscrn for the move From FPSCR
Control & Set R instruction with an immediate argument.  It also adds the
builtin with a floating point register argument.  A new runnable test is
added for the new builtin.

gcc/

* config/rs6000/rs6000-builtins.def (__builtin_mffscrni,
__builtin_mffscrnd): Add builtin definitions.
* config/rs6000/rs6000-overload.def (__builtin_mffscrn): Add
overloaded definition.
* doc/extend.texi: Add documentation for __builtin_mffscrn.

gcc/testsuite/

* gcc.target/powerpc/builtin-mffscrn.c: Add testcase for new
builtin.
---
 gcc/config/rs6000/rs6000-builtins.def |   7 ++
 gcc/config/rs6000/rs6000-overload.def |   5 +
 gcc/doc/extend.texi   |   8 ++
 .../gcc.target/powerpc/builtin-mffscrn.c  | 106 ++
 4 files changed, 126 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtin-mffscrn.c

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 92d9b46e1b9..67125473684 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -2875,6 +2875,13 @@
   pure vsc __builtin_vsx_xl_len_r (void *, signed long);
 XL_LEN_R xl_len_r {}
 
+; Immediate instruction only uses the least significant two bits of the
+; const int.
+  double __builtin_mffscrni (const int<2>);
+MFFSCRNI rs6000_mffscrni {}
+
+  double __builtin_mffscrnd (double);
+MFFSCRNF rs6000_mffscrn {}
 
 ; Builtins requiring hardware support for IEEE-128 floating-point.
 [ieee128-hw]
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index c582490c084..adda2df69ea 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -78,6 +78,11 @@
 ; like after a required newline, but nowhere else.  Lines beginning with
 ; a semicolon are also treated as blank lines.
 
+[MFFSCR, __builtin_mffscrn, __builtin_mffscrn]
+  double __builtin_mffscrn (const int<2>);
+MFFSCRNI
+  double __builtin_mffscrn (double);
+MFFSCRNF
 
 [BCDADD, __builtin_bcdadd, __builtin_vec_bcdadd]
   vsq __builtin_vec_bcdadd (vsq, vsq, const int);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index ed8b9c8a87b..f16c046051a 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -18455,6 +18455,9 @@ int __builtin_dfp_dtstsfi_ov_td (unsigned int 
comparison, _Decimal128 value);
 
 double __builtin_mffsl(void);
 
+double __builtin_mffscrn (unsigned long int);
+double __builtin_mffscrn (double);
+
 @end smallexample
 The @code{__builtin_byte_in_set} function requires a
 64-bit environment supporting ISA 3.0 or later.  This function returns
@@ -18511,6 +18514,11 @@ the FPSCR.  The instruction is a lower latency version 
of the @code{mffs}
 instruction.  If the @code{mffsl} instruction is not available, then the
 builtin uses the older @code{mffs} instruction to read the FPSCR.
 
+The @code{__builtin_mffscrn} returns the contents of the control bits in the
+FPSCR, bits 29:31 (DRN) and bits 56:63 (VE, OE, UE, ZE, XE, NI, RN).  The
+contents of bits [62:63] of the unsigned long int or double argument are placed
+into bits [62:63] of the FPSCR (RN).
+
 @node Basic PowerPC Built-in Functions Available on ISA 3.1
 @subsubsection Basic PowerPC Built-in Functions Available on ISA 3.1
 
diff --git a/gcc/testsuite/gcc.target/powerpc/builtin-mffscrn.c 
b/gcc/testsuite/gcc.target/powerpc/builtin-mffscrn.c
new file mode 100644
index 000..26c666a4091
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/builtin-mffscrn.c
@@ -0,0 +1,106 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p9vector_hw } */
+/* { dg-options "-mpower9-vector -mdejagnu-cpu=power9" } */
+
+#include 
+
+#ifdef DEBUG
+#include 
+#endif
+
+#define MASK 0x3
+#define EXPECTED1 0x1
+#define EXPECTED2 0x2
+
+void abort (void);
+
+int
+main()
+{
+  unsigned long mask, result, expected;
+  double double_arg;
+  
+  union convert_t {
+double d;
+unsigned long ul;
+  } val;
+
+  /* Test immediate version of __builtin_mffscrn. */
+  /* Read FPSCR and set RN bits in FPSCR[62:63]. */
+  val.d = __builtin_mffscrn (EXPECTED2);
+
+  /* Read FPSCR, bits [62:63] should have been set to 0x2 by previous builtin
+ call.  */
+  val.d = __builtin_mffscrn (EXPECTED1);
+  /* The expected result is the