[PATCH] D120395: [X86] Prohibit arithmetic operations on type `__bfloat16`

2022-10-21 Thread Phoebe Wang via Phabricator via cfe-commits
pengfei abandoned this revision.
pengfei added a comment.

This is not needed anymore, thanks @RKSimon


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120395/new/

https://reviews.llvm.org/D120395

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D120395: [X86] Prohibit arithmetic operations on type `__bfloat16`

2022-10-20 Thread Simon Pilgrim via Phabricator via cfe-commits
RKSimon added a comment.
Herald added a subscriber: StephenFan.

@pengfei Do we still need this?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120395/new/

https://reviews.llvm.org/D120395

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D120395: [X86] Prohibit arithmetic operations on type `__bfloat16`

2022-03-16 Thread Kan Shengchen via Phabricator via cfe-commits
skan added a comment.

In D120395#3365622 , @craig.topper 
wrote:

>> So at this point we have these options:
>>
>> 1. Make the `__m[128|256|512]bh` types aliases of `__m[128|256|512]i`
>> 2. Deprecate the `__m[128|256|512]bh` types and replace them with 
>> `__m[128|256|512]i`
>> 3. Add load/store/insert/extract intrinsics for the `__bfloat16` type
>>
>> Of these, I'd prefer the third option because both of the first two require 
>> the an overloaded use of the vector-integer type. I already don't like that 
>> we use the same type for any size integer vector. Using it for BF16 vectors 
>> just seems wrong.
>
> The third option also needs bitcast intrinsics to/from the same sized ps/pd/i 
> type. We have similar casts between the other 3 types already.

If we'd like to not conflict with future bf16 ABI, the second option is the 
right direction.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120395/new/

https://reviews.llvm.org/D120395

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D120395: [X86] Prohibit arithmetic operations on type `__bfloat16`

2022-03-07 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

> So at this point we have these options:
>
> 1. Make the `__m[128|256|512]bh` types aliases of `__m[128|256|512]i`
> 2. Deprecate the `__m[128|256|512]bh` types and replace them with 
> `__m[128|256|512]i`
> 3. Add load/store/insert/extract intrinsics for the `__bfloat16` type
>
> Of these, I'd prefer the third option because both of the first two require 
> the an overloaded use of the vector-integer type. I already don't like that 
> we use the same type for any size integer vector. Using it for BF16 vectors 
> just seems wrong.

The third option also needs bitcast intrinsics to/from the same sized ps/pd/i 
type. We have similar casts between the other 3 types already.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120395/new/

https://reviews.llvm.org/D120395

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D120395: [X86] Prohibit arithmetic operations on type `__bfloat16`

2022-03-07 Thread Andy Kaylor via Phabricator via cfe-commits
andrew.w.kaylor added a comment.

In D120395#3359255 , @pengfei wrote:

> I don't agree. Unlike `__fp16`, `__bf16` is simple an ARM specific type.

Why is __bf16 an ARM-specific type? It's a type that describes a floating point 
value with a specific binary representation that is supported on some ARM and 
some Intel processors. Why should the type be ARM-specific? What the clang 
documentation says is just a description of the current implementation. We can 
change the documentation based on what the compiler supports.

In D120395#3359255 , @pengfei wrote:

> So, it is not a problem of easy or difficult. It's a wrong direction we can't 
> go with it (introducing new IR type without clear ABI declarations). We made 
> such mistake with the `half` type. https://godbolt.org/z/K55s5zqPG We 
> shouldn't make it again.

If people are writing code that uses variables of this type, we aren't really 
solving anything by pretending it's a different type because the ABI for the 
type they're using hasn't been defined. The solution here is to work with HJ 
and others to get the ABI defined for this type.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120395/new/

https://reviews.llvm.org/D120395

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D120395: [X86] Prohibit arithmetic operations on type `__bfloat16`

2022-03-07 Thread Andy Kaylor via Phabricator via cfe-commits
andrew.w.kaylor added a comment.

In D120395#3358533 , @craig.topper 
wrote:

> __m256bh should not have been a new type. It should have been an alias of 
> __m256i. We don't have load/store intrinsics for __m256bh so if you can even 
> get the __m256bh type in and out of memory using load/store intrinsics, it is 
> only because we allow lax vector conversion by default. 
> -fno-lax-vector-conversions will probably break any code trying to load/store 
> it using a load/store intrinsic. If __m256bh was made a struct as at one 
> point proposed, this would have been broken.
>
> If we want __m256bh to be a unique type using __bf16, we must define load, 
> store, and cast intrinsics for it. We would probably want insert/extract 
> element intrinsics as well.

From this, it sounds like our intrinsics support is incomplete for this type. 
Even if it is defined as an alias of some existing type (such as __m256i), I 
would have to do the mental gymnastics of mixing and matching intrinsics to get 
the behavior I want. I think this gets back to the question @scanon asked about 
the semantics of this type. What should I be able to do with it? Can I load a 
vector of these values from memory? Can I store them to memory? Can I assign 
one vector to another? Can I pass it as an argument? Can I use it as a return 
type? It looks the only things we have intrinsics for (for the vector types) 
are converting to and from single precision vectors and performing the dot 
product accumulate operation.

I was wondering about similar issues when I was looking at what we had for the 
__bfloat16 type. We have intrinsics to convert this type to and from single 
precision floating point values, but I can't do anything else with it. Nothing 
else at all, including inserting it into a vector of bf16 values.

So @pengfei is trying to press ahead with the backend implementation, but our 
front end support is incomplete. That might explain why Phoebe and I haven't 
been able to agree on what should be done here.

This patch is strictly a front end patch, but it's trying to just wedge 
definitions into header files to get the desired outcome in the code 
generation. From the user's perspective, it feels totally broken.

Consider this function.

  __mm128 f(__bfloat16 *p1, __bfloat16 *p2) {
// Load the vectors using the integer load intrinsics??
__mm128i temp1 = _mm_loadu_epi32(p1);
__mm128i temp2 = _mm_loadu_epi32(p2);
  
// Zero-initialize the a base value vector
__mm128 base = _mm_set_ps1(0.0f);
  
// Perform the dot product
return _mm_dpbf16_ps (base, temp1, temp2);
  }

Is what you'd expect with the current definitions? It looks like it produces 
the instructions I expected, but with -fno-lax-vector-conversions I get an 
error unless I explicitly bitcast the arguments from `__m128i` to `__m128bh`.

I think that just brings me up to speed with what Craig was saying, right?

So at this point we have these options:

1. Make the `__m[128|256|512]bh` types aliases of `__m[128|256|512]i`
2. Deprecate the `__m[128|256|512]bh` types and replace them with 
`__m[128|256|512]i`
3. Add load/store/insert/extract intrinsics for the `__bfloat16` type

Of these, I'd prefer the third option because both of the first two require the 
an overloaded use of the vector-integer type. I already don't like that we use 
the same type for any size integer vector. Using it for BF16 vectors just seems 
wrong.

For the example above, I'd like to write code like this:

  __mm128 f(__bfloat16 *p1, __bfloat16 *p2) {
// Load the BF16 vectors
__mm128bh v1 = _mm_load_pbh(p1);
__mm128bh v2 = _mm_load_pbh(p2);
  
// Zero-initialize the a base value vector
__mm128 base = _mm_set_ps1(0.0f);
  
// Perform the dot product
return _mm_dpbf16_ps (base, v1, v2);
  }

That's more work, but it has the merit of allowing me to use types that match 
what the program is doing.

The fact that you can't pass a __m128i value to a function that is expecting 
__m128bh is a good thing. We shouldn't be making changes that prevents this 
diagnostic.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120395/new/

https://reviews.llvm.org/D120395

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D120395: [X86] Prohibit arithmetic operations on type `__bfloat16`

2022-03-03 Thread Phoebe Wang via Phabricator via cfe-commits
pengfei added a comment.

> if we define the `__bfloat16` type as the built-in `__bf16` type, then the 
> front end can apply whatever rules it has for that type, including adding 
> whatever ABI handling is needed for BF16 values.

I don't agree. Unlike `__fp16`, `__bf16` is simple an ARM specific type. I 
don't see why we need to define with it. 
https://clang.llvm.org/docs/LanguageExtensions.html#half-precision-floating-point
 . Besides, the doc says it "is only available when supported in hardware". 
Since we only have vector instructions, I don't think we support the scalar 
type here.

> If that ends up being the same as the rules for unsigned short, that's no 
> problem. The front end can implement it that way.

Doesn't it mean front end still generate `i16` in IR for it?

> The point is, by telling the front end that this is a BF16 value, we allow 
> the front end to control the semantics for it.

If you are saying we support a separate `__bfloat16` type in front end and 
generate `i16` in IR just for doing diagnose, I'm fine with it. The problem is 
we can't reuse the `__bf16` 's representation `BFloat16Ty`. Instead, we have to 
follow the way we are using for fp16, e.g., `HalfTy` for `__fp16` (no ABI) and 
`Float16Ty` for `_Float16` (has ABI). But it doesn't worth the effort to me.

> Also, you say we can't assume what registers will be used (in the eventual 
> ABI?) but we are assuming exactly that. If the ABI is ever defined 
> differently than what clang and gcc are currently doing, they will both be 
> wrong.

No new IR types, no ABI issues. The declaration of double underscore type are 
free without ABI. `__fp16` is a good example: https://godbolt.org/z/6qKqGc6Gj

> I don't see why we would treat BF16 values as unsigned short and i16 
> throughout the compiler just to make the backend implementation easier when 
> we already have types available for BF16.

So, it is not a problem of easy or difficult. It's a wrong direction we can't 
go with it (introducing new IR type without clear ABI declarations). We made 
such mistake with the `half` type. https://godbolt.org/z/K55s5zqPG We shouldn't 
make it again.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120395/new/

https://reviews.llvm.org/D120395

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D120395: [X86] Prohibit arithmetic operations on type `__bfloat16`

2022-03-03 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

In D120395#3358533 , @craig.topper 
wrote:

> In D120395#3358453 , 
> @andrew.w.kaylor wrote:
>
>> In D120395#3356355 , @pengfei 
>> wrote:
>>
>>> Good question! This is actually the scope of ABI. Unfortunately, we don't 
>>> have the BF16 ABI at the present. We can't assume what are the physical 
>>> registers the arguments been passed and returned before we have such a 
>>> hardware. For example, ARM has soft FP ABI that supports FP arithmetic 
>>> operations and passes and returns arguments by integer registers. When we 
>>> enabling some ISA set whose type doesn't have ABI representation, e.g., 
>>> F16C, we borrowed such conception. And as a trade off, we used integer 
>>> rather than introducing a new IR type, since we don't need to support the 
>>> arithmetic operations.
>>
>> I don't see the point of the ARM soft-float comparison, given that X86 
>> doesn't have the strict distinction between integer and floating point 
>> registers that ARM has, at least not for the XMM/YMM/ZMM registers. Consider 
>> the following code:
>>
>>   __m128bh foo(__m128 x) {
>> return _mm_cvtneps_pbh(x);
>>   }
>>   __m128 bar(__m128bh x) {
>> return _mm_cvtpbh_ps(x);
>>   }
>>
>> Currently, both clang and gcc will use XMM0 for the argument and return 
>> value in both functions. Is XMM0 an integer register or a floating point 
>> register? There is no such distinction. It's true that the x86_64 psABI does 
>> talk about the general purpose registers as integer registers, and both 
>> clang and gcc will use one of these registers for `__bfloat16` values, but 
>> that's an implementation detail (and a dubious one, considering that nearly 
>> anything useful that you can do with a `__bfloat16` will require moving it 
>> into an SSE register).
>>
>> Also, you say we can't assume what registers will be used (in the eventual 
>> ABI?) but we are assuming exactly that. If the ABI is ever defined 
>> differently than what clang and gcc are currently doing, they will both be 
>> wrong.
>>
>> But all of this only applies to the backend code generation. It has very 
>> little to do with the intrinsic definition in the header file or the IR 
>> generated by the front end. If we continue to define `__bfloat16` as an 
>> `unsigned short` in the header file, the front end will treat it as an 
>> `unsigned short` and it will use its rules for `unsigned short` to generate 
>> IR. If the ABI is ever defined to treat BF16 differently than `unsigned 
>> short`, the front end won't be able to do anything about that because we've 
>> told the front end that the value is an unsigned short.
>>
>> On the other hand, if we define the `__bfloat16` type as the built-in 
>> `__bf16` type, then the front end can apply whatever rules it has for that 
>> type, including adding whatever ABI handling is needed for BF16 values. If 
>> that ends up being the same as the rules for `unsigned short`, that's no 
>> problem. The front end can implement it that way. If it ends up being 
>> something different, the front end can apply rules for whatever the 
>> alternative is. The point is, by telling the front end that this is a BF16 
>> value, we allow the front end to control the semantics for it. This would 
>> then result in the front end generating IR using `bfloat` as the type for 
>> BF16 values. Again, this is a correct and accurate description of the value. 
>> It allows the optimizer to reason about it correctly in any way it needs to.
>>
>> I don't see why we would treat BF16 values as `unsigned short` and `i16` 
>> throughout the compiler just to make the backend implementation easier when 
>> we already have types available for BF16.
>
> __m256bh should not have been a new type. It should have been an alias of 
> __m256i. We don't have load/store intrinsics for __m256bh so if you can even 
> get the __m256bh type in and out of memory using load/store intrinsics, it is 
> only because we allow lax vector conversion by default. 
> -fno-lax-vector-conversions will probably break any code trying to load/store 
> it using a load/store intrinsic. If __m256bh was made a struct as at one 
> point proposed, this would have been broken.
>
> If we want __m256bh to be a unique type using __bf16, we must define load, 
> store, and cast intrinsics for it. We would probably want insert/extract 
> element intrinsics as well.

-fno-lax-vector-conversions does indeed break the load/store intrinsics 
https://godbolt.org/z/b5WPj84Pa


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120395/new/

https://reviews.llvm.org/D120395

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D120395: [X86] Prohibit arithmetic operations on type `__bfloat16`

2022-03-03 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

In D120395#3358453 , @andrew.w.kaylor 
wrote:

> In D120395#3356355 , @pengfei wrote:
>
>> Good question! This is actually the scope of ABI. Unfortunately, we don't 
>> have the BF16 ABI at the present. We can't assume what are the physical 
>> registers the arguments been passed and returned before we have such a 
>> hardware. For example, ARM has soft FP ABI that supports FP arithmetic 
>> operations and passes and returns arguments by integer registers. When we 
>> enabling some ISA set whose type doesn't have ABI representation, e.g., 
>> F16C, we borrowed such conception. And as a trade off, we used integer 
>> rather than introducing a new IR type, since we don't need to support the 
>> arithmetic operations.
>
> I don't see the point of the ARM soft-float comparison, given that X86 
> doesn't have the strict distinction between integer and floating point 
> registers that ARM has, at least not for the XMM/YMM/ZMM registers. Consider 
> the following code:
>
>   __m128bh foo(__m128 x) {
> return _mm_cvtneps_pbh(x);
>   }
>   __m128 bar(__m128bh x) {
> return _mm_cvtpbh_ps(x);
>   }
>
> Currently, both clang and gcc will use XMM0 for the argument and return value 
> in both functions. Is XMM0 an integer register or a floating point register? 
> There is no such distinction. It's true that the x86_64 psABI does talk about 
> the general purpose registers as integer registers, and both clang and gcc 
> will use one of these registers for `__bfloat16` values, but that's an 
> implementation detail (and a dubious one, considering that nearly anything 
> useful that you can do with a `__bfloat16` will require moving it into an SSE 
> register).
>
> Also, you say we can't assume what registers will be used (in the eventual 
> ABI?) but we are assuming exactly that. If the ABI is ever defined 
> differently than what clang and gcc are currently doing, they will both be 
> wrong.
>
> But all of this only applies to the backend code generation. It has very 
> little to do with the intrinsic definition in the header file or the IR 
> generated by the front end. If we continue to define `__bfloat16` as an 
> `unsigned short` in the header file, the front end will treat it as an 
> `unsigned short` and it will use its rules for `unsigned short` to generate 
> IR. If the ABI is ever defined to treat BF16 differently than `unsigned 
> short`, the front end won't be able to do anything about that because we've 
> told the front end that the value is an unsigned short.
>
> On the other hand, if we define the `__bfloat16` type as the built-in 
> `__bf16` type, then the front end can apply whatever rules it has for that 
> type, including adding whatever ABI handling is needed for BF16 values. If 
> that ends up being the same as the rules for `unsigned short`, that's no 
> problem. The front end can implement it that way. If it ends up being 
> something different, the front end can apply rules for whatever the 
> alternative is. The point is, by telling the front end that this is a BF16 
> value, we allow the front end to control the semantics for it. This would 
> then result in the front end generating IR using `bfloat` as the type for 
> BF16 values. Again, this is a correct and accurate description of the value. 
> It allows the optimizer to reason about it correctly in any way it needs to.
>
> I don't see why we would treat BF16 values as `unsigned short` and `i16` 
> throughout the compiler just to make the backend implementation easier when 
> we already have types available for BF16.

__m256bh should not have been a new type. It should have been an alias of 
__m256i. We don't have load/store intrinsics for __m256bh so if you can even 
get the __m256bh type in and out of memory using load/store intrinsics, it is 
only because we allow lax vector conversion by default. 
-fno-lax-vector-conversions will probably break any code trying to load/store 
it using a load/store intrinsic. If __m256bh was made a struct as at one point 
proposed, this would have been broken.

If we want __m256bh to be a unique type using __bf16, we must define load, 
store, and cast intrinsics for it. We would probably want insert/extract 
element intrinsics as well.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120395/new/

https://reviews.llvm.org/D120395

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D120395: [X86] Prohibit arithmetic operations on type `__bfloat16`

2022-03-03 Thread Andy Kaylor via Phabricator via cfe-commits
andrew.w.kaylor added a comment.

In D120395#3356355 , @pengfei wrote:

> Good question! This is actually the scope of ABI. Unfortunately, we don't 
> have the BF16 ABI at the present. We can't assume what are the physical 
> registers the arguments been passed and returned before we have such a 
> hardware. For example, ARM has soft FP ABI that supports FP arithmetic 
> operations and passes and returns arguments by integer registers. When we 
> enabling some ISA set whose type doesn't have ABI representation, e.g., F16C, 
> we borrowed such conception. And as a trade off, we used integer rather than 
> introducing a new IR type, since we don't need to support the arithmetic 
> operations.

I don't see the point of the ARM soft-float comparison, given that X86 doesn't 
have the strict distinction between integer and floating point registers that 
ARM has, at least not for the XMM/YMM/ZMM registers. Consider the following 
code:

  __m128bh foo(__m128 x) {
return _mm_cvtneps_pbh(x);
  }
  __m128 bar(__m128bh x) {
return _mm_cvtpbh_ps(x);
  }

Currently, both clang and gcc will use XMM0 for the argument and return value 
in both functions. Is XMM0 an integer register or a floating point register? 
There is no such distinction. It's true that the x86_64 psABI does talk about 
the general purpose registers as integer registers, and both clang and gcc will 
use one of these registers for `__bfloat16` values, but that's an 
implementation detail (and a dubious one, considering that nearly anything 
useful that you can do with a `__bfloat16` will require moving it into an SSE 
register).

Also, you say we can't assume what registers will be used (in the eventual 
ABI?) but we are assuming exactly that. If the ABI is ever defined differently 
than what clang and gcc are currently doing, they will both be wrong.

But all of this only applies to the backend code generation. It has very little 
to do with the intrinsic definition in the header file or the IR generated by 
the front end. If we continue to define `__bfloat16` as an `unsigned short` in 
the header file, the front end will treat it as an `unsigned short` and it will 
use its rules for `unsigned short` to generate IR. If the ABI is ever defined 
to treat BF16 differently than `unsigned short`, the front end won't be able to 
do anything about that because we've told the front end that the value is an 
unsigned short.

On the other hand, if we define the `__bfloat16` type as the built-in `__bf16` 
type, then the front end can apply whatever rules it has for that type, 
including adding whatever ABI handling is needed for BF16 values. If that ends 
up being the same as the rules for `unsigned short`, that's no problem. The 
front end can implement it that way. If it ends up being something different, 
the front end can apply rules for whatever the alternative is. The point is, by 
telling the front end that this is a BF16 value, we allow the front end to 
control the semantics for it. This would then result in the front end 
generating IR using `bfloat` as the type for BF16 values. Again, this is a 
correct and accurate description of the value. It allows the optimizer to 
reason about it correctly in any way it needs to.

I don't see why we would treat BF16 values as `unsigned short` and `i16` 
throughout the compiler just to make the backend implementation easier when we 
already have types available for BF16.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120395/new/

https://reviews.llvm.org/D120395

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D120395: [X86] Prohibit arithmetic operations on type `__bfloat16`

2022-03-03 Thread Phoebe Wang via Phabricator via cfe-commits
pengfei added a comment.

Thanks @andrew.w.kaylor ! You are totally correct about the intention and 
current implementations.

> Concretely, what are the semantics that we want for the BF16 types and 
> intrinsics? Unlike the other floating-point types, there's no standard to 
> guide this, so it's even more important to clearly specify how these types 
> are to be used, instead of having an ad-hoc semantics of whatever someone 
> happens to implement.

Good question! This is actually the scope of ABI. Unfortunately, we don't have 
the BF16 ABI at the present. We can't assume what are the physical registers 
the arguments been passed and returned before we have such a hardware. For 
example, ARM has soft FP ABI that supports FP arithmetic operations and passes 
and returns arguments by integer registers. When we enabling some ISA set whose 
type doesn't have ABI representation, e.g., F16C, we borrowed such conception. 
And as a trade off, we used integer rather than introducing a new IR type, 
since we don't need to support the arithmetic operations.
This patch as well as D120411  are trying to 
follow what we are doing on F16C. The difference is we are supporting scalar 
type too. That's why I put them into two patches.
Back to the question:

1. There's no BF16 type and its semantics before ABI ready on X86.
2. All intrinsics take BF16 as integer type.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120395/new/

https://reviews.llvm.org/D120395

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D120395: [X86] Prohibit arithmetic operations on type `__bfloat16`

2022-03-02 Thread Andy Kaylor via Phabricator via cfe-commits
andrew.w.kaylor added a comment.
Herald added a project: All.

In D120395#3346591 , @scanon wrote:

> There's a lot of churn around proposed "solutions" on this and related PR, 
> but not a very clear analysis of what the problem we're trying to solve is.

I thought the problem that the patch was originally trying to solve is that 
`__bfloat16` variables could be used in arithmetic operations (using the 
`__bfloat16` type defined in the avx512bf16intrin.h header). For example, clang 
currently compiles this code without any diagnostics if the target processor 
has the required features (avx512bf16 and avx512vl).

  #include 
  float f(float x, float y) {
__bfloat16 x16 = _mm_cvtness_sbh(x);
__bfloat16 y16 = _mm_cvtness_sbh(y);
__bfloat16 z16 = x16 + y16;
return _mm_cvtsbh_ss(z16);
  }

https://godbolt.org/z/vcbcGsPPx

The problem is that the instructions generated for that code are completely 
wrong because `__bfloat` is defined as `unsigned short`. It relies on the user 
knowing that they shouldn't use this type in arithmetic operations.

Like I said, I //thought// that was the original intention of this patch. 
However, the latest version of the patch doesn't prevent this at all. In fact, 
it makes the problem worse by asking the user to define the BF16 variables as 
unsigned short in their code. Getting correct behavior from this point

@pengfei Please correct me if I misunderstood the purpose of this patch.

In D120395#3346591 , @scanon wrote:

> Concretely, what are the semantics that we want for the BF16 types and 
> intrinsics? Unlike the other floating-point types, there's no standard to 
> guide this, so it's even more important to clearly specify how these types 
> are to be used, instead of having an ad-hoc semantics of whatever someone 
> happens to implement.

The binary representation of a BF16 value (such as the value returned by 
_mm_cvtness_sbh) is, as Phoebe mentioned, the "brain floating point type" as 
described here: https://en.wikichip.org/wiki/brain_floating-point_format

Unfortunately, what you can do with it seems to depend on the target 
architecture. For very recent x86 processors, you can convert vectors of this 
type to and from single precision floating point and you can do a SIMD dot 
product and accumulate operation (VDPBF16PS), but the only way to do this is 
with intrinsics. Some ARM processors support other operations, but I think with 
similar restrictions (i.e. only accessible through intrinsics). Apart from 
intrinsics, it is treated as a storage-only type.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120395/new/

https://reviews.llvm.org/D120395

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D120395: [X86] Prohibit arithmetic operations on type `__bfloat16`

2022-02-25 Thread Steve Canon via Phabricator via cfe-commits
scanon added a comment.

There's a lot of churn around proposed "solutions" on this and related PR, but 
not a very clear analysis of what the problem we're trying to solve is.

Concretely, what are the semantics that we want for the BF16 types and 
intrinsics? Unlike the other floating-point types, there's no standard to guide 
this, so it's even more important to clearly specify how these types are to be 
used, instead of having an ad-hoc semantics of whatever someone happens to 
implement.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120395/new/

https://reviews.llvm.org/D120395

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D120395: [X86] Prohibit arithmetic operations on type `__bfloat16`

2022-02-25 Thread Andy Kaylor via Phabricator via cfe-commits
andrew.w.kaylor added inline comments.



Comment at: clang/lib/Headers/avx512bf16intrin.h:37
 ///and fraction field is extended to 23 bits.
-static __inline__ float __DEFAULT_FN_ATTRS _mm_cvtsbh_ss(__bfloat16 __A) {
+static __inline__ float __DEFAULT_FN_ATTRS _mm_cvtsbh_ss(unsigned short __A) {
   return __builtin_ia32_cvtsbf162ss_32(__A);

Are we trying to make our intrinsics weakly typed? I don't like this change at 
all.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120395/new/

https://reviews.llvm.org/D120395

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D120395: [X86] Prohibit arithmetic operations on type `__bfloat16`

2022-02-25 Thread Andy Kaylor via Phabricator via cfe-commits
andrew.w.kaylor added a comment.

In D120395#3344890 , @pengfei wrote:

> Disscussed with GCC folks. We think it's better to use the same way as 
> D120411  that replacing it with short int.

Which GCC folks did you discuss it with? I ask because GCC currently considers 
__m128bh and __m128i to be incompatible types, which is what I would expect: 
https://godbolt.org/z/z9nefbrEc


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120395/new/

https://reviews.llvm.org/D120395

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits