[PATCH] D155145: [X86] Add AVX-VNNI-INT16 instructions.

2023-08-02 Thread Anna Thomas via Phabricator via cfe-commits
anna added a comment.

thank you @craig.topper  and @pengfei .


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155145/new/

https://reviews.llvm.org/D155145

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155145: [X86] Add AVX-VNNI-INT16 instructions.

2023-08-02 Thread Phoebe Wang via Phabricator via cfe-commits
pengfei added a comment.

In D155145#4556178 , @craig.topper 
wrote:

> In D155145#4556157 , @anna wrote:
>
>> In D155145#4554786 , @anna wrote:
>>
 Can you capture the values of EAX, EBX, ECX, and EDX after the two calls 
 to getX86CpuIDAndInfoEx that have 0x7 as the first argument? Maybe there's 
 a bug in CPUID on Sandy Bridge.
>>>
>>> Sure, on the original code before the patch you suggested right?
>>> The two calls are:
>>>
>>>bool HasLeaf7 =
>>>   MaxLevel >= 7 && !getX86CpuIDAndInfoEx(0x7, 0x0, , , 
>>> , );
>>>   +   llvm::errs() << "Before setting fsgsbase the value for EAX: " << EAX
>>>   + << " EBX: " << EBX << " ECX: " << ECX << "  EDX: " 
>>> << EDX
>>>   + << "\n";
>>> 
>>>   Features["fsgsbase"]   = HasLeaf7 && ((EBX >>  0) & 1);
>>>   
>>>   bool HasLeaf7Subleaf1 =
>>>   MaxLevel >= 7 && !getX86CpuIDAndInfoEx(0x7, 0x1, , , 
>>> , );
>>>   +   llvm::errs() << "Before setting sha512 the value for EAX: " << EAX
>>>   + << " EBX: " << EBX << " ECX: " << ECX << "  EDX: " 
>>> << EDX
>>>   + << "\n";
>>>   Features["sha512"] = HasLeaf7Subleaf1 && ((EAX >> 0) & 1);
>>>   ...
>>>   we set avxvnniint16 after this
>>>
>>> Takes a while to get a build on this machine, should have the output soon.
>>
>> @craig.topper here is the output:
>>
>>   Before setting fsgsbase the value for EAX: 0 EBX: 0 ECX: 0  EDX: 
>> 2617246720 // this is after the HasLeaf7 calculation
>>   Before setting sha512 the value for EAX: 0 EBX: 0 ECX: 0  EDX: 2617246720 
>> // this is after the HasLeaf7Subleaf1 calculation
>>
>> So, with your patch `HasLeaf7Subleaf1` is 0 as EAX is 0. Pls let me know if 
>> you need  any additional diagnostics output (we actually lose access to the 
>> machine on friday, since it is being retired!).
>>
>>> The documentation says that invalid subleaves of leaf 7 should return all 
>>> 0s. So we thought it was safe to check the bits of sub leaf 1 even if eax 
>>> from subleaf 0 doesn't say subleaf 1 is supported.
>>
>> This means the CPUID doesn't satisfy the documentation since EDX != 0 for 
>> SubLeaf1?
>
> Interestingly all of the bits set in EDX are features that were things that 
> were added in microcode patches in the wake of vulnerabilities like Spectre 
> and Meltdown. Maybe the microcode patch forgot to check the subleaf since 
> there was no subleaf implemented when sandy bridge was originally made.
>
> I think my patch is the correct fix given that information. I'll post a patch 
> for review shortly.

Thanks Craig! That makes sense to me.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155145/new/

https://reviews.llvm.org/D155145

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155145: [X86] Add AVX-VNNI-INT16 instructions.

2023-08-02 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

In D155145#4556157 , @anna wrote:

> In D155145#4554786 , @anna wrote:
>
>>> Can you capture the values of EAX, EBX, ECX, and EDX after the two calls to 
>>> getX86CpuIDAndInfoEx that have 0x7 as the first argument? Maybe there's a 
>>> bug in CPUID on Sandy Bridge.
>>
>> Sure, on the original code before the patch you suggested right?
>> The two calls are:
>>
>>bool HasLeaf7 =
>>   MaxLevel >= 7 && !getX86CpuIDAndInfoEx(0x7, 0x0, , , , 
>> );
>>   +   llvm::errs() << "Before setting fsgsbase the value for EAX: " << EAX
>>   + << " EBX: " << EBX << " ECX: " << ECX << "  EDX: " 
>> << EDX
>>   + << "\n";
>> 
>>   Features["fsgsbase"]   = HasLeaf7 && ((EBX >>  0) & 1);
>>   
>>   bool HasLeaf7Subleaf1 =
>>   MaxLevel >= 7 && !getX86CpuIDAndInfoEx(0x7, 0x1, , , , 
>> );
>>   +   llvm::errs() << "Before setting sha512 the value for EAX: " << EAX
>>   + << " EBX: " << EBX << " ECX: " << ECX << "  EDX: " 
>> << EDX
>>   + << "\n";
>>   Features["sha512"] = HasLeaf7Subleaf1 && ((EAX >> 0) & 1);
>>   ...
>>   we set avxvnniint16 after this
>>
>> Takes a while to get a build on this machine, should have the output soon.
>
> @craig.topper here is the output:
>
>   Before setting fsgsbase the value for EAX: 0 EBX: 0 ECX: 0  EDX: 2617246720 
> // this is after the HasLeaf7 calculation
>   Before setting sha512 the value for EAX: 0 EBX: 0 ECX: 0  EDX: 2617246720 
> // this is after the HasLeaf7Subleaf1 calculation
>
> So, with your patch `HasLeaf7Subleaf1` is 0 as EAX is 0. Pls let me know if 
> you need  any additional diagnostics output (we actually lose access to the 
> machine on friday, since it is being retired!).
>
>> The documentation says that invalid subleaves of leaf 7 should return all 
>> 0s. So we thought it was safe to check the bits of sub leaf 1 even if eax 
>> from subleaf 0 doesn't say subleaf 1 is supported.
>
> This means the CPUID doesn't satisfy the documentation since EDX != 0 for 
> SubLeaf1?

Interestingly all of the bits set in EDX are features that were things that 
were added in microcode patches in the wake of vulnerabilities like Spectre and 
Meltdown. Maybe the microcode patch forgot to check the subleaf since there was 
no subleaf implemented when sandy bridge was originally made.

I think my patch is the correct fix given that information. I'll post a patch 
for review shortly.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155145/new/

https://reviews.llvm.org/D155145

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155145: [X86] Add AVX-VNNI-INT16 instructions.

2023-08-02 Thread Phoebe Wang via Phabricator via cfe-commits
pengfei added a comment.

In D155145#4556157 , @anna wrote:

> In D155145#4554786 , @anna wrote:
>
>>> Can you capture the values of EAX, EBX, ECX, and EDX after the two calls to 
>>> getX86CpuIDAndInfoEx that have 0x7 as the first argument? Maybe there's a 
>>> bug in CPUID on Sandy Bridge.
>>
>> Sure, on the original code before the patch you suggested right?
>> The two calls are:
>>
>>bool HasLeaf7 =
>>   MaxLevel >= 7 && !getX86CpuIDAndInfoEx(0x7, 0x0, , , , 
>> );
>>   +   llvm::errs() << "Before setting fsgsbase the value for EAX: " << EAX
>>   + << " EBX: " << EBX << " ECX: " << ECX << "  EDX: " 
>> << EDX
>>   + << "\n";
>> 
>>   Features["fsgsbase"]   = HasLeaf7 && ((EBX >>  0) & 1);
>>   
>>   bool HasLeaf7Subleaf1 =
>>   MaxLevel >= 7 && !getX86CpuIDAndInfoEx(0x7, 0x1, , , , 
>> );
>>   +   llvm::errs() << "Before setting sha512 the value for EAX: " << EAX
>>   + << " EBX: " << EBX << " ECX: " << ECX << "  EDX: " 
>> << EDX
>>   + << "\n";
>>   Features["sha512"] = HasLeaf7Subleaf1 && ((EAX >> 0) & 1);
>>   ...
>>   we set avxvnniint16 after this
>>
>> Takes a while to get a build on this machine, should have the output soon.
>
> @craig.topper here is the output:
>
>   Before setting fsgsbase the value for EAX: 0 EBX: 0 ECX: 0  EDX: 2617246720 
> // this is after the HasLeaf7 calculation
>   Before setting sha512 the value for EAX: 0 EBX: 0 ECX: 0  EDX: 2617246720 
> // this is after the HasLeaf7Subleaf1 calculation
>
> So, with your patch `HasLeaf7Subleaf1` is 0 as EAX is 0. Pls let me know if 
> you need  any additional diagnostics output (we actually lose access to the 
> machine on friday, since it is being retired!).
>
>> The documentation says that invalid subleaves of leaf 7 should return all 
>> 0s. So we thought it was safe to check the bits of sub leaf 1 even if eax 
>> from subleaf 0 doesn't say subleaf 1 is supported.
>
> This means the CPUID doesn't satisfy the documentation since EDX != 0 for 
> SubLeaf1?

The identical EDX value looks dubious to me. Could you compile and run above 
code and paste the result here? Thanks!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155145/new/

https://reviews.llvm.org/D155145

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155145: [X86] Add AVX-VNNI-INT16 instructions.

2023-08-02 Thread Anna Thomas via Phabricator via cfe-commits
anna added a comment.

In D155145#4554786 , @anna wrote:

>> Can you capture the values of EAX, EBX, ECX, and EDX after the two calls to 
>> getX86CpuIDAndInfoEx that have 0x7 as the first argument? Maybe there's a 
>> bug in CPUID on Sandy Bridge.
>
> Sure, on the original code before the patch you suggested right?
> The two calls are:
>
>bool HasLeaf7 =
>   MaxLevel >= 7 && !getX86CpuIDAndInfoEx(0x7, 0x0, , , , 
> );
>   +   llvm::errs() << "Before setting fsgsbase the value for EAX: " << EAX
>   + << " EBX: " << EBX << " ECX: " << ECX << "  EDX: " << 
> EDX
>   + << "\n";
> 
>   Features["fsgsbase"]   = HasLeaf7 && ((EBX >>  0) & 1);
>   
>   bool HasLeaf7Subleaf1 =
>   MaxLevel >= 7 && !getX86CpuIDAndInfoEx(0x7, 0x1, , , , 
> );
>   +   llvm::errs() << "Before setting sha512 the value for EAX: " << EAX
>   + << " EBX: " << EBX << " ECX: " << ECX << "  EDX: " << 
> EDX
>   + << "\n";
>   Features["sha512"] = HasLeaf7Subleaf1 && ((EAX >> 0) & 1);
>   ...
>   we set avxvnniint16 after this
>
> Takes a while to get a build on this machine, should have the output soon.

@ctopper here is the output:

  Before setting fsgsbase the value for EAX: 0 EBX: 0 ECX: 0  EDX: 2617246720 
// this is after the HasLeaf7 calculation
  Before setting sha512 the value for EAX: 0 EBX: 0 ECX: 0  EDX: 2617246720 // 
this is after the HasLeaf7Subleaf1 calculation

So, with your patch `HasLeaf7Subleaf1` is 0 as EAX is 0. Pls let me know if you 
need  any additional diagnostics output (we actually lose access to the machine 
on friday, since it is being retired!).

> The documentation says that invalid subleaves of leaf 7 should return all 0s. 
> So we thought it was safe to check the bits of sub leaf 1 even if eax from 
> subleaf 0 doesn't say subleaf 1 is supported.

This means the CPUID doesn't satisfy the documentation since EDX != 0 for 
SubLeaf1?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155145/new/

https://reviews.llvm.org/D155145

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155145: [X86] Add AVX-VNNI-INT16 instructions.

2023-08-02 Thread Phoebe Wang via Phabricator via cfe-commits
pengfei added a comment.

Thanks @anna and @craig.topper 
I think we can dump the value with the simple code

  $ cat cpuid.c
  #include 
  #include 
  
  int main() {
unsigned int info[4];
for (int i = 0; i < 2; ++i) {
  __get_cpuid_count(7, 1, info, info + 1, info + 2, info + 3);
  printf("%08x\n", info[0]);
  printf("%08x\n", info[1]);
  printf("%08x\n", info[2]);
  printf("%08x\n", info[3]);
}
  }
  
  $ clang cpuid.c && ./a.out


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155145/new/

https://reviews.llvm.org/D155145

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155145: [X86] Add AVX-VNNI-INT16 instructions.

2023-08-02 Thread Anna Thomas via Phabricator via cfe-commits
anna added a comment.

> Can you capture the values of EAX, EBX, ECX, and EDX after the two calls to 
> getX86CpuIDAndInfoEx that have 0x7 as the first argument? Maybe there's a bug 
> in CPUID on Sandy Bridge.

Sure, on the original code before the patch you suggested right?
The two calls are:

   bool HasLeaf7 =
  MaxLevel >= 7 && !getX86CpuIDAndInfoEx(0x7, 0x0, , , , 
);
  +   llvm::errs() << "Before setting fsgsbase the value for EAX: " << EAX
  + << " EBX: " << EBX << " ECX: " << ECX << "  EDX: " << 
EDX
  + << "\n";

  Features["fsgsbase"]   = HasLeaf7 && ((EBX >>  0) & 1);
  
  bool HasLeaf7Subleaf1 =
  MaxLevel >= 7 && !getX86CpuIDAndInfoEx(0x7, 0x1, , , , 
);
  +   llvm::errs() << "Before setting sha512 the value for EAX: " << EAX
  + << " EBX: " << EBX << " ECX: " << ECX << "  EDX: " << 
EDX
  + << "\n";
  Features["sha512"] = HasLeaf7Subleaf1 && ((EAX >> 0) & 1);
  ...
  we set avxvnniint16 after this

Takes a while to get a build on this machine, should have the output soon.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155145/new/

https://reviews.llvm.org/D155145

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155145: [X86] Add AVX-VNNI-INT16 instructions.

2023-08-02 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

In D155145#4553922 , @anna wrote:

> In D155145#4551621 , @craig.topper 
> wrote:
>
>> In D155145#4551526 , @anna wrote:
>>
>>> In D155145#4544068 , @pengfei 
>>> wrote:
>>>
 In D155145#4543326 , @anna wrote:

> We see a crash bisected to this patch about using an illegal instruction. 
> Here's the CPUInfo for the machine:
>
>   CPU info:
>   current cpu id: 22
>   total 32(physical cores 16) (assigned logical cores 32) (assigned 
> physical cores 16) (assigned_sockets:2 of 2) (8 cores per cpu, 2 threads 
> per core) family 6 model 45 stepping 7 microcode 0x71a, cmov, cx8, fxsr, 
> mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, 
> aes, clmul, ht, tsc, tscinvbit, tscinv, clflush
>   AvgLoads: 0.30, 0.10, 0.18
>   CPU Model and flags from /proc/cpuinfo:
>   model name  : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
>   flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
> mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe 
> syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good 
> nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor 
> ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic 
> popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb 
> stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln 
> pts md_clear flush_l1d
>   Online cpus: 0-31
>   Offline cpus:
>   BIOS frequency limitation: 
>   Frequency switch latency (ns): 2
>   Available cpu frequencies: 
>   Current governor: schedutil
>   Core performance/turbo boost: 
>
> I don't see `avxvnniint16` in the flags list nor avx2. So, this 
> (relatively new) instruction shouldn't be generated for this machine. Any 
> ideas on why this might be happening?

 As far as I can see from the patch, the only way to generate avxvnniint16 
 instructions is to call its specific intrinsics explicitly. And we will 
 check compiling options in FE before allowing to call the intrinsics. We 
 do have an optimization to generate vnni instructions without intrinsics, 
 but we haven't extend it to avxvnniint16 so far.
 So I don't know what's wrong in your case, could you provide a reproducer 
 for your problem?
>>>
>>> I've investigated what is going on. With this patch, we are now passing in 
>>> `+avxvnniint16` into machine attributes. With that attribute, we now 
>>> generate an instruction which is illegal on sandybridge machine:
>>>
>>>0x3013f2af:  jmpq   0x3013f09b
>>>  0x3013f2b4:mov%rax,%rdi
>>>  0x3013f2b7:and$0xfff0,%rdi
>>>   => 0x3013f2bb:vpbroadcastd %xmm0,%ymm2
>>>  0x3013f2c0:vpbroadcastd %xmm1,%ymm3
>>>
>>> The instruction `vpbroadcastd %xmm0,%ymm2` requires `AVX2` CPU flag: 
>>> https://www.felixcloutier.com/x86/vpbroadcast. However, the machine has 
>>> only AVX flag.
>>>
>>> This is the complete mattr generated:
>>>
>>>   !3 = 
>>> !{!"-mattr=-prfchw,-cldemote,+avx,+aes,+sahf,+pclmul,-xop,+crc32,-xsaves,-avx512fp16,-sm4,+sse4.1,-avx512ifma,+xsave,-avx512pf,+sse4.2,-tsxldtrk,-ptwrite,-widekl,-sm3,-invpcid,+64bit,-xsavec,-avx512vpopcntdq,+cmov,-avx512vp2intersect,-avx512cd,-movbe,-avxvnniint8,-avx512er,-amx-int8,-kl,-sha512,-avxvnni,-rtm,-adx,-avx2,-hreset,-movdiri,-serialize,-vpclmulqdq,-avx512vl,-uintr,-clflushopt,-raoint,-cmpccxadd,-bmi,-amx-tile,+sse,-gfni,+avxvnniint16,-amx-fp16,+xsaveopt,-rdrnd,-avx512f,-amx-bf16,-avx512bf16,-avx512vnni,+cx8,-avx512bw,+sse3,-pku,-fsgsbase,-clzero,-mwaitx,-lwp,-lzcnt,-sha,-movdir64b,-wbnoinvd,-enqcmd,-prefetchwt1,-avxneconvert,-tbm,-pconfig,-amx-complex,+ssse3,+cx16,-bmi2,-fma,+popcnt,-avxifma,-f16c,-avx512bitalg,-rdpru,-clwb,+mmx,+sse2,-rdseed,-avx512vbmi2,-prefetchi,-rdpid,-fma4,-avx512vbmi,-shstk,-vaes,-waitpkg,-sgx,+fxsr,-avx512dq,-sse4a"}
>>>
>>> I've confirmed if we changed to `-avxvnniint16` we do not generate 
>>> `vpbroadcastd`.
>>>
>>> W.r.t. how we get the machine attributes generated through our front-end:
>>>
>>>   if (!sys::getHostCPUFeatures(Features))
>>> return std::move(mattr);
>>> 
>>>   // Fill mattr with default values.
>>>   mattr.reserve(Features.getNumItems());
>>>   for (auto  : Features) {
>>> std::string attr(I.first());
>>> mattr.emplace_back(std::string(I.second ? "+" : "-") + attr);
>>>   }
>>>
>>> So, the problem is in getHostCPUFeatures, possibly this line from the patch 
>>> : 
>>> `Features["avxvnniint16"] = HasLeaf7Subleaf1 && ((EDX >> 10) & 1) && 
>>> HasAVXSave;`.
>>
>> Does this patch help
>>
>>   diff --git 

[PATCH] D155145: [X86] Add AVX-VNNI-INT16 instructions.

2023-08-02 Thread Anna Thomas via Phabricator via cfe-commits
anna added a comment.

In D155145#4551621 , @craig.topper 
wrote:

> In D155145#4551526 , @anna wrote:
>
>> In D155145#4544068 , @pengfei 
>> wrote:
>>
>>> In D155145#4543326 , @anna wrote:
>>>
 We see a crash bisected to this patch about using an illegal instruction. 
 Here's the CPUInfo for the machine:

   CPU info:
   current cpu id: 22
   total 32(physical cores 16) (assigned logical cores 32) (assigned 
 physical cores 16) (assigned_sockets:2 of 2) (8 cores per cpu, 2 threads 
 per core) family 6 model 45 stepping 7 microcode 0x71a, cmov, cx8, fxsr, 
 mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, aes, 
 clmul, ht, tsc, tscinvbit, tscinv, clflush
   AvgLoads: 0.30, 0.10, 0.18
   CPU Model and flags from /proc/cpuinfo:
   model name  : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
   flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
 mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall 
 nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl 
 xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl 
 vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt 
 tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp 
 tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts 
 md_clear flush_l1d
   Online cpus: 0-31
   Offline cpus:
   BIOS frequency limitation: 
   Frequency switch latency (ns): 2
   Available cpu frequencies: 
   Current governor: schedutil
   Core performance/turbo boost: 

 I don't see `avxvnniint16` in the flags list nor avx2. So, this 
 (relatively new) instruction shouldn't be generated for this machine. Any 
 ideas on why this might be happening?
>>>
>>> As far as I can see from the patch, the only way to generate avxvnniint16 
>>> instructions is to call its specific intrinsics explicitly. And we will 
>>> check compiling options in FE before allowing to call the intrinsics. We do 
>>> have an optimization to generate vnni instructions without intrinsics, but 
>>> we haven't extend it to avxvnniint16 so far.
>>> So I don't know what's wrong in your case, could you provide a reproducer 
>>> for your problem?
>>
>> I've investigated what is going on. With this patch, we are now passing in 
>> `+avxvnniint16` into machine attributes. With that attribute, we now 
>> generate an instruction which is illegal on sandybridge machine:
>>
>>0x3013f2af:   jmpq   0x3013f09b
>>  0x3013f2b4: mov%rax,%rdi
>>  0x3013f2b7: and$0xfff0,%rdi
>>   => 0x3013f2bb: vpbroadcastd %xmm0,%ymm2
>>  0x3013f2c0: vpbroadcastd %xmm1,%ymm3
>>
>> The instruction `vpbroadcastd %xmm0,%ymm2` requires `AVX2` CPU flag: 
>> https://www.felixcloutier.com/x86/vpbroadcast. However, the machine has only 
>> AVX flag.
>>
>> This is the complete mattr generated:
>>
>>   !3 = 
>> !{!"-mattr=-prfchw,-cldemote,+avx,+aes,+sahf,+pclmul,-xop,+crc32,-xsaves,-avx512fp16,-sm4,+sse4.1,-avx512ifma,+xsave,-avx512pf,+sse4.2,-tsxldtrk,-ptwrite,-widekl,-sm3,-invpcid,+64bit,-xsavec,-avx512vpopcntdq,+cmov,-avx512vp2intersect,-avx512cd,-movbe,-avxvnniint8,-avx512er,-amx-int8,-kl,-sha512,-avxvnni,-rtm,-adx,-avx2,-hreset,-movdiri,-serialize,-vpclmulqdq,-avx512vl,-uintr,-clflushopt,-raoint,-cmpccxadd,-bmi,-amx-tile,+sse,-gfni,+avxvnniint16,-amx-fp16,+xsaveopt,-rdrnd,-avx512f,-amx-bf16,-avx512bf16,-avx512vnni,+cx8,-avx512bw,+sse3,-pku,-fsgsbase,-clzero,-mwaitx,-lwp,-lzcnt,-sha,-movdir64b,-wbnoinvd,-enqcmd,-prefetchwt1,-avxneconvert,-tbm,-pconfig,-amx-complex,+ssse3,+cx16,-bmi2,-fma,+popcnt,-avxifma,-f16c,-avx512bitalg,-rdpru,-clwb,+mmx,+sse2,-rdseed,-avx512vbmi2,-prefetchi,-rdpid,-fma4,-avx512vbmi,-shstk,-vaes,-waitpkg,-sgx,+fxsr,-avx512dq,-sse4a"}
>>
>> I've confirmed if we changed to `-avxvnniint16` we do not generate 
>> `vpbroadcastd`.
>>
>> W.r.t. how we get the machine attributes generated through our front-end:
>>
>>   if (!sys::getHostCPUFeatures(Features))
>> return std::move(mattr);
>> 
>>   // Fill mattr with default values.
>>   mattr.reserve(Features.getNumItems());
>>   for (auto  : Features) {
>> std::string attr(I.first());
>> mattr.emplace_back(std::string(I.second ? "+" : "-") + attr);
>>   }
>>
>> So, the problem is in getHostCPUFeatures, possibly this line from the patch 
>> : 
>> `Features["avxvnniint16"] = HasLeaf7Subleaf1 && ((EDX >> 10) & 1) && 
>> HasAVXSave;`.
>
> Does this patch help
>
>   diff --git a/llvm/lib/TargetParser/Host.cpp b/llvm/lib/TargetParser/Host.cpp
>   index 1141df09307c..11a6879fb76a 100644
>   --- a/llvm/lib/TargetParser/Host.cpp
>   +++ 

[PATCH] D155145: [X86] Add AVX-VNNI-INT16 instructions.

2023-08-01 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

In D155145#4551526 , @anna wrote:

> In D155145#4544068 , @pengfei wrote:
>
>> In D155145#4543326 , @anna wrote:
>>
>>> We see a crash bisected to this patch about using an illegal instruction. 
>>> Here's the CPUInfo for the machine:
>>>
>>>   CPU info:
>>>   current cpu id: 22
>>>   total 32(physical cores 16) (assigned logical cores 32) (assigned 
>>> physical cores 16) (assigned_sockets:2 of 2) (8 cores per cpu, 2 threads 
>>> per core) family 6 model 45 stepping 7 microcode 0x71a, cmov, cx8, fxsr, 
>>> mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, aes, 
>>> clmul, ht, tsc, tscinvbit, tscinv, clflush
>>>   AvgLoads: 0.30, 0.10, 0.18
>>>   CPU Model and flags from /proc/cpuinfo:
>>>   model name  : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
>>>   flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
>>> mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall 
>>> nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl 
>>> xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl 
>>> vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt 
>>> tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp 
>>> tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts 
>>> md_clear flush_l1d
>>>   Online cpus: 0-31
>>>   Offline cpus:
>>>   BIOS frequency limitation: 
>>>   Frequency switch latency (ns): 2
>>>   Available cpu frequencies: 
>>>   Current governor: schedutil
>>>   Core performance/turbo boost: 
>>>
>>> I don't see `avxvnniint16` in the flags list nor avx2. So, this (relatively 
>>> new) instruction shouldn't be generated for this machine. Any ideas on why 
>>> this might be happening?
>>
>> As far as I can see from the patch, the only way to generate avxvnniint16 
>> instructions is to call its specific intrinsics explicitly. And we will 
>> check compiling options in FE before allowing to call the intrinsics. We do 
>> have an optimization to generate vnni instructions without intrinsics, but 
>> we haven't extend it to avxvnniint16 so far.
>> So I don't know what's wrong in your case, could you provide a reproducer 
>> for your problem?
>
> I've investigated what is going on. With this patch, we are now passing in 
> `+avxvnniint16` into machine attributes. With that attribute, we now generate 
> an instruction which is illegal on sandybridge machine:
>
>0x3013f2af:jmpq   0x3013f09b
>  0x3013f2b4:  mov%rax,%rdi
>  0x3013f2b7:  and$0xfff0,%rdi
>   => 0x3013f2bb:  vpbroadcastd %xmm0,%ymm2
>  0x3013f2c0:  vpbroadcastd %xmm1,%ymm3
>
> The instruction `vpbroadcastd %xmm0,%ymm2` requires `AVX2` CPU flag: 
> https://www.felixcloutier.com/x86/vpbroadcast. However, the machine has only 
> AVX flag.
>
> This is the complete mattr generated:
>
>   !3 = 
> !{!"-mattr=-prfchw,-cldemote,+avx,+aes,+sahf,+pclmul,-xop,+crc32,-xsaves,-avx512fp16,-sm4,+sse4.1,-avx512ifma,+xsave,-avx512pf,+sse4.2,-tsxldtrk,-ptwrite,-widekl,-sm3,-invpcid,+64bit,-xsavec,-avx512vpopcntdq,+cmov,-avx512vp2intersect,-avx512cd,-movbe,-avxvnniint8,-avx512er,-amx-int8,-kl,-sha512,-avxvnni,-rtm,-adx,-avx2,-hreset,-movdiri,-serialize,-vpclmulqdq,-avx512vl,-uintr,-clflushopt,-raoint,-cmpccxadd,-bmi,-amx-tile,+sse,-gfni,+avxvnniint16,-amx-fp16,+xsaveopt,-rdrnd,-avx512f,-amx-bf16,-avx512bf16,-avx512vnni,+cx8,-avx512bw,+sse3,-pku,-fsgsbase,-clzero,-mwaitx,-lwp,-lzcnt,-sha,-movdir64b,-wbnoinvd,-enqcmd,-prefetchwt1,-avxneconvert,-tbm,-pconfig,-amx-complex,+ssse3,+cx16,-bmi2,-fma,+popcnt,-avxifma,-f16c,-avx512bitalg,-rdpru,-clwb,+mmx,+sse2,-rdseed,-avx512vbmi2,-prefetchi,-rdpid,-fma4,-avx512vbmi,-shstk,-vaes,-waitpkg,-sgx,+fxsr,-avx512dq,-sse4a"}
>
> I've confirmed if we changed to `-avxvnniint16` we do not generate 
> `vpbroadcastd`.
>
> W.r.t. how we get the machine attributes generated through our front-end:
>
>   if (!sys::getHostCPUFeatures(Features))
> return std::move(mattr);
> 
>   // Fill mattr with default values.
>   mattr.reserve(Features.getNumItems());
>   for (auto  : Features) {
> std::string attr(I.first());
> mattr.emplace_back(std::string(I.second ? "+" : "-") + attr);
>   }
>
> So, the problem is in getHostCPUFeatures, possibly this line from the patch : 
> `Features["avxvnniint16"] = HasLeaf7Subleaf1 && ((EDX >> 10) & 1) && 
> HasAVXSave;`.

Does this patch help

  diff --git a/llvm/lib/TargetParser/Host.cpp b/llvm/lib/TargetParser/Host.cpp
  index 1141df09307c..11a6879fb76a 100644
  --- a/llvm/lib/TargetParser/Host.cpp
  +++ b/llvm/lib/TargetParser/Host.cpp
  @@ -1769,7 +1769,7 @@ bool sys::getHostCPUFeatures(StringMap ) 
{
 Features["amx-tile"]   = HasLeaf7 && ((EDX >> 24) & 1) && HasAMXSave;
 Features["amx-int8"]   = 

[PATCH] D155145: [X86] Add AVX-VNNI-INT16 instructions.

2023-08-01 Thread Anna Thomas via Phabricator via cfe-commits
anna added a comment.

In D155145#4544068 , @pengfei wrote:

> In D155145#4543326 , @anna wrote:
>
>> We see a crash bisected to this patch about using an illegal instruction. 
>> Here's the CPUInfo for the machine:
>>
>>   CPU info:
>>   current cpu id: 22
>>   total 32(physical cores 16) (assigned logical cores 32) (assigned physical 
>> cores 16) (assigned_sockets:2 of 2) (8 cores per cpu, 2 threads per core) 
>> family 6 model 45 stepping 7 microcode 0x71a, cmov, cx8, fxsr, mmx, sse, 
>> sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, aes, clmul, ht, 
>> tsc, tscinvbit, tscinv, clflush
>>   AvgLoads: 0.30, 0.10, 0.18
>>   CPU Model and flags from /proc/cpuinfo:
>>   model name  : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
>>   flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx 
>> pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology 
>> nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est 
>> tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt 
>> tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp 
>> tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts 
>> md_clear flush_l1d
>>   Online cpus: 0-31
>>   Offline cpus:
>>   BIOS frequency limitation: 
>>   Frequency switch latency (ns): 2
>>   Available cpu frequencies: 
>>   Current governor: schedutil
>>   Core performance/turbo boost: 
>>
>> I don't see `avxvnniint16` in the flags list nor avx2. So, this (relatively 
>> new) instruction shouldn't be generated for this machine. Any ideas on why 
>> this might be happening?
>
> As far as I can see from the patch, the only way to generate avxvnniint16 
> instructions is to call its specific intrinsics explicitly. And we will check 
> compiling options in FE before allowing to call the intrinsics. We do have an 
> optimization to generate vnni instructions without intrinsics, but we haven't 
> extend it to avxvnniint16 so far.
> So I don't know what's wrong in your case, could you provide a reproducer for 
> your problem?

I've investigated what is going on. With this patch, we are now passing in 
`+avxvnniint16` into machine attributes. With that attribute, we now generate 
an instruction which is illegal on sandybridge machine:

   0x3013f2af:  jmpq   0x3013f09b
 0x3013f2b4:mov%rax,%rdi
 0x3013f2b7:and$0xfff0,%rdi
  => 0x3013f2bb:vpbroadcastd %xmm0,%ymm2
 0x3013f2c0:vpbroadcastd %xmm1,%ymm3

The instruction `vpbroadcastd %xmm0,%ymm2` requires `AVX2` CPU flag: 
https://www.felixcloutier.com/x86/vpbroadcast. However, the machine has only 
AVX flag.

This is the complete mattr generated:

  !3 = 
!{!"-mattr=-prfchw,-cldemote,+avx,+aes,+sahf,+pclmul,-xop,+crc32,-xsaves,-avx512fp16,-sm4,+sse4.1,-avx512ifma,+xsave,-avx512pf,+sse4.2,-tsxldtrk,-ptwrite,-widekl,-sm3,-invpcid,+64bit,-xsavec,-avx512vpopcntdq,+cmov,-avx512vp2intersect,-avx512cd,-movbe,-avxvnniint8,-avx512er,-amx-int8,-kl,-sha512,-avxvnni,-rtm,-adx,-avx2,-hreset,-movdiri,-serialize,-vpclmulqdq,-avx512vl,-uintr,-clflushopt,-raoint,-cmpccxadd,-bmi,-amx-tile,+sse,-gfni,+avxvnniint16,-amx-fp16,+xsaveopt,-rdrnd,-avx512f,-amx-bf16,-avx512bf16,-avx512vnni,+cx8,-avx512bw,+sse3,-pku,-fsgsbase,-clzero,-mwaitx,-lwp,-lzcnt,-sha,-movdir64b,-wbnoinvd,-enqcmd,-prefetchwt1,-avxneconvert,-tbm,-pconfig,-amx-complex,+ssse3,+cx16,-bmi2,-fma,+popcnt,-avxifma,-f16c,-avx512bitalg,-rdpru,-clwb,+mmx,+sse2,-rdseed,-avx512vbmi2,-prefetchi,-rdpid,-fma4,-avx512vbmi,-shstk,-vaes,-waitpkg,-sgx,+fxsr,-avx512dq,-sse4a"}

I've confirmed if we changed to `-avxvnniint16` we do not generate 
`vpbroadcastd`.

W.r.t. how we get the machine attributes generated through our front-end:

  if (!sys::getHostCPUFeatures(Features))
return std::move(mattr);

  // Fill mattr with default values.
  mattr.reserve(Features.getNumItems());
  for (auto  : Features) {
std::string attr(I.first());
mattr.emplace_back(std::string(I.second ? "+" : "-") + attr);
  }

So, the problem is in getHostCPUFeatures, possibly this line from the patch : 
`Features["avxvnniint16"] = HasLeaf7Subleaf1 && ((EDX >> 10) & 1) && 
HasAVXSave;`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155145/new/

https://reviews.llvm.org/D155145

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155145: [X86] Add AVX-VNNI-INT16 instructions.

2023-07-28 Thread Phoebe Wang via Phabricator via cfe-commits
pengfei added a comment.

In D155145#4543326 , @anna wrote:

> We see a crash bisected to this patch about using an illegal instruction. 
> Here's the CPUInfo for the machine:
>
>   CPU info:
>   current cpu id: 22
>   total 32(physical cores 16) (assigned logical cores 32) (assigned physical 
> cores 16) (assigned_sockets:2 of 2) (8 cores per cpu, 2 threads per core) 
> family 6 model 45 stepping 7 microcode 0x71a, cmov, cx8, fxsr, mmx, sse, 
> sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, aes, clmul, ht, 
> tsc, tscinvbit, tscinv, clflush
>   AvgLoads: 0.30, 0.10, 0.18
>   CPU Model and flags from /proc/cpuinfo:
>   model name  : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
>   flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx 
> pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology 
> nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est 
> tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt 
> tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp 
> tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts 
> md_clear flush_l1d
>   Online cpus: 0-31
>   Offline cpus:
>   BIOS frequency limitation: 
>   Frequency switch latency (ns): 2
>   Available cpu frequencies: 
>   Current governor: schedutil
>   Core performance/turbo boost: 
>
> I don't see `avxvnniint16` in the flags list nor avx2. So, this (relatively 
> new) instruction shouldn't be generated for this machine. Any ideas on why 
> this might be happening?

As far as I can see from the patch, the only way to generate avxvnniint16 
instructions is to call its specific intrinsics explicitly. And we will check 
compiling options in FE before allowing to call the instructions. We do have an 
optimization to generate vnni instructions without intrinsics, but we haven't 
extend it to avxvnniint16 so far.
So I don't know what's wrong in your case, could you provide a reproducer for 
your problem?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155145/new/

https://reviews.llvm.org/D155145

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155145: [X86] Add AVX-VNNI-INT16 instructions.

2023-07-28 Thread Anna Thomas via Phabricator via cfe-commits
anna added a comment.

We see a crash bisected to this patch about using an illegal instruction. 
Here's the CPUInfo for the machine:

  CPU info:
  current cpu id: 22
  total 32(physical cores 16) (assigned logical cores 32) (assigned physical 
cores 16) (assigned_sockets:2 of 2) (8 cores per cpu, 2 threads per core) 
family 6 model 45 stepping 7 microcode 0x71a, cmov, cx8, fxsr, mmx, sse, sse2, 
sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, aes, clmul, ht, tsc, 
tscinvbit, tscinv, clflush
  AvgLoads: 0.30, 0.10, 0.18
  CPU Model and flags from /proc/cpuinfo:
  model name  : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
  flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx 
pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology 
nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est 
tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt 
tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp 
tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear 
flush_l1d
  Online cpus: 0-31
  Offline cpus:
  BIOS frequency limitation: 
  Frequency switch latency (ns): 2
  Available cpu frequencies: 
  Current governor: schedutil
  Core performance/turbo boost: 

I don't see `avxvnniint16` in the flags list nor avx2. So, this (relatively 
new) instruction shouldn't be generated for this machine. Any ideas on why this 
might be happening?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155145/new/

https://reviews.llvm.org/D155145

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155145: [X86] Add AVX-VNNI-INT16 instructions.

2023-07-19 Thread Phoebe Wang via Phabricator via cfe-commits
pengfei accepted this revision.
pengfei added a comment.
This revision is now accepted and ready to land.

LGTM.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155145/new/

https://reviews.llvm.org/D155145

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155145: [X86] Add AVX-VNNI-INT16 instructions.

2023-07-19 Thread Freddy, Ye via Phabricator via cfe-commits
FreddyYe added a comment.

ping... Anyone help accept?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155145/new/

https://reviews.llvm.org/D155145

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155145: [X86] Add AVX-VNNI-INT16 instructions.

2023-07-17 Thread Simon Pilgrim via Phabricator via cfe-commits
RKSimon added inline comments.



Comment at: clang/lib/Headers/avxvnniint16intrin.h:26
+
+static __inline__ __m128i __DEFAULT_FN_ATTRS128 _mm_dpwsud_epi32(__m128i __A,
+ __m128i __B,

doxygen descriptions?



Comment at: llvm/test/MC/Disassembler/X86/avx-vnni-int16-64.txt:3
+# RUN: llvm-mc --disassemble %s -triple=x86_64 --output-asm-variant=1 | 
FileCheck %s --check-prefixes=INTEL
+
+# ATT:vpdpwsud %ymm4, %ymm3, %ymm2

try to use some x86_64-- specific registers to improve test coverage


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155145/new/

https://reviews.llvm.org/D155145

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits