subject:"\[Mesa\-dev\] \[PATCH\] gallivm\: use getHostCPUFeatures on x86\/llvm\-4.0\+."

Re: [Mesa-dev] [PATCH] gallivm: use getHostCPUFeatures on x86/llvm-4.0+.

2016-12-07 Thread Michel Dänzer

On 08/12/16 12:02 AM, Roland Scheidegger wrote:
> The bug in llvm has been fixed, can you confirm lp_test_format passes again?

Yep, it does, thanks!


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gallivm: use getHostCPUFeatures on x86/llvm-4.0+.

2016-12-07 Thread Roland Scheidegger

The bug in llvm has been fixed, can you confirm lp_test_format passes again?

Roland

Am 06.12.2016 um 19:00 schrieb Roland Scheidegger:
> Ok, here is the bug:
> https://llvm.org/bugs/show_bug.cgi?id=31296
> 
> Roland
> 
> Am 06.12.2016 um 18:47 schrieb Roland Scheidegger:
>> Actually I've verified this quickly with llc.
>> With -mattr=xop, it produces
>>
>> fetch_r32_float_float:  # @fetch_r32_float_float
>> .cfi_startproc
>> # BB#0: # %entry
>> vpermilps   $65, .LCPI0_0(%rip), %xmm0 # xmm0 = mem[1,0,0,1]
>> vmovaps %xmm0, (%rdi)
>> retq
>>
>> which is very obviously garbage (it even managed to optimize out the
>> actual load, just the constants are left...). So this is a llvm bug with
>> xop indeed. I'm going to a file a bug, but in the interim I don't know
>> what mesa should do - this is one reason why we didn't want to enable
>> features which we didn't actually test previously (that said, if we
>> don't enable them, the llvm bugs we hit will probably never get
>> fixed...). We could of course force-disable xop (albeit in theory it's
>> nice - we really can make use of that damn missing vector shift which
>> otherwise requires avx2).
>>
>> Roland
>>
>>
>> Am 06.12.2016 um 17:34 schrieb Roland Scheidegger:
>>> Interesting. Can you show the IR / assembly? I don't get any failures here.
>>> I'm wondering if it's trying to use XOP and there's some bug there (or
>>> we're relying on undefined behavior which doesn't happen to work with
>>> it). Albeit since there's not actually any conversion involved in this
>>> case (float 1 channel -> float 4 channel) the assembly here looks
>>> trivial and I can't see how it could go wrong.
>>>
>>> I get (with a couple days old llvm):
>>> define void @fetch_r32_float_float(<4 x float>*, i8*, i32, i32, { [2048
>>> x i32], [128 x i64] }*) {
>>> entry:
>>>   %5 = getelementptr i8, i8* %1, i32 0
>>>   %6 = bitcast i8* %5 to i32*
>>>   %7 = load i32, i32* %6
>>>   %8 = zext i32 %7 to i128
>>>   %9 = bitcast i128 %8 to <4 x float>
>>>   %10 = shufflevector <4 x float> %9, <4 x float> >> float 1.00e+00, float undef, float undef>, <4 x i32> >> i32 4, i32 5>
>>>   store <4 x float> %10, <4 x float>* %0
>>>   ret void
>>> }
>>>
>>> fetch_r32_float_float:
>>>  0: pushq   %rbp
>>>  1: movq%rsp, %rbp
>>>  4: movl(%rsi), %eax
>>>  6: vmovq   %rax, %xmm0
>>> 11: movabsq $140375561531392, %rax
>>> 21: vmovaps (%rax), %xmm1
>>> 25: vshufps $0, %xmm1, %xmm0, %xmm0
>>> 30: vshufps $72, %xmm1, %xmm0, %xmm0
>>> 35: vmovaps %xmm0, (%rdi)
>>> 39: popq%rbp
>>> 40: retq
>>>
>>> The only thing I can think of is maybe the load/zext in combination with
>>> the shuffle going wrong - the shuffle combiner in llvm has a couple xop
>>> cases.
>>>
>>> fwiw printing of the values is a bit suboptimal, the "packed" 00 00 80
>>> bf value really is a float 0xbf80 and you don't see the other
>>> channels at all albeit in this case there aren't any...
>>>
>>> Roland
>>>
>>> Am 06.12.2016 um 07:27 schrieb Michel Dänzer:
 On 06/12/16 02:39 AM, Tim Rowley wrote:
> Use llvm provided API based on cpuid rather than our own
> manually mantained list of mattr enabling/disabling.

 This change broke the llvmpipe unit test lp_test_format for me:

 Testing PIPE_FORMAT_R32_FLOAT (float) ...
 FAILED
   Packed: 00 00 00 00
   Unpacked (0,0): 1 0 0 1 obtained
   0 0 0 1 expected
 FAILED
   Packed: 00 00 80 bf
   Unpacked (0,0): 1 0 0 1 obtained
   -1 0 0 1 expected


 This is on:

 processor  : 0
 vendor_id  : AuthenticAMD
 cpu family : 21
 model  : 48
 model name : AMD A10-7850K Radeon R7, 12 Compute Cores 4C+8G
 stepping   : 1
 microcode  : 0x6003106
 cpu MHz: 4100.000
 cache size : 2048 KB
 physical id: 0
 siblings   : 4
 core id: 0
 cpu cores  : 2
 apicid : 16
 initial apicid : 0
 fpu: yes
 fpu_exception  : yes
 cpuid level: 13
 wp : yes
 flags  : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
 mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt 
 pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid 
 aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 
 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm 
 sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce 
 nodeid_msr tbm topoext perfctr_core perfctr_nb bpext ptsc cpb hw_pstate 
 vmmcall fsgsbase bmi1 xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale 
 vmcb_clean flushbyasid decodeassists pausefilter pfthreshold

Re: [Mesa-dev] [PATCH] gallivm: use getHostCPUFeatures on x86/llvm-4.0+.

2016-12-06 Thread Roland Scheidegger

Ok, here is the bug:
https://llvm.org/bugs/show_bug.cgi?id=31296

Roland

Am 06.12.2016 um 18:47 schrieb Roland Scheidegger:
> Actually I've verified this quickly with llc.
> With -mattr=xop, it produces
> 
> fetch_r32_float_float:  # @fetch_r32_float_float
> .cfi_startproc
> # BB#0: # %entry
> vpermilps   $65, .LCPI0_0(%rip), %xmm0 # xmm0 = mem[1,0,0,1]
> vmovaps %xmm0, (%rdi)
> retq
> 
> which is very obviously garbage (it even managed to optimize out the
> actual load, just the constants are left...). So this is a llvm bug with
> xop indeed. I'm going to a file a bug, but in the interim I don't know
> what mesa should do - this is one reason why we didn't want to enable
> features which we didn't actually test previously (that said, if we
> don't enable them, the llvm bugs we hit will probably never get
> fixed...). We could of course force-disable xop (albeit in theory it's
> nice - we really can make use of that damn missing vector shift which
> otherwise requires avx2).
> 
> Roland
> 
> 
> Am 06.12.2016 um 17:34 schrieb Roland Scheidegger:
>> Interesting. Can you show the IR / assembly? I don't get any failures here.
>> I'm wondering if it's trying to use XOP and there's some bug there (or
>> we're relying on undefined behavior which doesn't happen to work with
>> it). Albeit since there's not actually any conversion involved in this
>> case (float 1 channel -> float 4 channel) the assembly here looks
>> trivial and I can't see how it could go wrong.
>>
>> I get (with a couple days old llvm):
>> define void @fetch_r32_float_float(<4 x float>*, i8*, i32, i32, { [2048
>> x i32], [128 x i64] }*) {
>> entry:
>>   %5 = getelementptr i8, i8* %1, i32 0
>>   %6 = bitcast i8* %5 to i32*
>>   %7 = load i32, i32* %6
>>   %8 = zext i32 %7 to i128
>>   %9 = bitcast i128 %8 to <4 x float>
>>   %10 = shufflevector <4 x float> %9, <4 x float> > float 1.00e+00, float undef, float undef>, <4 x i32> > i32 4, i32 5>
>>   store <4 x float> %10, <4 x float>* %0
>>   ret void
>> }
>>
>> fetch_r32_float_float:
>>  0: pushq   %rbp
>>  1: movq%rsp, %rbp
>>  4: movl(%rsi), %eax
>>  6: vmovq   %rax, %xmm0
>> 11: movabsq $140375561531392, %rax
>> 21: vmovaps (%rax), %xmm1
>> 25: vshufps $0, %xmm1, %xmm0, %xmm0
>> 30: vshufps $72, %xmm1, %xmm0, %xmm0
>> 35: vmovaps %xmm0, (%rdi)
>> 39: popq%rbp
>> 40: retq
>>
>> The only thing I can think of is maybe the load/zext in combination with
>> the shuffle going wrong - the shuffle combiner in llvm has a couple xop
>> cases.
>>
>> fwiw printing of the values is a bit suboptimal, the "packed" 00 00 80
>> bf value really is a float 0xbf80 and you don't see the other
>> channels at all albeit in this case there aren't any...
>>
>> Roland
>>
>> Am 06.12.2016 um 07:27 schrieb Michel Dänzer:
>>> On 06/12/16 02:39 AM, Tim Rowley wrote:
 Use llvm provided API based on cpuid rather than our own
 manually mantained list of mattr enabling/disabling.
>>>
>>> This change broke the llvmpipe unit test lp_test_format for me:
>>>
>>> Testing PIPE_FORMAT_R32_FLOAT (float) ...
>>> FAILED
>>>   Packed: 00 00 00 00
>>>   Unpacked (0,0): 1 0 0 1 obtained
>>>   0 0 0 1 expected
>>> FAILED
>>>   Packed: 00 00 80 bf
>>>   Unpacked (0,0): 1 0 0 1 obtained
>>>   -1 0 0 1 expected
>>>
>>>
>>> This is on:
>>>
>>> processor   : 0
>>> vendor_id   : AuthenticAMD
>>> cpu family  : 21
>>> model   : 48
>>> model name  : AMD A10-7850K Radeon R7, 12 Compute Cores 4C+8G
>>> stepping: 1
>>> microcode   : 0x6003106
>>> cpu MHz : 4100.000
>>> cache size  : 2048 KB
>>> physical id : 0
>>> siblings: 4
>>> core id : 0
>>> cpu cores   : 2
>>> apicid  : 16
>>> initial apicid  : 0
>>> fpu : yes
>>> fpu_exception   : yes
>>> cpuid level : 13
>>> wp  : yes
>>> flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
>>> mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt 
>>> pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid 
>>> aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 
>>> popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm 
>>> sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce 
>>> nodeid_msr tbm topoext perfctr_core perfctr_nb bpext ptsc cpb hw_pstate 
>>> vmmcall fsgsbase bmi1 xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale 
>>> vmcb_clean flushbyasid decodeassists pausefilter pfthreshold overflow_recov
>>> bugs: fxsave_leak sysret_ss_attrs null_seg
>>> bogomips: 8200.42
>>> TLB size: 1536 4K pages
>>> clflush size: 64
>>> cache_alignment : 64
>>> address sizes   : 48 bits physical, 48 bits virtual
>>> power

Re: [Mesa-dev] [PATCH] gallivm: use getHostCPUFeatures on x86/llvm-4.0+.

2016-12-06 Thread Roland Scheidegger

Actually I've verified this quickly with llc.
With -mattr=xop, it produces

fetch_r32_float_float:  # @fetch_r32_float_float
.cfi_startproc
# BB#0: # %entry
vpermilps   $65, .LCPI0_0(%rip), %xmm0 # xmm0 = mem[1,0,0,1]
vmovaps %xmm0, (%rdi)
retq

which is very obviously garbage (it even managed to optimize out the
actual load, just the constants are left...). So this is a llvm bug with
xop indeed. I'm going to a file a bug, but in the interim I don't know
what mesa should do - this is one reason why we didn't want to enable
features which we didn't actually test previously (that said, if we
don't enable them, the llvm bugs we hit will probably never get
fixed...). We could of course force-disable xop (albeit in theory it's
nice - we really can make use of that damn missing vector shift which
otherwise requires avx2).

Roland


Am 06.12.2016 um 17:34 schrieb Roland Scheidegger:
> Interesting. Can you show the IR / assembly? I don't get any failures here.
> I'm wondering if it's trying to use XOP and there's some bug there (or
> we're relying on undefined behavior which doesn't happen to work with
> it). Albeit since there's not actually any conversion involved in this
> case (float 1 channel -> float 4 channel) the assembly here looks
> trivial and I can't see how it could go wrong.
> 
> I get (with a couple days old llvm):
> define void @fetch_r32_float_float(<4 x float>*, i8*, i32, i32, { [2048
> x i32], [128 x i64] }*) {
> entry:
>   %5 = getelementptr i8, i8* %1, i32 0
>   %6 = bitcast i8* %5 to i32*
>   %7 = load i32, i32* %6
>   %8 = zext i32 %7 to i128
>   %9 = bitcast i128 %8 to <4 x float>
>   %10 = shufflevector <4 x float> %9, <4 x float>  float 1.00e+00, float undef, float undef>, <4 x i32>  i32 4, i32 5>
>   store <4 x float> %10, <4 x float>* %0
>   ret void
> }
> 
> fetch_r32_float_float:
>  0: pushq   %rbp
>  1: movq%rsp, %rbp
>  4: movl(%rsi), %eax
>  6: vmovq   %rax, %xmm0
> 11: movabsq $140375561531392, %rax
> 21: vmovaps (%rax), %xmm1
> 25: vshufps $0, %xmm1, %xmm0, %xmm0
> 30: vshufps $72, %xmm1, %xmm0, %xmm0
> 35: vmovaps %xmm0, (%rdi)
> 39: popq%rbp
> 40: retq
> 
> The only thing I can think of is maybe the load/zext in combination with
> the shuffle going wrong - the shuffle combiner in llvm has a couple xop
> cases.
> 
> fwiw printing of the values is a bit suboptimal, the "packed" 00 00 80
> bf value really is a float 0xbf80 and you don't see the other
> channels at all albeit in this case there aren't any...
> 
> Roland
> 
> Am 06.12.2016 um 07:27 schrieb Michel Dänzer:
>> On 06/12/16 02:39 AM, Tim Rowley wrote:
>>> Use llvm provided API based on cpuid rather than our own
>>> manually mantained list of mattr enabling/disabling.
>>
>> This change broke the llvmpipe unit test lp_test_format for me:
>>
>> Testing PIPE_FORMAT_R32_FLOAT (float) ...
>> FAILED
>>   Packed: 00 00 00 00
>>   Unpacked (0,0): 1 0 0 1 obtained
>>   0 0 0 1 expected
>> FAILED
>>   Packed: 00 00 80 bf
>>   Unpacked (0,0): 1 0 0 1 obtained
>>   -1 0 0 1 expected
>>
>>
>> This is on:
>>
>> processor: 0
>> vendor_id: AuthenticAMD
>> cpu family   : 21
>> model: 48
>> model name   : AMD A10-7850K Radeon R7, 12 Compute Cores 4C+8G
>> stepping : 1
>> microcode: 0x6003106
>> cpu MHz  : 4100.000
>> cache size   : 2048 KB
>> physical id  : 0
>> siblings : 4
>> core id  : 0
>> cpu cores: 2
>> apicid   : 16
>> initial apicid   : 0
>> fpu  : yes
>> fpu_exception: yes
>> cpuid level  : 13
>> wp   : yes
>> flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
>> mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt 
>> pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid 
>> aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 
>> popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm 
>> sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce 
>> nodeid_msr tbm topoext perfctr_core perfctr_nb bpext ptsc cpb hw_pstate 
>> vmmcall fsgsbase bmi1 xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale 
>> vmcb_clean flushbyasid decodeassists pausefilter pfthreshold overflow_recov
>> bugs : fxsave_leak sysret_ss_attrs null_seg
>> bogomips : 8200.42
>> TLB size : 1536 4K pages
>> clflush size : 64
>> cache_alignment  : 64
>> address sizes: 48 bits physical, 48 bits virtual
>> power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro [13]
>>
>>
>>
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gallivm: use getHostCPUFeatures on x86/llvm-4.0+.

2016-12-06 Thread Roland Scheidegger

Interesting. Can you show the IR / assembly? I don't get any failures here.
I'm wondering if it's trying to use XOP and there's some bug there (or
we're relying on undefined behavior which doesn't happen to work with
it). Albeit since there's not actually any conversion involved in this
case (float 1 channel -> float 4 channel) the assembly here looks
trivial and I can't see how it could go wrong.

I get (with a couple days old llvm):
define void @fetch_r32_float_float(<4 x float>*, i8*, i32, i32, { [2048
x i32], [128 x i64] }*) {
entry:
  %5 = getelementptr i8, i8* %1, i32 0
  %6 = bitcast i8* %5 to i32*
  %7 = load i32, i32* %6
  %8 = zext i32 %7 to i128
  %9 = bitcast i128 %8 to <4 x float>
  %10 = shufflevector <4 x float> %9, <4 x float> , <4 x i32> 
  store <4 x float> %10, <4 x float>* %0
  ret void
}

fetch_r32_float_float:
 0: pushq   %rbp
 1: movq%rsp, %rbp
 4: movl(%rsi), %eax
 6: vmovq   %rax, %xmm0
11: movabsq $140375561531392, %rax
21: vmovaps (%rax), %xmm1
25: vshufps $0, %xmm1, %xmm0, %xmm0
30: vshufps $72, %xmm1, %xmm0, %xmm0
35: vmovaps %xmm0, (%rdi)
39: popq%rbp
40: retq

The only thing I can think of is maybe the load/zext in combination with
the shuffle going wrong - the shuffle combiner in llvm has a couple xop
cases.

fwiw printing of the values is a bit suboptimal, the "packed" 00 00 80
bf value really is a float 0xbf80 and you don't see the other
channels at all albeit in this case there aren't any...

Roland

Am 06.12.2016 um 07:27 schrieb Michel Dänzer:
> On 06/12/16 02:39 AM, Tim Rowley wrote:
>> Use llvm provided API based on cpuid rather than our own
>> manually mantained list of mattr enabling/disabling.
> 
> This change broke the llvmpipe unit test lp_test_format for me:
> 
> Testing PIPE_FORMAT_R32_FLOAT (float) ...
> FAILED
>   Packed: 00 00 00 00
>   Unpacked (0,0): 1 0 0 1 obtained
>   0 0 0 1 expected
> FAILED
>   Packed: 00 00 80 bf
>   Unpacked (0,0): 1 0 0 1 obtained
>   -1 0 0 1 expected
> 
> 
> This is on:
> 
> processor : 0
> vendor_id : AuthenticAMD
> cpu family: 21
> model : 48
> model name: AMD A10-7850K Radeon R7, 12 Compute Cores 4C+8G
> stepping  : 1
> microcode : 0x6003106
> cpu MHz   : 4100.000
> cache size: 2048 KB
> physical id   : 0
> siblings  : 4
> core id   : 0
> cpu cores : 2
> apicid: 16
> initial apicid: 0
> fpu   : yes
> fpu_exception : yes
> cpuid level   : 13
> wp: yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
> pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb 
> rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf 
> eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave 
> avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 
> 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext 
> perfctr_core perfctr_nb bpext ptsc cpb hw_pstate vmmcall fsgsbase bmi1 
> xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid 
> decodeassists pausefilter pfthreshold overflow_recov
> bugs  : fxsave_leak sysret_ss_attrs null_seg
> bogomips  : 8200.42
> TLB size  : 1536 4K pages
> clflush size  : 64
> cache_alignment   : 64
> address sizes : 48 bits physical, 48 bits virtual
> power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro [13]
> 
> 
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gallivm: use getHostCPUFeatures on x86/llvm-4.0+.

2016-12-06 Thread Rowley, Timothy O

Interesting.  My testing was done using piglit on an avx512 capable processor, 
where I didn’t see any regressions.

llvmpipe’s “make check” also passes for me with this change on avx2 and avx512 
machines.

Was this the only regression you saw?

-Tim

> On Dec 6, 2016, at 12:27 AM, Michel Dänzer  wrote:
> 
> On 06/12/16 02:39 AM, Tim Rowley wrote:
>> Use llvm provided API based on cpuid rather than our own
>> manually mantained list of mattr enabling/disabling.
> 
> This change broke the llvmpipe unit test lp_test_format for me:
> 
> Testing PIPE_FORMAT_R32_FLOAT (float) ...
> FAILED
>  Packed: 00 00 00 00
>  Unpacked (0,0): 1 0 0 1 obtained
>  0 0 0 1 expected
> FAILED
>  Packed: 00 00 80 bf
>  Unpacked (0,0): 1 0 0 1 obtained
>  -1 0 0 1 expected
> 
> 
> This is on:
> 
> processor : 0
> vendor_id : AuthenticAMD
> cpu family: 21
> model : 48
> model name: AMD A10-7850K Radeon R7, 12 Compute Cores 4C+8G
> stepping  : 1
> microcode : 0x6003106
> cpu MHz   : 4100.000
> cache size: 2048 KB
> physical id   : 0
> siblings  : 4
> core id   : 0
> cpu cores : 2
> apicid: 16
> initial apicid: 0
> fpu   : yes
> fpu_exception : yes
> cpuid level   : 13
> wp: yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
> pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb 
> rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf 
> eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave 
> avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 
> 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext 
> perfctr_core perfctr_nb bpext ptsc cpb hw_pstate vmmcall fsgsbase bmi1 
> xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid 
> decodeassists pausefilter pfthreshold overflow_recov
> bugs  : fxsave_leak sysret_ss_attrs null_seg
> bogomips  : 8200.42
> TLB size  : 1536 4K pages
> clflush size  : 64
> cache_alignment   : 64
> address sizes : 48 bits physical, 48 bits virtual
> power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro [13]
> 
> 
> 
> -- 
> Earthling Michel Dänzer   |   http://www.amd.com
> Libre software enthusiast | Mesa and X developer

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gallivm: use getHostCPUFeatures on x86/llvm-4.0+.

2016-12-05 Thread Michel Dänzer

On 06/12/16 02:39 AM, Tim Rowley wrote:
> Use llvm provided API based on cpuid rather than our own
> manually mantained list of mattr enabling/disabling.

This change broke the llvmpipe unit test lp_test_format for me:

Testing PIPE_FORMAT_R32_FLOAT (float) ...
FAILED
  Packed: 00 00 00 00
  Unpacked (0,0): 1 0 0 1 obtained
  0 0 0 1 expected
FAILED
  Packed: 00 00 80 bf
  Unpacked (0,0): 1 0 0 1 obtained
  -1 0 0 1 expected


This is on:

processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 21
model   : 48
model name  : AMD A10-7850K Radeon R7, 12 Compute Cores 4C+8G
stepping: 1
microcode   : 0x6003106
cpu MHz : 4100.000
cache size  : 2048 KB
physical id : 0
siblings: 4
core id : 0
cpu cores   : 2
apicid  : 16
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb 
rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf 
eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave 
avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 
3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext 
perfctr_core perfctr_nb bpext ptsc cpb hw_pstate vmmcall fsgsbase bmi1 xsaveopt 
arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists 
pausefilter pfthreshold overflow_recov
bugs: fxsave_leak sysret_ss_attrs null_seg
bogomips: 8200.42
TLB size: 1536 4K pages
clflush size: 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro [13]



-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gallivm: use getHostCPUFeatures on x86/llvm-4.0+.

2016-12-05 Thread Roland Scheidegger

Am 05.12.2016 um 18:39 schrieb Tim Rowley:
> Use llvm provided API based on cpuid rather than our own
> manually mantained list of mattr enabling/disabling.
> ---
>  src/gallium/auxiliary/gallivm/lp_bld_misc.cpp | 15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp 
> b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
> index a68428d..21d9e15 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
> @@ -542,6 +542,20 @@ 
> lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
> llvm::SmallVector MAttrs;
>  
>  #if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64)
> +#if HAVE_LLVM >= 0x0400
> +   /* llvm-3.7+ implements sys::getHostCPUFeatures for x86,
> +* which allows us to enable/disable code generation based
> +* on the results of cpuid.
> +*/
> +   llvm::StringMap features;
> +   llvm::sys::getHostCPUFeatures(features);
> +
> +   for (StringMapIterator f = features.begin();
> +f != features.end();
> +++f) {
> +  MAttrs.push_back(((*f).second ? "+" : "-") + (*f).first().str());
> +   }
> +#else
> /*
>  * We need to unset attributes because sometimes LLVM mistakenly assumes
>  * certain features are present given the processor name.
> @@ -596,6 +610,7 @@ 
> lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
> MAttrs.push_back("-avx512vl");
>  #endif
>  #endif
> +#endif
>  
>  #if defined(PIPE_ARCH_PPC)
> MAttrs.push_back(util_cpu_caps.has_altivec ? "+altivec" : "-altivec");
> 

Reviewed-by: Roland Scheidegger 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] gallivm: use getHostCPUFeatures on x86/llvm-4.0+.

2016-12-05 Thread Tim Rowley

Use llvm provided API based on cpuid rather than our own
manually mantained list of mattr enabling/disabling.
---
 src/gallium/auxiliary/gallivm/lp_bld_misc.cpp | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp 
b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
index a68428d..21d9e15 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
+++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
@@ -542,6 +542,20 @@ 
lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
llvm::SmallVector MAttrs;
 
 #if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64)
+#if HAVE_LLVM >= 0x0400
+   /* llvm-3.7+ implements sys::getHostCPUFeatures for x86,
+* which allows us to enable/disable code generation based
+* on the results of cpuid.
+*/
+   llvm::StringMap features;
+   llvm::sys::getHostCPUFeatures(features);
+
+   for (StringMapIterator f = features.begin();
+f != features.end();
+++f) {
+  MAttrs.push_back(((*f).second ? "+" : "-") + (*f).first().str());
+   }
+#else
/*
 * We need to unset attributes because sometimes LLVM mistakenly assumes
 * certain features are present given the processor name.
@@ -596,6 +610,7 @@ 
lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
MAttrs.push_back("-avx512vl");
 #endif
 #endif
+#endif
 
 #if defined(PIPE_ARCH_PPC)
MAttrs.push_back(util_cpu_caps.has_altivec ? "+altivec" : "-altivec");
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gallivm: use getHostCPUFeatures on x86/llvm-4.0+.

Re: [Mesa-dev] [PATCH] gallivm: use getHostCPUFeatures on x86/llvm-4.0+.

Re: [Mesa-dev] [PATCH] gallivm: use getHostCPUFeatures on x86/llvm-4.0+.

Re: [Mesa-dev] [PATCH] gallivm: use getHostCPUFeatures on x86/llvm-4.0+.

Re: [Mesa-dev] [PATCH] gallivm: use getHostCPUFeatures on x86/llvm-4.0+.

Re: [Mesa-dev] [PATCH] gallivm: use getHostCPUFeatures on x86/llvm-4.0+.

Re: [Mesa-dev] [PATCH] gallivm: use getHostCPUFeatures on x86/llvm-4.0+.

Re: [Mesa-dev] [PATCH] gallivm: use getHostCPUFeatures on x86/llvm-4.0+.

[Mesa-dev] [PATCH] gallivm: use getHostCPUFeatures on x86/llvm-4.0+.

9 matches

Site Navigation

Mail list logo

Footer information