>>>On Mon, Dec 04, 2017 at 08:16:47PM +0000, Bhanuprakash Bodireddy
>wrote:
>>>> Processors support prefetch instruction in anticipation of write but
>>>> compilers(gcc) won't use them unless explicitly asked to do so even
>>>> with '-march=native' specified.
>>>>
>>>> [Problem]
>>>> Case A:
>>>> OVS_PREFETCH_CACHE(addr, OPCH_HTW)
>>>> __builtin_prefetch(addr, 1, 3)
>>>> leaq -112(%rbp), %rax [Assembly]
>>>> prefetchw (%rax)
>>>>
>>>> Case B:
>>>> OVS_PREFETCH_CACHE(addr, OPCH_LTW)
>>>> __builtin_prefetch(addr, 1, 1)
>>>> leaq -112(%rbp), %rax [Assembly]
>>>> prefetchw (%rax) <***problem***>
>>>>
>>>> Inspite of specifying -march=native and using Low Temporal
>>>Write(OPCH_LTW),
>>>> the compiler generates 'prefetchw' instruction instead of 'prefetchwt1'
>>>> instruction available on processor.
>>>>
>>>> [Solution]
>>>> Include -mprefetchwt1
>>>>
>>>> Case B:
>>>> OVS_PREFETCH_CACHE(addr, OPCH_LTW)
>>>> __builtin_prefetch(addr, 1, 1)
>>>> leaq -112(%rbp), %rax [Assembly]
>>>> prefetchwt1 (%rax)
>>>>
>>>> [Testing]
>>>> $ ./boot.sh
>>>> $ ./configure
>>>> checking target hint for cgcc... x86_64
>>>> checking whether gcc accepts -mprefetchwt1... yes
>>>> $ make -j
>>>>
>>>> Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy at
>>>> intel.com>
>>>
>>>Does this have any effect if the architecture or CPU configured for
>>>use does not support prefetchwt1?
>>
>> That's a good question and I spent reasonable time today to figure this out.
>> I have Haswell, Broadwell and Skylake CPUs and they all support this
>instruction.
>
>Hmm. I have 2 different Broadwell machines (Xeon E5 v4 and i7-6800K) and
>both of them doesn't have prefetchwt1 instruction according to cpuid:
>
> PREFETCHWT1 = false
Xeon E5-26XX v4 is Broadwell workstation/server but i7-6800k is Skylake Desktop
variant where as E3-12XX v5 is equivalent skylake workstation/server variant.
AFAIK, prefetchwt1 should be available on above processors, not sure why cpuid
displays it otherwise.
pmd_thread_main()
-------------------------------------------------------------------------------------------
WITH OPCH_HTW, we see prefetchw instruction.
OVS_PREFETCH_CACHE(&pmd->cachelineC, OPCH_HTW);
cycles_count_start(pmd);
for (;;) {
for (i = 0; i < poll_cnt; i++) {
process_packets =
dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx,
poll_list[i].port_no);
cycles_count_intermediate(pmd, poll_list[i].rxq,
Address Source Line Assembly
0x6e29ef 4,086 movl 0x823ecb(%rip), %edi
0x6e29f5 4,085 movq 0x50(%rsp), %rax
0x6e29fa 4,086 test %edi, %edi
0x6e29fc 4,085 prefetchwz (%rax)
----------------------------------------------------------------------------------------
With OPCH_LTW, we can see prefetchwt1b instruction being used(change made to
show this).
OVS_PREFETCH_CACHE(&pmd->cachelineC, OPCH_LTW);
cycles_count_start(pmd);
for (;;) {
for (i = 0; i < poll_cnt; i++) {
..........
Address Source Line Assembly
0x6e29ef 4,086 movl 0x823ecb(%rip), %edi
0x6e29f5 4,085 movq 0x50(%rsp), %rax
0x6e29fa 4,086 test %edi, %edi
0x6e29fc 4,085 prefetchwt1b (%rax)
-----------------------------------------------------------------------------------------
>
>This means that introducing of this change will break binary compatibility even
>between CPUs of the same generation, i.e. I will not be able to run on my
>system binaries compiled on yours.
>
>If it's true I prefer to not have this change.
>
>Anyway adding of this change will make compiling a generic binary for a
>different platforms impossible if your build server supports prefetchwt1.
>There should be way to disable this arch specific compiler flag even if it
>supported on my current platform.
I see your point where a build server can be advanced and supports the
prefetchwt1 instruction
and when I copy and run the precompiled binaries on a server not supporting it,
how does this behave?
Not sure on this. May be Redhat/canonical developers can comment on how they
handle this kind of cases.
I will try to check this on my side.
- Bhanuprakash.
>
>Best regards, Ilya Maximets.
>
>> But I found that this instruction isn't enabled by default even with
>march=native and so need to explicitly enable this.
>>
>> Coming to your question, there won't be side effects on using OPCH_LTW.
>> On Processors that *doesn't* support PREFETCHW and PREFETCHWT1 the
>compiler generates a 'prefetcht1' instruction.
>> On processors that support PREFETCHW the compiler generates 'prefetchw'
>instruction.
>> On processors that support PREFETCHW & PREFETCHWT1, the compiler
>generates 'prefetchwt1' instruction with -mprefetchwt1 explicitly enabled.
>>
>>>If it could lead to that situation, then this does not seem like the
>>>right thing to do, and we might want to fall back to recommending use
>>>of the option when the person building knows that the software will
>>>run on a machine with prefetchwt1.
>>
>> According to above on processors that doesn't have this instruction support,
>'prefetchnt1' instruction would be generated and doesn't have side effects.
>> I verified this using https://gcc.godbolt.org/ and carefully checking the
>instructions generated for different compiler versions and march flags.
>>
>> - Bhanuprakash.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev