>On Mon, Dec 04, 2017 at 08:16:47PM +0000, Bhanuprakash Bodireddy wrote: >> Processors support prefetch instruction in anticipation of write but >> compilers(gcc) won't use them unless explicitly asked to do so even >> with '-march=native' specified. >> >> [Problem] >> Case A: >> OVS_PREFETCH_CACHE(addr, OPCH_HTW) >> __builtin_prefetch(addr, 1, 3) >> leaq -112(%rbp), %rax [Assembly] >> prefetchw (%rax) >> >> Case B: >> OVS_PREFETCH_CACHE(addr, OPCH_LTW) >> __builtin_prefetch(addr, 1, 1) >> leaq -112(%rbp), %rax [Assembly] >> prefetchw (%rax) <***problem***> >> >> Inspite of specifying -march=native and using Low Temporal >Write(OPCH_LTW), >> the compiler generates 'prefetchw' instruction instead of 'prefetchwt1' >> instruction available on processor. >> >> [Solution] >> Include -mprefetchwt1 >> >> Case B: >> OVS_PREFETCH_CACHE(addr, OPCH_LTW) >> __builtin_prefetch(addr, 1, 1) >> leaq -112(%rbp), %rax [Assembly] >> prefetchwt1 (%rax) >> >> [Testing] >> $ ./boot.sh >> $ ./configure >> checking target hint for cgcc... x86_64 >> checking whether gcc accepts -mprefetchwt1... yes >> $ make -j >> >> Signed-off-by: Bhanuprakash Bodireddy >> <[email protected]> > >Does this have any effect if the architecture or CPU configured for use does >not support prefetchwt1?
That's a good question and I spent reasonable time today to figure this out. I have Haswell, Broadwell and Skylake CPUs and they all support this instruction. But I found that this instruction isn't enabled by default even with march=native and so need to explicitly enable this. Coming to your question, there won't be side effects on using OPCH_LTW. On Processors that *doesn't* support PREFETCHW and PREFETCHWT1 the compiler generates a 'prefetcht1' instruction. On processors that support PREFETCHW the compiler generates 'prefetchw' instruction. On processors that support PREFETCHW & PREFETCHWT1, the compiler generates 'prefetchwt1' instruction with -mprefetchwt1 explicitly enabled. >If it could lead to that situation, then this does not >seem like the right thing to do, and we might want to fall back to >recommending use of the option when the person building knows that the >software will run on a machine with prefetchwt1. According to above on processors that doesn't have this instruction support, 'prefetchnt1' instruction would be generated and doesn't have side effects. I verified this using https://gcc.godbolt.org/ and carefully checking the instructions generated for different compiler versions and march flags. - Bhanuprakash. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
