> >> -----Original Message----- >> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev- >> boun...@openvswitch.org] On Behalf Of Bhanuprakash Bodireddy >> Sent: Friday, January 12, 2018 5:41 PM >> To: d...@openvswitch.org >> Subject: [ovs-dev] [PATCH 1/4] compiler: Introduce OVS_PREFETCH variants. >> >> This commit introduces prefetch variants by using the GCC built-in >> prefetch function. >> >> The prefetch variants gives the user better control on designing data >> caching strategy in order to increase cache efficiency and minimize >> cache pollution. Data reference patterns here can be classified in to >> >> - Non-temporal(NT) - Data that is referenced once and not reused in >> immediate future. >> - Temporal - Data will be used again soon. >> >> The Macro variants can be used where there are >> - Predictable memory access patterns. >> - Execution pipeline can stall if data isn't available. >> - Time consuming loops. >> >> For example: >> >> OVS_PREFETCH_CACHE(addr, OPCH_LTR) >> - OPCH_LTR : OVS PREFETCH CACHE HINT-LOW TEMPORAL READ. >> - __builtin_prefetch(addr, 0, 1) >> - Prefetch data in to L3 cache for readonly purpose. >> >> OVS_PREFETCH_CACHE(addr, OPCH_HTW) >> - OPCH_HTW : OVS PREFETCH CACHE HINT-HIGH TEMPORAL WRITE. >> - __builtin_prefetch(addr, 1, 3) >> - Prefetch data in to all caches in anticipation of write. In doing >> so it invalidates other cached copies so as to gain 'exclusive' >> access. >> >> OVS_PREFETCH(addr) >> - OPCH_HTR : OVS PREFETCH CACHE HINT-HIGH TEMPORAL READ. >> - __builtin_prefetch(addr, 0, 3) >> - Prefetch data in to all caches in anticipation of read and that >> data will be used again soon (HTR - High Temporal Read). >> >> Signed-off-by: Bhanuprakash Bodireddy >> <bhanuprakash.bodire...@intel.com> >> --- >> include/openvswitch/compiler.h | 147 >> ++++++++++++++++++++++++++++++++++++++--- >> 1 file changed, 139 insertions(+), 8 deletions(-) >> >> diff --git a/include/openvswitch/compiler.h >> b/include/openvswitch/compiler.h index c7cb930..94bb24d 100644 >> --- a/include/openvswitch/compiler.h >> +++ b/include/openvswitch/compiler.h >> @@ -222,18 +222,149 @@ >> static void f(void) >> #endif >> >> -/* OVS_PREFETCH() can be used to instruct the CPU to fetch the cache >> - * line containing the given address to a CPU cache. >> - * OVS_PREFETCH_WRITE() should be used when the memory is going to >be >> - * written to. Depending on the target CPU, this can generate the >> same >> - * instruction as OVS_PREFETCH(), or bring the data into the cache in >> an >> - * exclusive state. */ >> #if __GNUC__ >> -#define OVS_PREFETCH(addr) __builtin_prefetch((addr)) -#define >> OVS_PREFETCH_WRITE(addr) __builtin_prefetch((addr), 1) >> +enum cache_locality { >> + NON_TEMPORAL_LOCALITY, >> + LOW_TEMPORAL_LOCALITY, >> + MODERATE_TEMPORAL_LOCALITY, >> + HIGH_TEMPORAL_LOCALITY >> +}; >> + >> +enum cache_rw { >> + PREFETCH_READ, >> + PREFETCH_WRITE >> +}; >> + >> +/* The prefetch variants gives the user better control on designing >> +data >> + * caching strategy in order to increase cache efficiency and >> +minimize >> + * cache pollution. Data reference patterns here can be classified in >> +to >> + * >> + * Non-temporal(NT) - Data that is referenced once and not reused in >> + * immediate future. >> + * Temporal - Data will be used again soon. >> + * >> + * The Macro variants can be used where there are >> + * o Predictable memory access patterns. >> + * o Execution pipeline can stall if data isn't available. >> + * o Time consuming loops. >> + * >> + * OVS_PREFETCH_CACHE() can be used to instruct the CPU to fetch the >> +cache >> + * line containing the given address to a CPU cache. The second >> +argument >> + * OPCH_XXR (or) OPCH_XXW is used to hint if the prefetched data is >> +going >> + * to be read or written to by core. >> + * >> + * Example Usage: >> + * >> + * OVS_PREFETCH_CACHE(addr, OPCH_LTR) >> + * - OPCH_LTR : OVS PREFETCH CACHE HINT-LOW TEMPORAL READ. >> + * - __builtin_prefetch(addr, 0, 1) >> + * - Prefetch data in to L3 cache for readonly purpose. >> + * >> + * OVS_PREFETCH_CACHE(addr, OPCH_HTW) >> + * - OPCH_HTW : OVS PREFETCH CACHE HINT-HIGH TEMPORAL WRITE. >> + * - __builtin_prefetch(addr, 1, 3) >> + * - Prefetch data in to all caches in anticipation of write. In >> doing >> + * so it invalidates other cached copies so as to gain >> 'exclusive' >> + * access. >> + * >> + * OVS_PREFETCH(addr) >> + * - OPCH_HTR : OVS PREFETCH CACHE HINT-HIGH TEMPORAL READ. >> + * - __builtin_prefetch(addr, 0, 3) >> + * - Prefetch data in to all caches in anticipation of read and >> that >> + * data will be used again soon (HTR - High Temporal Read). >> + * >> + * Implementation details of prefetch hint instructions may vary >> + across >> + * different processors and microarchitectures. > >Herein lies a potential problem, have you tested this on systems that have >different interpretations of the prefetch hints? What about systems that >don't support it?
[BHANU] I have tested it on different intel micro architectures(Haswell, Broadwell, skylake). I understand that you are concerned about ARM platform, I see that ARM do support prefetch variants and they have the same functionality as x86_64. For example, the below code snippet when compiled on ARM64 with gcc 5.4 void pref(void *p) { __builtin_prefetch(p,0,0); __builtin_prefetch(p,0,1); __builtin_prefetch(p,0,2); __builtin_prefetch(p,0,3); __builtin_prefetch(p,1,0); __builtin_prefetch(p,1,1); __builtin_prefetch(p,1,2); __builtin_prefetch(p,1,3); } ON ARM64 (gcc 5.4) : pref: prfm PLDL1STRM, [x0] prfm PLDL3KEEP, [x0] prfm PLDL2KEEP, [x0] prfm PLDL1KEEP, [x0] prfm PSTL1STRM, [x0] prfm PSTL3KEEP, [x0] prfm PSTL2KEEP, [x0] prfm PSTL1KEEP, [x0] ret On instruction details: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0802b/PRFM_imm.html The best way to verify different platforms and complier versions is to use https://gcc.godbolt.org/ > >In some cases OVS will be compiled on one system but then deployed on >another, they might not be the same HW platform. What happens in that >case? If the target doesn't support the prefetch, it might be a NOP on that platform and doesn't cause any application crashes or performance penalties. > >Will it behave as expected i.e. similar fashion to how prefetch currently >behaves? Yes. > >> + * >> + * OPCH_NTW, OPCH_LTW, OPCH_MTW uses prefetchwt1 instruction and >> +OPCH_HTW >> + * uses prefetchw instruction when available. Refer Documentation on >> +how >> + * to enable prefetchwt1 instruction. > >Just to clarify, Is it HW documentation for a user's setup they must refer to? [BHANU] Nope, I meant the OvS Documentation in this patch. https://mail.openvswitch.org/pipermail/ovs-dev/2018-January/343101.html >Are there any extra setup steps for compilers etc. for these instructions? [BHANU] True, this has been clearly mentioned in the Documentation in the above specified link. > >I would expect something like this to be added to the OVS docs. > >> + * >> + * PREFETCH HINT Instruction GCC builtin function >> + * ------------------------------------------------------- >> + * OPCH_NTR prefetchnta __builtin_prefetch(a, 0, 0) >> + * OPCH_LTR prefetcht2 __builtin_prefetch(a, 0, 1) >> + * OPCH_MTR prefetcht1 __builtin_prefetch(a, 0, 2) >> + * OPCH_HTR prefetcht0 __builtin_prefetch(a, 0, 3) >> + * >> + * OPCH_NTW prefetchwt1 __builtin_prefetch(a, 1, 0) >> + * OPCH_LTW prefetchwt1 __builtin_prefetch(a, 1, 1) >> + * OPCH_MTW prefetchwt1 __builtin_prefetch(a, 1, 2) >> + * OPCH_HTW prefetchw __builtin_prefetch(a, 1, 3) >> + * >> + * */ >> +#define OVS_PREFETCH_CACHE_HINT >> \ >> + OPCH(OPCH_NTR, PREFETCH_READ, NON_TEMPORAL_LOCALITY, >> \ >> + "Fetch data to non-temporal cache close to processor" >> \ >> + "to minimize cache pollution") >> \ >> + OPCH(OPCH_LTR, PREFETCH_READ, LOW_TEMPORAL_LOCALITY, >> \ >> + "Fetch data to L2 and L3 cache") >> \ >> + OPCH(OPCH_MTR, PREFETCH_READ, MODERATE_TEMPORAL_LOCALITY, >> \ >> + "Fetch data to L2 and L3 caches, same as LTR on" >> \ >> + "Nehalem, Westmere, Sandy Bridge and newer >> + microarchitectures") >> \ >> + OPCH(OPCH_HTR, PREFETCH_READ, HIGH_TEMPORAL_LOCALITY, >> \ >> + "Fetch data in to all cache levels L1, L2 and L3") >> \ >> + OPCH(OPCH_NTW, PREFETCH_WRITE, NON_TEMPORAL_LOCALITY, >> \ >> + "Fetch data to L2 and L3 cache in exclusive state" >> \ >> + "in anticipation of write") >> \ >> + OPCH(OPCH_LTW, PREFETCH_WRITE, LOW_TEMPORAL_LOCALITY, >> \ >> + "Fetch data to L2 and L3 cache in exclusive state") >> \ >> + OPCH(OPCH_MTW, PREFETCH_WRITE, >MODERATE_TEMPORAL_LOCALITY, >> \ >> + "Fetch data in to L2 and L3 caches in exclusive state") >> \ >> + OPCH(OPCH_HTW, PREFETCH_WRITE, HIGH_TEMPORAL_LOCALITY, >> \ >> + "Fetch data in to all cache levels in exclusive state") >> + >> +/* Indexes for cache prefetch types. */ enum { #define OPCH(ENUM, RW, >> +LOCALITY, EXPLANATION) ENUM##_INDEX, >> + OVS_PREFETCH_CACHE_HINT >> +#undef OPCH >> +}; >> + >> +/* Cache prefetch types. */ >> +enum ovs_prefetch_type { >> +#define OPCH(ENUM, RW, LOCALITY, EXPLANATION) ENUM = 1 << >ENUM##_INDEX, >> + OVS_PREFETCH_CACHE_HINT >> +#undef OPCH >> +}; >> + >> +#define OVS_PREFETCH_CACHE(addr, TYPE) switch(TYPE) > >Checkpatch caught the following: > >ERROR: Improper whitespace around control block >#164 FILE: include/openvswitch/compiler.h:331: >#define OVS_PREFETCH_CACHE(addr, TYPE) switch(TYPE) \ > >Lines checked: 204, Warnings: 0, Errors: 1> \ [BHANU] I will fix this. >> +{ >> \ >> + case OPCH_NTR: >> \ >> + __builtin_prefetch((addr), PREFETCH_READ, >> + NON_TEMPORAL_LOCALITY); >> \ >> + break; >> \ >> + case OPCH_LTR: >> \ >> + __builtin_prefetch((addr), PREFETCH_READ, >> + LOW_TEMPORAL_LOCALITY); >> \ >> + break; >> \ >> + case OPCH_MTR: >> \ >> + __builtin_prefetch((addr), PREFETCH_READ, >> \ >> + MODERATE_TEMPORAL_LOCALITY); >> \ >> + break; >> \ >> + case OPCH_HTR: >> \ >> + __builtin_prefetch((addr), PREFETCH_READ, >> HIGH_TEMPORAL_LOCALITY); \ >> + break; >> \ >> + case OPCH_NTW: >> \ >> + __builtin_prefetch((addr), PREFETCH_WRITE, >> NON_TEMPORAL_LOCALITY); \ >> + break; >> \ >> + case OPCH_LTW: >> \ >> + __builtin_prefetch((addr), PREFETCH_WRITE, >> LOW_TEMPORAL_LOCALITY); \ >> + break; >> \ >> + case OPCH_MTW: >> \ >> + __builtin_prefetch((addr), PREFETCH_WRITE, >> \ >> + MODERATE_TEMPORAL_LOCALITY); >> \ >> + break; >> \ >> + case OPCH_HTW: >> \ >> + __builtin_prefetch((addr), PREFETCH_WRITE, >> HIGH_TEMPORAL_LOCALITY); \ >> + break; >> \ >> +} >> + >> +/* Retain this for backward compatibility. */ #define >> +OVS_PREFETCH(addr) OVS_PREFETCH_CACHE(addr, OPCH_HTR) #define >> +OVS_PREFETCH_WRITE(addr) OVS_PREFETCH_CACHE(addr, OPCH_HTW) >> #else >> #define OVS_PREFETCH(addr) >> #define OVS_PREFETCH_WRITE(addr) >> +#define OVS_PREFETCH_CACHE(addr, OP) >> #endif >> >> /* Build assertions. >> -- >> 2.4.11 >> >> _______________________________________________ >> dev mailing list >> d...@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev