[Bug target/91043] GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 Hanoch Haim changed: What|Removed |Added Status|REOPENED|RESOLVED Resolution|--- |INVALID --- Comment #25 from Hanoch Haim --- Hi Richard, You were right all along. I've looked into the wrong place! I understand it now and it is not a gcc issue. gcc7/8 are just better than gcc 6 with code generation. 1. The alignment is contagious, gcc marks all the parent objects of such an object as aligned. 2. With static allocated object there is no issue. 3. The issue in my case was a dynamic allocation of a different object that includes the aligned object. The object(parent) is assumed to be aligned, but was allocated dynamically (not aligned) Thank you for the explanation.
[Bug target/91043] GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 --- Comment #24 from Richard Biener --- (In reply to Richard Biener from comment #23) > (In reply to Hanoch Haim from comment #22) > > > > "Of course it does, because without aligning the container you cannot have > > aligned members. Maximum alignment always propagates outwards." > > > > Sorry, your answer is still not clear, so let give a short example > > In this case there is a discrepancy betwean two gcc modules > > > > 1. The module that generates the code think that it is aligned > > (CCPortLatency) > > 2. However the linker puts it in a none aligned location > > > > > > " > > class CTimeHistogram { > > > > } __rte_cache_aligned; > > > > class CCPortLatency { > > public: > > CTimeHistogram m_hist; > > }; > > class Root { > > > > CCPortLatency port; > > > > } __rte_cache_aligned; > > > > static Root root; > > " > > > > In this case can I expect root.port to be aligned because its child (m_hist) > > was defined as aligned and it propogate? Or should I explicitly ask both to > > be aligned? > > Yes, for > > class CTimeHistogram { > } __attribute__((aligned(64))); > class CCPortLatency { > public: > CTimeHistogram m_hist; > }; > class Root { > CCPortLatency port; > }; > static Root root; > > 'root' will be aligned to 64 bytes. This is also what you can easily > observe when inspecting the ELF object: > > Section Headers: > [Nr] Name Type Address Offset >Size EntSize Flags Link Info Align > ... > [ 3] .bss NOBITS 0040 >0040 WA 0 0 64 > > Symbol table '.symtab' contains 8 entries: >Num:Value Size TypeBind Vis Ndx Name > ... > 5: 64 OBJECT LOCAL DEFAULT3 _ZL4root > > compiled without optimization since root is unused and will otherwise > be eliminated. You can also check with static_assert (__alignof__(root.port.m_hist) == 64, "oops"); (need to make port public for this, eh)
[Bug target/91043] GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 --- Comment #23 from Richard Biener --- (In reply to Hanoch Haim from comment #22) > > "Of course it does, because without aligning the container you cannot have > aligned members. Maximum alignment always propagates outwards." > > Sorry, your answer is still not clear, so let give a short example > In this case there is a discrepancy betwean two gcc modules > > 1. The module that generates the code think that it is aligned > (CCPortLatency) > 2. However the linker puts it in a none aligned location > > > " > class CTimeHistogram { > > } __rte_cache_aligned; > > class CCPortLatency { > public: > CTimeHistogram m_hist; > }; > class Root { > > CCPortLatency port; > > } __rte_cache_aligned; > > static Root root; > " > > In this case can I expect root.port to be aligned because its child (m_hist) > was defined as aligned and it propogate? Or should I explicitly ask both to > be aligned? Yes, for class CTimeHistogram { } __attribute__((aligned(64))); class CCPortLatency { public: CTimeHistogram m_hist; }; class Root { CCPortLatency port; }; static Root root; 'root' will be aligned to 64 bytes. This is also what you can easily observe when inspecting the ELF object: Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align ... [ 3] .bss NOBITS 0040 0040 WA 0 0 64 Symbol table '.symtab' contains 8 entries: Num:Value Size TypeBind Vis Ndx Name ... 5: 64 OBJECT LOCAL DEFAULT3 _ZL4root compiled without optimization since root is unused and will otherwise be eliminated.
[Bug target/91043] GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 --- Comment #22 from Hanoch Haim --- "Of course it does, because without aligning the container you cannot have aligned members. Maximum alignment always propagates outwards." Sorry, your answer is still not clear, so let give a short example In this case there is a discrepancy betwean two gcc modules 1. The module that generates the code think that it is aligned (CCPortLatency) 2. However the linker puts it in a none aligned location " class CTimeHistogram { } __rte_cache_aligned; class CCPortLatency { public: CTimeHistogram m_hist; }; class Root { CCPortLatency port; } __rte_cache_aligned; static Root root; " In this case can I expect root.port to be aligned because its child (m_hist) was defined as aligned and it propogate? Or should I explicitly ask both to be aligned?
[Bug target/91043] GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 --- Comment #21 from Richard Biener --- (In reply to Hanoch Haim from comment #19) > After some investigation, I think it is not a gcc issue, please verify. > One of the internal object does not include a 64B alignment. > > #define __rte_cache_aligned __attribute__((__aligned__(64))); > > class CTimeHistogram { > > } __rte_cache_aligned; > > > class CCPortLatency { > public: > CTimeHistogram m_hist; > } __rte_cache_aligned; <<= without this, it is not aligned while the code > generation assumed it is aligned ! > > class Root { > > CCPortLatency port; > > } __rte_cache_aligned; > > > Is it valid? why the code generation assumed the CCPortLatency is aligned > because one of its internal is aligned? Of course it does, because without aligning the container you cannot have aligned members. Maximum alignment always propagates outwards.
[Bug target/91043] GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 --- Comment #20 from Hanoch Haim --- One more thing. I would expect that the issue would be in CTimeHistogram functions (defined as aligned) but the code generation issue was in the parent object ( CCPortLatency) Why the compiler assumed that if one of the internal objects is defined as aligned the parent is aligned too?
[Bug target/91043] GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 --- Comment #19 from Hanoch Haim --- After some investigation, I think it is not a gcc issue, please verify. One of the internal object does not include a 64B alignment. #define __rte_cache_aligned __attribute__((__aligned__(64))); class CTimeHistogram { } __rte_cache_aligned; class CCPortLatency { public: CTimeHistogram m_hist; } __rte_cache_aligned; <<= without this, it is not aligned while the code generation assumed it is aligned ! class Root { CCPortLatency port; } __rte_cache_aligned; Is it valid? why the code generation assumed the CCPortLatency is aligned because one of its internal is aligned?
[Bug target/91043] GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comment #18 from Alexander Monakov --- It seems the problem is not in CCPortLatency::Create, but rather one of its callers. Try to investigate in gdb where a misaligned pointer is derived from a 64-byte aligned pointer to the toplevel g_trex object (i.e. work up the stack from the point of the crash).
[Bug target/91043] GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 --- Comment #17 from Uroš Bizjak --- The asm dump claims that the access is aligned to 32bytes: #(insn 14 31 9 2 (set (mem:V4DI (plus:DI (reg/f:DI 3 bx [orig:90 this ] [90]) #(const_int 64 [0x40])) [6 MEM[(long unsigned int *)this_6(D) + 64B]+0 S32 A256]) #(reg:V4DI 21 xmm0 [92])) "../../src/stateful_rx_core.cpp":254 1228 {movv4di_internal} # (nil)) vmovdqa %ymm0, 64(%rbx) # 14movv4di_internal/4 [length = 5] which gets expanded from: ;; MEM[(long unsigned int *)this_6(D) + 64B] = { 0, 0, 0, 0 }; (insn 13 12 14 (set (reg:V4DI 92) (const_vector:V4DI [ (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) ])) "../../src/stateful_rx_core.cpp":254 -1 (nil)) (insn 14 13 0 (set (mem:V4DI (plus:DI (reg/f:DI 90 [ this ]) (const_int 64 [0x40])) [6 MEM[(long unsigned int *)this_6(D) + 64B]+0 S32 A256]) (reg:V4DI 92)) "../../src/stateful_rx_core.cpp":254 -1 (nil)) So, not a target issue.
[Bug target/91043] GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 --- Comment #16 from Hanoch Haim --- The global/parent object CGlobalTRex is aligned (64B) as expected: (gdb) p _trex $1 = (CGlobalTRex *) 0xc365c0 Could you explain why it is a problem to define the internal objects with the aligment like the parent (64B)?
[Bug target/91043] GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 --- Comment #15 from Richard Biener --- (In reply to Richard Biener from comment #14) > (In reply to Hanoch Haim from comment #12) > > Removing __rte_cache_aligned does not solve the issue > > > > > > diff --git a/src/time_histogram.h b/src/time_histogram.h > > index 07e66b49..26a37248 100755 > > --- a/src/time_histogram.h > > +++ b/src/time_histogram.h > > @@ -133,10 +133,10 @@ private: > > uint32_t m_win_cnt; > > uint32_t m_hot_max; > > dsec_t m_max_ar[HISTOGRAM_QUEUE_SIZE]; // Array of maximum latencies > > for previous periods > > -uint64_t m_hcnt[HISTOGRAM_SIZE_LOG][HISTOGRAM_SIZE] __rte_cache_aligned > > ; > > +uint64_t m_hcnt[HISTOGRAM_SIZE_LOG][HISTOGRAM_SIZE] ; > > // Hdr histogram instance > > hdr_histogram *m_hdrh; > > -}; > > +} __rte_cache_aligned; > > There are more aligned attributes. I see > > class CLatencyManager : public TrexRxCore { > ... > volatile bool m_do_stop __attribute__((__aligned__(64))) ; > > struct rte_ring { > char name[32] __attribute__((__aligned__(64))); > > class CFlowGenListPerThread { > ... > } __attribute__((__aligned__(64))); > > etc. > > Can you check the .bss section Alignment in the final executable/shared > object? > Do you by chance substitute the program loader for something not honoring > large alignment of .bss sections? You can also check with a debugger whether your global static object is properly aligned.
[Bug target/91043] GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 --- Comment #14 from Richard Biener --- (In reply to Hanoch Haim from comment #12) > Removing __rte_cache_aligned does not solve the issue > > > diff --git a/src/time_histogram.h b/src/time_histogram.h > index 07e66b49..26a37248 100755 > --- a/src/time_histogram.h > +++ b/src/time_histogram.h > @@ -133,10 +133,10 @@ private: > uint32_t m_win_cnt; > uint32_t m_hot_max; > dsec_t m_max_ar[HISTOGRAM_QUEUE_SIZE]; // Array of maximum latencies > for previous periods > -uint64_t m_hcnt[HISTOGRAM_SIZE_LOG][HISTOGRAM_SIZE] __rte_cache_aligned > ; > +uint64_t m_hcnt[HISTOGRAM_SIZE_LOG][HISTOGRAM_SIZE] ; > // Hdr histogram instance > hdr_histogram *m_hdrh; > -}; > +} __rte_cache_aligned; There are more aligned attributes. I see class CLatencyManager : public TrexRxCore { ... volatile bool m_do_stop __attribute__((__aligned__(64))) ; struct rte_ring { char name[32] __attribute__((__aligned__(64))); class CFlowGenListPerThread { ... } __attribute__((__aligned__(64))); etc. Can you check the .bss section Alignment in the final executable/shared object? Do you by chance substitute the program loader for something not honoring large alignment of .bss sections?
[Bug target/91043] GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 --- Comment #13 from Hanoch Haim --- One more thing, The parent object is defined with 64Byte alignment class CGlobalTRex { .. } __rte_cache_aligned; static CGlobalTRex trex;
[Bug target/91043] GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 Hanoch Haim changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|INVALID |--- --- Comment #12 from Hanoch Haim --- Removing __rte_cache_aligned does not solve the issue diff --git a/src/time_histogram.h b/src/time_histogram.h index 07e66b49..26a37248 100755 --- a/src/time_histogram.h +++ b/src/time_histogram.h @@ -133,10 +133,10 @@ private: uint32_t m_win_cnt; uint32_t m_hot_max; dsec_t m_max_ar[HISTOGRAM_QUEUE_SIZE]; // Array of maximum latencies for previous periods -uint64_t m_hcnt[HISTOGRAM_SIZE_LOG][HISTOGRAM_SIZE] __rte_cache_aligned ; +uint64_t m_hcnt[HISTOGRAM_SIZE_LOG][HISTOGRAM_SIZE] ; // Hdr histogram instance hdr_histogram *m_hdrh; -}; +} __rte_cache_aligned;
[Bug target/91043] GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 --- Comment #11 from Hanoch Haim --- thanks for the quick answer. The parent object is static (bss) and wasn't dynmicly allocated using new/malloc. gcc set the address of the parent object and the childs. Is there a way to solve it without removing the alignment?
[Bug target/91043] GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 Richard Biener changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |INVALID --- Comment #10 from Richard Biener --- I can reproduce the aligned stores with -O3 -march=core-avx2 with GCC 7/8/9. note that when I check alignof(CCPortLatency) I do get 64 byte alignment because it has a private member of type CTimeHistogram which has a member uint64_t m_hcnt[HISTOGRAM_SIZE_LOG][HISTOGRAM_SIZE] __attribute__((__aligned__(64))) ; This kind of overaligned type doesn't play well with "old" C++ new but you need support for overaligned types which is only in newer C++ standards or resort to posix_memalign or friends to allocate memory. Or simply drop the aligned attribute from above.
[Bug target/91043] GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 --- Comment #9 from Hanoch Haim --- Attached. I hope this is what you are looking for.
[Bug target/91043] GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 --- Comment #8 from Hanoch Haim --- Created attachment 46542 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46542=edit stateful_rx_core.ss
[Bug target/91043] GCC produces unaligned vmovdqa vector data access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043 --- Comment #7 from Hanoch Haim --- Created attachment 46541 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46541=edit stateful_rx_core.ii compress ii