[Bug tree-optimization/87205] Inefficient code generation for switch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87205 Martin Liška changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #12 from Martin Liška --- I fixed the basic problematic, please open a separate for the tricky C++ issues..
[Bug tree-optimization/87205] Inefficient code generation for switch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87205 --- Comment #11 from Martin Liška --- Author: marxin Date: Wed Sep 5 11:28:49 2018 New Revision: 264124 URL: https://gcc.gnu.org/viewcvs?rev=264124=gcc=rev Log: Group switch cases in switch lowering (PR tree-optimization/87205). 2018-09-05 Martin Liska PR tree-optimization/87205 * tree-switch-conversion.c (pass_lower_switch::execute): Group cases for switch statements. 2018-09-05 Martin Liska PR tree-optimization/87205 * gcc.dg/tree-ssa/pr87205-2.c: New test. * gcc.dg/tree-ssa/pr87205.c: New test. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/pr87205-2.c trunk/gcc/testsuite/gcc.dg/tree-ssa/pr87205.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-switch-conversion.c
[Bug tree-optimization/87205] Inefficient code generation for switch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87205 --- Comment #10 from Martin Liška --- (In reply to Peter Dimov from comment #8) > (In reply to Martin Liška from comment #7) > > I'm not sure here Y are different types here and member access based on > > the type is distinct. > > Yes, one could argue that, I suppose. But in the `return ((Y<0>*)p)->v;` > case the member access _is_ lifted outside the jump table. If that's correct > there, it should also be correct here. :-) Yes, it's hoisted out (lifted). But it happens very late in optimization pipeline, that's why we create first jump table and later we isolate the expression.
[Bug tree-optimization/87205] Inefficient code generation for switch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87205 --- Comment #9 from Peter Dimov --- For more context, see https://godbolt.org/z/SzfpKr ``` #include template struct variant { std::aligned_union_t<0, T...> storage_; unsigned index_; }; template auto visit( variant& v, F f ) { switch( v.index_ ) { case 0: return f( (T0&)v.storage_ ); case 1: return f( (T1&)v.storage_ ); case 2: return f( (T2&)v.storage_ ); case 3: return f( (T3&)v.storage_ ); case 4: return f( (T4&)v.storage_ ); case 5: return f( (T5&)v.storage_ ); default: __builtin_unreachable(); } } struct X { int v; }; template struct Y: X { }; using V = variant, Y<1>, Y<2>, Y<3>, Y<4>, Y<5>>; void f( X& ); int g( int ); int h1( V& v ) { return visit( v, [](X const& x){ return x.v; } ); } int h2( V& v ) { return visit( v, [](auto&& x){ return x.v; } ); } void h3( V& v ) { return visit( v, [](auto&& x){ f(x); } ); } int h4( V& v ) { return visit( v, [](auto&& x){ return g(x.v); } ); } ``` This generates ``` h1(variant, Y<1>, Y<2>, Y<3>, Y<4>, Y<5> >&): mov eax, DWORD PTR [rdi] ret h2(variant, Y<1>, Y<2>, Y<3>, Y<4>, Y<5> >&): mov eax, DWORD PTR [rdi] ret h3(variant, Y<1>, Y<2>, Y<3>, Y<4>, Y<5> >&): cmp DWORD PTR [rdi+4], 5 jbe .L15 .L15: jmp f(X&) h4(variant, Y<1>, Y<2>, Y<3>, Y<4>, Y<5> >&): cmp DWORD PTR [rdi+4], 5 jbe .L19 .L19: mov edi, DWORD PTR [rdi] jmp g(int) ``` so the member access is folded in both cases (which is good!), even though the first occurs through X& and the second through Y&. I've been unable to determine what makes the optimizations misfire. This code should in principle be the same as the simplified one, but it isn't.
[Bug tree-optimization/87205] Inefficient code generation for switch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87205 --- Comment #8 from Peter Dimov --- (In reply to Martin Liška from comment #7) > I'm not sure here Y are different types here and member access based on > the type is distinct. Yes, one could argue that, I suppose. But in the `return ((Y<0>*)p)->v;` case the member access _is_ lifted outside the jump table. If that's correct there, it should also be correct here. :-)
[Bug tree-optimization/87205] Inefficient code generation for switch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87205 --- Comment #7 from Martin Liška --- (In reply to Peter Dimov from comment #5) > Another: > > ``` > struct X > { > int v; > }; > > template struct Y: X > { > }; > > void f( int v ); > > void h( unsigned ix, void* p ) > { > switch( ix ) > { > case 0: f( ((Y<0>*)p)->v ); break; > case 1: f( ((Y<1>*)p)->v ); break; > case 2: f( ((Y<2>*)p)->v ); break; > case 3: f( ((Y<3>*)p)->v ); break; > case 4: f( ((Y<4>*)p)->v ); break; > case 5: f( ((Y<5>*)p)->v ); break; > default: __builtin_unreachable(); > } > } > ``` > > ``` > h(unsigned int, void*): > mov edi, edi > jmp [QWORD PTR .L4[0+rdi*8]] > .L4: > .quad .L3 > .quad .L3 > .quad .L3 > .quad .L3 > .quad .L3 > .quad .L3 > .L3: > mov edi, DWORD PTR [rsi] > jmp f(int) > ``` > > https://godbolt.org/z/pGVx6W > > This however demonstrates a different problem, so it may need to go into a > separate bug. I'm not sure here Y are different types here and member access based on the type is distinct.
[Bug tree-optimization/87205] Inefficient code generation for switch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87205 --- Comment #6 from Martin Liška --- > ``` > h(unsigned int, void*): > cmp edi, 5 > jbe .L5 > .L5: > mov rdi, rsi > jmp f(X*) > ``` > > https://godbolt.org/z/2Lh_GZ Good, my patch can handle that and can generate direct call: _Z1hjPv: .LFB0: .cfi_startproc movq%rsi, %rdi jmp _Z1fP1X .cfi_endproc
[Bug tree-optimization/87205] Inefficient code generation for switch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87205 --- Comment #5 from Peter Dimov --- Another: ``` struct X { int v; }; template struct Y: X { }; void f( int v ); void h( unsigned ix, void* p ) { switch( ix ) { case 0: f( ((Y<0>*)p)->v ); break; case 1: f( ((Y<1>*)p)->v ); break; case 2: f( ((Y<2>*)p)->v ); break; case 3: f( ((Y<3>*)p)->v ); break; case 4: f( ((Y<4>*)p)->v ); break; case 5: f( ((Y<5>*)p)->v ); break; default: __builtin_unreachable(); } } ``` ``` h(unsigned int, void*): mov edi, edi jmp [QWORD PTR .L4[0+rdi*8]] .L4: .quad .L3 .quad .L3 .quad .L3 .quad .L3 .quad .L3 .quad .L3 .L3: mov edi, DWORD PTR [rsi] jmp f(int) ``` https://godbolt.org/z/pGVx6W This however demonstrates a different problem, so it may need to go into a separate bug.
[Bug tree-optimization/87205] Inefficient code generation for switch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87205 --- Comment #4 from Peter Dimov --- If the code is not the same the jump table is not optimized out and there's no extra check. But it also happens with code that is not the same on the C++ side, for example: ``` struct X { int v; }; template struct Y: X { }; void f( X* x ); void h( unsigned ix, void* p ) { switch( ix ) { case 0: f( (Y<0>*)p ); break; case 1: f( (Y<1>*)p ); break; case 2: f( (Y<2>*)p ); break; case 3: f( (Y<3>*)p ); break; case 4: f( (Y<4>*)p ); break; case 5: f( (Y<5>*)p ); break; default: __builtin_unreachable(); } } ``` ``` h(unsigned int, void*): cmp edi, 5 jbe .L5 .L5: mov rdi, rsi jmp f(X*) ``` https://godbolt.org/z/2Lh_GZ A variation on the same theme, which demonstrates another kind of missed optimization: ``` struct X { int v; }; template struct Y: X { }; int h( unsigned ix, void* p ) { switch( ix ) { case 0: return ((Y<0>*)p)->v; case 1: return ((Y<1>*)p)->v; case 2: return ((Y<2>*)p)->v; case 3: return ((Y<3>*)p)->v; case 4: return ((Y<4>*)p)->v; case 5: return ((Y<5>*)p)->v; default: __builtin_unreachable(); } } ``` ``` h(unsigned int, void*): mov edi, edi mov eax, DWORD PTR [rsi] jmp [QWORD PTR .L4[0+rdi*8]] .L4: .quad .L9 .quad .L8 .quad .L7 .quad .L6 .quad .L5 .quad .L3 .L5: ret .L3: ret .L9: ret .L8: ret .L7: ret .L6: ret ``` https://godbolt.org/z/lCzlR2 There's a table, so there's no redundant check.
[Bug tree-optimization/87205] Inefficient code generation for switch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87205 Martin Liška changed: What|Removed |Added Target Milestone|--- |9.0