[Bug tree-optimization/83518] [8 Regression] Missing optimization: useless instructions should be dropped
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83518 Richard Biener changed: What|Removed |Added Priority|P3 |P2 Target Milestone|8.0 |9.0 --- Comment #6 from Richard Biener --- Re-targeting to GCC9.
[Bug tree-optimization/83518] [8 Regression] Missing optimization: useless instructions should be dropped
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83518 --- Comment #5 from rguenther at suse dot de --- On Mon, 8 Jan 2018, jakub at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83518 > > --- Comment #4 from Jakub Jelinek --- > Before store-merging we have: > MEM[(int *)] = { 5, 4, 3, 2 }; > t_2 = arr[0]; > _65 = arr[1]; > _69 = arr[2]; > _73 = arr[3]; > arr[0] = _69; > arr[1] = _73; > arr[2] = 1; > arr[3] = t_2; > vect__2.5_38 = MEM[(int *)]; > and all store-merging can do with this is what it does: > MEM[(int *)] = { 5, 4, 3, 2 }; > t_2 = arr[0]; > _65 = arr[1]; > _69 = arr[2]; > _73 = arr[3]; > _46 = MEM[(int *) + 8B]; > MEM[(int *)] = _46; > arr[2] = 1; > arr[3] = t_2; > vect__2.5_38 = MEM[(int *)]; > where the _69 and _73 sets can be DCEd later. store-merging has no framework > like FRE to do analysis what memory location contains at which point. > So we'd need another late FRE pass to handle this? Without enhancing FRE that wouldn't help. I think the best thing is to try sanitizing the IL produced by unrolling so SLP vectorization can do its job here. Nothing for GCC 8 though.
[Bug tree-optimization/83518] [8 Regression] Missing optimization: useless instructions should be dropped
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83518 --- Comment #4 from Jakub Jelinek --- Before store-merging we have: MEM[(int *)] = { 5, 4, 3, 2 }; t_2 = arr[0]; _65 = arr[1]; _69 = arr[2]; _73 = arr[3]; arr[0] = _69; arr[1] = _73; arr[2] = 1; arr[3] = t_2; vect__2.5_38 = MEM[(int *)]; and all store-merging can do with this is what it does: MEM[(int *)] = { 5, 4, 3, 2 }; t_2 = arr[0]; _65 = arr[1]; _69 = arr[2]; _73 = arr[3]; _46 = MEM[(int *) + 8B]; MEM[(int *)] = _46; arr[2] = 1; arr[3] = t_2; vect__2.5_38 = MEM[(int *)]; where the _69 and _73 sets can be DCEd later. store-merging has no framework like FRE to do analysis what memory location contains at which point. So we'd need another late FRE pass to handle this?
[Bug tree-optimization/83518] [8 Regression] Missing optimization: useless instructions should be dropped
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83518 --- Comment #3 from Richard Biener --- Yes, it's not really sth new but a known issue with late value-numbering. Note that FRE wouldn't know how to simplify this either, we'd need store-merging to effectively vectorize the earlier sets. BB vectorization doesn't do this because after unrolling we see vect_cst__46 = { 5, 4, 3, 2 }; MEM[(int *)] = vect_cst__46; arr[4] = 1; t_2 = arr[0]; arr[0] = 5; arr[0] = t_2; t_32 = arr[0]; _65 = arr[1]; arr[0] = _65; arr[1] = t_32; t_68 = arr[0]; _69 = arr[2]; arr[0] = _69; arr[2] = t_68; t_72 = arr[0]; _73 = arr[3]; arr[0] = _73; arr[3] = t_72; t_76 = arr[0]; _77 = arr[4]; arr[0] = _77; arr[4] = t_76; i_80 = 1; ivtmp_81 = 4; pretmp_82 = arr[0]; t_87 = arr[i_80]; arr[i_80] = pretmp_82; ... and BB vectorization is confused by the dead stores (and DSE would be by the missed constant propagations).
[Bug tree-optimization/83518] [8 Regression] Missing optimization: useless instructions should be dropped
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83518 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #2 from Jakub Jelinek --- We do vectorize this and thus not unroll it since r253975. With -O3 -fno-vect-cost-model we've generated what GCC 8 emits or something similar for many years, including 4.4+ (r140264 already shows this behavior).
[Bug tree-optimization/83518] [8 Regression] Missing optimization: useless instructions should be dropped
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83518 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |NEW Known to work||7.2.1 Version|tree-ssa|8.0 Keywords||missed-optimization Last reconfirmed||2018-01-05 CC||rguenth at gcc dot gnu.org Ever confirmed|0 |1 Summary|Missing optimization: |[8 Regression] Missing |useless instructions should |optimization: useless |be dropped |instructions should be ||dropped Target Milestone|--- |8.0 --- Comment #1 from Richard Biener --- Works fine with GCC 7. I suppose unroller limits hit and/or we're just lucky that GCC 7 doesn't vectorize the reduction loop ... Trunk has t.C:16:20: note: loop vectorized t.C:21:10: note: basic block vectorized resulting in [local count: 178992762]: MEM[(int *)] = { 5, 4, 3, 2 }; t_2 = arr[0]; _65 = arr[1]; _46 = MEM[(int *) + 8B]; MEM[(int *)] = _46; arr[2] = 1; arr[3] = t_2; vect__2.5_38 = MEM[(int *)]; vect_sum_21.8_30 = VEC_PERM_EXPR; vect_sum_21.8_15 = vect_sum_21.8_30 + vect__2.5_38; vect_sum_21.8_59 = VEC_PERM_EXPR ; vect_sum_21.8_60 = vect_sum_21.8_15 + vect_sum_21.8_59; stmp_sum_21.7_61 = BIT_FIELD_REF ; sum_27 = stmp_sum_21.7_61 + _65; _23 = (unsigned int) sum_27; arr ={v} {CLOBBER}; return _23; while GCC 7 simply unrolls the loop. DOM is not able to simplify the vector load from MEM[(int *)] but the scalar loads from the unrolled variant.