https://llvm.org/bugs/show_bug.cgi?id=24449
Bug ID: 24449 Summary: [x86] avoid big bad immediates in the instruction stream, part 3: merge stores Product: libraries Version: trunk Hardware: PC OS: All Status: NEW Severity: normal Priority: P Component: Backend: X86 Assignee: unassignedb...@nondot.org Reporter: spatel+l...@rotateright.com CC: llvm-bugs@lists.llvm.org Classification: Unclassified This report is based on the post-commit thread for r244601: http://reviews.llvm.org/rL244601 http://reviews.llvm.org/D11363 Sean Silva observed sequences of 11-byte (!) instructions: 20b7f: 48 c7 80 78 01 00 00 00 00 00 00 movq $0, 376(%rax) 20b8a: 48 c7 80 80 01 00 00 00 00 00 00 movq $0, 384(%rax) 20b95: 48 c7 80 88 01 00 00 00 00 00 00 movq $0, 392(%rax) 20ba0: 48 c7 80 90 01 00 00 00 00 00 00 movq $0, 400(%rax) ... One way to reduce this bloat - merge adjacent stores into a larger store: vxorps %ymm0, %ymm0, %ymm0 [c5 fc 57 c0] vmovups %ymm0, 376(%rax) [c5 fc 11 87 78 01 00 00] ...that's 12 bytes instead of 44 (assuming we have AVX for a 32-byte store; if not, use SSE for 16-byte stores). It looks like this is a problem in lowering a memset specifically of zeros. If I change the constant value in the example below, I get the expected store merging that is implemented in the DAGCombiner: $ cat memset.c void big_zero(int *a) { a[0] = 0; a[1] = 0; a[2] = 0; a[3] = 0; a[4] = 0; a[5] = 0; a[6] = 0; a[7] = 0; } void big_nonzero(int *a) { a[0] = 1; a[1] = 1; a[2] = 1; a[3] = 1; a[4] = 1; a[5] = 1; a[6] = 1; a[7] = 1; } $ ./clang -O2 -fomit-frame-pointer -mavx memset.c -S -o - _big_zero ... movq $0, 24(%rdi) movq $0, 16(%rdi) movq $0, 8(%rdi) movq $0, (%rdi) retq _big_nonzero: ## @big_nonzero ... vmovaps LCPI1_0(%rip), %ymm0 ## ymm0 = [1,1,1,1,1,1,1,1] vmovups %ymm0, (%rdi) vzeroupper retq -- You are receiving this mail because: You are on the CC list for the bug.
_______________________________________________ llvm-bugs mailing list llvm-bugs@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs