> What compiler / version / flags / OS did you try? > I am running experiment on a machine with:
- Intel(R) Xeon(R) Platinum 8268 CPU @ 2.90GHz - Ubuntu 18.04.6 LTS - LLVM/Clang 15.0.6 (build from source) These are the flags I am using: CFLAGS = -O3 -fuse-ld=lld -gline-tables-only -fprofile-instr-generate LDFLAGS = -fuse-ld=lld -Wl,-q FWIW, I've experimented with LTO and PGO a bunch, both with gcc and clang. I > did hit a crash in gcc, but that did turn out to be a compiler bug, and > actually reduced to something not even needing LTO. > Good to hear that it works. I just need to figure out what is going wrong on my end then. > I saw quite substantial speedups with PGO, but I only tested very specific > workloads. IIRC it was >15% gain in concurrent readonly pgbench. > I successfully applied PGO only and obtained similar gains with TPC-C & TPC-H workloads. I dimly recall failing to get some benefit out of bolt for some reason that > I > unfortunately don't even vaguely recall. > I got similar gains slightly higher than PGO with BOLT, but not for all queries in TPC-H. In fact, I observed small (2-4%) regressions with BOLT. -- João Paulo L. de Carvalho Ph.D Computer Science | IC-UNICAMP | Campinas , SP - Brazil Postdoctoral Research Fellow | University of Alberta | Edmonton, AB - Canada joao.carva...@ic.unicamp.br joao.carva...@ualberta.ca