> What compiler / version / flags / OS did you try?
>

I am running experiment on a machine with:

   - Intel(R) Xeon(R) Platinum 8268 CPU @ 2.90GHz
   - Ubuntu 18.04.6 LTS
   - LLVM/Clang 15.0.6 (build from source)

These are the flags I am using:

CFLAGS = -O3 -fuse-ld=lld -gline-tables-only -fprofile-instr-generate
LDFLAGS = -fuse-ld=lld -Wl,-q


FWIW, I've experimented with LTO and PGO a bunch, both with gcc and clang. I
> did hit a crash in gcc, but that did turn out to be a compiler bug, and
> actually reduced to something not even needing LTO.
>

Good to hear that it works. I just need to figure out what is going wrong
on my end then.


> I saw quite substantial speedups with PGO, but I only tested very specific
> workloads. IIRC it was >15% gain in concurrent readonly pgbench.
>

I successfully applied PGO only and obtained similar gains with TPC-C &
TPC-H workloads.

I dimly recall failing to get some benefit out of bolt for some reason that
> I
> unfortunately don't even vaguely recall.
>

I got similar gains slightly higher than PGO with BOLT, but not for all
queries in TPC-H. In fact, I observed small (2-4%) regressions with BOLT.

-- 
João Paulo L. de Carvalho
Ph.D Computer Science |  IC-UNICAMP | Campinas , SP - Brazil
Postdoctoral Research Fellow | University of Alberta | Edmonton, AB - Canada
joao.carva...@ic.unicamp.br
joao.carva...@ualberta.ca

Reply via email to