Re: JIT compiling with LLVM v10.1
Hi, On 02/14/2018 01:17 PM, Andres Freund wrote: On 2018-02-07 06:54:05 -0800, Andres Freund wrote: I've pushed v10.0. The big (and pretty painful to make) change is that now all the LLVM specific code lives in src/backend/jit/llvm, which is built as a shared library which is loaded on demand. I thought https://db.in.tum.de/~leis/papers/adaptiveexecution.pdf?lang=en was relevant for this thread. Best regards, Jesper
Re: JIT compiling with LLVM v10.1
Hi, On 2018-02-15 11:59:46 +0300, Konstantin Knizhnik wrote: > It is well known fact that Postgres spends most of the time in sequence scan > queries for warm data in deforming tuples (17% in case of TPC-H Q1). I think that the majority of the time therein is not actually bottlenecked by CPU, but by cache misses. It might be worthwhile to repeat your analysis with the last patch of my series applied, and the #define FASTORDER uncommented. > Postgres tries to optimize access to the tuple by caching fixed size > offsets to the fields whenever possible and loading attributes on demand. > It is also well know recommendation to put fixed size, non-null, frequently > used attributes at the beginning of table's attribute list to make this > optimization work more efficiently. FWIW, I think this optimization causes vastly more trouble than it's worth. > You can see in the code of heap_deform_tuple shows that first NULL value > will switch it to "slow" mode: Note that in most workloads the relevant codepath isn't heap_deform_tuple but slot_deform_tuple. > 1. Modern platforms are mostly limited by memory access time, number of > performed instructions is less critical. I don't think this is quite the correct result. Especially because a lot of time is spent accessing memory, having code that the CPU can execute out-of-order (by speculatively executing forward) is hugely beneficial. Some of the benefit of JITing comes from being able to start deforming the next field while memory fetches for the previous one are still ongoing (iff dealing with fixed width cols). > 2. For large number of attributes JIT-ing of deform tuple can improve speed > up to two time. Which is quite good result from my point of view. +1 Note the last version has a small deficiency in decoding varlena datums that I need to fix (varsize_any isn't inlined anymore). Greetings, Andres Freund
Re: JIT compiling with LLVM v10.1
On 14.02.2018 21:17, Andres Freund wrote: Hi, On 2018-02-07 06:54:05 -0800, Andres Freund wrote: I've pushed v10.0. The big (and pretty painful to make) change is that now all the LLVM specific code lives in src/backend/jit/llvm, which is built as a shared library which is loaded on demand. The layout is now as follows: src/backend/jit/jit.c: Part of JITing always linked into the server. Supports loading the LLVM using JIT library. src/backend/jit/llvm/ Infrastructure: llvmjit.c: General code generation and optimization infrastructure llvmjit_error.cpp, llvmjit_wrap.cpp: Error / backward compat wrappers llvmjit_inline.cpp: Cross module inlining support Code-Gen: llvmjit_expr.c Expression compilation llvmjit_deform.c Deform compilation I've pushed a revised version that hopefully should address Jeff's wish/need of being able to experiment with this out of core. There's now a "jit_provider" PGC_POSTMASTER GUC that's by default set to "llvmjit". llvmjit.so is the .so implementing JIT using LLVM. It fills a set of callbacks via extern void _PG_jit_provider_init(JitProviderCallbacks *cb); which can also be implemented by any other potential provider. The other two biggest changes are that I've added a README https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=src/backend/jit/README;hb=jit and that I've revised the configure support so it does more error checks, and moved it into config/llvm.m4. There's a larger smattering of small changes too. I'm pretty happy with how the separation of core / shlib looks now. I'm planning to work on cleaning and then pushing some of the preliminary patches (fixed tupledesc, grouping) over the next few days. Greetings, Andres Freund I have made some more experiments with efficiency of JIT-ing of deform tuple and I want to share this results (I hope that them will be interesting). It is well known fact that Postgres spends most of the time in sequence scan queries for warm data in deforming tuples (17% in case of TPC-H Q1). Postgres tries to optimize access to the tuple by caching fixed size offsets to the fields whenever possible and loading attributes on demand. It is also well know recommendation to put fixed size, non-null, frequently used attributes at the beginning of table's attribute list to make this optimization work more efficiently. You can see in the code of heap_deform_tuple shows that first NULL value will switch it to "slow" mode: for (attnum = 0; attnum < natts; attnum++) { Form_pg_attribute thisatt = TupleDescAttr(tupleDesc, attnum); if (hasnulls && att_isnull(attnum, bp)) { values[attnum] = (Datum) 0; isnull[attnum] = true; slow = true; /* can't use attcacheoff anymore */ continue; } I tried to investigate importance of this optimization and what is actual penalty of "slow" mode. At the same time I want to understand how JIT help to speed-up tuple deforming. I have populated with data three tables: create table t1(id integer primary key,c1 integer,c2 integer,c3 integer,c4 integer,c5 integer,c6 integer,c7 integer,c8 integer,c9 integer); create table t2(id integer primary key,c1 integer,c2 integer,c3 integer,c4 integer,c5 integer,c6 integer,c7 integer,c8 integer,c9 integer); create table t3(id integer primary key,c1 integer not null,c2 integer not null,c3 integer not null,c4 integer not null,c5 integer not null,c6 integer not null,c7 integer not null,c8 integer not null,c9 integer not null); insert into t1 (id,c1,c2,c3,c4,c5,c6,c7,c8) values (generate_series(1,1000),0,0,0,0,0,0,0,0); insert into t2 (id,c2,c3,c4,c5,c6,c7,c8,c9) values (generate_series(1,1000),0,0,0,0,0,0,0,0); insert into t3 (id,c1,c2,c3,c4,c5,c6,c7,c8,c9) values (generate_series(1,1000),0,0,0,0,0,0,0,0,0); vacuum analyze t1; vacuum analyze t2; vacuum analyze t3; t1 contains null in last c9 column, t2 - in first c1 columns and t3 has all attributes declared as not-null (and JIT can use this knowledge to generate more efficient deforming code). All data set is hold in memory (shared buffer size is greater than database size) and I intentionally switch off parallel execution to make results more deterministic. I run two queries calculating aggregates on one/all not-null fields: select sum(c8) from t*; select sum(c2), sum(c3), sum(c4), sum(c5), sum(c6), sum(c7), sum(c8) from t*; As expected 35% time was spent in heap_deform_tuple. But results (msec) were slightly confusing and unexected: select sum(c8) from t*; w/o JIT with JIT t1 763 563 t2 772 570 t3 776 592 select sum(c2), sum(c3), sum(c4), sum(c5), sum(c6), sum(c7), sum(c8) from t*; w/o JIT with JIT t1 1239742 t2 1233747 t3 1255803 I repeat each query 10 times and take the minimal time ( I think that it
Re: JIT compiling with LLVM v10.1
Hi, On 2018-02-14 23:32:17 +0100, Pierre Ducroquet wrote: > Here are the LLVM4 and LLVM3.9 compatibility patches. > Successfully built, and executed some silly queries with JIT forced to make > sure it worked. Thanks! I'm going to integrate them into my series in the next few days. Regards, Andres
Re: JIT compiling with LLVM v10.1
On Wednesday, February 14, 2018 7:17:10 PM CET Andres Freund wrote: > Hi, > > On 2018-02-07 06:54:05 -0800, Andres Freund wrote: > > I've pushed v10.0. The big (and pretty painful to make) change is that > > now all the LLVM specific code lives in src/backend/jit/llvm, which is > > built as a shared library which is loaded on demand. > > > > The layout is now as follows: > > > > src/backend/jit/jit.c: > > Part of JITing always linked into the server. Supports loading the > > LLVM using JIT library. > > > > src/backend/jit/llvm/ > > > > Infrastructure: > > llvmjit.c: > > General code generation and optimization infrastructure > > > > llvmjit_error.cpp, llvmjit_wrap.cpp: > > Error / backward compat wrappers > > > > llvmjit_inline.cpp: > > Cross module inlining support > > > > Code-Gen: > > llvmjit_expr.c > > > > Expression compilation > > > > llvmjit_deform.c > > > > Deform compilation > > I've pushed a revised version that hopefully should address Jeff's > wish/need of being able to experiment with this out of core. There's now > a "jit_provider" PGC_POSTMASTER GUC that's by default set to > "llvmjit". llvmjit.so is the .so implementing JIT using LLVM. It fills a > set of callbacks via > extern void _PG_jit_provider_init(JitProviderCallbacks *cb); > which can also be implemented by any other potential provider. > > The other two biggest changes are that I've added a README > https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob; > f=src/backend/jit/README;hb=jit and that I've revised the configure support > so it does more error > checks, and moved it into config/llvm.m4. > > There's a larger smattering of small changes too. > > I'm pretty happy with how the separation of core / shlib looks now. I'm > planning to work on cleaning and then pushing some of the preliminary > patches (fixed tupledesc, grouping) over the next few days. > > Greetings, > > Andres Freund Hi Here are the LLVM4 and LLVM3.9 compatibility patches. Successfully built, and executed some silly queries with JIT forced to make sure it worked. Pierre>From c856a5db2f0ba34ba7c230a65f60277ae0e7347f Mon Sep 17 00:00:00 2001 From: Pierre Date: Fri, 2 Feb 2018 09:11:55 +0100 Subject: [PATCH 1/8] Add support for LLVM4 in llvmjit.c Signed-off-by: Pierre Ducroquet --- src/backend/jit/llvm/llvmjit.c | 21 +++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c index 7a96ece0f7..7557dc9a19 100644 --- a/src/backend/jit/llvm/llvmjit.c +++ b/src/backend/jit/llvm/llvmjit.c @@ -222,13 +222,20 @@ llvm_get_function(LLVMJitContext *context, const char *funcname) addr = 0; if (LLVMOrcGetSymbolAddressIn(handle->stack, &addr, handle->orc_handle, mangled)) - elog(ERROR, "failed to lookup symbol"); + elog(ERROR, "failed to lookup symbol %s", mangled); if (addr) return (void *) addr; } #else +#if LLVM_VERSION_MAJOR < 5 + if ((addr = LLVMOrcGetSymbolAddress(llvm_opt0_orc, mangled))) + return (void *) addr; + if ((addr = LLVMOrcGetSymbolAddress(llvm_opt3_orc, mangled))) + return (void *) addr; + elog(ERROR, "failed to lookup symbol %s for %s", mangled, funcname); +#else if (LLVMOrcGetSymbolAddress(llvm_opt0_orc, &addr, mangled)) elog(ERROR, "failed to lookup symbol"); if (addr) @@ -237,6 +244,8 @@ llvm_get_function(LLVMJitContext *context, const char *funcname) elog(ERROR, "failed to lookup symbol"); if (addr) return (void *) addr; +#endif // LLVM_VERSION_MAJOR + #endif elog(ERROR, "failed to JIT: %s", funcname); @@ -374,11 +383,18 @@ llvm_compile_module(LLVMJitContext *context) * faster instruction selection mechanism is used. */ { - LLVMSharedModuleRef smod; instr_time tb, ta; /* emit the code */ INSTR_TIME_SET_CURRENT(ta); +#if LLVM_VERSION_MAJOR < 5 + orc_handle = LLVMOrcAddEagerlyCompiledIR(compile_orc, context->module, + llvm_resolve_symbol, NULL); + // It seems there is no error return from that function in LLVM < 5. +#else + LLVMSharedModuleRef smod; + + LLVMSharedModuleRef smod; smod = LLVMOrcMakeSharedModule(context->module); if (LLVMOrcAddEagerlyCompiledIR(compile_orc, &orc_handle, smod, llvm_resolve_symbol, NULL)) @@ -386,6 +402,7 @@ llvm_compile_module(LLVMJitContext *context) elog(ERROR, "failed to jit module"); } LLVMOrcDisposeSharedModuleRef(smod); +#endif INSTR_TIME_SET_CURRENT(tb); INSTR_TIME_SUBTRACT(tb, ta); ereport(DEBUG1, (errmsg("time to emit: %.3fs", -- 2.16.1 >From a44378f05c33a40c485f26e5f007614100c70fe7 Mon Sep 17 00:00:00 2001 From: Pierre Date: Fri, 2 Feb 2018 09:13:40 +0100 Subject: [PATCH 2/8] Add LLVM4 support in llvmjit_error.cpp Signed-off-by: Pierre Ducroquet --- src/backend/jit/llvm/llvmjit_error.cpp | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/backend/jit/llvm/llvmjit_error.cpp b/src/backend/jit/llvm/llvmjit_error.c
Re: JIT compiling with LLVM v10.1
Hi, On 2018-02-07 06:54:05 -0800, Andres Freund wrote: > I've pushed v10.0. The big (and pretty painful to make) change is that > now all the LLVM specific code lives in src/backend/jit/llvm, which is > built as a shared library which is loaded on demand. > > The layout is now as follows: > > src/backend/jit/jit.c: > Part of JITing always linked into the server. Supports loading the > LLVM using JIT library. > > src/backend/jit/llvm/ > Infrastructure: > llvmjit.c: > General code generation and optimization infrastructure > llvmjit_error.cpp, llvmjit_wrap.cpp: > Error / backward compat wrappers > llvmjit_inline.cpp: > Cross module inlining support > Code-Gen: > llvmjit_expr.c > Expression compilation > llvmjit_deform.c > Deform compilation I've pushed a revised version that hopefully should address Jeff's wish/need of being able to experiment with this out of core. There's now a "jit_provider" PGC_POSTMASTER GUC that's by default set to "llvmjit". llvmjit.so is the .so implementing JIT using LLVM. It fills a set of callbacks via extern void _PG_jit_provider_init(JitProviderCallbacks *cb); which can also be implemented by any other potential provider. The other two biggest changes are that I've added a README https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=src/backend/jit/README;hb=jit and that I've revised the configure support so it does more error checks, and moved it into config/llvm.m4. There's a larger smattering of small changes too. I'm pretty happy with how the separation of core / shlib looks now. I'm planning to work on cleaning and then pushing some of the preliminary patches (fixed tupledesc, grouping) over the next few days. Greetings, Andres Freund