Re: effect of JIT tuple deform?
On Wednesday, June 27, 2018 5:38:31 PM CEST Pavel Stehule wrote: > 2018-06-27 17:19 GMT+02:00 Tomas Vondra : > > On 06/26/2018 09:25 PM, Pavel Stehule wrote: > >> Hi > >> > >> ... > >> > >> So I am able to see effect of jit_tuple_deforming, and very well, but > >> only if optimization is active. When optimization is not active then > >> jit_tuple_deforming does slowdown. > >> > >> So maybe a usage of jit_tuple_deforming can be conditioned by > >> jit_optimization? > > > > Can you share the test case and some detail about the hardware and > > PostgreSQL configuration? > > I did very simple test > > > 0. > > master branch without asserts, shared buffer to 1GB > tested on Lenovo T520 8GB RAM 8CPU, i7 > Fedora 28, gcc CFLAGS="-ggdb -Og -g3 -fno-omit-frame-pointer" --with-llvm > > 1. > > select 'create table wt(' || string_agg(format('%I int', 'c' || i),',') || > ')' from generate_series(1,100) g(i) \gexec > > > 2. > > begin; > select 'insert into wt values(' || (select > string_agg((random()*1)::int::text,',') from generate_series(1,j - j + > 100) g(i)) || ')' from generate_series(1,100) gg(j) \gexec > insert into wt select * from wt; > commit; > > 3. > > set max_paralel_workers to 0; -- the effect of JIT will be more visible > > analyze wt; > \timing > > select sum(c99) from wt; > > I tested some combination of: > > jit: off on > jit_inline_above_cost: 0, 10 > jit_optimize_above_cost: 0, 10 > jit_tuple_deforming: on, off > > > My primitive tests shows nice possitive effect of jit_tuple_deforming if > jit optimization is active. When jit optimization is not active, then > jit_tuple_deforming did slowdown in my test. > > So there is range of costs between 10 and 50 where > jit_tuple_deforming didn't work well (without optimization) > > I am limmited by small memory of my notebook - when I created table larger > than 3GB, then I got IO waits on my crypted disc, and any effect of JIT was > eliminated. > > Regards > > Pavel Hi I have studied this case a bit, and I think too that there is something wrong here. Right now, jit_optimize is a -O3. It's really expensive, and triggering it here is not the right solution. In the attached patch, I force a -O1 for tuple deforming. With your test case, on my computer, the results are : - no jit : 850ms - jit with tuple deforming without optimizations : 1650 ms (1.5ms spent optimizing) - jit without tuple deforming : 820ms (0.2ms) - jit with tuple deforming with optimization (-O3) : 770ms (105ms) - jit with tuple deforming with patch (-O1) : 725ms (54ms) I will look at the generated code for tuple deforming, but I think we should pre-optimize the LLVM bytecode if we do not want to involve the LLVM optimization passes. Regards Pierre >From c2e70c8fbb7715283d3d53bdf5a70e4db18c99a9 Mon Sep 17 00:00:00 2001 From: Pierre Ducroquet Date: Mon, 2 Jul 2018 13:44:10 +0200 Subject: [PATCH] Introduce opt1 in LLVM/JIT, and force it with deforming --- src/backend/jit/llvm/llvmjit.c | 30 ++ 1 file changed, 30 insertions(+) diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c index 5d0cdab1fc..025319e9c1 100644 --- a/src/backend/jit/llvm/llvmjit.c +++ b/src/backend/jit/llvm/llvmjit.c @@ -91,10 +91,12 @@ static const char *llvm_layout = NULL; static LLVMTargetMachineRef llvm_opt0_targetmachine; +static LLVMTargetMachineRef llvm_opt1_targetmachine; static LLVMTargetMachineRef llvm_opt3_targetmachine; static LLVMTargetRef llvm_targetref; static LLVMOrcJITStackRef llvm_opt0_orc; +static LLVMOrcJITStackRef llvm_opt1_orc; static LLVMOrcJITStackRef llvm_opt3_orc; @@ -277,6 +279,8 @@ llvm_get_function(LLVMJitContext *context, const char *funcname) #if LLVM_VERSION_MAJOR < 5 if ((addr = LLVMOrcGetSymbolAddress(llvm_opt0_orc, funcname))) return (void *) (uintptr_t) addr; + if ((addr = LLVMOrcGetSymbolAddress(llvm_opt1_orc, funcname))) + return (void *) (uintptr_t) addr; if ((addr = LLVMOrcGetSymbolAddress(llvm_opt3_orc, funcname))) return (void *) (uintptr_t) addr; #else @@ -284,6 +288,10 @@ llvm_get_function(LLVMJitContext *context, const char *funcname) elog(ERROR, "failed to look up symbol \"%s\"", funcname); if (addr) return (void *) (uintptr_t) addr; + if (LLVMOrcGetSymbolAddress(llvm_opt1_orc, , funcname)) + elog(ERROR, "failed to look up symbol \"%s\"", funcname); + if (addr) + return (void *) (uintptr_t) addr; if (LLVMOrcGetSymbolAddress(llvm_opt3_orc, , funcname)) elog(ERROR, "failed to look up symbol \"%s\"", funcname); if (addr) @@ -420,6 +428,8 @@ llvm_optimize_module(LLVMJitContext *context, LLVMModuleRef module) if (context->base.flags & PGJIT_OPT3) compile_optlevel = 3; + else if (context->base.flags & PGJIT_DEFORM) + compile_optlevel = 1; else compile_optlevel = 0; @@ -491,6 +501,8 @@ llvm_compile_module(LLVMJitContext *context) if (context->base.flags & PGJIT_OPT3) compile_orc =
Re: effect of JIT tuple deform?
2018-06-27 17:19 GMT+02:00 Tomas Vondra : > On 06/26/2018 09:25 PM, Pavel Stehule wrote: > >> Hi >> >> ... >> >> So I am able to see effect of jit_tuple_deforming, and very well, but >> only if optimization is active. When optimization is not active then >> jit_tuple_deforming does slowdown. >> >> So maybe a usage of jit_tuple_deforming can be conditioned by >> jit_optimization? >> >> > Can you share the test case and some detail about the hardware and > PostgreSQL configuration? > I did very simple test 0. master branch without asserts, shared buffer to 1GB tested on Lenovo T520 8GB RAM 8CPU, i7 Fedora 28, gcc CFLAGS="-ggdb -Og -g3 -fno-omit-frame-pointer" --with-llvm 1. select 'create table wt(' || string_agg(format('%I int', 'c' || i),',') || ')' from generate_series(1,100) g(i) \gexec 2. begin; select 'insert into wt values(' || (select string_agg((random()*1)::int::text,',') from generate_series(1,j - j + 100) g(i)) || ')' from generate_series(1,100) gg(j) \gexec insert into wt select * from wt; commit; 3. set max_paralel_workers to 0; -- the effect of JIT will be more visible analyze wt; \timing select sum(c99) from wt; I tested some combination of: jit: off on jit_inline_above_cost: 0, 10 jit_optimize_above_cost: 0, 10 jit_tuple_deforming: on, off My primitive tests shows nice possitive effect of jit_tuple_deforming if jit optimization is active. When jit optimization is not active, then jit_tuple_deforming did slowdown in my test. So there is range of costs between 10 and 50 where jit_tuple_deforming didn't work well (without optimization) I am limmited by small memory of my notebook - when I created table larger than 3GB, then I got IO waits on my crypted disc, and any effect of JIT was eliminated. Regards Pavel > regards > > -- > Tomas Vondra http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >
Re: effect of JIT tuple deform?
On 06/26/2018 09:25 PM, Pavel Stehule wrote: Hi ... So I am able to see effect of jit_tuple_deforming, and very well, but only if optimization is active. When optimization is not active then jit_tuple_deforming does slowdown. So maybe a usage of jit_tuple_deforming can be conditioned by jit_optimization? Can you share the test case and some detail about the hardware and PostgreSQL configuration? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: effect of JIT tuple deform?
Hi I played with it and maybe I got interesting result. When I played with really big table, then I got IO waits and effect of jit_tuple_deforming is near to zero (as expected) When I played with smaller table under RAM, then I can see positive effect of JIT_TD, but only when optimization is active. When optimization was false, then JIT_TD has negative slowdown. Shared buffers 1GB table wt .. 100columns, 2M rows, cca 823MB set max_parallel_workers to 0; query: select sum(c99) from wt; default (jit active) ... 1853ms set jit_tuple_deforming to off .. 1397ms set jit off .. 1400ms == enforced inline == jit_tuple_deforming on .. 1610ms jit_tuple_deforming off .. 1420ms == enforced optimize == jit_tuple_deforming on .. 842ms jit_tuple_deforming off .. 1420ms I played with setting and I got the best speed with jit_inline_above_cost ..some pretty high number jit_optimize_above_cost 0 jit_tuple_deforming on So I am able to see effect of jit_tuple_deforming, and very well, but only if optimization is active. When optimization is not active then jit_tuple_deforming does slowdown. So maybe a usage of jit_tuple_deforming can be conditioned by jit_optimization? Regards Pavel
Re: effect of JIT tuple deform?
2018-06-25 5:41 GMT+02:00 Andres Freund : > On 2018-06-25 05:32:37 +0200, Pavel Stehule wrote: > > 2018-06-24 22:32 GMT+02:00 Dmitry Dolgov <9erthali...@gmail.com>: > > > > > > On 23 June 2018 at 08:47, Pavel Stehule > wrote: > > > > > > > > > > > > 2018-06-23 8:35 GMT+02:00 Pavel Stehule : > > > >> > > > >> Hi > > > >> > > > >> I try to measure effect of JIT tuple deform and I don't see any > possible > > > >> effect. > > > >> > > > >> Is it this feature active in master branch? > > > >> > > > >> Is possible to see this feature in EXPLAIN ANALYZE? > > > > > > > > > > > > Unfortunately I got slowdown > > > > > > > > 0. shared buffers = 1GB > > > > 1. create table with 50 int columns > > > > 2. insert into this table 4M rows > > > > > > Hi, > > > > > > Looks like I can reproduce the situation you're talking about (with > some > > > minor > > > differences) > > > > > > =# explain analyze select sum(data45) from test_deform; > > > QUERY PLAN > > > > > > --- > > > Finalize Aggregate > > > (cost=127097.71..127097.72 rows=1 width=8) > > > (actual time=813.957..813.957 rows=1 loops=1) > > >-> Gather > > >(cost=127097.50..127097.71 rows=2 width=8) > > >(actual time=813.946..813.950 rows=3 loops=1) > > > Workers Planned: 2 > > > Workers Launched: 2 > > > -> Partial Aggregate > > > (cost=126097.50..126097.51 rows=1 width=8) > > > (actual time=802.585..802.585 rows=1 loops=3) > > >-> Parallel Seq Scan on test_deform > > >(cost=0.00..121930.80 rows=180 width=4) > > >(actual time=0.040..207.326 rows=133 loops=3) > > > Planning Time: 0.212 ms > > > JIT: > > >Functions: 6 > > >Generation Time: 3.076 ms > > >Inlining: false > > >Inlining Time: 0.000 ms > > >Optimization: false > > >Optimization Time: 1.328 ms > > >Emission Time: 8.601 ms > > > Execution Time: 820.127 ms > > > (16 rows) > > > > > > =# set jit_tuple_deforming to off; > > > > > > =# explain analyze select sum(data45) from test_deform; > > > QUERY PLAN > > > > > > --- > > > Finalize Aggregate > > > (cost=127097.71..127097.72 rows=1 width=8) > > > (actual time=742.578..742.578 rows=1 loops=1) > > >-> Gather > > >(cost=127097.50..127097.71 rows=2 width=8) > > >(actual time=742.529..742.569 rows=3 loops=1) > > > Workers Planned: 2 > > > Workers Launched: 2 > > > -> Partial Aggregate > > > (cost=126097.50..126097.51 rows=1 width=8) > > > (actual time=727.876..727.876 rows=1 loops=3) > > >-> Parallel Seq Scan on test_deform > > >(cost=0.00..121930.80 rows=180 width=4) > > >(actual time=0.044..204.097 rows=133 loops=3) > > > Planning Time: 0.361 ms > > > JIT: > > >Functions: 4 > > >Generation Time: 2.840 ms > > >Inlining: false > > >Inlining Time: 0.000 ms > > >Optimization: false > > >Optimization Time: 0.751 ms > > >Emission Time: 6.436 ms > > > Execution Time: 749.827 ms > > > (16 rows) > > > > > > But at the same time on the bigger dataset (20M rows instead of 4M) the > > > very > > > same query gets better: > > > > > > =# explain analyze select sum(data45) from test_deform; > > > QUERY PLAN > > > > > > --- > > > Finalize Aggregate > > > (cost=631482.92..631482.93 rows=1 width=8) > > > (actual time=2350.288..2350.288 rows=1 loops=1) > > >-> Gather > > >(cost=631482.71..631482.92 rows=2 width=8) > > >(actual time=2350.276..235
Re: effect of JIT tuple deform?
On 2018-06-25 05:32:37 +0200, Pavel Stehule wrote: > 2018-06-24 22:32 GMT+02:00 Dmitry Dolgov <9erthali...@gmail.com>: > > > > On 23 June 2018 at 08:47, Pavel Stehule wrote: > > > > > > > > > 2018-06-23 8:35 GMT+02:00 Pavel Stehule : > > >> > > >> Hi > > >> > > >> I try to measure effect of JIT tuple deform and I don't see any possible > > >> effect. > > >> > > >> Is it this feature active in master branch? > > >> > > >> Is possible to see this feature in EXPLAIN ANALYZE? > > > > > > > > > Unfortunately I got slowdown > > > > > > 0. shared buffers = 1GB > > > 1. create table with 50 int columns > > > 2. insert into this table 4M rows > > > > Hi, > > > > Looks like I can reproduce the situation you're talking about (with some > > minor > > differences) > > > > =# explain analyze select sum(data45) from test_deform; > > QUERY PLAN > > > > --- > > Finalize Aggregate > > (cost=127097.71..127097.72 rows=1 width=8) > > (actual time=813.957..813.957 rows=1 loops=1) > >-> Gather > >(cost=127097.50..127097.71 rows=2 width=8) > >(actual time=813.946..813.950 rows=3 loops=1) > > Workers Planned: 2 > > Workers Launched: 2 > > -> Partial Aggregate > > (cost=126097.50..126097.51 rows=1 width=8) > > (actual time=802.585..802.585 rows=1 loops=3) > >-> Parallel Seq Scan on test_deform > >(cost=0.00..121930.80 rows=180 width=4) > >(actual time=0.040..207.326 rows=133 loops=3) > > Planning Time: 0.212 ms > > JIT: > >Functions: 6 > >Generation Time: 3.076 ms > >Inlining: false > >Inlining Time: 0.000 ms > >Optimization: false > >Optimization Time: 1.328 ms > >Emission Time: 8.601 ms > > Execution Time: 820.127 ms > > (16 rows) > > > > =# set jit_tuple_deforming to off; > > > > =# explain analyze select sum(data45) from test_deform; > > QUERY PLAN > > > > --- > > Finalize Aggregate > > (cost=127097.71..127097.72 rows=1 width=8) > > (actual time=742.578..742.578 rows=1 loops=1) > >-> Gather > >(cost=127097.50..127097.71 rows=2 width=8) > >(actual time=742.529..742.569 rows=3 loops=1) > > Workers Planned: 2 > > Workers Launched: 2 > > -> Partial Aggregate > > (cost=126097.50..126097.51 rows=1 width=8) > > (actual time=727.876..727.876 rows=1 loops=3) > >-> Parallel Seq Scan on test_deform > >(cost=0.00..121930.80 rows=180 width=4) > >(actual time=0.044..204.097 rows=133 loops=3) > > Planning Time: 0.361 ms > > JIT: > >Functions: 4 > >Generation Time: 2.840 ms > >Inlining: false > >Inlining Time: 0.000 ms > >Optimization: false > >Optimization Time: 0.751 ms > >Emission Time: 6.436 ms > > Execution Time: 749.827 ms > > (16 rows) > > > > But at the same time on the bigger dataset (20M rows instead of 4M) the > > very > > same query gets better: > > > > =# explain analyze select sum(data45) from test_deform; > > QUERY PLAN > > > > --- > > Finalize Aggregate > > (cost=631482.92..631482.93 rows=1 width=8) > > (actual time=2350.288..2350.288 rows=1 loops=1) > >-> Gather > >(cost=631482.71..631482.92 rows=2 width=8) > >(actual time=2350.276..2350.279 rows=3 loops=1) > > Workers Planned: 2 > > Workers Launched: 2 > > -> Partial Aggregate > > (cost=630482.71..630482.72 rows=1 width=8) > > (actual time=2328.378..2328.379 rows=1 loops=3) > >-> Parallel Seq Scan on test_deform > >(cost=0.00..609649.37 rows=837 width=4) > >(actual time=0.027..1175.960 rows=667 loops=3) > > Planning Time: 0.600 ms > > JIT: > >Functions: 6 > &g
Re: effect of JIT tuple deform?
Hi, On 2018-06-24 22:32:07 +0200, Dmitry Dolgov wrote: > `perf diff` indeed shows that in the first case (with the 4M rows dataset) the > jitted version has some noticeable delta for one call, and unfortunately so > far > I couldn't figure out which one exactly because of JIT (btw, who can explain > how to see a correct full `perf report` in this case? Somehow `perf > inject --jit -o > perf.data.jitted` and jit_profiling_support didn't help). jit_profiling_support currently requires a patch (that I'm about to merge into their trunk) in LLVM. I've posted it a couple times to the list. > But since on the bigger dataset I've got expected results, maybe it's just a > sign that JIT kicks in too early in this case and what's necessary is to > adjust > jit_above_cost/jit_optimize_above_cost/jit_inline_above_cost? Yea, that's very likely the problem. I wonder if we should multiply the cost w/ the number of targeted workers to reduce problems with parallelism... Greetings, Andres Freund
Re: effect of JIT tuple deform?
2018-06-23 8:35 GMT+02:00 Pavel Stehule : > Hi > > I try to measure effect of JIT tuple deform and I don't see any possible > effect. > > Is it this feature active in master branch? > > Is possible to see this feature in EXPLAIN ANALYZE? > Unfortunately I got slowdown 0. shared buffers = 1GB 1. create table with 50 int columns 2. insert into this table 4M rows postgres=# \dt+ wt List of relations ++--+---+---++-+ | Schema | Name | Type | Owner | Size | Description | ++--+---+---++-+ | public | wt | table | pavel | 893 MB | | ++--+---+---++-+ (1 row) default setting postgres=# explain analyze select sum(c45) from wt; +--+ |QUERY PLAN| +--+ | Finalize Aggregate (cost=136120.69..136120.70 rows=1 width=8) (actual time=879.547..879.547 rows=1 loops=1) | | -> Gather (cost=136120.47..136120.68 rows=2 width=8) (actual time=879.514..879.538 rows=3 loops=1) | | Workers Planned: 2 | | Workers Launched: 2 | | -> Partial Aggregate (cost=135120.47..135120.48 rows=1 width=8) (actual time=850.283..850.284 rows=1 loops=3) | | -> Parallel Seq Scan on wt (cost=0.00..130953.77 rows=178 width=4) (actual time=0.071..223.338 rows=147 loops=3) | | Planning Time: 0.158 ms | | JIT: | | Functions: 6 | | Generation Time: 4.267 ms | | Inlining: false | | Inlining Time: 0.000 ms | | Optimization: false | | Optimization Time: 2.472 ms | | Emission Time: 15.929 ms | | Execution Time: 899.496 ms | +--+ (16 rows) postgres=# set jit_tuple_deforming to off; SET postgres=# explain analyze select sum(c45) from wt; +--+ |QUERY PLAN| +--+ | Finalize Aggregate (cost=136120.69..136120.70 rows=1 width=8) (actual time=743.667..743.667 rows=1 loops=1) | | -> Gather (cost=136120.47..136120.68 rows=2 width=8) (actual time=743.654..743.657 rows=3 loops=1) | | Workers Planned: 2 | | Workers Launched: 2 | | -> Partial Aggregate (cost=135120.47..135120.48 rows=1 width=8) (actual time=715.532..715.533 rows=1 loops=3) | | -> Parallel Seq Scan on wt (cost=0.00..130953.77 rows=178 width=4) (actual time=0.067..216.245 rows=147 loops=3) | | Planning Time: 0.157 ms | | JIT: | | Functions: 4 | | Generation Time: 1.989 ms | | Inlining: false | | Inlining Time: 0.000 ms | | Optimization: false | | Optimization Time: 0.449 ms | | Emission Time: 7.254 ms | | Execution Time: 761.549 ms | +--+ (16 rows) When jit_tuple_deforming is enabled, the query is slower about 100ms, looks like performance regression > Regards > > Pavel >
effect of JIT tuple deform?
Hi I try to measure effect of JIT tuple deform and I don't see any possible effect. Is it this feature active in master branch? Is possible to see this feature in EXPLAIN ANALYZE? Regards Pavel