Re: [HACKERS] WIP: Faster Expression Processing and Tuple Deforming (including JIT)
Hi, On 2017-02-10 18:18:13 +0100, Markus Nullmeier wrote: > Well, if this thread of thought about hand-crafted JIT should be taken > up again by someone at some point in time, I guess it could be possible > to reuse tools that are already out there, such as "DynASM" > ( http://luajit.org/dynasm_features.html ) from the LuaJIT project > (currently offers x86-64, x86-32, ARM-32, PPC-32, and MIPS-32). > Maybe one could even recycle parts of the LuaJIT project itself > ( http://luajit.org/luajit.html , > http://article.gmane.org/gmane.comp.lang.lua.general/58908 ). FWIW, I'd looked at dynasm/luajit. One big reason to go for LLVM is that it has nearly all the infrastructure to make backend-functions/operators inlineable. Especially for some of the arithmetic operations and such, that'd be quite useful performance-wise. With LLVM you can just use clang on C to generate the IR, do some work to boil down the IR modules to the relevant functions (i.e. remove non sql-callable functions), for which LLVM has infrastructure, and then inline the functions that way. That's a lot harder to do with nearly everything else (save gcc's jit library, but the licensing and stability situation makes that unattractive. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing and Tuple Deforming (including JIT)
On 12/06/16 21:40, Andres Freund wrote: > On 2016-12-06 14:35:43 -0600, Nico Williams wrote: >> On Tue, Dec 06, 2016 at 12:27:51PM -0800, Andres Freund wrote: >>> On 2016-12-06 14:19:21 -0600, Nico Williams wrote: > I concur with your feeling that hand-rolled JIT is right out. But Yeah, that way lies maintenance madness. >>> >>> I'm not quite that sure about that. I had a lot of fun doing some >>> hand-rolled x86 JITing. Not that is a ward against me being mad. But >>> more seriously: Manually doing a JIT gives you a lot faster compilation >>> times, which makes JIT applicable in a lot more situations. >> >> What I meant is that each time there are new ISA extensions, or >> differences in how relevant/significant different implementations of the >> same ISA implement certain instructions, and/or every time you want to >> add a new architecture... someone has to do a lot of very low-level >> work. > > Yea, that's why I didn't pursue this path further. I *personally* think > it'd be perfectly fine to only support JITing on linux x86_64 and > aarch64 for now. And those I'd be willing to work on. But since I know > that's not project policy... Well, if this thread of thought about hand-crafted JIT should be taken up again by someone at some point in time, I guess it could be possible to reuse tools that are already out there, such as "DynASM" ( http://luajit.org/dynasm_features.html ) from the LuaJIT project (currently offers x86-64, x86-32, ARM-32, PPC-32, and MIPS-32). Maybe one could even recycle parts of the LuaJIT project itself ( http://luajit.org/luajit.html , http://article.gmane.org/gmane.comp.lang.lua.general/58908 ). -- Markus Nullmeierhttp://www.g-vo.org German Astrophysical Virtual Observatory (GAVO) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing and Tuple Deforming (including JIT)
On Tue, Dec 6, 2016 at 11:10:59AM -0800, Andres Freund wrote: > > I concur with your feeling that hand-rolled JIT is right out. But > > I'm not sure that whatever performance gain we might get in this > > direction is worth the costs. > > Well, I'm not impartial, but I don't think we do our users a service by > leaving significant speedups untackled, and after spending a *LOT* of > time on this, I don't see much other choice than JITing. Note that > nearly everything performance sensitive is moving towards doing JITing > in some form or another. Agreed, we don't really have a choice. After all the optimizations we have done to so many subsystems, our executor is relatively slow and is a major drag on our system, specifically for long-running queries. The base problem with the executor are the state machines at so many levels, and we really can't optimize that while keeping a reasonable maintenance burden. This is where JIT and LLVM help. I outlined two external projects that were researching this in this blog entry: http://momjian.us/main/blogs/pgblog/2016.html#April_1_2016 I am excited to now be seeing WIP code. -- Bruce Momjianhttp://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing and Tuple Deforming (including JIT)
On Mon, Dec 12, 2016 at 6:14 PM, Andres Freundwrote: > > > For Q1 I think the bigger win is JITing the transition function > invocation in advance_aggregates/transition_function - that's IIRC where > the biggest bottleneck lies. > Yeah, we bundle the agg core into our expr work... no point otherwise since we do it for OLAP. As for experience, I think you have found out for yourself. There is a lot that can be done and heuristics are involved in many places to decide whether to jit fully, partially, or not at all. But it looks like you have a solid basis now to proceed and explore the beyond :-) Send me private email if you have a particular question. Regards, -cktan
Re: [HACKERS] WIP: Faster Expression Processing and Tuple Deforming (including JIT)
Hi, On 2016-12-12 18:11:13 -0800, CK Tan wrote: > Andres, > > dev (no jiting): > > Time: 30343.532 ms > > > dev (jiting): > > SET jit_tuple_deforming = on; > > SET jit_expressions = true; > > > > Time: 24439.803 ms > > FYI, ~20% improvement for TPCH Q1 is consistent with what we find when we > only jit expression. For Q1 I think the bigger win is JITing the transition function invocation in advance_aggregates/transition_function - that's IIRC where the biggest bottleneck lies. If you have any details about your JITing experience that you're willing to talk about ... Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing and Tuple Deforming (including JIT)
Andres, > dev (no jiting): > Time: 30343.532 ms > dev (jiting): > SET jit_tuple_deforming = on; > SET jit_expressions = true; > > Time: 24439.803 ms FYI, ~20% improvement for TPCH Q1 is consistent with what we find when we only jit expression. Cheers, -cktan
Re: [HACKERS] WIP: Faster Expression Processing and Tuple Deforming (including JIT)
On Tue, Dec 6, 2016 at 9:19 AM, Andres Freundwrote: > 0009 WIP: Add minimal keytest implementation. > > More or less experimental patch that tries to implement simple > expression of the OpExpr(ScalarVar, Const) into a single expression > evaluation step. The benefits probably aren't big enough iff we do end > up doing JITing of expressions. Seems like we are try to achieve same thing with 'heap scan key push down patch'[1] as well. But I think with this patch you are covering OpExpr(ScalarVar, Const) for all the cases, wherein with [1] we are currently only doing it for seqscan and we are trying to make that generic for other node as well. So do you see any advantage of continuing [1] ? Is there something extra we can achieve with [1] what we can not get with this patch ? https://www.postgresql.org/message-id/CAFiTN-takT6Z4s3tGDwyC9bhYf%2B1gumpvW5bo_fpeNUy%2BrL-kg%40mail.gmail.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing and Tuple Deforming (including JIT)
On 7 December 2016 at 14:39, Craig Ringerwrote: > On 7 December 2016 at 04:13, Robert Haas wrote: > >> I wonder how feasible it would be to make this a run-time dependency >> rather than a compile option. > > Or something that's compiled with the server, but produces a separate > .so that's the only thing that links to LLVM. So packagers can avoid a > dependency on LLVM for postgres. Ahem, next time I'll finish the thread first. Nevermind. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing and Tuple Deforming (including JIT)
On 7 December 2016 at 04:13, Robert Haaswrote: > I wonder how feasible it would be to make this a run-time dependency > rather than a compile option. Or something that's compiled with the server, but produces a separate .so that's the only thing that links to LLVM. So packagers can avoid a dependency on LLVM for postgres. I suspect it wouldn't be worth the complexity, the added indirection necessary, etc. If you're using packages then pulling in LLVM isn't a big deal. If you're not, then don't use --with-llvm . -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing and Tuple Deforming (including JIT)
On 2016-12-06 13:27:14 -0800, Peter Geoghegan wrote: > On Mon, Dec 5, 2016 at 7:49 PM, Andres Freundwrote: > > I tried to address 2) by changing the C implementation. That brings some > > measurable speedups, but it's not huge. A bigger speedup is making > > slot_getattr, slot_getsomeattrs, slot_getallattrs very trivial wrappers; > > but it's still not huge. Finally I turned to just-in-time (JIT) > > compiling the code for tuple deforming. That doesn't save the cost of > > 1), but it gets rid of most of 2) (from ~15% to ~3% in TPCH-Q01). The > > first part is done in 0008, the JITing in 0012. > > A more complete motivating example would be nice. For example, it > would be nice to see the overall speedup for some particular TPC-H > query. Well, it's a bit WIP-y for that - not all TPCH queries run JITed yet, as I've not done that for enough expression types... And you run quickly into other bottlenecks. But here we go for TPCH (scale 10) Q01: master: Time: 33885.381 ms 16.29% postgres postgres [.] slot_getattr 12.85% postgres postgres [.] ExecMakeFunctionResultNoSets 10.85% postgres postgres [.] advance_aggregates 6.91% postgres postgres [.] slot_deform_tuple 6.70% postgres postgres [.] advance_transition_function 4.59% postgres postgres [.] ExecProject 4.25% postgres postgres [.] float8_accum 3.69% postgres postgres [.] tuplehash_insert 2.39% postgres postgres [.] float8pl 2.20% postgres postgres [.] bpchareq 2.03% postgres postgres [.] check_stack_depth profile: (note that all expression evaluated things are distributed among many functions) dev (no jiting): Time: 30343.532 ms profile: 16.57% postgres postgres [.] slot_deform_tuple 13.39% postgres postgres [.] ExecEvalExpr 8.64% postgres postgres [.] advance_aggregates 8.58% postgres postgres [.] advance_transition_function 5.83% postgres postgres [.] float8_accum 5.14% postgres postgres [.] tuplehash_insert 3.89% postgres postgres [.] float8pl 3.60% postgres postgres [.] slot_getattr 2.66% postgres postgres [.] bpchareq 2.56% postgres postgres [.] heap_getnext dev (jiting): SET jit_tuple_deforming = on; SET jit_expressions = true; Time: 24439.803 ms profile: 11.11% postgres postgres [.] slot_deform_tuple 10.87% postgres postgres [.] advance_aggregates 9.74% postgres postgres [.] advance_transition_function 6.53% postgres postgres [.] float8_accum 5.25% postgres postgres [.] tuplehash_insert 4.31% postgres perf-10698.map [.] deform0 3.68% postgres perf-10698.map [.] evalexpr6 3.53% postgres postgres [.] slot_getattr 3.41% postgres postgres [.] float8pl 2.84% postgres postgres [.] bpchareq (note how expression eval when from 13.39% to roughly 4%) The slot_deform_cost here is primarily cache misses. If you do the "memory order" iteration, it drops significantly. The JIT generated code still leaves a lot on the table, i.e. this is definitely not the best we can do. We also deform half the tuple twice, because I've not yet added support for starting to deform in the middle of a tuple. Independent of new expression evaluation and/or JITing, if you make advance_aggregates and advance_transition_function inline functions (or you do profiling accounting for children), you'll notice that ExecAgg() + advance_aggregates + advance_transition_function themselves take up about 20% cpu-time. That's *not* including the hashtable management, the actual transition functions, and such themselves. If you have queries where tuple deforming is a bigger proportion of the load, or where expression evalution (including projection) is a larger part (any NULLs e.g.) you can get a lot bigger wins, even without actually optimizing the generated code (which I've not yet done). Just btw: float8_accum really should use an internal aggregation type instead of using postgres array... Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing and Tuple Deforming (including JIT)
On Mon, Dec 5, 2016 at 7:49 PM, Andres Freundwrote: > I tried to address 2) by changing the C implementation. That brings some > measurable speedups, but it's not huge. A bigger speedup is making > slot_getattr, slot_getsomeattrs, slot_getallattrs very trivial wrappers; > but it's still not huge. Finally I turned to just-in-time (JIT) > compiling the code for tuple deforming. That doesn't save the cost of > 1), but it gets rid of most of 2) (from ~15% to ~3% in TPCH-Q01). The > first part is done in 0008, the JITing in 0012. A more complete motivating example would be nice. For example, it would be nice to see the overall speedup for some particular TPC-H query. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing and Tuple Deforming (including JIT)
On Tue, Dec 06, 2016 at 12:36:41PM -0800, Andres Freund wrote: > On 2016-12-06 15:25:44 -0500, Tom Lane wrote: > > I'm not entirely thrilled with the idea of this being a configure-time > > decision, because that forces packagers to decide for their entire > > audience whether it's okay to depend on LLVM. That would be an untenable > > position to put e.g. Red Hat's packagers in: either they screw the people > > who want performance or they screw the people who want security. There's no security issue. The dependency is on LLVM libraries, not LLVM front-ends (e.g., clang(1)). I don't think there's a real issue as to distros/packagers/OS vendors. They already have to package LLVM, and they already package LLVM libraries separately from LLVM front-ends. > The argument for not install a c compiler seems to be that it makes it > less convenient to build an executable. I doubt that having a C(++) > library for code generation is convenient enough to change the picture > there. The security argument goes back to the days of the Morris worm, which depended on having developer tools (specifically in that case, ld(1), the link-editor). But JIT via LLVM won't give hackers a way to generate or link arbitrary object code. Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing and Tuple Deforming (including JIT)
On 2016-12-06 14:35:43 -0600, Nico Williams wrote: > On Tue, Dec 06, 2016 at 12:27:51PM -0800, Andres Freund wrote: > > On 2016-12-06 14:19:21 -0600, Nico Williams wrote: > > > > I concur with your feeling that hand-rolled JIT is right out. But > > > > > > Yeah, that way lies maintenance madness. > > > > I'm not quite that sure about that. I had a lot of fun doing some > > hand-rolled x86 JITing. Not that is a ward against me being mad. But > > more seriously: Manually doing a JIT gives you a lot faster compilation > > times, which makes JIT applicable in a lot more situations. > > What I meant is that each time there are new ISA extensions, or > differences in how relevant/significant different implementations of the > same ISA implement certain instructions, and/or every time you want to > add a new architecture... someone has to do a lot of very low-level > work. Yea, that's why I didn't pursue this path further. I *personally* think it'd be perfectly fine to only support JITing on linux x86_64 and aarch64 for now. And those I'd be willing to work on. But since I know that's not project policy... - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing and Tuple Deforming (including JIT)
On 2016-12-06 15:25:44 -0500, Tom Lane wrote: > Andres Freundwrites: > > On 2016-12-06 13:56:28 -0500, Tom Lane wrote: > >> I guess the $64 question that has to be addressed here is whether we're > >> prepared to accept LLVM as a run-time dependency. There are some reasons > >> why we might not be: > > > Indeed. It'd only be a soft dependency obviously. > > Oh, so we'd need to maintain both the LLVM and the traditional expression > execution code? That seems like a bit of a pain, but maybe we can live > with it. Yea, that's why I converted the "traditional" expression evaluation into a different format first - that way the duplication is a lot lower. E.g. scalar var eval looks like: EEO_CASE(EEO_INNER_VAR): { int attnum = op->d.var.attnum; Assert(op->d.var.attnum >= 0); *op->resnull = innerslot->tts_isnull[attnum]; *op->resvalue = innerslot->tts_values[attnum]; EEO_DISPATCH(op); } in normal evaluation and like case EEO_INNER_VAR: { LLVMValueRef value, isnull; LLVMValueRef v_attnum; v_attnum = LLVMConstInt(LLVMInt32Type(), op->d.var.attnum, false); value = LLVMBuildLoad(builder, LLVMBuildGEP(builder, v_innervalues, _attnum, 1, ""), ""); isnull = LLVMBuildLoad(builder, LLVMBuildGEP(builder, v_innernulls, _attnum, 1, ""), ""); LLVMBuildStore(builder, value, v_resvaluep); LLVMBuildStore(builder, isnull, v_resnullp); LLVMBuildBr(builder, opblocks[i + 1]); break; } for JITed evaluation. > I'm not entirely thrilled with the idea of this being a configure-time > decision, because that forces packagers to decide for their entire > audience whether it's okay to depend on LLVM. That would be an untenable > position to put e.g. Red Hat's packagers in: either they screw the people > who want performance or they screw the people who want security. Hm. I've a bit of a hard time buying the security argument here. Having LLVM (not clang!) installed doesn't really change the picture that much. In either case you can install binaries, and you're very likely already using some program that does JIT internally. And postgres itself gives you plenty of ways to execute arbitrary code as superuser. The argument for not install a c compiler seems to be that it makes it less convenient to build an executable. I doubt that having a C(++) library for code generation is convenient enough to change the picture there. > I think it'd be all right if we can build this so that the direct > dependency on LLVM is confined to a separately-packageable extension. > That way, a packager can produce a core postgresql-server package > that does not require LLVM, plus a postgresql-llvm package that does, > and the "no compiler please" crowd simply doesn't install the latter > package. That should be possible, but I'm not sure it's worth the effort. The JIT infrastructure will need resowner integration and such. We can obviously split things so that part is independent of LLVM, but I'm unconvinced that the benefit is large enough. > The alternative would be to produce two independent builds of the > server, which I suppose might be acceptable but it sure seems like > a kluge, or at least something that simply wouldn't get done by > most vendors. Hm. We could make that a make target ourselves ;) Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing and Tuple Deforming (including JIT)
On Tue, Dec 06, 2016 at 12:27:51PM -0800, Andres Freund wrote: > On 2016-12-06 14:19:21 -0600, Nico Williams wrote: > > A bigger concern might be interface stability. IIRC the LLVM C/C++ > > interfaces are not very stable, but bitcode is. > > The C API is a lot more stable than the C++ bit, that's the primary > reason I ended up using it, despite the C++ docs being better. Ah. > > > I concur with your feeling that hand-rolled JIT is right out. But > > > > Yeah, that way lies maintenance madness. > > I'm not quite that sure about that. I had a lot of fun doing some > hand-rolled x86 JITing. Not that is a ward against me being mad. But > more seriously: Manually doing a JIT gives you a lot faster compilation > times, which makes JIT applicable in a lot more situations. What I meant is that each time there are new ISA extensions, or differences in how relevant/significant different implementations of the same ISA implement certain instructions, and/or every time you want to add a new architecture... someone has to do a lot of very low-level work. > > > I'm not sure that whatever performance gain we might get in this > > > direction is worth the costs. > > > > Byte-/bit-coding query plans then JITting them is very likely to improve > > performance significantly. > > Note that what I'm proposing is a far cry away from that - this converts > two (peformance wise two, size wise one) significant subsystems, but far > from all the executors to be JIT able. I think there's some more low Yes, I know. > hanging fruits (particularly aggregate transition functions), but > converting everything seems to hit the wrong spot in the > benefit/effort/maintainability triangle. Maybe? At least with the infrastructure in place for it someone might try it and see. Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing and Tuple Deforming (including JIT)
Hi, On 2016-12-06 14:19:21 -0600, Nico Williams wrote: > A bigger concern might be interface stability. IIRC the LLVM C/C++ > interfaces are not very stable, but bitcode is. The C API is a lot more stable than the C++ bit, that's the primary reason I ended up using it, despite the C++ docs being better. > > I concur with your feeling that hand-rolled JIT is right out. But > > Yeah, that way lies maintenance madness. I'm not quite that sure about that. I had a lot of fun doing some hand-rolled x86 JITing. Not that is a ward against me being mad. But more seriously: Manually doing a JIT gives you a lot faster compilation times, which makes JIT applicable in a lot more situations. > > I'm not sure that whatever performance gain we might get in this > > direction is worth the costs. > > Byte-/bit-coding query plans then JITting them is very likely to improve > performance significantly. Note that what I'm proposing is a far cry away from that - this converts two (peformance wise two, size wise one) significant subsystems, but far from all the executors to be JIT able. I think there's some more low hanging fruits (particularly aggregate transition functions), but converting everything seems to hit the wrong spot in the benefit/effort/maintainability triangle. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing and Tuple Deforming (including JIT)
Andres Freundwrites: > On 2016-12-06 13:56:28 -0500, Tom Lane wrote: >> I guess the $64 question that has to be addressed here is whether we're >> prepared to accept LLVM as a run-time dependency. There are some reasons >> why we might not be: > Indeed. It'd only be a soft dependency obviously. Oh, so we'd need to maintain both the LLVM and the traditional expression execution code? That seems like a bit of a pain, but maybe we can live with it. >> * How will we answer people who say they can't accept having a compiler >> installed on their production boxes for security reasons? > I think they'll just not enable --with-llvm in that case, and get > inferior performance. Note that installing llvm does not imply > installing a full blown C compiler (although I think that's largely > moot, you can get pretty much the same things done with just compiling > LLVM IR). I'm not entirely thrilled with the idea of this being a configure-time decision, because that forces packagers to decide for their entire audience whether it's okay to depend on LLVM. That would be an untenable position to put e.g. Red Hat's packagers in: either they screw the people who want performance or they screw the people who want security. I think it'd be all right if we can build this so that the direct dependency on LLVM is confined to a separately-packageable extension. That way, a packager can produce a core postgresql-server package that does not require LLVM, plus a postgresql-llvm package that does, and the "no compiler please" crowd simply doesn't install the latter package. The alternative would be to produce two independent builds of the server, which I suppose might be acceptable but it sure seems like a kluge, or at least something that simply wouldn't get done by most vendors. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing and Tuple Deforming (including JIT)
On Tue, Dec 06, 2016 at 01:56:28PM -0500, Tom Lane wrote: > Andres Freundwrites: > > I'm posting a quite massive series of WIP patches here, to get some > > feedback. > > I guess the $64 question that has to be addressed here is whether we're > prepared to accept LLVM as a run-time dependency. There are some reasons > why we might not be: > > * The sheer mass of the dependency. What's the installed footprint of > LLVM, versus a Postgres server? How hard is it to install from source? As long as it's optional, does this matter? A bigger concern might be interface stability. IIRC the LLVM C/C++ interfaces are not very stable, but bitcode is. > * How will we answer people who say they can't accept having a compiler > installed on their production boxes for security reasons? You don't need the front-ends (e.g., clang) installed in order to JIT. > * Are there any currently-interesting platforms that LLVM doesn't work > for? (I'm worried about RISC-V as much as legacy systems.) The *BSDs support more platforms than LLVM does, that's for sure. (NetBSD supports four more, IIRC, including ia64.) But the patches make LLVM optional anyways, so this should be a non-issue. > I concur with your feeling that hand-rolled JIT is right out. But Yeah, that way lies maintenance madness. > I'm not sure that whatever performance gain we might get in this > direction is worth the costs. Byte-/bit-coding query plans then JITting them is very likely to improve performance significantly. Whether you want the maintenance overhead is another story. Sometimes byte-coding + interpretation yields a significant improvement by reducing cache pressure on the icache and the size of the program to be interpreted. Having the option to JIT or not JIT might be useful. Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing and Tuple Deforming (including JIT)
On 2016-12-06 15:13:21 -0500, Robert Haas wrote: > Presumably this is going to need to be something that a user can get > via yum install or apt-get install on common systems. Right. apt-get install llvm-dev (or llvm-3.9-dev or such if you want to install a specific version), does the trick here. It's a bit easier to develop with a hand compiled version, because then LLVM adds a bootloads of asserts to its IR builder, which catches a fair amount of mistakes. Nothing you'd run in production though (just like you don't use a cassert build...). > I wonder how feasible it would be to make this a run-time dependency > rather than a compile option. That's probably overcomplicating > things, but... I don't think that's feasible at all unfortunately - the compiler IR (which then is JITed by LLVM) is generated via another C API. We could rebuild that one, but that'd be a lot of work. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing and Tuple Deforming (including JIT)
On Tue, Dec 6, 2016 at 2:10 PM, Andres Freundwrote: >> * The sheer mass of the dependency. What's the installed footprint of >> LLVM, versus a Postgres server? How hard is it to install from source? > > Worked for me first try, but I'm perhaps not the best person to judge. > It does take a while to compile though (~20min on my laptop). Presumably this is going to need to be something that a user can get via yum install or apt-get install on common systems. I wonder how feasible it would be to make this a run-time dependency rather than a compile option. That's probably overcomplicating things, but... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing and Tuple Deforming (including JIT)
On 2016-12-06 11:10:59 -0800, Andres Freund wrote: > > * Are there any currently-interesting platforms that LLVM doesn't work > > for? (I'm worried about RISC-V as much as legacy systems.) > > LLVM itself I don't think is a problem, it seems to target a wide range > of platforms. The platforms that don't support JIT compiling might be a > bit larger, since that involves more than just generating code. The os specific part is handling the executable format. The JIT we'd be using (MCJIT) has support for ELF, MachO, and COFF. The architecture specific bits seem to be there for x86, arm (small endian, be), aarch64 (arm 64 bits be/le again), mips, ppc64. Somebody is working on RISC-V support for llvm (i.e. it appears to be working, but is not merged) - but given it's not integrated into gcc either, I'm not seing that being an argument. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing and Tuple Deforming (including JIT)
On 2016-12-06 14:04:09 -0500, Robert Haas wrote: > I've heard at least one and maybe several PGCon presentations about > people JITing tuple deformation and getting big speedups, and I'd like > to finally hear one from somebody who intends to integrate that into > PostgreSQL. I certainly want to. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing and Tuple Deforming (including JIT)
Hi, On 2016-12-06 13:56:28 -0500, Tom Lane wrote: > Andres Freundwrites: > > I'm posting a quite massive series of WIP patches here, to get some > > feedback. > > I guess the $64 question that has to be addressed here is whether we're > prepared to accept LLVM as a run-time dependency. There are some reasons > why we might not be: Indeed. It'd only be a soft dependency obviously. > * The sheer mass of the dependency. What's the installed footprint of > LLVM, versus a Postgres server? How hard is it to install from source? Worked for me first try, but I'm perhaps not the best person to judge. It does take a while to compile though (~20min on my laptop). > * How will we answer people who say they can't accept having a compiler > installed on their production boxes for security reasons? I think they'll just not enable --with-llvm in that case, and get inferior performance. Note that installing llvm does not imply installing a full blown C compiler (although I think that's largely moot, you can get pretty much the same things done with just compiling LLVM IR). > * Are there any currently-interesting platforms that LLVM doesn't work > for? (I'm worried about RISC-V as much as legacy systems.) LLVM itself I don't think is a problem, it seems to target a wide range of platforms. The platforms that don't support JIT compiling might be a bit larger, since that involves more than just generating code. > I concur with your feeling that hand-rolled JIT is right out. But > I'm not sure that whatever performance gain we might get in this > direction is worth the costs. Well, I'm not impartial, but I don't think we do our users a service by leaving significant speedups untackled, and after spending a *LOT* of time on this, I don't see much other choice than JITing. Note that nearly everything performance sensitive is moving towards doing JITing in some form or another. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing and Tuple Deforming (including JIT)
On Tue, Dec 6, 2016 at 1:56 PM, Tom Lanewrote: > Andres Freund writes: >> I'm posting a quite massive series of WIP patches here, to get some >> feedback. > > I guess the $64 question that has to be addressed here is whether we're > prepared to accept LLVM as a run-time dependency. There are some reasons > why we might not be: > > * The sheer mass of the dependency. What's the installed footprint of > LLVM, versus a Postgres server? How hard is it to install from source? > > * How will we answer people who say they can't accept having a compiler > installed on their production boxes for security reasons? > > * Are there any currently-interesting platforms that LLVM doesn't work > for? (I'm worried about RISC-V as much as legacy systems.) I think anything that requires LLVM -- or, for that matter, anything that does JIT by any means -- has got to be optional. But I don't think --with-llvm as a compile option is inherently problematic. Also, I think this is probably a direction we need to go. I've heard at least one and maybe several PGCon presentations about people JITing tuple deformation and getting big speedups, and I'd like to finally hear one from somebody who intends to integrate that into PostgreSQL. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing and Tuple Deforming (including JIT)
Andres Freundwrites: > I'm posting a quite massive series of WIP patches here, to get some > feedback. I guess the $64 question that has to be addressed here is whether we're prepared to accept LLVM as a run-time dependency. There are some reasons why we might not be: * The sheer mass of the dependency. What's the installed footprint of LLVM, versus a Postgres server? How hard is it to install from source? * How will we answer people who say they can't accept having a compiler installed on their production boxes for security reasons? * Are there any currently-interesting platforms that LLVM doesn't work for? (I'm worried about RISC-V as much as legacy systems.) I concur with your feeling that hand-rolled JIT is right out. But I'm not sure that whatever performance gain we might get in this direction is worth the costs. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] WIP: Faster Expression Processing and Tuple Deforming (including JIT)
Hi Everyone, TL;DR: Making things faster. Architectural evalation. as some of you might be aware I've been working on making execution of larger queries in postgresl faster. While working on "batched execution" I came to the conclusion that, while necessary, isn't currently showing a large benefit because expression evaluation and tuple deforming are massive bottlenecks. I'm posting a quite massive series of WIP patches here, to get some feedback. Tuple deforming is slow because of two reasons: 1) It's the first thing that accesses tuples, i.e. it'll often incur cache misses. That's partially fundamental, but also partially can be addressed, e.g. through changing the access order in heap as in [1]. 2) Tuple deforming has a lot of unpredicatable branches, because it has to cope with various types of fields. We e.g. perform alignment in a lot of unneeded cases, do null checks for NOT NULL columns et al. I tried to address 2) by changing the C implementation. That brings some measurable speedups, but it's not huge. A bigger speedup is making slot_getattr, slot_getsomeattrs, slot_getallattrs very trivial wrappers; but it's still not huge. Finally I turned to just-in-time (JIT) compiling the code for tuple deforming. That doesn't save the cost of 1), but it gets rid of most of 2) (from ~15% to ~3% in TPCH-Q01). The first part is done in 0008, the JITing in 0012. Expression evaluation and projection is another major bottleneck. 1) Our recursive expression evaluation puts a *lot* of pressure on the stack. 2) There's a lot of indirect function calls when recursing to other expression nodes. These are hard to predict, because the same node type (say ExecEvalAnd()) is used in different parts of an expression tree, and invokes different sub-nodes. 3) The function calls to operators and other functions are hard to predict, leading to a significant number of pipeline stalls. 4) There's a fair amount of pg_list.h list style iteration going on, those are cache and pipeline inefficient. After some experimenting I came to the conclusion that the recursive processing is a fundamental impediment to making this faster. I've converted (0006) expression processing and projection into an opcode dispatch based interpreter. That yields, especially for complex expressions and larger projections a significant speedup in itself. But similarly to the deforming, expression evaluation remains a bottleneck after that, primarily because there's still a lot of unpredictable jump and calls, and because loads/stores have to be complex (e.g. ExprContext->ecxt_innertuple->tts_values[i]/tts_isnull[i] for a single scalar var evaluation). Using the opcode based representation of expression evaluation (as it's nearly linear, and has done a lot of the lookups ahead of time), it's actually quite easy to *After JITing expression evaluation itself is more than ten times faster than before*. But unfortunately that doesn't mean that queries are ten times faster - usually we'll hit bottlenecks elsewhere relatively soon. WRT to expression evaluation, the biggest cost afterwards are the relatively high overhead V1 function calls - register based parameter passing is a lot faster. After experimenting a bit with doing JITing manually (a lot of eye-stabbing kind of fun), I chose to use LLVM. An overview of the patch-queue so far: 0001 Make get_last_attnums more generic. Boring prerequisite. 0002 More efficient AggState->pertrans iteration. Relatively boring minor optimization, but it turns out to be a easily hit bottleneck. Will commit independently. 0003 Avoid materializing SRFs in the FROM list. 0004 Allow ROWS FROM to return functions as single record column. 0005 Basic implementation of targetlist SRFs via ROWS FROM. 0006 Remove unused code related to targetlist SRFs. These are basically just pre-requisites for the faster expression evaluation, and discussed elsewhere [2]. This implementation is *NOT* going to survive, because we ended coming to the conclusion that using a separate executor node to expand SRFs is a btter plan. But the new expression evaluation code won't be able to handle SRFs... 0007 WIP: Optimize slot_deform_tuple() significantly. This a) turns tuple deforming into an opcode based dispatch loop (using computed goto on gcc/clang). b) moves a lot of the logic from slot_deform_tuple() callsites into itself - that turns out to be more efficient. I'm not entirely sure it's worth doing the opcode based dispatch part, if we're going to also do the JIT bit - it's a fair amount of code, and the speed difference only matters on large amounts of rows. 0008 WIP: Faster expression processing and targetlist projection. This, functionally nearly complete, patch turns expression evaluation (and tuple deforming as a special case of that) into a "mini language" which is interpreted using either a while(true) switch(opcode) or computed goto to jump from opcode to opcode. It does