Hi, As previously mentioned, tuple deforming is a major bottleneck, and JITing it can be highly beneficial. I previously had posted a prototype that does JITing at the slot_deform_tuple() level, caching the deformed function in the tupledesc.
Storing things in the tupledesc isn't a great concept however - the lifetime of the generated function is hard to manage. But more importantly, and even if we moved this into the slot, it precludes important optimization. JITing the deforming is a *lot* more efficient if we can combine it with the JITing of the expressions using the deformed expression. There's a couple of reasons for that: 1) By knowing the exact attnum the caller is going to request, the code can be optimized. No need to generate code for columns not deformed. If there's NOT NULL columns at/after the last to-be-deformed column, there's no need to generate checks about the length of the null-bitmap - getting rid of about half the branches! 2) By generating the deforming code in the generated expression code, the code will be generated together.. That's a good chunk of the overhead, of the memory mapping overhead, and it noticeably reduces function call overhead (because relative near calls can be used). 3) LLVM's optimizer can inline parts / all of the tuple deforming code into the expression evaluation function, further reducing overhead. In simpler cases and with some additional prodding, llvm even can interleave deforming of individual columns and their use (note that I'm not proposing to do so initially). 4) If we know that the underlying tuple is an actual nonvirtual tuple, e.g. on the scan level, the slot deforming of NOT NULL can be replaced with direct byte accesses to the relevant column - a good chunk faster again. (note that I'm not proposing to do so initially) The problem however is that when generating the expression code we don't have the necessary information. In my current prototype I'm emitting the LLVM IR (the input to LLVM) at ExecInitExpr() time for all expressions in a tree. That allows to emit the code for all functions in executor tree in one go. But unfortunately the current executor initiation "framework" doesn't provide information about the underlying slot tupledescs at that time. Nor does it actually guarantee that the tupledesc / slots stay the same over the course of the execution. Therefore I'd like to somehow change things so that the executor keeps track of whether the tupledesc of inner/outer/scan are going to change, and if not provide them. The right approach here seems to be to add a bit of extra data to ExecAssignScanType etc., and move ExecInitExpr / ExecInitQual / ExecAssignScanProjectionInfo /... to after that. We then could keep track of of the relevant tupledescs somewhere in PlanState - that's a bit ugly, but I don't quite see how to avoid that unless we want to add major executor-node awareness into expression evaluation. Thoughts? Better ideas? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers