Re: [HACKERS] WIP: Faster Expression Processing v4
On 2017-03-27 13:30:11 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2017-03-27 12:18:37 -0400, Tom Lane wrote: > >> My feeling at this point is that we might be better off disabling > >> the computed-goto case by default. At the very least, we're going > >> to need a version check that restricts it to latest gcc. > > > In my measurements it's still faster in at least gcc-5/6, even without > > the option (largely because it avoids array bounds checks on the jump > > table built for the switch). > > Hm. What test cases are you using? I used tpc-h - seems like a realistic enough testcase. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Andres Freund writes: > On 2017-03-27 12:18:37 -0400, Tom Lane wrote: >> My feeling at this point is that we might be better off disabling >> the computed-goto case by default. At the very least, we're going >> to need a version check that restricts it to latest gcc. > In my measurements it's still faster in at least gcc-5/6, even without > the option (largely because it avoids array bounds checks on the jump > table built for the switch). Hm. What test cases are you using? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On 2017-03-27 12:18:37 -0400, Tom Lane wrote: > I wrote: > > As to the point of whether it actually helps or not ... > > on gcc 4.4.7 (RHEL 6), it makes things *WORSE*. We go from about half of > > the dispatches getting routed through a common location, to *all* of them > > (except one; for some odd reason the first EEO_NEXT in EEOP_NULLIF > > survives as a separate jump). This seems like a bug, but there it is. > > So after a bit of googling, this is a very longstanding complaint: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39284 > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71785 > > (hm, think I know the second submitter) I don't think that's precisely the same issue - as long as some of the goto branches survive, we're not hitting the full brunt of the compgoto thing. I think we're essentially avoiding some of that because we're "precomputing" the dispatch_table lookup. To count the number of jumps I used: gdb -batch -ex 'disassemble/s ExecInterpExpr' execExprInterp.o|grep jmpq|grep -v ExecInterpExpr|wc -l which'll only include indirect jumps (since otherwise jumps will look like "jmpq 0x35f2 ") standard flags -fno-crossjumping gcc-5 (5.4.1-8):34 82 gcc-6 (6.3.0-8):34 82 gcc-7 (7.0.1): 71 108 gcc-snapshot: 72 108 So that doesn't look too bad. > My feeling at this point is that we might be better off disabling > the computed-goto case by default. At the very least, we're going > to need a version check that restricts it to latest gcc. In my measurements it's still faster in at least gcc-5/6, even without the option (largely because it avoids array bounds checks on the jump table built for the switch). Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On 2017-03-27 11:52:05 -0400, Tom Lane wrote: > As to the point of whether it actually helps or not ... > > on gcc 6.3.1 (Fedora 25), yes it does. Seems to be one "jmp *something" > per EEO_NEXT or EEO_JUMP. > > on gcc 4.4.7 (RHEL 6), it makes things *WORSE*. We go from about half of > the dispatches getting routed through a common location, to *all* of them > (except one; for some odd reason the first EEO_NEXT in EEOP_NULLIF > survives as a separate jump). This seems like a bug, but there it is. Gah :(. I have gcc 5,6,7 here, but nothing earlier. I'm now inclined to go the version check routine, for the individual versions we can (and want) confirm this on... I'm not too concerned about not doing so on gcc 4.4 or older... > So this means we'd need some serious research to decide whether to apply > it. And I'm suspecting we'd end up with a compiler version test. Indeed. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
I wrote: > As to the point of whether it actually helps or not ... > on gcc 4.4.7 (RHEL 6), it makes things *WORSE*. We go from about half of > the dispatches getting routed through a common location, to *all* of them > (except one; for some odd reason the first EEO_NEXT in EEOP_NULLIF > survives as a separate jump). This seems like a bug, but there it is. So after a bit of googling, this is a very longstanding complaint: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39284 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71785 (hm, think I know the second submitter) I'm not sure we should be relying on a gcc bug fix that's barely four months old. Even assuming it fixes the issue without new regressions, most people are not going to have it anytime soon. My feeling at this point is that we might be better off disabling the computed-goto case by default. At the very least, we're going to need a version check that restricts it to latest gcc. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
As to the point of whether it actually helps or not ... on gcc 6.3.1 (Fedora 25), yes it does. Seems to be one "jmp *something" per EEO_NEXT or EEO_JUMP. on gcc 4.4.7 (RHEL 6), it makes things *WORSE*. We go from about half of the dispatches getting routed through a common location, to *all* of them (except one; for some odd reason the first EEO_NEXT in EEOP_NULLIF survives as a separate jump). This seems like a bug, but there it is. So this means we'd need some serious research to decide whether to apply it. And I'm suspecting we'd end up with a compiler version test. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On 2017-03-27 11:22:40 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2017-03-27 09:33:43 -0400, Tom Lane wrote: > >> I think it would, primarily because if we find out that some other compiler > >> spells this differently, we could handle it totally within configure. > > > I'm unconvinced that we could sensibly map different compiler's options > > on the same option name - I'd be surprised if they would have a similar > > enough effect. > > [ shrug... ] OK, next question is whether pgindent treats this > layout sanely. The patch was run through pgindent, i.e. it seems to be ok with it. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Andres Freund writes: > On 2017-03-27 09:33:43 -0400, Tom Lane wrote: >> I think it would, primarily because if we find out that some other compiler >> spells this differently, we could handle it totally within configure. > I'm unconvinced that we could sensibly map different compiler's options > on the same option name - I'd be surprised if they would have a similar > enough effect. [ shrug... ] OK, next question is whether pgindent treats this layout sanely. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On 2017-03-27 09:33:43 -0400, Tom Lane wrote: > Andres Freund writes: > > Checking for this isn't entirely pretty - see my attached attempt at > > doing so. I considered hiding > > __attribute__((optimize("no-crossjumping"))) in execInterpExpr.c behind > > a macro (like PG_DISABLE_CROSSJUMPING), but I don't really think that > > makes things better. > > I think it would, primarily because if we find out that some other compiler > spells this differently, we could handle it totally within configure. I'm unconvinced that we could sensibly map different compiler's options on the same option name - I'd be surprised if they would have a similar enough effect. > Isn't our practice to put __attribute__ at the end of a function > declaration or definition, not in the middle someplace? We could move it to the declaration - which doesn't seem like an improvement here, given that it's about optimization not API changes - but in definitions you can't move it after the function name: static Datum ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull) #ifdef HAVE_FUNCATTR_NO_CROSSJUMPING __attribute__((optimize("no-crossjumping"))) #endif : /home/andres/src/postgresql/src/backend/executor/execExprInterp.c:279:1: error: attributes should be specified before the declarator in a function definition Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Andres Freund writes: >> I personally find per-function annotation ala >> __attribute__((optimize("no-crossjumping"))) >> cleaner anyway. I tested that, and it seems to work. >> >> Obviously we'd have to hide that behind a configure test. Could also do >> tests based on __GNUC__ / __GNUC_MINOR__, but that seems uglier. Agreed. > Checking for this isn't entirely pretty - see my attached attempt at > doing so. I considered hiding > __attribute__((optimize("no-crossjumping"))) in execInterpExpr.c behind > a macro (like PG_DISABLE_CROSSJUMPING), but I don't really think that > makes things better. I think it would, primarily because if we find out that some other compiler spells this differently, we could handle it totally within configure. Isn't our practice to put __attribute__ at the end of a function declaration or definition, not in the middle someplace? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On 2017-03-25 20:59:27 -0700, Andres Freund wrote: > On 2017-03-25 23:51:45 -0400, Tom Lane wrote: > > Andres Freund writes: > > > On March 25, 2017 4:56:11 PM PDT, Ants Aasma wrote: > > >> I haven't had the time to research this properly, but initial tests > > >> show that with GCC 6.2 adding > > >> > > >> #pragma GCC optimize ("no-crossjumping") > > >> > > >> fixes merging of the op tail jumps. > > >> > > >> Some quick and dirty benchmarking suggests that the benefit for the > > >> interpreter is about 15% (5% speedup on a workload that spends 1/3 in > > >> ExecInterpExpr). My idea of prefetching op->resnull/resvalue to local > > >> vars before the indirect jump is somewhere between a tiny benefit and > > >> no effect, certainly not worth introducing extra complexity. Clang 3.8 > > >> does the correct thing out of the box and is a couple of percent > > >> faster than GCC with the pragma. > > > > > That's large enough to be worth doing (although I recall you seeing all > > > jumps commonalized). We should probably do this on a per function basis > > > however (either using pragma push option, or function attributes). > > > > Seems like it would be fine to do it on a per-file basis. > > I personally find per-function annotation ala > __attribute__((optimize("no-crossjumping"))) > cleaner anyway. I tested that, and it seems to work. > > Obviously we'd have to hide that behind a configure test. Could also do > tests based on __GNUC__ / __GNUC_MINOR__, but that seems uglier. Checking for this isn't entirely pretty - see my attached attempt at doing so. I considered hiding __attribute__((optimize("no-crossjumping"))) in execInterpExpr.c behind a macro (like PG_DISABLE_CROSSJUMPING), but I don't really think that makes things better. Comments? Greetings, Andres Freund >From ce42086871ebfcbb7ae52266e9c4549b5958e80b Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Sun, 26 Mar 2017 22:38:21 -0700 Subject: [PATCH] Disable gcc's crossjumping optimization for ExecInterpExpr(). GCC merges code in ExecInterpExpr() too aggressively, reducing branch prediction accuracy. As this is performance critical code, go through the trouble of disabling the relevant optimization for the function. To do so, have to detect whether the compiler support __attribute__((optimize("no-crossjumping"))) as a function annotation. Do so in configure. Discussion: https://postgr.es/m/20170326035927.5mubkfdtaqlrg...@alap3.anarazel.de --- config/c-compiler.m4 | 35 ++ configure | 41 +++ configure.in | 1 + src/backend/executor/execExprInterp.c | 9 src/include/pg_config.h.in| 4 src/include/pg_config.h.win32 | 4 6 files changed, 94 insertions(+) diff --git a/config/c-compiler.m4 b/config/c-compiler.m4 index 3321f226f3..ac72539dec 100644 --- a/config/c-compiler.m4 +++ b/config/c-compiler.m4 @@ -297,6 +297,41 @@ fi])# PGAC_C_COMPUTED_GOTO +# PGAC_C_FUNCATTR_NO_CROSSJUMPING +# --- +# Check if the C compiler understands the +# __attribute__((optimize("no-crossjumping"))) function attribute. +# Define HAVE_FUNCATTR_NO_CROSSJUMPING if so. +# +# At some later point it might make sense to generalize this so we can +# check for other optimization flags, but so far there's no need for +# that. +# +# Have to enable Werror, as some compilers (e.g. clang) would +# otherwise just warn about an unknown type of attribute. +AC_DEFUN([PGAC_C_FUNCATTR_NO_CROSSJUMPING], +[AC_CACHE_CHECK([for function attribute disabling crossjumping optimizations], pgac_cv_funcattr_no_crossjumping, +[ac_save_c_werror_flag=$ac_c_werror_flag +ac_c_werror_flag=yes +AC_COMPILE_IFELSE([AC_LANG_PROGRAM([ +extern int testme(void); +int +__attribute__((optimize("no-crossjumping"))) +testme(void) +{ +return 0; +} +])], +[pgac_cv_funcattr_no_crossjumping=yes], +[pgac_cv_funcattr_no_crossjumping=no]) +ac_c_werror_flag=$ac_save_c_werror_flag]) +if test x"$pgac_cv_funcattr_no_crossjumping" = xyes ; then +AC_DEFINE(HAVE_FUNCATTR_NO_CROSSJUMPING, 1, + [Define to 1 if your compiler understands __attribute__((optimize("no-crossjumping"))).]) +fi])# PGAC_C_FUNCATTR_NO_CROSSJUMPING + + + # PGAC_C_VA_ARGS # -- # Check if the C compiler understands C99-style variadic macros, diff --git a/configure b/configure index 4b8229e959..87c4799be3 100755 --- a/configure +++ b/configure @@ -11835,6 +11835,47 @@ if test x"$pgac_cv_computed_goto" = xyes ; then $as_echo "#define HAVE_COMPUTED_GOTO 1" >>confdefs.h fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for function attribute disabling crossjumping optimizations" >&5 +$as_echo_n "checking for function attribute disabling crossjumping optimizations... " >&6; } +if ${pgac_cv_funcattr_no_crossjumping+:} false; then : + $as_echo_n "(cached) " >&6 +else + ac_save_c_werror_flag=$ac_c_werror_flag +ac_c_we
Re: [HACKERS] WIP: Faster Expression Processing v4
On 2017-03-25 23:51:45 -0400, Tom Lane wrote: > Andres Freund writes: > > On March 25, 2017 4:56:11 PM PDT, Ants Aasma wrote: > >> I haven't had the time to research this properly, but initial tests > >> show that with GCC 6.2 adding > >> > >> #pragma GCC optimize ("no-crossjumping") > >> > >> fixes merging of the op tail jumps. > >> > >> Some quick and dirty benchmarking suggests that the benefit for the > >> interpreter is about 15% (5% speedup on a workload that spends 1/3 in > >> ExecInterpExpr). My idea of prefetching op->resnull/resvalue to local > >> vars before the indirect jump is somewhere between a tiny benefit and > >> no effect, certainly not worth introducing extra complexity. Clang 3.8 > >> does the correct thing out of the box and is a couple of percent > >> faster than GCC with the pragma. > > > That's large enough to be worth doing (although I recall you seeing all > > jumps commonalized). We should probably do this on a per function basis > > however (either using pragma push option, or function attributes). > > Seems like it would be fine to do it on a per-file basis. I personally find per-function annotation ala __attribute__((optimize("no-crossjumping"))) cleaner anyway. I tested that, and it seems to work. Obviously we'd have to hide that behind a configure test. Could also do tests based on __GNUC__ / __GNUC_MINOR__, but that seems uglier. > If you're > worried about pessimizing the out-of-line subroutines, we could move > those to a different file --- it's pretty questionable that they're > in execExprInterp.c in the first place, considering they're meant to be > used by more than just that execution method. I indeed am, but having the code in the same file has a minor advantage: It allows the compiler to partially inline them, if it feels like it (e.g. moving null checks inline). Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Andres Freund writes: > On March 25, 2017 4:56:11 PM PDT, Ants Aasma wrote: >> I haven't had the time to research this properly, but initial tests >> show that with GCC 6.2 adding >> >> #pragma GCC optimize ("no-crossjumping") >> >> fixes merging of the op tail jumps. >> >> Some quick and dirty benchmarking suggests that the benefit for the >> interpreter is about 15% (5% speedup on a workload that spends 1/3 in >> ExecInterpExpr). My idea of prefetching op->resnull/resvalue to local >> vars before the indirect jump is somewhere between a tiny benefit and >> no effect, certainly not worth introducing extra complexity. Clang 3.8 >> does the correct thing out of the box and is a couple of percent >> faster than GCC with the pragma. > That's large enough to be worth doing (although I recall you seeing all jumps > commonalized). We should probably do this on a per function basis however > (either using pragma push option, or function attributes). Seems like it would be fine to do it on a per-file basis. If you're worried about pessimizing the out-of-line subroutines, we could move those to a different file --- it's pretty questionable that they're in execExprInterp.c in the first place, considering they're meant to be used by more than just that execution method. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On March 25, 2017 4:56:11 PM PDT, Ants Aasma wrote: >On Sun, Mar 26, 2017 at 12:22 AM, Andres Freund >wrote: >>> At least with current gcc (6.3.1 on Fedora 25) at -O2, >>> what I see is multiple places jumping to the same indirect jump >>> instruction :-(. It's not a total disaster: as best I can tell, all >the >>> uses of EEO_JUMP remain distinct. But gcc has chosen to implement >about >>> 40 of the 71 uses of EEO_NEXT by jumping to the same couple of >>> instructions that increment the "op" register and then do an >indirect >>> jump :-(. >> >> Yea, I see some of that too - "usually" when there's more than just >the >> jump in common. I think there's some gcc variables that influence >this >> (min-crossjump-insns (5), max-goto-duplication-insns (8)). Might be >> worthwhile experimenting with setting them locally via a pragma or >such. >> I think Aants wanted to experiment with that, too. > >I haven't had the time to research this properly, but initial tests >show that with GCC 6.2 adding > >#pragma GCC optimize ("no-crossjumping") > >fixes merging of the op tail jumps. > >Some quick and dirty benchmarking suggests that the benefit for the >interpreter is about 15% (5% speedup on a workload that spends 1/3 in >ExecInterpExpr). My idea of prefetching op->resnull/resvalue to local >vars before the indirect jump is somewhere between a tiny benefit and >no effect, certainly not worth introducing extra complexity. Clang 3.8 >does the correct thing out of the box and is a couple of percent >faster than GCC with the pragma. That's large enough to be worth doing (although I recall you seeing all jumps commonalized). We should probably do this on a per function basis however (either using pragma push option, or function attributes). Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On Sun, Mar 26, 2017 at 12:22 AM, Andres Freund wrote: >> At least with current gcc (6.3.1 on Fedora 25) at -O2, >> what I see is multiple places jumping to the same indirect jump >> instruction :-(. It's not a total disaster: as best I can tell, all the >> uses of EEO_JUMP remain distinct. But gcc has chosen to implement about >> 40 of the 71 uses of EEO_NEXT by jumping to the same couple of >> instructions that increment the "op" register and then do an indirect >> jump :-(. > > Yea, I see some of that too - "usually" when there's more than just the > jump in common. I think there's some gcc variables that influence this > (min-crossjump-insns (5), max-goto-duplication-insns (8)). Might be > worthwhile experimenting with setting them locally via a pragma or such. > I think Aants wanted to experiment with that, too. I haven't had the time to research this properly, but initial tests show that with GCC 6.2 adding #pragma GCC optimize ("no-crossjumping") fixes merging of the op tail jumps. Some quick and dirty benchmarking suggests that the benefit for the interpreter is about 15% (5% speedup on a workload that spends 1/3 in ExecInterpExpr). My idea of prefetching op->resnull/resvalue to local vars before the indirect jump is somewhere between a tiny benefit and no effect, certainly not worth introducing extra complexity. Clang 3.8 does the correct thing out of the box and is a couple of percent faster than GCC with the pragma. Regards, Ants Aasma -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On 2017-03-25 12:22:15 -0400, Tom Lane wrote: > More random musing ... have you considered making the jump-target fields > in expressions be relative rather than absolute indexes? That is, > EEO_JUMP would look like > > op += (stepno); \ > EEO_DISPATCH(); \ > > instead of > > op = &state->steps[stepno]; \ > EEO_DISPATCH(); \ > > I have not carried out a full patch to make this work, but just making > that one change and examining the generated assembly code looks promising. > Instead of this > > movslq 40(%r14), %r8 > salq$6, %r8 > addq24(%rbx), %r8 > movq%r8, %r14 > jmp *(%r8) > > we get this > > movslq 40(%r14), %rax > salq$6, %rax > addq%rax, %r14 > jmp *(%r14) That seems like a good idea. I've not done this in the committed version (and I don't think we necessarily need to this before the release), but fo rthe future it seems like a good plan. It makes sense that it's faster - there's no need to reference state->steps. > which certainly looks like it ought to be faster. Also, the real reason > I got interested in this at all is that with relative jumps, groups of > steps would be position-independent within the steps array, which would > enable some compile-time tricks that seem impractical with the current > definition. Indeed. > BTW, now that I've spent a bit of time looking at the generated assembly > code, I'm kind of disinclined to believe any arguments about how we have > better control over branch prediction with the jump-threading > implementation. I measured the performance difference between using it and not using it, and it came out a pretty clear plus. On gcc 6.3, gcc master snapshot, and clang-3.9. It's not just that more jumps are duplicated, it's also that the switch() always adds a boundary check. > At least with current gcc (6.3.1 on Fedora 25) at -O2, > what I see is multiple places jumping to the same indirect jump > instruction :-(. It's not a total disaster: as best I can tell, all the > uses of EEO_JUMP remain distinct. But gcc has chosen to implement about > 40 of the 71 uses of EEO_NEXT by jumping to the same couple of > instructions that increment the "op" register and then do an indirect > jump :-(. Yea, I see some of that too - "usually" when there's more than just the jump in common. I think there's some gcc variables that influence this (min-crossjump-insns (5), max-goto-duplication-insns (8)). Might be worthwhile experimenting with setting them locally via a pragma or such. I think Aants wanted to experiment with that, too. Then there's also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71785 which causes some forms of computed goto (not ours I think) to be deoptimized in gcc. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Andres Freund writes: > - The !caseExpr->defresult result branch is currently unreachable (and > its equivalent was before the patch) because transformCaseExpr() > generates a default expression. I'm inclined to replace the dead code > with an assertion. Any reason not to do that? Ah-hah. I'd noted that that code wasn't being reached when I did some code coverage checks this morning, but I hadn't found the cause yet. Yeah, replacing the if-test with an assert seems fine. > I think there's some argument to be made that we should move all of > execSRF.c's logic to their respective callsites. There's not really > much shared code: ExecEvalFuncArgs(), tupledesc_match(), init_sexpr() - > and at least the latter would even become a bit simpler if we didn't > support both callers. Possibly. All that code could stand to be rethought, probably, now that its mission is just to handle the SRF case. But at least all the cruft is in one place for the moment. And I don't think it's anything we have to fix before v10, it's just cosmetic. > Thanks a lot for your work on the patch! You're welcome! regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Hi, On 2017-03-24 17:16:15 -0400, Tom Lane wrote: > Attached is an updated patch. I believe this is committable, but > since I whacked it around quite a bit, I'm sure you'll want to look > it over first. Indeed. Points: - It's going to take a while for me to get used to execExprInterp.c - I constantly try to switch to a nonexistant buffer ;) - Like your approach with /* Check that current tupdesc doesn't have more fields than we allocated */ in ExecEvalFieldStoreDeForm(). - I like EEO_FLAG_IS_QUAL. - I think at some point (not sure whether in your or in my revision) ScalarArrayOpExpr lost its permission check. Added that. - Added note that we knowingly don't perform permission checks for input/output funcs. - Both pre/post patch, CoerceViaIO doesn't invoke InvokeFunctionExecuteHook - I'm still unclear what that hook is useful for, so ... - The !caseExpr->defresult result branch is currently unreachable (and its equivalent was before the patch) because transformCaseExpr() generates a default expression. I'm inclined to replace the dead code with an assertion. Any reason not to do that? - I see you kept determination of the set of constraints to be checked for domain at initialization time. Made note that that a) might change b) could change behaviour. > Please be sure the commit message notes that function EXECUTE permissions > are now checked at executor startup not first call. We need to document > that in the v10 release notes as an incompatibility, and I'm sure we'll > forget if the commit log doesn't point it out. Done. > * As we discussed, ExecEvalWholeRowVar is now using a pretty inefficient > method for flattening tuples into datums. I will take a to-do item to > fix this. Thanks. > * ExecInitCheck is really just ExecInitExpr with a make_ands_explicit > call in front of it. It turned out that most of the places that were > (or should have been) calling it were doing a make_ands_implicit call > for no other reason than to satisfy its API. I changed most of those > to just call ExecInitExpr directly. There are just a couple of call > sites left, and I think probably those are likewise not really that > happy with this API --- but I didn't take the time to chase down where > the expressions were coming from in those cases. It seems possible > that we could remove ExecInitCheck/ExecPrepareCheck entirely. I'll > look into this later, too. Thanks^2. > * As we discussed, CaseTestValue/DomainValue are pretty messy and need > to be rethought. That might not get done for v10 though. Yea, I'm disinclined to tackle this just now, there's enough other stuff pending - but I'm willing to tackle it for v11 unless you beat me to it. > * I'm not very happy that execSRF.c has two somewhat different > implementations of largely similar functionality, and > SetExprState.elidedFuncState seems like a wart. That's mostly a > pre-existing problem of course. I'm satisfied with leaving it as it is > for now, but eventually some refactoring there would be good. Yea, that's not pretty. Given the historical difference in behaviour between SRF and ROWS FROM (the latter always materializes, the former only when the set function does so itself), I'm not sure they can easily be simplified. I'm not really sure how to get rid of the issue underlying elidedFuncState? I mean we could move that case into nodeFunctionscan, above ExecMakeTableFunctionResult's level, but that doesn't really seem like an improvement. I think there's some argument to be made that we should move all of execSRF.c's logic to their respective callsites. There's not really much shared code: ExecEvalFuncArgs(), tupledesc_match(), init_sexpr() - and at least the latter would even become a bit simpler if we didn't support both callers. Thanks a lot for your work on the patch! And pushed. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
More random musing ... have you considered making the jump-target fields in expressions be relative rather than absolute indexes? That is, EEO_JUMP would look like op += (stepno); \ EEO_DISPATCH(); \ instead of op = &state->steps[stepno]; \ EEO_DISPATCH(); \ I have not carried out a full patch to make this work, but just making that one change and examining the generated assembly code looks promising. Instead of this movslq 40(%r14), %r8 salq$6, %r8 addq24(%rbx), %r8 movq%r8, %r14 jmp *(%r8) we get this movslq 40(%r14), %rax salq$6, %rax addq%rax, %r14 jmp *(%r14) which certainly looks like it ought to be faster. Also, the real reason I got interested in this at all is that with relative jumps, groups of steps would be position-independent within the steps array, which would enable some compile-time tricks that seem impractical with the current definition. BTW, now that I've spent a bit of time looking at the generated assembly code, I'm kind of disinclined to believe any arguments about how we have better control over branch prediction with the jump-threading implementation. At least with current gcc (6.3.1 on Fedora 25) at -O2, what I see is multiple places jumping to the same indirect jump instruction :-(. It's not a total disaster: as best I can tell, all the uses of EEO_JUMP remain distinct. But gcc has chosen to implement about 40 of the 71 uses of EEO_NEXT by jumping to the same couple of instructions that increment the "op" register and then do an indirect jump :-(. So it seems that we're at the mercy of gcc's whims as to which instruction dispatches will be distinguishable to the hardware; which casts a very dark shadow over any benchmarking-based arguments that X is better than Y for branch prediction purposes. Compiler version differences are likely to matter a lot more than anything we do. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
btw ... I just got around to looking at a code coverage report for this patched version, and that reminded me of something I'd already suspected: EEOP_INNER_SYSVAR and EEOP_OUTER_SYSVAR seem to be dead code. That's unsurprising, because we never try to access a tuple's system columns above the scan level. If a query asks for system columns, those get passed up to upper query levels as ordinary user-side columns. We could keep the execution support for those opcodes, or we could rip it out and throw an error in execExpr.c if one would need to be generated. I'm a little inclined to the latter, because it seems like the plan is to grow more support for every opcode in the future. We don't need to be adding support for unreachable opcodes. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Andres Freund writes: > On 2017-03-24 11:26:27 -0400, Tom Lane wrote: >> Another modest proposal: >> >> I'm not really sold on the approach of using EEOP_FETCHSOME opcodes to >> trigger initial tupleslot de-forming. Certainly we want to have a single >> slot_getsomeattrs call per source slot, but as-is, we need a separate >> traversal over the expression tree just to precompute the max attribute >> number needed. That can't be good for expression compile speed, and >> it introduces nontrivial bug risks because the code that does that >> is completely decoupled from the code that emits the EEOP_VAR opcodes >> (which are what's really relying on the de-forming to have happened). > Hm. We had the separate traversal for projections for a long while, and > I don't think there've been a a lot of changes to the extraction of the > last attribute number. That's easily disproven just by looking at the code: /* * Don't examine the arguments or filters of Aggrefs or WindowFuncs, * because those do not represent expressions to be evaluated within the * calling expression's econtext. GroupingFunc arguments are never * evaluated at all. */ if (IsA(node, Aggref)) return false; if (IsA(node, WindowFunc)) return false; if (IsA(node, GroupingFunc)) return false; return expression_tree_walker(node, get_last_attnums_walker, (void *) info); The WindowFunc exception hasn't been there so long, and the GroupingFunc one is very new. And who's to say whether e.g. the recent XMLTABLE patch got this right at all? We could easily be extracting columns we don't need to. I'm willing to leave this as-is for the moment, but I really think we should look into changing it (after the main patch is in). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Hi, On 2017-03-24 11:26:27 -0400, Tom Lane wrote: > Another modest proposal: > > I'm not really sold on the approach of using EEOP_FETCHSOME opcodes to > trigger initial tupleslot de-forming. Certainly we want to have a single > slot_getsomeattrs call per source slot, but as-is, we need a separate > traversal over the expression tree just to precompute the max attribute > number needed. That can't be good for expression compile speed, and > it introduces nontrivial bug risks because the code that does that > is completely decoupled from the code that emits the EEOP_VAR opcodes > (which are what's really relying on the de-forming to have happened). Hm. We had the separate traversal for projections for a long while, and I don't think there've been a a lot of changes to the extraction of the last attribute number. I'm very doubtful that the cost of traversing the expression twice is meaningful in comparison to the other costs. > What do you think about a design like this: > > * Drop the FETCHSOME opcodes. > > * Add fields to struct ExprState that will hold the maximum inner, > outer, and scan attribute numbers needed. > > * ExecInitExpr initializes those fields to zero, and then during > ExecInitExprRec, whenever we generate an EEOP_VAR opcode, we do e.g. > > state->last_inner_attno = Max(state->last_inner_attno, > variable->varattno); > > * ExecInitExprSlots, get_last_attnums_walker, etc all go away. > > * In the startup segment of ExecInterpExpr, add > > if (state->last_inner_attno > 0) > slot_getsomeattrs(innerslot, state->last_inner_attno); > if (state->last_outer_attno > 0) > slot_getsomeattrs(outerslot, state->last_outer_attno); > if (state->last_scan_attno > 0) > slot_getsomeattrs(scanslot, state->last_scan_attno); > > This would be a little different from the existing code as far as runtime > branch-prediction behavior goes, but it's not apparent to me that it'd be > any worse. I'd be suprised if it weren't. I'm not super strongly against this setup, but I fail to see the benefit of whacking this around. I've benchmarked the previous/current setup fairly extensively, and I'd rather not redo that. In contrast to the other changes you've talked about, this definitely is in the hot-path... > Also, for JIT purposes it'd still be entirely possible to compile the > slot_getsomeattrs calls in-line; you'd just be looking to a different > part of the ExprState struct to find out what to do. Yea, we could do that. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Another modest proposal: I'm not really sold on the approach of using EEOP_FETCHSOME opcodes to trigger initial tupleslot de-forming. Certainly we want to have a single slot_getsomeattrs call per source slot, but as-is, we need a separate traversal over the expression tree just to precompute the max attribute number needed. That can't be good for expression compile speed, and it introduces nontrivial bug risks because the code that does that is completely decoupled from the code that emits the EEOP_VAR opcodes (which are what's really relying on the de-forming to have happened). What do you think about a design like this: * Drop the FETCHSOME opcodes. * Add fields to struct ExprState that will hold the maximum inner, outer, and scan attribute numbers needed. * ExecInitExpr initializes those fields to zero, and then during ExecInitExprRec, whenever we generate an EEOP_VAR opcode, we do e.g. state->last_inner_attno = Max(state->last_inner_attno, variable->varattno); * ExecInitExprSlots, get_last_attnums_walker, etc all go away. * In the startup segment of ExecInterpExpr, add if (state->last_inner_attno > 0) slot_getsomeattrs(innerslot, state->last_inner_attno); if (state->last_outer_attno > 0) slot_getsomeattrs(outerslot, state->last_outer_attno); if (state->last_scan_attno > 0) slot_getsomeattrs(scanslot, state->last_scan_attno); This would be a little different from the existing code as far as runtime branch-prediction behavior goes, but it's not apparent to me that it'd be any worse. Also, for JIT purposes it'd still be entirely possible to compile the slot_getsomeattrs calls in-line; you'd just be looking to a different part of the ExprState struct to find out what to do. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On 2017-03-23 21:58:03 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2017-03-23 21:26:19 -0400, Tom Lane wrote: > >> Hmm, I see ... but that only works in the cases where the caller of > >> ExecBuildProjectionInfo supplied a source slot, and a lot of 'em > >> don't. > > > Right, the old and new code comment on that: > > > * inputDesc can be NULL, but if it is not, we check to see whether simple > > * Vars in the tlist match the descriptor. It is important to provide > > * inputDesc for relation-scan plan nodes, as a cross check that the > > relation > > * hasn't been changed since the plan was made. At higher levels of a plan, > > * there is no need to recheck. > > Ah, I'd forgotten the assumption that we only need to check this at scan > level. > > >> I'd already put in the infrastructure to add ASSIGN_FOO_VAR_FIRST > >> step types. I could take it back out, but I wonder if it wouldn't be > >> smarter to keep it and remove the restriction in ExecBuildProjectionInfo. > >> Or maybe we could have ExecBuildProjectionInfo emit either > >> ASSIGN_FOO_VAR_FIRST or ASSIGN_FOO_VAR depending on whether it can prove > >> the reference safe. > > > I think it's probably ok to just leave the check in, and remove those > > comments, and simplify the isSimpleVar stuff to only check if > > IsA(tle->expr, Var) && ((Var *) tle->expr)->varattno > 0) > > Not sure. It's a pretty fair amount of duplicative code, once you finish > dealing with all the ExecJustFoo functions in addition to the main code > paths. At this point I'm inclined to take it back out and improve the > comments around ExecBuildProjectionInfo. I'm ok with both. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Andres Freund writes: > On 2017-03-23 21:26:19 -0400, Tom Lane wrote: >> Hmm, I see ... but that only works in the cases where the caller of >> ExecBuildProjectionInfo supplied a source slot, and a lot of 'em >> don't. > Right, the old and new code comment on that: > * inputDesc can be NULL, but if it is not, we check to see whether simple > * Vars in the tlist match the descriptor. It is important to provide > * inputDesc for relation-scan plan nodes, as a cross check that the relation > * hasn't been changed since the plan was made. At higher levels of a plan, > * there is no need to recheck. Ah, I'd forgotten the assumption that we only need to check this at scan level. >> I'd already put in the infrastructure to add ASSIGN_FOO_VAR_FIRST >> step types. I could take it back out, but I wonder if it wouldn't be >> smarter to keep it and remove the restriction in ExecBuildProjectionInfo. >> Or maybe we could have ExecBuildProjectionInfo emit either >> ASSIGN_FOO_VAR_FIRST or ASSIGN_FOO_VAR depending on whether it can prove >> the reference safe. > I think it's probably ok to just leave the check in, and remove those > comments, and simplify the isSimpleVar stuff to only check if > IsA(tle->expr, Var) && ((Var *) tle->expr)->varattno > 0) Not sure. It's a pretty fair amount of duplicative code, once you finish dealing with all the ExecJustFoo functions in addition to the main code paths. At this point I'm inclined to take it back out and improve the comments around ExecBuildProjectionInfo. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On 2017-03-23 21:26:19 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2017-03-23 20:36:32 -0400, Tom Lane wrote: > >> The problem here is that once we drop the buffer pin, any pointers we may > >> have into on-disk data are dangling pointers --- we're at risk of some > >> other backend taking away that shared buffer. (So I'm afraid that the > >> commentary there is overly optimistic.) So even though those pointers > >> may still be there beyond tts_nvalid, subsequent references to them are > >> very dangerous. > > > This applies to the code in master as well, no? An ExecEvalScalarVar() > > followed by an ExecEvalWholeRowVar() would have precisely the same > > effect? > > Yeah. The other order would be safe, because ExecEvalScalarVar would do > slot_getattr which would re-extract the value from the newly materialized > tuple. But there definitely seems to be a hazard for the order you > mentioned. > > > Do we need to do anything about this in the back-branches, > > given how unlikely this is going to be in practice? > > Probably not. As I mentioned, I think this may be only theoretical rather > than real, if you believe that buffer pins would only be associated with > slots holding references to regular tuples. And even if it's not > theoretical, the odds of seeing a failure in the field seem pretty tiny > given that a just-released buffer shouldn't be subject to recycling for > a fair while. But I don't want to leave it like this going forward. Ok. > >> Also, while trying to test the above scenario, I realized that the patch > >> as submitted was being way too cavalier about where it was applying > >> CheckVarSlotCompatibility and so on. The ASSIGN_FOO_VAR steps, for > >> instance, had no protection at all. > > > I don't think that's true - the assign checks had copied the code from > > the old ExecBuildProjectionInfo, setting isSimpleVar iff > > (!attr->attisdropped && variable->vartype == attr->atttypid) - we can > > check that for projections in contrast to normal expressions because we > > already know the slot. > > Hmm, I see ... but that only works in the cases where the caller of > ExecBuildProjectionInfo supplied a source slot, and a lot of 'em > don't. Right, the old and new code comment on that: * inputDesc can be NULL, but if it is not, we check to see whether simple * Vars in the tlist match the descriptor. It is important to provide * inputDesc for relation-scan plan nodes, as a cross check that the relation * hasn't been changed since the plan was made. At higher levels of a plan, * there is no need to recheck. and that seems like reasonable to me? That said, I think we can remove that assumption, by checking once. > As the code stands, we are unable to use ASSIGN_FOO_VAR in quite a lot > of places, including everywhere above the relation scan level. Hm? If inputDesc isn't given we just, before and after, do: if (!inputDesc) isSimpleVar = true; /* can't check type, assume OK */ > I'd already put in the infrastructure to add ASSIGN_FOO_VAR_FIRST > step types. I could take it back out, but I wonder if it wouldn't be > smarter to keep it and remove the restriction in ExecBuildProjectionInfo. > Or maybe we could have ExecBuildProjectionInfo emit either > ASSIGN_FOO_VAR_FIRST or ASSIGN_FOO_VAR depending on whether it can prove > the reference safe. I think it's probably ok to just leave the check in, and remove those comments, and simplify the isSimpleVar stuff to only check if IsA(tle->expr, Var) && ((Var *) tle->expr)->varattno > 0) - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Andres Freund writes: > On 2017-03-23 20:36:32 -0400, Tom Lane wrote: >> The problem here is that once we drop the buffer pin, any pointers we may >> have into on-disk data are dangling pointers --- we're at risk of some >> other backend taking away that shared buffer. (So I'm afraid that the >> commentary there is overly optimistic.) So even though those pointers >> may still be there beyond tts_nvalid, subsequent references to them are >> very dangerous. > This applies to the code in master as well, no? An ExecEvalScalarVar() > followed by an ExecEvalWholeRowVar() would have precisely the same > effect? Yeah. The other order would be safe, because ExecEvalScalarVar would do slot_getattr which would re-extract the value from the newly materialized tuple. But there definitely seems to be a hazard for the order you mentioned. > Do we need to do anything about this in the back-branches, > given how unlikely this is going to be in practice? Probably not. As I mentioned, I think this may be only theoretical rather than real, if you believe that buffer pins would only be associated with slots holding references to regular tuples. And even if it's not theoretical, the odds of seeing a failure in the field seem pretty tiny given that a just-released buffer shouldn't be subject to recycling for a fair while. But I don't want to leave it like this going forward. >> So I think that we have got to fix ExecEvalWholeRowVar so that it doesn't >> clobber the state of the slot. > That seems like a good plan. Yeah. I have the stopgap code in my working copy, and will look at refactoring the tuptoaster code for better performance later. >> Also, while trying to test the above scenario, I realized that the patch >> as submitted was being way too cavalier about where it was applying >> CheckVarSlotCompatibility and so on. The ASSIGN_FOO_VAR steps, for >> instance, had no protection at all. > I don't think that's true - the assign checks had copied the code from > the old ExecBuildProjectionInfo, setting isSimpleVar iff > (!attr->attisdropped && variable->vartype == attr->atttypid) - we can > check that for projections in contrast to normal expressions because we > already know the slot. Hmm, I see ... but that only works in the cases where the caller of ExecBuildProjectionInfo supplied a source slot, and a lot of 'em don't. As the code stands, we are unable to use ASSIGN_FOO_VAR in quite a lot of places, including everywhere above the relation scan level. I'd already put in the infrastructure to add ASSIGN_FOO_VAR_FIRST step types. I could take it back out, but I wonder if it wouldn't be smarter to keep it and remove the restriction in ExecBuildProjectionInfo. Or maybe we could have ExecBuildProjectionInfo emit either ASSIGN_FOO_VAR_FIRST or ASSIGN_FOO_VAR depending on whether it can prove the reference safe. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On 2017-03-23 20:36:32 -0400, Tom Lane wrote: > So I think that we have got to fix ExecEvalWholeRowVar so that it doesn't > clobber the state of the slot. Right at the moment, the only way to do > that seems to be to do this instead of ExecFetchSlotTupleDatum: > > tuple = ExecCopySlotTuple(slot); > dtuple = (HeapTupleHeader) > DatumGetPointer(heap_copy_tuple_as_datum(tuple, > slot->tts_tupleDescriptor)); > heap_freetuple(tuple); Hm. One disadvantage would be that repeated whole-row references to the same table would be a bit slower, because we'd repeatedly form a tuple from a virtual one - but I have a hard time coming up with a scenario where that'd matter. I'd suspect that in the end it'd probably even have a *positive* performance impact, because right now the next scalar access will have to deform the whole tuple again, and that seems like a lot more likely scenario. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On 2017-03-23 20:36:32 -0400, Tom Lane wrote: > I found a rather nasty bug :-( ... the comment in EEOP_INNER_VAR_FIRST about > > +/* > + * Can't assert tts_nvalid, as wholerow var evaluation or such > + * could have materialized the slot - but the contents are still > + * valid :/ > + */ > +Assert(op->d.var.attnum >= 0); > > is actually averting its eyes from a potentially serious problem. If we > go through ExecEvalWholeRowVar, which calls ExecFetchSlotTupleDatum, and > ExecFetchSlotTuple decides it has to materialize the tuple, then > ExecMaterializeSlot does this: > > /* > * Drop the pin on the referenced buffer, if there is one. > */ > if (BufferIsValid(slot->tts_buffer)) > ReleaseBuffer(slot->tts_buffer); > > slot->tts_buffer = InvalidBuffer; > > /* > * Mark extracted state invalid. This is important because the slot is > * not supposed to depend any more on the previous external data; we > * mustn't leave any dangling pass-by-reference datums in tts_values. > * However, we have not actually invalidated any such datums, if there > * happen to be any previously fetched from the slot. (Note in particular > * that we have not pfree'd tts_mintuple, if there is one.) > */ > slot->tts_nvalid = 0; > > > The problem here is that once we drop the buffer pin, any pointers we may > have into on-disk data are dangling pointers --- we're at risk of some > other backend taking away that shared buffer. (So I'm afraid that the > commentary there is overly optimistic.) So even though those pointers > may still be there beyond tts_nvalid, subsequent references to them are > very dangerous. This applies to the code in master as well, no? An ExecEvalScalarVar() followed by an ExecEvalWholeRowVar() would have precisely the same effect? Do we need to do anything about this in the back-branches, given how unlikely this is going to be in practice? > So I think that we have got to fix ExecEvalWholeRowVar so that it doesn't > clobber the state of the slot. That seems like a good plan. > Also, while trying to test the above scenario, I realized that the patch > as submitted was being way too cavalier about where it was applying > CheckVarSlotCompatibility and so on. The ASSIGN_FOO_VAR steps, for > instance, had no protection at all. I don't think that's true - the assign checks had copied the code from the old ExecBuildProjectionInfo, setting isSimpleVar iff (!attr->attisdropped && variable->vartype == attr->atttypid) - we can check that for projections in contrast to normal expressions because we already know the slot. The relevant comment for that, from before the patch, is: * inputDesc. (Note: if there is a type mismatch then ExecEvalScalarVar * will probably throw an error at runtime, but we leave that to it.) */ > I hope to have a fully reviewed patch to pass back to you tomorrow. > Or Saturday at the latest. Cool. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Hi,m On 2017-03-23 17:40:55 -0400, Tom Lane wrote: > Stylistic thought ... I am wondering if it wouldn't be a good idea > to replace EEOP_CASE_WHEN_STEP, EEOP_CASE_THEN_STEP, EEOP_COALESCE, > and EEOP_ARRAYREF_CHECKINPUT with instructions defined in a less > usage-dependent way as > > EEOP_JUMP unconditional jump > EEOP_JUMP_IF_NULL jump if step result is null > EEOP_JUMP_IF_NOT_NULL jump if step result isn't null > EEOP_JUMP_IF_NOT_TRUE jump if step result isn't TRUE > > One could imagine later filling out this set with the other BoolTest > condition types, but that seems to be all we need right now. Hm, no arguments against, but I'm also not particularly excited about the change. > These are basically just renamings of the step types that exist now, > although EEOP_ARRAYREF_CHECKINPUT would have to drop its not-very- > necessary Assert(!op->d.arrayref.state->isassignment). I won't shed a tear about that assert's removal. > Well, I guess I should say that they're renamings of the semantics > that I have for these steps in my working copy; for instance, I got > rid of casewhen.value/casewhen.isnull in favor of letting CASE WHEN > expressions evaluate into the CASE's final output variable. That sounds like a sensible change (in the abstract, I obviously haven't seen your working copy). Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
I found a rather nasty bug :-( ... the comment in EEOP_INNER_VAR_FIRST about +/* + * Can't assert tts_nvalid, as wholerow var evaluation or such + * could have materialized the slot - but the contents are still + * valid :/ + */ +Assert(op->d.var.attnum >= 0); is actually averting its eyes from a potentially serious problem. If we go through ExecEvalWholeRowVar, which calls ExecFetchSlotTupleDatum, and ExecFetchSlotTuple decides it has to materialize the tuple, then ExecMaterializeSlot does this: /* * Drop the pin on the referenced buffer, if there is one. */ if (BufferIsValid(slot->tts_buffer)) ReleaseBuffer(slot->tts_buffer); slot->tts_buffer = InvalidBuffer; /* * Mark extracted state invalid. This is important because the slot is * not supposed to depend any more on the previous external data; we * mustn't leave any dangling pass-by-reference datums in tts_values. * However, we have not actually invalidated any such datums, if there * happen to be any previously fetched from the slot. (Note in particular * that we have not pfree'd tts_mintuple, if there is one.) */ slot->tts_nvalid = 0; The problem here is that once we drop the buffer pin, any pointers we may have into on-disk data are dangling pointers --- we're at risk of some other backend taking away that shared buffer. (So I'm afraid that the commentary there is overly optimistic.) So even though those pointers may still be there beyond tts_nvalid, subsequent references to them are very dangerous. I think that it's pretty hard to hit this in practice, maybe impossible, because the normal case for an "on-disk" tuple is that TTS_HAS_PHYSICAL_TUPLE is true, so that ExecFetchSlotTuple won't change the state of the slot. If we have a virtual tuple that has to be materialized, then by that very token it won't have a buffer pin to drop. But I find this fragile as heck, and the aforesaid patch comment surely isn't adequately documenting the safety considerations. Also, if there ever were a live bug here, reproducing it would be damn hard because of the low probability that a just-unpinned buffer would get replaced any time soon. (Hm, I wonder whether the buffer cache needs something analogous to the syscaches' CLOBBER_CACHE_ALWAYS behavior...) Besides which, I really really don't like the lack of an "attnum < tts_nvalid" assertion there; that's just obviously failing to check for very simple bugs, such as getting the FETCHSOME steps wrong. So I think that we have got to fix ExecEvalWholeRowVar so that it doesn't clobber the state of the slot. Right at the moment, the only way to do that seems to be to do this instead of ExecFetchSlotTupleDatum: tuple = ExecCopySlotTuple(slot); dtuple = (HeapTupleHeader) DatumGetPointer(heap_copy_tuple_as_datum(tuple, slot->tts_tupleDescriptor)); heap_freetuple(tuple); That's kind of annoying because of the double copying involved, but tuptoaster.c doesn't expose any functionality for this except heap_copy_tuple_as_datum(). I figure we can improve it later --- it looks like we can refactor heap_copy_tuple_as_datum to expose a function that reads from Datum/isnull arrays, and then call it on the slot's tts_values/tts_isnull arrays. Seems like it might be a good idea to think a bit harder about the TupleTableSlot APIs, too, and reduce the number of cases where execTuples exposes destructive changes to the state of a slot. Also, while trying to test the above scenario, I realized that the patch as submitted was being way too cavalier about where it was applying CheckVarSlotCompatibility and so on. The ASSIGN_FOO_VAR steps, for instance, had no protection at all. Think I have that all fixed up though. I hope to have a fully reviewed patch to pass back to you tomorrow. Or Saturday at the latest. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Stylistic thought ... I am wondering if it wouldn't be a good idea to replace EEOP_CASE_WHEN_STEP, EEOP_CASE_THEN_STEP, EEOP_COALESCE, and EEOP_ARRAYREF_CHECKINPUT with instructions defined in a less usage-dependent way as EEOP_JUMP unconditional jump EEOP_JUMP_IF_NULL jump if step result is null EEOP_JUMP_IF_NOT_NULL jump if step result isn't null EEOP_JUMP_IF_NOT_TRUE jump if step result isn't TRUE One could imagine later filling out this set with the other BoolTest condition types, but that seems to be all we need right now. These are basically just renamings of the step types that exist now, although EEOP_ARRAYREF_CHECKINPUT would have to drop its not-very- necessary Assert(!op->d.arrayref.state->isassignment). Well, I guess I should say that they're renamings of the semantics that I have for these steps in my working copy; for instance, I got rid of casewhen.value/casewhen.isnull in favor of letting CASE WHEN expressions evaluate into the CASE's final output variable. At least to me, I think the compiling code would be more readable this way. I find WHEN_STEP and THEN_STEP a bit odd because they are emitted after, not before, the expressions you'd think they control. ARRAYREF_CHECKINPUT is pretty vaguely named, too. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Hi, On 2017-03-23 13:14:59 -0400, Tom Lane wrote: > Looking at some of the coding choices you made here, I see that you're > making a hard assumption that called functions never scribble on their > fcinfo->arg/argnull arrays. I do not believe that we've ever had such > an assumption before. I think we did that before, e.g. ExecEvalScalarArrayOp(). Think there's others too. > Are we comfortable with that? If so, we'd better document it. I think it's ok, but we indeed should document it. I recall a note somewhere... Can't find it anywhere however, might have misremembered a note about pass-by-ref arguments. fmgr/README? A note in FunctionCallInfoData's definition? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Looking at some of the coding choices you made here, I see that you're making a hard assumption that called functions never scribble on their fcinfo->arg/argnull arrays. I do not believe that we've ever had such an assumption before. Are we comfortable with that? If so, we'd better document it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On 2017-03-22 13:15:41 -0400, Tom Lane wrote: > BTW, I'm fairly concerned by what you did in nodeTidscan.c, ie delaying > compile of the TID expressions until TidListCreate. I think that probably > fails for cases involving, eg, subplans in the expressions; we need > subplans to get linked to the parent node, and this way won't do it till > (maybe) too late. Barring objection I'm going to rearrange that so that > we still do the compile part at executor start. No objection here. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
BTW, I'm fairly concerned by what you did in nodeTidscan.c, ie delaying compile of the TID expressions until TidListCreate. I think that probably fails for cases involving, eg, subplans in the expressions; we need subplans to get linked to the parent node, and this way won't do it till (maybe) too late. Barring objection I'm going to rearrange that so that we still do the compile part at executor start. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Andres Freund writes: > On 2017-03-22 10:41:06 -0400, Tom Lane wrote: >> * execQual.c doesn't seem to have a unifying reason to exist anymore. >> It certainly has little to do with evaluating typical qual expressions; >> what's left in there seems to be mostly concerned with SRFs. I feel >> like it might be a good idea to rename it, but to what? execExprUtils.c >> perhaps? Or maybe we should destroy it altogether, shoving the SRF >> stuff into nodeFunctionscan.c and moving what little remains into >> execUtils.c. > Yea, I was wondering about that too. What would we do with > GetAttributeByName/Num? I was thinking execUtils.c for those. >> * I do not like the function name ExecInstantiateExpr(). Webster's >> defines "instantiate" as "to represent (an abstraction) by a concrete >> instance", which does not seem to me to have a lot to do with what this >> function actually does. >> ... > Either of those work, but they don't strike me as perfect either, but I > can't come up with something better (ExecReadyExprForExec()?). Actually, ExecReadyExpr seems kind of nice for this (using "ready" in its verb sense, "to prepare (someone or something) for an activity or purpose"). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On 2017-03-22 10:41:06 -0400, Tom Lane wrote: > I've been busily hacking away on this, trying to make things cleaner and > fix a couple of bugs I stumbled across. (Confusion between ExecQual and > ExecCheck, for instance - we apparently lack regression tests exercising > constraints-returning-null in corner cases such as table rewrite.) It > will probably be a day or two more before I'm done. Thanks! > A couple of naming questions: > > * I concur with Heikki's dislike for the file name execInterpExpr.c. > Would it make it any better to switch to execExprInterp.c? I think > that having all this stuff under a common pattern execExpr*.c would > be a good thing (and I notice you've already got one comment referring > to them that way ...) That works for me. > * execQual.c doesn't seem to have a unifying reason to exist anymore. > It certainly has little to do with evaluating typical qual expressions; > what's left in there seems to be mostly concerned with SRFs. I feel > like it might be a good idea to rename it, but to what? execExprUtils.c > perhaps? Or maybe we should destroy it altogether, shoving the SRF > stuff into nodeFunctionscan.c and moving what little remains into > execUtils.c. Yea, I was wondering about that too. What would we do with GetAttributeByName/Num? > * I do not like the function name ExecInstantiateExpr(). Webster's > defines "instantiate" as "to represent (an abstraction) by a concrete > instance", which does not seem to me to have a lot to do with what this > function actually does. It perhaps makes a *bit* more sense if you view it from the POV that, with the future JIT support (WIP versions of which I posted previously), it'd actually create a compiled function which'd largely be independent of the of ->steps (except for the non-hotpath functions, which'd still end up using it). So one of the above "conrete instances" would be the interpeted version, another the compiled one. No, not an entirely convincing argument. > There's nothing very abstract about its input. > More, the implication of "instantiate" is that you can instantiate any > number of representatives of the same abstraction, but this scribbles > on the input in a one-way fashion. I think perhaps something like > ExecPrepareExpr or ExecFinishExpr or something along that line would > be better, but nothing is really standing out as le mot juste. Either of those work, but they don't strike me as perfect either, but I can't come up with something better (ExecReadyExprForExec()?). Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
I've been busily hacking away on this, trying to make things cleaner and fix a couple of bugs I stumbled across. (Confusion between ExecQual and ExecCheck, for instance - we apparently lack regression tests exercising constraints-returning-null in corner cases such as table rewrite.) It will probably be a day or two more before I'm done. A couple of naming questions: * I concur with Heikki's dislike for the file name execInterpExpr.c. Would it make it any better to switch to execExprInterp.c? I think that having all this stuff under a common pattern execExpr*.c would be a good thing (and I notice you've already got one comment referring to them that way ...) * execQual.c doesn't seem to have a unifying reason to exist anymore. It certainly has little to do with evaluating typical qual expressions; what's left in there seems to be mostly concerned with SRFs. I feel like it might be a good idea to rename it, but to what? execExprUtils.c perhaps? Or maybe we should destroy it altogether, shoving the SRF stuff into nodeFunctionscan.c and moving what little remains into execUtils.c. * I do not like the function name ExecInstantiateExpr(). Webster's defines "instantiate" as "to represent (an abstraction) by a concrete instance", which does not seem to me to have a lot to do with what this function actually does. There's nothing very abstract about its input. More, the implication of "instantiate" is that you can instantiate any number of representatives of the same abstraction, but this scribbles on the input in a one-way fashion. I think perhaps something like ExecPrepareExpr or ExecFinishExpr or something along that line would be better, but nothing is really standing out as le mot juste. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Andres Freund writes: > On 2017-03-20 16:06:27 -0400, Tom Lane wrote: >> ... is there a reason why resultnum for EEOP_ASSIGN_* steps is declared >> size_t and not just int? Since it's an array index, and one that >> certainly can't be bigger than AttrNumber, that seems rather confusing. > Not that I can see, no. I guess I might have "overcompensated" when > changing it from AttrNumber - AttrNumber isn't a good idea because that > needs an extra move-zero-extend, because 16bit indexing isn't that well > supported on x86. But that doesn't mean it should be a 64bit number - > to the contrary actually. OK, will fix in the edits I'm working on. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On 2017-03-20 16:06:27 -0400, Tom Lane wrote: > ... is there a reason why resultnum for EEOP_ASSIGN_* steps is declared > size_t and not just int? Since it's an array index, and one that > certainly can't be bigger than AttrNumber, that seems rather confusing. Not that I can see, no. I guess I might have "overcompensated" when changing it from AttrNumber - AttrNumber isn't a good idea because that needs an extra move-zero-extend, because 16bit indexing isn't that well supported on x86. But that doesn't mean it should be a 64bit number - to the contrary actually. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
... is there a reason why resultnum for EEOP_ASSIGN_* steps is declared size_t and not just int? Since it's an array index, and one that certainly can't be bigger than AttrNumber, that seems rather confusing. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Andres Freund writes: > Additionally I added a regression test for the nearly entirely untested > nodeTidscan.c, after I'd broken it previously without noticing (thanks > Andreas). I went ahead and pushed this part, since it seemed pretty uncontroversial. I added a bit more stuff to get the LOC measurement up on the planner side too. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Andres Freund writes: > On 2017-03-15 20:09:03 -0400, Tom Lane wrote: >> I think it would be worth creating a README file giving an overview >> of how all of this patch is supposed to work. You also need to do a >> whole lot more work on the function-level comments. > I tried to improve upon both fronts. I've added the higher level > explanation to executor/README, but I don't feel very strong about that. > I'm not quite sure it's exactly what you wanted however, the above ask > could also be understood to have more of an motivational angle, > describing why and what exactly is changed? I'm also still not sure how > understandable it's for anybody that hasn't had their head in this for a > while... Well, I wasn't totally sure what was needed either. I'm coming to this relatively fresh, having paid little attention to the thread up to now, so maybe I'll try to add material to what you wrote as I figure things out. > Did I understand correctly that you'd rather just merge > ExecGetLastAttnums into execExpr.c, instead of making it globally > available? Yeah, just moving it over seems like the thing to do for now. We can expose it later if there proves to be an actual reason to do that, but as I mentioned, I'm doubtful that there will be one. I'll start reading these... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Hi, On 2017-03-19 23:55:50 -0400, Tom Lane wrote: > Andres Freund writes: > > I've been pondering if we can't entirely get rid of CaseTest etc, the > > amount of hackery required seems not like a good thing. One way I'd > > prototyped was to replace them with PARAM_EXEC nodes - then the whole > > issue of them potentially having different values at different parts of > > an expression vanishes because the aliasing is removed. > > Yes, replacing all of that with Param slots had occurred to me too. > We might want to keep the special parse node types for convenience in > reverse-listing, but having them act just like PARAM_EXEC for execution > purposes seems promising. As long as that special parse-time node is part of the same value numbering, that makes sense (could just name make it a subtype of param ala PARAM_CASE). I don't think we actually do anything useful in ruleutils etc with either CaseTest or CoerceToDomainValue. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Andres Freund writes: > On 2017-03-16 11:15:16 -0400, Tom Lane wrote: >> The thing that actually made the ExecEvalCase code into a bug was that >> we were using ExprContext-level fields to store the current caseValue, >> allowing aliasing to occur across nested CASEs. I think that in >> this implementation, it ought to be possible to get rid of >> ExprContext.caseValue_datum et al altogether, in favor of some storage >> location that's private to each CASE expression. I'm a bit disappointed >> that that doesn't seem to have happened. > ... Unfortunately > CaseTest/CoerceToDomainValue are reused outside of domain / case > expressions in a bunch of places (plpgsql uses CaseTest for casts > evaluation, validateDomainConstraint/domain_check_input evaluate domain > constraints without a CoerceToDomain node). Yeah, there's various stuff that did that for expediency. > I'd like to get rid of those usages, but that'd recurse into rewriting > plpgsql casts and other random pieces of code into a different approach > - something I'd like to avoid doing at the same as this already large > patch. Agreed, maybe we should just plan to clean that up later. > I've been pondering if we can't entirely get rid of CaseTest etc, the > amount of hackery required seems not like a good thing. One way I'd > prototyped was to replace them with PARAM_EXEC nodes - then the whole > issue of them potentially having different values at different parts of > an expression vanishes because the aliasing is removed. Yes, replacing all of that with Param slots had occurred to me too. We might want to keep the special parse node types for convenience in reverse-listing, but having them act just like PARAM_EXEC for execution purposes seems promising. >> Eventually, I would also like to find a way to remove the restriction >> imposed by the other part of f0c7b789a, ie that we can't inline a SQL >> function when that would result in intermixing two levels of CASE >> expression. > That seems to suggest my PARAM_EXEC idea isn't necessarily perfect - > inlining would cause aliasing again, but it'd also not hard to fix that > up. Right, inlining would probably require some parameter-renumbering to avoid aliasing. But at least we'd have a clear framework for how to handle it, whereas the way things are now, it's just impossible. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Hi, On 2017-03-16 11:15:16 -0400, Tom Lane wrote: > The thing that actually made the ExecEvalCase code into a bug was that > we were using ExprContext-level fields to store the current caseValue, > allowing aliasing to occur across nested CASEs. I think that in > this implementation, it ought to be possible to get rid of > ExprContext.caseValue_datum et al altogether, in favor of some storage > location that's private to each CASE expression. I'm a bit disappointed > that that doesn't seem to have happened. The patch actually does so - during ExecInitExprRec the "relevant" case/domain testval is stored in ExprState->innermost_*, those pointers are then stored directly in the relevant steps for CaseTest/CoerceToDomainValue evaluation. Unfortunately CaseTest/CoerceToDomainValue are reused outside of domain / case expressions in a bunch of places (plpgsql uses CaseTest for casts evaluation, validateDomainConstraint/domain_check_input evaluate domain constraints without a CoerceToDomain node). I.e ExprContext.caseValue_datum etc. aren't used for normal expressions anymore, just for the ad-hoc hackery in a bunch of places. I'd like to get rid of those usages, but that'd recurse into rewriting plpgsql casts and other random pieces of code into a different approach - something I'd like to avoid doing at the same as this already large patch. I've been pondering if we can't entirely get rid of CaseTest etc, the amount of hackery required seems not like a good thing. One way I'd prototyped was to replace them with PARAM_EXEC nodes - then the whole issue of them potentially having different values at different parts of an expression vanishes because the aliasing is removed. > Eventually, I would also like to find a way to remove the restriction > imposed by the other part of f0c7b789a, ie that we can't inline a SQL > function when that would result in intermixing two levels of CASE > expression. An implementation along the lines of what I've sketched > above could handle that easily enough, as long as we could identify > which nested level of CASE a particular CaseTestExpr belongs to. > I don't know how to do that offhand, but it seems like it ought to be > soluble if we put a bit of thought into it. I haven't thought overly much about this, but I agree, it looks like it should be doable. That seems to suggest my PARAM_EXEC idea isn't necessarily perfect - inlining would cause aliasing again, but it'd also not hard to fix that up. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Andres Freund writes: > That said, it seems this is something that has to wait for a later > release, I'm putting back in similar logic as there was before (not a > branch, but change the opcode to a non-checking variant). Yeah, I was wondering if changing the opcode would be preferable to a first-time flag. >> So my recommendation is to drop 0001 and include the same one-time >> checks that execQual.c currently has as out-of-line one-time checks >> in the new code. We can revisit that later, but time grows short for >> v10. I would much rather have a solid version of 0004 and not 0001, >> than not have anything for v10 because we spent too much time dealing >> with adding new dependencies. > Doing that (+README). OK. I believe that we can get this committed after the documentation problems are sorted. I noticed a lot of small things that bugged me, mostly sloppy comments, but I think that the most efficient way to handle those is for me to make an editorial pass on your next version. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Hi, On 2017-03-17 11:36:30 -0400, Tom Lane wrote: > Having said all that, I think that 0001 is contributing very little to the > goals of this patch set. Andres stated that he wanted it so as to drop > some of the one-time checks that execQual.c currently does for Vars, but > I'm not really convinced that we could do that safely even with these > additional dependencies in place. Moreover, I really doubt that there's > a horrible performance cost from including something like > > if (unlikely(op->first_execution)) > out_of_line_checking_subroutine(...); > in the execution code for Vars. But it actually does (well, for some relatively small value of horrible) The issue is that op->first_execution is actually badly predictable, because it will be set back/forth between executions of different expressions (in the same plantree). Obviously you'll not notice if you have a Var and then some expensive stuff, but it's noticeable for cheap-ish expressions (say e.g. a single Var). So the branch prediction often doesn't handle this gracefully - it also just expands the set of to-be-tracked jumps. If we had a decent way to actually check this during ExecInitExpr() (et al), the whole discussion would be different - I'd be all expanding the set of such checks even. But I couldn't find a decent way to get there - when expressions are initialized we don't even get an ExprContext (not to speak of valid slots), nor is parent->plan very helpful. That said, it seems this is something that has to wait for a later release, I'm putting back in similar logic as there was before (not a branch, but change the opcode to a non-checking variant). > And that certainly isn't adding any > complexity for JIT compilation that we don't face anyway for other > execution step types. Obviously a if (op->first_execution) isn't an issue, it's actually only doing the first time through that's not easily possible. > So my recommendation is to drop 0001 and include the same one-time > checks that execQual.c currently has as out-of-line one-time checks > in the new code. We can revisit that later, but time grows short for > v10. I would much rather have a solid version of 0004 and not 0001, > than not have anything for v10 because we spent too much time dealing > with adding new dependencies. Doing that (+README). Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Andres Freund writes: > [ latest patches ] I looked through 0001 (the composite-type-dependencies one). Although I agree that it'd be good to tighten things up in that area, I do not think we want this specific patch: it tightens things too much. Consider this variant of the existing test case in create_view.sql: create table tt (f1 int, f2 int, f3 int); create function ttf() returns setof tt as 'select * from tt' language sql; create view vv as select f3 from ttf(); alter table tt drop column f3; Current code allows the above (and causes the view to return nulls afterwards). The 0001 patch would forbid the DROP, which is fine, but it would also forbid dropping either of the other two table columns, which I think is insupportable. Also, given the above table and function, consider create view vvv as select ttf(); This view returns a 3-column composite type. Now do alter table tt add column f4 int; Now the view returns a 4-column composite type. But at this point the patch will let you drop the f4 column, but not any of the earlier three. That's just weird. So I'm unhappy with the specific decisions made in 0001. I think what we really want there, probably, is for find_expr_references_walker to do more than nothing with Vars referencing non-RELATION RTEs. Having said all that, I think that 0001 is contributing very little to the goals of this patch set. Andres stated that he wanted it so as to drop some of the one-time checks that execQual.c currently does for Vars, but I'm not really convinced that we could do that safely even with these additional dependencies in place. Moreover, I really doubt that there's a horrible performance cost from including something like if (unlikely(op->first_execution)) out_of_line_checking_subroutine(...); in the execution code for Vars. And that certainly isn't adding any complexity for JIT compilation that we don't face anyway for other execution step types. So my recommendation is to drop 0001 and include the same one-time checks that execQual.c currently has as out-of-line one-time checks in the new code. We can revisit that later, but time grows short for v10. I would much rather have a solid version of 0004 and not 0001, than not have anything for v10 because we spent too much time dealing with adding new dependencies. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
I wrote: > Andres Freund writes: >> I don't think there's a danger similar to f0c7b789a here, because the >> "caller" (i.e. the node that needs the expression's result) expects >> resvalue/null to be overwritten. > Yeah, that's what I thought when I wrote the broken code in ExecEvalCase, > too. It was wrong. Along the same line, I notice that you've got some expr step types overwriting their own input, the various flavors of EEOP_BOOLTEST for example. Maybe that's all right but it doesn't really give me a warm feeling, especially when other single-argument operations like EEOP_BOOL_NOT_STEP are done differently. Again, I think a clear explanation of the design is essential to allow people to reason about whether this sort of trickery is safe. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Andres Freund writes: > On 2017-03-15 20:09:03 -0400, Tom Lane wrote: >> That scares me quite a bit, because it smells exactly like the sort of >> premature optimization that bit us on the rear in CVE-2016-5423 (cf commit >> f0c7b789a). > I don't think there's a danger similar to f0c7b789a here, because the > "caller" (i.e. the node that needs the expression's result) expects > resvalue/null to be overwritten. Yeah, that's what I thought when I wrote the broken code in ExecEvalCase, too. It was wrong. Basically you've got to be sure that no aliasing can occur, and I think the only way to be safe about that is to have a very clear rule about where results are allowed to get returned to, preferably one that doesn't ever re-use the same target. (I think the repeated use of the same subexpression result address for the arms of an AND or OR is okay, but it would be a good idea to have a clear statement of why.) The thing that actually made the ExecEvalCase code into a bug was that we were using ExprContext-level fields to store the current caseValue, allowing aliasing to occur across nested CASEs. I think that in this implementation, it ought to be possible to get rid of ExprContext.caseValue_datum et al altogether, in favor of some storage location that's private to each CASE expression. I'm a bit disappointed that that doesn't seem to have happened. Eventually, I would also like to find a way to remove the restriction imposed by the other part of f0c7b789a, ie that we can't inline a SQL function when that would result in intermixing two levels of CASE expression. An implementation along the lines of what I've sketched above could handle that easily enough, as long as we could identify which nested level of CASE a particular CaseTestExpr belongs to. I don't know how to do that offhand, but it seems like it ought to be soluble if we put a bit of thought into it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Hi, I got a test failure with this version of the patch in the postges_fdw. It looks to me like it was caused by a typo in the source code which is fixed in the attached patch. After applying this patch check-world passes. Andreas diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c index 1763be5cf0..2ba5a2ea69 100644 --- a/src/backend/executor/nodeTidscan.c +++ b/src/backend/executor/nodeTidscan.c @@ -42,7 +42,6 @@ static void TidListCreate(TidScanState *tidstate); static int itemptr_comparator(const void *a, const void *b); static TupleTableSlot *TidNext(TidScanState *node); - /* * Compute the list of TIDs to be visited, by evaluating the expressions * for them. @@ -101,6 +100,7 @@ TidListCreate(TidScanState *tidstate) else if (IsCTIDVar(arg2)) exprstate = ExecInitExpr((Expr *) linitial(fex->args), &tidstate->ss.ps); + else elog(ERROR, "could not identify CTID variable"); itemptr = (ItemPointer) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Hi, On 2017-03-15 20:09:03 -0400, Tom Lane wrote: > Andres Freund writes: > > [ new patches ] > > I've started to look at 0004, and the first conclusion I've come to > is that it's *direly* short of documentation. To the point that I'd > vote against committing it if something isn't done about that. Yea, I asked for input about what's hard to understand and what's not - I stared at this for a *lot* of time, and it all kinda looks easy-ish now. I'm more than willing to expand on the below, and other pieces. > As an example, it's quite unclear how ExprEvalSteps acquire correct > resnull/resvalue pointers, and the algorithm for that seems nontrivial. > It doesn't help any that the arguments of ExecInitExprRec are entirely > undocumented. Generally whatever wants the result of a (sub-)expression passes in the desired resvalue/resnull. E.g. when doing a function call the individual arguments are each prepared for evaluation using ExecInitExprRec() and resvalue/resnull are pointing into fcinfo's arg/nulls[i]. > I think it would be worth creating a README file giving an overview > of how all of this patch is supposed to work. You also need to do a > whole lot more work on the function-level comments. Ok. > A specific thing I noticed in the particular area of > what-gets-returned-where is this bit in EEOP_SCALARARRAYOP setup: > > +/* > + * Evaluate array argument into our return value, overwrite > + * with comparison results afterwards. > + */ > +ExecInitExprRec((Expr *) lsecond(opexpr->args), parent, > state, > +resv, resnull); > > That scares me quite a bit, because it smells exactly like the sort of > premature optimization that bit us on the rear in CVE-2016-5423 (cf commit > f0c7b789a). I don't think there's a danger similar to f0c7b789a here, because the "caller" (i.e. the node that needs the expression's result) expects resvalue/null to be overwritten. It'll e.g. be the value "slot" of one arm (is there a better name for one part of a boolean expression?) of a boolean expression. > What's it really buying us to overwrite the return value > early rather than storing into the fcinfo's second argument slot? That'd work just as well. > (The memory of that CVE is part of what's prompting me to demand a clear > explanation of the algorithm for deciding where steps return their > results. Even if this particular code is safe, somebody is going to do > something unsafe in future if there's not a documented rule to follow.) I don't think there's a danger here, but I think you more generally have a point. > Another thing that ties into the do-I-understand-this-at-all question > is this bit: > > +EEO_CASE(EEOP_BOOL_AND_STEP_FIRST) > +{ > +*op->d.boolexpr.anynull = false; > + > +/* > + * Fallthrough (can't be last - ANDs have two arguments at > least). > + */ > +} > + > +EEO_CASE(EEOP_BOOL_AND_STEP) > > It seems like this is missing an "op++;" before falling through. If it > isn't, because really BOOL_AND_STEP_FIRST is defined as clearing anynull > and then also doing a regular BOOL_AND_STEP, then the comment seems rather > off point. It's intended to fall through this way, i.e. the difference between _FIRST and not is just that only the former clears anynull. What the comment is about, admittedly too cryptically, is that the _LAST step that then evaluates anynull cannot be the same step as EEOP_BOOL_AND_STEP_FIRST, because bool AND/OR always has at least two "arms". Will expand / move. > BTW, it sure seems like ExecInitExprRec and related code ought to set > off all sorts of valgrind complaints? It's not being careful at all > to ensure that all fields of the "scratch" record get initialized before > we memcpy it to someplace. It worked not long ago - valgrind's replacment memcpy() doesn't trigger undefined memory warnings, it just copies the "definedness" of each byte (or bit?). But your point gives me an idea: It seems like a good idea to VALGRIND_MAKE_MEM_UNDEFINED() the "scratch" step at some convenient places, so the definedness of individual operations is more useful. Thanks for the look! - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On 2017-03-15 18:48:28 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2017-03-15 18:16:57 -0400, Tom Lane wrote: > >> [ scratches head... ] What deforming logic do you think I'm proposing > >> removing? > > > I thought you were suggesting that we don't do the get_last_attnums (and > > inlined version in the isSimpleVar case) at execution time anymore, > > instead relying on logic in the planner to know how much to deform ahead > > of time. Then we'd do slot_getsomeattrs in the appropriate places. But > > I understood you suggesting to do so only in scan nodes - which doesn't > > seem sufficient, due to the use of materialized / minimal tuples in > > other types of nodes. Did I misunderstand? > > We would need to do it anywhere that we might be projecting from a > materialized tuple, I suppose. From the planner's standpoint it would be > about as easy to do this for all plan nodes as only selected ones. Yea. > Anyway the core point here seems to be that skipping ExecProject misses a > bet because the underlying tuple doesn't get disassembled. A quick-hack > way of seeing if this helps might be to do slot_getallattrs in the places > where we skip ExecProject. > I'm not sure we'd want that as a permanent > solution, because it's possible that it extracts columns not actually > needed, but it should at least move the hotspot in your example case. Hm, that could hurt pretty badly if we actually only access the first few columns. I wonder if we shouldn't essentially get rid of all slot_getattr() calls, and replace them with one slot_getsomeattrs() and then direct tts_values/nulls access. Where we compute the column number is then essentially a separate discussion. Looking around most of them seem to access multiple columns, and several of them probably can't conveniently done via the planner: src/backend/catalog/partition.c 1635: datum = slot_getattr(slot, keycol, &isNull); acesses all the partitioned columns and has convenient location to compute column to deform to (RelationGetPartitionDispatchInfo). src/backend/catalog/index.c 1797: iDatum = slot_getattr(slot, keycol, &isNull); acesses all the partitioned columns and has convenient location to compute column to deform to (BuildIndexInfo). src/backend/utils/adt/orderedsetaggs.c 1196: Datum d = slot_getattr(slot, nargs + 1, &isnull); 1359: Datum d = slot_getattr(slot, nargs + 1, &isnull); no benefit. src/backend/executor/nodeMergeAppend.c 246:datum1 = slot_getattr(s1, attno, &isNull1); 247:datum2 = slot_getattr(s2, attno, &isNull2); accesses all columns in the sort key, convenient place to prepare (ExecInitMergeAppend). src/backend/executor/execQual.c 594: * caught inside slot_getattr). What we have to check for here is the 600: * Note: we allow a reference to a dropped attribute. slot_getattr will 637:return slot_getattr(slot, attnum, isNull); 676:return slot_getattr(slot, attnum, isNull); 1681: return slot_getattr(fcache->funcResultSlot, 1, isNull); Already converted in proposed expression evaluation patch. src/backend/executor/nodeSubplan.c 367:dvalue = slot_getattr(slot, 1, &disnull); 395:prmdata->value = slot_getattr(slot, col, &(prmdata->isnull)); 565:prmdata->value = slot_getattr(slot, col, 1015: dvalue = slot_getattr(slot, 1, &disnull); Accesses all params, convenient location to compute column number (ExecInitSubPlan). src/backend/executor/execCurrent.c 186:/* Use slot_getattr to catch any possible mistakes */ 188: DatumGetObjectId(slot_getattr(scanstate->ss_ScanTupleSlot, 193: DatumGetPointer(slot_getattr(scanstate->ss_ScanTupleSlot, Accesses only system columns. src/backend/executor/nodeSetOp.c 107:flag = DatumGetInt32(slot_getattr(inputslot, Not an actual column. src/backend/executor/nodeGatherMerge.c 678:datum1 = slot_getattr(s1, attno, &isNull1); 679:datum2 = slot_getattr(s2, attno, &isNull2); Same as mergeAppend (huh, why did we copy this code), i.e. beneficial. src/backend/executor/functions.c 970:value = slot_getattr(slot, 1, &(fcinfo->isnull)); no benefit. src/backend/executor/execJunk.c 253:return slot_getattr(slot, attno, isNull); no benefit. src/backend/executor/nodeNestloop.c 136:prm->value = slot_getattr(outerTupleSlot, accesses all future param values, convenient place to compute column number. src/backend/executor/execGrouping.c 100:attr1 = slot_getattr(slot1, att, &isNull1); 102:attr2 = slot_getattr(slot2, att, &isNull2); 170:attr1 = slot_getattr(slot1, att, &isNull1); 175:attr2 = slot_getattr(slot2, att, &isNull2); 501:attr = slot_getattr(slot, att, &isNull); accesses all g
Re: [HACKERS] WIP: Faster Expression Processing v4
Andres Freund writes: > [ new patches ] I've started to look at 0004, and the first conclusion I've come to is that it's *direly* short of documentation. To the point that I'd vote against committing it if something isn't done about that. As an example, it's quite unclear how ExprEvalSteps acquire correct resnull/resvalue pointers, and the algorithm for that seems nontrivial. It doesn't help any that the arguments of ExecInitExprRec are entirely undocumented. I think it would be worth creating a README file giving an overview of how all of this patch is supposed to work. You also need to do a whole lot more work on the function-level comments. A specific thing I noticed in the particular area of what-gets-returned-where is this bit in EEOP_SCALARARRAYOP setup: +/* + * Evaluate array argument into our return value, overwrite + * with comparison results afterwards. + */ +ExecInitExprRec((Expr *) lsecond(opexpr->args), parent, state, +resv, resnull); That scares me quite a bit, because it smells exactly like the sort of premature optimization that bit us on the rear in CVE-2016-5423 (cf commit f0c7b789a). What's it really buying us to overwrite the return value early rather than storing into the fcinfo's second argument slot? (The memory of that CVE is part of what's prompting me to demand a clear explanation of the algorithm for deciding where steps return their results. Even if this particular code is safe, somebody is going to do something unsafe in future if there's not a documented rule to follow.) Another thing that ties into the do-I-understand-this-at-all question is this bit: +EEO_CASE(EEOP_BOOL_AND_STEP_FIRST) +{ +*op->d.boolexpr.anynull = false; + +/* + * Fallthrough (can't be last - ANDs have two arguments at least). + */ +} + +EEO_CASE(EEOP_BOOL_AND_STEP) It seems like this is missing an "op++;" before falling through. If it isn't, because really BOOL_AND_STEP_FIRST is defined as clearing anynull and then also doing a regular BOOL_AND_STEP, then the comment seems rather off point. It should be more like "Fall through to do regular AND step processing as well". The place where the comment would be on point is probably over here: +case AND_EXPR: +/* + * ANDs have at least two arguments, so that + * no step needs to be both FIRST and LAST. + */ +Assert(list_length(boolexpr->args) >= 2); + +if (off == 0) +scratch.opcode = EEOP_BOOL_AND_STEP_FIRST; +else if (off + 1 == nargs) +scratch.opcode = EEOP_BOOL_AND_STEP_LAST; +else +scratch.opcode = EEOP_BOOL_AND_STEP; +break; although I think the Assert ought to be examining nargs not list_length(boolexpr->args) so that it has some visible connection to the code after it. (This all applies to OR processing as well, of course.) BTW, it sure seems like ExecInitExprRec and related code ought to set off all sorts of valgrind complaints? It's not being careful at all to ensure that all fields of the "scratch" record get initialized before we memcpy it to someplace. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Andres Freund writes: > On 2017-03-15 18:16:57 -0400, Tom Lane wrote: >> [ scratches head... ] What deforming logic do you think I'm proposing >> removing? > I thought you were suggesting that we don't do the get_last_attnums (and > inlined version in the isSimpleVar case) at execution time anymore, > instead relying on logic in the planner to know how much to deform ahead > of time. Then we'd do slot_getsomeattrs in the appropriate places. But > I understood you suggesting to do so only in scan nodes - which doesn't > seem sufficient, due to the use of materialized / minimal tuples in > other types of nodes. Did I misunderstand? We would need to do it anywhere that we might be projecting from a materialized tuple, I suppose. From the planner's standpoint it would be about as easy to do this for all plan nodes as only selected ones. Anyway the core point here seems to be that skipping ExecProject misses a bet because the underlying tuple doesn't get disassembled. A quick-hack way of seeing if this helps might be to do slot_getallattrs in the places where we skip ExecProject. I'm not sure we'd want that as a permanent solution, because it's possible that it extracts columns not actually needed, but it should at least move the hotspot in your example case. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On 2017-03-15 18:16:57 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2017-03-15 17:33:46 -0400, Tom Lane wrote: > >> We could make the planner mark each table scan node with the highest > >> column number that the plan will access, and use that to drive a > >> slot_getsomeattrs call in advance of any access to tuple contents. > > > probably isn't sufficient - we build non-virtual tuples in a good number > > of places (sorts, tuplestore using stuff like nodeMaterial, nodeHash.c > > output, ...). I suspect it'd have measurable negative consequences if > > we removed the deforming logic for all expressions/projections above > > such nodes. I guess we could just do such a logic for every Plan node? > > [ scratches head... ] What deforming logic do you think I'm proposing > removing? I thought you were suggesting that we don't do the get_last_attnums (and inlined version in the isSimpleVar case) at execution time anymore, instead relying on logic in the planner to know how much to deform ahead of time. Then we'd do slot_getsomeattrs in the appropriate places. But I understood you suggesting to do so only in scan nodes - which doesn't seem sufficient, due to the use of materialized / minimal tuples in other types of nodes. Did I misunderstand? - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Andres Freund writes: > On 2017-03-15 17:33:46 -0400, Tom Lane wrote: >> We could make the planner mark each table scan node with the highest >> column number that the plan will access, and use that to drive a >> slot_getsomeattrs call in advance of any access to tuple contents. > probably isn't sufficient - we build non-virtual tuples in a good number > of places (sorts, tuplestore using stuff like nodeMaterial, nodeHash.c > output, ...). I suspect it'd have measurable negative consequences if > we removed the deforming logic for all expressions/projections above > such nodes. I guess we could just do such a logic for every Plan node? [ scratches head... ] What deforming logic do you think I'm proposing removing? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On 2017-03-15 17:33:46 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2017-03-15 16:07:14 -0400, Tom Lane wrote: > >> As for ExecHashGetHashValue, it's most likely going to be working from > >> virtual tuples passed up to the join, which won't benefit from > >> predetermination of the last column to be accessed. The > >> tuple-deconstruction would have happened while projecting in the scan > >> node below. > > > I think the physical tuple stuff commonly thwarts that argument? On > > master for tpch's Q5 you can e.g. see the following profile (master): > > Hmmm ... I think you're mistaken in fingering the physical-tuple > optimization per se, but maybe skipping ExecProject at the scan level > would cause this result? I think those are often related (i.e. we replace a smaller targetlist with a "physical" one, which then allows to skip ExecProject()). > I've thought for some time that it was dumb to have the executor > reverse-engineering this info at plan startup anyway. Yea, it'd be good if this (and some similar tasks like building interim tuple descriptors) could be moved to the planner. But: > We could make the planner mark each table scan node with the highest > column number that the plan will access, and use that to drive a > slot_getsomeattrs call in advance of any access to tuple contents. probably isn't sufficient - we build non-virtual tuples in a good number of places (sorts, tuplestore using stuff like nodeMaterial, nodeHash.c output, ...). I suspect it'd have measurable negative consequences if we removed the deforming logic for all expressions/projections above such nodes. I guess we could just do such a logic for every Plan node? - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Andres Freund writes: > On 2017-03-15 16:07:14 -0400, Tom Lane wrote: >> As for ExecHashGetHashValue, it's most likely going to be working from >> virtual tuples passed up to the join, which won't benefit from >> predetermination of the last column to be accessed. The >> tuple-deconstruction would have happened while projecting in the scan >> node below. > I think the physical tuple stuff commonly thwarts that argument? On > master for tpch's Q5 you can e.g. see the following profile (master): Hmmm ... I think you're mistaken in fingering the physical-tuple optimization per se, but maybe skipping ExecProject at the scan level would cause this result? I've thought for some time that it was dumb to have the executor reverse-engineering this info at plan startup anyway. We could make the planner mark each table scan node with the highest column number that the plan will access, and use that to drive a slot_getsomeattrs call in advance of any access to tuple contents. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On 2017-03-15 16:07:14 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2017-03-15 15:41:22 -0400, Tom Lane wrote: > >> Color me dubious. Which specific other places have you got in mind, and > >> do they have expression trees at hand that would tell them which columns > >> they really need to pull out? > > > I was thinking of execGrouping.c's execTuplesMatch(), > > TupleHashTableHash() (and unequal, but doubt that matters > > performancewise). There's also nodeHash.c's ExecHashGetValue(), but I > > think that'd possibly better fixed differently. > > The execGrouping.c functions don't have access to an expression tree > instructing them which columns to pull out of the tuple, so I fail to see > how get_last_attnums() would be of any use to them. I presume most of the callers do. We'd have to change the API somewhat, unless we just have a small loop in execTuplesMatch() determining the biggest column index (which might be worthwhile / acceptable). TupleHashTableHash() should be able to have that pre-computed in BuildTupleHashTable(). Might be more viable to go that way. > As for ExecHashGetHashValue, it's most likely going to be working from > virtual tuples passed up to the join, which won't benefit from > predetermination of the last column to be accessed. The > tuple-deconstruction would have happened while projecting in the scan > node below. I think the physical tuple stuff commonly thwarts that argument? On master for tpch's Q5 you can e.g. see the following profile (master): + 29.38% postgres postgres [.] ExecScanHashBucket + 16.72% postgres postgres [.] slot_getattr +5.51% postgres postgres [.] heap_getnext -5.50% postgres postgres [.] slot_deform_tuple - 98.07% slot_deform_tuple - 85.98% slot_getattr - 96.59% ExecHashGetHashValue - ExecHashJoin - ExecProcNode + 85.12% ExecHashJoin + 14.88% MultiExecHash + 3.41% ExecMakeFunctionResultNoSets + 14.02% slot_getsomeattrs + 1.58% ExecEvalScalarVarFast I.e. nearly all calls for slot_deform_tuple are from slot_getattrs in ExecHashGetHashValue(). And nearly all the time in slot_getattr is spent on code only executed for actual tuples: │ if (tuple == NULL) /* internal error */ 0.18 │ test %rax,%rax │ ↓ je 223 │* │* (We have to check this separately because of various inheritance and │* table-alteration scenarios: the tuple could be either longer or shorter │* than the tupdesc.) │*/ │ tup = tuple->t_data; 0.47 │ mov0x10(%rax),%rsi │ if (attnum > HeapTupleHeaderGetNatts(tup)) 75.42 │ movzwl 0x12(%rsi),%eax 0.70 │ and$0x7ff,%eax 0.47 │ cmp%eax,%ebx │ ↓ jg e8 - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Andres Freund writes: > On 2017-03-15 15:41:22 -0400, Tom Lane wrote: >> Color me dubious. Which specific other places have you got in mind, and >> do they have expression trees at hand that would tell them which columns >> they really need to pull out? > I was thinking of execGrouping.c's execTuplesMatch(), > TupleHashTableHash() (and unequal, but doubt that matters > performancewise). There's also nodeHash.c's ExecHashGetValue(), but I > think that'd possibly better fixed differently. The execGrouping.c functions don't have access to an expression tree instructing them which columns to pull out of the tuple, so I fail to see how get_last_attnums() would be of any use to them. As for ExecHashGetHashValue, it's most likely going to be working from virtual tuples passed up to the join, which won't benefit from predetermination of the last column to be accessed. The tuple-deconstruction would have happened while projecting in the scan node below. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On 2017-03-15 15:41:22 -0400, Tom Lane wrote: > Andres Freund writes: > > On March 15, 2017 12:33:28 PM PDT, Tom Lane wrote: > >> So for my money, you should drop 0002 altogether and just have 0004 > >> remove get_last_attnums() from execUtils.c and stick it into > >> execExpr.c. I suppose you still need the LastAttnumInfo API change > >> so as to decouple it from ProjectionInfo, but that's minor. > > > I think it's quite useful in other places too, we do repeated slot-getattrs > > in a bunch of places in the executor, and it's noticeable performance wise. > > Color me dubious. Which specific other places have you got in mind, and > do they have expression trees at hand that would tell them which columns > they really need to pull out? I was thinking of execGrouping.c's execTuplesMatch(), TupleHashTableHash() (and unequal, but doubt that matters performancewise). There's also nodeHash.c's ExecHashGetValue(), but I think that'd possibly better fixed differently. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Andres Freund writes: > On March 15, 2017 12:33:28 PM PDT, Tom Lane wrote: >> So for my money, you should drop 0002 altogether and just have 0004 >> remove get_last_attnums() from execUtils.c and stick it into >> execExpr.c. I suppose you still need the LastAttnumInfo API change >> so as to decouple it from ProjectionInfo, but that's minor. > I think it's quite useful in other places too, we do repeated slot-getattrs > in a bunch of places in the executor, and it's noticeable performance wise. Color me dubious. Which specific other places have you got in mind, and do they have expression trees at hand that would tell them which columns they really need to pull out? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On March 15, 2017 12:33:28 PM PDT, Tom Lane wrote: >Andres Freund writes: >> [ latest patches ] > >I looked through 0002-Make-get_last_attnums-more-generic.patch. >So for my money, you should drop 0002 altogether and just have 0004 >remove get_last_attnums() from execUtils.c and stick it into >execExpr.c. I suppose you still need the LastAttnumInfo API change >so as to decouple it from ProjectionInfo, but that's minor. I think it's quite useful in other places too, we do repeated slot-getattrs in a bunch of places in the executor, and it's noticeable performance wise. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Andres Freund writes: > [ latest patches ] I looked through 0002-Make-get_last_attnums-more-generic.patch. Although it seems relatively unobjectionable on its own, I'm not convinced that it's really useful to try to split it out like this. I see that 0004 removes the only call of ExecGetLastAttnums (the one in ExecBuildProjectionInfo) and then adds a single call in ExecInitExprSlots which is in execExpr.c. To me, the only reason ExecGetLastAttnums nee get_last_attnums is in execUtils.c in the first place is that it is a private subroutine of ExecBuildProjectionInfo. After these changes, it might as well be a private subroutine of ExecInitExprSlots. I'm suspicious of turning it into a globally accessible function as you've done here, because I doubt that it is of global use --- in particular, the fact that it doesn't deal specially with INDEX_VAR Vars seems rather specific to this one use-case. So for my money, you should drop 0002 altogether and just have 0004 remove get_last_attnums() from execUtils.c and stick it into execExpr.c. I suppose you still need the LastAttnumInfo API change so as to decouple it from ProjectionInfo, but that's minor. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Andres Freund writes: > On 2017-03-14 19:34:12 -0400, Tom Lane wrote: >> It seems bizarre that you chose to spell the new configure symbol as >> HAVE__COMPUTED_GOTO rather than HAVE_COMPUTED_GOTO > I went back-and-forth about this a number of times. We have a bunch of > symbols defined with HAVE__ as a prefix (and some with HAVE_GCC__) - and > more of the nearby code seems to use __ rather than _. I don't really > know why we started doing that, but it's far from new.. > Any idea why we introduce __ stuff? The nearby stuff is describing features that have a specific name that includes a leading underscore or two, like __builtin_unreachable(). That doesn't seem to apply here, so I wouldn't add extra underscores. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On 2017-03-14 23:10:25 -0400, Peter Eisentraut wrote: > On 3/14/17 19:40, Andres Freund wrote: > > Any idea why we introduce __ stuff? > > Because the symbols start with an underscore: > > /* Define to 1 if your compiler understands _Static_assert. */ > #undef HAVE__STATIC_ASSERT Oh, I guess that makes some sense. But we do so very inconsistently, given we only keep the leading underscores not the trailing ones (see HAVE__VA_ARGS, HAVE_FUNCNAME__FUNC). > But we don't need to introduce underscores that are not there. Indeed. Don't want to repost for just this, but consider it adapted (and moved as requested by Tom). - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On 3/14/17 19:40, Andres Freund wrote: > Any idea why we introduce __ stuff? Because the symbols start with an underscore: /* Define to 1 if your compiler understands _Static_assert. */ #undef HAVE__STATIC_ASSERT There is apparently some inconsistency when symbols start with more than one underscore. But we don't need to introduce underscores that are not there. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Hi, On 2017-03-14 19:34:12 -0400, Tom Lane wrote: > Andres Freund writes: > > [ new patch versions ] > > About to leave, but I had time to read 0003: > > It seems bizarre that you chose to spell the new configure symbol as > HAVE__COMPUTED_GOTO rather than HAVE_COMPUTED_GOTO I went back-and-forth about this a number of times. We have a bunch of symbols defined with HAVE__ as a prefix (and some with HAVE_GCC__) - and more of the nearby code seems to use __ rather than _. I don't really know why we started doing that, but it's far from new.. Any idea why we introduce __ stuff? > Also, being a neatnik, I would insert both the definition and the > call of that macro between PGAC_C_BUILTIN_UNREACHABLE and > PGAC_C_VA_ARGS, keeping it more or less in alphabetical order. That works for me. Thanks, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Andres Freund writes: > [ new patch versions ] About to leave, but I had time to read 0003: It seems bizarre that you chose to spell the new configure symbol as HAVE__COMPUTED_GOTO rather than HAVE_COMPUTED_GOTO, especially so when the comment for PGAC_C_COMPUTED_GOTO alleges it's the latter. Also, being a neatnik, I would insert both the definition and the call of that macro between PGAC_C_BUILTIN_UNREACHABLE and PGAC_C_VA_ARGS, keeping it more or less in alphabetical order. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On 2017-03-14 14:19:18 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2017-03-14 08:44:24 -0300, Alvaro Herrera wrote: > >> It would be good to have someone at least read it before pushing, but > >> I don't think anyone other than you has done so. > > > I'd love for somebody else > > to look through it, I tried asking multiple times... Seems like > > threatening a commit is the best way to get it :P > > I'm willing to look Cool. > but the last few messages make it sound like you're not all that close > to a finished version. Hm, doesn't seem that far off to me - the latest few rounds of changes have been largely polishing (naming, fighting w/ pgindent, language polishing in comments, etc), otherwise I've basically just added a few lines of optimization after doing more profiling. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On Tue, Mar 14, 2017 at 3:16 PM Andres Freund wrote: > Hm. Right now ExprState's are allocated in several places - but we > could easily move that to a central place. Have a bit of a hard time > seing that that branch during *initialization* time is that expensive, > especially given that previously we allocated a lot of memory separately > too. > I didn't make any comparisons of the cost of the new init against the old init with this change in particular - I just saw that it made the new init faster. I also didn't play around to determine if the savings was found in removing the branch misprediction or inlining or both. I certainly wouldn't hold up your commit for this, but it's something that might be worth a second look once the dust has settled.
Re: [HACKERS] WIP: Faster Expression Processing v4
Hi, On 2017-03-14 22:03:45 +, Douglas Doole wrote: > I do have one observation based on my experiments with your first version > of the code. In my tests, I found that expression init becomes a lot more > expensive in this new model. (That's neither a surprise, nor a > concern.) I suspect that's to a good degree because back then it'd often end up trying the "new" stuff, but then fall back to the old machinery due to unsupported operations. Which isn't a concern anymore, because now there's full coverage. Otherwise the cost aren't, according to my measurements, higher than before, excepting that some stuff that happened at execution time is now happening during initialization. > In particular, the function ExprEvalPushStep() is quite hot. In my > code I made the following changes: > > * Declare ExprEvalPushStep() "inline". > * Remove the "if (es->steps_alloc == 0)" condition > from ExprEvalPushStep(). > * In ExecInitExpr(), add: >state->steps_alloc = 16; >state->steps = palloc(sizeof(ExprEvalStep) * es->steps_alloc); > > I found that this cut the cost of initializing the expression by about 20%. > (Of course, that was on version 1 of your code, so the benefit may well be > different now.) Hm. Right now ExprState's are allocated in several places - but we could easily move that to a central place. Have a bit of a hard time seing that that branch during *initialization* time is that expensive, especially given that previously we allocated a lot of memory separately too. > On Tue, Mar 14, 2017 at 11:51 AM Andres Freund wrote: > > > > Hmm. Could we make the instructions variable size? It would allow packing > > > the small instructions even more tight, and we wouldn't need to obsess > > over > > > a particular maximum size for more complicated instructions. > > > > That makes jumps a lot more complicated. I'd experimented with it and > > given it up as "not worth it". > Back when I was at IBM, I spent a lot of time doing stuff like this. If you > want to commit with the fixed size arrays, I'm happy to volunteer to look > at packing it tighter as a follow-on piece of work. (It was already on my > list of things to try anyhow.) Yea, I think experimenting with that is a good idea. I just think this is complicated enough, and I don't want to add a whole lot more complexity for very debatable benefit. > > If we were to try to do so, we'd also > > not like storing the pointer and enum variants both, since it'd again > > would reduce the density. > From my experience, it's worth the small loss in density to carry around > both the pointer and the enum - it makes debugging so much easier. I found it annoying enough to print the instruction itself, due to the union - so printing the opcode using a function isn't too bad... Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Andres, sorry I haven't had a chance to look at this great stuff you've been doing. I've wanted to get to it, but work keeps getting in the way. ;-) I do have one observation based on my experiments with your first version of the code. In my tests, I found that expression init becomes a lot more expensive in this new model. (That's neither a surprise, nor a concern.) In particular, the function ExprEvalPushStep() is quite hot. In my code I made the following changes: * Declare ExprEvalPushStep() "inline". * Remove the "if (es->steps_alloc == 0)" condition from ExprEvalPushStep(). * In ExecInitExpr(), add: state->steps_alloc = 16; state->steps = palloc(sizeof(ExprEvalStep) * es->steps_alloc); I found that this cut the cost of initializing the expression by about 20%. (Of course, that was on version 1 of your code, so the benefit may well be different now.) On Tue, Mar 14, 2017 at 11:51 AM Andres Freund wrote: > > Hmm. Could we make the instructions variable size? It would allow packing > > the small instructions even more tight, and we wouldn't need to obsess > over > > a particular maximum size for more complicated instructions. > > That makes jumps a lot more complicated. I'd experimented with it and > given it up as "not worth it". Back when I was at IBM, I spent a lot of time doing stuff like this. If you want to commit with the fixed size arrays, I'm happy to volunteer to look at packing it tighter as a follow-on piece of work. (It was already on my list of things to try anyhow.) > If we were to try to do so, we'd also > not like storing the pointer and enum variants both, since it'd again > would reduce the density. > >From my experience, it's worth the small loss in density to carry around both the pointer and the enum - it makes debugging so much easier. - Doug Salesforce
Re: [HACKERS] WIP: Faster Expression Processing v4
Hi, On 2017-03-14 20:28:51 +0200, Heikki Linnakangas wrote: > On 03/14/2017 07:31 PM, Andres Freund wrote: > > On 2017-03-14 16:58:54 +0200, Heikki Linnakangas wrote: > > > * How tight are we on space in the ExprEvalStep union? Currently, the > > > jump-threading preparation replaces the opcodes with the goto labels, but > > > it > > > would be really nice to keep the original opcodes, for debugging purposes > > > if > > > nothing else. > > > > There's nothing left to spare :(. For debugging I've just used > > ExecEvalStepOp() - it's inconvenient to directly print the struct > > anyway, since gdb prints the whole union... > > This comment above the union is not accurate: > > > /* > > * Data for an operation. Inline stored data is faster to access, but > > also > > * bloats the size of all instructions. The union should be kept below > > 48 > > * bytes (so the entire struct is below 64bytes, a single cacheline on > > * common systems). > > */ > > On my x86-64 system, the whole struct is 64 bytes, but the union is only 40 > bytes. The fields before the union occupy 24 bytes. IOW, the union should be > kept below 40 bytes, not 48. Point. Will update. > Hmm. Could we make the instructions variable size? It would allow packing > the small instructions even more tight, and we wouldn't need to obsess over > a particular maximum size for more complicated instructions. That makes jumps a lot more complicated. I'd experimented with it and given it up as "not worth it". If we were to try to do so, we'd also not like storing the pointer and enum variants both, since it'd again would reduce the density. > A compromise might be to have the fixed size, but have "continuation" > opcodes for the more complicated instructions. An XmlExpr instruction, for > example, could have an extra instruction after the primary one, just to hold > more fields. When evaluating it, we would just increment the Program Counter > by two instead of one, to skip over the extra instruction. I think for cases where 40 bytes becomes the limit, it's usually better to have out-of-line state. We could do this, but we'd also waste a bunch of space in the array (namely the 24 byte Op "header"). Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On 03/14/2017 07:31 PM, Andres Freund wrote: On 2017-03-14 16:58:54 +0200, Heikki Linnakangas wrote: * How tight are we on space in the ExprEvalStep union? Currently, the jump-threading preparation replaces the opcodes with the goto labels, but it would be really nice to keep the original opcodes, for debugging purposes if nothing else. There's nothing left to spare :(. For debugging I've just used ExecEvalStepOp() - it's inconvenient to directly print the struct anyway, since gdb prints the whole union... This comment above the union is not accurate: /* * Data for an operation. Inline stored data is faster to access, but also * bloats the size of all instructions. The union should be kept below 48 * bytes (so the entire struct is below 64bytes, a single cacheline on * common systems). */ On my x86-64 system, the whole struct is 64 bytes, but the union is only 40 bytes. The fields before the union occupy 24 bytes. IOW, the union should be kept below 40 bytes, not 48. Hmm. Could we make the instructions variable size? It would allow packing the small instructions even more tight, and we wouldn't need to obsess over a particular maximum size for more complicated instructions. A compromise might be to have the fixed size, but have "continuation" opcodes for the more complicated instructions. An XmlExpr instruction, for example, could have an extra instruction after the primary one, just to hold more fields. When evaluating it, we would just increment the Program Counter by two instead of one, to skip over the extra instruction. For reference, here are the sizes of all the structs in the union: 40: xmlexpr 40: scalararrayop 40: minmax 40: iocoerce 40: convert_rowtype 32: wholerow 32: rowcompare_step 32: row 32: func 32: fieldstore 32: domaincheck 32: boolexpr 32: arrayexpr 32: arraycoerce 24: casewhen 24: casethen 24: arrayref_checksubscript 16: grouping_func 16: fieldselect 16: constval 16: casetest 16: assign_var 16: arrayref 8: window_func 8: subplan 8: sqlvaluefunction 8: param 8: nulltest_row 8: assign_tmp 8: alternative_subplan 8: aggref 4: var 4: rowcompare_final 4: qualexpr 4: fetch 4: coalesce - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Andres Freund writes: > On 2017-03-14 08:44:24 -0300, Alvaro Herrera wrote: >> It would be good to have someone at least read it before pushing, but >> I don't think anyone other than you has done so. > I'd love for somebody else > to look through it, I tried asking multiple times... Seems like > threatening a commit is the best way to get it :P I'm willing to look, but the last few messages make it sound like you're not all that close to a finished version. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On 2017-03-14 08:44:24 -0300, Alvaro Herrera wrote: > Patch 0003 is huge. I suspect you mean 0004? If so - yes :(. I unfortunately don't think there's a useful way to split it in smaller chunks - I originally moved ops over one-by-one, but that required a lot of duplicated structs and such... > It would be good to have someone at least read it before pushing, but > I don't think anyone other than you has done so. I'd love for somebody else to look through it, I tried asking multiple times... Seems like threatening a commit is the best way to get it :P - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On 2017-03-14 08:28:02 -0300, Alvaro Herrera wrote: > Andres Freund wrote: > > > EEO_SWITCH(op->opcode) > > { > > EEO_CASE(EEO_DONE): > > goto out; > > Oh my. > > > which is a bit annoying. (the EEO_CASE is either a jump label or a case > > statement, depending on computed goto availability). > > > > It seems we could either: > > 1) live with the damage > > 2) disable pgindent > > 3) move the : inside EEO_CASE's definition, and only use {} blocks. > > I think 3) is nasty because the result doesn't look like normal C. We have a bunch of such constructs already however. foreach(), PG_TRY(). That means 3) has the advantage that our editors and pgindent can already deal with it. > EEO_CASE() are potentially jump labels, then indentation becomes > correct. Why not accept it? It looks odd, but that gives a better hint > to the reader about what's going on. Seems likely that every editor would indent them differently :( I'm leaning towards 3) for the reasons above and after trying out the different indentations, but I don't have a strong opinion. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Hi, On 2017-03-14 16:58:54 +0200, Heikki Linnakangas wrote: > On 03/14/2017 08:53 AM, Andres Freund wrote: > > Besides that, this version has: > > - pgindented most of the affected pieces (i.e. all significant new code > > has been reindent, not all touched one) > > I think you'll need to add all the inner structs ExprEvalStep typedefs.list > to indent them right. Yea, I saw that. Since they're not named atm, That'd probably not work well. I'd be inclined to just live with it :/ > I looked at patch 0004. Thanks! > * EEO_DISPATCH_DIRECT(op) takes 'op' as an argument, but it really assumes > that 'op' has already been set to point to the jump target. I find that a > bit weird. I guess the idea is that you always pass the Program Counter > variable as 'op' argument. For consistency, would be good if EEO_SWITCH() > also took just 'op' as the argument, rather than op->opcode. But I think it > would be more clear if they should both just assumed that there's a variable > called 'op' that points to the current instruction. Hm. I dislike magic variable names. I think I prefer your idea of passing the op entirely to EEO_SWITCH. > * All the callers of EEO_DISPATCH_DIRECT(op) set 'op' just prior to calling > EEO_DISPATCH_DIRECT(op). How about having a macro EEO_JUMP(), > to encapsulate setting 'op' and jumping to it in one operation? Ok. > * ExecEvalStepOp() seems relatively expensive, with the linear scan over all > the opcodes, if called on an ExprState that already has > EEO_FLAG_JUMP_THREADED set. All the callers use it to check if the opcode is > a particular one, so you could check if the opcode matches that particular > one, instead of scanning the dispatch table to find what it is. It should be fairly cheap at that place, unless the same expression is instantiated twice for some reason. I have been wondering about adding a second table ptr->op that can be binary searched for the operation to make it cheaper. > * But is ExecEvalStepOp() ever actually get called on an expression with > EEO_FLAG_JUMP_THREADED already set? It's only used in > ExecInstantiateInterpretedExpr(), and it's called only once on each > ExprState. How about just adding an Assert that EEO_FLAG_JUMP_THREADED is > not yet set? Or drop the EEO_FLAG_JUMP_THREADED flag altogether, and assert > that evalfunc != ExecInterpExpr. The primary use of ExecEvalStepOp() is really debugging, so I'd like to avoid adding that restriction. I guess we can add a second function for this if needed. > * How tight are we on space in the ExprEvalStep union? Currently, the > jump-threading preparation replaces the opcodes with the goto labels, but it > would be really nice to keep the original opcodes, for debugging purposes if > nothing else. There's nothing left to spare :(. For debugging I've just used ExecEvalStepOp() - it's inconvenient to directly print the struct anyway, since gdb prints the whole union... > * execInterpExpr.c is quite a mouthful. How about exprInterpreter.c? > Attached is a patch, on top of your > 0004-Faster-expression-evaluation-and-targetlist-projecti.patch, with some > minor tweaks and comments here and there (search for HEIKKI). It's mostly > the same stuff I listed above, implemented in a quick & dirty way. I hope > it's helpful. Thanks! > typedef enum ExprEvalOp > { > + /* > + * HEIKKI: How about renaming these to EEOP_* ? I think it'd be a > + * good mnemonic to remember that these are opcodes, and just one > + * extra letter. > + */ I don't mind the change. > + /* HEIKKI: Is it safe to assume that sizeof(size_t) >= sizeof(void *) ? > */ Yes, seems very likely - it's supposed to hold any potential sizeof() return value. You could end up on platforms where that's not true, if it used tagged pointers or such. I guess we could instead use a union, but that doesn't seem that much of a benefit. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On 03/14/2017 08:53 AM, Andres Freund wrote: Besides that, this version has: - pgindented most of the affected pieces (i.e. all significant new code has been reindent, not all touched one) I think you'll need to add all the inner structs ExprEvalStep typedefs.list to indent them right. My current plan is to not do very much on this tomorrow, do another full pass on Wednesday, and push it, unless there's protest till then. I looked at patch 0004. Some comments: * EEO_DISPATCH_DIRECT(op) takes 'op' as an argument, but it really assumes that 'op' has already been set to point to the jump target. I find that a bit weird. I guess the idea is that you always pass the Program Counter variable as 'op' argument. For consistency, would be good if EEO_SWITCH() also took just 'op' as the argument, rather than op->opcode. But I think it would be more clear if they should both just assumed that there's a variable called 'op' that points to the current instruction. * All the callers of EEO_DISPATCH_DIRECT(op) set 'op' just prior to calling EEO_DISPATCH_DIRECT(op). How about having a macro EEO_JUMP(number>), to encapsulate setting 'op' and jumping to it in one operation? * ExecEvalStepOp() seems relatively expensive, with the linear scan over all the opcodes, if called on an ExprState that already has EEO_FLAG_JUMP_THREADED set. All the callers use it to check if the opcode is a particular one, so you could check if the opcode matches that particular one, instead of scanning the dispatch table to find what it is. * But is ExecEvalStepOp() ever actually get called on an expression with EEO_FLAG_JUMP_THREADED already set? It's only used in ExecInstantiateInterpretedExpr(), and it's called only once on each ExprState. How about just adding an Assert that EEO_FLAG_JUMP_THREADED is not yet set? Or drop the EEO_FLAG_JUMP_THREADED flag altogether, and assert that evalfunc != ExecInterpExpr. * How tight are we on space in the ExprEvalStep union? Currently, the jump-threading preparation replaces the opcodes with the goto labels, but it would be really nice to keep the original opcodes, for debugging purposes if nothing else. * execInterpExpr.c is quite a mouthful. How about exprInterpreter.c? Attached is a patch, on top of your 0004-Faster-expression-evaluation-and-targetlist-projecti.patch, with some minor tweaks and comments here and there (search for HEIKKI). It's mostly the same stuff I listed above, implemented in a quick & dirty way. I hope it's helpful. - Heikki faster-eval-comments.patch Description: application/download -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c index 3d6a3801c0..d205101b89 100644 --- a/src/backend/executor/execUtils.c +++ b/src/backend/executor/execUtils.c @@ -47,7 +47,14 @@ #include "utils/rel.h" -static bool get_last_attnums(Node *node, ProjectionInfo *projInfo); +typedef struct LastLastAttnumInfo LastLast? Patch 0003 is huge. It would be good to have someone at least read it before pushing, but I don't think anyone other than you has done so. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Andres Freund wrote: > EEO_SWITCH(op->opcode) > { > EEO_CASE(EEO_DONE): > goto out; Oh my. > which is a bit annoying. (the EEO_CASE is either a jump label or a case > statement, depending on computed goto availability). > > It seems we could either: > 1) live with the damage > 2) disable pgindent > 3) move the : inside EEO_CASE's definition, and only use {} blocks. I think 3) is nasty because the result doesn't look like normal C. If EEO_CASE() are potentially jump labels, then indentation becomes correct. Why not accept it? It looks odd, but that gives a better hint to the reader about what's going on. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Hi, On 2017-03-13 01:03:51 -0700, Andres Freund wrote: > What's basically missing here is: > - pgindent run and minimizing resulting damage Running into a bit of an issue here - pgindent mangles something like EEO_SWITCH (op->opcode) { EEO_CASE(EEO_DONE): goto out; EEO_CASE(EEO_INNER_FETCHSOME): /* XXX: worthwhile to check tts_nvalid inline first? */ slot_getsomeattrs(innerslot, op->d.fetch.last_var); EEO_DISPATCH(op); EEO_CASE(EEO_OUTER_FETCHSOME): slot_getsomeattrs(outerslot, op->d.fetch.last_var); EEO_DISPATCH(op); EEO_CASE(EEO_SCAN_FETCHSOME): slot_getsomeattrs(scanslot, op->d.fetch.last_var); EEO_DISPATCH(op); EEO_CASE(EEO_INNER_VAR): { int attnum = op->d.var.attnum; /* * Can't assert tts_nvalid, as wholerow var evaluation or such * could have materialized the slot - but the contents are * still valid :/ */ Assert(op->d.var.attnum >= 0); *op->resnull = innerslot->tts_isnull[attnum]; *op->resvalue = innerslot->tts_values[attnum]; EEO_DISPATCH(op); } into EEO_SWITCH(op->opcode) { EEO_CASE(EEO_DONE): goto out; EEO_CASE(EEO_INNER_FETCHSOME): /* XXX: worthwhile to check tts_nvalid inline first? */ slot_getsomeattrs(innerslot, op->d.fetch.last_var); EEO_DISPATCH(op); EEO_CASE(EEO_OUTER_FETCHSOME): slot_getsomeattrs(outerslot, op->d.fetch.last_var); EEO_DISPATCH(op); EEO_CASE(EEO_SCAN_FETCHSOME): slot_getsomeattrs(scanslot, op->d.fetch.last_var); EEO_DISPATCH(op); EEO_CASE(EEO_INNER_VAR): { int attnum = op->d.var.attnum; /* * Can't assert tts_nvalid, as wholerow var evaluation or such * could have materialized the slot - but the contents are still * valid :/ */ Assert(op->d.var.attnum >= 0); *op->resnull = innerslot->tts_isnull[attnum]; *op->resvalue = innerslot->tts_values[attnum]; EEO_DISPATCH(op); } which is a bit annoying. (the EEO_CASE is either a jump label or a case statement, depending on computed goto availability). It seems we could either: 1) live with the damage 2) disable pgindent 3) move the : inside EEO_CASE's definition, and only use {} blocks. I'm inclined to go for 3). Opinions? - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On 03/13/2017 09:03 AM, Andres Freund wrote: Hi, On 2017-03-12 05:40:51 +0100, Tomas Vondra wrote: I wanted to do a bit of testing and benchmarking on this, but 0004 seems to be a bit broken. Well, "broken" in the sense that it's already outdated, because other stuff that got merged. Yes, that's what I meant. Sorry for not being quite clear. The patch does not apply anymore - there are some conflicts in execQual.c, but I think I fixed those. But then I ran into a bunch of compile-time errors, because some of the executor nodes still reference bits that were moved elsewhere. Updated patch attached. Note that this patch has two changes I've not yet evaluated performance-wise. And which are those? I plan to do a bit of testing / benchmarking on this later this week, depending on other testing already happening on my machine. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On 13.03.2017 11:03, Andres Freund wrote: Hi, On 2017-03-12 05:40:51 +0100, Tomas Vondra wrote: I wanted to do a bit of testing and benchmarking on this, but 0004 seems to be a bit broken. Well, "broken" in the sense that it's already outdated, because other stuff that got merged. The patch does not apply anymore - there are some conflicts in execQual.c, but I think I fixed those. But then I ran into a bunch of compile-time errors, because some of the executor nodes still reference bits that were moved elsewhere. Updated patch attached. Note that this patch has two changes I've not yet evaluated performance-wise. I got the following results at my system with Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz, 16Gb RAM, TPC-H Q1/Q6 scale 10, sharedBuffers=8Gb, pg_prewarm on lineitem table projection: Q1 Q6 Master 7503 ms 1171 ms Your patch 6420 ms 1034 ms VOPS 396 ms 249 ms VOPS + patch367 ms 233 ms -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP: Faster Expression Processing v4
Hi, On 03/07/2017 04:01 AM, Andres Freund wrote: Hi, Attached is an updated version of the patchset, but more importantly some benchmark numbers. I wanted to do a bit of testing and benchmarking on this, but 0004 seems to be a bit broken. The patch does not apply anymore - there are some conflicts in execQual.c, but I think I fixed those. But then I ran into a bunch of compile-time errors, because some of the executor nodes still reference bits that were moved elsewhere. E.g. there's no targetlist in PlanState anymore, yet nodeGatherMerge and nodeTableFuncscan do this: gm_state->ps.targetlist = (List *) ExecInitExpr((Expr *) node->plan.targetlist, (PlanState *) gm_state); Some of the nodes also assign to ps.qual values that are (List *), but that field was switched to ExprState. That needs fixing, I guess. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On Fri, Mar 10, 2017 at 7:42 PM, Bruce Momjian wrote: > On Fri, Mar 10, 2017 at 07:15:59PM -0500, Peter Eisentraut wrote: >> On 3/10/17 14:40, Andres Freund wrote: >> > I'd really like to get it in. The performance improvements on its own >> > are significant, and it provides the basis for significantly larger >> > improvements again (JIT) - those followup improvements are large patches >> > again though, so I'd rather not do all of that next cycle. >> > >> > My next step (over the weekend) is to add tests to execQual.c to get it >> > a good chunk closer to 100% test coverage, and then do the same for the >> > new implementation (there's probably very little additional tests needed >> > after the conversion). Given all tests pass before/after, and there's a >> > lot of them, I think we can have a reasonable confidence of a low bug >> > density. >> >> That seems like a plan, but now would probably be a good time for some >> other hackers to take note of this proposal. > > Well, the executor is long overdue for improvement, so I would love to > have this in 10.0. I am not sure what additional polishing will happen > if we punt it for 11.0. I think the only downside of having it in 10.0 > is that it will not have lived in the source tree for as long a time > between commit and PG release, but if it is tested, that seems fine. Yeah. I think we can't rule out the possibility that this patch is full of bugs, but (1) a lot of the change is pretty mechanical and (2) none of this touches the on-disk format or the catalog contents, so it doesn't break anybody's existing install if we have to patch it. On the other hand, (3) it's a big patch and (4) if expression evaluation doesn't work, PostgreSQL is pretty much unusable. So my feeling is: 1. If Andres commits this and it turns out to be seriously buggy, he should be prepared to revert it without a big fight. We can try again for v11. 2. If Andres commits this and it turns out to be mildly buggy, he should be prepared to address bugs very proactively and very promptly. 3. If Andres wants to commit this for v10, it should be done RSN. Feature freeze isn't technically until the end of March, but giant potentialy-destabilizing patches that land on March 31st are likely to lead to outcomes that nobody wants. Like, I think, everyone else, I'd really like to have these speedups and I think they will benefit everyone who uses PostgreSQL. However, I think that we cannot afford to have a repeat of the simplehash thing where serious bugs sat around for months without really getting properly addressed. If that happens with this, I will be extremely unhappy, and I'm fairly sure I won't be alone. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On Fri, Mar 10, 2017 at 07:15:59PM -0500, Peter Eisentraut wrote: > On 3/10/17 14:40, Andres Freund wrote: > > I'd really like to get it in. The performance improvements on its own > > are significant, and it provides the basis for significantly larger > > improvements again (JIT) - those followup improvements are large patches > > again though, so I'd rather not do all of that next cycle. > > > > My next step (over the weekend) is to add tests to execQual.c to get it > > a good chunk closer to 100% test coverage, and then do the same for the > > new implementation (there's probably very little additional tests needed > > after the conversion). Given all tests pass before/after, and there's a > > lot of them, I think we can have a reasonable confidence of a low bug > > density. > > That seems like a plan, but now would probably be a good time for some > other hackers to take note of this proposal. Well, the executor is long overdue for improvement, so I would love to have this in 10.0. I am not sure what additional polishing will happen if we punt it for 11.0. I think the only downside of having it in 10.0 is that it will not have lived in the source tree for as long a time between commit and PG release, but if it is tested, that seems fine. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On 3/10/17 14:40, Andres Freund wrote: > I'd really like to get it in. The performance improvements on its own > are significant, and it provides the basis for significantly larger > improvements again (JIT) - those followup improvements are large patches > again though, so I'd rather not do all of that next cycle. > > My next step (over the weekend) is to add tests to execQual.c to get it > a good chunk closer to 100% test coverage, and then do the same for the > new implementation (there's probably very little additional tests needed > after the conversion). Given all tests pass before/after, and there's a > lot of them, I think we can have a reasonable confidence of a low bug > density. That seems like a plan, but now would probably be a good time for some other hackers to take note of this proposal. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On 3/10/17 14:24, Andres Freund wrote: > What problem are you thinking of exactly, and what'd be the solution? > I'd guess that people wouldn't like being unable to change columns in a > table if some function returned the type. Well, that's pretty much the problem. For example: create type t1 as (a int, b int); create function f1() returns setof t1 language plpgsql as $$ begin return query values (1, 2); end $$; alter type t1 drop attribute b; select * from f1(); -- fail The dependency system should arguably guard against that. Yes, it would make some things more cumbersome, but, well, it already does that. >> Not necessarily in this patch, but I would like to keep the options >> open. > > Yea, seems worth thinking about. I don't think the patch closes down > options? nope -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On 2017-03-10 09:00:14 -0500, Peter Eisentraut wrote: > I have also looked at the 0002 and 0003 patches, and they seem OK, but > they are obviously not of much use without 0004. What is your ambition > for getting 0004 reviewed and committed for PG10? I'd really like to get it in. The performance improvements on its own are significant, and it provides the basis for significantly larger improvements again (JIT) - those followup improvements are large patches again though, so I'd rather not do all of that next cycle. My next step (over the weekend) is to add tests to execQual.c to get it a good chunk closer to 100% test coverage, and then do the same for the new implementation (there's probably very little additional tests needed after the conversion). Given all tests pass before/after, and there's a lot of them, I think we can have a reasonable confidence of a low bug density. Large parts of the changes are fairly mechanical, the interesting bits probably are: - there's a lot fewer "full blown" node types for expression evaluation anymore. In most cases there's now only a top-level ExprState node that's nodeTag() able. The actual information to execute an instruction now is in ExprState->steps[i] - because of the previous point, it now isn't legal anymore to call ExecInitExpr() on a list of expressions - ExecInitExprList() is a convenience wrapper around manually iterating over the list (gets rid of ugly casts, too) - Because they behave differently on an actual expression evaluation basis, quals / check constraint, now have to be initialized with ExecInitQual/ExecInitCheck(). That way the shortcut behaviour and such can be retained, while also being faster. - PlanState->targetlist is gone. Because expressions aren't initialized on their own anymore, but as part of ExecProject, there's no need for it. - The seperation between ExecInitExprRec() and ExecEvalExpr/ExecInterpExpr() is different - we try to do as much as possible during initialization. - I retained some ugly hackering around innermost_caseval|null/domainval|null - I had started down a path of removing the use of those in random parts of the system, but I ended up not going for it. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On 2017-03-10 08:51:25 -0500, Peter Eisentraut wrote: > On 3/7/17 19:14, Andres Freund wrote: > >> Why shouldn't the function itself also depend on the components of its > >> return type? > > Because that'd make it impossible to change the return type components - > > if the return type is baked into the view that's necessary, but for a > > "freestanding function" it's not. If you e.g. have a function that just > > returns a table's rows, it'd certainly be annoying if that'd prevent you > > from dropping columns. > > I think functions breaking when the return type is fiddled with is > actually a not-uncommon problem in practice. It would be nice if that > could be addressed. What problem are you thinking of exactly, and what'd be the solution? I'd guess that people wouldn't like being unable to change columns in a table if some function returned the type. One rather extreme way would be to have functions return a "different" type, that's initially identical to the table/composite type. If the ocmposite type then changes, we'd transparently map between the actual return type, and the one callers expect. But that'd a fair bit of effort, and presumably also quite confusing for users that might e.g. expect a new column to show up. > Not necessarily in this patch, but I would like to keep the options > open. Yea, seems worth thinking about. I don't think the patch closes down options? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
I have also looked at the 0002 and 0003 patches, and they seem OK, but they are obviously not of much use without 0004. What is your ambition for getting 0004 reviewed and committed for PG10? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On 3/7/17 19:14, Andres Freund wrote: >> Why shouldn't the function itself also depend on the components of its >> return type? > Because that'd make it impossible to change the return type components - > if the return type is baked into the view that's necessary, but for a > "freestanding function" it's not. If you e.g. have a function that just > returns a table's rows, it'd certainly be annoying if that'd prevent you > from dropping columns. I think functions breaking when the return type is fiddled with is actually a not-uncommon problem in practice. It would be nice if that could be addressed. Not necessarily in this patch, but I would like to keep the options open. Comments from others would be welcome on this. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Hi, On 2017-03-07 18:46:31 -0500, Peter Eisentraut wrote: > I looked over > 0001-Add-expression-dependencies-on-composite-type-whole-.patch. That > seems useful independent of the things you have following. > > Even though it is presented more as a preparatory change, it appears to > have noticeable user-facing impact, which we should analyze. Indeed. > For > example, in the regression test, the command > > alter table tt14t drop column f3; > > is now rejected, rightly, but the command > > alter table tt14t drop column f3 cascade; > > ends up dropping the whole view, which might be a bit surprising. We > don't support dropping columns from views, so there might be no > alternative It seems strictly better than silently breaking the view. > , but I wonder what else is changing because of this. I > think that might deserve a bit more analysis and perhaps some test cases. There are already some tests. More is probably good - if you have specific cases in mind... > --- a/src/backend/catalog/dependency.c > +++ b/src/backend/catalog/dependency.c > @@ -206,6 +206,9 @@ static bool object_address_present_add_flags(const > ObjectAddress *object, > static bool stack_address_present_add_flags(const ObjectAddress *object, > int flags, > > ObjectAddressStack *stack); > +static void add_type_addresses(Oid typeid, bool include_components, > ObjectAddresses *addrs); > +static void add_type_component_addresses(Oid typeid, ObjectAddresses > *addrs); > +static void add_class_component_addresses(Oid relid, ObjectAddresses > *addrs); > static void DeleteInitPrivs(const ObjectAddress *object); > > For some reason, the function add_object_address() is singular, and > these new functions are plural Clearly, they take a plural argument, so > your version seems more correct, but I wonder if we should keep that > consistent. I named them plural, because add_object_address only adds a single new entry, but add_type_addresses etc potentially add multiple ones. So I think plural/singular as used here is actually consistent? > +* e.g. not to the right thing for column-less > tables. Note that > > Small typo there. (to -> do) Oops. > @@ -1639,6 +1641,9 @@ find_expr_references_walker(Node *node, > > add_object_address(OCLASS_PROC, funcexpr->funcid, 0, >context->addrs); > + /* dependency on type itself already exists via function */ > + add_type_component_addresses(funcexpr->funcresulttype, > context->addrs); > + > /* fall through to examine arguments */ > } > else if (IsA(node, OpExpr)) > > Why shouldn't the function itself also depend on the components of its > return type? Because that'd make it impossible to change the return type components - if the return type is baked into the view that's necessary, but for a "freestanding function" it's not. If you e.g. have a function that just returns a table's rows, it'd certainly be annoying if that'd prevent you from dropping columns. > How are argument types handled? We fall through to return expression_tree_walker(node, find_expr_references_walker, (void *) context); which'll add a dependency if necessary. If it's e.g. a table column, function return type or such we'll add a dependency there. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
I looked over 0001-Add-expression-dependencies-on-composite-type-whole-.patch. That seems useful independent of the things you have following. Even though it is presented more as a preparatory change, it appears to have noticeable user-facing impact, which we should analyze. For example, in the regression test, the command alter table tt14t drop column f3; is now rejected, rightly, but the command alter table tt14t drop column f3 cascade; ends up dropping the whole view, which might be a bit surprising. We don't support dropping columns from views, so there might be no alternative, but I wonder what else is changing because of this. I think that might deserve a bit more analysis and perhaps some test cases. --- a/src/backend/catalog/dependency.c +++ b/src/backend/catalog/dependency.c @@ -206,6 +206,9 @@ static bool object_address_present_add_flags(const ObjectAddress *object, static bool stack_address_present_add_flags(const ObjectAddress *object, int flags, ObjectAddressStack *stack); +static void add_type_addresses(Oid typeid, bool include_components, ObjectAddresses *addrs); +static void add_type_component_addresses(Oid typeid, ObjectAddresses *addrs); +static void add_class_component_addresses(Oid relid, ObjectAddresses *addrs); static void DeleteInitPrivs(const ObjectAddress *object); For some reason, the function add_object_address() is singular, and these new functions are plural Clearly, they take a plural argument, so your version seems more correct, but I wonder if we should keep that consistent. +* e.g. not to the right thing for column-less tables. Note that Small typo there. (to -> do) @@ -1639,6 +1641,9 @@ find_expr_references_walker(Node *node, add_object_address(OCLASS_PROC, funcexpr->funcid, 0, context->addrs); + /* dependency on type itself already exists via function */ + add_type_component_addresses(funcexpr->funcresulttype, context->addrs); + /* fall through to examine arguments */ } else if (IsA(node, OpExpr)) Why shouldn't the function itself also depend on the components of its return type? How are argument types handled? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
On Mon, Mar 6, 2017 at 10:01 PM, Andres Freund wrote: > My benchmarking script first prewarms the whole database, then runs the > tpch queries in sequence, repeated three times, and compares the shortes > execution time: Those numbers are stupendous. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers