subject:"Re\: \[HACKERS\] WIP\: Faster Expression Processing v4"

On 2017-03-27 13:30:11 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-03-27 12:18:37 -0400, Tom Lane wrote:
> >> My feeling at this point is that we might be better off disabling
> >> the computed-goto case by default.  At the very least, we're going
> >> to need a version check that restricts it to latest gcc.
> 
> > In my measurements it's still faster in at least gcc-5/6, even without
> > the option (largely because it avoids array bounds checks on the jump
> > table built for the switch).
> 
> Hm.  What test cases are you using?

I used tpc-h - seems like a realistic enough testcase.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

Andres Freund  writes:
> On 2017-03-27 12:18:37 -0400, Tom Lane wrote:
>> My feeling at this point is that we might be better off disabling
>> the computed-goto case by default.  At the very least, we're going
>> to need a version check that restricts it to latest gcc.

> In my measurements it's still faster in at least gcc-5/6, even without
> the option (largely because it avoids array bounds checks on the jump
> table built for the switch).

Hm.  What test cases are you using?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

On 2017-03-27 12:18:37 -0400, Tom Lane wrote:
> I wrote:
> > As to the point of whether it actually helps or not ...
> > on gcc 4.4.7 (RHEL 6), it makes things *WORSE*.  We go from about half of
> > the dispatches getting routed through a common location, to *all* of them
> > (except one; for some odd reason the first EEO_NEXT in EEOP_NULLIF
> > survives as a separate jump).  This seems like a bug, but there it is.
> 
> So after a bit of googling, this is a very longstanding complaint:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39284
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71785
> 
> (hm, think I know the second submitter)

I don't think that's precisely the same issue - as long as some of the
goto branches survive, we're not hitting the full brunt of the compgoto
thing.  I think we're essentially avoiding some of that because we're
"precomputing" the dispatch_table lookup.

To count the number of jumps I used:
gdb  -batch -ex 'disassemble/s ExecInterpExpr' execExprInterp.o|grep jmpq|grep 
-v ExecInterpExpr|wc -l
which'll only include indirect jumps (since otherwise jumps will look
like "jmpq   0x35f2 ")

standard flags  -fno-crossjumping
gcc-5 (5.4.1-8):34  82
gcc-6 (6.3.0-8):34  82
gcc-7 (7.0.1):  71  108
gcc-snapshot:   72  108

So that doesn't look too bad.


> My feeling at this point is that we might be better off disabling
> the computed-goto case by default.  At the very least, we're going
> to need a version check that restricts it to latest gcc.

In my measurements it's still faster in at least gcc-5/6, even without
the option (largely because it avoids array bounds checks on the jump
table built for the switch).

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

On 2017-03-27 11:52:05 -0400, Tom Lane wrote:
> As to the point of whether it actually helps or not ...
> 
> on gcc 6.3.1 (Fedora 25), yes it does.  Seems to be one "jmp *something"
> per EEO_NEXT or EEO_JUMP.
> 
> on gcc 4.4.7 (RHEL 6), it makes things *WORSE*.  We go from about half of
> the dispatches getting routed through a common location, to *all* of them
> (except one; for some odd reason the first EEO_NEXT in EEOP_NULLIF
> survives as a separate jump).  This seems like a bug, but there it is.

Gah :(.  I have gcc 5,6,7 here, but nothing earlier.  I'm now inclined
to go the version check routine, for the individual versions we can (and
want) confirm this on...  I'm not too concerned about not doing so on
gcc 4.4 or older...


> So this means we'd need some serious research to decide whether to apply
> it.  And I'm suspecting we'd end up with a compiler version test.

Indeed.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

I wrote:
> As to the point of whether it actually helps or not ...
> on gcc 4.4.7 (RHEL 6), it makes things *WORSE*.  We go from about half of
> the dispatches getting routed through a common location, to *all* of them
> (except one; for some odd reason the first EEO_NEXT in EEOP_NULLIF
> survives as a separate jump).  This seems like a bug, but there it is.

So after a bit of googling, this is a very longstanding complaint:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39284
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71785

(hm, think I know the second submitter)

I'm not sure we should be relying on a gcc bug fix that's barely four
months old.  Even assuming it fixes the issue without new regressions,
most people are not going to have it anytime soon.

My feeling at this point is that we might be better off disabling
the computed-goto case by default.  At the very least, we're going
to need a version check that restricts it to latest gcc.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

As to the point of whether it actually helps or not ...

on gcc 6.3.1 (Fedora 25), yes it does.  Seems to be one "jmp *something"
per EEO_NEXT or EEO_JUMP.

on gcc 4.4.7 (RHEL 6), it makes things *WORSE*.  We go from about half of
the dispatches getting routed through a common location, to *all* of them
(except one; for some odd reason the first EEO_NEXT in EEOP_NULLIF
survives as a separate jump).  This seems like a bug, but there it is.

So this means we'd need some serious research to decide whether to apply
it.  And I'm suspecting we'd end up with a compiler version test.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

On 2017-03-27 11:22:40 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-03-27 09:33:43 -0400, Tom Lane wrote:
> >> I think it would, primarily because if we find out that some other compiler
> >> spells this differently, we could handle it totally within configure.
> 
> > I'm unconvinced that we could sensibly map different compiler's options
> > on the same option name - I'd be surprised if they would have a similar
> > enough effect.
> 
> [ shrug... ]  OK, next question is whether pgindent treats this
> layout sanely.

The patch was run through pgindent, i.e. it seems to be ok with it.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

Andres Freund  writes:
> On 2017-03-27 09:33:43 -0400, Tom Lane wrote:
>> I think it would, primarily because if we find out that some other compiler
>> spells this differently, we could handle it totally within configure.

> I'm unconvinced that we could sensibly map different compiler's options
> on the same option name - I'd be surprised if they would have a similar
> enough effect.

[ shrug... ]  OK, next question is whether pgindent treats this
layout sanely.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

On 2017-03-27 09:33:43 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Checking for this isn't entirely pretty - see my attached attempt at
> > doing so.  I considered hiding
> > __attribute__((optimize("no-crossjumping"))) in execInterpExpr.c behind
> > a macro (like PG_DISABLE_CROSSJUMPING), but I don't really think that
> > makes things better.
> 
> I think it would, primarily because if we find out that some other compiler
> spells this differently, we could handle it totally within configure.

I'm unconvinced that we could sensibly map different compiler's options
on the same option name - I'd be surprised if they would have a similar
enough effect.


> Isn't our practice to put __attribute__ at the end of a function
> declaration or definition, not in the middle someplace?

We could move it to the declaration - which doesn't seem like an
improvement here, given that it's about optimization not API changes -
but in definitions you can't move it after the function name:
static Datum
ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
#ifdef HAVE_FUNCATTR_NO_CROSSJUMPING
__attribute__((optimize("no-crossjumping")))
#endif
:
/home/andres/src/postgresql/src/backend/executor/execExprInterp.c:279:1: error: 
attributes should be specified before the declarator in a function definition


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

Andres Freund  writes:
>> I personally find per-function annotation ala
>> __attribute__((optimize("no-crossjumping")))
>> cleaner anyway.  I tested that, and it seems to work.
>> 
>> Obviously we'd have to hide that behind a configure test.  Could also do
>> tests based on __GNUC__ / __GNUC_MINOR__, but that seems uglier.

Agreed.

> Checking for this isn't entirely pretty - see my attached attempt at
> doing so.  I considered hiding
> __attribute__((optimize("no-crossjumping"))) in execInterpExpr.c behind
> a macro (like PG_DISABLE_CROSSJUMPING), but I don't really think that
> makes things better.

I think it would, primarily because if we find out that some other compiler
spells this differently, we could handle it totally within configure.

Isn't our practice to put __attribute__ at the end of a function
declaration or definition, not in the middle someplace?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-26 Thread Andres Freund

On 2017-03-25 20:59:27 -0700, Andres Freund wrote:
> On 2017-03-25 23:51:45 -0400, Tom Lane wrote:
> > Andres Freund  writes:
> > > On March 25, 2017 4:56:11 PM PDT, Ants Aasma  wrote:
> > >> I haven't had the time to research this properly, but initial tests
> > >> show that with GCC 6.2 adding
> > >> 
> > >> #pragma GCC optimize ("no-crossjumping")
> > >> 
> > >> fixes merging of the op tail jumps.
> > >> 
> > >> Some quick and dirty benchmarking suggests that the benefit for the
> > >> interpreter is about 15% (5% speedup on a workload that spends 1/3 in
> > >> ExecInterpExpr). My idea of prefetching op->resnull/resvalue to local
> > >> vars before the indirect jump is somewhere between a tiny benefit and
> > >> no effect, certainly not worth introducing extra complexity. Clang 3.8
> > >> does the correct thing out of the box and is a couple of percent
> > >> faster than GCC with the pragma.
> > 
> > > That's large enough to be worth doing (although I recall you seeing all 
> > > jumps commonalized).  We should probably do this on a per function basis 
> > > however (either using pragma push option, or function attributes).
> > 
> > Seems like it would be fine to do it on a per-file basis.
> 
> I personally find per-function annotation ala
> __attribute__((optimize("no-crossjumping")))
> cleaner anyway.  I tested that, and it seems to work.
> 
> Obviously we'd have to hide that behind a configure test.  Could also do
> tests based on __GNUC__ / __GNUC_MINOR__, but that seems uglier.

Checking for this isn't entirely pretty - see my attached attempt at
doing so.  I considered hiding
__attribute__((optimize("no-crossjumping"))) in execInterpExpr.c behind
a macro (like PG_DISABLE_CROSSJUMPING), but I don't really think that
makes things better.

Comments?

Greetings,

Andres Freund
>From ce42086871ebfcbb7ae52266e9c4549b5958e80b Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sun, 26 Mar 2017 22:38:21 -0700
Subject: [PATCH] Disable gcc's crossjumping optimization for ExecInterpExpr().

GCC merges code in ExecInterpExpr() too aggressively, reducing branch
prediction accuracy. As this is performance critical code, go through
the trouble of disabling the relevant optimization for the function.

To do so, have to detect whether the compiler support
__attribute__((optimize("no-crossjumping"))) as a function
annotation. Do so in configure.

Discussion: https://postgr.es/m/20170326035927.5mubkfdtaqlrg...@alap3.anarazel.de
---
 config/c-compiler.m4  | 35 ++
 configure | 41 +++
 configure.in  |  1 +
 src/backend/executor/execExprInterp.c |  9 
 src/include/pg_config.h.in|  4 
 src/include/pg_config.h.win32 |  4 
 6 files changed, 94 insertions(+)

diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index 3321f226f3..ac72539dec 100644
--- a/config/c-compiler.m4
+++ b/config/c-compiler.m4
@@ -297,6 +297,41 @@ fi])# PGAC_C_COMPUTED_GOTO
 
 
 
+# PGAC_C_FUNCATTR_NO_CROSSJUMPING
+# ---
+# Check if the C compiler understands the
+# __attribute__((optimize("no-crossjumping"))) function attribute.
+# Define HAVE_FUNCATTR_NO_CROSSJUMPING if so.
+#
+# At some later point it might make sense to generalize this so we can
+# check for other optimization flags, but so far there's no need for
+# that.
+#
+# Have to enable Werror, as some compilers (e.g. clang) would
+# otherwise just warn about an unknown type of attribute.
+AC_DEFUN([PGAC_C_FUNCATTR_NO_CROSSJUMPING],
+[AC_CACHE_CHECK([for function attribute disabling crossjumping optimizations], pgac_cv_funcattr_no_crossjumping,
+[ac_save_c_werror_flag=$ac_c_werror_flag
+ac_c_werror_flag=yes
+AC_COMPILE_IFELSE([AC_LANG_PROGRAM([
+extern int testme(void);
+int
+__attribute__((optimize("no-crossjumping")))
+testme(void)
+{
+return 0;
+}
+])],
+[pgac_cv_funcattr_no_crossjumping=yes],
+[pgac_cv_funcattr_no_crossjumping=no])
+ac_c_werror_flag=$ac_save_c_werror_flag])
+if test x"$pgac_cv_funcattr_no_crossjumping" = xyes ; then
+AC_DEFINE(HAVE_FUNCATTR_NO_CROSSJUMPING, 1,
+  [Define to 1 if your compiler understands __attribute__((optimize("no-crossjumping"))).])
+fi])# PGAC_C_FUNCATTR_NO_CROSSJUMPING
+
+
+
 # PGAC_C_VA_ARGS
 # --
 # Check if the C compiler understands C99-style variadic macros,
diff --git a/configure b/configure
index 4b8229e959..87c4799be3 100755
--- a/configure
+++ b/configure
@@ -11835,6 +11835,47 @@ if test x"$pgac_cv_computed_goto" = xyes ; then
 $as_echo "#define HAVE_COMPUTED_GOTO 1" >>confdefs.h
 
 fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for function attribute disabling crossjumping optimizations" >&5
+$as_echo_n "checking for function attribute disabling crossjumping optimizations... " >&6; }
+if ${pgac_cv_funcattr_no_crossjumping+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_save_c_werror_flag=$ac_c_werror_flag
+ac_c_we

Re: [HACKERS] WIP: Faster Expression Processing v4

On 2017-03-25 23:51:45 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On March 25, 2017 4:56:11 PM PDT, Ants Aasma  wrote:
> >> I haven't had the time to research this properly, but initial tests
> >> show that with GCC 6.2 adding
> >> 
> >> #pragma GCC optimize ("no-crossjumping")
> >> 
> >> fixes merging of the op tail jumps.
> >> 
> >> Some quick and dirty benchmarking suggests that the benefit for the
> >> interpreter is about 15% (5% speedup on a workload that spends 1/3 in
> >> ExecInterpExpr). My idea of prefetching op->resnull/resvalue to local
> >> vars before the indirect jump is somewhere between a tiny benefit and
> >> no effect, certainly not worth introducing extra complexity. Clang 3.8
> >> does the correct thing out of the box and is a couple of percent
> >> faster than GCC with the pragma.
> 
> > That's large enough to be worth doing (although I recall you seeing all 
> > jumps commonalized).  We should probably do this on a per function basis 
> > however (either using pragma push option, or function attributes).
> 
> Seems like it would be fine to do it on a per-file basis.

I personally find per-function annotation ala
__attribute__((optimize("no-crossjumping")))
cleaner anyway.  I tested that, and it seems to work.

Obviously we'd have to hide that behind a configure test.  Could also do
tests based on __GNUC__ / __GNUC_MINOR__, but that seems uglier.


> If you're
> worried about pessimizing the out-of-line subroutines, we could move
> those to a different file --- it's pretty questionable that they're
> in execExprInterp.c in the first place, considering they're meant to be
> used by more than just that execution method.

I indeed am, but having the code in the same file has a minor advantage:
It allows the compiler to partially inline them, if it feels like it
(e.g. moving null checks inline).

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-25 Thread Tom Lane

Andres Freund  writes:
> On March 25, 2017 4:56:11 PM PDT, Ants Aasma  wrote:
>> I haven't had the time to research this properly, but initial tests
>> show that with GCC 6.2 adding
>> 
>> #pragma GCC optimize ("no-crossjumping")
>> 
>> fixes merging of the op tail jumps.
>> 
>> Some quick and dirty benchmarking suggests that the benefit for the
>> interpreter is about 15% (5% speedup on a workload that spends 1/3 in
>> ExecInterpExpr). My idea of prefetching op->resnull/resvalue to local
>> vars before the indirect jump is somewhere between a tiny benefit and
>> no effect, certainly not worth introducing extra complexity. Clang 3.8
>> does the correct thing out of the box and is a couple of percent
>> faster than GCC with the pragma.

> That's large enough to be worth doing (although I recall you seeing all jumps 
> commonalized).  We should probably do this on a per function basis however 
> (either using pragma push option, or function attributes).

Seems like it would be fine to do it on a per-file basis.  If you're
worried about pessimizing the out-of-line subroutines, we could move
those to a different file --- it's pretty questionable that they're
in execExprInterp.c in the first place, considering they're meant to be
used by more than just that execution method.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4



On March 25, 2017 4:56:11 PM PDT, Ants Aasma  wrote:
>On Sun, Mar 26, 2017 at 12:22 AM, Andres Freund 
>wrote:
>>> At least with current gcc (6.3.1 on Fedora 25) at -O2,
>>> what I see is multiple places jumping to the same indirect jump
>>> instruction :-(.  It's not a total disaster: as best I can tell, all
>the
>>> uses of EEO_JUMP remain distinct.  But gcc has chosen to implement
>about
>>> 40 of the 71 uses of EEO_NEXT by jumping to the same couple of
>>> instructions that increment the "op" register and then do an
>indirect
>>> jump :-(.
>>
>> Yea, I see some of that too - "usually" when there's more than just
>the
>> jump in common.  I think there's some gcc variables that influence
>this
>> (min-crossjump-insns (5), max-goto-duplication-insns (8)).  Might be
>> worthwhile experimenting with setting them locally via a pragma or
>such.
>> I think Aants wanted to experiment with that, too.
>
>I haven't had the time to research this properly, but initial tests
>show that with GCC 6.2 adding
>
>#pragma GCC optimize ("no-crossjumping")
>
>fixes merging of the op tail jumps.
>
>Some quick and dirty benchmarking suggests that the benefit for the
>interpreter is about 15% (5% speedup on a workload that spends 1/3 in
>ExecInterpExpr). My idea of prefetching op->resnull/resvalue to local
>vars before the indirect jump is somewhere between a tiny benefit and
>no effect, certainly not worth introducing extra complexity. Clang 3.8
>does the correct thing out of the box and is a couple of percent
>faster than GCC with the pragma.

That's large enough to be worth doing (although I recall you seeing all jumps 
commonalized).  We should probably do this on a per function basis however 
(either using pragma push option, or function attributes).

Andres

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-25 Thread Ants Aasma

On Sun, Mar 26, 2017 at 12:22 AM, Andres Freund  wrote:
>> At least with current gcc (6.3.1 on Fedora 25) at -O2,
>> what I see is multiple places jumping to the same indirect jump
>> instruction :-(.  It's not a total disaster: as best I can tell, all the
>> uses of EEO_JUMP remain distinct.  But gcc has chosen to implement about
>> 40 of the 71 uses of EEO_NEXT by jumping to the same couple of
>> instructions that increment the "op" register and then do an indirect
>> jump :-(.
>
> Yea, I see some of that too - "usually" when there's more than just the
> jump in common.  I think there's some gcc variables that influence this
> (min-crossjump-insns (5), max-goto-duplication-insns (8)).  Might be
> worthwhile experimenting with setting them locally via a pragma or such.
> I think Aants wanted to experiment with that, too.

I haven't had the time to research this properly, but initial tests
show that with GCC 6.2 adding

#pragma GCC optimize ("no-crossjumping")

fixes merging of the op tail jumps.

Some quick and dirty benchmarking suggests that the benefit for the
interpreter is about 15% (5% speedup on a workload that spends 1/3 in
ExecInterpExpr). My idea of prefetching op->resnull/resvalue to local
vars before the indirect jump is somewhere between a tiny benefit and
no effect, certainly not worth introducing extra complexity. Clang 3.8
does the correct thing out of the box and is a couple of percent
faster than GCC with the pragma.

Regards,
Ants Aasma


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

On 2017-03-25 12:22:15 -0400, Tom Lane wrote:
> More random musing ... have you considered making the jump-target fields
> in expressions be relative rather than absolute indexes?  That is,
> EEO_JUMP would look like
> 
>   op += (stepno); \
>   EEO_DISPATCH(); \
> 
> instead of
> 
>   op = &state->steps[stepno]; \
>   EEO_DISPATCH(); \
> 
> I have not carried out a full patch to make this work, but just making
> that one change and examining the generated assembly code looks promising.
> Instead of this
> 
>   movslq  40(%r14), %r8
>   salq$6, %r8
>   addq24(%rbx), %r8
>   movq%r8, %r14
>   jmp *(%r8)
> 
> we get this
> 
>   movslq  40(%r14), %rax
>   salq$6, %rax
>   addq%rax, %r14
>   jmp *(%r14)

That seems like a good idea.  I've not done this in the committed
version (and I don't think we necessarily need to this before the
release), but fo rthe future it seems like a good plan.  It makes sense
that it's faster - there's no need to reference state->steps.

> which certainly looks like it ought to be faster.  Also, the real reason
> I got interested in this at all is that with relative jumps, groups of
> steps would be position-independent within the steps array, which would
> enable some compile-time tricks that seem impractical with the current
> definition.

Indeed.

> BTW, now that I've spent a bit of time looking at the generated assembly
> code, I'm kind of disinclined to believe any arguments about how we have
> better control over branch prediction with the jump-threading
> implementation.

I measured the performance difference between using it and not using it,
and it came out a pretty clear plus. On gcc 6.3, gcc master snapshot,
and clang-3.9.  It's not just that more jumps are duplicated, it's also
that the switch() always adds a boundary check.

> At least with current gcc (6.3.1 on Fedora 25) at -O2,
> what I see is multiple places jumping to the same indirect jump
> instruction :-(.  It's not a total disaster: as best I can tell, all the
> uses of EEO_JUMP remain distinct.  But gcc has chosen to implement about
> 40 of the 71 uses of EEO_NEXT by jumping to the same couple of
> instructions that increment the "op" register and then do an indirect
> jump :-(.

Yea, I see some of that too - "usually" when there's more than just the
jump in common.  I think there's some gcc variables that influence this
(min-crossjump-insns (5), max-goto-duplication-insns (8)).  Might be
worthwhile experimenting with setting them locally via a pragma or such.
I think Aants wanted to experiment with that, too.

Then there's also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71785
which causes some forms of computed goto (not ours I think) to be
deoptimized in gcc.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-25 Thread Tom Lane

Andres Freund  writes:
> - The !caseExpr->defresult result branch is currently unreachable (and
>   its equivalent was before the patch) because transformCaseExpr()
>   generates a default expression.  I'm inclined to replace the dead code
>   with an assertion. Any reason not to do that?

Ah-hah.  I'd noted that that code wasn't being reached when I did some
code coverage checks this morning, but I hadn't found the cause yet.
Yeah, replacing the if-test with an assert seems fine.

> I think there's some argument to be made that we should move all of
> execSRF.c's logic to their respective callsites.  There's not really
> much shared code: ExecEvalFuncArgs(), tupledesc_match(), init_sexpr() -
> and at least the latter would even become a bit simpler if we didn't
> support both callers.

Possibly.  All that code could stand to be rethought, probably, now that
its mission is just to handle the SRF case.  But at least all the cruft is
in one place for the moment.  And I don't think it's anything we have to
fix before v10, it's just cosmetic.

> Thanks a lot for your work on the patch!

You're welcome!

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

Hi,

On 2017-03-24 17:16:15 -0400, Tom Lane wrote:
> Attached is an updated patch.  I believe this is committable, but
> since I whacked it around quite a bit, I'm sure you'll want to look
> it over first.

Indeed.

Points:

- It's going to take a while for me to get used to execExprInterp.c - I
  constantly try to switch to a nonexistant buffer ;)

- Like your approach with
  /* Check that current tupdesc doesn't have more fields than we allocated */
  in ExecEvalFieldStoreDeForm().

- I like EEO_FLAG_IS_QUAL.

- I think at some point (not sure whether in your or in my revision)
  ScalarArrayOpExpr lost its permission check. Added that.

- Added note that we knowingly don't perform permission checks for
  input/output funcs.

- Both pre/post patch, CoerceViaIO doesn't invoke
  InvokeFunctionExecuteHook - I'm still unclear what that hook is
  useful for, so ...

- The !caseExpr->defresult result branch is currently unreachable (and
  its equivalent was before the patch) because transformCaseExpr()
  generates a default expression.  I'm inclined to replace the dead code
  with an assertion. Any reason not to do that?

- I see you kept determination of the set of constraints to be checked
  for domain at initialization time. Made note that that a) might change
  b) could change behaviour.


> Please be sure the commit message notes that function EXECUTE permissions
> are now checked at executor startup not first call.  We need to document
> that in the v10 release notes as an incompatibility, and I'm sure we'll
> forget if the commit log doesn't point it out.

Done.


> * As we discussed, ExecEvalWholeRowVar is now using a pretty inefficient
> method for flattening tuples into datums.  I will take a to-do item to
> fix this.

Thanks.


> * ExecInitCheck is really just ExecInitExpr with a make_ands_explicit
> call in front of it.  It turned out that most of the places that were
> (or should have been) calling it were doing a make_ands_implicit call
> for no other reason than to satisfy its API.  I changed most of those
> to just call ExecInitExpr directly.  There are just a couple of call
> sites left, and I think probably those are likewise not really that
> happy with this API --- but I didn't take the time to chase down where
> the expressions were coming from in those cases.  It seems possible
> that we could remove ExecInitCheck/ExecPrepareCheck entirely.  I'll
> look into this later, too.

Thanks^2.


> * As we discussed, CaseTestValue/DomainValue are pretty messy and need
> to be rethought.  That might not get done for v10 though.

Yea, I'm disinclined to tackle this just now, there's enough other stuff
pending - but I'm willing to tackle it for v11 unless you beat me to it.


> * I'm not very happy that execSRF.c has two somewhat different
> implementations of largely similar functionality, and
> SetExprState.elidedFuncState seems like a wart.  That's mostly a
> pre-existing problem of course.  I'm satisfied with leaving it as it is
> for now, but eventually some refactoring there would be good.

Yea, that's not pretty.  Given the historical difference in behaviour
between SRF and ROWS FROM (the latter always materializes, the former
only when the set function does so itself), I'm not sure they can easily
be simplified.

I'm not really sure how to get rid of the issue underlying
elidedFuncState?  I mean we could move that case into nodeFunctionscan,
above ExecMakeTableFunctionResult's level, but that doesn't really seem
like an improvement.


I think there's some argument to be made that we should move all of
execSRF.c's logic to their respective callsites.  There's not really
much shared code: ExecEvalFuncArgs(), tupledesc_match(), init_sexpr() -
and at least the latter would even become a bit simpler if we didn't
support both callers.


Thanks a lot for your work on the patch!

And pushed.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-25 Thread Tom Lane

More random musing ... have you considered making the jump-target fields
in expressions be relative rather than absolute indexes?  That is,
EEO_JUMP would look like

op += (stepno); \
EEO_DISPATCH(); \

instead of

op = &state->steps[stepno]; \
EEO_DISPATCH(); \

I have not carried out a full patch to make this work, but just making
that one change and examining the generated assembly code looks promising.
Instead of this

movslq  40(%r14), %r8
salq$6, %r8
addq24(%rbx), %r8
movq%r8, %r14
jmp *(%r8)

we get this

movslq  40(%r14), %rax
salq$6, %rax
addq%rax, %r14
jmp *(%r14)

which certainly looks like it ought to be faster.  Also, the real reason
I got interested in this at all is that with relative jumps, groups of
steps would be position-independent within the steps array, which would
enable some compile-time tricks that seem impractical with the current
definition.

BTW, now that I've spent a bit of time looking at the generated assembly
code, I'm kind of disinclined to believe any arguments about how we have
better control over branch prediction with the jump-threading
implementation.  At least with current gcc (6.3.1 on Fedora 25) at -O2,
what I see is multiple places jumping to the same indirect jump
instruction :-(.  It's not a total disaster: as best I can tell, all the
uses of EEO_JUMP remain distinct.  But gcc has chosen to implement about
40 of the 71 uses of EEO_NEXT by jumping to the same couple of
instructions that increment the "op" register and then do an indirect
jump :-(.

So it seems that we're at the mercy of gcc's whims as to which instruction
dispatches will be distinguishable to the hardware; which casts a very
dark shadow over any benchmarking-based arguments that X is better than Y
for branch prediction purposes.  Compiler version differences are likely
to matter a lot more than anything we do.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-24 Thread Tom Lane

btw ... I just got around to looking at a code coverage report for this
patched version, and that reminded me of something I'd already suspected:
EEOP_INNER_SYSVAR and EEOP_OUTER_SYSVAR seem to be dead code.  That's
unsurprising, because we never try to access a tuple's system columns
above the scan level.  If a query asks for system columns, those get
passed up to upper query levels as ordinary user-side columns.

We could keep the execution support for those opcodes, or we could rip it
out and throw an error in execExpr.c if one would need to be generated.
I'm a little inclined to the latter, because it seems like the plan is
to grow more support for every opcode in the future.  We don't need to
be adding support for unreachable opcodes.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-24 Thread Tom Lane

Andres Freund  writes:
> On 2017-03-24 11:26:27 -0400, Tom Lane wrote:
>> Another modest proposal:
>> 
>> I'm not really sold on the approach of using EEOP_FETCHSOME opcodes to
>> trigger initial tupleslot de-forming.  Certainly we want to have a single
>> slot_getsomeattrs call per source slot, but as-is, we need a separate
>> traversal over the expression tree just to precompute the max attribute
>> number needed.  That can't be good for expression compile speed, and
>> it introduces nontrivial bug risks because the code that does that
>> is completely decoupled from the code that emits the EEOP_VAR opcodes
>> (which are what's really relying on the de-forming to have happened).

> Hm.  We had the separate traversal for projections for a long while, and
> I don't think there've been a a lot of changes to the extraction of the
> last attribute number.

That's easily disproven just by looking at the code:

/*
 * Don't examine the arguments or filters of Aggrefs or WindowFuncs,
 * because those do not represent expressions to be evaluated within the
 * calling expression's econtext.  GroupingFunc arguments are never
 * evaluated at all.
 */
if (IsA(node, Aggref))
return false;
if (IsA(node, WindowFunc))
return false;
if (IsA(node, GroupingFunc))
return false;
return expression_tree_walker(node, get_last_attnums_walker,
  (void *) info);

The WindowFunc exception hasn't been there so long, and the GroupingFunc
one is very new.  And who's to say whether e.g. the recent XMLTABLE patch
got this right at all?  We could easily be extracting columns we don't
need to.

I'm willing to leave this as-is for the moment, but I really think we
should look into changing it (after the main patch is in).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-24 Thread Andres Freund

Hi,

On 2017-03-24 11:26:27 -0400, Tom Lane wrote:
> Another modest proposal:
> 
> I'm not really sold on the approach of using EEOP_FETCHSOME opcodes to
> trigger initial tupleslot de-forming.  Certainly we want to have a single
> slot_getsomeattrs call per source slot, but as-is, we need a separate
> traversal over the expression tree just to precompute the max attribute
> number needed.  That can't be good for expression compile speed, and
> it introduces nontrivial bug risks because the code that does that
> is completely decoupled from the code that emits the EEOP_VAR opcodes
> (which are what's really relying on the de-forming to have happened).

Hm.  We had the separate traversal for projections for a long while, and
I don't think there've been a a lot of changes to the extraction of the
last attribute number.  I'm very doubtful that the cost of traversing
the expression twice is meaningful in comparison to the other costs.


> What do you think about a design like this:
> 
> * Drop the FETCHSOME opcodes.
> 
> * Add fields to struct ExprState that will hold the maximum inner,
> outer, and scan attribute numbers needed.
> 
> * ExecInitExpr initializes those fields to zero, and then during
> ExecInitExprRec, whenever we generate an EEOP_VAR opcode, we do e.g.
> 
>   state->last_inner_attno = Max(state->last_inner_attno,
> variable->varattno);
> 
> * ExecInitExprSlots, get_last_attnums_walker, etc all go away.
> 
> * In the startup segment of ExecInterpExpr, add
> 
>   if (state->last_inner_attno > 0)
>   slot_getsomeattrs(innerslot, state->last_inner_attno);
>   if (state->last_outer_attno > 0)
>   slot_getsomeattrs(outerslot, state->last_outer_attno);
>   if (state->last_scan_attno > 0)
>   slot_getsomeattrs(scanslot, state->last_scan_attno);
> 
> This would be a little different from the existing code as far as runtime
> branch-prediction behavior goes, but it's not apparent to me that it'd be
> any worse.

I'd be suprised if it weren't.

I'm not super strongly against this setup, but I fail to see the benefit
of whacking this around.  I've benchmarked the previous/current setup
fairly extensively, and I'd rather not redo that.  In contrast to the
other changes you've talked about, this definitely is in the hot-path...


> Also, for JIT purposes it'd still be entirely possible to compile the
> slot_getsomeattrs calls in-line; you'd just be looking to a different
> part of the ExprState struct to find out what to do.

Yea, we could do that.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-24 Thread Tom Lane

Another modest proposal:

I'm not really sold on the approach of using EEOP_FETCHSOME opcodes to
trigger initial tupleslot de-forming.  Certainly we want to have a single
slot_getsomeattrs call per source slot, but as-is, we need a separate
traversal over the expression tree just to precompute the max attribute
number needed.  That can't be good for expression compile speed, and
it introduces nontrivial bug risks because the code that does that
is completely decoupled from the code that emits the EEOP_VAR opcodes
(which are what's really relying on the de-forming to have happened).

What do you think about a design like this:

* Drop the FETCHSOME opcodes.

* Add fields to struct ExprState that will hold the maximum inner,
outer, and scan attribute numbers needed.

* ExecInitExpr initializes those fields to zero, and then during
ExecInitExprRec, whenever we generate an EEOP_VAR opcode, we do e.g.

state->last_inner_attno = Max(state->last_inner_attno,
  variable->varattno);

* ExecInitExprSlots, get_last_attnums_walker, etc all go away.

* In the startup segment of ExecInterpExpr, add

if (state->last_inner_attno > 0)
slot_getsomeattrs(innerslot, state->last_inner_attno);
if (state->last_outer_attno > 0)
slot_getsomeattrs(outerslot, state->last_outer_attno);
if (state->last_scan_attno > 0)
slot_getsomeattrs(scanslot, state->last_scan_attno);

This would be a little different from the existing code as far as runtime
branch-prediction behavior goes, but it's not apparent to me that it'd be
any worse.  Also, for JIT purposes it'd still be entirely possible to
compile the slot_getsomeattrs calls in-line; you'd just be looking to a
different part of the ExprState struct to find out what to do.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

On 2017-03-23 21:58:03 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-03-23 21:26:19 -0400, Tom Lane wrote:
> >> Hmm, I see ... but that only works in the cases where the caller of
> >> ExecBuildProjectionInfo supplied a source slot, and a lot of 'em
> >> don't.
> 
> > Right, the old and new code comment on that:
> 
> >  * inputDesc can be NULL, but if it is not, we check to see whether simple
> >  * Vars in the tlist match the descriptor.  It is important to provide
> >  * inputDesc for relation-scan plan nodes, as a cross check that the 
> > relation
> >  * hasn't been changed since the plan was made.  At higher levels of a plan,
> >  * there is no need to recheck.
> 
> Ah, I'd forgotten the assumption that we only need to check this at scan
> level.
> 
> >> I'd already put in the infrastructure to add ASSIGN_FOO_VAR_FIRST
> >> step types.  I could take it back out, but I wonder if it wouldn't be
> >> smarter to keep it and remove the restriction in ExecBuildProjectionInfo.
> >> Or maybe we could have ExecBuildProjectionInfo emit either
> >> ASSIGN_FOO_VAR_FIRST or ASSIGN_FOO_VAR depending on whether it can prove
> >> the reference safe.
> 
> > I think it's probably ok to just leave the check in, and remove those
> > comments, and simplify the isSimpleVar stuff to only check if
> >  IsA(tle->expr, Var) && ((Var *) tle->expr)->varattno > 0)
> 
> Not sure.  It's a pretty fair amount of duplicative code, once you finish
> dealing with all the ExecJustFoo functions in addition to the main code
> paths.  At this point I'm inclined to take it back out and improve the
> comments around ExecBuildProjectionInfo.

I'm ok with both.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

Andres Freund  writes:
> On 2017-03-23 21:26:19 -0400, Tom Lane wrote:
>> Hmm, I see ... but that only works in the cases where the caller of
>> ExecBuildProjectionInfo supplied a source slot, and a lot of 'em
>> don't.

> Right, the old and new code comment on that:

>  * inputDesc can be NULL, but if it is not, we check to see whether simple
>  * Vars in the tlist match the descriptor.  It is important to provide
>  * inputDesc for relation-scan plan nodes, as a cross check that the relation
>  * hasn't been changed since the plan was made.  At higher levels of a plan,
>  * there is no need to recheck.

Ah, I'd forgotten the assumption that we only need to check this at scan
level.

>> I'd already put in the infrastructure to add ASSIGN_FOO_VAR_FIRST
>> step types.  I could take it back out, but I wonder if it wouldn't be
>> smarter to keep it and remove the restriction in ExecBuildProjectionInfo.
>> Or maybe we could have ExecBuildProjectionInfo emit either
>> ASSIGN_FOO_VAR_FIRST or ASSIGN_FOO_VAR depending on whether it can prove
>> the reference safe.

> I think it's probably ok to just leave the check in, and remove those
> comments, and simplify the isSimpleVar stuff to only check if
>  IsA(tle->expr, Var) && ((Var *) tle->expr)->varattno > 0)

Not sure.  It's a pretty fair amount of duplicative code, once you finish
dealing with all the ExecJustFoo functions in addition to the main code
paths.  At this point I'm inclined to take it back out and improve the
comments around ExecBuildProjectionInfo.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

On 2017-03-23 21:26:19 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-03-23 20:36:32 -0400, Tom Lane wrote:
> >> The problem here is that once we drop the buffer pin, any pointers we may
> >> have into on-disk data are dangling pointers --- we're at risk of some
> >> other backend taking away that shared buffer.  (So I'm afraid that the
> >> commentary there is overly optimistic.)  So even though those pointers
> >> may still be there beyond tts_nvalid, subsequent references to them are
> >> very dangerous.
> 
> > This applies to the code in master as well, no?  An ExecEvalScalarVar()
> > followed by an ExecEvalWholeRowVar() would have precisely the same
> > effect?
> 
> Yeah.  The other order would be safe, because ExecEvalScalarVar would do
> slot_getattr which would re-extract the value from the newly materialized
> tuple.  But there definitely seems to be a hazard for the order you
> mentioned.
> 
> > Do we need to do anything about this in the back-branches,
> > given how unlikely this is going to be in practice?
> 
> Probably not.  As I mentioned, I think this may be only theoretical rather
> than real, if you believe that buffer pins would only be associated with
> slots holding references to regular tuples.  And even if it's not
> theoretical, the odds of seeing a failure in the field seem pretty tiny
> given that a just-released buffer shouldn't be subject to recycling for
> a fair while.  But I don't want to leave it like this going forward.

Ok.


> >> Also, while trying to test the above scenario, I realized that the patch
> >> as submitted was being way too cavalier about where it was applying 
> >> CheckVarSlotCompatibility and so on.  The ASSIGN_FOO_VAR steps, for
> >> instance, had no protection at all.
> 
> > I don't think that's true - the assign checks had copied the code from
> > the old ExecBuildProjectionInfo, setting isSimpleVar iff
> > (!attr->attisdropped && variable->vartype == attr->atttypid) - we can
> > check that for projections in contrast to normal expressions because we
> > already know the slot.
> 
> Hmm, I see ... but that only works in the cases where the caller of
> ExecBuildProjectionInfo supplied a source slot, and a lot of 'em
> don't.

Right, the old and new code comment on that:

 * inputDesc can be NULL, but if it is not, we check to see whether simple
 * Vars in the tlist match the descriptor.  It is important to provide
 * inputDesc for relation-scan plan nodes, as a cross check that the relation
 * hasn't been changed since the plan was made.  At higher levels of a plan,
 * there is no need to recheck.

and that seems like reasonable to me?  That said, I think we can remove
that assumption, by checking once.


> As the code stands, we are unable to use ASSIGN_FOO_VAR in quite a lot
> of places, including everywhere above the relation scan level.

Hm? If inputDesc isn't given we just, before and after, do:
if (!inputDesc)
isSimpleVar = true; /* can't check 
type, assume OK */



> I'd already put in the infrastructure to add ASSIGN_FOO_VAR_FIRST
> step types.  I could take it back out, but I wonder if it wouldn't be
> smarter to keep it and remove the restriction in ExecBuildProjectionInfo.
> Or maybe we could have ExecBuildProjectionInfo emit either
> ASSIGN_FOO_VAR_FIRST or ASSIGN_FOO_VAR depending on whether it can prove
> the reference safe.

I think it's probably ok to just leave the check in, and remove those
comments, and simplify the isSimpleVar stuff to only check if
 IsA(tle->expr, Var) && ((Var *) tle->expr)->varattno > 0)

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

Andres Freund  writes:
> On 2017-03-23 20:36:32 -0400, Tom Lane wrote:
>> The problem here is that once we drop the buffer pin, any pointers we may
>> have into on-disk data are dangling pointers --- we're at risk of some
>> other backend taking away that shared buffer.  (So I'm afraid that the
>> commentary there is overly optimistic.)  So even though those pointers
>> may still be there beyond tts_nvalid, subsequent references to them are
>> very dangerous.

> This applies to the code in master as well, no?  An ExecEvalScalarVar()
> followed by an ExecEvalWholeRowVar() would have precisely the same
> effect?

Yeah.  The other order would be safe, because ExecEvalScalarVar would do
slot_getattr which would re-extract the value from the newly materialized
tuple.  But there definitely seems to be a hazard for the order you
mentioned.

> Do we need to do anything about this in the back-branches,
> given how unlikely this is going to be in practice?

Probably not.  As I mentioned, I think this may be only theoretical rather
than real, if you believe that buffer pins would only be associated with
slots holding references to regular tuples.  And even if it's not
theoretical, the odds of seeing a failure in the field seem pretty tiny
given that a just-released buffer shouldn't be subject to recycling for
a fair while.  But I don't want to leave it like this going forward.

>> So I think that we have got to fix ExecEvalWholeRowVar so that it doesn't
>> clobber the state of the slot.

> That seems like a good plan.

Yeah.  I have the stopgap code in my working copy, and will look at
refactoring the tuptoaster code for better performance later.

>> Also, while trying to test the above scenario, I realized that the patch
>> as submitted was being way too cavalier about where it was applying 
>> CheckVarSlotCompatibility and so on.  The ASSIGN_FOO_VAR steps, for
>> instance, had no protection at all.

> I don't think that's true - the assign checks had copied the code from
> the old ExecBuildProjectionInfo, setting isSimpleVar iff
> (!attr->attisdropped && variable->vartype == attr->atttypid) - we can
> check that for projections in contrast to normal expressions because we
> already know the slot.

Hmm, I see ... but that only works in the cases where the caller of
ExecBuildProjectionInfo supplied a source slot, and a lot of 'em don't.
As the code stands, we are unable to use ASSIGN_FOO_VAR in quite a lot
of places, including everywhere above the relation scan level.

I'd already put in the infrastructure to add ASSIGN_FOO_VAR_FIRST
step types.  I could take it back out, but I wonder if it wouldn't be
smarter to keep it and remove the restriction in ExecBuildProjectionInfo.
Or maybe we could have ExecBuildProjectionInfo emit either
ASSIGN_FOO_VAR_FIRST or ASSIGN_FOO_VAR depending on whether it can prove
the reference safe.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

On 2017-03-23 20:36:32 -0400, Tom Lane wrote:
> So I think that we have got to fix ExecEvalWholeRowVar so that it doesn't
> clobber the state of the slot.  Right at the moment, the only way to do
> that seems to be to do this instead of ExecFetchSlotTupleDatum:
> 
> tuple = ExecCopySlotTuple(slot);
> dtuple = (HeapTupleHeader)
> DatumGetPointer(heap_copy_tuple_as_datum(tuple,
>  slot->tts_tupleDescriptor));
> heap_freetuple(tuple);

Hm.  One disadvantage would be that repeated whole-row references to the
same table would be a bit slower, because we'd repeatedly form a tuple
from a virtual one - but I have a hard time coming up with a scenario
where that'd matter.  I'd suspect that in the end it'd probably even
have a *positive* performance impact, because right now the next scalar
access will have to deform the whole tuple again, and that seems like a
lot more likely scenario.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

On 2017-03-23 20:36:32 -0400, Tom Lane wrote:
> I found a rather nasty bug :-( ... the comment in EEOP_INNER_VAR_FIRST about
> 
> +/*
> + * Can't assert tts_nvalid, as wholerow var evaluation or such
> + * could have materialized the slot - but the contents are still
> + * valid :/
> + */
> +Assert(op->d.var.attnum >= 0);
> 
> is actually averting its eyes from a potentially serious problem.  If we
> go through ExecEvalWholeRowVar, which calls ExecFetchSlotTupleDatum, and
> ExecFetchSlotTuple decides it has to materialize the tuple, then
> ExecMaterializeSlot does this:
> 
> /*
>  * Drop the pin on the referenced buffer, if there is one.
>  */
> if (BufferIsValid(slot->tts_buffer))
> ReleaseBuffer(slot->tts_buffer);
> 
> slot->tts_buffer = InvalidBuffer;
> 
> /*
>  * Mark extracted state invalid.  This is important because the slot is
>  * not supposed to depend any more on the previous external data; we
>  * mustn't leave any dangling pass-by-reference datums in tts_values.
>  * However, we have not actually invalidated any such datums, if there
>  * happen to be any previously fetched from the slot.  (Note in particular
>  * that we have not pfree'd tts_mintuple, if there is one.)
>  */
> slot->tts_nvalid = 0;
> 
> 
> The problem here is that once we drop the buffer pin, any pointers we may
> have into on-disk data are dangling pointers --- we're at risk of some
> other backend taking away that shared buffer.  (So I'm afraid that the
> commentary there is overly optimistic.)  So even though those pointers
> may still be there beyond tts_nvalid, subsequent references to them are
> very dangerous.

This applies to the code in master as well, no?  An ExecEvalScalarVar()
followed by an ExecEvalWholeRowVar() would have precisely the same
effect?  Do we need to do anything about this in the back-branches,
given how unlikely this is going to be in practice?


> So I think that we have got to fix ExecEvalWholeRowVar so that it doesn't
> clobber the state of the slot.

That seems like a good plan.


> Also, while trying to test the above scenario, I realized that the patch
> as submitted was being way too cavalier about where it was applying 
> CheckVarSlotCompatibility and so on.  The ASSIGN_FOO_VAR steps, for
> instance, had no protection at all.

I don't think that's true - the assign checks had copied the code from
the old ExecBuildProjectionInfo, setting isSimpleVar iff
(!attr->attisdropped && variable->vartype == attr->atttypid) - we can
check that for projections in contrast to normal expressions because we
already know the slot.  The relevant comment for that, from before the
patch, is:
 * inputDesc.  (Note: if there is a type mismatch then ExecEvalScalarVar
 * will probably throw an error at runtime, but we leave that to it.)
 */


> I hope to have a fully reviewed patch to pass back to you tomorrow.
> Or Saturday at the latest.

Cool.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

Hi,m

On 2017-03-23 17:40:55 -0400, Tom Lane wrote:
> Stylistic thought ... I am wondering if it wouldn't be a good idea
> to replace EEOP_CASE_WHEN_STEP, EEOP_CASE_THEN_STEP, EEOP_COALESCE,
> and EEOP_ARRAYREF_CHECKINPUT with instructions defined in a less
> usage-dependent way as
> 
>   EEOP_JUMP   unconditional jump
>   EEOP_JUMP_IF_NULL   jump if step result is null
>   EEOP_JUMP_IF_NOT_NULL   jump if step result isn't null
>   EEOP_JUMP_IF_NOT_TRUE   jump if step result isn't TRUE
> 
> One could imagine later filling out this set with the other BoolTest
> condition types, but that seems to be all we need right now.

Hm, no arguments against, but I'm also not particularly excited about
the change.


> These are basically just renamings of the step types that exist now,
> although EEOP_ARRAYREF_CHECKINPUT would have to drop its not-very-
> necessary Assert(!op->d.arrayref.state->isassignment).

I won't shed a tear about that assert's removal.


> Well, I guess I should say that they're renamings of the semantics
> that I have for these steps in my working copy; for instance, I got
> rid of casewhen.value/casewhen.isnull in favor of letting CASE WHEN
> expressions evaluate into the CASE's final output variable.

That sounds like a sensible change (in the abstract, I obviously haven't
seen your working copy).


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

I found a rather nasty bug :-( ... the comment in EEOP_INNER_VAR_FIRST about

+/*
+ * Can't assert tts_nvalid, as wholerow var evaluation or such
+ * could have materialized the slot - but the contents are still
+ * valid :/
+ */
+Assert(op->d.var.attnum >= 0);

is actually averting its eyes from a potentially serious problem.  If we
go through ExecEvalWholeRowVar, which calls ExecFetchSlotTupleDatum, and
ExecFetchSlotTuple decides it has to materialize the tuple, then
ExecMaterializeSlot does this:

/*
 * Drop the pin on the referenced buffer, if there is one.
 */
if (BufferIsValid(slot->tts_buffer))
ReleaseBuffer(slot->tts_buffer);

slot->tts_buffer = InvalidBuffer;

/*
 * Mark extracted state invalid.  This is important because the slot is
 * not supposed to depend any more on the previous external data; we
 * mustn't leave any dangling pass-by-reference datums in tts_values.
 * However, we have not actually invalidated any such datums, if there
 * happen to be any previously fetched from the slot.  (Note in particular
 * that we have not pfree'd tts_mintuple, if there is one.)
 */
slot->tts_nvalid = 0;


The problem here is that once we drop the buffer pin, any pointers we may
have into on-disk data are dangling pointers --- we're at risk of some
other backend taking away that shared buffer.  (So I'm afraid that the
commentary there is overly optimistic.)  So even though those pointers
may still be there beyond tts_nvalid, subsequent references to them are
very dangerous.

I think that it's pretty hard to hit this in practice, maybe impossible,
because the normal case for an "on-disk" tuple is that
TTS_HAS_PHYSICAL_TUPLE is true, so that ExecFetchSlotTuple won't change
the state of the slot.  If we have a virtual tuple that has to be
materialized, then by that very token it won't have a buffer pin to drop.
But I find this fragile as heck, and the aforesaid patch comment surely
isn't adequately documenting the safety considerations.  Also, if there
ever were a live bug here, reproducing it would be damn hard because of
the low probability that a just-unpinned buffer would get replaced any
time soon.  (Hm, I wonder whether the buffer cache needs something
analogous to the syscaches' CLOBBER_CACHE_ALWAYS behavior...)

Besides which, I really really don't like the lack of an "attnum <
tts_nvalid" assertion there; that's just obviously failing to check for
very simple bugs, such as getting the FETCHSOME steps wrong.

So I think that we have got to fix ExecEvalWholeRowVar so that it doesn't
clobber the state of the slot.  Right at the moment, the only way to do
that seems to be to do this instead of ExecFetchSlotTupleDatum:

tuple = ExecCopySlotTuple(slot);
dtuple = (HeapTupleHeader)
DatumGetPointer(heap_copy_tuple_as_datum(tuple,
 slot->tts_tupleDescriptor));
heap_freetuple(tuple);

That's kind of annoying because of the double copying involved, but
tuptoaster.c doesn't expose any functionality for this except
heap_copy_tuple_as_datum().  I figure we can improve it later --- it looks
like we can refactor heap_copy_tuple_as_datum to expose a function that
reads from Datum/isnull arrays, and then call it on the slot's
tts_values/tts_isnull arrays.  Seems like it might be a good idea to
think a bit harder about the TupleTableSlot APIs, too, and reduce the
number of cases where execTuples exposes destructive changes to the
state of a slot.

Also, while trying to test the above scenario, I realized that the patch
as submitted was being way too cavalier about where it was applying 
CheckVarSlotCompatibility and so on.  The ASSIGN_FOO_VAR steps, for
instance, had no protection at all.  Think I have that all fixed up
though.

I hope to have a fully reviewed patch to pass back to you tomorrow.
Or Saturday at the latest.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

Stylistic thought ... I am wondering if it wouldn't be a good idea
to replace EEOP_CASE_WHEN_STEP, EEOP_CASE_THEN_STEP, EEOP_COALESCE,
and EEOP_ARRAYREF_CHECKINPUT with instructions defined in a less
usage-dependent way as

EEOP_JUMP   unconditional jump
EEOP_JUMP_IF_NULL   jump if step result is null
EEOP_JUMP_IF_NOT_NULL   jump if step result isn't null
EEOP_JUMP_IF_NOT_TRUE   jump if step result isn't TRUE

One could imagine later filling out this set with the other BoolTest
condition types, but that seems to be all we need right now.

These are basically just renamings of the step types that exist now,
although EEOP_ARRAYREF_CHECKINPUT would have to drop its not-very-
necessary Assert(!op->d.arrayref.state->isassignment).  Well, I guess
I should say that they're renamings of the semantics that I have
for these steps in my working copy; for instance, I got rid of
casewhen.value/casewhen.isnull in favor of letting CASE WHEN expressions
evaluate into the CASE's final output variable.

At least to me, I think the compiling code would be more readable
this way.  I find WHEN_STEP and THEN_STEP a bit odd because they are
emitted after, not before, the expressions you'd think they control.
ARRAYREF_CHECKINPUT is pretty vaguely named, too.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

Hi,

On 2017-03-23 13:14:59 -0400, Tom Lane wrote:
> Looking at some of the coding choices you made here, I see that you're
> making a hard assumption that called functions never scribble on their
> fcinfo->arg/argnull arrays.  I do not believe that we've ever had such
> an assumption before.

I think we did that before, e.g. ExecEvalScalarArrayOp(). Think there's
others too.

> Are we comfortable with that?  If so, we'd better document it.

I think it's ok, but we indeed should document it. I recall a note
somewhere... Can't find it anywhere however, might have misremembered a
note about pass-by-ref arguments.  fmgr/README? A note in
FunctionCallInfoData's definition?

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

Looking at some of the coding choices you made here, I see that you're
making a hard assumption that called functions never scribble on their
fcinfo->arg/argnull arrays.  I do not believe that we've ever had such
an assumption before.  Are we comfortable with that?  If so, we'd
better document it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-22 Thread Andres Freund

On 2017-03-22 13:15:41 -0400, Tom Lane wrote:
> BTW, I'm fairly concerned by what you did in nodeTidscan.c, ie delaying
> compile of the TID expressions until TidListCreate.  I think that probably
> fails for cases involving, eg, subplans in the expressions; we need
> subplans to get linked to the parent node, and this way won't do it till
> (maybe) too late.  Barring objection I'm going to rearrange that so that
> we still do the compile part at executor start.

No objection here.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-22 Thread Tom Lane

BTW, I'm fairly concerned by what you did in nodeTidscan.c, ie delaying
compile of the TID expressions until TidListCreate.  I think that probably
fails for cases involving, eg, subplans in the expressions; we need
subplans to get linked to the parent node, and this way won't do it till
(maybe) too late.  Barring objection I'm going to rearrange that so that
we still do the compile part at executor start.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-22 Thread Tom Lane

Andres Freund  writes:
> On 2017-03-22 10:41:06 -0400, Tom Lane wrote:
>> * execQual.c doesn't seem to have a unifying reason to exist anymore.
>> It certainly has little to do with evaluating typical qual expressions;
>> what's left in there seems to be mostly concerned with SRFs.  I feel
>> like it might be a good idea to rename it, but to what?  execExprUtils.c
>> perhaps?  Or maybe we should destroy it altogether, shoving the SRF
>> stuff into nodeFunctionscan.c and moving what little remains into
>> execUtils.c.

> Yea, I was wondering about that too.  What would we do with
> GetAttributeByName/Num?

I was thinking execUtils.c for those.

>> * I do not like the function name ExecInstantiateExpr().  Webster's
>> defines "instantiate" as "to represent (an abstraction) by a concrete
>> instance", which does not seem to me to have a lot to do with what this
>> function actually does.
>> ...

> Either of those work, but they don't strike me as perfect either, but I
> can't come up with something better (ExecReadyExprForExec()?).

Actually, ExecReadyExpr seems kind of nice for this (using "ready" in its
verb sense, "to prepare (someone or something) for an activity or
purpose").

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-22 Thread Andres Freund

On 2017-03-22 10:41:06 -0400, Tom Lane wrote:
> I've been busily hacking away on this, trying to make things cleaner and
> fix a couple of bugs I stumbled across.  (Confusion between ExecQual and
> ExecCheck, for instance - we apparently lack regression tests exercising
> constraints-returning-null in corner cases such as table rewrite.)  It
> will probably be a day or two more before I'm done.

Thanks!


> A couple of naming questions:
> 
> * I concur with Heikki's dislike for the file name execInterpExpr.c.
> Would it make it any better to switch to execExprInterp.c?  I think
> that having all this stuff under a common pattern execExpr*.c would
> be a good thing (and I notice you've already got one comment referring
> to them that way ...)

That works for me.


> * execQual.c doesn't seem to have a unifying reason to exist anymore.
> It certainly has little to do with evaluating typical qual expressions;
> what's left in there seems to be mostly concerned with SRFs.  I feel
> like it might be a good idea to rename it, but to what?  execExprUtils.c
> perhaps?  Or maybe we should destroy it altogether, shoving the SRF
> stuff into nodeFunctionscan.c and moving what little remains into
> execUtils.c.

Yea, I was wondering about that too.  What would we do with
GetAttributeByName/Num?


> * I do not like the function name ExecInstantiateExpr().  Webster's
> defines "instantiate" as "to represent (an abstraction) by a concrete
> instance", which does not seem to me to have a lot to do with what this
> function actually does.

It perhaps makes a *bit* more sense if you view it from the POV that,
with the future JIT support (WIP versions of which I posted previously),
it'd actually create a compiled function which'd largely be independent
of the of ->steps (except for the non-hotpath functions, which'd still
end up using it).  So one of the above "conrete instances" would be the
interpeted version, another the compiled one.   No, not an entirely
convincing argument.


> There's nothing very abstract about its input.
> More, the implication of "instantiate" is that you can instantiate any
> number of representatives of the same abstraction, but this scribbles
> on the input in a one-way fashion.  I think perhaps something like
> ExecPrepareExpr or ExecFinishExpr or something along that line would
> be better, but nothing is really standing out as le mot juste.

Either of those work, but they don't strike me as perfect either, but I
can't come up with something better (ExecReadyExprForExec()?).

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-22 Thread Tom Lane

I've been busily hacking away on this, trying to make things cleaner and
fix a couple of bugs I stumbled across.  (Confusion between ExecQual and
ExecCheck, for instance - we apparently lack regression tests exercising
constraints-returning-null in corner cases such as table rewrite.)  It
will probably be a day or two more before I'm done.

A couple of naming questions:

* I concur with Heikki's dislike for the file name execInterpExpr.c.
Would it make it any better to switch to execExprInterp.c?  I think
that having all this stuff under a common pattern execExpr*.c would
be a good thing (and I notice you've already got one comment referring
to them that way ...)

* execQual.c doesn't seem to have a unifying reason to exist anymore.
It certainly has little to do with evaluating typical qual expressions;
what's left in there seems to be mostly concerned with SRFs.  I feel
like it might be a good idea to rename it, but to what?  execExprUtils.c
perhaps?  Or maybe we should destroy it altogether, shoving the SRF
stuff into nodeFunctionscan.c and moving what little remains into
execUtils.c.

* I do not like the function name ExecInstantiateExpr().  Webster's
defines "instantiate" as "to represent (an abstraction) by a concrete
instance", which does not seem to me to have a lot to do with what this
function actually does.  There's nothing very abstract about its input.
More, the implication of "instantiate" is that you can instantiate any
number of representatives of the same abstraction, but this scribbles
on the input in a one-way fashion.  I think perhaps something like
ExecPrepareExpr or ExecFinishExpr or something along that line would
be better, but nothing is really standing out as le mot juste.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-21 Thread Tom Lane

Andres Freund  writes:
> On 2017-03-20 16:06:27 -0400, Tom Lane wrote:
>> ... is there a reason why resultnum for EEOP_ASSIGN_* steps is declared
>> size_t and not just int?  Since it's an array index, and one that
>> certainly can't be bigger than AttrNumber, that seems rather confusing.

> Not that I can see, no.  I guess I might have "overcompensated" when
> changing it from AttrNumber - AttrNumber isn't a good idea because that
> needs an extra move-zero-extend, because 16bit indexing isn't that well
> supported on x86.  But that doesn't mean it should be a 64bit number -
> to the contrary actually.

OK, will fix in the edits I'm working on.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-20 Thread Andres Freund

On 2017-03-20 16:06:27 -0400, Tom Lane wrote:
> ... is there a reason why resultnum for EEOP_ASSIGN_* steps is declared
> size_t and not just int?  Since it's an array index, and one that
> certainly can't be bigger than AttrNumber, that seems rather confusing.

Not that I can see, no.  I guess I might have "overcompensated" when
changing it from AttrNumber - AttrNumber isn't a good idea because that
needs an extra move-zero-extend, because 16bit indexing isn't that well
supported on x86.  But that doesn't mean it should be a 64bit number -
to the contrary actually.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-20 Thread Tom Lane

... is there a reason why resultnum for EEOP_ASSIGN_* steps is declared
size_t and not just int?  Since it's an array index, and one that
certainly can't be bigger than AttrNumber, that seems rather confusing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-20 Thread Tom Lane

Andres Freund  writes:
> Additionally I added a regression test for the nearly entirely untested
> nodeTidscan.c, after I'd broken it previously without noticing (thanks
> Andreas).

I went ahead and pushed this part, since it seemed pretty uncontroversial.
I added a bit more stuff to get the LOC measurement up on the planner
side too.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-20 Thread Tom Lane

Andres Freund  writes:
> On 2017-03-15 20:09:03 -0400, Tom Lane wrote:
>> I think it would be worth creating a README file giving an overview
>> of how all of this patch is supposed to work.  You also need to do a
>> whole lot more work on the function-level comments.

> I tried to improve upon both fronts.  I've added the higher level
> explanation to executor/README, but I don't feel very strong about that.

> I'm not quite sure it's exactly what you wanted however, the above ask
> could also be understood to have more of an motivational angle,
> describing why and what exactly is changed?  I'm also still not sure how
> understandable it's for anybody that hasn't had their head in this for a
> while...

Well, I wasn't totally sure what was needed either.  I'm coming to this
relatively fresh, having paid little attention to the thread up to now,
so maybe I'll try to add material to what you wrote as I figure things
out.

> Did I understand correctly that you'd rather just merge
> ExecGetLastAttnums into execExpr.c, instead of making it globally
> available?

Yeah, just moving it over seems like the thing to do for now.  We can
expose it later if there proves to be an actual reason to do that,
but as I mentioned, I'm doubtful that there will be one.

I'll start reading these...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-19 Thread Andres Freund

Hi,


On 2017-03-19 23:55:50 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I've been pondering if we can't entirely get rid of CaseTest etc, the
> > amount of hackery required seems not like a good thing. One way I'd
> > prototyped was to replace them with PARAM_EXEC nodes - then the whole
> > issue of them potentially having different values at different parts of
> > an expression vanishes because the aliasing is removed.
> 
> Yes, replacing all of that with Param slots had occurred to me too.
> We might want to keep the special parse node types for convenience in
> reverse-listing, but having them act just like PARAM_EXEC for execution
> purposes seems promising.

As long as that special parse-time node is part of the same value
numbering, that makes sense (could just name make it a subtype of param
ala PARAM_CASE).  I don't think we actually do anything useful in
ruleutils etc with either CaseTest or CoerceToDomainValue.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-19 Thread Tom Lane

Andres Freund  writes:
> On 2017-03-16 11:15:16 -0400, Tom Lane wrote:
>> The thing that actually made the ExecEvalCase code into a bug was that
>> we were using ExprContext-level fields to store the current caseValue,
>> allowing aliasing to occur across nested CASEs.  I think that in
>> this implementation, it ought to be possible to get rid of
>> ExprContext.caseValue_datum et al altogether, in favor of some storage
>> location that's private to each CASE expression.  I'm a bit disappointed
>> that that doesn't seem to have happened.

> ...  Unfortunately
> CaseTest/CoerceToDomainValue are reused outside of domain / case
> expressions in a bunch of places (plpgsql uses CaseTest for casts
> evaluation, validateDomainConstraint/domain_check_input evaluate domain
> constraints without a CoerceToDomain node).

Yeah, there's various stuff that did that for expediency.

> I'd like to get rid of those usages, but that'd recurse into rewriting
> plpgsql casts and other random pieces of code into a different approach
> - something I'd like to avoid doing at the same as this already large
> patch.

Agreed, maybe we should just plan to clean that up later.

> I've been pondering if we can't entirely get rid of CaseTest etc, the
> amount of hackery required seems not like a good thing. One way I'd
> prototyped was to replace them with PARAM_EXEC nodes - then the whole
> issue of them potentially having different values at different parts of
> an expression vanishes because the aliasing is removed.

Yes, replacing all of that with Param slots had occurred to me too.
We might want to keep the special parse node types for convenience in
reverse-listing, but having them act just like PARAM_EXEC for execution
purposes seems promising.

>> Eventually, I would also like to find a way to remove the restriction
>> imposed by the other part of f0c7b789a, ie that we can't inline a SQL
>> function when that would result in intermixing two levels of CASE
>> expression.

> That seems to suggest my PARAM_EXEC idea isn't necessarily perfect -
> inlining would cause aliasing again, but it'd also not hard to fix that
> up.

Right, inlining would probably require some parameter-renumbering to avoid
aliasing.  But at least we'd have a clear framework for how to handle it,
whereas the way things are now, it's just impossible.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-19 Thread Andres Freund

Hi,

On 2017-03-16 11:15:16 -0400, Tom Lane wrote:
> The thing that actually made the ExecEvalCase code into a bug was that
> we were using ExprContext-level fields to store the current caseValue,
> allowing aliasing to occur across nested CASEs.  I think that in
> this implementation, it ought to be possible to get rid of
> ExprContext.caseValue_datum et al altogether, in favor of some storage
> location that's private to each CASE expression.  I'm a bit disappointed
> that that doesn't seem to have happened.

The patch actually does so - during ExecInitExprRec the "relevant"
case/domain testval is stored in ExprState->innermost_*, those pointers
are then stored directly in the relevant steps for
CaseTest/CoerceToDomainValue evaluation.  Unfortunately
CaseTest/CoerceToDomainValue are reused outside of domain / case
expressions in a bunch of places (plpgsql uses CaseTest for casts
evaluation, validateDomainConstraint/domain_check_input evaluate domain
constraints without a CoerceToDomain node).  I.e
ExprContext.caseValue_datum etc. aren't used for normal expressions
anymore, just for the ad-hoc hackery in a bunch of places.

I'd like to get rid of those usages, but that'd recurse into rewriting
plpgsql casts and other random pieces of code into a different approach
- something I'd like to avoid doing at the same as this already large
patch.

I've been pondering if we can't entirely get rid of CaseTest etc, the
amount of hackery required seems not like a good thing. One way I'd
prototyped was to replace them with PARAM_EXEC nodes - then the whole
issue of them potentially having different values at different parts of
an expression vanishes because the aliasing is removed.

> Eventually, I would also like to find a way to remove the restriction
> imposed by the other part of f0c7b789a, ie that we can't inline a SQL
> function when that would result in intermixing two levels of CASE
> expression.  An implementation along the lines of what I've sketched
> above could handle that easily enough, as long as we could identify
> which nested level of CASE a particular CaseTestExpr belongs to.
> I don't know how to do that offhand, but it seems like it ought to be
> soluble if we put a bit of thought into it.

I haven't thought overly much about this, but I agree, it looks like it
should be doable.

That seems to suggest my PARAM_EXEC idea isn't necessarily perfect -
inlining would cause aliasing again, but it'd also not hard to fix that
up.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-17 Thread Tom Lane

Andres Freund  writes:
> That said, it seems this is something that has to wait for a later
> release, I'm putting back in similar logic as there was before (not a
> branch, but change the opcode to a non-checking variant).

Yeah, I was wondering if changing the opcode would be preferable to
a first-time flag.

>> So my recommendation is to drop 0001 and include the same one-time
>> checks that execQual.c currently has as out-of-line one-time checks
>> in the new code.  We can revisit that later, but time grows short for
>> v10.  I would much rather have a solid version of 0004 and not 0001,
>> than not have anything for v10 because we spent too much time dealing
>> with adding new dependencies.

> Doing that (+README).

OK.  I believe that we can get this committed after the documentation
problems are sorted.  I noticed a lot of small things that bugged me,
mostly sloppy comments, but I think that the most efficient way to
handle those is for me to make an editorial pass on your next version.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-17 Thread Andres Freund

Hi,

On 2017-03-17 11:36:30 -0400, Tom Lane wrote:
> Having said all that, I think that 0001 is contributing very little to the
> goals of this patch set.  Andres stated that he wanted it so as to drop
> some of the one-time checks that execQual.c currently does for Vars, but
> I'm not really convinced that we could do that safely even with these
> additional dependencies in place.  Moreover, I really doubt that there's
> a horrible performance cost from including something like
> 
>   if (unlikely(op->first_execution))
>   out_of_line_checking_subroutine(...);

> in the execution code for Vars.

But it actually does (well, for some relatively small value of horrible)
The issue is that op->first_execution is actually badly predictable,
because it will be set back/forth between executions of different
expressions (in the same plantree).  Obviously you'll not notice if you
have a Var and then some expensive stuff, but it's noticeable for
cheap-ish expressions (say e.g. a single Var).  So the branch prediction
often doesn't handle this gracefully - it also just expands the set of
to-be-tracked jumps.

If we had a decent way to actually check this during ExecInitExpr() (et
al), the whole discussion would be different - I'd be all expanding the
set of such checks even.  But I couldn't find a decent way to get there
- when expressions are initialized we don't even get an ExprContext (not
to speak of valid slots), nor is parent->plan very helpful.

That said, it seems this is something that has to wait for a later
release, I'm putting back in similar logic as there was before (not a
branch, but change the opcode to a non-checking variant).

> And that certainly isn't adding any
> complexity for JIT compilation that we don't face anyway for other
> execution step types.

Obviously a if (op->first_execution) isn't an issue, it's actually only
doing the first time through that's not easily possible.

> So my recommendation is to drop 0001 and include the same one-time
> checks that execQual.c currently has as out-of-line one-time checks
> in the new code.  We can revisit that later, but time grows short for
> v10.  I would much rather have a solid version of 0004 and not 0001,
> than not have anything for v10 because we spent too much time dealing
> with adding new dependencies.

Doing that (+README).

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-17 Thread Tom Lane

Andres Freund  writes:
> [ latest patches ]

I looked through 0001 (the composite-type-dependencies one).  Although
I agree that it'd be good to tighten things up in that area, I do not
think we want this specific patch: it tightens things too much.  Consider
this variant of the existing test case in create_view.sql:

create table tt (f1 int, f2 int, f3 int);
create function ttf() returns setof tt as 'select * from tt' language sql;
create view vv as select f3 from ttf();
alter table tt drop column f3;

Current code allows the above (and causes the view to return nulls
afterwards).  The 0001 patch would forbid the DROP, which is fine,
but it would also forbid dropping either of the other two table
columns, which I think is insupportable.

Also, given the above table and function, consider

create view vvv as select ttf();

This view returns a 3-column composite type.  Now do

alter table tt add column f4 int;

Now the view returns a 4-column composite type.  But at this point the
patch will let you drop the f4 column, but not any of the earlier three.
That's just weird.

So I'm unhappy with the specific decisions made in 0001.  I think what we
really want there, probably, is for find_expr_references_walker to do more
than nothing with Vars referencing non-RELATION RTEs.

Having said all that, I think that 0001 is contributing very little to the
goals of this patch set.  Andres stated that he wanted it so as to drop
some of the one-time checks that execQual.c currently does for Vars, but
I'm not really convinced that we could do that safely even with these
additional dependencies in place.  Moreover, I really doubt that there's
a horrible performance cost from including something like

if (unlikely(op->first_execution))
out_of_line_checking_subroutine(...);

in the execution code for Vars.  And that certainly isn't adding any
complexity for JIT compilation that we don't face anyway for other
execution step types.

So my recommendation is to drop 0001 and include the same one-time
checks that execQual.c currently has as out-of-line one-time checks
in the new code.  We can revisit that later, but time grows short for
v10.  I would much rather have a solid version of 0004 and not 0001,
than not have anything for v10 because we spent too much time dealing
with adding new dependencies.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-16 Thread Tom Lane

I wrote:
> Andres Freund  writes:
>> I don't think there's a danger similar to f0c7b789a here, because the
>> "caller" (i.e. the node that needs the expression's result) expects
>> resvalue/null to be overwritten.

> Yeah, that's what I thought when I wrote the broken code in ExecEvalCase,
> too.  It was wrong.

Along the same line, I notice that you've got some expr step types
overwriting their own input, the various flavors of EEOP_BOOLTEST for
example.  Maybe that's all right but it doesn't really give me a warm
feeling, especially when other single-argument operations like
EEOP_BOOL_NOT_STEP are done differently.  Again, I think a clear
explanation of the design is essential to allow people to reason about
whether this sort of trickery is safe.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-16 Thread Tom Lane

Andres Freund  writes:
> On 2017-03-15 20:09:03 -0400, Tom Lane wrote:
>> That scares me quite a bit, because it smells exactly like the sort of
>> premature optimization that bit us on the rear in CVE-2016-5423 (cf commit
>> f0c7b789a).

> I don't think there's a danger similar to f0c7b789a here, because the
> "caller" (i.e. the node that needs the expression's result) expects
> resvalue/null to be overwritten.

Yeah, that's what I thought when I wrote the broken code in ExecEvalCase,
too.  It was wrong.  Basically you've got to be sure that no aliasing
can occur, and I think the only way to be safe about that is to have a
very clear rule about where results are allowed to get returned to,
preferably one that doesn't ever re-use the same target.  (I think the
repeated use of the same subexpression result address for the arms of
an AND or OR is okay, but it would be a good idea to have a clear
statement of why.)

The thing that actually made the ExecEvalCase code into a bug was that
we were using ExprContext-level fields to store the current caseValue,
allowing aliasing to occur across nested CASEs.  I think that in
this implementation, it ought to be possible to get rid of
ExprContext.caseValue_datum et al altogether, in favor of some storage
location that's private to each CASE expression.  I'm a bit disappointed
that that doesn't seem to have happened.

Eventually, I would also like to find a way to remove the restriction
imposed by the other part of f0c7b789a, ie that we can't inline a SQL
function when that would result in intermixing two levels of CASE
expression.  An implementation along the lines of what I've sketched
above could handle that easily enough, as long as we could identify
which nested level of CASE a particular CaseTestExpr belongs to.
I don't know how to do that offhand, but it seems like it ought to be
soluble if we put a bit of thought into it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-15 Thread Andreas Karlsson


Hi,

I got a test failure with this version of the patch in the postges_fdw. 
It looks to me like it was caused by a typo in the source code which is 
fixed in the attached patch.


After applying this patch check-world passes.

Andreas
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 1763be5cf0..2ba5a2ea69 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -42,7 +42,6 @@ static void TidListCreate(TidScanState *tidstate);
 static int	itemptr_comparator(const void *a, const void *b);
 static TupleTableSlot *TidNext(TidScanState *node);
 
-
 /*
  * Compute the list of TIDs to be visited, by evaluating the expressions
  * for them.
@@ -101,6 +100,7 @@ TidListCreate(TidScanState *tidstate)
 			else if (IsCTIDVar(arg2))
 exprstate = ExecInitExpr((Expr *) linitial(fex->args),
 		  &tidstate->ss.ps);
+			else
 elog(ERROR, "could not identify CTID variable");
 
 			itemptr = (ItemPointer)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

Hi,

On 2017-03-15 20:09:03 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > [ new patches ]
>
> I've started to look at 0004, and the first conclusion I've come to
> is that it's *direly* short of documentation.  To the point that I'd
> vote against committing it if something isn't done about that.

Yea, I asked for input about what's hard to understand and what's not -
I stared at this for a *lot* of time, and it all kinda looks easy-ish
now.  I'm more than willing to expand on the below, and other pieces.

> As  an example, it's quite unclear how ExprEvalSteps acquire correct
> resnull/resvalue pointers, and the algorithm for that seems nontrivial.
> It doesn't help any that the arguments of ExecInitExprRec are entirely
> undocumented.

Generally whatever wants the result of a (sub-)expression passes in the
desired resvalue/resnull.  E.g. when doing a function call the
individual arguments are each prepared for evaluation using
ExecInitExprRec() and resvalue/resnull are pointing into fcinfo's
arg/nulls[i].

> I think it would be worth creating a README file giving an overview
> of how all of this patch is supposed to work.  You also need to do a
> whole lot more work on the function-level comments.

Ok.

> A specific thing I noticed in the particular area of
> what-gets-returned-where is this bit in EEOP_SCALARARRAYOP setup:
>
> +/*
> + * Evaluate array argument into our return value, overwrite
> + * with comparison results afterwards.
> + */
> +ExecInitExprRec((Expr *) lsecond(opexpr->args), parent, 
> state,
> +resv, resnull);
>
> That scares me quite a bit, because it smells exactly like the sort of
> premature optimization that bit us on the rear in CVE-2016-5423 (cf commit
> f0c7b789a).

I don't think there's a danger similar to f0c7b789a here, because the
"caller" (i.e. the node that needs the expression's result) expects
resvalue/null to be overwritten. It'll e.g. be the value "slot" of one
arm (is there a better name for one part of a boolean expression?) of a
boolean expression.

> What's it really buying us to overwrite the return value
> early rather than storing into the fcinfo's second argument slot?

That'd work just as well.

> (The memory of that CVE is part of what's prompting me to demand a clear
> explanation of the algorithm for deciding where steps return their
> results.  Even if this particular code is safe, somebody is going to do
> something unsafe in future if there's not a documented rule to follow.)

I don't think there's a danger here, but I think you more generally have
a point.

> Another thing that ties into the do-I-understand-this-at-all question
> is this bit:
>
> +EEO_CASE(EEOP_BOOL_AND_STEP_FIRST)
> +{
> +*op->d.boolexpr.anynull = false;
> +
> +/*
> + * Fallthrough (can't be last - ANDs have two arguments at 
> least).
> + */
> +}
> +
> +EEO_CASE(EEOP_BOOL_AND_STEP)
>
> It seems like this is missing an "op++;" before falling through.  If it
> isn't, because really BOOL_AND_STEP_FIRST is defined as clearing anynull
> and then also doing a regular BOOL_AND_STEP, then the comment seems rather
> off point.

It's intended to fall through this way, i.e. the difference between
_FIRST and not is just that only the former clears anynull.  What the
comment is about, admittedly too cryptically, is that the _LAST step
that then evaluates anynull cannot be the same step as
EEOP_BOOL_AND_STEP_FIRST, because bool AND/OR always has at least two
"arms".  Will expand / move.

> BTW, it sure seems like ExecInitExprRec and related code ought to set
> off all sorts of valgrind complaints?  It's not being careful at all
> to ensure that all fields of the "scratch" record get initialized before
> we memcpy it to someplace.

It worked not long ago - valgrind's replacment memcpy() doesn't trigger
undefined memory warnings, it just copies the "definedness" of each byte
(or bit?).  But your point gives me an idea: It seems like a good idea
to VALGRIND_MAKE_MEM_UNDEFINED() the "scratch" step at some convenient
places, so the definedness of individual operations is more useful.

Thanks for the look!

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

On 2017-03-15 18:48:28 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-03-15 18:16:57 -0400, Tom Lane wrote:
> >> [ scratches head... ]  What deforming logic do you think I'm proposing
> >> removing?
> 
> > I thought you were suggesting that we don't do the get_last_attnums (and
> > inlined version in the isSimpleVar case) at execution time anymore,
> > instead relying on logic in the planner to know how much to deform ahead
> > of time.  Then we'd do slot_getsomeattrs in the appropriate places.  But
> > I understood you suggesting to do so only in scan nodes - which doesn't
> > seem sufficient, due to the use of materialized / minimal tuples in
> > other types of nodes.  Did I misunderstand?
> 
> We would need to do it anywhere that we might be projecting from a
> materialized tuple, I suppose.  From the planner's standpoint it would be
> about as easy to do this for all plan nodes as only selected ones.

Yea.

> Anyway the core point here seems to be that skipping ExecProject misses a
> bet because the underlying tuple doesn't get disassembled.  A quick-hack
> way of seeing if this helps might be to do slot_getallattrs in the places
> where we skip ExecProject.
> I'm not sure we'd want that as a permanent
> solution, because it's possible that it extracts columns not actually
> needed, but it should at least move the hotspot in your example case.

Hm, that could hurt pretty badly if we actually only access the first
few columns.

I wonder if we shouldn't essentially get rid of all slot_getattr()
calls, and replace them with one slot_getsomeattrs() and then direct
tts_values/nulls access.  Where we compute the column number is then
essentially a separate discussion.

Looking around most of them seem to access multiple columns, and several
of them probably can't conveniently done via the planner:

src/backend/catalog/partition.c
1635:   datum = slot_getattr(slot, keycol, &isNull);

acesses all the partitioned columns and has convenient location to
compute column to deform to (RelationGetPartitionDispatchInfo).

src/backend/catalog/index.c
1797:   iDatum = slot_getattr(slot, keycol, &isNull);

acesses all the partitioned columns and has convenient location to
compute column to deform to (BuildIndexInfo).

src/backend/utils/adt/orderedsetaggs.c
1196:   Datum   d = slot_getattr(slot, nargs + 1, &isnull);
1359:   Datum   d = slot_getattr(slot, nargs + 1, &isnull);

no benefit.

src/backend/executor/nodeMergeAppend.c
246:datum1 = slot_getattr(s1, attno, &isNull1);
247:datum2 = slot_getattr(s2, attno, &isNull2);

accesses all columns in the sort key, convenient place to prepare
(ExecInitMergeAppend).

src/backend/executor/execQual.c
594: * caught inside slot_getattr).  What we have to check for here is the
600: * Note: we allow a reference to a dropped attribute.  slot_getattr will
637:return slot_getattr(slot, attnum, isNull);
676:return slot_getattr(slot, attnum, isNull);
1681:   return slot_getattr(fcache->funcResultSlot, 1, 
isNull);

Already converted in proposed expression evaluation patch.

src/backend/executor/nodeSubplan.c
367:dvalue = slot_getattr(slot, 1, &disnull);
395:prmdata->value = slot_getattr(slot, col, 
&(prmdata->isnull));
565:prmdata->value = slot_getattr(slot, col,
1015:   dvalue = slot_getattr(slot, 1, &disnull);

Accesses all params, convenient location to compute column number
(ExecInitSubPlan).

src/backend/executor/execCurrent.c
186:/* Use slot_getattr to catch any possible mistakes */
188:
DatumGetObjectId(slot_getattr(scanstate->ss_ScanTupleSlot,
193:
DatumGetPointer(slot_getattr(scanstate->ss_ScanTupleSlot,

Accesses only system columns.

src/backend/executor/nodeSetOp.c
107:flag = DatumGetInt32(slot_getattr(inputslot,

Not an actual column.

src/backend/executor/nodeGatherMerge.c
678:datum1 = slot_getattr(s1, attno, &isNull1);
679:datum2 = slot_getattr(s2, attno, &isNull2);

Same as mergeAppend (huh, why did we copy this code), i.e. beneficial.

src/backend/executor/functions.c
970:value = slot_getattr(slot, 1, &(fcinfo->isnull));

no benefit.

src/backend/executor/execJunk.c
253:return slot_getattr(slot, attno, isNull);

no benefit.

src/backend/executor/nodeNestloop.c
136:prm->value = slot_getattr(outerTupleSlot,

accesses all future param values, convenient place to compute column
number.

src/backend/executor/execGrouping.c
100:attr1 = slot_getattr(slot1, att, &isNull1);
102:attr2 = slot_getattr(slot2, att, &isNull2);
170:attr1 = slot_getattr(slot1, att, &isNull1);
175:attr2 = slot_getattr(slot2, att, &isNull2);
501:attr = slot_getattr(slot, att, &isNull);

accesses all g

Re: [HACKERS] WIP: Faster Expression Processing v4

Andres Freund  writes:
> [ new patches ]

I've started to look at 0004, and the first conclusion I've come to
is that it's *direly* short of documentation.  To the point that I'd
vote against committing it if something isn't done about that.  As
an example, it's quite unclear how ExprEvalSteps acquire correct
resnull/resvalue pointers, and the algorithm for that seems nontrivial.
It doesn't help any that the arguments of ExecInitExprRec are entirely
undocumented.

I think it would be worth creating a README file giving an overview
of how all of this patch is supposed to work.  You also need to do a
whole lot more work on the function-level comments.

A specific thing I noticed in the particular area of
what-gets-returned-where is this bit in EEOP_SCALARARRAYOP setup:

+/*
+ * Evaluate array argument into our return value, overwrite
+ * with comparison results afterwards.
+ */
+ExecInitExprRec((Expr *) lsecond(opexpr->args), parent, state,
+resv, resnull);

That scares me quite a bit, because it smells exactly like the sort of
premature optimization that bit us on the rear in CVE-2016-5423 (cf commit
f0c7b789a).  What's it really buying us to overwrite the return value
early rather than storing into the fcinfo's second argument slot?
(The memory of that CVE is part of what's prompting me to demand a clear
explanation of the algorithm for deciding where steps return their
results.  Even if this particular code is safe, somebody is going to do
something unsafe in future if there's not a documented rule to follow.)

Another thing that ties into the do-I-understand-this-at-all question
is this bit:

+EEO_CASE(EEOP_BOOL_AND_STEP_FIRST)
+{
+*op->d.boolexpr.anynull = false;
+
+/*
+ * Fallthrough (can't be last - ANDs have two arguments at least).
+ */
+}
+
+EEO_CASE(EEOP_BOOL_AND_STEP)

It seems like this is missing an "op++;" before falling through.  If it
isn't, because really BOOL_AND_STEP_FIRST is defined as clearing anynull
and then also doing a regular BOOL_AND_STEP, then the comment seems rather
off point.  It should be more like "Fall through to do regular AND step
processing as well".  The place where the comment would be on point
is probably over here:

+case AND_EXPR:
+/*
+ * ANDs have at least two arguments, so that
+ * no step needs to be both FIRST and LAST.
+ */
+Assert(list_length(boolexpr->args) >= 2);
+
+if (off == 0)
+scratch.opcode = EEOP_BOOL_AND_STEP_FIRST;
+else if (off + 1 == nargs)
+scratch.opcode = EEOP_BOOL_AND_STEP_LAST;
+else
+scratch.opcode = EEOP_BOOL_AND_STEP;
+break;

although I think the Assert ought to be examining nargs not
list_length(boolexpr->args) so that it has some visible connection to the
code after it.  (This all applies to OR processing as well, of course.)

BTW, it sure seems like ExecInitExprRec and related code ought to set
off all sorts of valgrind complaints?  It's not being careful at all
to ensure that all fields of the "scratch" record get initialized before
we memcpy it to someplace.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

Andres Freund  writes:
> On 2017-03-15 18:16:57 -0400, Tom Lane wrote:
>> [ scratches head... ]  What deforming logic do you think I'm proposing
>> removing?

> I thought you were suggesting that we don't do the get_last_attnums (and
> inlined version in the isSimpleVar case) at execution time anymore,
> instead relying on logic in the planner to know how much to deform ahead
> of time.  Then we'd do slot_getsomeattrs in the appropriate places.  But
> I understood you suggesting to do so only in scan nodes - which doesn't
> seem sufficient, due to the use of materialized / minimal tuples in
> other types of nodes.  Did I misunderstand?

We would need to do it anywhere that we might be projecting from a
materialized tuple, I suppose.  From the planner's standpoint it would be
about as easy to do this for all plan nodes as only selected ones.

Anyway the core point here seems to be that skipping ExecProject misses a
bet because the underlying tuple doesn't get disassembled.  A quick-hack
way of seeing if this helps might be to do slot_getallattrs in the places
where we skip ExecProject.  I'm not sure we'd want that as a permanent
solution, because it's possible that it extracts columns not actually
needed, but it should at least move the hotspot in your example case.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

On 2017-03-15 18:16:57 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-03-15 17:33:46 -0400, Tom Lane wrote:
> >> We could make the planner mark each table scan node with the highest
> >> column number that the plan will access, and use that to drive a
> >> slot_getsomeattrs call in advance of any access to tuple contents.
> 
> > probably isn't sufficient - we build non-virtual tuples in a good number
> > of places (sorts, tuplestore using stuff like nodeMaterial, nodeHash.c
> > output, ...).  I suspect it'd have measurable negative consequences if
> > we removed the deforming logic for all expressions/projections above
> > such nodes.  I guess we could just do such a logic for every Plan node?
> 
> [ scratches head... ]  What deforming logic do you think I'm proposing
> removing?

I thought you were suggesting that we don't do the get_last_attnums (and
inlined version in the isSimpleVar case) at execution time anymore,
instead relying on logic in the planner to know how much to deform ahead
of time.  Then we'd do slot_getsomeattrs in the appropriate places.  But
I understood you suggesting to do so only in scan nodes - which doesn't
seem sufficient, due to the use of materialized / minimal tuples in
other types of nodes.  Did I misunderstand?

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

Andres Freund  writes:
> On 2017-03-15 17:33:46 -0400, Tom Lane wrote:
>> We could make the planner mark each table scan node with the highest
>> column number that the plan will access, and use that to drive a
>> slot_getsomeattrs call in advance of any access to tuple contents.

> probably isn't sufficient - we build non-virtual tuples in a good number
> of places (sorts, tuplestore using stuff like nodeMaterial, nodeHash.c
> output, ...).  I suspect it'd have measurable negative consequences if
> we removed the deforming logic for all expressions/projections above
> such nodes.  I guess we could just do such a logic for every Plan node?

[ scratches head... ]  What deforming logic do you think I'm proposing
removing?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

On 2017-03-15 17:33:46 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-03-15 16:07:14 -0400, Tom Lane wrote:
> >> As for ExecHashGetHashValue, it's most likely going to be working from
> >> virtual tuples passed up to the join, which won't benefit from
> >> predetermination of the last column to be accessed.  The
> >> tuple-deconstruction would have happened while projecting in the scan
> >> node below.
> 
> > I think the physical tuple stuff commonly thwarts that argument?  On
> > master for tpch's Q5 you can e.g. see the following profile (master):
> 
> Hmmm ... I think you're mistaken in fingering the physical-tuple
> optimization per se, but maybe skipping ExecProject at the scan level
> would cause this result?

I think those are often related (i.e. we replace a smaller targetlist
with a "physical" one, which then allows to skip ExecProject()).

> I've thought for some time that it was dumb to have the executor
> reverse-engineering this info at plan startup anyway.

Yea, it'd be good if this (and some similar tasks like building interim
tuple descriptors) could be moved to the planner.  But:

> We could make the planner mark each table scan node with the highest
> column number that the plan will access, and use that to drive a
> slot_getsomeattrs call in advance of any access to tuple contents.

probably isn't sufficient - we build non-virtual tuples in a good number
of places (sorts, tuplestore using stuff like nodeMaterial, nodeHash.c
output, ...).  I suspect it'd have measurable negative consequences if
we removed the deforming logic for all expressions/projections above
such nodes.  I guess we could just do such a logic for every Plan node?

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

Andres Freund  writes:
> On 2017-03-15 16:07:14 -0400, Tom Lane wrote:
>> As for ExecHashGetHashValue, it's most likely going to be working from
>> virtual tuples passed up to the join, which won't benefit from
>> predetermination of the last column to be accessed.  The
>> tuple-deconstruction would have happened while projecting in the scan
>> node below.

> I think the physical tuple stuff commonly thwarts that argument?  On
> master for tpch's Q5 you can e.g. see the following profile (master):

Hmmm ... I think you're mistaken in fingering the physical-tuple
optimization per se, but maybe skipping ExecProject at the scan level
would cause this result?

I've thought for some time that it was dumb to have the executor
reverse-engineering this info at plan startup anyway.  We could make the
planner mark each table scan node with the highest column number that the
plan will access, and use that to drive a slot_getsomeattrs call in
advance of any access to tuple contents.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

On 2017-03-15 16:07:14 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-03-15 15:41:22 -0400, Tom Lane wrote:
> >> Color me dubious.  Which specific other places have you got in mind, and
> >> do they have expression trees at hand that would tell them which columns
> >> they really need to pull out?
>
> > I was thinking of execGrouping.c's execTuplesMatch(),
> > TupleHashTableHash() (and unequal, but doubt that matters
> > performancewise).  There's also nodeHash.c's ExecHashGetValue(), but I
> > think that'd possibly better fixed differently.
>
> The execGrouping.c functions don't have access to an expression tree
> instructing them which columns to pull out of the tuple, so I fail to see
> how get_last_attnums() would be of any use to them.

I presume most of the callers do.  We'd have to change the API somewhat,
unless we just have a small loop in execTuplesMatch() determining the
biggest column index (which might be worthwhile / acceptable).
TupleHashTableHash() should be able to have that pre-computed in
BuildTupleHashTable().  Might be more viable to go that way.


> As for ExecHashGetHashValue, it's most likely going to be working from
> virtual tuples passed up to the join, which won't benefit from
> predetermination of the last column to be accessed.  The
> tuple-deconstruction would have happened while projecting in the scan
> node below.

I think the physical tuple stuff commonly thwarts that argument?  On
master for tpch's Q5 you can e.g. see the following profile (master):

+   29.38%  postgres  postgres  [.] ExecScanHashBucket
+   16.72%  postgres  postgres  [.] slot_getattr
+5.51%  postgres  postgres  [.] heap_getnext
-5.50%  postgres  postgres  [.] slot_deform_tuple
   - 98.07% slot_deform_tuple
  - 85.98% slot_getattr
 - 96.59% ExecHashGetHashValue
- ExecHashJoin
   - ExecProcNode
  + 85.12% ExecHashJoin
  + 14.88% MultiExecHash
 + 3.41% ExecMakeFunctionResultNoSets
  + 14.02% slot_getsomeattrs
   + 1.58% ExecEvalScalarVarFast

I.e. nearly all calls for slot_deform_tuple are from slot_getattrs in
ExecHashGetHashValue().  And nearly all the time in slot_getattr is
spent on code only executed for actual tuples:

   │   if (tuple == NULL)  /* internal 
error */
  0.18 │ test   %rax,%rax
   │   ↓ je 223
   │*
   │* (We have to check this separately because of various 
inheritance and
   │* table-alteration scenarios: the tuple could be either 
longer or shorter
   │* than the tupdesc.)
   │*/
   │   tup = tuple->t_data;
  0.47 │ mov0x10(%rax),%rsi
   │   if (attnum > HeapTupleHeaderGetNatts(tup))
 75.42 │ movzwl 0x12(%rsi),%eax
  0.70 │ and$0x7ff,%eax
  0.47 │ cmp%eax,%ebx
   │   ↓ jg e8

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

Andres Freund  writes:
> On 2017-03-15 15:41:22 -0400, Tom Lane wrote:
>> Color me dubious.  Which specific other places have you got in mind, and
>> do they have expression trees at hand that would tell them which columns
>> they really need to pull out?

> I was thinking of execGrouping.c's execTuplesMatch(),
> TupleHashTableHash() (and unequal, but doubt that matters
> performancewise).  There's also nodeHash.c's ExecHashGetValue(), but I
> think that'd possibly better fixed differently.

The execGrouping.c functions don't have access to an expression tree
instructing them which columns to pull out of the tuple, so I fail to see
how get_last_attnums() would be of any use to them.  As for
ExecHashGetHashValue, it's most likely going to be working from virtual
tuples passed up to the join, which won't benefit from predetermination
of the last column to be accessed.  The tuple-deconstruction would have
happened while projecting in the scan node below.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

On 2017-03-15 15:41:22 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On March 15, 2017 12:33:28 PM PDT, Tom Lane  wrote:
> >> So for my money, you should drop 0002 altogether and just have 0004
> >> remove get_last_attnums() from execUtils.c and stick it into
> >> execExpr.c.  I suppose you still need the LastAttnumInfo API change
> >> so as to decouple it from ProjectionInfo, but that's minor.
> 
> > I think it's quite useful in other places too, we do repeated slot-getattrs 
> > in a bunch of places in the executor, and it's noticeable performance wise. 
> 
> Color me dubious.  Which specific other places have you got in mind, and
> do they have expression trees at hand that would tell them which columns
> they really need to pull out?

I was thinking of execGrouping.c's execTuplesMatch(),
TupleHashTableHash() (and unequal, but doubt that matters
performancewise).  There's also nodeHash.c's ExecHashGetValue(), but I
think that'd possibly better fixed differently.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

Andres Freund  writes:
> On March 15, 2017 12:33:28 PM PDT, Tom Lane  wrote:
>> So for my money, you should drop 0002 altogether and just have 0004
>> remove get_last_attnums() from execUtils.c and stick it into
>> execExpr.c.  I suppose you still need the LastAttnumInfo API change
>> so as to decouple it from ProjectionInfo, but that's minor.

> I think it's quite useful in other places too, we do repeated slot-getattrs 
> in a bunch of places in the executor, and it's noticeable performance wise. 

Color me dubious.  Which specific other places have you got in mind, and
do they have expression trees at hand that would tell them which columns
they really need to pull out?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4



On March 15, 2017 12:33:28 PM PDT, Tom Lane  wrote:
>Andres Freund  writes:
>> [ latest patches ]
>
>I looked through 0002-Make-get_last_attnums-more-generic.patch.

>So for my money, you should drop 0002 altogether and just have 0004
>remove get_last_attnums() from execUtils.c and stick it into
>execExpr.c.  I suppose you still need the LastAttnumInfo API change
>so as to decouple it from ProjectionInfo, but that's minor.

I think it's quite useful in other places too, we do repeated slot-getattrs in 
a bunch of places in the executor, and it's noticeable performance wise. 

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

Andres Freund  writes:
> [ latest patches ]

I looked through 0002-Make-get_last_attnums-more-generic.patch.
Although it seems relatively unobjectionable on its own, I'm not
convinced that it's really useful to try to split it out like this.
I see that 0004 removes the only call of ExecGetLastAttnums (the
one in ExecBuildProjectionInfo) and then adds a single call in
ExecInitExprSlots which is in execExpr.c.  To me, the only reason
ExecGetLastAttnums nee get_last_attnums is in execUtils.c in the first
place is that it is a private subroutine of ExecBuildProjectionInfo.
After these changes, it might as well be a private subroutine of
ExecInitExprSlots.  I'm suspicious of turning it into a globally
accessible function as you've done here, because I doubt that it is of
global use --- in particular, the fact that it doesn't deal specially
with INDEX_VAR Vars seems rather specific to this one use-case.

So for my money, you should drop 0002 altogether and just have 0004
remove get_last_attnums() from execUtils.c and stick it into
execExpr.c.  I suppose you still need the LastAttnumInfo API change
so as to decouple it from ProjectionInfo, but that's minor.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-14 Thread Tom Lane

Andres Freund  writes:
> On 2017-03-14 19:34:12 -0400, Tom Lane wrote:
>> It seems bizarre that you chose to spell the new configure symbol as
>> HAVE__COMPUTED_GOTO rather than HAVE_COMPUTED_GOTO

> I went back-and-forth about this a number of times.  We have a bunch of
> symbols defined with HAVE__ as a prefix (and some with HAVE_GCC__) - and
> more of the nearby code seems to use __ rather than _.  I don't really
> know why we started doing that, but it's far from new..

> Any idea why we introduce __ stuff?

The nearby stuff is describing features that have a specific name that
includes a leading underscore or two, like __builtin_unreachable().
That doesn't seem to apply here, so I wouldn't add extra underscores.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

On 2017-03-14 23:10:25 -0400, Peter Eisentraut wrote:
> On 3/14/17 19:40, Andres Freund wrote:
> > Any idea why we introduce __ stuff?
> 
> Because the symbols start with an underscore:
> 
> /* Define to 1 if your compiler understands _Static_assert. */
> #undef HAVE__STATIC_ASSERT

Oh, I guess that makes some sense.  But we do so very inconsistently,
given we only keep the leading underscores not the trailing ones (see
HAVE__VA_ARGS, HAVE_FUNCNAME__FUNC).

> But we don't need to introduce underscores that are not there.

Indeed.  Don't want to repost for just this, but consider it adapted
(and moved as requested by Tom).

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-14 Thread Peter Eisentraut

On 3/14/17 19:40, Andres Freund wrote:
> Any idea why we introduce __ stuff?

Because the symbols start with an underscore:

/* Define to 1 if your compiler understands _Static_assert. */
#undef HAVE__STATIC_ASSERT

There is apparently some inconsistency when symbols start with more than
one underscore.

But we don't need to introduce underscores that are not there.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

Hi,

On 2017-03-14 19:34:12 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > [ new patch versions ]
> 
> About to leave, but I had time to read 0003:
> 
> It seems bizarre that you chose to spell the new configure symbol as
> HAVE__COMPUTED_GOTO rather than HAVE_COMPUTED_GOTO

I went back-and-forth about this a number of times.  We have a bunch of
symbols defined with HAVE__ as a prefix (and some with HAVE_GCC__) - and
more of the nearby code seems to use __ rather than _.  I don't really
know why we started doing that, but it's far from new..

Any idea why we introduce __ stuff?

> Also, being a neatnik, I would insert both the definition and the
> call of that macro between PGAC_C_BUILTIN_UNREACHABLE and
> PGAC_C_VA_ARGS, keeping it more or less in alphabetical order.

That works for me.

Thanks,

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-14 Thread Tom Lane

Andres Freund  writes:
> [ new patch versions ]

About to leave, but I had time to read 0003:

It seems bizarre that you chose to spell the new configure symbol as
HAVE__COMPUTED_GOTO rather than HAVE_COMPUTED_GOTO, especially so
when the comment for PGAC_C_COMPUTED_GOTO alleges it's the latter.
Also, being a neatnik, I would insert both the definition and the
call of that macro between PGAC_C_BUILTIN_UNREACHABLE and
PGAC_C_VA_ARGS, keeping it more or less in alphabetical order.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

On 2017-03-14 14:19:18 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-03-14 08:44:24 -0300, Alvaro Herrera wrote:
> >> It would be good to have someone at least read it before pushing, but
> >> I don't think anyone other than you has done so.
> 
> > I'd love for somebody else
> > to look through it, I tried asking multiple times...  Seems like
> > threatening a commit is the best way to get it :P
> 
> I'm willing to look

Cool.


> but the last few messages make it sound like you're not all that close
> to a finished version.

Hm, doesn't seem that far off to me - the latest few rounds of changes
have been largely polishing (naming, fighting w/ pgindent, language
polishing in comments, etc), otherwise I've basically just added a few
lines of optimization after doing more profiling.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-14 Thread Douglas Doole

On Tue, Mar 14, 2017 at 3:16 PM Andres Freund  wrote:

> Hm.  Right now ExprState's are allocated in several places - but we
> could easily move that to a central place.  Have a bit of a hard time
> seing that that branch during *initialization* time is that expensive,
> especially given that previously we allocated a lot of memory separately
> too.
>

I didn't make any comparisons of the cost of the new init against the old
init with this change in particular - I just saw that it made the new init
faster. I also didn't play around to determine if the savings was found in
removing the branch misprediction or inlining or both.

I certainly wouldn't hold up your commit for this, but it's something that
might be worth a second look once the dust has settled.

Re: [HACKERS] WIP: Faster Expression Processing v4

Hi,

On 2017-03-14 22:03:45 +, Douglas Doole wrote:
> I do have one observation based on my experiments with your first version
> of the code. In my tests, I found that expression init becomes a lot more
> expensive in this new model. (That's neither a surprise, nor a
> concern.)

I suspect that's to a good degree because back then it'd often end up
trying the "new" stuff, but then fall back to the old machinery due to
unsupported operations. Which isn't a concern anymore, because now
there's full coverage.

Otherwise the cost aren't, according to my measurements, higher than
before, excepting that some stuff that happened at execution time is now
happening during initialization.

> In particular, the function ExprEvalPushStep() is quite hot. In my
> code I made the following changes:
> 
>   * Declare ExprEvalPushStep() "inline".
>   * Remove the "if (es->steps_alloc == 0)" condition
> from ExprEvalPushStep().
>   * In ExecInitExpr(), add:
>state->steps_alloc = 16;
>state->steps = palloc(sizeof(ExprEvalStep) * es->steps_alloc);
> 
> I found that this cut the cost of initializing the expression by about 20%.
> (Of course, that was on version 1 of your code, so the benefit may well be
> different now.)

Hm.  Right now ExprState's are allocated in several places - but we
could easily move that to a central place.  Have a bit of a hard time
seing that that branch during *initialization* time is that expensive,
especially given that previously we allocated a lot of memory separately
too.

> On Tue, Mar 14, 2017 at 11:51 AM Andres Freund  wrote:
> 
> > > Hmm. Could we make the instructions variable size? It would allow packing
> > > the small instructions even more tight, and we wouldn't need to obsess
> > over
> > > a particular maximum size for more complicated instructions.
> >
> > That makes jumps a lot more complicated.  I'd experimented with it and
> > given it up as "not worth it".

> Back when I was at IBM, I spent a lot of time doing stuff like this. If you
> want to commit with the fixed size arrays, I'm happy to volunteer to look
> at packing it tighter as a follow-on piece of work. (It was already on my
> list of things to try anyhow.)

Yea, I think experimenting with that is a good idea. I just think this
is complicated enough, and I don't want to add a whole lot more
complexity for very debatable benefit.

> > If we were to try to do so, we'd also
> > not like storing the pointer and enum variants both, since it'd again
> > would reduce the density.

> From my experience, it's worth the small loss in density to carry around
> both the pointer and the enum - it makes debugging so much easier.

I found it annoying enough to print the instruction itself, due to the
union - so printing the opcode using a function isn't too bad...

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-14 Thread Douglas Doole

Andres, sorry I haven't had a chance to look at this great stuff you've
been doing. I've wanted to get to it, but work keeps getting in the way. ;-)

I do have one observation based on my experiments with your first version
of the code. In my tests, I found that expression init becomes a lot more
expensive in this new model. (That's neither a surprise, nor a concern.) In
particular, the function ExprEvalPushStep() is quite hot. In my code I made
the following changes:

  * Declare ExprEvalPushStep() "inline".
  * Remove the "if (es->steps_alloc == 0)" condition
from ExprEvalPushStep().
  * In ExecInitExpr(), add:
   state->steps_alloc = 16;
   state->steps = palloc(sizeof(ExprEvalStep) * es->steps_alloc);

I found that this cut the cost of initializing the expression by about 20%.
(Of course, that was on version 1 of your code, so the benefit may well be
different now.)

On Tue, Mar 14, 2017 at 11:51 AM Andres Freund  wrote:

> > Hmm. Could we make the instructions variable size? It would allow packing
> > the small instructions even more tight, and we wouldn't need to obsess
> over
> > a particular maximum size for more complicated instructions.
>
> That makes jumps a lot more complicated.  I'd experimented with it and
> given it up as "not worth it".

Back when I was at IBM, I spent a lot of time doing stuff like this. If you
want to commit with the fixed size arrays, I'm happy to volunteer to look
at packing it tighter as a follow-on piece of work. (It was already on my
list of things to try anyhow.)

> If we were to try to do so, we'd also
> not like storing the pointer and enum variants both, since it'd again
> would reduce the density.
>

>From my experience, it's worth the small loss in density to carry around
both the pointer and the enum - it makes debugging so much easier.

- Doug
Salesforce

Re: [HACKERS] WIP: Faster Expression Processing v4

Hi,

On 2017-03-14 20:28:51 +0200, Heikki Linnakangas wrote:
> On 03/14/2017 07:31 PM, Andres Freund wrote:
> > On 2017-03-14 16:58:54 +0200, Heikki Linnakangas wrote:
> > > * How tight are we on space in the ExprEvalStep union? Currently, the
> > > jump-threading preparation replaces the opcodes with the goto labels, but 
> > > it
> > > would be really nice to keep the original opcodes, for debugging purposes 
> > > if
> > > nothing else.
> > 
> > There's nothing left to spare :(.  For debugging I've just used
> > ExecEvalStepOp() - it's inconvenient to directly print the struct
> > anyway, since gdb prints the whole union...
> 
> This comment above the union is not accurate:
> 
> > /*
> >  * Data for an operation. Inline stored data is faster to access, but 
> > also
> >  * bloats the size of all instructions. The union should be kept below 
> > 48
> >  * bytes (so the entire struct is below 64bytes, a single cacheline on
> >  * common systems).
> >  */
> 
> On my x86-64 system, the whole struct is 64 bytes, but the union is only 40
> bytes. The fields before the union occupy 24 bytes. IOW, the union should be
> kept below 40 bytes, not 48.

Point.  Will update.


> Hmm. Could we make the instructions variable size? It would allow packing
> the small instructions even more tight, and we wouldn't need to obsess over
> a particular maximum size for more complicated instructions.

That makes jumps a lot more complicated.  I'd experimented with it and
given it up as "not worth it".  If we were to try to do so, we'd also
not like storing the pointer and enum variants both, since it'd again
would reduce the density.


> A compromise might be to have the fixed size, but have "continuation"
> opcodes for the more complicated instructions. An XmlExpr instruction, for
> example, could have an extra instruction after the primary one, just to hold
> more fields. When evaluating it, we would just increment the Program Counter
> by two instead of one, to skip over the extra instruction.

I think for cases where 40 bytes becomes the limit, it's usually better
to have out-of-line state.  We could do this, but we'd also waste a
bunch of space in the array (namely the 24 byte Op "header").

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-14 Thread Heikki Linnakangas


On 03/14/2017 07:31 PM, Andres Freund wrote:

On 2017-03-14 16:58:54 +0200, Heikki Linnakangas wrote:

* How tight are we on space in the ExprEvalStep union? Currently, the
jump-threading preparation replaces the opcodes with the goto labels, but it
would be really nice to keep the original opcodes, for debugging purposes if
nothing else.


There's nothing left to spare :(.  For debugging I've just used
ExecEvalStepOp() - it's inconvenient to directly print the struct
anyway, since gdb prints the whole union...


This comment above the union is not accurate:


/*
 * Data for an operation. Inline stored data is faster to access, but 
also
 * bloats the size of all instructions. The union should be kept below 
48
 * bytes (so the entire struct is below 64bytes, a single cacheline on
 * common systems).
 */


On my x86-64 system, the whole struct is 64 bytes, but the union is only 
40 bytes. The fields before the union occupy 24 bytes. IOW, the union 
should be kept below 40 bytes, not 48.


Hmm. Could we make the instructions variable size? It would allow 
packing the small instructions even more tight, and we wouldn't need to 
obsess over a particular maximum size for more complicated instructions.


A compromise might be to have the fixed size, but have "continuation" 
opcodes for the more complicated instructions. An XmlExpr instruction, 
for example, could have an extra instruction after the primary one, just 
to hold more fields. When evaluating it, we would just increment the 
Program Counter by two instead of one, to skip over the extra instruction.


For reference, here are the sizes of all the structs in the union:

40: xmlexpr
40: scalararrayop
40: minmax
40: iocoerce
40: convert_rowtype
32: wholerow
32: rowcompare_step
32: row
32: func
32: fieldstore
32: domaincheck
32: boolexpr
32: arrayexpr
32: arraycoerce
24: casewhen
24: casethen
24: arrayref_checksubscript
16: grouping_func
16: fieldselect
16: constval
16: casetest
16: assign_var
16: arrayref
8: window_func
8: subplan
8: sqlvaluefunction
8: param
8: nulltest_row
8: assign_tmp
8: alternative_subplan
8: aggref
4: var
4: rowcompare_final
4: qualexpr
4: fetch
4: coalesce

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-14 Thread Tom Lane

Andres Freund  writes:
> On 2017-03-14 08:44:24 -0300, Alvaro Herrera wrote:
>> It would be good to have someone at least read it before pushing, but
>> I don't think anyone other than you has done so.

> I'd love for somebody else
> to look through it, I tried asking multiple times...  Seems like
> threatening a commit is the best way to get it :P

I'm willing to look, but the last few messages make it sound like you're
not all that close to a finished version.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

On 2017-03-14 08:44:24 -0300, Alvaro Herrera wrote:
> Patch 0003 is huge.

I suspect you mean 0004?  If so - yes :(.  I unfortunately don't think
there's a useful way to split it in smaller chunks - I originally moved
ops over one-by-one, but that required a lot of duplicated structs and
such...

> It would be good to have someone at least read it before pushing, but
> I don't think anyone other than you has done so.

I'd love for somebody else
to look through it, I tried asking multiple times...  Seems like
threatening a commit is the best way to get it :P

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

On 2017-03-14 08:28:02 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> 
> > EEO_SWITCH(op->opcode)
> > {
> > EEO_CASE(EEO_DONE):
> > goto out;
> 
> Oh my.
> 
> > which is a bit annoying.  (the EEO_CASE is either a jump label or a case
> > statement, depending on computed goto availability).
> > 
> > It seems we could either:
> > 1) live with the damage
> > 2) disable pgindent
> > 3) move the : inside EEO_CASE's definition, and only use {} blocks.
> 
> I think 3) is nasty because the result doesn't look like normal C.

We have a bunch of such constructs already however. foreach(),
PG_TRY().  That means 3) has the advantage that our editors and pgindent
can already deal with it.


> EEO_CASE() are potentially jump labels, then indentation becomes
> correct.  Why not accept it?  It looks odd, but that gives a better hint
> to the reader about what's going on.

Seems likely that every editor would indent them differently :(


I'm leaning towards 3) for the reasons above and after trying out the
different indentations, but I don't have a strong opinion.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

Hi,


On 2017-03-14 16:58:54 +0200, Heikki Linnakangas wrote:
> On 03/14/2017 08:53 AM, Andres Freund wrote:
> > Besides that, this version has:
> > - pgindented most of the affected pieces (i.e. all significant new code
> >   has been reindent, not all touched one)
> 
> I think you'll need to add all the inner structs ExprEvalStep typedefs.list
> to indent them right.

Yea, I saw that.  Since they're not named atm, That'd probably not work
well.  I'd be inclined to just live with it :/


> I looked at patch 0004.

Thanks!


> * EEO_DISPATCH_DIRECT(op) takes 'op' as an argument, but it really assumes
> that 'op' has already been set to point to the jump target. I find that a
> bit weird. I guess the idea is that you always pass the Program Counter
> variable as 'op' argument. For consistency, would be good if EEO_SWITCH()
> also took just 'op' as the argument, rather than op->opcode. But I think it
> would be more clear if they should both just assumed that there's a variable
> called 'op' that points to the current instruction.

Hm.  I dislike magic variable names. I think I prefer your idea of
passing the op entirely to EEO_SWITCH.


> * All the callers of EEO_DISPATCH_DIRECT(op) set 'op' just prior to calling
> EEO_DISPATCH_DIRECT(op). How about having a macro EEO_JUMP(),
> to encapsulate setting 'op' and jumping to it in one operation?

Ok.


> * ExecEvalStepOp() seems relatively expensive, with the linear scan over all
> the opcodes, if called on an ExprState that already has
> EEO_FLAG_JUMP_THREADED set. All the callers use it to check if the opcode is
> a particular one, so you could check if the opcode matches that particular
> one, instead of scanning the dispatch table to find what it is.

It should be fairly cheap at that place, unless the same expression is
instantiated twice for some reason.  I have been wondering about adding
a second table ptr->op that can be binary searched for the operation to
make it cheaper.


> * But is ExecEvalStepOp() ever actually get called on an expression with
> EEO_FLAG_JUMP_THREADED already set? It's only used in
> ExecInstantiateInterpretedExpr(), and it's called only once on each
> ExprState. How about just adding an Assert that EEO_FLAG_JUMP_THREADED is
> not yet set? Or drop the EEO_FLAG_JUMP_THREADED flag altogether, and assert
> that evalfunc != ExecInterpExpr.

The primary use of ExecEvalStepOp() is really debugging, so I'd like to
avoid adding that restriction.  I guess we can add a second function for
this if needed.


> * How tight are we on space in the ExprEvalStep union? Currently, the
> jump-threading preparation replaces the opcodes with the goto labels, but it
> would be really nice to keep the original opcodes, for debugging purposes if
> nothing else.

There's nothing left to spare :(.  For debugging I've just used
ExecEvalStepOp() - it's inconvenient to directly print the struct
anyway, since gdb prints the whole union...


> * execInterpExpr.c is quite a mouthful. How about exprInterpreter.c?



> Attached is a patch, on top of your
> 0004-Faster-expression-evaluation-and-targetlist-projecti.patch, with some
> minor tweaks and comments here and there (search for HEIKKI). It's mostly
> the same stuff I listed above, implemented in a quick & dirty way. I hope
> it's helpful.

Thanks!



>  typedef enum ExprEvalOp
>  {
> + /*
> +  * HEIKKI: How about renaming these to EEOP_* ? I think it'd be a
> +  * good mnemonic to remember that these are opcodes, and just one
> +  * extra letter.
> +  */

I don't mind the change.


> + /* HEIKKI: Is it safe to assume that sizeof(size_t) >= sizeof(void *) ? 
> */

Yes, seems very likely - it's supposed to hold any potential sizeof()
return value.  You could end up on platforms where that's not true, if
it used tagged pointers or such.  I guess we could instead use a union,
but that doesn't seem that much of a benefit.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-14 Thread Heikki Linnakangas

On 03/14/2017 08:53 AM, Andres Freund wrote:

Besides that, this version has:
- pgindented most of the affected pieces (i.e. all significant new code
has been reindent, not all touched one)

I think you'll need to add all the inner structs ExprEvalStep
typedefs.list to indent them right.

My current plan is to not do very much on this tomorrow, do another full
pass on Wednesday, and push it, unless there's protest till then.

I looked at patch 0004. Some comments:

* EEO_DISPATCH_DIRECT(op) takes 'op' as an argument, but it really
assumes that 'op' has already been set to point to the jump target. I
find that a bit weird. I guess the idea is that you always pass the
Program Counter variable as 'op' argument. For consistency, would be
good if EEO_SWITCH() also took just 'op' as the argument, rather than
op->opcode. But I think it would be more clear if they should both just
assumed that there's a variable called 'op' that points to the current
instruction.

* All the callers of EEO_DISPATCH_DIRECT(op) set 'op' just prior to
calling EEO_DISPATCH_DIRECT(op). How about having a macro EEO_JUMP(number>), to encapsulate setting 'op' and jumping to it in one operation?

* ExecEvalStepOp() seems relatively expensive, with the linear scan over
all the opcodes, if called on an ExprState that already has
EEO_FLAG_JUMP_THREADED set. All the callers use it to check if the
opcode is a particular one, so you could check if the opcode matches
that particular one, instead of scanning the dispatch table to find what
it is.

* But is ExecEvalStepOp() ever actually get called on an expression with
EEO_FLAG_JUMP_THREADED already set? It's only used in
ExecInstantiateInterpretedExpr(), and it's called only once on each
ExprState. How about just adding an Assert that EEO_FLAG_JUMP_THREADED
is not yet set? Or drop the EEO_FLAG_JUMP_THREADED flag altogether, and
assert that evalfunc != ExecInterpExpr.

* How tight are we on space in the ExprEvalStep union? Currently, the
jump-threading preparation replaces the opcodes with the goto labels,
but it would be really nice to keep the original opcodes, for debugging
purposes if nothing else.

* execInterpExpr.c is quite a mouthful. How about exprInterpreter.c?

Attached is a patch, on top of your
0004-Faster-expression-evaluation-and-targetlist-projecti.patch, with
some minor tweaks and comments here and there (search for HEIKKI). It's
mostly the same stuff I listed above, implemented in a quick & dirty
way. I hope it's helpful.

- Heikki

faster-eval-comments.patch
Description: application/download

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-14 Thread Alvaro Herrera

diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 3d6a3801c0..d205101b89 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -47,7 +47,14 @@
 #include "utils/rel.h"
 
 
-static bool get_last_attnums(Node *node, ProjectionInfo *projInfo);
+typedef struct LastLastAttnumInfo


LastLast?


Patch 0003 is huge.  It would be good to have someone at least read it
before pushing, but I don't think anyone other than you has done so.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-14 Thread Alvaro Herrera

Andres Freund wrote:

> EEO_SWITCH(op->opcode)
> {
> EEO_CASE(EEO_DONE):
> goto out;

Oh my.

> which is a bit annoying.  (the EEO_CASE is either a jump label or a case
> statement, depending on computed goto availability).
> 
> It seems we could either:
> 1) live with the damage
> 2) disable pgindent
> 3) move the : inside EEO_CASE's definition, and only use {} blocks.

I think 3) is nasty because the result doesn't look like normal C.  If
EEO_CASE() are potentially jump labels, then indentation becomes
correct.  Why not accept it?  It looks odd, but that gives a better hint
to the reader about what's going on.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-13 Thread Andres Freund

Hi,

On 2017-03-13 01:03:51 -0700, Andres Freund wrote:
> What's basically missing here is:
> - pgindent run and minimizing resulting damage

Running into a bit of an issue here - pgindent mangles something like

EEO_SWITCH (op->opcode)
{
EEO_CASE(EEO_DONE):
goto out;

EEO_CASE(EEO_INNER_FETCHSOME):
/* XXX: worthwhile to check tts_nvalid inline first? */
slot_getsomeattrs(innerslot, op->d.fetch.last_var);
EEO_DISPATCH(op);

EEO_CASE(EEO_OUTER_FETCHSOME):
slot_getsomeattrs(outerslot, op->d.fetch.last_var);
EEO_DISPATCH(op);

EEO_CASE(EEO_SCAN_FETCHSOME):
slot_getsomeattrs(scanslot, op->d.fetch.last_var);
EEO_DISPATCH(op);

EEO_CASE(EEO_INNER_VAR):
{
int attnum = op->d.var.attnum;

/*
 * Can't assert tts_nvalid, as wholerow var evaluation or such
 * could have materialized the slot - but the contents are
 * still valid :/
 */
Assert(op->d.var.attnum >= 0);
*op->resnull = innerslot->tts_isnull[attnum];
*op->resvalue = innerslot->tts_values[attnum];
EEO_DISPATCH(op);
}

into

EEO_SWITCH(op->opcode)
{
EEO_CASE(EEO_DONE):
goto out;

EEO_CASE(EEO_INNER_FETCHSOME):
/* XXX: worthwhile to check tts_nvalid inline first? */
slot_getsomeattrs(innerslot, op->d.fetch.last_var);
EEO_DISPATCH(op);

EEO_CASE(EEO_OUTER_FETCHSOME):
slot_getsomeattrs(outerslot, op->d.fetch.last_var);
EEO_DISPATCH(op);

EEO_CASE(EEO_SCAN_FETCHSOME):
slot_getsomeattrs(scanslot, op->d.fetch.last_var);
EEO_DISPATCH(op);

EEO_CASE(EEO_INNER_VAR):
{
int attnum = op->d.var.attnum;

/*
 * Can't assert tts_nvalid, as wholerow var evaluation or such
 * could have materialized the slot - but the contents are still
 * valid :/
 */
Assert(op->d.var.attnum >= 0);
*op->resnull = innerslot->tts_isnull[attnum];
*op->resvalue = innerslot->tts_values[attnum];
EEO_DISPATCH(op);
}


which is a bit annoying.  (the EEO_CASE is either a jump label or a case
statement, depending on computed goto availability).

It seems we could either:
1) live with the damage
2) disable pgindent
3) move the : inside EEO_CASE's definition, and only use {} blocks.

I'm inclined to go for 3).

Opinions?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-13 Thread Tomas Vondra


On 03/13/2017 09:03 AM, Andres Freund wrote:

Hi,

On 2017-03-12 05:40:51 +0100, Tomas Vondra wrote:

I wanted to do a bit of testing and benchmarking on this, but 0004 seems to
be a bit broken.


Well, "broken" in the sense that it's already outdated, because other
stuff that got merged.



Yes, that's what I meant. Sorry for not being quite clear.




The patch does not apply anymore - there are some conflicts
in execQual.c, but I think I fixed those. But then I ran into a bunch of
compile-time errors, because some of the executor nodes still reference bits
that were moved elsewhere.


Updated patch attached.  Note that this patch has two changes I've not
yet evaluated performance-wise.



And which are those?

I plan to do a bit of testing / benchmarking on this later this week, 
depending on other testing already happening on my machine.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-13 Thread Konstantin Knizhnik




On 13.03.2017 11:03, Andres Freund wrote:

Hi,

On 2017-03-12 05:40:51 +0100, Tomas Vondra wrote:

I wanted to do a bit of testing and benchmarking on this, but 0004 seems to
be a bit broken.

Well, "broken" in the sense that it's already outdated, because other
stuff that got merged.



The patch does not apply anymore - there are some conflicts
in execQual.c, but I think I fixed those. But then I ran into a bunch of
compile-time errors, because some of the executor nodes still reference bits
that were moved elsewhere.

Updated patch attached.  Note that this patch has two changes I've not
yet evaluated performance-wise.



I got the following results at my system with Intel(R) Core(TM) i7-4770 
CPU @ 3.40GHz, 16Gb RAM,
TPC-H Q1/Q6 scale 10, sharedBuffers=8Gb, pg_prewarm on lineitem table 
projection:



Q1
Q6
Master
7503 ms 1171 ms
Your patch  6420 ms 1034 ms
VOPS
396 ms
249 ms
VOPS + patch367 ms
233 ms



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-11 Thread Tomas Vondra


Hi,

On 03/07/2017 04:01 AM, Andres Freund wrote:

Hi,

Attached is an updated version of the patchset, but more importantly
some benchmark numbers.



I wanted to do a bit of testing and benchmarking on this, but 0004 seems 
to be a bit broken. The patch does not apply anymore - there are some 
conflicts in execQual.c, but I think I fixed those. But then I ran into 
a bunch of compile-time errors, because some of the executor nodes still 
reference bits that were moved elsewhere.


E.g. there's no targetlist in PlanState anymore, yet nodeGatherMerge and 
nodeTableFuncscan do this:


gm_state->ps.targetlist = (List *)
ExecInitExpr((Expr *) node->plan.targetlist,
 (PlanState *) gm_state);

Some of the nodes also assign to ps.qual values that are (List *), but 
that field was switched to ExprState. That needs fixing, I guess.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-11 Thread Robert Haas

On Fri, Mar 10, 2017 at 7:42 PM, Bruce Momjian  wrote:
> On Fri, Mar 10, 2017 at 07:15:59PM -0500, Peter Eisentraut wrote:
>> On 3/10/17 14:40, Andres Freund wrote:
>> > I'd really like to get it in.  The performance improvements on its own
>> > are significant, and it provides the basis for significantly larger
>> > improvements again (JIT) - those followup improvements are large patches
>> > again though, so I'd rather not do all of that next cycle.
>> >
>> > My next step (over the weekend) is to add tests to execQual.c to get it
>> > a good chunk closer to 100% test coverage, and then do the same for the
>> > new implementation (there's probably very little additional tests needed
>> > after the conversion).  Given all tests pass before/after, and there's a
>> > lot of them, I think we can have a reasonable confidence of a low bug
>> > density.
>>
>> That seems like a plan, but now would probably be a good time for some
>> other hackers to take note of this proposal.
>
> Well, the executor is long overdue for improvement, so I would love to
> have this in 10.0.  I am not sure what additional polishing will happen
> if we punt it for 11.0.  I think the only downside of having it in 10.0
> is that it will not have lived in the source tree for as long a time
> between commit and PG release, but if it is tested, that seems fine.

Yeah.  I think we can't rule out the possibility that this patch is
full of bugs, but (1) a lot of the change is pretty mechanical and (2)
none of this touches the on-disk format or the catalog contents, so it
doesn't break anybody's existing install if we have to patch it.  On
the other hand, (3) it's a big patch and (4) if expression evaluation
doesn't work, PostgreSQL is pretty much unusable.

So my feeling is:

1. If Andres commits this and it turns out to be seriously buggy, he
should be prepared to revert it without a big fight.  We can try again
for v11.

2. If Andres commits this and it turns out to be mildly buggy, he
should be prepared to address bugs very proactively and very promptly.

3. If Andres wants to commit this for v10, it should be done RSN.
Feature freeze isn't technically until the end of March, but giant
potentialy-destabilizing patches that land on March 31st are likely to
lead to outcomes that nobody wants.

Like, I think, everyone else, I'd really like to have these speedups
and I think they will benefit everyone who uses PostgreSQL.  However,
I think that we cannot afford to have a repeat of the simplehash thing
where serious bugs sat around for months without really getting
properly addressed.  If that happens with this, I will be extremely
unhappy, and I'm fairly sure I won't be alone.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-10 Thread Bruce Momjian

On Fri, Mar 10, 2017 at 07:15:59PM -0500, Peter Eisentraut wrote:
> On 3/10/17 14:40, Andres Freund wrote:
> > I'd really like to get it in.  The performance improvements on its own
> > are significant, and it provides the basis for significantly larger
> > improvements again (JIT) - those followup improvements are large patches
> > again though, so I'd rather not do all of that next cycle.
> > 
> > My next step (over the weekend) is to add tests to execQual.c to get it
> > a good chunk closer to 100% test coverage, and then do the same for the
> > new implementation (there's probably very little additional tests needed
> > after the conversion).  Given all tests pass before/after, and there's a
> > lot of them, I think we can have a reasonable confidence of a low bug
> > density.
> 
> That seems like a plan, but now would probably be a good time for some
> other hackers to take note of this proposal.

Well, the executor is long overdue for improvement, so I would love to
have this in 10.0.  I am not sure what additional polishing will happen
if we punt it for 11.0.  I think the only downside of having it in 10.0
is that it will not have lived in the source tree for as long a time
between commit and PG release, but if it is tested, that seems fine.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

On 3/10/17 14:40, Andres Freund wrote:
> I'd really like to get it in.  The performance improvements on its own
> are significant, and it provides the basis for significantly larger
> improvements again (JIT) - those followup improvements are large patches
> again though, so I'd rather not do all of that next cycle.
> 
> My next step (over the weekend) is to add tests to execQual.c to get it
> a good chunk closer to 100% test coverage, and then do the same for the
> new implementation (there's probably very little additional tests needed
> after the conversion).  Given all tests pass before/after, and there's a
> lot of them, I think we can have a reasonable confidence of a low bug
> density.

That seems like a plan, but now would probably be a good time for some
other hackers to take note of this proposal.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

On 3/10/17 14:24, Andres Freund wrote:
> What problem are you thinking of exactly, and what'd be the solution?
> I'd guess that people wouldn't like being unable to change columns in a
> table if some function returned the type.

Well, that's pretty much the problem.

For example:

create type t1 as (a int, b int);
create function f1() returns setof t1 language plpgsql
as $$ begin return query values (1, 2); end $$;
alter type t1 drop attribute b;
select * from f1();  -- fail

The dependency system should arguably guard against that.  Yes, it would
make some things more cumbersome, but, well, it already does that.

>> Not necessarily in this patch, but I would like to keep the options
>> open.
> 
> Yea, seems worth thinking about.  I don't think the patch closes down
> options?

nope

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-10 Thread Andres Freund

On 2017-03-10 09:00:14 -0500, Peter Eisentraut wrote:
> I have also looked at the 0002 and 0003 patches, and they seem OK, but
> they are obviously not of much use without 0004.  What is your ambition
> for getting 0004 reviewed and committed for PG10?

I'd really like to get it in.  The performance improvements on its own
are significant, and it provides the basis for significantly larger
improvements again (JIT) - those followup improvements are large patches
again though, so I'd rather not do all of that next cycle.

My next step (over the weekend) is to add tests to execQual.c to get it
a good chunk closer to 100% test coverage, and then do the same for the
new implementation (there's probably very little additional tests needed
after the conversion).  Given all tests pass before/after, and there's a
lot of them, I think we can have a reasonable confidence of a low bug
density.

Large parts of the changes are fairly mechanical, the interesting bits
probably are:

- there's a lot fewer "full blown" node types for expression evaluation
  anymore. In most cases there's now only a top-level ExprState node
  that's nodeTag() able.  The actual information to execute an
  instruction now is in ExprState->steps[i]
- because of the previous point, it now isn't legal anymore to call
  ExecInitExpr() on a list of expressions - ExecInitExprList() is a
  convenience wrapper around manually iterating over the list (gets rid
  of ugly casts, too)
- Because they behave differently on an actual expression evaluation
  basis, quals / check constraint, now have to be initialized with
  ExecInitQual/ExecInitCheck(). That way the shortcut behaviour and such
  can be retained, while also being faster.
- PlanState->targetlist is gone. Because expressions aren't initialized
  on their own anymore, but as part of ExecProject, there's no need for it.
- The seperation between ExecInitExprRec() and
  ExecEvalExpr/ExecInterpExpr() is different - we try to do as much as
  possible during initialization.
- I retained some ugly hackering around
  innermost_caseval|null/domainval|null - I had started down a path of
  removing the use of those in random parts of the system, but I ended
  up not going for it.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-10 Thread Andres Freund

On 2017-03-10 08:51:25 -0500, Peter Eisentraut wrote:
> On 3/7/17 19:14, Andres Freund wrote:
> >> Why shouldn't the function itself also depend on the components of its
> >> return type?
> > Because that'd make it impossible to change the return type components -
> > if the return type is baked into the view that's necessary, but for a
> > "freestanding function" it's not.  If you e.g. have a function that just
> > returns a table's rows, it'd certainly be annoying if that'd prevent you
> > from dropping columns.
> 
> I think functions breaking when the return type is fiddled with is
> actually a not-uncommon problem in practice.  It would be nice if that
> could be addressed.

What problem are you thinking of exactly, and what'd be the solution?
I'd guess that people wouldn't like being unable to change columns in a
table if some function returned the type.

One rather extreme way would be to have functions return a "different"
type, that's initially identical to the table/composite type.  If the
ocmposite type then changes, we'd transparently map between the actual
return type, and the one callers expect.   But that'd a fair bit of
effort, and presumably also quite confusing for users that might
e.g. expect a new column to show up.

> Not necessarily in this patch, but I would like to keep the options
> open.

Yea, seems worth thinking about.  I don't think the patch closes down
options?

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

I have also looked at the 0002 and 0003 patches, and they seem OK, but
they are obviously not of much use without 0004.  What is your ambition
for getting 0004 reviewed and committed for PG10?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4