Hello,

At Thu, 01 Sep 2016 16:12:31 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
<horiguchi.kyot...@lab.ntt.co.jp> wrote in 
<20160901.161231.110068639.horiguchi.kyot...@lab.ntt.co.jp>
> There's perfomance degradation for non-asynchronous nodes, as
> shown as 't0' below.
> 
> The patch adds two "if-then" and one additional function call as
> asynchronous stuff into ExecProcnode, which is heavily passed and
> foremerly consists only five meaningful lines. The stuff slows
> performance by about 1% for simple seqscan case. The following is
> the performance numbers previously shown upthread.  (Or the
> difference might be too small to get meaningful performance
> difference..)

I tried __builtin_expect before moving the stuff out of
execProcNode. (attached patch) I found a conversation about the
pragma in past discussion.

https://www.postgresql.org/message-id/CA+TgmoYknejCgWMb8Tg63qA67JoUG2uCc0DZc5mm9=e18am...@mail.gmail.com

> If we can show cases where it reliably produces a significant
> speedup, then I would think it would be worthwhile

I got a result as the followings.

master(67e1e2a)-O2
      time(ms)  stddev(ms)
  t0: 3928.22 (  0.40)   # Simple SeqScan only
  pl: 1665.14 (  0.53)   # Append(SeqScan)

Patched-O2 / NOT Use __builtin_expect
  t0: 4042.69 (  0.92)    degradation to master is 2.9%
  pl: 1698.46 (  0.44)    degradation to master is 2.0%

Patched-O2 / Use __builtin_expect
  t0: 3886.69 (  1.93)    *gain* to master is 1.06%
  pl: 1671.66 (  0.67)    degradation to master is 0.39%

I haven't directly seen the pragmra's implication for
optimization on surrounding code but I suspect there's some
implication. I also tried the pragma to ExecAppend but no
difference seen. The numbers flucture easily by any changes in
the machine's state so the lower digits aren't trustworthy but
several succeeding repetitions showed fluctuations up to some
milliseconds.

execProcNode will be allowed to be as it is if __builtin_expect
is usable but ExecAppend still needs an improvement.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From f1f02557f7f4d694f0e3b4d62a6bdfad8e746b03 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyot...@lab.ntt.co.jp>
Date: Mon, 12 Sep 2016 16:36:37 +0900
Subject: [PATCH] Use __builtin_expect to optimize branches

__builtin_expect can minimize the cost of failure of branch prediction
for certain cases. It seems to work very fine here.
---
 src/backend/executor/execProcnode.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index cef262b..c247fa3 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -585,13 +585,22 @@ ExecExecuteNode(PlanState *node)
  *      execution subtree and every subtree should have an individual context.
  *      ----------------------------------------------------------------
  */
+#define MY_USE_LIKELY
+#if defined MY_USE_LIKELY
+#define my_likely(x) __builtin_expect((x),1)
+#define my_unlikely(x) __builtin_expect((x),0)
+#else
+#define my_likely(x) (x)
+#define my_unlikely(x) (x)
+#endif
+
 TupleTableSlot *
 ExecProcNode(PlanState *node)
 {
 	CHECK_FOR_INTERRUPTS();
 
 	/* Return unconsumed result if any */
-	if (node->result_ready)
+	if (my_unlikely(node->result_ready))
 		return ExecConsumeResult(node);
 
 	if (node->chgParam != NULL) /* something changed */
@@ -599,7 +608,7 @@ ExecProcNode(PlanState *node)
 
 	ExecDispatchNode(node);
 
-	if (!node->result_ready)
+	if (my_unlikely(!node->result_ready))
 		ExecAsyncWaitForNode(node);
 
 	return ExecConsumeResult(node);
-- 
2.9.2

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to