[HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-23 Thread Heikki Linnakangas
For the record, on MSVC we can use __assume(0) to mark unreachable code. 
It does the same as gcc's __builtin_unreachable(). I tested it with the 
same Pavel's palloc-heavy test case that you used earlier, with the 
one-shot plan commit temporarily reverted, and saw a similar speedup you 
reported with gcc's __builtin_unreachable(). So, committed that.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-21 Thread Steve Singer

On 13-01-21 02:28 AM, Andres Freund wrote:


I haven't removed it from the patch afaik, so it would be great to get a
profile here! Its only for xlogdump, but that tool helped me immensely
and I don't want to maintain it independently...


Here is the output from tprof

Here is the baseline:

Average time 37862

Total Ticks For All Processes (./postgres_perftest) = 19089



Subroutine Ticks%   Source Address  Bytes
== = == == ===  =
.AllocSetAlloc  2270   0.99 aset.c 3bfb84b0
.base_yyparse   1217   0.53 gram.c fe644  168ec
.text   1015   0.44 copyfuncs.c 11bfb4   4430
.MemoryContextAllocZero  535   0.23 mcxt.c 3b990 f0
.MemoryContextAlloc  533   0.23 mcxt.c 3b780 e0
.check_stack_depth   392   0.17 postgres.c 568ac100
.core_yylex  385   0.17 gram.c fb5b4   1c90
.expression_tree_walker  347   0.15 nodeFuncs.c 50da4750
.AllocSetFree308   0.13 aset.c 3c9981b0
.grouping_planner242   0.11 planner.c 2399f0   1d10
.SearchCatCache  198   0.09 catcache.c 7ec20550
._SPI_execute_plan   195   0.09 spi.c 2f0c187c0
.pfree   195   0.09 mcxt.c 3b3b0 70
query_dependencies_walker185   0.08 setrefs.c 2559441b0
.GetSnapshotData 183   0.08 procarray.c 69efc460
.query_tree_walker   176   0.08 nodeFuncs.c 50ae4210
.strncpy 168   0.07 strncpy.s ba080130
.fmgr_info_cxt_security  166   0.07 fmgr.c 3f7b0850
.transformStmt   159   0.07 analyze.c 29091c   12d0
.text141   0.06 parse_collate.c 28ddf8  1
.ExecInitExpr137   0.06 execQual.c 17f18c   15f0
.fix_expr_common 132   0.06 setrefs.c 2557e4160
.standard_ExecutorStart  127   0.06 execMain.c 1d9a00940
.GetCachedPlan   125   0.05 plancache.c ce664310
.strcpy  121   0.05 noname 3bd401a8


With your changes (same test as I described before)

Average: 37938


Total Ticks For All Processes (./postgres_perftest) = 19311

Subroutine Ticks%   Source Address  Bytes
== = == == ===  =
.AllocSetAlloc  2162   2.17 aset.c 3bfb84b0
.base_yyparse   1242   1.25 gram.c fdc7c  167f0
.text   1028   1.03 copyfuncs.c 11b4d0   4210
.palloc  553   0.56 mcxt.c 3b4c8 d0
.MemoryContextAllocZero  509   0.51 mcxt.c 3b9e8 f0
.core_yylex  413   0.41 gram.c fac2c   1c60
.check_stack_depth   404   0.41 postgres.c 56730100
.expression_tree_walker  320   0.32 nodeFuncs.c 50d28750
.AllocSetFree261   0.26 aset.c 3c9981b0
._SPI_execute_plan   232   0.23 spi.c 2ee6247c0
.GetSnapshotData 221   0.22 procarray.c 69d54460
.grouping_planner211   0.21 planner.c 237b60   1cf0
.MemoryContextAlloc  190   0.19 mcxt.c 3b738 e0
.query_tree_walker   184   0.18 nodeFuncs.c 50a68210
.SearchCatCache  182   0.18 catcache.c 7ea08550
.transformStmt   181   0.18 analyze.c 28e774   12d0
query_dependencies_walker180   0.18 setrefs.c 25397c1b0
.strncpy 175   0.18 strncpy.s b9a60130
.MemoryContextCreate 167   0.17 mcxt.c 3bad8160
.pfree   151   0.15 mcxt.c 3b208 70
.strcpy  150   0.15 noname 3bd401a8
.fmgr_info_cxt_security  146   0.15 fmgr.c 3f790850
.text132   0.13 parse_collate.c 28bc50  1
.ExecInitExpr125   0.13 execQual.c 17de28   15e0
.expression_tree_mutator 124   0.12 nodeFuncs.c 53268   1080
.strcmp  122   0.12 noname 1e44158
.fix_expr_common 117   0.12 setrefs.c 25381c160





Greetings,

Andres Freund





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-21 Thread Andres Freund
On 2013-01-21 11:59:18 -0500, Steve Singer wrote:
 On 13-01-21 02:28 AM, Andres Freund wrote:
 
 I haven't removed it from the patch afaik, so it would be great to get a
 profile here! Its only for xlogdump, but that tool helped me immensely
 and I don't want to maintain it independently...

 Here is the output from tprof

 Here is the baseline:

Any chance to do the test ontop of HEAD? The elog stuff has gone in only
afterwards and might actually effect this enough to be relevant.

Otherwise I have to say I am a bit confused by the mighty difference in
costs attributed to AllocSetAlloc given the amount of calls should be
exactly the same and the number of trampoline function calls should
also stay the same.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-21 Thread Steve Singer

On 13-01-21 12:15 PM, Andres Freund wrote:

On 2013-01-21 11:59:18 -0500, Steve Singer wrote:

On 13-01-21 02:28 AM, Andres Freund wrote:

I haven't removed it from the patch afaik, so it would be great to get a
profile here! Its only for xlogdump, but that tool helped me immensely
and I don't want to maintain it independently...

Here is the output from tprof

Here is the baseline:

Any chance to do the test ontop of HEAD? The elog stuff has gone in only
afterwards and might actually effect this enough to be relevant.


HEAD(:fb197290)
Average: 28877

.AllocSetAlloc  1442   1.90 aset.c 3a9384b0
.base_yyparse   1220   1.61 gram.c fdbc0  168ec
.MemoryContextAlloc  485   0.64 mcxt.c 3a154 e0
.core_yylex  407   0.54 gram.c fab30   1c90
.AllocSetFree320   0.42 aset.c 3b3181b0
.MemoryContextAllocZero  316   0.42 mcxt.c 3a364 f0
.grouping_planner271   0.36 planner.c 2b0ce8   1d10
.strncpy 256   0.34 strncpy.s b8ca0130
.expression_tree_walker  222   0.29 nodeFuncs.c 4f734750
._SPI_execute_plan   221   0.29 spi.c 2fb230840
.SearchCatCache  182   0.24 catcache.c 7d870550
.GetSnapshotData 161   0.21 procarray.c 68acc460
.fmgr_info_cxt_security  155   0.20 fmgr.c 3e130850
.pfree   152   0.20 mcxt.c 39d84 70
.expression_tree_mutator 151   0.20 nodeFuncs.c 51c84   1170
.check_stack_depth   142   0.19 postgres.c 5523c100
.text126   0.17 parse_collate.c 233d90  1
.transformStmt   125   0.16 analyze.c 289e88   12c0
.ScanKeywordLookup   123   0.16 kwlookup.c f7ab4154
.new_list120   0.16 list.c 40f74 b0
.subquery_planner115   0.15 planner.c 2b29f8dc0
.GetCachedPlan   115   0.15 plancache.c cdab0310
.ExecInitExpr114   0.15 execQual.c 17e690   15f0
.set_plan_refs   113   0.15 setrefs.c 252630cb0
.standard_ExecutorStart  110   0.14 execMain.c 1d9264940
.heap_form_tuple 107   0.14 heaptuple.c 177c842f0
.query_tree_walker   105   0.14 nodeFuncs.c 4f474210



HEAD + patch:
Average: 29035

Total Ticks For All Processes (./postgres_perftest) = 15044

Subroutine Ticks%   Source Address  Bytes
== = == == ===  =
.AllocSetAlloc  1422   1.64 aset.c 3a9384b0
.base_yyparse   1201   1.38 gram.c fd1e8  167f0
.palloc  470   0.54 mcxt.c 39e04 d0
.core_yylex  364   0.42 gram.c fa198   1c60
.MemoryContextAllocZero  282   0.33 mcxt.c 3a324 f0
._SPI_execute_plan   250   0.29 spi.c 2f8c18840
.AllocSetFree244   0.28 aset.c 3b3181b0
.strncpy 234   0.27 strncpy.s b86a0130
.expression_tree_walker  229   0.26 nodeFuncs.c 4f698750
.grouping_planner176   0.20 planner.c 2aea84   1d30
.SearchCatCache  170   0.20 catcache.c 7d638550
.GetSnapshotData 168   0.19 procarray.c 68904460
.assign_collations_walker155   0.18 parse_collate.c 231f4c7e0
.subquery_planner141   0.16 planner.c 2b07b4dc0
.fmgr_info_cxt_security  141   0.16 fmgr.c 3e110850
.check_stack_depth   140   0.16 postgres.c 550a0100
.ExecInitExpr138   0.16 execQual.c 17d2f8   15e0
.pfree   137   0.16 mcxt.c 39b44 70
.transformStmt   132   0.15 analyze.c 287dec   12c0
.new_list128   0.15 list.c 40f44 90
.expression_tree_mutator 125   0.14 nodeFuncs.c 51bd8   1080
.preprocess_expression   118   0.14 planner.c 2adf541a0
.strcmp  118   0.14 noname 1e44158
.standard_ExecutorStart  115   0.13 execMain.c 1d77c0940
.MemoryContextAlloc  111   0.13 mcxt.c 3a074 e0
.set_plan_refs   109   0.13 setrefs.c 2506c4ca0





Otherwise I have to say I am a bit confused by the mighty difference in
costs attributed to AllocSetAlloc given the amount of calls should be
exactly the same and the number of trampoline function calls should
also stay the same.
Greetings,

Andres Freund

--
  Andres Freundhttp://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-20 Thread Andres Freund
On 2013-01-19 17:33:05 -0500, Steve Singer wrote:
 On 13-01-09 03:07 PM, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
 Well, I *did* benchmark it as noted elsewhere in the thread, but thats
 obviously just machine (E5520 x 2) with one rather restricted workload
 (pgbench -S -jc 40 -T60). At least its rather palloc heavy.
 Here are the numbers:
 before:
 #101646.763208 101350.361595 101421.425668 101571.211688 101862.172051 
 101449.857665
 after:
 #101553.596257 102132.277795 101528.816229 101733.541792 101438.531618 
 101673.400992
 So on my system if there is a difference, its positive (0.12%).
 pgbench-based testing doesn't fill me with a lot of confidence for this
 --- its numbers contain a lot of communication overhead, not to mention
 that pgbench itself can be a bottleneck.  It struck me that we have a
 recent test case that's known to be really palloc-intensive, namely
 Pavel's example here:
 http://www.postgresql.org/message-id/CAFj8pRCKfoz6L82PovLXNK-1JL=jzjwat8e2bd2pwnkm7i7...@mail.gmail.com
 
 I set up a non-cassert build of commit
 78a5e738e97b4dda89e1bfea60675bcf15f25994 (ie, just before the patch that
 reduced the data-copying overhead for that).  On my Fedora 16 machine
 (dual 2.0GHz Xeon E5503, gcc version 4.6.3 20120306 (Red Hat 4.6.3-2))
 I get a runtime for Pavel's example of 17023 msec (average over five
 runs).  I then applied oprofile and got a breakdown like this:
...
 
 The number of calls of AllocSetAlloc certainly hasn't changed at all, so
 how did that get faster?
 
 I notice that the postgres executable is about 0.2% smaller, presumably
 because a whole lot of inlined fetches of CurrentMemoryContext are gone.
 This makes me wonder if my result is due to chance improvements of cache
 line alignment for inner loops.
 
 I would like to know if other people get comparable results on other
 hardware (non-Intel hardware would be especially interesting).  If this
 result holds up across a range of platforms, I'll withdraw my objection
 to making palloc a plain function.
 
  regards, tom lane
 

 Sorry for the delay I only read this thread today.


 I just tried Pawel's test on a POWER5 machine with an older version of gcc
 (see the grebe buildfarm animal for details)

 78a5e738e:   37874.855 (average of 6 runs)
 78a5e738 + palloc.h + mcxt.c: 38076.8035

 The functions do seem to slightly slow things down on POWER. I haven't
 bothered to run oprofile or tprof to get a breakdown of the functions
 since Andres has already removed this from his patch.

I haven't removed it from the patch afaik, so it would be great to get a
profile here! Its only for xlogdump, but that tool helped me immensely
and I don't want to maintain it independently...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-19 Thread Steve Singer

On 13-01-09 03:07 PM, Tom Lane wrote:

Andres Freund and...@2ndquadrant.com writes:

Well, I *did* benchmark it as noted elsewhere in the thread, but thats
obviously just machine (E5520 x 2) with one rather restricted workload
(pgbench -S -jc 40 -T60). At least its rather palloc heavy.
Here are the numbers:
before:
#101646.763208 101350.361595 101421.425668 101571.211688 101862.172051 
101449.857665
after:
#101553.596257 102132.277795 101528.816229 101733.541792 101438.531618 
101673.400992
So on my system if there is a difference, its positive (0.12%).

pgbench-based testing doesn't fill me with a lot of confidence for this
--- its numbers contain a lot of communication overhead, not to mention
that pgbench itself can be a bottleneck.  It struck me that we have a
recent test case that's known to be really palloc-intensive, namely
Pavel's example here:
http://www.postgresql.org/message-id/CAFj8pRCKfoz6L82PovLXNK-1JL=jzjwat8e2bd2pwnkm7i7...@mail.gmail.com

I set up a non-cassert build of commit
78a5e738e97b4dda89e1bfea60675bcf15f25994 (ie, just before the patch that
reduced the data-copying overhead for that).  On my Fedora 16 machine
(dual 2.0GHz Xeon E5503, gcc version 4.6.3 20120306 (Red Hat 4.6.3-2))
I get a runtime for Pavel's example of 17023 msec (average over five
runs).  I then applied oprofile and got a breakdown like this:

   samples|  %|
--
108409 84.5083 /home/tgl/testversion/bin/postgres
 13723 10.6975 /lib64/libc-2.14.90.so
  3153  2.4579 /home/tgl/testversion/lib/postgresql/plpgsql.so

samples  %symbol name
1096010.1495  AllocSetAlloc
6325  5.8572  MemoryContextAllocZeroAligned
6225  5.7646  base_yyparse
3765  3.4866  copyObject
2511  2.3253  MemoryContextAlloc
2292  2.1225  grouping_planner
2044  1.8928  SearchCatCache
1956  1.8113  core_yylex
1763  1.6326  expression_tree_walker
1347  1.2474  MemoryContextCreate
1340  1.2409  check_stack_depth
1276  1.1816  GetCachedPlan
1175  1.0881  AllocSetFree
1106  1.0242  GetSnapshotData
1106  1.0242  _SPI_execute_plan
1101  1.0196  extract_query_dependencies_walker

I then applied the palloc.h and mcxt.c hunks of your patch and rebuilt.
Now I get an average runtime of 1 ms, a full 2% faster, which is a
bit astonishing, particularly because the oprofile results haven't moved
much:

107642 83.7427 /home/tgl/testversion/bin/postgres
 14677 11.4183 /lib64/libc-2.14.90.so
  3180  2.4740 /home/tgl/testversion/lib/postgresql/plpgsql.so

samples  %symbol name
10038 9.3537  AllocSetAlloc
6392  5.9562  MemoryContextAllocZeroAligned
5763  5.3701  base_yyparse
4810  4.4821  copyObject
2268  2.1134  grouping_planner
2178  2.0295  core_yylex
1963  1.8292  palloc
1867  1.7397  SearchCatCache
1835  1.7099  expression_tree_walker
1551  1.4453  check_stack_depth
1374  1.2803  _SPI_execute_plan
1282  1.1946  MemoryContextCreate
1187  1.1061  AllocSetFree
...
653   0.6085  palloc0
...
552   0.5144  MemoryContextAlloc

The number of calls of AllocSetAlloc certainly hasn't changed at all, so
how did that get faster?

I notice that the postgres executable is about 0.2% smaller, presumably
because a whole lot of inlined fetches of CurrentMemoryContext are gone.
This makes me wonder if my result is due to chance improvements of cache
line alignment for inner loops.

I would like to know if other people get comparable results on other
hardware (non-Intel hardware would be especially interesting).  If this
result holds up across a range of platforms, I'll withdraw my objection
to making palloc a plain function.

regards, tom lane



Sorry for the delay I only read this thread today.


I just tried Pawel's test on a POWER5 machine with an older version of 
gcc (see the grebe buildfarm animal for details)


78a5e738e:   37874.855 (average of 6 runs)
78a5e738 + palloc.h + mcxt.c: 38076.8035

The functions do seem to slightly slow things down on POWER. I haven't bothered 
to run oprofile or tprof to get a breakdown of the functions since Andres has 
already removed this from his patch.

Steve




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-18 Thread Andres Freund
On 2013-01-09 15:07:10 -0500, Tom Lane wrote:
 I would like to know if other people get comparable results on other
 hardware (non-Intel hardware would be especially interesting).  If this
 result holds up across a range of platforms, I'll withdraw my objection
 to making palloc a plain function.

Unfortunately nobody spoke up and I don't have any non-intel hardware
anymore. Well, aside from my phone that is...

Updated patch attached, the previous version missed some contrib modules.

I like the diffstat:
 41 files changed, 224 insertions(+), 636 deletions(-)

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From e4960b1659c8bc33661ff07b2ba3cccef653c8fc Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Wed, 9 Jan 2013 12:05:37 +0100
Subject: [PATCH] Provide a common malloc wrappers and palloc et al. emulation
 for frontend'ish environs

---
 contrib/oid2name/oid2name.c|  52 +
 contrib/pg_upgrade/pg_upgrade.h|   5 +-
 contrib/pg_upgrade/util.c  |  49 -
 contrib/pgbench/pgbench.c  |  54 +-
 src/backend/storage/file/copydir.c |  12 +--
 src/backend/utils/mmgr/mcxt.c  |  78 +++-
 src/bin/initdb/initdb.c|  40 +-
 src/bin/pg_basebackup/pg_basebackup.c  |   2 +-
 src/bin/pg_basebackup/pg_receivexlog.c |   1 +
 src/bin/pg_basebackup/receivelog.c |   1 +
 src/bin/pg_basebackup/streamutil.c |  38 +-
 src/bin/pg_basebackup/streamutil.h |   4 -
 src/bin/pg_ctl/pg_ctl.c|  39 +-
 src/bin/pg_dump/Makefile   |   6 +-
 src/bin/pg_dump/common.c   |   1 -
 src/bin/pg_dump/compress_io.c  |   1 -
 src/bin/pg_dump/dumpmem.c  |  76 ---
 src/bin/pg_dump/dumpmem.h  |  22 --
 src/bin/pg_dump/dumputils.c|   1 -
 src/bin/pg_dump/dumputils.h|   1 +
 src/bin/pg_dump/pg_backup_archiver.c   |   1 -
 src/bin/pg_dump/pg_backup_custom.c |   2 +-
 src/bin/pg_dump/pg_backup_db.c |   1 -
 src/bin/pg_dump/pg_backup_directory.c  |   1 -
 src/bin/pg_dump/pg_backup_null.c   |   1 -
 src/bin/pg_dump/pg_backup_tar.c|   1 -
 src/bin/pg_dump/pg_dump.c  |   1 -
 src/bin/pg_dump/pg_dump_sort.c |   1 -
 src/bin/pg_dump/pg_dumpall.c   |   1 -
 src/bin/pg_dump/pg_restore.c   |   1 -
 src/bin/pg_resetxlog/pg_resetxlog.c|   5 +-
 src/bin/psql/common.c  |  50 -
 src/bin/psql/common.h  |  10 +--
 src/bin/scripts/common.c   |  49 -
 src/bin/scripts/common.h   |   5 +-
 src/include/port/palloc.h  |  19 +
 src/include/utils/palloc.h |  12 +--
 src/port/Makefile  |   8 +-
 src/port/dirmod.c  |  75 +--
 src/port/palloc.c  | 129 +
 src/tools/msvc/Mkvcbuild.pm|   4 +-
 41 files changed, 224 insertions(+), 636 deletions(-)
 delete mode 100644 src/bin/pg_dump/dumpmem.c
 delete mode 100644 src/bin/pg_dump/dumpmem.h
 create mode 100644 src/include/port/palloc.h
 create mode 100644 src/port/palloc.c

diff --git a/contrib/oid2name/oid2name.c b/contrib/oid2name/oid2name.c
index a666731..dfd8105 100644
--- a/contrib/oid2name/oid2name.c
+++ b/contrib/oid2name/oid2name.c
@@ -9,6 +9,8 @@
  */
 #include postgres_fe.h
 
+#include port/palloc.h
+
 #include unistd.h
 #ifdef HAVE_GETOPT_H
 #include getopt.h
@@ -50,9 +52,6 @@ struct options
 /* function prototypes */
 static void help(const char *progname);
 void		get_opts(int, char **, struct options *);
-void	   *pg_malloc(size_t size);
-void	   *pg_realloc(void *ptr, size_t size);
-char	   *pg_strdup(const char *str);
 void		add_one_elt(char *eltname, eary *eary);
 char	   *get_comma_elts(eary *eary);
 PGconn	   *sql_conn(struct options *);
@@ -201,53 +200,6 @@ help(const char *progname)
 		   progname, progname);
 }
 
-void *
-pg_malloc(size_t size)
-{
-	void	   *ptr;
-
-	/* Avoid unportable behavior of malloc(0) */
-	if (size == 0)
-		size = 1;
-	ptr = malloc(size);
-	if (!ptr)
-	{
-		fprintf(stderr, out of memory\n);
-		exit(1);
-	}
-	return ptr;
-}
-
-void *
-pg_realloc(void *ptr, size_t size)
-{
-	void	   *result;
-
-	/* Avoid unportable behavior of realloc(NULL, 0) */
-	if (ptr == NULL  size == 0)
-		size = 1;
-	result = realloc(ptr, size);
-	if (!result)
-	{
-		fprintf(stderr, out of memory\n);
-		exit(1);
-	}
-	return result;
-}
-
-char *
-pg_strdup(const char *str)
-{
-	char	   *result = strdup(str);
-
-	if (!result)
-	{
-		fprintf(stderr, out of memory\n);
-		exit(1);
-	}
-	return result;
-}
-
 /*
  * add_one_elt
  *
diff --git a/contrib/pg_upgrade/pg_upgrade.h b/contrib/pg_upgrade/pg_upgrade.h
index d5c3fa9..a917b5b 100644
--- 

[HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-13 Thread Andres Freund
On 2013-01-12 15:39:16 -0500, Tom Lane wrote:
 Heikki Linnakangas hlinnakan...@vmware.com writes:
  On 12.01.2013 20:42, Andres Freund wrote:
  I don't care for that too much in detail -- if errstart were to return
  false (it shouldn't, but if it did) this would be utterly broken,
 
  With the current ereport(), we'll call abort() if errstart returns false 
  and elevel = ERROR. That's even worse than making an error reporting 
  calls that elog.c isn't prepared for.
 
 No it isn't: you'd get a clean and easily interpretable abort.  I am not
 sure exactly what would happen if we plow forward with calling elog.c
 functions after errstart returns false, but it wouldn't be good, and
 debugging a field report of such behavior wouldn't be easy.
 
 This is actually a disadvantage of the proposal to replace the abort()
 calls with __builtin_unreachable(), too.  The gcc boys interpret the
 semantics of that as if control reaches here, the behavior is
 undefined --- and their notion of undefined generally seems to amount
 to we will screw you as hard as we can.  For example, they have no
 problem with using the assumption that control won't reach there to make
 deductions while optimizing the rest of the function.  If it ever did
 happen, I would not want to have to debug it.  The same goes for
 __attribute__((noreturn)), BTW.


We could make add an Assert() additionally?
 
  Yea, I didn't really like it all that much either - but anyway, I have
  yet to find *any* version with a local variable that doesn't lead to
  200kb size increase :(.
 
  Hmm, that's strange. Assuming the optimizer optimizes away the paths in 
  the macro that are never taken when elevel is a constant, I would expect 
  the resulting binary to be smaller than what we have now.
 
 I was wondering about that too, but haven't tried it locally yet.  It
 could be that Andres was looking at an optimization level in which a
 register still got allocated for the local variable.

Nope, it was O2, albeit with -fno-omit-frame-pointer (for usable
hierarchical profiles).

  Here's what I got with and without my patch on my laptop:
 
  -rwxr-xr-x 1 heikki heikki 24767237 tammi 12 21:27 postgres-do-while-mypatch
  -rwxr-xr-x 1 heikki heikki 24623081 tammi 12 21:28 postgres-unpatched
 
  But when I build without --enable-debug, the situation reverses:
 
  -rwxr-xr-x 1 heikki heikki 5961309 tammi 12 21:48 postgres-do-while-mypatch
  -rwxr-xr-x 1 heikki heikki 5986961 tammi 12 21:49 postgres-unpatched
 
 Yes, I noticed that too: these patches make the debug overhead grow
 substantially for some reason.  It's better to look at the output of
 size rather than the executable's total file size.  That way you don't
 need to rebuild without debug to see reality.  (I guess you could also

But the above is spot on. While I used strip earlier I somehow forgot it
here and so the debug overhead is part of what I saw. But, in comparison
to my baseline (which is replacing the abort() with
pg_unreachable()/__builtin_unreachable()) its still a loss
performancewise which is why I didn't doubt my results...

Pavel's testcase with changing ereport implementations (5 runs, minimal
time):
EREPORT_EXPR_OLD_NO_UNREACHABLE: 9964.015ms,5530296 bytes (stripped)
EREPORT_EXPR_OLD_ABORT:  9765.916ms,5466936 bytes (stripped)
EREPORT_EXPR_OLD_UNREACHABLE:9646.502ms,5462456 bytes (stripped)
EREPORT_STMT_HEIKKI: 10133.968ms,   5435704 bytes (stripped)
EREPORT_STMT_ANDRES: 9548.436ms,5462232 bytes (stripped)

Where my variant is:
#define ereport_domain(elevel, domain, rest)\
do {\
const int elevel_ = elevel; \
if (errstart(elevel_,__FILE__, __LINE__, PG_FUNCNAME_MACRO, domain))\
{   \
errfinish rest; \
}   \
if (elevel_= ERROR)\
pg_unreachable();   \
} while(0)

So I suggest using that with an additional Assert(0) in the if(elevel_)
branch.

The assembler generated for my version is exactly the same when elevel
is constant in comparison with Heikki's but doesn't generate any
additional code in the case its not constant and I guess thats where it
gets faster?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-13 Thread Andres Freund
On 2013-01-12 19:47:18 -0500, Tom Lane wrote:
 Heikki Linnakangas hlinnakan...@vmware.com writes:
  When I compile with gcc -O0, I get one warning with this:
 
  datetime.c: In function �DateTimeParseError�:
  datetime.c:3575:1: warning: �noreturn� function does return [enabled by 
  default]
 
  That suggests that the compiler didn't correctly deduce that 
  ereport(ERROR, ...) doesn't return. With -O1, the warning goes away.
 
 Yeah, I am seeing this too.  It appears to be endemic to the
 local-variable approach, ie if we have
 
   const int elevel_ = (elevel);
   ...
   (elevel_ = ERROR) ? pg_unreachable() : (void) 0
 
 then we do not get the desired results at -O0, which is not terribly
 surprising --- I'd not really expect the compiler to propagate the
 value of elevel_ when not optimizing.
 
 If we don't use a local variable, we don't get the warning, which
 I take to mean that gcc will fold ERROR = ERROR to true even at -O0,
 and that it does this early enough to conclude that unreachability
 holds.
 
 I experimented with some variant ways of phrasing the macro, but the
 only thing that worked at -O0 required __builtin_constant_p, which
 rather defeats the purpose of making this accessible to non-gcc
 compilers.

Yea, its rather sad. The only thing I can see is actually doing both
variants like:

#define ereport_domain(elevel, domain, rest)\
do {\
const int elevel_ = elevel; \
if (errstart(elevel_,__FILE__, __LINE__, PG_FUNCNAME_MACRO, domain))\
{   \
errfinish rest; \
}   \
if (pg_is_known_constant(elevel)  (elevel) = ERROR)  \
{   \
pg_unreachable();   \
Assert(0);  \
}   \
else if (elevel_= ERROR)   \
{   \
pg_unreachable();   \
Assert(0);  \
}   \
} while(0)

but that obviously still relies on __builtin_constant_p giving
reasonable answers.

We should probably put that Assert(0) into pg_unreachable()...

 If we go with the local-variable approach, we could probably suppress
 this warning by putting an abort() call at the bottom of
 DateTimeParseError.  It seems a tad ugly though, and what's a bigger
 deal is that if the compiler is unable to deduce unreachability at -O0
 then we are probably going to be fighting such bogus warnings for all
 time to come.  Note also that an abort() (much less a pg_unreachable())
 would not do anything positive like give us a compile warning if we
 mistakenly added a case that could fall through.
 
 On the other hand, if there's only one such case in our tree today,
 maybe I'm worrying too much.

I think the only reason there's only one such case (two here actually,
second in renametrig) is that all others already have been silenced
because the abort() didn't use to be there. And I think the danger youre
alluding to above is a real one.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-13 Thread Andres Freund
On 2013-01-12 16:36:39 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  It does *not* combine elog_start and elog_finish into one function if
  varargs are available although that brings a rather measurable
  size/performance benefit.
 
  Since you've apparently already done the measurement: how much?
  It would be a bit tedious to support two different infrastructures for
  elog(), but if it's a big enough win maybe we should.
 
  Imo its pretty definitely a big enough win. So big I have a hard time
  believing it can be true without negative effects somewhere else.
 
 Well, actually there's a pretty serious negative effect here, which is
 that when it's implemented this way it's impossible to save errno for %m
 before the elog argument list is evaluated.  So I think this is a no-go.
 We've always had the contract that functions in the argument list could
 stomp on errno without care.
 
 If we switch to a do-while macro expansion it'd be possible to do
 something like
 
   do {
   int save_errno = errno;
   int elevel = whatever;
 
   elog_internal( save_errno, elevel, fmt, __VA__ARGS__ );
   } while (0);
 
 but this would almost certainly result in more code bloat not less,
 since call sites would now be responsible for fetching errno.

the numbers are:
old definition: 10393.658ms, 5497912 bytes
old definition + unreachable:   10011.102ms, 5469144 bytes
stmt, two calls, unreachable:   10036.132ms, 5468792 bytes
stmt, one call, unreachable:9443.612ms,  5462232 bytes
stmt, one call, unreachable, save errno:9615.863ms,  5489688 bytes

So while not saving errno is unsurprisingly better its still a win.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-13 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 the numbers are:
 old definition: 10393.658ms, 5497912 bytes
 old definition + unreachable:   10011.102ms, 5469144 bytes
 stmt, two calls, unreachable:   10036.132ms, 5468792 bytes
 stmt, one call, unreachable:9443.612ms,  5462232 bytes
 stmt, one call, unreachable, save errno:9615.863ms,  5489688 bytes

I find these numbers pretty hard to credit.  Why should replacing two
calls by one, in code paths that are not being taken, move the runtime
so much?  The argument that a net reduction of code size is a win
doesn't work, because the last case is more code than any except the
first.

I think you're measuring some coincidental effect or other, not a
reproducible performance improvement.  Or there's a bug in the code
you're using.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-13 Thread Andres Freund
On 2013-01-13 14:17:52 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  the numbers are:
  old definition: 10393.658ms, 5497912 bytes
  old definition + unreachable:   10011.102ms, 5469144 bytes
  stmt, two calls, unreachable:   10036.132ms, 5468792 bytes
  stmt, one call, unreachable:9443.612ms,  5462232 bytes
  stmt, one call, unreachable, save errno:9615.863ms,  5489688 bytes
 
 I find these numbers pretty hard to credit.  Why should replacing two
 calls by one, in code paths that are not being taken, move the runtime
 so much?  The argument that a net reduction of code size is a win
 doesn't work, because the last case is more code than any except the
 first.

There are quite some elog(DEBUG*)s in the backend, and those are taken,
so I don't find it unreasonable that doing less work in those cases is
measurable.

And if you look at the disassembly of ERROR codepaths:

saving errno, one function:
Dump of assembler code for function palloc:
639  {
   0x00736397 +7: mov%rdi,%rsi
   0x007363b0 +32:push   %rbp
   0x007363b1 +33:mov%rsp,%rbp
   0x007363b4 +36:sub$0x20,%rsp

640 /* duplicates MemoryContextAlloc to
avoid increased overhead */
641 AssertArg(MemoryContextIsValid(CurrentMemoryContext));
642
643 if (!AllocSizeIsValid(size))
   0x00736390 +0: cmp$0x3fff,%rdi
   0x0073639a +10:ja 0x7363b0 palloc+32

644elog(ERROR, invalid memory alloc
request size %lu,
   0x007363b8 +40:mov%rdi,-0x8(%rbp)
   0x007363bc +44:callq  0x462010 __errno_location@plt
   0x007363c1 +49:mov-0x8(%rbp),%rsi
   0x007363c5 +53:mov$0x8ace30,%r9d
   0x007363cb +59:mov$0x14,%ecx
   0x007363d0 +64:mov$0x8acefe,%edx
   0x007363d5 +69:mov$0x8ace58,%edi
   0x007363da +74:mov%rsi,(%rsp)
   0x007363de +78:mov(%rax),%r8d
   0x007363e1 +81:mov$0x285,%esi
   0x007363e6 +86:xor%eax,%eax
   0x007363e8 +88:callq  0x71a0b0 elog_all
   0x007363ed:  nopl   (%rax)
645  (unsigned long) size);
646
647 CurrentMemoryContext-isReset = false;
   0x0073639c +12:  mov0x25c6e5(%rip),%rdi
   # 0x992a88 CurrentMemoryContext
   0x007363a7 +23:movb   $0x0,0x30(%rdi)

648
649 return (*CurrentMemoryContext-methods-alloc)
(CurrentMemoryContext, size);
   0x007363a3 +19:mov0x8(%rdi),%rax
   0x007363ab +27:mov(%rax),%rax
   0x007363ae +30:jmpq   *%rax

End of assembler dump.

vs

Dump of assembler code for function palloc:
639  {
   0x007382b0 +0: push   %rbp
   0x007382b1 +1: mov%rsp,%rbp
   0x007382b4 +4: push   %rbx
   0x007382b5 +5: mov%rdi,%rbx
   0x007382b8 +8: sub$0x8,%rsp

640 /* duplicates MemoryContextAlloc to
avoid increased overhead */
641 AssertArg(MemoryContextIsValid(CurrentMemoryContext));
642
643 if (!AllocSizeIsValid(size))
   0x007382bc +12:cmp$0x3fff,%rdi
   0x007382c3 +19:jbe0x7382ed palloc+61

644elog(ERROR, invalid memory alloc
request size %lu,
   0x007382c5 +21:mov$0x8aec9e,%edx
   0x007382ca +26:mov$0x284,%esi
   0x007382cf +31:mov$0x8aebd0,%edi
   0x007382d4 +36:callq  0x71bcb0 elog_start
   0x007382d9 +41:mov%rbx,%rdx
   0x007382dc +44:mov$0x8aec10,%esi
   0x007382e1 +49:mov$0x14,%edi
   0x007382e6 +54:xor%eax,%eax
   0x007382e8 +56:callq  0x71bac0 elog_finish

645  (unsigned long) size);
646
647 CurrentMemoryContext-isReset = false;
   0x007382ed +61:  mov0x25bc54(%rip),%rdi
   # 0x993f48 CurrentMemoryContext
   0x007382fb +75:movb   $0x0,0x30(%rdi)

648
649 return (*CurrentMemoryContext-methods-alloc)
(CurrentMemoryContext, size);
   0x007382f4 +68:mov%rbx,%rsi
   0x007382f7 +71:mov0x8(%rdi),%rax
   0x007382ff +79:mov(%rax),%rax
   0x00738308 +88:jmpq   *%rax
   0x0073830a:  nopw   0x0(%rax,%rax,1)

650 }
   0x00738302 +82:add$0x8,%rsp
   0x00738306 +86:pop%rbx
   0x00738307 +87:pop%rbp
End of assembler dump.

If other critical paths look like this its not that surprising.

 I think you're measuring some coincidental 

[HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-13 Thread Andres Freund
On 2013-01-13 15:44:58 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:

  And if you look at the disassembly of ERROR codepaths:
 
 I think your numbers are being twisted by -fno-omit-frame-pointer.
 What I get, with the __builtin_unreachable version of the macro,
 looks more like

The only difference is that the following two instructions aren't done:
   0x007363b0 +32:push   %rbp
   0x007363b1 +33:mov%rsp,%rbp

Given that that function barely uses registers (from an amd64 pov at
least), thats notreally surprising.

 MemoryContextAlloc:
   cmpq$1073741823, %rsi
   pushq   %rbx
   movq%rsi, %rbx
   ja  .L53
   movq8(%rdi), %rax
   movb$0, 48(%rdi)
   popq%rbx
   movq(%rax), %rax
   jmp *%rax
 .L53:
   movl$__func__.5262, %edx
   movl$576, %esi
   movl$.LC2, %edi
   callelog_start
   movq%rbx, %rdx
   movl$.LC3, %esi
   movl$20, %edi
   xorl%eax, %eax
   callelog_finish
 
 With -fno-omit-frame-pointer it's a little worse, but still not what you
 show --- in particular, for me gcc still pushes the elog calls out of
 the main code path.  I don't think that the main path will get any
 better with one elog function instead of two.  It could easily get worse.
 On many machines, the single-function version would be worse because of
 needing to use more parameter registers, which would translate into more
 save/restore work in the calling function, which is overhead that would
 likely be paid whether elog actually gets called or not.  (As an
 example, in the above code gcc evidently isn't noticing that it doesn't
 need to save/restore rbx so far as the main path is concerned.)
 
 In any case, results from a single micro-benchmark on a single platform
 with a single compiler version (and single set of compiler options)
 don't convince me a whole lot here.

I am not convinced either, I just got curious whether it would be a win
after the __VA_ARGS__ thing made it possible. If the errno problem
didn't exist I would be pretty damn sure its just about always a win,
but the way its now...

We could make elog behave the same as ereport WRT to argument evaluation
but that seems a bit dangerous given it would still be different on some
platforms.

 Basically, the aspects of this that I think are likely to be
 reproducible wins across different platforms are (a) teaching the
 compiler that elog(ERROR) doesn't return, and (b) reducing code size as
 much as possible.  The single-function change isn't going to help on
 either ground --- maybe it would have helped on (b) without the errno
 problem, but that's going to destroy any possible code size savings.

Agreed. I am happy to produce an updated patch unless youre already on
it?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-12 Thread Heikki Linnakangas

On 11.01.2013 23:49, Tom Lane wrote:

Andres Freundand...@2ndquadrant.com  writes:

On 2013-01-11 16:28:13 -0500, Tom Lane wrote:

I'm not very satisfied with that answer.  Right now, Peter's
patch has added a class of bugs that never existed before 9.3, and yours
would add more.  It might well be that those classes are empty ... but
*we can't know that*.  I don't think that keeping a new optimization for
non-gcc compilers is worth that risk.  Postgres is already full of
gcc-only optimizations, anyway.



ISTM that ereport() already has so strange behaviour wrt evaluation of
arguments (i.e doing it only when the message is really logged) that its
doesn't seem to add a real additional risk.


Hm ... well, that's a point.  I may be putting too much weight on the
fact that any such bug for elevel would be a new one that never existed
before.  What do others think?


It sure would be nice to avoid multiple evaluation. At least in xlog.c 
we use emode_for_corrupt_record() to pass the elevel, and it does have a 
side-effect. It's quite harmless in practice, you'll get some extra 
DEBUG1 messages in the log, but still.


Here's one more construct to consider:

#define ereport_domain(elevel, domain, rest) \
do { \
const int elevel_ = elevel; \
		if (errstart(elevel_,__FILE__, __LINE__, PG_FUNCNAME_MACRO, domain) || 
elevel_= ERROR) \

{ \
(void) rest; \
if (elevel_= ERROR) \
errfinish_noreturn(1); \
else \
errfinish(1); \
} \
} while(0)

extern void errfinish(int dummy,...);
extern void errfinish_noreturn(int dummy,...) __attribute__((noreturn));

With do-while, ereport() would no longer be an expression. There only 
place where that's a problem in our codebase is in scan.l, bootscanner.l 
and repl_scanner.l, which do this:


#undef fprintf
#define fprintf(file, fmt, msg)  ereport(ERROR, (errmsg_internal(%s, 
msg)))


and flex does this:

(void) fprintf( stderr, %s\n, msg );

but that's easy to work around by creating a wrapper fprintf function in 
those .l files.


When I compile with gcc -O0, I get one warning with this:

datetime.c: In function ‘DateTimeParseError’:
datetime.c:3575:1: warning: ‘noreturn’ function does return [enabled by 
default]


That suggests that the compiler didn't correctly deduce that 
ereport(ERROR, ...) doesn't return. With -O1, the warning goes away.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-12 Thread Andres Freund
On 2013-01-12 12:26:37 +0200, Heikki Linnakangas wrote:
 On 11.01.2013 23:49, Tom Lane wrote:
 Andres Freundand...@2ndquadrant.com  writes:
 On 2013-01-11 16:28:13 -0500, Tom Lane wrote:
 I'm not very satisfied with that answer.  Right now, Peter's
 patch has added a class of bugs that never existed before 9.3, and yours
 would add more.  It might well be that those classes are empty ... but
 *we can't know that*.  I don't think that keeping a new optimization for
 non-gcc compilers is worth that risk.  Postgres is already full of
 gcc-only optimizations, anyway.
 
 ISTM that ereport() already has so strange behaviour wrt evaluation of
 arguments (i.e doing it only when the message is really logged) that its
 doesn't seem to add a real additional risk.
 
 Hm ... well, that's a point.  I may be putting too much weight on the
 fact that any such bug for elevel would be a new one that never existed
 before.  What do others think?
 
 It sure would be nice to avoid multiple evaluation. At least in xlog.c we
 use emode_for_corrupt_record() to pass the elevel, and it does have a
 side-effect. It's quite harmless in practice, you'll get some extra DEBUG1
 messages in the log, but still.
 
 Here's one more construct to consider:
 
 #define ereport_domain(elevel, domain, rest) \
   do { \
   const int elevel_ = elevel; \
   if (errstart(elevel_,__FILE__, __LINE__, PG_FUNCNAME_MACRO, 
 domain) ||
 elevel_= ERROR) \
   { \
   (void) rest; \
   if (elevel_= ERROR) \
   errfinish_noreturn(1); \
   else \
   errfinish(1); \
   } \
   } while(0)
 
 extern void errfinish(int dummy,...);
 extern void errfinish_noreturn(int dummy,...) __attribute__((noreturn));

I am afraid that that would bloat the code again as it would have to put
that if() into each callsite whereas a variant that ends up using
__builtin_unreachable() doesn't. So I think we should use
__builtin_unreachable() while falling back to abort() as before. But
that's really more a detail than anything fundamental about the approach.

I'll play a bit.

 With do-while, ereport() would no longer be an expression. There only place
 where that's a problem in our codebase is in scan.l, bootscanner.l and
 repl_scanner.l, which do this:

I aggree that would easy to fix, so no problem there.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-12 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 It does *not* combine elog_start and elog_finish into one function if
 varargs are available although that brings a rather measurable
 size/performance benefit.

 Since you've apparently already done the measurement: how much?
 It would be a bit tedious to support two different infrastructures for
 elog(), but if it's a big enough win maybe we should.

 Imo its pretty definitely a big enough win. So big I have a hard time
 believing it can be true without negative effects somewhere else.

Well, actually there's a pretty serious negative effect here, which is
that when it's implemented this way it's impossible to save errno for %m
before the elog argument list is evaluated.  So I think this is a no-go.
We've always had the contract that functions in the argument list could
stomp on errno without care.

If we switch to a do-while macro expansion it'd be possible to do
something like

do {
int save_errno = errno;
int elevel = whatever;

elog_internal( save_errno, elevel, fmt, __VA__ARGS__ );
} while (0);

but this would almost certainly result in more code bloat not less,
since call sites would now be responsible for fetching errno.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-12 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 When I compile with gcc -O0, I get one warning with this:

 datetime.c: In function ‘DateTimeParseError’:
 datetime.c:3575:1: warning: ‘noreturn’ function does return [enabled by 
 default]

 That suggests that the compiler didn't correctly deduce that 
 ereport(ERROR, ...) doesn't return. With -O1, the warning goes away.

Yeah, I am seeing this too.  It appears to be endemic to the
local-variable approach, ie if we have

const int elevel_ = (elevel);
...
(elevel_ = ERROR) ? pg_unreachable() : (void) 0

then we do not get the desired results at -O0, which is not terribly
surprising --- I'd not really expect the compiler to propagate the
value of elevel_ when not optimizing.

If we don't use a local variable, we don't get the warning, which
I take to mean that gcc will fold ERROR = ERROR to true even at -O0,
and that it does this early enough to conclude that unreachability
holds.

I experimented with some variant ways of phrasing the macro, but the
only thing that worked at -O0 required __builtin_constant_p, which
rather defeats the purpose of making this accessible to non-gcc
compilers.

If we go with the local-variable approach, we could probably suppress
this warning by putting an abort() call at the bottom of
DateTimeParseError.  It seems a tad ugly though, and what's a bigger
deal is that if the compiler is unable to deduce unreachability at -O0
then we are probably going to be fighting such bogus warnings for all
time to come.  Note also that an abort() (much less a pg_unreachable())
would not do anything positive like give us a compile warning if we
mistakenly added a case that could fall through.

On the other hand, if there's only one such case in our tree today,
maybe I'm worrying too much.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-11 Thread Andres Freund
On 2013-01-10 10:55:20 +0100, Andres Freund wrote:
 On 2013-01-10 10:31:04 +0100, Andres Freund wrote:
  On 2013-01-10 00:05:07 +0100, Andres Freund wrote:
   On 2013-01-09 17:28:35 -0500, Tom Lane wrote:
(We know this doesn't
matter, but gcc doesn't; this version of gcc apparently isn't doing much
with the knowledge that elog won't return.)
  
   Afaics one reason for that is that we don't have any such annotation for
   elog(), just for ereport. And I don't immediately see how it could be
   added to elog without relying on variadic macros. Bit of a shame, there
   probably a good number of callsites that could actually benefit from
   that knowledge.
   Is it worth making that annotation conditional on variadic macro
   support?
 
  A quick test confirms that. With:
 
  diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
  index cbbda04..39cd809 100644
  --- a/src/include/utils/elog.h
  +++ b/src/include/utils/elog.h
  @@ -212,7 +212,13 @@ extern int getinternalerrposition(void);
* elog(ERROR, portal \%s\ not found, stmt-portalname);
*--
*/
  +#ifdef __GNUC__
  +#define elog(elevel, ...) elog_start(__FILE__, __LINE__, 
  PG_FUNCNAME_MACRO), \
  +   elog_finish(elevel, __VA_ARGS__),   
  \
  +   ((elevel) = ERROR ? __builtin_unreachable() : (void) 0)
  +#else
   #define elog   elog_start(__FILE__, __LINE__, PG_FUNCNAME_MACRO), 
  elog_finish
  +#endif
 
   extern void elog_start(const char *filename, int lineno, const char 
  *funcname);
   extern void
 
  nearly the same code is generated for both variants (this is no
  additionally saved registers and such).
 
  Two unfinished things:
  * the above only checks for gcc although all C99 compilers should
support varargs macros
  * It uses __builtin_unreachable() instead of abort() as that creates
a smaller executable. That's gcc 4.5 only. Saves about 30kb in a
both a stripped and unstripped binary.
  * elog(ERROR) i.e. no args would require more macro ugliness, but that
seems unneccesary
 
  Doing the __builtin_unreachable() for ereport as well saves about 50kb.
 
  Given that some of those elog/ereports are in performance critical code
  paths it seems like a good idea to remove code from the
  callsites. Adding configure check for __VA_ARGS__ and
  __builtin_unreachable() should be simple enough?
 
 For whatever its worth - I am not sure its much - after relying on
 variadic macros we can make elog into one function call instead of
 two. That saves another 100kb.
 
 [tinker]
 
 The builtin_unreachable and combined elog thing bring a measurable
 performance benefit of a rather surprising 7% when running Pavel's
 testcase ontop of yesterdays HEAD. So it seems worth investigating.
 About 3% of that is the __builtin_unreachable() and about 4% the
 combining of elog_start/finish.
 
 Now that testcase sure isn't a very representative one for this, but I
 don't really see why it should benefit much more than other cases. Seems
 to be quite the benefit for relatively minor cost.
 Although I am pretty sure just copying the whole of
 elog_start/elog_finish into one elog_all() isn't the best way to go at
 this...

The attached patch:

* adds configure checks for __VA_ARGS__ and __builtin_unreachable
  support
* provides a pg_unreachable() macro that expands to
  __builtin_unreachable() or abort()
* changes #define elog ... into #define elog(elevel, ...) if varargs are
  available

It does *not* combine elog_start and elog_finish into one function if
varargs are available although that brings a rather measurable
size/performance benefit.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 94a3f93f4c8a381358a4dc52a43ec60a8f3e33f6 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Fri, 11 Jan 2013 16:58:17 +0100
Subject: [PATCH] Mark elog() as not returning if the capabilities of the C
 compiler allow it

To do that:
* Add a configure check for __VA_ARGS__ support
* Provide pg_unreachable() macro which expands to__builtin_unreachable() if the
  compiler supports it, abort() otherwise

Marking it as such improves code generation and reduces executable size.
---
 config/c-compiler.m4 | 41 +
 configure.in |  2 ++
 src/include/c.h  | 10 ++
 src/include/utils/elog.h | 18 +++---
 4 files changed, 68 insertions(+), 3 deletions(-)

diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index 7cbb8ec..f3efe72 100644
--- a/config/c-compiler.m4
+++ b/config/c-compiler.m4
@@ -161,6 +161,47 @@ fi])# PGAC_C_TYPES_COMPATIBLE
 
 
 
+# PGAC_C_VA_ARGS
+# ---
+# Check if the C compiler understands C99 style variadic macros and define
+# HAVE__VA_ARGS if so.
+AC_DEFUN([PGAC_C_VA_ARGS],
+[AC_CACHE_CHECK(for 

Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-11 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 The attached patch:

 * adds configure checks for __VA_ARGS__ and __builtin_unreachable
   support
 * provides a pg_unreachable() macro that expands to
   __builtin_unreachable() or abort()
 * changes #define elog ... into #define elog(elevel, ...) if varargs are
   available

Seems like a good thing to do --- will review and commit.

 It does *not* combine elog_start and elog_finish into one function if
 varargs are available although that brings a rather measurable
 size/performance benefit.

Since you've apparently already done the measurement: how much?
It would be a bit tedious to support two different infrastructures for
elog(), but if it's a big enough win maybe we should.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-11 Thread Andres Freund
On 2013-01-11 14:01:40 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  The attached patch:

  * adds configure checks for __VA_ARGS__ and __builtin_unreachable
support
  * provides a pg_unreachable() macro that expands to
__builtin_unreachable() or abort()
  * changes #define elog ... into #define elog(elevel, ...) if varargs are
available

 Seems like a good thing to do --- will review and commit.

Thanks.

I guess you will catch that anyway, but afaik msvc supports __VA_ARGS__
these days (since v2005 seemingly) and I didn't add HAVE__VA_ARGS to the
respective pg_config.h.win32

  It does *not* combine elog_start and elog_finish into one function if
  varargs are available although that brings a rather measurable
  size/performance benefit.

 Since you've apparently already done the measurement: how much?
 It would be a bit tedious to support two different infrastructures for
 elog(), but if it's a big enough win maybe we should.

Imo its pretty definitely a big enough win. So big I have a hard time
believing it can be true without negative effects somewhere else.

Ontop of the patch youve replied to it saves somewhere around 80kb and
between 0.8 (-S -M prepared), 2% (-S -M simple) and 4% (own stuff) I had
lying around and it consistently gives 6%-7% in Pavel's testcase. With
todays HEAD, not with the one before your fix.

I attached the absolutely ugly and unready patch I used for testing.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index e710f22..c1f3200 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -1151,6 +1151,43 @@ getinternalerrposition(void)
 	return edata-internalpos;
 }
 
+#if defined(HAVE__VA_ARGS)  defined(COMBINE_ELOG)
+void
+elog_all(const char *filename, int lineno, const char *funcname, int elevel, const char *fmt,...)
+{
+	elog_start(filename, lineno, funcname);
+	{
+		ErrorData  *edata = errordata[errordata_stack_depth];
+		MemoryContext oldcontext;
+
+		CHECK_STACK_DEPTH();
+
+		/*
+		 * Do errstart() to see if we actually want to report the message.
+		 */
+		errordata_stack_depth--;
+		errno = edata-saved_errno;
+		if (!errstart(elevel, edata-filename, edata-lineno, edata-funcname, NULL))
+			return;	/* nothing to do */
+
+		/*
+		 * Format error message just like errmsg_internal().
+		 */
+		recursion_depth++;
+		oldcontext = MemoryContextSwitchTo(ErrorContext);
+
+		EVALUATE_MESSAGE(edata-domain, message, false, false);
+
+		MemoryContextSwitchTo(oldcontext);
+		recursion_depth--;
+
+		/*
+		 * And let errfinish() finish up.
+		 */
+		errfinish(0);
+	}
+}
+#endif
 
 /*
  * elog_start --- startup for old-style API
diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index d6e1054..9af530f 100644
--- a/src/include/utils/elog.h
+++ b/src/include/utils/elog.h
@@ -216,7 +216,15 @@ extern int	getinternalerrposition(void);
  * generated.
  *--
  */
-#ifdef HAVE__VA_ARGS
+#define COMBINE_ELOG
+#if defined(HAVE__VA_ARGS)  defined(COMBINE_ELOG)
+extern void elog_all(const char *filename, int lineno, const char *funcname, int elevel, const char *fmt,...)
+__attribute__((format(PG_PRINTF_ATTRIBUTE, 5, 6)));
+
+#define elog(elevel, ...)\
+	elog_all(__FILE__, __LINE__, PG_FUNCNAME_MACRO, elevel, __VA_ARGS__), \
+		((elevel) = ERROR ? pg_unreachable() : (void) 0)
+#elif defined(HAVE__VA_ARGS)
 #define elog(elevel, ...)\
 	elog_start(__FILE__, __LINE__, PG_FUNCNAME_MACRO),	\
 		elog_finish(elevel, __VA_ARGS__),\

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-11 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 [ patch to mark elog() as non-returning if elevel = ERROR ]

It strikes me that both in this patch, and in Peter's previous patch to
do this for ereport(), there is an opportunity for improvement.  Namely,
that the added code only has any use if elevel is a constant, which in
some places it isn't.  We don't really need a call to abort() to be
executed there, but the compiler knows no better and will have to
generate code to test the value of elevel and call abort().  Which
rather defeats the purpose of saving code; plus the compiler will still
not have any knowledge that code after the ereport isn't reached.
And another thing: what if the elevel argument isn't safe for multiple
evaluation?  No such hazard ever existed before these patches, so I'm
not very comfortable with adding one.  (Even if all our own code is
safe, there's third-party code to worry about.)

When we're using recent gcc, there is an easy fix, which is that the
macros could do this:

__builtin_constant_p(elevel)  (elevel) = ERROR : pg_unreachable() : (void) 0

This gets rid of both useless code and the risk of undesirable multiple
evaluation, while not giving up any optimization possibility that
existed before.  So I think we should add a configure test for
__builtin_constant_p() and do it like that when possible.

That still leaves the question of what to do when the compiler doesn't
have __builtin_constant_p().  Should we use the coding that we have,
or give up and not try to mark the macros as non-returning?  I'm rather
inclined to do the latter, because of the multiple-evaluation-risk
argument.

Comments?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-11 Thread Andres Freund
On 2013-01-11 15:05:54 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  [ patch to mark elog() as non-returning if elevel = ERROR ]

 It strikes me that both in this patch, and in Peter's previous patch to
 do this for ereport(), there is an opportunity for improvement.  Namely,
 that the added code only has any use if elevel is a constant, which in
 some places it isn't.  We don't really need a call to abort() to be
 executed there, but the compiler knows no better and will have to
 generate code to test the value of elevel and call abort().
 Which rather defeats the purpose of saving code; plus the compiler will still
 not have any knowledge that code after the ereport isn't reached.

Yea, I think that's why __builtin_unreachable() is a performance
benefit in comparison with abort().
Both are a benefit over not doing either btw in my measurements.

 And another thing: what if the elevel argument isn't safe for multiple
 evaluation?  No such hazard ever existed before these patches, so I'm
 not very comfortable with adding one.  (Even if all our own code is
 safe, there's third-party code to worry about.)

Hm. I am not really too scared about those dangers I have to admit. If
at all I am caring more for ereport than for elog.
Placing code with side effects as elevel arguments just seems a bit too
absurd.

 When we're using recent gcc, there is an easy fix, which is that the
 macros could do this:

Recent as in 3.1 or so ;)

Its really too bad that there's no __builtin_pure() or
__builtin_side_effect_free() or somesuch :(.

 __builtin_constant_p(elevel)  (elevel) = ERROR : pg_unreachable() : (void)  0

 This gets rid of both useless code and the risk of undesirable multiple
 evaluation, while not giving up any optimization possibility that
 existed before.  So I think we should add a configure test for
 __builtin_constant_p() and do it like that when possible.

I think there might be code somewhere that decides between FATAL and
PANIC which would not hit that path then. Imo not a good enough reason
not to do it, but I wanted to mention it anyway.

 That still leaves the question of what to do when the compiler doesn't
 have __builtin_constant_p().  Should we use the coding that we have,
 or give up and not try to mark the macros as non-returning?  I'm rather
 inclined to do the latter, because of the multiple-evaluation-risk
 argument.

I dislike the latter somewhat as it would mean not to give that
information at all to msvc and others which seems a bit sad. But I don't
feel particularly strongly.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-11 Thread Andres Freund
On 2013-01-11 15:52:19 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2013-01-11 15:05:54 -0500, Tom Lane wrote:
  And another thing: what if the elevel argument isn't safe for multiple
  evaluation?  No such hazard ever existed before these patches, so I'm
  not very comfortable with adding one.  (Even if all our own code is
  safe, there's third-party code to worry about.)
 
  Hm. I am not really too scared about those dangers I have to admit.
 
 I agree the scenario doesn't seem all that probable, but what scares me
 here is that if we use __builtin_constant_p(elevel)  (elevel) = ERROR
 in some builds, and just (elevel) = ERROR in others, then if there is
 any code with a multiple-evaluation hazard, it is only buggy in the
 latter builds.  That's sufficiently nasty that I'm willing to give up
 an optimization that we never had before 9.3 anyway.

Well, why use it at all then and not just rely on
__builtin_unreachable() in any recent gcc (and llvm fwiw) and abort()
otherwise? Then the code is small for anything recent (gcc 4.4 afair)
and always consistently buggy.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-11 Thread Andres Freund
On 2013-01-11 16:16:58 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2013-01-11 15:52:19 -0500, Tom Lane wrote:
  I agree the scenario doesn't seem all that probable, but what scares me
  here is that if we use __builtin_constant_p(elevel)  (elevel) = ERROR
  in some builds, and just (elevel) = ERROR in others, then if there is
  any code with a multiple-evaluation hazard, it is only buggy in the
  latter builds.  That's sufficiently nasty that I'm willing to give up
  an optimization that we never had before 9.3 anyway.

  Well, why use it at all then and not just rely on
  __builtin_unreachable() in any recent gcc (and llvm fwiw) and abort()
  otherwise? Then the code is small for anything recent (gcc 4.4 afair)
  and always consistently buggy.

 Uh ... because it's *not* unreachable if elevel  ERROR.  Otherwise we'd
 just mark errfinish as __attribute((noreturn)) and be done.  Of course,
 that's a gcc-ism too.

Well, I mean with the double evaluation risk.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-11 Thread Andres Freund
On 2013-01-11 16:28:13 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2013-01-11 16:16:58 -0500, Tom Lane wrote:
  Uh ... because it's *not* unreachable if elevel  ERROR.  Otherwise we'd
  just mark errfinish as __attribute((noreturn)) and be done.  Of course,
  that's a gcc-ism too.
 
  Well, I mean with the double evaluation risk.
 
 Oh, are you saying you don't want to make the __builtin_constant_p
 addition?

Yes, that was what I wanted to say.

 I'm not very satisfied with that answer.  Right now, Peter's
 patch has added a class of bugs that never existed before 9.3, and yours
 would add more.  It might well be that those classes are empty ... but
 *we can't know that*.  I don't think that keeping a new optimization for
 non-gcc compilers is worth that risk.  Postgres is already full of
 gcc-only optimizations, anyway.

ISTM that ereport() already has so strange behaviour wrt evaluation of
arguments (i.e doing it only when the message is really logged) that its
doesn't seem to add a real additional risk.
It also means that we potentially need to add more quieting returns et
al to keep the warning level on non-gcc compatible compilers low
(probably only msvc matters here) which we all don't see on our primary
compilers.

Anyway, youve got a strong opinion and I don't, so ...

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-10 Thread Andres Freund
On 2013-01-10 00:05:07 +0100, Andres Freund wrote:
 On 2013-01-09 17:28:35 -0500, Tom Lane wrote:
  (We know this doesn't
  matter, but gcc doesn't; this version of gcc apparently isn't doing much
  with the knowledge that elog won't return.)

 Afaics one reason for that is that we don't have any such annotation for
 elog(), just for ereport. And I don't immediately see how it could be
 added to elog without relying on variadic macros. Bit of a shame, there
 probably a good number of callsites that could actually benefit from
 that knowledge.
 Is it worth making that annotation conditional on variadic macro
 support?

A quick test confirms that. With:

diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index cbbda04..39cd809 100644
--- a/src/include/utils/elog.h
+++ b/src/include/utils/elog.h
@@ -212,7 +212,13 @@ extern int getinternalerrposition(void);
  * elog(ERROR, portal \%s\ not found, stmt-portalname);
  *--
  */
+#ifdef __GNUC__
+#define elog(elevel, ...) elog_start(__FILE__, __LINE__, PG_FUNCNAME_MACRO), \
+   elog_finish(elevel, __VA_ARGS__),   
\
+   ((elevel) = ERROR ? __builtin_unreachable() : (void) 0)
+#else
 #define elog   elog_start(__FILE__, __LINE__, PG_FUNCNAME_MACRO), elog_finish
+#endif
 
 extern void elog_start(const char *filename, int lineno, const char *funcname);
 extern void

nearly the same code is generated for both variants (this is no
additionally saved registers and such).

Two unfinished things:
* the above only checks for gcc although all C99 compilers should
  support varargs macros
* It uses __builtin_unreachable() instead of abort() as that creates
  a smaller executable. That's gcc 4.5 only. Saves about 30kb in a
  both a stripped and unstripped binary.
* elog(ERROR) i.e. no args would require more macro ugliness, but that
  seems unneccesary

Doing the __builtin_unreachable() for ereport as well saves about 50kb.

Given that some of those elog/ereports are in performance critical code
paths it seems like a good idea to remove code from the
callsites. Adding configure check for __VA_ARGS__ and
__builtin_unreachable() should be simple enough?

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-10 Thread Andres Freund
On 2013-01-10 10:31:04 +0100, Andres Freund wrote:
 On 2013-01-10 00:05:07 +0100, Andres Freund wrote:
  On 2013-01-09 17:28:35 -0500, Tom Lane wrote:
   (We know this doesn't
   matter, but gcc doesn't; this version of gcc apparently isn't doing much
   with the knowledge that elog won't return.)
 
  Afaics one reason for that is that we don't have any such annotation for
  elog(), just for ereport. And I don't immediately see how it could be
  added to elog without relying on variadic macros. Bit of a shame, there
  probably a good number of callsites that could actually benefit from
  that knowledge.
  Is it worth making that annotation conditional on variadic macro
  support?

 A quick test confirms that. With:

 diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
 index cbbda04..39cd809 100644
 --- a/src/include/utils/elog.h
 +++ b/src/include/utils/elog.h
 @@ -212,7 +212,13 @@ extern int   getinternalerrposition(void);
   *   elog(ERROR, portal \%s\ not found, stmt-portalname);
   *--
   */
 +#ifdef __GNUC__
 +#define elog(elevel, ...) elog_start(__FILE__, __LINE__, PG_FUNCNAME_MACRO), 
 \
 + elog_finish(elevel, __VA_ARGS__),   
 \
 + ((elevel) = ERROR ? __builtin_unreachable() : (void) 0)
 +#else
  #define elog elog_start(__FILE__, __LINE__, PG_FUNCNAME_MACRO), elog_finish
 +#endif

  extern void elog_start(const char *filename, int lineno, const char 
 *funcname);
  extern void

 nearly the same code is generated for both variants (this is no
 additionally saved registers and such).

 Two unfinished things:
 * the above only checks for gcc although all C99 compilers should
   support varargs macros
 * It uses __builtin_unreachable() instead of abort() as that creates
   a smaller executable. That's gcc 4.5 only. Saves about 30kb in a
   both a stripped and unstripped binary.
 * elog(ERROR) i.e. no args would require more macro ugliness, but that
   seems unneccesary

 Doing the __builtin_unreachable() for ereport as well saves about 50kb.

 Given that some of those elog/ereports are in performance critical code
 paths it seems like a good idea to remove code from the
 callsites. Adding configure check for __VA_ARGS__ and
 __builtin_unreachable() should be simple enough?

For whatever its worth - I am not sure its much - after relying on
variadic macros we can make elog into one function call instead of
two. That saves another 100kb.

[tinker]

The builtin_unreachable and combined elog thing bring a measurable
performance benefit of a rather surprising 7% when running Pavel's
testcase ontop of yesterdays HEAD. So it seems worth investigating.
About 3% of that is the __builtin_unreachable() and about 4% the
combining of elog_start/finish.

Now that testcase sure isn't a very representative one for this, but I
don't really see why it should benefit much more than other cases. Seems
to be quite the benefit for relatively minor cost.
Although I am pretty sure just copying the whole of
elog_start/elog_finish into one elog_all() isn't the best way to go at
this...

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-09 Thread Andres Freund
On 2013-01-09 11:57:35 -0500, Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  On 2013-01-09 11:37:46 -0500, Tom Lane wrote:
  I agree that what dirmod is doing is pretty ugly, but it's localized
  enough that I don't care particularly.  (Really, the only reason it's
  a hack and not The Solution is that at the moment there's only one
  file doing it that way.  A trampoline function for use only by code
  that needs to work in both frontend and backend isn't an unreasonable
  idea.)
 
  Well, after the patch the trampoline function would be palloc() itself.
 
 My objection is that you're forcing *all* backend code to go through
 the trampoline.  That probably has a negative impact on performance, and
 if you think it'll get committed without a thorough proof that there's
 no such impact, you're mistaken.  Perhaps you're not aware of exactly
 how hot a hot-spot palloc is?  It's not unlikely that this patch could
 hurt backend performance by several percent all by itself.

There is no additional overhead except some minor increase in code size.
If you look at the patch palloc() now simply directly calls
CurrentMemoryContext-methods-alloc. So there is no additional function
call relative to the previous state.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-09 Thread Andres Freund
On 2013-01-09 12:32:20 -0500, Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  On 2013-01-09 11:57:35 -0500, Tom Lane wrote:
  My objection is that you're forcing *all* backend code to go through
  the trampoline.  That probably has a negative impact on performance, and
  if you think it'll get committed without a thorough proof that there's
  no such impact, you're mistaken.  Perhaps you're not aware of exactly
  how hot a hot-spot palloc is?  It's not unlikely that this patch could
  hurt backend performance by several percent all by itself.

  There is no additional overhead except some minor increase in code size.
  If you look at the patch palloc() now simply directly calls
  CurrentMemoryContext-methods-alloc. So there is no additional function
  call relative to the previous state.

 Apparently we're failing to communicate, so let me put this as clearly
 as possible: an unsupported assertion that this patch has zero cost
 isn't worth the electrons it's written on.

Well, I *did* benchmark it as noted elsewhere in the thread, but thats
obviously just machine (E5520 x 2) with one rather restricted workload
(pgbench -S -jc 40 -T60). At least its rather palloc heavy.

Here are the numbers:

before:
#101646.763208 101350.361595 101421.425668 101571.211688 101862.172051 
101449.857665
after:
#101553.596257 102132.277795 101528.816229 101733.541792 101438.531618 
101673.400992

So on my system if there is a difference, its positive (0.12%).

 In any case it's not clear to me why duplicating code like that is a
 less ugly approach than having different macro definitions for frontend
 and backend.

Imo its better abstracted and more consistent (pfree() is not a macro,
palloc() is) and actually makes it easier to redirect code somewhere
else. It also opens the door for exposing less knowledge in
utils/palloc.h.

Anyway, we can use a macro instead, I don't think its that important.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-09 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 Well, I *did* benchmark it as noted elsewhere in the thread, but thats
 obviously just machine (E5520 x 2) with one rather restricted workload
 (pgbench -S -jc 40 -T60). At least its rather palloc heavy.

 Here are the numbers:

 before:
 #101646.763208 101350.361595 101421.425668 101571.211688 101862.172051 
 101449.857665
 after:
 #101553.596257 102132.277795 101528.816229 101733.541792 101438.531618 
 101673.400992

 So on my system if there is a difference, its positive (0.12%).

pgbench-based testing doesn't fill me with a lot of confidence for this
--- its numbers contain a lot of communication overhead, not to mention
that pgbench itself can be a bottleneck.  It struck me that we have a
recent test case that's known to be really palloc-intensive, namely
Pavel's example here:
http://www.postgresql.org/message-id/CAFj8pRCKfoz6L82PovLXNK-1JL=jzjwat8e2bd2pwnkm7i7...@mail.gmail.com

I set up a non-cassert build of commit
78a5e738e97b4dda89e1bfea60675bcf15f25994 (ie, just before the patch that
reduced the data-copying overhead for that).  On my Fedora 16 machine
(dual 2.0GHz Xeon E5503, gcc version 4.6.3 20120306 (Red Hat 4.6.3-2))
I get a runtime for Pavel's example of 17023 msec (average over five
runs).  I then applied oprofile and got a breakdown like this:

  samples|  %|
--
   108409 84.5083 /home/tgl/testversion/bin/postgres
13723 10.6975 /lib64/libc-2.14.90.so
 3153  2.4579 /home/tgl/testversion/lib/postgresql/plpgsql.so

samples  %symbol name
1096010.1495  AllocSetAlloc
6325  5.8572  MemoryContextAllocZeroAligned
6225  5.7646  base_yyparse
3765  3.4866  copyObject
2511  2.3253  MemoryContextAlloc
2292  2.1225  grouping_planner
2044  1.8928  SearchCatCache
1956  1.8113  core_yylex
1763  1.6326  expression_tree_walker
1347  1.2474  MemoryContextCreate
1340  1.2409  check_stack_depth
1276  1.1816  GetCachedPlan
1175  1.0881  AllocSetFree
1106  1.0242  GetSnapshotData
1106  1.0242  _SPI_execute_plan
1101  1.0196  extract_query_dependencies_walker

I then applied the palloc.h and mcxt.c hunks of your patch and rebuilt.
Now I get an average runtime of 1 ms, a full 2% faster, which is a
bit astonishing, particularly because the oprofile results haven't moved
much:

   107642 83.7427 /home/tgl/testversion/bin/postgres
14677 11.4183 /lib64/libc-2.14.90.so
 3180  2.4740 /home/tgl/testversion/lib/postgresql/plpgsql.so

samples  %symbol name
10038 9.3537  AllocSetAlloc
6392  5.9562  MemoryContextAllocZeroAligned
5763  5.3701  base_yyparse
4810  4.4821  copyObject
2268  2.1134  grouping_planner
2178  2.0295  core_yylex
1963  1.8292  palloc
1867  1.7397  SearchCatCache
1835  1.7099  expression_tree_walker
1551  1.4453  check_stack_depth
1374  1.2803  _SPI_execute_plan
1282  1.1946  MemoryContextCreate
1187  1.1061  AllocSetFree
...
653   0.6085  palloc0
...
552   0.5144  MemoryContextAlloc

The number of calls of AllocSetAlloc certainly hasn't changed at all, so
how did that get faster?

I notice that the postgres executable is about 0.2% smaller, presumably
because a whole lot of inlined fetches of CurrentMemoryContext are gone.
This makes me wonder if my result is due to chance improvements of cache
line alignment for inner loops.

I would like to know if other people get comparable results on other
hardware (non-Intel hardware would be especially interesting).  If this
result holds up across a range of platforms, I'll withdraw my objection
to making palloc a plain function.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-09 Thread Tom Lane
I wrote:
 I then applied the palloc.h and mcxt.c hunks of your patch and rebuilt.
 Now I get an average runtime of 1 ms, a full 2% faster, which is a
 bit astonishing, particularly because the oprofile results haven't moved
 much:

I studied the assembly code being generated for palloc(), and I believe
I see the reason why it's a bit faster: when there's only a single local
variable that has to survive over the elog call, gcc generates a shorter
function entry/exit sequence.  I had thought of proposing that we code
palloc() like this:

void *
palloc(Size size)
{
MemoryContext context = CurrentMemoryContext;

AssertArg(MemoryContextIsValid(context));

if (!AllocSizeIsValid(size))
elog(ERROR, invalid memory alloc request size %lu,
 (unsigned long) size);

context-isReset = false;

return (*context-methods-alloc) (context, size);
}

but at least on this specific hardware and compiler that would evidently
be a net loss compared to direct use of CurrentMemoryContext.  I would
not put a lot of faith in that result holding up on other machines
though.

In any case this doesn't explain the whole 2% speedup, but it probably
accounts for palloc() showing as slightly cheaper than
MemoryContextAlloc had been in the oprofile listing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-09 Thread Andres Freund
On 2013-01-09 15:43:19 -0500, Tom Lane wrote:
 I wrote:
  I then applied the palloc.h and mcxt.c hunks of your patch and rebuilt.
  Now I get an average runtime of 1 ms, a full 2% faster, which is a
  bit astonishing, particularly because the oprofile results haven't moved
  much:

 I studied the assembly code being generated for palloc(), and I believe
 I see the reason why it's a bit faster: when there's only a single local
 variable that has to survive over the elog call, gcc generates a shorter
 function entry/exit sequence.

Makes sense.

  I had thought of proposing that we code
 palloc() like this:

 void *
 palloc(Size size)
 {
 MemoryContext context = CurrentMemoryContext;

 AssertArg(MemoryContextIsValid(context));

 if (!AllocSizeIsValid(size))
 elog(ERROR, invalid memory alloc request size %lu,
  (unsigned long) size);

 context-isReset = false;

 return (*context-methods-alloc) (context, size);
 }

 but at least on this specific hardware and compiler that would evidently
 be a net loss compared to direct use of CurrentMemoryContext.  I would
 not put a lot of faith in that result holding up on other machines
 though.

Thats not optimized to the same? ISTM the compiler should produce
exactly the same code for both.

 In any case this doesn't explain the whole 2% speedup, but it probably
 accounts for palloc() showing as slightly cheaper than
 MemoryContextAlloc had been in the oprofile listing.

I'd guess that a good part of the cost is just smeared across all
callers and not individually accountable to any function visible in the
profile. Additionally, With functions as short as MemoryContextAllocZero
profiles like oprofile (and perf) also often leak quite a bit of the
actual cost to the callsites in my experience.

I wonder whether it makes sense to inline the contents pstrdup()
additionally? My gut feeling is not, but...

I would like to move CurrentMemoryContext to memutils.h, but that seems
to require too many changes.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-09 Thread Andres Freund
On 2013-01-09 15:07:10 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  Well, I *did* benchmark it as noted elsewhere in the thread, but thats
  obviously just machine (E5520 x 2) with one rather restricted workload
  (pgbench -S -jc 40 -T60). At least its rather palloc heavy.
 
  Here are the numbers:
 
  before:
  #101646.763208 101350.361595 101421.425668 101571.211688 101862.172051 
  101449.857665
  after:
  #101553.596257 102132.277795 101528.816229 101733.541792 101438.531618 
  101673.400992
 
  So on my system if there is a difference, its positive (0.12%).
 
 pgbench-based testing doesn't fill me with a lot of confidence for this
 --- its numbers contain a lot of communication overhead, not to mention
 that pgbench itself can be a bottleneck. 

I chose it because I looked at profiles of it often enough to know
memory allocation shows up high in the profile (especially without -M
prepared).

 I would like to know if other people get comparable results on other
 hardware (non-Intel hardware would be especially interesting).  If this
 result holds up across a range of platforms, I'll withdraw my objection
 to making palloc a plain function.

I only have access to core2 and Nehalem atm, but FWIW I just tested it
on another workload (decoding WAL changes of a large transaction into
text) and I see improvements around 1.2% on both.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-09 Thread Andres Freund
On 2013-01-09 17:28:35 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2013-01-09 15:43:19 -0500, Tom Lane wrote:
  I had thought of proposing that we code
  palloc() like this:
 
  void *
  palloc(Size size)
  {
  MemoryContext context = CurrentMemoryContext;
 
  AssertArg(MemoryContextIsValid(context));
 
  if (!AllocSizeIsValid(size))
  elog(ERROR, invalid memory alloc request size %lu,
   (unsigned long) size);
 
  context-isReset = false;
 
  return (*context-methods-alloc) (context, size);
  }
 
  but at least on this specific hardware and compiler that would evidently
  be a net loss compared to direct use of CurrentMemoryContext.  I would
  not put a lot of faith in that result holding up on other machines
  though.

  Thats not optimized to the same? ISTM the compiler should produce
  exactly the same code for both.

 No, that's exactly the point here, you can't just assume that you get
 the same code; this is tense enough that a few instructions matter.
 Remember we're considering non-assert builds, so the AssertArg vanishes.
 So in the form where CurrentMemoryContext is only read after the elog
 call, the compiler can see that it requires but one fetch - it can't
 change within the two lines where it's used.  In the form given above,
 the compiler is required to fetch CurrentMemoryContext into a local
 variable and keep it across the elog call.

I had thought that it could simply store the address in a callee
saved register inside palloc, totally forgetting that palloc would need
to save that register itself in that case.

 (We know this doesn't
 matter, but gcc doesn't; this version of gcc apparently isn't doing much
 with the knowledge that elog won't return.)

Afaics one reason for that is that we don't have any such annotation for
elog(), just for ereport. And I don't immediately see how it could be
added to elog without relying on variadic macros. Bit of a shame, there
probably a good number of callsites that could actually benefit from
that knowledge.
Is it worth making that annotation conditional on variadic macro
support?

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers