[HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
For the record, on MSVC we can use __assume(0) to mark unreachable code. It does the same as gcc's __builtin_unreachable(). I tested it with the same Pavel's palloc-heavy test case that you used earlier, with the one-shot plan commit temporarily reverted, and saw a similar speedup you reported with gcc's __builtin_unreachable(). So, committed that. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
On 13-01-21 02:28 AM, Andres Freund wrote: I haven't removed it from the patch afaik, so it would be great to get a profile here! Its only for xlogdump, but that tool helped me immensely and I don't want to maintain it independently... Here is the output from tprof Here is the baseline: Average time 37862 Total Ticks For All Processes (./postgres_perftest) = 19089 Subroutine Ticks% Source Address Bytes == = == == === = .AllocSetAlloc 2270 0.99 aset.c 3bfb84b0 .base_yyparse 1217 0.53 gram.c fe644 168ec .text 1015 0.44 copyfuncs.c 11bfb4 4430 .MemoryContextAllocZero 535 0.23 mcxt.c 3b990 f0 .MemoryContextAlloc 533 0.23 mcxt.c 3b780 e0 .check_stack_depth 392 0.17 postgres.c 568ac100 .core_yylex 385 0.17 gram.c fb5b4 1c90 .expression_tree_walker 347 0.15 nodeFuncs.c 50da4750 .AllocSetFree308 0.13 aset.c 3c9981b0 .grouping_planner242 0.11 planner.c 2399f0 1d10 .SearchCatCache 198 0.09 catcache.c 7ec20550 ._SPI_execute_plan 195 0.09 spi.c 2f0c187c0 .pfree 195 0.09 mcxt.c 3b3b0 70 query_dependencies_walker185 0.08 setrefs.c 2559441b0 .GetSnapshotData 183 0.08 procarray.c 69efc460 .query_tree_walker 176 0.08 nodeFuncs.c 50ae4210 .strncpy 168 0.07 strncpy.s ba080130 .fmgr_info_cxt_security 166 0.07 fmgr.c 3f7b0850 .transformStmt 159 0.07 analyze.c 29091c 12d0 .text141 0.06 parse_collate.c 28ddf8 1 .ExecInitExpr137 0.06 execQual.c 17f18c 15f0 .fix_expr_common 132 0.06 setrefs.c 2557e4160 .standard_ExecutorStart 127 0.06 execMain.c 1d9a00940 .GetCachedPlan 125 0.05 plancache.c ce664310 .strcpy 121 0.05 noname 3bd401a8 With your changes (same test as I described before) Average: 37938 Total Ticks For All Processes (./postgres_perftest) = 19311 Subroutine Ticks% Source Address Bytes == = == == === = .AllocSetAlloc 2162 2.17 aset.c 3bfb84b0 .base_yyparse 1242 1.25 gram.c fdc7c 167f0 .text 1028 1.03 copyfuncs.c 11b4d0 4210 .palloc 553 0.56 mcxt.c 3b4c8 d0 .MemoryContextAllocZero 509 0.51 mcxt.c 3b9e8 f0 .core_yylex 413 0.41 gram.c fac2c 1c60 .check_stack_depth 404 0.41 postgres.c 56730100 .expression_tree_walker 320 0.32 nodeFuncs.c 50d28750 .AllocSetFree261 0.26 aset.c 3c9981b0 ._SPI_execute_plan 232 0.23 spi.c 2ee6247c0 .GetSnapshotData 221 0.22 procarray.c 69d54460 .grouping_planner211 0.21 planner.c 237b60 1cf0 .MemoryContextAlloc 190 0.19 mcxt.c 3b738 e0 .query_tree_walker 184 0.18 nodeFuncs.c 50a68210 .SearchCatCache 182 0.18 catcache.c 7ea08550 .transformStmt 181 0.18 analyze.c 28e774 12d0 query_dependencies_walker180 0.18 setrefs.c 25397c1b0 .strncpy 175 0.18 strncpy.s b9a60130 .MemoryContextCreate 167 0.17 mcxt.c 3bad8160 .pfree 151 0.15 mcxt.c 3b208 70 .strcpy 150 0.15 noname 3bd401a8 .fmgr_info_cxt_security 146 0.15 fmgr.c 3f790850 .text132 0.13 parse_collate.c 28bc50 1 .ExecInitExpr125 0.13 execQual.c 17de28 15e0 .expression_tree_mutator 124 0.12 nodeFuncs.c 53268 1080 .strcmp 122 0.12 noname 1e44158 .fix_expr_common 117 0.12 setrefs.c 25381c160 Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
On 2013-01-21 11:59:18 -0500, Steve Singer wrote: On 13-01-21 02:28 AM, Andres Freund wrote: I haven't removed it from the patch afaik, so it would be great to get a profile here! Its only for xlogdump, but that tool helped me immensely and I don't want to maintain it independently... Here is the output from tprof Here is the baseline: Any chance to do the test ontop of HEAD? The elog stuff has gone in only afterwards and might actually effect this enough to be relevant. Otherwise I have to say I am a bit confused by the mighty difference in costs attributed to AllocSetAlloc given the amount of calls should be exactly the same and the number of trampoline function calls should also stay the same. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
On 13-01-21 12:15 PM, Andres Freund wrote: On 2013-01-21 11:59:18 -0500, Steve Singer wrote: On 13-01-21 02:28 AM, Andres Freund wrote: I haven't removed it from the patch afaik, so it would be great to get a profile here! Its only for xlogdump, but that tool helped me immensely and I don't want to maintain it independently... Here is the output from tprof Here is the baseline: Any chance to do the test ontop of HEAD? The elog stuff has gone in only afterwards and might actually effect this enough to be relevant. HEAD(:fb197290) Average: 28877 .AllocSetAlloc 1442 1.90 aset.c 3a9384b0 .base_yyparse 1220 1.61 gram.c fdbc0 168ec .MemoryContextAlloc 485 0.64 mcxt.c 3a154 e0 .core_yylex 407 0.54 gram.c fab30 1c90 .AllocSetFree320 0.42 aset.c 3b3181b0 .MemoryContextAllocZero 316 0.42 mcxt.c 3a364 f0 .grouping_planner271 0.36 planner.c 2b0ce8 1d10 .strncpy 256 0.34 strncpy.s b8ca0130 .expression_tree_walker 222 0.29 nodeFuncs.c 4f734750 ._SPI_execute_plan 221 0.29 spi.c 2fb230840 .SearchCatCache 182 0.24 catcache.c 7d870550 .GetSnapshotData 161 0.21 procarray.c 68acc460 .fmgr_info_cxt_security 155 0.20 fmgr.c 3e130850 .pfree 152 0.20 mcxt.c 39d84 70 .expression_tree_mutator 151 0.20 nodeFuncs.c 51c84 1170 .check_stack_depth 142 0.19 postgres.c 5523c100 .text126 0.17 parse_collate.c 233d90 1 .transformStmt 125 0.16 analyze.c 289e88 12c0 .ScanKeywordLookup 123 0.16 kwlookup.c f7ab4154 .new_list120 0.16 list.c 40f74 b0 .subquery_planner115 0.15 planner.c 2b29f8dc0 .GetCachedPlan 115 0.15 plancache.c cdab0310 .ExecInitExpr114 0.15 execQual.c 17e690 15f0 .set_plan_refs 113 0.15 setrefs.c 252630cb0 .standard_ExecutorStart 110 0.14 execMain.c 1d9264940 .heap_form_tuple 107 0.14 heaptuple.c 177c842f0 .query_tree_walker 105 0.14 nodeFuncs.c 4f474210 HEAD + patch: Average: 29035 Total Ticks For All Processes (./postgres_perftest) = 15044 Subroutine Ticks% Source Address Bytes == = == == === = .AllocSetAlloc 1422 1.64 aset.c 3a9384b0 .base_yyparse 1201 1.38 gram.c fd1e8 167f0 .palloc 470 0.54 mcxt.c 39e04 d0 .core_yylex 364 0.42 gram.c fa198 1c60 .MemoryContextAllocZero 282 0.33 mcxt.c 3a324 f0 ._SPI_execute_plan 250 0.29 spi.c 2f8c18840 .AllocSetFree244 0.28 aset.c 3b3181b0 .strncpy 234 0.27 strncpy.s b86a0130 .expression_tree_walker 229 0.26 nodeFuncs.c 4f698750 .grouping_planner176 0.20 planner.c 2aea84 1d30 .SearchCatCache 170 0.20 catcache.c 7d638550 .GetSnapshotData 168 0.19 procarray.c 68904460 .assign_collations_walker155 0.18 parse_collate.c 231f4c7e0 .subquery_planner141 0.16 planner.c 2b07b4dc0 .fmgr_info_cxt_security 141 0.16 fmgr.c 3e110850 .check_stack_depth 140 0.16 postgres.c 550a0100 .ExecInitExpr138 0.16 execQual.c 17d2f8 15e0 .pfree 137 0.16 mcxt.c 39b44 70 .transformStmt 132 0.15 analyze.c 287dec 12c0 .new_list128 0.15 list.c 40f44 90 .expression_tree_mutator 125 0.14 nodeFuncs.c 51bd8 1080 .preprocess_expression 118 0.14 planner.c 2adf541a0 .strcmp 118 0.14 noname 1e44158 .standard_ExecutorStart 115 0.13 execMain.c 1d77c0940 .MemoryContextAlloc 111 0.13 mcxt.c 3a074 e0 .set_plan_refs 109 0.13 setrefs.c 2506c4ca0 Otherwise I have to say I am a bit confused by the mighty difference in costs attributed to AllocSetAlloc given the amount of calls should be exactly the same and the number of trampoline function calls should also stay the same. Greetings, Andres Freund -- Andres Freundhttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
On 2013-01-19 17:33:05 -0500, Steve Singer wrote: On 13-01-09 03:07 PM, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: Well, I *did* benchmark it as noted elsewhere in the thread, but thats obviously just machine (E5520 x 2) with one rather restricted workload (pgbench -S -jc 40 -T60). At least its rather palloc heavy. Here are the numbers: before: #101646.763208 101350.361595 101421.425668 101571.211688 101862.172051 101449.857665 after: #101553.596257 102132.277795 101528.816229 101733.541792 101438.531618 101673.400992 So on my system if there is a difference, its positive (0.12%). pgbench-based testing doesn't fill me with a lot of confidence for this --- its numbers contain a lot of communication overhead, not to mention that pgbench itself can be a bottleneck. It struck me that we have a recent test case that's known to be really palloc-intensive, namely Pavel's example here: http://www.postgresql.org/message-id/CAFj8pRCKfoz6L82PovLXNK-1JL=jzjwat8e2bd2pwnkm7i7...@mail.gmail.com I set up a non-cassert build of commit 78a5e738e97b4dda89e1bfea60675bcf15f25994 (ie, just before the patch that reduced the data-copying overhead for that). On my Fedora 16 machine (dual 2.0GHz Xeon E5503, gcc version 4.6.3 20120306 (Red Hat 4.6.3-2)) I get a runtime for Pavel's example of 17023 msec (average over five runs). I then applied oprofile and got a breakdown like this: ... The number of calls of AllocSetAlloc certainly hasn't changed at all, so how did that get faster? I notice that the postgres executable is about 0.2% smaller, presumably because a whole lot of inlined fetches of CurrentMemoryContext are gone. This makes me wonder if my result is due to chance improvements of cache line alignment for inner loops. I would like to know if other people get comparable results on other hardware (non-Intel hardware would be especially interesting). If this result holds up across a range of platforms, I'll withdraw my objection to making palloc a plain function. regards, tom lane Sorry for the delay I only read this thread today. I just tried Pawel's test on a POWER5 machine with an older version of gcc (see the grebe buildfarm animal for details) 78a5e738e: 37874.855 (average of 6 runs) 78a5e738 + palloc.h + mcxt.c: 38076.8035 The functions do seem to slightly slow things down on POWER. I haven't bothered to run oprofile or tprof to get a breakdown of the functions since Andres has already removed this from his patch. I haven't removed it from the patch afaik, so it would be great to get a profile here! Its only for xlogdump, but that tool helped me immensely and I don't want to maintain it independently... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
On 13-01-09 03:07 PM, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: Well, I *did* benchmark it as noted elsewhere in the thread, but thats obviously just machine (E5520 x 2) with one rather restricted workload (pgbench -S -jc 40 -T60). At least its rather palloc heavy. Here are the numbers: before: #101646.763208 101350.361595 101421.425668 101571.211688 101862.172051 101449.857665 after: #101553.596257 102132.277795 101528.816229 101733.541792 101438.531618 101673.400992 So on my system if there is a difference, its positive (0.12%). pgbench-based testing doesn't fill me with a lot of confidence for this --- its numbers contain a lot of communication overhead, not to mention that pgbench itself can be a bottleneck. It struck me that we have a recent test case that's known to be really palloc-intensive, namely Pavel's example here: http://www.postgresql.org/message-id/CAFj8pRCKfoz6L82PovLXNK-1JL=jzjwat8e2bd2pwnkm7i7...@mail.gmail.com I set up a non-cassert build of commit 78a5e738e97b4dda89e1bfea60675bcf15f25994 (ie, just before the patch that reduced the data-copying overhead for that). On my Fedora 16 machine (dual 2.0GHz Xeon E5503, gcc version 4.6.3 20120306 (Red Hat 4.6.3-2)) I get a runtime for Pavel's example of 17023 msec (average over five runs). I then applied oprofile and got a breakdown like this: samples| %| -- 108409 84.5083 /home/tgl/testversion/bin/postgres 13723 10.6975 /lib64/libc-2.14.90.so 3153 2.4579 /home/tgl/testversion/lib/postgresql/plpgsql.so samples %symbol name 1096010.1495 AllocSetAlloc 6325 5.8572 MemoryContextAllocZeroAligned 6225 5.7646 base_yyparse 3765 3.4866 copyObject 2511 2.3253 MemoryContextAlloc 2292 2.1225 grouping_planner 2044 1.8928 SearchCatCache 1956 1.8113 core_yylex 1763 1.6326 expression_tree_walker 1347 1.2474 MemoryContextCreate 1340 1.2409 check_stack_depth 1276 1.1816 GetCachedPlan 1175 1.0881 AllocSetFree 1106 1.0242 GetSnapshotData 1106 1.0242 _SPI_execute_plan 1101 1.0196 extract_query_dependencies_walker I then applied the palloc.h and mcxt.c hunks of your patch and rebuilt. Now I get an average runtime of 1 ms, a full 2% faster, which is a bit astonishing, particularly because the oprofile results haven't moved much: 107642 83.7427 /home/tgl/testversion/bin/postgres 14677 11.4183 /lib64/libc-2.14.90.so 3180 2.4740 /home/tgl/testversion/lib/postgresql/plpgsql.so samples %symbol name 10038 9.3537 AllocSetAlloc 6392 5.9562 MemoryContextAllocZeroAligned 5763 5.3701 base_yyparse 4810 4.4821 copyObject 2268 2.1134 grouping_planner 2178 2.0295 core_yylex 1963 1.8292 palloc 1867 1.7397 SearchCatCache 1835 1.7099 expression_tree_walker 1551 1.4453 check_stack_depth 1374 1.2803 _SPI_execute_plan 1282 1.1946 MemoryContextCreate 1187 1.1061 AllocSetFree ... 653 0.6085 palloc0 ... 552 0.5144 MemoryContextAlloc The number of calls of AllocSetAlloc certainly hasn't changed at all, so how did that get faster? I notice that the postgres executable is about 0.2% smaller, presumably because a whole lot of inlined fetches of CurrentMemoryContext are gone. This makes me wonder if my result is due to chance improvements of cache line alignment for inner loops. I would like to know if other people get comparable results on other hardware (non-Intel hardware would be especially interesting). If this result holds up across a range of platforms, I'll withdraw my objection to making palloc a plain function. regards, tom lane Sorry for the delay I only read this thread today. I just tried Pawel's test on a POWER5 machine with an older version of gcc (see the grebe buildfarm animal for details) 78a5e738e: 37874.855 (average of 6 runs) 78a5e738 + palloc.h + mcxt.c: 38076.8035 The functions do seem to slightly slow things down on POWER. I haven't bothered to run oprofile or tprof to get a breakdown of the functions since Andres has already removed this from his patch. Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
On 2013-01-09 15:07:10 -0500, Tom Lane wrote: I would like to know if other people get comparable results on other hardware (non-Intel hardware would be especially interesting). If this result holds up across a range of platforms, I'll withdraw my objection to making palloc a plain function. Unfortunately nobody spoke up and I don't have any non-intel hardware anymore. Well, aside from my phone that is... Updated patch attached, the previous version missed some contrib modules. I like the diffstat: 41 files changed, 224 insertions(+), 636 deletions(-) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From e4960b1659c8bc33661ff07b2ba3cccef653c8fc Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Wed, 9 Jan 2013 12:05:37 +0100 Subject: [PATCH] Provide a common malloc wrappers and palloc et al. emulation for frontend'ish environs --- contrib/oid2name/oid2name.c| 52 + contrib/pg_upgrade/pg_upgrade.h| 5 +- contrib/pg_upgrade/util.c | 49 - contrib/pgbench/pgbench.c | 54 +- src/backend/storage/file/copydir.c | 12 +-- src/backend/utils/mmgr/mcxt.c | 78 +++- src/bin/initdb/initdb.c| 40 +- src/bin/pg_basebackup/pg_basebackup.c | 2 +- src/bin/pg_basebackup/pg_receivexlog.c | 1 + src/bin/pg_basebackup/receivelog.c | 1 + src/bin/pg_basebackup/streamutil.c | 38 +- src/bin/pg_basebackup/streamutil.h | 4 - src/bin/pg_ctl/pg_ctl.c| 39 +- src/bin/pg_dump/Makefile | 6 +- src/bin/pg_dump/common.c | 1 - src/bin/pg_dump/compress_io.c | 1 - src/bin/pg_dump/dumpmem.c | 76 --- src/bin/pg_dump/dumpmem.h | 22 -- src/bin/pg_dump/dumputils.c| 1 - src/bin/pg_dump/dumputils.h| 1 + src/bin/pg_dump/pg_backup_archiver.c | 1 - src/bin/pg_dump/pg_backup_custom.c | 2 +- src/bin/pg_dump/pg_backup_db.c | 1 - src/bin/pg_dump/pg_backup_directory.c | 1 - src/bin/pg_dump/pg_backup_null.c | 1 - src/bin/pg_dump/pg_backup_tar.c| 1 - src/bin/pg_dump/pg_dump.c | 1 - src/bin/pg_dump/pg_dump_sort.c | 1 - src/bin/pg_dump/pg_dumpall.c | 1 - src/bin/pg_dump/pg_restore.c | 1 - src/bin/pg_resetxlog/pg_resetxlog.c| 5 +- src/bin/psql/common.c | 50 - src/bin/psql/common.h | 10 +-- src/bin/scripts/common.c | 49 - src/bin/scripts/common.h | 5 +- src/include/port/palloc.h | 19 + src/include/utils/palloc.h | 12 +-- src/port/Makefile | 8 +- src/port/dirmod.c | 75 +-- src/port/palloc.c | 129 + src/tools/msvc/Mkvcbuild.pm| 4 +- 41 files changed, 224 insertions(+), 636 deletions(-) delete mode 100644 src/bin/pg_dump/dumpmem.c delete mode 100644 src/bin/pg_dump/dumpmem.h create mode 100644 src/include/port/palloc.h create mode 100644 src/port/palloc.c diff --git a/contrib/oid2name/oid2name.c b/contrib/oid2name/oid2name.c index a666731..dfd8105 100644 --- a/contrib/oid2name/oid2name.c +++ b/contrib/oid2name/oid2name.c @@ -9,6 +9,8 @@ */ #include postgres_fe.h +#include port/palloc.h + #include unistd.h #ifdef HAVE_GETOPT_H #include getopt.h @@ -50,9 +52,6 @@ struct options /* function prototypes */ static void help(const char *progname); void get_opts(int, char **, struct options *); -void *pg_malloc(size_t size); -void *pg_realloc(void *ptr, size_t size); -char *pg_strdup(const char *str); void add_one_elt(char *eltname, eary *eary); char *get_comma_elts(eary *eary); PGconn *sql_conn(struct options *); @@ -201,53 +200,6 @@ help(const char *progname) progname, progname); } -void * -pg_malloc(size_t size) -{ - void *ptr; - - /* Avoid unportable behavior of malloc(0) */ - if (size == 0) - size = 1; - ptr = malloc(size); - if (!ptr) - { - fprintf(stderr, out of memory\n); - exit(1); - } - return ptr; -} - -void * -pg_realloc(void *ptr, size_t size) -{ - void *result; - - /* Avoid unportable behavior of realloc(NULL, 0) */ - if (ptr == NULL size == 0) - size = 1; - result = realloc(ptr, size); - if (!result) - { - fprintf(stderr, out of memory\n); - exit(1); - } - return result; -} - -char * -pg_strdup(const char *str) -{ - char *result = strdup(str); - - if (!result) - { - fprintf(stderr, out of memory\n); - exit(1); - } - return result; -} - /* * add_one_elt * diff --git a/contrib/pg_upgrade/pg_upgrade.h b/contrib/pg_upgrade/pg_upgrade.h index d5c3fa9..a917b5b 100644 ---
[HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
On 2013-01-12 15:39:16 -0500, Tom Lane wrote: Heikki Linnakangas hlinnakan...@vmware.com writes: On 12.01.2013 20:42, Andres Freund wrote: I don't care for that too much in detail -- if errstart were to return false (it shouldn't, but if it did) this would be utterly broken, With the current ereport(), we'll call abort() if errstart returns false and elevel = ERROR. That's even worse than making an error reporting calls that elog.c isn't prepared for. No it isn't: you'd get a clean and easily interpretable abort. I am not sure exactly what would happen if we plow forward with calling elog.c functions after errstart returns false, but it wouldn't be good, and debugging a field report of such behavior wouldn't be easy. This is actually a disadvantage of the proposal to replace the abort() calls with __builtin_unreachable(), too. The gcc boys interpret the semantics of that as if control reaches here, the behavior is undefined --- and their notion of undefined generally seems to amount to we will screw you as hard as we can. For example, they have no problem with using the assumption that control won't reach there to make deductions while optimizing the rest of the function. If it ever did happen, I would not want to have to debug it. The same goes for __attribute__((noreturn)), BTW. We could make add an Assert() additionally? Yea, I didn't really like it all that much either - but anyway, I have yet to find *any* version with a local variable that doesn't lead to 200kb size increase :(. Hmm, that's strange. Assuming the optimizer optimizes away the paths in the macro that are never taken when elevel is a constant, I would expect the resulting binary to be smaller than what we have now. I was wondering about that too, but haven't tried it locally yet. It could be that Andres was looking at an optimization level in which a register still got allocated for the local variable. Nope, it was O2, albeit with -fno-omit-frame-pointer (for usable hierarchical profiles). Here's what I got with and without my patch on my laptop: -rwxr-xr-x 1 heikki heikki 24767237 tammi 12 21:27 postgres-do-while-mypatch -rwxr-xr-x 1 heikki heikki 24623081 tammi 12 21:28 postgres-unpatched But when I build without --enable-debug, the situation reverses: -rwxr-xr-x 1 heikki heikki 5961309 tammi 12 21:48 postgres-do-while-mypatch -rwxr-xr-x 1 heikki heikki 5986961 tammi 12 21:49 postgres-unpatched Yes, I noticed that too: these patches make the debug overhead grow substantially for some reason. It's better to look at the output of size rather than the executable's total file size. That way you don't need to rebuild without debug to see reality. (I guess you could also But the above is spot on. While I used strip earlier I somehow forgot it here and so the debug overhead is part of what I saw. But, in comparison to my baseline (which is replacing the abort() with pg_unreachable()/__builtin_unreachable()) its still a loss performancewise which is why I didn't doubt my results... Pavel's testcase with changing ereport implementations (5 runs, minimal time): EREPORT_EXPR_OLD_NO_UNREACHABLE: 9964.015ms,5530296 bytes (stripped) EREPORT_EXPR_OLD_ABORT: 9765.916ms,5466936 bytes (stripped) EREPORT_EXPR_OLD_UNREACHABLE:9646.502ms,5462456 bytes (stripped) EREPORT_STMT_HEIKKI: 10133.968ms, 5435704 bytes (stripped) EREPORT_STMT_ANDRES: 9548.436ms,5462232 bytes (stripped) Where my variant is: #define ereport_domain(elevel, domain, rest)\ do {\ const int elevel_ = elevel; \ if (errstart(elevel_,__FILE__, __LINE__, PG_FUNCNAME_MACRO, domain))\ { \ errfinish rest; \ } \ if (elevel_= ERROR)\ pg_unreachable(); \ } while(0) So I suggest using that with an additional Assert(0) in the if(elevel_) branch. The assembler generated for my version is exactly the same when elevel is constant in comparison with Heikki's but doesn't generate any additional code in the case its not constant and I guess thats where it gets faster? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
On 2013-01-12 19:47:18 -0500, Tom Lane wrote: Heikki Linnakangas hlinnakan...@vmware.com writes: When I compile with gcc -O0, I get one warning with this: datetime.c: In function �DateTimeParseError�: datetime.c:3575:1: warning: �noreturn� function does return [enabled by default] That suggests that the compiler didn't correctly deduce that ereport(ERROR, ...) doesn't return. With -O1, the warning goes away. Yeah, I am seeing this too. It appears to be endemic to the local-variable approach, ie if we have const int elevel_ = (elevel); ... (elevel_ = ERROR) ? pg_unreachable() : (void) 0 then we do not get the desired results at -O0, which is not terribly surprising --- I'd not really expect the compiler to propagate the value of elevel_ when not optimizing. If we don't use a local variable, we don't get the warning, which I take to mean that gcc will fold ERROR = ERROR to true even at -O0, and that it does this early enough to conclude that unreachability holds. I experimented with some variant ways of phrasing the macro, but the only thing that worked at -O0 required __builtin_constant_p, which rather defeats the purpose of making this accessible to non-gcc compilers. Yea, its rather sad. The only thing I can see is actually doing both variants like: #define ereport_domain(elevel, domain, rest)\ do {\ const int elevel_ = elevel; \ if (errstart(elevel_,__FILE__, __LINE__, PG_FUNCNAME_MACRO, domain))\ { \ errfinish rest; \ } \ if (pg_is_known_constant(elevel) (elevel) = ERROR) \ { \ pg_unreachable(); \ Assert(0); \ } \ else if (elevel_= ERROR) \ { \ pg_unreachable(); \ Assert(0); \ } \ } while(0) but that obviously still relies on __builtin_constant_p giving reasonable answers. We should probably put that Assert(0) into pg_unreachable()... If we go with the local-variable approach, we could probably suppress this warning by putting an abort() call at the bottom of DateTimeParseError. It seems a tad ugly though, and what's a bigger deal is that if the compiler is unable to deduce unreachability at -O0 then we are probably going to be fighting such bogus warnings for all time to come. Note also that an abort() (much less a pg_unreachable()) would not do anything positive like give us a compile warning if we mistakenly added a case that could fall through. On the other hand, if there's only one such case in our tree today, maybe I'm worrying too much. I think the only reason there's only one such case (two here actually, second in renametrig) is that all others already have been silenced because the abort() didn't use to be there. And I think the danger youre alluding to above is a real one. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
On 2013-01-12 16:36:39 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: It does *not* combine elog_start and elog_finish into one function if varargs are available although that brings a rather measurable size/performance benefit. Since you've apparently already done the measurement: how much? It would be a bit tedious to support two different infrastructures for elog(), but if it's a big enough win maybe we should. Imo its pretty definitely a big enough win. So big I have a hard time believing it can be true without negative effects somewhere else. Well, actually there's a pretty serious negative effect here, which is that when it's implemented this way it's impossible to save errno for %m before the elog argument list is evaluated. So I think this is a no-go. We've always had the contract that functions in the argument list could stomp on errno without care. If we switch to a do-while macro expansion it'd be possible to do something like do { int save_errno = errno; int elevel = whatever; elog_internal( save_errno, elevel, fmt, __VA__ARGS__ ); } while (0); but this would almost certainly result in more code bloat not less, since call sites would now be responsible for fetching errno. the numbers are: old definition: 10393.658ms, 5497912 bytes old definition + unreachable: 10011.102ms, 5469144 bytes stmt, two calls, unreachable: 10036.132ms, 5468792 bytes stmt, one call, unreachable:9443.612ms, 5462232 bytes stmt, one call, unreachable, save errno:9615.863ms, 5489688 bytes So while not saving errno is unsurprisingly better its still a win. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
Andres Freund and...@2ndquadrant.com writes: the numbers are: old definition: 10393.658ms, 5497912 bytes old definition + unreachable: 10011.102ms, 5469144 bytes stmt, two calls, unreachable: 10036.132ms, 5468792 bytes stmt, one call, unreachable:9443.612ms, 5462232 bytes stmt, one call, unreachable, save errno:9615.863ms, 5489688 bytes I find these numbers pretty hard to credit. Why should replacing two calls by one, in code paths that are not being taken, move the runtime so much? The argument that a net reduction of code size is a win doesn't work, because the last case is more code than any except the first. I think you're measuring some coincidental effect or other, not a reproducible performance improvement. Or there's a bug in the code you're using. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
On 2013-01-13 14:17:52 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: the numbers are: old definition: 10393.658ms, 5497912 bytes old definition + unreachable: 10011.102ms, 5469144 bytes stmt, two calls, unreachable: 10036.132ms, 5468792 bytes stmt, one call, unreachable:9443.612ms, 5462232 bytes stmt, one call, unreachable, save errno:9615.863ms, 5489688 bytes I find these numbers pretty hard to credit. Why should replacing two calls by one, in code paths that are not being taken, move the runtime so much? The argument that a net reduction of code size is a win doesn't work, because the last case is more code than any except the first. There are quite some elog(DEBUG*)s in the backend, and those are taken, so I don't find it unreasonable that doing less work in those cases is measurable. And if you look at the disassembly of ERROR codepaths: saving errno, one function: Dump of assembler code for function palloc: 639 { 0x00736397 +7: mov%rdi,%rsi 0x007363b0 +32:push %rbp 0x007363b1 +33:mov%rsp,%rbp 0x007363b4 +36:sub$0x20,%rsp 640 /* duplicates MemoryContextAlloc to avoid increased overhead */ 641 AssertArg(MemoryContextIsValid(CurrentMemoryContext)); 642 643 if (!AllocSizeIsValid(size)) 0x00736390 +0: cmp$0x3fff,%rdi 0x0073639a +10:ja 0x7363b0 palloc+32 644elog(ERROR, invalid memory alloc request size %lu, 0x007363b8 +40:mov%rdi,-0x8(%rbp) 0x007363bc +44:callq 0x462010 __errno_location@plt 0x007363c1 +49:mov-0x8(%rbp),%rsi 0x007363c5 +53:mov$0x8ace30,%r9d 0x007363cb +59:mov$0x14,%ecx 0x007363d0 +64:mov$0x8acefe,%edx 0x007363d5 +69:mov$0x8ace58,%edi 0x007363da +74:mov%rsi,(%rsp) 0x007363de +78:mov(%rax),%r8d 0x007363e1 +81:mov$0x285,%esi 0x007363e6 +86:xor%eax,%eax 0x007363e8 +88:callq 0x71a0b0 elog_all 0x007363ed: nopl (%rax) 645 (unsigned long) size); 646 647 CurrentMemoryContext-isReset = false; 0x0073639c +12: mov0x25c6e5(%rip),%rdi # 0x992a88 CurrentMemoryContext 0x007363a7 +23:movb $0x0,0x30(%rdi) 648 649 return (*CurrentMemoryContext-methods-alloc) (CurrentMemoryContext, size); 0x007363a3 +19:mov0x8(%rdi),%rax 0x007363ab +27:mov(%rax),%rax 0x007363ae +30:jmpq *%rax End of assembler dump. vs Dump of assembler code for function palloc: 639 { 0x007382b0 +0: push %rbp 0x007382b1 +1: mov%rsp,%rbp 0x007382b4 +4: push %rbx 0x007382b5 +5: mov%rdi,%rbx 0x007382b8 +8: sub$0x8,%rsp 640 /* duplicates MemoryContextAlloc to avoid increased overhead */ 641 AssertArg(MemoryContextIsValid(CurrentMemoryContext)); 642 643 if (!AllocSizeIsValid(size)) 0x007382bc +12:cmp$0x3fff,%rdi 0x007382c3 +19:jbe0x7382ed palloc+61 644elog(ERROR, invalid memory alloc request size %lu, 0x007382c5 +21:mov$0x8aec9e,%edx 0x007382ca +26:mov$0x284,%esi 0x007382cf +31:mov$0x8aebd0,%edi 0x007382d4 +36:callq 0x71bcb0 elog_start 0x007382d9 +41:mov%rbx,%rdx 0x007382dc +44:mov$0x8aec10,%esi 0x007382e1 +49:mov$0x14,%edi 0x007382e6 +54:xor%eax,%eax 0x007382e8 +56:callq 0x71bac0 elog_finish 645 (unsigned long) size); 646 647 CurrentMemoryContext-isReset = false; 0x007382ed +61: mov0x25bc54(%rip),%rdi # 0x993f48 CurrentMemoryContext 0x007382fb +75:movb $0x0,0x30(%rdi) 648 649 return (*CurrentMemoryContext-methods-alloc) (CurrentMemoryContext, size); 0x007382f4 +68:mov%rbx,%rsi 0x007382f7 +71:mov0x8(%rdi),%rax 0x007382ff +79:mov(%rax),%rax 0x00738308 +88:jmpq *%rax 0x0073830a: nopw 0x0(%rax,%rax,1) 650 } 0x00738302 +82:add$0x8,%rsp 0x00738306 +86:pop%rbx 0x00738307 +87:pop%rbp End of assembler dump. If other critical paths look like this its not that surprising. I think you're measuring some coincidental
[HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
On 2013-01-13 15:44:58 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: And if you look at the disassembly of ERROR codepaths: I think your numbers are being twisted by -fno-omit-frame-pointer. What I get, with the __builtin_unreachable version of the macro, looks more like The only difference is that the following two instructions aren't done: 0x007363b0 +32:push %rbp 0x007363b1 +33:mov%rsp,%rbp Given that that function barely uses registers (from an amd64 pov at least), thats notreally surprising. MemoryContextAlloc: cmpq$1073741823, %rsi pushq %rbx movq%rsi, %rbx ja .L53 movq8(%rdi), %rax movb$0, 48(%rdi) popq%rbx movq(%rax), %rax jmp *%rax .L53: movl$__func__.5262, %edx movl$576, %esi movl$.LC2, %edi callelog_start movq%rbx, %rdx movl$.LC3, %esi movl$20, %edi xorl%eax, %eax callelog_finish With -fno-omit-frame-pointer it's a little worse, but still not what you show --- in particular, for me gcc still pushes the elog calls out of the main code path. I don't think that the main path will get any better with one elog function instead of two. It could easily get worse. On many machines, the single-function version would be worse because of needing to use more parameter registers, which would translate into more save/restore work in the calling function, which is overhead that would likely be paid whether elog actually gets called or not. (As an example, in the above code gcc evidently isn't noticing that it doesn't need to save/restore rbx so far as the main path is concerned.) In any case, results from a single micro-benchmark on a single platform with a single compiler version (and single set of compiler options) don't convince me a whole lot here. I am not convinced either, I just got curious whether it would be a win after the __VA_ARGS__ thing made it possible. If the errno problem didn't exist I would be pretty damn sure its just about always a win, but the way its now... We could make elog behave the same as ereport WRT to argument evaluation but that seems a bit dangerous given it would still be different on some platforms. Basically, the aspects of this that I think are likely to be reproducible wins across different platforms are (a) teaching the compiler that elog(ERROR) doesn't return, and (b) reducing code size as much as possible. The single-function change isn't going to help on either ground --- maybe it would have helped on (b) without the errno problem, but that's going to destroy any possible code size savings. Agreed. I am happy to produce an updated patch unless youre already on it? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
On 11.01.2013 23:49, Tom Lane wrote: Andres Freundand...@2ndquadrant.com writes: On 2013-01-11 16:28:13 -0500, Tom Lane wrote: I'm not very satisfied with that answer. Right now, Peter's patch has added a class of bugs that never existed before 9.3, and yours would add more. It might well be that those classes are empty ... but *we can't know that*. I don't think that keeping a new optimization for non-gcc compilers is worth that risk. Postgres is already full of gcc-only optimizations, anyway. ISTM that ereport() already has so strange behaviour wrt evaluation of arguments (i.e doing it only when the message is really logged) that its doesn't seem to add a real additional risk. Hm ... well, that's a point. I may be putting too much weight on the fact that any such bug for elevel would be a new one that never existed before. What do others think? It sure would be nice to avoid multiple evaluation. At least in xlog.c we use emode_for_corrupt_record() to pass the elevel, and it does have a side-effect. It's quite harmless in practice, you'll get some extra DEBUG1 messages in the log, but still. Here's one more construct to consider: #define ereport_domain(elevel, domain, rest) \ do { \ const int elevel_ = elevel; \ if (errstart(elevel_,__FILE__, __LINE__, PG_FUNCNAME_MACRO, domain) || elevel_= ERROR) \ { \ (void) rest; \ if (elevel_= ERROR) \ errfinish_noreturn(1); \ else \ errfinish(1); \ } \ } while(0) extern void errfinish(int dummy,...); extern void errfinish_noreturn(int dummy,...) __attribute__((noreturn)); With do-while, ereport() would no longer be an expression. There only place where that's a problem in our codebase is in scan.l, bootscanner.l and repl_scanner.l, which do this: #undef fprintf #define fprintf(file, fmt, msg) ereport(ERROR, (errmsg_internal(%s, msg))) and flex does this: (void) fprintf( stderr, %s\n, msg ); but that's easy to work around by creating a wrapper fprintf function in those .l files. When I compile with gcc -O0, I get one warning with this: datetime.c: In function ‘DateTimeParseError’: datetime.c:3575:1: warning: ‘noreturn’ function does return [enabled by default] That suggests that the compiler didn't correctly deduce that ereport(ERROR, ...) doesn't return. With -O1, the warning goes away. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
On 2013-01-12 12:26:37 +0200, Heikki Linnakangas wrote: On 11.01.2013 23:49, Tom Lane wrote: Andres Freundand...@2ndquadrant.com writes: On 2013-01-11 16:28:13 -0500, Tom Lane wrote: I'm not very satisfied with that answer. Right now, Peter's patch has added a class of bugs that never existed before 9.3, and yours would add more. It might well be that those classes are empty ... but *we can't know that*. I don't think that keeping a new optimization for non-gcc compilers is worth that risk. Postgres is already full of gcc-only optimizations, anyway. ISTM that ereport() already has so strange behaviour wrt evaluation of arguments (i.e doing it only when the message is really logged) that its doesn't seem to add a real additional risk. Hm ... well, that's a point. I may be putting too much weight on the fact that any such bug for elevel would be a new one that never existed before. What do others think? It sure would be nice to avoid multiple evaluation. At least in xlog.c we use emode_for_corrupt_record() to pass the elevel, and it does have a side-effect. It's quite harmless in practice, you'll get some extra DEBUG1 messages in the log, but still. Here's one more construct to consider: #define ereport_domain(elevel, domain, rest) \ do { \ const int elevel_ = elevel; \ if (errstart(elevel_,__FILE__, __LINE__, PG_FUNCNAME_MACRO, domain) || elevel_= ERROR) \ { \ (void) rest; \ if (elevel_= ERROR) \ errfinish_noreturn(1); \ else \ errfinish(1); \ } \ } while(0) extern void errfinish(int dummy,...); extern void errfinish_noreturn(int dummy,...) __attribute__((noreturn)); I am afraid that that would bloat the code again as it would have to put that if() into each callsite whereas a variant that ends up using __builtin_unreachable() doesn't. So I think we should use __builtin_unreachable() while falling back to abort() as before. But that's really more a detail than anything fundamental about the approach. I'll play a bit. With do-while, ereport() would no longer be an expression. There only place where that's a problem in our codebase is in scan.l, bootscanner.l and repl_scanner.l, which do this: I aggree that would easy to fix, so no problem there. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
Andres Freund and...@2ndquadrant.com writes: It does *not* combine elog_start and elog_finish into one function if varargs are available although that brings a rather measurable size/performance benefit. Since you've apparently already done the measurement: how much? It would be a bit tedious to support two different infrastructures for elog(), but if it's a big enough win maybe we should. Imo its pretty definitely a big enough win. So big I have a hard time believing it can be true without negative effects somewhere else. Well, actually there's a pretty serious negative effect here, which is that when it's implemented this way it's impossible to save errno for %m before the elog argument list is evaluated. So I think this is a no-go. We've always had the contract that functions in the argument list could stomp on errno without care. If we switch to a do-while macro expansion it'd be possible to do something like do { int save_errno = errno; int elevel = whatever; elog_internal( save_errno, elevel, fmt, __VA__ARGS__ ); } while (0); but this would almost certainly result in more code bloat not less, since call sites would now be responsible for fetching errno. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
Heikki Linnakangas hlinnakan...@vmware.com writes: When I compile with gcc -O0, I get one warning with this: datetime.c: In function DateTimeParseError: datetime.c:3575:1: warning: noreturn function does return [enabled by default] That suggests that the compiler didn't correctly deduce that ereport(ERROR, ...) doesn't return. With -O1, the warning goes away. Yeah, I am seeing this too. It appears to be endemic to the local-variable approach, ie if we have const int elevel_ = (elevel); ... (elevel_ = ERROR) ? pg_unreachable() : (void) 0 then we do not get the desired results at -O0, which is not terribly surprising --- I'd not really expect the compiler to propagate the value of elevel_ when not optimizing. If we don't use a local variable, we don't get the warning, which I take to mean that gcc will fold ERROR = ERROR to true even at -O0, and that it does this early enough to conclude that unreachability holds. I experimented with some variant ways of phrasing the macro, but the only thing that worked at -O0 required __builtin_constant_p, which rather defeats the purpose of making this accessible to non-gcc compilers. If we go with the local-variable approach, we could probably suppress this warning by putting an abort() call at the bottom of DateTimeParseError. It seems a tad ugly though, and what's a bigger deal is that if the compiler is unable to deduce unreachability at -O0 then we are probably going to be fighting such bogus warnings for all time to come. Note also that an abort() (much less a pg_unreachable()) would not do anything positive like give us a compile warning if we mistakenly added a case that could fall through. On the other hand, if there's only one such case in our tree today, maybe I'm worrying too much. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
On 2013-01-10 10:55:20 +0100, Andres Freund wrote: On 2013-01-10 10:31:04 +0100, Andres Freund wrote: On 2013-01-10 00:05:07 +0100, Andres Freund wrote: On 2013-01-09 17:28:35 -0500, Tom Lane wrote: (We know this doesn't matter, but gcc doesn't; this version of gcc apparently isn't doing much with the knowledge that elog won't return.) Afaics one reason for that is that we don't have any such annotation for elog(), just for ereport. And I don't immediately see how it could be added to elog without relying on variadic macros. Bit of a shame, there probably a good number of callsites that could actually benefit from that knowledge. Is it worth making that annotation conditional on variadic macro support? A quick test confirms that. With: diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h index cbbda04..39cd809 100644 --- a/src/include/utils/elog.h +++ b/src/include/utils/elog.h @@ -212,7 +212,13 @@ extern int getinternalerrposition(void); * elog(ERROR, portal \%s\ not found, stmt-portalname); *-- */ +#ifdef __GNUC__ +#define elog(elevel, ...) elog_start(__FILE__, __LINE__, PG_FUNCNAME_MACRO), \ + elog_finish(elevel, __VA_ARGS__), \ + ((elevel) = ERROR ? __builtin_unreachable() : (void) 0) +#else #define elog elog_start(__FILE__, __LINE__, PG_FUNCNAME_MACRO), elog_finish +#endif extern void elog_start(const char *filename, int lineno, const char *funcname); extern void nearly the same code is generated for both variants (this is no additionally saved registers and such). Two unfinished things: * the above only checks for gcc although all C99 compilers should support varargs macros * It uses __builtin_unreachable() instead of abort() as that creates a smaller executable. That's gcc 4.5 only. Saves about 30kb in a both a stripped and unstripped binary. * elog(ERROR) i.e. no args would require more macro ugliness, but that seems unneccesary Doing the __builtin_unreachable() for ereport as well saves about 50kb. Given that some of those elog/ereports are in performance critical code paths it seems like a good idea to remove code from the callsites. Adding configure check for __VA_ARGS__ and __builtin_unreachable() should be simple enough? For whatever its worth - I am not sure its much - after relying on variadic macros we can make elog into one function call instead of two. That saves another 100kb. [tinker] The builtin_unreachable and combined elog thing bring a measurable performance benefit of a rather surprising 7% when running Pavel's testcase ontop of yesterdays HEAD. So it seems worth investigating. About 3% of that is the __builtin_unreachable() and about 4% the combining of elog_start/finish. Now that testcase sure isn't a very representative one for this, but I don't really see why it should benefit much more than other cases. Seems to be quite the benefit for relatively minor cost. Although I am pretty sure just copying the whole of elog_start/elog_finish into one elog_all() isn't the best way to go at this... The attached patch: * adds configure checks for __VA_ARGS__ and __builtin_unreachable support * provides a pg_unreachable() macro that expands to __builtin_unreachable() or abort() * changes #define elog ... into #define elog(elevel, ...) if varargs are available It does *not* combine elog_start and elog_finish into one function if varargs are available although that brings a rather measurable size/performance benefit. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From 94a3f93f4c8a381358a4dc52a43ec60a8f3e33f6 Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Fri, 11 Jan 2013 16:58:17 +0100 Subject: [PATCH] Mark elog() as not returning if the capabilities of the C compiler allow it To do that: * Add a configure check for __VA_ARGS__ support * Provide pg_unreachable() macro which expands to__builtin_unreachable() if the compiler supports it, abort() otherwise Marking it as such improves code generation and reduces executable size. --- config/c-compiler.m4 | 41 + configure.in | 2 ++ src/include/c.h | 10 ++ src/include/utils/elog.h | 18 +++--- 4 files changed, 68 insertions(+), 3 deletions(-) diff --git a/config/c-compiler.m4 b/config/c-compiler.m4 index 7cbb8ec..f3efe72 100644 --- a/config/c-compiler.m4 +++ b/config/c-compiler.m4 @@ -161,6 +161,47 @@ fi])# PGAC_C_TYPES_COMPATIBLE +# PGAC_C_VA_ARGS +# --- +# Check if the C compiler understands C99 style variadic macros and define +# HAVE__VA_ARGS if so. +AC_DEFUN([PGAC_C_VA_ARGS], +[AC_CACHE_CHECK(for
Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
Andres Freund and...@2ndquadrant.com writes: The attached patch: * adds configure checks for __VA_ARGS__ and __builtin_unreachable support * provides a pg_unreachable() macro that expands to __builtin_unreachable() or abort() * changes #define elog ... into #define elog(elevel, ...) if varargs are available Seems like a good thing to do --- will review and commit. It does *not* combine elog_start and elog_finish into one function if varargs are available although that brings a rather measurable size/performance benefit. Since you've apparently already done the measurement: how much? It would be a bit tedious to support two different infrastructures for elog(), but if it's a big enough win maybe we should. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
On 2013-01-11 14:01:40 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: The attached patch: * adds configure checks for __VA_ARGS__ and __builtin_unreachable support * provides a pg_unreachable() macro that expands to __builtin_unreachable() or abort() * changes #define elog ... into #define elog(elevel, ...) if varargs are available Seems like a good thing to do --- will review and commit. Thanks. I guess you will catch that anyway, but afaik msvc supports __VA_ARGS__ these days (since v2005 seemingly) and I didn't add HAVE__VA_ARGS to the respective pg_config.h.win32 It does *not* combine elog_start and elog_finish into one function if varargs are available although that brings a rather measurable size/performance benefit. Since you've apparently already done the measurement: how much? It would be a bit tedious to support two different infrastructures for elog(), but if it's a big enough win maybe we should. Imo its pretty definitely a big enough win. So big I have a hard time believing it can be true without negative effects somewhere else. Ontop of the patch youve replied to it saves somewhere around 80kb and between 0.8 (-S -M prepared), 2% (-S -M simple) and 4% (own stuff) I had lying around and it consistently gives 6%-7% in Pavel's testcase. With todays HEAD, not with the one before your fix. I attached the absolutely ugly and unready patch I used for testing. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c index e710f22..c1f3200 100644 --- a/src/backend/utils/error/elog.c +++ b/src/backend/utils/error/elog.c @@ -1151,6 +1151,43 @@ getinternalerrposition(void) return edata-internalpos; } +#if defined(HAVE__VA_ARGS) defined(COMBINE_ELOG) +void +elog_all(const char *filename, int lineno, const char *funcname, int elevel, const char *fmt,...) +{ + elog_start(filename, lineno, funcname); + { + ErrorData *edata = errordata[errordata_stack_depth]; + MemoryContext oldcontext; + + CHECK_STACK_DEPTH(); + + /* + * Do errstart() to see if we actually want to report the message. + */ + errordata_stack_depth--; + errno = edata-saved_errno; + if (!errstart(elevel, edata-filename, edata-lineno, edata-funcname, NULL)) + return; /* nothing to do */ + + /* + * Format error message just like errmsg_internal(). + */ + recursion_depth++; + oldcontext = MemoryContextSwitchTo(ErrorContext); + + EVALUATE_MESSAGE(edata-domain, message, false, false); + + MemoryContextSwitchTo(oldcontext); + recursion_depth--; + + /* + * And let errfinish() finish up. + */ + errfinish(0); + } +} +#endif /* * elog_start --- startup for old-style API diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h index d6e1054..9af530f 100644 --- a/src/include/utils/elog.h +++ b/src/include/utils/elog.h @@ -216,7 +216,15 @@ extern int getinternalerrposition(void); * generated. *-- */ -#ifdef HAVE__VA_ARGS +#define COMBINE_ELOG +#if defined(HAVE__VA_ARGS) defined(COMBINE_ELOG) +extern void elog_all(const char *filename, int lineno, const char *funcname, int elevel, const char *fmt,...) +__attribute__((format(PG_PRINTF_ATTRIBUTE, 5, 6))); + +#define elog(elevel, ...)\ + elog_all(__FILE__, __LINE__, PG_FUNCNAME_MACRO, elevel, __VA_ARGS__), \ + ((elevel) = ERROR ? pg_unreachable() : (void) 0) +#elif defined(HAVE__VA_ARGS) #define elog(elevel, ...)\ elog_start(__FILE__, __LINE__, PG_FUNCNAME_MACRO), \ elog_finish(elevel, __VA_ARGS__),\ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
Andres Freund and...@2ndquadrant.com writes: [ patch to mark elog() as non-returning if elevel = ERROR ] It strikes me that both in this patch, and in Peter's previous patch to do this for ereport(), there is an opportunity for improvement. Namely, that the added code only has any use if elevel is a constant, which in some places it isn't. We don't really need a call to abort() to be executed there, but the compiler knows no better and will have to generate code to test the value of elevel and call abort(). Which rather defeats the purpose of saving code; plus the compiler will still not have any knowledge that code after the ereport isn't reached. And another thing: what if the elevel argument isn't safe for multiple evaluation? No such hazard ever existed before these patches, so I'm not very comfortable with adding one. (Even if all our own code is safe, there's third-party code to worry about.) When we're using recent gcc, there is an easy fix, which is that the macros could do this: __builtin_constant_p(elevel) (elevel) = ERROR : pg_unreachable() : (void) 0 This gets rid of both useless code and the risk of undesirable multiple evaluation, while not giving up any optimization possibility that existed before. So I think we should add a configure test for __builtin_constant_p() and do it like that when possible. That still leaves the question of what to do when the compiler doesn't have __builtin_constant_p(). Should we use the coding that we have, or give up and not try to mark the macros as non-returning? I'm rather inclined to do the latter, because of the multiple-evaluation-risk argument. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
On 2013-01-11 15:05:54 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: [ patch to mark elog() as non-returning if elevel = ERROR ] It strikes me that both in this patch, and in Peter's previous patch to do this for ereport(), there is an opportunity for improvement. Namely, that the added code only has any use if elevel is a constant, which in some places it isn't. We don't really need a call to abort() to be executed there, but the compiler knows no better and will have to generate code to test the value of elevel and call abort(). Which rather defeats the purpose of saving code; plus the compiler will still not have any knowledge that code after the ereport isn't reached. Yea, I think that's why __builtin_unreachable() is a performance benefit in comparison with abort(). Both are a benefit over not doing either btw in my measurements. And another thing: what if the elevel argument isn't safe for multiple evaluation? No such hazard ever existed before these patches, so I'm not very comfortable with adding one. (Even if all our own code is safe, there's third-party code to worry about.) Hm. I am not really too scared about those dangers I have to admit. If at all I am caring more for ereport than for elog. Placing code with side effects as elevel arguments just seems a bit too absurd. When we're using recent gcc, there is an easy fix, which is that the macros could do this: Recent as in 3.1 or so ;) Its really too bad that there's no __builtin_pure() or __builtin_side_effect_free() or somesuch :(. __builtin_constant_p(elevel) (elevel) = ERROR : pg_unreachable() : (void) 0 This gets rid of both useless code and the risk of undesirable multiple evaluation, while not giving up any optimization possibility that existed before. So I think we should add a configure test for __builtin_constant_p() and do it like that when possible. I think there might be code somewhere that decides between FATAL and PANIC which would not hit that path then. Imo not a good enough reason not to do it, but I wanted to mention it anyway. That still leaves the question of what to do when the compiler doesn't have __builtin_constant_p(). Should we use the coding that we have, or give up and not try to mark the macros as non-returning? I'm rather inclined to do the latter, because of the multiple-evaluation-risk argument. I dislike the latter somewhat as it would mean not to give that information at all to msvc and others which seems a bit sad. But I don't feel particularly strongly. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
On 2013-01-11 15:52:19 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2013-01-11 15:05:54 -0500, Tom Lane wrote: And another thing: what if the elevel argument isn't safe for multiple evaluation? No such hazard ever existed before these patches, so I'm not very comfortable with adding one. (Even if all our own code is safe, there's third-party code to worry about.) Hm. I am not really too scared about those dangers I have to admit. I agree the scenario doesn't seem all that probable, but what scares me here is that if we use __builtin_constant_p(elevel) (elevel) = ERROR in some builds, and just (elevel) = ERROR in others, then if there is any code with a multiple-evaluation hazard, it is only buggy in the latter builds. That's sufficiently nasty that I'm willing to give up an optimization that we never had before 9.3 anyway. Well, why use it at all then and not just rely on __builtin_unreachable() in any recent gcc (and llvm fwiw) and abort() otherwise? Then the code is small for anything recent (gcc 4.4 afair) and always consistently buggy. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
On 2013-01-11 16:16:58 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2013-01-11 15:52:19 -0500, Tom Lane wrote: I agree the scenario doesn't seem all that probable, but what scares me here is that if we use __builtin_constant_p(elevel) (elevel) = ERROR in some builds, and just (elevel) = ERROR in others, then if there is any code with a multiple-evaluation hazard, it is only buggy in the latter builds. That's sufficiently nasty that I'm willing to give up an optimization that we never had before 9.3 anyway. Well, why use it at all then and not just rely on __builtin_unreachable() in any recent gcc (and llvm fwiw) and abort() otherwise? Then the code is small for anything recent (gcc 4.4 afair) and always consistently buggy. Uh ... because it's *not* unreachable if elevel ERROR. Otherwise we'd just mark errfinish as __attribute((noreturn)) and be done. Of course, that's a gcc-ism too. Well, I mean with the double evaluation risk. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
On 2013-01-11 16:28:13 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2013-01-11 16:16:58 -0500, Tom Lane wrote: Uh ... because it's *not* unreachable if elevel ERROR. Otherwise we'd just mark errfinish as __attribute((noreturn)) and be done. Of course, that's a gcc-ism too. Well, I mean with the double evaluation risk. Oh, are you saying you don't want to make the __builtin_constant_p addition? Yes, that was what I wanted to say. I'm not very satisfied with that answer. Right now, Peter's patch has added a class of bugs that never existed before 9.3, and yours would add more. It might well be that those classes are empty ... but *we can't know that*. I don't think that keeping a new optimization for non-gcc compilers is worth that risk. Postgres is already full of gcc-only optimizations, anyway. ISTM that ereport() already has so strange behaviour wrt evaluation of arguments (i.e doing it only when the message is really logged) that its doesn't seem to add a real additional risk. It also means that we potentially need to add more quieting returns et al to keep the warning level on non-gcc compatible compilers low (probably only msvc matters here) which we all don't see on our primary compilers. Anyway, youve got a strong opinion and I don't, so ... Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
On 2013-01-10 00:05:07 +0100, Andres Freund wrote: On 2013-01-09 17:28:35 -0500, Tom Lane wrote: (We know this doesn't matter, but gcc doesn't; this version of gcc apparently isn't doing much with the knowledge that elog won't return.) Afaics one reason for that is that we don't have any such annotation for elog(), just for ereport. And I don't immediately see how it could be added to elog without relying on variadic macros. Bit of a shame, there probably a good number of callsites that could actually benefit from that knowledge. Is it worth making that annotation conditional on variadic macro support? A quick test confirms that. With: diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h index cbbda04..39cd809 100644 --- a/src/include/utils/elog.h +++ b/src/include/utils/elog.h @@ -212,7 +212,13 @@ extern int getinternalerrposition(void); * elog(ERROR, portal \%s\ not found, stmt-portalname); *-- */ +#ifdef __GNUC__ +#define elog(elevel, ...) elog_start(__FILE__, __LINE__, PG_FUNCNAME_MACRO), \ + elog_finish(elevel, __VA_ARGS__), \ + ((elevel) = ERROR ? __builtin_unreachable() : (void) 0) +#else #define elog elog_start(__FILE__, __LINE__, PG_FUNCNAME_MACRO), elog_finish +#endif extern void elog_start(const char *filename, int lineno, const char *funcname); extern void nearly the same code is generated for both variants (this is no additionally saved registers and such). Two unfinished things: * the above only checks for gcc although all C99 compilers should support varargs macros * It uses __builtin_unreachable() instead of abort() as that creates a smaller executable. That's gcc 4.5 only. Saves about 30kb in a both a stripped and unstripped binary. * elog(ERROR) i.e. no args would require more macro ugliness, but that seems unneccesary Doing the __builtin_unreachable() for ereport as well saves about 50kb. Given that some of those elog/ereports are in performance critical code paths it seems like a good idea to remove code from the callsites. Adding configure check for __VA_ARGS__ and __builtin_unreachable() should be simple enough? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
On 2013-01-10 10:31:04 +0100, Andres Freund wrote: On 2013-01-10 00:05:07 +0100, Andres Freund wrote: On 2013-01-09 17:28:35 -0500, Tom Lane wrote: (We know this doesn't matter, but gcc doesn't; this version of gcc apparently isn't doing much with the knowledge that elog won't return.) Afaics one reason for that is that we don't have any such annotation for elog(), just for ereport. And I don't immediately see how it could be added to elog without relying on variadic macros. Bit of a shame, there probably a good number of callsites that could actually benefit from that knowledge. Is it worth making that annotation conditional on variadic macro support? A quick test confirms that. With: diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h index cbbda04..39cd809 100644 --- a/src/include/utils/elog.h +++ b/src/include/utils/elog.h @@ -212,7 +212,13 @@ extern int getinternalerrposition(void); * elog(ERROR, portal \%s\ not found, stmt-portalname); *-- */ +#ifdef __GNUC__ +#define elog(elevel, ...) elog_start(__FILE__, __LINE__, PG_FUNCNAME_MACRO), \ + elog_finish(elevel, __VA_ARGS__), \ + ((elevel) = ERROR ? __builtin_unreachable() : (void) 0) +#else #define elog elog_start(__FILE__, __LINE__, PG_FUNCNAME_MACRO), elog_finish +#endif extern void elog_start(const char *filename, int lineno, const char *funcname); extern void nearly the same code is generated for both variants (this is no additionally saved registers and such). Two unfinished things: * the above only checks for gcc although all C99 compilers should support varargs macros * It uses __builtin_unreachable() instead of abort() as that creates a smaller executable. That's gcc 4.5 only. Saves about 30kb in a both a stripped and unstripped binary. * elog(ERROR) i.e. no args would require more macro ugliness, but that seems unneccesary Doing the __builtin_unreachable() for ereport as well saves about 50kb. Given that some of those elog/ereports are in performance critical code paths it seems like a good idea to remove code from the callsites. Adding configure check for __VA_ARGS__ and __builtin_unreachable() should be simple enough? For whatever its worth - I am not sure its much - after relying on variadic macros we can make elog into one function call instead of two. That saves another 100kb. [tinker] The builtin_unreachable and combined elog thing bring a measurable performance benefit of a rather surprising 7% when running Pavel's testcase ontop of yesterdays HEAD. So it seems worth investigating. About 3% of that is the __builtin_unreachable() and about 4% the combining of elog_start/finish. Now that testcase sure isn't a very representative one for this, but I don't really see why it should benefit much more than other cases. Seems to be quite the benefit for relatively minor cost. Although I am pretty sure just copying the whole of elog_start/elog_finish into one elog_all() isn't the best way to go at this... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
On 2013-01-09 11:57:35 -0500, Tom Lane wrote: Andres Freund and...@anarazel.de writes: On 2013-01-09 11:37:46 -0500, Tom Lane wrote: I agree that what dirmod is doing is pretty ugly, but it's localized enough that I don't care particularly. (Really, the only reason it's a hack and not The Solution is that at the moment there's only one file doing it that way. A trampoline function for use only by code that needs to work in both frontend and backend isn't an unreasonable idea.) Well, after the patch the trampoline function would be palloc() itself. My objection is that you're forcing *all* backend code to go through the trampoline. That probably has a negative impact on performance, and if you think it'll get committed without a thorough proof that there's no such impact, you're mistaken. Perhaps you're not aware of exactly how hot a hot-spot palloc is? It's not unlikely that this patch could hurt backend performance by several percent all by itself. There is no additional overhead except some minor increase in code size. If you look at the patch palloc() now simply directly calls CurrentMemoryContext-methods-alloc. So there is no additional function call relative to the previous state. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
On 2013-01-09 12:32:20 -0500, Tom Lane wrote: Andres Freund and...@anarazel.de writes: On 2013-01-09 11:57:35 -0500, Tom Lane wrote: My objection is that you're forcing *all* backend code to go through the trampoline. That probably has a negative impact on performance, and if you think it'll get committed without a thorough proof that there's no such impact, you're mistaken. Perhaps you're not aware of exactly how hot a hot-spot palloc is? It's not unlikely that this patch could hurt backend performance by several percent all by itself. There is no additional overhead except some minor increase in code size. If you look at the patch palloc() now simply directly calls CurrentMemoryContext-methods-alloc. So there is no additional function call relative to the previous state. Apparently we're failing to communicate, so let me put this as clearly as possible: an unsupported assertion that this patch has zero cost isn't worth the electrons it's written on. Well, I *did* benchmark it as noted elsewhere in the thread, but thats obviously just machine (E5520 x 2) with one rather restricted workload (pgbench -S -jc 40 -T60). At least its rather palloc heavy. Here are the numbers: before: #101646.763208 101350.361595 101421.425668 101571.211688 101862.172051 101449.857665 after: #101553.596257 102132.277795 101528.816229 101733.541792 101438.531618 101673.400992 So on my system if there is a difference, its positive (0.12%). In any case it's not clear to me why duplicating code like that is a less ugly approach than having different macro definitions for frontend and backend. Imo its better abstracted and more consistent (pfree() is not a macro, palloc() is) and actually makes it easier to redirect code somewhere else. It also opens the door for exposing less knowledge in utils/palloc.h. Anyway, we can use a macro instead, I don't think its that important. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
Andres Freund and...@2ndquadrant.com writes: Well, I *did* benchmark it as noted elsewhere in the thread, but thats obviously just machine (E5520 x 2) with one rather restricted workload (pgbench -S -jc 40 -T60). At least its rather palloc heavy. Here are the numbers: before: #101646.763208 101350.361595 101421.425668 101571.211688 101862.172051 101449.857665 after: #101553.596257 102132.277795 101528.816229 101733.541792 101438.531618 101673.400992 So on my system if there is a difference, its positive (0.12%). pgbench-based testing doesn't fill me with a lot of confidence for this --- its numbers contain a lot of communication overhead, not to mention that pgbench itself can be a bottleneck. It struck me that we have a recent test case that's known to be really palloc-intensive, namely Pavel's example here: http://www.postgresql.org/message-id/CAFj8pRCKfoz6L82PovLXNK-1JL=jzjwat8e2bd2pwnkm7i7...@mail.gmail.com I set up a non-cassert build of commit 78a5e738e97b4dda89e1bfea60675bcf15f25994 (ie, just before the patch that reduced the data-copying overhead for that). On my Fedora 16 machine (dual 2.0GHz Xeon E5503, gcc version 4.6.3 20120306 (Red Hat 4.6.3-2)) I get a runtime for Pavel's example of 17023 msec (average over five runs). I then applied oprofile and got a breakdown like this: samples| %| -- 108409 84.5083 /home/tgl/testversion/bin/postgres 13723 10.6975 /lib64/libc-2.14.90.so 3153 2.4579 /home/tgl/testversion/lib/postgresql/plpgsql.so samples %symbol name 1096010.1495 AllocSetAlloc 6325 5.8572 MemoryContextAllocZeroAligned 6225 5.7646 base_yyparse 3765 3.4866 copyObject 2511 2.3253 MemoryContextAlloc 2292 2.1225 grouping_planner 2044 1.8928 SearchCatCache 1956 1.8113 core_yylex 1763 1.6326 expression_tree_walker 1347 1.2474 MemoryContextCreate 1340 1.2409 check_stack_depth 1276 1.1816 GetCachedPlan 1175 1.0881 AllocSetFree 1106 1.0242 GetSnapshotData 1106 1.0242 _SPI_execute_plan 1101 1.0196 extract_query_dependencies_walker I then applied the palloc.h and mcxt.c hunks of your patch and rebuilt. Now I get an average runtime of 1 ms, a full 2% faster, which is a bit astonishing, particularly because the oprofile results haven't moved much: 107642 83.7427 /home/tgl/testversion/bin/postgres 14677 11.4183 /lib64/libc-2.14.90.so 3180 2.4740 /home/tgl/testversion/lib/postgresql/plpgsql.so samples %symbol name 10038 9.3537 AllocSetAlloc 6392 5.9562 MemoryContextAllocZeroAligned 5763 5.3701 base_yyparse 4810 4.4821 copyObject 2268 2.1134 grouping_planner 2178 2.0295 core_yylex 1963 1.8292 palloc 1867 1.7397 SearchCatCache 1835 1.7099 expression_tree_walker 1551 1.4453 check_stack_depth 1374 1.2803 _SPI_execute_plan 1282 1.1946 MemoryContextCreate 1187 1.1061 AllocSetFree ... 653 0.6085 palloc0 ... 552 0.5144 MemoryContextAlloc The number of calls of AllocSetAlloc certainly hasn't changed at all, so how did that get faster? I notice that the postgres executable is about 0.2% smaller, presumably because a whole lot of inlined fetches of CurrentMemoryContext are gone. This makes me wonder if my result is due to chance improvements of cache line alignment for inner loops. I would like to know if other people get comparable results on other hardware (non-Intel hardware would be especially interesting). If this result holds up across a range of platforms, I'll withdraw my objection to making palloc a plain function. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
I wrote: I then applied the palloc.h and mcxt.c hunks of your patch and rebuilt. Now I get an average runtime of 1 ms, a full 2% faster, which is a bit astonishing, particularly because the oprofile results haven't moved much: I studied the assembly code being generated for palloc(), and I believe I see the reason why it's a bit faster: when there's only a single local variable that has to survive over the elog call, gcc generates a shorter function entry/exit sequence. I had thought of proposing that we code palloc() like this: void * palloc(Size size) { MemoryContext context = CurrentMemoryContext; AssertArg(MemoryContextIsValid(context)); if (!AllocSizeIsValid(size)) elog(ERROR, invalid memory alloc request size %lu, (unsigned long) size); context-isReset = false; return (*context-methods-alloc) (context, size); } but at least on this specific hardware and compiler that would evidently be a net loss compared to direct use of CurrentMemoryContext. I would not put a lot of faith in that result holding up on other machines though. In any case this doesn't explain the whole 2% speedup, but it probably accounts for palloc() showing as slightly cheaper than MemoryContextAlloc had been in the oprofile listing. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
On 2013-01-09 15:43:19 -0500, Tom Lane wrote: I wrote: I then applied the palloc.h and mcxt.c hunks of your patch and rebuilt. Now I get an average runtime of 1 ms, a full 2% faster, which is a bit astonishing, particularly because the oprofile results haven't moved much: I studied the assembly code being generated for palloc(), and I believe I see the reason why it's a bit faster: when there's only a single local variable that has to survive over the elog call, gcc generates a shorter function entry/exit sequence. Makes sense. I had thought of proposing that we code palloc() like this: void * palloc(Size size) { MemoryContext context = CurrentMemoryContext; AssertArg(MemoryContextIsValid(context)); if (!AllocSizeIsValid(size)) elog(ERROR, invalid memory alloc request size %lu, (unsigned long) size); context-isReset = false; return (*context-methods-alloc) (context, size); } but at least on this specific hardware and compiler that would evidently be a net loss compared to direct use of CurrentMemoryContext. I would not put a lot of faith in that result holding up on other machines though. Thats not optimized to the same? ISTM the compiler should produce exactly the same code for both. In any case this doesn't explain the whole 2% speedup, but it probably accounts for palloc() showing as slightly cheaper than MemoryContextAlloc had been in the oprofile listing. I'd guess that a good part of the cost is just smeared across all callers and not individually accountable to any function visible in the profile. Additionally, With functions as short as MemoryContextAllocZero profiles like oprofile (and perf) also often leak quite a bit of the actual cost to the callsites in my experience. I wonder whether it makes sense to inline the contents pstrdup() additionally? My gut feeling is not, but... I would like to move CurrentMemoryContext to memutils.h, but that seems to require too many changes. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
On 2013-01-09 15:07:10 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: Well, I *did* benchmark it as noted elsewhere in the thread, but thats obviously just machine (E5520 x 2) with one rather restricted workload (pgbench -S -jc 40 -T60). At least its rather palloc heavy. Here are the numbers: before: #101646.763208 101350.361595 101421.425668 101571.211688 101862.172051 101449.857665 after: #101553.596257 102132.277795 101528.816229 101733.541792 101438.531618 101673.400992 So on my system if there is a difference, its positive (0.12%). pgbench-based testing doesn't fill me with a lot of confidence for this --- its numbers contain a lot of communication overhead, not to mention that pgbench itself can be a bottleneck. I chose it because I looked at profiles of it often enough to know memory allocation shows up high in the profile (especially without -M prepared). I would like to know if other people get comparable results on other hardware (non-Intel hardware would be especially interesting). If this result holds up across a range of platforms, I'll withdraw my objection to making palloc a plain function. I only have access to core2 and Nehalem atm, but FWIW I just tested it on another workload (decoding WAL changes of a large transaction into text) and I see improvements around 1.2% on both. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
On 2013-01-09 17:28:35 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2013-01-09 15:43:19 -0500, Tom Lane wrote: I had thought of proposing that we code palloc() like this: void * palloc(Size size) { MemoryContext context = CurrentMemoryContext; AssertArg(MemoryContextIsValid(context)); if (!AllocSizeIsValid(size)) elog(ERROR, invalid memory alloc request size %lu, (unsigned long) size); context-isReset = false; return (*context-methods-alloc) (context, size); } but at least on this specific hardware and compiler that would evidently be a net loss compared to direct use of CurrentMemoryContext. I would not put a lot of faith in that result holding up on other machines though. Thats not optimized to the same? ISTM the compiler should produce exactly the same code for both. No, that's exactly the point here, you can't just assume that you get the same code; this is tense enough that a few instructions matter. Remember we're considering non-assert builds, so the AssertArg vanishes. So in the form where CurrentMemoryContext is only read after the elog call, the compiler can see that it requires but one fetch - it can't change within the two lines where it's used. In the form given above, the compiler is required to fetch CurrentMemoryContext into a local variable and keep it across the elog call. I had thought that it could simply store the address in a callee saved register inside palloc, totally forgetting that palloc would need to save that register itself in that case. (We know this doesn't matter, but gcc doesn't; this version of gcc apparently isn't doing much with the knowledge that elog won't return.) Afaics one reason for that is that we don't have any such annotation for elog(), just for ereport. And I don't immediately see how it could be added to elog without relying on variadic macros. Bit of a shame, there probably a good number of callsites that could actually benefit from that knowledge. Is it worth making that annotation conditional on variadic macro support? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers