Re: [HACKERS] Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs
On Mon, Jan 5, 2015 at 8:42 PM, Atri Sharma atri.j...@gmail.com wrote: Hi All, Please forgive if this is a repost. Please find attached patch for supporting ORDER BY clause in CREATE FUNCTION for SRFs. Specifically: CREATE OR REPLACE FUNCTION func1(OUT e int, OUT f int) returns setof record as ' SELECT a,b FROM table1 ORDER BY a; ' language 'sql' ORDER BY e; This shall allow for capturing information about existing preorder that might be present inherently in the SRF's input or algorithm (the above example and think generate_series). This allows for eliminating sorts that can be based off the known existing preorder. For eg: SELECT * FROM correct_order_singlecol() ORDER BY e; # Does not need to sort by e since existing preorder is known. Eliminating such sorts can be a huge gain, especially if the expected input to needed Sort node is large. The obvious question that comes is what happens if specified ORDER BY clause is false. For checking the order, a new node is added which is top node of the plan and is responsible for projecting result rows. It tracks the previous row seen and given a sort order, ensures that the current tuple to be projected is in the required sort order. So, for above example EXPLAIN (COSTS OFF) SELECT * FROM correct_order_multicol() ORDER BY e; QUERY PLAN --- OrderCheck - Function Scan on correct_order_multicol (2 rows) If order of result rows is not the same as required, an error is raised: SELECT * FROM incorrect_order_nulls() ORDER BY e NULLS LAST; ERROR: Order not same as specified Preorder columns are first transformed into SortGroupClauses first and then stored directly in pg_proc. This functionality is a user case seen functionality, and is especially useful when SRF inputs are large and/or might be pipelined from another function (SRFs are used in pipelines in analytical systems many times, with large data). The overhead of this patch is small. A new path is added for the preorder keys, and OrderCheck node's additional cost is pretty low, given that it only compares two rows and stores only a single row (previous row seen), hence the memory footprint is minuscule. We can eliminate the new node and put onus or having the right order on the user like we do with volatile setting of the function. In the inner joins thread, Tom mentioned having a new node which has multiple plans and executor can decide which plan to execute given runtime conditions. I played around with the idea, and am open to experiment having a new node which has a Sort based plan and is executed in case OrderCheck node sees that the inherent order of result tuples is not correct. Feedback here would be very welcome. I will add the patch to current commitfest. Thoughts? Regards, Atri -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs
The overhead of this patch is small. A new path is added for the preorder keys, and OrderCheck node's additional cost is pretty low, given that it only compares two rows and stores only a single row (previous row seen), hence the memory footprint is minuscule. We can eliminate the new node and put onus or having the right order on the user like we do with volatile setting of the function. That is exactly what the new node does, since we are not re sorting right now in case the order is wrong. Please see my explanation upthread, OrderCheck node's primary purpose is to check for a user error in the result rows order. The onus right now to give correct order is on user. Regards, Atri -- Regards, Atri *l'apprenant*
Re: [HACKERS] Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs
On Tue, Jan 6, 2015 at 12:23 PM, Atri Sharma atri.j...@gmail.com wrote: The overhead of this patch is small. A new path is added for the preorder keys, and OrderCheck node's additional cost is pretty low, given that it only compares two rows and stores only a single row (previous row seen), hence the memory footprint is minuscule. We can eliminate the new node and put onus or having the right order on the user like we do with volatile setting of the function. That is exactly what the new node does, since we are not re sorting right now in case the order is wrong. Please see my explanation upthread, OrderCheck node's primary purpose is to check for a user error in the result rows order. The onus right now to give correct order is on user. Even checking whether the output of the function is in the right order or not, has its cost. I am suggesting that we can eliminate this cost as well. For example, PostgreSQL does not check whether a function is really immutable or not. Regards, Atri -- Regards, Atri *l'apprenant* -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs
On Tue, Jan 6, 2015 at 12:29 PM, Amit Langote langote_amit...@lab.ntt.co.jp wrote: On 06-01-2015 PM 04:00, Ashutosh Bapat wrote: On Tue, Jan 6, 2015 at 12:23 PM, Atri Sharma atri.j...@gmail.com wrote: We can eliminate the new node and put onus or having the right order on the user like we do with volatile setting of the function. That is exactly what the new node does, since we are not re sorting right now in case the order is wrong. Please see my explanation upthread, OrderCheck node's primary purpose is to check for a user error in the result rows order. The onus right now to give correct order is on user. Even checking whether the output of the function is in the right order or not, has its cost. I am suggesting that we can eliminate this cost as well. For example, PostgreSQL does not check whether a function is really immutable or not. Sounds something like ORDERED BY x implying output is known ordered by x perhaps enough hint for the planner to make necessary plan choices though I may be wrong. I may be missing something, but isnt what you mentioned the exact functionality this patch adds?
Re: [HACKERS] pg_rewind in contrib
On Tue, Jan 6, 2015 at 2:38 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Here's an updated version of pg_rewind. The code itself is the same as before, and corresponds to the sources in the current pg_rewind github repository, as of commit a65e3754faf9ca9718e6b350abc736de586433b7. Based mostly on Michael's comments, I have: * replaced VMware copyright notices with PGDG ones. * removed unnecessary cruft from .gitignore * changed the --version line and report bugs notice in --help to match other binaries in the PostgreSQL distribution * moved documentation from README to the user manual. * minor fixes to how the regression tests are launched so that they work again Some more work remains to be done on the regression tests. The way they're launched now is quite weird. It was written that way to make it work outside the PostgreSQL source tree, and also on Windows. Now that it lives in contrib, it should be redesigned. The documentation could also use some work; for now I just converted the existing text from README to sgml format. Some more comments: - The binary pg_rewind needs to be ignored in contrib/pg_rewind/ - Be careful that ./launcher permission should be set to 755 when doing a git add in it... Or regression tests will fail. - It would be good to get make check working so as it includes both check-local and check-remote - installcheck should be a target ignored. - Nitpicking: the header formats of filemap.c, datapagemap.c, datapagemap.h and util.h are incorrect (I pushed a fix about that in pg_rewind itself, feel free to pick it up). - parsexlog.c has a copyright mention to Nippon Telegraph and Telephone Corporation. Cannot we drop it safely? - MSVC build is not supported yet. You need to do something similar to pg_xlogdump, aka some magic with for example xlogreader.c. - Error codes needs to be generated before building pg_rewind. If I do for example a simple configure followed by make I get a failure: $ ./configure $ cd contrib/pg_rewind make In file included from parsexlog.c:16: In file included from ../../src/include/postgres.h:48: ../../src/include/utils/elog.h:69:10: fatal error: 'utils/errcodes.h' file not found #include utils/errcodes.h - Build fails with MinGW as there is visibly some unportable code: copy_fetch.c: In function 'recurse_dir': copy_fetch.c:112:3: warning: implicit declaration of function 'S_ISLNK' [-Wimpli cit-function-declaration] else if (S_ISLNK(fst.st_mode)) ^ copy_fetch.c: In function 'check_samefile': copy_fetch.c:298:2: warning: passing argument 2 of '_fstat64i32' from incompatib le pointer type [enabled by default] if (fstat(fd1, statbuf1) 0) ^ In file included from ../../src/include/port.h:283:0, from ../../src/include/c.h:1050, from ../../src/include/postgres_fe.h:25, from copy_fetch.c:10: c:\mingw\include\sys\stat.h:200:32: note: expected 'struct _stat64i32 *' but arg ument is of type 'struct stat *' __CRT_MAYBE_INLINE int __cdecl _fstat64i32(int desc, struct _stat64i32 *_stat) { ^ copy_fetch.c:304:2: warning: passing argument 2 of '_fstat64i32' from incompatib le pointer type [enabled by default] if (fstat(fd2, statbuf2) 0) ^ In file included from ../../src/include/port.h:283:0, from ../../src/include/c.h:1050, from ../../src/include/postgres_fe.h:25, from copy_fetch.c:10: c:\mingw\include\sys\stat.h:200:32: note: expected 'struct _stat64i32 *' but arg ument is of type 'struct stat *' __CRT_MAYBE_INLINE int __cdecl _fstat64i32(int desc, struct _stat64i32 *_stat) { ^ copy_fetch.c: In function 'truncate_target_file': copy_fetch.c:436:2: warning: implicit declaration of function 'truncate' [-Wimpl icit-function-declaration] if (truncate(dstpath, newsize) != 0) ^ copy_fetch.c: In function 'slurpFile': copy_fetch.c:561:2: warning: passing argument 2 of '_fstat64i32' from incompatib le pointer type [enabled by default] if (fstat(fd, statbuf) 0) ^ In file included from ../../src/include/port.h:283:0, from ../../src/include/c.h:1050, from ../../src/include/postgres_fe.h:25, from copy_fetch.c:10: c:\mingw\include\sys\stat.h:200:32: note: expected 'struct _stat64i32 *' but arg ument is of type 'struct stat *' __CRT_MAYBE_INLINE int __cdecl _fstat64i32(int desc, struct _stat64i32 *_stat) { ^ gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -We ndif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -f wrapv -fexcess-precision=standard -g -DG -fno-omit-frame-pointer -I../../src/int erfaces/libpq -I. -I. -I../../src/include -DFRONTEND -I../../src/include/port/ win32 -c -o libpq_fetch.o libpq_fetch.c -MMD -MP -MF .deps/libpq_fetch.Po gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -We ndif-labels
Re: [HACKERS] segmentation fault in execTuples.c#ExecStoreVirtualTuple
On Tue, Jan 6, 2015 at 12:39 AM, Manuel Kniep man...@adjust.com wrote: Hi, we are running postges 9.3.5 on gentoo linux kernel 3.16.5, compiled with gcc 4.8.3 Any ideas ? #17 0x0062bb9d in SPI_execute_with_args ( src=0x22b880bb0 \nCREATE TEMPORARY TABLE [...] #33 0x7f363555ab97 in plpgsql_exec_function (func=0xd888c8, fcinfo=0x7aa89a60) at pl_exec.c:321 #34 0x7f3632be in plpgsql_call_handler (fcinfo=0x7aa89a60) at pl_handler.c:129 [...] #46 0x0072e4eb in exec_simple_query ( query_string=0xd633b0 SELECT 'event' as item, '2014-12-30' as date, 'Backends::Backend9' as backend, '33' as bucket, * FROM materialize_events('2014-11-20', '2014-12-30')) at postgres.c:1048 From the backtrace you are showing, you are creating a temporary table with CREATE TABLE AS within a plpgsql function. Could you provide a self-contained test case? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On Tue, Dec 23, 2014 at 5:21 AM, Petr Jelinek p...@2ndquadrant.com wrote: Attached is v3 which besides the fixes mentioned above also includes changes discussed with Tomas (except the CREATE/DROP TABLESAMPLE METHOD), fixes for crash with FETCH FIRST and is rebased against current master. This patch needs a rebase, there is a small conflict in parallel_schedule. Structurally speaking, I think that the tsm methods should be added in src/backend/utils and not src/backend/access which is more high-level as tsm_bernoulli.c and tsm_system.c contain only a set of new procedure functions. Having a single header file tsm.h would be also a good thing. Regarding the naming, is tsm (table sample method) really appealing? Wouldn't it be better to use simply tablesample_* for the file names and the method names? This is a large patch... Wouldn't sampling.[c|h] extracted from ANALYZE live better as a refactoring patch? This would limit a bit bug occurrences on the main patch. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
On Fri, Dec 19, 2014 at 3:53 PM, Noah Misch n...@leadboat.com wrote: localhost template1=# select clock_timestamp(), pg_sleep(.1 * (n % 2)) from generate_series(0,7) t(n); clock_timestamp| pg_sleep ---+-- 2014-12-18 08:34:34.522126+00 | 2014-12-18 08:34:34.522126+00 | 2014-12-18 08:34:34.631508+00 | 2014-12-18 08:34:34.631508+00 | 2014-12-18 08:34:34.74089+00 | 2014-12-18 08:34:34.74089+00 | 2014-12-18 08:34:34.850272+00 | 2014-12-18 08:34:34.850272+00 | (8 rows) So, we would need additional information other than the node ID *and* the timestamp to ensure proper transaction commit ordering on Windows. That's not cool and makes this feature very limited on this platform. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
On 18-12-2014 AM 04:52, Robert Haas wrote: On Wed, Dec 17, 2014 at 1:53 PM, Josh Berkus j...@agliodbs.com wrote: Sure. But there's a big difference between we're going to take these steps and that problem will be fixable eventually and we're going to retain features of the current partitioning system which make that problem impossible to fix. The drift of discussion on this thread *sounded* like the latter, and I've been calling attention to the issue in an effort to make sure that it's not. Last week, I wrote a longish email listing out the common problems users have with our current partitioning as a way of benchmarking the plan for new partitioning. While some people responded to that post, absolutely nobody discussed the list of issues I gave. Is that because there's universal agreement that I got the major issues right? Seems doubtful. I agreed with many of the things you listed but not all of them. However, I don't think it's realistic to burden whatever patch Amit writes with the duty of, for example, making global indexes work. That's a huge problem all of its own. Now, conceivably, we could try to solve that as part of the next patch by insisting that the partitions have to really be block number ranges within a single relfilenode rather than separate relfilenodes as they are today ... but I think that's a bad design which we would likely regret bitterly. I also think that it would likely make what's being talked about here so complicated that it will never go anywhere. I think it's better that we focus on solving one problem really well - storing metadata for partition boundaries in the catalog so that we can do efficient tuple routing and partition pruning - and leave the other problems for later. Yes, I think partitioning as a whole is a BIG enough project that we need to tackle it as a series of steps each of which is a discussion of its own. The first step might as well be discussing how we represent a partitioned table. We have a number of design decisions to make during this step itself and we would definitely want to reach a consensus on these points. Things like where we indicate if a table is partitioned (pg_class), what the partition key looks like, where it is stored, what the partition definition looks like, where it is stored, how we represent arbitrary number of levels in partitioning hierarchy, how we implement that only leaf level relations in a hierarchy have storage, what are implications of all these choices, etc. Some of these points are being discussed. I agree that while we are discussing these points, we could also be discussing how we solve problems of existing partitioning implementation using whatever the above things end up being. Proposed approaches to solve those problems might be useful to drive the first step as well or perhaps that's how it should be done anyway. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs
On 06-01-2015 PM 04:00, Ashutosh Bapat wrote: On Tue, Jan 6, 2015 at 12:23 PM, Atri Sharma atri.j...@gmail.com wrote: We can eliminate the new node and put onus or having the right order on the user like we do with volatile setting of the function. That is exactly what the new node does, since we are not re sorting right now in case the order is wrong. Please see my explanation upthread, OrderCheck node's primary purpose is to check for a user error in the result rows order. The onus right now to give correct order is on user. Even checking whether the output of the function is in the right order or not, has its cost. I am suggesting that we can eliminate this cost as well. For example, PostgreSQL does not check whether a function is really immutable or not. Sounds something like ORDERED BY x implying output is known ordered by x perhaps enough hint for the planner to make necessary plan choices though I may be wrong. Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs
On Tue, Jan 6, 2015 at 12:30 PM, Ashutosh Bapat ashutosh.ba...@enterprisedb.com wrote: On Tue, Jan 6, 2015 at 12:23 PM, Atri Sharma atri.j...@gmail.com wrote: Even checking whether the output of the function is in the right order or not, has its cost. I am suggesting that we can eliminate this cost as well. For example, PostgreSQL does not check whether a function is really immutable or not. That implies possibly returning a non ordered result set even when the user explicitly specified an ORDER BY clause. If we are depending on an optimization and it did not work out (even if it is a user error), I think we should error out indicating that the order was incorrect rather than returning non ordered rows, which could be disastrous IMO.
[HACKERS] Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch)
On Mon, Jan 5, 2015 at 5:53 PM, Peter Geoghegan p...@heroku.com wrote: It's not all that easy to recreate, even with my custom instrumentation. With fsync=off, it occurs somewhat infrequently with a custom instrumentation Postgres build: pg@hamster:~/jjanes_upsert$ perl count_upsert.pl 8 10 [Mon Jan 5 17:31:57 2015] init done at count_upsert.pl line 101. [Mon Jan 5 17:32:17 2015] child abnormal exit [Mon Jan 5 17:32:17 2015] DBD::Pg::db selectall_arrayref failed: ERROR: mismatch in xmin for (322,264). Initially 4324145, then 4324429 at count_upsert.pl line 210.\n at count_upsert.pl line 227. [Mon Jan 5 17:32:49 2015] sum is -800 [Mon Jan 5 17:32:49 2015] count is 9515 [Mon Jan 5 17:32:49 2015] normal exit at 1420507969 after 733912 items processed at count_upsert.pl line 182. I think that super deleting broken promise tuples undermines how vacuum interlocking is supposed to work [1]. Hmm. On second thought, I think that the instrumentation I added here can be confused by redirecting line pointers, and that may account for this apparent problem. Still, I would like for us to figure out how it was previously possible for the implementation to update a super-deleted tuple (resulting in unusual all-NULLs tuples) since I though that in order for that to happen, other xacts would have to see what was only ever a promise tuple as a fully-fledged tuple -- in order for the logic that pre-checks for conflicts to find a conflict, it would have to find the tuple already committed (so it certainly couldn't be an unresolved promise tuple). My superficial fix [1] was written without fully understanding what the problem was. In any case, I doubt the changes that I made to tqual.c in that fix will be accepted as-is. [1] http://www.postgresql.org/message-id/cam3swztftt_fehet3tu3ykcpcypynnauquz3q+naasnh-60...@mail.gmail.com -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redesigning checkpoint_segments
On 01/04/2015 11:44 PM, Josh Berkus wrote: On 01/03/2015 12:56 AM, Heikki Linnakangas wrote: On 01/03/2015 12:28 AM, Josh Berkus wrote: On 01/02/2015 01:57 AM, Heikki Linnakangas wrote: wal_keep_segments does not affect the calculation of CheckPointSegments. If you set wal_keep_segments high enough, checkpoint_wal_size will be exceeded. The other alternative would be to force a checkpoint earlier, i.e. lower CheckPointSegments, so that checkpoint_wal_size would be honored. However, if you set wal_keep_segments high enough, higher than checkpoint_wal_size, it's impossible to honor checkpoint_wal_size no matter how frequently you checkpoint. So you're saying that wal_keep_segments is part of the max_wal_size total, NOT in addition to it? Not sure what you mean. wal_keep_segments is an extra control that can prevent WAL segments from being recycled. It has the same effect as archive_command failing for N most recent segments, if that helps. I mean, if I have these settings: max_wal_size* = 256MB wal_keep_segments = 8 ... then my max wal size is *still* 256MB, NOT 384MB? Right. If that's the case (and I think it's a good plan), then as a follow-on, we should prevent users from setting wal_keep_segments to more than 50% of max_wal_size, no? Not sure if the 50% figure is correct, but I see what you mean: don't allow setting wal_keep_segments so high that we would exceed max_wal_size because of it. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add modulo (%) operator to pgbench
I'm also just looking at you ERROR() macro, it seems that core code is quite careful not to use __VA_ARGS__ on compilers where HAVE__VA_ARGS is not defined. I'd say this needs to be fixed too. Have a look at the trick used in elog.h for when HAVE__VA_ARGS is not defined. Here is a v5 with the vararg macro expanded. -- Fabien.diff --git a/contrib/pgbench/.gitignore b/contrib/pgbench/.gitignore index 489a2d6..aae819e 100644 --- a/contrib/pgbench/.gitignore +++ b/contrib/pgbench/.gitignore @@ -1 +1,3 @@ +/exprparse.c +/exprscan.c /pgbench diff --git a/contrib/pgbench/Makefile b/contrib/pgbench/Makefile index b8e2fc8..2d4033b 100644 --- a/contrib/pgbench/Makefile +++ b/contrib/pgbench/Makefile @@ -4,7 +4,9 @@ PGFILEDESC = pgbench - a simple program for running benchmark tests PGAPPICON = win32 PROGRAM = pgbench -OBJS = pgbench.o $(WIN32RES) +OBJS = pgbench.o exprparse.o $(WIN32RES) + +EXTRA_CLEAN = exprparse.c exprscan.c PG_CPPFLAGS = -I$(libpq_srcdir) PG_LIBS = $(libpq_pgport) $(PTHREAD_LIBS) @@ -18,8 +20,22 @@ subdir = contrib/pgbench top_builddir = ../.. include $(top_builddir)/src/Makefile.global include $(top_srcdir)/contrib/contrib-global.mk + +distprep: exprparse.c exprscan.c endif ifneq ($(PORTNAME), win32) override CFLAGS += $(PTHREAD_CFLAGS) endif + +# There is no correct way to write a rule that generates two files. +# Rules with two targets don't have that meaning, they are merely +# shorthand for two otherwise separate rules. To be safe for parallel +# make, we must chain the dependencies like this. The semicolon is +# important, otherwise make will choose the built-in rule for +# gram.y=gram.c. + +exprparse.h: exprparse.c ; + +# exprscan is compiled as part of exprparse +exprparse.o: exprscan.c diff --git a/contrib/pgbench/exprparse.y b/contrib/pgbench/exprparse.y new file mode 100644 index 000..243c6b9 --- /dev/null +++ b/contrib/pgbench/exprparse.y @@ -0,0 +1,96 @@ +%{ +/*- + * + * exprparse.y + * bison grammar for a simple expression syntax + * + * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + *- + */ + +#include postgres_fe.h + +#include pgbench.h + +PgBenchExpr *expr_parse_result; + +static PgBenchExpr *make_integer_constant(int64 ival); +static PgBenchExpr *make_variable(char *varname); +static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr, + PgBenchExpr *rexpr); + +%} + +%expect 0 +%name-prefix=expr_yy + +%union +{ + int64 ival; + char *str; + PgBenchExpr *expr; +} + +%type expr expr +%type ival INTEGER +%type str VARIABLE +%token INTEGER VARIABLE +%token CHAR_ERROR /* never used, will raise a syntax error */ + +%left '+' '-' +%left '*' '/' '%' +%right UMINUS + +%% + +result: expr{ expr_parse_result = $1; } + +expr: '(' expr ')' { $$ = $2; } + | '+' expr %prec UMINUS { $$ = $2; } + | '-' expr %prec UMINUS { $$ = make_op('-', make_integer_constant(0), $2); } + | expr '+' expr { $$ = make_op('+', $1, $3); } + | expr '-' expr { $$ = make_op('-', $1, $3); } + | expr '*' expr { $$ = make_op('*', $1, $3); } + | expr '/' expr { $$ = make_op('/', $1, $3); } + | expr '%' expr { $$ = make_op('%', $1, $3); } + | INTEGER{ $$ = make_integer_constant($1); } + | VARIABLE { $$ = make_variable($1); } + ; + +%% + +static PgBenchExpr * +make_integer_constant(int64 ival) +{ + PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr)); + + expr-etype = ENODE_INTEGER_CONSTANT; + expr-u.integer_constant.ival = ival; + return expr; +} + +static PgBenchExpr * +make_variable(char *varname) +{ + PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr)); + + expr-etype = ENODE_VARIABLE; + expr-u.variable.varname = varname; + return expr; +} + +static PgBenchExpr * +make_op(char operator, PgBenchExpr *lexpr, PgBenchExpr *rexpr) +{ + PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr)); + + expr-etype = ENODE_OPERATOR; + expr-u.operator.operator = operator; + expr-u.operator.lexpr = lexpr; + expr-u.operator.rexpr = rexpr; + return expr; +} + +#include exprscan.c diff --git a/contrib/pgbench/exprscan.l b/contrib/pgbench/exprscan.l new file mode 100644 index 000..4c9229c --- /dev/null +++ b/contrib/pgbench/exprscan.l @@ -0,0 +1,105 @@ +%{ +/*- + * + * exprscan.l + * a lexical scanner for a simple expression syntax + * + * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + *- + */ + +/* line and column number for error reporting */ +static int yyline = 0, yycol = 0; + +/* Handles to the buffer that the lexer uses internally */ +static YY_BUFFER_STATE scanbufhandle; +static char
Re: [HACKERS] pg_basebackup -x/X doesn't play well with archive_mode wal_keep_segments
On Mon, Jan 5, 2015 at 6:22 PM, Andres Freund and...@2ndquadrant.com wrote: On 2015-01-05 16:22:56 +0900, Fujii Masao wrote: On Sun, Jan 4, 2015 at 5:47 AM, Andres Freund and...@2ndquadrant.com wrote: On 2015-01-03 16:03:36 +0100, Andres Freund wrote: pg_basebackup really changed heavily since it's introduction. And desparately needs some restructuring. The patch seems to break pg_receivexlog. I got the following error message while running pg_receivexlog. pg_receivexlog: could not create archive status file mmm/archive_status/00010003.done: No such file or directory Dang. Stupid typo. And my tests didn't catch it, because I had archive_directory in the target directory :( At least it's only broken in master :/ Thanks for the catch. Do you have some additional testsuite or did you catch it manually? Manually... I just tested the tools and options which the patch may affect... Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redesigning checkpoint_segments
On 2015-01-05 11:34:54 +0200, Heikki Linnakangas wrote: On 01/04/2015 11:44 PM, Josh Berkus wrote: On 01/03/2015 12:56 AM, Heikki Linnakangas wrote: On 01/03/2015 12:28 AM, Josh Berkus wrote: On 01/02/2015 01:57 AM, Heikki Linnakangas wrote: wal_keep_segments does not affect the calculation of CheckPointSegments. If you set wal_keep_segments high enough, checkpoint_wal_size will be exceeded. The other alternative would be to force a checkpoint earlier, i.e. lower CheckPointSegments, so that checkpoint_wal_size would be honored. However, if you set wal_keep_segments high enough, higher than checkpoint_wal_size, it's impossible to honor checkpoint_wal_size no matter how frequently you checkpoint. So you're saying that wal_keep_segments is part of the max_wal_size total, NOT in addition to it? Not sure what you mean. wal_keep_segments is an extra control that can prevent WAL segments from being recycled. It has the same effect as archive_command failing for N most recent segments, if that helps. I mean, if I have these settings: max_wal_size* = 256MB wal_keep_segments = 8 ... then my max wal size is *still* 256MB, NOT 384MB? Right. With that you mean that wal_keep_segments has *no* influence over checkpoint pacing or the contrary? Because upthread you imply that it doesn't, but later comments may mean the contrary. I think that influencing the pacing would be pretty insane - the user certainly doesn't expect drastic performance changes when changing wal_keep_segments. It's confusing enough that it can cause slight peformance variations due to recycling, but we shouldn't make it have a larger influence. If that's the case (and I think it's a good plan), then as a follow-on, we should prevent users from setting wal_keep_segments to more than 50% of max_wal_size, no? Not sure if the 50% figure is correct, but I see what you mean: don't allow setting wal_keep_segments so high that we would exceed max_wal_size because of it. That seems a unrealistic goal. I've seen setups that have set checkpoint_segments intentionally, and with good reasoning, north of 50k. Neither wal_keep_segments, nor failing archive_commands nor replication slot should have an influence on checkpoint pacing. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Incremental backup v3: incremental PoC
I've noticed that I missed to add this to the commitfest. I've just added it. It is not meant to end up in a committable state, but at this point I'm searching for some code review and more discusison. I'm also about to send an additional patch to implement an LSN map as an additional fork for heap files. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it signature.asc Description: OpenPGP digital signature
Re: [HACKERS] [RFC] Incremental backup v3: incremental PoC
I've noticed that I missed to add this to the commitfest. I've just added it. It is not meant to end up in a committable state, but at this point I'm searching for some code review and more discusison. I'm also about to send an additional patch to implement an LSN map as an additional fork for heap files. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Something is broken in logical decoding with CLOBBER_CACHE_ALWAYS
On 2015-01-04 21:54:40 +0100, Andres Freund wrote: On January 4, 2015 9:51:43 PM CET, Andrew Dunstan and...@dunslane.net wrote: On 12/15/2014 12:04 PM, Andres Freund wrote: I think the safest fix would be to defer catchup interrupt processing while you're in this mode. You don't really want to be processing any remote sinval messages at all, I'd think. Well, we need to do relmap, smgr and similar things. So I think that'd be more complicated than we want. Where are we on this? Traffic seems to have gone quite but we still have a bunch of buildfarm animals red. I've a simple fix (similar too what I iriginally outkined) which I plan to post soonish. I've tried a bunch of things roughly in the vein of Tom's suggestions, but they all are more invasive and still incomplete. Attached. Note that part 1) referenced in the commit message is actually problematic in all branches. I think we actually should backpatch that part all the way, because if we ever hit that case the consequences of the current coding will be rather hard to analyze. If others think so as well, I'm going to split the commit into two parts, so commit messages for 9.4 won't reference logical decoding. Since there hasn't been a report of that error for a long while (~8.1 era), I can also live with not backpatching further than 9.4. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From e5a6c2e9211561bc5c9985a8bf107d9911aad9f6 Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Mon, 5 Jan 2015 11:52:05 +0100 Subject: [PATCH] Improve handling of relcache invalidations to currently invisible relations. The corner case where a relcache invalidation tried to rebuild the entry for a referenced relation but couldn't find it in the catalog wasn't correct. 1) The code tried to RelationCacheDelete/RelationDestroyRelation the entry. That didn't work when assertions are enabled because the latter contains an assertion ensuring the refcount is zero. It's also a bad idea, because by virtue of being referenced somebody might actually look at the entry, which is possible if the error is trapped and handled via a subtransaction abort. Instead just error out, without deleting the entry. As the entry is marked invalid, the worst that can happen is that we leak some memory. 2) When using a historic snapshot for logical decoding it can validly happen that a relation that's in the relcache isn't visible to that historic snapshot. E.g. if the relation is referenced in the query that uses the SQL interface for logical decoding. While erroring cleanly, instead of hitting an assertion due to 1) is already an improvement, it's not good enough. So additionally allow that case without an error if a historic snapshot is set up - that won't allow an invalid entry to stay in the cache because it's a) already marked invalid and will thus be rebuilt during the next access b) the syscaches will be reset at the end of decoding. There might be prettier solutions to handle this case, but all that we could think of so far end up being much more complex than this simple fix. This fixes the assertion failures reported by the buildfarm (markhor, tick, leech) after the introduction of new regression tests in 89fd41b390a4. The failure there weren't actually directly caused by CLOBBER_CACHE_ALWAYS but via the long runtimes due to it. Backpatch the whole patch to 9.4, and 1) from above all the way back. --- src/backend/utils/cache/relcache.c | 21 ++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c index c2e574d..2cce2c1 100644 --- a/src/backend/utils/cache/relcache.c +++ b/src/backend/utils/cache/relcache.c @@ -2192,9 +2192,24 @@ RelationClearRelation(Relation relation, bool rebuild) newrel = RelationBuildDesc(save_relid, false); if (newrel == NULL) { - /* Should only get here if relation was deleted */ - RelationCacheDelete(relation); - RelationDestroyRelation(relation, false); + /* + * We can validly get here, if we're using a historic snapshot in + * which a relation, accessed from outside logical decoding, is + * still invisible. In that case it's fine to just mark the + * relation as invalid and return - it'll fully get reloaded after + * by the cache reset at the end of logical decoding. + */ + if (HistoricSnapshotActive()) +return; + + /* + * This situation shouldn't happen as dropping a relation + * shouldn't be possible if still referenced + * (c.f. CheckTableNotInUse()). But if get here anyway, we can't + * just delete the relcache entry as it's possibly still + * referenced and the error might get caught and handled via a + * subtransaction rollback. + */ elog(ERROR, relation %u
Re: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code
On Wed, Dec 31, 2014 at 12:22 AM, Alexey Vasiliev leopard...@inbox.ru wrote: Tue, 30 Dec 2014 21:39:33 +0900 от Michael Paquier michael.paqu...@gmail.com: As I understand now = (pg_time_t) time(NULL); return time in seconds, what is why I multiply value to 1000 to compare with restore_command_retry_interval in milliseconds. This way of doing is incorrect, you would get a wait time of 1s even for retries lower than 1s. In order to get the feature working correctly and to get a comparison granularity sufficient, you need to use TimetampTz for the tracking of the current and last failure time and to use the APIs from TimestampTz for difference calculations. I am not sure about small retry interval of time, in my cases I need interval bigger 5 seconds (20-40 seconds). Right now I limiting this value be bigger 100 milliseconds. Other people may disagree here, but I don't see any reason to put restrictions to this interval of time. Attached is a patch fixing the feature to use TimestampTz, updating as well the docs and recovery.conf.sample which was incorrect, on top of other things I noticed here and there. Alexey, does this new patch look fine for you? -- Michael From 9bb71e39b218885466a53c005a87361d4ae889fa Mon Sep 17 00:00:00 2001 From: Michael Paquier mich...@otacoo.com Date: Mon, 5 Jan 2015 17:10:13 +0900 Subject: [PATCH] Add restore_command_retry_interval to control retries of restore_command restore_command_retry_interval can be specified in recovery.conf to control the interval of time between retries of restore_command when failures occur. --- doc/src/sgml/recovery-config.sgml | 21 + src/backend/access/transam/recovery.conf.sample | 9 ++ src/backend/access/transam/xlog.c | 42 - 3 files changed, 64 insertions(+), 8 deletions(-) diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml index 0c64ff2..0488ae3 100644 --- a/doc/src/sgml/recovery-config.sgml +++ b/doc/src/sgml/recovery-config.sgml @@ -145,6 +145,27 @@ restore_command = 'copy C:\\server\\archivedir\\%f %p' # Windows /listitem /varlistentry + varlistentry id=restore-command-retry-interval xreflabel=restore_command_retry_interval + termvarnamerestore_command_retry_interval/varname (typeinteger/type) + indexterm +primaryvarnamerestore_command_retry_interval/ recovery parameter/primary + /indexterm + /term + listitem + para +If varnamerestore_command/ returns nonzero exit status code, retry +command after the interval of time specified by this parameter, +measured in milliseconds if no unit is specified. Default value is +literal5s/. + /para + para +This command is particularly useful for warm standby configurations +to leverage the amount of retries done to restore a WAL segment file +depending on the activity of a master node. + /para + /listitem + /varlistentry + /variablelist /sect1 diff --git a/src/backend/access/transam/recovery.conf.sample b/src/backend/access/transam/recovery.conf.sample index b777400..8699a33 100644 --- a/src/backend/access/transam/recovery.conf.sample +++ b/src/backend/access/transam/recovery.conf.sample @@ -58,6 +58,15 @@ # #recovery_end_command = '' # +# +# restore_command_retry_interval +# +# specifies an optional retry interval for restore_command command, if +# previous command returned nonzero exit status code. This can be useful +# to increase or decrease the number of calls of restore_command. +# +#restore_command_retry_interval = '5s' +# #--- # RECOVERY TARGET PARAMETERS #--- diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 5cc7e47..1944e6f 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -235,6 +235,7 @@ static TimestampTz recoveryTargetTime; static char *recoveryTargetName; static int recovery_min_apply_delay = 0; static TimestampTz recoveryDelayUntilTime; +static int restore_command_retry_interval = 5000; /* options taken from recovery.conf for XLOG streaming */ static bool StandbyModeRequested = false; @@ -4881,6 +4882,26 @@ readRecoveryCommandFile(void) (errmsg_internal(trigger_file = '%s', TriggerFile))); } + else if (strcmp(item-name, restore_command_retry_interval) == 0) + { + const char *hintmsg; + + if (!parse_int(item-value, restore_command_retry_interval, GUC_UNIT_MS, + hintmsg)) +ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg(parameter \%s\ requires a temporal value, +restore_command_retry_interval), + hintmsg ? errhint(%s, _(hintmsg)) : 0)); + ereport(DEBUG2, + (errmsg_internal(restore_command_retry_interval
Re: [HACKERS] add modulo (%) operator to pgbench
On 1 January 2015 at 21:23, Fabien COELHO coe...@cri.ensmp.fr wrote: I was about to go and look at this, but I had a problem when attempting to compile with MSVC. Thanks! Here is a v4 which includes your fix. I'm also just looking at you ERROR() macro, it seems that core code is quite careful not to use __VA_ARGS__ on compilers where HAVE__VA_ARGS is not defined. I'd say this needs to be fixed too. Have a look at the trick used in elog.h for when HAVE__VA_ARGS is not defined. Regards David Rowley
Re: [HACKERS] add modulo (%) operator to pgbench
I'm also just looking at you ERROR() macro, it seems that core code is quite careful not to use __VA_ARGS__ on compilers where HAVE__VA_ARGS is not defined. I'd say this needs to be fixed too. Have a look at the trick used in elog.h for when HAVE__VA_ARGS is not defined. Indeed. The simplest solution seems to expand this macro so that these is no macro:-) I'll do that. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compression of full-page-writes
On Sat, Jan 3, 2015 at 1:52 AM, Bruce Momjian br...@momjian.us wrote: On Fri, Jan 2, 2015 at 10:15:57AM -0600, k...@rice.edu wrote: On Fri, Jan 02, 2015 at 01:01:06PM +0100, Andres Freund wrote: On 2014-12-31 16:09:31 -0500, Bruce Momjian wrote: I still don't understand the value of adding WAL compression, given the high CPU usage and minimal performance improvement. The only big advantage is WAL storage, but again, why not just compress the WAL file when archiving. before: pg_xlog is 800GB after: pg_xlog is 600GB. I'm damned sure that many people would be happy with that, even if the *per backend* overhead is a bit higher. And no, compression of archives when archiving helps *zap* with that (streaming, wal_keep_segments, checkpoint_timeout). As discussed before. Greetings, Andres Freund +1 On an I/O constrained system assuming 50:50 table:WAL I/O, in the case above you can process 100GB of transaction data at the cost of a bit more CPU. OK, so given your stats, the feature give a 12.5% reduction in I/O. If that is significant, shouldn't we see a performance improvement? If we don't see a performance improvement, is I/O reduction worthwhile? Is it valuable in that it gives non-database applications more I/O to use? Is that all? I suggest we at least document that this feature as mostly useful for I/O reduction, and maybe say CPU usage and performance might be negatively impacted. OK, here is the email I remember from Fujii Masao this same thread that showed a performance improvement for WAL compression: http://www.postgresql.org/message-id/CAHGQGwGqG8e9YN0fNCUZqTTT=hnr7ly516kft5ffqf4pp1q...@mail.gmail.com Why are we not seeing the 33% compression and 15% performance improvement he saw? Because the benchmarks I and Michael used are very difffernet. I just used pgbench, but he used his simple test SQLs (see http://www.postgresql.org/message-id/cab7npqsc97o-ue5paxfmukwcxe_jioyxo1m4a0pmnmyqane...@mail.gmail.com). Furthermore, the data type of pgbench_accounts.filler column is character(84) and its content is empty, so pgbench_accounts is very compressible. This is one of the reasons I could see good performance improvement and high compression ratio. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compression of full-page-writes
On Sat, Jan 3, 2015 at 2:24 AM, Bruce Momjian br...@momjian.us wrote: On Fri, Jan 2, 2015 at 02:18:12PM -0300, Claudio Freire wrote: On Fri, Jan 2, 2015 at 2:11 PM, Andres Freund and...@2ndquadrant.com wrote: , I now see the compression patch as something that has negatives, so has to be set by the user, and only wins in certain cases. I am disappointed, and am trying to figure out how this became such a marginal win for 9.5. :-( I find the notion that a multi digit space reduction is a marginal win pretty ridiculous and way too narrow focused. Our WAL volume is a *significant* problem in the field. And it mostly consists out of FPWs spacewise. One thing I'd like to point out, is that in cases where WAL I/O is an issue (ie: WAL archiving), usually people already compress the segments during archiving. I know I do, and I know it's recommended on the web, and by some consultants. So, I wouldn't want this FPW compression, which is desirable in replication scenarios if you can spare the CPU cycles (because of streaming), adversely affecting WAL compression during archiving. To be specific, desirable in streaming replication scenarios that don't use SSL compression. (What percentage is that?) It is something we should mention in the docs for this feature? Even if SSL is used in replication, FPW compression is useful. It can reduce the amount of I/O in the standby side. Sometimes I've seen walreceiver's I/O had become a performance bottleneck especially in synchronous replication cases. FPW compression can be useful for those cases, for example. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup -x/X doesn't play well with archive_mode wal_keep_segments
On 2015-01-05 16:22:56 +0900, Fujii Masao wrote: On Sun, Jan 4, 2015 at 5:47 AM, Andres Freund and...@2ndquadrant.com wrote: On 2015-01-03 16:03:36 +0100, Andres Freund wrote: pg_basebackup really changed heavily since it's introduction. And desparately needs some restructuring. The patch seems to break pg_receivexlog. I got the following error message while running pg_receivexlog. pg_receivexlog: could not create archive status file mmm/archive_status/00010003.done: No such file or directory Dang. Stupid typo. And my tests didn't catch it, because I had archive_directory in the target directory :( At least it's only broken in master :/ Thanks for the catch. Do you have some additional testsuite or did you catch it manually? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Incremental backup v3: incremental PoC
On Mon, Jan 5, 2015 at 7:56 PM, Marco Nenciarini marco.nenciar...@2ndquadrant.it wrote: I've noticed that I missed to add this to the commitfest. I've just added it. It is not meant to end up in a committable state, but at this point I'm searching for some code review and more discusison. I'm also about to send an additional patch to implement an LSN map as an additional fork for heap files. Moved to CF 2015-02. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup -x/X doesn't play well with archive_mode wal_keep_segments
On 2015-01-05 18:49:06 +0900, Fujii Masao wrote: On Mon, Jan 5, 2015 at 6:22 PM, Andres Freund and...@2ndquadrant.com wrote: On 2015-01-05 16:22:56 +0900, Fujii Masao wrote: On Sun, Jan 4, 2015 at 5:47 AM, Andres Freund and...@2ndquadrant.com wrote: On 2015-01-03 16:03:36 +0100, Andres Freund wrote: pg_basebackup really changed heavily since it's introduction. And desparately needs some restructuring. The patch seems to break pg_receivexlog. I got the following error message while running pg_receivexlog. pg_receivexlog: could not create archive status file mmm/archive_status/00010003.done: No such file or directory Dang. Stupid typo. And my tests didn't catch it, because I had archive_directory in the target directory :( At least it's only broken in master :/ I've pushed the trivial fix, and verified using my adapted testscript that it works on all branches. Thanks for the catch. Do you have some additional testsuite or did you catch it manually? Manually... I just tested the tools and options which the patch may affect... Thanks! Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Sun, Dec 28, 2014 at 10:57 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Fri, Dec 26, 2014 at 4:16 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Fri, Dec 26, 2014 at 3:24 PM, Fujii Masao masao.fu...@gmail.com wrote: pglz_compress() and pglz_decompress() still use PGLZ_Header, so the frontend which uses those functions needs to handle PGLZ_Header. But it basically should be handled via the varlena macros. That is, the frontend still seems to need to understand the varlena datatype. I think we should avoid that. Thought? Hm, yes it may be wiser to remove it and make the data passed to pglz for varlena 8 bytes shorter.. OK, here is the result of this work, made of 3 patches. Thanks for updating the patches! The first two patches move pglz stuff to src/common and make it a frontend utility entirely independent on varlena and its related metadata. - Patch 1 is a simple move of pglz to src/common, with PGLZ_Header still present. There is nothing amazing here, and that's the broken version that has been reverted in 966115c. The patch 1 cannot be applied to the master successfully because of recent change. - The real stuff comes with patch 2, that implements the removal of PGLZ_Header, changing the APIs of compression and decompression to pglz to not have anymore toast metadata, this metadata being now localized in tuptoaster.c. Note that this patch protects the on-disk format (tested with pg_upgrade from 9.4 to a patched HEAD server). Here is how the APIs of compression and decompression look like with this patch, simply performing operations from a source to a destination: extern int32 pglz_compress(const char *source, int32 slen, char *dest, const PGLZ_Strategy *strategy); extern int32 pglz_decompress(const char *source, char *dest, int32 compressed_size, int32 raw_size); The return value of those functions is the number of bytes written in the destination buffer, and 0 if operation failed. So it's guaranteed that 0 is never returned in success case? I'm not sure if that case can really happen, though. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re[2]: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code
Mon, 5 Jan 2015 17:16:43 +0900 от Michael Paquier michael.paqu...@gmail.com: On Wed, Dec 31, 2014 at 12:22 AM, Alexey Vasiliev leopard...@inbox.ru wrote: Tue, 30 Dec 2014 21:39:33 +0900 от Michael Paquier michael.paqu...@gmail.com: As I understand now = (pg_time_t) time(NULL); return time in seconds, what is why I multiply value to 1000 to compare with restore_command_retry_interval in milliseconds. This way of doing is incorrect, you would get a wait time of 1s even for retries lower than 1s. In order to get the feature working correctly and to get a comparison granularity sufficient, you need to use TimetampTz for the tracking of the current and last failure time and to use the APIs from TimestampTz for difference calculations. I am not sure about small retry interval of time, in my cases I need interval bigger 5 seconds (20-40 seconds). Right now I limiting this value be bigger 100 milliseconds. Other people may disagree here, but I don't see any reason to put restrictions to this interval of time. Attached is a patch fixing the feature to use TimestampTz, updating as well the docs and recovery.conf.sample which was incorrect, on top of other things I noticed here and there. Alexey, does this new patch look fine for you? -- Michael Hello. Thanks for help. Yes, new patch look fine! -- Alexey Vasiliev -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add modulo (%) operator to pgbench
David Rowley wrote: I'm also just looking at you ERROR() macro, it seems that core code is quite careful not to use __VA_ARGS__ on compilers where HAVE__VA_ARGS is not defined. I'd say this needs to be fixed too. Have a look at the trick used in elog.h for when HAVE__VA_ARGS is not defined. Hm, I just looked at the previous version which used ERROR rather than LOCATE and LOCATION, and I liked that approach better. Can we get that back? I understand that for the purposes of this patch it's easier to not change existing fprintf()/exit() calls, though. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add modulo (%) operator to pgbench
Fabien COELHO wrote: I'm also just looking at you ERROR() macro, it seems that core code is quite careful not to use __VA_ARGS__ on compilers where HAVE__VA_ARGS is not defined. I'd say this needs to be fixed too. Have a look at the trick used in elog.h for when HAVE__VA_ARGS is not defined. Here is a v5 with the vararg macro expanded. Thanks. On top of evaluateExpr() we need a comment (generally I think pgbench could do with more comments; not saying your patch should add them, just expressing an opinion.) Also, intuitively I would say that the return values of that function should be reversed: return true if things are good. Can we cause a stack overflow in this function with a complex enough expression? I wonder about LOCATE and LOCATION. Can we do away with the latter, and keep only LOCATE perhaps with a better name such as PRINT_ERROR_AT or similar? I would just expand an ad-hoc fprintf in the single place where the other macro is used. Are we okay with only integer operands? Is this something we would expand in the future? Is the gaussian/exp random stuff going to work with integer operands, if we want to change it to use function syntax, as expressed elsewhere? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Fri, Jan 2, 2015 at 5:36 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Thu, Jan 1, 2015 at 11:29 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Jan 1, 2015 at 12:00 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: Can we check the number of free bgworkers slots to set the max workers? The real solution here is that this patch can't throw an error if it's unable to obtain the desired number of background workers. It needs to be able to smoothly degrade to a smaller number of background workers, or none at all. I think handling this way can have one side effect which is that if we degrade to smaller number, then the cost of plan (which was decided by optimizer based on number of parallel workers) could be more than non-parallel scan. Ideally before finalizing the parallel plan we should reserve the bgworkers required to execute that plan, but I think as of now we can workout a solution without it. I don't think this is very practical. When cached plans are in use, we can have a bunch of plans sitting around that may or may not get reused at some point in the future, possibly far in the future. The current situation, which I think we want to maintain, is that such plans hold no execution-time resources (e.g. locks) and, generally, don't interfere with other things people might want to execute on the system. Nailing down a bunch of background workers just in case we might want to use them in the future would be pretty unfriendly. I think it's right to view this in the same way we view work_mem. We plan on the assumption that an amount of memory equal to work_mem will be available at execution time, without actually reserving it. If the plan happens to need that amount of memory and if it actually isn't available when needed, then performance will suck; conceivably, the OOM killer might trigger. But it's the user's job to avoid this by not setting work_mem too high in the first place. Whether this system is for the best is arguable: one can certainly imagine a system where, if there's not enough memory at execution time, we consider alternatives like (a) replanning with a lower memory target, (b) waiting until more memory is available, or (c) failing outright in lieu of driving the machine into swap. But devising such a system is complicated -- for example, replanning with a lower memory target might be latch onto a far more expensive plan, such that we would have been better off waiting for more memory to be available; yet trying to waiting until more memory is available might result in waiting forever. And that's why we don't have such a system. We don't need to do any better here. The GUC should tell us how many parallel workers we should anticipate being able to obtain. If other settings on the system, or the overall system load, preclude us from obtaining that number of parallel workers, then the query will take longer to execute; and the plan might be sub-optimal. If that happens frequently, the user should lower the planner GUC to a level that reflects the resources actually likely to be available at execution time. By the way, another area where this kind of effect crops up is with the presence of particular disk blocks in shared_buffers or the system buffer cache. Right now, the planner makes no attempt to cost a scan of a frequently-used, fully-cached relation different than a rarely-used, probably-not-cached relation; and that sometimes leads to bad plans. But if it did try to do that, then we'd have the same kind of problem discussed here -- things might change between planning and execution, or even after the beginning of execution. Also, we might get nasty feedback effects: since the relation isn't cached, we view a plan that would involve reading it in as very expensive, and avoid such a plan. However, we might be better off picking the slow plan anyway, because it might be that once we've read the data once it will stay cached and run much more quickly than some plan that seems better starting from a cold cache. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel mode and parallel contexts
On Sat, Jan 3, 2015 at 7:31 PM, Andres Freund and...@2ndquadrant.com wrote: A couple remarks: * Shouldn't this provide more infrastructure to deal with the fact that we might get less parallel workers than we had planned for? Maybe, but I'm not really sure what that should look like. My working theory is that individual parallel applications (executor nodes, or functions that use parallelism internally, or whatever) should be coded in such a way that they work correctly even if the number of workers that starts is smaller than what they requested, or even zero. It may turn out that this is impractical in some cases, but I don't know what those cases are yet so I don't know what the common threads will be. I think parallel seq scan should work by establishing N tuple queues (where N is the number of workers we have). When asked to return a tuple, the master should poll all of those queues one after another until it finds one that contains a tuple. If none of them contain a tuple then it should claim a block to scan and return a tuple from that block. That way, if there are enough running workers to keep up with the master's demand for tuples, the master won't do any of the actual scan itself. But if the workers can't keep up -- e.g. suppose 90% of the CPU consumed by the query is in the filter qual for the scan -- then the master can pitch in along with everyone else. As a non-trivial fringe benefit, if the workers don't all start, or take a while to initialize, the user backend isn't stalled meanwhile. * I wonder if parallel contexts shouldn't be tracked via resowners That is a good question. I confess that I'm somewhat fuzzy about which things should be tracked via the resowner mechanism vs. which things should have their own bespoke bookkeeping. However, the AtEOXact_Parallel() stuff happens well before ResourceOwnerRelease(), which makes merging them seem not particularly clean. * combocid.c should error out in parallel mode, as insurance Eh, which function? HeapTupleHeaderGetCmin(), HeapTupleHeaderGetCmax(), and AtEOXact_ComboCid() are intended to work in parallel mode. HeapTupleHeaderAdjustCmax() could Assert(!IsInParallelMode()) but the only calls are in heap_update() and heap_delete() which already have error checks, so putting yet another elog() there seems like overkill. * I doubt it's a good idea to allow heap_insert, heap_inplace_update, index_insert. I'm not convinced that those will be handled correct and relaxing restrictions is easier than adding them. I'm fine with adding checks to heap_insert() and heap_inplace_update(). I'm not sure we really need to add anything to index_insert(); how are we going to get there without hitting some other prohibited operation first? * I'd much rather separate HandleParallelMessageInterrupt() in one variant that just tells the machinery there's a interrupt (called from signal handlers) and one that actually handles them. We shouldn't even consider adding any new code that does allocations, errors and such in signal handlers. That's just a *bad* idea. That's a nice idea, but I didn't invent the way this crap works today. ImmediateInterruptOK gets set to true while performing a lock wait, and we need to be able to respond to errors while in that state. I think there's got to be a better way to handle that than force every asynchronous operation to have to cope with the fact that ImmediateInterruptOK may be set or not set, but as of today that's what you have to do. * I'm not a fan of the shm_toc stuff... Verbose and hard to read. I'd much rather place a struct there and be careful about not using pointers. That also would obliviate the need to reserve some ids. I don't see how that's going to work with variable-size data structures, and a bunch of the things that we need to serialize are variable-size. Moreover, I'm not really interested in rehashing this design again. I know it's not your favorite; you've said that before. But it makes it possible to write code to do useful things in parallel, a capability that we do not otherwise have. And I like it fine. * Doesn't the restriction in GetSerializableTransactionSnapshotInt() apply for repeatable read just the same? Yeah. I'm not sure whether we really need that check at all, because there is a check in GetTransactionSnapshot() that probably checks the same thing. * I'm not a fan of relying on GetComboCommandId() to restore things in the same order as before. I thought that was a little wonky, but it's not likely to break, and there is an elog(ERROR) there to catch it if it does, so I'd rather not make it more complicated. * I'd say go ahead and commit the BgworkerByOid thing earlier/independently. I've seen need for that a couple times. Woohoo! I was hoping someone would say that. * s/entrypints/entrypoints/ Thanks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via
Re: [HACKERS] Publish autovacuum informations
On Wed, Dec 31, 2014 at 12:46 PM, Tom Lane t...@sss.pgh.pa.us wrote: I'd be all right with putting the data structure declarations in a file named something like autovacuum_private.h, especially if it carried an annotation that if you depend on this, don't be surprised if we break your code in future. Works for me. I am not in general surprised when we do things that break my code, or anyway, the code that I'm responsible for maintaining. But I think it makes sense to segregate this into a separate header file so that we are clear that it is only exposed for the benefit of extension authors, not so that other things in the core system can touch it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Change how first WAL segment on new timeline after promotion is
On 01/03/2015 08:59 PM, Andres Freund wrote: Hi Heikki, While writing a test script for http://archives.postgresql.org/message-id/20141205002854.GE21964%40awork2.anarazel.de I noticed that this commit broke starting a pg_basebackup -X * without a recovery.conf present. Which might not be the best idea, but imo is a perfectly valid thing to do. To me the changes to StartupXLOG() in that commit look a bit bogus. The new startLogSegNo is initialized to XLByteToSeg(EndOfLog)? Which points to the end of the record +1? Which thus isn't guaranteed to exist as a segment (e.g. never if the last record was a XLOG_SWITCH). Ah, good point. Did you perhaps intend to use XLogFileInit(use_existing = true) instead of XLogFileOpen()? That works for me. Hmm, that doesn't sound right either. XLogFileInit is used when you switch to a new segment, not to open an old segment for writing. It happens to work, because with use_existing = true it will in fact always open the old segment, instead of creating a new one, but I don't think that's in the spirit of how that function's intended to be used. A very simple fix is to not try opening the segment at all. There isn't actually any requirement to have the segment open at that point, XLogWrite() will open it the first time the WAL is flushed. The WAL is flushed after writing the initial checkpoint or end-of-recovery record, which happens pretty soon anyway. Any objections to the attached? I've attached my preliminary testscript (note it's really not that interesting at this point) that reliably reproduces the problem for me. Thanks! - Heikki diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 5cc7e47..e623463 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -5646,7 +5646,6 @@ StartupXLOG(void) XLogRecPtr RecPtr, checkPointLoc, EndOfLog; - XLogSegNo startLogSegNo; TimeLineID PrevTimeLineID; XLogRecord *record; TransactionId oldestActiveXID; @@ -6633,7 +6632,6 @@ StartupXLOG(void) */ record = ReadRecord(xlogreader, LastRec, PANIC, false); EndOfLog = EndRecPtr; - XLByteToSeg(EndOfLog, startLogSegNo); /* * Complain if we did not roll forward far enough to render the backup @@ -6741,9 +6739,6 @@ StartupXLOG(void) * buffer cache using the block containing the last record from the * previous incarnation. */ - openLogSegNo = startLogSegNo; - openLogFile = XLogFileOpen(openLogSegNo); - openLogOff = 0; Insert = XLogCtl-Insert; Insert-PrevBytePos = XLogRecPtrToBytePos(LastRec); Insert-CurrBytePos = XLogRecPtrToBytePos(EndOfLog); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Publish autovacuum informations
2015-01-05 17:40 GMT+01:00 Robert Haas robertmh...@gmail.com: On Wed, Dec 31, 2014 at 12:46 PM, Tom Lane t...@sss.pgh.pa.us wrote: I'd be all right with putting the data structure declarations in a file named something like autovacuum_private.h, especially if it carried an annotation that if you depend on this, don't be surprised if we break your code in future. Works for me. I am not in general surprised when we do things that break my code, or anyway, the code that I'm responsible for maintaining. But I think it makes sense to segregate this into a separate header file so that we are clear that it is only exposed for the benefit of extension authors, not so that other things in the core system can touch it. I'm fine with that too. I'll try to find some time to work on that. Thanks. -- Guillaume. http://blog.guillaume.lelarge.info http://www.dalibo.com
Re: [HACKERS] Additional role attributes superuser review
On Wed, Dec 24, 2014 at 12:48 PM, Adam Brightwell adam.brightw...@crunchydatasolutions.com wrote: * BACKUP - allows role to perform backup operations * LOGROTATE - allows role to rotate log files * MONITOR - allows role to view pg_stat_* details * PROCSIGNAL - allows role to signal backend processes How about just SIGNAL instead of PROCSIGNAL? Generally, I think we'll be happier if these capabilities have names that are actual words - or combinations of words - rather than partial words, so I'd suggest avoiding things like PROC for PROCESS and AUTH for AUTHORIZATION. In this particular case, it seems like the name of the capability is based off the name of an internal system data structure. That's the sort of thing that we do not want to expose to users. As far as we can, we should try to describe what the capability allows, not the details of how that is (currently) implemented. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
On 05/01/15 16:17, Petr Jelinek wrote: On 05/01/15 07:28, Fujii Masao wrote: On Thu, Dec 4, 2014 at 12:08 PM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Dec 3, 2014 at 11:54 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Pushed with some extra cosmetic tweaks. I got the following assertion failure when I executed pg_xact_commit_timestamp() in the standby server. =# select pg_xact_commit_timestamp('1000'::xid); TRAP: FailedAssertion(!(((oldestCommitTs) != ((TransactionId) 0)) == ((newestCommitTs) != ((TransactionId) 0))), File: commit_ts.c, Line: 315) server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: 2014-12-04 12:01:08 JST sby1 LOG: server process (PID 15545) was terminated by signal 6: Aborted 2014-12-04 12:01:08 JST sby1 DETAIL: Failed process was running: select pg_xact_commit_timestamp('1000'::xid); The way to reproduce this problem is #1. set up and start the master and standby servers with track_commit_timestamp disabled #2. enable track_commit_timestamp in the master and restart the master #3. run some write transactions #4. enable track_commit_timestamp in the standby and restart the standby #5. execute select pg_xact_commit_timestamp('1000'::xid) in the standby BTW, at the step #4, I got the following log messages. This might be a hint for this problem. LOG: file pg_commit_ts/ doesn't exist, reading as zeroes CONTEXT: xlog redo Transaction/COMMIT: 2014-12-04 12:00:16.428702+09; inval msgs: catcache 59 catcache 58 catcache 59 catcache 58 catcache 45 catcache 44 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 snapshot 2608 relcache 16384 This problem still happens in the master. Regards, Attached patch fixes it, I am not sure how happy I am with the way I did it though. Added more comments and made the error message bit clearer. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c index ca074da..59d19a0 100644 --- a/src/backend/access/transam/commit_ts.c +++ b/src/backend/access/transam/commit_ts.c @@ -557,6 +557,12 @@ StartupCommitTs(void) TransactionId xid = ShmemVariableCache-nextXid; int pageno = TransactionIdToCTsPage(xid); + if (track_commit_timestamp) + { + ActivateCommitTs(); + return; + } + LWLockAcquire(CommitTsControlLock, LW_EXCLUSIVE); /* @@ -571,14 +577,30 @@ StartupCommitTs(void) * This must be called ONCE during postmaster or standalone-backend startup, * when commit timestamp is enabled. Must be called after recovery has * finished. + */ +void +CompleteCommitTsInitialization(void) +{ + if (!track_commit_timestamp) + DeactivateCommitTs(true); +} + +/* + * This must be called when track_commit_timestamp is turned on. + * Note that this only happens during postmaster or standalone-backend startup + * or during WAL replay. + * + * The reason why this SLRU needs separate activation/deactivation functions is + * that it can be enabled/disabled during start and the activation/deactivation + * on master is propagated to slave via replay. Other SLRUs don't have this + * property and they can be just initialized during normal startup. * * This is in charge of creating the currently active segment, if it's not * already there. The reason for this is that the server might have been * running with this module disabled for a while and thus might have skipped * the normal creation point. */ -void -CompleteCommitTsInitialization(void) +void ActivateCommitTs(void) { TransactionId xid = ShmemVariableCache-nextXid; int pageno = TransactionIdToCTsPage(xid); @@ -591,22 +613,6 @@ CompleteCommitTsInitialization(void) LWLockRelease(CommitTsControlLock); /* - * If this module is not currently enabled, make sure we don't hand back - * possibly-invalid data; also remove segments of old data. - */ - if (!track_commit_timestamp) - { - LWLockAcquire(CommitTsLock, LW_EXCLUSIVE); - ShmemVariableCache-oldestCommitTs = InvalidTransactionId; - ShmemVariableCache-newestCommitTs = InvalidTransactionId; - LWLockRelease(CommitTsLock); - - TruncateCommitTs(ReadNewTransactionId()); - - return; - } - - /* * If CommitTs is enabled, but it wasn't in the previous server run, we * need to set the oldest and newest values to the next Xid; that way, we * will not try to read data that might not have been set. @@ -641,6 +647,35 @@ CompleteCommitTsInitialization(void) } /* + * This must be called when track_commit_timestamp is turned off. + * Note that this only happens during postmaster or standalone-backend startup + * or during WAL replay. + * + * Resets CommitTs into invalid state to
Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD
On Thu, Jan 1, 2015 at 3:04 PM, Andres Freund and...@2ndquadrant.com wrote: That's true, but if you don't align the beginnings of the allocations, then it's a lot more complicated for the code to properly align stuff within the allocation. It's got to insert a variable amount of padding based on the alignment it happens to get. Hm? Allocate +PG_CACHELINE_SIZE and do var = CACHELINEALIGN(var). Meh. I guess that will work, but I see little advantage in it. Cache-line aligning the allocations is simple and, not of no value, of long precedent. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel mode and parallel contexts
Robert Haas robertmh...@gmail.com writes: On Sat, Jan 3, 2015 at 7:31 PM, Andres Freund and...@2ndquadrant.com wrote: * I wonder if parallel contexts shouldn't be tracked via resowners That is a good question. I confess that I'm somewhat fuzzy about which things should be tracked via the resowner mechanism vs. which things should have their own bespoke bookkeeping. However, the AtEOXact_Parallel() stuff happens well before ResourceOwnerRelease(), which makes merging them seem not particularly clean. FWIW, the resowner mechanism was never meant as a substitute for bespoke bookkeeping. What it is is a helper mechanism to reduce the need for PG_TRY blocks that guarantee that a resource-releasing function will be called even in error paths. I'm not sure whether that analogy applies well in parallel-execution cases. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redesigning checkpoint_segments
On 01/05/2015 12:06 PM, Andres Freund wrote: On 2015-01-05 11:34:54 +0200, Heikki Linnakangas wrote: On 01/04/2015 11:44 PM, Josh Berkus wrote: On 01/03/2015 12:56 AM, Heikki Linnakangas wrote: On 01/03/2015 12:28 AM, Josh Berkus wrote: On 01/02/2015 01:57 AM, Heikki Linnakangas wrote: wal_keep_segments does not affect the calculation of CheckPointSegments. If you set wal_keep_segments high enough, checkpoint_wal_size will be exceeded. The other alternative would be to force a checkpoint earlier, i.e. lower CheckPointSegments, so that checkpoint_wal_size would be honored. However, if you set wal_keep_segments high enough, higher than checkpoint_wal_size, it's impossible to honor checkpoint_wal_size no matter how frequently you checkpoint. So you're saying that wal_keep_segments is part of the max_wal_size total, NOT in addition to it? Not sure what you mean. wal_keep_segments is an extra control that can prevent WAL segments from being recycled. It has the same effect as archive_command failing for N most recent segments, if that helps. I mean, if I have these settings: max_wal_size* = 256MB wal_keep_segments = 8 ... then my max wal size is *still* 256MB, NOT 384MB? Right. With that you mean that wal_keep_segments has *no* influence over checkpoint pacing or the contrary? Because upthread you imply that it doesn't, but later comments may mean the contrary. wal_keep_segments does not influence checkpoint pacing. If that's the case (and I think it's a good plan), then as a follow-on, we should prevent users from setting wal_keep_segments to more than 50% of max_wal_size, no? Not sure if the 50% figure is correct, but I see what you mean: don't allow setting wal_keep_segments so high that we would exceed max_wal_size because of it. I wasn't clear on my opinion here. I think I understood what Josh meant, but I don't think we should do it. Seems like unnecessary nannying of the DBA. Let's just mention in the manual that if you set wal_keep_segments higher than [insert formula here], you will routinely have more WAL in pg_xlog than what checkpoint_wal_size is set to. That seems a unrealistic goal. I've seen setups that have set checkpoint_segments intentionally, and with good reasoning, north of 50k. So? I don't see how that's relevant. Neither wal_keep_segments, nor failing archive_commands nor replication slot should have an influence on checkpoint pacing. Agreed. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] PGCon 2015 call for papers
PGCon 2015 will be on 18-19 June 2015 at University of Ottawa. * 16-17 (Tue-Wed) tutorials * 18-19 (Thu-Fri) talks - the main part of the conference * 20 (Sat) The Unconference (very popular) PLEASE NOTE: PGCon 2015 is in June. See http://www.pgcon.org/2015/ We are now accepting proposals for the main part of the conference (18-19 June). Proposals can be quite simple. We do not require academic-style papers. You have about two weeks left before submissions close. If you are doing something interesting with PostgreSQL, please submit a proposal. You might be one of the backend hackers or work on a PostgreSQL related project and want to share your know-how with others. You might be developing an interesting system using PostgreSQL as the foundation. Perhaps you migrated from another database to PostgreSQL and would like to share details. These, and other stories are welcome. Both users and developers are encouraged to share their experiences. Here are a some ideas to jump start your proposal process: - novel ways in which PostgreSQL is used - migration of production systems from another database - data warehousing - tuning PostgreSQL for different work loads - replication and clustering - hacking the PostgreSQL code - PostgreSQL derivatives and forks - applications built around PostgreSQL - benchmarking and performance engineering - case studies - location-aware and mapping software with PostGIS - The latest PostgreSQL features and features in development - research and teaching with PostgreSQL - things the PostgreSQL project could do better - integrating PostgreSQL with 3rd-party software Both users and developers are encouraged to share their experiences. The schedule is: 1 Dec 2014 Proposal acceptance begins 19 Jan 2015 Proposal acceptance ends 19 Feb 2015 Confirmation of accepted proposals NOTE: the call for lightning talks will go out very close to the conference. Do not submit lightning talks proposals until then. See also http://www.pgcon.org/2015/papers.php Instructions for submitting a proposal to PGCon 2015 are available from: http://www.pgcon.org/2015/submissions.php — Dan Langille http://langille.org/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Additional role attributes superuser review
On Mon, Jan 5, 2015 at 11:49 AM, Robert Haas robertmh...@gmail.com wrote: On Wed, Dec 24, 2014 at 12:48 PM, Adam Brightwell adam.brightw...@crunchydatasolutions.com wrote: * BACKUP - allows role to perform backup operations * LOGROTATE - allows role to rotate log files * MONITOR - allows role to view pg_stat_* details * PROCSIGNAL - allows role to signal backend processes How about just SIGNAL instead of PROCSIGNAL? Sure. Generally, I think we'll be happier if these capabilities have names that are actual words - or combinations of words - rather than partial words, so I'd suggest avoiding things like PROC for PROCESS and AUTH for AUTHORIZATION. Agreed. In this particular case, it seems like the name of the capability is based off the name of an internal system data structure. That's the sort of thing that we do not want to expose to users. As far as we can, we should try to describe what the capability allows, not the details of how that is (currently) implemented. Agreed. If others are also in agreement on this point then I'll update the patch accordingly. Thanks, Adam -- Adam Brightwell - adam.brightw...@crunchydatasolutions.com Database Engineer - www.crunchydatasolutions.com
Re: [HACKERS] pg_rewind in contrib
Here's an updated version of pg_rewind. The code itself is the same as before, and corresponds to the sources in the current pg_rewind github repository, as of commit a65e3754faf9ca9718e6b350abc736de586433b7. Based mostly on Michael's comments, I have: * replaced VMware copyright notices with PGDG ones. * removed unnecessary cruft from .gitignore * changed the --version line and report bugs notice in --help to match other binaries in the PostgreSQL distribution * moved documentation from README to the user manual. * minor fixes to how the regression tests are launched so that they work again Some more work remains to be done on the regression tests. The way they're launched now is quite weird. It was written that way to make it work outside the PostgreSQL source tree, and also on Windows. Now that it lives in contrib, it should be redesigned. The documentation could also use some work; for now I just converted the existing text from README to sgml format. Anything else? - Heikki diff --git a/contrib/Makefile b/contrib/Makefile index 195d447..2fe861f 100644 --- a/contrib/Makefile +++ b/contrib/Makefile @@ -32,6 +32,7 @@ SUBDIRS = \ pg_buffercache \ pg_freespacemap \ pg_prewarm \ + pg_rewind \ pg_standby \ pg_stat_statements \ pg_test_fsync \ diff --git a/contrib/pg_rewind/.gitignore b/contrib/pg_rewind/.gitignore new file mode 100644 index 000..816a91d --- /dev/null +++ b/contrib/pg_rewind/.gitignore @@ -0,0 +1,9 @@ +# Files generated during build +/xlogreader.c + +# Generated by test suite +/tmp_check/ +/regression.diffs +/regression.out +/results/ +/regress_log/ diff --git a/contrib/pg_rewind/Makefile b/contrib/pg_rewind/Makefile new file mode 100644 index 000..241c775 --- /dev/null +++ b/contrib/pg_rewind/Makefile @@ -0,0 +1,51 @@ +# Makefile for pg_rewind +# +# Copyright (c) 2013-2014, PostgreSQL Global Development Group +# + +PGFILEDESC = pg_rewind - repurpose an old master server as standby +PGAPPICON = win32 + +PROGRAM = pg_rewind +OBJS = pg_rewind.o parsexlog.o xlogreader.o util.o datapagemap.o timeline.o \ + fetch.o copy_fetch.o libpq_fetch.o filemap.o + +REGRESS = basictest extrafiles databases +REGRESS_OPTS=--use-existing --launcher=./launcher + +PG_CPPFLAGS = -I$(libpq_srcdir) +PG_LIBS = $(libpq_pgport) + +override CPPFLAGS := -DFRONTEND $(CPPFLAGS) + +EXTRA_CLEAN = $(RMGRDESCSOURCES) xlogreader.c + +all: pg_rewind + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = contrib/pg_rewind +top_builddir = ../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif + +xlogreader.c: % : $(top_srcdir)/src/backend/access/transam/% + rm -f $@ $(LN_S) $ . + +# The regression tests can be run separately against, using the libpq or local +# method to copy the files. For local mode, the makefile target is +# check-local, and for libpq mode, check-remote. The target check-both +# runs the tests in both modes. +check-local: + echo Running tests against local data directory, in copy-mode + bindir=$(bindir) TEST_SUITE=local $(MAKE) installcheck + +check-remote: + echo Running tests against a running standby, via libpq + bindir=$(bindir) TEST_SUITE=remote $(MAKE) installcheck + +check-both: check-local check-remote diff --git a/contrib/pg_rewind/copy_fetch.c b/contrib/pg_rewind/copy_fetch.c new file mode 100644 index 000..5c40ec7 --- /dev/null +++ b/contrib/pg_rewind/copy_fetch.c @@ -0,0 +1,586 @@ +/*- + * + * copy_fetch.c + * Functions for copying a PostgreSQL data directory + * + * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group + * + *- + */ +#include postgres_fe.h + +#include catalog/catalog.h + +#include sys/types.h +#include sys/stat.h +#include dirent.h +#include fcntl.h +#include unistd.h +#include string.h + +#include pg_rewind.h +#include fetch.h +#include filemap.h +#include datapagemap.h +#include util.h + +static void recurse_dir(const char *datadir, const char *path, + process_file_callback_t callback); + +static void execute_pagemap(datapagemap_t *pagemap, const char *path); + +static void remove_target_file(const char *path); +static void create_target_dir(const char *path); +static void remove_target_dir(const char *path); +static void create_target_symlink(const char *path, const char *link); +static void remove_target_symlink(const char *path); + +/* + * Traverse through all files in a data directory, calling 'callback' + * for each file. + */ +void +traverse_datadir(const char *datadir, process_file_callback_t callback) +{ + /* should this copy config files or not? */ + recurse_dir(datadir, NULL, callback); +} + +/* + * recursive part of traverse_datadir + */ +static void +recurse_dir(const char *datadir, const char *parentpath, + process_file_callback_t callback) +{ +
Re: [HACKERS] replicating DROP commands across servers
On Fri, Jan 2, 2015 at 9:59 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Mon, Dec 29, 2014 at 2:15 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Here's a patch that tweaks the grammar to use TypeName in COMMENT, SECURITY LABEL, and DROP for the type and domain cases. The required changes in the code are pretty minimal, thankfully. Note the slight changes to the new object_address test also. I think I'm pretty much done with this now, so I intend to push this first thing tomorrow and then the rebased getObjectIdentityParts patch sometime later. This commit 3f88672a4e4d8e648d24ccc65 seems to have broken pg_upgrade for pg_trgm. On 9.2.9 freshly initdb and started with default config: $ createdb jjanes in psql: create extension pg_trgm ; create table foo (x text); create index on foo using gin (upper(x) gin_trgm_ops); Then run 9.5's pg_upgrade and it fails at the stage of restoring the database schema. The problem also occurs doing a self-upgrade from 9.5 to 9.5. The contents of the dump not changed meaningfully between 9.4 and 9.5. I've isolated the problem to the backend applying the pg_restore of the dump, regardless of which version created the dump. After compiling 3c9e4cdbf2ec876dbb7 with CFLAGS=-fno-omit-frame-pointer, I get this backtrace for the core-dump of postmaster during the pg_restore: Core was generated by `postgres: jjanes jjanes [local] ALTER EXTENSION '. Program terminated with signal 11, Segmentation fault. #0 0x005135ff in NameListToString (names=0x257fcf8) at namespace.c:2935 2935if (IsA(name, String)) (gdb) bt #0 0x005135ff in NameListToString (names=0x257fcf8) at namespace.c:2935 #1 0x00512f33 in DeconstructQualifiedName (names=0x257fcf8, nspname_p=0x7fff419bc960, objname_p=0x7fff419bc958) at namespace.c:2648 #2 0x0058a746 in LookupTypeName (pstate=0x0, typeName=0x257fd10, typmod_p=0x0, missing_ok=0 '\000') at parse_type.c:153 #3 0x005220b4 in get_object_address_type (objtype=OBJECT_TYPE, objname=0x257fd50, missing_ok=0 '\000') at objectaddress.c:1306 #4 0x00520cf5 in get_object_address (objtype=OBJECT_TYPE, objname=0x257fd50, objargs=0x0, relp=0x7fff419bcb58, lockmode=4, missing_ok=0 '\000') at objectaddress.c:678 #5 0x005c0f36 in ExecAlterExtensionContentsStmt (stmt=0x257fd80) at extension.c:2906 #6 0x0077508c in ProcessUtilitySlow (parsetree=0x257fd80, queryString=0x254f990 \n-- For binary upgrade, must preserve pg_type oid\nSELECT binary_upgrade.set_next_pg_type_oid('16394'::pg_catalog.oid);\n\n\n-- For binary upgrade, must preserve pg_type array oid\nSELECT binary_upgrade.se..., context=PROCESS_UTILITY_QUERY, params=0x0, dest=0x2581b60, completionTag=0x7fff419bd100 ) at utility.c:1167 #7 0x0077490e in standard_ProcessUtility (parsetree=0x257fd80, queryString=0x254f990 \n-- For binary upgrade, must preserve pg_type oid\nSELECT binary_upgrade.set_next_pg_type_oid('16394'::pg_catalog.oid);\n\n\n-- For binary upgrade, must preserve pg_type array oid\nSELECT binary_upgrade.se..., context=PROCESS_UTILITY_QUERY, params=0x0, dest=0x2581b60, completionTag=0x7fff419bd100 ) at utility.c:840 #8 0x00773bcc in ProcessUtility (parsetree=0x257fd80, queryString=0x254f990 \n-- For binary upgrade, must preserve pg_type oid\nSELECT binary_upgrade.set_next_pg_type_oid('16394'::pg_catalog.oid);\n\n\n-- For binary upgrade, must preserve pg_type array oid\nSELECT binary_upgrade.se..., context=PROCESS_UTILITY_QUERY, params=0x0, dest=0x2581b60, completionTag=0x7fff419bd100 ) at utility.c:313 #9 0x00772dd6 in PortalRunUtility (portal=0x2505b90, utilityStmt=0x257fd80, isTopLevel=0 '\000', dest=0x2581b60, completionTag=0x7fff419bd100 ) at pquery.c:1187 #10 0x00772f8c in PortalRunMulti (portal=0x2505b90, isTopLevel=0 '\000', dest=0x2581b60, altdest=0x2581b60, completionTag=0x7fff419bd100 ) at pquery.c:1318 #11 0x00772560 in PortalRun (portal=0x2505b90, count=9223372036854775807, isTopLevel=0 '\000', dest=0x2581b60, altdest=0x2581b60, completionTag=0x7fff419bd100 ) at pquery.c:816 #12 0x0076c868 in exec_simple_query ( query_string=0x254f990 \n-- For binary upgrade, must preserve pg_type oid\nSELECT binary_upgrade.set_next_pg_type_oid('16394'::pg_catalog.oid);\n\n\n-- For binary upgrade, must preserve pg_type array oid\nSELECT binary_upgrade.se...) at postgres.c:1045 #13 0x007708a2 in PostgresMain (argc=1, argv=0x24ed5e0, dbname=0x24ed4b8 jjanes, username=0x24ed4a0 jjanes) at postgres.c:4012 #14 0x00701940 in BackendRun (port=0x250c1b0) at postmaster.c:4130 #15 0x00701083 in BackendStartup (port=0x250c1b0) at postmaster.c:3805 #16 0x006fd8c5 in ServerLoop () at postmaster.c:1573 #17 0x006fd013 in PostmasterMain (argc=18, argv=0x24ec480) at postmaster.c:1220 #18 0x0066476b in main (argc=18, argv=0x24ec480) at
Re: [HACKERS] parallel mode and parallel contexts
On 22 December 2014 at 19:14, Robert Haas robertmh...@gmail.com wrote: Here is another new version, with lots of bugs fixed. An initial blind review, independent of other comments already made on thread. OK src/backend/access/heap/heapam.c heapam.c prohibitions on update and delete look fine OK src/backend/access/transam/Makefile OK src/backend/access/transam/README.parallel README.parallel and all concepts look good PARTIAL src/backend/access/transam/parallel.c wait_patiently mentioned in comment but doesn’t exist Why not make nworkers into a uint? Trampoline? Really? I think we should define what we mean by that, somewhere, rather than just use the term as if it was self-evident. Entrypints? .. OK src/backend/access/transam/varsup.c QUESTIONS src/backend/access/transam/xact.c These comments don’t have any explanation or justification + * This stores XID of the toplevel transaction to which we belong. + * Normally, it's the same as TopTransactionState.transactionId, but in + * a parallel worker it may not be. + * If we're a parallel worker, there may be additional XIDs that we need + * to regard as part of our transaction even though they don't appear + * in the transaction state stack. This comment is copied-and-pasted too many times for safety and elegance + /* +* Workers synchronize transaction state at the beginning of each parallel +* operation, so we can't account for transaction state change after that +* point. (Note that this check will certainly error out if s-blockState +* is TBLOCK_PARALLEL_INPROGRESS, so we can treat that as an invalid case +* below.) +*/ We need a single Check() that contains more detail and comments within Major comments * Parallel stuff at start sounded OK * Transaction stuff strikes me as overcomplicated and error prone. Given there is no explanation or justification for most of it, I’d be inclined to question why its there * If we have a trampoline for DSM, why don’t we use a trampoline for snapshots, then you wouldn’t need to do all this serialize stuff This is very impressive, but it looks like we’re trying to lift too much weight on the first lift. If we want to go anywhere near this, we need to have very clear documentation about how and why its like that. I’m actually very sorry to say that because the review started very well, much much better than most. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] POLA violation with \c service=
On Tue, Dec 30, 2014 at 04:48:11PM -0800, David Fetter wrote: On Wed, Dec 17, 2014 at 08:14:04AM -0500, Andrew Dunstan wrote: Yeah, that's the correct solution. It should not be terribly difficult to create a test for a conninfo string in the dbname parameter. That's what libpq does after all. We certainly don't want psql to have to try to interpret the service file. psql just needs to let libpq do its work in this situation. This took a little longer to get time to polish than I'd hoped, but please find attached a patch which: - Correctly connects to service= and postgres(ql)?:// with \c - Disallows tab completion in the above cases I'd like to see about having tab completion actually work correctly in at least the service= case, but that's a matter for a follow-on patch. Thanks to Andrew Dunstan for the original patch, and to Andrew Gierth for his help getting it into shape. Cheers, David. I should mention that the patch also corrects a problem where the password was being saved/discarded at inappropriate times. Please push this patch to the back branches :) Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] event trigger pg_dump fix
Hi, Attached is fix for http://www.postgresql.org/message-id/1420492505.23613.15.ca...@bloodnok.com It was dumping comment with NULL owner while the function expects empty string for objects without owner (there is pg_strdup down the line). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 6658fda..6541463 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -15106,7 +15106,7 @@ dumpEventTrigger(Archive *fout, DumpOptions *dopt, EventTriggerInfo *evtinfo) query-data, , NULL, NULL, 0, NULL, NULL); dumpComment(fout, dopt, labelq-data, -NULL, NULL, +NULL, , evtinfo-dobj.catId, 0, evtinfo-dobj.dumpId); destroyPQExpBuffer(query); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] event trigger test exception message
On Sun, Jan 4, 2015 at 8:09 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Andrew Dunstan wrote: I don't wish to seem humorless, but I think this should probably be changed: root/HEAD/pgsql/src/test/regress/sql/event_trigger.sql:248: RAISE EXCEPTION 'I''m sorry Sir, No Rewrite Allowed.'; Quite apart from any other reason, the Sir does seem a bit sexist - we have no idea of the gender of the reader. Probably just 'sorry, no rewrite allowed' would suffice. This seems pointless tinkering to me. Should I start introducing female pronouns in test error messages, to measure how much this will annoy my male conterparts? This is not a user visible message in any case. I'm with Andrew. Aside from any question of sexism, which I do not discount, the capitalization is wrong. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning recovery.conf into GUCs
I think this is a valuable effort, but I wonder how we are going to arrive at a consensus. I don't see any committer supporting these changes. Clearly, there are some advantages to having recovery parameters be GUCs, but the proposed changes also come with some disadvantages, and the tradeoffs aren't so clear. For example, putting recovery target parameters into postgresql.conf might not make sense to some people. Once you have reached the recovery end point, the information is obsolete. Keeping it set could be considered confusing. Moreover, when I'm actually doing point-in-time-recovery attempts, I don't think I want to be messing with my pristine postgresql.conf file. Having those settings separate isn't a bad idea in that case. Some people might like the recovery.done file as a historical record. Having standby.enabled just be deleted would destroy that information once recovery has ended. In the past, when some people have complained that recovery.conf cannot be moved to a configuration directory, I (and others?) have argued that recovery.conf is really more of a command script file than a configuration file (that was before replication was commonplace). The premise of this patch is that some options really are more configuration than command most of the time, but that doesn't mean the old view was invalid. The current system makes it easy to share postgresql.conf between primary and standby and just maintain the information related to the standby locally in recovery.conf. pg_basebackup -R makes that even easier. It's still possible to do this in the new system, but it's decidedly more work. These are all soft reasons, but they add up. The wins on the other hand are obscure: You can now use SHOW to inspect recovery settings. You can design your own configuration file include structures to set them. These are not bad, but is that all? I note that all the new settings are PGC_POSTMASTER, so you can't set them on the fly. Changing primary_conninfo without a restart would be a big win, for example. Is that planned? It might be acceptable to break all the old workflows if the new system was obviously great, but it's not. It's still confusing as heck. For example, we would have a file standby.enabled and a setting standby_mode, which would mean totally different things. I think there is also some conceptual overlap between standby_mode and recovery_target_action (standby_mode is really something like recovery_target_action = keep_going). I also find the various uses of trigger file or to trigger confusing. The old trigger file to trigger promotion makes sense. But calling standby.enabled a trigger file as well confuses two entirely different concepts. I have previously argued for a different approach: Ready recovery.conf as a configuration file after postgresql.conf, but keep it as a file to start recovery. That doesn't break any old workflows, it gets you all the advantages of having recovery parameters in the GUC system, and it keeps all the options open for improving things further in the future. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] event trigger pg_dump fix
Petr Jelinek p...@2ndquadrant.com writes: Attached is fix for http://www.postgresql.org/message-id/1420492505.23613.15.ca...@bloodnok.com It was dumping comment with NULL owner while the function expects empty string for objects without owner (there is pg_strdup down the line). This isn't right either: event triggers do have owners. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] event trigger pg_dump fix
On 06/01/15 01:02, Tom Lane wrote: Petr Jelinek p...@2ndquadrant.com writes: Attached is fix for http://www.postgresql.org/message-id/1420492505.23613.15.ca...@bloodnok.com It was dumping comment with NULL owner while the function expects empty string for objects without owner (there is pg_strdup down the line). This isn't right either: event triggers do have owners. Bah, of course, stupid me. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 6658fda..00b87e7 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -15106,7 +15106,7 @@ dumpEventTrigger(Archive *fout, DumpOptions *dopt, EventTriggerInfo *evtinfo) query-data, , NULL, NULL, 0, NULL, NULL); dumpComment(fout, dopt, labelq-data, -NULL, NULL, +NULL, evtinfo-evtowner, evtinfo-dobj.catId, 0, evtinfo-dobj.dumpId); destroyPQExpBuffer(query); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch)
On Thu, Jan 1, 2015 at 11:17 PM, Peter Geoghegan p...@heroku.com wrote: The attached patch fixes Jeff's test case, as far as it goes. But, as I mentioned, it's not intended as a real fix. For one thing, I have made multiple changes to HeapTupleSatisfies*() routines, but haven't fully considered the consequences for, say, ri_triggers.c. HeapTupleSatisfiesSelf() and HeapTupleSatisfiesHistoricMVCC() were not modified. I've modified HeapTupleSatisfiesVacuum() to see InvalidTransactionID (not invalid xid infomask bits) as visible for garbage collection (alongside HeapTupleSatisfiesDirty() and HeapTupleSatisfiesMVCC(), which I changed first), and I wouldn't be surprised if that created new problems in other areas. In short, this patch is just for illustrative purposes. I think that I see another race condition, requiring custom instrumentation to the server to demonstrate. Basically, even with the fix above, there are still similar races. I'm not trying to call ExecLockUpdateTuple() against super deleted NULL tuples, because those are no longer visible to any relevant snapshot type, the fix described above. However, I can still see a change in tuple header xmin between a dirty index scan and attempting to lock that row to update. If I'm not mistaken, the race condition is small enough that something like Jeff's tool won't catch it directly in a reasonable amount of time, because it happens to be the same index key value. It's not all that easy to recreate, even with my custom instrumentation. With fsync=off, it occurs somewhat infrequently with a custom instrumentation Postgres build: pg@hamster:~/jjanes_upsert$ perl count_upsert.pl 8 10 [Mon Jan 5 17:31:57 2015] init done at count_upsert.pl line 101. [Mon Jan 5 17:32:17 2015] child abnormal exit [Mon Jan 5 17:32:17 2015] DBD::Pg::db selectall_arrayref failed: ERROR: mismatch in xmin for (322,264). Initially 4324145, then 4324429 at count_upsert.pl line 210.\n at count_upsert.pl line 227. [Mon Jan 5 17:32:49 2015] sum is -800 [Mon Jan 5 17:32:49 2015] count is 9515 [Mon Jan 5 17:32:49 2015] normal exit at 1420507969 after 733912 items processed at count_upsert.pl line 182. I think that super deleting broken promise tuples undermines how vacuum interlocking is supposed to work [1]. We end up at the wrong tuple, that happens to occupy the same slot as the old one (and happens to have the same index key value, because the race is so small are there are relatively few distinct values in play - it's hard to recreate). Attached instrumentation patch was used with Jeff Jane's tool. If we weren't super deleting in the patch, then I think that our backend's xmin horizon would be enough of an interlock (haven't tested that theory by testing value locking approach #1 implementation yet, but I worried about the scenario quite a lot before and things seemed okay). But we are super deleting with implementation #2, and we're also not holding a pin on the heap buffer or B-Tree leaf page buffer throughout, so this happens (if I'm not mistaken). [1] https://github.com/postgres/postgres/blob/REL9_3_STABLE/src/backend/access/nbtree/README#L142 -- Peter Geoghegan diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c index f8a4017..4ce45e3 100644 --- a/src/backend/executor/execUtils.c +++ b/src/backend/executor/execUtils.c @@ -1189,7 +1189,7 @@ ExecInsertIndexTuples(TupleTableSlot *slot, */ bool ExecCheckIndexConstraints(TupleTableSlot *slot, - EState *estate, ItemPointer conflictTid, + EState *estate, HeapTuple conflictTid, Oid arbiterIdx) { ResultRelInfo *resultRelInfo; @@ -1339,7 +1339,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index, Datum *values, bool *isnull, EState *estate, bool newIndex, bool violationOK, bool wait, - ItemPointer conflictTid) + HeapTuple conflictTid) { Oid *constr_procs; uint16 *constr_strats; @@ -1469,7 +1469,10 @@ retry: { conflict = true; if (conflictTid) -*conflictTid = tup-t_self; + { +conflictTid-t_self = tup-t_self; +conflictTid-t_data = tup-t_data; + } break; } @@ -1506,7 +1509,10 @@ retry: { conflict = true; if (conflictTid) -*conflictTid = tup-t_self; + { +conflictTid-t_self = tup-t_self; +conflictTid-t_data = tup-t_data; + } break; } diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c index 69af2d1..a289ef4 100644 --- a/src/backend/executor/nodeModifyTable.c +++ b/src/backend/executor/nodeModifyTable.c @@ -56,7 +56,7 @@ static bool ExecLockUpdateTuple(EState *estate, ResultRelInfo *relinfo, -ItemPointer conflictTid, +HeapTuple conflictTid, TupleTableSlot *planSlot, TupleTableSlot *insertSlot, bool canSetTag, @@ -292,7 +292,7 @@ ExecInsert(TupleTableSlot *slot, else { boolconflict; -
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Mon, Jan 5, 2015 at 10:29 PM, Fujii Masao masao.fu...@gmail.com wrote: On Sun, Dec 28, 2014 at 10:57 PM, Michael Paquier wrote: The patch 1 cannot be applied to the master successfully because of recent change. Yes, that's caused by ccb161b. Attached are rebased versions. - The real stuff comes with patch 2, that implements the removal of PGLZ_Header, changing the APIs of compression and decompression to pglz to not have anymore toast metadata, this metadata being now localized in tuptoaster.c. Note that this patch protects the on-disk format (tested with pg_upgrade from 9.4 to a patched HEAD server). Here is how the APIs of compression and decompression look like with this patch, simply performing operations from a source to a destination: extern int32 pglz_compress(const char *source, int32 slen, char *dest, const PGLZ_Strategy *strategy); extern int32 pglz_decompress(const char *source, char *dest, int32 compressed_size, int32 raw_size); The return value of those functions is the number of bytes written in the destination buffer, and 0 if operation failed. So it's guaranteed that 0 is never returned in success case? I'm not sure if that case can really happen, though. This is an inspiration from lz4 APIs. Wouldn't it be buggy for a compression algorithm to return a size of 0 bytes as compressed or decompressed length btw? We could as well make it return a negative value when a failure occurs if you feel more comfortable with it. -- Michael 20150105_fpw_compression_v13.tar.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD
On Fri, Jan 2, 2015 at 06:25:52PM +0100, Andres Freund wrote: I can't run tests right now... What exactly do you want to see with these tests? that's essentially what I've already benchmarked + some fprintfs? I want to test two patches that both allocate an extra 64 bytes but shift the alignment of just the buffer descriptor memory allocation. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code
On Mon, Jan 5, 2015 at 10:39 PM, Alexey Vasiliev leopard...@inbox.ru wrote: Hello. Thanks for help. Yes, new patch look fine! OK cool. Let's get an opinion from a committer then. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel mode and parallel contexts
On Mon, Jan 5, 2015 at 9:57 PM, Robert Haas robertmh...@gmail.com wrote: On Sat, Jan 3, 2015 at 7:31 PM, Andres Freund and...@2ndquadrant.com wrote: * Doesn't the restriction in GetSerializableTransactionSnapshotInt() apply for repeatable read just the same? Yeah. I'm not sure whether we really need that check at all, because there is a check in GetTransactionSnapshot() that probably checks the same thing. The check in GetSerializableTransactionSnapshotInt() will also prohibit any user/caller of SetSerializableTransactionSnapshot() in parallel mode as that can't be prohibited by GetTransactionSnapshot(). With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
[HACKERS] Possible typo in create_policy.sgml
Hi, Following is perhaps a typo: - qualifications of queries which are run against the table the policy is on, + qualifications of queries which are run against the table if the policy is on, Attached fixes it if so. Thanks, Amit diff --git a/doc/src/sgml/ref/create_policy.sgml b/doc/src/sgml/ref/create_policy.sgml index 4c8c001..c28ba2c 100644 --- a/doc/src/sgml/ref/create_policy.sgml +++ b/doc/src/sgml/ref/create_policy.sgml @@ -40,7 +40,7 @@ CREATE POLICY replaceable class=parametername/replaceable ON replaceable para A policy is an expression which is added to the security-barrier - qualifications of queries which are run against the table the policy is on, + qualifications of queries which are run against the table if the policy is on, or an expression which is added to the with-check options for a table and which is applied to rows which would be added to the table. The security-barrier qualifications will always be evaluated prior to any -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning recovery.conf into GUCs
On 01/05/2015 05:43 PM, Peter Eisentraut wrote: The wins on the other hand are obscure: You can now use SHOW to inspect recovery settings. You can design your own configuration file include structures to set them. These are not bad, but is that all? That's not the only potential win, and it's not small either. I'll be able to tell what master a replica is replicating from using via a port 5432 connection (currently there is absolutely no way to do this). I note that all the new settings are PGC_POSTMASTER, so you can't set them on the fly. Changing primary_conninfo without a restart would be a big win, for example. Is that planned? ... but there you have the flaw in the ointment. Unless we make some of the settings superuserset, no, it doesn't win us a lot, except ... I have previously argued for a different approach: Ready recovery.conf as a configuration file after postgresql.conf, but keep it as a file to start recovery. That doesn't break any old workflows, it gets you all the advantages of having recovery parameters in the GUC system, and it keeps all the options open for improving things further in the future. ... and there you hit on one of the other issues with recovery.conf, which is that it's a configuration file with configuration parameters which gets automatically renamed when a standby is promoted. This plays merry hell with configuration management systems. The amount of conditional logic I've had to write for Salt to handle recovery.conf truly doesn't bear thinking about. There may be some other way to make recovery.conf configuration-management friendly, but I haven't thought of it. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] INSERT ... ON CONFLICT UPDATE and RLS
The patch that implements INSERT ... ON CONFLICT UPDATE has support and tests for per-column privileges (which are not relevant to the IGNORE variant, AFAICT). However, RLS support is another thing entirely. It has not been properly thought out, and unlike per-column privileges requires careful consideration, as the correct behavior isn't obvious. I've documented the current problems with RLS here: https://wiki.postgresql.org/wiki/UPSERT#RLS It's not clear whether or not the auxiliary UPDATE within an INSERT... ON CONFLICT UPDATE statement should have security quals appended. Stephen seemed to think that that might not be the best solution [1]. I am not sure. I'd like to learn what other people think. What is the best way of integrating RLS with ON CONFLICT UPDATE? What behavior is most consistent with the guarantees of RLS? In particular, should the implementation append security quals to the auxiliary UPDATE, or fail sooner? [1] http://www.postgresql.org/message-id/20141121205926.gk28...@tamriel.snowman.net -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redesigning checkpoint_segments
On 01/05/2015 09:06 AM, Heikki Linnakangas wrote: I wasn't clear on my opinion here. I think I understood what Josh meant, but I don't think we should do it. Seems like unnecessary nannying of the DBA. Let's just mention in the manual that if you set wal_keep_segments higher than [insert formula here], you will routinely have more WAL in pg_xlog than what checkpoint_wal_size is set to. That seems a unrealistic goal. I've seen setups that have set checkpoint_segments intentionally, and with good reasoning, north of 50k. So? I don't see how that's relevant. Neither wal_keep_segments, nor failing archive_commands nor replication slot should have an influence on checkpoint pacing. Agreed. Oh, right, slots can also cause the log to increase in size. And we've already had the discussion about hard limits, which is maybe a future feature and not part of this patch. Can we figure out a reasonable formula? My thinking is 50% for wal_keep_segments, because we need at least 50% of the wals to do a reasonable spread checkpoint. If max_wal_size is 1GB, and wal_keep_segments is 1.5GB, what would happen? What if wal_keep_segments is 0.9GB? I need to create a fake benchmark for this ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add modulo (%) operator to pgbench
Hello Alvaro, Here is a v6 with most of your suggestions applied. On top of evaluateExpr() we need a comment (generally I think pgbench could do with more comments; not saying your patch should add them, just expressing an opinion.) Also, intuitively I would say that the return values of that function should be reversed: return true if things are good. Comment inverted return value done. I wonder about LOCATE and LOCATION. Can we do away with the latter, and keep only LOCATE perhaps with a better name such as PRINT_ERROR_AT or similar? I would just expand an ad-hoc fprintf in the single place where the other macro is used. I've used just one PRINT_ERROR_AT() macro consistently. Are we okay with only integer operands? Is this something we would expand in the future? Is the gaussian/exp random stuff going to work with integer operands, if we want to change it to use function syntax, as expressed elsewhere? Nothing for now, I feel it is for a later round. [other mail] bring ERROR() macro back I also prefer the code with it, but the cost-benefit of a pre-C99 compatible implementation seems quite low, and it does imply less (style) changes with the previous situation as it is. -- Fabien.diff --git a/contrib/pgbench/.gitignore b/contrib/pgbench/.gitignore index 489a2d6..aae819e 100644 --- a/contrib/pgbench/.gitignore +++ b/contrib/pgbench/.gitignore @@ -1 +1,3 @@ +/exprparse.c +/exprscan.c /pgbench diff --git a/contrib/pgbench/Makefile b/contrib/pgbench/Makefile index b8e2fc8..2d4033b 100644 --- a/contrib/pgbench/Makefile +++ b/contrib/pgbench/Makefile @@ -4,7 +4,9 @@ PGFILEDESC = pgbench - a simple program for running benchmark tests PGAPPICON = win32 PROGRAM = pgbench -OBJS = pgbench.o $(WIN32RES) +OBJS = pgbench.o exprparse.o $(WIN32RES) + +EXTRA_CLEAN = exprparse.c exprscan.c PG_CPPFLAGS = -I$(libpq_srcdir) PG_LIBS = $(libpq_pgport) $(PTHREAD_LIBS) @@ -18,8 +20,22 @@ subdir = contrib/pgbench top_builddir = ../.. include $(top_builddir)/src/Makefile.global include $(top_srcdir)/contrib/contrib-global.mk + +distprep: exprparse.c exprscan.c endif ifneq ($(PORTNAME), win32) override CFLAGS += $(PTHREAD_CFLAGS) endif + +# There is no correct way to write a rule that generates two files. +# Rules with two targets don't have that meaning, they are merely +# shorthand for two otherwise separate rules. To be safe for parallel +# make, we must chain the dependencies like this. The semicolon is +# important, otherwise make will choose the built-in rule for +# gram.y=gram.c. + +exprparse.h: exprparse.c ; + +# exprscan is compiled as part of exprparse +exprparse.o: exprscan.c diff --git a/contrib/pgbench/exprparse.y b/contrib/pgbench/exprparse.y new file mode 100644 index 000..243c6b9 --- /dev/null +++ b/contrib/pgbench/exprparse.y @@ -0,0 +1,96 @@ +%{ +/*- + * + * exprparse.y + * bison grammar for a simple expression syntax + * + * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + *- + */ + +#include postgres_fe.h + +#include pgbench.h + +PgBenchExpr *expr_parse_result; + +static PgBenchExpr *make_integer_constant(int64 ival); +static PgBenchExpr *make_variable(char *varname); +static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr, + PgBenchExpr *rexpr); + +%} + +%expect 0 +%name-prefix=expr_yy + +%union +{ + int64 ival; + char *str; + PgBenchExpr *expr; +} + +%type expr expr +%type ival INTEGER +%type str VARIABLE +%token INTEGER VARIABLE +%token CHAR_ERROR /* never used, will raise a syntax error */ + +%left '+' '-' +%left '*' '/' '%' +%right UMINUS + +%% + +result: expr{ expr_parse_result = $1; } + +expr: '(' expr ')' { $$ = $2; } + | '+' expr %prec UMINUS { $$ = $2; } + | '-' expr %prec UMINUS { $$ = make_op('-', make_integer_constant(0), $2); } + | expr '+' expr { $$ = make_op('+', $1, $3); } + | expr '-' expr { $$ = make_op('-', $1, $3); } + | expr '*' expr { $$ = make_op('*', $1, $3); } + | expr '/' expr { $$ = make_op('/', $1, $3); } + | expr '%' expr { $$ = make_op('%', $1, $3); } + | INTEGER{ $$ = make_integer_constant($1); } + | VARIABLE { $$ = make_variable($1); } + ; + +%% + +static PgBenchExpr * +make_integer_constant(int64 ival) +{ + PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr)); + + expr-etype = ENODE_INTEGER_CONSTANT; + expr-u.integer_constant.ival = ival; + return expr; +} + +static PgBenchExpr * +make_variable(char *varname) +{ + PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr)); + + expr-etype = ENODE_VARIABLE; + expr-u.variable.varname = varname; + return expr; +} + +static PgBenchExpr * +make_op(char operator, PgBenchExpr *lexpr, PgBenchExpr *rexpr) +{ + PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr)); + + expr-etype = ENODE_OPERATOR; +
Re: [HACKERS] Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs
David G Johnston david.g.johns...@gmail.com writes: Atri Sharma wrote If order of result rows is not the same as required, an error is raised: SELECT * FROM incorrect_order_nulls() ORDER BY e NULLS LAST; ERROR: Order not same as specified First reaction for the error was unfavorable but (see below) it likely is the best option and does adequately cover the reason for failure - programmer error. TBH, my first reaction to this entire patch is unfavorable: it's a solution in search of a problem. It adds substantial complication not only for users but for PG developers in order to solve a rather narrow performance issue. What would make sense to me is to teach the planner about inlining SQL functions that include ORDER BY clauses, so that the performance issue of a double sort could be avoided entirely transparently to the user. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transactions involving multiple postgres foreign servers
On Mon, Jan 5, 2015 at 2:47 PM, Tom Lane t...@sss.pgh.pa.us wrote: That's a laudable goal, but I would bet that nothing built on the FDW infrastructure will ever get there. Why? It would be surprising to me if, given that we have gone to some pains to create a system that allows cross-system queries, and hopefully eventually pushdown of quals, joins, and aggregates, we then made sharding work in some completely different way that reuses none of that infrastructure. But maybe I am looking at this the wrong way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning recovery.conf into GUCs
On 24/12/14 20:11, Alex Shulgin wrote: Alex Shulgin a...@commandprompt.com writes: - the StandbyModeRequested and StandbyMode should be lowercased like the rest of the GUCs Yes, except that standby_mode is linked to StandbyModeRequested, not the other one. I can try see if there's a sane way out of this. As I see it, renaming these will only add noise to this patch, and there is also ArchiveRecoveryRequested / InArchiveRecovery. This is going to be tricky and I'm not sure it's really worth the effort. I don't buy that to be honest, most (if not all) places that would be affected are already in diff as part of context around other renames anyway and it just does not seem right to rename some variables and not the others. I still think there should be some thought given to merging the hot_standby and standby_mode, but I think we'd need opinion of more people (especially some committers) on this one. - The whole CopyErrorData and memory context logic is not needed in check_recovery_target_time() as the FlushErrorState() is not called there You must be right. I recall having some trouble with strings being free'd prematurely, but if it's ERROR, then there should be no need for that. I'll check again. Hrm, after going through this again I'm pretty sure that was correct: the only way to obtain the current error message is to use CopyErrorData(), but that requires you to switch back to non-error memory context (via Assert). Right, my bad. The FlushErrorState() call is not there because it's handled by the hook caller (or it can exit via ereport() with elevel==ERROR). Yes I've seen that, shouldn't you FreeErrorData(edata) then though? I guess it does not matter too much considering that you are going to throw error and die eventually anyway once you are in that code path, maybe just for the clarity... -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transactions involving multiple postgres foreign servers
Robert Haas robertmh...@gmail.com writes: On Mon, Jan 5, 2015 at 2:47 PM, Tom Lane t...@sss.pgh.pa.us wrote: That's a laudable goal, but I would bet that nothing built on the FDW infrastructure will ever get there. Why? It would be surprising to me if, given that we have gone to some pains to create a system that allows cross-system queries, and hopefully eventually pushdown of quals, joins, and aggregates, we then made sharding work in some completely different way that reuses none of that infrastructure. But maybe I am looking at this the wrong way. Well, we intentionally didn't couple the FDW stuff closely into transaction commit, because of the thought that the far end would not necessarily have Postgres-like transactional behavior, and even if it did there would be about zero chance of having atomic commit with a non-Postgres remote server. postgres_fdw is a seriously bad starting point as far as that goes, because it encourages one to make assumptions that can't possibly work for any other wrapper. I think the idea I sketched upthread of supporting an external transaction manager might be worth pursuing, in that it would potentially lead to having at least an approximation of atomic commit across heterogeneous servers. Independently of that, I think what you are talking about would be better addressed outside the constraints of the FDW mechanism. That's not to say that we couldn't possibly make postgres_fdw use some additional non-FDW infrastructure to manage commits; just that solving this in terms of the FDW infrastructure seems wrongheaded to me. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery_min_apply_delay with a negative value
On 05/01/15 20:44, Robert Haas wrote: On Sat, Jan 3, 2015 at 12:04 PM, Tom Lane t...@sss.pgh.pa.us wrote: Of course, if recovery_min_apply_delay were a proper GUC, we'd just configure it with a minimum value of zero and be done :-( Amen. We should *really* convert all of the recovery.conf parameters to be GUCs. Well, there is an ongoing effort on that and I think the patch is very close to the state where committer should take a look IMHO, I have only couple of gripes with it now and one of them needs opinions of others anyway. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs
On Mon, Jan 5, 2015 at 11:24 PM, Tom Lane t...@sss.pgh.pa.us wrote: David G Johnston david.g.johns...@gmail.com writes: Atri Sharma wrote If order of result rows is not the same as required, an error is raised: SELECT * FROM incorrect_order_nulls() ORDER BY e NULLS LAST; ERROR: Order not same as specified First reaction for the error was unfavorable but (see below) it likely is the best option and does adequately cover the reason for failure - programmer error. TBH, my first reaction to this entire patch is unfavorable: it's a solution in search of a problem. It adds substantial complication not only for users but for PG developers in order to solve a rather narrow performance issue. I could agree about the scope of the performance issue, but am not sure about the added complication. It essentially is similar to, say, a combination of how Unique is implemented with a flavour or ORDINALITY implementation. A new path that is added in a certain number of cases plus a low overhead node does not seem too bad to me IMO. This is inline with a lot of real world cases I have seen, where the data is *bubbled* up to SRFs, which does give a possibility of an existing order. Couple it with the power to specify ORDER BY in your SRF function and you could save a lot. I am not sure how it complicates for hackers. Could you please elaborate a bit? What would make sense to me is to teach the planner about inlining SQL functions that include ORDER BY clauses, so that the performance issue of a double sort could be avoided entirely transparently to the user. It sounds good, but inlining in current way shall restrict the scope of optimization (which is not applicable for current design). For eg, you cannot inline RECORD returning SRFs... -- Regards, Atri *l'apprenant*
Re: [HACKERS] Transactions involving multiple postgres foreign servers
On Fri, Jan 2, 2015 at 3:45 PM, Tom Lane t...@sss.pgh.pa.us wrote: In short, you can't force 2PC technology on people who aren't using it already; while for those who are using it already, this isn't nearly good enough as-is. I was involved in some internal discussions related to this patch, so I have some opinions on it. The long-term, high-level goal here is to facilitate sharding. If we've got a bunch of PostgreSQL servers interlinked via postgres_fdw, it should be possible to perform transactions on the cluster in such a way that transactions are just as atomic, consistent, isolated, and durable as they would be with just one server. As far as I know, there is no way to achieve this goal through the use of an external transaction manager, because even if that external transaction manager guarantees, for every transaction, that the transaction either commits on all nodes or rolls back on all nodes, there's no way for it to guarantee that other transactions won't see some intermediate state where the commit has been completed on some nodes but not others. To get that, you need some of integration that reaches down to the way snapshots are taken. I think, though, that it might be worthwhile to first solve the simpler problem of figuring out how to ensure that a transaction commits everywhere or rolls back everywhere, even if intermediate states might still be transiently visible. I don't think this patch, as currently designed, is equal to that challenge, because XACT_EVENT_PRE_COMMIT fires before the transaction is certain to commit - PreCommit_CheckForSerializationFailure or PreCommit_Notify could still error out. We could have a hook that fires after that, but that doesn't solve the problem if a user of that hook can itself throw an error. Even if part of the API contract is that it's not allowed to do so, the actual attempt to commit the change on the remote side can fail due to - e.g. - a network interruption, and that's go to be dealt with somehow. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Additional role attributes superuser review
Adam, all, * Adam Brightwell (adam.brightw...@crunchydatasolutions.com) wrote: If others are also in agreement on this point then I'll update the patch accordingly. Works for me. Thanks! Stephen signature.asc Description: Digital signature
[HACKERS] NODE
hi, I am going through the hashjoin algorithm in postgres. I find a function ExecHashjoin , which is called each time a new tuple is required by the hash join *Node.* could someone explain what exactly node mean in postgres. Thanks
Re: [HACKERS] NODE
On 01/05/2015 09:28 PM, Ravi Kiran wrote: hi, I am going through the hashjoin algorithm in postgres. I find a function ExecHashjoin , which is called each time a new tuple is required by the hash join *Node.* could someone explain what exactly node mean in postgres. See src/backend/nodes/. It's a mechanism that imitates class inheritance in object-oriented languages. Node is the superclass that everything else inherits from. Every Node type supports some basic operations like copy, equals and serialization to/from text. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery_min_apply_delay with a negative value
On Sat, Jan 3, 2015 at 12:04 PM, Tom Lane t...@sss.pgh.pa.us wrote: Of course, if recovery_min_apply_delay were a proper GUC, we'd just configure it with a minimum value of zero and be done :-( Amen. We should *really* convert all of the recovery.conf parameters to be GUCs. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transactions involving multiple postgres foreign servers
Robert Haas robertmh...@gmail.com writes: I was involved in some internal discussions related to this patch, so I have some opinions on it. The long-term, high-level goal here is to facilitate sharding. If we've got a bunch of PostgreSQL servers interlinked via postgres_fdw, it should be possible to perform transactions on the cluster in such a way that transactions are just as atomic, consistent, isolated, and durable as they would be with just one server. As far as I know, there is no way to achieve this goal through the use of an external transaction manager, because even if that external transaction manager guarantees, for every transaction, that the transaction either commits on all nodes or rolls back on all nodes, there's no way for it to guarantee that other transactions won't see some intermediate state where the commit has been completed on some nodes but not others. To get that, you need some of integration that reaches down to the way snapshots are taken. That's a laudable goal, but I would bet that nothing built on the FDW infrastructure will ever get there. Certainly the proposed patch doesn't look like it moves us very far towards that set of goalposts. I think, though, that it might be worthwhile to first solve the simpler problem of figuring out how to ensure that a transaction commits everywhere or rolls back everywhere, even if intermediate states might still be transiently visible. Perhaps. I suspect that it might still be a dead end if the ultimate goal is cross-system atomic commit ... but likely it would teach us some useful things anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs
Hi All, Please forgive if this is a repost. Please find attached patch for supporting ORDER BY clause in CREATE FUNCTION for SRFs. Specifically: CREATE OR REPLACE FUNCTION func1(OUT e int, OUT f int) returns setof record as ' SELECT a,b FROM table1 ORDER BY a; ' language 'sql' ORDER BY e; This shall allow for capturing information about existing preorder that might be present inherently in the SRF's input or algorithm (the above example and think generate_series). This allows for eliminating sorts that can be based off the known existing preorder. For eg: SELECT * FROM correct_order_singlecol() ORDER BY e; # Does not need to sort by e since existing preorder is known. Eliminating such sorts can be a huge gain, especially if the expected input to needed Sort node is large. The obvious question that comes is what happens if specified ORDER BY clause is false. For checking the order, a new node is added which is top node of the plan and is responsible for projecting result rows. It tracks the previous row seen and given a sort order, ensures that the current tuple to be projected is in the required sort order. So, for above example EXPLAIN (COSTS OFF) SELECT * FROM correct_order_multicol() ORDER BY e; QUERY PLAN --- OrderCheck - Function Scan on correct_order_multicol (2 rows) If order of result rows is not the same as required, an error is raised: SELECT * FROM incorrect_order_nulls() ORDER BY e NULLS LAST; ERROR: Order not same as specified Preorder columns are first transformed into SortGroupClauses first and then stored directly in pg_proc. This functionality is a user case seen functionality, and is especially useful when SRF inputs are large and/or might be pipelined from another function (SRFs are used in pipelines in analytical systems many times, with large data). The overhead of this patch is small. A new path is added for the preorder keys, and OrderCheck node's additional cost is pretty low, given that it only compares two rows and stores only a single row (previous row seen), hence the memory footprint is minuscule. In the inner joins thread, Tom mentioned having a new node which has multiple plans and executor can decide which plan to execute given runtime conditions. I played around with the idea, and am open to experiment having a new node which has a Sort based plan and is executed in case OrderCheck node sees that the inherent order of result tuples is not correct. Feedback here would be very welcome. I will add the patch to current commitfest. Thoughts? Regards, Atri orderbycreatefuncver1.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel mode and parallel contexts
On Fri, Jan 2, 2015 at 9:04 AM, Amit Kapila amit.kapil...@gmail.com wrote: While working on parallel seq-scan patch to adapt this framework, I noticed few things and have questions regrading the same. 1. Currently parallel worker just attaches to error queue, for tuple queue do you expect it to be done in the same place or in the caller supplied function, if later then we need segment address as input to that function to attach queue to the segment(shm_mq_attach()). Another question, I have in this regard is that if we have redirected messages to error queue by using pq_redirect_to_shm_mq, then how can we set tuple queue for the same purpose. Similarly I think more handling is needed for tuple queue in master backend and the answer to above will dictate what is the best way to do it. I've come to the conclusion that it's a bad idea for tuples to be sent through the same queue as errors. We want errors (or notices, but especially errors) to be processed promptly, but there may be a considerable delay in processing tuples. For example, imagine a plan that looks like this: Nested Loop - Parallel Seq Scan on p - Index Scan on q Index Scan: q.x = p.x The parallel workers should fill up the tuple queues used by the parallel seq scan so that the master doesn't have to do any of that work itself. Therefore, the normal situation will be that those tuple queues are all full. If an error occurs in a worker at that point, it can't add it to the tuple queue, because the tuple queue is full. But even if it could do that, the master then won't notice the error until it reads all of the queued-up tuple messges that are in the queue in front of the error, and maybe some messages from the other queues as well, since it probably round-robins between the queues or something like that. Basically, it could do a lot of extra work before noticing that error in there. Now we could avoid that by having the master read messages from the queue immediately and just save them off to local storage if they aren't error messages. But that's not very desirable either, because now we have no flow-control. The workers will just keep spamming tuples that the master isn't ready for into the queues, and the master will keep reading them and saving them to local storage, and eventually it will run out of memory and die. We could engineer some solution to this problem, of course, but it seems quite a bit simpler to just have two queues. The error queues don't need to be very big (I made them 16kB, which is trivial on any system on which you care about having working parallelism) and the tuple queues can be sized as needed to avoid pipeline stalls. 2. Currently there is no interface for wait_for_workers_to_become_ready() in your patch, don't you think it is important that before start of fetching tuples, we should make sure all workers are started, what if some worker fails to start? I think that, in general, getting the most benefit out of parallelism means *avoiding* situations where backends have to wait for each other. If the relation being scanned is not too large, the user backend might be able to finish the whole scan - or a significant fraction of it - before the workers initialize. Of course in that case it might have been a bad idea to parallelize in the first place, but we should still try to make the best of the situation. If some worker fails to start, then instead of having the full degree N parallelism we were hoping for, we have some degree K N, so things will take a little longer, but everything should still work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
On 05/01/15 07:28, Fujii Masao wrote: On Thu, Dec 4, 2014 at 12:08 PM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Dec 3, 2014 at 11:54 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Pushed with some extra cosmetic tweaks. I got the following assertion failure when I executed pg_xact_commit_timestamp() in the standby server. =# select pg_xact_commit_timestamp('1000'::xid); TRAP: FailedAssertion(!(((oldestCommitTs) != ((TransactionId) 0)) == ((newestCommitTs) != ((TransactionId) 0))), File: commit_ts.c, Line: 315) server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: 2014-12-04 12:01:08 JST sby1 LOG: server process (PID 15545) was terminated by signal 6: Aborted 2014-12-04 12:01:08 JST sby1 DETAIL: Failed process was running: select pg_xact_commit_timestamp('1000'::xid); The way to reproduce this problem is #1. set up and start the master and standby servers with track_commit_timestamp disabled #2. enable track_commit_timestamp in the master and restart the master #3. run some write transactions #4. enable track_commit_timestamp in the standby and restart the standby #5. execute select pg_xact_commit_timestamp('1000'::xid) in the standby BTW, at the step #4, I got the following log messages. This might be a hint for this problem. LOG: file pg_commit_ts/ doesn't exist, reading as zeroes CONTEXT: xlog redo Transaction/COMMIT: 2014-12-04 12:00:16.428702+09; inval msgs: catcache 59 catcache 58 catcache 59 catcache 58 catcache 45 catcache 44 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 snapshot 2608 relcache 16384 This problem still happens in the master. Regards, Attached patch fixes it, I am not sure how happy I am with the way I did it though. And while at it I noticed that redo code for XLOG_PARAMETER_CHANGE sets ControlFile-wal_log_hints = wal_log_hints; shouldn't it be ControlFile-wal_log_hints = xlrec.wal_log_hints; instead? -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c index ca074da..fcfccf8 100644 --- a/src/backend/access/transam/commit_ts.c +++ b/src/backend/access/transam/commit_ts.c @@ -557,6 +557,12 @@ StartupCommitTs(void) TransactionId xid = ShmemVariableCache-nextXid; int pageno = TransactionIdToCTsPage(xid); + if (track_commit_timestamp) + { + ActivateCommitTs(); + return; + } + LWLockAcquire(CommitTsControlLock, LW_EXCLUSIVE); /* @@ -571,14 +577,25 @@ StartupCommitTs(void) * This must be called ONCE during postmaster or standalone-backend startup, * when commit timestamp is enabled. Must be called after recovery has * finished. + */ +void +CompleteCommitTsInitialization(void) +{ + if (!track_commit_timestamp) + DeactivateCommitTs(true); +} + +/* + * This must be called when track_commit_timestamp is turned on. + * Note that this only happens during postmaster or standalone-backend startup + * or during WAL replay. * * This is in charge of creating the currently active segment, if it's not * already there. The reason for this is that the server might have been * running with this module disabled for a while and thus might have skipped * the normal creation point. */ -void -CompleteCommitTsInitialization(void) +void ActivateCommitTs(void) { TransactionId xid = ShmemVariableCache-nextXid; int pageno = TransactionIdToCTsPage(xid); @@ -591,22 +608,6 @@ CompleteCommitTsInitialization(void) LWLockRelease(CommitTsControlLock); /* - * If this module is not currently enabled, make sure we don't hand back - * possibly-invalid data; also remove segments of old data. - */ - if (!track_commit_timestamp) - { - LWLockAcquire(CommitTsLock, LW_EXCLUSIVE); - ShmemVariableCache-oldestCommitTs = InvalidTransactionId; - ShmemVariableCache-newestCommitTs = InvalidTransactionId; - LWLockRelease(CommitTsLock); - - TruncateCommitTs(ReadNewTransactionId()); - - return; - } - - /* * If CommitTs is enabled, but it wasn't in the previous server run, we * need to set the oldest and newest values to the next Xid; that way, we * will not try to read data that might not have been set. @@ -641,6 +642,35 @@ CompleteCommitTsInitialization(void) } /* + * This must be called when track_commit_timestamp is turned off. + * Note that this only happens during postmaster or standalone-backend startup + * or during WAL replay. + * + * Resets CommitTs into invalid state to make sure we don't hand back + * possibly-invalid data; also removes segments of old data. + */ +void +DeactivateCommitTs(bool do_wal) +{ + TransactionId xid = ShmemVariableCache-nextXid; +
Re: [HACKERS] Parallel Seq Scan
* Robert Haas (robertmh...@gmail.com) wrote: I think it's right to view this in the same way we view work_mem. We plan on the assumption that an amount of memory equal to work_mem will be available at execution time, without actually reserving it. Agreed- this seems like a good approach for how to address this. We should still be able to end up with plans which use less than the max possible parallel workers though, as I pointed out somewhere up-thread. This is also similar to work_mem- we certainly have plans which don't expect to use all of work_mem and others that expect to use all of it (per node, of course). Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] add modulo (%) operator to pgbench
Hello Alvaro, On top of evaluateExpr() we need a comment (generally I think pgbench could do with more comments; not saying your patch should add them, just expressing an opinion.) Having spent some time in pgbench, I agree that more comments are a good thing. Also, intuitively I would say that the return values of that function should be reversed: return true if things are good. Ok. Can we cause a stack overflow in this function with a complex enough expression? Not for any practical purpose, IMO. I wonder about LOCATE and LOCATION. Can we do away with the latter, and keep only LOCATE perhaps with a better name such as PRINT_ERROR_AT or similar? I would just expand an ad-hoc fprintf in the single place where the other macro is used. I think that all location information should always be the same, so having it defined only once helps maintenance. If someone fixes the macro and there is one expanded version it is likely that it would not be changed. Maybe we could do with only one macro, though. Are we okay with only integer operands? Is this something we would expand in the future? Probably Is the gaussian/exp random stuff going to work with integer operands, No, it will need a floating point parameter, but I was thinking of only adding constants floats as function arguments in a first approach, and not allow an expression syntax on these, something like: \set n exprand(1, :size+3, 2.0) + 1 But not \set n exprand(1, :size+3, :size/3.14159) + 1 That is debatable. Otherwise we have to take care of typing issues, which would complicate the code significantly with two dynamic types (int float) to handle, propagate and so in the expression evaluation. It is possible though, but it seems to me that it is a lot of bother for a small added value. Anyway, I suggest to keep that for another round and keep the Robert's isofunctional patch as it is before extending. if we want to change it to use function syntax, as expressed elsewhere? I think I'll add a function syntax, and add a new node type to handle these, but the current syntax should/might be preserved for upward compatibility. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] segmentation fault in execTuples.c#ExecStoreVirtualTuple
Hi, we are running postges 9.3.5 on gentoo linux kernel 3.16.5, compiled with gcc 4.8.3 getting a segfault from time to time with the below core dump. The error happens only on our production system and is not reproducible a second run after database recovery always succeed without any error. From the coredump I can’t imagine where this error might come from. Any ideas ? Thanks Manuel #0 0x00608f74 in ExecStoreVirtualTuple (slot=0x24a40bd00) at execTuples.c:508 #1 0x005f9d6c in ExecFilterJunk (junkfilter=0x24a40cf50, slot=0x3ecbac18) at execJunk.c:318 #2 0x0061eba4 in ExecModifyTable (node=0x59ac150) at nodeModifyTable.c:988 #3 0x005fda57 in ExecProcNode (node=0x59ac150) at execProcnode.c:377 #4 0x00622a62 in CteScanNext (node=0x29500050) at nodeCtescan.c:103 #5 0x006084e7 in ExecScanFetch (node=0x29500050, accessMtd=0x622938 CteScanNext, recheckMtd=0x622ac8 CteScanRecheck) at execScan.c:82 #6 0x00608556 in ExecScan (node=0x29500050, accessMtd=0x622938 CteScanNext, recheckMtd=0x622ac8 CteScanRecheck) at execScan.c:132 #7 0x00622afd in ExecCteScan (node=0x29500050) at nodeCtescan.c:155 #8 0x005fdb53 in ExecProcNode (node=0x29500050) at execProcnode.c:434 #9 0x005fbcaf in ExecutePlan (estate=0x9280ef0, planstate=0x29500050, operation=CMD_SELECT, sendTuples=1 '\001', numberTuples=0, direction=ForwardScanDirection, dest=0x13b4530) at execMain.c:1473 #10 0x005fa12b in standard_ExecutorRun (queryDesc=0x228e58a78, direction=ForwardScanDirection, count=0) at execMain.c:307 #11 0x005fa02d in ExecutorRun (queryDesc=0x228e58a78, direction=ForwardScanDirection, count=0) at execMain.c:255 #12 0x0059449a in ExecCreateTableAs (stmt=0x927e3e0, queryString=0x22b880bb0 \n CREATE TEMPORARY TABLE unpro_inc_23472 ON COMMIT DROP AS\n WITH affected AS (\n UPDATE events t SET processed = true\n WHERE tracker_id BETWEEN $1 AND $2\n AND ajdate(t.created_at) ..., params=0x563d0520, completionTag=0x7aa88cc0 ) at createas.c:170 #13 0x007372c0 in ProcessUtilitySlow (parsetree=0x927e3e0, queryString=0x22b880bb0 \n CREATE TEMPORARY TABLE unpro_inc_23472 ON COMMIT DROP AS\n WITH affected AS (\n UPDATE events t SET processed = true\n WHERE tracker_id BETWEEN $1 AND $2\n AND ajdate(t.created_at) ..., context=PROCESS_UTILITY_QUERY, params=0x563d0520, dest=0xc56fc0 spi_printtupDR, completionTag=0x7aa88cc0 ) at utility.c:1231 #14 0x0073688f in standard_ProcessUtility (parsetree=0x927e3e0, queryString=0x22b880bb0 \n CREATE TEMPORARY TABLE unpro_inc_23472 ON COMMIT DROP AS\n WITH affected AS (\n UPDATE events t SET processed = true\n WHERE tracker_id BETWEEN $1 AND $2\n AND ajdate(t.created_at) ..., context=PROCESS_UTILITY_QUERY, params=0x563d0520, dest=0xc56fc0 spi_printtupDR, completionTag=0x7aa88cc0 ) at utility.c:829 #15 0x00735ba3 in ProcessUtility (parsetree=0x927e3e0, queryString=0x22b880bb0 \n CREATE TEMPORARY TABLE unpro_inc_23472 ON COMMIT DROP AS\n WITH affected AS (\n UPDATE events t SET processed = true\n WHERE tracker_id BETWEEN $1 AND $2\n AND ajdate(t.created_at) ..., context=PROCESS_UTILITY_QUERY, params=0x563d0520, dest=0xc56fc0 spi_printtupDR, completionTag=0x7aa88cc0 ) at utility.c:309 #16 0x0062e897 in _SPI_execute_plan (plan=0x7aa88d70, paramLI=0x563d0520, snapshot=0x0, crosscheck_snapshot=0x0, read_only=0 '\000', fire_triggers=1 '\001', tcount=0) at spi.c:2160 #17 0x0062bb9d in SPI_execute_with_args ( src=0x22b880bb0 \n CREATE TEMPORARY TABLE unpro_inc_23472 ON COMMIT DROP AS\n WITH affected AS (\n UPDATE events t SET processed = true\n WHERE tracker_id BETWEEN $1 AND $2\n AND ajdate(t.created_at) ..., nargs=4, argtypes=0x22b880ab0, Values=0x22b880af0, Nulls=0x22b881010 \002, read_only=0 '\000', tcount=0) at spi.c:537 #18 0x7f3635560ed0 in exec_stmt_dynexecute (estate=0x7aa890d0, stmt=0xe2f140) at pl_exec.c:3462 #19 0x7f363555cfe9 in exec_stmt (estate=0x7aa890d0, stmt=0xe2f140) at pl_exec.c:1450 #20 0x7f363555cd05 in exec_stmts (estate=0x7aa890d0, stmts=0xe2ef58) at pl_exec.c:1345 #21 0x7f363555cbb0 in exec_stmt_block (estate=0x7aa890d0, block=0xe2fa80) at pl_exec.c:1283 #22 0x7f363555ab97 in plpgsql_exec_function (func=0xe1f5b0, fcinfo=0x11daac0) at pl_exec.c:321 #23 0x7f3632be in plpgsql_call_handler (fcinfo=0x11daac0) at pl_handler.c:129 #24 0x0060114d in ExecMakeFunctionResultNoSets (fcache=0x11daa50, econtext=0x346e2840, isNull=0x7aa89473 , isDone=0x0) at execQual.c:1992 #25 0x00601b4b in ExecEvalFunc (fcache=0x11daa50, econtext=0x346e2840, isNull=0x7aa89473 , isDone=0x0) at execQual.c:2383 #26 0x7f3635563f3b in exec_eval_simple_expr (estate=0x7aa89740, expr=0xdc4ab0,
Re: [HACKERS] add modulo (%) operator to pgbench
I'm also just looking at you ERROR() macro, it seems that core code is quite careful not to use __VA_ARGS__ on compilers where HAVE__VA_ARGS is not defined. I'd say this needs to be fixed too. Have a look at the trick used in elog.h for when HAVE__VA_ARGS is not defined. Hm, I just looked at the previous version which used ERROR rather than LOCATE and LOCATION, and I liked that approach better. Can we get that back? It seems that postgresql must be able to compile with a preprocessor which does not handle varargs macros, so I thought it simpler to just expand the macro. If we must have it without a vararg macro, it means creating an adhoc vararg function to handle the vfprintf and exit. Oviously it can be done, if vfprintf is available. The prior style was to repeat fprintf/exit everywhere, I wanted to improve a little, but not to bother too much about portability constraints with pre C99 compilers. I understand that for the purposes of this patch it's easier to not change existing fprintf()/exit() calls, though. The issue is really the portability constraints. C99 is only 16 years old, so it is a minor language:-) -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs
Atri Sharma wrote If order of result rows is not the same as required, an error is raised: SELECT * FROM incorrect_order_nulls() ORDER BY e NULLS LAST; ERROR: Order not same as specified First reaction for the error was unfavorable but (see below) it likely is the best option and does adequately cover the reason for failure - programmer error. It is not data specific (other than by accident) so any runtime attempt to correct an error is going to be wasted effort and overhead. In the inner joins thread, Tom mentioned having a new node which has multiple plans and executor can decide which plan to execute given runtime conditions. I played around with the idea, and am open to experiment having a new node which has a Sort based plan and is executed in case OrderCheck node sees that the inherent order of result tuples is not correct. Feedback here would be very welcome. Could SQL functions be explored such that if the planner sees an order by it omits the post-function sort node otherwise it adds an explicit one? How expensive is sorting an already sorted result? Runtime conditional sorting seems worse case, depending on implementation, since a large result with an error in the last bit will end up nearly double processing. I'd rather deem unsorted output programmer error and raise the error so it is likely to be discovered during testing and not have any meaningful runtime overhead. David J. -- View this message in context: http://postgresql.nabble.com/Patch-to-add-functionality-to-specify-ORDER-BY-in-CREATE-FUNCTION-for-SRFs-tp5832876p5832885.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL information view
On 11/19/14 7:36 AM, Magnus Hagander wrote: + row + entrystructnamepg_stat_ssl/indextermprimarypg_stat_ssl/primary/indexterm/entry + entryOne row per connection (regular and replication), showing statistics about +SSL used on this connection. +See xref linkend=pg-stat-ssl-view for details. + /entry + /row + It doesn't really show statistics. It shows information or data. We should make contrib/sslinfo a wrapper around this view as much as possible. Is it useful to include rows for sessions not using SSL? Should we perpetuate the ssl-naming? Is there a more general term? Will this work for non-OpenSSL implementations? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs
On 1/5/15, 12:20 PM, Atri Sharma wrote: What would make sense to me is to teach the planner about inlining SQL functions that include ORDER BY clauses, so that the performance issue of a double sort could be avoided entirely transparently to the user. It sounds good, but inlining in current way shall restrict the scope of optimization (which is not applicable for current design). For eg, you cannot inline RECORD returning SRFs... Related... I'd like to see a way to inline a function that does something like: CREATE FUNCTION foo(text) RETURNS int LANGUAGE sql AS $$ SELECT a FROM b WHERE lower(b.c) = lower($1) $$ and have the performance be comparable to SELECT ..., (SELECT a FROM b WHERE lower(b.c) = lower(something)) AS foo I realize that there's a whole question about the function not being an SRF, but the thing is this works great when manually inlined and is fast. The SQL function is significantly slower. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs
Jim Nasby jim.na...@bluetreble.com writes: Related... I'd like to see a way to inline a function that does something like: CREATE FUNCTION foo(text) RETURNS int LANGUAGE sql AS $$ SELECT a FROM b WHERE lower(b.c) = lower($1) $$ The reason that's not inlined ATM is that the semantics wouldn't be the same (ie, what happens if the SELECT returns more than one row). It's possible they would be the same if we attached a LIMIT 1 to the function's query, but I'm not 100% sure about that offhand. I'm also not real sure that you'd still get good performance if there were an inserted LIMIT; that would disable at least some optimizations. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers