Re: [HACKERS] [bug fix] pg_ctl stop times out when it should respond quickly
On Sun, Jan 5, 2014 at 3:49 PM, MauMau maumau...@gmail.com wrote: Could you confirm again and tell me what problem is happening? FWIW, I just quickly tested those two patches independently and got them correctly applied with patch -p1 $PATCH on master at edc4345. They compiled and passed as well make check. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Support for pg_stat_archiver view
On Mon, Jan 6, 2014 at 5:37 PM, Gabriele Bartolini gabriele.bartol...@2ndquadrant.it wrote: Hi Fabrizio, Il 05/01/14 20:46, Fabrizio Mello ha scritto: I don't see your code yet, but I would like to know if is possible to implement this view as an extension. I wanted to do it as an extension - so that I could backport that to previous versions of Postgres. I do not think it is a possibility, given that the client code that is aware of the events lies in pgarch.c. I don't see particularly any point in doing that as an extension as you want to log this statistical information once archive has either failed or passed. I just had a quick look at v2, and it looks that the patch is in good shape. Sorry to be picky, but I am not sure that using the character m_type is adapted when delivering messages to pgstat facility. You might want to use a boolean instead... MAX_XFN_CHARS is not adapted as well IMO: you should remove it and use instead MAXFNAMELEN of xlog_internal.h, where all the file names related to WAL files, including history and backup files, are generated. Then, the patch looks to be working as expected, here are some results with a short test: =# \x Expanded display (expanded) is on. =# select * from pg_stat_get_archiver(); -[ RECORD 1 ]--+-- archived_wals | 6 last_archived_wal | 00010005 last_archived_wal_time | 2014-01-07 17:27:34.752903+09 failed_attempts| 12 last_failed_wal| 00010006 last_failed_wal_time | 2014-01-07 17:31:18.409528+09 stats_reset (1 row) =# select * from pg_stat_archiver; -[ RECORD 1 ]--+-- archived_wals | 6 last_archived_wal | 00010005 last_archived_wal_time | 2014-01-07 17:27:34.752903+09 failed_attempts| 12 last_failed_wal| 00010006 last_failed_wal_time | 2014-01-07 17:31:18.409528+09 stats_reset| 2014-01-07 17:25:51.949498+09 Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixing pg_basebackup with tablespaces found in $PGDATA
On Thu, Jan 2, 2014 at 11:34 PM, Bernd Helmle maili...@oopsware.de wrote: --On 1. Januar 2014 23:53:46 +0100 Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Hi, As much as I've seen people frown upon $subject, it still happens in the wild, and Magnus seems to agree that the current failure mode of our pg_basebackup tool when confronted to the situation is a bug. So here's a fix, attached. I've seen having tablespaces under PGDATA as a policy within several data centres in the past. The main reasoning behind this seems that they strictly separate platform and database administration and for database inventory reasons. They are regularly surprised if you tell them to not use tablespaces in such a way, since they absorbed this practice over the years from other database systems. So +1 for fixing this. Same here. +1 for fixing it. Having tablespaces in PGDATA itself is most of the time part of some obscure policy making easier to manage all the data in one folder... -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ERROR: missing chunk number 0 for toast value
On 01/06/2014 08:29 PM, Andres Freund wrote: On 2014-01-06 12:40:25 -0500, Robert Haas wrote: On Mon, Jan 6, 2014 at 11:47 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-01-06 11:08:41 -0500, Robert Haas wrote: Yea. But at least it would fail reliably instead of just under concurrency and other strange circumstances - and there'd be a safe way out. Currently there seem to be all sorts of odd behaviour possible. I simply don't have a better idea :( Is forcibly detoast everything a complete no-go? I realize there are performance concerns with that approach, but I'm not sure how realistic a worry it actually is. The scenario I am primarily worried about is turning a record assignment which previously took up to BLOCK_SIZE + slop amount of memory into something taking up to a gigabyte. That's a pretty damn hefty change. And there's no good way of preventing it short of using a variable for each actually desired column which imnsho isn't really a solution. We could mitigate that somewhat by doing an optimization pass of the PL/pgSQL code after compilation, and check which fields of a row variable are never referenced, and skip the detoasting for those fields. It would only work for named row variables, not anonymous record variables, and you would still unnecessarily detoast fields that are sometimes accessed but usually not. But it would avoid the detoasting in the most egregious cases, e.g where you fetch a whole row into a variable just to access one field. Overall, I'm leaning towards biting the bullet and always detoasting everything in master. Probably best to just leave the stable branches alone. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] In-core regression tests for replication, cascading, archiving, PITR, etc. Michael Paquier
On 01/06/2014 07:12 PM, Mark Dilger wrote: The reason I was going to all the trouble of creating chrooted environments was to be able to replicate clusters that have tablespaces. You can remove and recreate the symlink in pg_tblspc directory, after creating the cluster, to point it to a different location. It might be a bit tricky to do that if you have two clusters running at the same time, but it's probably easier than chrooting anyway. For example: 1. stop the standby 2. create the tablespace in master 3. stop master 4. mv the tablespace directory, and modify the symlink in master to point to the new location 5. start standby. It will replay the tablespace creation in the original location 6. restart master. You now have the same tablespace in master and standby, but they point to different locations. This doesn't allow dynamically creating and dropping tablespaces during tests, but at least it gives you one tablespace to use. Another idea would be to do something like chroot, but more lightweight, using FUSE, private mount namespaces, or cgroups. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: How to reproduce serialization failure for a read only transaction.
On Mon, Jan 06, 2014 at 05:14:12PM -0800, AK wrote: Also I cannot reproduce a scenario when applications must not depend on results read during a transaction that later aborted;. In this example the SELECT itself has failed. Can you show an example where a SELECT completes, but the COMMIT blows up? Actually, no, not for a read-only transaction. It happens that the final serialization failure check executed on COMMIT only affects read/write transactions, not read-only ones. That's a pretty specific implementation detail, though, so I wouldn't necessarily rely on it... Here's an example of why applications must not depend on results read during a transaction that later aborted: W2: BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE W2: UPDATE t SET count=1 WHERE id=1; W1: BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE W1: SELECT * FROM t WHERE id=1; W2: COMMIT; R : BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE READ ONLY R : SELECT * FROM t; R : COMMIT; ! W1: UPDATE t SET count=1 WHERE id=2; W1: COMMIT; If you try this, it'll cause a serialization failure on the line marked with a '!'. W1 saw (1,0) in the table, so W1 appears to have executed before W2. But R saw both (1,1) and (2,0) in the table, and that has to be a consistent snapshot of the database state, meaning W2 appears to have executed before W1. That's an inconsistency, so something has to be rolled back. This particular anomaly requires all three of the transactions, and so it can't be detected until W1 does its UPDATE. Postgres detects the conflict at that point and rolls back W1. So what does this have to do with relying on the results of read-only transactions that abort? Well, what if you had instead had R ROLLBACK instead of COMMIT -- maybe because you expected ROLLBACK and COMMIT to be equivalent for transactions that don't modify the database, or maybe because something else caused the transaction to abort? When W1 does its update, it will be checked for serialization failures, but aborted transactions are (intentionally) not included in those checks. W1 is therefore allowed to commit; the apparent serial order of execution is W1 followed by W2, and the results of the aborted transaction R aren't consistent with that. Dan -- Dan R. K. PortsUW CSEhttp://drkp.net/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] truncating pg_multixact/members
On 2014-01-06 20:51:57 -0500, Robert Haas wrote: On Mon, Jan 6, 2014 at 7:50 PM, Jim Nasby j...@nasby.net wrote: On 1/4/14, 8:19 AM, Robert Haas wrote: Also, while multixactid_freeze_min_age should be low, perhaps a million as you suggest, multixactid_freeze_table_age should NOT be lowered to 3 million or anything like it. If you do that, people who are actually doing lots of row locking will start getting many more full-table scans. We want to avoid that at all cost. I'd probably make the default the same as for vacuum_freeze_table_age, so that mxids only cause extra full-table scans if they're being used more quickly than xids. Same default as vacuum_freeze_table_age, or default TO vacuum_freeze_table_age? I'm thinking the latter makes more sense... Same default. I think it's a mistake to keep leading people to think that the sensible values for one set of parameters are somehow related to a sensible set of values for the other set. They're really quite different things. Valid argument - on the other hand, defaulting to the current variable's value has the advantage of being less likely to cause pain when doing a minor version upgrade because suddenly full table vacuums are much more frequent. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Triggers on foreign tables
Hello. Since the last discussion about it (http://www.postgresql.org/message-id/cadyhksugp6ojb1pybtimop3s5fg_yokguto-7rbcltnvaj5...@mail.gmail.com), I finally managed to implement row triggers for foreign tables. The attached patch does not contain regression tests or documentation yet, a revised patch will include those sometime during the week. I'll add it to the commitfest then. A simple test scenario using postgres_fdw is however included as an attachment. The attached patch implements triggers for foreign tables according to the design discussed in the link above. For statement-level triggers, nothing has changed since the last patch. Statement-level triggers are just allowed in the command checking. For row-level triggers, it works as follows: - during rewrite phase, every attribute of the foreign table is duplicated as a resjunk entry if a trigger is defined on the relation. These entries are then used to store the old values for a tuple. There is room for improvement here: we could check if any trigger is in fact a row-level trigger - during execution phase, the before triggers are fired exactly like triggers on regular tables, except that old tuples are not fetched using their ctid, but rebuilt using the previously-stored resjunk attributes. - for after triggers, the whole queuing mechanism is bypassed for foreign tables. This is IMO acceptable, since foreign tables cannot have constraints or constraints triggers, and thus have not need for deferrable execution. This design avoids the need for storing and retrieving/identifying remote tuples until the query or transaction end. - the duplicated resjunk attributes are identified by being: - marked as resjunk in the targetlist - not being system or whole-row attributes (varno 0) There is still one small issue with the attached patch: modifications to the tuple performed by the foreign data wrapper (via the returned TupleTableSlot in ExecForeignUpdate and ExecForeignInsert hooks) are not visible to the AFTER trigger. This could be fixed by merging the planslot containing the resjunk columns with the returned slot before calling the trigger, but I'm not really sure how to safely perform that. Any advice ? Many thanks to Kohei Kaigai for taking the time to help with the design. -- Ronan Dunklau http://dalibo.com - http://dalibo.org test_with_postgres_fdw.sql Description: application/sql diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index b9cd88d..a509595 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -3150,13 +3150,16 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd, /* No command-specific prep needed */ break; case AT_EnableTrig: /* ENABLE TRIGGER variants */ - case AT_EnableAlwaysTrig: - case AT_EnableReplicaTrig: case AT_EnableTrigAll: case AT_EnableTrigUser: case AT_DisableTrig: /* DISABLE TRIGGER variants */ case AT_DisableTrigAll: case AT_DisableTrigUser: + ATSimplePermissions(rel, ATT_TABLE | ATT_FOREIGN_TABLE); + pass = AT_PASS_MISC; + break; + case AT_EnableReplicaTrig: + case AT_EnableAlwaysTrig: case AT_EnableRule: /* ENABLE/DISABLE RULE variants */ case AT_EnableAlwaysRule: case AT_EnableReplicaRule: diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c index 4eff184..12870eb 100644 --- a/src/backend/commands/trigger.c +++ b/src/backend/commands/trigger.c @@ -75,6 +75,14 @@ static HeapTuple GetTupleForTrigger(EState *estate, ItemPointer tid, LockTupleMode lockmode, TupleTableSlot **newSlot); + +static HeapTuple ExtractOldTuple(TupleTableSlot * mixedtupleslot, + ResultRelInfo * relinfo); +static void ExecARForeignTrigger(EState * estate, +TriggerData LocTriggerData, +ResultRelInfo *relinfo, +int triggerEvent, int triggerType); + static bool TriggerEnabled(EState *estate, ResultRelInfo *relinfo, Trigger *trigger, TriggerEvent event, Bitmapset *modifiedCols, @@ -184,12 +192,22 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString, RelationGetRelationName(rel)), errdetail(Views cannot have TRUNCATE triggers.))); } + else if (rel-rd_rel-relkind == RELKIND_FOREIGN_TABLE) + { + if(stmt-isconstraint) + ereport(ERROR, + (errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg(\%s\ is a foreign table, + RelationGetRelationName(rel)), + errdetail(Foreign Tables cannot have constraint triggers.))); + } else + { ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg(\%s\ is not a table or view, RelationGetRelationName(rel; - + } if (!allowSystemTableMods IsSystemRelation(rel)) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), @@ -1062,10 +1080,11 @@ RemoveTriggerById(Oid trigOid) rel = heap_open(relid, AccessExclusiveLock); if (rel-rd_rel-relkind != RELKIND_RELATION - rel-rd_rel-relkind != RELKIND_VIEW) + rel-rd_rel-relkind !=
Re: [HACKERS] ERROR: missing chunk number 0 for toast value
On 2014-01-07 10:45:24 +0200, Heikki Linnakangas wrote: Overall, I'm leaning towards biting the bullet and always detoasting everything in master. Probably best to just leave the stable branches alone. If we're doing something coarse grained as this, I agree, it should be master only. I personally vote to rather just leave things as is, seems better than this pessimization, and it's not like loads of people have hit the issue. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] PostgreSQL in Windows console and Ctrl-C
Hello all, when pg_ctl start is used to run PostgreSQL in a console window on Windows, it runs in the background (it is terminated by closing the window, but that is probably inevitable). There is one problem, however: The first Ctrl-C in that window, no matter in which situation, will cause the background postmaster to exit. If you, say, ping something, and press Ctrl-C to stop ping, you probably don't want the database to go away, too. The reason is that Windows delivers the Ctrl-C event to all processes using that console, not just to the foreground one. Here's a patch to fix that. pg_ctl stop still works, and it has no effect when running as a service, so it should be safe. It starts the postmaster in a new process group (similar to calling setpgrp() after fork()) that does not receive Ctrl-C events from the console window. -- Christian diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c new file mode 100644 index 50d4586..89a9544 *** a/src/bin/pg_ctl/pg_ctl.c --- b/src/bin/pg_ctl/pg_ctl.c *** CreateRestrictedProcess(char *cmd, PROCE *** 1561,1566 --- 1561,1567 HANDLE restrictedToken; SID_IDENTIFIER_AUTHORITY NtAuthority = {SECURITY_NT_AUTHORITY}; SID_AND_ATTRIBUTES dropSids[2]; + DWORD flags; /* Functions loaded dynamically */ __CreateRestrictedToken _CreateRestrictedToken = NULL; *** CreateRestrictedProcess(char *cmd, PROCE *** 1636,1642 AddUserToTokenDacl(restrictedToken); #endif ! r = CreateProcessAsUser(restrictedToken, NULL, cmd, NULL, NULL, TRUE, CREATE_SUSPENDED, NULL, NULL, si, processInfo); Kernel32Handle = LoadLibrary(KERNEL32.DLL); if (Kernel32Handle != NULL) --- 1637,1650 AddUserToTokenDacl(restrictedToken); #endif ! flags = CREATE_SUSPENDED; ! ! /* Protect console process from Ctrl-C */ ! if (!as_service) { ! flags |= CREATE_NEW_PROCESS_GROUP; ! } ! ! r = CreateProcessAsUser(restrictedToken, NULL, cmd, NULL, NULL, TRUE, flags, NULL, NULL, si, processInfo); Kernel32Handle = LoadLibrary(KERNEL32.DLL); if (Kernel32Handle != NULL) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] generic pseudotype IO functions?
On 2014-01-06 11:56:28 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: I think I am less concerned about pseudotypes.c than about bloating pg_proc.h even further and about the annoyance of editing it - but I guess that should rather be fixed by storing it in a more sensible format at some point... Yeah, getting rid of a dozen pseudotype I/O functions is hardly going to reduce the PITA factor of editing pg_proc.h. It's interesting to think about moving all those DATA() macros into some more-maintainable format --- I'm not sure what it should be exactly, but I think something that can insert plausible defaults for omitted columns would be a big help for pg_proc and maybe some of the other catalogs too. Alvaro previously suggested storing pg_proc in json. Not sure I like it, but it'd sure be better than the current format if we derive unspecified values from other columns (e.g. prorows = 0 iff proretset). I think we also should auto-assign the oids for pg_proc (and some other tables) rows if we go there. Afaics there's really not much reason to keep them stable and it's by far the most frequent conflict I have seen with keeping patches up2date. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dynamic shared memory and locks
On 2014-01-06 21:35:22 -0300, Alvaro Herrera wrote: Jim Nasby escribió: On 1/6/14, 2:59 PM, Robert Haas wrote: On Mon, Jan 6, 2014 at 3:57 PM, Tom Lane t...@sss.pgh.pa.us wrote: The point I'm making is that no such code should get past review, whether it's got an obvious performance problem or not. Sure, I agree, but we all make mistakes. It's just a judgement call as to how likely you think it is that someone might make this particular mistake, a topic upon which opinions may vary. I don't think it's that unlikely as the previous implementation's rules when viewed while squinting allowed nesting spinlocks. And it's a pretty simple check. Maybe it makes sense to have such a check #ifdef'ed out on most builds to avoid extra overhead, but not having any check at all just because we trust the review process too much doesn't strike me as the best of ideas. I don't think that check would have relevantly high performance impact in comparison to the rest of --enable-cassert - it's a single process local variable which is regularly accessed. It will just stay in L1 or even registers. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ERROR: missing chunk number 0 for toast value
On Jan7, 2014, at 09:45 , Heikki Linnakangas hlinnakan...@vmware.com wrote: Overall, I'm leaning towards biting the bullet and always detoasting everything in master. Probably best to just leave the stable branches alone. +1 The fact that de-TOAST-ing can happen lazily is, at least to me, an implementation detail that shouldn't be observable. If we want to allow people to use lazy de-TOAST-ing as an optimization tool, we should provide an explicit way to do so, e.g. by flagging variables in pl/pgsql as REFERENCE or something like that. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Upgrade to Autoconf 2.69
That was probably me. I'll look into it. On Jan 6, 2014, at 11:40 PM, Tom Lane t...@sss.pgh.pa.us wrote: Bruce Momjian br...@momjian.us writes: On Sun, Dec 29, 2013 at 02:48:21AM -0500, Tom Lane wrote: 3. pg_upgrade ignores the fact that pg_resetxlog failed, and keeps going. Does pg_resetxlog return a non-zero exit status? If so, pg_upgrade should have caught that and exited. It certainly does: if (errno) { fprintf(stderr, _(%s: could not read from directory \%s\: %s\n), progname, XLOGDIR, strerror(errno)); exit(1); } The bug is that pg_upgrade appears to assume (in many places not just this one) that exec_prog() will abort if the called program fails, but *it doesn't*, contrary to the claim in its own header comment. This is because pg_log(FATAL, ...) doesn't call exit(). pg_fatal() does, but that's not what's being called in the throw_error case. I imagine that this used to work correctly and got broken by some ill-advised refactoring, but whatever the origin, it's 100% broken today. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] multibyte messages are displayed incorrectly on the client
From: Bruce Momjian br...@momjian.us On Sun, Jan 5, 2014 at 04:40:17PM +0900, MauMau wrote: Then, as a happy medium, how about disabling message localization only if the client encoding differs from the server one? That is, compare the client_encoding value in the startup packet with the result of GetPlatformEncoding(). If they don't match, call disable_message_localization(). I think the problem is we don't know the client and server encodings at that time. I suppose we know (or at least believe) those encodings during backend startup: * client encoding - the client_encoding parameter passed in the startup packet, or if that's not present, client_encoding GUC value. * server encoding - the encoding of strings gettext() returns. That is what GetPlatformEncoding() returns. Am I correct? Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] extra_float_digits and casting from real to numeric
A customer recently upgraded their jdbc driver from 8.4 to 9.2. This enabled the binary wire protocol (effectively between 9.1 and 9.2). They reported that single precision values inserted into a numeric(10,2) column were suddenly rounded wrongly, i.e. 1.18 was inserted as 1.20, while that worked before. Of course we told them that single is the wrong data type for this, but still, this is a regression. The behavior is easily reproducible with SELECT 1.18::real which returns 1.2. Now, the jdbc driver sets extra_float_digits = 3, which makes the this ::real cast return 1.1797 in psql. This is consistent with the documentation which suggests that extra_float_digits = 0 will return the same representation on all platforms, so it must be rounded a bit to account for different implementations. But if extra_float_digits 0 is set, I'd expect not only the float4 output to be affected by it, but also casts to other datatypes, which is not the case now: set extra_float_digits = 0; select 1.18::real, 1.18::real::numeric(10,2), 1.18::real::text, to_char(1.18::real, '9D999'); float4 | numeric | text | to_char -+--+-+-- 1.2 | 1.20 | 1.2 | 1.2 set extra_float_digits = 1; select 1.18::real, 1.18::real::numeric(10,2), 1.18::real::text, to_char(1.18::real, '9D999'); float4 | numeric | text | to_char --+--+--+-- 1.18 | 1.20 | 1.18 | 1.2 set extra_float_digits = 3; select 1.18::real, 1.18::real::numeric(10,2), 1.18::real::text, to_char(1.18::real, '9D999'); float4 | numeric |text| to_char +--++-- 1.1797 | 1.20 | 1.1797 | 1.2 Is that sane? Shouldn't FLT_DIG in float4_numeric() be replaced with FLT_DIG + extra_float_digits like float4out() does, so the extra precision is not lost when inserting float4 data into numeric columns? Likewise, float4_to_char() should be adjusted for to_char output, and correspondingly float8_numeric() and float8_to_char()? Christoph -- c...@df7cb.de | http://www.df7cb.de/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] generic pseudotype IO functions?
Andres Freund and...@2ndquadrant.com writes: I think we also should auto-assign the oids for pg_proc (and some other tables) rows if we go there. -1 ... you've evidently not built any opclasses lately. Yeah, we could probably improve the bootstrap infrastructure enough to not need literal OIDs in so many places in the initial catalog contents, but it'd be lots of trouble. It definitely is *not* something to buy into at the same time as redoing the creation of the .bki file. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cleanup in code
David Rowley dgrowle...@gmail.com writes: I think it will be like Andres said up thread, to stop multiple evaluations of the expression passed to the macro. Exactly. We are not going to risk multiple evals in a macro as commonly used as elog/ereport; the risk/benefit ratio is just too high. I don't see anything wrong with suppressing this warning by inserting an additional return statement. The code is already plastered with such things, from the days before we had any unreachability hints in elog/ereport. And as I said upthread, there is no good reason to suppose that the unreachability hints are always recognized by every compiler. I take this behavior of MSVC as proof of that statement. It is mildly curious that MSVC fails to understand the unreachability hint here when it does so elsewhere, but for our purposes, that's a purely academic question. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] generic pseudotype IO functions?
On 2014-01-07 10:04:33 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: I think we also should auto-assign the oids for pg_proc (and some other tables) rows if we go there. -1 ... you've evidently not built any opclasses lately. True. Not sure if I ever built one, but when playing around. To the point that I am not seing the problem right now. I am not proposing to get rid of statically assigned oids in pg_type et al.. The references to procs mostly seem to be typed 'regproc' so there aren't many references to function's oids. There's a few procs that are involved in bootstraping, those likely would need to keep a fixed oid, but that doesn't sound problematic to me. Yeah, we could probably improve the bootstrap infrastructure enough to not need literal OIDs in so many places in the initial catalog contents, but it'd be lots of trouble. It definitely is *not* something to buy into at the same time as redoing the creation of the .bki file. Not sure. Any such infrastructure should have support for filling in defaults for things that don't need to be specified - imo generating an oid for that sounds like it would fit in there. Doesn't - and possibly shouldn't - be the same commit though. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] In-core regression tests for replication, cascading, archiving, PITR, etc. Michael Paquier
Heikki Linnakangas hlinnakan...@vmware.com writes: Another idea would be to do something like chroot, but more lightweight, using FUSE, private mount namespaces, or cgroups. I thought the goal here was to have a testing framework that (a) is portable to every platform we support and (b) doesn't require root privileges to run. None of those options sound like they'll help meet those requirements. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] In-core regression tests for replication, cascading, archiving, PITR, etc. Michael Paquier
On 2014-01-07 10:27:14 -0500, Tom Lane wrote: Heikki Linnakangas hlinnakan...@vmware.com writes: Another idea would be to do something like chroot, but more lightweight, using FUSE, private mount namespaces, or cgroups. I thought the goal here was to have a testing framework that (a) is portable to every platform we support and (b) doesn't require root privileges to run. None of those options sound like they'll help meet those requirements. Seconded. Perhaps the solution is to simply introduce tablespaces located relative to PGDATA? That'd be fracking useful anyway. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [HITB-Announce] HITB Magazine Issue 10 Out Now
On Mon, Jan 6, 2014 at 9:37 PM, Hafez Kamal aph...@hackinthebox.org wrote: Issue #10 is now available! What does any of this have to do with the purpose of the pgsql-hackers mailing list, namely development of the PostgreSQL database server? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [HITB-Announce] HITB Magazine Issue 10 Out Now
On 7 January 2014 15:47, Robert Haas robertmh...@gmail.com wrote: On Mon, Jan 6, 2014 at 9:37 PM, Hafez Kamal aph...@hackinthebox.org wrote: Issue #10 is now available! What does any of this have to do with the purpose of the pgsql-hackers mailing list, namely development of the PostgreSQL database server? We had a similar email from this same address over a year ago. These have been the only two messages posted, and neither relevant or appropriate. Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] generic pseudotype IO functions?
Andres Freund and...@2ndquadrant.com writes: To the point that I am not seing the problem right now. I am not proposing to get rid of statically assigned oids in pg_type et al.. The references to procs mostly seem to be typed 'regproc' so there aren't many references to function's oids. *Some* of them are typed regproc. Many are not, because there's need to refer to overloaded function names. A minimum change before we could even consider abandoning auto assignment of pg_proc entries would be to make regprocedure_in work in bootstrap mode, so that we could use regprocedure as a catalog column type. And regoperator_in, too, if you wanted to auto-assign operator OIDs. And you'd need some solution for generating fmgroids.h, as well as the handmade #define's for selected operator OIDs that appear now in pg_operator.h. There are probably more issues, but those are the ones that occur to me after thirty seconds' thought. Even after all that, we can *not* go over to auto-assignment of pg_type OIDs, because there is way too much stuff that assumes those OIDs are stable (starting with on-disk storage of arrays). I'm not entirely sure that it's okay to have function OIDs varying across releases, either; who's to say that there's not client code looking at fmgroids.h, or otherwise hard-wiring the numbers it uses for fastpath function calls? On the whole I think that this'd be a lot more work, and a lot more potential for breakage, than it's really worth. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to reproduce serialization failure for a read only transaction.
AK alk...@gmail.com wrote: I cannot have a read-only transaction fail because of serialization anomalies. Can someone show me a working example please? A common case is a read-only transaction reading a closed batch without seeing all of its entries. http://wiki.postgresql.org/wiki/SSI#Read_Only_Transactions -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixing pg_basebackup with tablespaces found in $PGDATA
On Thu, Jan 2, 2014 at 2:06 PM, Dimitri Fontaine dimi...@2ndquadrant.frwrote: Magnus Hagander mag...@hagander.net writes: We can't get away with just comparing the relative part of the pathname. Because it will fail if there is another path with exactly the same length, containing the tablespace. Actually… yeah. I think we might want to store a value in the tablespaceinfo struct indicating whether it's actually inside PGDATA (since we have the full path at that point), and then skip it based on that instead. Or store and pass the value of getcwd() perhaps. I think it's best to stuff in the tablespaceinfo struct either NIL or the relative path of the tablespace when found in $PGDATA, as done in the attached. I've attached a slightly updated patch - I changed around a bit of logic order and updated some comments during my review. And added error-checking. Thanks! I started again from your version for v3. Applied a fairly heavily edited version of this one. I also backpatched it to 9.1 and up. Thanks! -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan
On Mon, Jan 6, 2014 at 9:40 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: Robert Haas wrote: I spent some time looking at this tonight. I don't think the value that is displayed for the bitmap memory tracking will be accurate in complex cases. The bitmap heap scan may sit on top of one or more bitmap-and or bitmap-or nodes. When a bitmap-and operation happens, one of the two bitmaps being combined will be thrown out and the number of entries in the other map will, perhaps, be decreased. The peak memory usage for the surviving bitmap will be reflected in the number displayed for the bitmap heap scan, but the peak memory usage for the discarded bitmap will not. This is wholly arbitrary because both bitmaps existed at the same time, side by side, and which one we keep and which one we throw out is essentially random. Thank you for taking time to look at this patch. The peak memory usage for the discarded bitmap *can* be reflected in the number displayed for the bitmap heap scan by the following code in tbm_union() or tbm_intersect(): tbm_union(TIDBitmap *a, const TIDBitmap *b) { Assert(!a-iterating); + if (a-nentriesPeak b-nentriesPeak) + a-nentriesPeak = b-nentriesPeak; /* Nothing to do if b is empty */ if (b-nentries == 0) return; *** tbm_intersect(TIDBitmap *a, const TIDBitmap *b) { Assert(!a-iterating); + if (a-nentriesPeak b-nentriesPeak) + a-nentriesPeak = b-nentriesPeak; /* Nothing to do if a is empty */ if (a-nentries == 0) return; *** Sorry for the delay. Hmm, fair point. But I'm still not convinced that we really need to add extra accounting for this. What's wrong with just reporting the number of exact and lossy pages? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] generic pseudotype IO functions?
On 2014-01-07 11:08:22 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: To the point that I am not seing the problem right now. I am not proposing to get rid of statically assigned oids in pg_type et al.. The references to procs mostly seem to be typed 'regproc' so there aren't many references to function's oids. *Some* of them are typed regproc. Many are not, because there's need to refer to overloaded function names. So, for the archives, this seems to be: * pg_cast * pg_aggregate * pg_event_trigger * pg_foreign_data_wrapper * pg_language * pg_proc (duh...) It strikes me that several of the tables using reproc could very well stand to use regprocedure instead. A minimum change before we could even consider abandoning auto assignment of pg_proc entries would be to make regprocedure_in work in bootstrap mode, so that we could use regprocedure as a catalog column type. Yes. And regoperator_in, too, if you wanted to auto-assign operator OIDs. I personally find that much less interesting because there are so much fewer operators in comparison to procs, so conflicts are rarer. And you'd need some solution for generating fmgroids.h, as well as the handmade #define's for selected operator OIDs that appear now in pg_operator.h. If we're able to generate the .bki file, this seems like a solveable problem. There are probably more issues, but those are the ones that occur to me after thirty seconds' thought. Yes. Even after all that, we can *not* go over to auto-assignment of pg_type OIDs, because there is way too much stuff that assumes those OIDs are stable (starting with on-disk storage of arrays). Yes, I think that's totally out of question. I'm not entirely sure that it's okay to have function OIDs varying across releases, either; who's to say that there's not client code looking at fmgroids.h, or otherwise hard-wiring the numbers it uses for fastpath function calls? I am not particularly worried about that. That'd mean somebody built a solution specifically for calling builtin functions since all user created functions will be dynamic. And that that client is using a fmgroids.h from another release than the one he's working with - a recipe for trouble in such a solution independent of this imo. On the whole I think that this'd be a lot more work, and a lot more potential for breakage, than it's really worth. That might be true. It's probably the most frequent cause for conflicts in patches people submit tho... Anyway, pg_proc's contents would need to be generated before this anyway... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Assertion failure in base backup code path
On Tue, Dec 24, 2013 at 1:24 PM, Andres Freund and...@2ndquadrant.comwrote: On 2013-12-23 18:28:51 +0100, Magnus Hagander wrote: On Dec 19, 2013 12:06 AM, Andres Freund and...@2ndquadrant.com wrote: Hi Magnus, It looks to me like the path to do_pg_start_backup() outside of a transaction context comes from your initial commit of the base backup facility. The problem is that you're not allowed to do anything leading to a syscache/catcache lookup in those contexts. I think that may have come with the addition of the replication privilege actually but that doesn't change the fact that yes, it appears broken.. There was a if (!superuser()) check there before as well, that has the same dangers. I think the correct fix is to move the security check outside of do_pg_start_backup() and leave it to the caller. That means pg_start_backup() for a manual backup. And for a streaming base backup the check has already been made - you can't get through the authentication step if you don't have the required permissions. Does the attached seem right to you? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 43a0416..bf1174e 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -9727,6 +9727,9 @@ XLogFileNameP(TimeLineID tli, XLogSegNo segno) * * Every successfully started non-exclusive backup must be stopped by calling * do_pg_stop_backup() or do_pg_abort_backup(). + * + * It is the responsibility of the caller of this function to verify the + * permissions of the calling user! */ XLogRecPtr do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p, @@ -9747,11 +9750,6 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p, backup_started_in_recovery = RecoveryInProgress(); - if (!superuser() !has_rolreplication(GetUserId())) - ereport(ERROR, -(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), - errmsg(must be superuser or replication role to run a backup))); - /* * Currently only non-exclusive backup can be taken during recovery. */ @@ -10053,6 +10051,9 @@ pg_start_backup_callback(int code, Datum arg) * * Returns the last WAL position that must be present to restore from this * backup, and the corresponding timeline ID in *stoptli_p. + * + * It is the responsibility of the caller of this function to verify the + * permissions of the calling user! */ XLogRecPtr do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p) @@ -10085,11 +10086,6 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p) backup_started_in_recovery = RecoveryInProgress(); - if (!superuser() !has_rolreplication(GetUserId())) - ereport(ERROR, -(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), - (errmsg(must be superuser or replication role to run a backup; - /* * Currently only non-exclusive backup can be taken during recovery. */ diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c index c421a59..4be31b4 100644 --- a/src/backend/access/transam/xlogfuncs.c +++ b/src/backend/access/transam/xlogfuncs.c @@ -56,6 +56,11 @@ pg_start_backup(PG_FUNCTION_ARGS) backupidstr = text_to_cstring(backupid); + if (!superuser() !has_rolreplication(GetUserId())) + ereport(ERROR, +(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), + errmsg(must be superuser or replication role to run a backup))); + startpoint = do_pg_start_backup(backupidstr, fast, NULL, NULL); snprintf(startxlogstr, sizeof(startxlogstr), %X/%X, @@ -82,6 +87,11 @@ pg_stop_backup(PG_FUNCTION_ARGS) XLogRecPtr stoppoint; char stopxlogstr[MAXFNAMELEN]; + if (!superuser() !has_rolreplication(GetUserId())) + ereport(ERROR, +(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), + (errmsg(must be superuser or replication role to run a backup; + stoppoint = do_pg_stop_backup(NULL, true, NULL); snprintf(stopxlogstr, sizeof(stopxlogstr), %X/%X, -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extra_float_digits and casting from real to numeric
Christoph Berg christoph.b...@credativ.de writes: A customer recently upgraded their jdbc driver from 8.4 to 9.2. This enabled the binary wire protocol (effectively between 9.1 and 9.2). They reported that single precision values inserted into a numeric(10,2) column were suddenly rounded wrongly, i.e. 1.18 was inserted as 1.20, while that worked before. Of course we told them that single is the wrong data type for this, but still, this is a regression. I'm not sure that it's fair to characterize that as a regression. If anything, it's more sensible than what happened before. But if extra_float_digits 0 is set, I'd expect not only the float4 output to be affected by it, but also casts to other datatypes, This proposal scares me. extra_float_digits is strictly a matter of I/O representation, it does not affect any internal calculations. Moreover, since one of the fundamental attributes of type numeric is that it's supposed to give platform-independent results, I don't like the idea that you're likely to get platform-dependent results of conversions from float4/float8. I think your customer got bit by his own bad coding practice, and that should be the end of it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Assertion failure in base backup code path
On 2014-01-07 17:40:07 +0100, Magnus Hagander wrote: On Tue, Dec 24, 2013 at 1:24 PM, Andres Freund and...@2ndquadrant.comwrote: On 2013-12-23 18:28:51 +0100, Magnus Hagander wrote: On Dec 19, 2013 12:06 AM, Andres Freund and...@2ndquadrant.com wrote: Hi Magnus, It looks to me like the path to do_pg_start_backup() outside of a transaction context comes from your initial commit of the base backup facility. The problem is that you're not allowed to do anything leading to a syscache/catcache lookup in those contexts. I think that may have come with the addition of the replication privilege actually but that doesn't change the fact that yes, it appears broken.. There was a if (!superuser()) check there before as well, that has the same dangers. I think the correct fix is to move the security check outside of do_pg_start_backup() and leave it to the caller. That means pg_start_backup() for a manual backup. And for a streaming base backup the check has already been made - you can't get through the authentication step if you don't have the required permissions. Yes, that's what I was thinking and was planning to write you about at some point. Does the attached seem right to you? I haven't run the code, but it looks right to me. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction Interfaces
On 2013-12-12 16:49:33 +0100, Andres Freund wrote: On 2013-12-12 10:01:21 -0500, Robert Haas wrote: On Thu, Dec 12, 2013 at 7:04 AM, Andres Freund and...@2ndquadrant.com wrote: As far as I have been thinking of, this would be another catalog table like pg_decoding_plugin(oid, dpname name, dpload regproc). Instead of adding another catalog table, I think we should just define a new type. Again, please look at the way that foreign data wrappers do this: I don't really see what the usage of a special type has to do with this, but I think that's besides your main point. What you're saying is that the output plugin is just defined by a function name, possibly schema prefixed. That has an elegance to it. +1 Ok, so I've implemented this, but I am not so sure it's sufficient, there's some issue: Currently a logical replication slot has a plugin assigned, previously that has just been identified by the basename of a .so. But with the above proposal the identifier is pointing to a function, currently via its oid. But what happens if somebody drops or recreates the function? We can't make pg_depend entries or anything since that won't work on a standby. Earlier, if somebody removed the .so we'd just error out, but pg's dependency tracking always only mattered to things inside the catalogs. I see the following possible solutions for this: 1) accept that fact, and throw an error if the function doesn't exist anymore, or has an unsuitable signature. We can check the return type of output_plugin_callbacks, so that's a pretty specific test. 2) Create a pg_output_plugin catalog and prevent DROP OUTPUT PLUGIN (or similar) when there's a slot defined. But how'd that work if the slot is only defined on standbys? We could have the redo routine block and/or kill the slot if necessary? 3) Don't assign a specific output plugin to a slot, but have it specified everytime data is streamed, not just when a slot is created. Currently that wouldn't be a problem, but I am afraid it will constrict some future optimizations. Good ideas? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Assertion failure in base backup code path
On Tue, Jan 7, 2014 at 5:43 PM, Andres Freund and...@2ndquadrant.comwrote: On 2014-01-07 17:40:07 +0100, Magnus Hagander wrote: On Tue, Dec 24, 2013 at 1:24 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-12-23 18:28:51 +0100, Magnus Hagander wrote: On Dec 19, 2013 12:06 AM, Andres Freund and...@2ndquadrant.com wrote: Hi Magnus, It looks to me like the path to do_pg_start_backup() outside of a transaction context comes from your initial commit of the base backup facility. The problem is that you're not allowed to do anything leading to a syscache/catcache lookup in those contexts. I think that may have come with the addition of the replication privilege actually but that doesn't change the fact that yes, it appears broken.. There was a if (!superuser()) check there before as well, that has the same dangers. I think the correct fix is to move the security check outside of do_pg_start_backup() and leave it to the caller. That means pg_start_backup() for a manual backup. And for a streaming base backup the check has already been made - you can't get through the authentication step if you don't have the required permissions. Yes, that's what I was thinking and was planning to write you about at some point. Good, then we think the same :) Does the attached seem right to you? I haven't run the code, but it looks right to me. It worked fine in my testing, so I've pushed that version. Looks slightly different in both 9.2 and 9.1 (not clean backpatching) due to code reorganization and such, but AFAICT it's just code in different places. Thanks! -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
[HACKERS] Failed assertion root-hasLateralRTEs on initsplan.c
I get assertion failure on initsplan.c line 1325 while executing following query on HEAD (edc43458d797a5956f4bf39af18cf62abb0077db). It works fine without --enable-cassert. update subscriber set properties = hstore(a) from (select firstname, lastname from player where player.id = subscriber.id) as a; Backtrace: * thread #1: tid = 0x2e16ec, 0x7fff85f0b866 libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main-thread, stop reason = signal SIGABRT frame #0: 0x7fff85f0b866 libsystem_kernel.dylib`__pthread_kill + 10 frame #1: 0x7fff8450335c libsystem_pthread.dylib`pthread_kill + 92 frame #2: 0x7fff82ffdbba libsystem_c.dylib`abort + 125 frame #3: 0x00010e2b7510 postgres`ExceptionalCondition(conditionName=unavailable, errorType=unavailable, fileName=unavailable, lineNumber=unavailable) + 80 at assert.c:54 frame #4: 0x00010e155ab6 postgres`distribute_qual_to_rels(root=unavailable, clause=0x7fd5c382e208, is_deduced='\0', below_outer_join='\0', jointype=JOIN_INNER, qualscope=0x7fd5c3835ee8, ojscope=unavailable, outerjoin_nonnullable=unavailable, deduced_nullable_relids=unavailable, postponed_qual_list=unavailable) + 1254 at initsplan.c:1325 frame #5: 0x00010e154a66 postgres`deconstruct_recurse(root=0x7fd5c382c248, jtnode=0x7fd5c382cde0, below_outer_join='\0', qualscope=0x7fff51c723f8, inner_join_rels=unavailable, postponed_qual_list=0x7fff51c72400) + 870 at initsplan.c:781 frame #6: 0x00010e1548ab postgres`deconstruct_recurse(root=0x7fd5c382c248, jtnode=0x7fd5c382bfd8, below_outer_join='\0', qualscope=0x7fff51c72450, inner_join_rels=0x7fff51c72448, postponed_qual_list=0x7fff51c72440) + 427 at initsplan.c:732 frame #7: 0x00010e1546a1 postgres`deconstruct_jointree(root=unavailable) + 81 at initsplan.c:655 frame #8: 0x00010e156a1b postgres`query_planner(root=0x7fd5c382c248, tlist=0x7fd5c382e398, qp_callback=0x00010e15a660, qp_extra=0x7fff51c725f0) + 219 at planmain.c:145 frame #9: 0x00010e1589d8 postgres`grouping_planner(root=0x7fd5c382c248, tuple_fraction=unavailable) + 2888 at planner.c:1243 frame #10: 0x00010e157adf postgres`subquery_planner(glob=0x7fd5c4007e68, parse=0x7fd5c4007a30, parent_root=unavailable, hasRecursion=unavailable, tuple_fraction=0, subroot=0x7fff51c72900) + 3119 at planner.c:572 frame #11: 0x00010e156cac postgres`standard_planner(parse=0x7fd5c4007a30, cursorOptions=unavailable, boundParams=unavailable) + 236 at planner.c:210 frame #12: 0x00010e1d6356 postgres`pg_plan_query(querytree=0x7fd5c4007a30, cursorOptions=0, boundParams=0x) + 118 at postgres.c:759 frame #13: 0x00010e1d979a postgres`PostgresMain [inlined] pg_plan_queries(cursorOptions=0, querytrees=unavailable, boundParams=unavailable) + 56 at postgres.c:818 frame #14: 0x00010e1d9762 postgres`PostgresMain [inlined] exec_simple_query(query_string=0x7fd5c4006038) + 21 at postgres.c:983 frame #15: 0x00010e1d974d postgres`PostgresMain(argc=unavailable, argv=unavailable, dbname=0x7fd5c301ac30, username=unavailable) + 8749 at postgres.c:4011 frame #16: 0x00010e184c1f postgres`PostmasterMain [inlined] BackendRun + 7551 at postmaster.c:4085 frame #17: 0x00010e184c00 postgres`PostmasterMain [inlined] BackendStartup at postmaster.c:3774 frame #18: 0x00010e184c00 postgres`PostmasterMain [inlined] ServerLoop at postmaster.c:1585 frame #19: 0x00010e184c00 postgres`PostmasterMain(argc=unavailable, argv=unavailable) + 7520 at postmaster.c:1240 frame #20: 0x00010e11924f postgres`main(argc=1, argv=0x7fd5c2c03ec0) + 783 at main.c:194 frame #21: 0x7fff897165fd libdyld.dylib`start + 1 frame #22: 0x7fff897165fd libdyld.dylib`start + 1 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to reproduce serialization failure for a read only transaction.
On Jan7, 2014, at 00:38 , Jim Nasby j...@nasby.net wrote: This email and the previous one are an awesome bit of information, can we add it to the docs somehow? Even if it's just dumping the emails into a wiki page and referencing it? Most of what I wrote there can be found in README-SSE, I think, under Apparent Serial Order of Execution, Heap locking and Index AM implementations. I guess it'd be nice if we explained these things in the docs somewhere, though I'm not sure what level of detail would be appropriate. Maybe a good compromise would be to explain dependency graphs, but skip over the different kinds of dependencies (ww, rw, wr). Instead we could say that whenever a transaction *does see* another transaction's modifications it must appear after that transaction in any serial schedule, and whenever a transaction *might see* another transaction's modifications but doesn't due to begin/commit ordering it must appear before that transaction. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in visibility map WAL-logging
Hi folks, On Fri, Dec 13, 2013 at 9:47 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: lazy_vacuum_page() does this: 1. START_CRIT_SECTION() 2. Remove dead tuples from the page, marking the itemid's unused. 3. MarkBufferDirty 4. if all remaining tuples on the page are visible to everyone, set the all-visible flag on the page, and call visibilitymap_set() to set the VM bit. 5 visibilitymap_set() writes a WAL record about setting the bit in the visibility map. 6. write the WAL record of removing the dead tuples. 7. END_CRIT_SECTION(). See the problem? Setting the VM bit is WAL-logged first, before the removal of the tuples. If you stop recovery between the two WAL records, the page's VM bit in the VM map will be set, but the dead tuples are still on the page. This bug was introduced in Feb 2013, by commit fdf9e21196a6f58c6021c967dc5776a16190f295, so it's present in 9.3 and master. The fix seems quite straightforward, we just have to change the order of those two records. I'll go commit that. With a lot of help from Andres on IRC (thanks a lot man), I think (he thinks actually, =P ) I may ran into the issue this bug can cause. I'm sending this e-mail to (1) confirm if my issue is really caused by that bug and if that is the case, (2) let you guys know the problems it can cause. Scenario: Master: 9.3.1 (I know, trying to go to 9.3.2) Slave: 9.3.2 The slave was running with hot_standby=off (because of other bug Andres is working on), but it stopped working with the following log lines: 2014-01-07 12:38:04 BRST [22668]: [7-1] user=,db= WARNING: page 1076 of relation base/883916/151040222 is uninitialized 2014-01-07 12:38:04 BRST [22668]: [8-1] user=,db= CONTEXT: xlog redo visible: rel 1663/883916/151040222; blk 1076 2014-01-07 12:38:04 BRST [22668]: [9-1] user=,db= PANIC: WAL contains references to invalid pages 2014-01-07 12:38:04 BRST [22668]: [10-1] user=,db= CONTEXT: xlog redo visible: rel 1663/883916/151040222; blk 1076 2014-01-07 12:38:04 BRST [22665]: [3-1] user=,db= LOG: startup process (PID 22668) was terminated by signal 6: Aborted (Complete log at https://pgsql.privatepaste.com/343f3190fe). I used pg_xlogdump to (try to) find the block the error relates to: $ pg_xlogdump pg_xlog/000102BC002B 000102BC007F | grep -n -C5 '883916/151040222; blk 1076' 2088220 ... Heap2 ... 24/ 192, ... lsn: 2BC/46AB8B20 ... desc: clean: rel 1663/883916/151040222; blk 1073 remxid 107409880 2088221 ... Heap2 ... 20/52, ... lsn: 2BC/46AB8BE0 ... desc: visible: rel 1663/883916/151040222; blk 1074 2088222 ... Heap2 ... 24/ 128, ... lsn: 2BC/46AB8C18 ... desc: clean: rel 1663/883916/151040222; blk 1074 remxid 107409880 2088223 ... Heap2 ... 20/52, ... lsn: 2BC/46AB8C98 ... desc: visible: rel 1663/883916/151040222; blk 1075 2088224 ... Heap2 ... 32/64, ... lsn: 2BC/46AB8CD0 ... desc: clean: rel 1663/883916/151040222; blk 1075 remxid 107409880 == two lines that matched bellow == 2088225 ... Heap2 ... 20/52, ... lsn: 2BC/46AB8D10 ... desc: visible: rel 1663/883916/151040222; blk 1076 2088226 ... Heap2 ... 24/ 164, ... lsn: 2BC/46AB8D48 ... desc: clean: rel 1663/883916/151040222; blk 1076 remxid 107409880 2088227 ... Heap2 ... 20/52, ... lsn: 2BC/46AB8DF0 ... desc: visible: rel 1663/883916/151040222; blk 1077 2088228 ... Heap2 ... 24/ 128, ... lsn: 2BC/46AB8E28 ... desc: clean: rel 1663/883916/151040222; blk 1077 remxid 107409880 2088229 ... Heap2 ... 20/52, ... lsn: 2BC/46AB8EA8 ... desc: visible: rel 1663/883916/151040222; blk 1078 2088230 ... Heap2 ... 24/ 5748, ... lsn: 2BC/46AB8EE0 ... desc: clean: rel 1663/883916/151040222; blk 1078 remxid 107409880 2088231 ... Heap2 ... 20/52, ... lsn: 2BC/46ABA570 ... desc: visible: rel 1663/883916/151040222; blk 1079 I cropped some columns to make it easy to read (the entire result is attached), if you guys need more information, I can send the xlogdump of all the WALs (or any other information). Also attached the controldata, if needed. Thanks a lot. Best regards, -- Matheus de Oliveira Analista de Banco de Dados Dextra Sistemas - MPS.Br nível F! www.dextra.com.br/postgres 2088220-rmgr: Heap2 len (rec/tot): 24/ 192, tx: 0, lsn: 2BC/46AB8B20, prev 2BC/46AB8AE8, bkp: 1000, desc: clean: rel 1663/883916/151040222; blk 1073 remxid 107409880 2088221-rmgr: Heap2 len (rec/tot): 20/52, tx: 0, lsn: 2BC/46AB8BE0, prev 2BC/46AB8B20, bkp: , desc: visible: rel 1663/883916/151040222; blk 1074 2088222-rmgr: Heap2 len (rec/tot): 24/ 128, tx: 0, lsn: 2BC/46AB8C18, prev 2BC/46AB8BE0, bkp: 1000, desc: clean: rel 1663/883916/151040222; blk 1074 remxid 107409880 2088223-rmgr: Heap2 len (rec/tot): 20/52, tx: 0, lsn: 2BC/46AB8C98, prev 2BC/46AB8C18, bkp: , desc: visible: rel 1663/883916/151040222; blk 1075 2088224-rmgr: Heap2 len (rec/tot): 32/64, tx: 0, lsn:
Re: [HACKERS] Fixing pg_basebackup with tablespaces found in $PGDATA
Magnus Hagander mag...@hagander.net writes: Applied a fairly heavily edited version of this one. I also backpatched it to 9.1 and up. Thanks a lot! Did some reviewing and re-testing here, I like using DataDir and IS_DIR_SEP better than what I did, of course ;-) Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failed assertion root-hasLateralRTEs on initsplan.c
Emre Hasegeli e...@hasegeli.com writes: I get assertion failure on initsplan.c line 1325 while executing following query on HEAD (edc43458d797a5956f4bf39af18cf62abb0077db). It works fine without --enable-cassert. update subscriber set properties = hstore(a) from (select firstname, lastname from player where player.id = subscriber.id) as a; Hm, AFAICS this query should absolutely *not* work; the reference to subscriber.id inside the sub-select is illegal. It might be legal with LATERAL, but not otherwise. So I think this is a parser bug, and there's nothing wrong with the planner's Assert. 9.2 and earlier throw the error I'd expect, so probably something in the LATERAL patches broke this case; will look. The next question is if we should allow it with LATERAL. That would essentially be treating subscriber as having implicitly appeared at the start of the FROM list, which I guess is all right ... but does anyone want to argue against it? I seem to recall some old discussions about allowing the update target to be explicitly shown in FROM, in case you wanted say to left join it against something else. Allowing this implicit appearance might limit our options if we ever get around to trying to do that. On the other hand, those discussions were a long time back, so maybe it'll never happen anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: How to reproduce serialization failure for a read only transaction.
Dan Ports d...@csail.mit.edu wrote: On Mon, Jan 06, 2014 at 05:14:12PM -0800, AK wrote: If you try this, it'll cause a serialization failure on the line marked with a '!'. W1 saw (1,0) in the table, so W1 appears to have executed before W2. But R saw both (1,1) and (2,0) in the table, and that has to be a consistent snapshot of the database state, meaning W2 appears to have executed before W1. That's an inconsistency, so something has to be rolled back. This particular anomaly requires all three of the transactions, and so it can't be detected until W1 does its UPDATE. Postgres detects the conflict at that point and rolls back W1. Yeah, neither of the provided examples rolled back the read only transaction itself; the read only transaction caused a situation where something needed to be rolled back, but since we try to roll back a transaction which has a good chance of succeeding on retry, the read only transaction is not usually a good candidate. I created a new example on the Wiki page where the read only transaction itself must be rolled back because both of the other transactions involved have already committed: https://wiki.postgresql.org/wiki/SSI#Rollover Regarding other questions on the thread: I have no objections to moving the Wiki examples into the docs, but it seemed like a lot to include, and I'm not sure where it belongs. Ideas? Regarding the different results AK got, I set default_transaction_isolation = 'serializable' on my connections before running these for two reasons. (1) It keeps the examples more concise. (2) I think most people using serializable transactions in PostgreSQL set the default and don't set the transaction isolation level on each transaction, since (unlike strategies which rely on blocking, like S2PL) all transactions must be participating in the stricter isolation level for it to be reliable. In fact, given the performance benefits of declaring transactions READ ONLY when possible, I have seen shops that make *that* a default, too, and override it for transactions which need to write. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in visibility map WAL-logging
On 01/07/2014 07:15 PM, Matheus de Oliveira wrote: Hi folks, On Fri, Dec 13, 2013 at 9:47 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: lazy_vacuum_page() does this: 1. START_CRIT_SECTION() 2. Remove dead tuples from the page, marking the itemid's unused. 3. MarkBufferDirty 4. if all remaining tuples on the page are visible to everyone, set the all-visible flag on the page, and call visibilitymap_set() to set the VM bit. 5 visibilitymap_set() writes a WAL record about setting the bit in the visibility map. 6. write the WAL record of removing the dead tuples. 7. END_CRIT_SECTION(). See the problem? Setting the VM bit is WAL-logged first, before the removal of the tuples. If you stop recovery between the two WAL records, the page's VM bit in the VM map will be set, but the dead tuples are still on the page. This bug was introduced in Feb 2013, by commit fdf9e21196a6f58c6021c967dc5776a16190f295, so it's present in 9.3 and master. The fix seems quite straightforward, we just have to change the order of those two records. I'll go commit that. With a lot of help from Andres on IRC (thanks a lot man), I think (he thinks actually, =P ) I may ran into the issue this bug can cause. I'm sending this e-mail to (1) confirm if my issue is really caused by that bug and if that is the case, (2) let you guys know the problems it can cause. Scenario: Master: 9.3.1 (I know, trying to go to 9.3.2) Slave: 9.3.2 The slave was running with hot_standby=off (because of other bug Andres is working on), but it stopped working with the following log lines: 2014-01-07 12:38:04 BRST [22668]: [7-1] user=,db= WARNING: page 1076 of relation base/883916/151040222 is uninitialized 2014-01-07 12:38:04 BRST [22668]: [8-1] user=,db= CONTEXT: xlog redo visible: rel 1663/883916/151040222; blk 1076 2014-01-07 12:38:04 BRST [22668]: [9-1] user=,db= PANIC: WAL contains references to invalid pages 2014-01-07 12:38:04 BRST [22668]: [10-1] user=,db= CONTEXT: xlog redo visible: rel 1663/883916/151040222; blk 1076 2014-01-07 12:38:04 BRST [22665]: [3-1] user=,db= LOG: startup process (PID 22668) was terminated by signal 6: Aborted (Complete log at https://pgsql.privatepaste.com/343f3190fe). I used pg_xlogdump to (try to) find the block the error relates to: $ pg_xlogdump pg_xlog/000102BC002B 000102BC007F | grep -n -C5 '883916/151040222; blk 1076' 2088220 ... Heap2 ... 24/ 192, ... lsn: 2BC/46AB8B20 ... desc: clean: rel 1663/883916/151040222; blk 1073 remxid 107409880 2088221 ... Heap2 ... 20/52, ... lsn: 2BC/46AB8BE0 ... desc: visible: rel 1663/883916/151040222; blk 1074 2088222 ... Heap2 ... 24/ 128, ... lsn: 2BC/46AB8C18 ... desc: clean: rel 1663/883916/151040222; blk 1074 remxid 107409880 ... Hmm. The xlogdump indeed shows that the order of 'clean' and 'visible' is incorrect, but I don't immediately see how that could cause the PANIC. Why is the page uninitialized in the standby? If VACUUM is removing some dead tuples from it, it certainly should exist and be correctly initialized. How did you set up the standby? Did you initialize it from an offline backup of the master's data directory, perhaps? The log shows that the startup took the the crash recovery first, then start archive recovery path, because there was no backup label file. In that mode, the standby assumes that the system is consistent after replaying all the WAL in pg_xlog, which is correct if you initialize from an offline backup or atomic filesystem snapshot, for example. But WAL contains references to invalid pages could also be a symptom of an inconsistent base backup, cause by incorrect backup procedure. In particular, I have to ask because I've seen it before: you didn't delete backup_label from the backup, did you? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: How to reproduce serialization failure for a read only transaction.
AK alk...@gmail.com wrote: Session 1. Setting up: CREATE TABLE cars( license_plate VARCHAR NOT NULL, reserved_by VARCHAR NULL ); INSERT INTO cars(license_plate) VALUES ('SUPRUSR'),('MIDLYPH'); Session 2: W1 BEGIN ISOLATION LEVEL SERIALIZABLE; UPDATE cars SET reserved_by = 'Julia' WHERE license_plate = 'SUPRUSR' AND reserved_by IS NULL; SELECT * FROM Cars WHERE license_plate IN('SUPRUSR','MIDLYPH'); Session 3: W2 BEGIN ISOLATION LEVEL SERIALIZABLE; UPDATE cars SET reserved_by = 'Ryan' WHERE license_plate = 'MIDLYPH' AND reserved_by IS NULL; COMMIT; Session 4: R BEGIN ISOLATION LEVEL SERIALIZABLE READ ONLY; SELECT * FROM Cars WHERE license_plate IN('SUPRUSR','MIDLYPH'); Session 2: W1 COMMIT; ERROR: could not serialize access due to read/write dependencies among transactions What am I doing wrong? Even without the read only transaction the W1 and W2 transactions are a classic case of write skew. It looks like it might actually be benign, since neither transaction is updating license_plate, but serializable logic works at the row level, not the column level. After both transactions update the table there is write skew which must be resolved by cancelling one of the transactions. The first to commit wins and the other one will be cancelled when it attempts to run its next statement, which may or may not be a COMMIT. If, for purposes of demonstration, you add a unique index on license_plate and set enable_seqscan = off, you eliminate the simple write skew and get into more complex ways of breaking things. With that tweak you can run all of those transactions if W1 skips the SELECT. You can let W1 do the SELECT as long as you don't run R. The problem is that the SELECT in W1 sees the work of W1 but not W2 and the SELECT in R sees the work of W2 but not W1. We can't allow that. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cleanup in code
On 01/07/2014 05:20 PM, Tom Lane wrote: David Rowley dgrowle...@gmail.com writes: I think it will be like Andres said up thread, to stop multiple evaluations of the expression passed to the macro. Exactly. We are not going to risk multiple evals in a macro as commonly used as elog/ereport; the risk/benefit ratio is just too high. I don't see anything wrong with suppressing this warning by inserting an additional return statement. The code is already plastered with such things, from the days before we had any unreachability hints in elog/ereport. And as I said upthread, there is no good reason to suppose that the unreachability hints are always recognized by every compiler. I take this behavior of MSVC as proof of that statement. Yeah, I was just surprised because I thought MSVC understood it. Committed the additional return statement. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: How to reproduce serialization failure for a read only transaction.
Regarding this: So what does this have to do with relying on the results of read-only transactions that abort? Well, what if you had instead had R ROLLBACK instead of COMMIT -- maybe because you expected ROLLBACK and COMMIT to be equivalent for transactions that don't modify the database, or maybe because something else caused the transaction to abort? When W1 does its update, it will be checked for serialization failures, but aborted transactions are (intentionally) not included in those checks. W1 is therefore allowed to commit; the apparent serial order of execution is W1 followed by W2, and the results of the aborted transaction R aren't consistent with that. So if I am reading the data and then commit, I should be always fine, correct? -- View this message in context: http://postgresql.1045698.n5.nabble.com/How-to-reproduce-serialization-failure-for-a-read-only-transaction-tp5785569p5785757.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Get more from indices.
Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes: The following modification to v7 does this. = diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c index 380f3ba..233e21c 100644 --- a/src/backend/optimizer/path/pathkeys.c +++ b/src/backend/optimizer/path/pathkeys.c @@ -536,7 +536,8 @@ index_pathkeys_are_extensible(PlannerInfo *root, { EquivalenceMember *member = (EquivalenceMember *) lfirst(lc2); - if (!bms_equal(member-em_relids, index-rel-relids)) + if (!bms_equal(member-em_relids, index-rel-relids) || + !IsA(member, Var)) continue; else { == I'm pretty sure that IsA test prevents the optimization from ever firing. But aside from hasty typos, is there enough left of this optimization to be worth the complication? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [HITB-Announce] HITB Magazine Issue 10 Out Now
Thom Brown escribió: On 7 January 2014 15:47, Robert Haas robertmh...@gmail.com wrote: On Mon, Jan 6, 2014 at 9:37 PM, Hafez Kamal aph...@hackinthebox.org wrote: Issue #10 is now available! What does any of this have to do with the purpose of the pgsql-hackers mailing list, namely development of the PostgreSQL database server? We had a similar email from this same address over a year ago. These have been the only two messages posted, and neither relevant or appropriate. I have unregistered this address from postgresql.org mailing list. I guess we will have to wait a year to see if this has any effect ... -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] In-core regression tests for replication, cascading, archiving, PITR, etc. Michael Paquier
On Tuesday, January 7, 2014 7:29 AM, Tom Lane t...@sss.pgh.pa.us wrote: Heikki Linnakangas hlinnakan...@vmware.com writes: Another idea would be to do something like chroot, but more lightweight, using FUSE, private mount namespaces, or cgroups. I thought the goal here was to have a testing framework that (a) is portable to every platform we support and (b) doesn't require root privileges to run. None of those options sound like they'll help meet those requirements. regards, tom lane If I drop the idea of sudo/chroot and punt for now on testing tablespaces under replication, it should be possible to test the rest of the replication system in a way that meets (a) and (b). Perhaps Andres' idea of tablespaces relative to the data directory will get implemented some day, at which point we wouldn't be punting quite so much. But until then, punt. Would it make sense for this to just be part of 'make check'? That would require creating multiple database clusters under multiple data directories, and having them bind to multiple ports or unix domain sockets. Is that a problem? What's the logic of having replication testing separated from the other pg_regress tests? Granted, not every user of postgres uses replication, but that's true for lots of features, and we don't split things like json into separate test suites. Vendors who run 'make check' as part of their packaging of postgresql would probably benefit from knowing if replication doesn't work on their distro, and they may not change their packaging systems to include a second 'make replicationcheck' step. mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] In-core regression tests for replication, cascading, archiving, PITR, etc. Michael Paquier
Mark Dilger markdil...@yahoo.com writes: Would it make sense for this to just be part of 'make check'? Probably not, as (I imagine) it will take quite a bit longer than make check does today. People who are not working on replication related features will be annoyed if a test cycle starts taking 10X longer than it used to, for tests of no value to them. It's already not the case that make check runs every available automated test; the isolation tests, the PL tests, the contrib tests are all separate. There is a make check-world, which I think should reasonably run all of these. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: How to reproduce serialization failure for a read only transaction.
AK alk...@gmail.com wrote: So if I am reading the data and then commit, I should be always fine, correct? If a serializable transaction successfully commits, that means that all data read within that transaction can be trusted. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITHIN GROUP patch
Tom == Tom Lane t...@sss.pgh.pa.us writes: Initial tests suggest that your version is ~40% slower than ours on some workloads. Tom I poked at this a bit with perf and oprofile, and concluded that Tom probably the difference comes from ordered_set_startup() Tom repeating lookups for each group that could be done just once Tom per query. Retesting with your changes shows that the gap is down to 15% but still present: work_mem=64MB enable_hashagg=off (for baseline test) baseline query (333ms on both versions): select count(*) from (select j from generate_series(1,3) i, generate_series(1,10) j group by j) s; test query: select count(*) from (select percentile_disc(0.5) within group (order by i) from generate_series(1,3) i, generate_series(1,10) j group by j) s; On the original patch as supplied: 571ms - 333ms = 238ms On current master: 607ms - 333ms = 274ms Furthermore, I can't help noticing that the increased complexity has now pretty much negated your original arguments for moving so much of the work out of nodeAgg.c. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITHIN GROUP patch
Andrew Gierth and...@tao11.riddles.org.uk writes: Retesting with your changes shows that the gap is down to 15% but still present: There's probably some overhead from traversing advance_transition_function for each row, which your version wouldn't have done. 15% sounds pretty high for that though, since advance_transition_function doesn't have much to do when the transition function is non strict and the state value is pass-by-value (which internal is). It's possible that we could do something to micro-optimize that case. I also wondered if it was possible to omit rechecking AggCheckCallContext after the first time through in ordered_set_transition(). It seemed unsafe to perhaps not have that happen at all, since if the point is to detect misuse then the mere fact that argument 0 isn't null isn't much comfort. It strikes me though that now we could test for first call by looking at fcinfo-flinfo-fn_extra, and be pretty sure that we've already checked the context if that isn't null. Whether this would save anything noticeable isn't clear though; I didn't see AggCheckCallContext high in the profile. Furthermore, I can't help noticing that the increased complexity has now pretty much negated your original arguments for moving so much of the work out of nodeAgg.c. The key reason for that was, and remains, not having the behavior hard-wired in nodeAgg; I believe that this design permits things to be accomplished in aggregate implementation functions that would not have been possible with the original patch. I'm willing to accept some code growth to have that flexibility. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in visibility map WAL-logging
On 2014-01-07 21:36:31 +0200, Heikki Linnakangas wrote: 2088220 ... Heap2 ... 24/ 192, ... lsn: 2BC/46AB8B20 ... desc: clean: rel 1663/883916/151040222; blk 1073 remxid 107409880 2088221 ... Heap2 ... 20/52, ... lsn: 2BC/46AB8BE0 ... desc: visible: rel 1663/883916/151040222; blk 1074 2088222 ... Heap2 ... 24/ 128, ... lsn: 2BC/46AB8C18 ... desc: clean: rel 1663/883916/151040222; blk 1074 remxid 107409880 ... Hmm. The xlogdump indeed shows that the order of 'clean' and 'visi ble' is incorrect, but I don't immediately see how that could cause the PANIC. Why is the page uninitialized in the standby? If VACUUM is removing some dead tuples from it, it certainly should exist and be correctly initialized. Yea, that's strange. I first thought it might be the PageIsNew() branch in lazy_scan_heap(). That initializes the page without WAL logging which would explain it being uninitialized on the standby. But that wouldn't explain why we found something to clean on the primary while the page is still empty on the standby... How did you set up the standby? Did you initialize it from an offline backup of the master's data directory, perhaps? The log shows that the startup took the the crash recovery first, then start archive recovery path, because there was no backup label file Good point. In particular, I have to ask because I've seen it before: you didn't delete backup_label from the backup, did you? I have seen that repeatedly too. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITHIN GROUP patch
Tom == Tom Lane t...@sss.pgh.pa.us writes: Furthermore, I can't help noticing that the increased complexity has now pretty much negated your original arguments for moving so much of the work out of nodeAgg.c. Tom The key reason for that was, and remains, not having the Tom behavior hard-wired in nodeAgg; I believe that this design Tom permits things to be accomplished in aggregate implementation Tom functions that would not have been possible with the original Tom patch. I'm willing to accept some code growth to have that Tom flexibility. Do you have an actual use case? -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in visibility map WAL-logging
On Tue, Jan 7, 2014 at 5:36 PM, Heikki Linnakangas hlinnakan...@vmware.comwrote: On 01/07/2014 07:15 PM, Matheus de Oliveira wrote: Hi folks, On Fri, Dec 13, 2013 at 9:47 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: lazy_vacuum_page() does this: 1. START_CRIT_SECTION() 2. Remove dead tuples from the page, marking the itemid's unused. 3. MarkBufferDirty 4. if all remaining tuples on the page are visible to everyone, set the all-visible flag on the page, and call visibilitymap_set() to set the VM bit. 5 visibilitymap_set() writes a WAL record about setting the bit in the visibility map. 6. write the WAL record of removing the dead tuples. 7. END_CRIT_SECTION(). See the problem? Setting the VM bit is WAL-logged first, before the removal of the tuples. If you stop recovery between the two WAL records, the page's VM bit in the VM map will be set, but the dead tuples are still on the page. This bug was introduced in Feb 2013, by commit fdf9e21196a6f58c6021c967dc5776a16190f295, so it's present in 9.3 and master. The fix seems quite straightforward, we just have to change the order of those two records. I'll go commit that. With a lot of help from Andres on IRC (thanks a lot man), I think (he thinks actually, =P ) I may ran into the issue this bug can cause. I'm sending this e-mail to (1) confirm if my issue is really caused by that bug and if that is the case, (2) let you guys know the problems it can cause. Scenario: Master: 9.3.1 (I know, trying to go to 9.3.2) Slave: 9.3.2 The slave was running with hot_standby=off (because of other bug Andres is working on), but it stopped working with the following log lines: 2014-01-07 12:38:04 BRST [22668]: [7-1] user=,db= WARNING: page 1076 of relation base/883916/151040222 is uninitialized 2014-01-07 12:38:04 BRST [22668]: [8-1] user=,db= CONTEXT: xlog redo visible: rel 1663/883916/151040222; blk 1076 2014-01-07 12:38:04 BRST [22668]: [9-1] user=,db= PANIC: WAL contains references to invalid pages 2014-01-07 12:38:04 BRST [22668]: [10-1] user=,db= CONTEXT: xlog redo visible: rel 1663/883916/151040222; blk 1076 2014-01-07 12:38:04 BRST [22665]: [3-1] user=,db= LOG: startup process (PID 22668) was terminated by signal 6: Aborted (Complete log at https://pgsql.privatepaste.com/343f3190fe). I used pg_xlogdump to (try to) find the block the error relates to: $ pg_xlogdump pg_xlog/000102BC002B 000102BC007F | grep -n -C5 '883916/151040222; blk 1076' 2088220 ... Heap2 ... 24/ 192, ... lsn: 2BC/46AB8B20 ... desc: clean: rel 1663/883916/151040222; blk 1073 remxid 107409880 2088221 ... Heap2 ... 20/52, ... lsn: 2BC/46AB8BE0 ... desc: visible: rel 1663/883916/151040222; blk 1074 2088222 ... Heap2 ... 24/ 128, ... lsn: 2BC/46AB8C18 ... desc: clean: rel 1663/883916/151040222; blk 1074 remxid 107409880 ... Hmm. The xlogdump indeed shows that the order of 'clean' and 'visible' is incorrect, but I don't immediately see how that could cause the PANIC. Well... I also didn't understand if it could cause the PANIC. If I got right, it could only cause a visibility map bit with wrong value (e.g. first change the bit but fails to mark the tuple, in case of a failure in between - which seems that not happened). Is that right? Why is the page uninitialized in the standby? If VACUUM is removing some dead tuples from it, it certainly should exist and be correctly initialized. That I don't know for sure... How did you set up the standby? Did you initialize it from an offline backup of the master's data directory, perhaps? The log shows that the startup took the the crash recovery first, then start archive recovery path, because there was no backup label file. In that mode, the standby assumes that the system is consistent after replaying all the WAL in pg_xlog, which is correct if you initialize from an offline backup or atomic filesystem snapshot, for example. But WAL contains references to invalid pages could also be a symptom of an inconsistent base backup, cause by incorrect backup procedure. In particular, I have to ask because I've seen it before: you didn't delete backup_label from the backup, did you? Well, I cannot answer this right now, but makes all sense and is possible. I used a script created by someone else that does pg_start_backup + rsync + pg_stop_backup, but in fact I didn't look at this script to see if it is doing something nasty, as removing backup_label. I'll be able to check it tomorrow and so I'll come back to give a definitive answer. @andres, if it is really removing backup_label it could also cause that other issue we saw on Monday, right? (yes I did run the same script). BTW, is there any situation that backup_label should be removed? I see no reason for doing this, and also have not yet saw someone doing it, so I didn't even thought that could be it. Thank you guys for your attention. Best regards, --
Re: [HACKERS] dynamic shared memory and locks
On Tue, Jan 7, 2014 at 6:54 AM, Andres Freund and...@2ndquadrant.com wrote: Maybe it makes sense to have such a check #ifdef'ed out on most builds to avoid extra overhead, but not having any check at all just because we trust the review process too much doesn't strike me as the best of ideas. I don't think that check would have relevantly high performance impact in comparison to the rest of --enable-cassert - it's a single process local variable which is regularly accessed. It will just stay in L1 or even registers. I tried it out on a 16-core, 64-hwthread community box, PPC. pgbench -S, 5-minute rules at scale factor 300 with shared_buffers=8GB: resultsr.cassert.32.300.300:tps = 11341.627815 (including connections establishing) resultsr.cassert.32.300.300:tps = 11339.407976 (including connections establishing) resultsr.cassert.32.300.300:tps = 11321.339971 (including connections establishing) resultsr.cassert-spin-multiplex.32.300.300:tps = 11492.927372 (including connections establishing) resultsr.cassert-spin-multiplex.32.300.300:tps = 11471.810937 (including connections establishing) resultsr.cassert-spin-multiplex.32.300.300:tps = 11516.471053 (including connections establishing) By comparison: resultsr.master.32.300.300:tps = 197939.111998 (including connections establishing) resultsr.master.32.300.300:tps = 198641.337720 (including connections establishing) resultsr.master.32.300.300:tps = 198675.404349 (including connections establishing) So yeah, the overhead is negligible, if not negative. None of the other changes in that patch were even compiled in this case, since all that code is protected by --disable-spinlocks. Maybe somebody can find another workload where the overhead is visible, but here it's not. On the other hand, I did discover another bit of ugliness - the macros actually have to be defined this way: +#define SpinLockAcquire(lock) \ + (AssertMacro(!any_spinlock_held), any_spinlock_held = true, \ + S_LOCK(lock)) +#define SpinLockRelease(lock) \ + do { Assert(any_spinlock_held); any_spinlock_held = false; \ + S_UNLOCK(lock); } while (0) SpinLockAcquire has to be a comma-separated expression rather than a do {} while (0) block because S_LOCK returns a value (which is used when compiling with LWLOCK_STATS); SpinLockRelease, however, has to be done the other way because S_UNLOCK() is defined as a do {} while (0) block already on PPC among other architectures. Overall I don't really care enough about this to argue strenuously for it. I'm not nearly so confident as Tom that no errors of this type will creep into future code, but on the other hand, keeping --disable-spinlocks working reliably isn't significant enough for me to want to spend any political capital on it. It's probably got a dim future regardless of what we do here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITHIN GROUP patch
Andrew Gierth and...@tao11.riddles.org.uk writes: Tom == Tom Lane t...@sss.pgh.pa.us writes: Tom The key reason for that was, and remains, not having the Tom behavior hard-wired in nodeAgg; I believe that this design Tom permits things to be accomplished in aggregate implementation Tom functions that would not have been possible with the original Tom patch. I'm willing to accept some code growth to have that Tom flexibility. Do you have an actual use case? Not a concrete example of an aggregate to build, no; but it seemed plausible to me that a new aggregate might want more control over the sorting operation than was possible with your patch. Basically the only flexibility that was there was to sort a hypothetical row before or after its peers, right? Now it's got essentially full control over the sort parameters. One idea that comes to mind is that an aggregate that's interested in the top N rows might wish to flip the sort order, so that the rows it wants come out of the tuplesort first rather than last --- and/or it might want to tell the tuplesort about the row limitation, so that the bounded-sort logic can be used. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan
Robert Haas wrote: On Mon, Jan 6, 2014 at 9:40 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: Thank you for taking time to look at this patch. The peak memory usage for the discarded bitmap *can* be reflected in the number displayed for the bitmap heap scan by the following code in tbm_union() or tbm_intersect(): Hmm, fair point. But I'm still not convinced that we really need to add extra accounting for this. What's wrong with just reporting the number of exact and lossy pages? No. I intended to show the desired memory space for a TIDBitmap rather than the peak memory usage for that TIDBitmap. And I thought it'd be better for the latter to be displayed as additional information. However, I've removed the functionality for showing the desired memory space due to technical problems. Now I should probably remove the functionality for showing the peak memory usage too. Yes, as Andres mentioned, showing the peak memory usage is not a bad idea, I think. But I start to think it's not necessarily worth complicating the code ... If there are no objections of others, I'll remove extra accounting for showing the peak memory usage. Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITHIN GROUP patch
I wrote: There's probably some overhead from traversing advance_transition_function for each row, which your version wouldn't have done. 15% sounds pretty high for that though, since advance_transition_function doesn't have much to do when the transition function is non strict and the state value is pass-by-value (which internal is). It's possible that we could do something to micro-optimize that case. The most obvious micro-optimization that's possible there is to avoid redoing InitFunctionCallInfoData for each row. I tried this as in the attached patch. It seems to be good for about half a percent overall savings on your example case. That's not much, but then again this helps for *all* aggregates, and many of them are cheaper than ordered aggregates. I see about 2% overall savings on select count(*) from generate_series(1,100); which seems like a more interesting number. As against that, the roughly-kilobyte-sized FunctionCallInfoData is no longer just transient data on the stack, but persists for the lifespan of the query. We pay that already for plain FuncExpr and OpExpr nodes, though. On balance, I'm inclined to apply this; the performance benefit may be marginal but it seems like it makes advance_transition_function's API a tad cleaner by reducing the number of distinct structs it's hacking on. Comments? regards, tom lane diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c index 827b009..0dafb51 100644 *** a/src/backend/executor/nodeAgg.c --- b/src/backend/executor/nodeAgg.c *** typedef struct AggStatePerAggData *** 235,240 --- 235,248 */ Tuplesortstate *sortstate; /* sort object, if DISTINCT or ORDER BY */ + + /* + * This field is a pre-initialized FunctionCallInfo struct used for + * calling this aggregate's transfn. We save a few cycles per row by not + * re-initializing the unchanging fields; which isn't much, but it seems + * worth the extra space consumption. + */ + FunctionCallInfoData transfn_fcinfo; } AggStatePerAggData; /* *** static void initialize_aggregates(AggSta *** 290,297 AggStatePerGroup pergroup); static void advance_transition_function(AggState *aggstate, AggStatePerAgg peraggstate, ! AggStatePerGroup pergroupstate, ! FunctionCallInfoData *fcinfo); static void advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup); static void process_ordered_aggregate_single(AggState *aggstate, AggStatePerAgg peraggstate, --- 298,304 AggStatePerGroup pergroup); static void advance_transition_function(AggState *aggstate, AggStatePerAgg peraggstate, ! AggStatePerGroup pergroupstate); static void advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup); static void process_ordered_aggregate_single(AggState *aggstate, AggStatePerAgg peraggstate, *** initialize_aggregates(AggState *aggstate *** 399,419 * Given new input value(s), advance the transition function of an aggregate. * * The new values (and null flags) have been preloaded into argument positions ! * 1 and up in fcinfo, so that we needn't copy them again to pass to the ! * transition function. No other fields of fcinfo are assumed valid. * * It doesn't matter which memory context this is called in. */ static void advance_transition_function(AggState *aggstate, AggStatePerAgg peraggstate, ! AggStatePerGroup pergroupstate, ! FunctionCallInfoData *fcinfo) { ! int numTransInputs = peraggstate-numTransInputs; MemoryContext oldContext; Datum newVal; - int i; if (peraggstate-transfn.fn_strict) { --- 406,425 * Given new input value(s), advance the transition function of an aggregate. * * The new values (and null flags) have been preloaded into argument positions ! * 1 and up in peraggstate-transfn_fcinfo, so that we needn't copy them again ! * to pass to the transition function. We also expect that the static fields ! * of the fcinfo are already initialized; that was done by ExecInitAgg(). * * It doesn't matter which memory context this is called in. */ static void advance_transition_function(AggState *aggstate, AggStatePerAgg peraggstate, ! AggStatePerGroup pergroupstate) { ! FunctionCallInfoData *fcinfo = peraggstate-transfn_fcinfo; MemoryContext oldContext; Datum newVal; if (peraggstate-transfn.fn_strict) { *** advance_transition_function(AggState *ag *** 421,426 --- 427,435 * For a strict transfn, nothing happens when there's a NULL input; we * just keep the prior transValue. */ + int numTransInputs = peraggstate-numTransInputs; + int i; + for (i = 1; i = numTransInputs; i++) { if (fcinfo-argnull[i]) *** advance_transition_function(AggState *ag *** 467,478
Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
On Tue, Jan 7, 2014 at 2:46 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Jan 6, 2014 at 4:04 PM, james ja...@mansionfamily.plus.com wrote: The point remains that you need to duplicate it into every process that might want to use it subsequently, so it makes sense to DuplicateHandle into the parent, and then to advertise that handle value publicly so that other child processes can DuplicateHandle it back into their own process. Well, right now we just reopen the same object from all of the processes, which seems to work fine and doesn't require any of this complexity. The only problem I don't know how to solve is how to make a segment stick around for the whole postmaster lifetime. If duplicating the handle into the postmaster without its knowledge gets us there, it may be worth considering, but that doesn't seem like a good reason to rework the rest of the existing mechanism. I think one has to try this to see if it works as per the need. If it's not urgent, I can try this early next week? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cleanup in code
On Wed, Jan 8, 2014 at 1:25 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 01/07/2014 05:20 PM, Tom Lane wrote: David Rowley dgrowle...@gmail.com writes: I think it will be like Andres said up thread, to stop multiple evaluations of the expression passed to the macro. Exactly. We are not going to risk multiple evals in a macro as commonly used as elog/ereport; the risk/benefit ratio is just too high. I don't see anything wrong with suppressing this warning by inserting an additional return statement. The code is already plastered with such things, from the days before we had any unreachability hints in elog/ereport. And as I said upthread, there is no good reason to suppose that the unreachability hints are always recognized by every compiler. I take this behavior of MSVC as proof of that statement. Yeah, I was just surprised because I thought MSVC understood it. Committed the additional return statement. Thanks for committing both the patches for cleanup. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS]
Below attached patch is implementing following todo item.. machine-readable pg_controldata? http://www.postgresql.org/message-id/4b901d73.8030...@agliodbs.com Possible approaches: 1. Implement as backend function and provide a view to user. - But In this approach user can only get this information when server is running. 2. Extend pg_controldata tool to provide value of an individual attribute. A first draft version of the patch is attached in the mail, implemented using approach 2. Description of the patch: 1. Below is the list of the option which are exposed by pg_controldata (all checkpoint, redo and backup related options are exposed). I think we don't need to expose options for other parameters, because they will not be useful for using independently. (user can simply print all values same as old behavior) -v, --catalog-version catalog version -l, --latest-chcekpoint-loc Latest checkpoint location -L, --prev-chcekpoint-locPrior checkpoint location -r, --latest-checkpoint-redoloc Latest checkpoint's REDO location -t, --latest-checkpoint-timeline Latest checkpoint's TimeLineID -T, --latest-checkpoint-prevtimeline Latest checkpoint's PrevTimeLineID -c, --latest-checkpoint-time Time of latest checkpoint -x, --latest-checkpoint-nextxidLatest checkpoint's NextXID -o, --latest-checkpoint-nextoid Latest Latest checkpoint's NextOID -X, --latest-checkpoint-nextmulti-xactid Latest checkpoint's NextMultiXactId -O, --latest-checkpoint-nextmulti-offset Latest checkpoint's NextMultiOffset -q, --latest-checkpoint-oldestxidLatest checkpoint's oldestXID -a, --latest-checkpoint-oldest-activexid Latest checkpoint's oldestActiveXID -m, --latest-checkpoint-oldest-multixid Latest checkpoint's oldestMultiXid -e, --min-recovery-endloc Minimum recovery ending location -E, --min-recovery-endloc-timeline Min recovery ending loc's timeline -b, --backup-start-location Backup start location -B, --backup-end-location Backup end location 2. If user provide individual option, then direct value of that parameter will be printed (as shown in below example), parameter name is not printed as output so that user need not to parse the output. ./pg_controldata --latest-checkpoint-time-- run with latest-checkpoint-time option Tue 07 Jan 2014 05:22:42 PM IST --output is direct value 3. This feature can save parsing effort for user where user need to get the value of individual parameter i.e Time of latest checkpoint for Standby lag monitoring. Please provide your suggestion... Regards, Dilip controldata_v1.patch Description: controldata_v1.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] TODO: machine-readable pg_controldata
Below attached patch is implementing following todo item.. machine-readable pg_controldata? http://www.postgresql.org/message-id/4b901d73.8030...@agliodbs.com Possible approaches: 1. Implement as backend function and provide a view to user. - But In this approach user can only get this information when server is running. 2. Extend pg_controldata tool to provide value of an individual attribute. A first draft version of the patch is attached in the mail, implemented using approach 2. Description of the patch: 1. Below is the list of the option which are exposed by pg_controldata (all checkpoint, redo and backup related options are exposed). I think we don't need to expose options for other parameters, because they will not be useful for using independently. (user can simply print all values same as old behavior) -v, --catalog-version catalog version -l, --latest-chcekpoint-loc Latest checkpoint location -L, --prev-chcekpoint-locPrior checkpoint location -r, --latest-checkpoint-redoloc Latest checkpoint's REDO location -t, --latest-checkpoint-timeline Latest checkpoint's TimeLineID -T, --latest-checkpoint-prevtimeline Latest checkpoint's PrevTimeLineID -c, --latest-checkpoint-time Time of latest checkpoint -x, --latest-checkpoint-nextxidLatest checkpoint's NextXID -o, --latest-checkpoint-nextoid Latest Latest checkpoint's NextOID -X, --latest-checkpoint-nextmulti-xactid Latest checkpoint's NextMultiXactId -O, --latest-checkpoint-nextmulti-offset Latest checkpoint's NextMultiOffset -q, --latest-checkpoint-oldestxidLatest checkpoint's oldestXID -a, --latest-checkpoint-oldest-activexid Latest checkpoint's oldestActiveXID -m, --latest-checkpoint-oldest-multixid Latest checkpoint's oldestMultiXid -e, --min-recovery-endloc Minimum recovery ending location -E, --min-recovery-endloc-timeline Min recovery ending loc's timeline -b, --backup-start-location Backup start location -B, --backup-end-location Backup end location 2. If user provide individual option, then direct value of that parameter will be printed (as shown in below example), parameter name is not printed as output so that user need not to parse the output. ./pg_controldata --latest-checkpoint-time-- run with latest-checkpoint-time option Tue 07 Jan 2014 05:22:42 PM IST --output is direct value 3. This feature can save parsing effort for user where user need to get the value of individual parameter i.e Time of latest checkpoint for Standby lag monitoring. Please provide your suggestion... Regards, Dilip controldata_v1.patch Description: controldata_v1.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Get more from indices.
Hello, tgl I'm pretty sure that IsA test prevents the optimization from ever firing. Thank you. tgl But aside from hasty typos, Oops! I've picked up wrong node. It always denies pathkeys extension. | !IsA(member, Var)) is a mistake of the following. | !IsA(member-em_expr, Var)) tgl is there enough left of this optimization to tgl be worth the complication? Certainly. This patch is intended to be with my another UNION-ALL optimization pathce at first. It does very much with UNION-ORDERBY-LIMIT and also with UNION_ALL(Partitioned tables)-DISTINCT-ORDERBY-LIMIT. = test tables create table pu1 (a int not null, b int not null, c int, d text); create unique index i_pu1_ab on pu1 (a, b); create table cu11 (like pu1 including all) inherits (pu1); create table cu12 (like pu1 including all) inherits (pu1); create table pu2 (like pu1 including all); create table cu21 (like pu2 including all) inherits (pu2); create table cu22 (like pu2 including all) inherits (pu2); insert into cu11 (select a / 5, 4 - (a % 5), a, 'cu11' from generate_series(00, 09) a); insert into cu12 (select a / 5, 4 - (a % 5), a, 'cu12' from generate_series(10, 19) a); insert into cu21 (select a / 5, 4 - (a % 5), a, 'cu21' from generate_series(20, 29) a); insert into cu22 (select a / 5, 4 - (a % 5), a, 'cu22' from generate_series(30, 39) a); Following example is an implicit union derived from partitioned tables created as above. Limit is added to highlight the effect. 9.4dev with no patch, | =# explain (analyze on, costs off) select distinct * from pu1 order by a, b limit 100; | QUERY PLAN | | Limit (actual time=246.653..246.730 rows=100 loops=1) | - Unique (actual time=246.646..246.713 rows=100 loops=1) | - Sort (actual time=246.645..246.666 rows=100 loops=1) |Sort Key: pu1.a, pu1.b, pu1.c, pu1.d |Sort Method: external sort Disk: 5280kB |- Append (actual time=0.017..52.608 rows=20 loops=1) | - Seq Scan on pu1 (actual time=0.001..0.001 rows=0 loops=1) | - Seq Scan on cu11 (actual time=0.015..17.047 rows=10 loops=1) | - Seq Scan on cu12 (actual time=0.007..15.474 rows=10 loops=1) | Total runtime: 247.854 ms Fairly common query. It will get following plan with this patch. | =# explain (analyze on, costs off) select distinct * from pu1 order by a, b limit 100; | QUERY PLAN | - | Limit (actual time=0.108..0.350 rows=100 loops=1) | - Unique (actual time=0.104..0.329 rows=100 loops=1) | - Merge Append (actual time=0.103..0.214 rows=100 loops=1) | Sort Key: pu1.a, pu1.b, pu1.c, pu1.d | - Index Scan .. on pu1 (actual time=0.003..0.003 rows=0 loops=1) | - Index Scan .. on cu11 (actual time=0.049..0.110 rows=100 loops=1) | - Index Scan .. on cu12 (actual time=0.044..0.044 rows=1 loops=1) | Total runtime: 0.666 ms The same could be said on union with my another union-pullup patch. We will get the following result with only the union-pullup patch. |=# explain (analyze on, costs off) select * from cu11 union select * from cu12 union select * from cu21 union select * from cu22 order by a limit 1; | QUERY PLAN |--- | Limit (actual time=474.825..482.326 rows=1 loops=1) | - Unique (actual time=474.824..481.357 rows=1 loops=1) | - Sort (actual time=474.823..477.101 rows=1 loops=1) | Sort Key: cu11.a, cu11.b, cu11.c, cu11.d | Sort Method: external sort Disk: 10568kB | - Append (actual time=0.018..99.310 rows=40 loops=1) |- Seq Scan on cu11 (actual time=0.017..16.263 rows=10 loops=1) |- Seq Scan on cu12 (actual time=0.006..14.831 rows=10 loops=1) |- Seq Scan on cu21 (actual time=0.006..14.766 rows=10 loops=1) |- Seq Scan on cu22 (actual time=0.006..14.721 rows=10 loops=1) | Total runtime: 484.555 ms This is also a quite common operation but implicit DISTINCT prevents planner from selecting index scans. (ORDER BY is not essential but needed because planner does not consult distinct_pathkeys on planning scans and I've left it as it is.) The planner gives the following plan with this patch. | =# explain (analyze on, costs off) select * from cu11 union select * from cu12 union select * from cu21 union select * from cu22 order by a limit 1; |QUERY PLAN | - | Limit (actual time=0.197..14.739 rows=1 loops=1) | - Unique (actual time=0.191..13.527 rows=1 loops=1) | - Merge Append (actual time=0.191..7.894 rows=1 loops=1) |
Re: [HACKERS] Standalone synchronous master
On Wed, Nov 13, 2013 at 6:39 PM, Rajeev rastogi rajeev.rast...@huawei.com wrote: Add a new eager synchronous mode that starts out synchronous but reverts to asynchronous after a failure timeout period This would require some type of command to be executed to alert administrators of this change. http://archives.postgresql.org/pgsql-hackers/2011-12/msg01224.php This patch implementation is in the same line as it was given in the earlier thread. Some Of the additional important changes are: 1. Have added two GUC variable to take commands from user to be executed a. Master_to_standalone_cmd: To be executed before master switches to standalone mode. b. Master_to_sync_cmd: To be executed before master switches from sync mode to standalone mode. In description of both switches (a b), you are telling that it will switch to standalone mode, I think by your point 1b. you mean to say other way (switch from standalone to sync mode). Instead of getting commands, why can't we just log such actions? I think adding 3 new guc variables for this functionality seems to be bit high. Also what will happen when it switches to standalone mode incase there are some async standby's already connected to it before going to standalone mode, if it continues to send data then I think naming it as 'enable_standalone_master' is not good. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in visibility map WAL-logging
On Tue, Jan 7, 2014 at 11:36 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Hmm. The xlogdump indeed shows that the order of 'clean' and 'visible' is incorrect, but I don't immediately see how that could cause the PANIC. Why is the page uninitialized in the standby? If VACUUM is removing some dead tuples from it, it certainly should exist and be correctly initialized. Unless the vacuum subsequently truncated the file to be shorter and the backup was taken after that? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch (v2) for updatable security barrier views
I've been thinking about this some more and I don't think you can avoid the need to remap vars and attrs. I agree. I was hoping to let expand_targetlist / preprocess_targetlist do the job, but I'm no longer convinced that'll do the job without mangling them in the process. AIUI, your modified version of rewriteTargetView() will turn an UPDATE query with a rangetable of the form The patch I posted is clearly incorrect in several ways. Unfortunately I'm still coming up to speed with the rewriter / planner / executor at the same time as trying to make major modifications to them. This tends to produce some false starts. (It no longer attempts to use expand_targetlist in the rewriter, btw, that's long gone). Since the posted patch I've sorted out RETURNING list rewriting and a few other issues. There are some major issues remaining with the approach though: - If we have nested views, we need to pass the tuple ctid up the chain of expanded view subqueries so ExecModifyTable can consume it. This is proving to be a lot harder than I'd hoped. - expand_targetlist / preprocess_targetlist operate on the resultRelation. With updatable s.b. views, the resultRelation is no longer the relation from which tuples are being read for input into ExecModifyTable. - In subqueries, resultRelation is only set for the outermost layer. So preprocess_targetlist doesn't usefully add tlist entries for the inner subquery layers at all. - It is necessary to expand DEFAULTs into expression tlist entries in the innermost query layer so that Vars added further up in the subquery chain can refer to them. In the current experimental patch DEFAULTs aren't populated correctly. So we have the following needs: - passing ctids up through layers of subqueries - target-list expansion through layers of subqueries - Differentiating between resultRelation to heapmodifytuple and the source of the tuples to feed into heapmodifytuple now that these are no longer the same thing. I was originally hoping all this could be dealt with in the rewriter, by changing resultRelation to point to the next-inner-most RTE whenever a target view is expanded. This turns out to be too simplistic: - There is no ctid col on a view, so if we have v2 over v1 over table t, when we expand v2 there's no way to create a tlist entry to point to v1's ctid. It won't have one until we've expanded v1 into t in the next pass. The same issue applies to expanding the tlist to carry cols of t up the subquery chain in the rewrite phase. - Rowmarks are added to point to resultrelation after rewrite, and these then add tlist entries in preprocess_targetlist. These TLEs will point to the base resultRelation, which isn't correct when we're updating nested subqueries. So we just can't do this during recursive rewrite, because the required information is only available once the target view is fully expanded into nested subqueries. It seems that tlist fixup and injection of the ctid up the subquery chain must be done after all recursive rewriting. To do these fixups later, we need to keep track of which nested series of subqueries is the target, i.e. produces tuples including resjunk ctid cols to feed into ExecModifyTuple. Currently this information is resultRelation The more I think about this, the more I think that the approach of my original patch was neater. The idea was to have 2 new pieces of code: 1). In rewriteTargetView() decorate the target RTE with any security barrier quals (in the new rte-securityQuals field), instead of merging them with the main query's quals. So the output of this stage of rewriting would be something like rtable: 1: relation RTE (base table) - resultRelation - securityQuals = [view quals] fromList: [1] So you're proposing to still flatten views during rewrite, as the current code does, but just keep track of s.b. quals separately? If so, what about multiple S.B. views may be nested and their quals must be isolated from each other, not just from the outer query? That's why I see subqueries as such a useful model. 2). After all rewriting is complete, scan the query and turn all relation RTEs with non-empty securityQuals into subquery RTEs (making a copy of the original RTE in the case of the result relation). I'm not sure I understand how this would work in the face of multiple levels of nesting s.b. views. Are you thinking of doing recursive expansion? Another ugly feature of my earlier patch was the changes it made to expand_target_list() and the need to track the query's sourceRelation. I've been fighting the need to add the concept of a sourceRelation for this purpose too. Both of those things can be avoided simply by moving the subquery expansion code (2) to after expand_target_list(), and hence also after expand_inherited_tables(). That's certainly interesting. I'm reading over the patch now. There is still a lot more testing to be done
Re: [HACKERS] WIP patch (v2) for updatable security barrier views
On 01/08/2014 02:00 PM, Craig Ringer wrote: I'm not sure I understand how this would work in the face of multiple levels of nesting s.b. views. Are you thinking of doing recursive expansion? Never mind, that part is clearly covered in the patch. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Simple improvements to freespace allocation
Current freesapce code gives a new block insert target (NBT) from anywhere in table. That isn't very useful with bigger tables and it would be useful to be able to specify different algorithms for producing NBTs. ALTER TABLE foo WITH (freespace = ); Three simple and useful models come to mind * CONCURRENT This is the standard/current model. Naming it likes this emphasises why we pick NBTs in the way we do. * PACK We want the table to be smaller, so rather than run a VACUUM FULL we want to force the table to choose an NBT at start of table, even at the expense of concurrency. By avoiding putting new data at the top of the table we allow the possibility that VACUUM will shrink table size. This is same as current except we always reset the FSM pointer to zero and re-seek from there. This takes some time to have an effect, but is much less invasive than VACUUM FULL. * RECENT For large tables that are append-mostly use case it would be easier to prefer NBTs from the last two 1GB segments of a table, allowing them to be more easily cached. This is same as current except when we wrap we don't go to block 0 we go to first block of penultimate (max - 1) segment. For tables = 2 segments this is no change from existing algorithm. For larger tables it would focus updates/inserts into a much reduced and yet still large area and allow better cacheing. These are small patches. ...other possibilities, though more complex are... * IN-MEMORY A large table may only have some of its blocks in memory. It would be useful to force a NBT to be a block already in shared_buffers IFF a table is above a certain size (use same threshold as seq scans, i.e. 25% of shared_buffers). That may be difficult to achieve in practice, so not sure about this one. Like it? Any ideas? We might also allow a custom NBT policy though allowing user code at that point could be dangerous. Thoughts? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
On 8th Jan, 2014, Amit Kapila Wrote Add a new eager synchronous mode that starts out synchronous but reverts to asynchronous after a failure timeout period This would require some type of command to be executed to alert administrators of this change. http://archives.postgresql.org/pgsql-hackers/2011-12/msg01224.php This patch implementation is in the same line as it was given in the earlier thread. Some Of the additional important changes are: 1. Have added two GUC variable to take commands from user to be executed a. Master_to_standalone_cmd: To be executed before master switches to standalone mode. b. Master_to_sync_cmd: To be executed before master switches from sync mode to standalone mode. In description of both switches (a b), you are telling that it will switch to standalone mode, I think by your point 1b. you mean to say other way (switch from standalone to sync mode). Yes you are right. Its typo mistake. Instead of getting commands, why can't we just log such actions? I think adding 3 new guc variables for this functionality seems to be bit high. Actually in earlier discussion as well as in TODO added, it is mentioned to execute some kind of command to be executed to alert administrator. http://archives.postgresql.org/pgsql-hackers/2011-12/msg01224.php In my current patch, I have kept the LOG along with command. Also what will happen when it switches to standalone mode incase there are some async standby's already connected to it before going to standalone mode, if it continues to send data then I think naming it as 'enable_standalone_master' is not good. Yes we can change name to something more appropriate, some of them are: 1. enable_async_master 2. sync_standalone_master 3. enable_nowait_master 4. enable_nowait_resp_master Please provide your suggestion on above name or any other?. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple improvements to freespace allocation
On 01/08/2014 08:56 AM, Simon Riggs wrote: Current freesapce code gives a new block insert target (NBT) from anywhere in table. That isn't very useful with bigger tables and it would be useful to be able to specify different algorithms for producing NBTs. I've actually been surprised how little demand there has been for alternative algorithms. When I wrote the current FSM implementation, I expected people to start coming up with all kinds of wishes, but it didn't happen. There has been very few complaints, everyone seems to be satisfied with the way it works now. So I'm not convinced there's much need for this. ALTER TABLE foo WITH (freespace = ); Three simple and useful models come to mind * CONCURRENT This is the standard/current model. Naming it likes this emphasises why we pick NBTs in the way we do. * PACK We want the table to be smaller, so rather than run a VACUUM FULL we want to force the table to choose an NBT at start of table, even at the expense of concurrency. By avoiding putting new data at the top of the table we allow the possibility that VACUUM will shrink table size. This is same as current except we always reset the FSM pointer to zero and re-seek from there. This takes some time to have an effect, but is much less invasive than VACUUM FULL. We already reset the FSM pointer to zero on vacuum. Would the above actually make any difference in practice? * RECENT For large tables that are append-mostly use case it would be easier to prefer NBTs from the last two 1GB segments of a table, allowing them to be more easily cached. This is same as current except when we wrap we don't go to block 0 we go to first block of penultimate (max - 1) segment. For tables = 2 segments this is no change from existing algorithm. For larger tables it would focus updates/inserts into a much reduced and yet still large area and allow better cacheing. Umm, wouldn't that bloat the table with no limit? Putting my DBA/developer hat on, I don't understand when I would want to use that setting. These are small patches. ...other possibilities, though more complex are... * IN-MEMORY A large table may only have some of its blocks in memory. It would be useful to force a NBT to be a block already in shared_buffers IFF a table is above a certain size (use same threshold as seq scans, i.e. 25% of shared_buffers). That may be difficult to achieve in practice, so not sure about this one. Like it? Any ideas? Yeah, that seems nice, although I have feeling that it's not worth the complexity. There's one policy that I'd like to see: maintaining cluster order. When inserting a new tuple, try to place it close to other tuples with similar keys, to keep the table clustered. In practice, CLUSTER CONCURRENTLY might be more useful, though. We might also allow a custom NBT policy though allowing user code at that point could be dangerous. Yeah, I don't think there's much need for that. Overall, - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers