Re: [HACKERS] [v9.2] DROP statement reworks
2011/10/20 Robert Haas robertmh...@gmail.com: On Thu, Oct 20, 2011 at 10:49 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: part-3: drop statement reworks for other object classes This is going to need some rebasing. OK, I rebased it. This patch includes bug-fix when we tried to drop non-existence operator family with IF EXISTS. I'm thinking we should probably pull that part out and do it separately. Seems like it should probably be back-patched. I checked REL9_0_STABLE branch: It seems to me v9.0 implementation is correct. It might be enbugged when OpFamilyCacheLookup() get missing_ok argument. :-( /* * RemoveOpFamily * Deletes an opfamily. */ void RemoveOpFamily(RemoveOpFamilyStmt *stmt) { Oid amID, opfID; HeapTuple tuple; ObjectAddress object; /* * Get the access method's OID. */ amID = GetSysCacheOid1(AMNAME, CStringGetDatum(stmt-amname)); if (!OidIsValid(amID)) ereport(ERROR, (errcode(ERRCODE_UNDEFINED_OBJECT), errmsg(access method \%s\ does not exist, stmt-amname))); /* * Look up the opfamily. */ tuple = OpFamilyCacheLookup(amID, stmt-opfamilyname); if (!HeapTupleIsValid(tuple)) { if (!stmt-missing_ok) ereport(ERROR, (errcode(ERRCODE_UNDEFINED_OBJECT), errmsg(operator family \%s\ does not exist for access method \%s\, NameListToString(stmt-opfamilyname), stmt-amname))); else ereport(NOTICE, (errmsg(operator family \%s\ does not exist for access method \%s\, NameListToString(stmt-opfamilyname), stmt-amname))); return; } -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] funny lock mode in DropTrigger
Excerpts from Alvaro Herrera's message of vie oct 21 00:40:26 -0300 2011: Besides, the docs state that no backend code uses ShareRowExclusiveLock anyway (13.3 Explicit locking). I guess that if the patch to reduce lock strength in alter table goes in again, it'll need to update this chapter to match. Ah, I also noticed that the ALTER EXTENSION and SECURITY LABEL commands use ShareUpdateExclusiveLock, and they failed when committed to note this in the 13.3 chapter of the docs. Not sure how strict we are about documenting these things. (I note COMMENT already fails to document its ShareUpdExcl lock). -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] funny lock mode in DropTrigger
I just noticed that DropTrigger uses ShareRowExclusiveLock to lock the relation it's going to drop the trigger on. The comment right above it says that this should match RemoveTriggerById, but that one uses AccessExclusiveLock -- so one of them (or the comment) is not right. Besides, the docs state that no backend code uses ShareRowExclusiveLock anyway (13.3 Explicit locking). I guess that if the patch to reduce lock strength in alter table goes in again, it'll need to update this chapter to match. -- Álvaro Herrera alvhe...@alvh.no-ip.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ProcessStandbyHSFeedbackMessage can make global xmin go backwards
Excerpts from Tom Lane's message of jue oct 20 19:20:19 -0300 2011: So I've concluded that there's just no point in the GetOldestXmin clamping, and we should just apply the xmin value we get from the standby if it passes basic sanity checks (the epoch test), and hope that we're not too late to prevent loss of the data the standby wanted. I am happy that the HS patch has introduced uses for the xid epoch. I had to use them for multixact truncation in the FK locks patch, and was nervous because I wasn't seeing any user in core code other than the txid functions; this lack of callers, added to the comments in GetNextXidAndEpoch (we do not support such things), made me feel a bit uncomfortable about using it. I'm happy to have this problem out of my mind now. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] psql \set vs \copy - bug or expected behaviour?
It looks like \copy is just passing the text of the query unadjusted to COPY. I get a syntax error on :x with the \copy below on both 9.0 and 9.1 === test script === \set x '''HELLO''' -- Works \echo :x -- Works \o '/tmp/test1.txt' COPY (SELECT :x) TO STDOUT; -- Doesn't work \copy (SELECT :x) TO '/tmp/test2.txt' === end script === -- Richard Huxton Archonet Ltd -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.2] DROP statement reworks
On Fri, Oct 21, 2011 at 5:08 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: It seems to me v9.0 implementation is correct. It might be enbugged when OpFamilyCacheLookup() get missing_ok argument. :-( Yep, looks that way. Will fix. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.2] Fix Leaky View Problem
So, I will split the patch into two parts as follows, in the next commit fest. Part-1) Views with security_barrier reloption The part-1 portion provides views security_barrier reloption; that enables to keep sub-queries unflatten in the prepjoin.c stage. In addition, these sub-queries (that originally come from views with security_barrier option) don't allow to push down qualifiers from upper level. It shall prevent both of the problematic scenarios. Part-2) Functions with leakproof attribute The part-2 portion provides functions leakproof attribute; that enables to push down leakproof functions into sub-queries, even if it originally come from security views. It shall minimize performance damages when we use view for row-level security purpose. 2011/10/19 Tom Lane t...@sss.pgh.pa.us: Robert Haas robertmh...@gmail.com writes: Well, there's clearly some way to prevent pushdown from happening, because sticking a LIMIT in there does the trick... I already pointed you at subquery_is_pushdown_safe ... regards, tom lane -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade if 'postgres' database is dropped
On Tue, Oct 4, 2011 at 12:11 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: pg_upgrade doesn't work if the 'postgres' database has been dropped in the old cluster: ~/pgsql.master$ bin/pg_upgrade -b ~/pgsql.91stable/bin -B bin/ -d ~/pgsql.91stable/data -D data-upgraded/ Performing Consistency Checks - Checking current, bin, and data directories ok Checking cluster versions ok Checking database user is a superuser ok Checking for prepared transactions ok Checking for reg* system OID user data types ok Checking for contrib/isn with bigint-passing mismatch ok Creating catalog dump ok Checking for prepared transactions ok New cluster database postgres does not exist in the old cluster Failure, exiting That's a bit unfortunate. We have some other tools that don't work without the 'postgres' database, like 'reindexdb -all', but it would still be nice if pg_upgrade did. +1. I think our usual policy is to try postgres first and then try template1, so it would seem logical for pg_upgrade to do the same. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.2] Fix Leaky View Problem
On Fri, Oct 21, 2011 at 10:36 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: So, I will split the patch into two parts as follows, in the next commit fest. Part-1) Views with security_barrier reloption The part-1 portion provides views security_barrier reloption; that enables to keep sub-queries unflatten in the prepjoin.c stage. In addition, these sub-queries (that originally come from views with security_barrier option) don't allow to push down qualifiers from upper level. It shall prevent both of the problematic scenarios. Part-2) Functions with leakproof attribute The part-2 portion provides functions leakproof attribute; that enables to push down leakproof functions into sub-queries, even if it originally come from security views. It shall minimize performance damages when we use view for row-level security purpose. Sounds reasonable. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dumpall Sets Roll default_tablespace Before Creating Tablespaces
On Wed, Oct 19, 2011 at 7:46 PM, Florian Pflug f...@phlo.org wrote: On Oct20, 2011, at 01:19 , Tom Lane wrote: Florian Pflug f...@phlo.org writes: Taking this even further, why do we bother with non-immutable (i.e., depending on the database's contents) checks during ALTER ROLE/DATABASET SET at all? Yeah, I was wondering about that one too. It would not solve all the problems here, but skipping validity checks would fix some of them. The trouble of course is what happens if the value is found to be bad when you try to use it ... Presumably we'd detect that during logon, because the GUC assignment hook will quite probably complain. I'd vote for emitting a warning in that case. This is also what we due currently if we fail to set the GUC to the desired value due to permission issues postgres=# create role r1 login; CREATE ROLE postgres=# create role r2; CREATE ROLE postgres=# alter role r1 set role = r2; ALTER ROLE postgres=# \connect - r1 WARNING: permission denied to set role r2 WARNING: permission denied to set role r2 You are now connected to database postgres as user r1. (Dunno why that WARNING appears twice) Since an ALTER DATABASE/ROLE SET doesn't prevent the user from overriding the value, ignoring invalid settings shouldn't be a security risk. I didn't realize these dependencies weren't immutable. If that is the desired behavior, then I agree a warning should be sufficient to catch typo's and oversights. If you did want to make them immutable, I also like Florian's idea of a dependency graph. This would make the dumps less readable though. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Join push-down for foreign tables
How about the current status of this patch, although it is still Waiting on author. If Hanada-san would propose contrib/pgsql_fdw as a basis of join-pushdown feature, I'll likely volunteer to review the patch. I'm also interested in this feature. Hopefully, I'd like to try other kind of pushing down (such as aggregate, sort, ...) or updatable foreign tables. :-) Thanks, 2011/10/10 Tom Lane t...@sss.pgh.pa.us: Robert Haas robertmh...@gmail.com writes: This might be out of left field, but wouldn't it make more sense to get postgresql_fdw committed first, and then add the join push-down functionality afterwards? I mean, otherwise, we're going to be left with a situation where we have join pushdown in core, but the only FDW that can actually make use of it elsewhere. There's likely to be a lot of FDW infrastructure that will not be exercised by anything in core or contrib ... regards, tom lane -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Synchronized snapshots versus multiple databases
I've thought of another nasty problem for the sync-snapshots patch. Consider the following sequence of events: 1. Transaction A, which is about to export a snapshot, is running in database X. 2. Transaction B is making some changes in database Y. 3. A takes and exports a snapshot showing B's xid as running. 4. Transaction B ends. 5. Autovacuum launches in database Y. It sees nothing running in Y, so it decides it can vacuum dead rows right up to nextXid, including anything B deleted. 6. Transaction C starts in database Y, and imports the snapshot from A. Now it thinks it can see rows deleted by B ... but vacuum is busy removing them, or maybe already finished doing so. The problem here is that A's xmin is ignored by GetOldestXmin when calculating cutoff XIDs for non-shared tables in database Y, so it doesn't protect would-be adoptees of the exported snapshot. I can see a few alternatives, none of them very pleasant: 1. Restrict exported snapshots to be loaded only by transactions running in the same database as the exporter. This would fix the problem, but it cuts out one of the main use-cases for sync snapshots, namely getting cluster-wide-consistent dumps in pg_dumpall. 2. Allow a snapshot exported from another database to be loaded so long as this doesn't cause the DB-local value of GetOldestXmin to go backwards. However, in scenarios such as the above, C is certain to fail such a test. To make it work, pg_dumpall would have to start advance guard transactions in each database before it takes the intended-to-be-shared snapshot, and probably even wait for these to be oldest. Ick. 3. Remove the optimization that lets GetOldestXmin ignore XIDs outside the current database. This sounds bad, but OTOH I don't think there's ever been any proof that this optimization is worth much in real-world usage. We've already had to lobotomize that optimization for walsender processes, anyway. 4. Somehow mark the xmin of a process that has exported a snapshot so that it will be honored in all DBs not just the current one. The difficulty here is that we'd need to know *at the time the snap is taken* that it's going to be exported. (Consider the scenario above, except that A doesn't get around to exporting the snapshot it took in step 3 until between steps 5 and 6. If the xmin wasn't already marked as globally applicable when vacuum looked at it in step 5, we lose.) This is do-able but it will contort the user-visible API of the sync snapshots feature. One way we could do it is to require that transactions that want to export snapshots set a transaction mode before they take their first snapshot. Thoughts, better ideas? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Log crashed backend's query v3
Hi, here's version 4 of the patch. On Wed, Oct 19, 2011 at 19:34, Robert Haas robertmh...@gmail.com wrote: I think it would be safer to write this so that pgstat_get_crashed_backend_activity writes its answer into a statically allocated buffer and returns a pointer to that buffer, rather than using palloc. I think the current coding might lead to a memory leak in the postmaster Good catch about the memory leak; I always assumed that the caller takes care of cleaning the memory context. But looking at the code, that doesn't seem to happen in postmaster. Using a global buffer would waste memory in every backend, but this is needed rarely only in postmaster. So instead I'm allocating the buffer on stack in LogChildExit(), and pass that to pgstat_get_crashed_backend_activity() in arguments. I use a character array of 1024 bytes in LogChildExit() since 'track_activity_query_size' is unknown at compile time (1024 is the default). I could have used alloca(), but doesn't seem portable or robust with arbitrary inputs coming from GUC. Also, how about having CreateSharedBackendStatus store the length of the backend activity buffer in a global somewhere, instead of repeating the calculation here? Sure, I added a BackendActivityBufferSize global to pgstat.c return command string not enabled; I'd suggest that we instead return command string not found, and avoid making judgements about how things got that way. Originally I wanted to use exact same messages as pg_stat_get_backend_activity; but you're right, we should be as accurate as possible. I think command string empty is better, since it means the PID was found, but it had a zero-length activity string. It's almost making me cry thinking about how much time this would have saved me Thanks for your review and the generous words. :) Regards, Marti From bce8e81ea4811a823ec7c3a0ad15ff63b5cd1be4 Mon Sep 17 00:00:00 2001 From: Marti Raudsepp ma...@juffo.org Date: Fri, 21 Oct 2011 18:36:50 +0300 Subject: [PATCH] Log crashed backend's query (activity string) The crashing query is often a good starting point in debugging the cause, and much more easily accessible than core dumps. We're extra-paranoid in reading the activity buffer since it might be corrupt. All non-ASCII characters are replaced with '?' Example output: LOG: server process (PID 31451) was terminated by signal 9: Killed DETAIL: Running query: DO LANGUAGE plpythonu 'import os;os.kill(os.getpid(),9)' --- src/backend/postmaster/pgstat.c | 73 ++- src/backend/postmaster/postmaster.c | 22 -- src/backend/utils/adt/ascii.c | 34 src/include/pgstat.h|2 + src/include/utils/ascii.h |1 + 5 files changed, 125 insertions(+), 7 deletions(-) diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index 44956c1..ba64f23 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -58,6 +58,7 @@ #include storage/pg_shmem.h #include storage/pmsignal.h #include storage/procsignal.h +#include utils/ascii.h #include utils/guc.h #include utils/memutils.h #include utils/ps_status.h @@ -2228,6 +2229,7 @@ static PgBackendStatus *MyBEEntry = NULL; static char *BackendClientHostnameBuffer = NULL; static char *BackendAppnameBuffer = NULL; static char *BackendActivityBuffer = NULL; +static Size BackendActivityBufferSize = 0; /* @@ -2310,9 +2312,12 @@ CreateSharedBackendStatus(void) } /* Create or attach to the shared activity buffer */ - size = mul_size(pgstat_track_activity_query_size, MaxBackends); + BackendActivityBufferSize = mul_size(pgstat_track_activity_query_size, + MaxBackends); BackendActivityBuffer = (char *) - ShmemInitStruct(Backend Activity Buffer, size, found); + ShmemInitStruct(Backend Activity Buffer, + BackendActivityBufferSize, + found); if (!found) { @@ -2751,6 +2756,70 @@ pgstat_get_backend_current_activity(int pid, bool checkUser) return backend information not available; } +/* -- + * pgstat_get_crashed_backend_activity() - + * + * Return a string representing the current activity of the backend with + * the specified PID. Like the function above, but reads shared memory with + * the expectation that it may be corrupt. Returns either a pointer to a + * constant string, or writes into the 'buffer' argument and returns it. + * + * This function is only intended to be used by postmaster to report the + * query that crashed the backend. In particular, no attempt is made to + * follow the correct concurrency protocol when accessing the + * BackendStatusArray. But that's OK, in the worst case we'll return a + * corrupted message. We also must take care not to trip on ereport(ERROR). + * + * Note: return strings for special cases match pg_stat_get_backend_activity. + * -- + */ +const char * +pgstat_get_crashed_backend_activity(int pid,
Re: [HACKERS] ProcessStandbyHSFeedbackMessage can make global xmin go backwards
On Wed, Oct 19, 2011 at 6:04 PM, Tom Lane t...@sss.pgh.pa.us wrote: ProcessStandbyHSFeedbackMessage has a race condition: it thinks it can call GetOldestXmin and then the result will politely hold still while it considers what to do next. But in fact, whoever has the oldest xmin could exit their transaction, allowing the global minimum to advance. If a VACUUM process then inspects the ProcArray, it could compute an oldest xmin that is newer than the value that ProcessStandbyHSFeedbackMessage installs just afterwards. So much for keeping the data the standby wanted. AFAICS we have to do all the logic about choosing the new value for MyProc-xmin while holding ProcArrayLock, which IMO means that it should go into a function in procarray.c. The fact that walsender.c is taking ProcArrayLock and whacking MyProc-xmin around is already a horrid violation of modularity, even if it weren't incorrect. Also, it seems like using GetOldestXmin is quite wrong here anyway; we don't really want a result modified by vacuum_defer_cleanup_age do we? It looks to me like that would result in vacuum_defer_cleanup_age being applied twice: the walsender process sets its xmin the defer age into the past, and then subsequent calls of GetOldestXmin compute a result that is the defer age further back. IOW this is an independent mechanism that also results in the computed global xmin going backwards. (Now that we have a standby feedback mechanism, I'm a bit tempted to propose getting rid of vacuum_defer_cleanup_age altogether, rather than hacking things to avoid the above.) curious: are these bugs in production, and what would be the user visible symptoms of seeing them in the wild? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_comments (was: Allow \dd to show constraint comments)
On Mon, Oct 17, 2011 at 05:04, Robert Haas robertmh...@gmail.com wrote: Hearing no cries of oh, yes, please, I'm marking this Returned with Feedback for now. We can always revisit it if we hear that more people want it. I think this would be an improvement, but it's pretty low on my wishlist. I haven't checked the patch though. Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ProcessStandbyHSFeedbackMessage can make global xmin go backwards
Merlin Moncure mmonc...@gmail.com writes: On Wed, Oct 19, 2011 at 6:04 PM, Tom Lane t...@sss.pgh.pa.us wrote: ProcessStandbyHSFeedbackMessage has a race condition: it thinks it can call GetOldestXmin and then the result will politely hold still while it considers what to do next. curious: are these bugs in production, and what would be the user visible symptoms of seeing them in the wild? There's no bug so far as data integrity on the master goes. The risk is that you'd see queries failing with replication conflicts on a hot-standby slave even though you thought you'd protected them by setting hot_standby_feedback = on. That would happen if any rows actually got vacuumed despite the standby's attempt to set an xmin that would protect them. This is possible anyway at walsender startup, but I think the logic errors made it more probable. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronized snapshots versus multiple databases
On Oct21, 2011, at 17:36 , Tom Lane wrote: 1. Restrict exported snapshots to be loaded only by transactions running in the same database as the exporter. This would fix the problem, but it cuts out one of the main use-cases for sync snapshots, namely getting cluster-wide-consistent dumps in pg_dumpall. Isn't the use-case getting consistent *parallel* dumps of a single database rather than consistent dump of multiple databases? Since we don't have atomic cross-database commits, what does using the same snapshot to dump multiple databases buy us? On that grounds, +1 for option 1 here. 3. Remove the optimization that lets GetOldestXmin ignore XIDs outside the current database. This sounds bad, but OTOH I don't think there's ever been any proof that this optimization is worth much in real-world usage. We've already had to lobotomize that optimization for walsender processes, anyway. Hm, we've told people who wanted cross-database access to tables in the past to either * use dblink or * not split their tables over multiple databases in the first place, and to use schemas instead If we remove the GetOldestXmin optimization, we're essentially reversing course on this. Do we really wanna go there? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] funny lock mode in DropTrigger
Alvaro Herrera alvhe...@alvh.no-ip.org writes: I just noticed that DropTrigger uses ShareRowExclusiveLock to lock the relation it's going to drop the trigger on. The comment right above it says that this should match RemoveTriggerById, but that one uses AccessExclusiveLock -- so one of them (or the comment) is not right. Yeah, this is a bug. I think what happened is: 1. Simon committed his patch to reduce lock levels. 2. Robert copied that lock level in his commit 4240e429d0c2d889d0cda23c618f94e12c13ade7 that refactored use of RangeVarGetRelid. 3. When I reverted most of Simon's change, it didn't occur to me to look for places that had copied the bad lock levels --- I was just looking at what his patch had done. I will fix this, and also do some looking to see if the bogus lock levels propagated anywhere else... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] funny lock mode in DropTrigger
Alvaro Herrera alvhe...@commandprompt.com writes: Ah, I also noticed that the ALTER EXTENSION and SECURITY LABEL commands use ShareUpdateExclusiveLock, and they failed when committed to note this in the 13.3 chapter of the docs. Not sure how strict we are about documenting these things. (I note COMMENT already fails to document its ShareUpdExcl lock). To my mind, the list in 13.3 is only meant to cover what lock levels are used *with tables*. Locks applied to other kinds of objects would probably need a different list altogether. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronized snapshots versus multiple databases
On 10/21/2011 12:05 PM, Florian Pflug wrote: On Oct21, 2011, at 17:36 , Tom Lane wrote: 1. Restrict exported snapshots to be loaded only by transactions running in the same database as the exporter. This would fix the problem, but it cuts out one of the main use-cases for sync snapshots, namely getting cluster-wide-consistent dumps in pg_dumpall. Isn't the use-case getting consistent *parallel* dumps of a single database rather than consistent dump of multiple databases? Since we don't have atomic cross-database commits, what does using the same snapshot to dump multiple databases buy us? That was my understanding of the use case. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] So, is COUNT(*) fast now?
Laments at: http://wiki.postgresql.org/wiki/FAQ#Why_is_.22SELECT_count.28.2A.29_FROM_bigtable.3B.22_slow.3F http://wiki.postgresql.org/wiki/Slow_Counting I tried this on my MacBook Pro this morning, using pgbench -i -s 500 to create a database about 7.5GB in size, and then using SELECT sum(1) FROM pgbench_accounts as a test query, on a build WITHOUT --enable-cassert. This machine has 4GB of memory, and I set shared_buffers = 400MB. (No, I'm not sure whether that's the optimal setting for shared_buffers for this machine.) With enable_indexonlyscan = false, times for this query are: 96799.747 ms, 89108.987 ms, 114433.664 ms. With enable_indexonlyscan = true, times were: 16779.325 ms, 16537.726 ms, 16703.229 ms. That's about six times faster. It's worth noting that the pgbench_accounts table has relatively narrow rows. On a table with wider rows (but not so wide that they get toasted and become narrow again), the benefit might be more. On the other hand, this is also a table that's just been vacuumed, and you've got the happy case where the table (6404 MB) does not fit in memory but but the index (1071 MB) does. What happens on a smaller test case? Here are the results at scale factor = 100: enable_indexonlyscan = false times: 1774.945 ms, 1784.985 ms, 1836.099 ms enable_indexonlyscan = true times: 1450.445 ms, 1452.407 ms, 1452.426 ms That's about a 23% speedup. At this scale factor, everything fits into memory, but the index by itself (214 MB) fits into memory while the table (1281 MB) does not. Let's crank the scale factor down some more. Here are the results at scale_factor = 20: enable_indexonlyscan = false times: 352.213 ms, 353.988 ms, 350.859 ms enable_indexonlyscan = true times: 300.623 ms, 301.355 ms, 302.346 ms Now the entire database fits into shared_buffers, but we're still seeing a 17% speedup. But this turns out to misleading. The ring-buffer logic is actually preventing shared_buffers from getting all of the heap blocks in cache quickly. If I run the query over and over again until the whole table actually makes it into shared buffers, the sequential scan gets much faster: enable_indexonlyscan = false times after lots and lots of prewarming: 215.487 ms, 219.006 ms, 218.490 ms That's a bit disappointing - it's now more than a third faster to do the sequential scan, even though the sequential scan has to touch six times as many blocks (at scale factor 20, index is 43 MB, table is 256 MB) all of which are in cache. Of course, touching that many fewer blocks does have some advantages if there is concurrent activity on the system, but it still seems unfortunate that the ratio of runtime to blocks touched is more than 8x higher for the index-only case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.2] Object access hooks with arguments support (v1)
When someone comes along in another year or two and adds materialized views, will they need to pass some additional data to the object access hook? Probably, but I bet you're the only one who can quickly figure out what it is. That's no good. We're not going to make changes to PostgreSQL core that only you can maintain, and that are likely to be broken by future commits. At this point I feel pretty good that someone can look at the stuff that we've done so far with SECURITY LABEL and the object access hooks, and if they add a new object type, they can make those things apply to the new object type too by copying what's already there, without making any reference to the sepgsql code. There's a clear abstraction boundary, and people who are working on one side of the boundary do not need to understand the details of what happens on the other side of the boundary. I had checked my older implementation based on 8.4.x or 9.0.x that includes all the features that I want to implement. At least, it does not require so much different information from ones needed by DAC model, although SELECT INTO was an exception. It might be quite natural because both works similar things. For example, sepgsql required Oid of source database to compute default security label on new database at createdb(). It was used to permission checks in DAC model also. For another example, sepgsql also required Oids of type-functions to check permissions on them at DefineType(). It was also used to DAC model except for these checks were commented out by #ifdef NOT_USED because of superuser() was already checked. So, how do we launch this efforts according to the principles: - Hooks being used to security checks also should be deployed around existing DAC checks. - The delivered arguments should not be model specific. I don't have clear idea to rework existing routines like as I proposed long before; that wrap-up a series of DAC checks and entrypoint of MAC hooks into a single function. A straightforward idea is to deploy object-access-hook around existing DAC checks with new OAT_* label, such as OAT_CREATE. In the case of relation creation, it shall be DefineRelation() and OpenIntoRel() to bypass internal invocation of heap_create_with_catalog(). Please tell me if we have different idea of code reworking to simplify deployment of the hooks. In this particular case, I think it might be reasonable to change the DAC behavior, so that a CREATE TABLE AS SELECT or SELECT INTO requires insert privileges on the new table as well as permission to create it in the first place. I don't particularly see any reason to require different privileges for CREATE TABLE followed by INSERT .. SELECT than what we require when the two commands are rolled into one. Prior to 9.0, this case couldn't arise, because we didn't have default privileges, so I'm inclined to think that the way it works now is a historical accident rather than a deliberate design decision. It will help this mismatch between DAC and MAC. I'll submit it as a separate patch to handle this behavior. Probably, all we need to do here is invocation of ExecCheckRTPerms(). Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.2] Object access hooks with arguments support (v1)
On Fri, Oct 21, 2011 at 12:44 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote: I had checked my older implementation based on 8.4.x or 9.0.x that includes all the features that I want to implement. At least, it does not require so much different information from ones needed by DAC model, although SELECT INTO was an exception. It might be quite natural because both works similar things. OK. It seems like it might be helpful to put together a list of all of the things you want to check permissions on, maybe on a wiki page somewhere, and indicate in there which ones are done and which ones are not done, and what additional information you think is needed in each case, and flag any MAC/DAC discrepancies that you are concerned about. I think that might help us reach agreement on the best way forward. Also, then, as we get things committed, we can track progress versus that roadmap. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronized snapshots versus multiple databases
Andrew Dunstan and...@dunslane.net writes: On 10/21/2011 12:05 PM, Florian Pflug wrote: On Oct21, 2011, at 17:36 , Tom Lane wrote: 1. Restrict exported snapshots to be loaded only by transactions running in the same database as the exporter. This would fix the problem, but it cuts out one of the main use-cases for sync snapshots, namely getting cluster-wide-consistent dumps in pg_dumpall. Isn't the use-case getting consistent *parallel* dumps of a single database rather than consistent dump of multiple databases? Since we don't have atomic cross-database commits, what does using the same snapshot to dump multiple databases buy us? That was my understanding of the use case. Um, which one are you supporting? Anyway, the value of using the same snapshot across all of a pg_dumpall run would be that you could be sure that what you'd dumped concerning role and tablespace objects was consistent with what you then dump about database-local objects. (In principle, anyway --- I'm not sure how much of that happens under SnapshotNow rules because of use of backend functions. But you'll most certainly never be able to guarantee it if pg_dumpall can't export its snapshot to each subsidiary pg_dump run.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronized snapshots versus multiple databases
Florian Pflug f...@phlo.org writes: On Oct21, 2011, at 17:36 , Tom Lane wrote: 3. Remove the optimization that lets GetOldestXmin ignore XIDs outside the current database. This sounds bad, but OTOH I don't think there's ever been any proof that this optimization is worth much in real-world usage. We've already had to lobotomize that optimization for walsender processes, anyway. Hm, we've told people who wanted cross-database access to tables in the past to either * use dblink or * not split their tables over multiple databases in the first place, and to use schemas instead If we remove the GetOldestXmin optimization, we're essentially reversing course on this. Do we really wanna go there? Huh? The behavior of GetOldestXmin is purely a backend-internal matter. I don't see how it's related to cross-database access --- or at least, changing this would not represent a significant move towards supporting that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] So, is COUNT(*) fast now?
Robert Haas robertmh...@gmail.com writes: That's a bit disappointing - it's now more than a third faster to do the sequential scan, even though the sequential scan has to touch six times as many blocks (at scale factor 20, index is 43 MB, table is 256 MB) all of which are in cache. Of course, touching that many fewer blocks does have some advantages if there is concurrent activity on the system, but it still seems unfortunate that the ratio of runtime to blocks touched is more than 8x higher for the index-only case. I don't know why you'd imagine that touching an index is free, or even cheap, CPU-wise. The whole point of the index-only optimization is to avoid I/O. When you try it on a case where there's no I/O to be saved, *and* no shared-buffers contention to be avoided, there's no way it's going to be a win. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronized snapshots versus multiple databases
On Fri, Oct 21, 2011 at 11:36 AM, Tom Lane t...@sss.pgh.pa.us wrote: I've thought of another nasty problem for the sync-snapshots patch. 1. Restrict exported snapshots to be loaded only by transactions running in the same database as the exporter. This would fix the problem, but it cuts out one of the main use-cases for sync snapshots, namely getting cluster-wide-consistent dumps in pg_dumpall. 2. Allow a snapshot exported from another database to be loaded so long as this doesn't cause the DB-local value of GetOldestXmin to go backwards. However, in scenarios such as the above, C is certain to fail such a test. To make it work, pg_dumpall would have to start advance guard transactions in each database before it takes the intended-to-be-shared snapshot, and probably even wait for these to be oldest. Ick. 3. Remove the optimization that lets GetOldestXmin ignore XIDs outside the current database. This sounds bad, but OTOH I don't think there's ever been any proof that this optimization is worth much in real-world usage. We've already had to lobotomize that optimization for walsender processes, anyway. 4. Somehow mark the xmin of a process that has exported a snapshot so that it will be honored in all DBs not just the current one. The difficulty here is that we'd need to know *at the time the snap is taken* that it's going to be exported. (Consider the scenario above, except that A doesn't get around to exporting the snapshot it took in step 3 until between steps 5 and 6. If the xmin wasn't already marked as globally applicable when vacuum looked at it in step 5, we lose.) This is do-able but it will contort the user-visible API of the sync snapshots feature. One way we could do it is to require that transactions that want to export snapshots set a transaction mode before they take their first snapshot. I am unexcited by #2 on usability grounds. I agree with you that #3 might end up being a fairly small pessimization in practice, but I'd be inclined to just do #1 for now and revisit the issue when and if someone shows an interest in revamping pg_dumpall to do what you're proposing (and hopefully a bunch of other cleanup too). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] So, is COUNT(*) fast now?
On Fri, Oct 21, 2011 at 1:18 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: That's a bit disappointing - it's now more than a third faster to do the sequential scan, even though the sequential scan has to touch six times as many blocks (at scale factor 20, index is 43 MB, table is 256 MB) all of which are in cache. Of course, touching that many fewer blocks does have some advantages if there is concurrent activity on the system, but it still seems unfortunate that the ratio of runtime to blocks touched is more than 8x higher for the index-only case. I don't know why you'd imagine that touching an index is free, or even cheap, CPU-wise. The whole point of the index-only optimization is to avoid I/O. When you try it on a case where there's no I/O to be saved, *and* no shared-buffers contention to be avoided, there's no way it's going to be a win. Well, call me naive, but I would have thought touching six times less data would make the operation run faster, not slower. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronized snapshots versus multiple databases
On Oct21, 2011, at 19:09 , Tom Lane wrote: Florian Pflug f...@phlo.org writes: On Oct21, 2011, at 17:36 , Tom Lane wrote: 3. Remove the optimization that lets GetOldestXmin ignore XIDs outside the current database. This sounds bad, but OTOH I don't think there's ever been any proof that this optimization is worth much in real-world usage. We've already had to lobotomize that optimization for walsender processes, anyway. Hm, we've told people who wanted cross-database access to tables in the past to either * use dblink or * not split their tables over multiple databases in the first place, and to use schemas instead If we remove the GetOldestXmin optimization, we're essentially reversing course on this. Do we really wanna go there? Huh? The behavior of GetOldestXmin is purely a backend-internal matter. I don't see how it's related to cross-database access --- or at least, changing this would not represent a significant move towards supporting that. AFAIR, the performance hit we'd take by making the vacuum cutoff point (i.e. GetOldestXmin()) global instead of database-local has been repeatedly used in the past as an against against cross-database queries. I have to admit that I currently cannot seem to find an entry in the archives to back that up, though. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dumpall Sets Roll default_tablespace Before Creating Tablespaces
On Oct21, 2011, at 16:42 , Phil Sorber wrote: If you did want to make them immutable, I also like Florian's idea of a dependency graph. This would make the dumps less readable though. Hm, I kinda reversed my opinion on that, though - i.e., I no longer think that the dependency graph idea has much merit. For two reasons First, dependencies work on OIDs, not on names. Thus, for the dependency machinery to work for GUCs, they'd also need to store OIDs instead of names of referenced schema objects. (Otherwise you get into trouble if objects are renamed) Which of course doesn't work, at least for roles, because roles are shared objects, but referenced objects might be database-local. (search_path, for example). best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Log crashed backend's query v3
On Fri, Oct 21, 2011 at 11:45 AM, Marti Raudsepp ma...@juffo.org wrote: It's almost making me cry thinking about how much time this would have saved me Thanks for your review and the generous words. :) I have committed this version. I'm expecting Tom to try to find a scenario in which it's unfixably broken, so we'll see how that turns out; but there seems to be significant support for this feature and I'm hopeful that this will pass (or can be made to pass) muster. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronized snapshots versus multiple databases
On Fri, Oct 21, 2011 at 1:40 PM, Florian Pflug f...@phlo.org wrote: AFAIR, the performance hit we'd take by making the vacuum cutoff point (i.e. GetOldestXmin()) global instead of database-local has been repeatedly used in the past as an against against cross-database queries. I have to admit that I currently cannot seem to find an entry in the archives to back that up, though. I think the main argument against cross-database queries is that every place in the backend that, for example, uses an OID to identify a table would need to be modified to use a database OID and a table OID. Even if the distributed performance penalty of such a change doesn't bother you, the amount of code churn that it would take to make such a change is mind-boggling. I haven't seen anyone explain why they really need this feature anyway, and I think it's going in the wrong direction. IMHO, anyone who wants to be doing cross-database queries should be using schemas instead, and if that's not workable for some reason, then we should improve the schema implementation until it becomes workable. I think that the target use case for separate databases ought to be multi-tenancy, but what is needed there is actually more isolation (e.g. wrt/role names, cluster-wide visibility of pg_database contents, etc.), not less. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronized snapshots versus multiple databases
On 10/21/2011 01:06 PM, Tom Lane wrote: Andrew Dunstanand...@dunslane.net writes: On 10/21/2011 12:05 PM, Florian Pflug wrote: On Oct21, 2011, at 17:36 , Tom Lane wrote: 1. Restrict exported snapshots to be loaded only by transactions running in the same database as the exporter. This would fix the problem, but it cuts out one of the main use-cases for sync snapshots, namely getting cluster-wide-consistent dumps in pg_dumpall. Isn't the use-case getting consistent *parallel* dumps of a single database rather than consistent dump of multiple databases? Since we don't have atomic cross-database commits, what does using the same snapshot to dump multiple databases buy us? That was my understanding of the use case. Um, which one are you supporting? #1 seemed OK from this POV. Everything else looks ickier and/or more fragile, at first glance anyway. Anyway, the value of using the same snapshot across all of a pg_dumpall run would be that you could be sure that what you'd dumped concerning role and tablespace objects was consistent with what you then dump about database-local objects. (In principle, anyway --- I'm not sure how much of that happens under SnapshotNow rules because of use of backend functions. But you'll most certainly never be able to guarantee it if pg_dumpall can't export its snapshot to each subsidiary pg_dump run.) For someone who is concerned with that, maybe pg_dumpall could have an option to take an EXCLUSIVE lock on the shared catalogs? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronized snapshots versus multiple databases
On Oct21, 2011, at 19:47 , Robert Haas wrote: On Fri, Oct 21, 2011 at 1:40 PM, Florian Pflug f...@phlo.org wrote: AFAIR, the performance hit we'd take by making the vacuum cutoff point (i.e. GetOldestXmin()) global instead of database-local has been repeatedly used in the past as an against against cross-database queries. I have to admit that I currently cannot seem to find an entry in the archives to back that up, though. I haven't seen anyone explain why they really need this feature anyway, and I think it's going in the wrong direction. IMHO, anyone who wants to be doing cross-database queries should be using schemas instead, and if that's not workable for some reason, then we should improve the schema implementation until it becomes workable. I think that the target use case for separate databases ought to be multi-tenancy, but what is needed there is actually more isolation (e.g. wrt/role names, cluster-wide visibility of pg_database contents, etc.), not less. Agreed. I wasn't trying to argue for cross-database queries - quite the opposite, actually. My point was more that since we've used database isolation as an argument against cross-database queries in the past, we shouldn't sacrifice it now for synchronized snapshots. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] So, is COUNT(*) fast now?
Robert Haas robertmh...@gmail.com writes: On Fri, Oct 21, 2011 at 1:18 PM, Tom Lane t...@sss.pgh.pa.us wrote: I don't know why you'd imagine that touching an index is free, or even cheap, CPU-wise. The whole point of the index-only optimization is to avoid I/O. When you try it on a case where there's no I/O to be saved, *and* no shared-buffers contention to be avoided, there's no way it's going to be a win. Well, call me naive, but I would have thought touching six times less data would make the operation run faster, not slower. It's not touching six times less data. It's touching the exact same number of tuples either way, just index tuples in one case and heap tuples in the other. You've arranged the test case so that all these tuples are in shared buffers already, so there's no data movement to be avoided. What this test case proves is that btree's overhead per index tuple touched is significantly more than the cost of the fastest path through HeapTupleSatisfiesMVCC, which I don't find surprising considering how much sweat has been expended on that code path over the years. (It may well be that it's not even btree at fault but the per-tuple visits to the visibility map ... did you do any oprofiling yet?) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronized snapshots versus multiple databases
Florian Pflug f...@phlo.org writes: AFAIR, the performance hit we'd take by making the vacuum cutoff point (i.e. GetOldestXmin()) global instead of database-local has been repeatedly used in the past as an against against cross-database queries. I have to admit that I currently cannot seem to find an entry in the archives to back that up, though. To my mind, the main problem with cross-database queries is that none of the backend is set up to deal with more than one set of system catalogs. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronized snapshots versus multiple databases
On Fri, Oct 21, 2011 at 2:06 PM, Florian Pflug f...@phlo.org wrote: On Oct21, 2011, at 19:47 , Robert Haas wrote: On Fri, Oct 21, 2011 at 1:40 PM, Florian Pflug f...@phlo.org wrote: AFAIR, the performance hit we'd take by making the vacuum cutoff point (i.e. GetOldestXmin()) global instead of database-local has been repeatedly used in the past as an against against cross-database queries. I have to admit that I currently cannot seem to find an entry in the archives to back that up, though. I haven't seen anyone explain why they really need this feature anyway, and I think it's going in the wrong direction. IMHO, anyone who wants to be doing cross-database queries should be using schemas instead, and if that's not workable for some reason, then we should improve the schema implementation until it becomes workable. I think that the target use case for separate databases ought to be multi-tenancy, but what is needed there is actually more isolation (e.g. wrt/role names, cluster-wide visibility of pg_database contents, etc.), not less. Agreed. I wasn't trying to argue for cross-database queries - quite the opposite, actually. My point was more that since we've used database isolation as an argument against cross-database queries in the past, we shouldn't sacrifice it now for synchronized snapshots. Right, I agree. It might be nice to take a cluster-wide dump that is guaranteed to be transactionally consistent, but I bet a lot of people would actually be happier to see us go the opposite direction - e.g. give each database its own XID space, so that activity in one database doesn't accelerate the need for anti-wraparound vacuums in another database. Not sure that could ever actually happen, but the point is that people probably should not be relying on serializability across databases too much, because the whole point of the multiple databases feature is to have multiple, independent databases in one cluster that are thoroughly isolated from each other, and any future changes we make should probably lean in that direction. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] So, is COUNT(*) fast now?
On Fri, Oct 21, 2011 at 2:08 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Fri, Oct 21, 2011 at 1:18 PM, Tom Lane t...@sss.pgh.pa.us wrote: I don't know why you'd imagine that touching an index is free, or even cheap, CPU-wise. The whole point of the index-only optimization is to avoid I/O. When you try it on a case where there's no I/O to be saved, *and* no shared-buffers contention to be avoided, there's no way it's going to be a win. Well, call me naive, but I would have thought touching six times less data would make the operation run faster, not slower. It's not touching six times less data. It's touching the exact same number of tuples either way, just index tuples in one case and heap tuples in the other. Yeah, but it works out to fewer pages. You've arranged the test case so that all these tuples are in shared buffers already, so there's no data movement to be avoided. What this test case proves is that btree's overhead per index tuple touched is significantly more than the cost of the fastest path through HeapTupleSatisfiesMVCC, which I don't find surprising considering how much sweat has been expended on that code path over the years. I think HeapTupleSatisfiesMVCC is probably being skipped anyway in this case, since all the heap pages should be PD_ALL_VISIBLE. (It may well be that it's not even btree at fault but the per-tuple visits to the visibility map ... did you do any oprofiling yet?) No, but I think that might be a good idea. Maybe I'll go do that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] psql command for bytea output
A few months ago, I blogged about the difficulty of getting psql to put a bytea datum into a file. See http://people.planetpostgresql.org/andrew/index.php?/archives/196-Clever-trick-challenge.html. Today I ran into the problem again, and it struck me that we could fairly easily have a new command to handle this, called, say, bcopy. So it would look like: \bcopy (select generate_bytea()) to foo.bin Thoughts? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronized snapshots versus multiple databases
Robert Haas robertmh...@gmail.com writes: On Fri, Oct 21, 2011 at 11:36 AM, Tom Lane t...@sss.pgh.pa.us wrote: 1. Restrict exported snapshots to be loaded only by transactions running in the same database as the exporter. This would fix the problem, but it cuts out one of the main use-cases for sync snapshots, namely getting cluster-wide-consistent dumps in pg_dumpall. I am unexcited by #2 on usability grounds. I agree with you that #3 might end up being a fairly small pessimization in practice, but I'd be inclined to just do #1 for now and revisit the issue when and if someone shows an interest in revamping pg_dumpall to do what you're proposing (and hopefully a bunch of other cleanup too). Seems like that is the consensus view, so that's what I'll do. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] So, is COUNT(*) fast now?
Robert Haas robertmh...@gmail.com writes: On Fri, Oct 21, 2011 at 2:08 PM, Tom Lane t...@sss.pgh.pa.us wrote: What this test case proves is that btree's overhead per index tuple touched is significantly more than the cost of the fastest path through HeapTupleSatisfiesMVCC, which I don't find surprising considering how much sweat has been expended on that code path over the years. I think HeapTupleSatisfiesMVCC is probably being skipped anyway in this case, since all the heap pages should be PD_ALL_VISIBLE. Proves my point ;-) ... you're comparing a code path that's been beat on for *years* with one that just got written. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql command for bytea output
2011/10/21 Andrew Dunstan andrew.duns...@pgexperts.com: A few months ago, I blogged about the difficulty of getting psql to put a bytea datum into a file. See http://people.planetpostgresql.org/andrew/index.php?/archives/196-Clever-trick-challenge.html. Today I ran into the problem again, and it struck me that we could fairly easily have a new command to handle this, called, say, bcopy. So it would look like: \bcopy (select generate_bytea()) to foo.bin Thoughts? isn't better to fix current tools to work well with bytea? Pavel cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql command for bytea output
On 10/21/2011 02:44 PM, Pavel Stehule wrote: 2011/10/21 Andrew Dunstanandrew.duns...@pgexperts.com: A few months ago, I blogged about the difficulty of getting psql to put a bytea datum into a file. See http://people.planetpostgresql.org/andrew/index.php?/archives/196-Clever-trick-challenge.html. Today I ran into the problem again, and it struck me that we could fairly easily have a new command to handle this, called, say, bcopy. So it would look like: \bcopy (select generate_bytea()) to foo.bin Thoughts? isn't better to fix current tools to work well with bytea? Such as? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql command for bytea output
2011/10/21 Andrew Dunstan andrew.duns...@pgexperts.com: On 10/21/2011 02:44 PM, Pavel Stehule wrote: 2011/10/21 Andrew Dunstanandrew.duns...@pgexperts.com: A few months ago, I blogged about the difficulty of getting psql to put a bytea datum into a file. See http://people.planetpostgresql.org/andrew/index.php?/archives/196-Clever-trick-challenge.html. Today I ran into the problem again, and it struck me that we could fairly easily have a new command to handle this, called, say, bcopy. So it would look like: \bcopy (select generate_bytea()) to foo.bin Thoughts? isn't better to fix current tools to work well with bytea? Such as? some like \copy ... to foo.bin format binary a COPY API can do it - it support a binary load and binary dump, so just there is missing interface Pavel cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql command for bytea output
On 10/21/2011 02:51 PM, Pavel Stehule wrote: 2011/10/21 Andrew Dunstanandrew.duns...@pgexperts.com: On 10/21/2011 02:44 PM, Pavel Stehule wrote: 2011/10/21 Andrew Dunstanandrew.duns...@pgexperts.com: A few months ago, I blogged about the difficulty of getting psql to put a bytea datum into a file. See http://people.planetpostgresql.org/andrew/index.php?/archives/196-Clever-trick-challenge.html. Today I ran into the problem again, and it struck me that we could fairly easily have a new command to handle this, called, say, bcopy. So it would look like: \bcopy (select generate_bytea()) to foo.bin Thoughts? isn't better to fix current tools to work well with bytea? Such as? some like \copy ... to foo.bin format binary a COPY API can do it - it support a binary load and binary dump, so just there is missing interface That would be fine if you could suppress the file header/trailer and field header, so all you got was the raw data. But making COPY do that seems no cleaner than what I suggested. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] So, is COUNT(*) fast now?
On Fri, Oct 21, 2011 at 2:33 PM, Tom Lane t...@sss.pgh.pa.us wrote: I think HeapTupleSatisfiesMVCC is probably being skipped anyway in this case, since all the heap pages should be PD_ALL_VISIBLE. Proves my point ;-) ... you're comparing a code path that's been beat on for *years* with one that just got written. I know. I wrote a chunk of it. :-) My point is just that it'd be nice to make it better. Anyhow, here's the scoop. On my desktop machine running F14, running SELECT sum(1) FROM pgbench_accounts in a tight loop, 60 s worth of oprofile data: 176830 13.0801 postgres postgres ExecProject 170028 12.5770 postgres postgres IndexOnlyNext 96631 7.1478 postgres postgres visibilitymap_test 86019 6.3628 postgres postgres advance_aggregates 74366 5.5009 postgres postgres ExecScan 72428 5.3575 postgres postgres ExecClearTuple 68483 5.0657 postgres postgres btgettuple 60614 4.4836 postgres postgres advance_transition_function 59680 4.4145 postgres postgres ExecProcNode 52295 3.8683 postgres postgres _bt_checkkeys 52078 3.8522 libc-2.12.90.so libc-2.12.90.so __memcpy_sse2 49548 3.6651 postgres postgres index_getnext_tid 48265 3.5702 postgres postgres ExecEvalConst 42989 3.1799 postgres postgres _bt_next 40544 2.9990 postgres postgres _bt_readpage 35162 2.6009 no-vmlinux no-vmlinux /no-vmlinux 33639 2.4883 postgres postgres MemoryContextReset And without index-only scans. but everything in shared_buffers: 169515 18.4261 postgres postgres ExecProject 9482710.3076 postgres postgres heapgettup_pagemode 84850 9.2231 postgres postgres advance_aggregates 57998 6.3043 postgres postgres advance_transition_function 55638 6.0478 postgres postgres ExecEvalConst 53684 5.8354 postgres postgres heapgetpage 51411 5.5883 postgres postgres ExecScan 48387 5.2596 postgres postgres ExecProcNode 44129 4.7968 postgres postgres ExecStoreTuple 30759 3.3435 postgres postgres heap_getnext 25923 2.8178 postgres postgres SeqNext 24145 2.6245 postgres postgres CheckForSerializableConflictOut 23155 2.5169 postgres postgres ExecAgg 18864 2.0505 postgres postgres heap_page_prune_opt 18784 2.0418 no-vmlinux no-vmlinux /no-vmlinux The index-only scan takes about 385 ms, while the non-index-only version takes about 284 ms. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] So, is COUNT(*) fast now?
On Fri, Oct 21, 2011 at 3:07 PM, Robert Haas robertmh...@gmail.com wrote: [ oprofile results ] *grovels through the line-by-line results* Hmm, I guess there is a bit of a hotspot in StoreIndexTuple, which is probably being folded into IndexOnlyNext in the per-function timings: ExecClearTuple(slot); for (i = 0; i nindexatts; i++) values[i] = index_getattr(itup, i + 1, itupdesc, isnull[i]); ExecStoreVirtualTuple(slot); If I'm reading these results right, that section is about 3% of the total number of samples. Also, this line is kind of expensive: if (!visibilitymap_test(scandesc-heapRelation, ItemPointerGetBlockNumber(tid), node-ioss_VMBuffer)) Around 2%. But I don't see any way to avoid that, or even make it cheaper. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] So, is COUNT(*) fast now?
Robert Haas robertmh...@gmail.com writes: Anyhow, here's the scoop. On my desktop machine running F14, running SELECT sum(1) FROM pgbench_accounts in a tight loop, 60 s worth of oprofile data: 176830 13.0801 postgres postgres ExecProject Hm, that's weird. In both these cases, I'd have expected that ExecProject would get optimized away thanks to selection of a physical tlist for the scan node. Wonder if that got broken ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] So, is COUNT(*) fast now?
Robert Haas robertmh...@gmail.com writes: Hmm, I guess there is a bit of a hotspot in StoreIndexTuple, which is probably being folded into IndexOnlyNext in the per-function timings: ExecClearTuple(slot); for (i = 0; i nindexatts; i++) values[i] = index_getattr(itup, i + 1, itupdesc, isnull[i]); ExecStoreVirtualTuple(slot); I had wondered whether it'd be worth optimizing that along the lines of slot_getallattrs(). But most indexes probably have only one column, or anyway not enough to make for a useful savings. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] So, is COUNT(*) fast now?
On Fri, Oct 21, 2011 at 3:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Anyhow, here's the scoop. On my desktop machine running F14, running SELECT sum(1) FROM pgbench_accounts in a tight loop, 60 s worth of oprofile data: 176830 13.0801 postgres postgres ExecProject Hm, that's weird. In both these cases, I'd have expected that ExecProject would get optimized away thanks to selection of a physical tlist for the scan node. Wonder if that got broken ... If it did, it looks like it wasn't recent. I set up the same test case on my MacBook using REL9_1_STABLE and REL9_0_STABLE and set a breakpoint on ExecProject(). Both back-branches appear to also call ExecProject() for every tuple. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql command for bytea output
Pavel Stehule pavel.steh...@gmail.com writes: 2011/10/21 Andrew Dunstan andrew.duns...@pgexperts.com: On 10/21/2011 02:44 PM, Pavel Stehule wrote: isn't better to fix current tools to work well with bytea? Such as? some like \copy ... to foo.bin format binary No, because COPY BINARY will emit its own sort of wrappers around the data. What I don't like about Andrew's proposal is that it seems rather limited. Why bytea in particular? Text chunks could probably also use a direct output method. And what about input? Could we do anything with a notion of a COPY RAW mode, that would dump or read the data without any header, column, or row separators? On input I suppose you'd have to restrict it to one column --- on output, we could leave re-dividing the data to the user's ingenuity ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Log crashed backend's query v3
Robert Haas robertmh...@gmail.com writes: I have committed this version. I'm expecting Tom to try to find a scenario in which it's unfixably broken, so we'll see how that turns out; but there seems to be significant support for this feature and I'm hopeful that this will pass (or can be made to pass) muster. I found some problems with it, but with the changes I just committed it seems like it should be fairly bulletproof. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Log crashed backend's query v3
On Fri, Oct 21, 2011 at 4:37 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I have committed this version. I'm expecting Tom to try to find a scenario in which it's unfixably broken, so we'll see how that turns out; but there seems to be significant support for this feature and I'm hopeful that this will pass (or can be made to pass) muster. I found some problems with it, but with the changes I just committed it seems like it should be fairly bulletproof. Cool. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql command for bytea output
2011/10/21 Tom Lane t...@sss.pgh.pa.us: Pavel Stehule pavel.steh...@gmail.com writes: 2011/10/21 Andrew Dunstan andrew.duns...@pgexperts.com: On 10/21/2011 02:44 PM, Pavel Stehule wrote: isn't better to fix current tools to work well with bytea? Such as? some like \copy ... to foo.bin format binary No, because COPY BINARY will emit its own sort of wrappers around the data. true - it's not useful for this case What I don't like about Andrew's proposal is that it seems rather limited. Why bytea in particular? Text chunks could probably also use a direct output method. And what about input? Could we do anything with a notion of a COPY RAW mode, that would dump or read the data without any header, column, or row separators? On input I suppose you'd have to restrict it to one column --- on output, we could leave re-dividing the data to the user's ingenuity ... +1 Andrew - has sense a input/output different than 1row/1column? Regards Pavel regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql command for bytea output
On 10/21/2011 04:43 PM, Pavel Stehule wrote: What I don't like about Andrew's proposal is that it seems rather limited. Why bytea in particular? Text chunks could probably also use a direct output method. And what about input? Could we do anything with a notion of a COPY RAW mode, that would dump or read the data without any header, column, or row separators? On input I suppose you'd have to restrict it to one column --- on output, we could leave re-dividing the data to the user's ingenuity ... +1 Andrew - has sense a input/output different than 1row/1column? Well, my main use case is only for output of a single datum. Sure we could run them all together on output, but it's hard to see what use that would be unless everything were of fixed size. Being able to load a single datum would certainly be good, and not just for bytea fields - for any large piece of text as well, in fact. One use case I have for that involves a number of fairly large XSL stylesheets that need to be loaded into a database. Currently that's done via psql variables, but it's a bit grotty. I also think this is really just a psql issue. Most other clients (e.g. a perl DBD::Pg client) will have no trouble handling all this now. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \set vs \copy - bug or expected behaviour?
On Fri, Oct 21, 2011 at 7:24 AM, Richard Huxton d...@archonet.com wrote: It looks like \copy is just passing the text of the query unadjusted to COPY. I get a syntax error on :x with the \copy below on both 9.0 and 9.1 === test script === \set x '''HELLO''' -- Works \echo :x -- Works \o '/tmp/test1.txt' COPY (SELECT :x) TO STDOUT; -- Doesn't work \copy (SELECT :x) TO '/tmp/test2.txt' === end script === I'm not sure whether that's a bug per se, but I can see where a behavior change might be an improvement. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql command for bytea output
Andrew Dunstan andrew.duns...@pgexperts.com writes: I also think this is really just a psql issue. Most other clients (e.g. a perl DBD::Pg client) will have no trouble handling all this now. Well, that's debatable, and in any case I think you'd find it will be easier to push this into COPY than to invent new psql backslash commands for it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXECUTE tab completion
On Thu, Oct 20, 2011 at 5:16 PM, Andreas Karlsson andr...@proxel.se wrote: A new version is attached. Looks fine. Marking ready for committer (CF 2011-11). Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] So, is COUNT(*) fast now?
On Fri, Oct 21, 2011 at 11:14 AM, Robert Haas robertmh...@gmail.com wrote: On Fri, Oct 21, 2011 at 2:08 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Fri, Oct 21, 2011 at 1:18 PM, Tom Lane t...@sss.pgh.pa.us wrote: I don't know why you'd imagine that touching an index is free, or even cheap, CPU-wise. The whole point of the index-only optimization is to avoid I/O. When you try it on a case where there's no I/O to be saved, *and* no shared-buffers contention to be avoided, there's no way it's going to be a win. Well, call me naive, but I would have thought touching six times less data would make the operation run faster, not slower. It's not touching six times less data. It's touching the exact same number of tuples either way, just index tuples in one case and heap tuples in the other. Yeah, but it works out to fewer pages. But since those pages are already in RAM, why would it matter all that much? (Other than in the case of highly concurrent access, which you don't seem to be testing?) One of Tom's commits that made it not lock the same index page over and over again (once for each tuple on it) made me think it should be much faster than the seq scan, but a bit of random flailing about convinced me that any saving from this were compensated for by the high over head of FunctionCall2Coll and all of the hokey-pokey that that call entails, which a seqscan can skip entirely. If count(*) could cause the index-only scan to happen in physical order of the index, rather than logical order, that might be a big win. Both for all in memory and for not-all-in-memory. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] database file encryption.
Hi, I am new to this list. I haven't ever contributed code before, and have no idea on how to do this. I have made some changes to my copy of the 9.1.1 code that encrypts/decrypts the database files on the fly using AES256 cypher. It passes all the tests. :-) The changes are limited to : src/backend/storage/file/fd.c src/backend/storage/file/buffile.c src/backend/libpq/be-fsstubs.c At the moment the password has been hardcoded into the source, I don't know how to get it passed in atm. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers