Re: [HACKERS] ASYNC Privileges proposal
In fairness NOTIFY has only had a payload since v9 (maybe 8.4?), and the issue of trust is mainly tied to data leaking from the payload, so I suspect I won't be last person to request this as people re-visit NOTIFY :) ...but I totally get your point of course. My first thought was also that having control at the channel-level would be ideal, but that would be a huge implementation change by the looks of things, would certainly affect performance a great deal more and would not really give much more benefit that could be attained with database-level control + a "SECURITY DEFINER" function. Chris On 20 May 2013 03:23, Tom Lane wrote: > Chris Farmiloe writes: > > I find the current LISTEN / NOTIFY rather limited in the context of > > databases with multiple roles. As it stands it is not possible to > restrict > > the use of LISTEN or NOTIFY to specific roles, and therefore > notifications > > (and their payloads) cannot really be trusted as coming from any > particular > > source. > > TBH, nobody has complained about this in the fifteen-plus years that > LISTEN has been around. I'm dubious about adding privilege-checking > overhead for everybody to satisfy a complaint from one person. > > > I'd like to propose a new ASYNC database privilege that would control > > whether a role can use LISTEN, NOTIFY and UNLISTEN statements and the > > associated pg_notify function. > > ... and if I did think that there were an issue here, I doubt I'd think > that a privilege as coarse-grained as that would fix it. Surely you'd > want per-channel privileges if you were feeling paranoid about this, > not to mention separate read and write privileges. But the demand for > that just isn't out there. > > regards, tom lane >
Re: [HACKERS] ASYNC Privileges proposal
Chris Farmiloe writes: > I find the current LISTEN / NOTIFY rather limited in the context of > databases with multiple roles. As it stands it is not possible to restrict > the use of LISTEN or NOTIFY to specific roles, and therefore notifications > (and their payloads) cannot really be trusted as coming from any particular > source. TBH, nobody has complained about this in the fifteen-plus years that LISTEN has been around. I'm dubious about adding privilege-checking overhead for everybody to satisfy a complaint from one person. > I'd like to propose a new ASYNC database privilege that would control > whether a role can use LISTEN, NOTIFY and UNLISTEN statements and the > associated pg_notify function. ... and if I did think that there were an issue here, I doubt I'd think that a privilege as coarse-grained as that would fix it. Surely you'd want per-channel privileges if you were feeling paranoid about this, not to mention separate read and write privileges. But the demand for that just isn't out there. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] ASYNC Privileges proposal
Hey all, I find the current LISTEN / NOTIFY rather limited in the context of databases with multiple roles. As it stands it is not possible to restrict the use of LISTEN or NOTIFY to specific roles, and therefore notifications (and their payloads) cannot really be trusted as coming from any particular source. If the payloads of notifications could be trusted, then applications could make better use of them, without fear of leaking any sensitive information to anyone who shouldn't be able to see it. I'd like to propose a new ASYNC database privilege that would control whether a role can use LISTEN, NOTIFY and UNLISTEN statements and the associated pg_notify function. ie: GRANT ASYNC ON DATABASE TO bob; REVOKE ASYNC ON DATABASE FROM bob; SECURITY DEFINER functions could then be used anywhere that a finer grained access control was required. I had a quick play to see what might be involved [attached], and would like to hear people thoughts; good idea, bad idea, not like that! etc Chris. async_privileges_r0.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \watch versus \timing
Jeff Janes writes: > I'd like to run same query repeatedly and see how long it takes each time. > I thought \watch would be excellent for this, but it turns out that using > \watch suppresses the output of \timing. > Is this intentional, or unavoidable? \watch uses PSQLexec not SendQuery; the latter implements \timing which I agree is arguably useful here, but also autocommit/auto-savepoint behavior which probably isn't a good idea. It might be a good idea to refactor those two routines into one routine with some sort of bitmap flags argument to control the various add-on behaviors, but that seems like not 9.3 material anymore. > Also, is it just or does the inability to watch more frequently than once a > second make it a lot less useful than it could be? It did not seem that exciting to me. In particular, we've already found out that \watch with zero delay is a pretty bad idea, so you'd have to make a case for what smaller minimum to use if it's not to be 1 second. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] psql \watch versus \timing
I'd like to run same query repeatedly and see how long it takes each time. I thought \watch would be excellent for this, but it turns out that using \watch suppresses the output of \timing. Is this intentional, or unavoidable? Also, is it just or does the inability to watch more frequently than once a second make it a lot less useful than it could be? Cheers, Jeff
Re: [HACKERS] Parallel Sort
On Wed, May 15, 2013 at 11:11 AM, Noah Misch wrote: > On Wed, May 15, 2013 at 08:12:34AM +0900, Michael Paquier wrote: > > The concept of clause parallelism for backend worker is close to the > > concept of clause shippability introduced in Postgres-XC. In the case of > > XC, the equivalent of the master backend is a backend located on a node > > called Coordinator that merges and organizes results fetched in parallel > > from remote nodes where data scans occur (on nodes called Datanodes). The > > backends used for tuple scans across Datanodes share the same data > > visibility as they use the same snapshot and transaction ID as the > backend > > on Coordinator. This is different from the parallelism as there is no > idea > > of snapshot import to worker backends. > > Worker backends would indeed share snapshot and XID. > > > However, the code in XC planner used for clause shippability evaluation > is > > definitely worth looking at just considering the many similarities it > > shares with parallelism when evaluating if a given clause can be executed > > on a worker backend or not. It would be a waste to implement twice the > same > > thing is there is code already available. > > Agreed. Local parallel query is very similar to distributed query; the > specific IPC cost multipliers differ, but that's about it. I hope we can > benefit from XC's experience in this area. > > I believe the parallel execution is much easier to be done if the data is partitioned. Of course it is possible to make only the sort operation parallel but then the question would be how to split and pass each tuple to workers. XC and Greenplum use notion of hash distributed table that enables the parallel sort (XC doesn't perform parallel sort on replicated table, I guess). For postgres, I don't think hash distributed table is foreseeable option, but MergeAppend over inheritance is a good choice to run in parallel. You won't even need to modify many lines of sort execution code if you correctly dispatch the work, as it's just to split and assign the subnode of query plan to workers. Transactions and locks will be tricky though, and we might end up introducing small set of snapshot sharing infra for the former and notion of session id rather than process id for the latter. I don't think SnapshotNow is the problem as anyway executor is reading catalogs with that today. Thanks, Hitoshi -- Hitoshi Harada
Re: [HACKERS] Assertion failure when promoting node by deleting recovery.conf and restart node
On 25 March 2013 19:14, Heikki Linnakangas wrote: > On 15.03.2013 04:25, Michael Paquier wrote: >> >> Hi, >> >> When trying to *promote* a slave as master by removing recovery.conf and >> restarting node, I found an assertion failure on master branch: >> LOG: database system was shut down in recovery at 2013-03-15 10:22:27 JST >> TRAP: FailedAssertion("!(ControlFile->minRecoveryPointTLI != 1)", File: >> "xlog.c", Line: 4954) >> (gdb) bt >> #0 0x7f95af03b2c5 in raise () from /usr/lib/libc.so.6 >> #1 0x7f95af03c748 in abort () from /usr/lib/libc.so.6 >> #2 0x0086ce71 in ExceptionalCondition (conditionName=0x8f2af0 >> "!(ControlFile->minRecoveryPointTLI != 1)", errorType=0x8f0813 >> "FailedAssertion", fileName=0x8f076b "xlog.c", >> lineNumber=4954) at assert.c:54 >> #3 0x004fe499 in StartupXLOG () at xlog.c:4954 >> #4 0x006f9d34 in StartupProcessMain () at startup.c:224 >> #5 0x0050ef92 in AuxiliaryProcessMain (argc=2, >> argv=0x7fffa6fc3d20) at bootstrap.c:423 >> #6 0x006f8816 in StartChildProcess (type=StartupProcess) at >> postmaster.c:4956 >> #7 0x006f39e9 in PostmasterMain (argc=6, argv=0x1c950a0) at >> postmaster.c:1237 >> #8 0x0065d59b in main (argc=6, argv=0x1c950a0) at main.c:197 >> Ok, this is not the cleanest way to promote a node as it doesn't do any >> safety checks relation at promotion but 9.2 and previous versions allowed >> to do that properly. >> >> The assertion has been introduced by commit 3f0ab05 in order to record >> properly minRecoveryPointTLI in control file at the end of recovery in the >> case of a crash. >> However, in the case of a slave node properly shutdown in recovery which >> is >> then restarted as a master, the code path of this assertion is taken. >> What do you think of the patch attached? It avoids the update of >> recoveryTargetTLI and recoveryTargetIsLatest if the node has been shutdown >> while in recovery. >> Another possibility could be to add in the assertion some conditions based >> on the state of controlFile but I think it is more consistent simply not >> to >> update those fields. > > > Simon, can you comment on this? ISTM we could just remove the assertion and > update the comment to mention that this can happen. If there is a min > recovery point, surely we always need to recover to the timeline containing > that point, so setting recoveryTargetTLI to minRecoveryPointTLI seems > sensible. Fixed using the latest TLI available and removing the assertion. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast promotion failure
On 7 May 2013 10:57, Heikki Linnakangas wrote: > While testing the bug from the "Assertion failure at standby promotion", I > bumped into a different bug in fast promotion. When the first checkpoint > after fast promotion is performed, there is no guarantee that the > checkpointer process is running with the correct, new, ThisTimeLineID. In > CreateCheckPoint(), we have this: > >> /* >> * An end-of-recovery checkpoint is created before anyone is >> allowed to >> * write WAL. To allow us to write the checkpoint record, >> temporarily >> * enable XLogInsertAllowed. (This also ensures ThisTimeLineID is >> * initialized, which we need here and in AdvanceXLInsertBuffer.) >> */ >> if (flags & CHECKPOINT_END_OF_RECOVERY) >> LocalSetXLogInsertAllowed(); > > > That ensures that ThisTimeLineID is updated when performing an > end-of-recovery checkpoint, but it doesn't get executed with fast promotion. > The consequence is that the checkpoint is created with the old timeline, and > subsequent recovery from it will fail. LocalSetXLogInsertAllowed() is called by CreateEndOfRecoveryRecord(), but in fast promotion this is called by Startup process, not Checkpointer process. So there is no longer an explicit call to InitXLOGAccess() in the checkpointer process via a checkpoint with flag CHECKPOINT_END_OF_RECOVERY. However, there is a call to RecoveryInProgress() at the top of the main loop of the checkpointer, which does explicitly state that it "initializes TimeLineID if it's not set yet." The checkpointer makes the decision about whether to run a restartpoint or a checkpoint directly from the answer to that, modified only in the case of an CHECKPOINT_END_OF_RECOVERY. So the appropiate call is made and I don't agree with the statement that this "doesn't get executed with fast promotion", or the title of the thread. So while I believe that the checkpointer might have an incorrect TLI and that you've seen a bug, what isn't clear is that the checkpointer is the only process that would see an incorrect TLI, or why such processes see an incorrect TLI. It seems equally likely at this point that the TLI may be set incorrectly somehow and that is why it is being read incorrectly. I see that the comment in InitXLOGAccess() is incorrect "ThisTimeLineID doesn't change so we need no lock to copy it", which seems worth correcting since there's no need to save time in a once only function. Continuing to investigate further. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fast promotion and log_checkpoints
On 1 May 2013 10:05, Fujii Masao wrote: > In HEAD, when the standby is promoted, recovery requests the checkpoint > but doesn't wait for its completion. I found the checkpoint starting log > message > of this checkpoint looks odd as follows: > > LOG: checkpoint starting: > > I think something like the following is better. > > LOG: checkpoint starting: end-of-recovery > > In 9.2 or before, "end-of-recovery" part is logged. Even if we changed the > behavior of end-of-recovery checkpoint, I think that it's more intuitive to > label it as "end-of-recovery". Thought? The checkpoint isn't an "end-of-recovery" checkpoint, its just the first checkpoint after the end of recovery. I don't think it should say "end-of-recovery". The problem is that we've now changed the code to trigger a checkpoint in a place that wasn't part of the original design, so the checkpoint called at that point isn't supplied with a reason and so has nothing to print. It would be possible to redesign this with a special new reason, or we could just use "time" as the reason, or we could just leave it. Do nothing is easy, though so are the others, so we can choose anything we want. What do we want it to say? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Block write statistics WIP
I have some time now for working on the mystery of why there are no block write statistics in the database. I hacked out the statistics collection and reporting side already, rough patch attached if you want to see the boring parts. I guessed that there had to be a gotcha behind why this wasn't done before now, and I've found one so far. All of the read statistics are collected with a macro that expects to know a Relation number. Callers to ReadBuffer pass one. On the write side, the two places that increment the existing write counters (pg_stat_statements, vacuum) are MarkBufferDirty and MarkBufferDirtyHint. Neither of those is given a Relation though. Inspecting the Buffer passed can only find the buffer tag's RelFileNode. I've thought of two paths to get a block write count out of that so far: -Provide a function to find the Relation from the RelFileNode. There is a warning about the perils of assuming you can map that way from a buftag value in buf_internals.h though: "Note: the BufferTag data must be sufficient to determine where to write the block, without reference to pg_class or pg_tablespace entries. It's possible that the backend flushing the buffer doesn't even believe the relation is visible yet (its xact may have started before the xact that created the rel). The storage manager must be able to cope anyway." -Modify every caller of MarkDirty* to include a relation when that information is available. There are about 200 callers of those functions around, so that won't be fun. I noted that many of them are in the XLog replay functions, which won't have the relation--but those aren't important to count anyway. Neither of these options feels very good to me, so selecting between the two feels like picking the lesser of two evils. I'm fine with chugging through all of the callers to modify MarkDirty*, but I didn't want to do that only to have the whole approach rejected as wrong. That's why I stopped here to get some feedback. The way that MarkDirty requires this specific underlying storage manager behavior to work properly strikes me as as a bit of a layering violation too. I'd like the read and write paths to have a similar API, but here they don't even operate on the same type of inputs. Addressing that is probably harder than just throwing a hack on the existing code though. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index a03bfa6..1773d59 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -467,15 +467,19 @@ CREATE VIEW pg_statio_all_tables AS pg_stat_get_blocks_fetched(C.oid) - pg_stat_get_blocks_hit(C.oid) AS heap_blks_read, pg_stat_get_blocks_hit(C.oid) AS heap_blks_hit, +pg_stat_get_blocks_dirtied(C.oid) AS heap_blks_dirtied, sum(pg_stat_get_blocks_fetched(I.indexrelid) - pg_stat_get_blocks_hit(I.indexrelid))::bigint AS idx_blks_read, sum(pg_stat_get_blocks_hit(I.indexrelid))::bigint AS idx_blks_hit, +sum(pg_stat_get_blocks_dirtied(I.indexrelid))::bigint AS idx_blks_dirtied, pg_stat_get_blocks_fetched(T.oid) - pg_stat_get_blocks_hit(T.oid) AS toast_blks_read, pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit, +pg_stat_get_blocks_dirtied(T.oid) AS toast_blks_dirtied, pg_stat_get_blocks_fetched(X.oid) - pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read, pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit +pg_stat_get_blocks_dirted(X.oid) AS tidx_blks_dirtied FROM pg_class C LEFT JOIN pg_index I ON C.oid = I.indrelid LEFT JOIN pg_class T ON C.reltoastrelid = T.oid LEFT JOIN @@ -530,6 +534,7 @@ CREATE VIEW pg_statio_all_indexes AS pg_stat_get_blocks_fetched(I.oid) - pg_stat_get_blocks_hit(I.oid) AS idx_blks_read, pg_stat_get_blocks_hit(I.oid) AS idx_blks_hit +pg_stat_get_blocks_dirtied(I.oid) AS idx_blks_dirtied FROM pg_class C JOIN pg_index X ON C.oid = X.indrelid JOIN pg_class I ON I.oid = X.indexrelid @@ -554,6 +559,7 @@ CREATE VIEW pg_statio_all_sequences AS pg_stat_get_blocks_fetched(C.oid) - pg_stat_get_blocks_hit(C.oid) AS blks_read, pg_stat_get_blocks_hit(C.oid) AS blks_hit +pg_stat_get_blocks_dirtied(C.oid) AS blks_dirtied FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE C.relkind = 'S'; @@ -622,6 +628,7 @@ CREATE VIEW pg_stat_database AS pg_stat_get_db_blocks_fetched(D.oid) - pg_stat_get_db_blocks_hit(D.oid) AS blks_re