Re: [HACKERS] ASYNC Privileges proposal

2013-05-19 Thread Chris Farmiloe
In fairness NOTIFY has only had a payload since v9 (maybe 8.4?), and the
issue of trust is mainly tied to data leaking from the payload, so I
suspect I won't be last person to request this as people re-visit NOTIFY :)
...but I totally get your point of course.

My first thought was also that having control at the channel-level would be
ideal, but that would be a huge implementation change by the looks of
things, would certainly affect performance a great deal more and would not
really give much more benefit that could be attained with database-level
control + a "SECURITY DEFINER" function.


Chris


On 20 May 2013 03:23, Tom Lane  wrote:

> Chris Farmiloe  writes:
> > I find the current LISTEN / NOTIFY rather limited in the context of
> > databases with multiple roles. As it stands it is not possible to
> restrict
> > the use of LISTEN or NOTIFY to specific roles, and therefore
> notifications
> > (and their payloads) cannot really be trusted as coming from any
> particular
> > source.
>
> TBH, nobody has complained about this in the fifteen-plus years that
> LISTEN has been around.  I'm dubious about adding privilege-checking
> overhead for everybody to satisfy a complaint from one person.
>
> > I'd like to propose a new ASYNC database privilege that would control
> > whether a role can use LISTEN, NOTIFY and UNLISTEN statements and the
> > associated pg_notify function.
>
> ... and if I did think that there were an issue here, I doubt I'd think
> that a privilege as coarse-grained as that would fix it.  Surely you'd
> want per-channel privileges if you were feeling paranoid about this,
> not to mention separate read and write privileges.  But the demand for
> that just isn't out there.
>
> regards, tom lane
>


Re: [HACKERS] ASYNC Privileges proposal

2013-05-19 Thread Tom Lane
Chris Farmiloe  writes:
> I find the current LISTEN / NOTIFY rather limited in the context of
> databases with multiple roles. As it stands it is not possible to restrict
> the use of LISTEN or NOTIFY to specific roles, and therefore notifications
> (and their payloads) cannot really be trusted as coming from any particular
> source.

TBH, nobody has complained about this in the fifteen-plus years that
LISTEN has been around.  I'm dubious about adding privilege-checking
overhead for everybody to satisfy a complaint from one person.

> I'd like to propose a new ASYNC database privilege that would control
> whether a role can use LISTEN, NOTIFY and UNLISTEN statements and the
> associated pg_notify function.

... and if I did think that there were an issue here, I doubt I'd think
that a privilege as coarse-grained as that would fix it.  Surely you'd
want per-channel privileges if you were feeling paranoid about this,
not to mention separate read and write privileges.  But the demand for
that just isn't out there.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ASYNC Privileges proposal

2013-05-19 Thread Chris Farmiloe
Hey all,

I find the current LISTEN / NOTIFY rather limited in the context of
databases with multiple roles. As it stands it is not possible to restrict
the use of LISTEN or NOTIFY to specific roles, and therefore notifications
(and their payloads) cannot really be trusted as coming from any particular
source.

If the payloads of notifications could be trusted, then applications could
make better use of them, without fear of leaking any sensitive information
to anyone who shouldn't be able to see it.

I'd like to propose a new ASYNC database privilege that would control
whether a role can use LISTEN, NOTIFY and UNLISTEN statements and the
associated pg_notify function.

ie:
GRANT ASYNC ON DATABASE  TO bob;
REVOKE ASYNC ON DATABASE  FROM bob;

SECURITY DEFINER functions could then be used anywhere that a finer grained
access control was required.

I had a quick play to see what might be involved [attached], and would like
to hear people thoughts; good idea, bad idea, not like that! etc

Chris.


async_privileges_r0.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql \watch versus \timing

2013-05-19 Thread Tom Lane
Jeff Janes  writes:
> I'd like to run same query repeatedly and see how long it takes each time.
> I thought \watch would be excellent for this, but it turns out that using
> \watch suppresses the output of \timing.

> Is this intentional, or unavoidable?

\watch uses PSQLexec not SendQuery; the latter implements \timing which
I agree is arguably useful here, but also autocommit/auto-savepoint
behavior which probably isn't a good idea.

It might be a good idea to refactor those two routines into one routine
with some sort of bitmap flags argument to control the various add-on
behaviors, but that seems like not 9.3 material anymore.

> Also, is it just or does the inability to watch more frequently than once a
> second make it a lot less useful than it could be?

It did not seem that exciting to me.  In particular, we've already found
out that \watch with zero delay is a pretty bad idea, so you'd have to
make a case for what smaller minimum to use if it's not to be 1 second.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] psql \watch versus \timing

2013-05-19 Thread Jeff Janes
I'd like to run same query repeatedly and see how long it takes each time.

I thought \watch would be excellent for this, but it turns out that using
\watch suppresses the output of \timing.

Is this intentional, or unavoidable?

Also, is it just or does the inability to watch more frequently than once a
second make it a lot less useful than it could be?


Cheers,

Jeff


Re: [HACKERS] Parallel Sort

2013-05-19 Thread Hitoshi Harada
On Wed, May 15, 2013 at 11:11 AM, Noah Misch  wrote:

> On Wed, May 15, 2013 at 08:12:34AM +0900, Michael Paquier wrote:
> > The concept of clause parallelism for backend worker is close to the
> > concept of clause shippability introduced in Postgres-XC. In the case of
> > XC, the equivalent of the master backend is a backend located on a node
> > called Coordinator that merges and organizes results fetched in parallel
> > from remote nodes where data scans occur (on nodes called Datanodes). The
> > backends used for tuple scans across Datanodes share the same data
> > visibility as they use the same snapshot and transaction ID as the
> backend
> > on Coordinator. This is different from the parallelism as there is no
> idea
> > of snapshot import to worker backends.
>
> Worker backends would indeed share snapshot and XID.
>
> > However, the code in XC planner used for clause shippability evaluation
> is
> > definitely worth looking at just considering the many similarities it
> > shares with parallelism when evaluating if a given clause can be executed
> > on a worker backend or not. It would be a waste to implement twice the
> same
> > thing is there is code already available.
>
> Agreed.  Local parallel query is very similar to distributed query; the
> specific IPC cost multipliers differ, but that's about it.  I hope we can
> benefit from XC's experience in this area.
>
>
I believe the parallel execution is much easier to be done if the data is
partitioned.  Of course it is possible to make only the sort operation
parallel but then the question would be how to split and pass each tuple to
workers.  XC and Greenplum use notion of hash distributed table that
enables the parallel sort (XC doesn't perform parallel sort on replicated
table, I guess).  For postgres, I don't think hash distributed table is
foreseeable option, but MergeAppend over inheritance is a good choice to
run in parallel.  You won't even need to modify many lines of sort
execution code if you correctly dispatch the work, as it's just to split
and assign the subnode of query plan to workers.  Transactions and locks
will be tricky though, and we might end up introducing small set of
snapshot sharing infra for the former and notion of session id rather than
process id for the latter.  I don't think SnapshotNow is the problem as
anyway executor is reading catalogs with that today.

Thanks,
Hitoshi


-- 
Hitoshi Harada


Re: [HACKERS] Assertion failure when promoting node by deleting recovery.conf and restart node

2013-05-19 Thread Simon Riggs
On 25 March 2013 19:14, Heikki Linnakangas  wrote:
> On 15.03.2013 04:25, Michael Paquier wrote:
>>
>> Hi,
>>
>> When trying to *promote* a slave as master by removing recovery.conf and
>> restarting node, I found an assertion failure on master branch:
>> LOG:  database system was shut down in recovery at 2013-03-15 10:22:27 JST
>> TRAP: FailedAssertion("!(ControlFile->minRecoveryPointTLI != 1)", File:
>> "xlog.c", Line: 4954)
>> (gdb) bt
>> #0  0x7f95af03b2c5 in raise () from /usr/lib/libc.so.6
>> #1  0x7f95af03c748 in abort () from /usr/lib/libc.so.6
>> #2  0x0086ce71 in ExceptionalCondition (conditionName=0x8f2af0
>> "!(ControlFile->minRecoveryPointTLI != 1)", errorType=0x8f0813
>> "FailedAssertion", fileName=0x8f076b "xlog.c",
>>  lineNumber=4954) at assert.c:54
>> #3  0x004fe499 in StartupXLOG () at xlog.c:4954
>> #4  0x006f9d34 in StartupProcessMain () at startup.c:224
>> #5  0x0050ef92 in AuxiliaryProcessMain (argc=2,
>> argv=0x7fffa6fc3d20) at bootstrap.c:423
>> #6  0x006f8816 in StartChildProcess (type=StartupProcess) at
>> postmaster.c:4956
>> #7  0x006f39e9 in PostmasterMain (argc=6, argv=0x1c950a0) at
>> postmaster.c:1237
>> #8  0x0065d59b in main (argc=6, argv=0x1c950a0) at main.c:197
>> Ok, this is not the cleanest way to promote a node as it doesn't do any
>> safety checks relation at promotion but 9.2 and previous versions allowed
>> to do that properly.
>>
>> The assertion has been introduced by commit 3f0ab05 in order to record
>> properly minRecoveryPointTLI in control file at the end of recovery in the
>> case of a crash.
>> However, in the case of a slave node properly shutdown in recovery which
>> is
>> then restarted as a master, the code path of this assertion is taken.
>> What do you think of the patch attached? It avoids the update of
>> recoveryTargetTLI and recoveryTargetIsLatest if the node has been shutdown
>> while in recovery.
>> Another possibility could be to add in the assertion some conditions based
>> on the state of controlFile but I think it is more consistent simply not
>> to
>> update those fields.
>
>
> Simon, can you comment on this? ISTM we could just remove the assertion and
> update the comment to mention that this can happen. If there is a min
> recovery point, surely we always need to recover to the timeline containing
> that point, so setting recoveryTargetTLI to minRecoveryPointTLI seems
> sensible.

Fixed using the latest TLI available and removing the assertion.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast promotion failure

2013-05-19 Thread Simon Riggs
On 7 May 2013 10:57, Heikki Linnakangas  wrote:
> While testing the bug from the "Assertion failure at standby promotion", I
> bumped into a different bug in fast promotion. When the first checkpoint
> after fast promotion is performed, there is no guarantee that the
> checkpointer process is running with the correct, new, ThisTimeLineID. In
> CreateCheckPoint(), we have this:
>
>> /*
>>  * An end-of-recovery checkpoint is created before anyone is
>> allowed to
>>  * write WAL. To allow us to write the checkpoint record,
>> temporarily
>>  * enable XLogInsertAllowed.  (This also ensures ThisTimeLineID is
>>  * initialized, which we need here and in AdvanceXLInsertBuffer.)
>>  */
>> if (flags & CHECKPOINT_END_OF_RECOVERY)
>> LocalSetXLogInsertAllowed();
>
>
> That ensures that ThisTimeLineID is updated when performing an
> end-of-recovery checkpoint, but it doesn't get executed with fast promotion.
> The consequence is that the checkpoint is created with the old timeline, and
> subsequent recovery from it will fail.

LocalSetXLogInsertAllowed() is called by CreateEndOfRecoveryRecord(),
but in fast promotion this is called by Startup process, not
Checkpointer process. So there is no longer an explicit call to
InitXLOGAccess() in the checkpointer process via a checkpoint with
flag CHECKPOINT_END_OF_RECOVERY.

However, there is a call to RecoveryInProgress() at the top of the
main loop of the checkpointer, which does explicitly state that it
"initializes TimeLineID if it's not set yet." The checkpointer makes
the decision about whether to run a restartpoint or a checkpoint
directly from the answer to that, modified only in the case of an
CHECKPOINT_END_OF_RECOVERY.

So the appropiate call is made and I don't agree with the statement
that this "doesn't get executed with fast promotion", or the title of
the thread.

So while I believe that the checkpointer might have an incorrect TLI
and that you've seen a bug, what isn't clear is that the checkpointer
is the only process that would see an incorrect TLI, or why such
processes see an incorrect TLI. It seems equally likely at this point
that the TLI may be set incorrectly somehow and that is why it is
being read incorrectly.

I see that the comment in InitXLOGAccess() is incorrect
"ThisTimeLineID doesn't change so we need no lock to copy it", which
seems worth correcting since there's no need to save time in a once
only function.

Continuing to investigate further.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fast promotion and log_checkpoints

2013-05-19 Thread Simon Riggs
On 1 May 2013 10:05, Fujii Masao  wrote:

> In HEAD, when the standby is promoted, recovery requests the checkpoint
> but doesn't wait for its completion. I found the checkpoint starting log 
> message
> of this checkpoint looks odd as follows:
>
> LOG:  checkpoint starting:
>
> I think something like the following is better.
>
> LOG:  checkpoint starting: end-of-recovery
>
> In 9.2 or before, "end-of-recovery" part is logged. Even if we changed the
> behavior of end-of-recovery checkpoint, I think that it's more intuitive to
> label it as "end-of-recovery". Thought?

The checkpoint isn't an "end-of-recovery" checkpoint, its just the
first checkpoint after the end of recovery. I don't think it should
say "end-of-recovery".

The problem is that we've now changed the code to trigger a checkpoint
in a place that wasn't part of the original design, so the checkpoint
called at that point isn't supplied with a reason and so has nothing
to print.

It would be possible to redesign this with a special new reason, or we
could just use "time" as the reason, or we could just leave it.

Do nothing is easy, though so are the others, so we can choose
anything we want. What do we want it to say?

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Block write statistics WIP

2013-05-19 Thread Greg Smith
I have some time now for working on the mystery of why there are no 
block write statistics in the database.  I hacked out the statistics 
collection and reporting side already, rough patch attached if you want 
to see the boring parts.


I guessed that there had to be a gotcha behind why this wasn't done 
before now, and I've found one so far.  All of the read statistics are 
collected with a macro that expects to know a Relation number.  Callers 
to ReadBuffer pass one.  On the write side, the two places that 
increment the existing write counters (pg_stat_statements, vacuum) are 
MarkBufferDirty and MarkBufferDirtyHint.  Neither of those is given a 
Relation though.  Inspecting the Buffer passed can only find the buffer 
tag's RelFileNode.


I've thought of two paths to get a block write count out of that so far:

-Provide a function to find the Relation from the RelFileNode.  There is 
a warning about the perils of assuming you can map that way from a 
buftag value in buf_internals.h though:


"Note: the BufferTag data must be sufficient to determine where to write 
the block, without reference to pg_class or pg_tablespace entries.  It's 
possible that the backend flushing the buffer doesn't even believe the 
relation is visible yet (its xact may have started before the xact that 
created the rel).  The storage manager must be able to cope anyway."


-Modify every caller of MarkDirty* to include a relation when that 
information is available.  There are about 200 callers of those 
functions around, so that won't be fun.  I noted that many of them are 
in the XLog replay functions, which won't have the relation--but those 
aren't important to count anyway.


Neither of these options feels very good to me, so selecting between the 
two feels like picking the lesser of two evils.  I'm fine with chugging 
through all of the callers to modify MarkDirty*, but I didn't want to do 
that only to have the whole approach rejected as wrong.  That's why I 
stopped here to get some feedback.


The way that MarkDirty requires this specific underlying storage manager 
behavior to work properly strikes me as as a bit of a layering violation 
too.  I'd like the read and write paths to have a similar API, but here 
they don't even operate on the same type of inputs.  Addressing that is 
probably harder than just throwing a hack on the existing code though.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index a03bfa6..1773d59 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -467,15 +467,19 @@ CREATE VIEW pg_statio_all_tables AS
 pg_stat_get_blocks_fetched(C.oid) -
 pg_stat_get_blocks_hit(C.oid) AS heap_blks_read,
 pg_stat_get_blocks_hit(C.oid) AS heap_blks_hit,
+pg_stat_get_blocks_dirtied(C.oid) AS heap_blks_dirtied,
 sum(pg_stat_get_blocks_fetched(I.indexrelid) -
 pg_stat_get_blocks_hit(I.indexrelid))::bigint AS 
idx_blks_read,
 sum(pg_stat_get_blocks_hit(I.indexrelid))::bigint AS idx_blks_hit,
+sum(pg_stat_get_blocks_dirtied(I.indexrelid))::bigint AS 
idx_blks_dirtied,
 pg_stat_get_blocks_fetched(T.oid) -
 pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
 pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
+pg_stat_get_blocks_dirtied(T.oid) AS toast_blks_dirtied,
 pg_stat_get_blocks_fetched(X.oid) -
 pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
 pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
+pg_stat_get_blocks_dirted(X.oid) AS tidx_blks_dirtied
 FROM pg_class C LEFT JOIN
 pg_index I ON C.oid = I.indrelid LEFT JOIN
 pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
@@ -530,6 +534,7 @@ CREATE VIEW pg_statio_all_indexes AS
 pg_stat_get_blocks_fetched(I.oid) -
 pg_stat_get_blocks_hit(I.oid) AS idx_blks_read,
 pg_stat_get_blocks_hit(I.oid) AS idx_blks_hit
+pg_stat_get_blocks_dirtied(I.oid) AS idx_blks_dirtied
 FROM pg_class C JOIN
 pg_index X ON C.oid = X.indrelid JOIN
 pg_class I ON I.oid = X.indexrelid
@@ -554,6 +559,7 @@ CREATE VIEW pg_statio_all_sequences AS
 pg_stat_get_blocks_fetched(C.oid) -
 pg_stat_get_blocks_hit(C.oid) AS blks_read,
 pg_stat_get_blocks_hit(C.oid) AS blks_hit
+pg_stat_get_blocks_dirtied(C.oid) AS blks_dirtied
 FROM pg_class C
 LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
 WHERE C.relkind = 'S';
@@ -622,6 +628,7 @@ CREATE VIEW pg_stat_database AS
 pg_stat_get_db_blocks_fetched(D.oid) -
 pg_stat_get_db_blocks_hit(D.oid) AS blks_re