Re: [HACKERS] ExecutorCheckPerms() hook

2010-05-24 Thread KaiGai Kohei
(2010/05/21 1:14), Robert Haas wrote:
 In yesterday's development meeting, we talked about the possibility of
 a basic SE-PostgreSQL implementation that checks permissions only for
 DML.  Greg Smith offered the opinion that this could provide much of
 the benefit of SE-PostgreSQL for many users, while being much simpler.
   In fact, SE-PostgreSQL would need to get control in just one place:
 ExecCheckRTPerms.  This morning, Stephen Frost and I worked up a quick
 patch showing how we could add a hook here to let a hypothetical
 SE-PostgreSQL module get control in the relevant place.  The attached
 patch also includes a toy contrib module showing how it could be used
 to enforce arbitrary security policy.
 
 I don't think that this by itself would be quite enough framework for
 a minimal SE-PostgreSQL implementation - for that, you'd probably need
 an object-labeling facility in core which SE-PostgreSQL could leverage
 - or else some other way to determine which the label associated with
 a given object - but I think that plus this would be enough.

I'd like to point out two more points are necessary to be considered
for DML permission checks in addition to ExecCheckRTPerms().

* DoCopy()

Although DoCopy() is called from standard_ProcessUtility(), it performs
as DML statement, rather than DDL. It check ACL_SELECT or ACL_INSERT on
the copied table or attributes, similar to what ExecCheckRTEPerms() doing.

* RI_Initial_Check()

RI_Initial_Check() is a function called on ALTER TABLE command to add FK
constraints between two relations. The permission to execute this ALTER TABLE
command itself is checked on ATPrepCmd() and ATAddForeignKeyConstraint(),
so it does not affect anything on the DML permission reworks.

When we add a new FK constraint, both of the existing FK and PK relations have
to satify the new constraint. So, RI_Initial_Check() tries to check whether the
PK relation has corresponding tuples to FK relation, or not.
Then, it tries to execute a secondary query using SPI_*() functions, if no
access violations are expected. Otherwise, it scans the FK relation with
per tuple checks sequentionally (see, validateForeignKeyConstraint()), but slow.

If we have an external security provider which will deny accesses on the FK/PK
relation, but the default PG checks allows it, the RI_Initial_Check() tries to
execute secondary SELECT statement, then it raises an access violation error,
although we are already allowed to execute ALTER TABLE statement.

Therefore, we also need to check DML permissions at RI_Initial_Check() to avoid
unexpected access violation error, prior to the secondary query.

BTW, I guess the reason why permissions on attributes are not checked here is
that we missed it at v8.4 development.


The attached patch provides a common checker function of DML, and modifies
ExecCheckRTPerms(), CopyTo() and RI_Initial_Check() to call the checker
function instead of individual ACL checks.

The most part of the checker function is cut  paste from ExecCheckRTEPerms(),
but its arguments are modified for easy invocation from other functions.

  extern bool check_dml_permissions(Oid relOid,
Oid userId,
AclMode requiredPerms,
Bitmapset *selCols,
Bitmapset *modCols,
bool abort);

Thanks,
-- 
KaiGai Kohei kai...@ak.jp.nec.com
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
***
*** 21,26 
--- 21,27 
  #include arpa/inet.h
  
  #include access/heapam.h
+ #include access/sysattr.h
  #include access/xact.h
  #include catalog/namespace.h
  #include catalog/pg_type.h
***
*** 41,46 
--- 42,48 
  #include utils/builtins.h
  #include utils/lsyscache.h
  #include utils/memutils.h
+ #include utils/security.h
  #include utils/snapmgr.h
  
  
***
*** 725,733  DoCopy(const CopyStmt *stmt, const char *queryString)
  	List	   *force_notnull = NIL;
  	bool		force_quote_all = false;
  	bool		format_specified = false;
- 	AclMode		required_access = (is_from ? ACL_INSERT : ACL_SELECT);
- 	AclMode		relPerms;
- 	AclMode		remainingPerms;
  	ListCell   *option;
  	TupleDesc	tupDesc;
  	int			num_phys_attrs;
--- 727,732 
***
*** 988,993  DoCopy(const CopyStmt *stmt, const char *queryString)
--- 987,996 
  
  	if (stmt-relation)
  	{
+ 		Bitmapset  *columnsSet = NULL;
+ 		List	   *attnums;
+ 		ListCell   *cur;
+ 
  		Assert(!stmt-query);
  		cstate-queryDesc = NULL;
  
***
*** 998,1026  DoCopy(const CopyStmt *stmt, const char *queryString)
  		tupDesc = RelationGetDescr(cstate-rel);
  
  		/* Check relation permissions. */
! 		relPerms = pg_class_aclmask(RelationGetRelid(cstate-rel), GetUserId(),
! 	required_access, ACLMASK_ALL);
! 		remainingPerms = required_access  ~relPerms;
! 		if (remainingPerms != 0)
  		{
! 			/* We don't have 

Re: [HACKERS] [v9.1] access control reworks in ALTER TABLE

2010-05-24 Thread KaiGai Kohei
At Ottawa, I had a talk with Robert Haas about this reworks.

Because a part of ALTER TABLE options need information which can
be gathered at execution stage, not preparation stage, the patch
tried to move all the access control stuff into execution stage.

However, he pointed out it has a matter the tables tried to be
altered may acquire locks on the table without permissions.
It can allow DoS attacks, so we agreed we should not move these
checks into execution stage.
Fortunately, only three ALTER TABLE options need information
gathered at execution stage (add inheritance; add FK constraints;
add index); So, we can rework them using separated checker functions,
so it is not a matter.

I marked it as returned with feedback.
At the 1st CF, we focus on the reworks of DML permission checks.
So, the rest of DDL works should be done on the 2nd.

Thanks,

(2010/03/17 16:26), KaiGai Kohei wrote:
 The attached patch reworks access control logics corresponding
 to ALTER TABLE statement, as a groundwork for the upcoming
 enhanced security features.
 
 As we discussed in the last commit fest, it shall be a preferable
 way to wrap up a unit of access control logics into some functions
 which are categorized by the actions to be checked. These functions
 will be able to perform as entrypoints of existing permission checks
 and the upcoming security features, like SELinux.
 
 The first step of this efforts is to consolidate existing access
 control codes into a unit for each operations. Once we reworks them,
 it is obvious to replace these unit by access control functions.
 ALTER TABLE is the most functional command in PostgreSQL, so it may
 be a good stater for this efforts.
 
 
 In this patch, I put access control codes in the execution stage
 after all needed information getting gathered. Then, privileges
 are checked at once.
 
 The ALTER TABLE implementation, exceptionally, has a few stages.
 The preparation stage applies sanity and permission checks, and
 expand the given command to child relations. Then, the execution
 stage updates system catalogs according to the given command.
 However, we don't have multiple stages in most of ddl statements.
 So, even if we adopt a basis to apply permission checks in the
 preparation stage, we cannot reuse this basis for others.
 
 Most of AT commands checks ownerships of the relation using
 ATSimplePermissions() in the preparation stage, but now a few number
 of commands also needs to check privileges in the execution stage
 dues to some reasons.
 
 (1) commands with self recursion
 ATExecDropColumn(), ATAddCheckConstraint() and ATExecDropConstraint()
 implements its recursive calls by itself. It means we cannot know
 child relations to be altered in the preparation stage, so it has
 the following code typically.
 
 |   /* At top level, permission check was done in ATPrepCmd, else do it */
 |   if (recursing)
 |   ATSimplePermissions(rel, false);
 
 (2) commands directly called from alter.c
 RenameRelation(), renameatt() and AlterTableNamespace() are directly
 called from alter.c, so they have no preparation stage. Instead of
 the ATSimplePermissions(), it calls CheckRelationOwnership() in the
 caller. Also, renameatt() checks ownership of the relation during
 its recursive calls.
 
 (3) commands need privilges to other objects
 - AlterTableNamespace() checks ACL_CREATE on the specified namespace
in the LookupCreationNamespace().
 - ATPrepSetTableSpace() checks ACL_CREATE on the specified tablespace.
 - ATExecAddIndex() checks ACL_CREATE on the namespace and ACL_CREATE
on the specified tablespace (if exist) in DefineIndex().
 - ATExecAddInherit() checks ownership of the specified parent relation.
 - ATAddForeignKeyConstraint() checks ACL_REFERENCES on the both of
specified table/columns.
 
 (4) commands needs its own sanity checks
 ATSimplePermissions() also ensures the relation is table (or view),
 and not a system catalog, not only checks its ownership.
 However, some of commands need to apply an exceptional sanity checks.
 ATSimplePermissionsRelationOrIndex() is used for sanity checks of the
 AT commands which is allowed on indexes. ATPrepSetStatistics() skips
 checks to ensure the relation is not system catalogs.
 
 
 At first, this patch broke down the ATSimplePermissions() into sanity
 checks and permission checks.
 
 This patch break down the ATSimplePermissions() into sanity checks
 and permissions checks, then later part is moved to the execution
 stage.
   - ATCheckSanity()
   - ATCheckOwnership()
 
 It enables to eliminate (1) special case handling on the commands with
 self recursion, and (4) exceptional sanity checks because ATCheckSanity()
 takes three arguments to control sanity checks behavior.
 
 It also means we don't need to consider (2) functions are exceptional,
 because all the commands apply checks in the execution stage.
 
 For the (3) functions, it moved ownership checks nearby the permission
 checks to other object, 

Re: [HACKERS] Specification for Trusted PLs?

2010-05-24 Thread Jan Wieck

On 5/23/2010 11:19 PM, Andrew Dunstan wrote:


Jan Wieck wrote:


ISTM we are in danger of confusing several different things. A user 
that doesn't want data to be shared should not stash it in global 
objects. But to me, trusting a language is not about making data 
private, but about not allowing the user to do things that are 
dangerous, such as referencing memory, or the file system, or the 
operating system, or network connections, or loading code which might 
do any of those things.


How is loading code which might do any of those things different 
from writing a stored procedure, that accesses data, a careless 
superuser left in a global variable? Remember, the code of a PL 
function is open source - like in everyone can select from 
pg_proc. You really don't expect anyone to scan for your global 
variables just because they can write functions in the same language?




Well, that threat arises from the unsafe actions of the careless 
superuser. And we could at least ameliorate it by providing a per role 
data stash, at very little cost, as I mentioned. It's not like we don't 
know about such threats, and I'm certainly not pretending they don't 
exist. The 9.0 PL/Perl docs say:


The %_SHARED variable and other global state within the language is
public data, available to all PL/Perl functions within a session.
Use with care, especially in situations that involve use of multiple
roles or SECURITY DEFINER functions.


But the threats I was referring to arise if the language allows them to, 
without any requirement for unsafe actions by another user. Protecting 
against those is the essence of trustedness in my mind at least.


I can agree with that.


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ExecutorCheckPerms() hook

2010-05-24 Thread Robert Haas
2010/5/24 KaiGai Kohei kai...@ak.jp.nec.com:
 BTW, I guess the reason why permissions on attributes are not checked here is
 that we missed it at v8.4 development.

That's a little worrying.  Can you construct and post a test case
where this results in a user-visible failure in CVS HEAD?

 The attached patch provides a common checker function of DML, and modifies
 ExecCheckRTPerms(), CopyTo() and RI_Initial_Check() to call the checker
 function instead of individual ACL checks.

This looks pretty sane to me, although I have not done a full review.
I am disinclined to create a whole new directory for it.   I think the
new function should go in src/backend/catalog/aclchk.c and be declared
in src/include/utils/acl.h.  If that sounds reasonable to you, please
revise and post an updated patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Synchronization levels in SR

2010-05-24 Thread Fujii Masao
Hi,

I'm now designing the synchronous replication feature based on
SR for 9.1, while discussing that at another thread.
http://archives.postgresql.org/pgsql-hackers/2010-04/msg01516.php

At the first design phase, I'd like to clarify which synch levels
should be supported 9.1 and how it should be specified by users.

The log-shipping replication has some synch levels as follows.

   The transaction commit on the master
   #1 doesn't wait for replication (already suppored in 9.0)
   #2 waits for WAL to be received by the standby
   #3 waits for WAL to be received and flushed by the standby
   #4 waits for WAL to be received, flushed and replayed by
  the standby
   ..etc?

Which should we include in 9.1? I'd like to add #2 and #3.
They are enough for high-availability use case (i.e., to
prevent failover from losing any transactions committed).
AFAIR, MySQL semi-synchronous replication supports #2 level.

#4 is useful for some cases, but might often make the
transaction commit on the master get stuck since read-only
query can easily block recovery by the lock conflict. So
#4 seems not to be worth working on until that HS problem
has been addressed. Thought?

Second, we need to discuss about how to specify the synch
level. There are three approaches:

* Per standby
  Since the purpose, location and H/W resource often differ
  from one standby to another, specifying level per standby
  (i.e., we set the level in recovery.conf) is a
  straightforward approach, I think. For example, we can
  choose #3 for high-availability standby near the master,
  and choose #1 (async) for the disaster recovery standby
  remote.

* Per transaction
  Define the PGC_USERSET option specifying the level and
  specify it on the master in response to the purpose of
  transaction. In this approach, for example, we can choose
  #4 for the transaction which should be visible on the
  standby as soon as a success of the commit has been
  returned to a client. We can also choose #1 for
  time-critical but not mission-critical transaction.

* Mix
  Allow users to specify the level per standby and
  transaction at the same time, and then calculate the real
  level from them by using some algorithm.

Which should we adopt for 9.1? I'd like to implement the
per-standby approach at first since it's simple and seems
to cover more use cases. Thought?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Stefan's bug (was: max_standby_delay considered harmful)

2010-05-24 Thread Robert Haas
On Mon, May 24, 2010 at 1:27 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, May 19, 2010 at 2:47 PM, Fujii Masao masao.fu...@gmail.com wrote:
 Oh, right. How about allowing the postmaster only in PM_STARTUP,
 PM_RECOVERY, PM_HOT_STANDBY or PM_WAIT_READONLY state to invoke
 walreceiver? We can keep walreceiver alive until all read only
 backends have gone, and prevent unexpected startup of walreceiver.

 Yes, that seems like something we should be checking, if we aren't already.

 I'll do that.

 Here is the updated version. I added the above-mentioned check
 into the patch.

This looks pretty reasonable to me, but I guess I feel like it would
be better to drive the CancelBackup() decision off of whether we've
ever reached PM_RUN rather than consulting XLogCtl.  It just feels
cleaner to me to drive all of the postmaster decisions off of the same
signalling mechanism rather than having a separate one (that only
works because it's used very late in shutdown when we theoretically
don't need a lock) just for this one case.

I could be all wet, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-24 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


 In light of the proposed purging scheme, how would it be able to distinguish 
 between those two cases (nothing there yet vs. was there but purged)?

 There is a difference between an empty result set and an exception.

No, I meant how will the *function* know, if a superuser and/or some 
background process can purge records at any time?

- -- 
Greg Sabino Mullane g...@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201005240928
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAkv6f0UACgkQvJuQZxSWSsh0xwCgmXLtKngoBBYX0TxDM2TlJRId
AVIAoMHYa3c9Ej2vUJyFufxBR5vDPzQ+
=e1mh
-END PGP SIGNATURE-



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages

2010-05-24 Thread Heikki Linnakangas

On 22/05/10 16:35, Tom Lane wrote:

Josh Berkusj...@agliodbs.com  writes:

   From a discussion at dinner at pgcon, I wanted to send this to the list
for people to poke holes in it:


Somebody (I think Joe or Heikki) poked a big hole in this last night at
the Royal Oak.


Me.


 Although the scheme would get rid of the need to replace
old XIDs with FrozenXid, it does not get rid of the need to set hint
bits before you can truncate CLOG.


Hmm, we don't rely on setting hint bits to truncate CLOG anymore 
(http://archives.postgresql.org/pgsql-committers/2006-11/msg00026.php). 
It's the replacement of xids with FrozenXid that matters, the hint bits 
are really just hints.


Doesn't change the conclusion, though: you still need to replace XIDs 
with FrozenXids to truncate the clog. Conceivably we could keep around 
more than 2^32 transactions in clog with this scheme, but then you need 
a lot more space for the clog. But perhaps it would be better to do that 
than to launch anti-wraparound vacuums, or to refuse more updates in the 
extreme cases.



 So in your example of an insert-only
table that's probably never read again, there's still a minimum of one
update visit required on every old page.  Now that's still better than
two update visits ... but we could manage that already, just by tweaking
vacuum's heuristics about when to freeze vs when to set hint bits.


(As also discussed in the Royal Oak) I think we should simply not dirty 
a page when a hint bit is updated. Reading a page from disk is 
expensive, setting hint bits on the access is generally cheap compared 
to that. But that is orthogonal to the idea of a per-page XID epoch.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Stefan's bug (was: max_standby_delay considered harmful)

2010-05-24 Thread Simon Riggs
On Mon, 2010-05-24 at 09:26 -0400, Robert Haas wrote:
 On Mon, May 24, 2010 at 1:27 AM, Fujii Masao masao.fu...@gmail.com wrote:
  On Wed, May 19, 2010 at 2:47 PM, Fujii Masao masao.fu...@gmail.com wrote:
  Oh, right. How about allowing the postmaster only in PM_STARTUP,
  PM_RECOVERY, PM_HOT_STANDBY or PM_WAIT_READONLY state to invoke
  walreceiver? We can keep walreceiver alive until all read only
  backends have gone, and prevent unexpected startup of walreceiver.
 
  Yes, that seems like something we should be checking, if we aren't 
  already.
 
  I'll do that.
 
  Here is the updated version. I added the above-mentioned check
  into the patch.
 
 This looks pretty reasonable to me, but I guess I feel like it would
 be better to drive the CancelBackup() decision off of whether we've
 ever reached PM_RUN rather than consulting XLogCtl. 

That is exactly what XLogCtl tells us and why it is suggested for use.

  It just feels
 cleaner to me to drive all of the postmaster decisions off of the same
 signalling mechanism rather than having a separate one (that only
 works because it's used very late in shutdown when we theoretically
 don't need a lock) just for this one case.
 
 I could be all wet, though.
 
-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-24 Thread Robert Haas
On Sun, May 23, 2010 at 9:44 PM, Jan Wieck janwi...@yahoo.com wrote:
 I'm not sure the retention policies of the shared buffer cache, the WAL
 buffers, CLOG buffers and every other thing we try to cache are that easy to
 fold into one single set of logic. But I'm all ears.

I'm not sure either, although it seems like LRU ought to be good
enough for most things.  I'm more worried about things like whether
the BufferDesc abstraction is going to get in the way.

 CommitTransaction() inside of xact.c will call a function, that inserts
 a new record into this array. The operation will for most of the time be
 nothing than taking a spinlock and adding the record to shared memory.
 All the data for the record is readily available, does not require
 further locking and can be collected locally before taking the spinlock.

 What happens when you need to switch pages?

 Then the code will have to grab another free buffer or evict one.

Hopefully not while holding a spin lock.  :-)

 The function will return the sequence number which CommitTransaction()
 in turn will record in the WAL commit record together with the
 begin_timestamp. While both, the begin as well as the commit timestamp
 are crucial to determine what data a particular transaction should have
 seen, the row count is not and will not be recorded in WAL.

 It would certainly be better if we didn't to bloat the commit xlog
 records to do this.  Is there any way to avoid that?

 If you can tell me how a crash recovering system can figure out what the
 exact sequence number of the WAL commit record at hand should be, let's
 rip it.

Hmm...  could we get away with WAL-logging the next sequence number
just once per checkpoint?  When you replay the checkpoint record, you
update the control file with the sequence number.  Then all the
commits up through the next checkpoint just use consecutive numbers
starting at that value.

 It is an option. Keep it until I tell you is a perfectly valid
 configuration option. One you probably don't want to forget about, but valid
 none the less.

As Tom is fond of saying, if it breaks, you get to keep both pieces.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Stefan's bug (was: max_standby_delay considered harmful)

2010-05-24 Thread Robert Haas
On Mon, May 24, 2010 at 9:28 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Mon, 2010-05-24 at 09:26 -0400, Robert Haas wrote:
 On Mon, May 24, 2010 at 1:27 AM, Fujii Masao masao.fu...@gmail.com wrote:
  On Wed, May 19, 2010 at 2:47 PM, Fujii Masao masao.fu...@gmail.com wrote:
  Oh, right. How about allowing the postmaster only in PM_STARTUP,
  PM_RECOVERY, PM_HOT_STANDBY or PM_WAIT_READONLY state to invoke
  walreceiver? We can keep walreceiver alive until all read only
  backends have gone, and prevent unexpected startup of walreceiver.
 
  Yes, that seems like something we should be checking, if we aren't 
  already.
 
  I'll do that.
 
  Here is the updated version. I added the above-mentioned check
  into the patch.

 This looks pretty reasonable to me, but I guess I feel like it would
 be better to drive the CancelBackup() decision off of whether we've
 ever reached PM_RUN rather than consulting XLogCtl.

 That is exactly what XLogCtl tells us and why it is suggested for use.

Sure.  My only point is that the postmaster doesn't (and can't) use
that method of getting the information at any other time when it is
needed, so I don't know why we'd want to use it in just this one case.
 Maybe there's a reason, but it's not obvious to me.

  It just feels
 cleaner to me to drive all of the postmaster decisions off of the same
 signalling mechanism rather than having a separate one (that only
 works because it's used very late in shutdown when we theoretically
 don't need a lock) just for this one case.

 I could be all wet, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages

2010-05-24 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 (As also discussed in the Royal Oak) I think we should simply not dirty 
 a page when a hint bit is updated. Reading a page from disk is 
 expensive, setting hint bits on the access is generally cheap compared 
 to that. But that is orthogonal to the idea of a per-page XID epoch.

I'm not sure it's cheap.  What you suggest would result in a substantial
increase in clog accesses, which means (1) more I/O and (2) more
contention.  Certainly it's worth experimenting with, but it's no
guaranteed win.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Stefan's bug (was: max_standby_delay considered harmful)

2010-05-24 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, May 24, 2010 at 9:28 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Mon, 2010-05-24 at 09:26 -0400, Robert Haas wrote:
 This looks pretty reasonable to me, but I guess I feel like it would
 be better to drive the CancelBackup() decision off of whether we've
 ever reached PM_RUN rather than consulting XLogCtl.
 
 That is exactly what XLogCtl tells us and why it is suggested for use.

 Sure.  My only point is that the postmaster doesn't (and can't) use
 that method of getting the information at any other time when it is
 needed, so I don't know why we'd want to use it in just this one case.
  Maybe there's a reason, but it's not obvious to me.

I'm with Robert on this.  The postmaster is designed to be driven by an
internal state machine.  Making it rely on the contents of shared memory
is a fundamentally dangerous idea.  It might coincidentally be safe in
this one case, but I can easily imagine that property failing as a result
of subsequent changes.

The postmaster should not look at shared memory if there is any
reasonable alternative, and we clearly have a reasonable alternative.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Specification for Trusted PLs?

2010-05-24 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


 Well, the best way to define what a trusted language can do is to
 define a *whitelist* of what it can do, not a blacklist of what it
 can't do. That's the only way to get a complete definition. It's then
 up to the implementation step to figure out how to represent that in
 the form of tests.

 Yes, PL/Perl is following this approach. For a whitelist see
 plperl_opmask.h (generated by plperl_opmask.pl at build phase).

Ah, okay, I can mostly agree with that. My objection was with trying 
to build a cross-language generic whitelist. But it looks like the 
ship has already sailed upthread and we've more or less got a working 
definition. David, I think you started this thread, I assume you have 
some concrete reason for asking about this (new trusted language?). 
May have been stated, but I missed it.

- -- 
Greg Sabino Mullane g...@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201005241025
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAkv6jE4ACgkQvJuQZxSWSsjWugCdEwR/n0V3IeFB7w/h5hhPQW/J
ln0An2FZKa2CHWaWdHKOvQvEbBIvyzwK
=wqO5
-END PGP SIGNATURE-



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Stefan's bug (was: max_standby_delay considered harmful)

2010-05-24 Thread Fujii Masao
On Mon, May 24, 2010 at 10:26 PM, Robert Haas robertmh...@gmail.com wrote:
 This looks pretty reasonable to me, but I guess I feel like it would
 be better to drive the CancelBackup() decision off of whether we've
 ever reached PM_RUN rather than consulting XLogCtl.  It just feels
 cleaner to me to drive all of the postmaster decisions off of the same
 signalling mechanism rather than having a separate one (that only
 works because it's used very late in shutdown when we theoretically
 don't need a lock) just for this one case.

Okay, how about the attached patch? It uses the postmaster-local flag
ReachedEndOfRecovery (better name?) instead of XLogCtl one.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


fix_smart_shutdown_in_recovery_v5_fujii.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Stefan's bug (was: max_standby_delay considered harmful)

2010-05-24 Thread Robert Haas
On Mon, May 24, 2010 at 10:35 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Mon, May 24, 2010 at 10:26 PM, Robert Haas robertmh...@gmail.com wrote:
 This looks pretty reasonable to me, but I guess I feel like it would
 be better to drive the CancelBackup() decision off of whether we've
 ever reached PM_RUN rather than consulting XLogCtl.  It just feels
 cleaner to me to drive all of the postmaster decisions off of the same
 signalling mechanism rather than having a separate one (that only
 works because it's used very late in shutdown when we theoretically
 don't need a lock) just for this one case.

 Okay, how about the attached patch? It uses the postmaster-local flag
 ReachedEndOfRecovery (better name?) instead of XLogCtl one.

Looks good to me.  I will test and apply.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Stefan's bug (was: max_standby_delay considered harmful)

2010-05-24 Thread Tom Lane
Fujii Masao masao.fu...@gmail.com writes:
 Okay, how about the attached patch? It uses the postmaster-local flag
 ReachedEndOfRecovery (better name?) instead of XLogCtl one.

ReachedNormalRunning, perhaps?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-24 Thread Kevin Grittner
Jan Wieck wrote:
 
 In some systems (data warehousing, replication), the order of
 commits is important, since that is the order in which changes
 have become visible.
 
This issue intersects with the serializable work I've been doing.
While in database transactions using S2PL the above is true, in
snapshot isolation and the SSI implementation of serializable
transactions, it's not. In particular, the snapshot anomalies which
can cause non-serializable behavior happen precisely because the
apparent order of execution doesn't match anything so linear as
order of commit.
 
I'll raise that receipting example again. You have transactions
which grab the current deposit data and insert it into receipts, as
payments are received. At some point in the afternoon, the deposit
date in a control table is changed to the next day, so that the
receipts up to that point can be deposited during banking hours with
the current date as their deposit date. A report is printed (and
likely a transfer transaction recorded to move cash in drawer to
cash in checking, but I'll ignore that aspect for this example).
Some receipts may not be committed when the update to the date in
the control table is committed.
 
This is eventually consistent -- once all the receipts with the
old date commit or roll back the database is OK, but until then you
might be able to select the new date in the control table and the
set of receipts matching the old date without the database telling
you that you're missing data. The new serializable implementation
fixes this, but there are open RD items (due to the need to discuss
the issues) on the related Wiki page related to hot standby and
other replication. Will we be able to support transactional
integrity on slave machines?
 
What if the update to the control table and the insert of receipts
all happen on the master, but someone decides to move the (now
happily working correctly with serializable transactions) reporting
to a slave machine? (And by the way, don't get too hung up on this
particular example, I could generate dozens more on demand -- the
point is that order of commit doesn't always correspond to apparent
order of execution; in this case the receipts *appear* to have
executed first, because they are using a value later updated to
something else by a different transaction, even though that other
transaction *committed* first.)
 
Replicating or recreating the whole predicate locking and conflict
detection on slaves is not feasible for performance reasons. (I
won't elaborate unless someone feels that's not intuitively
obvious.) The only sane way I can see to have a slave database allow
serializable behavior is to WAL-log the acquisition of a snapshot by
a serializable transaction, and the rollback or commit, on the
master, and to have the serializable snapshot build on a slave
exclude any serializable transactions for which there are still
concurrent serializable transactions. Yes, that does mean WAL-
logging the snapshot acquisition even if the transaction doesn't yet
have an xid, and WAL-logging the commit or rollback even if it never
acquires an xid.
 
I think this solve the issue Jan raises as long as serializable
transactions are used; if they aren't there are no guarantees of
transactional integrity no matter how you track commit sequence,
unless it can be based on S2PL-type blocking locks.  I'll have to
leave that to someone else to sort out.
 
-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-24 Thread Robert Haas
On Mon, May 24, 2010 at 11:24 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Jan Wieck wrote:

 In some systems (data warehousing, replication), the order of
 commits is important, since that is the order in which changes
 have become visible.

 This issue intersects with the serializable work I've been doing.
 While in database transactions using S2PL the above is true, in
 snapshot isolation and the SSI implementation of serializable
 transactions, it's not.

I think you're confusing two subtly different things.  The way to
prove that a set of transactions running under some implementation of
serializability is actually serializable is to construct a serial
order of execution consistent with the view of the database that each
transaction saw.  This may or may not match the commit order, as you
say.  But the commit order is still the order the effects of those
transactions have become visible - if we inserted a new read-only
transaction into the stream at some arbitrary point in time, it would
see all the transactions which committed before it and none of those
that committed afterward.  So I think Jan's statement is correct.

Having said that, I think your concerns about how things will look
from a slave's point of view are possibly valid.  A transaction
running on a slave is essentially a read-only transaction that the
master doesn't know about.  It's not clear to me whether adding such a
transaction to the timeline could result in either (a) that
transaction being rolled back or (b) some impact on which other
transactions got rolled back.  If it did, that would obviously be a
problem for serializability on slaves, though your proposed fix sounds
like it would be prohibitively expensive for many users.  But can this
actually happen?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Move 'long long' check to c.h

2010-05-24 Thread Michael Meskes
On Sun, May 23, 2010 at 11:50:00AM -0400, Stephen Frost wrote:
  If we need this we should do it properly with autoconf.

I absolutely agree and planed to do that *after* the release if it makes sense
for the rest of PG, but wouldn't want to mess with it in the current
situtation. On the other hand I didn't want to release with that bug in there.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
ICQ 179140304, AIM/Yahoo/Skype michaelmeskes, Jabber mes...@jabber.org
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-24 Thread Heikki Linnakangas

On 24/05/10 16:20, Fujii Masao wrote:

The log-shipping replication has some synch levels as follows.

The transaction commit on the master
#1 doesn't wait for replication (already suppored in 9.0)
#2 waits for WAL to be received by the standby
#3 waits for WAL to be received and flushed by the standby
#4 waits for WAL to be received, flushed and replayed by
   the standby
..etc?

Which should we include in 9.1? I'd like to add #2 and #3.
They are enough for high-availability use case (i.e., to
prevent failover from losing any transactions committed).
AFAIR, MySQL semi-synchronous replication supports #2 level.

#4 is useful for some cases, but might often make the
transaction commit on the master get stuck since read-only
query can easily block recovery by the lock conflict. So
#4 seems not to be worth working on until that HS problem
has been addressed. Thought?


I see a lot of value in #4; it makes it possible to distribute read-only 
load to the standby using something like pgbouncer, completely 
transparently to the application. In the lesser modes, the application 
can see slightly stale results.


But whatever we can easily implement, really. Pick one that you think is 
the easiest and start with that, but keep the other modes in mind in the 
design and in the user interface so that you don't paint yourself in the 
corner.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Move 'long long' check to c.h

2010-05-24 Thread Michael Meskes
On Sat, May 22, 2010 at 11:20:50PM -0400, Stephen Frost wrote:
   git diff -p), I noted that c.h is already included by both extern.h
   and ecpg.header through postgres_fe.h.  Given this and that we're
   already doing alot of similar #define's there (unlike in those other
   files), I felt c.h was a more appropriate place.  Putting it in c.h
   also means we don't have to duplicate that code.

But do other parts of PG also need it? Keep in mind that this works for ecpg
because it needs LLONG_MIN or LONGLONG_MIN anyway. I'm not sure if there are
compilers that have long long without those defines, but I'd guess there
aren't.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
ICQ 179140304, AIM/Yahoo/Skype michaelmeskes, Jabber mes...@jabber.org
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] libpq, PQexecPrepared, data size sent to FE vs. FETCH_COUNT

2010-05-24 Thread Alex Goncharov
I have some libpq-using application code, in which fetching the data
follows this logic (after a statement has been prepared):



  PQexecPrepared(pg_result, pg_conn, pg_statement_name, input_param_cnt,
 param_values, param_lengths, param_formats, result_format);
  
  PQntuples(rows_in_result, pg_result);
  
  /* The application provides storage so that I can pass a certain number of 
rows
   * (rows_to_pass_up) to the caller, and I repeat the following loop until
   * many rows_to_pass_up cover all the rows_in_result (pg_row_num_base keeps 
the track
   * of where I am in the process. */
  
  for (int row_idx = 0; row_idx  rows_to_pass_up; ++row_idx) {
const int pg_row_number = row_idx + pg_row_num_base; 

for (int pg_column_number = 0; pg_column_number  result_column_cnt_ 
++pg_column_number) {
PQgetvalue(value, pg_result, pg_row_number, pg_column_number);
PQgetlength(length, pg_result, pg_row_number, pg_column_number);
}
  }



My question is: am I doing the right thing from the data size being
passed from BE to FE perspective?

The code in `bin/psql' relies on the value of the FETCH_COUNT
parameter to build an appropriate

fetch forward FETCH_COUNT from _psql_cursor

command.

No equivalent of FETCH_COUNT is available at the libpq level, so I
assume that the interface I am using is smart enough not to send
gigabytes of data to FE.

Is that right? Is the logic I am using safe and good?

Where does the result set (GBs of data) reside after I call
PQexecPrepared?  On BE, I hope?

Thanks,

-- Alex -- alex-goncha...@comcast.net --

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages

2010-05-24 Thread Josh Berkus



I'm not sure it's cheap.  What you suggest would result in a substantial
increase in clog accesses, which means (1) more I/O and (2) more
contention.  Certainly it's worth experimenting with, but it's no
guaranteed win.


It seems like there's a number of issues we could fix by making the CLOG 
more efficient somehow -- from the elimination of hint bits to the 
ability to freeze pages without writing them.


Not, of course, that I have any idea how to do that.

--
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-24 Thread Kevin Grittner
Robert Haas  wrote:
 
 I think you're confusing two subtly different things.
 
The only thing I'm confused about is what benefit anyone expects to
get from looking at data between commits in some way other than our
current snapshot mechanism.  Can someone explain a use case where
what Jan is proposing is better than snapshot isolation?  It doesn't
provide any additional integrity guarantees that I can see.
 
 But the commit order is still the order the effects of those
 transactions have become visible - if we inserted a new read-only
 transaction into the stream at some arbitrary point in time, it
 would see all the transactions which committed before it and none
 of those that committed afterward.
 
Isn't that what a snapshot does already?
 
 your proposed fix sounds like it would be prohibitively expensive
 for many users. But can this actually happen?
 
How so?  The transaction start/end logging, or looking at that data
when building a snapshot?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-24 Thread Heikki Linnakangas

On 24/05/10 19:51, Kevin Grittner wrote:

The only thing I'm confused about is what benefit anyone expects to
get from looking at data between commits in some way other than our
current snapshot mechanism.  Can someone explain a use case where
what Jan is proposing is better than snapshot isolation?  It doesn't
provide any additional integrity guarantees that I can see.


Right, it doesn't. What it provides is a way to reconstruct a snapshot 
at any point in time, after the fact. For example, after transactions A, 
C, D and B have committed in that order, it allows you to reconstruct a 
snapshot just like you would've gotten immediately after the commit of 
A, C, D and B respectively. That's useful replication tools like Slony 
that needs to commit the changes of those transactions in the slave in 
the same order as they were committed in the master.


I don't know enough of Slony et al. to understand why that'd be better 
than the current heartbeat mechanism they use, taking a snapshot every 
few seconds, batching commits.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Move 'long long' check to c.h

2010-05-24 Thread Tom Lane
Michael Meskes mes...@postgresql.org writes:
 On Sat, May 22, 2010 at 11:20:50PM -0400, Stephen Frost wrote:
 git diff -p), I noted that c.h is already included by both extern.h
 and ecpg.header through postgres_fe.h.  Given this and that we're
 already doing alot of similar #define's there (unlike in those other
 files), I felt c.h was a more appropriate place.  Putting it in c.h
 also means we don't have to duplicate that code.

 But do other parts of PG also need it? Keep in mind that this works for ecpg
 because it needs LLONG_MIN or LONGLONG_MIN anyway. I'm not sure if there are
 compilers that have long long without those defines, but I'd guess there
 aren't.

I think the current coding is extremely fragile (if it indeed works at
all) because of its assumption that limits.h has been included
already.  In any case, we have configure tests that exist only for the
benefit of contrib modules, so it's hard to argue that we shouldn't have
one that exists only for ecpg.

I think we should fix this (properly) for 9.0.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade docs

2010-05-24 Thread Bruce Momjian
Stefan Kaltenbrunner wrote:
 On 05/19/2010 05:16 PM, Bruce Momjian wrote:
  Andres Freund wrote:
  On Wednesday 19 May 2010 22:39:32 Bruce Momjian wrote:
  There are some limitations when migrating from 8.3 to 8.4, but not when
  migrating from 8.3 to 9.0, because we added a feature to 9.0.  Can you
  give a specific example?
  Didnt the 'name' alignment change?
  
  Uh, the heading above that item is:
  
titleLimitations in migrating emphasisfrom/ PostgreSQL
8.3/title
  
  What is unclear there?  It covers going to 8.4 and 9.0.
 
 well the wording makes it kinda unclear on what happens if you go FROM
 8.4 to 9.0. If there are no known limits we might want to add a small
 note saying so. If there are some we might want to restructure the
 paragraph a bit...

Sorry for the delay in replying.  The section you list is titled:

F.31.4. Limitations in migrating from PostgreSQL 8.3

http://developer.postgresql.org/pgdocs/postgres/pgupgrade.html

and the first sentence is:

   pg_upgrade will not work for a migration from 8.3 if a user column

I have updated the paragraph to be:

   Upgrading from PostgreSQL 8.3 has additional restrictions not present
   when upgrading from later PostgreSQL releases.  For example,
   pg_upgrade will not work for a migration from 8.3 if a user column
   is defined as:

Can you suggest other wording?

FYI, the items that affect only 8.3 to 8.4 migrations are no longer in
the 9.0 pg_upgrade docs because we don't support 8.4 as a target
anymore;  specifically:

Limitations In Migrating _to_ PostgreSQL 8.4
--
pg_migrator will not work if a user column is defined as:

o  a user-defined composite data type
o  a user-defined array data type
o  a user-defined enum data type

You must drop any such columns and migrate them manually.

You can see the full documentation here:


http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/pg-migrator/pg_migrator/README?rev=1.78content-type=text/x-cvsweb-markup

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta testing - pg_upgrade bug fix - double free

2010-05-24 Thread Bruce Momjian
Pavel Stehule wrote:
 Hello
 
 it fixes bug
 
 pg_upgrade(13359) malloc: *** error for object 0x801600:
 non-page-aligned, non-allocated pointer being freed
 *** set a breakpoint in malloc_error_break to debug
 
 
 arget 03:31 /usr/local/src/postgresql/contrib/pg_upgrade git diff .
 diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c
 index 31f12fb..f989229 100644
 --- a/contrib/pg_upgrade/check.c
 +++ b/contrib/pg_upgrade/check.c
 @@ -154,7 +154,6 @@ issue_warnings(migratorContext *ctx, char
 *sequence_script_file_name)
  ctx-new.bindir,
 ctx-new.port, sequence_script_file_name,
  ctx-logfile);
unlink(sequence_script_file_name);
 -   pg_free(sequence_script_file_name);
check_ok(ctx);
}

Thanks for the report.  Tom has applied your fix.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade docs

2010-05-24 Thread Stefan Kaltenbrunner

On 05/24/2010 07:43 PM, Bruce Momjian wrote:

Stefan Kaltenbrunner wrote:

On 05/19/2010 05:16 PM, Bruce Momjian wrote:

Andres Freund wrote:

On Wednesday 19 May 2010 22:39:32 Bruce Momjian wrote:

There are some limitations when migrating from 8.3 to 8.4, but not when
migrating from 8.3 to 9.0, because we added a feature to 9.0.  Can you
give a specific example?

Didnt the 'name' alignment change?


Uh, the heading above that item is:

   titleLimitations in migratingemphasisfrom/  PostgreSQL
   8.3/title

What is unclear there?  It covers going to 8.4 and 9.0.


well the wording makes it kinda unclear on what happens if you go FROM
8.4 to 9.0. If there are no known limits we might want to add a small
note saying so. If there are some we might want to restructure the
paragraph a bit...


Sorry for the delay in replying.  The section you list is titled:

F.31.4. Limitations in migrating from PostgreSQL 8.3

http://developer.postgresql.org/pgdocs/postgres/pgupgrade.html

and the first sentence is:

pg_upgrade will not work for a migration from 8.3 if a user column

I have updated the paragraph to be:

Upgrading from PostgreSQL 8.3 has additional restrictions not present
when upgrading from later PostgreSQL releases.  For example,
pg_upgrade will not work for a migration from 8.3 if a user column
is defined as:

Can you suggest other wording?


hmm that seems better thanks, however I just noticed that we don't have 
a general limitations section. The way the docs are now done suggests 
that there are not limitations at all (except for the two warnings in 
the migration guide). Is pg_upgrade really up to the point where it can 
fully replace pg_dump  pg_restore independent of the loaded (contrib) 
or even third party modules(like postgis or custom datatypes etc)?




Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-24 Thread Robert Haas
On Mon, May 24, 2010 at 12:51 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Robert Haas  wrote:
 I think you're confusing two subtly different things.

 The only thing I'm confused about is what benefit anyone expects to
 get from looking at data between commits in some way other than our
 current snapshot mechanism.  Can someone explain a use case where
 what Jan is proposing is better than snapshot isolation?  It doesn't
 provide any additional integrity guarantees that I can see.

It's a tool for replication solutions to use.

 But the commit order is still the order the effects of those
 transactions have become visible - if we inserted a new read-only
 transaction into the stream at some arbitrary point in time, it
 would see all the transactions which committed before it and none
 of those that committed afterward.

 Isn't that what a snapshot does already?

Yes, for a particular transaction.  But this is to allow transactions
to be replayed (in order) on another node.

 your proposed fix sounds like it would be prohibitively expensive
 for many users. But can this actually happen?

 How so?  The transaction start/end logging, or looking at that data
 when building a snapshot?

I guess what I'm asking is - if the reconstructed transaction order
inferred by SSI doesn't match the actual commit order, can we get a
serialization anomaly on the standby by replaying transactions there
in commit order?  Can you give an example and explain how your
proposal would solve it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-24 Thread Alvaro Herrera
Excerpts from Robert Haas's message of dom may 23 20:38:14 -0400 2010:
 On Sun, May 23, 2010 at 4:21 PM, Jan Wieck janwi...@yahoo.com wrote:
  The system will have postgresql.conf options for enabling/disabling the
  whole shebang, how many shared buffers to allocate for managing access
  to the data and to define the retention period of the data based on data
  volume and/or age of the commit records.
 
 It would be nice if this could just be managed out of shared_buffers
 rather than needing to configure a separate pool just for this
 feature.

FWIW we've talked about this for years -- see old discussions about how
pg_subtrans becomes a bottleneck in certain cases and you want to
enlarge the number of buffers allocated to it (probably easy to find by
searching posts from Jignesh).  I'm guessing the new notify code would
benefit from this as well.

It'd be nice to have as a side effect, but if not, IMHO this proposal
could simply use a fixed buffer pool like all other slru.c callers until
someone gets around to fixing that.  Adding more GUC switches for this
strikes me as overkill.

-- 
Álvaro Herrera alvhe...@alvh.no-ip.org

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade docs

2010-05-24 Thread Bruce Momjian
Stefan Kaltenbrunner wrote:
  I have updated the paragraph to be:
 
  Upgrading from PostgreSQL 8.3 has additional restrictions not present
  when upgrading from later PostgreSQL releases.  For example,
  pg_upgrade will not work for a migration from 8.3 if a user column
  is defined as:
 
  Can you suggest other wording?
 
 hmm that seems better thanks, however I just noticed that we don't have 
 a general limitations section. The way the docs are now done suggests 
 that there are not limitations at all (except for the two warnings in 
 the migration guide). Is pg_upgrade really up to the point where it can 
 fully replace pg_dump  pg_restore independent of the loaded (contrib) 
 or even third party modules(like postgis or custom datatypes etc)?

Yea, that's about right.  I can add limiations if you want.  ;-)

The only open pg_upgrade items are the ones on our TODO list:

http://wiki.postgresql.org/wiki/Todo

(I can't give you a URL hash-reference to the section because it doesn't
work on Firefox and no one seems to be able to fix it.)

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-24 Thread Robert Haas
On Mon, May 24, 2010 at 3:07 PM, Alvaro Herrera alvhe...@alvh.no-ip.org wrote:
 It'd be nice to have as a side effect, but if not, IMHO this proposal
 could simply use a fixed buffer pool like all other slru.c callers until
 someone gets around to fixing that.  Adding more GUC switches for this
 strikes me as overkill.

I agree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-24 Thread Dan Ports
On Sun, May 23, 2010 at 04:21:58PM -0400, Jan Wieck wrote:
 In some systems (data warehousing, replication), the order of commits is
 important, since that is the order in which changes have become visible.
 This information could theoretically be extracted from the WAL, but
 scanning the entire WAL just to extract this tidbit of information would
 be excruciatingly painful.

This is very interesting to me as I've been doing some (research --
nowhere near production-level) work on building a transactional
application-level (i.e. memcached-like) cache atop Postgres. One of the
features I needed to support it was basically what you describe.

Without getting too far into the details of what I'm doing, I needed to
make it clear to a higher layer which commits were visible to a given
query. That is, I wanted to know both the order of commits and where
particular snapshots fit into this ordering. (A SnapshotData struct
obviously contains the visibility information, but a representation in
terms of the commit ordering is both more succinct and allows for easy
ordering comparisons).

Something you might want to consider, then, is adding an interface to
find out the timestamp of the current transaction's snapshot, i.e. the
timestamp of the most recent committed transaction visible to it. I
wouldn't expect this to be difficult to implement as transaction
completion/visibility is already synchronized via ProcArrayLock.

 Each record of the Transaction Commit Info consists of
 
   txid  xci_transaction_id
   timestamptz   xci_begin_timestamp
   timestamptz   xci_commit_timestamp
   int64 xci_total_rowcount

Another piece of information that seems useful to provide here would be
the logical timestamp of the transaction, i.e. a counter that's
incremented by one for each transaction. But maybe that's implicit in
the log ordering?

I'm not clear on why the total rowcount is useful, but perhaps I'm
missing something obvious.


I've actually implemented some semblance of this on Postgres 8.2, but it
sounds like what you're interested in is more sophisticated. In
particular, I wasn't at all concerned with durability or WAL stuff, and
I had some specific requirements about when it was OK to purge the
data. Because of this (and very limited development time), I just
threw something together with a simple shared buffer.

I don't think I have any useful code to offer, but let me know if
there's some way I can help out.

Dan

-- 
Dan R. K. Ports  MIT CSAILhttp://drkp.net/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade docs

2010-05-24 Thread Robert Haas
On Mon, May 24, 2010 at 3:11 PM, Bruce Momjian br...@momjian.us wrote:
 Stefan Kaltenbrunner wrote:
  I have updated the paragraph to be:
 
      Upgrading from PostgreSQL 8.3 has additional restrictions not present
      when upgrading from later PostgreSQL releases.  For example,
      pg_upgrade will not work for a migration from 8.3 if a user column
      is defined as:
 
  Can you suggest other wording?

 hmm that seems better thanks, however I just noticed that we don't have
 a general limitations section. The way the docs are now done suggests
 that there are not limitations at all (except for the two warnings in
 the migration guide). Is pg_upgrade really up to the point where it can
 fully replace pg_dump  pg_restore independent of the loaded (contrib)
 or even third party modules(like postgis or custom datatypes etc)?

 Yea, that's about right.  I can add limiations if you want.  ;-)

I don't believe this.  For one thing, I am pretty sure that if there
are ABI differences between loadable modules between the old and new
cluster, hilarity will ensue.

 The only open pg_upgrade items are the ones on our TODO list:

        http://wiki.postgresql.org/wiki/Todo

 (I can't give you a URL hash-reference to the section because it doesn't
 work on Firefox and no one seems to be able to fix it.)

It works OK for me.  The link to /contrib/pg_upgrade within the nav
section at the top righthand corner of the page seems to work just
fine.

http://wiki.postgresql.org/wiki/Todo#.2Fcontrib.2Fpg_upgrade

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Move 'long long' check to c.h

2010-05-24 Thread Michael Meskes
 I think the current coding is extremely fragile (if it indeed works at
 all) because of its assumption that limits.h has been included

Well, this is the case in the code so far. 

 already.  In any case, we have configure tests that exist only for the
 benefit of contrib modules, so it's hard to argue that we shouldn't have
 one that exists only for ecpg.
 
 I think we should fix this (properly) for 9.0.

Ok, I don't mind fixing it properly for 9.0. Will do so as soon as I find time.

Michael

-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
ICQ 179140304, AIM/Yahoo/Skype michaelmeskes, Jabber mes...@jabber.org
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Move 'long long' check to c.h

2010-05-24 Thread Stephen Frost
* Michael Meskes (mes...@postgresql.org) wrote:
  I think the current coding is extremely fragile (if it indeed works at
  all) because of its assumption that limits.h has been included
 
 Well, this is the case in the code so far. 

Right, the existing code is after limits.h is included, my suggestion to
put it in c.h would have lost limits.h and broken things.  Sorry about
that.  I didn't realize the dependency and make check didn't complain
(not that I'm sure there's even a way we could have a regression test
for this..).  I didn't intend to imply the currently-committed code
didn't work (I figured it was probably fine :), was just trying to tidy
a bit.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] pg_upgrade docs

2010-05-24 Thread Bruce Momjian
Robert Haas wrote:
 On Mon, May 24, 2010 at 3:11 PM, Bruce Momjian br...@momjian.us wrote:
  Stefan Kaltenbrunner wrote:
   I have updated the paragraph to be:
  
   ? ? Upgrading from PostgreSQL 8.3 has additional restrictions not present
   ? ? when upgrading from later PostgreSQL releases. ?For example,
   ? ? pg_upgrade will not work for a migration from 8.3 if a user column
   ? ? is defined as:
  
   Can you suggest other wording?
 
  hmm that seems better thanks, however I just noticed that we don't have
  a general limitations section. The way the docs are now done suggests
  that there are not limitations at all (except for the two warnings in
  the migration guide). Is pg_upgrade really up to the point where it can
  fully replace pg_dump  pg_restore independent of the loaded (contrib)
  or even third party modules(like postgis or custom datatypes etc)?
 
  Yea, that's about right. ?I can add limiations if you want. ?;-)
 
 I don't believe this.  For one thing, I am pretty sure that if there
 are ABI differences between loadable modules between the old and new
 cluster, hilarity will ensue.

Well, the point is that our existing code doesn't have any
incompatibilites that I know of.  We could certainly add some in 9.1.

  The only open pg_upgrade items are the ones on our TODO list:
 
  ? ? ? ?http://wiki.postgresql.org/wiki/Todo
 
  (I can't give you a URL hash-reference to the section because it doesn't
  work on Firefox and no one seems to be able to fix it.)
 
 It works OK for me.  The link to /contrib/pg_upgrade within the nav
 section at the top righthand corner of the page seems to work just
 fine.
 
 http://wiki.postgresql.org/wiki/Todo#.2Fcontrib.2Fpg_upgrade

The problem is that the Contents menu on the top right of the page
doesn't allow a clickable link to that section, and many others.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages

2010-05-24 Thread Alvaro Herrera
Excerpts from Josh Berkus's message of vie may 21 17:57:35 -0400 2010:

 Problem: currently, if your database has a large amount of cold data, 
 such as 350GB of 3-year-old sales transactions, in 8.4 vacuum no longer 
 needs to touch it thanks to the visibility map.  However, every 
 freeze_age transactions, very old pages need to be sucked into memory 
 and rewritten just in order to freeze those pages.  This can have a huge 
 impact on system performance, and seems unjustified because the pages 
 are not actually being used.

I think this is nonsense.  If you have 3-years-old sales transactions,
and your database has any interesting churn, tuples those pages have
been frozen for a very long time *already*.  The problem is vacuum
reading them in so that it can verify there's nothing to do.  If we want
to avoid *reading* those pages, this solution is useless:

 Suggested resolution: we would add a 4-byte field to the *page* header 
 which would track the XID wraparound count.

because you still have to read the page.

I think what you're looking for is for this Xid wraparound count to be
stored elsewhere, not inside the page.  That way vacuum can read it and
skip the page without reading it altogether.  I think a freeze map has
been mentioned downthread.

I remember mentioning some time ago that we could declare some tables as
frozen, i.e. not needing vacuum.  This strikes me as similar, except
at the page level rather than table level.

-- 
Álvaro Herrera alvhe...@alvh.no-ip.org

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-24 Thread Kevin Grittner
Robert Haas  wrote:
 
 It's a tool for replication solutions to use.
 
I was thrown by the original post referencing data warehousing.
For replication I definitely see that it would be good to provide
some facility to grab a coherent snapshot out of the transaction
stream, but I'm still not clear on a use case where other solutions
aren't better.  If you want a *particular* past snapshot, something
akin to the transactional caching that Dan Ports mentioned seems
best.  If you just want a coherent snapshot like snapshot isolation,
the current mechanisms seem to work (unless I'm missing something?).
If you want solid data integrity querying the most recent replicated
data, the proposal I posted earlier in the thread is the best I can
see, so far.
 
 if the reconstructed transaction order inferred by SSI doesn't
 match the actual commit order, can we get a serialization anomaly
 on the standby by replaying transactions there in commit order?
 
Yes.  If we don't do *something* to address it, the replicas
(slaves) will operate as read-only snapshot isolation, not true
serializable.
 
 Can you give an example and explain how your proposal would solve
 it?
 
I gave an example (without rigorous proof accompanying it, granted)
earlier in the thread.  In that example, if you allow a selection
against a snapshot which includes the earlier commit (the update of
the control table) and before the later commits (the receipts which
used the old deposit date) you have exactly the kind of
serialization anomaly which the work in progress prevents on the
source (master) database -- the receipts *appear* to run in earlier
transactions because the see the pre-update deposit date, but they
show up out of order.  As far as I'm concerned this is only a
problem if the user *requested* serializable behavior for all
transactions involved.
 
If we send the information I suggested in the WAL stream, then any
slave using the WAL stream could build a snapshot for a serializable
transaction which excluded serializable transactions from the source
which overlap with still-pending serializable transactions on the
source.  In this example, the update of the control table would not
be visible to a serializable transaction on the slave until any
overlapping serializable transactions (which would include any
receipts using the old date) had also committed, so you could never
see the writes out of order.
 
I don't think that passing detailed predicate locking information
would be feasible from a performance perspective, but since the
slaves are read-only, I think it is fine to pass just the minimal
transaction-level information I described.
 
-Kevin



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unnailing shared relations (was Re: global temporary tables)

2010-05-24 Thread Alvaro Herrera
Excerpts from Robert Haas's message of vie may 21 10:20:38 -0400 2010:

 Actually, there's another way we could do this.   Instead of creating
 pg_shared_class and pg_shared_attribute and moving all of the catalog
 entries for the shared relations into those tables, we could consider
 leaving the catalog entries in the unshared copies of pg_class,
 pg_attribute, etc. and DUPLICATING them in a shared catalog which
 would only be used prior to selecting a database.  Once we selected a
 database we'd switch to using the database-specific pg_class et al.
 Obviously that's a little grotty but it might (?) be easier, and
 possibly a step along the way.

Uh, how does this work when you change the entries for shared relations
in a database-specific pg_class?  Keeping everything in sync seems hard,
if not impossible.

-- 
Álvaro Herrera alvhe...@alvh.no-ip.org

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unnailing shared relations (was Re: global temporary tables)

2010-05-24 Thread Robert Haas
On Mon, May 24, 2010 at 4:23 PM, Alvaro Herrera alvhe...@alvh.no-ip.org wrote:
 Excerpts from Robert Haas's message of vie may 21 10:20:38 -0400 2010:

 Actually, there's another way we could do this.   Instead of creating
 pg_shared_class and pg_shared_attribute and moving all of the catalog
 entries for the shared relations into those tables, we could consider
 leaving the catalog entries in the unshared copies of pg_class,
 pg_attribute, etc. and DUPLICATING them in a shared catalog which
 would only be used prior to selecting a database.  Once we selected a
 database we'd switch to using the database-specific pg_class et al.
 Obviously that's a little grotty but it might (?) be easier, and
 possibly a step along the way.

 Uh, how does this work when you change the entries for shared relations
 in a database-specific pg_class?  Keeping everything in sync seems hard,
 if not impossible.

Well, I might be missing something here, but pg_class already IS
database-specific.  If you change anything very significant about a
shared rel in one copy of pg_class today, you're toast, IIUC.  This
proposal doesn't make that any better, but I don't think it makes it
any worse either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade docs

2010-05-24 Thread Robert Haas
On Mon, May 24, 2010 at 3:41 PM, Bruce Momjian br...@momjian.us wrote:
 Robert Haas wrote:
 On Mon, May 24, 2010 at 3:11 PM, Bruce Momjian br...@momjian.us wrote:
  Stefan Kaltenbrunner wrote:
   I have updated the paragraph to be:
  
   ? ? Upgrading from PostgreSQL 8.3 has additional restrictions not 
   present
   ? ? when upgrading from later PostgreSQL releases. ?For example,
   ? ? pg_upgrade will not work for a migration from 8.3 if a user column
   ? ? is defined as:
  
   Can you suggest other wording?
 
  hmm that seems better thanks, however I just noticed that we don't have
  a general limitations section. The way the docs are now done suggests
  that there are not limitations at all (except for the two warnings in
  the migration guide). Is pg_upgrade really up to the point where it can
  fully replace pg_dump  pg_restore independent of the loaded (contrib)
  or even third party modules(like postgis or custom datatypes etc)?
 
  Yea, that's about right. ?I can add limiations if you want. ?;-)

 I don't believe this.  For one thing, I am pretty sure that if there
 are ABI differences between loadable modules between the old and new
 cluster, hilarity will ensue.

 Well, the point is that our existing code doesn't have any
 incompatibilites that I know of.  We could certainly add some in 9.1.

Yes, or third-party vendors could add some for us.  We can't guarantee
this in general.

  The only open pg_upgrade items are the ones on our TODO list:
 
  ? ? ? ?http://wiki.postgresql.org/wiki/Todo
 
  (I can't give you a URL hash-reference to the section because it doesn't
  work on Firefox and no one seems to be able to fix it.)

 It works OK for me.  The link to /contrib/pg_upgrade within the nav
 section at the top righthand corner of the page seems to work just
 fine.

 http://wiki.postgresql.org/wiki/Todo#.2Fcontrib.2Fpg_upgrade

 The problem is that the Contents menu on the top right of the page
 doesn't allow a clickable link to that section, and many others.

It does for me...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-24 Thread Robert Haas
On Mon, May 24, 2010 at 4:03 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Robert Haas  wrote:

 It's a tool for replication solutions to use.

 I was thrown by the original post referencing data warehousing.
 For replication I definitely see that it would be good to provide
 some facility to grab a coherent snapshot out of the transaction
 stream, but I'm still not clear on a use case where other solutions
 aren't better.  If you want a *particular* past snapshot, something
 akin to the transactional caching that Dan Ports mentioned seems
 best.  If you just want a coherent snapshot like snapshot isolation,
 the current mechanisms seem to work (unless I'm missing something?).
 If you want solid data integrity querying the most recent replicated
 data, the proposal I posted earlier in the thread is the best I can
 see, so far.

Well, AIUI, what you're really trying to do is derive the delta
between an old snapshot and a newer snapshot.

 Can you give an example and explain how your proposal would solve
 it?

 I gave an example (without rigorous proof accompanying it, granted)
 earlier in the thread.  In that example, if you allow a selection
 against a snapshot which includes the earlier commit (the update of
 the control table) and before the later commits (the receipts which
 used the old deposit date) you have exactly the kind of
 serialization anomaly which the work in progress prevents on the
 source (master) database -- the receipts *appear* to run in earlier
 transactions because the see the pre-update deposit date, but they
 show up out of order.

Yep, I see it now.

  As far as I'm concerned this is only a
 problem if the user *requested* serializable behavior for all
 transactions involved.

Agreed.

 If we send the information I suggested in the WAL stream, then any
 slave using the WAL stream could build a snapshot for a serializable
 transaction which excluded serializable transactions from the source
 which overlap with still-pending serializable transactions on the
 source.  In this example, the update of the control table would not
 be visible to a serializable transaction on the slave until any
 overlapping serializable transactions (which would include any
 receipts using the old date) had also committed, so you could never
 see the writes out of order.

 I don't think that passing detailed predicate locking information
 would be feasible from a performance perspective, but since the
 slaves are read-only, I think it is fine to pass just the minimal
 transaction-level information I described.

I suspect that's still going to be sort of hard on performance, but
maybe we should get serializable working and committed on one node
first and then worry about how to distribute it.  I think there might
be other approaches to this problem (global transaction coordinator?
standby requests snapshot from primary?).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-24 Thread Jan Wieck

On 5/24/2010 12:51 PM, Kevin Grittner wrote:

Robert Haas  wrote:
 

I think you're confusing two subtly different things.
 
The only thing I'm confused about is what benefit anyone expects to

get from looking at data between commits in some way other than our
current snapshot mechanism.  Can someone explain a use case where
what Jan is proposing is better than snapshot isolation?  It doesn't
provide any additional integrity guarantees that I can see.
 

But the commit order is still the order the effects of those
transactions have become visible - if we inserted a new read-only
transaction into the stream at some arbitrary point in time, it
would see all the transactions which committed before it and none
of those that committed afterward.
 
Isn't that what a snapshot does already?


It does and the proposed is a mere alternative serving the same purpose.

Have you ever looked at one of those queries, that Londiste or Slony 
issue against the provider DB in order to get all the log data that has 
been committed between two snapshots? Is that really the best you can 
think of?



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unnailing shared relations (was Re: global temporary tables)

2010-05-24 Thread Alvaro Herrera
Excerpts from Robert Haas's message of lun may 24 17:18:21 -0400 2010:
 On Mon, May 24, 2010 at 4:23 PM, Alvaro Herrera alvhe...@alvh.no-ip.org 
 wrote:
  Excerpts from Robert Haas's message of vie may 21 10:20:38 -0400 2010:

  Uh, how does this work when you change the entries for shared relations
  in a database-specific pg_class?  Keeping everything in sync seems hard,
  if not impossible.
 
 Well, I might be missing something here, but pg_class already IS
 database-specific.  If you change anything very significant about a
 shared rel in one copy of pg_class today, you're toast, IIUC.  This
 proposal doesn't make that any better, but I don't think it makes it
 any worse either.

I thought the whole point of this exercise was precisely to avoid this
sort of problem.

-- 
Álvaro Herrera alvhe...@alvh.no-ip.org

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade docs

2010-05-24 Thread Dave Page
On Mon, May 24, 2010 at 5:20 PM, Robert Haas robertmh...@gmail.com wrote:

 It works OK for me.  The link to /contrib/pg_upgrade within the nav
 section at the top righthand corner of the page seems to work just
 fine.

 http://wiki.postgresql.org/wiki/Todo#.2Fcontrib.2Fpg_upgrade

 The problem is that the Contents menu on the top right of the page
 doesn't allow a clickable link to that section, and many others.

 It does for me...

Doesn't here. FYI, neither do others such as 2.6, 2.7, 6.1  6.11


-- 
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade docs

2010-05-24 Thread Bruce Momjian
Robert Haas wrote:
 On Mon, May 24, 2010 at 3:41 PM, Bruce Momjian br...@momjian.us wrote:
  Robert Haas wrote:
  On Mon, May 24, 2010 at 3:11 PM, Bruce Momjian br...@momjian.us wrote:
   Stefan Kaltenbrunner wrote:
I have updated the paragraph to be:
   
? ? Upgrading from PostgreSQL 8.3 has additional restrictions not 
present
? ? when upgrading from later PostgreSQL releases. ?For example,
? ? pg_upgrade will not work for a migration from 8.3 if a user column
? ? is defined as:
   
Can you suggest other wording?
  
   hmm that seems better thanks, however I just noticed that we don't have
   a general limitations section. The way the docs are now done suggests
   that there are not limitations at all (except for the two warnings in
   the migration guide). Is pg_upgrade really up to the point where it can
   fully replace pg_dump  pg_restore independent of the loaded (contrib)
   or even third party modules(like postgis or custom datatypes etc)?
  
   Yea, that's about right. ?I can add limiations if you want. ?;-)
 
  I don't believe this. ?For one thing, I am pretty sure that if there
  are ABI differences between loadable modules between the old and new
  cluster, hilarity will ensue.
 
  Well, the point is that our existing code doesn't have any
  incompatibilites that I know of. ?We could certainly add some in 9.1.
 
 Yes, or third-party vendors could add some for us.  We can't guarantee
 this in general.

What is your point?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-24 Thread Jan Wieck

On 5/24/2010 3:10 PM, Dan Ports wrote:

I'm not clear on why the total rowcount is useful, but perhaps I'm
missing something obvious.


It is a glimpse into the future. Several years of pain doing replication 
work has taught me that knowing approximately who much work the next 
chunk will be before you select it all is a really useful thing.



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade docs

2010-05-24 Thread Robert Haas
On Mon, May 24, 2010 at 5:38 PM, Dave Page dp...@pgadmin.org wrote:
 On Mon, May 24, 2010 at 5:20 PM, Robert Haas robertmh...@gmail.com wrote:

 It works OK for me.  The link to /contrib/pg_upgrade within the nav
 section at the top righthand corner of the page seems to work just
 fine.

 http://wiki.postgresql.org/wiki/Todo#.2Fcontrib.2Fpg_upgrade

 The problem is that the Contents menu on the top right of the page
 doesn't allow a clickable link to that section, and many others.

 It does for me...

 Doesn't here. FYI, neither do others such as 2.6, 2.7, 6.1  6.11

Oh, interesting.  2.6 and 2.7 don't work for me, but 6.1 and 6.11 do.
That is really odd.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-24 Thread Dan Ports
On Mon, May 24, 2010 at 10:24:07AM -0500, Kevin Grittner wrote:
 Jan Wieck wrote:
  
  In some systems (data warehousing, replication), the order of
  commits is important, since that is the order in which changes
  have become visible.
  
 This issue intersects with the serializable work I've been doing.
 While in database transactions using S2PL the above is true, in
 snapshot isolation and the SSI implementation of serializable
 transactions, it's not. In particular, the snapshot anomalies which
 can cause non-serializable behavior happen precisely because the
 apparent order of execution doesn't match anything so linear as
 order of commit.

All true, but this doesn't pose a problem in snapshot isolation. Maybe
this is obvious to everyone else, but just to be clear: a transaction's
snapshot is determined entirely by which transactions committed before
it snapshotted (and hence are visible to it). Thus, replaying update
transactions in the sae order on a slave makes the same sequence of
states visible to it.

Of course (as in your example) some of these states could expose
snapshot isolation anomalies. But that's true on a single-replica
system too.


Now, stepping into the SSI world...

 Replicating or recreating the whole predicate locking and conflict
 detection on slaves is not feasible for performance reasons. (I
 won't elaborate unless someone feels that's not intuitively
 obvious.) The only sane way I can see to have a slave database allow
 serializable behavior is to WAL-log the acquisition of a snapshot by
 a serializable transaction, and the rollback or commit, on the
 master, and to have the serializable snapshot build on a slave
 exclude any serializable transactions for which there are still
 concurrent serializable transactions. Yes, that does mean WAL-
 logging the snapshot acquisition even if the transaction doesn't yet
 have an xid, and WAL-logging the commit or rollback even if it never
 acquires an xid.

One important observation is that any anomaly that occurs on the slave
can be resolved by aborting a local read-only transaction. This is a
good thing, because the alternatives are too horrible to consider.

You could possibly cut the costs of predicate locking by having the
master ship with each transaction the list of predicate locks it
acquired. But you'd still have to track locks for read-only
transactions, so maybe that's not a significant cost improvement. On
the other hand, if you're willing to pay the price of serializability
on the master, why not the slaves too?

Dan

-- 
Dan R. K. Ports  MIT CSAILhttp://drkp.net/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ExecutorCheckPerms() hook

2010-05-24 Thread KaiGai Kohei
(2010/05/24 22:18), Robert Haas wrote:
 2010/5/24 KaiGai Koheikai...@ak.jp.nec.com:
 BTW, I guess the reason why permissions on attributes are not checked here is
 that we missed it at v8.4 development.
 
 That's a little worrying.  Can you construct and post a test case
 where this results in a user-visible failure in CVS HEAD?

Sorry, after more detailed consideration, it seems to me the permission
checks in RI_Initial_Check() and its fallback mechanism are nonsense.

See the following commands.

  postgres=# CREATE USER ymj;
  CREATE ROLE
  postgres=# CREATE TABLE pk_tbl (a int primary key, b text);
  NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index pk_tbl_pkey 
for table pk_tbl
  CREATE TABLE
  postgres=# CREATE TABLE fk_tbl (x int, y text);
  CREATE TABLE
  postgres=# ALTER TABLE pk_tbl OWNER TO ymj;
  ALTER TABLE
  postgres=# ALTER TABLE fk_tbl OWNER TO ymj;
  ALTER TABLE
  postgres=# REVOKE ALL ON pk_tbl, fk_tbl FROM ymj;
  REVOKE
  postgres=# GRANT REFERENCES ON pk_tbl, fk_tbl TO ymj;
  GRANT

At that time, the 'ymj' has ownership and REFERENCES permissions on
both of pk_tbl and fk_tbl. In this case, RI_Initial_Check() shall return
and the fallback-seqscan will run. But,

  postgres= ALTER TABLE fk_tbl ADD FOREIGN KEY (x) REFERENCES pk_tbl (a);
  ERROR:  permission denied for relation pk_tbl
  CONTEXT:  SQL statement SELECT 1 FROM ONLY public.pk_tbl x WHERE a 
OPERATOR(pg_catalog.=) $1 FOR SHARE OF x

From more careful observation of the code, the validateForeignKeyConstraint()
also calls RI_FKey_check_ins() for each scanned tuples, but it also execute
SELECT statement using SPI_*() interface internally.

In other words, both of execution paths entirely require SELECT permission
to validate new FK constraint.


I think we need a new SPI_*() interface which allows to run the given query
without any permission checks, because these queries are purely internal stuff,
so we can trust the query is harmless.
Is there any other idea?

 The attached patch provides a common checker function of DML, and modifies
 ExecCheckRTPerms(), CopyTo() and RI_Initial_Check() to call the checker
 function instead of individual ACL checks.
 
 This looks pretty sane to me, although I have not done a full review.
 I am disinclined to create a whole new directory for it.   I think the
 new function should go in src/backend/catalog/aclchk.c and be declared
 in src/include/utils/acl.h.  If that sounds reasonable to you, please
 revise and post an updated patch.
 

I'm afraid of that the src/backend/catalog/aclchk.c will become overcrowding
in the future. If it is ugly to deploy checker functions in separated dir/files,
I think it is an idea to put it on the execMain.c, instead of 
ExecCheckRTEPerms().

It also suggest us where the checker functions should be deployed on the 
upcoming
DDL reworks. In similar way, we will deploy them on 
src/backend/command/pg_database
for example?

Thanks,
-- 
KaiGai Kohei kai...@ak.jp.nec.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] (9.1) btree_gist support for searching on not equals

2010-05-24 Thread Takahiro Itagaki

Marko Tiikkaja marko.tiikk...@cs.helsinki.fi wrote:

 On 5/21/10 11:47 PM +0300, Jeff Davis wrote:
  It also allows you to enforce the constraint that only one tuple exists
  in a table by doing something like:
 
 create table a
 (
   i int,
   exclude using gist (i with),
   unique (i)
 );

+1.  I've not read the code, but it might be considerable that we can
abort index scans if we find a first index entry for i. While we must
scan all candidates for WHERE i  ?, but we can abort for the constraint
case because we know existing values are all the same.

 FWIW, this is achievable a lot more easily:
 CREATE UNIQUE INDEX a_single_row ON a ((1));

The former exclusion constraint means one same value for all rows,
but your alternative means a_single_row, right?

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-24 Thread Florian Pflug
On May 25, 2010, at 0:42 , Dan Ports wrote:
 On Mon, May 24, 2010 at 10:24:07AM -0500, Kevin Grittner wrote:
 Jan Wieck wrote:
 
 In some systems (data warehousing, replication), the order of
 commits is important, since that is the order in which changes
 have become visible.
 
 This issue intersects with the serializable work I've been doing.
 While in database transactions using S2PL the above is true, in
 snapshot isolation and the SSI implementation of serializable
 transactions, it's not. In particular, the snapshot anomalies which
 can cause non-serializable behavior happen precisely because the
 apparent order of execution doesn't match anything so linear as
 order of commit.
 
 All true, but this doesn't pose a problem in snapshot isolation. Maybe
 this is obvious to everyone else, but just to be clear: a transaction's
 snapshot is determined entirely by which transactions committed before
 it snapshotted (and hence are visible to it). Thus, replaying update
 transactions in the sae order on a slave makes the same sequence of
 states visible to it.

The subtle point here is whether you consider the view from the outside (in 
the sense of what a read-only transaction started at an arbitrary time can or 
cannot observe), or from the inside (what updating transactions can observe 
and might base their updates on).

The former case is completely determined by the commit ordering of the 
transactions, while the latter is not - otherwise serializability wouldn't be 
such a hard problem.

For some problems, like replication, the former (outside) view is what 
matters - if slave synthesizes transactions that insert/update/delete the very 
same tuples as the original transaction did, and commits them in the same 
order, no read-only transaction can observe the difference. But that is *not* a 
serial schedule of the original transactions, since the transactions are *not* 
the same - the merely touch the same tuples. In fact, if you try replaying the 
original SQL, you *will* get different results on the slave, and not only 
because of now() and the like.

best regards,
Florian Pflug




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ExecutorCheckPerms() hook

2010-05-24 Thread KaiGai Kohei
(2010/05/25 4:11), Stephen Frost wrote:
 * KaiGai Kohei (kai...@ak.jp.nec.com) wrote:
 I'd like to point out two more points are necessary to be considered
 for DML permission checks in addition to ExecCheckRTPerms().

 * DoCopy()

 Although DoCopy() is called from standard_ProcessUtility(), it performs
 as DML statement, rather than DDL. It check ACL_SELECT or ACL_INSERT on
 the copied table or attributes, similar to what ExecCheckRTEPerms() doing.
 
 Rather than construct a complicated API for this DML activity, why don't
 we just make ExecCheckRTPerms available for DoCopy to use?  It seems
 like we could move ExecCheckRTPerms() to acl.c without too much trouble.
 acl.h already includes parsenodes.h and has knowledge of RangeVar's.
 Once DoCopy is using that, this issue resolves itself with the hook that
 Robert already wrote up.

We have two options; If the checker function takes the list of RangeTblEntry,
it will be comfortable to ExecCheckRTPerms(), but not DoCopy(). Inversely,
if the checker function takes arguments in my patch, it will be comfortable
to DoCopy(), but ExecCheckRTPerms().

In my patch, it takes 6 arguments, but we can reference all of them from
the given RangeTblEntry. On the other hand, if DoCopy() has to set up
a pseudo RangeTblEntry to call checker function, it entirely needs to set
up similar or a bit large number of variables.

As I replied in the earlier message, it may be an idea to rename and change
the definition of ExecCheckRTEPerms() without moving it anywhere.

 * RI_Initial_Check()

 RI_Initial_Check() is a function called on ALTER TABLE command to add FK
 constraints between two relations. The permission to execute this ALTER TABLE
 command itself is checked on ATPrepCmd() and ATAddForeignKeyConstraint(),
 so it does not affect anything on the DML permission reworks.
 
 I'm not really thrilled with how RI_Initial_Check() does it's own
 permissions checking and then calls SPI expecting things to 'just work'.
 Not sure if there's some way we could handle failure from SPI, or, if it
 was changed to call ExecCheckRTPerms() instead, how it would handle
 failure cases from there.  One possible solution would be to have an
 additional option to ExecCheckRTPerms() which asks it to just check and
 return false if there's a problem, rather than unconditionally calling
 aclcheck_error() whenever it finds a problem.
 
 Using the same function for both the initial check in RI_Initial_Check()
 and then from SPI would eliminate issues where those two checks disagree
 for some reason, which would be good in the general case.

Sorry, I missed the fallback path also needs SELECT permissions because
validateForeignKeyConstraint() calls RI_FKey_check_ins() which entirely
tries to execute SELECT statement using SPI_*() interface.
But, it is a separate issue from the DML permission reworks.

It seems to me the permission checks in RI_Initial_Check() is a bit ad-hoc.
What we really want to do here is validation of the new FK constraints.
So, the validateForeignKeyConstraint() intends to provide a fall-back code
when table-level permission is denied, doesn't it?

In this case, we should execute the secondary query without permission checks,
because the permissions of ALTER TABLE is already checked, and we can trust
the given query is harmless.

 BTW, I guess the reason why permissions on attributes are not checked here is
 that we missed it at v8.4 development.
 
 Indeed, but at the same time, this looks to be a 'fail-safe' situation.
 Basically, this is checking table-level permissions, which, if you have,
 gives you sufficient rights to SELECT against the table (any column).
 What this isn't doing is allowing the option of column-level permissions
 to be sufficient for this requirement.  That's certainly an oversight in
 the column-level permissions handling (sorry about that), but it's not
 horrible- there's a workaround if RI_Initial_Check returns false already
 anyway.

Yes, it is harmless expect for performances in a corner-case.
If user have table-level permissions, it does not need to check column-
level permissions, even if it is implemented.

Thanks,
-- 
KaiGai Kohei kai...@ak.jp.nec.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ExecutorCheckPerms() hook

2010-05-24 Thread Stephen Frost
KaiGai,

* KaiGai Kohei (kai...@ak.jp.nec.com) wrote:
   postgres=# ALTER TABLE pk_tbl OWNER TO ymj;
   ALTER TABLE
   postgres=# ALTER TABLE fk_tbl OWNER TO ymj;
   ALTER TABLE
   postgres=# REVOKE ALL ON pk_tbl, fk_tbl FROM ymj;
   REVOKE
   postgres=# GRANT REFERENCES ON pk_tbl, fk_tbl TO ymj;
   GRANT
 
 At that time, the 'ymj' has ownership and REFERENCES permissions on
 both of pk_tbl and fk_tbl. In this case, RI_Initial_Check() shall return
 and the fallback-seqscan will run. But,

ymj may be considered an 'owner' on that table, but in this case, it
doesn't have SELECT rights on it.  Now, you might argue that we should
assume that the owner has SELECT rights (since they're granted by
default), even if they've been revoked, but that's a whole separate
issue.

   postgres= ALTER TABLE fk_tbl ADD FOREIGN KEY (x) REFERENCES pk_tbl (a);
   ERROR:  permission denied for relation pk_tbl
   CONTEXT:  SQL statement SELECT 1 FROM ONLY public.pk_tbl x WHERE a 
 OPERATOR(pg_catalog.=) $1 FOR SHARE OF x

I think you've got another issue here that's not related.  Perhaps
something wrong with a patch you've applied?  Otherwise, what version of
PG is this?  Using 8.2, 8.3, 8.4 and a recent git checkout, I get:

postgres=# CREATE USER ymj;
CREATE ROLE
postgres=# CREATE TABLE pk_tbl (a int primary key, b text);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index pk_tbl_pkey 
for table pk_tbl
CREATE TABLE
postgres=# CREATE TABLE fk_tbl (x int, y text);
CREATE TABLE
postgres=# ALTER TABLE pk_tbl OWNER TO ymj;
ALTER TABLE
postgres=# ALTER TABLE fk_tbl OWNER TO ymj;
ALTER TABLE
postgres=# REVOKE ALL ON pk_tbl, fk_tbl FROM ymj;
REVOKE
postgres=# GRANT REFERENCES ON pk_tbl, fk_tbl TO ymj;
GRANT
postgres=# SET ROLE ymj;
SET
postgres= ALTER TABLE fk_tbl ADD FOREIGN KEY (x) REFERENCES pk_tbl (a);
ALTER TABLE
postgres= 

 I think we need a new SPI_*() interface which allows to run the given query
 without any permission checks, because these queries are purely internal 
 stuff,
 so we can trust the query is harmless.
 Is there any other idea?

Yeah, I don't see that going anywhere...

 I'm afraid of that the src/backend/catalog/aclchk.c will become overcrowding
 in the future. If it is ugly to deploy checker functions in separated 
 dir/files,
 I think it is an idea to put it on the execMain.c, instead of 
 ExecCheckRTEPerms().

No, this is not a service of the executor, putting it in execMain.c does
not make any sense.

 It also suggest us where the checker functions should be deployed on the 
 upcoming
 DDL reworks. In similar way, we will deploy them on 
 src/backend/command/pg_database
 for example?

We'll worry about DDL when we get there.  It won't be any time soon.  I
would strongly recommend that you concentrate on building an SELinux
module using the hook function that Robert wrote or none of this is
going to end up going anywhere.  If and when we find other places which
handle DML and need adjustment, we can identify how to handle them at
that time.

Hopefully by the time we're comfortable with DML, some of the DDL
permissions checking rework will have been done and how to move forward
with allowing external security modules to handle that will be clear.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-24 Thread Tom Lane
Florian Pflug f...@phlo.org writes:
 The subtle point here is whether you consider the view from the outside (in 
 the sense of what a read-only transaction started at an arbitrary time can or 
 cannot observe), or from the inside (what updating transactions can observe 
 and might base their updates on).

 The former case is completely determined by the commit ordering of the 
 transactions, while the latter is not - otherwise serializability wouldn't be 
 such a hard problem.

BTW, doesn't all this logic fall in a heap as soon as you consider
read-committed transactions?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ExecutorCheckPerms() hook

2010-05-24 Thread Stephen Frost
* KaiGai Kohei (kai...@ak.jp.nec.com) wrote:
 We have two options; If the checker function takes the list of RangeTblEntry,
 it will be comfortable to ExecCheckRTPerms(), but not DoCopy(). Inversely,
 if the checker function takes arguments in my patch, it will be comfortable
 to DoCopy(), but ExecCheckRTPerms().
 
 In my patch, it takes 6 arguments, but we can reference all of them from
 the given RangeTblEntry. On the other hand, if DoCopy() has to set up
 a pseudo RangeTblEntry to call checker function, it entirely needs to set
 up similar or a bit large number of variables.

I don't know that it's really all that difficult to set up an RT in
DoCopy or RI_Initial_Check().  In my opinion, those are the strange or
corner cases- not the Executor code, through which all 'regular' DML is
done.  It makes me wonder if COPY shouldn't have been implemented using
the Executor instead, but that's, again, a completely separate topic.
It wasn't, but it wants to play like it operates in the same kind of way
as INSERT, so it needs to pick up the slack.

 As I replied in the earlier message, it may be an idea to rename and change
 the definition of ExecCheckRTEPerms() without moving it anywhere.

And, again, I don't see that as a good idea at all.

  * RI_Initial_Check()
 
 It seems to me the permission checks in RI_Initial_Check() is a bit ad-hoc.

I agree with this- my proposal would address this in a way whih would be
guaranteed to be consistant: by using the same code path to do both
checks.  I'm still not thrilled with how RI_Initial_Check() works, but
rewriting that isn't part of this.

 In this case, we should execute the secondary query without permission checks,
 because the permissions of ALTER TABLE is already checked, and we can trust
 the given query is harmless.

I dislike the idea of providing a new SPI interfance (on the face of
it), and also dislike the idea of having a skip all permissions
checking option for anything which resembles SPI.  I would rather ask
the question of if it really makes sense to use SPI to check FKs as
they're being added, but we're not going to solve that issue here.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Synchronization levels in SR

2010-05-24 Thread Josh Berkus

 #4 is useful for some cases, but might often make the
 transaction commit on the master get stuck since read-only
 query can easily block recovery by the lock conflict. So
 #4 seems not to be worth working on until that HS problem
 has been addressed. Thought?

I agree that #4 should be done last, but it will be needed, not in the
least by your employer ;-) .  I don't see any obvious way to make #4
compatible with any significant query load on the slave, but in general
I'd think that users of #4 are far more concerned with 0% data loss than
they are with getting the slave to run read queries.

 Second, we need to discuss about how to specify the synch
 level. There are three approaches:
 
 * Per standby
 
 * Per transaction

Ach, I'm torn.  I can see strong use cases for both of the above.
Really, I think:

 * Mix
   Allow users to specify the level per standby and
   transaction at the same time, and then calculate the real
   level from them by using some algorithm.

What we should do is specify it per-standby, and then have a USERSET GUC
on the master which specifies which transactions will be synched, and
those will be synched only on the slaves which are set up to support
synch.  That is, if you have:

Master
Slave #1: synch
Slave #2: not synch
Slave #3: not synch

And you have:
Session #1: synch
Session #2: not synch

Session #1 will be synched on Slave #1 before commit.  Nothing will be
synched on Slaves 2 and 3, and session #2 will not wait for synch on any
slave.

I think this model delivers the maximum HA flexibility to users while
still making intuitive logical sense.

 Which should we adopt for 9.1? I'd like to implement the
 per-standby approach at first since it's simple and seems
 to cover more use cases. Thought?

If people agree that the above is our roadmap, implementing
per-standby first makes sense, and then we can implement per-session
GUC later.


-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Clearing psql's input buffer after auto-reconnect

2010-05-24 Thread Tom Lane
We determined that $SUBJECT would be a good idea in this thread:
http://archives.postgresql.org/pgsql-bugs/2010-05/msg00159.php

I looked a bit at what it would take to make this happen.  The
difficulty is that the input buffer is a local variable in MainLoop(),
and so are a bunch of other subsidiary variables that would need to be
reset along with it.  The place where auto-reconnect presently happens
is CheckConnection(), which is in a different file and is also several
levels of subroutine call away from MainLoop.  AFAICS there are three
ways that we might attack this:

1. Massive restructuring of the code in common.c so that the fact of
a connection reset having happened can be returned back to MainLoop.

2. Export much of MainLoop's internal state as globals, so that
CheckConnection can hack on it directly.

3. Have CheckConnection do longjmp(sigint_interrupt_jmp) after resetting
the connection, to force control to go back to MainLoop directly.
MainLoop is already coded to clear its local state after catching a
longjmp.

Now #1 might be the best long-term solution but I have no particular
appetite to tackle it, and #2 is just too ugly to contemplate.  That
leaves #3, which is a bit ugly in its own right but seems like the best
fix we're likely to get.

Comments, better ideas?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-24 Thread Fujii Masao
On Tue, May 25, 2010 at 1:18 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 I see a lot of value in #4; it makes it possible to distribute read-only
 load to the standby using something like pgbouncer, completely transparently
 to the application.

Agreed.

 In the lesser modes, the application can see slightly
 stale results.

Yes

BTW, even if we got #4, we would need to be careful about that
we might see the uncommitted results from the standby. That is,
the transaction commit might become visible in the standby before
the master returns its success to a client. I think that we
would never get the completely-transaction-consistent results
from the standby until we have implemented the snapshot cloning
feature.
http://wiki.postgresql.org/wiki/ClusterFeatures#Export_snapshots_to_other_sessions

 But whatever we can easily implement, really. Pick one that you think is the
 easiest and start with that, but keep the other modes in mind in the design
 and in the user interface so that you don't paint yourself in the corner.

Yep, the design and implementation for #2 and #3 should be
easily extensible for #4. I'll keep in mind that.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regression testing for psql

2010-05-24 Thread Stephen Frost
* Stephen Frost (sfr...@snowman.net) wrote:
 Add regression testing for psql backslash commands
 
 This patch adds rather extensive regression testing
 of the psql backslash commands.  Hopefully this will
 minimize issues such as the one which cropped up
 recently with \h segfaulting.  Note that we don't
 currently explicit check all the \h options and that
 pretty much any catalog changes will mean that this
 needs to also be updated.  Still, it's a start, we can
 reduce the set of tests if that makes sense or they
 become a problem.

And..  it's way too big to send to the list.  The patch is available
here:

http://snowman.net/~sfrost/psql-regress-help.patch

Of course, if people want to suggest tests that just shouldn't be
included, I can go through and strip things out.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] ExecutorCheckPerms() hook

2010-05-24 Thread KaiGai Kohei
(2010/05/25 10:13), Stephen Frost wrote:
 KaiGai,
 
 * KaiGai Kohei (kai...@ak.jp.nec.com) wrote:
postgres=# ALTER TABLE pk_tbl OWNER TO ymj;
ALTER TABLE
postgres=# ALTER TABLE fk_tbl OWNER TO ymj;
ALTER TABLE
postgres=# REVOKE ALL ON pk_tbl, fk_tbl FROM ymj;
REVOKE
postgres=# GRANT REFERENCES ON pk_tbl, fk_tbl TO ymj;
GRANT

 At that time, the 'ymj' has ownership and REFERENCES permissions on
 both of pk_tbl and fk_tbl. In this case, RI_Initial_Check() shall return
 and the fallback-seqscan will run. But,
 
 ymj may be considered an 'owner' on that table, but in this case, it
 doesn't have SELECT rights on it.  Now, you might argue that we should
 assume that the owner has SELECT rights (since they're granted by
 default), even if they've been revoked, but that's a whole separate
 issue.

Yes, it is entirely separate issue. I don't intend to argue whether
we can assume the default PG permission allows owner to SELECT on
the table, or not.

postgres=  ALTER TABLE fk_tbl ADD FOREIGN KEY (x) REFERENCES pk_tbl (a);
ERROR:  permission denied for relation pk_tbl
CONTEXT:  SQL statement SELECT 1 FROM ONLY public.pk_tbl x WHERE a 
 OPERATOR(pg_catalog.=) $1 FOR SHARE OF x
 
 I think you've got another issue here that's not related.  Perhaps
 something wrong with a patch you've applied?  Otherwise, what version of
 PG is this?  Using 8.2, 8.3, 8.4 and a recent git checkout, I get:
 
 postgres=# CREATE USER ymj;
 CREATE ROLE
 postgres=# CREATE TABLE pk_tbl (a int primary key, b text);
 NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index pk_tbl_pkey 
 for table pk_tbl
 CREATE TABLE
 postgres=# CREATE TABLE fk_tbl (x int, y text);
 CREATE TABLE
 postgres=# ALTER TABLE pk_tbl OWNER TO ymj;
 ALTER TABLE
 postgres=# ALTER TABLE fk_tbl OWNER TO ymj;
 ALTER TABLE
 postgres=# REVOKE ALL ON pk_tbl, fk_tbl FROM ymj;
 REVOKE
 postgres=# GRANT REFERENCES ON pk_tbl, fk_tbl TO ymj;
 GRANT
 postgres=# SET ROLE ymj;
 SET
 postgres=  ALTER TABLE fk_tbl ADD FOREIGN KEY (x) REFERENCES pk_tbl (a);
 ALTER TABLE
 postgres=

Sorry, I missed to copy  paste INSERT statement just after CREATE TABLE.

The secondary RI_FKey_check_ins() is invoked during the while() loop using
heap_getnext(), so it is not called for empty table.

For correctness,

  postgres=# CREATE USER ymj;
  CREATE ROLE
  postgres=# CREATE TABLE pk_tbl (a int primary key, b text);
  NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index pk_tbl_pkey 
for table pk_tbl
  CREATE TABLE
  postgres=# CREATE TABLE fk_tbl (x int, y text);
  CREATE TABLE
| postgres=# INSERT INTO pk_tbl VALUES (1,'aaa'), (2,'bbb'), (3,'ccc');
| INSERT 0 3
| postgres=# INSERT INTO fk_tbl VALUES (1,'xxx'), (2,'yyy'), (3,'zzz');
| INSERT 0 3
  postgres=# ALTER TABLE pk_tbl OWNER TO ymj;
  ALTER TABLE
  postgres=# ALTER TABLE fk_tbl OWNER TO ymj;
  ALTER TABLE
  postgres=# REVOKE ALL ON pk_tbl, fk_tbl FROM ymj;
  REVOKE
  postgres=# GRANT REFERENCES ON pk_tbl, fk_tbl TO ymj;
  GRANT
  postgres=# SET ROLE ymj;
  SET
  postgres= ALTER TABLE fk_tbl ADD FOREIGN KEY (x) REFERENCES pk_tbl (a);
  ERROR:  permission denied for relation pk_tbl
  CONTEXT:  SQL statement SELECT 1 FROM ONLY public.pk_tbl x WHERE a 
OPERATOR(pg_catalog.=) $1 FOR SHARE OF x

I could reproduce it on the 8.4.4, but didn't try on the prior releases.

Thanks,
-- 
KaiGai Kohei kai...@ak.jp.nec.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ExecutorCheckPerms() hook

2010-05-24 Thread KaiGai Kohei
(2010/05/25 10:27), Stephen Frost wrote:
 * KaiGai Kohei (kai...@ak.jp.nec.com) wrote:
 We have two options; If the checker function takes the list of RangeTblEntry,
 it will be comfortable to ExecCheckRTPerms(), but not DoCopy(). Inversely,
 if the checker function takes arguments in my patch, it will be comfortable
 to DoCopy(), but ExecCheckRTPerms().

 In my patch, it takes 6 arguments, but we can reference all of them from
 the given RangeTblEntry. On the other hand, if DoCopy() has to set up
 a pseudo RangeTblEntry to call checker function, it entirely needs to set
 up similar or a bit large number of variables.
 
 I don't know that it's really all that difficult to set up an RT in
 DoCopy or RI_Initial_Check().  In my opinion, those are the strange or
 corner cases- not the Executor code, through which all 'regular' DML is
 done.  It makes me wonder if COPY shouldn't have been implemented using
 the Executor instead, but that's, again, a completely separate topic.
 It wasn't, but it wants to play like it operates in the same kind of way
 as INSERT, so it needs to pick up the slack.
 

Yes, it is not difficult to set up.
The reason why I prefer the checker function takes 6 arguments are that
DoCopy() / RI_Initial_Check() has to set up RangeTblEntry in addition to
Bitmap set, but we don't have any other significant reason.

OK, let's add a hook in the ExecCheckRTPerms().

 * RI_Initial_Check()

 It seems to me the permission checks in RI_Initial_Check() is a bit ad-hoc.
 
 I agree with this- my proposal would address this in a way whih would be
 guaranteed to be consistant: by using the same code path to do both
 checks.  I'm still not thrilled with how RI_Initial_Check() works, but
 rewriting that isn't part of this.

I agree to ignore the problem right now.
It implicitly assume the owner has SELECT privilege on the FK/PK tables,
so the minimum SELinux module also implicitly assume the client has similar
permissions on it.

 In this case, we should execute the secondary query without permission 
 checks,
 because the permissions of ALTER TABLE is already checked, and we can trust
 the given query is harmless.
 
 I dislike the idea of providing a new SPI interfance (on the face of
 it), and also dislike the idea of having a skip all permissions
 checking option for anything which resembles SPI.  I would rather ask
 the question of if it really makes sense to use SPI to check FKs as
 they're being added, but we're not going to solve that issue here.

Apart from the topic of this thread, I guess it allows us to utilize
query optimization and cascaded triggers to implement FK constraints
with minimum code size.

Thanks
-- 
KaiGai Kohei kai...@ak.jp.nec.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ExecutorCheckPerms() hook

2010-05-24 Thread Robert Haas
2010/5/24 KaiGai Kohei kai...@ak.jp.nec.com:
 I think we need a new SPI_*() interface which allows to run the given query
 without any permission checks, because these queries are purely internal 
 stuff,
 so we can trust the query is harmless.
[...]
 I'm afraid of that the src/backend/catalog/aclchk.c will become overcrowding
 in the future. If it is ugly to deploy checker functions in separated 
 dir/files,
 I think it is an idea to put it on the execMain.c, instead of 
 ExecCheckRTEPerms().

Both of these are bad ideas, for reasons Stephen Frost has articulated well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ExecutorCheckPerms() hook

2010-05-24 Thread Robert Haas
On Mon, May 24, 2010 at 9:27 PM, Stephen Frost sfr...@snowman.net wrote:
 * KaiGai Kohei (kai...@ak.jp.nec.com) wrote:
 We have two options; If the checker function takes the list of RangeTblEntry,
 it will be comfortable to ExecCheckRTPerms(), but not DoCopy(). Inversely,
 if the checker function takes arguments in my patch, it will be comfortable
 to DoCopy(), but ExecCheckRTPerms().

 In my patch, it takes 6 arguments, but we can reference all of them from
 the given RangeTblEntry. On the other hand, if DoCopy() has to set up
 a pseudo RangeTblEntry to call checker function, it entirely needs to set
 up similar or a bit large number of variables.

 I don't know that it's really all that difficult to set up an RT in
 DoCopy or RI_Initial_Check().  In my opinion, those are the strange or
 corner cases- not the Executor code, through which all 'regular' DML is
 done.  It makes me wonder if COPY shouldn't have been implemented using
 the Executor instead, but that's, again, a completely separate topic.
 It wasn't, but it wants to play like it operates in the same kind of way
 as INSERT, so it needs to pick up the slack.

I think this approach is definitely worth investigating.  KaiGai, can
you please work up what the patch would look like if we do it this
way?

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade docs

2010-05-24 Thread Robert Haas
On Mon, May 24, 2010 at 5:50 PM, Bruce Momjian br...@momjian.us wrote:
 What is your point?

My point is that I think Stefan has a good point when he says this:

   hmm that seems better thanks, however I just noticed that we don't have
   a general limitations section. The way the docs are now done suggests
   that there are not limitations at all (except for the two warnings in
   the migration guide). Is pg_upgrade really up to the point where it can
   fully replace pg_dump  pg_restore independent of the loaded (contrib)
   or even third party modules(like postgis or custom datatypes etc)?

I think he is quite right to be concerned about these issues and if
the limitations in this area are not well-documented so that users can
easily be aware of them, then IMHO that is something we should
correct.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ExecutorCheckPerms() hook

2010-05-24 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 ... It makes me wonder if COPY shouldn't have been implemented using
 the Executor instead, but that's, again, a completely separate topic.
 It wasn't, but it wants to play like it operates in the same kind of way
 as INSERT, so it needs to pick up the slack.

FWIW, we've shifted COPY more towards using executor support over the
years.  I'm pretty sure that it didn't originally use the executor's
index-entry-insertion infrastructure, for instance.

Building an RT entry seems like a perfectly sane thing to do in order
to make it use the executor's permissions infrastructure.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] (9.1) btree_gist support for searching on not equals

2010-05-24 Thread Jeff Davis
On Sat, 2010-05-22 at 01:02 +0300, Marko Tiikkaja wrote:
 On 5/21/10 11:47 PM +0300, Jeff Davis wrote:
  It also allows you to enforce the constraint that only one tuple exists
  in a table by doing something like:
 
 create table a
 (
   i int,
   exclude using gist (i with),
   unique (i)
 );
 
 FWIW, this is achievable a lot more easily:
 CREATE UNIQUE INDEX a_single_row ON a ((1));
 

Yes, you're right. Also, neither of us accounted for NULLs, so I suppose
a NOT NULL is necessary as well.

I think the original case (same values only) is potentially useful
enough that we should support it.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade docs

2010-05-24 Thread Bruce Momjian
Robert Haas wrote:
 On Mon, May 24, 2010 at 5:50 PM, Bruce Momjian br...@momjian.us wrote:
  What is your point?
 
 My point is that I think Stefan has a good point when he says this:
 
hmm that seems better thanks, however I just noticed that we don't 
have
a general limitations section. The way the docs are now done 
suggests
that there are not limitations at all (except for the two warnings in
the migration guide). Is pg_upgrade really up to the point where it 
can
fully replace pg_dump  pg_restore independent of the loaded 
(contrib)
or even third party modules(like postgis or custom datatypes etc)?
 
 I think he is quite right to be concerned about these issues and if
 the limitations in this area are not well-documented so that users can
 easily be aware of them, then IMHO that is something we should
 correct.

Have you read the docs?  It does mention the issue with /contrib and
stuff.  How do I document a limitation I don't know about?  This is all
very vague.  Please suggest some wording.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regression testing for psql

2010-05-24 Thread Robert Haas
On Mon, May 24, 2010 at 10:51 PM, Stephen Frost sfr...@snowman.net wrote:
 * Stephen Frost (sfr...@snowman.net) wrote:
     Add regression testing for psql backslash commands

     This patch adds rather extensive regression testing
     of the psql backslash commands.  Hopefully this will
     minimize issues such as the one which cropped up
     recently with \h segfaulting.  Note that we don't
     currently explicit check all the \h options and that
     pretty much any catalog changes will mean that this
     needs to also be updated.  Still, it's a start, we can
     reduce the set of tests if that makes sense or they
     become a problem.

 And..  it's way too big to send to the list.  The patch is available
 here:

 http://snowman.net/~sfrost/psql-regress-help.patch

 Of course, if people want to suggest tests that just shouldn't be
 included, I can go through and strip things out.

Well...  I'm a little reluctant to believe that we should have 3.3M of
tests for the entire backend and 5M of tests just for psql.  Then,
too, there's the fact that many of these tests fail on my machine
because my username is not sfrost, and/or because of row-ordering
differences on backslash commands without enough ORDER BY to fully
determine the output order.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] (9.1) btree_gist support for searching on not equals

2010-05-24 Thread Robert Haas
On Mon, May 24, 2010 at 11:25 PM, Jeff Davis pg...@j-davis.com wrote:
 I think the original case (same values only) is potentially useful
 enough that we should support it.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-24 Thread Fujii Masao
On Tue, May 25, 2010 at 10:29 AM, Josh Berkus j...@agliodbs.com wrote:
 I agree that #4 should be done last, but it will be needed, not in the
 least by your employer ;-) .  I don't see any obvious way to make #4
 compatible with any significant query load on the slave, but in general
 I'd think that users of #4 are far more concerned with 0% data loss than
 they are with getting the slave to run read queries.

Since #2 and #3 are enough for 0% data loss, I think that such users
would be more concerned about what results are visible in the standby.
No?

 What we should do is specify it per-standby, and then have a USERSET GUC
 on the master which specifies which transactions will be synched, and
 those will be synched only on the slaves which are set up to support
 synch.  That is, if you have:

 Master
 Slave #1: synch
 Slave #2: not synch
 Slave #3: not synch

 And you have:
 Session #1: synch
 Session #2: not synch

 Session #1 will be synched on Slave #1 before commit.  Nothing will be
 synched on Slaves 2 and 3, and session #2 will not wait for synch on any
 slave.

 I think this model delivers the maximum HA flexibility to users while
 still making intuitive logical sense.

This makes sense.

Since it's relatively easy and simple to implement such a boolean GUC flag
rather than per-transaction levels (there are four valid values #1, #2,
#3 and #4), I'll do that.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unnailing shared relations (was Re: global temporary tables)

2010-05-24 Thread Robert Haas
On Mon, May 24, 2010 at 5:37 PM, Alvaro Herrera alvhe...@alvh.no-ip.org wrote:
 Excerpts from Robert Haas's message of lun may 24 17:18:21 -0400 2010:
 On Mon, May 24, 2010 at 4:23 PM, Alvaro Herrera alvhe...@alvh.no-ip.org 
 wrote:
  Excerpts from Robert Haas's message of vie may 21 10:20:38 -0400 2010:

  Uh, how does this work when you change the entries for shared relations
  in a database-specific pg_class?  Keeping everything in sync seems hard,
  if not impossible.

 Well, I might be missing something here, but pg_class already IS
 database-specific.  If you change anything very significant about a
 shared rel in one copy of pg_class today, you're toast, IIUC.  This
 proposal doesn't make that any better, but I don't think it makes it
 any worse either.

 I thought the whole point of this exercise was precisely to avoid this
 sort of problem.

Short answer: Nope.

Long answer: It would be nice to do that, but in order to accomplish
that we would need to create pg_shared_foo for all relevant pg_foo
and teach the backend code to check both tables in every case.  That
seemed hard, so I suggested just duplicating the entries, thereby
giving processes like the autovacuum launcher the ability to look at
any shared relation without it needing to be nailed, but not actually
solving the whole problem.

It may be a bad idea.  It was just a thought.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unnailing shared relations (was Re: global temporary tables)

2010-05-24 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, May 24, 2010 at 5:37 PM, Alvaro Herrera alvhe...@alvh.no-ip.org 
 wrote:
 Excerpts from Robert Haas's message of lun may 24 17:18:21 -0400 2010:
 Well, I might be missing something here, but pg_class already IS
 database-specific.  If you change anything very significant about a
 shared rel in one copy of pg_class today, you're toast, IIUC.  This
 proposal doesn't make that any better, but I don't think it makes it
 any worse either.
 
 I thought the whole point of this exercise was precisely to avoid this
 sort of problem.

 Short answer: Nope.

In practice, it's very difficult to change anything about a system
catalog anyway, because so many of its properties are baked into the
behavior of the C code.  Whether there's a single copy of the catalog
rows is the least of your worries there.

 Long answer: It would be nice to do that, but in order to accomplish
 that we would need to create pg_shared_foo for all relevant pg_foo
 and teach the backend code to check both tables in every case.  That
 seemed hard, so I suggested just duplicating the entries, thereby
 giving processes like the autovacuum launcher the ability to look at
 any shared relation without it needing to be nailed, but not actually
 solving the whole problem.

I hadn't been paying that much attention to this thread, but it's
sounding to me like it's based on a false premise.  Having a shared copy
of the catalog entries for a shared catalog would accomplish little or
nothing in terms of eliminating nailed relcache entries.  You might be
able to de-nail pg_database and friends, but only at the cost of instead
nailing up entries for pg_shared_class and friends.  Which seems to me
like a net step backwards in terms of the maintenance overhead for
relcache entries.

If we really cared about not nailing these (and I'm not aware of any
evidence that we should care), it would probably be more useful to try
to institute an intermediate level of nailing (stapling?
scotch-taping?) that locked the entry into memory only until we'd
finished bootstrapping the backend.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ROLLBACK TO SAVEPOINT

2010-05-24 Thread Sam Vilain
The note at the end of;

http://www.postgresql.org/docs/8.4/static/sql-savepoint.html

Lead us to believe that if you roll back to the same savepoint name
twice in a row, that you might start walking back through the
savepoints.  I guess I missed the note on ROLLBACK TO SAVEPOINT that
that is not how it works.

Here is the section:

SQL requires a savepoint to be destroyed automatically when another
savepoint with the same name is established. In PostgreSQL, the old
savepoint is kept, though only the more recent one will be used when
rolling back or releasing. (Releasing the newer savepoint will cause the
older one to again become accessible to ROLLBACK TO SAVEPOINT and
RELEASE SAVEPOINT.) Otherwise, SAVEPOINT is fully SQL conforming.

I think it could be improved by also communicating:

Rollback to a savepoint never releases it; you can safely repeat
ROLLBACK TO SAVEPOINT statements without unwinding the transaction, even
if you are re-using savepoint names.

Well, maybe no-one else will ever have the misconception I did, but
there it is.

Sam.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-24 Thread Fujii Masao
On Mon, May 24, 2010 at 10:20 PM, Fujii Masao masao.fu...@gmail.com wrote:
 At the first design phase, I'd like to clarify which synch levels
 should be supported 9.1 and how it should be specified by users.

There is another question about synch level:

When should the master wait for replication?

In my current design, the backend waits for replication only at
the end of the transaction commit. Is this enough? Is there other
waiting point?

For example, smart or fast shutdown on the master should wait
for a shutdown checkpoint record to be replicated to the standby
(btw, in 9.0, shutdown waits for checkpoint record to be *sent*)?
pg_switch_xlog() needs to wait for all of original WAL file to
be replicated?

I'm not sure if the above two waits-for-replication have use
cases, so I'm thinking they are not worth implementing, but..

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers