date:20100526

[HACKERS] [RFC] Security label support

2010-05-26 Thread KaiGai Kohei

As we talked at the developer meeting on Ottawa, it needs to provide
a capability to assign a short text identifier on database objects
to support label based ESP (such as SELinux).
So, I'd like to propose a few approaches to support security label
as a draft of discussion.

An example of label: "system_u:object_r:sepgsql_ro_table_t:s0".

The format/contains/meanings of the security label shall be parsed
and validated by ESP module, so all we need to do is associate such a
short text on a certain database. It is quite similar to COMMENT ON.

I don't want to support multiple labels of an object in this stage,
because it makes ESP interfaces more complex and it is unclear whether
it is actually wanted. For example, OS does not support multiple MAC
features concurrently.

Here are a few idea to support security labels.
In this stage, I think the idea of [2] is most reasonable for us.
(Perhaps, I guess Stephen has same opinion, because the idea was
originally came from him.)


[1] Inject a text label field to every system catalog
-

This idea tries to add a new field to the schema of existing system
catalog.

Its implementation will be simple at first, however, it will be
entirely painful to modify every system catalog definitions and
(typically) Create() functions under the src/backend/commands/.

I doubt it is a correct way, even if short-term development.
It will be reasonabel just only conceptual development.


[2] Using OID as a key of text representation in separated catalog
--

This idea is similar to pg_description/pg_shdescription.
A new system catalog pg_seclabel and pg_shseclabel stores text form
of labels for pair of the relation-Id, object-Oid and object-Subid.
It does not damage to the schema of existing system catalog,

It adds two new system catalogs; pg_seclabel (local) and pg_shseclabel (shared).
The catalogs shall be declared as follows:

  CATALOG(pg_seclabel, 3037) BKI_WITHOUT_OIDS
  {
  Oid relid;  /* OID of the catalog containing the object */
  Oid objid;  /* OID of the object itself */
  int4subid;  /* column number, or 0 if unused */
  textlabel;  /* text form of security label */
  } FormData_pg_seclabel;

We also add a dependency between the labeled object and the security
label itself. It also enables to clean up orphan labels automatically,
without any new invention.

The related code will be stored in src/backend/catalog/pg_seclabel.c.
It provides an internal interface to assign a security label on
a certain database object when creation or relabeling.

However, it also has a limitation from the viewpoint of long-term.
>From the definition, OID of database objects are unique. So, we cannot
share text form of labels even if massive number of database objects
have an identical security label; it can lead waste of storage consumption
because of the duplicated security labels. So, this idea shall be switched
to the [3] when we support row-level security with ESP.
But I think the idea [2] is reasonable in short-term development.


[3] Using security-Id as a key of text representation in separated catalog
--

This idea is a derivation from the idea of [2].
It also stores text form of labels into pg_seclabel/pg_shseclabel, but it
shall be identified with a pair of relation-Id and security-Id which is
newly supported.

The security-Id shall be stored within padding area of HeapTupleHeader like
object-Id. But, unlike object-Id, it does not need to be unique for each tuples.
It allows multiple tuples has same security-Id that is related to a certain
text form of security label. It means we can reduce waste of storage due to
the duplicated labels in text (Note, massive number of objects tend to share
a limited number of labels in general).

So, this approach has advantage toward the idea of [2], however, it needs more
code to be implemented/reviewed than [2], such as management of security-Id,
reclaim of orphan labels and so on.
Therefore, it is not feasible at the statring-up stage, as long as row-level
security with ESP is not available.


* SQL Statement
---

It also need to provide SQL statement to manage security label of the database
object. I plan the following statement to change the security label.

  ALTER xxx  SECURITY LABEL TO 'label';

  (For columns)
  ALTER TABLE  ALTER  SECURITY LABEL TO 'label';

The 'xxx' part is replaced by an object class, such as TABLE, SCHEMA and so on.

When the ALTER command is executed, ESP module validate the given label,
in addition to permission checks to relabel it.
If no ESP module is available, the ALTER always raises a feature-not-supported
error.

  Example)
  ALTER TABLE t1 SECURITY LABEL TO 'system_u:object_r:sepgsql_ro_table_t:s0';

  ALTER SCHEMA kaigai SECURITY LABEL TO 'us

Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-26 Thread Pavel Stehule

2010/5/27 Robert Haas :
> On Wed, May 26, 2010 at 9:28 PM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> On Wed, May 26, 2010 at 8:21 PM, Tom Lane  wrote:
 If we go with the spec's syntax I think we'd have no realistic choice
 except to forbid => altogether as an operator name.  (And no, I'm not
 for that.)
>>
>>> I suppose the most painful thing about doing that is that it would
>>> break hstore.  Are there other commonly-used modules that rely on =>
>>> as an operator name?
>>
>> There don't seem to be any other contrib modules that define => as an
>> operator name, but I'm not sure what's out there on pgfoundry or
>> elsewhere.  The bigger issue to me is not so much hstore itself as that
>> this is an awfully attractive operator name for anything container-ish.
>> Wasn't the JSON-datatype proposal using => for an operator at one stage?
>> (The current wiki page for it doesn't seem to reflect any such idea,
>> though.)  And I think I remember Oleg & Teodor proposing such an
>> operator in conjunction with some GIN-related idea or other.
>>
>>> In spite of the difficulties, I'm reluctant to give up on it.  I
>>> always thought that the "AS" syntax was a crock and I'm not eager to
>>> invent another crock to replace it.  Being compatible with the SQL
>>> standard and with Oracle is not to be taken lightly.
>>
>> Yeah, I know.  Though this could end up being one of the bits of the
>> spec that we politely decline to follow, like upper-casing identifiers.
>> Still, it's a good idea to think again before we've set the release
>> in stone ...
>
> Perhaps one idea would be to:
>
> 1. Invent a new crock for now.
> 2. Add a duplicate version of the hstore => operator with a different name.
> 3. Emit a warning whenever an operator called => is created.
> 4. Document that beginning in PG 9.1, we will no longer support => as
> an operator name.

+1

Pavel

>
> That's still going to cause a fair amount of pain, but certainly less
> if we decide it now rather than later.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise Postgres Company
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-26 Thread Pavel Stehule

2010/5/27 Tom Lane :
> Robert Haas  writes:
>> On Wed, May 26, 2010 at 8:21 PM, Tom Lane  wrote:
>>> If we go with the spec's syntax I think we'd have no realistic choice
>>> except to forbid => altogether as an operator name.  (And no, I'm not
>>> for that.)
>
>> I suppose the most painful thing about doing that is that it would
>> break hstore.  Are there other commonly-used modules that rely on =>
>> as an operator name?
>
> There don't seem to be any other contrib modules that define => as an
> operator name, but I'm not sure what's out there on pgfoundry or
> elsewhere.  The bigger issue to me is not so much hstore itself as that
> this is an awfully attractive operator name for anything container-ish.
> Wasn't the JSON-datatype proposal using => for an operator at one stage?
> (The current wiki page for it doesn't seem to reflect any such idea,
> though.)  And I think I remember Oleg & Teodor proposing such an
> operator in conjunction with some GIN-related idea or other.
>
>> In spite of the difficulties, I'm reluctant to give up on it.  I
>> always thought that the "AS" syntax was a crock and I'm not eager to
>> invent another crock to replace it.  Being compatible with the SQL
>> standard and with Oracle is not to be taken lightly.
>
> Yeah, I know.  Though this could end up being one of the bits of the
> spec that we politely decline to follow, like upper-casing identifiers.
> Still, it's a good idea to think again before we've set the release
> in stone ...

we have a last minutes for decision. any other change will need years
- like 'standard strings'. I agree so it's not good time for change.
But this change is a few lines in parser.

Regards
Pavel


>
>                        regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronization levels in SR

2010-05-26 Thread Simon Riggs

On Thu, 2010-05-27 at 02:18 +0300, Heikki Linnakangas wrote:
> On 27/05/10 01:23, Simon Riggs wrote:
> > On Thu, 2010-05-27 at 00:21 +0300, Heikki Linnakangas wrote:
> >> On 26/05/10 23:31, Dimitri Fontaine wrote:
> >>>d. choice of commit or rollback at timeout
> >>
> >> Rollback is not an option. There is no going back after the commit
> >> record has been flushed to disk or sent to a standby.
> >
> > There's definitely no going back after the xid has been removed from
> > procarray because other transactions will then depend upon the final
> > state. Currently we PANIC if we abort after we've marked clog, though
> > that happens after XLogFlush(), which is where we're planning to wait
> > for synch rep. If we abort after having written a commit record to disk
> > we can still successfully generate an abort record as well. (Luckily, I
> > note HS does actually cope with that. Phew!)
> >
> > So actually, an abort is a reasonable possibility, though I know it
> > doesn't sound like it could be at first thought.

> Hmm, that's an interesting thought. Interesting, as in crazy ;-).

:-) It's a surprising thought for me also.

> I don't understand how HS could handle that. As soon as it sees the 
> commit record, the transaction becomes visible to readers.

I meant not-barf completely.

> >> The choice is to either commit anyway after the timeout, or wait forever.
> >
> > Hmm, wait forever. What happens if we try to shutdown fast while there
> > is a transaction that is waiting forever? Is that then a commit, even
> > though it never made it to the standby? How would we know it was safe to
> > switchover or not? Hmm.
> 
> Refuse to shut down until the standby acknowledges the commit. That's 
> the only way to be sure..
> 
> In practice, hard synchronous "don't return ever until the commit hits 
> the standby" behavior is rarely what admins actually want, because it's 
> disastrous from an availability point of view. More likely, admins want 
> "wait for ack from standby, unless it's not responding, in which case to 
> hell with redundancy and just act like a single server". It makes sense 
> if you just want to make sure that the standby doesn't return stale 
> results when it's working properly, and you're not worried about 
> durability but I'm not sure it's very sound otherwise.

Which is also crazy. If you're using synch rep its because you care
deeply about durability. Some people wish to treat the COMMIT as a
guarantee, not just a shrug.

I agree that don't-return-ever isn't something anyone will want.

What we need is a "COMMIT with ERROR" message!

Note that Oracle gives the options of COMMIT | SHUTDOWN at this point.
Shutdown is an implicit abort for the writing transaction...

At this point the primary thinks standby is no longer available. If we
have a split brain situation then we should be assuming we will STONITH
and shutdown the primary anyway. If we have more than one standby we can
stay up and probably shouldn't be sending an abort after a commit.

The trouble is *every* option is crazy from some perspective, so we must
consider them all, to see whether they are practical or impractical.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-26 Thread Pavel Stehule

2010/5/27 Heikki Linnakangas :
> On 27/05/10 02:09, alvherre wrote:
>>
>> Excerpts from Andrew Dunstan's message of mié may 26 18:52:33 -0400 2010:
>>
>>> I think we should fix it now.  Quick thought: maybe we could use FOR
>>> instead of AS: select myfunc(7 for a, 6 for b); IIRC the standard's
>>> mechanism for this is 'paramname =>  value', but I think that has
>>> problems because of our possibly use of =>  as an operator - otherwise
>>> that would be by far the best way to go.
>>
>> I think we were refraining from =>  because the standard didn't specify
>> this back then -- AFAIU this was introduced very recently.  But now that
>> it does, and that the syntax we're implementing conflicts with a
>> different feature, it seems wise to use the standard-mandated syntax.
>>
>> The problem with the =>  operator seems best resolved as not accepting
>> such an operator in a function parameter, which sucks but we don't seem
>> to have a choice.  Perhaps we could allow "=>" to resolve as the
>> operator for the case the user really needs to use it; or a
>> schema-qualified operator.
>
> AFAIU, the standard doesn't say anything about named parameters. Oracle uses
> =>, but as you said, that's ambiguous with the => operator.
>
> +1 for FOR.
>

I don't see any advantage of "FOR". We can change ir to support new
standard or don't change it.

Pavel

> --
>  Heikki Linnakangas
>  EnterpriseDB   http://www.enterprisedb.com
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-26 Thread Pavel Stehule

>>
>
> I think we should fix it now.  Quick thought: maybe we could use FOR instead
> of AS: select myfunc(7 for a, 6 for b); IIRC the standard's mechanism for
> this is 'paramname => value', but I think that has problems because of our
> possibly use of => as an operator - otherwise that would be by far the best
> way to go.
>

What is advice of "FOR" instead "AS"?

it is exactly same.

Regards
Pavel

> cheers
>
> andrew
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [spf:guess] Re: [HACKERS] ROLLBACK TO SAVEPOINT

2010-05-26 Thread Heikki Linnakangas


On 27/05/10 03:25, Florian Pflug wrote:

On May 27, 2010, at 0:58 , Heikki Linnakangas wrote:

On 26/05/10 02:00, Sam Vilain wrote:

Florian Pflug wrote:

On May 25, 2010, at 12:18 , Heikki Linnakangas wrote:

Releasing the newer savepoint will cause the older one to again become 
accessible, as the doc says, but rolling back to a savepoint does not 
implicitly release it. You'll have to use RELEASE SAVEPOINT for that.


Ah, now I get it. Thanks.

Would changing "Releasing the newer savepoint will cause ... " to "Explicitly releasing the 
newer savepoint" or maybe even "Explicitly releasing the newer savepoint with RELEASE SAVEPOINT 
will cause ..." make things clearer?


Yes, probably - your misreading matches my misreading of it :-)


+1.


Patch that changes the wording to "Explicitly releasing the newer savepoint with 
RELEASE SAVEPOINT will cause ..." is attached.


Thanks, committed. I left out the "Explicitly", though, because as Sam 
pointed out the newer savepoint can also be implicitly released by 
rolling back to an earlier savepoint.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronization levels in SR

2010-05-26 Thread Simon Riggs

On Thu, 2010-05-27 at 11:28 +0900, Fujii Masao wrote:
> On Wed, May 26, 2010 at 10:20 PM, Simon Riggs  wrote:
> > On Wed, 2010-05-26 at 18:52 +0900, Fujii Masao wrote:
> >
> >> I guess that dropping the support of #3 doesn't reduce complexity
> >> since the code of #3 is almost the same as that of #2. Like
> >> walreceiver sends the ACK after receiving the WAL in #2 case, it has
> >> only to do the same thing after the WAL flush.
> >
> > Hmm, well the code for #3 is similar also to the code for #4. So if you
> > do #2, its easy to do #2, #3 and #4 together.
> 
> No. #4 requires the way of prompt communication between walreceiver and
> startup process, but #2 and #3 not. That is, in #4, walreceiver has to
> wake the startup process up as soon as it has flushed WAL. OTOH, the
> startup process has to wake walreceiver up as soon as it has replayed
> WAL, to request it to send the ACK to the master. In #2 and #3, the
> prompt communication from walreceiver to startup process, i.e., changing
> the poll loop in the startup process would also be useful for the data
> to be visible immediately on the standby. But it's not required.

You need to pass WAL promptly on primary from backend to WALSender.
Whatever mechanism you use can also be reused symmetrically on standby
to provide #4. So not a problem.

> > The comment is about whether having #3 makes sense from a user interface
> > perspective. It's easy to add options, but they must have useful
> > meaning.
> 
> #3 would be useful for people wanting further robustness. In #2,
> when simultaneous power failure on the master and the standby,
> and concurrent disk crash on the master happen, transaction whose
> "success" indicator has been returned to a client might be lost.
> #3 can avoid such a critical situation. This is one of reasons that
> DRBD supports "Protocol C", I think.

Which few people use it, or if they do its because DRBD didn't
originally support multiple standbys. Not worth emulating IMHO.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages

2010-05-26 Thread Jesper Krogh



On 27/05/2010, at 02.48, Robert Haas  wrote:


On Wed, May 26, 2010 at 8:06 PM, Tom Lane  wrote:

Josh Berkus  writes:

How does that get us out of reading and writing old pages, though?


Yeah.  Neither PD_ALL_VISIBLE nor the visibility map are going to  
solve
your problem, because they cannot become set without having visited  
the

page.


Well, maybe I'm confused here, but arranging things so that we NEVER
have to visit the page after initially writing it seems like it's
setting the bar almost impossibly high.   Consider a table that is
regularly written but append-only.  Every time autovacuum kicks in,
we'll go and remove any dead tuples and then mark the pages
PD_ALL_VISIBLE and set the visibility map bits, which will cause
subsequent vacuums to ignore the all-visible portions of the table...
until anti-wraparound kicks in, at which point we'll vacuum the entire
table and freeze everything.


Just a thought.  Wouldn't a All-visible bit also enable index only  
scans to some degree?


Jesper

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Add _PG_init to PL language handler documentation

2010-05-26 Thread Joshua Tolley

On Wed, May 26, 2010 at 03:47:25PM -0400, Robert Haas wrote:
> On Tue, May 25, 2010 at 4:34 AM, Jonathan Leto  wrote:
> > This tiny doc patch adds _PG_init to the skeleton example code for a
> > PL. The information is quite valuable to PL authors, who might miss it
> > when it is described in the shared library documentation.
> 
> I'm not sure it does much good to add it to the template without any
> explanation of what it does.  What might make more sense is to add a
> cross-reference to the section on dynamic loading, where this is
> documented.

+1. How about the attached (which, incidentally, tested successfully on my
box, because I've managed to achieve doc building nirvana through blindly
flailing about until it worked...)?

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com
diff --git a/doc/src/sgml/plhandler.sgml b/doc/src/sgml/plhandler.sgml
index b82a5da..1162bcf 100644
*** a/doc/src/sgml/plhandler.sgml
--- b/doc/src/sgml/plhandler.sgml
***
*** 95,101 
 
  
 
! This is a template for a procedural-language handler written in C:
  
  #include "postgres.h"
  #include "executor/spi.h"
--- 95,104 
 
  
 
! This is a template for a procedural-language handler written in C. The
! documentation on C-language functions found in 
! applies to language handlers as well as any other C functions, and may be
! helpful to understand some portions of this code.
  
  #include "postgres.h"
  #include "executor/spi.h"


signature.asc
Description: Digital signature

[HACKERS] pg_trgm

2010-05-26 Thread Tatsuo Ishii

Hi,

Anyone working on make contrib/pg_trgm mutibyte encoding aware? If
not, I'm interested in the work.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] exporting raw parser

2010-05-26 Thread Tatsuo Ishii

> My "stand-alone" means libSQL can be used from many modules
> without duplicated codes. For example, copy routines for raw
> parse trees should be in the DLL rather than in postgres.exe.
> 
> Then, we need to consider other products than pgpool. Who will
> use the dll? If pgpool is the only user, we might not allow to
> modify core codes only for one usecase. More research other than
> pgpool is required to decide the interface routines for libSQL.

If the user of the new API is only pgpool-II, I hadn't made the
propose in the first place. It's a waste of time and I would rather
keep on borrowing the parse code. I thought there were several people
who needed the API as well in the cluster meeting. If somebody who
made such a vote in the meeting is on the list, please express your
opinion for the API.

I'm not in the position of speaking for other products.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ExecutorCheckPerms() hook

2010-05-26 Thread KaiGai Kohei

Stephen,

>> The 'failure' may make an impression of generic errors not only permission 
>> denied.
>> How about 'error_on_violation'?
> 
> Maybe 'ereport_on_violation'?  I dunno, guess one isn't really better
> than the other.  You need to go back and fix the comment though- you
> still say 'abort' there.

I have no preference between 'error_on_violation' and 'ereport_on_violation'.
OK, I fixed it.

>> BTW, I wonder whether acl.h is a correct place to explain about the hook,
>> although I added comments for the hook.
> 
> Guess I don't really see a problem putting the comments there.  By the
> way, have we got a place where we actually document the hooks we support
> somewhere in the official documentation..?  If so, that should certainly
> be updated too..

I could not find Executor hooks from doc/src/sgml using grep.
If so, it might be worth to list them on the wikipage.

>> I think we should add a series of explanation about ESP hooks in the internal
>> section of the documentation, when the number of hooks reaches a dozen for
>> example.
> 
> I believe the goal will be to avoid reaching a dozen hooks for this.

Maybe, all we need to hook on DML permissions is only this one.

> All-in-all, I'm pretty happy with these.  Couple minor places which
> could use some copy editing, but that's about it.
> 
> Next, we need to get the security label catalog and the grammar to
> support it implemented and then from that an SELinux module should
> be pretty easy to implement.  Based on the discussions at PGCon, Robert
> is working on the security label catalog and grammar.  The current plan
> is to have a catalog similar to pg_depend, to minimize impact to the
> rest of the backend and to those who aren't interested in using security
> labels.

Pg_depend? not pg_description/pg_shdescription?

I basically agree with the idea that minimizes damages to the existing schema
of system catalogs, but I cannot imagine something like pg_depend well.

I'd like to post a new thread to discuss the security label support. OK?

> Of course, there will also need to be hooks there for an
> external module to enforce restrictions associated with changing labels
> on various objects in the system.

Yes, the user given has to be validated by ESP.

Thanks,
-- 
KaiGai Kohei 
*** a/src/backend/catalog/aclchk.c
--- b/src/backend/catalog/aclchk.c
***
*** 135,140  static AclMode pg_aclmask(AclObjectKind objkind, Oid table_oid, AttrNumber attnu
--- 135,144 
  
  static AclResult check_rte_privileges(RangeTblEntry *rte, bool ereport_on_violation);
  
+ /*
+  * External security provider hooks
+  */
+ check_relation_privileges_hook_type check_relation_privileges_hook = NULL;
  
  #ifdef ACLDEBUG
  static void
***
*** 4730,4735  get_user_default_acl(GrantObjectType objtype, Oid ownerId, Oid nsp_oid)
--- 4734,4742 
   *	It checks access permissions for all relations listed in a range table.
   *	If violated, it raises an error or returns false depending on the 'abort'
   *	argument.
+  *	It also invokes an external security provide to check the permissions.
+  *	If it is available, both of the default PG checks and external checks
+  *	have to allow the required accesses for the relations.
   */
  AclResult
  check_relation_privileges(List *rangeTable, bool ereport_on_violation)
***
*** 4746,4751  check_relation_privileges(List *rangeTable, bool ereport_on_violation)
--- 4753,4763 
  		if (retval != ACLCHECK_OK)
  			return retval;
  	}
+ 
+ 	/* External security provider invocation */
+ 	if (check_relation_privileges_hook)
+ 		retval = (*check_relation_privileges_hook)(rangeTable,
+    ereport_on_violation);
  	return retval;
  }
  
*** a/src/backend/executor/execMain.c
--- b/src/backend/executor/execMain.c
***
*** 466,471  InitPlan(QueryDesc *queryDesc, int eflags)
--- 466,475 
  	 * check anything for other RTE types (function, join, subquery, ...).
  	 * Function RTEs are checked by init_fcache when the function is prepared
  	 * for execution. Join, subquery, and special RTEs need no checks.
+ 	 *
+ 	 * If available, it also invokes an external security provider. In this
+ 	 * case, both of the default PG checks and the external checks have to
+ 	 * allow the required accesses on the relation
  	 */
  	check_relation_privileges(rangeTable, true);
  
*** a/src/include/utils/acl.h
--- b/src/include/utils/acl.h
***
*** 196,201  typedef enum AclObjectKind
--- 196,216 
  	MAX_ACL_KIND/* MUST BE LAST */
  } AclObjectKind;
  
+ /*
+  * External security provider hooks
+  */
+ 
+ /*
+  * check_relation_privileges_hook
+  *	It allows an ESP to get control at check_relation_privileges().
+  *	A list of RangeTblEntry to be referenced and a flag to inform preferable
+  *	bahavior on access violations.
+  *	The ESP shall return ACLCHECK_OK if it allows the required access.
+  *	Elsewhere, it raises an error or return other AclResult statu

Re: [HACKERS] libpq should not be using SSL_CTX_set_client_cert_cb

2010-05-26 Thread Tom Lane

Craig Ringer  writes:
> On 27/05/10 10:21, Tom Lane wrote:
>> What will happen as things stand is that all the certs get loaded
>> into a common pool.  That's not too horrible as long as there are
>> not actual conflicts, but it could mean that for example some
>> connections trust CA certs that the app programmer expected to only
>> be trusted for other connections.  I did arrange (and test) that the
>> client cert and key are local to each connection, but leakage of
>> trusted root certs is a different story.

>> We could avoid this problem if we were willing to set up a separate
>> SSL_context for each connection, but I'm not sure if it's worth that.
>> The scenario where a single application process is managing multiple
>> distinct sets of trusted certs seems a bit far-fetched anyway.

> OpenSSL really doesn't seem to be designed for multiple truly 
> independent SSL contexts. The SSL context stuff has clearly been hacked 
> on after the fact to a library that started out having only one global 
> state, and it's pretty incomplete. I'm honestly not sure it's worth 
> trying to allow per-connection trust going, especially as (AFAIK) 
> there's no evidence that anybody _wants_ per-client-connection SSL trust 
> anyway.

Precisely.  I'm not excited about doing anything about this in the near
term, or even the not-so-near term.  I just wanted to get the
information into the PG archives in case it ever does become
significant.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] exporting raw parser

2010-05-26 Thread Takahiro Itagaki

Tatsuo Ishii  wrote:

> > The proposal will be acceptable only when all of the technical issues
> > are solved. The libSQL should also be available in stand-alone.
> > It should not be a collection of half-baked functions.
> 
> What do you mean by "should also be available in stand-alone"? If you
> want more abstract API than "libSQL", you could invent such a thing
> based on it as much as you like. IMO anything need to parse/operate
> the raw parse tree should be in libSQL.

My "stand-alone" means libSQL can be used from many modules
without duplicated codes. For example, copy routines for raw
parse trees should be in the DLL rather than in postgres.exe.

Then, we need to consider other products than pgpool. Who will
use the dll? If pgpool is the only user, we might not allow to
modify core codes only for one usecase. More research other than
pgpool is required to decide the interface routines for libSQL.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq should not be using SSL_CTX_set_client_cert_cb

2010-05-26 Thread Craig Ringer


On 27/05/10 10:21, Tom Lane wrote:


What will happen as things stand is that all the certs get loaded
into a common pool.  That's not too horrible as long as there are
not actual conflicts, but it could mean that for example some
connections trust CA certs that the app programmer expected to only
be trusted for other connections.  I did arrange (and test) that the
client cert and key are local to each connection, but leakage of
trusted root certs is a different story.

>

We could avoid this problem if we were willing to set up a separate
SSL_context for each connection, but I'm not sure if it's worth that.
The scenario where a single application process is managing multiple
distinct sets of trusted certs seems a bit far-fetched anyway.


OpenSSL really doesn't seem to be designed for multiple truly 
independent SSL contexts. The SSL context stuff has clearly been hacked 
on after the fact to a library that started out having only one global 
state, and it's pretty incomplete. I'm honestly not sure it's worth 
trying to allow per-connection trust going, especially as (AFAIK) 
there's no evidence that anybody _wants_ per-client-connection SSL trust 
anyway.


If I really needed that sort of thing, I'd try to abstract the OpenSSL 
stuff away a layer, so it could be replaced by Mozilla NSS (or GnuTLS, 
or whatever) if an application needed it. This would make it easier to 
use libpq in apps that already have their own SSL/TLS environment based 
on a different library, and would make it possible for apps to use more 
flexible libraries if they needed to do complex things with SSL/TLS.


Really, though, is it necessary? What's the use case for trusting one 
cert for one connection, and another cert for another connection, beyond 
what's already provided by 'verify-full' in terms of hostname/ip 
checking? It *is* possible in more sophisticated APIs (Java, NSS too I 
think) but it's hard to imagine client-side reasons to use it.


(Or have I misunderstood what you're describing?)

--
Craig Ringer

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql's is_select_command is naive

2010-05-26 Thread Robert Haas

On Wed, May 26, 2010 at 10:35 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> It knows that queries beginning with "select" or "values" are select
>> commands, but it seems not to be clued in about "table" and "with".
>
> What we really ought to do IMO is throw out the entire current
> implementation of fetch_count.  If libpq exposed access to the
> protocol-level fetch count, we could implement it without this
> cursor kluge.

I suspect that would make a lot of people very happy.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql's is_select_command is naive

2010-05-26 Thread Tom Lane

Robert Haas  writes:
> It knows that queries beginning with "select" or "values" are select
> commands, but it seems not to be clued in about "table" and "with".

What we really ought to do IMO is throw out the entire current
implementation of fetch_count.  If libpq exposed access to the
protocol-level fetch count, we could implement it without this
cursor kluge.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronization levels in SR

2010-05-26 Thread Fujii Masao

On Wed, May 26, 2010 at 10:20 PM, Simon Riggs  wrote:
> On Wed, 2010-05-26 at 18:52 +0900, Fujii Masao wrote:
>
>> I guess that dropping the support of #3 doesn't reduce complexity
>> since the code of #3 is almost the same as that of #2. Like
>> walreceiver sends the ACK after receiving the WAL in #2 case, it has
>> only to do the same thing after the WAL flush.
>
> Hmm, well the code for #3 is similar also to the code for #4. So if you
> do #2, its easy to do #2, #3 and #4 together.

No. #4 requires the way of prompt communication between walreceiver and
startup process, but #2 and #3 not. That is, in #4, walreceiver has to
wake the startup process up as soon as it has flushed WAL. OTOH, the
startup process has to wake walreceiver up as soon as it has replayed
WAL, to request it to send the ACK to the master. In #2 and #3, the
prompt communication from walreceiver to startup process, i.e., changing
the poll loop in the startup process would also be useful for the data
to be visible immediately on the standby. But it's not required.

> The comment is about whether having #3 makes sense from a user interface
> perspective. It's easy to add options, but they must have useful
> meaning.

#3 would be useful for people wanting further robustness. In #2,
when simultaneous power failure on the master and the standby,
and concurrent disk crash on the master happen, transaction whose
"success" indicator has been returned to a client might be lost.
#3 can avoid such a critical situation. This is one of reasons that
DRBD supports "Protocol C", I think.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] exporting raw parser

2010-05-26 Thread Tatsuo Ishii

> I read your proposal says "postgres.exe" will link to "libSQL.dll",
> and "pgpool.exe" will also link to the DLL, right?

Perhaps.

> I think it is reasonable, but I'm not sure what part of postgres
> should be in the DLL. Obviously we should avoid code duplication
> between the DLL and "postgres.exe".
>
> > - create an exportable version of memory manager
> > - create an exportable exception handling routines(i.e. elog)
> 
> Are there any other issues? For example,
>   - How to split headers for raw parser nodes?
>   - Which module do we define T_xxx enumerations and support functions?
> (outfuncs, readfuncs, copyfuncs, and equalfuncs)
> 
> The proposal will be acceptable only when all of the technical issues
> are solved. The libSQL should also be available in stand-alone.
> It should not be a collection of half-baked functions.

What do you mean by "should also be available in stand-alone"? If you
want more abstract API than "libSQL", you could invent such a thing
based on it as much as you like. IMO anything need to parse/operate
the raw parse tree should be in libSQL.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq should not be using SSL_CTX_set_client_cert_cb

2010-05-26 Thread Tom Lane

I wrote:
> It strikes me that we could not only fix this case, but make the libpq
> code simpler and more like the backend case, if we got rid of
> client_cert_cb and instead preloaded the ~/.postgresql/postgresql.crt
> file using SSL_CTX_use_certificate_chain_file().

Just for the archives: I've applied a patch along that line, but it made
me realize afresh what an ugly kluge OpenSSL's API is.  Unless I've
missed something basic, it's only possible to load additional certs for
a cert chain into an SSL_CTX object, not an SSL object.  (This is
reflected in the fact that SSL_CTX_use_certificate_chain_file doesn't
have an SSL-object equivalent, and even more fundamentally that
SSL_CTX_add_extra_chain_cert doesn't either.)  Basically, certificate
stores are managed at the SSL_CTX level not as part of SSL objects.

This is a problem for libpq because it tries to maintain only one
SSL_CTX object per client-side process: if you have multiple PG
connections in a single client process, either concurrently or one 
after another, they all share the same SSL_CTX.  Now that doesn't
matter so long as all the connections use the same
sslcert/sslkey/sslrootcert/sslcrl settings, but what if they don't?

What will happen as things stand is that all the certs get loaded
into a common pool.  That's not too horrible as long as there are
not actual conflicts, but it could mean that for example some
connections trust CA certs that the app programmer expected to only
be trusted for other connections.  I did arrange (and test) that the
client cert and key are local to each connection, but leakage of
trusted root certs is a different story.

We could avoid this problem if we were willing to set up a separate
SSL_context for each connection, but I'm not sure if it's worth that.
The scenario where a single application process is managing multiple
distinct sets of trusted certs seems a bit far-fetched anyway.

Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql's is_select_command is naive

2010-05-26 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> For now, we can probably get by with just adding those to the list of
> things it checks for.  But I wonder what we'll do about this if we
> ever get CTEs for insert/update/delete statements...  you'd have to
> parse the whole darn thing to figure out whether it was WITH...SELECT
> or WITH...INSERT or WITH...INSERT...RETURNING.

Would be nice if it could just ask the backend if it's planning to send
back some data. :)

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] exporting raw parser

2010-05-26 Thread Tatsuo Ishii

> As was already discussed, I don't believe that premise.  None of the
> applications you cite would be able to make use of the raw parser
> output, because it doesn't contain the semantic information they need.
> If what you actually meant was the analyzed parse tree, that *might*
> serve the need depending on just what is wanted (in particular,
> properties that could be affected by the expansion of views or
> inlineable functions could still not be determined reliably).
> But you can't have that without access to the current system catalog
> contents.

No, what pgpoo-II needs is a raw parse tree. When it needs info in the
system catalog, it sends SELECT to PostgreSQL. So that would be no
problem.

> In any case there's the serious problem that we simply are not going
> to promise that the parser output representation is stable.  We've
> changed it many times in the past and will do so in the future.

That's acceptable at least for pgpool-II. Basically what I need is,
a)SQL statement type, b)target tables, c)target columns(functions)
etc., which seem pretty stable among versions. Even if PostgreSQL
changes the representation of the praser, pgpool-II could ask the
PostgreSQL version and could undertstand the different
representations. Pgpool-II has already done this with the system
catalog changes.

Also good thing is, the parser provides nice APIs to process the parse
tree: raw_expression_tree_walker, outfuncs and macros. Those will
absorb the version difference.

> Quite aside from whether the result would be of any use or not, that
> opinion is obviously wrong.  This would be at least as difficult to
> maintain as ecpg ... which has been a enormous time sink.

>From reading README.parser of ecpg, the maintenance problem with ecpg
seems comes from that it needs to modify the grammer. My proposal
does not require the grammer changes. So I don't understand why you
think this would be difficult as ecpg.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ExecutorCheckPerms() hook

2010-05-26 Thread Stephen Frost

KaiGai,

* KaiGai Kohei (kai...@ak.jp.nec.com) wrote:
> Stephen, thanks for comments.
> 
> The attached three patches are the revised and divided ones.
> 
>  A: add makeRangeTblEntry()

Ok, didn't actually expect that.  Guess my suggestion would have been to
just use makeNode() since there wasn't anything more appropriate already.
Still, I don't really have a problem with your makeRangeTblEntry() and
you certainly found quite a few places to use it.

>  B: major reworks of DML permission checks

No serious issues with this that I saw either.  I definitely think it's
cleaner using makeRangeTblEntry() and check_relation_privileges().

>  C: add an ESP hook on the DML permission checks

This also looks good to me, though I don't know that you really need the
additional comment in execMain.c about the hook.  I would make sure that
you have a comment around check_rte_privileges() which says not to call
it directly because you'll bypass the hook (and potentially cause a
security leak by doing so).  Don't recall seeing that, apologies if it
was there.

> IIRC, Robert suggested that a verb should lead the function name.
> So, I renamed it into check_relation_privileges() and check_rte_privileges().

Yeah, that's alright.  I'm on the fence about using 'relation' or using
'rangetbl' there, but certainly whomever commits this could trivially
change it to whatever they prefer.

> The 'failure' may make an impression of generic errors not only permission 
> denied.
> How about 'error_on_violation'?

Maybe 'ereport_on_violation'?  I dunno, guess one isn't really better
than the other.  You need to go back and fix the comment though- you
still say 'abort' there.

> > - Have you checked if there are any bad side-effects from calling
> >ri_FetchConstraintInfo before doing the permissions checking?
> 
> The ri_FetchConstraintInfo() only references SysCaches to set up given
> local variable without any other locks except for ones acquired by syscache.c.

Ok.

> > - The hook in acl.h should be separated out and brought to the top and
> >documented independently as to exactly where the hook is and what it
> >can be used for, along with what the arguments mean, etc.  Similairly,
> >chkpriv_relation_perms should really have a short comment for it about
> >what it's for.  Something more than 'security checker function'.
> 
> OK, at the patch-C, I moved the definition of the hook into the first half
> of acl.h, but it needs to be declared after the AclResult definition.

Fair enough.

> BTW, I wonder whether acl.h is a correct place to explain about the hook,
> although I added comments for the hook.

Guess I don't really see a problem putting the comments there.  By the
way, have we got a place where we actually document the hooks we support
somewhere in the official documentation..?  If so, that should certainly
be updated too..

> I think we should add a series of explanation about ESP hooks in the internal
> section of the documentation, when the number of hooks reaches a dozen for
> example.

I believe the goal will be to avoid reaching a dozen hooks for this.

All-in-all, I'm pretty happy with these.  Couple minor places which
could use some copy editing, but that's about it.

Next, we need to get the security label catalog and the grammar to
support it implemented and then from that an SELinux module should
be pretty easy to implement.  Based on the discussions at PGCon, Robert
is working on the security label catalog and grammar.  The current plan
is to have a catalog similar to pg_depend, to minimize impact to the
rest of the backend and to those who aren't interested in using security
labels.  Of course, there will also need to be hooks there for an
external module to enforce restrictions associated with changing labels
on various objects in the system.

Thanks,

Stephen

signature.asc
Description: Digital signature

[HACKERS] psql's is_select_command is naive

2010-05-26 Thread Robert Haas

It knows that queries beginning with "select" or "values" are select
commands, but it seems not to be clued in about "table" and "with".

For now, we can probably get by with just adding those to the list of
things it checks for.  But I wonder what we'll do about this if we
ever get CTEs for insert/update/delete statements...  you'd have to
parse the whole darn thing to figure out whether it was WITH...SELECT
or WITH...INSERT or WITH...INSERT...RETURNING.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [spf:guess] Re: [HACKERS] ROLLBACK TO SAVEPOINT

2010-05-26 Thread Mark Kirkwood


On 27/05/10 12:25, Florian Pflug wrote:



Patch that changes the wording to "Explicitly releasing the newer savepoint with 
RELEASE SAVEPOINT will cause ..." is attached.

Unfortunately, this patch is untested. I couldn't get openjade + DocBook to 
work on OSX for some reason :-(

   


FWIW docs with this patch seem to build ok for me on Ubuntu Lucid 64 bit.

regards

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Stefan's bug (was: max_standby_delay considered harmful)

2010-05-26 Thread Robert Haas

On Wed, May 26, 2010 at 9:06 PM, Fujii Masao  wrote:
> On Wed, May 26, 2010 at 9:40 PM, Robert Haas  wrote:
>> On Mon, May 24, 2010 at 10:35 AM, Fujii Masao  wrote:
>>> On Mon, May 24, 2010 at 10:26 PM, Robert Haas  wrote:
 This looks pretty reasonable to me, but I guess I feel like it would
 be better to drive the CancelBackup() decision off of whether we've
 ever reached PM_RUN rather than consulting XLogCtl.  It just feels
 cleaner to me to drive all of the postmaster decisions off of the same
 signalling mechanism rather than having a separate one (that only
 works because it's used very late in shutdown when we theoretically
 don't need a lock) just for this one case.
>>>
>>> Okay, how about the attached patch? It uses the postmaster-local flag
>>> "ReachedEndOfRecovery" (better name?) instead of XLogCtl one.
>>
>> I've committed part of this patch, with the naming change that Tom
>> suggested.
>
> Thanks!
>
>> The parts I haven't committed are:
>>
>> 1. I don't see why we need to reset ReachedEndOfRecovery starting over
>> from PM_NO_CHILDREN.  It seems to me that once we reach PM_RUN, we
>> CAN'T go back to needing the backup label file, even if we have a
>> subsequent backend crash.  If I'm wrong, please let me know why and
>> I'll go put this back (with an appropriate comment).
>
> That reset has nothing to do with cancellation of the backup mode.
> I just added it since postmaster might use ReachedNormalRunning for
> another purpose in the future. For now, I have no objection not to
> add it.

OK, good.  I'm not sure it would be right to add it any way - reached
normal running sounds to me like it ought to mean "reached normal
running, ever" rather than "reached normal running since the last
backend crash".  In any event, it's moot for now.

>> 2. The changes to avoid launching WALReceiver except during certain
>> PM_* states.  It seems fairly sensible, but what is the case where
>> adding this logic prevents a problem?
>
> The problem is that shutdown would get stuck when walreceiver is
> invoked in PM_WAIT_BACKEND state. Imagine the following scenario:
>
> 1. Fast shutdown is requested just before the startup process calls
>   RequestXLogStreaming() which is the function to request postmaster
>   to invoke walreceiver.
>
> 2. pmdie() sends SIGTERM to the startup process, but not walreceiver
>   because it's not been started yet. Then, pmdie() switches the
>   state into PM_WAIT_BACKENDS.
>
> 3. The startup process goes through RequestXLogStreaming() and requests
>   postmaster to start walreceiver before processing SIGTERM sent from
>   pmdie(). Then the startup process exits, and postmaster invokes
>   walreceiver in PM_WAIT_BACKENDS state.
>
> 4. Once postmaster has reached PM_WAIT_BACKENDS, there is no way to send
>   SIGTERM to walreceiver. OTOH, postmaster cannot advance the state from
>   PM_WAIT_BACKENDS until walreceiver has gone away. So shutdown gets
>   stuck.

OK, makes sense.   I have committed this part also.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] exporting raw parser

2010-05-26 Thread Takahiro Itagaki

Tatsuo Ishii  wrote:

> I'm thinking about exporting the raw parser and related modules as a C
> library. Though this will not be an immediate benefit of PostgreSQL
> itself, it will be a huge benefit for any PostgreSQL
> applications/middle ware those need to parse SQL statements.

I read your proposal says "postgres.exe" will link to "libSQL.dll",
and "pgpool.exe" will also link to the DLL, right?

I think it is reasonable, but I'm not sure what part of postgres
should be in the DLL. Obviously we should avoid code duplication
between the DLL and "postgres.exe".

> - create an exportable version of memory manager
> - create an exportable exception handling routines(i.e. elog)

Are there any other issues? For example,
  - How to split headers for raw parser nodes?
  - Which module do we define T_xxx enumerations and support functions?
(outfuncs, readfuncs, copyfuncs, and equalfuncs)

The proposal will be acceptable only when all of the technical issues
are solved. The libSQL should also be available in stand-alone.
It should not be a collection of half-baked functions.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ExecutorCheckPerms() hook

2010-05-26 Thread KaiGai Kohei

Stephen, thanks for comments.

The attached three patches are the revised and divided ones.

 A: add makeRangeTblEntry()
 B: major reworks of DML permission checks
 C: add an ESP hook on the DML permission checks

(2010/05/27 0:09), Stephen Frost wrote:
> KaiGai,
> 
> * KaiGai Kohei (kai...@ak.jp.nec.com) wrote:
>> The attached patch is a revised one for DML permission checks.
> 
> This is certainly alot better.
> 
>> ToDo:
>> - makeRangeTblEntry() stuff to allocate a RTE node with given parameter
>>is not yet.
> 
> I'd certainly like to see the above done, or to understand why it can't
> be if that turns out to be the case.

The patch-A tries to implement makeRangeTblEntry() which takes only rtekind
as argument right now.
Other fields are initialized to zero, using makeNode().

> A couple of other comments, all pretty minor things:
> 
> - I'd still rather see the hook itself in another patch, but given that
>we've determined that none of this is going to go into 9.0, it's not
>as big a deal.

OK, I divided the ESP hook part into the patch-C.

> - The hook definition in aclchk.c should really be at the top of that
>file.  We've been pretty consistant about putting hooks at the top of
>files instead of deep down in the file, this should also follow that
>scheme.

OK, I moved it.

> - Some of the comments at the top of chkpriv_rte_perms probably make
>sense to move up to where it's called from execMain.c.  Specifically,
>the comments about the other RTE types (function, join, subquery).
>I'd probably change the comment in chkpriv_rte_perms to be simpler-
>"This is only used for checking plain relation permissions, nothing
>else is checked here", and also have that same comment around
>chkpriv_relation_perms, both in aclchk.c and in acl.h.

OK, I edited the comment as follows:

|   /*
|* Do permissions checks. The check_relation_privileges() checks access
|* permissions for all relations listed in a range table, but does not
|* check anything for other RTE types (function, join, subquery, ...).
|* Function RTEs are checked by init_fcache when the function is prepared
|* for execution. Join, subquery, and special RTEs need no checks.
|*/

> - I'd move chkpriv_relation_perms above chkpriv_rte_perms, it's what we
>expect people to use, after all.

OK, I reordered it.

> - Don't particularly like the function names.  How about
>relation_privilege_check?  Or rangetbl_privilege_check?  We don't use
>'perms' much (uh, at all?) in function names, and even if we did, it'd
>be redundant and not really help someone understand what the function
>is doing.

IIRC, Robert suggested that a verb should lead the function name.
So, I renamed it into check_relation_privileges() and check_rte_privileges().

> - I don't really like having 'abort' as the variable name for the 2nd
>argument.  I'm not finding an obvious convention right now, but maybe
>something like "error_on_failure" instead?

The 'failure' may make an impression of generic errors not only permission 
denied.
How about 'error_on_violation'?

> - In DoCopy, some comments about what you're doing there to set up for
>calling chkpriv_relation_perms would be good (like the comment you
>removed- /* We don't have table permissions, check per-column
>permissions */, updated to for something like "build an RTE with the
>columns referenced marked to check for necessary privileges").
>Additionally, it might be worth considering if having an RTE built
>farther up in DoCopy would make sense and would then be usable for
>other bits in DoCopy.

I edited the comments as follows:

| /*
|  * Check relation permissions.
|  * We built an RTE with the relation and columns to be accessed
|  * to check for necessary privileges in the common way.
|  */

> - In RI_Initial_Check, why not build up an actual list of RTEs and just
>call chkpriv_relation_perms once?  Also, you should add comments
>there, again, about what you're doing and why.  If you can use another
>function to build the actual RTE, this will probably fall out more
>sensibly too.

Good catch! I fixed the invocation of checker function with list_make2().

And, I edited the comments as follows:

| /*
|  * We built a pair of RTEs of FK/PK relations and columns referenced
|  * in the test query to check necessary permission in the common way.
|  */

> - Have you checked if there are any bad side-effects from calling
>ri_FetchConstraintInfo before doing the permissions checking?

The ri_FetchConstraintInfo() only references SysCaches to set up given
local variable without any other locks except for ones acquired by syscache.c.

> - The hook in acl.h should be separated out and brought to the top and
>documented independently as to exactly where the hook is and what it
>can be used for, along with what the arguments mean, etc.  Similairly,
>chkpriv_relation_perms should real

Re: [HACKERS] release notes

2010-05-26 Thread Bruce Momjian

Andrew Dunstan wrote:
> 
> 
> Tom Lane wrote:
> > Andrew Dunstan  writes:
> >   
> >> OK ... I guess I was bothered because this has been referred to in a 
> >> public press release about the Beta. The PLPerl security stuff is 
> >> missing too, so I'll fix that also.
> >> 
> >
> > The security stuff isn't relevant to the 9.0 notes, since it's already
> > been fixed in a previous release.  In general we don't document bug
> > fixes in a major release if they already appeared in previous
> > back-patches; the major release notes are quite verbose enough without
> > such duplication.  (In effect, the idea is that major release notes
> > should represent the delta from the previous major release *as it stood
> > at the time of the new major release*.)
> >   
> 
> 
> OK, then I will just remove the now redundant item regarding Safe.pm.

Yes, please make whatever updates to the release notes you feel are
appropriate.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] exporting raw parser

2010-05-26 Thread Tom Lane

Tatsuo Ishii  writes:
> I'm thinking about exporting the raw parser and related modules as a C
> library. Though this will not be an immediate benefit of PostgreSQL
> itself, it will be a huge benefit for any PostgreSQL
> applications/middle ware those need to parse SQL statements.

As was already discussed, I don't believe that premise.  None of the
applications you cite would be able to make use of the raw parser
output, because it doesn't contain the semantic information they need.
If what you actually meant was the analyzed parse tree, that *might*
serve the need depending on just what is wanted (in particular,
properties that could be affected by the expansion of views or
inlineable functions could still not be determined reliably).
But you can't have that without access to the current system catalog
contents.

In any case there's the serious problem that we simply are not going
to promise that the parser output representation is stable.  We've
changed it many times in the past and will do so in the future.

> I think those works are essentially a refactoring of existing raw
> parser, and will not add performance degration nor maintenance cost.

Quite aside from whether the result would be of any use or not, that
opinion is obviously wrong.  This would be at least as difficult to
maintain as ecpg ... which has been a enormous time sink.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-26 Thread Robert Haas

On Wed, May 26, 2010 at 9:28 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Wed, May 26, 2010 at 8:21 PM, Tom Lane  wrote:
>>> If we go with the spec's syntax I think we'd have no realistic choice
>>> except to forbid => altogether as an operator name.  (And no, I'm not
>>> for that.)
>
>> I suppose the most painful thing about doing that is that it would
>> break hstore.  Are there other commonly-used modules that rely on =>
>> as an operator name?
>
> There don't seem to be any other contrib modules that define => as an
> operator name, but I'm not sure what's out there on pgfoundry or
> elsewhere.  The bigger issue to me is not so much hstore itself as that
> this is an awfully attractive operator name for anything container-ish.
> Wasn't the JSON-datatype proposal using => for an operator at one stage?
> (The current wiki page for it doesn't seem to reflect any such idea,
> though.)  And I think I remember Oleg & Teodor proposing such an
> operator in conjunction with some GIN-related idea or other.
>
>> In spite of the difficulties, I'm reluctant to give up on it.  I
>> always thought that the "AS" syntax was a crock and I'm not eager to
>> invent another crock to replace it.  Being compatible with the SQL
>> standard and with Oracle is not to be taken lightly.
>
> Yeah, I know.  Though this could end up being one of the bits of the
> spec that we politely decline to follow, like upper-casing identifiers.
> Still, it's a good idea to think again before we've set the release
> in stone ...

Perhaps one idea would be to:

1. Invent a new crock for now.
2. Add a duplicate version of the hstore => operator with a different name.
3. Emit a warning whenever an operator called => is created.
4. Document that beginning in PG 9.1, we will no longer support => as
an operator name.

That's still going to cause a fair amount of pain, but certainly less
if we decide it now rather than later.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages

2010-05-26 Thread Robert Haas

On Wed, May 26, 2010 at 8:59 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Wed, May 26, 2010 at 8:06 PM, Tom Lane  wrote:
>>> Yeah.  Neither PD_ALL_VISIBLE nor the visibility map are going to solve
>>> your problem, because they cannot become set without having visited the
>>> page.
>
>> Well, maybe I'm confused here, but arranging things so that we NEVER
>> have to visit the page after initially writing it seems like it's
>> setting the bar almost impossibly high.
>
> Well, that was the use-case that Josh was on about when this idea came
> up: high-volume append-only log tables that in most cases will never be
> read, so his client wants to get rid of the extra I/O for maintenance
> visits to once-written pages.

Well, I'll just note that using PD_ALL_VISIBLE as I'm proposing is
basically equivalent to Josh's original proposal of using an XID epoch
except that it addresses all three of the "obvious issues" which he
noted in his original email; plus it doesn't prevent truncating CLOG
(on the assumption that we rejigger things not to consult clog when
the page is marked PD_ALL_VISIBLE).

> If you're willing to allow one visit and rewrite of each page, then
> we can do that today with maybe a bit of rejiggering of vacuum's
> when-to-freeze heuristics.

Hmm, yeah.  Maybe we should freeze when we set PD_ALL_VISIBLE; that
might be just as good, and simpler.  Assuming the visibility map is
sufficiently crash-safe/non-buggy, we could then teach VACUUM that
it's OK to advance relfrozenxid even when doing just a partial vacuum
- because any pages that were skipped must contain only frozen tuples.
 Previously you've objected to proposals in this direction because
they might destroy forensic information, but maybe we should do it
anyway.

Either way, I think if we do this it *basically* gets rid of
anti-wraparound vacuum.  Yeah, we'll still do routine partial vacuums,
but what you won't have is... write the table, vacuum, vacuum, vacuum,
vacuum, OK, everything's visible to everyone, don't need to vacuum any
more...  months pass...  boom, unexpected full-table vacuum.  The
beginning part is the same, but you get rid of the boom at the end.
The only way I see to cut down vacuum activity even further is to
freeze them (and set the visibility map bit) before evicting them from
shared_buffers.  That's really the only way to get "write once and
only once", but it's pretty hit or miss, because the xmin horizon
might not advance fast enough to make it actually work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-26 Thread Tom Lane

Robert Haas  writes:
> On Wed, May 26, 2010 at 8:21 PM, Tom Lane  wrote:
>> If we go with the spec's syntax I think we'd have no realistic choice
>> except to forbid => altogether as an operator name.  (And no, I'm not
>> for that.)

> I suppose the most painful thing about doing that is that it would
> break hstore.  Are there other commonly-used modules that rely on =>
> as an operator name?

There don't seem to be any other contrib modules that define => as an
operator name, but I'm not sure what's out there on pgfoundry or
elsewhere.  The bigger issue to me is not so much hstore itself as that
this is an awfully attractive operator name for anything container-ish.
Wasn't the JSON-datatype proposal using => for an operator at one stage?
(The current wiki page for it doesn't seem to reflect any such idea,
though.)  And I think I remember Oleg & Teodor proposing such an
operator in conjunction with some GIN-related idea or other.

> In spite of the difficulties, I'm reluctant to give up on it.  I
> always thought that the "AS" syntax was a crock and I'm not eager to
> invent another crock to replace it.  Being compatible with the SQL
> standard and with Oracle is not to be taken lightly.

Yeah, I know.  Though this could end up being one of the bits of the
spec that we politely decline to follow, like upper-casing identifiers.
Still, it's a good idea to think again before we've set the release
in stone ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Regression testing for psql

2010-05-26 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> There might be some value in psql backslash command tests that
> are designed to depend on just one or a few tables (or other appropriate
> objects).

Updated, much much smaller, patch attached.  Also available, again, at
http://snowman.net/~sfrost/psql-regress-help.patch

Basically, I removed anything that would produce data directly from
the catalogs by trying to find a 'none' object which matched.  This
still goes through alot of the same setup and query, it's just that
there aren't any results.

Thanks,

Stephen

psql-regress-help.patch.gz
Description: Binary data

signature.asc
Description: Digital signature

Re: [HACKERS] exporting raw parser

2010-05-26 Thread Josh Berkus


> I think those works are essentially a refactoring of existing raw
> parser, and will not add performance degration nor maintenance cost.
> 
> Comments?

You should call it "libSQL"; who knows, other DB projects might want it.
 They seem to borrow our parser enough as it is.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Stefan's bug (was: max_standby_delay considered harmful)

2010-05-26 Thread Fujii Masao

On Wed, May 26, 2010 at 9:54 PM, Robert Haas  wrote:
> On Tue, May 25, 2010 at 6:19 AM, Fujii Masao  wrote:
>> On Tue, May 18, 2010 at 3:09 PM, Fujii Masao  wrote:
> (2)
> pg_ctl -ms stop emits the following warning whenever there is the
> backup_label file in $PGDATA.
>
>       WARNING: online backup mode is active
>       Shutdown will not complete until pg_stop_backup() is called.
>
> This warning doesn't fit in with the shutdown during recovery case.
> Since smart shutdown might be requested by other than pg_ctl, the
> warning should be emitted in server side rather than client, I think.
> How about moving the warning to the server side?
>>>
>>> Though I'm not sure if this should be fixed for 9.0, I attached the
>>> patch (move_bkp_cancel_warning_v1.patch).
>>
>> This patch is worth applying for 9.0? If not, I'll add it into
>> the next CF for 9.1.
>
> I'm not convinced that this is a good idea, because ISTM it will make
> the error message to be less likely to be seen by the person running
> pg_ctl.  In any case, it's a behavior change, so I think that means
> it's a no-go for 9.0.
>
> In terms of 9.1, it might make sense to log something to both places.
> But maybe we shouldn't just do it once - maybe it should happen every
> 30 s or so until we actually manage to shut down, with a list of
> what's still blocking shutdown a la errdetail_busy_db.

Fair enough. I'll think of the issue in detail again for 9.1.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Stefan's bug (was: max_standby_delay considered harmful)

2010-05-26 Thread Fujii Masao

On Wed, May 26, 2010 at 9:40 PM, Robert Haas  wrote:
> On Mon, May 24, 2010 at 10:35 AM, Fujii Masao  wrote:
>> On Mon, May 24, 2010 at 10:26 PM, Robert Haas  wrote:
>>> This looks pretty reasonable to me, but I guess I feel like it would
>>> be better to drive the CancelBackup() decision off of whether we've
>>> ever reached PM_RUN rather than consulting XLogCtl.  It just feels
>>> cleaner to me to drive all of the postmaster decisions off of the same
>>> signalling mechanism rather than having a separate one (that only
>>> works because it's used very late in shutdown when we theoretically
>>> don't need a lock) just for this one case.
>>
>> Okay, how about the attached patch? It uses the postmaster-local flag
>> "ReachedEndOfRecovery" (better name?) instead of XLogCtl one.
>
> I've committed part of this patch, with the naming change that Tom
> suggested.

Thanks!

> The parts I haven't committed are:
>
> 1. I don't see why we need to reset ReachedEndOfRecovery starting over
> from PM_NO_CHILDREN.  It seems to me that once we reach PM_RUN, we
> CAN'T go back to needing the backup label file, even if we have a
> subsequent backend crash.  If I'm wrong, please let me know why and
> I'll go put this back (with an appropriate comment).

That reset has nothing to do with cancellation of the backup mode.
I just added it since postmaster might use ReachedNormalRunning for
another purpose in the future. For now, I have no objection not to
add it.

> 2. The changes to avoid launching WALReceiver except during certain
> PM_* states.  It seems fairly sensible, but what is the case where
> adding this logic prevents a problem?

The problem is that shutdown would get stuck when walreceiver is
invoked in PM_WAIT_BACKEND state. Imagine the following scenario:

1. Fast shutdown is requested just before the startup process calls
   RequestXLogStreaming() which is the function to request postmaster
   to invoke walreceiver.

2. pmdie() sends SIGTERM to the startup process, but not walreceiver
   because it's not been started yet. Then, pmdie() switches the
   state into PM_WAIT_BACKENDS.

3. The startup process goes through RequestXLogStreaming() and requests
   postmaster to start walreceiver before processing SIGTERM sent from
   pmdie(). Then the startup process exits, and postmaster invokes
   walreceiver in PM_WAIT_BACKENDS state.

4. Once postmaster has reached PM_WAIT_BACKENDS, there is no way to send
   SIGTERM to walreceiver. OTOH, postmaster cannot advance the state from
   PM_WAIT_BACKENDS until walreceiver has gone away. So shutdown gets
   stuck.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-26 Thread Heikki Linnakangas


On 27/05/10 03:57, Robert Haas wrote:

Being compatible with the SQL
standard and with Oracle is not to be taken lightly.


I seem to be alone believing that the SQL standard doesn't say anything 
about named function parameters. Can someone point me to the relevant 
section of the standard?


As evidence to the contrary:
http://archives.postgresql.org/pgsql-hackers/2009-08/msg00558.php

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] exporting raw parser

2010-05-26 Thread Tatsuo Ishii

I'm thinking about exporting the raw parser and related modules as a C
library. Though this will not be an immediate benefit of PostgreSQL
itself, it will be a huge benefit for any PostgreSQL
applications/middle ware those need to parse SQL statements.

For example, pgpool-II parses queries to know if it's a read query or
not. In other case, it needs to know if a SELECT statement includes
any temporal constructor such as CURRENT_TIME_STAMP. These are not a
trivial job since SQL grammar is complex. For this purpose pgpool-II
copies PostgreSQL parser code and use it. Of course maintaining the
part is pain since PostgreSQL's parser will be changed from release to
release.

I believe not only pgpool-II but some connection pooling middle wares
need SQL parser as well(pgbouncer?). Also any tool which accepts SQL
statement as its input would also need SQL parser(pgAdmin?). For them
exported raw parser will be a huge benefit.

The implementation will not be very difficult since pgpool-II has
already done most of necessary work for this:

- extract raw parser part from parser directory, which include gram.y,
  scan.l and keywords.c

- extract utility functions needed to handle raw parse tree:
  nodes/nodes.c makefunc.c etc.

- create an exportable version of memory manager

- create an exportable exception handling routines(i.e. elog)

- wrap all of above into a libXX*.so

I think those works are essentially a refactoring of existing raw
parser, and will not add performance degration nor maintenance cost.

Comments?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Distclean does not remove gram.c

2010-05-26 Thread Robert Haas

On Wed, May 26, 2010 at 8:20 PM, Andrew Dunstan  wrote:
> Gurjeet Singh wrote:
>>
>> I did a `git clean -f -d` and even that did not remove gram.c, apparently
>> because this file _was_ alive at some point in the past hence git won't
>> remove it even though the current branch does not have gram.c.
>
> At first glance that looks like a git bug.

My guess is that either .git/info/exclude or a .gitignore file
someplace says to ignore gram.c.  git clean -df will not remove such
files; you need git clean -dfx if you want that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages

2010-05-26 Thread Tom Lane

Robert Haas  writes:
> On Wed, May 26, 2010 at 8:06 PM, Tom Lane  wrote:
>> Yeah.  Neither PD_ALL_VISIBLE nor the visibility map are going to solve
>> your problem, because they cannot become set without having visited the
>> page.

> Well, maybe I'm confused here, but arranging things so that we NEVER
> have to visit the page after initially writing it seems like it's
> setting the bar almost impossibly high.

Well, that was the use-case that Josh was on about when this idea came
up: high-volume append-only log tables that in most cases will never be
read, so his client wants to get rid of the extra I/O for maintenance
visits to once-written pages.

If you're willing to allow one visit and rewrite of each page, then
we can do that today with maybe a bit of rejiggering of vacuum's
when-to-freeze heuristics.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-26 Thread Robert Haas

On Wed, May 26, 2010 at 8:21 PM, Tom Lane  wrote:
> alvherre  writes:
>> The problem with the => operator seems best resolved as not accepting
>> such an operator in a function parameter, which sucks but we don't seem
>> to have a choice.
>
> "Sucks" is not the word; "utterly unacceptable" is the word.  Having an
> expression mean different things depending on context is a recipe for
> unbelievable nightmares.  Can you imagine dealing with that in a query
> generator for example?  Or even ruleutils.c?
>
> If we go with the spec's syntax I think we'd have no realistic choice
> except to forbid => altogether as an operator name.  (And no, I'm not
> for that.)

I suppose the most painful thing about doing that is that it would
break hstore.  Are there other commonly-used modules that rely on =>
as an operator name?

In spite of the difficulties, I'm reluctant to give up on it.  I
always thought that the "AS" syntax was a crock and I'm not eager to
invent another crock to replace it.  Being compatible with the SQL
standard and with Oracle is not to be taken lightly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [spf:guess] Re: [HACKERS] ROLLBACK TO SAVEPOINT

2010-05-26 Thread Robert Haas

On Wed, May 26, 2010 at 8:25 PM, Florian Pflug  wrote:
> Unfortunately, this patch is untested. I couldn't get openjade + DocBook to 
> work on OSX for some reason :-(

That is a truly awful nightmare.  Dave Page dug up some old
instructions which got me through it - I'm guessing he doesn't mind my
posting them, but let me double-check with him first.  Actually, I
think we should incorporate them into our docs.  It's beyond me how
anyone ever got this to work, even once.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages

2010-05-26 Thread Robert Haas

On Wed, May 26, 2010 at 8:06 PM, Tom Lane  wrote:
> Josh Berkus  writes:
>> How does that get us out of reading and writing old pages, though?
>
> Yeah.  Neither PD_ALL_VISIBLE nor the visibility map are going to solve
> your problem, because they cannot become set without having visited the
> page.

Well, maybe I'm confused here, but arranging things so that we NEVER
have to visit the page after initially writing it seems like it's
setting the bar almost impossibly high.   Consider a table that is
regularly written but append-only.  Every time autovacuum kicks in,
we'll go and remove any dead tuples and then mark the pages
PD_ALL_VISIBLE and set the visibility map bits, which will cause
subsequent vacuums to ignore the all-visible portions of the table...
until anti-wraparound kicks in, at which point we'll vacuum the entire
table and freeze everything.

If, however, we decree that you can't write a new tuple into a
PD_ALL_VISIBLE page without freezing the existing tuples, then you'll
still have the small, incremental vacuums but those are pretty cheap,
and in any event, I don't see any way to get rid of them unless
someone can devise a scheme to do away with vacuum entirely.  But you
won't need the full-table vacuum to freeze tuples, because you can
freeze them opportunistically the next time those pages are written
(at which point freezing will be very cheap because the page has to be
written to disk at that point no matter what).

Maybe I'm confused.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] out-of-date comment in CreateRestartPoint()

2010-05-26 Thread Heikki Linnakangas


On 26/05/10 16:54, Heikki Linnakangas wrote:

On 26/05/10 16:16, Robert Haas wrote:

This comment obviously requires adjustment now that HS is committed.
The obvious way to change it is to replace "when we get hot standby
capability" with "when running in Hot Standby mode", but I'm not clear
whether that's all that's required.


I think that's all that's required.


Committed. I changed it to "when hot standby is enabled".

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Regression testing for psql

2010-05-26 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> There might be some value in psql backslash command tests that
> are designed to depend on just one or a few tables (or other appropriate
> objects).  Dumping large fractions of the catalogs will just be a net
> loss.

Fair enough, I can certainly do that pretty easily.  Stripping out the
unqualified \d*S commands should result in a much, much, much smaller
output, with corresponding less changes due to catalog changes.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Regression testing for psql

2010-05-26 Thread Tom Lane

Stephen Frost  writes:
> * alvherre (alvhe...@commandprompt.com) wrote:
>> (And if we want something like this, I think we should not have a single
>> huge file for the complete test, but a set of smaller files.  I'd even
>> put the bunch in src/bin/psql/regress rather than the main regress dir.)

> The actual set of tests is rather small.  The output is large, but
> that's just because we have alot of things in the catalog.

It sounds to me like this is going to be like the rules regression test
writ large; specifically the part that dumps out view definitions for
all the built-in views.  And that, quite frankly, has been a huge
maintenance burden and AFAIR has never once had any redeeming social
value in terms of catching a bug.  If you're testing things that way,
don't.  There might be some value in psql backslash command tests that
are designed to depend on just one or a few tables (or other appropriate
objects).  Dumping large fractions of the catalogs will just be a net
loss.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [spf:guess] Re: [HACKERS] ROLLBACK TO SAVEPOINT

2010-05-26 Thread Florian Pflug

On May 27, 2010, at 0:58 , Heikki Linnakangas wrote:
> On 26/05/10 02:00, Sam Vilain wrote:
>> Florian Pflug wrote:
>>> On May 25, 2010, at 12:18 , Heikki Linnakangas wrote:
 Releasing the newer savepoint will cause the older one to again become 
 accessible, as the doc says, but rolling back to a savepoint does not 
 implicitly release it. You'll have to use RELEASE SAVEPOINT for that.
>>> 
>>> Ah, now I get it. Thanks.
>>> 
>>> Would changing "Releasing the newer savepoint will cause ... " to 
>>> "Explicitly releasing the newer savepoint" or maybe even "Explicitly 
>>> releasing the newer savepoint with RELEASE SAVEPOINT will cause ..." make 
>>> things clearer?
>> 
>> Yes, probably - your misreading matches my misreading of it :-)
> 
> +1.

Patch that changes the wording to "Explicitly releasing the newer savepoint 
with RELEASE SAVEPOINT will cause ..." is attached.

Unfortunately, this patch is untested. I couldn't get openjade + DocBook to 
work on OSX for some reason :-(

best regards,
Florian Pflug


doc-savepoint-explicitrelease.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-26 Thread Tom Lane

alvherre  writes:
> The problem with the => operator seems best resolved as not accepting
> such an operator in a function parameter, which sucks but we don't seem
> to have a choice.

"Sucks" is not the word; "utterly unacceptable" is the word.  Having an
expression mean different things depending on context is a recipe for
unbelievable nightmares.  Can you imagine dealing with that in a query
generator for example?  Or even ruleutils.c?

If we go with the spec's syntax I think we'd have no realistic choice
except to forbid => altogether as an operator name.  (And no, I'm not
for that.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Distclean does not remove gram.c

2010-05-26 Thread Andrew Dunstan




Gurjeet Singh wrote:


I did a `git clean -f -d` and even that did not remove gram.c, 
apparently because this file _was_ alive at some point in the past 
hence git won't remove it even though the current branch does not have 
gram.c.





At first glance that looks like a git bug.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages

2010-05-26 Thread Tom Lane

Josh Berkus  writes:
> How does that get us out of reading and writing old pages, though?

Yeah.  Neither PD_ALL_VISIBLE nor the visibility map are going to solve
your problem, because they cannot become set without having visited the
page.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Distclean does not remove gram.c

2010-05-26 Thread Tom Lane

Gurjeet Singh  writes:
> The src/backend/parser/gram.c is a generated file, so shouldn't this be
> removed by `make distclean`, or maybe even by `make clean`?

No.  It's shipped in distribution tarballs.  If you want all derived
files to be removed, use make maintainer-clean.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Regression testing for psql

2010-05-26 Thread Stephen Frost

* alvherre (alvhe...@commandprompt.com) wrote:
> Excerpts from Stephen Frost's message of mié may 26 15:19:59 -0400 2010:
> > * Robert Haas (robertmh...@gmail.com) wrote:
> > > Then, too, there's the fact that many of these tests fail on my
> > > machine because my username is not sfrost, 
> > 
> > I've updated the patch to address this, it's again at:
> > http://snowman.net/~sfrost/psql-regress-help.patch
> 
> Isn't this kind of test a pain to maintain?  If somebody add a new SQL
> command, it will affect the entire \h output and she'll have to either
> apply the changes without checking them, or manually check the complete
> list.  I have only to add a new function to make the test fail ...

Erm, last I checked, we already provide a diff output for the changes to
the regression output.  That doesn't help when you add a new column to a
catalog, but that's a far cry from just adding some new SQL syntax or
adding a function.

> Also, having to exclude tests that mention the database owner means that
> you're only testing a fraction of the commands, so any possible problem
> has a large chance of going undetected.  I mean, if we're going to test
> this kind of thing, shouldn't we be using something that allows us to
> ignore the db owner name?  A simple wildcard in place of the owner name
> would suffice ... or do we need a regex for some other reason?

Yes, being able to use a regexp or something would be nice, but we don't
have those mechanics in the regression tests now (unless I missed
something).

> The \h output normally depends on terminal width.  Have you handled that
> somehow?

I don't think that'll actually be an issue, but would love to hear from
people if it is.

> (And if we want something like this, I think we should not have a single
> huge file for the complete test, but a set of smaller files.  I'd even
> put the bunch in src/bin/psql/regress rather than the main regress dir.)

The actual set of tests is rather small.  The output is large, but
that's just because we have alot of things in the catalog.

Thanks,

Stephen


signature.asc
Description: Digital signature

[HACKERS] Distclean does not remove gram.c

2010-05-26 Thread Gurjeet Singh

The src/backend/parser/gram.c is a generated file, so shouldn't this be
removed by `make distclean`, or maybe even by `make clean`?

I did a `git clean -f -d` and even that did not remove gram.c, apparently
because this file _was_ alive at some point in the past hence git won't
remove it even though the current branch does not have gram.c.

Regards,
-- 
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.enterprisedb.com

singh.gurj...@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device

Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Jan Wieck


On 5/26/2010 4:34 PM, Kevin Grittner wrote:

Jan Wieck  wrote:
 

Without this logic, the replication system could not combine
multiple origin sessions into one replication session without
risking to never find a state, in which it can commit.
 
My latest idea for handling this in WAL-based replication involves

WAL-logging information about the transaction through which a the
committing transaction makes it safe to view.  There are a few
options here at the detail level that I'm still thinking through.
The idea would be that the xmin from read-only queries on the slaves
might be somewhere behind where you would expect based on
transactions committed.  (The details involve such things as where
non-serializable transactions fall into the plan on both sides, and
whether it's worth the effort to special-case read-only transactions
on the master.)
 
I can't say that I'm 100% sure that some lurking detail won't shoot

this technique down for HS, but it seems good to me at a conceptual
level.


Without simulating multiple simultaneous transactions during playback, 
how are you going to manage that the tuples, already inserted on behalf 
of the ongoing master transactions, disappear when they abort on the master?



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-26 Thread Heikki Linnakangas


On 27/05/10 02:09, alvherre wrote:

Excerpts from Andrew Dunstan's message of mié may 26 18:52:33 -0400 2010:


I think we should fix it now.  Quick thought: maybe we could use FOR
instead of AS: select myfunc(7 for a, 6 for b); IIRC the standard's
mechanism for this is 'paramname =>  value', but I think that has
problems because of our possibly use of =>  as an operator - otherwise
that would be by far the best way to go.


I think we were refraining from =>  because the standard didn't specify
this back then -- AFAIU this was introduced very recently.  But now that
it does, and that the syntax we're implementing conflicts with a
different feature, it seems wise to use the standard-mandated syntax.

The problem with the =>  operator seems best resolved as not accepting
such an operator in a function parameter, which sucks but we don't seem
to have a choice.  Perhaps we could allow "=>" to resolve as the
operator for the case the user really needs to use it; or a
schema-qualified operator.


AFAIU, the standard doesn't say anything about named parameters. Oracle 
uses =>, but as you said, that's ambiguous with the => operator.


+1 for FOR.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CIText and pattern_ops

2010-05-26 Thread David E. Wheeler

Just picking up a dropped thread, does anyone have a patch for this?

  http://archives.postgresql.org/pgsql-hackers/2010-04/msg01191.php

Best,

David

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-26 Thread David E. Wheeler

On May 26, 2010, at 4:09 PM, alvherre wrote:

> The problem with the => operator seems best resolved as not accepting
> such an operator in a function parameter, which sucks but we don't seem
> to have a choice.  Perhaps we could allow "=>" to resolve as the
> operator for the case the user really needs to use it; or a
> schema-qualified operator.

I think requiring schema-qualification is an acceptable compromise.

Best,

David

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-26 Thread Josh Berkus


> Just this second posted about that, as it turns out.
> 
> I have a v3 *almost* ready of the keepalive patch. It still makes sense
> to me after a few days reflection, so is worth discussion and review. In
> or out, I want this settled within a week. Definitely need some R&R
> here.

Does the keepalive fix all the issues with max_standby_delay?  Tom?

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronization levels in SR

2010-05-26 Thread Heikki Linnakangas


On 27/05/10 01:23, Simon Riggs wrote:

On Thu, 2010-05-27 at 00:21 +0300, Heikki Linnakangas wrote:

On 26/05/10 23:31, Dimitri Fontaine wrote:

   d. choice of commit or rollback at timeout


Rollback is not an option. There is no going back after the commit
record has been flushed to disk or sent to a standby.


There's definitely no going back after the xid has been removed from
procarray because other transactions will then depend upon the final
state. Currently we PANIC if we abort after we've marked clog, though
that happens after XLogFlush(), which is where we're planning to wait
for synch rep. If we abort after having written a commit record to disk
we can still successfully generate an abort record as well. (Luckily, I
note HS does actually cope with that. Phew!)

So actually, an abort is a reasonable possibility, though I know it
doesn't sound like it could be at first thought.


Hmm, that's an interesting thought. Interesting, as in crazy ;-).

I don't understand how HS could handle that. As soon as it sees the 
commit record, the transaction becomes visible to readers.



The choice is to either commit anyway after the timeout, or wait forever.


Hmm, wait forever. What happens if we try to shutdown fast while there
is a transaction that is waiting forever? Is that then a commit, even
though it never made it to the standby? How would we know it was safe to
switchover or not? Hmm.


Refuse to shut down until the standby acknowledges the commit. That's 
the only way to be sure..


In practice, hard synchronous "don't return ever until the commit hits 
the standby" behavior is rarely what admins actually want, because it's 
disastrous from an availability point of view. More likely, admins want 
"wait for ack from standby, unless it's not responding, in which case to 
hell with redundancy and just act like a single server". It makes sense 
if you just want to make sure that the standby doesn't return stale 
results when it's working properly, and you're not worried about 
durability but I'm not sure it's very sound otherwise.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-26 Thread alvherre

Excerpts from Andrew Dunstan's message of mié may 26 18:52:33 -0400 2010:

> I think we should fix it now.  Quick thought: maybe we could use FOR 
> instead of AS: select myfunc(7 for a, 6 for b); IIRC the standard's 
> mechanism for this is 'paramname => value', but I think that has 
> problems because of our possibly use of => as an operator - otherwise 
> that would be by far the best way to go.

I think we were refraining from => because the standard didn't specify
this back then -- AFAIU this was introduced very recently.  But now that
it does, and that the syntax we're implementing conflicts with a
different feature, it seems wise to use the standard-mandated syntax.

The problem with the => operator seems best resolved as not accepting
such an operator in a function parameter, which sucks but we don't seem
to have a choice.  Perhaps we could allow "=>" to resolve as the
operator for the case the user really needs to use it; or a
schema-qualified operator.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Jan Wieck


On 5/26/2010 5:12 PM, Heikki Linnakangas wrote:

On 26/05/10 23:49, Jan Wieck wrote:

In this implementation it wouldn't even matter if a transaction that was
recorded actually never made it because it crashed before the WAL flush.
It would be reported by this "commit order" feature, but there would be
no traces of whatever it did to be found inside the DB, so that anomaly
is harmless.


Hmm, I think it would also not matter if the reported commit order 
doesn't match exactly the order of the commit records, as long as 
there's no dependency between the two transactions.


What I'm after is that I think it would be enough to establish the 
commit order using deferred triggers that are fired during pre-commit 
processing. The trigger could get a number from a global sequence to 
establish the commit order, and write it to a table. So you still need a 
global sequence, but it's only needed once per commit.


You're not trying to derail this thread into yet another of our famous 
"commit trigger" battles, are you?




(you have to handle deferred triggers that fire after the commit-order 
trigger. perhaps by getting another number from the global sequence and 
replacing the previous number with it)


I could imagine a commit trigger as a special case that is fired AFTER 
the trigger queue was shut down, so any operation that causes any 
further triggers to fire would automatically abort the transaction. A 
draconian, but reasonable restriction.



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] primary/secondary/master/slave/standby

2010-05-26 Thread Greg Stark

On Wed, May 12, 2010 at 8:00 PM, Simon Riggs  wrote:
> On Wed, 2010-05-12 at 19:33 +0300, Peter Eisentraut wrote:
>> The server's messages and the documentation uses all of these terms in
>> mixed ways.  Maybe we could decide on some preferred terminology and
>> adjust the existing texts.  Ideas?
>
> Never user the term "secondary" myself.
>
> I deliberately use "standby" rather than "slave", to differentiate
> between an exact replica and a synchronised copy (respectively).


Fwiw I like the word "replica" but I don't see an obvious choice of
word to pair it with


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [spf:guess] Re: [HACKERS] ROLLBACK TO SAVEPOINT

2010-05-26 Thread Heikki Linnakangas


On 26/05/10 02:00, Sam Vilain wrote:

Florian Pflug wrote:

On May 25, 2010, at 12:18 , Heikki Linnakangas wrote:

Releasing the newer savepoint will cause the older one to again become 
accessible, as the doc says, but rolling back to a savepoint does not 
implicitly release it. You'll have to use RELEASE SAVEPOINT for that.


Ah, now I get it. Thanks.

Would changing "Releasing the newer savepoint will cause ... " to "Explicitly releasing the 
newer savepoint" or maybe even "Explicitly releasing the newer savepoint with RELEASE SAVEPOINT 
will cause ..." make things clearer?


Yes, probably - your misreading matches my misreading of it :-)


+1.


There is another way you can get there - releasing to a savepoint before
the re-used savepoint name will also release the savepoints after it.

ie

savepoint foo;
savepoint bar;
savepoint foo;
release to savepoint bar;
release to savepoint foo;

After the first release, the second 'foo' savepoint is gone.  I think
this is a key advantage in saving the old savepoints.


Yep. Do we need to mention that in that notice? I don't think so, it 
would become really verbose. Florian's wording above seems fine.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-26 Thread Simon Riggs

On Wed, 2010-05-26 at 15:45 -0700, Josh Berkus wrote:
> > Committed with chunk size of 128 kB. I hope that's a reasonable
> > compromise, in the absence of any performance test data either way.
> 
> So where are we with max_standby_delay?  Status check?

Just this second posted about that, as it turns out.

I have a v3 *almost* ready of the keepalive patch. It still makes sense
to me after a few days reflection, so is worth discussion and review. In
or out, I want this settled within a week. Definitely need some R&R
here.

-- 
 Simon Riggs   www.2ndQuadrant.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-26 Thread Andrew Dunstan




Peter Eisentraut wrote:

It turns out that the SQL standard uses the function call notation

foo(this AS that)

for something else:

 ::=  

 ::= [   ] 

 ::=  [  [ {   }... ] ] 

 ::= 
| 
| 

 ::=  AS 

In systems that have inheritance of composite types, this is used to
specify which type the value is supposed to be interpreted as (for
example, to treat the value as a supertype).

Seems kind of bad to overload this with something completely different.
What should we do?

  


I think we should fix it now.  Quick thought: maybe we could use FOR 
instead of AS: select myfunc(7 for a, 6 for b); IIRC the standard's 
mechanism for this is 'paramname => value', but I think that has 
problems because of our possibly use of => as an operator - otherwise 
that would be by far the best way to go.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-26 Thread Simon Riggs

On Sun, 2010-05-16 at 17:11 +0100, Simon Riggs wrote:

> New version, with some other cleanup of wait processing.
> 
> New logic is that when Startup asks for next applychunk of WAL it saves
> the lastChunkTimestamp. That is then the base time used by
> WaitExceedsMaxStandbyDelay(), except when latestXLogTime is later.
> Since multiple receivechunks can arrive from primary before Startup asks
> for next applychunk we use the oldest receivechunk timestamp, not the
> latest. Doing it this way means the lastChunkTimestamp doesn't change
> when new keepalives arrive, so we have a stable and reasonably accurate
> recordSendTimestamp for each WAL record.
> 
> The keepalive is sent as the first part of a new message, if any. So
> partial chunks of data always have an accurate timestamp, even if that
> is slightly older as a result. Doesn't make much difference except with
> very large chunks.
> 
> I think that addresses the points raised on this thread and others.

Was about to post v3 after your last commit, but just found a minor bug
in my v2->v3 changes, even though they were fairly light. Too tired to
fix now. The general thinking underlying this patch is still the same
though and is worth discussing over next few days.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-26 Thread Simon Riggs

On Thu, 2010-05-27 at 01:34 +0300, Heikki Linnakangas wrote:

> Committed with chunk size of 128 kB.

Thanks. I'm sure that's fine.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-26 Thread Josh Berkus


> Committed with chunk size of 128 kB. I hope that's a reasonable
> compromise, in the absence of any performance test data either way.

So where are we with max_standby_delay?  Status check?

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-26 Thread Heikki Linnakangas


On 19/05/10 00:37, Simon Riggs wrote:

On Tue, 2010-05-18 at 17:25 -0400, Heikki Linnakangas wrote:

On 18/05/10 17:17, Simon Riggs wrote:

There's no reason that the buffer size we use for XLogRead() should be
the same as the send buffer, if you're worried about that. My point is
that pq_putmessage contains internal flushes so at the libpq level you
gain nothing by big batches. The read() will be buffered anyway with
readahead so not sure what the issue is. We'll have to do this for sync
rep anyway, so what's the big deal? Just do it now, once. Do we really
want 9.1 code to differ here?


Do what? What exactly is it that you want instead, then?


Read and write smaller messages, so the latency is minimised. Libpq will
send in 8192 byte packets, so writing anything larger gains nothing when
the WAL data is also chunked at exactly the same size.


Committed with chunk size of 128 kB. I hope that's a reasonable 
compromise, in the absence of any performance test data either way.


I'm weary of setting it as low as 8k, as there is some per-message 
overhead. Some of that could be avoided by rearranging the loops so that 
the ps display is not updated at every message as you suggested, but I 
don't feel doing any extra rearrangements at this point. It would not be 
hard, but it also certainly wouldn't make the code simpler.


I believe in practice 128kB is just as good as 8k from the 
responsiveness point of view. If a standby is not responding, walsender 
will be stuck trying to send no matter what the block size is. If it 
responding, no matter how slowly, 128kB should get transferred pretty 
quickly.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronization levels in SR

2010-05-26 Thread Simon Riggs

On Thu, 2010-05-27 at 00:21 +0300, Heikki Linnakangas wrote:
> On 26/05/10 23:31, Dimitri Fontaine wrote:
> >   d. choice of commit or rollback at timeout
> 
> Rollback is not an option. There is no going back after the commit 
> record has been flushed to disk or sent to a standby.

There's definitely no going back after the xid has been removed from
procarray because other transactions will then depend upon the final
state. Currently we PANIC if we abort after we've marked clog, though
that happens after XLogFlush(), which is where we're planning to wait
for synch rep. If we abort after having written a commit record to disk
we can still successfully generate an abort record as well. (Luckily, I
note HS does actually cope with that. Phew!) 

So actually, an abort is a reasonable possibility, though I know it
doesn't sound like it could be at first thought.

> The choice is to either commit anyway after the timeout, or wait forever.

Hmm, wait forever. What happens if we try to shutdown fast while there
is a transaction that is waiting forever? Is that then a commit, even
though it never made it to the standby? How would we know it was safe to
switchover or not? Hmm.

Oracle offers options of COMMIT | SHUTDOWN in this case.

-- 
 Simon Riggs   www.2ndQuadrant.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronization levels in SR

2010-05-26 Thread Simon Riggs

On Wed, 2010-05-26 at 17:31 -0400, Jan Wieck wrote:

> You can do this only with per standby options, by giving each standby a 
> weight, or a number of votes. Your DEV server would have a weight of 
> zero, while your production standby's have higher weights, depending on 
> their importance for your overall infrastructure. As long as majority 
> means >50% of all votes in the house, you don't have a split brain risk.

Yes, you could do that with per-standby options.

If you give each standby a weight then the parameter has much less
meaning for the user. It doesn't mean number of replicas any more, it
means something else with local and changeable meaning. A fractional
quorum suffers the same way.

What would make some sense would be to have an option for "vote=0|1" so
that a standby would never take part in the transaction sync when
vote=0.

But you still have the problem of specifying rules if insufficient
servers with vote=1 are available. The reaction to that is to supply
more servers with vote=1, though then you need a way to specify how many
servers with vote=1 you care about.

-- 
 Simon Riggs   www.2ndQuadrant.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] 9.0 Open Items: Code and Documentation sections

2010-05-26 Thread Selena Deckelmann

Hi!

I've spent some time going through all the email threads since Alpha4
was released, and have updated the 9.0 Open Items list.

We have two sections of open items which need initial review by -hackers:

http://wiki.postgresql.org/wiki/PostgreSQL_9.0_Open_Items#Code
http://wiki.postgresql.org/wiki/PostgreSQL_9.0_Open_Items#Documentation

If you know that an item has been resolved, please move it to the
Resolved section. If you have additional information about the status
of the page, please feel free to update the details accordingly!

Thanks!
-selena

-- 
http://chesnok.com/daily - me

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] primary/secondary/master/slave/standby

2010-05-26 Thread Heikki Linnakangas


On 12/05/10 22:23, Robert Haas wrote:

On Wed, May 12, 2010 at 3:01 PM, Heikki Linnakangas
  wrote:

Tom Lane wrote:

If so, master/standby would probably work.


+1 for master/standby.

It's worth remembering that a "standby server" might not be actively
connected to a master server. A server that's reading WAL from an
archive backup, for example, can be put to standby mode. "Standby"
covers that case too, better than "slave".


So does this mean we should rename primary_conninfo?


Yes, I think it does. I'll change it tomorrow, barring objections or 
someone else changing it first.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronization levels in SR

2010-05-26 Thread Heikki Linnakangas


On 26/05/10 23:31, Dimitri Fontaine wrote:

So if you want simplicity to admin, effective data availability and
precise control over the global setup, I say go for:
  a. transaction level control of the replication level
  b. cascading support
  c. quorum with timeout
  d. choice of commit or rollback at timeout

Then give me a setup example that you can't express fully.


One master, one synchronous standby on another continent for HA 
purposes, and one asynchronous reporting server in the same rack as the 
master. You don't want to set up the reporting server as a cascaded 
slave of the standby on the other continent, because that would double 
the bandwidth required, but you also don't want the master to wait for 
the reporting server.


The possibilities are endless... Your proposal above covers a pretty 
good set of scenarios, but it's by no means complete. If we try to solve 
everything the configuration will need to be written in a 
Turing-complete Replication Description Language. We'll have to pick a 
useful, easy-to-understand subset that covers the common scenarios. To 
handle the more exotic scenarios, you can write a proxy that sits in 
front of the master, and implements whatever rules you wish, with the 
rules written in C.


BTW, I think we're going to need a separate config file for listing the 
standbys anyway. There you can write per-server rules and options, but 
explicitly knowing about all the standbys also allows the master to 
recycle WAL as soon as it has been streamed to all the registered 
standbys. Currently we just keep wal_keep_segments files around, just in 
case there's a standby out there that needs them.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE...ALTER COLUMN vs inheritance

2010-05-26 Thread Bernd Helmle




--On 4. November 2009 09:57:27 -0500 Tom Lane  wrote:


Yeah, this is a known issue.  The ALTER should be rejected, but it is
not, because we don't have enough infrastructure to notice that the
constraint is inherited and logically can't be dropped.  I think the
consensus was that the way to fix this (along with some other problems)
is to start representing NOT NULL constraints in pg_constraint, turning
attnotnull into just a bit of denormalization for performance.


Lost it a little from my radar, but here's is a first shot on this issue 
now. The patch creates a new CONSTRAINT_NOTNULL contype and assigns all 
required information for the NOT NULL constraint to it. Currently the 
constraint records the attribute number it belongs to and manages the 
inheritance properties. Passes regression tests with some adjustments to 
pg_constraint output.


The patch as it stands employs a dedicated code path for 
ATExecDropNotNull(), thus duplicates the behavior of 
ATExecDropConstraint(). I'm not really satisfied with this, but i did it 
this way to prevent some heavy conditional rearrangement in 
ATExecDropConstraint(). Maybe its worth to move the code to adjust 
constraint inheritance properties into a separate function.


There's also a remaining issue which needs to be addressed: currently 
pg_get_constraintdef_worker() is special case'd to dump the correct syntax 
for the NOT NULL constraint, which is totally redundant. I'm not sure how 
to do it correctly, since for example ATExecAlterColumnType() actually uses 
it to restore dependent constraints on a column. We might want to just move 
the special case there.


I understand that people are busy with the remaining open items list for 
9.0, so it's okay to discuss this during the upcoming reviewfest.



   Bernd


notnull_constraint.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronization levels in SR

2010-05-26 Thread Jan Wieck


On 5/26/2010 12:55 PM, Heikki Linnakangas wrote:

On 26/05/10 18:31, Robert Haas wrote:

And frankly, I don't think it's possible for quorum commit to reduce
the number of parameters.  Even if we have that feature available, not
everyone will want to use it.  And the people who don't will
presumably need whatever parameters they would have needed if quorum
commit hadn't been available in the first place.


Agreed, quorum commit is not a panacea.

For example, suppose that you have two servers, master and a standby, 
and you want transactions to be synchronously committed to both, so that 
in the event of a meteor striking the master, you don't lose any 
transactions that have been replied to the client as committed.


Now you want to set up a temporary replica of the master at a 
development server, for testing purposes. If you set quorum to 2, your 
development server becomes critical infrastructure, which is not what 
you want. If you set quorum to 1, it also becomes critical 
infrastructure, because it's possible that a transaction has been 
replicated to the test server but not the real production standby, and a 
meteor strikes.


Per-standby settings would let you express that, but not OTOH the quorum 
behavior where you require N out of M to acknowledge the commit before 
returning to client.



You can do this only with per standby options, by giving each standby a 
weight, or a number of votes. Your DEV server would have a weight of 
zero, while your production standby's have higher weights, depending on 
their importance for your overall infrastructure. As long as majority 
means >50% of all votes in the house, you don't have a split brain risk.



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Show schema name on REINDEX DATABASE

2010-05-26 Thread Greg Sabino Mullane


-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


>>> Patch attached to show the schema *and* table name when doing
>>> a REINDEX DATABASE.
> 
>> Is this something that can be added to 9.1 commitfest?

> Not in this form, apparently.  Can we convince Greg or someone else to
> work on adding some more error message fields?

I think you might be confusing this with my somewhat more controversial 
patch to show schemas when a COPY error occurs. This just changes
the normal output, not the error output. Nobody has complained 
about this patch yet.


- -- 
Greg Sabino Mullane g...@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201005261728
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAkv9kvcACgkQvJuQZxSWSsgdvwCgygmLyKdAHKflL8WucdNvfuPn
84sAoOBuvAIgupEUcThNWW3KsPaIu9MQ
=Ptoe
-END PGP SIGNATURE-



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronization levels in SR

2010-05-26 Thread Heikki Linnakangas


On 26/05/10 23:31, Dimitri Fontaine wrote:

  d. choice of commit or rollback at timeout


Rollback is not an option. There is no going back after the commit 
record has been flushed to disk or sent to a standby.


The choice is to either commit anyway after the timeout, or wait forever.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ADMIN] command tag logging

2010-05-26 Thread alvherre

Excerpts from Ray Stell's message of mié may 26 17:08:33 -0400 2010:
> I just installed a compiled from src 8.3.11.  I usually include %i, command 
> tag,
> in the log_line_prefix setting.  This causes some spewage I'd not seen before
> on connection received lines as if it is dumping the environment:
> 
> [unknown],17462,[unknown],2010-05-26 16:04:33.293 
> EDT,4bfd7ed1.4436,1,2010-05-26 16:04:33 
> EDT,0,/usr/local/pgsql8311/bin/postgres-D/var/database/pgsql/alerts_subscribeMANPATH=/usr/local/pgsql/man:HOSTNAME=test.cns.vt.eduTERM=xtermSHELL=/bin/bashHISTSIZE=1000ANT_HOME=/var/local/apache-antUSER=postgresqlLS_COLORS=no=00:fi=00:di=00;34:ln=00;36:pi=40;33:so=00;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=00;32:*.cmd=00;32:*.exe=00;32:*.com=00;32:*.btm=00;32:*.bat=00;32:*.sh=00;32:*.csh=00;32:*.tar=00;31:*.tgz=00;31:*.arj=00;31:*.taz=00;31:*.lzh=00;31:*.zip=00;31:*.z=00;31:*.Z=00;31:*.gz=00;31:*.bz2=00;31:*.bz=00;31:*.tz=00;31:*.rpm=00;31:*.cpio=00;31:*.jpg=00;35:*.gif=00;35:*.bmp=00;35:*.xbm=00;35:*.xpm=00;35:*.png=00;35:*.tif=00;35:LD_LIBRARY_PATH=/usr/lib/openssl/:/usr/local/pgsql/lib:PATH=/usr/java/jdk1.6.0_20/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/db03/app/oracle/product/11.1.0/bin:/var/local/apache-ant/bin:/usr/local/maven/bin:/usr/local/pgsql/bi
 
nMAIL=/var/spool/mail/postgresql_=/usr/local/pgsql8311/bin/postgresPWD=/home/postgresqlINPUTRC=/etc/inputrcJAVA_HOME=/usr/java/jdk1.6.0_20LANG=en_US.UTF-8PGSYSCONFDIR=/usr/local/pgsql8311/etcSSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpassPGDIR=/usr/local/pgsqlHOME=/home/postgresqlSHLVL=2M2_HOME=/usr/local/mavenLOGNAME=postgresqlCVS_RSH=/usr/bin/sshPGDATA=/var/database/pgsql/alerts_subscribeLESSOPEN=|/usr/bin/lesspipe.sh
 %sORACLE_HOME=/db03/app/oracle/product/11.1.0G_BROKEN_FILENAMES=1 LOG:  
connection received: host=198.82.3.23 port=49723

Hmm, I bet it's the recent %.*s patch.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Jan Wieck


On 5/26/2010 4:52 PM, Heikki Linnakangas wrote:

Ok, I think I understand it now. The commit order is enough, because 
replaying the actions in the order "all actions of B, then all actions 
of A" yields the same result.


Precisely.


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Heikki Linnakangas


On 26/05/10 23:49, Jan Wieck wrote:

In this implementation it wouldn't even matter if a transaction that was
recorded actually never made it because it crashed before the WAL flush.
It would be reported by this "commit order" feature, but there would be
no traces of whatever it did to be found inside the DB, so that anomaly
is harmless.


Hmm, I think it would also not matter if the reported commit order 
doesn't match exactly the order of the commit records, as long as 
there's no dependency between the two transactions.


What I'm after is that I think it would be enough to establish the 
commit order using deferred triggers that are fired during pre-commit 
processing. The trigger could get a number from a global sequence to 
establish the commit order, and write it to a table. So you still need a 
global sequence, but it's only needed once per commit.


(you have to handle deferred triggers that fire after the commit-order 
trigger. perhaps by getting another number from the global sequence and 
replacing the previous number with it)


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Show schema name on REINDEX DATABASE

2010-05-26 Thread alvherre

Excerpts from Selena Deckelmann's message of mié may 26 11:07:40 -0400 2010:
> On Mon, Apr 5, 2010 at 9:29 AM, Greg Sabino Mullane  wrote:
> > Patch attached to show the schema *and* table name when doing
> > a REINDEX DATABASE.
> 
> Is this something that can be added to 9.1 commitfest?

Not in this form, apparently.  Can we convince Greg or someone else to
work on adding some more error message fields?

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages

2010-05-26 Thread Josh Berkus


> What if we drove it off of the PD_ALL_VISIBLE bit on the page itself,
> rather than the visibility map bit?  It would be safe to clear the
> visibility map bit without touching the page, but if you clear the
> PD_ALL_VISIBLE bit on the page itself then you set all the hint bits
> and freeze all the tuples.  In the case where the visibility map bit
> gets cleared but the page-level bit is still set, a future vacuum can
> notice and reset the visibility map bit.  But whenever the visibility
> map bit is set, you know that the page-level bit MUST be set, so you
> needn't vacuum those pages, even for anti-wraparound: you know they'll
> be frozen when and if they ever get written again.

How does that get us out of reading and writing old pages, though?  If
we're going to set a bit on them, we might as well freeze them.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Heikki Linnakangas


On 26/05/10 23:45, Heikki Linnakangas wrote:

On 26/05/10 23:32, Jan Wieck wrote:

Consider two transactions A and B that due to transaction batching
between snapshots get applied together. Let the order of actions be

1. A starts
2. B starts
3. B selects a row for update, then updates the row
4. A tries to do the same and blocks
5. B commits
6. A gets the lock, the row, does the update
7. A commits

If Slony (or Londiste) would not record the exact order of those
individual row actions, then it would not have any idea if within that
batch the action of B (higher XID) actually came first. Without that
knowledge there is a 50/50 chance of getting your replica out of sync
with that simple conflict.


Hmm, I don't see how even a fully reliable WAL-logged commit-order log
would save you then. It seems that you need to not only know the
relative order of commits, but the order of commits relative to actions
within the transactions. I.e. in the above example it's not enough to
know that B committed before A, you also have to know that A updated the
row only after B committed.


Ok, I think I understand it now. The commit order is enough, because 
replaying the actions in the order "all actions of B, then all actions 
of A" yields the same result.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Jan Wieck


On 5/26/2010 12:38 PM, Greg Stark wrote:

On Wed, May 26, 2010 at 5:10 PM, Jan Wieck  wrote:

... but to answer that request, actually I don't even think we should be
discussing API specifics.



How about just API generalities? Like, where do you need this data, on
the master or on the slave? Would PGXC like it on the transaction
coordinator?

What question do you need to answer, do you need to pull out sets of
commits in certain ranges or look up specific transaction ids and find
out when they committed? Or do you only need to answer which of two
transaction ids committed first?


The question I want answered is

  "what was the order and xid of the next 0..n transactions, that
  committed after transaction X?"

Preferably I would avoid scanning the entire available WAL just to get 
the next n xid's to process.


The proposal assigned a unique serial number (file segment and position 
driven) to each xid and used that for the ordering as well as 
identification of the last known transaction. That is certainly a 
premature implementation detail.


In this implementation it wouldn't even matter if a transaction that was 
recorded actually never made it because it crashed before the WAL flush. 
It would be reported by this "commit order" feature, but there would be 
no traces of whatever it did to be found inside the DB, so that anomaly 
is harmless.



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Heikki Linnakangas


On 26/05/10 23:32, Jan Wieck wrote:

Consider two transactions A and B that due to transaction batching
between snapshots get applied together. Let the order of actions be

1. A starts
2. B starts
3. B selects a row for update, then updates the row
4. A tries to do the same and blocks
5. B commits
6. A gets the lock, the row, does the update
7. A commits

If Slony (or Londiste) would not record the exact order of those
individual row actions, then it would not have any idea if within that
batch the action of B (higher XID) actually came first. Without that
knowledge there is a 50/50 chance of getting your replica out of sync
with that simple conflict.


Hmm, I don't see how even a fully reliable WAL-logged commit-order log 
would save you then. It seems that you need to not only know the 
relative order of commits, but the order of commits relative to actions 
within the transactions. I.e. in the above example it's not enough to 
know that B committed before A, you also have to know that A updated the 
row only after B committed.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] functional call named notation clashes with SQL feature

2010-05-26 Thread Peter Eisentraut

It turns out that the SQL standard uses the function call notation

foo(this AS that)

for something else:

 ::=  

 ::= [   ] 

 ::=  [  [ {   }... ] ] 

 ::= 
| 
| 

 ::=  AS 

In systems that have inheritance of composite types, this is used to
specify which type the value is supposed to be interpreted as (for
example, to treat the value as a supertype).

Seems kind of bad to overload this with something completely different.
What should we do?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Jan Wieck


On 5/26/2010 4:11 PM, Dimitri Fontaine wrote:

So even ordering the txid and txid_snapshots with respect to WAL commit
time (LSN) won't be the whole story, for any given transaction
containing more than one event we also need to have them in order. I
know Jan didn't forget about it so it must either be in the proposal or
easily derived, too tired to recheck.


No, that detail is actually not explained in the proposal. When applying 
all changes in transaction commit order, there is no need for a global 
sequence. A local counter per backend is sufficient because the total 
order of ,  yields a similarly 
agreeable order of actions.



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Kevin Grittner

Jan Wieck  wrote:

> Without this logic, the replication system could not combine
> multiple origin sessions into one replication session without
> risking to never find a state, in which it can commit.

My latest idea for handling this in WAL-based replication involves
WAL-logging information about the transaction through which a the
committing transaction makes it safe to view.  There are a few
options here at the detail level that I'm still thinking through.
The idea would be that the xmin from read-only queries on the slaves
might be somewhere behind where you would expect based on
transactions committed.  (The details involve such things as where
non-serializable transactions fall into the plan on both sides, and
whether it's worth the effort to special-case read-only transactions
on the master.)

I can't say that I'm 100% sure that some lurking detail won't shoot
this technique down for HS, but it seems good to me at a conceptual
level.

I think, however, that this fails to work for systems like Slony and
Londiste because there could be transactions writing to tables which
are not replication targets, so the snapshot adjustments wouldn't be
safe.  True?  (If not true, I think that adding some sort of xmin
value, depending on the answers to the above questions, to Jan's
proposed structure might support better transactional integrity,
even to the level of full serializable support, at the cost of
delaying visibility of committed data.)

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Jan Wieck


On 5/26/2010 3:16 PM, Heikki Linnakangas wrote:

On 26/05/10 21:43, Jan Wieck wrote:

On 5/26/2010 1:17 PM, Heikki Linnakangas wrote:

It would not get called during recovery, but I believe that would be
sufficient for Slony. You could always batch commits that you don't
know when they committed as if they committed simultaneously.


Here you are mistaken. If the origin crashes but can recover not yet
flushed to xlog-commit-order transactions, then the consumer has no idea
about the order of those commits, which throws us back to the point
where we require a non cacheable global sequence to replay the
individual actions of those "now batched" transactions in an agreeable
order.

The commit order data needs to be covered by crash recovery.


Perhaps I'm missing something, 


Apparently, more about that at the end.


I'm thinking that the commit-order log would contain two kinds of records:

a) Transaction with XID X committed
b) All transactions with XID < X committed


If that was true then long running transactions would delay all commits 
for transactions that started after them. Do they?




During normal operation we write the 1st kind of record at every commit. 
After crash recovery (perhaps at the first commit after recovery or when 
the slon daemon first polls the server, as there's no hook for 
end-of-recovery), we write the 2nd kind of record.


I think the callback is also called during backend startup, which means 
that it could record the first XID to come which is known from the 
control file and in that case, all < XID's are committed or aborted.


Which leads us to your missing piece above, the need for the global non 
cacheable sequence.


Consider two transactions A and B that due to transaction batching 
between snapshots get applied together. Let the order of actions be


1. A starts
2. B starts
3. B selects a row for update, then updates the row
4. A tries to do the same and blocks
5. B commits
6. A gets the lock, the row, does the update
7. A commits

If Slony (or Londiste) would not record the exact order of those 
individual row actions, then it would not have any idea if within that 
batch the action of B (higher XID) actually came first. Without that 
knowledge there is a 50/50 chance of getting your replica out of sync 
with that simple conflict.



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronization levels in SR

2010-05-26 Thread Dimitri Fontaine

Simon Riggs  writes:
> On Wed, 2010-05-26 at 19:55 +0300, Heikki Linnakangas wrote:
>> Now you want to set up a temporary replica of the master at a 
>> development server, for testing purposes. If you set quorum to 2, your
>> development server becomes critical infrastructure, which is not what 
>> you want.
>
> That's a good argument for standby relays. 

Well it seems to me we can have the best of both worlds as soon as we
have cascading support. Even in the test server example, this one would
be a slave of the main slave, not counted into the quorum on the master.

Now that's the quorum on the slave that would be deciding on the
availability of the test server. Set it down to 0 and your test server
has no impact on the production environment.

In the example of one master and 4 slaves in 2 different locations,
you'll have a quorum of 2 on the master, which will know about 2 slaves
only. And each of them will have 1 slave, with a quorum to set to 0 or 1
depending on what you want to achieve.

So if you want simplicity to admin, effective data availability and
precise control over the global setup, I say go for:
 a. transaction level control of the replication level
 b. cascading support
 c. quorum with timeout
 d. choice of commit or rollback at timeout

Then give me a setup example that you can't express fully.

As far as the options to control the whole thing are concerned, I think
that the cascading support does not add any. So that's 3 GUCs.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Robert Haas

On Wed, May 26, 2010 at 4:11 PM, Dimitri Fontaine
 wrote:
> Heikki Linnakangas  writes:
>> Perhaps I'm missing something, but I thought that Slony currently uses a
>> heartbeat, and all transactions committed between two beats are banged
>> together and committed as one in the slave so that their relative commit
>> order doesn't matter.
>
> I guess Slony does the same as pgq here: all events of all those
> transactions between two given ticks are batched together in the order
> of the event commits. (In fact the batches are made at the consumer
> request, so possibly spreading more than 2 ticks at a time).
>
> If you skip that event ordering (within transactions), you can't
> maintain foreign keys on the slaves, among other things.
>
> The idea of this proposal is to be able to get this commit order
> directly from where the information is maintained, rather than use some
> sort of user sequence for that.

Exactly.

> So even ordering the txid and txid_snapshots with respect to WAL commit
> time (LSN) won't be the whole story, for any given transaction
> containing more than one event we also need to have them in order. I
> know Jan didn't forget about it so it must either be in the proposal or
> easily derived, too tired to recheck.

Right, so the point is - with this proposal, he can switch to using a
LOCAL sequence number to order events within the session and then
order the sessions using the commit ordering.  Right now, he has to
use a GLOBAL sequence number because there's no way to know the commit
order.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] cursor_to_xml: How to signal end?

2010-05-26 Thread Peter Eisentraut

Currently, cursor_to_xml returns an empty string when the end of the
cursor is reached (meaning the fetch returned zero rows).  As discussed
on -general, that's kind of weird, because you'd have to do something
like

IF val::text = ''

to test for the end, since there is no = operator for xml, and also
because '' isn't a valid XML value at all.

What would be a good way to address this?  Return null perhaps?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Dimitri Fontaine

Heikki Linnakangas  writes:
> Perhaps I'm missing something, but I thought that Slony currently uses a
> heartbeat, and all transactions committed between two beats are banged
> together and committed as one in the slave so that their relative commit
> order doesn't matter. 

I guess Slony does the same as pgq here: all events of all those
transactions between two given ticks are batched together in the order
of the event commits. (In fact the batches are made at the consumer
request, so possibly spreading more than 2 ticks at a time).

If you skip that event ordering (within transactions), you can't
maintain foreign keys on the slaves, among other things.

The idea of this proposal is to be able to get this commit order
directly from where the information is maintained, rather than use some
sort of user sequence for that.

So even ordering the txid and txid_snapshots with respect to WAL commit
time (LSN) won't be the whole story, for any given transaction
containing more than one event we also need to have them in order. I
know Jan didn't forget about it so it must either be in the proposal or
easily derived, too tired to recheck.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Regression testing for psql

2010-05-26 Thread alvherre

Excerpts from Stephen Frost's message of mié may 26 15:19:59 -0400 2010:
> * Robert Haas (robertmh...@gmail.com) wrote:
> > Then, too, there's the fact that many of these tests fail on my
> > machine because my username is not sfrost, 
> 
> I've updated the patch to address this, it's again at:
> http://snowman.net/~sfrost/psql-regress-help.patch

Isn't this kind of test a pain to maintain?  If somebody add a new SQL
command, it will affect the entire \h output and she'll have to either
apply the changes without checking them, or manually check the complete
list.  I have only to add a new function to make the test fail ...

Also, having to exclude tests that mention the database owner means that
you're only testing a fraction of the commands, so any possible problem
has a large chance of going undetected.  I mean, if we're going to test
this kind of thing, shouldn't we be using something that allows us to
ignore the db owner name?  A simple wildcard in place of the owner name
would suffice ... or do we need a regex for some other reason?

The \h output normally depends on terminal width.  Have you handled that
somehow?

(And if we want something like this, I think we should not have a single
huge file for the complete test, but a set of smaller files.  I'd even
put the bunch in src/bin/psql/regress rather than the main regress dir.)

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Add _PG_init to PL language handler documentation

2010-05-26 Thread Robert Haas

On Tue, May 25, 2010 at 4:34 AM, Jonathan Leto  wrote:
> This tiny doc patch adds _PG_init to the skeleton example code for a
> PL. The information is quite valuable to PL authors, who might miss it
> when it is described in the shared library documentation.

I'm not sure it does much good to add it to the template without any
explanation of what it does.  What might make more sense is to add a
cross-reference to the section on dynamic loading, where this is
documented.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronization levels in SR

2010-05-26 Thread Joshua D. Drake

On Wed, 2010-05-26 at 15:37 -0400, Robert Haas wrote:
> On Wed, May 26, 2010 at 3:13 PM, Simon Riggs  wrote:
> >> I don't really understand this comment.  I have said, and I believe,
> >> that a system without quorum commit is simpler than one with quorum
> >> commit.  I'd debate the point with you but I find the point so
> >> self-evident that I don't even know where to begin arguing it.
> >
> >> It's simply an opinion, which I believe to
> >> be based on solid technical reasoning, but which I might change my
> >> mind about if someone convinces me that I'm looking at the problem the
> >> wrong way.
> >
> > You're saying you have solid technical reasons, but they are so
> > self-evident that you can't even begin to argue them. Why are you so
> > sure your reasons are solid?? Regrettably, I say this doesn't make any
> > sense, however much you write.
> 
> Yeah, especially when you juxtapose two different parts of my email
> that were talking about different things.

/me rings bell and orders the fighters to their respective corners.

> 
> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise Postgres Company
> 

-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronization levels in SR

2010-05-26 Thread Robert Haas

On Wed, May 26, 2010 at 3:13 PM, Simon Riggs  wrote:
>> I don't really understand this comment.  I have said, and I believe,
>> that a system without quorum commit is simpler than one with quorum
>> commit.  I'd debate the point with you but I find the point so
>> self-evident that I don't even know where to begin arguing it.
>
>> It's simply an opinion, which I believe to
>> be based on solid technical reasoning, but which I might change my
>> mind about if someone convinces me that I'm looking at the problem the
>> wrong way.
>
> You're saying you have solid technical reasons, but they are so
> self-evident that you can't even begin to argue them. Why are you so
> sure your reasons are solid?? Regrettably, I say this doesn't make any
> sense, however much you write.

Yeah, especially when you juxtapose two different parts of my email
that were talking about different things.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 174 matches

Mail list logo