Re: [HACKERS] Simplifying replication

2010-10-26 Thread Robert Haas
On Tue, Oct 26, 2010 at 9:59 PM, Josh Berkus  wrote:
>
>> If you set wal_keep_segments=0, archive_mode=on, and
>> archive_command=, you might run out of disk space.
>>
>> If you set wal_keep_segments=-1, you might run out of disk space.
>>
>> Are you any more screwed in the second case than you are in the first
>> case?
>
> It is the same to the user either way.  In either case you have to
> change some settings and restart the master.

Except that changing wal_keep_segments doesn't require restarting the master.

The point of allowing -1 was to allow someone to set it to that value
temporarily, to be able to do a hot backup without having to guess how
large to set it.  If you don't have enough disk space for a backup to
complete, you're kind of hosed either way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplifying replication

2010-10-26 Thread Josh Berkus

> If you set wal_keep_segments=0, archive_mode=on, and
> archive_command=, you might run out of disk space.
> 
> If you set wal_keep_segments=-1, you might run out of disk space.
> 
> Are you any more screwed in the second case than you are in the first
> case?

It is the same to the user either way.  In either case you have to
change some settings and restart the master.

Well, for the archive case, you could conceivably mass-delete the
archive files.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-26 Thread Robert Haas
On Thu, Oct 21, 2010 at 8:33 PM, Bruce Momjian  wrote:
> Robert Haas wrote:
>> On Thu, Oct 21, 2010 at 4:21 PM, Josh Berkus  wrote:
>> > On 10/20/10 6:54 PM, Robert Haas wrote:
>> >> I find it impossible to believe that's
>> >> a good decision, and IMHO we should be focusing on how to make the
>> >> parameters PGC_SIGHUP rather than PGC_POSTMASTER, which would give us
>> >> most of the same benefits without throwing away hard-won performance.
>> >
>> > I'd be happy to accept that. ?Is it possible, though?
>>
>> I sketched an outline of the problem AIUI here:
>>
>> http://archives.postgresql.org/pgsql-hackers/2010-10/msg01348.php
>>
>> I think it's possible; I'm not quite sure how hard it is.
>> Unfortunately, I've not had as much PG-hacking time lately as I'd
>> like...
>
> Have we documented these TODOs?

I have not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] security hook on authorization

2010-10-26 Thread Robert Haas
On Mon, Oct 25, 2010 at 9:45 PM, Robert Haas  wrote:
> Oh.  You know, I am realizing that I misread this patch.  This hook is
> actually after authentication has been done; it's merely before we've
> told the client what happened.  So maybe this is committable as-is,
> modulo some work on the comments.

Committed, with some work on the comments.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add label to enum syntax

2010-10-26 Thread Tom Lane
Alvaro Herrera  writes:
> Excerpts from Dean Rasheed's message of mar oct 26 15:46:56 -0300 2010:
>> Well ELEMENT is a reserved keyword in SQL:2008, to support multisets,
>> so if we ever supported that feature...

> Hah!

Hmmm ... I dug through SQL:2008, and so far as I can find, the only use
of ELEMENT as a keyword is for , which
is defined as "return the sole element of a multiset of one element":

 ::= ELEMENT   

This is stated to be equivalent to

( SELECT M.E FROM UNNEST (mve) AS M(E) )

AFAICS, if we were to implement this, we'd do it as an ordinary function
named element(), just like unnest() is an ordinary function in our
implementation.  Reserving a common word for as tiny of a notational
savings as this would be stupid.

Of course, it's possible that in future versions the committee might
extend ELEMENT() in ways that we can't duplicate as a simple function.
But that's all hypothetical --- you could as well argue that they might
decide to reserve any other word, too.


But ... having said all that, I have to agree that ELEMENT seems
preferable to LABEL if we ignore micro-considerations of parser
efficiency --- I still think LABEL is a pretty poor choice of word
here.  Personally I'd still take VALUE as being my first choice though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] O_DSYNC broken on MacOS X?

2010-10-26 Thread Robert Haas
On Mon, Oct 25, 2010 at 12:51 PM, Peter Eisentraut  wrote:
> On mån, 2010-10-25 at 09:33 -0400, Robert Haas wrote:
>> It seems we're still missing some relevant details, because hdparm
>> doesn't seem to work on SCSI devices.  Is sdparm the right utility in
>> that case?  Does anyone know what the correct incantations look like?
>
> Search the sdparm man page for "Writeback Cache".  It has detailed
> examples.

Here's a patch.  This adds a few more details about sdparm and makes
it clear that it applies to both FreeBSD and Linux.  But, perhaps more
significantly, it rearranges what is currently a fairly long paragraph
into a bulleted list, which I think is more readable.  Comments?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


wal-reliability-list.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add label to enum syntax

2010-10-26 Thread Alvaro Herrera
Excerpts from Dean Rasheed's message of mar oct 26 15:46:56 -0300 2010:
> On 26 October 2010 17:04, David E. Wheeler  wrote:
> > On Oct 26, 2010, at 7:15 AM, Robert Haas wrote:
> >
> >>> Notwithstanding the above, I don't think ELEMENT would be a very bad 
> >>> choice.
> >>
> >> I still think we should just go for LABEL and be done with it.  But
> >> y'all can ignore me if you want...
> >
> > +1
> 
> Well ELEMENT is a reserved keyword in SQL:2008, to support multisets,
> so if we ever supported that feature...

Hah!

Well, here's a patch for LABEL in any case.  If we're going to have to
reserve ELEMENT in the future then there doesn't seem to be much point
in not choosing that one though.  Should we take a poll?

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


0001-Change-syntax-to-add-a-new-enum-value-to-ALTER-TYPE-.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql: Don't close stdin, don't leak file descriptor with ON_ERROR_STOP

2010-10-26 Thread Robert Haas
On Wed, Oct 20, 2010 at 5:54 PM, Marti Raudsepp  wrote:
> Here's the second patch from my coccicheck run. Originally it flagged
> the fact that the opened file in psql's process_file() wasn't being
> closed in the ON_ERROR_STOP path, but there seem to be two more
> unintended behaviors here.
>
> (1) In the error path, the value of pset.inputfile wasn't being
> properly restored. The caller does free(fname) on line 786, so
> psql.inputfile would point to unallocated memory.
>
> (2) The more significant issue is that stdin *was closed in the
> success return path. So when you run a script with two "\i -" lines,
> the first "\q" would close stdin and the next one would fail with:
>    psql:-:0: could not read from input file: Bad file descriptor
>
> In fact, this means that stdin was being accessed after being
> fclose()d, which is undefined behavior per ANSI C, though it seems to
> work just fine on Linux.
>
> The new behavior requires the same amount of "\q"s as the number of
> executions of '-' because stdin is never closed.

Thanks, committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplifying replication

2010-10-26 Thread Robert Haas
On Tue, Oct 26, 2010 at 8:27 AM, Simon Riggs  wrote:
> On Thu, 2010-10-21 at 20:57 -0400, Robert Haas wrote:
>
>> Very true.  But the lack of a -1 setting for wal_keep_segments means
>> that if you would like to take a backup without archiving, you must
>> set wal_keep_segments to a value greater than or equal to the rate at
>> which you generate WAL segments multiplied by the time it takes you to
>> run a backup.  If that doesn't qualify as requiring arcane knowledge,
>> I'm mystified as to what ever could.
>
> People are missing the point here:
>
> You have to put the WAL files *somewhere* while you do the base backup.
> PostgreSQL can't itself work out where that is, nor can it work out
> ahead of time how big it will need to be, since it is up to you how you
> do your base backup. Setting a parameter to -1 doesn't make the problem
> go away, it just pretends and hopes it doesn't exist, but screws you
> badly if you do hit the wall.

If you set wal_keep_segments=0, archive_mode=on, and
archive_command=, you might run out of disk space.

If you set wal_keep_segments=-1, you might run out of disk space.

Are you any more screwed in the second case than you are in the first
case?  Why?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ask for review of MERGE

2010-10-26 Thread Simon Riggs
On Tue, 2010-10-26 at 16:08 -0400, Robert Haas wrote:
> On Tue, Oct 26, 2010 at 8:10 AM, Simon Riggs  wrote:
> > I agree with your analysis. Let me review...
> > [review]
> 
> Sounds like we're on the same page.
> 
> > Two options for resolution are
> >
> > 1) Throw SERIALIZABLE error
> >
> > 2) Implement something similar to EvalPlanQual
> > As you say, we already resolve this situation for concurrent updates by
> > following the update chain from a row that is visible to the latest row.
> > For MERGE the semantics need to be subtely different here: we need to
> > follow the update chain to the latest row, but from a row that we
> > *can't* see.
> >
> > MERGE is still very useful without the need for 2), though fails in some
> > cases for concurrent use. The failure rate would increase as the number
> > of concurrent MERGErs and/or number of rows in source table. Those
> > errors are no more serious than are possible now.
> >
> > So IMHO we should implement 1) now and come back later to implement 2).
> > Given right now there may be other issues with 2) it seems unsafe to
> > rely on the assumption that we'll fix them by end of release.
> 
> Yeah.  In fact, I'm not sure we're ever going to want to implement #2
> - I think that needs more study to determine whether there's even
> something there that makes sense to implement at all.  But certainly I
> wouldn't count on it happening for 9.1.

2) sounds weird, until you realise it is exactly how you would need to
code a PL/pgSQL procedure to do the equivalent of MERGE. Not doing it
just makes no sense in the longer term. I agree it will take a while to
think it through in sufficient detail.

In the meantime it's a good argument for the ELSE construct at the end
of the WHEN clauses, so we can do something other than skip a row or
throw an ERROR.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash in plancache with subtransactions

2010-10-26 Thread Tom Lane
I wrote:
> Heikki Linnakangas  writes:
>> One simple idea is to keep a flag along with the executor state to 
>> indicate that the executor state is currently in use. Set it just before 
>> calling ExecEvalExpr, and reset afterwards. If the flag is already set 
>> in the beginning of exec_eval_simple_expr, we have recursed, and must 
>> create a new executor state.

> Yeah, the same thought occurred to me in the shower this morning.
> I'm concerned about possible memory leakage during repeated recursion,
> but maybe that can be dealt with.

I spent some time poking at this today, and developed the attached
patch, which gets rid of all the weird assumptions associated with
"simple expressions" in plpgsql, in favor of just doing another
ExecInitExpr per expression in each call of the function.  While this is
a whole lot cleaner than what we have, I'm afraid that it's unacceptably
slow.  I'm seeing an overall slowdown of 2X to 3X on function execution
with examples like:

create or replace function speedtest10(x float8) returns float8 as $$
declare
  z float8 := x;
begin
  z := z * 2 + 1;
  z := z * 2 + 1;
  z := z * 2 + 1;
  z := z * 2 + 1;
  z := z * 2 + 1;
  z := z * 2 + 1;
  z := z * 2 + 1;
  z := z * 2 + 1;
  z := z * 2 + 1;
  z := z * 2 + 1;
  return z;
end
$$
language plpgsql immutable;

Now, this is about the worst case for the patch.  This function's
runtime depends almost entirely on the speed of simple expressions,
and because there's no internal looping, we only get to use the result
of each ExecInitExpr once per function call.  So probably "typical" use
cases wouldn't be quite so bad; but still it seems like we can't go this
route.  We need to be able to use the ExecInitExpr results across
successive calls one way or another.

I'll look into creating an in-use flag next.

regards, tom lane

diff --git a/src/pl/plpgsql/src/gram.y b/src/pl/plpgsql/src/gram.y
index a28c6707e4670be17ed1c947d383f1d11b9c90c5..0963efe1b5b5e7e974a682d2a7f71d6427180e9b 100644
*** a/src/pl/plpgsql/src/gram.y
--- b/src/pl/plpgsql/src/gram.y
*** decl_statement	: decl_varname decl_const
*** 490,495 
--- 490,496 
  		curname_def = palloc0(sizeof(PLpgSQL_expr));
  
  		curname_def->dtype = PLPGSQL_DTYPE_EXPR;
+ 		curname_def->expr_num = plpgsql_nExprs++;
  		strcpy(buf, "SELECT ");
  		cp1 = new->refname;
  		cp2 = buf + strlen(buf);
*** read_sql_construct(int until,
*** 2277,2282 
--- 2278,2284 
  	expr->query			= pstrdup(ds.data);
  	expr->plan			= NULL;
  	expr->paramnos		= NULL;
+ 	expr->expr_num  = plpgsql_nExprs++;
  	expr->ns= plpgsql_ns_top();
  	pfree(ds.data);
  
*** make_execsql_stmt(int firsttoken, int lo
*** 2476,2481 
--- 2478,2484 
  	expr->query			= pstrdup(ds.data);
  	expr->plan			= NULL;
  	expr->paramnos		= NULL;
+ 	expr->expr_num  = plpgsql_nExprs++;
  	expr->ns= plpgsql_ns_top();
  	pfree(ds.data);
  
diff --git a/src/pl/plpgsql/src/pl_comp.c b/src/pl/plpgsql/src/pl_comp.c
index 3ddcf3b5a595e0847627cc10c4084258f44cc352..4feb14d2bec87de5d8a66071ab3a66da3c2a2d76 100644
*** a/src/pl/plpgsql/src/pl_comp.c
--- b/src/pl/plpgsql/src/pl_comp.c
***
*** 37,42 
--- 37,47 
  
  /* --
   * Our own local and global variables
+  *
+  * Ideally these would live in some dynamically-allocated structure, but
+  * it's unpleasant to pass such a thing around in a Bison parser.  As long
+  * as plpgsql function compilation isn't re-entrant, it's okay to use
+  * static storage for these.
   * --
   */
  PLpgSQL_stmt_block *plpgsql_parse_result;
*** int			plpgsql_nDatums;
*** 46,51 
--- 51,58 
  PLpgSQL_datum **plpgsql_Datums;
  static int	datums_last = 0;
  
+ int			plpgsql_nExprs;
+ 
  char	   *plpgsql_error_funcname;
  bool		plpgsql_DumpExecTree = false;
  bool		plpgsql_check_syntax = false;
*** do_compile(FunctionCallInfo fcinfo,
*** 367,372 
--- 374,381 
  	 sizeof(PLpgSQL_datum *) * datums_alloc);
  	datums_last = 0;
  
+ 	plpgsql_nExprs = 0;
+ 
  	switch (is_trigger)
  	{
  		case false:
*** do_compile(FunctionCallInfo fcinfo,
*** 693,698 
--- 702,708 
  	function->datums = palloc(sizeof(PLpgSQL_datum *) * plpgsql_nDatums);
  	for (i = 0; i < plpgsql_nDatums; i++)
  		function->datums[i] = plpgsql_Datums[i];
+ 	function->nexprs = plpgsql_nExprs;
  
  	/* Debug dump for completed functions */
  	if (plpgsql_DumpExecTree)
*** plpgsql_compile_inline(char *proc_source
*** 788,793 
--- 798,804 
  	plpgsql_nDatums = 0;
  	plpgsql_Datums = palloc(sizeof(PLpgSQL_datum *) * datums_alloc);
  	datums_last = 0;
+ 	plpgsql_nExprs = 0;
  
  	/* Set up as though in a function returning VOID */
  	function->fn_rettype = VOIDOID;
*** plpgsql_compile_inline(char *proc_source
*** 838,843 
--- 849,855 
  	function->datums = palloc(sizeof(PLpgSQL_datum *) * plpgsq

Re: [HACKERS] ask for review of MERGE

2010-10-26 Thread Robert Haas
On Tue, Oct 26, 2010 at 8:10 AM, Simon Riggs  wrote:
> I agree with your analysis. Let me review...
> [review]

Sounds like we're on the same page.

> Two options for resolution are
>
> 1) Throw SERIALIZABLE error
>
> 2) Implement something similar to EvalPlanQual
> As you say, we already resolve this situation for concurrent updates by
> following the update chain from a row that is visible to the latest row.
> For MERGE the semantics need to be subtely different here: we need to
> follow the update chain to the latest row, but from a row that we
> *can't* see.
>
> MERGE is still very useful without the need for 2), though fails in some
> cases for concurrent use. The failure rate would increase as the number
> of concurrent MERGErs and/or number of rows in source table. Those
> errors are no more serious than are possible now.
>
> So IMHO we should implement 1) now and come back later to implement 2).
> Given right now there may be other issues with 2) it seems unsafe to
> rely on the assumption that we'll fix them by end of release.

Yeah.  In fact, I'm not sure we're ever going to want to implement #2
- I think that needs more study to determine whether there's even
something there that makes sense to implement at all.  But certainly I
wouldn't count on it happening for 9.1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplifying replication

2010-10-26 Thread Simon Riggs
On Thu, 2010-10-21 at 20:57 -0400, Robert Haas wrote:

> Very true.  But the lack of a -1 setting for wal_keep_segments means
> that if you would like to take a backup without archiving, you must
> set wal_keep_segments to a value greater than or equal to the rate at
> which you generate WAL segments multiplied by the time it takes you to
> run a backup.  If that doesn't qualify as requiring arcane knowledge,
> I'm mystified as to what ever could.

People are missing the point here:

You have to put the WAL files *somewhere* while you do the base backup.
PostgreSQL can't itself work out where that is, nor can it work out
ahead of time how big it will need to be, since it is up to you how you
do your base backup. Setting a parameter to -1 doesn't make the problem
go away, it just pretends and hopes it doesn't exist, but screws you
badly if you do hit the wall. 

My view is that is irresponsible, even if I share people's wish that the
problem did not exist.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ask for review of MERGE

2010-10-26 Thread Simon Riggs
On Mon, 2010-10-25 at 16:58 -0400, Robert Haas wrote:
> On Mon, Oct 25, 2010 at 4:10 PM, Greg Stark  wrote:
> > On Mon, Oct 25, 2010 at 12:40 PM, Robert Haas  wrote:
> >> Now, as Greg says, that might be what some people want, but it's
> >> certainly monumentally unserializable.
> >
> > To be clear when I said it's what people want what I meant was that in
> > the common cases it's doing exactly what people want. As opposed to
> > getting closer to what people want in general but not quite hitting
> > the mark in the common cases.
> >
> > Just as an example I think it's important that in the simplest case,
> > upsert of a single record, it be 100% guaranteed to do the naive
> > upsert. If two users are doing the merge of a single key at the same
> > time one of them had better insert and one of them had better update
> > or else users are going to be monumentally surprised.
> 
> Hmm, so let's think about that case.
> 
> The first merge comes along and finds no match so it fires the NOT
> MATCHED rule, which inserts a tuple.  The second merge comes along and
> finds no match, so it also fires the NOT MATCHED rule and tries to
> insert a tuple.  But upon consulting the PRIMARY KEY index it finds
> that an in-doubt tuple exists so it goes to sleep waiting for the
> first transaction to commit or abort.  If the first transaction
> commits it then decides that the jig is up and fails.  We could
> (maybe) fix this by doing something similar to what EPQ does for
> updates: when the first transaction commits, instead of redoing the
> insert, we back up and recheck whether the new tuple would have
> matched the join clause and, if so, we instead fire the MATCHED action
> on the updated tuple.  If not, we fire NOT MATCHED anyway.  I'm not
> sure how hard that would be, or whether it would introduce any other
> nasty anomalies in more complex cases.
> 
> Alternatively, we could introduce an UPSERT or REPLACE statement
> intended to handle exactly this case and leave MERGE for more complex
> situations.  It's pretty easy to imagine what the coding of that
> should look like: if we encounter an in-doubt tuple in we wait on its
> xmin.  If the transaction aborts, we insert.  If it commits, and we're
> in READ COMMITTED mode, we update it; but if we're in REPEATABLE READ
> or SERIALIZABLE mode, we abort with a serialization error.  That's a
> lot simpler to understand and reason about than MERGE in its full
> generality.
> 
> I think it's pretty much hopeless to think that MERGE is going to work
> in complex concurrent scenarios without creating serialization
> anomalies, or at least rollbacks.  I think that's baked into the
> nature of what the statement does.  To simulate MERGE, you need to
> read from the target table and then do writes that depend on what you
> read.  If you do that with the commands that are available today,
> you're going to get serialization anomalies and/or rollbacks under
> concurrency.  The mere fact of that logic being inside the database
> rather than outside isn't going to make that go away.  Now sometimes,
> as with exclusion constraints, you can play games with dirty snapshots
> to get the semantics you want, but whether that's possible in a
> particular case depends on the details of the operation being
> performed, and here I think it can't be done.  Some operations are
> *fundamentally* unserializable.

I agree with your analysis. Let me review...

There is a case that locking alone won't resolve, however that locking
works. The transaction history for that is:

T1: takes snapshot
T2: INSERT row1
T2: COMMIT;
T1: attempts to determine if MATCHED or NOT MATCHED. 

The answer is neither of those two answers. If we say it is NOT MATCHED
then we will just fail on any INSERT, or if we say it is MATCHED then
technically we can't see the row so the UPDATE should fail. The COMMIT
of T2 releases any locks that would have helped resolve this, and note
that even if T1 attempts to lock that row as early as possible, only a
table level lock would prevent T2 from doing that action.

Two options for resolution are

1) Throw SERIALIZABLE error

2) Implement something similar to EvalPlanQual
As you say, we already resolve this situation for concurrent updates by
following the update chain from a row that is visible to the latest row.
For MERGE the semantics need to be subtely different here: we need to
follow the update chain to the latest row, but from a row that we
*can't* see.

MERGE is still very useful without the need for 2), though fails in some
cases for concurrent use. The failure rate would increase as the number
of concurrent MERGErs and/or number of rows in source table. Those
errors are no more serious than are possible now.

So IMHO we should implement 1) now and come back later to implement 2).
Given right now there may be other issues with 2) it seems unsafe to
rely on the assumption that we'll fix them by end of release.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 Post

Re: [HACKERS] Extensible executor nodes for preparation of SQL/MED

2010-10-26 Thread Tom Lane
Greg Stark  writes:
> On Mon, Oct 25, 2010 at 8:28 AM, Tom Lane  wrote:
>> But it might be a good change anyway from a performance standpoint,
>> in case a call through a function pointer is faster than a big switch.
>> Have you tried benchmarking it on common platforms?

> I've always wondered why we didn't use function pointers. It seems
> like it would make the code a lot easier to maintain with fewer places
> that need to be updated every time we add a node.

> But I always assumed a big part of the reason was performance.
> Generally compilers hate optimizing code using function pointers. They
> pretty much kill any inter-procedural optimization since it's very
> hard to figure out what set of functions you might have called and
> make any deductions about what side effects those functions might or
> might not have had. Even at the chip level function pointers tend to
> be slow since they cause full pipeline stalls where the processor has
> no idea where the next instruction is coming from until it's finished
> loading the data from the previous instruction.

At least in the case of the plan-node-related switches, the called
functions tend to be big and ugly enough that it's really hard to credit
any meaningful inter-procedural optimization could happen.  Side effects
would have to be "pretty much everything".

> On the other hand of course it's not obvious what algorithm gcc should
> use to implement largish switch statements like these. It might be
> doing a fairly large number of operations doing a binary search or
> hash lookup to determine which branch to take.

I confess to not having actually examined the assembly code anytime
recently, but I'd always supposed it would be an array-lookup anytime
the set of case labels was reasonably dense, which it should be in these
cases.  So I'm not sure I believe the pipeline stall argument either.
Any way you slice it, there's going to be a call to a function that
is going to be really really hard to predict in advance --- unless of
course you rely on statistics like "where did I jump to the last few
times I was here", which I think modern CPUs do have the ability to
do.

But this is all just arm-waving of course.  Benchmarks would be a lot
more convincing.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign keys for array/period contains relationships

2010-10-26 Thread Peter Eisentraut
On tis, 2010-10-26 at 11:53 -0700, Jeff Davis wrote:
> Case #2 is the strange one, I think. It's not actually just an
> adaptation of #1. #1 requires that all elements of the array have a
> corresponding PK value; but #2 just requires that one of them does.
> Peter, can you clarify case #2? Did you have a use case in mind?

[ That's the period references timestamp case. ]

You're right, it's probably not useful.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] xlog.c: WALInsertLock vs. WALWriteLock

2010-10-26 Thread Robert Haas
On Tue, Oct 26, 2010 at 3:00 PM, Josh Berkus  wrote:
>
>> I agree that the standby might get ahead, but this doesn't necessarily
>> lead to database corruption. Here, the interesting case is what happens
>> when the primary fails, which can lead to *either* of the following two
>> cases:
>> 1) The standby, due to some triggering mechanism, becomes the new
>> primary. In this case, even if the standby was ahead, its fine.
>> 2) The primary comes back as primary. In this case, the standby will
>> connect again to the primary. At this point, *if* somehow we are able to
>> detect that the standby is ahead, then we should abort the standby and
>> create a standby from scratch.
>
> Yes.  And we weren't able to implement that for 9.0.  It's worth
> revisiting for 9.1.  In fact, the issue of "is the standby ahead of the
> master" has come up repeatedly in potential failure scenarios; I think
> we're going to need a fairly bulletproof method to determine this.

Agreed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] xlog.c: WALInsertLock vs. WALWriteLock

2010-10-26 Thread Josh Berkus

> I agree that the standby might get ahead, but this doesn't necessarily
> lead to database corruption. Here, the interesting case is what happens
> when the primary fails, which can lead to *either* of the following two
> cases:
> 1) The standby, due to some triggering mechanism, becomes the new
> primary. In this case, even if the standby was ahead, its fine.
> 2) The primary comes back as primary. In this case, the standby will
> connect again to the primary. At this point, *if* somehow we are able to
> detect that the standby is ahead, then we should abort the standby and
> create a standby from scratch.

Yes.  And we weren't able to implement that for 9.0.  It's worth
revisiting for 9.1.  In fact, the issue of "is the standby ahead of the
master" has come up repeatedly in potential failure scenarios; I think
we're going to need a fairly bulletproof method to determine this.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] xlog.c: WALInsertLock vs. WALWriteLock

2010-10-26 Thread Robert Haas
On Tue, Oct 26, 2010 at 2:57 PM, fazool mein  wrote:
>
> On Tue, Oct 26, 2010 at 11:23 AM, Robert Haas  wrote:
>>
>> On Tue, Oct 26, 2010 at 2:13 PM, Heikki Linnakangas
>>  wrote:
>> >
>> >> Can you please describe why
>> >> walsender reading directly from the buffers was given up? To avoid a
>> >> lot
>> >> of
>> >> locking?
>> >
>> > To avoid locking yes, and complexity in general.
>>
>> And the fact that it might allow the standby to get ahead of the
>> master, leading to silent database corruption.
>>
>
> I agree that the standby might get ahead, but this doesn't necessarily lead
> to database corruption. Here, the interesting case is what happens when the
> primary fails, which can lead to *either* of the following two cases:
> 1) The standby, due to some triggering mechanism, becomes the new primary.
> In this case, even if the standby was ahead, its fine.

True.

> 2) The primary comes back as primary. In this case, the standby will connect
> again to the primary. At this point, *if* somehow we are able to detect that
> the standby is ahead, then we should abort the standby and create a standby
> from scratch.

Unless you set restart_after_crash=off, the master could
crash-and-restart before you can do anything about it.  But that
doesn't exist in the 9.0 branch.

> I agree with Heikki that going through all this trouble only makes sense if
> there is a huge performance boost.

There's probably quite a large performance boost in the sync rep case
from allowing the master and standby to fsync() in parallel, but first
we need to get something that works at all.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] xlog.c: WALInsertLock vs. WALWriteLock

2010-10-26 Thread fazool mein
On Tue, Oct 26, 2010 at 11:23 AM, Robert Haas  wrote:

> On Tue, Oct 26, 2010 at 2:13 PM, Heikki Linnakangas
>  wrote:
> >
> >> Can you please describe why
> >> walsender reading directly from the buffers was given up? To avoid a lot
> >> of
> >> locking?
> >
> > To avoid locking yes, and complexity in general.
>
> And the fact that it might allow the standby to get ahead of the
> master, leading to silent database corruption.
>
>
I agree that the standby might get ahead, but this doesn't necessarily lead
to database corruption. Here, the interesting case is what happens when the
primary fails, which can lead to *either* of the following two cases:
1) The standby, due to some triggering mechanism, becomes the new primary.
In this case, even if the standby was ahead, its fine.
2) The primary comes back as primary. In this case, the standby will connect
again to the primary. At this point, *if* somehow we are able to detect that
the standby is ahead, then we should abort the standby and create a standby
from scratch.

I agree with Heikki that going through all this trouble only makes sense if
there is a huge performance boost.


Re: [HACKERS] security label error message

2010-10-26 Thread Robert Haas
On Tue, Oct 26, 2010 at 2:20 PM, Peter Eisentraut  wrote:
> Isn't this error message logically backwards?
>
> =# SECURITY LABEL ON SCHEMA public IS NULL;
> ERROR:  22023: security label providers have been loaded

Ouch.  How embarrassing.  Fixed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensible executor nodes for preparation of SQL/MED

2010-10-26 Thread Greg Stark
On Mon, Oct 25, 2010 at 8:28 AM, Tom Lane  wrote:
> But it might be a good change anyway from a performance standpoint,
> in case a call through a function pointer is faster than a big switch.
> Have you tried benchmarking it on common platforms?

I've always wondered why we didn't use function pointers. It seems
like it would make the code a lot easier to maintain with fewer places
that need to be updated every time we add a node.

But I always assumed a big part of the reason was performance.
Generally compilers hate optimizing code using function pointers. They
pretty much kill any inter-procedural optimization since it's very
hard to figure out what set of functions you might have called and
make any deductions about what side effects those functions might or
might not have had. Even at the chip level function pointers tend to
be slow since they cause full pipeline stalls where the processor has
no idea where the next instruction is coming from until it's finished
loading the data from the previous instruction.

On the other hand of course it's not obvious what algorithm gcc should
use to implement largish switch statements like these. It might be
doing a fairly large number of operations doing a binary search or
hash lookup to determine which branch to take.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign keys for array/period contains relationships

2010-10-26 Thread Jeff Davis
On Mon, 2010-10-25 at 17:57 -0700, Greg Stark wrote:
> On Mon, Oct 25, 2010 at 5:24 PM, Jeff Davis  wrote:
> > I think that's easier when the PK must contain the FK, because then you
> > only need to lock one record. Even when you need to lock multiple
> > records, it seems feasible, and is just an index lookup, right? Do you
> > see a particular problem?
> 
> Well if you lock multiple records then it's not clear what operations
> you should conflict with. Removing any one of them wouldn't actually
> invalidate the foreign key reference unless you remove the last one.

I didn't word my statement clearly. If the PK contains the FK, and you
have an Exclusion Constraint on the PK (as Peter suggested), then you
only need to lock one record. I think that logic would be pretty much
the same as a normal FK.

The case where you might need to lock multiple records is when the FK
contains the PK (case #1 in Peter's original email). But in that case,
you would still have a UNIQUE constraint on the PK (right?) and removing
any referenced element should cause a conflict.

Case #2 is the strange one, I think. It's not actually just an
adaptation of #1. #1 requires that all elements of the array have a
corresponding PK value; but #2 just requires that one of them does.
Peter, can you clarify case #2? Did you have a use case in mind?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign keys for array/period contains relationships

2010-10-26 Thread Jeff Davis
On Tue, 2010-10-26 at 20:25 +0300, Peter Eisentraut wrote:
> Let's say you have
> 
> PK
> 
> 1
> 2
> 3
> 4
> 5
> 
> FK
> 
> [1,2,3]
> [3,4,5]
> [4,4,4]
> 
> When you delete PK = 3, what do you  expect to happen?  OK, you might
> decide to delete the first two rows from the FK table.  This might or
> might not make sense in a particular case, but on delete cascade is an
> option the user has to choose explicitly.

That's what I would expect.

> But I don't see what to do
> about cascading an update when you, say, update PK 1 => 6.

Intuitively, I would expect all 1's to be replaced by 6's in all arrays.
But I can now see why you would be hesitant to try to support that.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add label to enum syntax

2010-10-26 Thread Dean Rasheed
On 26 October 2010 17:04, David E. Wheeler  wrote:
> On Oct 26, 2010, at 7:15 AM, Robert Haas wrote:
>
>>> Notwithstanding the above, I don't think ELEMENT would be a very bad choice.
>>
>> I still think we should just go for LABEL and be done with it.  But
>> y'all can ignore me if you want...
>
> +1
>

Well ELEMENT is a reserved keyword in SQL:2008, to support multisets,
so if we ever supported that feature...

But I don't feel strongly about this. I think the overall consensus so
far is in favour of LABEL.

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] EOCF

2010-10-26 Thread David Fetter
Folks,

I just realized I hadn't closed out the commitfest earlier.  Have done
so.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] xlog.c: WALInsertLock vs. WALWriteLock

2010-10-26 Thread Robert Haas
On Tue, Oct 26, 2010 at 2:13 PM, Heikki Linnakangas
 wrote:
> On 26.10.2010 21:03, fazool mein wrote:
>>>
>>> Might I suggest adopting the same technique walsender does, ie just read
>>> the data back from disk?  There's a reason why we gave up trying to have
>>> walsender read directly from the buffers.
>>>
>> That is exactly what I do not want to do, i.e. read from disk, as long as
>> the piece of WAL is available in the buffers.
>
> Why not? If the reason is performance, I'd like to see some performance
> numbers to show that it's worth the trouble. You could perhaps do a quick
> and dirty hack that doesn't do the locking 100% correctly first, and do some
> benchmarking on that to get a ballpark number of how much potential there
> is. Or run oprofile on the current walsender implementation to see how much
> time is currently spent reading WAL from the kernel buffers.
>
>> Can you please describe why
>> walsender reading directly from the buffers was given up? To avoid a lot
>> of
>> locking?
>
> To avoid locking yes, and complexity in general.

And the fact that it might allow the standby to get ahead of the
master, leading to silent database corruption.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] security label error message

2010-10-26 Thread Peter Eisentraut
Isn't this error message logically backwards?

=# SECURITY LABEL ON SCHEMA public IS NULL;
ERROR:  22023: security label providers have been loaded



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] xlog.c: WALInsertLock vs. WALWriteLock

2010-10-26 Thread Heikki Linnakangas

On 26.10.2010 21:03, fazool mein wrote:

Might I suggest adopting the same technique walsender does, ie just read
the data back from disk?  There's a reason why we gave up trying to have
walsender read directly from the buffers.


That is exactly what I do not want to do, i.e. read from disk, as long as
the piece of WAL is available in the buffers.


Why not? If the reason is performance, I'd like to see some performance 
numbers to show that it's worth the trouble. You could perhaps do a 
quick and dirty hack that doesn't do the locking 100% correctly first, 
and do some benchmarking on that to get a ballpark number of how much 
potential there is. Or run oprofile on the current walsender 
implementation to see how much time is currently spent reading WAL from 
the kernel buffers.



Can you please describe why
walsender reading directly from the buffers was given up? To avoid a lot of
locking?


To avoid locking yes, and complexity in general.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] xlog.c: WALInsertLock vs. WALWriteLock

2010-10-26 Thread fazool mein
> Might I suggest adopting the same technique walsender does, ie just read
> the data back from disk?  There's a reason why we gave up trying to have
> walsender read directly from the buffers.
>
>
That is exactly what I do not want to do, i.e. read from disk, as long as
the piece of WAL is available in the buffers. Can you please describe why
walsender reading directly from the buffers was given up? To avoid a lot of
locking?
The locking issue might not be a problem considering synchronous
replication. In synchronous replication, the primary will anyways wait for
the standby to send a confirmation before it can do more WAL inserts. Hence,
reading from buffers might be better in this case.

So, as I understand from the emails, we need to lock both WALWriteLock and
WALInsertLock in exclusive mode for reading from buffers. Agreed?

Thanks.


Re: [HACKERS] Range Types, discrete and/or continuous

2010-10-26 Thread Robert Haas
On Tue, Oct 26, 2010 at 1:42 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Tue, Oct 26, 2010 at 1:26 PM, Jeff Davis  wrote:
>>> However, this is orthogonal, I think. I can always ask the user to
>>> specify everything when creating a Range Type, and then we can make them
>>> default to use the interface functions later. Some, like "plus" might be
>>> constant, but people certainly might want to specify alternate
>>> comparators.
>
>> If it were me, I would go design and implement the type interface part
>> first.   But it's not.
>
> I agree with Jeff's plan: seems like taking a first cut at the higher
> level is worthwhile, to make sure you know what you need from the
> type-system interfaces.
>
> FWIW, I don't agree with the proposed syntax.  We already have a
> perfectly extensible CREATE TYPE syntax, so there is no reason to
> implement this as an ALTER TYPE operation.  What's more, altering
> existing datatype declarations is fraught with all kinds of fun
> risks, as we were reminded with the recent enum patch.

Fair enough.  I'm not wedded to the syntax (or the order of development).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplifying replication

2010-10-26 Thread Josh Berkus

> What happens if max_wal_size is less than checkpoint_segments?
> Currently a checkpoint tries to leave WAL files which were generated
> from the prior ckpt start to current ckpt end. Because those WAL files
> are required for crash recovery. But we should delete some of them
> according to max_wal_size?

The ideas is that max_wal_size would *replace* checkpoint_segments.  The
checkpoint_segments setting is baffling to most PG DBAs.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types, discrete and/or continuous

2010-10-26 Thread Tom Lane
Robert Haas  writes:
> On Tue, Oct 26, 2010 at 1:26 PM, Jeff Davis  wrote:
>> However, this is orthogonal, I think. I can always ask the user to
>> specify everything when creating a Range Type, and then we can make them
>> default to use the interface functions later. Some, like "plus" might be
>> constant, but people certainly might want to specify alternate
>> comparators.

> If it were me, I would go design and implement the type interface part
> first.   But it's not.

I agree with Jeff's plan: seems like taking a first cut at the higher
level is worthwhile, to make sure you know what you need from the
type-system interfaces.

FWIW, I don't agree with the proposed syntax.  We already have a
perfectly extensible CREATE TYPE syntax, so there is no reason to
implement this as an ALTER TYPE operation.  What's more, altering
existing datatype declarations is fraught with all kinds of fun
risks, as we were reminded with the recent enum patch.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types, discrete and/or continuous

2010-10-26 Thread Robert Haas
On Tue, Oct 26, 2010 at 1:26 PM, Jeff Davis  wrote:
> On Mon, 2010-10-25 at 21:07 -0400, Robert Haas wrote:
>> See, that gets complicated, because now you're restricting the range
>> of values that can be expressed by the range type to something less
>> than the natural range of the data type.  I am not sure the value of
>> supporting that is sufficient to justify the amount of extra code that
>> will be required to make it work.  I'd say for a first version, nail
>> down the representation.  Perhaps in a future version you could have
>> compress/uncompress methods sort of like GIST,
>
> OK, I can live with that.
>
>> ALTER TYPE timestamptz
>>     ADD INTERFACE increment timestamp_pl_interval(timestamptz, interval),
>>     ADD INTERFACE decrement timestamp_mi_interval(timestamptz, interval);
>
> I think we chatted about this before. Sounds like a good idea to me
> (except the name -- "increment" is not the same as "plus").
>
> However, this is orthogonal, I think. I can always ask the user to
> specify everything when creating a Range Type, and then we can make them
> default to use the interface functions later. Some, like "plus" might be
> constant, but people certainly might want to specify alternate
> comparators.

If it were me, I would go design and implement the type interface part
first.   But it's not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign keys for array/period contains relationships

2010-10-26 Thread Peter Eisentraut
On mån, 2010-10-25 at 17:57 -0700, Greg Stark wrote:
> Well if you lock multiple records then it's not clear what operations
> you should conflict with. Removing any one of them wouldn't actually
> invalidate the foreign key reference unless you remove the last one.
> 
> I always assumed this was why we require the unique constraint at all.

I did mention that you would need an exclusion constraint in some of the
cases, to get an effect analogous to unique constraints.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types, discrete and/or continuous

2010-10-26 Thread Jeff Davis
On Mon, 2010-10-25 at 21:07 -0400, Robert Haas wrote:
> See, that gets complicated, because now you're restricting the range
> of values that can be expressed by the range type to something less
> than the natural range of the data type.  I am not sure the value of
> supporting that is sufficient to justify the amount of extra code that
> will be required to make it work.  I'd say for a first version, nail
> down the representation.  Perhaps in a future version you could have
> compress/uncompress methods sort of like GIST,

OK, I can live with that. 

> ALTER TYPE timestamptz
> ADD INTERFACE increment timestamp_pl_interval(timestamptz, interval),
> ADD INTERFACE decrement timestamp_mi_interval(timestamptz, interval);

I think we chatted about this before. Sounds like a good idea to me
(except the name -- "increment" is not the same as "plus").

However, this is orthogonal, I think. I can always ask the user to
specify everything when creating a Range Type, and then we can make them
default to use the interface functions later. Some, like "plus" might be
constant, but people certainly might want to specify alternate
comparators.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign keys for array/period contains relationships

2010-10-26 Thread Peter Eisentraut
On mån, 2010-10-25 at 17:38 -0700, Jeff Davis wrote:
> > Implementing the foreign key side of this merely requires the system
> to
> > have some knowledge of the required "contains" operator, which it
> does
> > in the array case, and something can surely be arranged for the
> range
> > case.  The problem is you can't do cascading updates or deletes, but
> you
> > could do on update/delete restrict, which is still useful.
> 
> Why can't you do cascading updates/deletes?

Let's say you have

PK

1
2
3
4
5

FK

[1,2,3]
[3,4,5]
[4,4,4]

When you delete PK = 3, what do you  expect to happen?  OK, you might
decide to delete the first two rows from the FK table.  This might or
might not make sense in a particular case, but on delete cascade is an
option the user has to choose explicitly.  But I don't see what to do
about cascading an update when you, say, update PK 1 => 6.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign keys for array/period contains relationships

2010-10-26 Thread Peter Eisentraut
On mån, 2010-10-25 at 22:10 +0200, Pavel Stehule wrote:
> 2010/10/25 Robert Haas :
> >> Example #1: Foreign key side is an array, every member must match some
> >> PK.
> >>
> >> CREATE TABLE pk (a int PRIMARKY KEY, ...);
> >>
> >> CREATE TABLE fk (x int[] REFERENCES pk (a), ...);
> 
> What about optimalizations and planning?

Some work will be required to get the respective checking queries to do
the right think fast, but it's solvable in principle.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] xlog.c: WALInsertLock vs. WALWriteLock

2010-10-26 Thread Tom Lane
Alvaro Herrera  writes:
> Excerpts from Jeff Janes's message of mar oct 26 12:22:38 -0300 2010:
>> I don't think that holding WALWriteLock accomplishes much.  It
>> prevents part of the buffer from being written out to OS/disk, and
>> thus becoming eligible for being overwritten in the buffer, but the
>> WALInsertLock prevents it from actually being overwritten.  And what
>> if the part of the buffer you want to read was already eligible for
>> overwriting but not yet actually overwritten?  WALWriteLock won't
>> allow you to safely access it, but WALInsertLock will (assuming you
>> have a safe way to identify the record in the first place).  For
>> either case, holding it in shared mode would be sufficient.

> And horrible for performance, I imagine.  Those locks are highly trafficked.

I think you might actually need *both* locks to ensure the WAL buffers
aren't changing underneath you.  If you don't have the walwriter locked
out, it is free to change the state of a buffer from "dirty" to
"written" and then to "prepared to receive next page of WAL".  If the
latter doesn't involve changing the content of the buffer today, it
still could tomorrow.

And on top of all that, there remains the problem that the piece of WAL
you want might already be gone from the buffers.

Might I suggest adopting the same technique walsender does, ie just read
the data back from disk?  There's a reason why we gave up trying to have
walsender read directly from the buffers.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add label to enum syntax

2010-10-26 Thread David E. Wheeler
On Oct 26, 2010, at 7:15 AM, Robert Haas wrote:

>> Notwithstanding the above, I don't think ELEMENT would be a very bad choice.
> 
> I still think we should just go for LABEL and be done with it.  But
> y'all can ignore me if you want...

+1

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] xlog.c: WALInsertLock vs. WALWriteLock

2010-10-26 Thread Alvaro Herrera
Excerpts from Jeff Janes's message of mar oct 26 12:22:38 -0300 2010:

> I don't think that holding WALWriteLock accomplishes much.  It
> prevents part of the buffer from being written out to OS/disk, and
> thus becoming eligible for being overwritten in the buffer, but the
> WALInsertLock prevents it from actually being overwritten.  And what
> if the part of the buffer you want to read was already eligible for
> overwriting but not yet actually overwritten?  WALWriteLock won't
> allow you to safely access it, but WALInsertLock will (assuming you
> have a safe way to identify the record in the first place).  For
> either case, holding it in shared mode would be sufficient.

And horrible for performance, I imagine.  Those locks are highly trafficked.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] xlog.c: WALInsertLock vs. WALWriteLock

2010-10-26 Thread Jeff Janes
On Mon, Oct 25, 2010 at 6:31 PM, Robert Haas  wrote:
> On Fri, Oct 22, 2010 at 3:08 PM, fazool mein  wrote:
>> I'm writing a function that will read data from the buffer in xlog (i.e.
>> from XLogCtl->pages and XLogCtl->xlblocks). I want to make sure that I am
>> doing it correctly.
>> For reading from the buffer, do I need to lock WALInsertLock or
>> WALWriteLock? Also, can you explain a bit the usage of 'LW_SHARED'. Can we
>> use it for read purposes?
>
> Holding WALInsertLock in shared mode prevents other processes from
> inserting WAL, or in other words it keeps the "end" position from
> moving, while holding WALWriteLock in shared mode prevents other
> processes from writing the WAL from the buffers out to the operating
> system, or in other words it keeps the "start" position from moving.
> So you could probably take WALInsertLock in shared mode, figure out
> the current end of WAL position, release the lock;

Once you release the WALInsertLock, someone else can grab it and
overwrite the part of the buffer you think you are reading.
So I think you have to hold WALInsertLock throughout the duration of
the operation.

Of course it couldn't be overwritten if the wal record itself is not
yet written from buffer to the OS/disk.  But since you are not yet
holding the WALWriteLock, this could be happening at any time.

> then take
> WALWriteLock in shared mode, read any WAL before the end of WAL
> position, and release the lock.  But note that this wouldn't guarantee
> that you read all WAL as it's generated

I don't think that holding WALWriteLock accomplishes much.  It
prevents part of the buffer from being written out to OS/disk, and
thus becoming eligible for being overwritten in the buffer, but the
WALInsertLock prevents it from actually being overwritten.  And what
if the part of the buffer you want to read was already eligible for
overwriting but not yet actually overwritten?  WALWriteLock won't
allow you to safely access it, but WALInsertLock will (assuming you
have a safe way to identify the record in the first place).  For
either case, holding it in shared mode would be sufficient.


Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add label to enum syntax

2010-10-26 Thread Tom Lane
Alvaro Herrera  writes:
> Excerpts from Andrew Dunstan's message of mar oct 26 10:54:59 -0300 2010:
>> Unlike the other suggestions, ELEMENT is not currently a keyword. That 
>> doesn't rule it out entirely, but it's a factor worth consideration.

> It can be added as an unreserved keyword, as in the attached patch.

> I also like ELEMENT better than the other suggestions, so I'm gonna
> commit this unless there are objections.

I definitely vote *against* adding a new keyword for this, unreserved or
otherwise.  Every keyword we add bloats the bison parser by some
fractional amount, costing performance across the board.

While I'm not very thrilled with LABEL, it at least has the virtue that
we already paid the price for it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add label to enum syntax

2010-10-26 Thread Alvaro Herrera
Excerpts from Andrew Dunstan's message of mar oct 26 10:54:59 -0300 2010:

> On 10/26/2010 03:02 AM, Dean Rasheed wrote:
> > In mathematics (and I think also computer science), the term
> > conventionally used the refer to the things in an enumeration is
> > "element", so how about ADD ELEMENT?
> 
> Unlike the other suggestions, ELEMENT is not currently a keyword. That 
> doesn't rule it out entirely, but it's a factor worth consideration.

It can be added as an unreserved keyword, as in the attached patch.

I also like ELEMENT better than the other suggestions, so I'm gonna
commit this unless there are objections.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


0001-Change-syntax-to-add-a-new-enum-value-to-ALTER-TYPE-.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add label to enum syntax

2010-10-26 Thread Robert Haas
On Tue, Oct 26, 2010 at 9:54 AM, Andrew Dunstan  wrote:
>
>
> On 10/26/2010 03:02 AM, Dean Rasheed wrote:
>>
>> In mathematics (and I think also computer science), the term
>> conventionally used the refer to the things in an enumeration is
>> "element", so how about ADD ELEMENT?
>
> Unlike the other suggestions, ELEMENT is not currently a keyword. That
> doesn't rule it out entirely, but it's a factor worth consideration.
>
>> The label is just one of the ways of identifying the element, and the
>> value is element's OID. The thing you're adding is an element, with
>> both a label and a value.
>>
>
> No, I think that's the wrong way of thinking about it entirely. The label
> *is* the value and the OID is simply an implementation detail, which for the
> most part we keep completely hidden from the user. We could have implemented
> enums in ways that did not involve OIDs at all, with identical semantics.
>
> Notwithstanding the above, I don't think ELEMENT would be a very bad choice.

I still think we should just go for LABEL and be done with it.  But
y'all can ignore me if you want...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Rollback sequence reset on TRUNCATE rollback patch

2010-10-26 Thread Steve Singer
The attached patch modifies TRUNCATE ... RESTART IDENTITY so that if the 
transaction rolls back the restart of the sequence will also be rolled back.


It follows the general outline discussed at 
http://archives.postgresql.org/pgsql-hackers/2008-05/msg00550.php of 
assigning a new reffilenode to the sequence.



I will add this to the next commitfest.


Steve
diff --git a/doc/src/sgml/ref/truncate.sgml b/doc/src/sgml/ref/truncate.sgml
index f32d255..137eade 100644
*** a/doc/src/sgml/ref/truncate.sgml
--- b/doc/src/sgml/ref/truncate.sgml
*** TRUNCATE [ TABLE ] [ ONLY ] 
  
-   
-
- Any ALTER SEQUENCE RESTART operations performed as a
- consequence of using the RESTART IDENTITY option are
- nontransactional and will not be rolled back on failure.  To minimize
- the risk, these operations are performed only after all the rest of
- TRUNCATE's work is done.  However, there is still a risk
- if TRUNCATE is performed inside a transaction block that is
- aborted afterwards.  For example, consider
  
- 
- BEGIN;
- TRUNCATE TABLE foo RESTART IDENTITY;
- COPY foo FROM ...;
- COMMIT;
- 
- 
- If the COPY fails partway through, the table data
- rolls back correctly, but the sequences will be left with values
- that are probably smaller than they had before, possibly leading
- to duplicate-key failures or other problems in later transactions.
- If this is likely to be a problem, it's best to avoid using
- RESTART IDENTITY, and accept that the new contents of
- the table will have higher serial numbers than the old.
-
-   
   
  
   
--- 159,165 
diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index 04b0c71..4fb9093 100644
*** a/src/backend/commands/sequence.c
--- b/src/backend/commands/sequence.c
***
*** 35,41 
  #include "utils/lsyscache.h"
  #include "utils/resowner.h"
  #include "utils/syscache.h"
! 
  
  /*
   * We don't want to log each fetching of a value from a sequence,
--- 35,41 
  #include "utils/lsyscache.h"
  #include "utils/resowner.h"
  #include "utils/syscache.h"
! #include "utils/snapmgr.h"
  
  /*
   * We don't want to log each fetching of a value from a sequence,
*** static void init_params(List *options, b
*** 96,101 
--- 96,104 
  static void do_setval(Oid relid, int64 next, bool iscalled);
  static void process_owned_by(Relation seqrel, List *owned_by);
  
+ static void init_seq_relation(Relation rel,TupleDesc tupDesc,Datum * value,
+ 			  bool * null,List * owned_by);
+ 
  
  /*
   * DefineSequence
*** DefineSequence(CreateSeqStmt *seq)
*** 109,119 
  	CreateStmt *stmt = makeNode(CreateStmt);
  	Oid			seqoid;
  	Relation	rel;
! 	Buffer		buf;
! 	Page		page;
! 	sequence_magic *sm;
! 	HeapTuple	tuple;
! 	TupleDesc	tupDesc;
  	Datum		value[SEQ_COL_LASTCOL];
  	bool		null[SEQ_COL_LASTCOL];
  	int			i;
--- 112,118 
  	CreateStmt *stmt = makeNode(CreateStmt);
  	Oid			seqoid;
  	Relation	rel;
! 	TupleDesc tupDesc;
  	Datum		value[SEQ_COL_LASTCOL];
  	bool		null[SEQ_COL_LASTCOL];
  	int			i;
*** DefineSequence(CreateSeqStmt *seq)
*** 210,217 
  
  	rel = heap_open(seqoid, AccessExclusiveLock);
  	tupDesc = RelationGetDescr(rel);
  
! 	/* Initialize first page of relation with special magic number */
  
  	buf = ReadBuffer(rel, P_NEW);
  	Assert(BufferGetBlockNumber(buf) == 0);
--- 209,293 
  
  	rel = heap_open(seqoid, AccessExclusiveLock);
  	tupDesc = RelationGetDescr(rel);
+ 	init_seq_relation(rel, tupDesc,value,null,owned_by);
+ 	heap_close(rel, NoLock);
+ }
  
! /**
!  * Resets the relation used by a sequence.
!  *
!  * The sequence is reset to its initial values,
!  * the old sequence is left in place in case the
!  * transaction rolls back.
!  */
! void ResetSequenceRelation(Oid seq_relid,List * options)
! {
! 	Relation seq_rel = relation_open(seq_relid,AccessExclusiveLock);
! 	SeqTable elm;
! 	Page page;
! 	Form_pg_sequence seq;
! 	Buffer buf;
! 	TupleDesc tupDesc;
! 	sequence_magic * sm;
! 	HeapTupleData tuple;
! 	ItemId lp;
! 	Datum * values;
! 	bool * isnull;
! 
! 	/**
! 	 * Read the old sequence.
! 	 *
! 	 */
! 	init_sequence(seq_relid,&elm,&seq_rel);
! 	seq = read_info(elm,seq_rel,&buf);
! 	page = BufferGetPage(buf);
! 	sm = (sequence_magic *) PageGetSpecialPointer(page);
! 
! 	if (sm->magic != SEQ_MAGIC)
! 		elog(ERROR, "bad magic number in sequence \"%s\": %08X",
! 			 RelationGetRelationName(seq_rel), sm->magic);
! 
! 	lp = PageGetItemId(page, FirstOffsetNumber);
! 	Assert(ItemIdIsNormal(lp));
! 
! 	tuple.t_data = (HeapTupleHeader) PageGetItem(page, lp);
! 	tupDesc = RelationGetDescr(seq_rel);
! 	values=palloc(sizeof(Datum)*tupDesc->natts);
! 	isnull=palloc(sizeof(bool)*tupDesc->natts);
! 	heap_deform_tuple(&tuple,tupDesc,values,isnull);
! 	UnlockReleaseBuffer(buf);
! 	relation_close(seq_rel,NoLock);
! 
! 	/**
! 	 * Get a new Relfilenode
! 	 */
! 	RelationSetNewRelfilenode(seq_rel,RecentXmin);
! 
!

Re: [HACKERS] add label to enum syntax

2010-10-26 Thread Andrew Dunstan



On 10/26/2010 03:02 AM, Dean Rasheed wrote:

In mathematics (and I think also computer science), the term
conventionally used the refer to the things in an enumeration is
"element", so how about ADD ELEMENT?


Unlike the other suggestions, ELEMENT is not currently a keyword. That 
doesn't rule it out entirely, but it's a factor worth consideration.



The label is just one of the ways of identifying the element, and the
value is element's OID. The thing you're adding is an element, with
both a label and a value.



No, I think that's the wrong way of thinking about it entirely. The 
label *is* the value and the OID is simply an implementation detail, 
which for the most part we keep completely hidden from the user. We 
could have implemented enums in ways that did not involve OIDs at all, 
with identical semantics.


Notwithstanding the above, I don't think ELEMENT would be a very bad choice.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types, discrete and/or continuous

2010-10-26 Thread Hitoshi Harada
2010/10/26 Robert Haas :
> On Mon, Oct 25, 2010 at 8:13 PM, Jeff Davis  wrote:
>> On Mon, 2010-10-25 at 18:03 -0400, Robert Haas wrote:
>>> Hmm.  Do you have some concrete examples of cases where a range type
>>> might want to do some representational optimization?
>>
>> Let's say for instance you want to keep an timestamp range in 16 bytes.
>> You could have an 8-byte timestamp, a 7-byte integer that represents the
>> offset from that timestamp in microseconds, and one byte for flags (e.g.
>> NULL or infinite boundaries, etc.). I'm not sure that you can make that
>> representation work in a generic way.
>
> See, that gets complicated, because now you're restricting the range
> of values that can be expressed by the range type to something less
> than the natural range of the data type.  I am not sure the value of
> supporting that is sufficient to justify the amount of extra code that
> will be required to make it work.  I'd say for a first version, nail
> down the representation.  Perhaps in a future version you could have
> compress/uncompress methods sort of like GIST, but for a first cut it
> seems highly desirable to be able to say something like:
>
> CREATE TYPE int_range AS RANGE (BASETYPE = int);
>
> I hear you complaining that we don't know the values you've called
> dtype, cmpfunc, addfunc, and subfunc.  It seems pretty reasonable to
> extract cmpfunc, if unspecified, from the default btree opclass for
> the data type.  For the rest, I'm inclined to propose that we support
> something like:
>
> ALTER TYPE timestamptz
>    ADD INTERFACE increment timestamp_pl_interval(timestamptz, interval),
>    ADD INTERFACE decrement timestamp_mi_interval(timestamptz, interval);
>
> or
>
> ALTER TYPE integer
>   ADD INTERFACE increment int4pl (integer, integer),
>   ADD INTERFACE decrement int4mi (integer, integer),
>   ADD VALUE addative_unit 1::integer,
>   ADD VALUE addative_identity 0::integer;
>
> IIRC, Window functions need this information too, so there's value in
> associating it with the base type, even if we want to allow users to
> override it when creating ranges.

Yeah, window functions will require 'how to add or subtract value' of
ORDER BY expression data type to search window frame boundary when we
want to support RANGE BETWEEN frame option. I tried to retrieve the
information by hard-coding '+'/'-' to get it from pg_oper in the
previous patch, but actually we found out we need more general way
like above. This will help it. But I don't have blue-print of catalog
format for it yet, between knn gist and range types.

Regards,


-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tab completion for view triggers in psql

2010-10-26 Thread David Fetter
On Tue, Oct 26, 2010 at 12:35:13PM +0100, Dean Rasheed wrote:
> On 25 October 2010 21:01, David Fetter  wrote:
> > Folks,
> >
> > Please find attached patch for $subject :)
> >
> 
> Thanks for looking at this. I forgot about tab completion.
> 
> I think that the change to ALTER TRIGGER is not necessary. AFAICT it
> works OK unmodified. In fact, the modified code here:
> 
> *** 971,977  psql_completion(char *text, int start, int end)
>   else if (pg_strcasecmp(prev4_wd, "ALTER") == 0 &&
>pg_strcasecmp(prev3_wd, "TRIGGER") == 0 &&
>pg_strcasecmp(prev_wd, "ON") == 0)
> ! COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, NULL);
> 
>   /* ALTER TRIGGER  ON  */
>   else if (pg_strcasecmp(prev4_wd, "TRIGGER") == 0 &&
> --- 1055,1061 
>   else if (pg_strcasecmp(prev4_wd, "ALTER") == 0 &&
>pg_strcasecmp(prev3_wd, "TRIGGER") == 0 &&
>pg_strcasecmp(prev_wd, "ON") == 0)
> ! COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_writeables, NULL);
> 
>   /* ALTER TRIGGER  ON  */
>   else if (pg_strcasecmp(prev4_wd, "TRIGGER") == 0 &&
> 
> appears to be unreachable, because it is preceded by
> 
> else if (pg_strcasecmp(prev4_wd, "ALTER") == 0 &&
>  pg_strcasecmp(prev3_wd, "TRIGGER") == 0)
> {
> completion_info_charp = prev2_wd;
> COMPLETE_WITH_QUERY(Query_for_list_of_tables_for_trigger);
> }

It is indeed unreachable.

> which works for tables and views, and makes the next "elseif"
> impossible to satisfy.  So I think that block could just be deleted,
> right?

Yes.  Good catch.  New patch attached :)

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
*** a/src/bin/psql/tab-complete.c
--- b/src/bin/psql/tab-complete.c
***
*** 303,308  static const SchemaQuery Query_for_list_of_tables = {
--- 303,375 
NULL
  };
  
+ /* The bit masks for the following three functions come from
+  * src/include/catalog/pg_trigger.h.
+  */
+ static const SchemaQuery Query_for_list_of_insertables = {
+   /* catname */
+   "pg_catalog.pg_class c",
+   /* selcondition */
+   "c.relkind = 'r' OR (c.relkind = 'v' AND c.relhastriggers AND EXISTS "
+   "(SELECT 1 FROM pg_catalog.pg_trigger t WHERE t.tgrelid = c.oid AND 
t.tgtype | (1 << 2) = t.tgtype))",
+   /* viscondition */
+   "pg_catalog.pg_table_is_visible(c.oid)",
+   /* namespace */
+   "c.relnamespace",
+   /* result */
+   "pg_catalog.quote_ident(c.relname)",
+   /* qualresult */
+   NULL
+ };
+ 
+ static const SchemaQuery Query_for_list_of_deleteables = {
+   /* catname */
+   "pg_catalog.pg_class c",
+   /* selcondition */
+   "c.relkind = 'r' OR (c.relkind = 'v' AND c.relhastriggers AND EXISTS "
+   "(SELECT 1 FROM pg_catalog.pg_trigger t WHERE t.tgrelid = c.oid AND 
t.tgtype | (1 << 3) = t.tgtype))",
+   /* viscondition */
+   "pg_catalog.pg_table_is_visible(c.oid)",
+   /* namespace */
+   "c.relnamespace",
+   /* result */
+   "pg_catalog.quote_ident(c.relname)",
+   /* qualresult */
+   NULL
+ };
+ 
+ static const SchemaQuery Query_for_list_of_updateables = {
+   /* catname */
+   "pg_catalog.pg_class c",
+   /* selcondition */
+   "c.relkind = 'r' OR (c.relkind = 'v' AND c.relhastriggers AND EXISTS "
+   "(SELECT 1 FROM pg_catalog.pg_trigger t WHERE t.tgrelid = c.oid AND 
t.tgtype | (1 << 4) = t.tgtype))",
+   /* viscondition */
+   "pg_catalog.pg_table_is_visible(c.oid)",
+   /* namespace */
+   "c.relnamespace",
+   /* result */
+   "pg_catalog.quote_ident(c.relname)",
+   /* qualresult */
+   NULL
+ };
+ 
+ static const SchemaQuery Query_for_list_of_writeables = {
+   /* catname */
+   "pg_catalog.pg_class c",
+   /* selcondition */
+   "c.relkind = 'r' OR (c.relkind = 'v' AND c.relhastriggers AND EXISTS "
+   "(SELECT 1 FROM pg_catalog.pg_trigger t WHERE t.tgrelid = c.oid AND 
(t.tgtype & ( (1<<2) | (1<<3) | (1<<4)))::bool)",
+   /* viscondition */
+   "pg_catalog.pg_table_is_visible(c.oid)",
+   /* namespace */
+   "c.relnamespace",
+   /* result */
+   "pg_catalog.quote_ident(c.relname)",
+   /* qualresult */
+   NULL
+ };
+ 
  static const SchemaQuery Query_for_list_of_tisv = {
/* catname */
"pg_catalog.pg_class c",
***
*** 333,338  static const SchemaQuery Query_for_list_of_tsv = {
--- 400,420 
NULL
  };
  
+ static const SchemaQuery Query_for_list_of_tv = {
+   /* catname */
+   "pg_catalog.pg_class c",
+   /* selcondition */
+   "c.relk

Re: [HACKERS] SQL/MED with simple wrappers

2010-10-26 Thread Shigeru HANADA
Thanks for your comments.

On Mon, 25 Oct 2010 15:05:51 +0200
Pavel Stehule  wrote:
> > 4) List of foreign connections
> > Users (especially DBAs?) might want to see list of foreign connections.
> > Currently postgresql_fdw provides its own connection list via
> > postgresql_fdw_connections view.  Common view such as
> > pg_foreign_connections would be needed?  If so, function which returns
> > list of active connections would be necessary in FDW API.
> >
> 
> + list of foreign tables?

I've implemented that functionality in some places.

1) \det psql command shows list of foreign table in the format like
   \dew and \des.
2) pg_foreign_tables catalog shows pair of OIDs (relation oid and
   server oid) and options in raw format.
3) views in information_schema, foreign_tables and foreign_table_options
   show information about foreign tables in SQL standard format.

Here the detail of \det psql command is.

\det psql command (followed naming of \des/\dew) shows list of
foreign tables, and \det  shows the list of foreign tables
whose name match the pattern.  For example of file_fdw:

postgres=# \det
   List of foreign tables
Table |   Server
--+-
 csv_accounts | file_server
 csv_branches | file_server
 csv_history  | file_server
 csv_tellers  | file_server
(4 rows)

Adding postfix "+" shows per-table generic options too.

postgres=# \det+
   List of foreign tables
Table |   Server|Options
--+-+
 csv_accounts | file_server | 
{format=csv,filename=/home/hanada/DB/CSV/pgbench_accounts.csv}
 csv_branches | file_server | 
{format=csv,filename=/home/hanada/DB/CSV/pgbench_branches.csv}
 csv_history  | file_server | 
{format=csv,filename=/home/hanada/DB/CSV/pgbench_history.csv}
 csv_tellers  | file_server | 
{format=csv,filename=/home/hanada/DB/CSV/pgbench_tellers.csv}
(4 rows)

I have chosen \det+ command to show per-table options because \d+
command has already many columns and seems difficult to add long
values.

In addition to \det, \dec command would be necessary to show per-column
options with columns.  It hasn't been implemented yet, though.

> > 5) Routine mapping
> > If a function in local query can be evaluated on the remote side in
> > same semantics, it seems efficient to push the function down to the
> > remote side.  But how can we know whether the function can be pushed
> > down or not?  For such purpose, SQL/MED standard defines "routine
> > mapping".  Should we implement routine mapping?
> >
> 
> is it related to aggregate functions? If yes, this it can be really
> significant help

Yes.  I was thinking about only normal functions at original post,
though.  To push down aggregate function to remote side, FDW would
need additional planner hook to merge Aggregate node in to ForeignScan
node.  Such planner hook might be able to handle Order or Limit node
too.

> > 7) Using cursor in postgresql_fdw
> > postgresql_fdw fetches all of the result tuples from the foreign
> > server in the first pgIterate() call.  It could cause out-of-memory if
> > the result set was huge.  If libpq supports protocol-level cursor,
> > postgresql_fdw will be able to divide result set into some sets and
> > lower the usage of memory.  Or should we use declare implicit cursor
> > with DECLARE statement?  One connection can be used by multiple
> > ForeignScan nodes in a local query alternately.  An issue is that
> > cursor requires implicit transaction block.  Is it OK to start
> > transaction automatically?
> 
> I don't know why DECLARE statement is problem? Can you explain it, please.

The most serious issue would be that SQL-level cursors require
explicit transaction block.  To use SQL-level cursors with shared
connections between multiple ForeignScan, postgresql_fdw need to
manage transaction state per connection.  Especially, recovering
error would make codes complex.  Or, we would be able to take the
easiest way, discarding connection at the error.

I'll try to implement cursor-version of postgresql_fdw experimentally
to make this issue clearer.

Regards,
--
Shigeru Hanada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-26 Thread Dimitri Fontaine

Le 25 oct. 2010 à 17:26, Alvaro Herrera a écrit :
> Ah, some reading of the patch reveals that the "script" defaults to the
> control file name, but it can be overridden.

Yes, that's new in v10. In v11 I've kept that and removed the name property in 
the control file, so that we have:

cat contrib/intarray/intarray.control.in 
# intarray
comment = 'one-dimensional arrays of integers: functions, operators, index 
support'
version = 'EXTVERSION'
script = '_int.sql'


> I noticed that you're using ExclusiveLock when creating an extension,
> citing the handling of the global variable create_extension for this.
> There are two problems here: one is that you're releasing the lock way
> too early: if you wanted this to be effective, you'd need to hold on to
> the lock until after you've registered the extension.
> 
> The other is that there is no need for this at all, because this backend
> cannot be concurrently running another CREATE  EXTENSION comand, and
> this is a backend-local variable.  So there's no point.

I wanted to protect from another backend trying to create the same extension at 
the same time. So the critical path is the inserting into the catalog. I now 
see I failed to include the duplicate object check into the critical path, when 
I added that later.

Do you confirm protecting the insertion in the catalog is not worthy of special 
locking? To get proper locking requires some more thinking than I did put in, 
but if you say I'd better remove it...

> Why palloc create_extension every time?  Isn't it better to initialize
> it properly and have a boolean value telling whether it's to be used?
> Also, if an extension fails partway through creation, the var will be
> left set.  I think you need a PG_TRY block to reset it.

Good catches. I'm still uneasy with which memory context what allocation 
belongs too, hence the palloc()ing here.

> (I find the repeated coding pattern that tests create_extension for
> NULL-ness before calling recordDependencyOn a bit awkward; maybe hide it
> in a function or macro?  But then maybe that's just me.  Also, why
> palloc it?  Seems better to have it static.  Notice your new calls to
> recordDependencyOn are the only ones with operands not using the &
> operator.)

In fact the goal of the test is to check if we're in the code path for CREATE 
EXTENSION, rather than pointer validity per-say. I'll go have it static, too, 
with a bool to determine the code path.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

PS: I hope to be able to send this email, but uploading the git repo will be 
uneasy from the wifi here at best. Will send patches if email is ok.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-26 Thread Dimitri Fontaine

Le 25 oct. 2010 à 17:26, Alvaro Herrera a écrit :
> Ah, some reading of the patch reveals that the "script" defaults to the
> control file name, but it can be overridden.

Yes, that's new in v10. In v11 I've kept that and removed the name property in 
the control file, so that we have:

cat contrib/intarray/intarray.control.in 
# intarray
comment = 'one-dimensional arrays of integers: functions, operators, index 
support'
version = 'EXTVERSION'
script = '_int.sql'


> I noticed that you're using ExclusiveLock when creating an extension,
> citing the handling of the global variable create_extension for this.
> There are two problems here: one is that you're releasing the lock way
> too early: if you wanted this to be effective, you'd need to hold on to
> the lock until after you've registered the extension.
> 
> The other is that there is no need for this at all, because this backend
> cannot be concurrently running another CREATE  EXTENSION comand, and
> this is a backend-local variable.  So there's no point.

I wanted to protect from another backend trying to create the same extension at 
the same time. So the critical path is the inserting into the catalog. I now 
see I failed to include the duplicate object check into the critical path, when 
I added that later.

Do you confirm protecting the insertion in the catalog is not worthy of special 
locking? To get proper locking requires some more thinking than I did put in, 
but if you say I'd better remove it...

> Why palloc create_extension every time?  Isn't it better to initialize
> it properly and have a boolean value telling whether it's to be used?
> Also, if an extension fails partway through creation, the var will be
> left set.  I think you need a PG_TRY block to reset it.

Good catches. I'm still uneasy with which memory context what allocation 
belongs too, hence the palloc()ing here.

> (I find the repeated coding pattern that tests create_extension for
> NULL-ness before calling recordDependencyOn a bit awkward; maybe hide it
> in a function or macro?  But then maybe that's just me.  Also, why
> palloc it?  Seems better to have it static.  Notice your new calls to
> recordDependencyOn are the only ones with operands not using the &
> operator.)

In fact the goal of the test is to check if we're in the code path for CREATE 
EXTENSION, rather than pointer validity per-say. I'll go have it static, too, 
with a bool to determine the code path.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

PS: I hope to be able to send this email, but uploading the git repo will be 
uneasy from the wifi here at best. Will send patches if email is ok.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tab completion for view triggers in psql

2010-10-26 Thread Dean Rasheed
On 25 October 2010 21:01, David Fetter  wrote:
> Folks,
>
> Please find attached patch for $subject :)
>

Thanks for looking at this. I forgot about tab completion.

I think that the change to ALTER TRIGGER is not necessary. AFAICT it
works OK unmodified. In fact, the modified code here:

*** 971,977  psql_completion(char *text, int start, int end)
else if (pg_strcasecmp(prev4_wd, "ALTER") == 0 &&
 pg_strcasecmp(prev3_wd, "TRIGGER") == 0 &&
 pg_strcasecmp(prev_wd, "ON") == 0)
!   COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, NULL);

/* ALTER TRIGGER  ON  */
else if (pg_strcasecmp(prev4_wd, "TRIGGER") == 0 &&
--- 1055,1061 
else if (pg_strcasecmp(prev4_wd, "ALTER") == 0 &&
 pg_strcasecmp(prev3_wd, "TRIGGER") == 0 &&
 pg_strcasecmp(prev_wd, "ON") == 0)
!   COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_writeables, NULL);

/* ALTER TRIGGER  ON  */
else if (pg_strcasecmp(prev4_wd, "TRIGGER") == 0 &&

appears to be unreachable, because it is preceded by

else if (pg_strcasecmp(prev4_wd, "ALTER") == 0 &&
 pg_strcasecmp(prev3_wd, "TRIGGER") == 0)
{
completion_info_charp = prev2_wd;
COMPLETE_WITH_QUERY(Query_for_list_of_tables_for_trigger);
}

which works for tables and views, and makes the next "elseif"
impossible to satisfy. So I think that block could just be deleted,
right?

Regards,
Dean


> Cheers,
> David.
> --
> David Fetter  http://fetter.org/
> Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
> Skype: davidfetter      XMPP: david.fet...@gmail.com
> iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics
>
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensible executor nodes for preparation of SQL/MED

2010-10-26 Thread Itagaki Takahiro
Here is a WIP patch to extensible executor nodes.

It replaces large switch blocks with function-pointer calls (renamed to
PlanProcs from VTable). It has small performance win (Please test it!)
and capsulize some details of executor nodes, but is not perfect.

The infrastructure might be used by SQL/MED, but then not only PlanState
but also Plan and Path are required to be customizable. Complete cleanup
would be difficult, but I'm trying to find common ground for existing
postgres' implementation and extensible planner and executor.
Comments and suggestions welcome.


On Tue, Oct 26, 2010 at 12:21 PM, Itagaki Takahiro
 wrote:
> I didn't intend performance, but there is small but measurable win
> in it if I avoided indirections. We might not always need to copy
> a whole vtable into planstate; only ExecProcNode might be enough.
> I'll continue the research.
>
> 24957.767 ms : master (a big switch)
> 25059.838 ms : two indirections (planstate->plan->vtable->fn)
> 24819.298 ms : one indirection (planstate->plan->vtable.fn)
> 24118.436 ms : direct call (planstate->vtable.fn)
>
> So, major benefits of the change might be performance and code refactoring.
> Does anyone have comments about it for the functionality? It might also be
> used by SQL/MED and executor hooks, but I have no specific idea yet.


-- 
Itagaki Takahiro


extensible_execnodes-20101026.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Composite Types and Function Parameters

2010-10-26 Thread Greg
Hi Merlin, I completely forgot about hstore! I'll give it a go. Thanks!






From: Merlin Moncure 
To: Greg 
Cc: Pavel Stehule ; pgsql-hackers@postgresql.org
Sent: Mon, 25 October, 2010 23:52:55
Subject: Re: [HACKERS] Composite Types and Function Parameters

On Mon, Oct 25, 2010 at 6:38 PM, Greg  wrote:
>
> Hi Pavel, thanks! Yeah, thats what I though. I have to have a custom type or 
> a 
>very ugly looking solution for passing the params then.
>
> To Postgre dev. team: If anyone who involved in Postgre development reading 
>this, just a feature suggestion: allow array that can accept combination of 
>any 
>data types to be passed to a function, for example:
>   // declare
>   create function TEST ( anytypearray[] ) ...
>   // calling
>   perform TEST (array[bool, int, etc.] ) 
> This would make such a nice adition to the development for postgre. Although 
>this may be complecated to achieve.

probably hstore would be more appropriate for something like that.
You can also declare functions taking composite arrays, anyarray,
variadic array, and variadic "any", although the latter requires
function implementation in C to get the most out of it.

merlin



  

Re: [HACKERS] add label to enum syntax

2010-10-26 Thread Dean Rasheed
On 25 October 2010 21:31, Tom Lane  wrote:
> Andrew Dunstan  writes:
>> LABEL is already an unreserved keyword, and I'm pretty sure that's all
>> we'll need.
>
> The only reason it's a keyword is the SECURITY LABEL patch that went
> in a month or so ago; which is syntax that might still be thought
> better of before it gets to a release.
>
> But I seem to be in the minority, so I'll shut up now.
>
>                        regards, tom lane
>

In mathematics (and I think also computer science), the term
conventionally used the refer to the things in an enumeration is
"element", so how about ADD ELEMENT?

The label is just one of the ways of identifying the element, and the
value is element's OID. The thing you're adding is an element, with
both a label and a value.

Regards,
Dean


> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers