date:20141007

[HACKERS] Re: Failure with make check-world for pgtypeslib/dt_test2 with HEAD on OSX

2014-10-07 Thread Noah Misch

On Mon, Oct 06, 2014 at 08:57:54PM -0400, Tom Lane wrote:
> I eventually realized that the critical difference was you'd added
> "CFLAGS=" to the configure call.  On this platform that has the net
> effect of removing -O2 from the compiler flags, and apparently that
> shifts around the stack layout enough to expose the clobber.
> 
> The fix is simple enough: ecpg's version of ParseDateTime is failing
> to check for overrun of the field[] array until *after* it's already
> clobbered the stack:

Thanks for tracking that down.  Oops.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-10-07 Thread David Fetter

On Wed, Oct 08, 2014 at 12:41:46AM -0300, Fabrízio de Royes Mello wrote:
> On Wed, Oct 8, 2014 at 12:36 AM, Josh Berkus  wrote:
> >
> > On 10/07/2014 09:44 AM, Fabrízio de Royes Mello wrote:
> > > We can think in a mechanism to create "properties / options" and assign
> it
> > > to objects (table, index, column, schema, ...) like COMMENT does.
> > >
> > > A quickly thought:
> > >
> > > CREATE OPTION [ IF NOT EXISTS ] name
> > > VALIDATOR valfunction
> > > [ DEFAULT value ];
> > >
> > > ALTER TABLE name
> > > SET OPTION optname { TO | = } { value | 'value' | DEFAULT };
> > >
> > > It's just a simple thought of course. We must think better about the
> syntax
> > > and purposes.
> >
> > If we're going to do this (and it seems like a good idea), we really,
> > really, really need to include some general system views which expose
> > the options in a user-friendly format (like columns, JSON or an array).
> >  It's already very hard for users to get information about what
> > reloptions have been set.
> >
> 
> Maybe into "information_schema" ??

Not there.  The information schema is defined pretty precisely in the
SQL standard and contains only some of this kind of information.

pg_catalog seems like a much more appropriate space for
PostgreSQL-specific settings.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_receivexlog always handles -d option argument as connstr

2014-10-07 Thread Amit Kapila

On Tue, Oct 7, 2014 at 8:13 PM, Sawada Masahiko 
wrote:
>
> On Tue, Oct 7, 2014 at 12:58 PM, Amit Kapila 
wrote:
> > On Mon, Oct 6, 2014 at 10:23 PM, Sawada Masahiko 
> > wrote:
> >>
> >> Hi all,
> >>
> >> pg_receivexlog always handles argument of -d option as  connstr
formatted
> >> value.
> >> We can doubly specify host name, port number.
> >> The other client tools handles -d option as connstr value only if
> >> argument has "=" character.
> >
> > pg_basebackup also seems to behave same as pg_receivexlog.
> > psql also treats it in similar way.  The behaviour of psql is as
> > below:
> > psql.exe -d="host=localhost port=5432 dbname=postgres"
> > psql: invalid connection option ""
> >
> > psql.exe -d "host=localhost port=5432 dbname=postgres"
> > psql (9.5devel)
> > WARNING: Console code page (437) differs from Windows code page (1252)
> >  8-bit characters might not work correctly. See psql reference
> >  page "Notes for Windows users" for details.
> > Type "help" for help.
> >
> > postgres=#
> >
> >> The document says that pg_receivexlog ignores database name, and this
> >> option is called for consistency with other client applications.
> >> But if we specify database name like other client tool '-d hoge' ,
> >> then we will definitely got error.
> >
> > What I understand from document is that it ignores database name
> > when given in connection string.
> >
>
> Yep, but we can use -d option like 'psql -d postgres'.
> pg_receivexlog and pg_basebackup doesn't seem to work like that.
> they always handles argument as connstr formatted value.

psql needs to connect to particular database for its functioning where
as pg_basebackup/pg_receivexlog works differently and doesn't
need to connect to particular database for its functioning, so I am not
sure why you want them to behave similarly for -d switch.
Moreover the same is clarified in docs, so whats the issue?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback

2014-10-07 Thread furuyao

> On 09/29/2014 01:13 PM, furu...@pm.nttdata.co.jp wrote:
> >> I don't understand what this patch does. When would you want to use
> >> the new --reply-fsync option? Is there any reason *not* to use it?
> >> In other words, do we need an option for this, couldn't you just
> >> always send the feedback message after fsync?
> >
> > Thanks for the comment.
> >
> > --reply-fsync option is intended for use in synchronous mode.
> >
> > By specifying -F option and --slot option, process calls fsync() when
> > it received the WAL, and flush location would be set in feedback
> > message.
> >
> > Interval of sending feedback message depends on -s option in this
> > state,  so in the case of synchronous mode, waiting for feedback
> > message would occure.
> >
> > therefore, --reply-fsync option is necessary. because it can send the
> > feedback message after fsync without waiting for the interval of -s
> > option.
> >
> > The reason for not sending the feedback message after fsync without
> > waiting for the interval of -s option always, is to answer the needs
> > who want to use fsync only (NOT using synchronous mode).
> 
> I still don't get it. AFAICS there are two ways to use pg_receivexlog.
> Either you use it as a synchronous standby, or not.
> 
> What set of options would you pass if you want to use it as a synchronous
> standby? And if you don't? Could we just have a single "--synchronous"
> flag, instead of -F and --reply-fsync?

Thanks for comment.

If you set "synchronous_commit" as "remote_write", the options would be 
different .
The set of options in each case, see the following.


 Synchronous standby(synchronous_commit=on)
 --fsync-interval=-1
 --reply-fsync
 --slot=slotname

 Synchronous standby(synchronous_commit=remote_write)
 --fsync-interval=-1
 --reply-fsync

 Asynchronous
 There are no relative options.


Well, if the response time delay(value of "--status-interval=interval") is 
acceptable, "--reply-fsync" is unnecessary.
Instead of "--reply-fsync", using "--synchronous"(which summarizes the 
"--reply-fsync" and "fsync-interval = -1") might be easy to understand. 
Although, in that case,  "--fsync-interval=interval" would be fixed value. 
Isn't there any problem ?

Regards,

--
Furuya Osamu


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-10-07 Thread Fabrízio de Royes Mello

On Wed, Oct 8, 2014 at 12:36 AM, Josh Berkus  wrote:
>
> On 10/07/2014 09:44 AM, Fabrízio de Royes Mello wrote:
> > We can think in a mechanism to create "properties / options" and assign
it
> > to objects (table, index, column, schema, ...) like COMMENT does.
> >
> > A quickly thought:
> >
> > CREATE OPTION [ IF NOT EXISTS ] name
> > VALIDATOR valfunction
> > [ DEFAULT value ];
> >
> > ALTER TABLE name
> > SET OPTION optname { TO | = } { value | 'value' | DEFAULT };
> >
> > It's just a simple thought of course. We must think better about the
syntax
> > and purposes.
>
> If we're going to do this (and it seems like a good idea), we really,
> really, really need to include some general system views which expose
> the options in a user-friendly format (like columns, JSON or an array).
>  It's already very hard for users to get information about what
> reloptions have been set.
>

Maybe into "information_schema" ??

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-10-07 Thread Josh Berkus

On 10/07/2014 09:44 AM, Fabrízio de Royes Mello wrote:
> We can think in a mechanism to create "properties / options" and assign it
> to objects (table, index, column, schema, ...) like COMMENT does.
> 
> A quickly thought:
> 
> CREATE OPTION [ IF NOT EXISTS ] name
> VALIDATOR valfunction
> [ DEFAULT value ];
> 
> ALTER TABLE name
> SET OPTION optname { TO | = } { value | 'value' | DEFAULT };
> 
> It's just a simple thought of course. We must think better about the syntax
> and purposes.

If we're going to do this (and it seems like a good idea), we really,
really, really need to include some general system views which expose
the options in a user-friendly format (like columns, JSON or an array).
 It's already very hard for users to get information about what
reloptions have been set.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump refactor patch to remove global variables

2014-10-07 Thread Alvaro Herrera

Here's a rebased patch for this (I also pgindented it).

One thing I find a bit odd is the fact that we copy
RestoreOptions->superuser into DumpOptions->outputSuperuser (a char *
pointer) without pstrdup or similar.  We're already doing the inverse
elsewhere, and the uses of the routine where this is done are pretty
limited, so it seems harmless.  Still, it's pretty weird.  (Really,
the whole dumpOptionsFromRestoreOptions() business is odd.)

I'm not real happy with the forward struct declare in pg_backup.h, but I
think this is just general messiness in this code structure and not this
patch' fault.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c
index 2f855cf..17e9574 100644
--- a/src/bin/pg_dump/common.c
+++ b/src/bin/pg_dump/common.c
@@ -64,7 +64,7 @@ static DumpableObject **nspinfoindex;
 
 static void flagInhTables(TableInfo *tbinfo, int numTables,
 			  InhInfo *inhinfo, int numInherits);
-static void flagInhAttrs(TableInfo *tblinfo, int numTables);
+static void flagInhAttrs(DumpOptions *dopt, TableInfo *tblinfo, int numTables);
 static DumpableObject **buildIndexArray(void *objArray, int numObjs,
 Size objSize);
 static int	DOCatalogIdCompare(const void *p1, const void *p2);
@@ -78,7 +78,7 @@ static int	strInArray(const char *pattern, char **arr, int arr_size);
  *	  Collect information about all potentially dumpable objects
  */
 TableInfo *
-getSchemaData(Archive *fout, int *numTablesPtr)
+getSchemaData(Archive *fout, DumpOptions *dopt, int *numTablesPtr)
 {
 	ExtensionInfo *extinfo;
 	InhInfo*inhinfo;
@@ -114,7 +114,7 @@ getSchemaData(Archive *fout, int *numTablesPtr)
 	 */
 	if (g_verbose)
 		write_msg(NULL, "reading user-defined tables\n");
-	tblinfo = getTables(fout, &numTables);
+	tblinfo = getTables(fout, dopt, &numTables);
 	tblinfoindex = buildIndexArray(tblinfo, numTables, sizeof(TableInfo));
 
 	/* Do this after we've built tblinfoindex */
@@ -122,11 +122,11 @@ getSchemaData(Archive *fout, int *numTablesPtr)
 
 	if (g_verbose)
 		write_msg(NULL, "reading extensions\n");
-	extinfo = getExtensions(fout, &numExtensions);
+	extinfo = getExtensions(fout, dopt, &numExtensions);
 
 	if (g_verbose)
 		write_msg(NULL, "reading user-defined functions\n");
-	funinfo = getFuncs(fout, &numFuncs);
+	funinfo = getFuncs(fout, dopt, &numFuncs);
 	funinfoindex = buildIndexArray(funinfo, numFuncs, sizeof(FuncInfo));
 
 	/* this must be after getTables and getFuncs */
@@ -142,7 +142,7 @@ getSchemaData(Archive *fout, int *numTablesPtr)
 
 	if (g_verbose)
 		write_msg(NULL, "reading user-defined aggregate functions\n");
-	getAggregates(fout, &numAggregates);
+	getAggregates(fout, dopt, &numAggregates);
 
 	if (g_verbose)
 		write_msg(NULL, "reading user-defined operators\n");
@@ -183,7 +183,7 @@ getSchemaData(Archive *fout, int *numTablesPtr)
 
 	if (g_verbose)
 		write_msg(NULL, "reading default privileges\n");
-	getDefaultACLs(fout, &numDefaultACLs);
+	getDefaultACLs(fout, dopt, &numDefaultACLs);
 
 	if (g_verbose)
 		write_msg(NULL, "reading user-defined collations\n");
@@ -213,7 +213,7 @@ getSchemaData(Archive *fout, int *numTablesPtr)
 	 */
 	if (g_verbose)
 		write_msg(NULL, "finding extension members\n");
-	getExtensionMembership(fout, extinfo, numExtensions);
+	getExtensionMembership(fout, dopt, extinfo, numExtensions);
 
 	/* Link tables to parents, mark parents of target tables interesting */
 	if (g_verbose)
@@ -222,11 +222,11 @@ getSchemaData(Archive *fout, int *numTablesPtr)
 
 	if (g_verbose)
 		write_msg(NULL, "reading column info for interesting tables\n");
-	getTableAttrs(fout, tblinfo, numTables);
+	getTableAttrs(fout, dopt, tblinfo, numTables);
 
 	if (g_verbose)
 		write_msg(NULL, "flagging inherited columns in subtables\n");
-	flagInhAttrs(tblinfo, numTables);
+	flagInhAttrs(dopt, tblinfo, numTables);
 
 	if (g_verbose)
 		write_msg(NULL, "reading indexes\n");
@@ -307,7 +307,7 @@ flagInhTables(TableInfo *tblinfo, int numTables,
  * modifies tblinfo
  */
 static void
-flagInhAttrs(TableInfo *tblinfo, int numTables)
+flagInhAttrs(DumpOptions *dopt, TableInfo *tblinfo, int numTables)
 {
 	int			i,
 j,
@@ -384,7 +384,7 @@ flagInhAttrs(TableInfo *tblinfo, int numTables)
 attrDef->adef_expr = pg_strdup("NULL");
 
 /* Will column be dumped explicitly? */
-if (shouldPrintColumn(tbinfo, j))
+if (shouldPrintColumn(dopt, tbinfo, j))
 {
 	attrDef->separate = false;
 	/* No dependency needed: NULL cannot have dependencies */
diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c
index e50dd8b..ceed115 100644
--- a/src/bin/pg_dump/parallel.c
+++ b/src/bin/pg_dump/parallel.c
@@ -89,11 +89,12 @@ static void WaitForTerminatingWorkers(ParallelState *pstate);
 static void sigTermHandler(int signum);
 #endif
 static void SetupWorker(ArchiveHandle *AH, int pipefd[2], int worker,
+			DumpOptions *dopt,
 			RestoreOptions

Re: [HACKERS] PL/pgSQL 2

2014-10-07 Thread Jim Nasby


On 10/7/14, 1:08 PM, Rodolfo Campero wrote:

If it were possible to mark a function as "private for its extension" that 
would be awesome (the opposite would work too, i.e. a way to specify a public API, 
meaning the rest is private). For big extensions it's not clear which functions can be 
used directly by users of the extension and which ones are just implementation details.


I would love to have that both for extensions as well as outside of extensions. If you're 
doing sophisticated things in your database you'll end up wanting private objects, and 
right now the only "reasonable" way to do that is to throw them in a _blah 
schema and try to further hide them with permissions games. :(
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-10-07 Thread Peter Geoghegan

On Tue, Oct 7, 2014 at 5:23 AM, Simon Riggs  wrote:
> IIRC it wasn't agreed that we needed to identify which indexes in the
> upsert SQL statement itself, since this would be possible in other
> ways and would require programmers to know which unique constraints
> are declared.

Kevin seemed quite concerned about that. That is something that seems
hard to reconcile with supporting the MERGE syntax. Perhaps Kevin can
comment on that, since he was in favor of both being able to specify
user intent by accepting a unique index, while also being in favor of
the MERGE syntax.

> All of the other syntax could easily remain the same, leaving us with
> a command that looks like this...
>
> MERGE CONCURRENTLY INTO foo USING VALUES ()
> WHEN NOT MATCHED THEN
>   INSERT
> WHEN MATCHED THEN
>  UPDATE
>
> Since MERGE now supports DELETE and IGNORE as options, presumably we
> would also want to support those for the UPSERT version also.
> I think it would be useful to also support a mechanism for raising an
> error, as DB2 allows.

It seems like what you're talking about here is just changing the
spelling of what I already have. I think that would be confusing to
users when the time comes to actually implement a fully-generalized
MERGE, even with the clearly distinct MERGE CONCURRENTLY variant
outlined here (which, of course, lacks an outer join, unlike MERGE
proper).

However, unlike the idea of trying to square the circle of producing a
general purpose MERGE command that also supports the UPSERT use-case,
my objection to this much more limited proposal is made purely on
aesthetic grounds. I think that it is not very user-friendly; I do not
think that it's a total disaster, which is what trying to solve both
problems at once (MERGE bulkloading and UPSERTing) would result in. So
FWIW, if the community is really set on something that includes the
keyword MERGE, which is really all you outline here, then I can live
with that.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add regression tests for autocommit-off mode for psql and fix some omissions

2014-10-07 Thread Jim Nasby

On 10/7/14, 2:11 AM, Feike Steenbergen wrote:

On 7 October 2014 01:41, Jim Nasby  wrote:

>The options I see...
>
>1) If there's a definitive way to tell from backend source code what
>commands disallow transactions then we can just use that information to
>generate the list of commands psql shouldn't do that with.
>
>2) Always run the regression test with auto-commit turned off.
>
>3) Run the regression in both modes (presumably only on the build farm due
>to how long it would take).

1) I don't know about a definitive way. I used grep to find all
statements calling PreventTransactionChain.

Perhaps it wouldn't be too horrific to create some perl code that would figure 
out what all of those commands are, and we could then use that to generate the 
appropriate list for psql.

2) - I expect most people use autocommit-on; so only running it in
  autocommit-off would not test the majority of users.
- autocommit-off also obliges you to explicitly rollback transactions after
errors occur; this would probably mean a rewrite of some tests?

Well, that is at least doable, but probably rather ugly. It would probably be 
less ugly if our test framework had a way to test for errors (ala pgTap).

Where I was going with this is a full-on brute-force test: execute every 
possible command with autocommit turned off. We don't need to check that each 
command does what it's supposed to do, only that it can execute.

Of course, the huge problem with that is knowing how to actually successfully 
run each command. :( Theoretically the tests could be structured in such a way 
that there's a subset of tests that just see if the command even executes, but 
creating that is obviously a lot of work and with our current test framework 
probably a real pain to maintain.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Promise index tuples for UPSERT

2014-10-07 Thread Peter Geoghegan

On Tue, Oct 7, 2014 at 11:25 AM, Peter Geoghegan  wrote:
> Now, we're not talking about a huge advantage here (I should re-test
> that).

I attach raw output when running the bash scripts insert.sh and
update.sh. These are benchmarks that concern performance in terms of
total system throughput (TPS). The scripts are available from my
stress-test suite: https://github.com/petergeoghegan/upsert

These scripts were originally designed to compare UPSERT with an
unsympathetic "gold-standard" for performance: "equivalent" INSERTs
and UPDATEs. I looked at a few runs of 60 seconds, on unlogged tables,
making the most direct comparison between UPSERTs and "equivalent"
INSERTs and UPDATEs that is possible. To be clear, by "equivalent" I
mean UPSERTS where we know we'll only UPDATE (in the case of
update.sh), and UPSERTS where we know we'll only INSERT (in the case
of insert.sh).

Both #1 and #2 do respectably as compared to "equivalent" INSERTs and
UPDATEs. There may be even less sympathetic though more representative
cases, but certainly for these simple cases, performance is solid
across the board. I got these numbers on my laptop, and it may be
necessary to devise a more rigorous benchmark later, but performance
is quite consistent between runs shown here.

Approach #1 wins out with UPDATEs. The heavyweight-lock avoidance
stuff is enough to compensate for the fact that we never INSERT (and
never need B-Tree leaf page heavyweight locks). Median TPS was
19,310.79 for #1. Whereas for #2, it was 18,872.63 TPS. This is the
case even though the "pre-check" for #2 is always appropriate, while
we still acquire page-level heavyweight locks sometimes with #1.

INSERTs see #2 win, and by a wider margin than #1 beat #2 with
UPDATEs. However, insert.sh is by design very unsympathetic towards
#1. It uses a serial primary key, so every INSERT uselessly obtains a
HW lock on the same leaf page for the duration of heap insertion.
Anyway, the median INSERT TPS numbers is 17,759.53 for #1, and
20,441.57 TPS for #2. So you're pretty much seeing the full brunt of
page heavyweight locking, and it isn't all that bad. However, Heikki
has said something about being more clever with when and how #2 is
made to pre-check (which is always wasted here); so it's possible to
imagine INSERTs becoming faster for #2, while that probably isn't the
case for #1. I think that if I wanted to, I could get #1 to do much
better on another case where page heavyweight locking is more varied.
My goal here was to do the opposite, though.

-- 
Peter Geoghegan
hwlocking (#1):
===

  UPDATES (#1) 
pg@hamster:~/upsert$ ./update.sh
running ./update.sh benchmark 1000 times
Test upsert as a replacement for update...no insert part, for comparison 
against raw update performance
UPSERT with no inserts, only updates, vs. equivalent updates
running ./update.sh benchmark
trying 60 second run (new infrastructure, extended hwlocking)
NOTICE:  table "foo" does not exist, skipping
DROP TABLE
CREATE TABLE
pre-inserting tuples to update
CHECKPOINT
transaction type: Custom query
scaling factor: 100
query mode: simple
number of clients: 8
number of threads: 1
duration: 60 s
number of transactions actually processed: 1158697
latency average: 0.414 ms
tps = 19308.753179 (including connections establishing)
tps = 19313.317204 (excluding connections establishing)
trying 60 second run (traditional updates, equivalent to master)
DROP TABLE
CREATE TABLE
pre-inserting tuples to update
CHECKPOINT
transaction type: Custom query
scaling factor: 100
query mode: simple
number of clients: 8
number of threads: 1
duration: 60 s
number of transactions actually processed: 1552490
latency average: 0.309 ms
tps = 25873.503004 (including connections establishing)
tps = 25878.810475 (excluding connections establishing)

running ./update.sh benchmark
trying 60 second run (new infrastructure, extended hwlocking)
DROP TABLE
CREATE TABLE
pre-inserting tuples to update
CHECKPOINT
transaction type: Custom query
scaling factor: 100
query mode: simple
number of clients: 8
number of threads: 1
duration: 60 s
number of transactions actually processed: 1158654
latency average: 0.414 ms
tps = 19310.786710 (including connections establishing)
tps = 19314.838618 (excluding connections establishing)
trying 60 second run (traditional updates, equivalent to master)
DROP TABLE
CREATE TABLE
pre-inserting tuples to update
CHECKPOINT
transaction type: Custom query
scaling factor: 100
query mode: simple
number of clients: 8
number of threads: 1
duration: 60 s
number of transactions actually processed: 1557929
latency average: 0.308 ms
tps = 25964.116323 (including connections establishing)
tps = 25969.841506 (excluding connections establishing)

running ./update.sh benchmark
trying 60 second run (new infrastructure, extended hwlocking)
DROP TABLE
CREATE TABLE
pre-inserting tuples to update
CHECKPOINT
transaction type: Custom query
scaling factor: 100
query mode: simpl

Re: [HACKERS] Proposal for better support of time-varying timezone abbreviations

2014-10-07 Thread Tom Lane

I wrote:
> 4. I've eyeballed the relevant code a bit, and it seems that the only
> implementation aspect that isn't perfectly straightforward is figuring
> out how to cram a zic timezone reference into a datetkn table entry.
> I suggest that before tackling this feature proper, we bring struct
> datetkn into the 21st century by widening it from 12 to 16 bytes, along
> the lines of

> typedef struct
> {
> chartoken[TOKMAXLEN + 1];  /* now always null-terminated */
> chartype;
> int32   value;
> } datetkn;

> and getting rid of all of the very crufty code that deals with
> non-null-terminated token strings and cramming values that don't really
> fit into a char-sized field into "value".  (We might save more code bytes
> that way than we spend on the wider token-table entries :-( ... and we'll
> certainly make the code less ugly.)

Attached is a proposed patch for this part.  It turned out to be quite
a bit more exciting (not in a good way) than I'd expected.  Somewhere
along the way, somebody decided that the easiest way to get from point A
to point B was to insert a minus sign into the FROMVAL() macro, meaning
that FROMVAL() and TOVAL() were not inverses as one would rationally
expect.  There were various ways that one could deal with this bit of
dirty laundry; in particular I considered flipping the sign definition
of TZ/DTZ entry values.  It seemed better in the end to keep the sign
the same (matching the IANA code's convention for timezone offset sign)
and negate the results at the call sites where needed.  Since, in another
bit of weird design, all those call sites already had a multiplication
by MINS_PER_HOUR (the wrong spelling of "60", btw, but I digress) that
needed to be got rid of, this didn't result in touching more places
than I would have had to anyway.

Much worse for the present effort: I was reminded that ecpg contains
a *hard wired* list of known zone abbreviations and their GMT offsets;
a list that doesn't seem to have been updated since around 2003.  I'm
not sure what a reasonable fix would be for that.  ecpg can't assume it
has access to the timezone database, probably, so bringing it up to
speed with what I propose to do in the backend doesn't seem feasible.
For the moment I didn't change the data there, even though a lot of
the entries are obsolete.

The attached patch also fixes a possible free()-of-uninitialized-pointer
problem in PGTYPEStimestamp_defmt_scan().

regards, tom lane

diff --git a/src/backend/utils/adt/date.c b/src/backend/utils/adt/date.c
index 073104d..195ac6e 100644
*** a/src/backend/utils/adt/date.c
--- b/src/backend/utils/adt/date.c
*** timetz_zone(PG_FUNCTION_ARGS)
*** 2710,2718 
  	type = DecodeSpecial(0, lowzone, &val);

  	if (type == TZ || type == DTZ)
! 		tz = val * MINS_PER_HOUR;
  	else
  	{
  		tzp = pg_tzset(tzname);
  		if (tzp)
  		{
--- 2710,2722 
  	type = DecodeSpecial(0, lowzone, &val);

  	if (type == TZ || type == DTZ)
! 	{
! 		/* abbreviation */
! 		tz = -val;
! 	}
  	else
  	{
+ 		/* try it as a full zone name */
  		tzp = pg_tzset(tzname);
  		if (tzp)
  		{
diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index 7632d11..b159a08 100644
*** a/src/backend/utils/adt/datetime.c
--- b/src/backend/utils/adt/datetime.c
*** const char *const days[] = {"Sunday", "M
*** 70,101 
   */

  /*
-  * Definitions for squeezing values into "value"
-  * We set aside a high bit for a sign, and scale the timezone offsets
-  * in minutes by a factor of 15 (so can represent quarter-hour increments).
-  */
- #define ABS_SIGNBIT		((char) 0200)
- #define VALMASK			((char) 0177)
- #define POS(n)			(n)
- #define NEG(n)			((n)|ABS_SIGNBIT)
- #define SIGNEDCHAR(c)	((c)&ABS_SIGNBIT? -((c)&VALMASK): (c))
- #define FROMVAL(tp)		(-SIGNEDCHAR((tp)->value) * 15) /* uncompress */
- #define TOVAL(tp, v)	((tp)->value = ((v) < 0? NEG((-(v))/15): POS(v)/15))
- 
- /*
   * datetktbl holds date/time keywords.
   *
   * Note that this table must be strictly alphabetically ordered to allow an
   * O(ln(N)) search algorithm to be used.
   *
!  * The token field is NOT guaranteed to be NULL-terminated.
!  *
!  * To keep this table reasonably small, we divide the value for TZ and DTZ
!  * entries by 15 (so they are on 15 minute boundaries) and truncate the token
!  * field at TOKMAXLEN characters.
!  * Formerly, we divided by 10 rather than 15 but there are a few time zones
!  * which are 30 or 45 minutes away from an even hour, most are on an hour
!  * boundary, and none on other boundaries.
   *
   * The static table contains no TZ or DTZ entries, rather those are loaded
   * from configuration files and stored in timezonetktbl, which has the same
--- 70,82 
   */

  /*
   * datetktbl holds date/time keywords

Re: [HACKERS] lwlock contention with SSI

2014-10-07 Thread Andres Freund

On October 7, 2014 10:06:25 PM CEST, Kevin Grittner  wrote:
>Robert Haas  wrote:
>> On Tue, Oct 7, 2014 at 2:40 PM, Kevin Grittner 
>wrote:
>>> Robert Haas  wrote:
 About a month ago, I told Kevin Grittner in an off-list
>conversation
 that I'd work on providing him with some statistics about lwlock
 contention under SSI.  I then ran a benchmark on a 16-core,
 64-hardware thread IBM server, testing read-only pgbench
>performance
 at scale factor 300 with 1, 8, and 32 clients (and an equal number
>of
 client threads).
>>>
>>> I hate to say this when I know how much work benchmarking is, but I
>>> don't think any benchmark of serializable transactions has very
>>> much value unless you set any transactions which don't write to
>>> READ ONLY.  I guess it shows how a naive conversion by someone who
>>> doesn't read the docs or chooses to ignore the advice on how to get
>>> good performance will perform, but how interesting is that?
>>>
>>> It might be worth getting TPS numbers from the worst-looking test
>>> from this run, but with the read-only run done after changing
>>> default_transaction_read_only = on.  Some shops using serializable
>>> transactions set that in the postgresql.conf file, and require that
>>> any transaction which will be modifying data override it.
>>
>> Well, we could do that.  But I'm not sure it's very realistic.  The
>> pgbench workload is either 100% write or 100% read, but most real
>> work-loads are mixed; say, 95% read, 5% write.  If the client
>software
>> has to be responsible for flipping default_transaction_read_only for
>> every write transaction, or just doing BEGIN TRANSACTION READ WRITE
>> and COMMIT around each otherwise-single-statement write transaction,
>> that's a whole bunch of extra server round trips and complexity that
>> most people are not going to want to bother with.
>
>Well, people using serializable transactions have generally opted
>to deal with that rather than using SELECT ... FOR UPDATE, LOCK
>TABLE, etc.  There's no free lunch, and changing BEGIN to BEGIN
>TRANSACTION READ WRITE for those transactions which are expected to
>write data is generally a lot less bother than the other.  

Then it really shouldn't have supplanted the old pseudo serializable (aka 
repeatable read).

There's software where something like this is easy. But I think it's not that 
largely overlapping with the set of devs where serializable is the easier way.

Andres

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SKIP LOCKED DATA (work in progress)

2014-10-07 Thread Alvaro Herrera

Thomas Munro wrote:

> > I attach some additional minor suggestions to your patch.  Please feel
> > free to reword comments differently if you think my wording isn't an
> > improvements (or I've maked an english mistakes).
> 
> Thanks, these are incorporated in the new version (also rebased).

Pushed, thanks.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] lwlock contention with SSI

2014-10-07 Thread Kevin Grittner

Robert Haas  wrote:
> On Tue, Oct 7, 2014 at 2:40 PM, Kevin Grittner  wrote:
>> Robert Haas  wrote:
>>> About a month ago, I told Kevin Grittner in an off-list conversation
>>> that I'd work on providing him with some statistics about lwlock
>>> contention under SSI.  I then ran a benchmark on a 16-core,
>>> 64-hardware thread IBM server, testing read-only pgbench performance
>>> at scale factor 300 with 1, 8, and 32 clients (and an equal number of
>>> client threads).
>>
>> I hate to say this when I know how much work benchmarking is, but I
>> don't think any benchmark of serializable transactions has very
>> much value unless you set any transactions which don't write to
>> READ ONLY.  I guess it shows how a naive conversion by someone who
>> doesn't read the docs or chooses to ignore the advice on how to get
>> good performance will perform, but how interesting is that?
>>
>> It might be worth getting TPS numbers from the worst-looking test
>> from this run, but with the read-only run done after changing
>> default_transaction_read_only = on.  Some shops using serializable
>> transactions set that in the postgresql.conf file, and require that
>> any transaction which will be modifying data override it.
>
> Well, we could do that.  But I'm not sure it's very realistic.  The
> pgbench workload is either 100% write or 100% read, but most real
> work-loads are mixed; say, 95% read, 5% write.  If the client software
> has to be responsible for flipping default_transaction_read_only for
> every write transaction, or just doing BEGIN TRANSACTION READ WRITE
> and COMMIT around each otherwise-single-statement write transaction,
> that's a whole bunch of extra server round trips and complexity that
> most people are not going to want to bother with.

Well, people using serializable transactions have generally opted
to deal with that rather than using SELECT ... FOR UPDATE, LOCK
TABLE, etc.  There's no free lunch, and changing BEGIN to BEGIN
TRANSACTION READ WRITE for those transactions which are expected to
write data is generally a lot less bother than the other.  In fact,
most software I have seen using this has a transaction manager in
the Java code which pays attention to the definition of each type
of transaction -- so you override a default in a declaration.

> We can tell them that they have to do it anyway, of course.

The docs already recommend it.

I really would like to see the LW locking issues in SSI brought
up-to-date with the rest of the code, but I would rather focus on
the bottlenecks where people are fundamentally using good technique
rather than cases where they are not following the advice in the
docs[1], and doing so would massively boost performance without any
change to PostgreSQL.

A paper by the University of Sidney[2] found that in their tests
the bottleneck was the linked lists which track read-write
dependencies, reporting that at a concurrency of 128, "Our
profiling showed that PostgreSQL spend 2.3% of the overall runtime
in traversing these list, plus 10% of its runtime waiting on the
corresponding kernel mutexes."  This list is covered by
SerializableXactHashLock, so either or both of converting this to
something which is not O(N^2) or using lock-free access would
probably make a big difference in contention at higher concurrency.
(I think we may be able to do one or the other, but not both.)
Further tests may identify other bottlenecks in reasonable
workloads, but this seems sure to be one which needs attention.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] 
http://www.postgresql.org/docs/current/interactive/transaction-iso.html#XACT-SERIALIZABLE

[2] Hyungsoo Jung, Hyuck Han, Alan Fekete, Uwe Röhm, and Heon Y.
Yeom.  Performance of Serializable Snapshot Isolation on Multicore
Servers.  Technical Report 693, The University of Sidney School of
Information Technologies, December, 2012.
http://sydney.edu.au/engineering/it/research/tr/tr693.pdf
(Quote is from section 5.2, Shared System Data Structures,
subsection PostgreSQL.)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

2014-10-07 Thread Robert Haas

On Tue, Oct 7, 2014 at 10:51 AM, Andres Freund  wrote:
> On 2014-10-07 10:45:24 -0400, Robert Haas wrote:
>> > It's not like it'd be significantly different today - in a read mostly
>> > workload that's bottlenecked on ProcArrayLock you'll not see many
>> > waits. There you'd have to count the total number of spinlocks cycles to
>> > measure anything interesting.
>>
>> Hmm, really?  I've never had to do that to find bottlenecks.
>
> How did you diagnose procarray contention in a readonly workload
> otherwise, without using perf?

spindelays.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] lwlock contention with SSI

2014-10-07 Thread Robert Haas

On Tue, Oct 7, 2014 at 2:40 PM, Kevin Grittner  wrote:
> Robert Haas  wrote:
>> About a month ago, I told Kevin Grittner in an off-list conversation
>> that I'd work on providing him with some statistics about lwlock
>> contention under SSI.  I then ran a benchmark on a 16-core,
>> 64-hardware thread IBM server, testing read-only pgbench performance
>> at scale factor 300 with 1, 8, and 32 clients (and an equal number of
>> client threads).
>
> I hate to say this when I know how much work benchmarking is, but I
> don't think any benchmark of serializable transactions has very
> much value unless you set any transactions which don't write to
> READ ONLY.  I guess it shows how a naive conversion by someone who
> doesn't read the docs or chooses to ignore the advice on how to get
> good performance will perform, but how interesting is that?
>
> It might be worth getting TPS numbers from the worst-looking test
> from this run, but with the read-only run done after changing
> default_transaction_read_only = on.  Some shops using serializable
> transactions set that in the postgresql.conf file, and require that
> any transaction which will be modifying data override it.

Well, we could do that.  But I'm not sure it's very realistic.  The
pgbench workload is either 100% write or 100% read, but most real
work-loads are mixed; say, 95% read, 5% write.  If the client software
has to be responsible for flipping default_transaction_read_only for
every write transaction, or just doing BEGIN TRANSACTION READ WRITE
and COMMIT around each otherwise-single-statement write transaction,
that's a whole bunch of extra server round trips and complexity that
most people are not going to want to bother with.  We can tell them
that they have to do it anyway, of course.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] lwlock contention with SSI

2014-10-07 Thread Kevin Grittner

Robert Haas  wrote:

> About a month ago, I told Kevin Grittner in an off-list conversation
> that I'd work on providing him with some statistics about lwlock
> contention under SSI.  I then ran a benchmark on a 16-core,
> 64-hardware thread IBM server, testing read-only pgbench performance
> at scale factor 300 with 1, 8, and 32 clients (and an equal number of
> client threads).

I hate to say this when I know how much work benchmarking is, but I
don't think any benchmark of serializable transactions has very
much value unless you set any transactions which don't write to
READ ONLY.  I guess it shows how a naive conversion by someone who
doesn't read the docs or chooses to ignore the advice on how to get
good performance will perform, but how interesting is that?

It might be worth getting TPS numbers from the worst-looking test
from this run, but with the read-only run done after changing
default_transaction_read_only = on.  Some shops using serializable
transactions set that in the postgresql.conf file, and require that
any transaction which will be modifying data override it.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN pageinspect functions

2014-10-07 Thread Oleg Bartunov

On Tue, Oct 7, 2014 at 9:56 PM, Heikki Linnakangas 
wrote:

> On 10/07/2014 08:36 PM, Oleg Bartunov wrote:
>
>> On Tue, Oct 7, 2014 at 9:03 PM, Heikki Linnakangas <
>> hlinnakan...@vmware.com>
>> wrote:
>>
>>  Some time ago, when debugging a GIN bug, I wrote these pageinspect
>>> functions to inspect GIN indexes. They were very useful; we should add
>>> them.
>>>
>>>
>> May be we can merge it with contrib/gevel, which we use many years for
>> development and debug purposes ?  Have you seen it ?
>>
>
> I remember downloading it many years ago, but that's all I remember.
> Where's the latest version?
>

I believe it's there
http://www.sigaev.ru/git/gitweb.cgi?p=gevel.git;a=summary


> - Heikki
>

Re: [HACKERS] Promise index tuples for UPSERT

2014-10-07 Thread Peter Geoghegan

On Tue, Oct 7, 2014 at 6:06 AM, Robert Haas  wrote:
> I'm not up on the details of what Peter's patch does with heavyweight
> locking, but I'd say it this way: if the patch uses heavyweight
> locking routinely, that's probably not going to scale well[1].   If
> the patch detects possible conflicts and uses heavyweight locking only
> in those cases and for the specific purpose of untangling those
> conflicts, then that might well be OK.

The patch opportunistically tries to use shared buffer locks when a
conflict is expected, when we restart (but only on the unique index
where a conflict was detected). So in the event of a lot of
near-conflicts, the hwlock traffic is quite modest. That, combined
with the fact that it uses what I've called "an index scan with an
identity crisis" (could be a near-insertion + hwlock in advance of
insertion proper, or just something akin to a regular index scan)
makes it perform best (at least with one or two unique indexes, which
is what I tested a few months back). It does not have a pre-check that
is always wasted with insertion-heavy workloads.

Now, we're not talking about a huge advantage here (I should re-test
that). And, in case I wasn't clear: I have misgivings about all 3
designs. Like Simon, I think it is appropriate that we figure out our
exact requirements using the two working prototype patches. Although,
right now #1 and #2 (the prototypes) seem quite comparable, that might
just be down to a failure of imagination. It's hard to be completely
confident about something like that.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/pgSQL 2

2014-10-07 Thread Merlin Moncure

On Tue, Oct 7, 2014 at 12:42 PM, Steven Lembark  wrote:
> On Mon, 1 Sep 2014 15:19:41 +0200
> Joel Jacobson  wrote:
>
>> The fatal problems with Python3 and Perl6 was the inability to mix
>> code between Python2/3 and Perl5/6.
>> We don't have that problem with pl-languages in postgres, so please
>> don't make that comparison, as it's incorrect.
>
> Actually Perl6 can include Perl5 code allows you to "use v5.6" or "use
> v6.0" to regulate how the code in any one block is compiled w/in the
> program. Even Perl 5 allows mixing blocks/modules with different version
> syntax w/in the same compiler.

I don't think that really helps very much at the end of the day; Perl
6 was a disaster for Perl.  Breaking compatibility of a language is a
good way to kill it off.  Compiler support is only one example of a
very broad set of problems it causes.  Hiding that compatibility
breaking under "language 2.0" doesn't solve anything either.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/pgSQL 2

2014-10-07 Thread Rodolfo Campero

2014-09-04 18:29 GMT-03:00 Robert Haas :

> On Thu, Sep 4, 2014 at 2:31 PM, Josh Berkus  wrote:
> > Sadly, what's prevented us from having "packages" already has been the
> > insistence of potential feature sponsors that they work *exactly* like
> > PL/SQL's packages, which is incompatible with Postgres namespacing.
> > Also, we'd want any "package" concept to be usable with external PLs as
> > well as PL/pgSQL, which necessitates other Oracle-incompatible changes.
>
> This is not a fun area in which to try to be exactly like Oracle.
> Just to take one example, the whole package is created and dumped as a
> single object, with all of its contained functions *and their
> comments*, including the exact position of those comments, such as
> inside the argument list to document what particular arguments are
> supposed to do.  We've worked out a (partial) solution to that problem
> in Advanced Server, but it's not perfect, and it limits the ability to
> implement other features that PostgreSQL users would probably expect,
> like being able to add a function to a package after-the-fact.
> PostgreSQL has a certain cleanliness of design that comes from doing
> things in a way that makes sense from first principles, rather than
> the way that other people may have done it.  I'm not prepared to say
> that a $184B company made a bad design decision here - it certainly
> seems to have worked out for them - but it's not what I would have
> picked, and it's not a very good fit for other design decisions we've
> made in PostgreSQL already.
>
> All-in-all, I'm pretty happy with our EXTENSION system as a way of
> loading code (and SQL function definitions) in a modular way.  It's
> not perfect, but it's definitely made my life as a developer easier.
> There are some things you can do with an Oracle package but not a
> PostgreSQL extension, but there is an awful lot of overlap, too.  I
> doubt we'd want to duplicate all that machinery just for compatibility
> reasons.
>

If it were possible to mark a function as "private for its extension" that
would be awesome (the opposite would work too, i.e. a way to specify a
public API, meaning the rest is private). For big extensions it's not clear
which functions can be used directly by users of the extension and which
ones are just implementation details.

Re: [HACKERS] TAP test breakage on MacOS X

2014-10-07 Thread Tom Lane

Andrew Dunstan  writes:
> I don't much like the idea of doing an install/initdb/start for every 
> directory in src/bin, though. Can't we at least manage a single 
> installation directory for all these?

Peter had a patch to eliminate the overhead of multiple subinstalls;
not sure where that stands, but presumably it would address your issue.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN pageinspect functions

2014-10-07 Thread Heikki Linnakangas


On 10/07/2014 08:36 PM, Oleg Bartunov wrote:

On Tue, Oct 7, 2014 at 9:03 PM, Heikki Linnakangas 
wrote:


Some time ago, when debugging a GIN bug, I wrote these pageinspect
functions to inspect GIN indexes. They were very useful; we should add them.



May be we can merge it with contrib/gevel, which we use many years for
development and debug purposes ?  Have you seen it ?


I remember downloading it many years ago, but that's all I remember. 
Where's the latest version?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/pgSQL 2

2014-10-07 Thread Steven Lembark

On Mon, 1 Sep 2014 15:19:41 +0200
Joel Jacobson  wrote:

> The fatal problems with Python3 and Perl6 was the inability to mix
> code between Python2/3 and Perl5/6.
> We don't have that problem with pl-languages in postgres, so please
> don't make that comparison, as it's incorrect.

Actually Perl6 can include Perl5 code allows you to "use v5.6" or "use 
v6.0" to regulate how the code in any one block is compiled w/in the 
program. Even Perl 5 allows mixing blocks/modules with different version
syntax w/in the same compiler.

The mistake Python made was not allowing the Python 3 compiler to 
gracefully handle Pythin 2 input.

-- 
Steven Lembark 3646 Flora Pl
Workhorse Computing   St Louis, MO 63110
lemb...@wrkhors.com  +1 888 359 3508

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN pageinspect functions

2014-10-07 Thread Oleg Bartunov

On Tue, Oct 7, 2014 at 9:03 PM, Heikki Linnakangas 
wrote:

> Some time ago, when debugging a GIN bug, I wrote these pageinspect
> functions to inspect GIN indexes. They were very useful; we should add them.
>

May be we can merge it with contrib/gevel, which we use many years for
development and debug purposes ?  Have you seen it ?

>
> - Heikki
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

[HACKERS] GIN pageinspect functions

2014-10-07 Thread Heikki Linnakangas

Some time ago, when debugging a GIN bug, I wrote these pageinspect 
functions to inspect GIN indexes. They were very useful; we should add them.


- Heikki
>From 91ef58aab11e9077ab6a38268a1120806e42f2dd Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Fri, 12 Sep 2014 13:36:30 +0300
Subject: [PATCH 1/1] Add pageinspect functions for inspecting GIN indexes.

---
 contrib/pageinspect/Makefile  |   7 +-
 contrib/pageinspect/ginfuncs.c| 264 ++
 contrib/pageinspect/pageinspect--1.2--1.3.sql |  42 
 contrib/pageinspect/pageinspect--1.3.sql  | 149 +++
 contrib/pageinspect/pageinspect.control   |   2 +-
 5 files changed, 460 insertions(+), 4 deletions(-)
 create mode 100644 contrib/pageinspect/ginfuncs.c
 create mode 100644 contrib/pageinspect/pageinspect--1.2--1.3.sql
 create mode 100644 contrib/pageinspect/pageinspect--1.3.sql

diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile
index f10229d..1a8b060 100644
--- a/contrib/pageinspect/Makefile
+++ b/contrib/pageinspect/Makefile
@@ -1,11 +1,12 @@
 # contrib/pageinspect/Makefile
 
 MODULE_big	= pageinspect
-OBJS		= rawpage.o heapfuncs.o btreefuncs.o fsmfuncs.o $(WIN32RES)
+OBJS		= rawpage.o heapfuncs.o btreefuncs.o fsmfuncs.o ginfuncs.o $(WIN32RES)
 
 EXTENSION = pageinspect
-DATA = pageinspect--1.2.sql pageinspect--1.0--1.1.sql \
-	pageinspect--1.1--1.2.sql pageinspect--unpackaged--1.0.sql
+DATA = pageinspect--1.3.sql pageinspect--1.0--1.1.sql \
+	pageinspect--1.1--1.2.sql pageinspect--1.2--1.3.sql \
+pageinspect--unpackaged--1.0.sql
 PGFILEDESC = "pageinspect - functions to inspect contents of database pages"
 
 ifdef USE_PGXS
diff --git a/contrib/pageinspect/ginfuncs.c b/contrib/pageinspect/ginfuncs.c
new file mode 100644
index 000..949d19d
--- /dev/null
+++ b/contrib/pageinspect/ginfuncs.c
@@ -0,0 +1,264 @@
+/*
+ * contrib/pageinspect/ginfuncs.c
+ */
+
+#include "postgres.h"
+
+#include "access/gin.h"
+#include "access/gin_private.h"
+#include "access/htup_details.h"
+#include "catalog/namespace.h"
+#include "catalog/pg_type.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "utils/builtins.h"
+#include "utils/rel.h"
+
+#define DatumGetItemPointer(X)	 ((ItemPointer) DatumGetPointer(X))
+#define ItemPointerGetDatum(X)	 PointerGetDatum(X)
+
+PG_FUNCTION_INFO_V1(gin_metapage);
+PG_FUNCTION_INFO_V1(gin_pageopaq);
+PG_FUNCTION_INFO_V1(gin_dataleafpage);
+
+Datum
+gin_metapage(PG_FUNCTION_ARGS)
+{
+	bytea	   *raw_page = PG_GETARG_BYTEA_P(0);
+	int			raw_page_size;
+	TupleDesc	tupdesc;
+	Page		page;
+	GinPageOpaque opaq;
+	GinMetaPageData *metadata;
+	HeapTuple	resultTuple;
+	Datum		values[10];
+	bool		nulls[10];
+
+	if (!superuser())
+		ereport(ERROR,
+(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ (errmsg("must be superuser to use raw page functions";
+
+	raw_page_size = VARSIZE(raw_page) - VARHDRSZ;
+	if (raw_page_size < BLCKSZ)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("input page too small (%d bytes)", raw_page_size)));
+	page = VARDATA(raw_page);
+
+	opaq = (GinPageOpaque) PageGetSpecialPointer(page);
+	if (opaq->flags != GIN_META)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("input page is not a GIN metapage"),
+ errdetail("Flags %04X, expected %04X",
+		   opaq->flags, GIN_META)));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	metadata = GinPageGetMeta(page);
+
+	memset(nulls, 0, sizeof(nulls));
+
+	values[0] = Int64GetDatum(metadata->head);
+	values[1] = Int64GetDatum(metadata->tail);
+	values[2] = Int32GetDatum(metadata->tailFreeSize);
+	values[3] = Int64GetDatum(metadata->nPendingPages);
+	values[4] = Int64GetDatum(metadata->nPendingHeapTuples);
+
+	/* statistics, updated by VACUUM */
+	values[5] = Int64GetDatum(metadata->nTotalPages);
+	values[6] = Int64GetDatum(metadata->nEntryPages);
+	values[7] = Int64GetDatum(metadata->nDataPages);
+	values[8] = Int64GetDatum(metadata->nEntries);
+
+	values[9] = Int32GetDatum(metadata->ginVersion);
+
+	/* Build and return the result tuple. */
+	resultTuple = heap_form_tuple(tupdesc, values, nulls);
+
+	return HeapTupleGetDatum(resultTuple);
+}
+
+
+Datum
+gin_pageopaq(PG_FUNCTION_ARGS)
+{
+	bytea	   *raw_page = PG_GETARG_BYTEA_P(0);
+	int			raw_page_size;
+	TupleDesc	tupdesc;
+	Page		page;
+	GinPageOpaque opaq;
+	HeapTuple	resultTuple;
+	Datum		values[3];
+	bool		nulls[10];
+	Datum		flags[16];
+	int			nflags = 0;
+	uint16		flagbits;
+
+	if (!superuser())
+		ereport(ERROR,
+(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ (errmsg("must be superuser to use raw page functions";
+
+	raw_page_size = VARSIZE(raw_page) - VARHDRSZ;
+	if (raw_page_size < BLCKSZ)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("input page too small (%d bytes)",

Re: [HACKERS] TAP test breakage on MacOS X

2014-10-07 Thread Andrew Dunstan



On 10/07/2014 09:53 AM, Andrew Dunstan wrote:


On 10/07/2014 12:15 AM, Tom Lane wrote:

Robert Haas  writes:
On Mon, Oct 6, 2014 at 8:15 PM, Peter Eisentraut  
wrote:

The TAP tests
are arguably already much easier to debug than pg_regress ever was.

Well, maybe.  I wasn't able, after about 5 minutes of searching, to
locate either a log file with details of the failure or the code that
revealed what the test, the expected result, and the actual result
were.  It's possible that all that information is there and I just
don't know where to look; it took me a while to learn where the
various logs (postmaster.log, initdb.log, results) left behind by
pg_regress were, too.  If that information is not there, then I'd say
it's not easier to debug.  If it is and I don't know where to look ...
well then I just need to get educated.

The given case seemed pretty opaque to me too.  Could we maybe
have some documentation about how to debug TAP failures?  Or in
other words, if they're "arguably" easier to debug, how about
presenting that argument?

Also to the point: does the buildfarm script know how to collect
the information needed to debug a TAP failure?





No. In fact, it doesn't yet know how to run those tests. That's on my 
TODO list.






OK, I have a preliminary cut at adding these tests to the client. See 
 
for an example run. The patch is at 



I don't much like the idea of doing an install/initdb/start for every 
directory in src/bin, though. Can't we at least manage a single 
installation directory for all these?


Also I notice that the tests remove their data directories. That could 
make collecting any diagnosis data more difficult. Right now, I have no 
idea what I'm looking for anyway.


cheers

andrew





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] OCLASS_ROWSECURITY oversights, and other kvetching

2014-10-07 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> The RLS patch added OCLASS_ROWSECURITY but it seems that not enough
> effort was made to grep for places that might require adjustment as a
> result.
> 
> In objectaddress.c, getObjectDescription() was updated, but
> getObjectTypeDescription() and getObjectIdentity() were not.
> 
> In dependency.c, object_classes didn't get updated.

I'm trying to recall why I didn't think it was necessary to add it into
more places..  I did do the 'grep' as described.  I'll go back and
review these.

> I also really question why we've got OCLASS_ROWSECURITY but
> OBJECT_POLICY.  In most cases, we name the OBJECT_* construct and the
> OCLASS_* construct similarly.  This is actually just the tip of the
> iceberg: we've got OBJECT_POLICY but OCLASS_ROWSECURITY (no underscore
> between row and security) and then we've got DO_ROW_SECURITY (with an
> underscore) and pg_row_security.

I'm guessing you mean pg_rowsecurity..  DO_ROW_SECURITY is in pg_dump
only and the ROW_SECURITY cases in the backend are representing the
'row_security' GUC.  That said, I'm not against changing things to be
more consistent, of course.  In this case, pg_rowsecurity should really
be pg_policies as that's what's actually in that catalog.  The original
naming was from the notion that the table-level attribute is
'ROW LEVEL SECURITY', but on reflection it's clearer to have it as
pg_policies.

> But then on the other hand the
> source code is in policy.c.  

Right, the functions for dealing with policies are in policy.c, while
the actual implementation of the table-level 'ROW LEVEL SECURITY'
attribute is in rowsecurity.c.

> pg_dump tries to sit on the fence by
> alternating between all the different names and sometimes combining
> them (row-security policy).  Some places refer to row-LEVEL security
> rather than row security or policies.

There's three different things happening in pg_dump, which I suspect is
why it's gotten inconsistent.  There's setting the ROW_SECURITY GUC,
dumping the fact that the table has been set to ENABLE ROW LEVEL
SECURITY, and dumping out the actual policies which are defined on the
table.

> I think this kind of messiness makes code really hard to maintain and
> should be cleaned up now while we have a chance.  For the most part,
> we have chosen to name our catalogs, SQL commands, and internal
> constants by *what kind of object it is* (in this case, a policy)
> rather than by *the feature it provides* (in this case, row security).
> So I think that everything relates to a policy specifically
> (OCLASS_ROWSECURITY, pg_row_security, etc.) should be renamed to refer
> to policies instead.  The references to row security should be
> preserved only when we are talking about the table-level property,
> which is actually called ROW SECURITY, or the feature in general.

I certainly agree that it can and should be improved.  Given that the
table property is ROW SECURITY, I'd think we would keep the GUC as
'ROW_SECURITY', but change all of the places which are currently working
with policies to use POLICY, such as OCLASS_ROWSECURITY ->
OCLASS_POLICY.  I'll go back through and review it with these three
distinctions top-of-mind and work out what other changes make sense.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-10-07 Thread Fabrízio de Royes Mello

On Tue, Oct 7, 2014 at 1:24 PM, Simon Riggs  wrote:
>
> On 31 July 2014 22:34, Stephen Frost  wrote:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> Stephen Frost  writes:
> >> > * Bruce Momjian (br...@momjian.us) wrote:
> >> >> Actually, thinking more, Stephen Frost mentioned that the auditing
> >> >> system has to modify database _state_, and dumping/restoring the
state
> >> >> of an extension might be tricky.
> >>
> >> > This is really true of any extension which wants to attach
information
> >> > or track things associated with roles or other database objects.
What
> >> > I'd like to avoid is having an extension which does so through an
extra
> >> > table or through reloptions or one of the other approaches which
exists
> >> > in contrib and which implements a capability we're looking at adding
to
> >> > core
> >>
> >> We have core code that uses reloptions --- autovacuum for instance ---
> >> so I'm not exactly clear on why that's so unacceptable for this.
> >
> > There was a pretty good thread regarding reloptions and making it so
> > extensions could use them which seemed to end up with a proposal to turn
> > 'security labels' into a more generic metadata capability.  Using that
> > kind of a mechanism would at least address one of my concerns about
> > using reloptions (specifically that they're specific to relations and
> > don't account for the other objects in the system).  Unfortunately, the
> > flexibility desired for auditing is more than just "all actions of this
> > role" or "all actions on this table" but also "actions of this role on
> > this table", which doesn't fit as well.
>
> Yes, there is a requirement, in some cases, for per role/relation
> metadata. Grant and ACLs are a good example.
>
> I spoke with Robert about a year ago that the patch he was most proud
> of was the reloptions abstraction. Whatever we do in the future,
> keeping metadata in a slightly more abstract form is very useful.
>

When we discussed about the rejected patch "store-custom-reloptions" I
pointed my thoughts about it in
http://www.postgresql.org/message-id/cafcns+p+2oa2fg7o-8kwmckazjaywue6mvnnudpurpt0pz8...@mail.gmail.com

We can think in a mechanism to create "properties / options" and assign it
to objects (table, index, column, schema, ...) like COMMENT does.

A quickly thought:

CREATE OPTION [ IF NOT EXISTS ] name
VALIDATOR valfunction
[ DEFAULT value ];

ALTER TABLE name
SET OPTION optname { TO | = } { value | 'value' | DEFAULT };

It's just a simple thought of course. We must think better about the syntax
and purposes.


> I hope we can get pgAudit in as a module for 9.5. I also hope that it
> will stimulate the requirements/funding of further work in this area,
> rather than squash it. My feeling is we have more examples of feature
> sets that grow over time (replication, view handling, hstore/JSONB
> etc) than we have examples of things languishing in need of attention
> (partitioning).
>

+1

Regards.

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello

Re: [HACKERS] Proposal for better support of time-varying timezone abbreviations

2014-10-07 Thread Jim Nasby


On 10/6/14, 6:19 PM, Jim Nasby wrote:

FWIW, I agree for timestamptz, but I do wish we had a timestamp datatype that stored the 
exact timezone in effect when the data was entered. That can really, REALLY save your 
rear if you screw up either timezone in postgresql.conf, or the server's timezone. The 
part that seems hard (at least to me) is the question of how to actually store the 
timezone, because I don't think storing the text string "America/Central" is 
going to cut it. :/


For the archives... there's an extension that does what I'd been talking about: 
http://pgxn.org/dist/timestampandtz/.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-10-07 Thread Robert Haas

On Tue, Oct 7, 2014 at 12:24 PM, Simon Riggs  wrote:
> I spoke with Robert about a year ago that the patch he was most proud
> of was the reloptions abstraction. Whatever we do in the future,
> keeping metadata in a slightly more abstract form is very useful.

To slightly correct the record, I believe I was referring to the
generic EXPLAIN options syntax, since copied into a lot of other
places and used to unblock various patches that would have otherwise
died on the vine.

The reloptions stuff was Alvaro's work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-10-07 Thread Stephen Frost

Simon,

* Simon Riggs (si...@2ndquadrant.com) wrote:
> On 31 July 2014 22:34, Stephen Frost  wrote:
> > There was a pretty good thread regarding reloptions and making it so
> > extensions could use them which seemed to end up with a proposal to turn
> > 'security labels' into a more generic metadata capability.  Using that
> > kind of a mechanism would at least address one of my concerns about
> > using reloptions (specifically that they're specific to relations and
> > don't account for the other objects in the system).  Unfortunately, the
> > flexibility desired for auditing is more than just "all actions of this
> > role" or "all actions on this table" but also "actions of this role on
> > this table", which doesn't fit as well.
> 
> Yes, there is a requirement, in some cases, for per role/relation
> metadata. Grant and ACLs are a good example.
> 
> I spoke with Robert about a year ago that the patch he was most proud
> of was the reloptions abstraction. Whatever we do in the future,
> keeping metadata in a slightly more abstract form is very useful.

Agreed.

> I hope we can get pgAudit in as a module for 9.5. I also hope that it
> will stimulate the requirements/funding of further work in this area,
> rather than squash it. My feeling is we have more examples of feature
> sets that grow over time (replication, view handling, hstore/JSONB
> etc) than we have examples of things languishing in need of attention
> (partitioning).

I've come around to this also (which I think I commented on
previously..), as it sounds like the upgrade concerns I was worried
about can be addressed, and having pgAudit would certainly be better
than not having any kind of auditing support.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] RLS - permissive vs restrictive

2014-10-07 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> The key point from my angle is that if you grant user alice the right
> to see records where a = 1 and user bob the right to see records where
> a = 2, the multiple-policy approach allows those quals to be
> implemented as index-scans.  If you had a single policy granting all
> users the right to see records where policyfunc() returns true, it
> would never be indexable.

Right, that is certainly an important aspect also.

> I think that Thom's idea of having some policies that are additional
> filter conditions on top of everything else is a pretty good one.
> It's probably possible to construct a case where you need multiple
> levels of AND and OR logic, which Thom's proposal does not provide
> for.  But are there really cases like that which anyone cares about?

I keep coming back to the feeling that we'd need some kind of exception
capability (more than just excluding the owner), without which this
feature wouldn't end up being practical.

> I think we're going to be tempted to think about that question for
> about 60 seconds and say "nope", and that's probably not enough
> thought.  It deserves serious reflection, because I think Thom's
> proposal is terminal: if we do what he's proposing, it'll be hard to
> extend the idea any further if we later discover that it isn't general
> enough.  That having been said, what he's proposing is simple and
> covers a fair amount of ground, and is thus worthy of serious
> consideration, at least IMHO.

Even given the above, I do like the idea in general and have been
thinking we need to provide something along these lines.  I've been
trying to work out if we could provide a way to get to a generalized
CNF capability for policies, but I agree that it's unclear if there's
a real-world need for such.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-10-07 Thread Simon Riggs

On 31 July 2014 22:34, Stephen Frost  wrote:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Stephen Frost  writes:
>> > * Bruce Momjian (br...@momjian.us) wrote:
>> >> Actually, thinking more, Stephen Frost mentioned that the auditing
>> >> system has to modify database _state_, and dumping/restoring the state
>> >> of an extension might be tricky.
>>
>> > This is really true of any extension which wants to attach information
>> > or track things associated with roles or other database objects.  What
>> > I'd like to avoid is having an extension which does so through an extra
>> > table or through reloptions or one of the other approaches which exists
>> > in contrib and which implements a capability we're looking at adding to
>> > core
>>
>> We have core code that uses reloptions --- autovacuum for instance ---
>> so I'm not exactly clear on why that's so unacceptable for this.
>
> There was a pretty good thread regarding reloptions and making it so
> extensions could use them which seemed to end up with a proposal to turn
> 'security labels' into a more generic metadata capability.  Using that
> kind of a mechanism would at least address one of my concerns about
> using reloptions (specifically that they're specific to relations and
> don't account for the other objects in the system).  Unfortunately, the
> flexibility desired for auditing is more than just "all actions of this
> role" or "all actions on this table" but also "actions of this role on
> this table", which doesn't fit as well.

Yes, there is a requirement, in some cases, for per role/relation
metadata. Grant and ACLs are a good example.

I spoke with Robert about a year ago that the patch he was most proud
of was the reloptions abstraction. Whatever we do in the future,
keeping metadata in a slightly more abstract form is very useful.

I hope we can get pgAudit in as a module for 9.5. I also hope that it
will stimulate the requirements/funding of further work in this area,
rather than squash it. My feeling is we have more examples of feature
sets that grow over time (replication, view handling, hstore/JSONB
etc) than we have examples of things languishing in need of attention
(partitioning).

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] OCLASS_ROWSECURITY oversights, and other kvetching

2014-10-07 Thread Robert Haas

The RLS patch added OCLASS_ROWSECURITY but it seems that not enough
effort was made to grep for places that might require adjustment as a
result.

In objectaddress.c, getObjectDescription() was updated, but
getObjectTypeDescription() and getObjectIdentity() were not.

In dependency.c, object_classes didn't get updated.

I also really question why we've got OCLASS_ROWSECURITY but
OBJECT_POLICY.  In most cases, we name the OBJECT_* construct and the
OCLASS_* construct similarly.  This is actually just the tip of the
iceberg: we've got OBJECT_POLICY but OCLASS_ROWSECURITY (no underscore
between row and security) and then we've got DO_ROW_SECURITY (with an
underscore) and pg_row_security.  But then on the other hand the
source code is in policy.c.  pg_dump tries to sit on the fence by
alternating between all the different names and sometimes combining
them (row-security policy).  Some places refer to row-LEVEL security
rather than row security or policies.

I think this kind of messiness makes code really hard to maintain and
should be cleaned up now while we have a chance.  For the most part,
we have chosen to name our catalogs, SQL commands, and internal
constants by *what kind of object it is* (in this case, a policy)
rather than by *the feature it provides* (in this case, row security).
So I think that everything relates to a policy specifically
(OCLASS_ROWSECURITY, pg_row_security, etc.) should be renamed to refer
to policies instead.  The references to row security should be
preserved only when we are talking about the table-level property,
which is actually called ROW SECURITY, or the feature in general.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

2014-10-07 Thread Ilya Kosmodemiansky

On Tue, Oct 7, 2014 at 4:45 PM, Robert Haas  wrote:
>> It's not like it'd be significantly different today - in a read mostly
>> workload that's bottlenecked on ProcArrayLock you'll not see many
>> waits. There you'd have to count the total number of spinlocks cycles to
>> measure anything interesting.
>
> Hmm, really?  I've never had to do that to find bottlenecks.

Not sure. Long waiting time represents the same thing better or at
least well enough. I think only acquisitions count is important.



>>> Having said that, if there's no blocking or spindelay any more, to me
>>> that doesn't mean we should look for some other measure of contention
>>> instead.  It just means that the whole area is a solved problem, we
>>> don't need to measure contention any more because there isn't any, and
>>> we can move on to other issues once we finish partying.  But mildly
>>> skeptical that the outcome will be as good as all that.
>>
>> It's not. Just because we're not waiting in a spinlock loop doesn't mean
>> there can't be contention... It's just moved one level down, into the cpu.
>
> I guess that's true, but how much of the contention at that level is
> really important to expose to DBAs?  We can put anything that is of
> developer interest only int LWLOCK_STATS.

For DBA all this means we are waiting for a lock.

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company



-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
i...@postgresql-consulting.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

2014-10-07 Thread Andres Freund

On 2014-10-07 10:45:24 -0400, Robert Haas wrote:
> > It's not like it'd be significantly different today - in a read mostly
> > workload that's bottlenecked on ProcArrayLock you'll not see many
> > waits. There you'd have to count the total number of spinlocks cycles to
> > measure anything interesting.
> 
> Hmm, really?  I've never had to do that to find bottlenecks.

How did you diagnose procarray contention in a readonly workload
otherwise, without using perf?

> >> Having said that, if there's no blocking or spindelay any more, to me
> >> that doesn't mean we should look for some other measure of contention
> >> instead.  It just means that the whole area is a solved problem, we
> >> don't need to measure contention any more because there isn't any, and
> >> we can move on to other issues once we finish partying.  But mildly
> >> skeptical that the outcome will be as good as all that.
> >
> > It's not. Just because we're not waiting in a spinlock loop doesn't mean
> > there can't be contention... It's just moved one level down, into the cpu.
> 
> I guess that's true, but how much of the contention at that level is
> really important to expose to DBAs?

I think so. Right now it's hard to see for them whether the rate of
transactions/the isolation mode is a significant problem or not.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

2014-10-07 Thread Ilya Kosmodemiansky

On Tue, Oct 7, 2014 at 4:30 PM, Andres Freund  wrote:
> On 2014-10-07 17:22:18 +0300, Heikki Linnakangas wrote:
>> FWIW, I liked Ilya's design. Before going to sleep, store the lock ID in
>> shared memory. When you wake up, clear it. That should be cheap enough to
>> have it always enabled. And it can easily be extended to other "waits", e.g.
>> when you're waiting for input from client.
>
> I think there's a few locks where that's interesting. But in my
> experience many slowdowns aren't caused by actual waits, but because of
> cacheline contention. And for that the number of acquisitions is much
> more relevant than the waiting. The primary example for this is probably
> the procarray lock.

I would say, that to see particular lwlockid 50 times in 100 samples
or to see it 50 times one after another or see it only 2 times,
provides good and representative information for DBA. At least better
than nothing.

>
>> I don't think counting the number of lock acquisition is that interesting.
>> It doesn't give you any information on how long the waits were, for
>> example.
>
> Sure, that's a separate thing that we should be able to answer.

The point is that a lot of short waits sometimes could be as worse as
one long wait. That is why it is important, but I thing propper
sampling provides good estimation for this.

>
> Greetings,
>
> Andres Freund
>
> --
>  Andres Freund http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services



-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
i...@postgresql-consulting.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

2014-10-07 Thread Robert Haas

On Tue, Oct 7, 2014 at 10:36 AM, Andres Freund  wrote:
>> That gets painful in a hurry.  We just got rid of something like that
>> with your patch to get rid of all the backend-local buffer pin arrays;
>> I'm not keen to add another such thing right back.
>
> I think it might be ok if we'd exclude buffer locks and made it depend
> on a GUC.

Maybe.  I'm skeptical about whether what you'll get at that point is
really useful.

> It's not like it'd be significantly different today - in a read mostly
> workload that's bottlenecked on ProcArrayLock you'll not see many
> waits. There you'd have to count the total number of spinlocks cycles to
> measure anything interesting.

Hmm, really?  I've never had to do that to find bottlenecks.

>> Having said that, if there's no blocking or spindelay any more, to me
>> that doesn't mean we should look for some other measure of contention
>> instead.  It just means that the whole area is a solved problem, we
>> don't need to measure contention any more because there isn't any, and
>> we can move on to other issues once we finish partying.  But mildly
>> skeptical that the outcome will be as good as all that.
>
> It's not. Just because we're not waiting in a spinlock loop doesn't mean
> there can't be contention... It's just moved one level down, into the cpu.

I guess that's true, but how much of the contention at that level is
really important to expose to DBAs?  We can put anything that is of
developer interest only int LWLOCK_STATS.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_receivexlog always handles -d option argument as connstr

2014-10-07 Thread Sawada Masahiko

On Tue, Oct 7, 2014 at 12:58 PM, Amit Kapila  wrote:
> On Mon, Oct 6, 2014 at 10:23 PM, Sawada Masahiko 
> wrote:
>>
>> Hi all,
>>
>> pg_receivexlog always handles argument of -d option as  connstr formatted
>> value.
>> We can doubly specify host name, port number.
>> The other client tools handles -d option as connstr value only if
>> argument has "=" character.
>
> pg_basebackup also seems to behave same as pg_receivexlog.
> psql also treats it in similar way.  The behaviour of psql is as
> below:
> psql.exe -d="host=localhost port=5432 dbname=postgres"
> psql: invalid connection option ""
>
> psql.exe -d "host=localhost port=5432 dbname=postgres"
> psql (9.5devel)
> WARNING: Console code page (437) differs from Windows code page (1252)
>  8-bit characters might not work correctly. See psql reference
>  page "Notes for Windows users" for details.
> Type "help" for help.
>
> postgres=#
>
>> The document says that pg_receivexlog ignores database name, and this
>> option is called for consistency with other client applications.
>> But if we specify database name like other client tool '-d hoge' ,
>> then we will definitely got error.
>
> What I understand from document is that it ignores database name
> when given in connection string.
>

Yep, but we can use -d option like 'psql -d postgres'.
pg_receivexlog and pg_basebackup doesn't seem to work like that.
they always handles argument as connstr formatted value.
psql handles argument as database name or connstr, depends on variable argument.

$ bin/pg_receivexlog -D xlog_dir/ -S hoge -d postgres -p 5432
pg_receivexlog: missing "=" after "postgres" in connection info string

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

2014-10-07 Thread Ilya Kosmodemiansky

On Tue, Oct 7, 2014 at 4:12 PM, Andres Freund  wrote:
>> I think the easiest way to measure lwlock contention would be to put
>> some counters in the lwlock itself.  My guess, based on a lot of
>> fiddling with LWLOCK_STATS over the years, is that there's no way to
>> count lock acquisitions and releases without harming performance
>> significantly - no matter where we put the counters, it's just going
>> to be too expensive.  However, I believe that incrementing a counter -
>> even in the lwlock itself - might not be too expensive if we only do
>> it when (1) a process goes to sleep or (2) spindelays occur.
>
> Increasing the size will be painful on its own :(.

I am afraid in this case we should think about minimizing overhead but
not about avoiding it at all: having such DBA-friendly feature it is
worth it.

Let me step down a bit, since the discussion went to details, while
the whole design idea stays unclear.

What actually we need: fact, that lwlock acquired? lock count? time
spent in lock? overall lock duration?

Usual way to explain how any of such performance tools work, is
Traffic example (and any oracle/db2 wait-interface aware DBA knows
it):

You have some from home to office way and spend an hour to make it.
You try to optimize it and found, that however you take highway with
no speed limit, you usually stack in traffic turning from highway to
your office and spend there about 10-30 min. Alternative is to take
another way with 2 speed limit zones and one traffic light, totally
you will loose 2 and 5 minutes on speed limit parts and 2 min on red
light - overall better than 30 minutes in a jam and even better than
10 min in a jam. That is all about: to found bottleneck we need
information that process hold certain lock, that it was held certain
time or there are a lot of shorter time locks.

I think, sampling even 1-2 times pro second and building sort of
histogram is well enough at the moment, because it shows (not very in
a very precise manner however) that process hold certain lock, that it
was held certain time or there are a lot of shorter time locks.
After that it is possible to implement something more precise. (As far
as I know, Greg Smith works on some sort of wait events, but it seems
to me there are a lot of work to do to implement exact analog of OWI)

-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
i...@postgresql-consulting.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

2014-10-07 Thread Andres Freund

On 2014-10-07 10:30:54 -0400, Robert Haas wrote:
> On Tue, Oct 7, 2014 at 10:12 AM, Andres Freund  wrote:
> > Have you tried/considered putting the counters into a per-backend array
> > somewhere in shared memory? That way they don't blow up the size of
> > frequently ping-ponged cachelines. Then you can summarize those values
> > whenever querying the results.
> 
> The problem with that is that you need O(N*M) memory instead of O(N),
> where N is the number of lwlocks and M is the number of backends.

Right.

> That gets painful in a hurry.  We just got rid of something like that
> with your patch to get rid of all the backend-local buffer pin arrays;
> I'm not keen to add another such thing right back.

I think it might be ok if we'd exclude buffer locks and made it depend
on a GUC.

> It might be viable
> if we excluded the buffer locks, but that also detracts from the
> value.  Plus, no matter how you slice it, you're now touching cache
> lines completely unrelated to the ones you need for the foreground
> work.  That's got a distributed overhead that's hard to measure, but
> it is certainly going to knock other stuff out of the CPU caches to
> some degree.

Yea, it's hard to guess ;(

> >> As a further point, when I study the LWLOCK_STATS output, that stuff
> >> is typically what I'm looking for anyway.  The first few times I ran
> >> with that enabled, I was kind of interested by the total lock counts
> >> ... but that quickly got uninteresting.  The blocking and spindelays
> >> show you where the problems are, so that's the interesting part.
> >
> > I don't really agree with this. Especially with shared locks (even more
> > so if/hwen the LW_SHARED stuff gets in), there's simply no relevant
> > blocking and spindelay.
> 
> If your patch to implement lwlocks using atomics goes in, then we may
> have to reassess what instrumentation is actually useful here.  I can
> only comment on the usefulness of various bits of instrumentation I
> have used in the past on the code bases we had that time, or my
> patches thereupon.  Nobody here can reasonably be expected to know
> whether the same stuff will still be useful after possible future
> patches that are not even in a reviewable state at present have been
> committed.

It's not like it'd be significantly different today - in a read mostly
workload that's bottlenecked on ProcArrayLock you'll not see many
waits. There you'd have to count the total number of spinlocks cycles to
measure anything interesting.

> Having said that, if there's no blocking or spindelay any more, to me
> that doesn't mean we should look for some other measure of contention
> instead.  It just means that the whole area is a solved problem, we
> don't need to measure contention any more because there isn't any, and
> we can move on to other issues once we finish partying.  But mildly
> skeptical that the outcome will be as good as all that.

It's not. Just because we're not waiting in a spinlock loop doesn't mean
there can't be contention... It's just moved one level down, into the cpu.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

2014-10-07 Thread Robert Haas

On Tue, Oct 7, 2014 at 10:12 AM, Andres Freund  wrote:
> Have you tried/considered putting the counters into a per-backend array
> somewhere in shared memory? That way they don't blow up the size of
> frequently ping-ponged cachelines. Then you can summarize those values
> whenever querying the results.

The problem with that is that you need O(N*M) memory instead of O(N),
where N is the number of lwlocks and M is the number of backends.
That gets painful in a hurry.  We just got rid of something like that
with your patch to get rid of all the backend-local buffer pin arrays;
I'm not keen to add another such thing right back.  It might be viable
if we excluded the buffer locks, but that also detracts from the
value.  Plus, no matter how you slice it, you're now touching cache
lines completely unrelated to the ones you need for the foreground
work.  That's got a distributed overhead that's hard to measure, but
it is certainly going to knock other stuff out of the CPU caches to
some degree.

>> As a further point, when I study the LWLOCK_STATS output, that stuff
>> is typically what I'm looking for anyway.  The first few times I ran
>> with that enabled, I was kind of interested by the total lock counts
>> ... but that quickly got uninteresting.  The blocking and spindelays
>> show you where the problems are, so that's the interesting part.
>
> I don't really agree with this. Especially with shared locks (even more
> so if/hwen the LW_SHARED stuff gets in), there's simply no relevant
> blocking and spindelay.

If your patch to implement lwlocks using atomics goes in, then we may
have to reassess what instrumentation is actually useful here.  I can
only comment on the usefulness of various bits of instrumentation I
have used in the past on the code bases we had that time, or my
patches thereupon.  Nobody here can reasonably be expected to know
whether the same stuff will still be useful after possible future
patches that are not even in a reviewable state at present have been
committed.

Having said that, if there's no blocking or spindelay any more, to me
that doesn't mean we should look for some other measure of contention
instead.  It just means that the whole area is a solved problem, we
don't need to measure contention any more because there isn't any, and
we can move on to other issues once we finish partying.  But mildly
skeptical that the outcome will be as good as all that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

2014-10-07 Thread Andres Freund

On 2014-10-07 17:22:18 +0300, Heikki Linnakangas wrote:
> FWIW, I liked Ilya's design. Before going to sleep, store the lock ID in
> shared memory. When you wake up, clear it. That should be cheap enough to
> have it always enabled. And it can easily be extended to other "waits", e.g.
> when you're waiting for input from client.

I think there's a few locks where that's interesting. But in my
experience many slowdowns aren't caused by actual waits, but because of
cacheline contention. And for that the number of acquisitions is much
more relevant than the waiting. The primary example for this is probably
the procarray lock.

> I don't think counting the number of lock acquisition is that interesting.
> It doesn't give you any information on how long the waits were, for
> example.

Sure, that's a separate thing that we should be able to answer.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

2014-10-07 Thread Heikki Linnakangas


On 10/07/2014 05:04 PM, Robert Haas wrote:

On Tue, Oct 7, 2014 at 8:03 AM, Bruce Momjian  wrote:

On Fri, Oct  3, 2014 at 06:06:24PM -0400, Bruce Momjian wrote:

I actually don't think that's true. Every lock acquiration implies a
number of atomic locks. Those are expensive. And if you see individual
locks acquired a high number of times in multiple proceses that's
something important. It causes significant bus traffic between sockets,
while not necessarily visible in the lock held times.


True, but I don't think users are going to get much value from those
numbers, and they are hard to get.  Server developers might want to know
lock counts, but in those cases performance might not be as important.


In summary, I think there are three measurements we can take on locks:

1.  lock wait, from request to acquisition
2.  lock duration, from acquisition to release
3.  lock count

I think #1 is the most useful, and can be tracked by scanning a single
PGPROC lock entry per session (as already outlined), because you can't
wait on more than one lock at a time.

#2 would probably require multiple PGPROC lock entries, though I am
unclear how often a session holds multiple light-weight locks
concurrently.  #3 might require global counters in memory.

#1 seems the most useful from a user perspective, and we can perhaps
experiment with #2 and #3 once that is done.


I agree with some of your thoughts on this, Bruce, but there are some
points I'm not so sure about.

I have a feeling that any system that involves repeatedly scanning the
procarray will either have painful performance impact (if it's
frequent) or catch only a statistically insignificant fraction of lock
acquisitions (if it's infrequent).  The reason I think there may be a
performance impact is because quite a number of heavily-trafficked
shared memory structures are bottlenecked on memory latency, so it's
easy to imagine that having an additional process periodically reading
them would increase cache-line bouncing and hurt performance.  We will
probably need some experimentation to find the best idea.

I think the easiest way to measure lwlock contention would be to put
some counters in the lwlock itself.  My guess, based on a lot of
fiddling with LWLOCK_STATS over the years, is that there's no way to
count lock acquisitions and releases without harming performance
significantly - no matter where we put the counters, it's just going
to be too expensive.  However, I believe that incrementing a counter -
even in the lwlock itself - might not be too expensive if we only do
it when (1) a process goes to sleep or (2) spindelays occur.  Those
operations are expensive enough that I think the cost of an extra
shared memory access won't be too significant.


FWIW, I liked Ilya's design. Before going to sleep, store the lock ID in 
shared memory. When you wake up, clear it. That should be cheap enough 
to have it always enabled. And it can easily be extended to other 
"waits", e.g. when you're waiting for input from client.


I don't think counting the number of lock acquisition is that 
interesting. It doesn't give you any information on how long the waits 
were, for example. I think the question the user or DBA is trying to 
answer is "Why is this query taking so long, even though the CPU is 
sitting idle?". A sampling approach works well for that.


For comparison, the "perf" tool works great for figuring out where the 
CPU time is spent in a process. It works by sampling. This is similar, 
but for wallclock time, and that we can hopefully produce more 
user-friendly output.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RLS - permissive vs restrictive

2014-10-07 Thread Robert Haas

On Tue, Oct 7, 2014 at 9:55 AM, Stephen Frost  wrote:
> ... or you could just only have one policy on the table and do whatever
> you'd like with it (which was the original idea, actually, though I've
> found myself very much liking the ability to have multiple policies, and
> to have them set for specific roles and commands, rather than having to
> have one very complicated policy or having to use a function..).

The key point from my angle is that if you grant user alice the right
to see records where a = 1 and user bob the right to see records where
a = 2, the multiple-policy approach allows those quals to be
implemented as index-scans.  If you had a single policy granting all
users the right to see records where policyfunc() returns true, it
would never be indexable.

I think that Thom's idea of having some policies that are additional
filter conditions on top of everything else is a pretty good one.
It's probably possible to construct a case where you need multiple
levels of AND and OR logic, which Thom's proposal does not provide
for.  But are there really cases like that which anyone cares about?
I think we're going to be tempted to think about that question for
about 60 seconds and say "nope", and that's probably not enough
thought.  It deserves serious reflection, because I think Thom's
proposal is terminal: if we do what he's proposing, it'll be hard to
extend the idea any further if we later discover that it isn't general
enough.  That having been said, what he's proposing is simple and
covers a fair amount of ground, and is thus worthy of serious
consideration, at least IMHO.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Promise index tuples for UPSERT

2014-10-07 Thread Simon Riggs

On 7 October 2014 14:06, Robert Haas  wrote:

>> Is there a way of detecting that we are updating a unique constraint
>> column and then applying the HW locking only in that case? Or can we
>> only apply locking when we have multiple unique constraints on a
>> table?
>> If so, I would withdraw any objection to the HW locking technique.
>
> I'm not up on the details of what Peter's patch does with heavyweight
> locking, but I'd say it this way: if the patch uses heavyweight
> locking routinely, that's probably not going to scale well[1].   If
> the patch detects possible conflicts and uses heavyweight locking only
> in those cases and for the specific purpose of untangling those
> conflicts, then that might well be OK.

Yes, what I meant, just more clearly phrased. Thanks for the link.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

2014-10-07 Thread Andres Freund

On 2014-10-07 10:04:38 -0400, Robert Haas wrote:
> On Tue, Oct 7, 2014 at 8:03 AM, Bruce Momjian  wrote:
> > On Fri, Oct  3, 2014 at 06:06:24PM -0400, Bruce Momjian wrote:
> >> > I actually don't think that's true. Every lock acquiration implies a
> >> > number of atomic locks. Those are expensive. And if you see individual
> >> > locks acquired a high number of times in multiple proceses that's
> >> > something important. It causes significant bus traffic between sockets,
> >> > while not necessarily visible in the lock held times.
> >>
> >> True, but I don't think users are going to get much value from those
> >> numbers, and they are hard to get.  Server developers might want to know
> >> lock counts, but in those cases performance might not be as important.
> >
> > In summary, I think there are three measurements we can take on locks:
> >
> > 1.  lock wait, from request to acquisition
> > 2.  lock duration, from acquisition to release
> > 3.  lock count
> >
> > I think #1 is the most useful, and can be tracked by scanning a single
> > PGPROC lock entry per session (as already outlined), because you can't
> > wait on more than one lock at a time.
> >
> > #2 would probably require multiple PGPROC lock entries, though I am
> > unclear how often a session holds multiple light-weight locks
> > concurrently.  #3 might require global counters in memory.
> >
> > #1 seems the most useful from a user perspective, and we can perhaps
> > experiment with #2 and #3 once that is done.
> 
> I agree with some of your thoughts on this, Bruce, but there are some
> points I'm not so sure about.
> 
> I have a feeling that any system that involves repeatedly scanning the
> procarray will either have painful performance impact (if it's
> frequent) or catch only a statistically insignificant fraction of lock
> acquisitions (if it's infrequent).

Agreed.

> I think the easiest way to measure lwlock contention would be to put
> some counters in the lwlock itself.  My guess, based on a lot of
> fiddling with LWLOCK_STATS over the years, is that there's no way to
> count lock acquisitions and releases without harming performance
> significantly - no matter where we put the counters, it's just going
> to be too expensive.  However, I believe that incrementing a counter -
> even in the lwlock itself - might not be too expensive if we only do
> it when (1) a process goes to sleep or (2) spindelays occur.

Increasing the size will be painful on its own :(.

Have you tried/considered putting the counters into a per-backend array
somewhere in shared memory? That way they don't blow up the size of
frequently ping-ponged cachelines. Then you can summarize those values
whenever querying the results.

> As a further point, when I study the LWLOCK_STATS output, that stuff
> is typically what I'm looking for anyway.  The first few times I ran
> with that enabled, I was kind of interested by the total lock counts
> ... but that quickly got uninteresting.  The blocking and spindelays
> show you where the problems are, so that's the interesting part.

I don't really agree with this. Especially with shared locks (even more
so if/hwen the LW_SHARED stuff gets in), there's simply no relevant
blocking and spindelay.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

2014-10-07 Thread Tom Lane

Robert Haas  writes:
> I think the easiest way to measure lwlock contention would be to put
> some counters in the lwlock itself.  My guess, based on a lot of
> fiddling with LWLOCK_STATS over the years, is that there's no way to
> count lock acquisitions and releases without harming performance
> significantly - no matter where we put the counters, it's just going
> to be too expensive.  However, I believe that incrementing a counter -
> even in the lwlock itself - might not be too expensive if we only do
> it when (1) a process goes to sleep or (2) spindelays occur.  Those
> operations are expensive enough that I think the cost of an extra
> shared memory access won't be too significant.

FWIW, that approach sounds sane to me as well.  I concur with Robert's
fear that adding cycles to the no-contention case will cost so much
as to make the feature unusable in production, or even for realistic
testing; which would mean it's pretty much useless.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

2014-10-07 Thread Robert Haas

On Tue, Oct 7, 2014 at 8:03 AM, Bruce Momjian  wrote:
> On Fri, Oct  3, 2014 at 06:06:24PM -0400, Bruce Momjian wrote:
>> > I actually don't think that's true. Every lock acquiration implies a
>> > number of atomic locks. Those are expensive. And if you see individual
>> > locks acquired a high number of times in multiple proceses that's
>> > something important. It causes significant bus traffic between sockets,
>> > while not necessarily visible in the lock held times.
>>
>> True, but I don't think users are going to get much value from those
>> numbers, and they are hard to get.  Server developers might want to know
>> lock counts, but in those cases performance might not be as important.
>
> In summary, I think there are three measurements we can take on locks:
>
> 1.  lock wait, from request to acquisition
> 2.  lock duration, from acquisition to release
> 3.  lock count
>
> I think #1 is the most useful, and can be tracked by scanning a single
> PGPROC lock entry per session (as already outlined), because you can't
> wait on more than one lock at a time.
>
> #2 would probably require multiple PGPROC lock entries, though I am
> unclear how often a session holds multiple light-weight locks
> concurrently.  #3 might require global counters in memory.
>
> #1 seems the most useful from a user perspective, and we can perhaps
> experiment with #2 and #3 once that is done.

I agree with some of your thoughts on this, Bruce, but there are some
points I'm not so sure about.

I have a feeling that any system that involves repeatedly scanning the
procarray will either have painful performance impact (if it's
frequent) or catch only a statistically insignificant fraction of lock
acquisitions (if it's infrequent).  The reason I think there may be a
performance impact is because quite a number of heavily-trafficked
shared memory structures are bottlenecked on memory latency, so it's
easy to imagine that having an additional process periodically reading
them would increase cache-line bouncing and hurt performance.  We will
probably need some experimentation to find the best idea.

I think the easiest way to measure lwlock contention would be to put
some counters in the lwlock itself.  My guess, based on a lot of
fiddling with LWLOCK_STATS over the years, is that there's no way to
count lock acquisitions and releases without harming performance
significantly - no matter where we put the counters, it's just going
to be too expensive.  However, I believe that incrementing a counter -
even in the lwlock itself - might not be too expensive if we only do
it when (1) a process goes to sleep or (2) spindelays occur.  Those
operations are expensive enough that I think the cost of an extra
shared memory access won't be too significant.

As a further point, when I study the LWLOCK_STATS output, that stuff
is typically what I'm looking for anyway.  The first few times I ran
with that enabled, I was kind of interested by the total lock counts
... but that quickly got uninteresting.  The blocking and spindelays
show you where the problems are, so that's the interesting part.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BRIN indexes - TRAP: BadArgument

2014-10-07 Thread Heikki Linnakangas


On 10/07/2014 01:33 AM, Alvaro Herrera wrote:

Heikki Linnakangas wrote:

On 09/23/2014 10:04 PM, Alvaro Herrera wrote:

+ Open questions
+ --
+
+ * Same-size page ranges?
+   Current related literature seems to consider that each "index entry" in a
+   BRIN index must cover the same number of pages.  There doesn't seem to be a


What is the related literature? Is there an academic paper or
something that should be cited as a reference for BRIN?


I the original "minmax-proposal" file, I had these four URLs:

: Other database systems already have similar features. Some examples:
:
: * Oracle Exadata calls this "storage indexes"
:   http://richardfoote.wordpress.com/category/storage-indexes/
:
: * Netezza has "zone maps"
:   http://nztips.com/2010/11/netezza-integer-join-keys/
:
: * Infobright has this automatically within their "data packs" according to a
:   May 3rd, 2009 blog post
:   
http://www.infobright.org/index.php/organizing_data_and_more_about_rough_data_contest/
:
: * MonetDB also uses this technique, according to a published paper
:   http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.108.2662
:   "Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS"

I gave them all a quick look and none of them touches the approach in
detail; in fact other than the Oracle Exadata one, they are all talking
about something else and mention the "minmax" stuff only in passing.  I
don't think any of them is worth citing.


I think the "current related literature" phrase should be removed, if 
there isn't in fact any literature on this. If there's any literature 
worth referencing, should add a proper citation.



I added an USE_ASSERTION-only block in brininsert that runs the union
support proc and compares the output with the one from regular addValue.
I haven't tested this too much yet.


Ok, that's better than nothing. I wonder if it's too strict, though. It 
uses brin_tuple_equal(), which does a memcmp() on the tuples. That will 
trip for any non-meaningful differences, like the scale in a numeric.



* clarify the memory context stuff of support functions that we also
discussed earlier


I re-checked this stuff.  Turns out that the support functions don't
palloc/pfree memory too much, except to update the stuff stored in
BrinValues, by using datumCopy().  This memory is only freed when we
need to update a previous Datum.  There's no way for the brin.c code to
know when the Datum is going to be released by the support proc, and
thus no way for a temp context to be used.

The memory context experiments I alluded to earlier are related to
pallocs done in brininsert / bringetbitmap themselves, not in the
opclass-provided support procs.


At the very least, it needs to be documented.


All in all, I don't think there's much
room for improvement, other than perhaps doing so in brininsert/
bringetbitmap.  Don't really care too much about this either way.


Doing it in brininsert/bringetbitmap seems like the right approach. 
GiST, GIN, and SP-GiST all use a temporary memory context like that.



It would be wise to reserve some more support procedure numbers, for 
future expansion. Currently, support procs 1-4 are used by BRIN itself, 
and higher numbers can be used by the opclass. minmax opclasses uses 5-8 
for the <, <=, >= and > operators. If we ever want to add a new, 
optional, support function to BRIN, we're out of luck. Let's document 
that e.g. support procs < 10 are reserved for BRIN.


The redo routines should be updated to follow the new 
XLogReadBufferForRedo idiom (commit 
f8f4227976a2cdb8ac7c611e49da03aa9e65e0d2).


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RLS - permissive vs restrictive

2014-10-07 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> On Tue, Oct 7, 2014 at 6:44 AM, Thom Brown  wrote:
> > It appears that I'm not the only person who finds it somewhat
> > unintuitive for overlapping RLS policies to be permissive rather than
> > restrictive (OR vs AND) (at least 3 others seem to expect AND
> > behaviour), although I understand the reasoning behind
> > it.  And I've since discovered that the same feature in another
> > database system uses the latter rather than the former.
> >
> > I posted a brain coredump of my thoughts on the matter on Depesz's
> > blog 
> > (http://www.depesz.com/2014/10/02/waiting-for-9-5-row-level-security-policies-rls/#comment-187800)
> > and I was wondering if there's a future in allowing both systems.  The
> > syntax is less important than the functionality, where restrictive
> > policies would be AND'd, permissive policies would (like they
> > currently do) be OR'd, and a combination would involve all restrictive
> > plus at least one permissive (i.e. restr1 AND restr2 AND (perm3 OR
> > perm4)).
> >
> > I'm just interested to know what others' thoughts on the matter are.
> 
> I think that could make sense.  I think the main thing to consider is
> the case where different policies apply to different users: what will
> be the combined effect for users who are subjected to any subset of
> those policies?   If the same policies applies to everyone, then you
> can just do it all as a single policy and put whatever Boolean logic
> you like inside of it.

Agreed- this, as I recall, was the crux of the concern when we were
discussing it back in March(?).  One option might be to have a flag on
the table which basically says "and" or "or", but that feels pretty
limiting to me.  Another option would be to allow the user to define the
specific and-vs-or for each policy like so:

ALTER TABLE t SET ROW SECURITY POLICY p1 OR (p2 AND p3);

... or you could just only have one policy on the table and do whatever
you'd like with it (which was the original idea, actually, though I've
found myself very much liking the ability to have multiple policies, and
to have them set for specific roles and commands, rather than having to
have one very complicated policy or having to use a function..).

As mentioned previously, I'm certainly not against any of these options,
but we need to figure out what we actually want first.  I do think 'OR'
is still the right answer if we're only going to do one thing as this
wouldn't make any sense to me:

CREATE ROLE r1;
CREATE ROLE r2;
CREATE ROLE r3;
GRANT r2, r3 TO r1;

CREATE TABLE t (a int);
ALTER TABLE t ENABLE ROW LEVEL SECURITY;
CREATE POLICY p1 ON t TO r2 USING (a = 1);
CREATE POLICY p2 ON t TO r3 USING (a = 2);
GRANT SELECT ON t TO r1, r2, r3;

INSERT INTO t VALUES (1);
INSERT INTO t VALUES (2);
INSERT INTO t VALUES (3);

SET ROLE r1;
TABLE t;
 a 
---
(0 rows)

SET ROLE r2;
TABLE t;
 a 
---
 1
(1 row)

Given that r2 and r3 are granted to r1, using AND would mean that r1
can't actually see any rows (and this could end up happening at a later
point too- perhaps r1 only had r2 granted originally and was then
granted r3..).  We could try to come up with convoluted rules for
policies which have roles associated vs. policies which are for all
users, or try to define the policy overlap handling in some other way,
but I don't think we'd be doing our users a favor by doing so.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] TAP test breakage on MacOS X

2014-10-07 Thread Andrew Dunstan



On 10/07/2014 12:15 AM, Tom Lane wrote:

Robert Haas  writes:

On Mon, Oct 6, 2014 at 8:15 PM, Peter Eisentraut  wrote:

The TAP tests
are arguably already much easier to debug than pg_regress ever was.

Well, maybe.  I wasn't able, after about 5 minutes of searching, to
locate either a log file with details of the failure or the code that
revealed what the test, the expected result, and the actual result
were.  It's possible that all that information is there and I just
don't know where to look; it took me a while to learn where the
various logs (postmaster.log, initdb.log, results) left behind by
pg_regress were, too.  If that information is not there, then I'd say
it's not easier to debug.  If it is and I don't know where to look ...
well then I just need to get educated.

The given case seemed pretty opaque to me too.  Could we maybe
have some documentation about how to debug TAP failures?  Or in
other words, if they're "arguably" easier to debug, how about
presenting that argument?

Also to the point: does the buildfarm script know how to collect
the information needed to debug a TAP failure?





No. In fact, it doesn't yet know how to run those tests. That's on my 
TODO list.



cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add regression tests for autocommit-off mode for psql and fix some omissions

2014-10-07 Thread Feike Steenbergen

On 7 October 2014 09:55, Marko Tiikkaja  wrote:
> It's not clear to me that this is fixing a problem, to be honest.  If you're
> running autocommit=off, you have an expectation that you can roll back
> commands at will.  It's fine if I can't roll back a VACUUM, for example,
> since I would practically never want to do that.  But  ALTER TYPE .. ADD
> VALUE ..;  is an entirely different beast.  That one's permanent; there's no
> DROP equivalent.  If the command is just executed, and I can't roll it back,
> wouldn't that be a serious violation of the principle of least astonishment?

I think you have a valid and good point; however the autocommit-off mode can
currently already execute statements which cannnot be rolled back.
Perhaps it is a good idea to not allow any of these statements in autocommit-off
mode to prevent astonishement from users, but that would be a discussion of
itself.

My reason for proposing this is to have all these commands treated
consistently.
The expectation of being able to roll back commands at will cannot be fulfilled
currently, many statemens that are allowed with autocommit-off fall into the
category "different beast".

Currently the following statemens call PreventTransactionChain and do not
generate errors in autocommit-off mode:
- REINDEX DATABASE
- CREATE INDEX CONCURRENTLY
- ALTER SYSTEM
- CREATE DATABASE
- DROP DATABASE
- CREATE TABLESPACE
- DROP TABLESPACE
- CLUSTER
- VACUUM

The following statements call PreventTransactionChain and do generate errors
in autocommit-off mode:
- DROP INDEX CONCURRENTLY
- ALTER DATABASE ... SET TABLESPACE
- ALTER TYPE ... ADD

I don't see why these last three should be treated seperately from the
first list; we should
either allow all, or none of these statements IMHO.

kind regards,

Feike Steenbergen

On 7 October 2014 09:55, Marko Tiikkaja  wrote:
> On 10/7/14, 9:11 AM, Feike Steenbergen wrote:
>>
>> Perhaps I am the only one using autocommit-off mode
>
>
> You most definitely aren't.
>
>> and we shouldn't put effort
>> into fixing this?
>
>
> It's not clear to me that this is fixing a problem, to be honest.  If you're
> running autocommit=off, you have an expectation that you can roll back
> commands at will.  It's fine if I can't roll back a VACUUM, for example,
> since I would practically never want to do that.  But  ALTER TYPE .. ADD
> VALUE ..;  is an entirely different beast.  That one's permanent; there's no
> DROP equivalent.  If the command is just executed, and I can't roll it back,
> wouldn't that be a serious violation of the principle of least astonishment?
> DROP INDEX CONCURRENTLY has a bit of the same problem.  You can CREATE INDEX
> CONCURRENTLY, but it might take days in some cases.
>
> I think that just running the command is a bad idea, and if we want to fix
> something here we should focus on just providing a better error message.
>
>
> .marko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Promise index tuples for UPSERT

2014-10-07 Thread Robert Haas

On Tue, Oct 7, 2014 at 8:33 AM, Simon Riggs  wrote:
> On 7 October 2014 03:31, Peter Geoghegan  wrote:
>>> It may be that people on reading this now believe Peter's HW locking
>>> approach is the best. I'm happy to go with consensus.
>>
>> I bet you didn't think that you'd say that a week ago.  :-)
>
> You're right, because last week I thought heavyweight locking sucks
> and I still think that; I haven't said it is the best.
>
> What we've just discovered is that we're locking 100% of the time, but
> its not needed in 99.9% of cases and is arguable in the 0.1% case -
> not "required" at all.
>
> The price of avoiding that rare and possibly erroneous condition seems
> high to me.
>
> Is there a way of detecting that we are updating a unique constraint
> column and then applying the HW locking only in that case? Or can we
> only apply locking when we have multiple unique constraints on a
> table?
> If so, I would withdraw any objection to the HW locking technique.

I'm not up on the details of what Peter's patch does with heavyweight
locking, but I'd say it this way: if the patch uses heavyweight
locking routinely, that's probably not going to scale well[1].   If
the patch detects possible conflicts and uses heavyweight locking only
in those cases and for the specific purpose of untangling those
conflicts, then that might well be OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] As evidence, I offer the fact that removing 1 of 2 places where
heavyweight locking is used in hash indexes resulted in a large
performance gain at high client counts.  See commit
76837c1507cb5a5a0048046233568447729e66dd.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_upgrade, locale and encoding

2014-10-07 Thread Heikki Linnakangas

While looking at bug #11431, I noticed that pg_upgrade still seems to 
think that encoding and locale are cluster-wide properties. We got 
per-database locale support in 8.4, and encoding has been per-database 
much longer than that.


pg_upgrade checks the encoding and locale of template0 in both clusters, 
and throws an error if they don't match. But it doesn't check the locale 
or encoding of postgres or template1 databases. That leads to problems 
if e.g. the postgres database was dropped and recreated with a different 
encoding or locale in the old cluster. We will merrily upgrade it, but 
strings in the database will be incorrectly encoded.


I propose the attached patch, for git master. It's more complicated in 
back-branches, as they still support upgrading from pre-8.4 clusters. We 
haven't heard any complaints from the field on this, so I don't think 
it's worth trying to back-patch this.


This slightly changes the way the locale comparison works. First, it 
ignores the encoding suffix of the locale name. It's of course important 
that the databases have a compatible encoding, but pg_database has a 
separate field for encoding, and that's now compared directly. Secondly, 
it tries to canonicalize the names, by calling setlocale(). That seems 
like a good idea, in response to bug #11431 
(http://www.postgresql.org/message-id/5424090e.9060...@vmware.com).


- Heikki
>From ff44c80710ce16a8268ecfe63b6306026d4db87f Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Tue, 7 Oct 2014 15:38:53 +0300
Subject: [PATCH 1/1] In pg_upgrade, check the encoding and locale of template1
 and postgres dbs.

Lc_collate and lc_ctype have been per-database settings since server version
8.4, but we were still treating them as cluster-wide options, fetching the
values for template0 database, and comparing them. That's backwards; we
don't care about the encoding and locale of the template0 database, as
template0 is guaranteed to contain only ASCII characters. But if there are
any other databases that exist on both clusters (in particular template1 and
postgres databases), their encodings and locales must be compatible.

No backpatching, as earlier versions of pg_upgrade still support upgrading
from 8.3 servers. That would be more complicated.
---
 contrib/pg_upgrade/check.c   | 204 ++-
 contrib/pg_upgrade/controldata.c |  34 ---
 contrib/pg_upgrade/info.c|  14 ++-
 contrib/pg_upgrade/pg_upgrade.h  |   6 +-
 4 files changed, 85 insertions(+), 173 deletions(-)

diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c
index bbfcab7..3df4b95 100644
--- a/contrib/pg_upgrade/check.c
+++ b/contrib/pg_upgrade/check.c
@@ -14,12 +14,10 @@
 #include "pg_upgrade.h"
 
 
-static void set_locale_and_encoding(ClusterInfo *cluster);
 static void check_new_cluster_is_empty(void);
-static void check_locale_and_encoding(ControlData *oldctrl,
-		  ControlData *newctrl);
-static bool equivalent_locale(const char *loca, const char *locb);
-static bool equivalent_encoding(const char *chara, const char *charb);
+static void check_databases_are_compatible(void);
+static void check_locale_and_encoding(DbInfo *olddb, DbInfo *newdb);
+static bool equivalent_locale(int category, const char *loca, const char *locb);
 static void check_is_install_user(ClusterInfo *cluster);
 static void check_for_prepared_transactions(ClusterInfo *cluster);
 static void check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster);
@@ -81,8 +79,6 @@ check_and_dump_old_cluster(bool live_check)
 	if (!live_check)
 		start_postmaster(&old_cluster, true);
 
-	set_locale_and_encoding(&old_cluster);
-
 	get_pg_database_relfilenode(&old_cluster);
 
 	/* Extract a list of databases and tables from the old cluster */
@@ -127,13 +123,10 @@ check_and_dump_old_cluster(bool live_check)
 void
 check_new_cluster(void)
 {
-	set_locale_and_encoding(&new_cluster);
-
-	check_locale_and_encoding(&old_cluster.controldata, &new_cluster.controldata);
-
 	get_db_and_rel_infos(&new_cluster);
 
 	check_new_cluster_is_empty();
+	check_databases_are_compatible();
 
 	check_loadable_libraries();
 
@@ -279,93 +272,24 @@ check_cluster_compatibility(bool live_check)
 
 
 /*
- * set_locale_and_encoding()
- *
- * query the database to get the template0 locale
- */
-static void
-set_locale_and_encoding(ClusterInfo *cluster)
-{
-	ControlData *ctrl = &cluster->controldata;
-	PGconn	   *conn;
-	PGresult   *res;
-	int			i_encoding;
-	int			cluster_version = cluster->major_version;
-
-	conn = connectToServer(cluster, "template1");
-
-	/* for pg < 80400, we got the values from pg_controldata */
-	if (cluster_version >= 80400)
-	{
-		int			i_datcollate;
-		int			i_datctype;
-
-		res = executeQueryOrDie(conn,
-"SELECT datcollate, datctype "
-"FROM	pg_catalog.pg_database "
-"WHERE	datname = 'template0' ");
-		assert(PQntuples(res) == 1);
-
-		i_datcollate = PQfnumber(res, "datcollate");
-		i_datctype = PQfnumber(res

Re: [HACKERS] Promise index tuples for UPSERT

2014-10-07 Thread Simon Riggs

On 7 October 2014 03:31, Peter Geoghegan  wrote:

>> It may be that people on reading this now believe Peter's HW locking
>> approach is the best. I'm happy to go with consensus.
>
> I bet you didn't think that you'd say that a week ago.  :-)

You're right, because last week I thought heavyweight locking sucks
and I still think that; I haven't said it is the best.

What we've just discovered is that we're locking 100% of the time, but
its not needed in 99.9% of cases and is arguable in the 0.1% case -
not "required" at all.

The price of avoiding that rare and possibly erroneous condition seems
high to me.

Is there a way of detecting that we are updating a unique constraint
column and then applying the HW locking only in that case? Or can we
only apply locking when we have multiple unique constraints on a
table?
If so, I would withdraw any objection to the HW locking technique.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-10-07 Thread Simon Riggs

On 2 October 2014 22:37, Bruce Momjian  wrote:

> OK, it is was just an idea I wanted to point out, and if it doesn't
> work, it more clearly cements that we need UPSERT _and_ MERGE.

It seems clear that having two different initial keywords is popular
because it provides clarity about which aspects of the commands will
be supported.

I like the idea of making the two commands as close as possible in
syntax, which will make it easier to program for and encourage
adoption.
The command name could easily be MERGE [CONCURRENTLY] since that uses
the same concept from earlier DDL syntax/keywords.

In UPSERT, we don't need the ON keyword at all. If we are altering the
syntax, then we can easily remove this.

IIRC it wasn't agreed that we needed to identify which indexes in the
upsert SQL statement itself, since this would be possible in other
ways and would require programmers to know which unique constraints
are declared.

All of the other syntax could easily remain the same, leaving us with
a command that looks like this...

MERGE CONCURRENTLY INTO foo USING VALUES ()
WHEN NOT MATCHED THEN
  INSERT
WHEN MATCHED THEN
 UPDATE

Since MERGE now supports DELETE and IGNORE as options, presumably we
would also want to support those for the UPSERT version also.
I think it would be useful to also support a mechanism for raising an
error, as DB2 allows.

More complex example of MERGE

MERGE INTO product AS T
  USING (SELECT sales.id, sum(sold) AS sold, max(catalog.name) as name
 FROM sales, catalog WHERE sales.id = catalog.id GROUP BY sales.id) AS S
ON S.id = T.id
  WHEN MATCHED AND T.inventory = S.sold
   THEN DELETE
  WHEN MATCHED AND T.inventory < S.sold
   THEN SIGNAL SQLSTATE '78000' SET MESSAGE_TEXT =
'Oversold: ' || S.name
  WHEN MATCHED
   THEN UPDATE SET inventory = T.inventory - S.sold
  WHEN NOT MATCHED
  THEN INSERT VALUES(S.id, S.name, -S.sold);

Full example would be similar to this

MERGE CONCURRENTLY INTO product AS T
  USING VALUES ()
  WHEN MATCHED AND T.inventory = S.sold
   THEN DELETE
  WHEN MATCHED AND T.inventory < S.sold
   THEN SIGNAL SQLSTATE '78000' SET MESSAGE_TEXT =
'Oversold: ' || S.name
  WHEN MATCHED
   THEN UPDATE SET inventory = T.inventory - S.sold
  WHEN NOT MATCHED
  THEN INSERT VALUES(S.id, S.name, -S.sold);

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RLS - permissive vs restrictive

2014-10-07 Thread Robert Haas

On Tue, Oct 7, 2014 at 6:44 AM, Thom Brown  wrote:
> It appears that I'm not the only person who finds it somewhat
> unintuitive for overlapping RLS policies to be permissive rather than
> restrictive (OR vs AND) (at least 3 others seem to expect AND
> behaviour), although I understand the reasoning behind
> it.  And I've since discovered that the same feature in another
> database system uses the latter rather than the former.
>
> I posted a brain coredump of my thoughts on the matter on Depesz's
> blog 
> (http://www.depesz.com/2014/10/02/waiting-for-9-5-row-level-security-policies-rls/#comment-187800)
> and I was wondering if there's a future in allowing both systems.  The
> syntax is less important than the functionality, where restrictive
> policies would be AND'd, permissive policies would (like they
> currently do) be OR'd, and a combination would involve all restrictive
> plus at least one permissive (i.e. restr1 AND restr2 AND (perm3 OR
> perm4)).
>
> I'm just interested to know what others' thoughts on the matter are.

I think that could make sense.  I think the main thing to consider is
the case where different policies apply to different users: what will
be the combined effect for users who are subjected to any subset of
those policies?   If the same policies applies to everyone, then you
can just do it all as a single policy and put whatever Boolean logic
you like inside of it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

2014-10-07 Thread Bruce Momjian

On Fri, Oct  3, 2014 at 06:06:24PM -0400, Bruce Momjian wrote:
> > I actually don't think that's true. Every lock acquiration implies a
> > number of atomic locks. Those are expensive. And if you see individual
> > locks acquired a high number of times in multiple proceses that's
> > something important. It causes significant bus traffic between sockets,
> > while not necessarily visible in the lock held times.
> 
> True, but I don't think users are going to get much value from those
> numbers, and they are hard to get.  Server developers might want to know
> lock counts, but in those cases performance might not be as important.

In summary, I think there are three measurements we can take on locks:

1.  lock wait, from request to acquisition
2.  lock duration, from acquisition to release
3.  lock count

I think #1 is the most useful, and can be tracked by scanning a single
PGPROC lock entry per session (as already outlined), because you can't
wait on more than one lock at a time.

#2 would probably require multiple PGPROC lock entries, though I am
unclear how often a session holds multiple light-weight locks
concurrently.  #3 might require global counters in memory.

#1 seems the most useful from a user perspective, and we can perhaps
experiment with #2 and #3 once that is done.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Lets delete src/test/performance

2014-10-07 Thread Andres Freund

Hi,

The code in there doesn't look very interesting - and very unlikely to
run these days. Notably it relies on a binary called 'postmaster' et
al...

The last realy changes are from a long time ago:

commit 142d42f9386ed81a4f0779ec8a0cad1254173b5e
Author: Vadim B. Mikheev 
Date:   Fri Sep 26 14:57:36 1997 +

Scripts to run queries and data.

commit dbde5caeed4c9bdaf1292e52eafed80bbf01e9e9
Author: Vadim B. Mikheev 
Date:   Fri Sep 26 14:55:44 1997 +

Some results.

commit cf76759f34a172d424301cfa3723baee37f4a7ce
Author: Vadim B. Mikheev 
Date:   Fri Sep 26 14:55:21 1997 +

Start with performance suite.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-10-07 Thread Marti Raudsepp

On Thu, Sep 4, 2014 at 12:13 AM, Peter Geoghegan  wrote:
> On Wed, Sep 3, 2014 at 9:51 AM, Robert Haas  wrote:
>>> INSERT INTO upsert(key, val) VALUES(1, 'insert') ON CONFLICT WITHIN
>>> upsert_pkey UPDATE SET val = 'update';
>>
>> It seems to me that it would be better to specify a conflicting column
>> set rather than a conflicting index name.
>
> I'm open to pursuing that, provided there is a possible implementation
> that's robust against things like BEFORE triggers that modify
> constrained attributes. It must also work well with partial unique
> indexes. So I imagine we'd have to determine a way of looking up the
> unique index only after BEFORE triggers fire. Unless you're
> comfortable with punting on some of these cases by throwing an error,
> then all of this is actually surprisingly ticklish.

Speaking of this, I really don't like the proposed behavior of firing
BEFORE INSERT triggers even before we've decided whether to insert or
update. In the "classical" upsert pattern, changes by a BEFORE INSERT
trigger would get rolled back on conflict, but the new approach seems
surprising: changes from BEFORE INSERT get persisted in the database,
but AFTER INSERT is not fired.

I haven't found any discussion about alternative triggers semantics
for upsert. If there has been any, can you point me to it?

How about this: use the original VALUES results for acquiring a value
lock; if indeed the row didn't conflict, *then* fire BEFORE INSERT
triggers, and throw an error if the trigger changed any columns of the
(specified?) unique key.

Advantages of this approach:
1. Would solve the above conundrum about specifying a unique index via columns.
2. In the UPDATE case we can skip evaluating INSERT triggers and
DEFAULT expressions for columns
3. If I'm not missing anything, this approach may also let us get rid
of the CONFLICTING() construct
4. Possibly be closer to MySQL's syntax?

Point (2) is actually a major consideration IMO: if your query is
mostly performing UPDATEs, on a table with SERIAL keys, and you're
using a different key to perform the updates, then you waste sequence
values unnecessarily. I believe this is a very common pattern, for
example:

create table evt_type (id serial primary key, name text unique, evt_count int);
prepare upsert(text) as INSERT into evt_type (name, evt_count) values ($1, 1)
on conflict within evt_type_name_key UPDATE set evt_count=evt_count+1;

execute upsert('foo');
execute upsert('foo');
execute upsert('bar');

# table evt_type;
 id | name | evt_count
+--+---
  1 | foo  | 2
  3 | bar  | 1   <-- id could very well be "2"

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] RLS - permissive vs restrictive

2014-10-07 Thread Thom Brown

Hi,

It appears that I'm not the only person who finds it somewhat
unintuitive for overlapping RLS policies to be permissive rather than
restrictive (OR vs AND) (at least 3 others seem to expect AND
behaviour), although I understand the reasoning behind
it.  And I've since discovered that the same feature in another
database system uses the latter rather than the former.

I posted a brain coredump of my thoughts on the matter on Depesz's
blog 
(http://www.depesz.com/2014/10/02/waiting-for-9-5-row-level-security-policies-rls/#comment-187800)
and I was wondering if there's a future in allowing both systems.  The
syntax is less important than the functionality, where restrictive
policies would be AND'd, permissive policies would (like they
currently do) be OR'd, and a combination would involve all restrictive
plus at least one permissive (i.e. restr1 AND restr2 AND (perm3 OR
perm4)).

I'm just interested to know what others' thoughts on the matter are.

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Feasibility of supporting bind params for all command types

2014-10-07 Thread Craig Ringer

On 10/06/2014 10:13 AM, Tom Lane wrote:
> I think it might be desirable but it'd be a mess, both as to the
> concept/definition and as to the implementation.

Thanks Tom.

The issues around ALTER etc pretty much put it in the
not-worth-caring-about bucket. The issues around parameter typing alone...

I think we just need to add support for client-side parameter binding of
literals with a client-side flag, or by detecting statement type. So
users still get to use bind parameters, but PgJDBC deals with the details.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add regression tests for autocommit-off mode for psql and fix some omissions

2014-10-07 Thread Marko Tiikkaja


On 10/7/14, 9:11 AM, Feike Steenbergen wrote:

Perhaps I am the only one using autocommit-off mode


You most definitely aren't.


and we shouldn't put effort
into fixing this?


It's not clear to me that this is fixing a problem, to be honest.  If 
you're running autocommit=off, you have an expectation that you can roll 
back commands at will.  It's fine if I can't roll back a VACUUM, for 
example, since I would practically never want to do that.  But  ALTER 
TYPE .. ADD VALUE ..;  is an entirely different beast.  That one's 
permanent; there's no DROP equivalent.  If the command is just executed, 
and I can't roll it back, wouldn't that be a serious violation of the 
principle of least astonishment?  DROP INDEX CONCURRENTLY has a bit of 
the same problem.  You can CREATE INDEX CONCURRENTLY, but it might take 
days in some cases.


I think that just running the command is a bad idea, and if we want to 
fix something here we should focus on just providing a better error message.



.marko


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add regression tests for autocommit-off mode for psql and fix some omissions

2014-10-07 Thread Feike Steenbergen

Apologies for the previous message, I didn't send the full version.

On 6 October 2014 16:01, Tom Lane  wrote:
> What class of bug would that prevent exactly?

ERROR: [...] cannot run inside a transaction block

when:
- running psql in AUTOCOMMIT off
- not having started a transaction yet

Currently some statements (ALTER TYPE name ADD VALUE, DROP INDEX CONCURRENTLY)
can only be run in psql when enabling autocommit
(which I consider a bug - either in the code, or in the documentation),
whilst many others (VACUUM, CREATE DATABASE) can be run in AUTOCOMMIT
off because
they will not implicitly create a transaction in psql.

> It seems to me like
> something that would normally get forgotten when we add any new
> such statement.

I think that is probably true; it has already been forgotten to be added
to psql for a few commands.
Perhaps I am the only one using autocommit-off mode and we shouldn't put effort
into fixing this?

For me the reason to add some tests was to make sure that the current behaviour
will not change in future versions; the function command_no_begin might be added
to, modified, or rewritten.

On 7 October 2014 01:41, Jim Nasby  wrote:
> The options I see...
>
> 1) If there's a definitive way to tell from backend source code what
> commands disallow transactions then we can just use that information to
> generate the list of commands psql shouldn't do that with.
>
> 2) Always run the regression test with auto-commit turned off.
>
> 3) Run the regression in both modes (presumably only on the build farm due
> to how long it would take).

1) I don't know about a definitive way. I used grep to find all
   statements calling PreventTransactionChain.

2) - I expect most people use autocommit-on; so only running it in
 autocommit-off would not test the majority of users.
   - autocommit-off also obliges you to explicitly rollback transactions after
errors occur; this would probably mean a rewrite of some tests?

kind regards,

Feike Steenbergen

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

67 matches

Mail list logo