Re: [HACKERS] Additional role attributes && superuser review

2014-10-15 Thread Jim Nasby

On 10/15/14, 12:22 AM, Stephen Frost wrote:

   BACKUP:
 pg_start_backup()
 pg_stop_backup()
 pg_switch_xlog()
 pg_create_restore_point()


It seems odd to me that this (presumably) supports PITR but not pg_dump*. I 
realize that most folks probably don't use pg_dump for actual backups, but I 
think it is very common to use it to produce schema-only (maybe with data from 
a few tables as well) dumps for developers. I've certainly wished I could offer 
that ability without going full-blown super-user.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE POLICY and RETURNING

2014-10-15 Thread Craig Ringer
On 10/16/2014 01:44 PM, Craig Ringer wrote:
> So the read-filtering policy should apply to all statements. Not just
> SELECT.

Oh, IIRC one wrinkle in the prior discussion about this was that doing
this will prevent the implementation of policies that permit users to
update/delete rows they cannot otherwise see.

That's an argument in favour of only applying a read-filtering policy
where a RETURNING clause is present, but that introduces the "surprise!
the effects of your DELETE changed based on an unrelated clause!" issue.

Keep in mind, when considering RETURNING, that users don't always add
this clause directly. PgJDBC will tack a RETURNING clause on the end of
a statement if the user requests generated keys, for example. They will
be very surprised if the behaviour of their DML changes based on whether
or not they asked to get generated keys.

To my mind having behaviour change based on RETURNING is actively wrong,
wheras policies that permit rows to be updated/deleted but not selected
are a nice-to-have at most.

I'd really like to see some more coverage of the details of how these
policies apply to inheritance, both the read- and write- sides of DML
with RETURNING clauses, etc.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improve automatic analyze messages for inheritance trees

2014-10-15 Thread Etsuro Fujita

(2014/10/16 11:45), Simon Riggs wrote:

On 6 October 2014 11:07, Etsuro Fujita  wrote:

I noticed that analyze messages shown by autovacuum don't discriminate
between non-inherited cases and inherited cases, as shown in the below
example:

LOG:  automatic analyze of table "postgres.public.pt" system usage: CPU
0.00s/0.01u sec elapsed 0.06 sec
LOG:  automatic analyze of table "postgres.public.pt" system usage: CPU
0.00s/0.02u sec elapsed 0.11 sec

(The first one is for table "postgres.public.pt" and the second one is
for table inheritance tree "postgres.public.pt".)

So, I'd like to propose improving the messages for inherited cases, in
order to easily distinguish such cases from non-inherited cases.  Please
find attached a patch.  I'll add this to the upcoming CF.


Thanks for the suggestion. It seems like a useful addition.

Existing log analysis may wish to see the "automatic analyze of table"
on each row.
So it would be good to keep
automatic analyze of table \"%s.%s.%s\"


Agreed.


Can we add some words after this to indicate inheritance? (I have no
suggestions at present)
e.g.
   automatic analyze of table \"%s.%s.%s\" (new words go here)


How about this?

automatic analyze of table \"%s.%s.%s\" as inheritance tree

Thank you for the comment.

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Segmentation fault] pg_dump binary-upgrade fail for type without element

2014-10-15 Thread Rushabh Lathia
PFA patch patch for the master branch.

On Thu, Oct 16, 2014 at 11:09 AM, Rushabh Lathia 
wrote:

> Hi All,
>
> pg_dump binary-upgrade fail with segmentation fault for type without
> element.
>
> Consider the following testcase:
>
> rushabh@postgresql$ ./db/bin/psql postgres
> psql (9.5devel)
> Type "help" for help.
>
> postgres=# drop type typ;
> DROP TYPE
> postgres=# create type typ as ();
> CREATE TYPE
> postgres=# \q
> rushabh@postgresql$ ./db/bin/pg_dump postgres --binary-upgrade
> pg_dump: row number 0 is out of range 0..-1
> Segmentation fault (core dumped)
>
> Stake trace:
>
> (gdb) bt
> #0  0x003a2cc375f2 in strtoull_l_internal () from /lib64/libc.so.6
> #1  0x00417a08 in dumpCompositeType (fout=0x1365200,
> tyinfo=0x13b1340) at pg_dump.c:9356
> #2  0x004156a2 in dumpType (fout=0x1365200, tyinfo=0x13b1340) at
> pg_dump.c:8449
> #3  0x00414b08 in dumpDumpableObject (fout=0x1365200,
> dobj=0x13b1340) at pg_dump.c:8135
> #4  0x004041f8 in main (argc=3, argv=0x7fff838ff6e8) at
> pg_dump.c:812
>
> Into dumpCompositeType(), query fetch the elements for the composite type,
> but in case there are no elements for the type then it returns zero rows.
> In
> the following code block:
>
> if (binary_upgrade)
> {
> Oidtyprelid = atooid(PQgetvalue(res, 0, i_typrelid));
>
> binary_upgrade_set_type_oids_by_type_oid(fout, q,
>  tyinfo->dobj.catId.oid);
> binary_upgrade_set_pg_class_oids(fout, q, typrelid, false);
> }
>
> it fetching the typrelid which require for binary_upgrade. But in case
> query
> is returning zero rows (for the composite type without element) is failing.
>
> Looking further into code I found that rather then fetching typrelid, we
> can
> use the already stored typrelid from TypeInfo structure.
>
> Following commit added code related to binary_upgrade:
>
> commit 28f6cab61ab8958b1a7dfb019724687d92722538
> Author: Bruce Momjian 
> Date:   Wed Jan 6 05:18:18 2010 +
>
> binary upgrade:
>
> Preserve relfilenodes for views and composite types --- even though we
> don't store data in, them, they do consume relfilenodes.
>
> Bump catalog version.
>
> PFA patch to fix the issue. I think this need to fix in the back branch as
> well
> because its effecting pg_upgrade. Fix should backport till PG91, as on PG90
> it was not allowed to create type without element.
>
> postgres=# select version();
>
> version
>
> ---
>  PostgreSQL 9.0.18 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC)
> 4.4.7 20120313 (Red Hat 4.4.7-4), 64-bit
> (1 row)
> postgres=# create type typ as ();
> ERROR:  syntax error at or near ")"
> LINE 1: create type typ as ();
> ^
>
> Regards,
> Rushabh Lathia
> www.EnterpriseDB.com
>



-- 
Rushabh Lathia
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index c56a4cb..c528577 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -9269,7 +9269,6 @@ dumpCompositeType(Archive *fout, DumpOptions *dopt, TypeInfo *tyinfo)
 	int			i_attalign;
 	int			i_attisdropped;
 	int			i_attcollation;
-	int			i_typrelid;
 	int			i;
 	int			actual_atts;
 
@@ -9290,8 +9289,7 @@ dumpCompositeType(Archive *fout, DumpOptions *dopt, TypeInfo *tyinfo)
 			"pg_catalog.format_type(a.atttypid, a.atttypmod) AS atttypdefn, "
 		  "a.attlen, a.attalign, a.attisdropped, "
 		  "CASE WHEN a.attcollation <> at.typcollation "
-		  "THEN a.attcollation ELSE 0 END AS attcollation, "
-		  "ct.typrelid "
+		  "THEN a.attcollation ELSE 0 END AS attcollation "
 		  "FROM pg_catalog.pg_type ct "
 "JOIN pg_catalog.pg_attribute a ON a.attrelid = ct.typrelid "
 	"LEFT JOIN pg_catalog.pg_type at ON at.oid = a.atttypid "
@@ -9309,8 +9307,7 @@ dumpCompositeType(Archive *fout, DumpOptions *dopt, TypeInfo *tyinfo)
 		appendPQExpBuffer(query, "SELECT a.attname, "
 			"pg_catalog.format_type(a.atttypid, a.atttypmod) AS atttypdefn, "
 		  "a.attlen, a.attalign, a.attisdropped, "
-		  "0 AS attcollation, "
-		  "ct.typrelid "
+		  "0 AS attcollation "
 	 "FROM pg_catalog.pg_type ct, pg_catalog.pg_attribute a "
 		  "WHERE ct.oid = '%u'::pg_catalog.oid "
 		  "AND a.attrelid = ct.typrelid "
@@ -9328,15 +9325,12 @@ dumpCompositeType(Archive *fout, DumpOptions *dopt, TypeInfo *tyinfo)
 	i_attalign = PQfnumber(res, "attalign");
 	i_attisdropped = PQfnumber(res, "attisdropped");
 	i_attcollation = PQfnumber(res, "attcollation");
-	i_typrelid = PQfnumber(res, "typrelid");
 
 	if (dopt->binary_upgrade)
 	{
-		Oid			typrelid = atooid(PQgetvalue(res, 0, i_typrelid));
-
 		binary_upgrade_set_type_oids_by_type_oid(fout, q,
  tyinfo->dobj.catId.oid);
-		binary_upgrade_set_pg_class_oids(fout, q, typrelid, false);
+		binary_upgrade_set_pg_class_oids(fou

Re: [HACKERS] CREATE POLICY and RETURNING

2014-10-15 Thread Craig Ringer
On 10/16/2014 12:25 PM, Fujii Masao wrote:
> Hi,
> 
> While I was checking the behavior of RLS, I found that the policy for SELECT
> doesn't seem to be applied to RETURNING. Is this intentional? 

This is why I was opposed to having a "SELECT" policy at all. It should
be "VISIBLE", "INSERT", "UPDATE", "DELETE".

I say "VISIBLE" instead of "READ" because I don't think the rows
affected by an UPDATE or DELETE should be affected by whether or not
they have a RETURNING clause. That's IMO nonsensical.and violates the
usual expectations about which clauses can have filtering effects.

So the read-filtering policy should apply to all statements. Not just
SELECT.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [Segmentation fault] pg_dump binary-upgrade fail for type without element

2014-10-15 Thread Rushabh Lathia
Hi All,

pg_dump binary-upgrade fail with segmentation fault for type without
element.

Consider the following testcase:

rushabh@postgresql$ ./db/bin/psql postgres
psql (9.5devel)
Type "help" for help.

postgres=# drop type typ;
DROP TYPE
postgres=# create type typ as ();
CREATE TYPE
postgres=# \q
rushabh@postgresql$ ./db/bin/pg_dump postgres --binary-upgrade
pg_dump: row number 0 is out of range 0..-1
Segmentation fault (core dumped)

Stake trace:

(gdb) bt
#0  0x003a2cc375f2 in strtoull_l_internal () from /lib64/libc.so.6
#1  0x00417a08 in dumpCompositeType (fout=0x1365200,
tyinfo=0x13b1340) at pg_dump.c:9356
#2  0x004156a2 in dumpType (fout=0x1365200, tyinfo=0x13b1340) at
pg_dump.c:8449
#3  0x00414b08 in dumpDumpableObject (fout=0x1365200,
dobj=0x13b1340) at pg_dump.c:8135
#4  0x004041f8 in main (argc=3, argv=0x7fff838ff6e8) at
pg_dump.c:812

Into dumpCompositeType(), query fetch the elements for the composite type,
but in case there are no elements for the type then it returns zero rows. In
the following code block:

if (binary_upgrade)
{
Oidtyprelid = atooid(PQgetvalue(res, 0, i_typrelid));

binary_upgrade_set_type_oids_by_type_oid(fout, q,
 tyinfo->dobj.catId.oid);
binary_upgrade_set_pg_class_oids(fout, q, typrelid, false);
}

it fetching the typrelid which require for binary_upgrade. But in case query
is returning zero rows (for the composite type without element) is failing.

Looking further into code I found that rather then fetching typrelid, we can
use the already stored typrelid from TypeInfo structure.

Following commit added code related to binary_upgrade:

commit 28f6cab61ab8958b1a7dfb019724687d92722538
Author: Bruce Momjian 
Date:   Wed Jan 6 05:18:18 2010 +

binary upgrade:

Preserve relfilenodes for views and composite types --- even though we
don't store data in, them, they do consume relfilenodes.

Bump catalog version.

PFA patch to fix the issue. I think this need to fix in the back branch as
well
because its effecting pg_upgrade. Fix should backport till PG91, as on PG90
it was not allowed to create type without element.

postgres=# select version();

version
---
 PostgreSQL 9.0.18 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC)
4.4.7 20120313 (Red Hat 4.4.7-4), 64-bit
(1 row)
postgres=# create type typ as ();
ERROR:  syntax error at or near ")"
LINE 1: create type typ as ();
^

Regards,
Rushabh Lathia
www.EnterpriseDB.com
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 2915329..64d3856 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -9290,7 +9290,6 @@ dumpCompositeType(Archive *fout, TypeInfo *tyinfo)
 	int			i_attalign;
 	int			i_attisdropped;
 	int			i_attcollation;
-	int			i_typrelid;
 	int			i;
 	int			actual_atts;
 
@@ -9311,8 +9310,7 @@ dumpCompositeType(Archive *fout, TypeInfo *tyinfo)
 			"pg_catalog.format_type(a.atttypid, a.atttypmod) AS atttypdefn, "
 		  "a.attlen, a.attalign, a.attisdropped, "
 		  "CASE WHEN a.attcollation <> at.typcollation "
-		  "THEN a.attcollation ELSE 0 END AS attcollation, "
-		  "ct.typrelid "
+		  "THEN a.attcollation ELSE 0 END AS attcollation "
 		  "FROM pg_catalog.pg_type ct "
 "JOIN pg_catalog.pg_attribute a ON a.attrelid = ct.typrelid "
 	"LEFT JOIN pg_catalog.pg_type at ON at.oid = a.atttypid "
@@ -9330,8 +9328,7 @@ dumpCompositeType(Archive *fout, TypeInfo *tyinfo)
 		appendPQExpBuffer(query, "SELECT a.attname, "
 			"pg_catalog.format_type(a.atttypid, a.atttypmod) AS atttypdefn, "
 		  "a.attlen, a.attalign, a.attisdropped, "
-		  "0 AS attcollation, "
-		  "ct.typrelid "
+		  "0 AS attcollation "
 	 "FROM pg_catalog.pg_type ct, pg_catalog.pg_attribute a "
 		  "WHERE ct.oid = '%u'::pg_catalog.oid "
 		  "AND a.attrelid = ct.typrelid "
@@ -9349,15 +9346,12 @@ dumpCompositeType(Archive *fout, TypeInfo *tyinfo)
 	i_attalign = PQfnumber(res, "attalign");
 	i_attisdropped = PQfnumber(res, "attisdropped");
 	i_attcollation = PQfnumber(res, "attcollation");
-	i_typrelid = PQfnumber(res, "typrelid");
 
 	if (binary_upgrade)
 	{
-		Oid			typrelid = atooid(PQgetvalue(res, 0, i_typrelid));
-
 		binary_upgrade_set_type_oids_by_type_oid(fout, q,
  tyinfo->dobj.catId.oid);
-		binary_upgrade_set_pg_class_oids(fout, q, typrelid, false);
+		binary_upgrade_set_pg_class_oids(fout, q, tyinfo->typrelid, false);
 	}
 
 	qtypname = pg_strdup(fmtId(tyinfo->dobj.name));

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-10-15 Thread Amit Kapila
On Thu, Oct 16, 2014 at 8:08 AM, Simon Riggs  wrote:
>
>
> I've been trying to review this thread with the thought "what does
> this give me?". I am keen to encourage contributions and also keen to
> extend our feature set, but I do not wish to complicate our code base.
> Dilip's developments do seem to be good quality; what I question is
> whether we want this feature.
>
> This patch seems to allow me to run multiple VACUUMs at once. But I
> can already do this, with autovacuum.
>
> Is there anything this patch can do that cannot be already done with
autovacuum?

The difference lies in the fact that vacuumdb (or VACUUM) gives
the option to user to control the vacuum activity for cases when
autovacuum doesn't suffice the need, one of the example is to perform
vacuum via vacuumdb after pg_upgrade or some other maintenance
activity (as mentioned by Jeff upthread).  So I think in all such cases
having parallel option can give benefit in terms of performance which
is already shown by Dilip upthread by running some tests (with and
without patch).

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Locking for Rename To new_name works differently for different objects

2014-10-15 Thread Amit Kapila
On Wed, Oct 15, 2014 at 9:34 PM, Robert Haas  wrote:
> On Wed, Oct 15, 2014 at 10:04 AM, Tom Lane  wrote:
> > Amit Kapila  writes:
> >> I have observed that for renaming some of the objects
> >> AccessExclusiveLock is taken on object whereas for
> >> other kind of objects no lock is taken on object before
> >> renaming the object.
> >
> > The usual theory for DDL updates of all types (not just rename)
> > is that an explicit lock is only needed for objects whose catalog
> > representation comprises more than one row.  Otherwise, the implicit
> > locking involved in updating that row is sufficient to serialize
> > different updates.

I am not sure if this rule is followed for all DDL's, as an example, I
tried to compare for FUNCTION and SCHEMA.

Function has entries in pg_proc, pg_depend (schema, language,
returntype, etc.), so by above theory any update should take explicit
lock which I think holds good whereas if we see for Schema, it has
entries in pg_namespace,  pg_depend (owner), but it doesn't take
explicit lock during Rename.

Yet another anomaly is for "Comment on  .." there is only
one row in pg_description, however it still takes explicit lock to
avoid concurrent activity which looks right to me.

> That's an interesting point that I hadn't considered, but I'm willing
> to believe that at least some of the differences might also be
> haphazard.

Yeah, I also think so, is it important enough that we spend energy to
find all such differences and fix them.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


[HACKERS] CREATE POLICY and RETURNING

2014-10-15 Thread Fujii Masao
Hi,

While I was checking the behavior of RLS, I found that the policy for SELECT
doesn't seem to be applied to RETURNING. Is this intentional? Please see
the following example.

CREATE ROLE foo LOGIN NOSUPERUSER;
CREATE TABLE hoge AS SELECT col FROM generate_series(1,10) col;
ALTER TABLE hoge ENABLE ROW LEVEL SECURITY;
GRANT SELECT, DELETE ON hoge TO foo;
CREATE POLICY hoge_select_policy ON hoge FOR SELECT TO foo USING (col < 4);
CREATE POLICY hoge_delete_policy ON hoge FOR DELETE TO foo USING (col < 8);
\c - foo
DELETE FROM hoge WHERE col = 6 RETURNING *;

The policy "hoge_select_policy" should disallow the user "foo" to see the row
with "col = 6". But the last DELETE RETURNING returns that row.

One minor suggestion is: what about changing the message as follows?
There are two more similar messages in policy.c, and they use the word
"row-policy" instead of "policy". For the consistency, I think that
the following also should use the word "row-policy".

diff --git a/src/backend/commands/policy.c b/src/backend/commands/policy.c
index 3627539..df45385 100644
--- a/src/backend/commands/policy.c
+++ b/src/backend/commands/policy.c
@@ -551,7 +551,7 @@ CreatePolicy(CreatePolicyStmt *stmt)
if (HeapTupleIsValid(rsec_tuple))
ereport(ERROR,
(errcode(ERRCODE_DUPLICATE_OBJECT),
-errmsg("policy \"%s\" for relation
\"%s\" already exists",
+errmsg("row-policy \"%s\" for
relation \"%s\" already exists",

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Directory/File Access Permissions for COPY and Generic File Access Functions

2014-10-15 Thread Tom Lane
"Brightwell, Adam"  writes:
> The attached patch for review implements a directory permission system that
> allows for providing a directory read/write capability to directories for
> COPY TO/FROM and Generic File Access Functions to non-superusers.

TBH, this sounds like it's adding a lot of mechanism and *significant*
risk of unforeseen security issues in order to solve a problem that we
do not need to solve.  The field demand for such a feature is just about
indistinguishable from zero.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Directory/File Access Permissions for COPY and Generic File Access Functions

2014-10-15 Thread Brightwell, Adam
All,

The attached patch for review implements a directory permission system that
allows for providing a directory read/write capability to directories for
COPY TO/FROM and Generic File Access Functions to non-superusers.  This is
not a complete solution as it does not currently contain documentation or
regression tests.  Though it is my hopes to get some feedback as I am sure
there are some aspects of it that need to be reworked.  So I thought I'd
put it out there for comments/review.

The approach taken is to create "directory aliases" that have a unique name
and path, as well as an associated ACL list.  A superuser can create a new
alias to any directory on the system and then provide READ or WRITE
permissions to any non-superuser.  When a non-superuser then attempts to
execute a COPY TO/FROM or any one of the generic file access functions, a
permission check is performed against the aliases for the user and target
directory.  Superusers are allowed to bypass all of these checks.  All
alias paths must be an absolute path in order to avoid potential risks.
However, in the generic file access functions superusers are still allowed
to execute the functions with a relative path where non-superusers are
required to provide an absolute path.

- Implementation Details -

System Catalog:
pg_diralias
 - dirname - the name of the directory alias
 - dirpath - the directory path - must be absolute
 - diracl - the ACL for the directory

Syntax:
CREATE DIRALIAS  AS ''
ALTER DIRALIAS  AS ''
ALTER DIRALIAS  RENAME TO 
DROP DIRALIAS 

This is probably the area that I would really appreciate your thoughts and
recommendations. To GRANT permissions to a directory alias, I had to create
a special variant of GRANT since READ and WRITE are not reserved keywords
and causes grammar issues.  Therefore, I chose to start with the following
syntax:

GRANT ON DIRALIAS   TO 

where  is either READ, WRITE or ALL.

Any comments, suggestions or feedback would be greatly appreciated.

Thanks,
Adam

-- 
Adam Brightwell - adam.brightw...@crunchydatasolutions.com
Database Engineer - www.crunchydatasolutions.com
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index b257b02..8cdc5cb 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -41,7 +41,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
 	pg_foreign_data_wrapper.h pg_foreign_server.h pg_user_mapping.h \
 	pg_foreign_table.h pg_rowsecurity.h \
 	pg_default_acl.h pg_seclabel.h pg_shseclabel.h pg_collation.h pg_range.h \
-	toasting.h indexing.h \
+	pg_diralias.h toasting.h indexing.h \
 )
 
 # location of Catalog.pm
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index d30612c..3717bf5 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -30,6 +30,7 @@
 #include "catalog/pg_collation.h"
 #include "catalog/pg_conversion.h"
 #include "catalog/pg_database.h"
+#include "catalog/pg_diralias.h"
 #include "catalog/pg_default_acl.h"
 #include "catalog/pg_event_trigger.h"
 #include "catalog/pg_extension.h"
@@ -48,6 +49,7 @@
 #include "catalog/pg_ts_config.h"
 #include "catalog/pg_ts_dict.h"
 #include "commands/dbcommands.h"
+#include "commands/diralias.h"
 #include "commands/proclang.h"
 #include "commands/tablespace.h"
 #include "foreign/foreign.h"
@@ -3183,6 +3185,190 @@ ExecGrant_Type(InternalGrant *istmt)
 	heap_close(relation, RowExclusiveLock);
 }
 
+/*
+ * ExecuteGrantDirAliasStmt
+ *   handles the execution of the GRANT/REVOKE ON DIRALIAS command.
+ *
+ * stmt - the GrantDirAliasStmt that describes the directory aliases and
+ *permissions to be granted/revoked.
+ */
+void
+ExecuteGrantDirAliasStmt(GrantDirAliasStmt *stmt)
+{
+	Relation		pg_diralias_rel;
+	Oidgrantor;
+	List		   *grantee_ids = NIL;
+	AclMode			permissions;
+	ListCell	   *item;
+
+	/* Must be superuser to grant directory alias permissions */
+	if (!superuser())
+		ereport(ERROR,
+(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("must be superuser to grant directory alias permissions")));
+
+	/*
+	 * Grantor is optional.  If it is not provided then set it to the current
+	 * user.
+	 */
+	if (stmt->grantor)
+		grantor = get_role_oid(stmt->grantor, false);
+	else
+		grantor = GetUserId();
+
+	/* Convert grantee names to oids */
+	foreach(item, stmt->grantees)
+	{
+		PrivGrantee *grantee = (PrivGrantee *) lfirst(item);
+
+		if (grantee->rolname == NULL)
+			grantee_ids = lappend_oid(grantee_ids, ACL_ID_PUBLIC);
+		else
+		{
+			Oid roleid = get_role_oid(grantee->rolname, false);
+			grantee_ids = lappend_oid(grantee_ids, roleid);
+		}
+	}
+
+	permissions = ACL_NO_RIGHTS;
+
+	/* If ALL was provided then set permissions to ACL_ALL_RIGHTS_DIRALIAS */
+	if (stmt->permissions == NIL)
+		permissions = ACL_ALL_RIGHTS_DIRALIAS;
+	else
+	{
+		/* Condense all permissions */
+		foreach(item, stmt->permissions)
+		{
+			AccessPriv *priv = (AccessPriv *) lfirst(item);
+			permissions |= st

Re: [HACKERS] Improve automatic analyze messages for inheritance trees

2014-10-15 Thread Simon Riggs
On 6 October 2014 11:07, Etsuro Fujita  wrote:
> I noticed that analyze messages shown by autovacuum don't discriminate
> between non-inherited cases and inherited cases, as shown in the below
> example:
>
> LOG:  automatic analyze of table "postgres.public.pt" system usage: CPU
> 0.00s/0.01u sec elapsed 0.06 sec
> LOG:  automatic analyze of table "postgres.public.pt" system usage: CPU
> 0.00s/0.02u sec elapsed 0.11 sec
>
> (The first one is for table "postgres.public.pt" and the second one is
> for table inheritance tree "postgres.public.pt".)
>
> So, I'd like to propose improving the messages for inherited cases, in
> order to easily distinguish such cases from non-inherited cases.  Please
> find attached a patch.  I'll add this to the upcoming CF.

Thanks for the suggestion. It seems like a useful addition.

Existing log analysis may wish to see the "automatic analyze of table"
on each row.
So it would be good to keep
   automatic analyze of table \"%s.%s.%s\"

Can we add some words after this to indicate inheritance? (I have no
suggestions at present)
e.g.
  automatic analyze of table \"%s.%s.%s\" (new words go here)

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-10-15 Thread Simon Riggs
On 27 September 2014 03:55, Jeff Janes  wrote:
> On Fri, Sep 26, 2014 at 11:47 AM, Alvaro Herrera 
> wrote:
>>
>> Gavin Flower wrote:
>>
>> > Curious: would it be both feasible and useful to have multiple
>> > workers process a 'large' table, without complicating things too
>> > much?  The could each start at a different position in the file.
>>
>> Feasible: no.  Useful: maybe, we don't really know.  (You could just as
>> well have a worker at double the speed, i.e. double vacuum_cost_limit).
>
>
> Vacuum_cost_delay is already 0 by default.  So unless you changed that,
> vacuum_cost_limit does not take effect under vacuumdb.
>
> It is pretty easy for vacuum to be CPU limited, and even easier for analyze
> to be CPU limited (It does a lot of sorting).  I think analyzing is the main
> use case for this patch, to shorten the pg_upgrade window.  At least, that
> is how I anticipate using it.

I've been trying to review this thread with the thought "what does
this give me?". I am keen to encourage contributions and also keen to
extend our feature set, but I do not wish to complicate our code base.
Dilip's developments do seem to be good quality; what I question is
whether we want this feature.

This patch seems to allow me to run multiple VACUUMs at once. But I
can already do this, with autovacuum.

Is there anything this patch can do that cannot be already done with autovacuum?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add shutdown_at_recovery_target option to recovery.conf

2014-10-15 Thread Simon Riggs
On 11 September 2014 16:02, Petr Jelinek  wrote:

>> What about adding something like action_at_recovery_target=pause|shutdown
>> instead of increasing the number of parameters?
>>
>
> That will also increase number of parameters as we can't remove the current
> pause one if we want to be backwards compatible. Also there would have to be
> something like action_at_recovery_target=none or off or something since the
> default is that pause is on and we need to be able to turn off pause without
> having to have shutdown on. What more, I am not sure I see any other actions
> that could be added in the future as promote action already works and listen
> (for RO queries) also already works independently of this.

I accept your argument, though I have other thoughts.

If someone specifies

shutdown_at_recovery_target = true
pause_at_recovery_target = true

it gets a little hard to work out what to do; we shouldn't allow such
lack of clarity.

In recovery its easy to do this

if (recoveryShutdownAtTarget)
   recoveryPauseAtTarget = false;

but it won't be when these become GUCs, so I think Fuji's suggestion
is a good one.

No other comments on patch, other than good idea.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg

2014-10-15 Thread Jeff Davis
On Fri, 2014-08-29 at 10:12 -0400, Tom Lane wrote:
> What about removing the callback per se and just keeping the argument,
> as it were.  That is, a MemoryContext has a field of type "size_t *"
> that is normally NULL, but if set to non-NULL, then we increment the
> pointed-to value for pallocs and decrement for pfrees.

I like that idea; patch attached. Two differences:
  * At block level, not chunk level.
  * I use a uint64, because a size_t is only guaranteed to hold one
allocation, and we need to sum up many allocations.

When unused, it does still appear to have a little overhead in Robert's
test on his power machine. It seems to be between a 0.5-1.0% regression.
There's not much extra code on that path, just a branch, pointer
dereference, and an addition; so I don't think it will get much cheaper
than it is.

This regression seems harder to reproduce on my workstation, or perhaps
it's just noisier.

> One thought in either case is that we don't have to touch the API for
> MemoryContextCreate: rather, the feature can be enabled in a separate
> call after making the context.

That seems fairly awkward to me because the pointer needs to be
inherited from the parent context when not specified. I left the extra
API call in.

The inheritance is awkward anyway, though. If you create a tracked
context as a child of an already-tracked context, allocations in the
newer one won't count against the original. I don't see a way around
that without introducing even more performance problems.

If this patch is unacceptable, my only remaining idea is to add the
memory only for the current context with no inheritance (thereby
eliminating the branch, also). That will require HashAgg to traverse all
the child contexts to check whether the memory limit has been exceeded.
As Tomas pointed out, that could be a lot of work in the case of
array_agg with many groups.

Regards,
Jeff Davis

*** a/src/backend/utils/mmgr/aset.c
--- b/src/backend/utils/mmgr/aset.c
***
*** 438,451  AllocSetContextCreate(MemoryContext parent,
  	  Size initBlockSize,
  	  Size maxBlockSize)
  {
! 	AllocSet	context;
  
  	/* Do the type-independent part of context creation */
! 	context = (AllocSet) MemoryContextCreate(T_AllocSetContext,
! 			 sizeof(AllocSetContext),
! 			 &AllocSetMethods,
! 			 parent,
! 			 name);
  
  	/*
  	 * Make sure alloc parameters are reasonable, and save them.
--- 438,482 
  	  Size initBlockSize,
  	  Size maxBlockSize)
  {
! 	/* inherit 'track_mem' from parent context, if available */
! 	uint64 *track_mem = (parent == NULL) ? NULL : parent->track_mem;
! 
! 	return AllocSetContextCreateTracked(parent, name, minContextSize,
! 		initBlockSize, maxBlockSize,
! 		track_mem);
! }
! 
! /*
!  * AllocSetContextCreateTracked
!  *		Create a new AllocSet context, tracking total memory usage.
!  *
!  * parent: parent context, or NULL if top-level context
!  * name: name of context (for debugging --- string will be copied)
!  * minContextSize: minimum context size
!  * initBlockSize: initial allocation block size
!  * maxBlockSize: maximum allocation block size
!  * track_mem: caller-supplied variable to use for memory tracking
!  */
! MemoryContext
! AllocSetContextCreateTracked(MemoryContext parent,
! 			 const char *name,
! 			 Size minContextSize,
! 			 Size initBlockSize,
! 			 Size maxBlockSize,
! 			 uint64 *track_mem)
! {
! 	AllocSet			set;
! 	MemoryContext		context;
  
  	/* Do the type-independent part of context creation */
! 	context = MemoryContextCreate(T_AllocSetContext,
!   sizeof(AllocSetContext),
!   &AllocSetMethods,
!   parent,
!   name,
!   track_mem);
! 
! 	set = (AllocSet) context;
  
  	/*
  	 * Make sure alloc parameters are reasonable, and save them.
***
*** 459,467  AllocSetContextCreate(MemoryContext parent,
  	if (maxBlockSize < initBlockSize)
  		maxBlockSize = initBlockSize;
  	Assert(AllocHugeSizeIsValid(maxBlockSize)); /* must be safe to double */
! 	context->initBlockSize = initBlockSize;
! 	context->maxBlockSize = maxBlockSize;
! 	context->nextBlockSize = initBlockSize;
  
  	/*
  	 * Compute the allocation chunk size limit for this context.  It can't be
--- 490,498 
  	if (maxBlockSize < initBlockSize)
  		maxBlockSize = initBlockSize;
  	Assert(AllocHugeSizeIsValid(maxBlockSize)); /* must be safe to double */
! 	set->initBlockSize = initBlockSize;
! 	set->maxBlockSize = maxBlockSize;
! 	set->nextBlockSize = initBlockSize;
  
  	/*
  	 * Compute the allocation chunk size limit for this context.  It can't be
***
*** 477,486  AllocSetContextCreate(MemoryContext parent,
  	 * and actually-allocated sizes of any chunk must be on the same side of
  	 * the limit, else we get confused about whether the chunk is "big".
  	 */
! 	context->allocChunkLimit = ALLOC_CHUNK_LIMIT;
! 	while ((Size) (context->allocChunkLimit + 

Re: [HACKERS] Column Redaction

2014-10-15 Thread Claudio Freire
On Wed, Oct 15, 2014 at 8:59 PM, Simon Riggs  wrote:
> On 15 October 2014 21:03, Claudio Freire  wrote:
>
>>> So you're familiar then with this process? So you know that an auditor
>>> would trigger an investigation, resulting in deeper surveillance and
>>> gathering of evidence that ends with various remedial actions, such as
>>> court. How would that process start then, if not this way?
>>
>> I've seen lots of such investigations fail because the evidence wasn't
>> strong enough to link to a particular person, but rather a computer
>> terminal or something like that.
>
> So your solution to the evidence problem is to do nothing? Or you have
> a better suggestion?
>
> Nothing is certain, apart from doing nothing.

Is solving the evidence problem in scope of the postgresql project?

The solution is to not require evidence in order to be protected from
data theft.

Having evidence is nice, you can punish effective attacks, which is a
deterrent to any attacker as you pointed out, and may even include
financial compensation. It requires physical security as well as
software security, and I'm not qualified to solve that problem without
help from a lawyer (but I do know you need help from a lawyer to make
sure the evidence you gather is usable).

Not having usable evidence, however, could fail to deter knowledgeable
attackers (remember, in this setting, it would be an inside job, so it
would be a very knowledgeable attacker).

But in any case, if the deterrence isn't enough, and you get attacked,
anything involving redaction as fleshed out in the OP is good for
nothing. The damage has been done already. The feature doesn't
meaningfully slow down extraction of data, so anything you do can only
punish the attacker, not prevent further data theft or damaged
reputation/business.

Something that requires superuser privilege (or specially granted
privilege) in order to gain access to the unredacted value, on the
other hand, would considerably slow down the attacker. From my
proposal, only the second form (unnormalized redacted tuples) would
provide any meaningful data security in this sense, but even in the
other, less limiting form, it would still prevent unauthorized users
from extracting the value: you can no longer do binary search with
unredacted data, only a full brute-force search would work. That's
because the full value id (that I called prefix id, sorry, leftover
from an earlier draft) doesn't relate to the unredacted value, so
sorting comparisons (< <= > >=) don't provide usable information about
value space.

So, if there is a chance to implement redaction in a way that truly
protects redacted data... even if it costs a bit of performance
sometimes. Is avoiding the performance hit worth the risk?

I guess the potential users of such a feature are the only ones
qualified to answer, and the answer has great weight on how the
feature could be implemented.

Well, and of course, the quality of the implementation. If my proposal
has weaknesses I did not realize yet, it may be worthless. But that's
true of all proposals that aim for any meaningful level of security:
it's worth a lengthy look.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #10823: Better REINDEX syntax.

2014-10-15 Thread Stephen Frost
* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> Stephen Frost wrote:
> > * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> > > We lost this patch for the October commitfest, didn't we?
> > 
> > I'm guessing you missed that a new version just got submitted..?
> 
> Which one, reindex schema?  Isn't that a completely different patch?

Err, sorry, wasn't looking close enough, evidently. :/

> > I'd be fine with today's being added to the october commitfest..
> > 
> > Of course, there's a whole independent discussion to be had about how
> > there wasn't any break between last commitfest and this one, but that
> > probably deserves its own thread.
> 
> It's not the first that that happens, and honestly I don't see all that
> much cause for concern.  Heikki did move pending patches to the current
> one, and closed a lot of inactive ones as 'returned with feedback'.

Inactive due to lack of review is the concern, but there is also a
concern that it's intended as a way to ensure committers have time to
work on their own patches instead of just working on patches submitted
through the commitfest process.  Now, I think we all end up trying to
balance making progress on our own patches while also providing help to
the commitfest, but that's the situation we were in constantly before
the commitfest process was put in place because it didn't scale very
well.

If we're always in 'commitfest' mode then we might as well eliminate the
notion of timing them.

> Attentive patch authors should have submitted new versions ... if they
> don't, then someone else with an interest in the patch should do so.
> If no one update the patches, what do we want them for?

As for this, sure, if there's a review and no response then it's fair to
mark the patch as returned with feedback.  The issue is both when no
patch gets a review and when the commitfest never ends.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Column Redaction

2014-10-15 Thread Simon Riggs
On 15 October 2014 21:03, Claudio Freire  wrote:

>> So you're familiar then with this process? So you know that an auditor
>> would trigger an investigation, resulting in deeper surveillance and
>> gathering of evidence that ends with various remedial actions, such as
>> court. How would that process start then, if not this way?
>
> I've seen lots of such investigations fail because the evidence wasn't
> strong enough to link to a particular person, but rather a computer
> terminal or something like that.

So your solution to the evidence problem is to do nothing? Or you have
a better suggestion?

Nothing is certain, apart from doing nothing.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] replicating DROP commands across servers

2014-10-15 Thread Stephen Frost
Alvaro,

On Wednesday, October 15, 2014, Alvaro Herrera 
wrote:

> Alvaro Herrera wrote:
> > Andres Freund wrote:
> >
> > > Having reread the patch just now I basically see two things to
> > > criticize:
> > > a) why isn't this accessible at SQL level? That seems easy to address.
> > > b) Arguably some of this could well be done in separate commits.
> >
> > Fair comments.  I will split it up.
>
> Here's a split version.  The last part is still missing some polish --
> in particular handling for OBJECT_POLICY, and the SQL interface which
> would also allow us to get something in the regression tests.


The OBJECT_POLICY bit is on me to clean up and I'm planning to do so
shortly. I agree that we likely want policies for other objects also as a
couple people have brought up that idea now and will investigate it.

I'm planning to address the copy.c comments first and should have a patch
later tonight.

Thanks!

Stephen


Re: [HACKERS] replicating DROP commands across servers

2014-10-15 Thread Alvaro Herrera
Alvaro Herrera wrote:
> Andres Freund wrote:
> 
> > Having reread the patch just now I basically see two things to
> > criticize:
> > a) why isn't this accessible at SQL level? That seems easy to address.
> > b) Arguably some of this could well be done in separate commits.
> 
> Fair comments.  I will split it up.

Here's a split version.  The last part is still missing some polish --
in particular handling for OBJECT_POLICY, and the SQL interface which
would also allow us to get something in the regression tests.


Note: in this patch series you can find the ObjectTypeMap thing that you
thought was obsolete in the DDL deparse patch ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
>From 92816868c1c717519a76a83e65cd0b48e3726fbf Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 4 Apr 2014 16:05:48 -0300
Subject: [PATCH 1/3] add normal/original flags to
 pg_event_trigger_dropped_objects

---
 doc/src/sgml/func.sgml  | 13 ++
 src/backend/catalog/dependency.c| 21 ++-
 src/backend/commands/event_trigger.c| 16 +---
 src/include/catalog/pg_proc.h   |  2 +-
 src/include/commands/event_trigger.h|  3 ++-
 src/test/regress/expected/event_trigger.out | 40 +
 src/test/regress/sql/event_trigger.sql  | 30 ++
 7 files changed, 114 insertions(+), 11 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7e5bcd9..45f3efa 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -17583,6 +17583,19 @@ FOR EACH ROW EXECUTE PROCEDURE suppress_redundant_updates_trigger();
 Object sub-id (e.g. attribute number for columns)


+original
+bool
+Flag used to identify the root object of the deletion
+   
+   
+normal
+bool
+
+ Flag indicating that there's a normal dependency relationship
+ in the dependency graph leading to this object
+
+   
+   
 object_type
 text
 Type of the object
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 256486c..6485e3d 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -205,16 +205,25 @@ deleteObjectsInList(ObjectAddresses *targetObjects, Relation *depRel,
 	/*
 	 * Keep track of objects for event triggers, if necessary.
 	 */
-	if (trackDroppedObjectsNeeded())
+	if (trackDroppedObjectsNeeded() && !(flags & PERFORM_DELETION_INTERNAL))
 	{
 		for (i = 0; i < targetObjects->numrefs; i++)
 		{
-			ObjectAddress *thisobj = targetObjects->refs + i;
-
-			if ((!(flags & PERFORM_DELETION_INTERNAL)) &&
-EventTriggerSupportsObjectClass(getObjectClass(thisobj)))
+			const ObjectAddress *thisobj = &targetObjects->refs[i];
+			const ObjectAddressExtra *extra = &targetObjects->extras[i];
+			bool	original = false;
+			bool	normal = false;
+
+			if (extra->flags & DEPFLAG_ORIGINAL)
+original = true;
+			if (extra->flags & DEPFLAG_NORMAL)
+normal = true;
+			if (extra->flags & DEPFLAG_REVERSE)
+normal = true;
+
+			if (EventTriggerSupportsObjectClass(getObjectClass(thisobj)))
 			{
-EventTriggerSQLDropAddObject(thisobj);
+EventTriggerSQLDropAddObject(thisobj, original, normal);
 			}
 		}
 	}
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 1b8c94b..0bab971 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -112,6 +112,8 @@ typedef struct SQLDropObject
 	const char *objname;
 	const char *objidentity;
 	const char *objecttype;
+	bool		original;
+	bool		normal;
 	slist_node	next;
 } SQLDropObject;
 
@@ -1105,7 +1107,7 @@ trackDroppedObjectsNeeded(void)
  * Register one object as being dropped by the current command.
  */
 void
-EventTriggerSQLDropAddObject(ObjectAddress *object)
+EventTriggerSQLDropAddObject(const ObjectAddress *object, bool original, bool normal)
 {
 	SQLDropObject *obj;
 	MemoryContext oldcxt;
@@ -1124,6 +1126,8 @@ EventTriggerSQLDropAddObject(ObjectAddress *object)
 
 	obj = palloc0(sizeof(SQLDropObject));
 	obj->address = *object;
+	obj->original = original;
+	obj->normal = normal;
 
 	/*
 	 * Obtain schema names from the object's catalog tuple, if one exists;
@@ -1251,8 +1255,8 @@ pg_event_trigger_dropped_objects(PG_FUNCTION_ARGS)
 	{
 		SQLDropObject *obj;
 		int			i = 0;
-		Datum		values[7];
-		bool		nulls[7];
+		Datum		values[9];
+		bool		nulls[9];
 
 		obj = slist_container(SQLDropObject, next, iter.cur);
 
@@ -1268,6 +1272,12 @@ pg_event_trigger_dropped_objects(PG_FUNCTION_ARGS)
 		/* objsubid */
 		values[i++] = Int32GetDatum(obj->address.objectSubId);
 
+		/* original */
+		values[i++] = BoolGetDatum(obj->original);
+
+		/* normal */
+		values[i++] = BoolGetDatum(obj->normal);
+
 		/* object_type */
 		values[i++] = C

Re: [HACKERS] jsonb generator functions

2014-10-15 Thread Andrew Dunstan


On 10/15/2014 05:47 PM, Alvaro Herrera wrote:

Andrew Dunstan wrote:


If we really want to change the name of json_object_two_arg, it
would probably be best to change it NOW in 9.4 before it gets out
into a production release at all.

Doesn't it require initdb?  If so, I think it's too late now.



Yeah, you're right, it would.

OK, forget that.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb generator functions

2014-10-15 Thread Alvaro Herrera
Andrew Dunstan wrote:

> If we really want to change the name of json_object_two_arg, it
> would probably be best to change it NOW in 9.4 before it gets out
> into a production release at all.

Doesn't it require initdb?  If so, I think it's too late now.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Maximum number of WAL files in the pg_xlog directory

2014-10-15 Thread Josh Berkus
On 10/15/2014 02:17 PM, Guillaume Lelarge wrote:
>> > If we don't count the WAL files, though, that eliminates the best way to
>> > detecting when archiving is failing.
>> >
>> >
> WAL files don't give you this directly. You may think it's an issue to get
> a lot of WAL files, but it can just be a spike of changes. Counting .ready
> files makes more sense when you're trying to see if wal archiving is
> failing. And now, using pg_stat_archiver is the way to go (thanks Gabriele
> :) ).

Yeah, a situation where we can't give our users any kind of reasonable
monitoring threshold at all sucks though.  Also, it makes it kind of
hard to allocate a wal partition if it could be 10X the minimum size,
you know?

What happened to the work Heikki was doing on making transaction log
disk usage sane?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Maximum number of WAL files in the pg_xlog directory

2014-10-15 Thread Guillaume Lelarge
2014-10-15 23:12 GMT+02:00 Josh Berkus :

> On 10/15/2014 01:25 PM, Guillaume Lelarge wrote:
> > Monitoring is another matter, and I don't really think a monitoring
> > solution should count the WAL files. What actually really matters is the
> > database availability, and that is covered with having enough disk space
> in
> > the WALs partition.
>
> If we don't count the WAL files, though, that eliminates the best way to
> detecting when archiving is failing.
>
>
WAL files don't give you this directly. You may think it's an issue to get
a lot of WAL files, but it can just be a spike of changes. Counting .ready
files makes more sense when you're trying to see if wal archiving is
failing. And now, using pg_stat_archiver is the way to go (thanks Gabriele
:) ).


-- 
Guillaume.
  http://blog.guillaume.lelarge.info
  http://www.dalibo.com


Re: [HACKERS] Maximum number of WAL files in the pg_xlog directory

2014-10-15 Thread Josh Berkus
On 10/15/2014 01:25 PM, Guillaume Lelarge wrote:
> Monitoring is another matter, and I don't really think a monitoring
> solution should count the WAL files. What actually really matters is the
> database availability, and that is covered with having enough disk space in
> the WALs partition.

If we don't count the WAL files, though, that eliminates the best way to
detecting when archiving is failing.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb generator functions

2014-10-15 Thread Andrew Dunstan


On 10/15/2014 03:54 PM, I wrote:


On 10/15/2014 07:38 AM, Pavel Stehule wrote:




I checked a code, and I have only two small objection - a name 
"jsonb_object_two_arg" is not good - maybe "json_object_keys_values" ?


It's consistent with the existing json_object_two_arg. In all cases I 
think I kept the names the same except for changing "json" to "jsonb". 
Note that these _two_arg functions are not visible at the SQL level - 
they are only visible in the C code.


I'm happy to be guided by others in changing or keeping these names.



If we really want to change the name of json_object_two_arg, it would 
probably be best to change it NOW in 9.4 before it gets out into a 
production release at all.


Thoughts?

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Maximum number of WAL files in the pg_xlog directory

2014-10-15 Thread Guillaume Lelarge
2014-10-15 22:11 GMT+02:00 Jeff Janes :

> On Fri, Aug 8, 2014 at 12:08 AM, Guillaume Lelarge  > wrote:
>
>> Hi,
>>
>> As part of our monitoring work for our customers, we stumbled upon an
>> issue with our customers' servers who have a wal_keep_segments setting
>> higher than 0.
>>
>> We have a monitoring script that checks the number of WAL files in the
>> pg_xlog directory, according to the setting of three parameters
>> (checkpoint_completion_target, checkpoint_segments, and wal_keep_segments).
>> We usually add a percentage to the usual formula:
>>
>> greatest(
>>   (2 + checkpoint_completion_target) * checkpoint_segments + 1,
>>   checkpoint_segments + wal_keep_segments + 1
>> )
>>
>
> I think the first bug is even having this formula in the documentation to
> start with, and in trying to use it.
>
>
I agree. But we have customers asking how to compute the right size for
their WAL file system partitions. Right size is usually a euphemism for
smallest size, and they usually tend to get it wrong, leading to huge
issues. And I'm not even speaking of monitoring, and alerting.

A way to avoid this issue is probably to erase the formula from the
documentation, and find a new way to explain them how to size their
partitions for WALs.

Monitoring is another matter, and I don't really think a monitoring
solution should count the WAL files. What actually really matters is the
database availability, and that is covered with having enough disk space in
the WALs partition.

"and will normally not be more than..."
>
> This may be "normal" for a toy system.  I think that the normal state for
> any system worth monitoring is that it has had load spikes at some point in
> the past.
>
>
Agreed.


> So it is the next part of the doc, which describes how many segments it
> climbs back down to upon recovering from a spike, which is the important
> one.  And that doesn't mention wal_keep_segments at all, which surely
> cannot be correct.
>
>
Agreed too.


> I will try to independently derive the correct formula from the code, as
> you did, without looking too much at your derivation  first, and see if we
> get the same answer.
>
>
Thanks. I look forward reading what you found.

What seems clear to me right now is that no one has a sane explanation of
the formula. Though yours definitely made sense, it didn't seem to be what
the code does.


-- 
Guillaume.
  http://blog.guillaume.lelarge.info
  http://www.dalibo.com


Re: [HACKERS] WIP: dynahash replacement for buffer table

2014-10-15 Thread Andres Freund
On 2014-10-15 13:41:25 -0400, Robert Haas wrote:
> On Wed, Oct 15, 2014 at 2:03 AM, Andres Freund  wrote:
> > The solution I'm thinking of is to essentially move away from hazard
> > pointers and store something like a generation counter per
> > backend. Which is updated less often, and in combination with other
> > operations. When a backend need to clean up and sees that there's a
> > straggler with a old generation it sends that backend a signal to ensure
> > it sets the latest generation.
> 
> It's possible that might work ... but on the timescale we're talking
> about here, that's asking the garbage collecting process to wait for
> practically geologic time.

I think it depends on what we're tying the generation increase to. We
very well could add a implict generation increment to, say, lwlock
acquisition - which are full barriers anyway. And I don't think we'll
normally have a that high rate of garbage collection anyway - there
should be plenty of headroom.

> Back when I first wrote this code, I spent a fair amount of time
> looking at existing work in the area of lock-free hash tables.
> Essentially all of the work I was able to find on this topic assumes a
> threaded model - or more precisely, it assumes that shared memory
> allocation is cheap and easy and you'll have no problem getting as
> much of it as you need whenever you need it.

FWIW, I think many of the solutions that are actually used in practice
don't rely on that heavily. I know at least some that require memory to
be reserved beforehand, in special contexts.

> This assumption often
> isn't even spelled out explicitly: it's just assumed that that's the
> programming environment you're working in.  Finding highly parallel
> algorithms that don't rely on memory allocation as a primitive is
> hard.  Hazard pointers are one of the few techniques I found that
> seems like it can work in our architecture.  I'm quite reluctant to
> leave aside techniques that have been proven to work well in favor of
> inventing something entirely novel to PostgreSQL.

I don't think something like generation numbers is really that new and
unproven - it's essentially a more trivial version of RCU. Which is used
quite heavily, and under different names.

That said, I'm far from convinced that they are the solution - they just
were the first thing that came to my mind thinking about the problem.

> That having been said, there is some literature on generation numbers,
> and I think something like that could be made to work.  It does have
> some significant disadvantages, though.  One is that a single process
> which fails to update its generation number prevents all reclamation,
> system-wide.In my algorithm, a process whose execution is
> suspended while a hazard pointer is set prevents recycling of only one
> of many garbage lists.  A process searching for a reusable element can
> mostly likely find some other garbage list to reclaim instead.

Hm. Couldn't a similar scheme with several lists be used with generation
numbers?

> Also, a generation number implies that we update the value
> periodically, rather than before and after each critical section.

Hm. Don't think it necessarily has to do that.

> Anything that forces garbage collection to be postponed longer than
> absolutely necessary seems to me likely to be a loser.  It might be a
> little faster as long as we have free elements to allocate, but as
> soon as we're out and have to wait longer than we otherwise would for
> garbage collection, and all system activity halts as a result, even
> for a few milliseconds, it's going to be a whole lot slower.  Or at
> least, I think.

I think it really depends on the user of the facility. If the generation
were e.g. also tied to lwlocks I couldn't see bufmgr.c outrunning that
realistically.

> That having been said, I don't know what to do about the fact that the
> fence is too expensive.

I'm far from sure that it's the problem at hand here - the reason I'm
wondering about the barriers is primarily that the buffer mapping hash
table is one of the top profile entries in in all concurrent
workloads. The top one often even, after removing some locking
bottleneck. Nearly all of the time is spent there due to cache
misses. While I think we can get the table smaller and more efficient
for the same NBuffers value, realistically we need to cope with much
bigger NBuffers values.

Since cache misses are a problem that we're going to have to deal with,
restricting the processor's tool for efficiently dealing with that (out
of order execution, SMT), doesn't seem like a wise choice.

> I don't know that we've really established
> that that's the true root of the problem rather than some other
> pedestrian optimization failure.

Absolutely.

> But the existing code is using an
> atomic operation to acquire a spinlock, then releasing it, walking the
> bucket chain, and then using another atomic operation to acquire a
> spinlock and then releasing it again.

Well, we don

Re: [HACKERS] Maximum number of WAL files in the pg_xlog directory

2014-10-15 Thread Jeff Janes
On Fri, Aug 8, 2014 at 12:08 AM, Guillaume Lelarge 
wrote:

> Hi,
>
> As part of our monitoring work for our customers, we stumbled upon an
> issue with our customers' servers who have a wal_keep_segments setting
> higher than 0.
>
> We have a monitoring script that checks the number of WAL files in the
> pg_xlog directory, according to the setting of three parameters
> (checkpoint_completion_target, checkpoint_segments, and wal_keep_segments).
> We usually add a percentage to the usual formula:
>
> greatest(
>   (2 + checkpoint_completion_target) * checkpoint_segments + 1,
>   checkpoint_segments + wal_keep_segments + 1
> )
>

I think the first bug is even having this formula in the documentation to
start with, and in trying to use it.

"and will normally not be more than..."

This may be "normal" for a toy system.  I think that the normal state for
any system worth monitoring is that it has had load spikes at some point in
the past.

So it is the next part of the doc, which describes how many segments it
climbs back down to upon recovering from a spike, which is the important
one.  And that doesn't mention wal_keep_segments at all, which surely
cannot be correct.

I will try to independently derive the correct formula from the code, as
you did, without looking too much at your derivation  first, and see if we
get the same answer.

Cheers,

Jeff


Re: [HACKERS] Column Redaction

2014-10-15 Thread Claudio Freire
On Wed, Oct 15, 2014 at 4:59 PM, Simon Riggs  wrote:
> On 15 October 2014 20:41, Claudio Freire  wrote:
>> On Sat, Oct 11, 2014 at 4:40 AM, Simon Riggs  wrote:
>>> On 10 October 2014 16:45, Rod Taylor  wrote:
>>> Redaction prevents accidental information loss only, forcing any loss
>>> that occurs to be explicit. It ensures that loss of information can be
>>> tied clearly back to an individual, like an ink packet that stains the
>>> fingers of a thief.
>>
>> That is not true.
>>
>> It can only be tied to a session. That's very far from an individual
>> in court terms, if you ask a lawyer.
>>
>> You need a helluva lot more to tie that to an individual.
>
> So you're familiar then with this process? So you know that an auditor
> would trigger an investigation, resulting in deeper surveillance and
> gathering of evidence that ends with various remedial actions, such as
> court. How would that process start then, if not this way?


I've seen lots of such investigations fail because the evidence wasn't
strong enough to link to a particular person, but rather a computer
terminal or something like that.

Unless you also physically restrict access to such terminal to a
single person through other means (which is quite uncommon practice
except perhaps in banks), that evidence is barely circumstantial.

But you'd have to ask a lawyer in your country to be sure. I can only
speak for my own experiences in my own country which is probably not
yours nor has the same laws. Law is a complex beast.

So, you really want actual information security in addition to that
deterrent you speak of. I don't say the deterrent is bad, I only say
it's not good enough on its own.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-15 Thread Simon Riggs
On 15 October 2014 20:41, Claudio Freire  wrote:
> On Sat, Oct 11, 2014 at 4:40 AM, Simon Riggs  wrote:
>> On 10 October 2014 16:45, Rod Taylor  wrote:
>> Redaction prevents accidental information loss only, forcing any loss
>> that occurs to be explicit. It ensures that loss of information can be
>> tied clearly back to an individual, like an ink packet that stains the
>> fingers of a thief.
>
> That is not true.
>
> It can only be tied to a session. That's very far from an individual
> in court terms, if you ask a lawyer.
>
> You need a helluva lot more to tie that to an individual.

So you're familiar then with this process? So you know that an auditor
would trigger an investigation, resulting in deeper surveillance and
gathering of evidence that ends with various remedial actions, such as
court. How would that process start then, if not this way?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb generator functions

2014-10-15 Thread Andrew Dunstan


On 10/15/2014 07:38 AM, Pavel Stehule wrote:



2014-10-13 17:22 GMT+02:00 Andrew Dunstan >:



On 10/13/2014 09:37 AM, Andrew Dunstan wrote:


On 09/26/2014 04:54 PM, Andrew Dunstan wrote:


Here is a patch for the generator and aggregate functions
for jsonb that we didn't manage to get done in time for
9.4. They are all equivalents of the similarly names json
functions. Included are

to_jsonb
jsonb_build_object
jsonb_build_array
jsonb_object
jsonb_agg
jsonb_object_agg


Still to come: documentation.

Adding to the next commitfest.



Revised patch to fix compiler warnings.


And again, initializing an incompletely initialized variable, as
found by Pavel Stehule.


I checked a code, and I have only two small objection - a name 
"jsonb_object_two_arg" is not good - maybe "json_object_keys_values" ?


It's consistent with the existing json_object_two_arg. In all cases I 
think I kept the names the same except for changing "json" to "jsonb". 
Note that these _two_arg functions are not visible at the SQL level - 
they are only visible in the C code.


I'm happy to be guided by others in changing or keeping these names.



Next: there are no tests for to_jsonb function.




Oh, my bad. I'll add some.

Thank for the review.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] replicating DROP commands across servers

2014-10-15 Thread Alvaro Herrera
Andres Freund wrote:

> Having reread the patch just now I basically see two things to
> criticize:
> a) why isn't this accessible at SQL level? That seems easy to address.
> b) Arguably some of this could well be done in separate commits.

Fair comments.  I will split it up.

FWIW, I spent some time today fighting with this stuff but the
OBJECT_POLICY stuff has been causing me some trouble.  I'm not sure that
what's there currently in get_object_address is completely sane -- for
one thing I'm unsure that tables are the only object kind in the system
that are subject to policies.  In that case, we have a shortage of
abstraction here, it seems, which will need some additional fixes.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] replicating DROP commands across servers

2014-10-15 Thread Andres Freund
On 2014-10-04 21:12:24 -0400, Robert Haas wrote:
> On Fri, Oct 3, 2014 at 4:58 PM, Alvaro Herrera  
> wrote:
> > Robert Haas wrote:
> >> I'm not really very convinced that it's a good idea to expose this
> >> instead of just figuring out a way to parse the object identity.
> >
> > That's the first thing I tried.  But it's not pretty: you have to
> > extract schema names by splitting at a period (and what if a schema name
> > contains a period?),
> 
> Please tell me that the literals are escaped if necessary.  If so,
> this is pretty easy.  quote_literal() is not a hard transformation to
> reverse, and splitting on a unquoted period is not hard...

> > split out on ON for certain object types,
> 
> ...nor is splitting on any other fixed text string, such as " ON ".

I'm not following here. Maybe just because I'm misunderstanding your
position.

The patch imo consists out of the following parts:
1) Addition of dependency information to the dropped object list
2) Actual get_object_address() handling for default values - the current
   behaviour looks pretty borked to me.
3) The reverse of getObjectTypeDescription()
4) getObjectIdentityParts() - a slightly more detailed version of
   getObjectIdentity() that requires less string parsing
5) drop even trigger support for a few types.

Are you saying you want to add a function to do 4) via parsing inside
postgres or are you suggesting to do that in every user of this
facility?

If the former, why would it be preferrable to do string parsing if we
have access to the source data? That part of the patch looks trivial to
me?

If the latter, I don't see the advantage either - this is complicated
enough, why should different users repeat the work?


Am I misunderstanding you here?


Having reread the patch just now I basically see two things to
criticize:
a) why isn't this accessible at SQL level? That seems easy to address.
b) Arguably some of this could well be done in separate commits.

> > It's just not sane to try to parse such text strings.
> 
> But this is a pretty ridiculous argument.  We have an existing parser
> that does it just fine

Which happens to be the part of postgres that's copied most often. So
it's certainly not something appearing to be trivial.

>,and a special-purpose parser that does just
> that (and not all of the other stuff that the main parser does) would
> be a great deal simpler.

That parser also happens to be far from trivial (if we're talking about
parseNameAndArgTypes() - which just solves one of the cases.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-15 Thread Claudio Freire
On Sat, Oct 11, 2014 at 4:40 AM, Simon Riggs  wrote:
> On 10 October 2014 16:45, Rod Taylor  wrote:
> Redaction prevents accidental information loss only, forcing any loss
> that occurs to be explicit. It ensures that loss of information can be
> tied clearly back to an individual, like an ink packet that stains the
> fingers of a thief.

That is not true.

It can only be tied to a session. That's very far from an individual
in court terms, if you ask a lawyer.

You need a helluva lot more to tie that to an individual.

> Redaction clearly relies completely on auditing before it can have any
> additional effect. And the effectiveness of redaction needs to be
> understood next to Rod's example.

It forces you to audit all of the queries issued by the otherwise trusted user.

That is, I believe, a far from optimal design. When you have to audit
everything, you end up auditing nothing, a haystack of false positives
can easily hide the needle that is the true positive.

What you want, is something that allows selective auditing of
leak-prone queries.

But we've seen that joining is already a leak-prone query, so clearly
you cannot allow simple joining if you want the above.

What I propose, needs a schema change and some preparedness from the
DBA. But, how can you assume that to be asking too much and not say
the same from thorough auditing?

So, what I propose, is to require explicit separation of concepts at
the schema level.

On Sat, Oct 11, 2014 at 10:43 AM, Bruce Momjian  wrote:
> For example, for a credit card type, you would output the last four
> digits, but is there any value to storing the non-visible digits?  You
> can check the checksum of the digits, but that can be done on input and
> doesn't require the storage of the digits.  Is there some function we
> could provide that would make that data type useful?  Could we provide
> comparison functions with delays or increasing delays?

Basically, as said above, the point is to provide a data type that is
nigh-useless.

Imagine a redacted card number as a tuple (full_value_id, suffix).
Suffix is in cleartext, and prefix_id is just an id pointing to a
lookup table for the type.

Regular users can read any redacted_number column, but will only get
the id (useless unless they already know what that prefix is), and
suffix. Format for that type would be " suffix" and would serve
the purpose on the OP: it can be joined (equal value = equal id).
Moreover, the type can be design in one of two ways: equal values
contain equal id, or salted-values, where even equal values generated
from different computations (ie: not copied) have different ids. This
second mode would be the most secure, albeit a tad hard to use
perhaps.

But it would allow joining and everything. Only users that have access
to the lookup table would be allowed to resolve the full value, with a
non-security-defining function like:

extract_full_value(redacted_number)

Then you can audit all queries against the lookup table, and you have
rather strong security IMHO.

This can all be done without any new features to postgres. Maybe you
can add syntactic sugar, but you don't really need anything on the
core to accomplish the above.

The syntactic sugar can take the form of a new data type family (like
enum?) where you specify the redaction function, redacted data type,
output format, and from there everything else works atomagically, with
a

extract_full(any) -> any

function that somehow knows what to do.


On Wed, Oct 15, 2014 at 3:57 PM, Simon Riggs  wrote:
> On 15 October 2014 19:46, Robert Haas  wrote:
>
>>> In IT terms, we're looking at controlling and reducing improper access
>>> to data by an otherwise Trusted person. The only problem is that some
>>> actions on data items are allowed, others are not.
>>
>> Sure, I don't disagree with any of that as a general principle.  I
>> just think we should look for some ways of shoring up your proposal
>> against some of the more obvious attacks, so as to have more good and
>> less bad.
>
> Suggestions welcome. I'm not in a rush to implement this, so we have
> time to mull it over.

Does the above work for your intended purposes?

Hard to know from what you've posted until now, but I believe it does.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for better support of time-varying timezone abbreviations

2014-10-15 Thread Tom Lane
... and here is a draft patch for the timezone abbreviation data files.

I changed all the abbreviations for which the parent zone had used more
than one GMT offset since 1970.  That seemed like a good cutoff to avoid
wasting cycles on ancient history, especially since the IANA people
themselves don't make any large promises about the quality of their data
before 1970.

Although zones in the Russian Federation are the majority of zones that
had abbreviation changes, a quick look at this patch shows that they're
hardly the only ones.  We've been sticking our heads in the sand about
this problem for quite a while :-(

regards, tom lane

diff --git a/src/timezone/tznames/America.txt b/src/timezone/tznames/America.txt
index 54b51fe..9e62732 100644
*** a/src/timezone/tznames/America.txt
--- b/src/timezone/tznames/America.txt
*** AMT-14400# Amazon Time
*** 47,53 
   # (America/Cuiaba)
   # (America/Manaus)
   # (America/Porto_Velho)
! ART-10800# Argentina Time
   # (America/Argentina/Buenos_Aires)
   # (America/Argentina/Cordoba)
   # (America/Argentina/Tucuman)
--- 47,53 
   # (America/Cuiaba)
   # (America/Manaus)
   # (America/Porto_Velho)
! ARTAmerica/Argentina/Buenos_Aires  # Argentina Time
   # (America/Argentina/Buenos_Aires)
   # (America/Argentina/Cordoba)
   # (America/Argentina/Tucuman)
*** ART-10800# Argentina Time
*** 58,64 
   # (America/Argentina/Mendoza)
   # (America/Argentina/Rio_Gallegos)
   # (America/Argentina/Ushuaia)
! ARST-7200 D  # Argentina Summer Time
  # CONFLICT! AST is not unique
  # Other timezones:
  #  - AST: Arabic Standard Time (Asia)
--- 58,64 
   # (America/Argentina/Mendoza)
   # (America/Argentina/Rio_Gallegos)
   # (America/Argentina/Ushuaia)
! ARSTAmerica/Argentina/Buenos_Aires  # Argentina Summer Time
  # CONFLICT! AST is not unique
  # Other timezones:
  #  - AST: Arabic Standard Time (Asia)
*** GMT 0# Greenwich Mean Time
*** 228,234 
   # (Etc/GMT)
   # (Europe/Dublin)
   # (Europe/London)
! GYT-14400# Guyana Time
   # (America/Guyana)
  HADT   -32400 D  # Hawaii-Aleutian Daylight Time
   # (America/Adak)
--- 228,234 
   # (Etc/GMT)
   # (Europe/Dublin)
   # (Europe/London)
! GYTAmerica/Guyana  # Guyana Time
   # (America/Guyana)
  HADT   -32400 D  # Hawaii-Aleutian Daylight Time
   # (America/Adak)
*** PST-28800# Pacific Standard Time
*** 285,299 
   # (Pacific/Pitcairn)
  PYST   -10800 D  # Paraguay Summer Time
   # (America/Asuncion)
! PYT-14400# Paraguay Time
   # (America/Asuncion)
! SRT-10800# Suriname Time
   # (America/Paramaribo)
  UYST-7200 D  # Uruguay Summer Time
   # (America/Montevideo)
  UYT-10800# Uruguay Time
   # (America/Montevideo)
! VET-16200# Venezuela Time (caution: this used to mean -14400)
   # (America/Caracas)
  WGST-7200 D  # Western Greenland Summer Time
   # (America/Godthab)
--- 285,299 
   # (Pacific/Pitcairn)
  PYST   -10800 D  # Paraguay Summer Time
   # (America/Asuncion)
! PYTAmerica/Asuncion  # Paraguay Time
   # (America/Asuncion)
! SRTAmerica/Paramaribo  # Suriname Time
   # (America/Paramaribo)
  UYST-7200 D  # Uruguay Summer Time
   # (America/Montevideo)
  UYT-10800# Uruguay Time
   # (America/Montevideo)
! VETAmerica/Caracas  # Venezuela Time
   # (America/Caracas)
  WGST-7200 D  # Western Greenland Summer Time
   # (America/Godthab)
diff --git a/src/timezone/tznames/Antarctica.txt b/src/timezone/tznames/Antarctica.txt
index 5a03250..2359020 100644
*** a/src/timezone/tznames/Antarctica.txt
--- b/src/timezone/tznames/Antarctica.txt
*** CLST   -10800 D  # Chile Summer Time
*** 16,26 
  CLT-14400# Chile Time
   # (America/Santiago)
   # (Antarctica/Palmer)
! DAVT25200# Davis Time (Antarctica)
   # (Antarctica/Davis)
  DDUT36000# Dumont-d`Urville Time (Antarctica)
   # (Antarctica/DumontDUrville)
! MAWT18000   

Re: [HACKERS] Column Redaction

2014-10-15 Thread Simon Riggs
On 15 October 2014 19:46, Robert Haas  wrote:

>> In IT terms, we're looking at controlling and reducing improper access
>> to data by an otherwise Trusted person. The only problem is that some
>> actions on data items are allowed, others are not.
>
> Sure, I don't disagree with any of that as a general principle.  I
> just think we should look for some ways of shoring up your proposal
> against some of the more obvious attacks, so as to have more good and
> less bad.

Suggestions welcome. I'm not in a rush to implement this, so we have
time to mull it over.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-15 Thread Simon Riggs
On 15 October 2014 17:03, Robert Haas  wrote:

> Well, I'm fervently in agreement with you on one point: the first
> version of all this needs to be as simple as possible, or the time to
> get to the first version will be longer than we can afford to wait.  I
> think what we're discussing here is which things are important enough
> that it makes sense to have them in the first version, and which
> things can wait.

I'm guessing we might differ slightly on what constitutes as simple as possible.

Something usable, with severe restrictions, is actually better than we
have now. I understand the journey this work represents, so don't be
embarrassed by submitting things with heuristics and good-enoughs in
it. Our mentor, Mr.Lane, achieved much by spreading work over many
releases, leaving others to join in the task.

Might I gently enquire what the "something usable" we are going to see
in this release? I'm not up on current plans.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-15 Thread Robert Haas
On Wed, Oct 15, 2014 at 4:04 AM, Simon Riggs  wrote:
> On 14 October 2014 17:43, Robert Haas  wrote:
>> On Sat, Oct 11, 2014 at 3:40 AM, Simon Riggs  wrote:
>>> As soon as you issue the above query, you have clearly indicated your
>>> intention to steal. Receiving information is no longer accidental, it
>>> is an explicit act that is logged in the auditing system against your
>>> name. This is sufficient to bury you in court and it is now a real
>>> deterrent. Redaction has worked.
>>
>> To me, this feels thin.  It's true that this might be good enough for
>> some users, but I wouldn't bet on it being good enough for very many
>> users, and I really hope there's a better option.  We have an existing
>> method of doing data redaction via security barrier views.
>
> I agree with "thin". There is a leak in the design, so let me coin the
> phrase "imprecise security". Of course, the leaks reduce the value of
> such a feature; they just don't reduce it all the way to zero.
>
> Security barrier views or views of any kind don't do the required job.
>
> We are not able to easily classify people as Trusted or Untrusted.
>
> We're seeking to differentiate between the right to use a column for
> queries and the right to see the value itself. Or put another way, you
> can read the book, you just can't photocopy it and take the copy home.
> Or, you can try on the new clothes to see if they fit, but you can't
> take them home for free. Both of those examples have imprecise
> security measures in place to control and reduce negative behaviours
> and in every other industry this is known as "security".
>
> In IT terms, we're looking at controlling and reducing improper access
> to data by an otherwise Trusted person. The only problem is that some
> actions on data items are allowed, others are not.

Sure, I don't disagree with any of that as a general principle.  I
just think we should look for some ways of shoring up your proposal
against some of the more obvious attacks, so as to have more good and
less bad.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Buffer Requests Trace

2014-10-15 Thread Jeff Janes
On Wed, Oct 15, 2014 at 6:22 AM, Lucas Lersch  wrote:

> So is it a possible normal behavior that running tpcc for 10min only
> access 50% of the database? Furthermore, is there a guideline of parameters
> for tpcc (# of warehouses, execution time, operations weight)?
>
>
I'm not familiar with your benchmarking tool.  With the one I am most
familiar with, pgbench, if you run it against a database which is too big
to fit in memory, it can take a very long time to touch each page once,
because the constant random disk reads makes it run very slowly.  Maybe
that is something to consider here--how many transactions were actually
executed during your 10 min run?

Also, the tool might build tables that are only used under certain run
options.  Perhaps you just aren't choosing the options which invoke usage
of those tables.  Since you have the trace data, it should be pretty easy
to count how many distinct blocks are accessed from each relation, and
compare that to the size of the relations to see which relations are unused
or lightly used.

Cheers,

Jeff


Re: [HACKERS] WIP: dynahash replacement for buffer table

2014-10-15 Thread Robert Haas
On Wed, Oct 15, 2014 at 2:03 AM, Andres Freund  wrote:
> On 2014-10-14 17:53:10 -0400, Robert Haas wrote:
>> On Tue, Oct 14, 2014 at 4:42 PM, Andres Freund  
>> wrote:
>> >> The code in CHashSearch shows the problem there: you need to STORE the
>> >> hazard pointer before you begin to do the LOAD operations to scan the
>> >> bucket, and you must finish all of those LOADs before you STORE the
>> >> NULL hazard pointer.  A read or write barrier won't provide store/load
>> >> or load/store ordering.
>> >
>> > I'm not sure I understand the problem here - but a read + write barrier
>> > should suffice? To avoid falling back to two full barriers, we'd need a
>> > separate define pg_read_write_barrier(), but that's not a problem. IIUC
>> > that should allow us to avoid emitting any actual code on x86.
>>
>> Well, thinking about x86 specifically, it definitely needs at least
>> one mfence, after setting the hazard pointer and before beginning to
>> read the bucket chain.
>
> Yes, I can see that now. I do wonder if that's not going to prove quite
> expensive... I think I can see some ways around needing that. But I
> think that needs some benchmarking first - no need to build a even more
> complex solution if not needed.
>
> The solution I'm thinking of is to essentially move away from hazard
> pointers and store something like a generation counter per
> backend. Which is updated less often, and in combination with other
> operations. When a backend need to clean up and sees that there's a
> straggler with a old generation it sends that backend a signal to ensure
> it sets the latest generation.

It's possible that might work ... but on the timescale we're talking
about here, that's asking the garbage collecting process to wait for
practically geologic time.

Back when I first wrote this code, I spent a fair amount of time
looking at existing work in the area of lock-free hash tables.
Essentially all of the work I was able to find on this topic assumes a
threaded model - or more precisely, it assumes that shared memory
allocation is cheap and easy and you'll have no problem getting as
much of it as you need whenever you need it.  This assumption often
isn't even spelled out explicitly: it's just assumed that that's the
programming environment you're working in.  Finding highly parallel
algorithms that don't rely on memory allocation as a primitive is
hard.  Hazard pointers are one of the few techniques I found that
seems like it can work in our architecture.  I'm quite reluctant to
leave aside techniques that have been proven to work well in favor of
inventing something entirely novel to PostgreSQL.

That having been said, there is some literature on generation numbers,
and I think something like that could be made to work.  It does have
some significant disadvantages, though.  One is that a single process
which fails to update its generation number prevents all reclamation,
system-wide.In my algorithm, a process whose execution is
suspended while a hazard pointer is set prevents recycling of only one
of many garbage lists.  A process searching for a reusable element can
mostly likely find some other garbage list to reclaim instead.  Also,
a generation number implies that we update the value periodically,
rather than before and after each critical section.  Anything that
forces garbage collection to be postponed longer than absolutely
necessary seems to me likely to be a loser.  It might be a little
faster as long as we have free elements to allocate, but as soon as
we're out and have to wait longer than we otherwise would for garbage
collection, and all system activity halts as a result, even for a few
milliseconds, it's going to be a whole lot slower.  Or at least, I
think.

That having been said, I don't know what to do about the fact that the
fence is too expensive.  I don't know that we've really established
that that's the true root of the problem rather than some other
pedestrian optimization failure.  But the existing code is using an
atomic operation to acquire a spinlock, then releasing it, walking the
bucket chain, and then using another atomic operation to acquire a
spinlock and then releasing it again.  Surely a pure fence shouldn't
cost more than a spinlock cycle?  Even with arguably one extra cache
line touch, that seems like it ought to be a win.  But my intuitions
in this area are shaky.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: dynahash replacement for buffer table

2014-10-15 Thread Ryan Johnson

On 15/10/2014 10:32 AM, Ants Aasma wrote:

On Tue, Oct 14, 2014 at 6:19 PM, Robert Haas  wrote:

With regard for using a hash table for the buffer mapping lock I'm
doubtful that any form of separate chaining is the right one. We
currently have a quite noticeable problem with the number of cache
misses in the buffer mapping hash (and also the heavyweight lock mgr) -
if we stay with hashes that's only going to be fundamentally lower than
dynahash if we change the type of hashing. I've had good, *preliminary*,
results using an open addressing + linear probing approach.

I'm very skeptical of open addressing + linear probing.  Such hash
tables tend to degrade over time, because you have to leave tombstones
behind to ensure that future scans advance far enough.  There's really
no way to recover without rebuilding the whole hash table, and
eventually it degrades to linear search.  If we're spending too much
time walking hash chains, I think the solution is to increase the
number of buckets so that the chains get shorter.

What about cuckoo hashing? There was a recent paper on how to do fine
grained locking with cuckoo hashes. [1]

I'm imagining a bucketized cuckoo hash with 5 item buckets (5-way
associativity). This allows us to fit the bucket onto 2 regular sized
cache lines and have 8 bytes left over. Buckets would be protected by
seqlocks stored in the extra space. On the read side we would only
need 2 read barriers (basically free on x86), and we are guaranteed to
have an answer by fetching two pairs of cache lines. We can trade
memory bandwidth for latency by issuing prefetches (once we add
primitives for this). Alternatively we can trade insert speed for
lookup speed by using asymmetrically sized tables.

[1] https://www.cs.princeton.edu/~mfreed/docs/cuckoo-eurosys14.pdf
Actually, I'd go with second-chance hashing [2], same number of hash 
functions but it's more stable (no infinite loops, for example). Most 
probably the techniques from [1] would apply equally well.


[2] 
http://www.eecs.harvard.edu/~michaelm/postscripts/infocom_hardware_submit.pdf


Ryan



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: dynahash replacement for buffer table

2014-10-15 Thread Ants Aasma
On Tue, Oct 14, 2014 at 6:19 PM, Robert Haas  wrote:
>> With regard for using a hash table for the buffer mapping lock I'm
>> doubtful that any form of separate chaining is the right one. We
>> currently have a quite noticeable problem with the number of cache
>> misses in the buffer mapping hash (and also the heavyweight lock mgr) -
>> if we stay with hashes that's only going to be fundamentally lower than
>> dynahash if we change the type of hashing. I've had good, *preliminary*,
>> results using an open addressing + linear probing approach.
>
> I'm very skeptical of open addressing + linear probing.  Such hash
> tables tend to degrade over time, because you have to leave tombstones
> behind to ensure that future scans advance far enough.  There's really
> no way to recover without rebuilding the whole hash table, and
> eventually it degrades to linear search.  If we're spending too much
> time walking hash chains, I think the solution is to increase the
> number of buckets so that the chains get shorter.

What about cuckoo hashing? There was a recent paper on how to do fine
grained locking with cuckoo hashes. [1]

I'm imagining a bucketized cuckoo hash with 5 item buckets (5-way
associativity). This allows us to fit the bucket onto 2 regular sized
cache lines and have 8 bytes left over. Buckets would be protected by
seqlocks stored in the extra space. On the read side we would only
need 2 read barriers (basically free on x86), and we are guaranteed to
have an answer by fetching two pairs of cache lines. We can trade
memory bandwidth for latency by issuing prefetches (once we add
primitives for this). Alternatively we can trade insert speed for
lookup speed by using asymmetrically sized tables.

[1] https://www.cs.princeton.edu/~mfreed/docs/cuckoo-eurosys14.pdf

Regards,
Ants Aasma
-- 
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Maximum number of WAL files in the pg_xlog directory

2014-10-15 Thread Fujii Masao
On Fri, Aug 8, 2014 at 4:08 PM, Guillaume Lelarge
 wrote:
> Hi,
>
> As part of our monitoring work for our customers, we stumbled upon an issue
> with our customers' servers who have a wal_keep_segments setting higher than
> 0.
>
> We have a monitoring script that checks the number of WAL files in the
> pg_xlog directory, according to the setting of three parameters
> (checkpoint_completion_target, checkpoint_segments, and wal_keep_segments).
> We usually add a percentage to the usual formula:
>
> greatest(
>   (2 + checkpoint_completion_target) * checkpoint_segments + 1,
>   checkpoint_segments + wal_keep_segments + 1
> )
>
> And we have lots of alerts from the script for customers who set their
> wal_keep_segments setting higher than 0.
>
> So we started to question this sentence of the documentation:
>
> There will always be at least one WAL segment file, and will normally not be
> more than (2 + checkpoint_completion_target) * checkpoint_segments + 1 or
> checkpoint_segments + wal_keep_segments + 1 files.
>
> (http://www.postgresql.org/docs/9.3/static/wal-configuration.html)
>
> While doing some tests, it appears it would be more something like:
>
> wal_keep_segments + (2 + checkpoint_completion_target) * checkpoint_segments
> + 1
>
> But after reading the source code (src/backend/access/transam/xlog.c), the
> right formula seems to be:
>
> wal_keep_segments + 2 * checkpoint_segments + 1
>
> Here is how we went to this formula...
>
> CreateCheckPoint(..) is responsible, among other things, for deleting and
> recycling old WAL files. From src/backend/access/transam/xlog.c, master
> branch, line 8363:
>
> /*
>  * Delete old log files (those no longer needed even for previous
>  * checkpoint or the standbys in XLOG streaming).
>  */
> if (_logSegNo)
> {
> KeepLogSeg(recptr, &_logSegNo);
> _logSegNo--;
> RemoveOldXlogFiles(_logSegNo, recptr);
> }
>
> KeepLogSeg(...) function takes care of wal_keep_segments. From
> src/backend/access/transam/xlog.c, master branch, line 8792:
>
> /* compute limit for wal_keep_segments first */
> if (wal_keep_segments > 0)
> {
> /* avoid underflow, don't go below 1 */
> if (segno <= wal_keep_segments)
> segno = 1;
> else
> segno = segno - wal_keep_segments;
> }
>
> IOW, the segment number (segno) is decremented according to the setting of
> wal_keep_segments. segno is then sent back to CreateCheckPoint(...) via
> _logSegNo. The RemoveOldXlogFiles() gets this segment number so that it can
> remove or recycle all files before this segment number. This function gets
> the number of WAL files to recycle with the XLOGfileslop constant, which is
> defined as:
>
> /*
>  * XLOGfileslop is the maximum number of preallocated future XLOG segments.
>  * When we are done with an old XLOG segment file, we will recycle it as a
>  * future XLOG segment as long as there aren't already XLOGfileslop future
>  * segments; else we'll delete it.  This could be made a separate GUC
>  * variable, but at present I think it's sufficient to hardwire it as
>  * 2*CheckPointSegments+1.  Under normal conditions, a checkpoint will free
>  * no more than 2*CheckPointSegments log segments, and we want to recycle
> all
>  * of them; the +1 allows boundary cases to happen without wasting a
>  * delete/create-segment cycle.
>  */
> #define XLOGfileslop(2*CheckPointSegments + 1)
>
> (in src/backend/access/transam/xlog.c, master branch, line 100)
>
> IOW, PostgreSQL will keep wal_keep_segments WAL files before the current WAL
> file, and then there may be 2*CheckPointSegments + 1 recycled ones. Hence
> the formula:
>
> wal_keep_segments + 2 * checkpoint_segments + 1
>
> And this is what we usually find in our customers' servers. We may find more
> WAL files, depending on the write activity of the cluster, but in average,
> we get this number of WAL files.
>
> AFAICT, the documentation is wrong about the usual number of WAL files in
> the pg_xlog directory. But I may be wrong, in which case, the documentation
> isn't clear enough for me, and should be fixed so that others can't
> misinterpret it like I may have done.
>
> Any comments? did I miss something, or should we fix the documentation?

I think you're right. The correct formula of the number of WAL files in
pg_xlog seems to be

(3 + checkpoint_completion_target) * checkpoint_segments + 1

or

wal_keep_segments + 2 * checkpoint_segments + 1


Why? At the end of checkpoint, the WAL files which were generated since the
start of previous checkpoint cannot be removed and must remain in pg_xlog.
The number of them is

(1 + checkpoint_completion_target) * checkpoint_segments

or

wal_keep_segments

Also, at the end of checkpoint, as you pointed out, if there are
*many* enough old WAL files, 2 * checkpoint_segments + 1 WAL files will be
recycled. Then checkpoint_segments WAL files will be consumed till the end of
next checkpoint. But since there are already 2 * checkpoint_segments + 1
recycled WAL files, no

Re: [HACKERS] Locking for Rename To new_name works differently for different objects

2014-10-15 Thread Robert Haas
On Wed, Oct 15, 2014 at 10:04 AM, Tom Lane  wrote:
> Amit Kapila  writes:
>> I have observed that for renaming some of the objects
>> AccessExclusiveLock is taken on object whereas for
>> other kind of objects no lock is taken on object before
>> renaming the object.
>
> The usual theory for DDL updates of all types (not just rename)
> is that an explicit lock is only needed for objects whose catalog
> representation comprises more than one row.  Otherwise, the implicit
> locking involved in updating that row is sufficient to serialize
> different updates.

That's an interesting point that I hadn't considered, but I'm willing
to believe that at least some of the differences might also be
haphazard.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-15 Thread Robert Haas
On Wed, Oct 15, 2014 at 10:12 AM, Simon Riggs  wrote:
> On 15 October 2014 14:46, Robert Haas  wrote:
>>> When my family goes to a restaurant, any member of the party may ask
>>> for a table and the request is granted for the whole family. But the
>>> lock is released only when I pay the bill. Once we have the table, any
>>> stragglers know we have locked the table and they just come sit at the
>>> table without needing to make their own lock request to the Maitre D',
>>> though they clearly cache the knowledge that we have the table locked.
>
>> Hmm, interesting idea.  Suppose, though, that the child process
>> requests a lock that can't immediately be granted, because the catalog
>> it's trying to access is locked in AccessExclusiveLock mode by an
>> unrelated transaction.  The unrelated transaction, in turn, is blocked
>> trying to acquire some resource, which the top level parallelism
>> process.  Assuming the top level parallelism process is waiting for
>> the child (or will eventually wait), this is a deadlock, but without
>> some modification to the deadlock detector, it can't see one of the
>> edges.
>
> Family disputes are fairly easily resolved ;-)
>
> The first and basic point is that in most cases the parent should
> already hold the required locks. This can only happen for briefly held
> locks and/or more complex stuff. In the first case, getting
> parallelism to work without that complex stuff would be useful. I'd be
> happy if the first version simply throws an error if a child can't
> acquire a lock immediately. Don't overthink the first version. Knowing
> you'll disagree, lets take a further step...

Well, I'm fervently in agreement with you on one point: the first
version of all this needs to be as simple as possible, or the time to
get to the first version will be longer than we can afford to wait.  I
think what we're discussing here is which things are important enough
that it makes sense to have them in the first version, and which
things can wait.  I also think we are in agreement that at least SOME
thought about lock management is needed; the question we're trying to
hash out is whether what I'm proposing to try to do here is
significantly more ambitious than what's really necessary for V1.
Which is a good question.

> Second point, the relationship between parent and children is clear.
> If we do a deadlock detection, we should be able to search for that as
> a special case, since we will know that we are a child and that such a
> situation might occur. So just add in an edge so the rest of the
> deadlock code works fine.
>
> If that doesn't work, use a heurisic. If parent is waiting when child
> does deadlock test, assume its a deadlock and abort the child
> speculatively just in case. You can work out how to do that better in
> the future, since it won't happen that often.

Well, the deadlock checker needs to work not only when one of the
parallel processes invokes it, but also when somebody else invokes it.
For example, suppose we have parallel processes A and B.  A holds lock
1 and awaits lock 2, while B holds awaits lock 3.  No problem!  Now
process X, which is holding lock 2 tries to grab lock 1.  X must be
able to detect the deadlock.  Now, that's not necessarily a problem
with what you said: it just means that the parent-child relationship
has to be clear from the contents of shared memory.

Which is pretty much the direction I'm aiming with the incomplete
patch I posted.  For the sake of simplicity, I want to assume that
every process in a locking group waits for every other process in the
same locking group, even though that might not be technically true in
every case.   If the parent is waiting for a lock and the guy who has
that lock is waiting for a parallel worker in the same group, my plan
(which I think matches your suggestion) is to call that a deadlock.
There are a few sticky points here: sometimes, the deadlock checker
doesn't actually abort transactions, but just rearranges the wait
queue, so something at least somewhat sensible needs to happen in that
case.

The thing that I'm aiming to do in the patch which I think you are
suggesting might not be necessary is to make it possible for the child
go ahead and request AccessShareLock on the scan relation even though
the parent might already hold some other lock (perhaps even
AccessExclusiveLock).   I want to make the lock manager smart enough
to recognize that those locks are mutually non-conflicting because of
the fact that the two processes are in close cooperation.  Clearly, we
could get by without that, but it adds complexity in other places: the
parent has to never release its lock (even if killed) until the child
is done with the relation; and the scan code itself needs to be
conditional, passing NoLock from children and some other mode in the
parent.  That's all manageable, but it looks to me like doing the
necessary surgery on the lock manager isn't actually going to be that
hard; most of the necessary logic

Re: [HACKERS] [BUGS] BUG #10823: Better REINDEX syntax.

2014-10-15 Thread Alvaro Herrera
Stephen Frost wrote:
> * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> > We lost this patch for the October commitfest, didn't we?
> 
> I'm guessing you missed that a new version just got submitted..?

Which one, reindex schema?  Isn't that a completely different patch?

> I'd be fine with today's being added to the october commitfest..
> 
> Of course, there's a whole independent discussion to be had about how
> there wasn't any break between last commitfest and this one, but that
> probably deserves its own thread.

It's not the first that that happens, and honestly I don't see all that
much cause for concern.  Heikki did move pending patches to the current
one, and closed a lot of inactive ones as 'returned with feedback'.
Attentive patch authors should have submitted new versions ... if they
don't, then someone else with an interest in the patch should do so.
If no one update the patches, what do we want them for?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Buffer Requests Trace

2014-10-15 Thread Lucas Lersch
I got the following numbers from my tpcc database:

Data size: ~6059MB
Index size: ~1390MB
Total size: ~7400MB

Even considering index-only scans, the ratio of around 50% of the database
pages being accessed seems unrealistic to me.



On Wed, Oct 15, 2014 at 3:50 PM, Stephen Frost  wrote:

> * Lucas Lersch (lucasler...@gmail.com) wrote:
> > So is it a possible normal behavior that running tpcc for 10min only
> access
> > 50% of the database? Furthermore, is there a guideline of parameters for
> > tpcc (# of warehouses, execution time, operations weight)?
>
> Depends- you may be aware that we support index-only scans in certain
> situations.  This means that only the index page for a given relation
> (and the visibility map) are accessed, and the heap is not.
>
> Thanks,
>
> Stephen
>



-- 
Lucas Lersch


Re: [HACKERS] [BUGS] BUG #10823: Better REINDEX syntax.

2014-10-15 Thread Stephen Frost
* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> We lost this patch for the October commitfest, didn't we?

I'm guessing you missed that a new version just got submitted..?

I'd be fine with today's being added to the october commitfest..

Of course, there's a whole independent discussion to be had about how
there wasn't any break between last commitfest and this one, but that
probably deserves its own thread.

THanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [BUGS] BUG #10823: Better REINDEX syntax.

2014-10-15 Thread Alvaro Herrera
Stephen Frost wrote:

> > >> Yes, I will update the patch.
> > > 
> > > Still planning to do this..?
> > > 
> > > Marking this back to waiting-for-author.
> > 
> > Yes, but probably not for this commitfest unfortunately.
> 
> Fair enough, I'll mark it 'returned with feedback'.

We lost this patch for the October commitfest, didn't we?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX SCHEMA

2014-10-15 Thread Sawada Masahiko
On Mon, Oct 13, 2014 at 11:16 PM, Robert Haas  wrote:
> On Sun, Oct 12, 2014 at 1:27 PM, Stephen Frost  wrote:
>> * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
>>> Sawada Masahiko wrote:
>>> > Attached WIP patch adds new syntax REINEX SCHEMA which does reindexing
>>> > all table of specified schema.
>>> > There are syntax dose reindexing specified index, per table and per 
>>> > database,
>>> > but we can not do reindexing per schema for now.
>>>
>>> It seems doubtful that there really is much use for this feature, but if
>>> there is, I think a better syntax precedent is the new ALTER TABLE ALL
>>> IN TABLESPACE thingy, rather than your proposed REINDEX SCHEMA.
>>> Something like REINDEX TABLE ALL IN SCHEMA perhaps.
>>
>> Yeah, I tend to agree that we should be looking at the 'ALL IN
>> TABLESPACE' and 'ALL IN SCHEMA' type of commands to keep things
>> consistent.  This might be an alternative for the vacuum / analyze /
>> reindex database commands also..
>
> Urgh.  I don't have a problem with that syntax in general, but it
> clashes pretty awfully with what we're already doing for REINDEX
> otherwise.
>

Attached patches are latest version patch.
I changed syntax to REINDEX ALL IN SCHEMA, but I felt a sense of
discomfort a little
as Robert mentioned.

Anyway, you can apply these patches in numerical order,
can use REINDEX ALL IN SCHEMA feature and  "-S/--schema" option in reindexdb.

000_reindex_all_in_schema_v2.patch : It contains REINDEX ALL IN SCHEMA feature
001_Add_schema_option_to_reindexdb_v1.patch : It contains reindexdb
"-S/--schema" supporting

Please review and comments.

Regards,

---
Sawada Masahiko


000_reindex_all_in_schema_v2.patch
Description: Binary data


001_Add_schema_option_to_reindexdb_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-15 Thread Simon Riggs
On 15 October 2014 14:46, Robert Haas  wrote:

>> When my family goes to a restaurant, any member of the party may ask
>> for a table and the request is granted for the whole family. But the
>> lock is released only when I pay the bill. Once we have the table, any
>> stragglers know we have locked the table and they just come sit at the
>> table without needing to make their own lock request to the Maitre D',
>> though they clearly cache the knowledge that we have the table locked.

> Hmm, interesting idea.  Suppose, though, that the child process
> requests a lock that can't immediately be granted, because the catalog
> it's trying to access is locked in AccessExclusiveLock mode by an
> unrelated transaction.  The unrelated transaction, in turn, is blocked
> trying to acquire some resource, which the top level parallelism
> process.  Assuming the top level parallelism process is waiting for
> the child (or will eventually wait), this is a deadlock, but without
> some modification to the deadlock detector, it can't see one of the
> edges.

Family disputes are fairly easily resolved ;-)

The first and basic point is that in most cases the parent should
already hold the required locks. This can only happen for briefly held
locks and/or more complex stuff. In the first case, getting
parallelism to work without that complex stuff would be useful. I'd be
happy if the first version simply throws an error if a child can't
acquire a lock immediately. Don't overthink the first version. Knowing
you'll disagree, lets take a further step...

Second point, the relationship between parent and children is clear.
If we do a deadlock detection, we should be able to search for that as
a special case, since we will know that we are a child and that such a
situation might occur. So just add in an edge so the rest of the
deadlock code works fine.

If that doesn't work, use a heurisic. If parent is waiting when child
does deadlock test, assume its a deadlock and abort the child
speculatively just in case. You can work out how to do that better in
the future, since it won't happen that often.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Locking for Rename To new_name works differently for different objects

2014-10-15 Thread Tom Lane
Amit Kapila  writes:
> I have observed that for renaming some of the objects
> AccessExclusiveLock is taken on object whereas for
> other kind of objects no lock is taken on object before
> renaming the object.

The usual theory for DDL updates of all types (not just rename)
is that an explicit lock is only needed for objects whose catalog
representation comprises more than one row.  Otherwise, the implicit
locking involved in updating that row is sufficient to serialize
different updates.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to make ResourceOwnerForgetBuffer() O(1), instead of O(N^2) scale

2014-10-15 Thread Kouhei Kaigai
> > On 10/03/2014 07:08 AM, Kouhei Kaigai wrote:
> > > Hello,
> > >
> > > I recently got a trouble on development of my extension that
> > > utilizes the shared buffer when it released each buffer page.
> > >
> > > This extension transfers contents of the shared buffers to GPU
> > > device using DMA feature, then kicks a device kernel code.
> >
> > Wow, that sounds crazy.
> >
> > > Once backend/extension calls ReadBuffer(), resowner.c tracks which
> > > buffer was referenced by the current resource owner, to ensure these
> > > buffers being released at end of the transaction.
> > > However, it seems to me implementation of resowner.c didn't assume
> > > many buffers are referenced by a particular resource owner
> simultaneously.
> > > It manages the buffer index using an expandable array, then looks up
> > > the target buffer by sequential walk but from the tail because
> > > recently pinned buffer tends to be released first.
> > > It made a trouble in my case. My extension pinned multiple thousands
> > > buffers, so owner->buffers[] were enlarged and takes expensive cost
> > > to walk on.
> > > In my measurement, ResourceOwnerForgetBuffer() takes 36 seconds in
> > > total during hash-joining 2M rows; even though hash-joining itself
> > > takes less than 16 seconds.
> > >
> > > What is the best way to solve the problem?
> >
> > How about creating a separate ResourceOwner for these buffer pins, and
> > doing a wholesale ResourceOwnerRelease() on it when you're done?
> >
> Let me clarify your idea.
> 
> 1. Create a separate ResourceOwner under the CurrentResourceOwner.
> 2. Switch CurrentResourceOwner to the self-constructed resource owner
>during ReadBuffer() on thousands buffers.
> 3. Switch back to the original CurrentResourceOwner.
> 4. Kick DMA and run GPU device kernel (actual job running...) 5. Switch
> CurrentResourceOwner to the self-constructed resource owner
>again, during ReleaseBuffer() in reverse order.
> 6. Switch back to the original CurrentResourceOwner, then calls
>ResourceOwnerDelete().
> 
> Hmm... at this moment, I cannot find something not easy to implement.
> I'd like to try this idea, then report it later.
> 
Let me share the result.

The above approach could eliminated my problem. Individual resource-owner
for each set of shared-buffer and ReleaseBuffer() in reverse order reduced
time to release pinned buffers from 36sec->67.3msec when I run hash-joining
on 20M records; that involves 59168 buffers are pinned concurrently in
maximum (32 asynchronous requests, a request packs 1849 buffers).

Thanks for the suggestion!


postgres=# explain analyze select * from t0 natural join t1 natural join t2;
INFO:  time to release buffers: 67.289ms

QUERY PLAN

---
Custom (GpuHashJoin)  (cost=3468.00..471640.11 rows=19740924 width=139) (actual 
time=193.086..5913.459 rows=2000 loops=1)
   pseudo scan tlist: 1:(t0.bid), 2:(t0.aid), 3:, 4:, 
5:, 6:, 7:, 8:, 9:, 11:, 
12:[t2.bid], 10:, 13:[t1.aid]
   hash clause 1: (t0.bid = t2.bid)
   hash clause 2: (t0.aid = t1.aid)
   ->  Custom (GpuScan) on t0  (cost=500.00..467167.24 rows=2024 width=73) 
(actual time=7.757..1056.426 rows=2000 loops=1)
   ->  Custom (MultiHash)  (cost=734.00..734.00 rows=4 width=37) (actual 
time=23.382..23.382 rows=4 loops=1)
 hash keys: bid
 ->  Seq Scan on t2  (cost=0.00..734.00 rows=4 width=37) (actual 
time=0.007..5.124 rows=4 loops=1)
 ->  Custom (MultiHash)  (cost=734.00..734.00 rows=4 width=37) 
(actual time=11.919..11.919 rows=4 loops=1)
   hash keys: aid
   ->  Seq Scan on t1  (cost=0.00..734.00 rows=4 width=37) 
(actual time=0.010..5.299 rows=4 loops=1)
 Planning time: 0.904 ms
 Execution time: 6667.986 ms
(13 rows)

--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for better support of time-varying timezone abbreviations

2014-10-15 Thread Tom Lane
Michael Meskes  writes:
> On 15.10.2014 00:26, Tom Lane wrote:
>> * I've not touched ecpg except for cosmetic changes to keep the struct
>> definitions in sync, and to fix the previously-mentioned bogus free()
>> attempt.  I doubt that it would be worth teaching ecpg how to access the
>> zic timezone database --- the problem of configuring where to find those
>> files seems like more trouble than it's worth given the lack of
>> complaints.  I'm not sure what we should do about the obsolete timezone
>> abbreviations in its table.

> Maybe we should just remove thme for the new release. Yes, that might
> break some applications, but then the server doesn't know these either,
> so the applications might break anyway.

The same thought had occurred to me.  Probably the main use of the
datetime parsing code in ecpg is for interpreting outputs from the
server, and (at least by default) the server doesn't use timezone
abbreviations when printing timestamps.  So maybe that's largely
dead code anyhow.  I would not propose back-patching such a change,
but we could try it in 9.5 and see if anyone complains.

A less drastic remedy would be to remove just those abbreviations
whose meaning has actually changed over time.  Eventually that
might be all of them ... but in the meantime, we could at least
argue that we weren't breaking any case that worked well before.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Buffer Requests Trace

2014-10-15 Thread Stephen Frost
* Lucas Lersch (lucasler...@gmail.com) wrote:
> So is it a possible normal behavior that running tpcc for 10min only access
> 50% of the database? Furthermore, is there a guideline of parameters for
> tpcc (# of warehouses, execution time, operations weight)?

Depends- you may be aware that we support index-only scans in certain
situations.  This means that only the index page for a given relation
(and the visibility map) are accessed, and the heap is not.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-15 Thread Robert Haas
On Wed, Oct 15, 2014 at 4:18 AM, Simon Riggs  wrote:
> On 15 October 2014 05:13, Tom Lane  wrote:
>> Robert Haas  writes:
>>> For parallelism, I think we need a concept of group locking.  That is,
>>> suppose we have a user backend and N worker backends collaborating to
>>> execute some query.  For the sake of argument, let's say it's a
>>> parallel CLUSTER, requiring a full table lock.  We need all of the
>>> backends to be able to lock the table even though at least one of them
>>> holds AccessExclusiveLock.  This suggests that the backends should all
>>> be members of a locking group, and that locks within the same locking
>>> group should be regarded as mutually non-conflicting.
>>
>> In the background worker case, I imagined that the foreground process
>> would hold a lock and the background processes would just assume they
>> could access the table without holding locks of their own.  Aren't
>> you building a mountain where a molehill would do?
>
> Yeh. Locks should be made in the name of the main transaction and
> released by it.
>
> When my family goes to a restaurant, any member of the party may ask
> for a table and the request is granted for the whole family. But the
> lock is released only when I pay the bill. Once we have the table, any
> stragglers know we have locked the table and they just come sit at the
> table without needing to make their own lock request to the Maitre D',
> though they clearly cache the knowledge that we have the table locked.
>
> So all lock requests held until EOX should be made in the name of the
> top level process. Any child process wanting a lock should request it,
> but on discovering it is already held at parent level should just
> update the local lock table. Transient locks, like catalog locks can
> be made and released locally; I think there is more detail there but
> it shouldn't affect the generalisation.

Hmm, interesting idea.  Suppose, though, that the child process
requests a lock that can't immediately be granted, because the catalog
it's trying to access is locked in AccessExclusiveLock mode by an
unrelated transaction.  The unrelated transaction, in turn, is blocked
trying to acquire some resource, which the top level parallelism
process.  Assuming the top level parallelism process is waiting for
the child (or will eventually wait), this is a deadlock, but without
some modification to the deadlock detector, it can't see one of the
edges.

Figuring out what to do about that is really the heart of this
project, I think, and there are a variety of designs possible.  One of
the early ideas that I had was to the parallel workers directly
twaddle the main processes' PGPROC and lock table state.  In other
words, instead of taking locks using their own PGPROCs, everybody uses
a single PGPROC.  I made several attempts at getting designs along
these lines off the ground, but it got complicated and invasive: (1)
The processes need to coordinate to make sure that you don't have two
people twaddling the lock state at the same time; (2) The existing
data structures won't support more than one process waiting at a time,
but there's no reason why one parallel worker couldn't be trying to
lock one catalog while another one is trying to lock a different
catalog; (3) On a related note, when a lock wait ends, you can't just
wake up the process that owns the PGPROC, but rather the one that's
actually waiting; (4) the LWLockReleaseAll algorithm just falls apart
in this environment, as far as I can see.

The alternative design which I've been experimenting with is to have
each process use its own PGPROC and PROCLOCK structures, but to tag
each PROCLOCK with not only the owning PGPROC but also the group
leader's PGPROC.  This has not been entirely smooth sailing, but it
sees to break much less code than trying to have everybody use one
PGPROC.  Most of the changes that seem to be needed to make things
work are pretty well-isolated; rather than totally rearranging the
lock manager, you're just adding extra code that runs only in the
parallel case.

I'm definitely open to the idea that there's a better, simpler design
out there, but I haven't been able to think of one that doesn't break
deadlock detection.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] narwhal and PGDLLIMPORT

2014-10-15 Thread Andrew Dunstan


On 10/15/2014 01:53 AM, Craig Ringer wrote:

On 10/15/2014 12:53 PM, Noah Misch wrote:

Windows Server 2003 isn't even EOL yet.  I'd welcome a buildfarm member with
that OS and a modern toolchain.

It's possible to run multiple buildfarm animals on a single Windows
instance, each with a different toolchain.



Indeed, and even using the same buildroot. That's been built into the 
buildfarm for a long time. For example, nightjar and friarbird do this.




There's the chance of interactions that can't occur if each SDK is used
in isolation on its own machine, but it should be pretty minimal.



Make sure each has a distinct base_port.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Buffer Requests Trace

2014-10-15 Thread Lucas Lersch
So is it a possible normal behavior that running tpcc for 10min only access
50% of the database? Furthermore, is there a guideline of parameters for
tpcc (# of warehouses, execution time, operations weight)?

On Wed, Oct 15, 2014 at 3:09 PM, Simon Riggs  wrote:

> On 15 October 2014 13:44, Lucas Lersch  wrote:
>
> > I am recording the BufferDesc.tag.blockNum for the buffer along with the
> > spcNode, dbNode, relNode, also present in the tag.
>
> The TPC-C I/O is random, so if you run it for longer you should see a
> wider set.
>
> Cacheing isn't possible as a way to improve txn rates.
>
> Check that you're touching all tables.
>
> --
>  Simon Riggs   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>



-- 
Lucas Lersch


Re: [HACKERS] Buffer Requests Trace

2014-10-15 Thread Simon Riggs
On 15 October 2014 13:44, Lucas Lersch  wrote:

> I am recording the BufferDesc.tag.blockNum for the buffer along with the
> spcNode, dbNode, relNode, also present in the tag.

The TPC-C I/O is random, so if you run it for longer you should see a wider set.

Cacheing isn't possible as a way to improve txn rates.

Check that you're touching all tables.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Buffer Requests Trace

2014-10-15 Thread Lucas Lersch
I am recording the BufferDesc.tag.blockNum for the buffer along with the
spcNode, dbNode, relNode, also present in the tag.

On Wed, Oct 15, 2014 at 2:27 PM, Simon Riggs  wrote:

> On 15 October 2014 12:49, Lucas Lersch  wrote:
> > Sorry for taking so long to answer. I am sending attached the patch with
> the
> > changes I did to pgsql code. I followed the steps for compiling and
> > installing pgsql from:
> > http://www.postgresql.org/docs/current/static/install-short.html
>
> Are you recording the bufferid or the blockid?
>
> --
>  Simon Riggs   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>



-- 
Lucas Lersch


Re: [HACKERS] Proposal for better support of time-varying timezone abbreviations

2014-10-15 Thread Michael Meskes
On 15.10.2014 00:26, Tom Lane wrote:
> * I've not touched ecpg except for cosmetic changes to keep the struct
> definitions in sync, and to fix the previously-mentioned bogus free()
> attempt.  I doubt that it would be worth teaching ecpg how to access the
> zic timezone database --- the problem of configuring where to find those
> files seems like more trouble than it's worth given the lack of
> complaints.  I'm not sure what we should do about the obsolete timezone
> abbreviations in its table.

Maybe we should just remove thme for the new release. Yes, that might
break some applications, but then the server doesn't know these either,
so the applications might break anyway.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at gmail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Buffer Requests Trace

2014-10-15 Thread Simon Riggs
On 15 October 2014 12:49, Lucas Lersch  wrote:
> Sorry for taking so long to answer. I am sending attached the patch with the
> changes I did to pgsql code. I followed the steps for compiling and
> installing pgsql from:
> http://www.postgresql.org/docs/current/static/install-short.html

Are you recording the bufferid or the blockid?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] WIP: Access method extendability

2014-10-15 Thread Alexander Korotkov
Hackers,

Postgres was initially designed to support access methods extendability.
This extendability lives to present day. However, this is mostly internal
in-core extendability. One can quite easily add new access method into
PostgreSQL core. But if one try to implement access method as external
module, he will be faced with following difficulties:

   1. Need to directly insert into pg_am, because of no "CREATE ACCESS
   METHOD" command. And no support of dependencies between am and opclasses
   etc.
   2. Module can't define xlog records. So, new am would be not WAL-logged.

The first problem is purely mechanical. Nothing prevents us to implement
"CREATE ACCESS METHOD" and "DROP ACCESS METHOD" commands and support all
required dependencies.

Problem of WAL is a bit more complex. According to previous discussions, we
don't want to let extensions declare their own xlog records. If we let them
then recovery process will depend on extensions. That is much violates
reliability. Solution is to implement some generic xlog record which is
able to represent difference between blocks in some general manner.

3 patches are attached:

   1. CREATE/DROP ACCESS METHOD commands. With support of dependencies.
   2. New generic xlog record type.
   3. Bloom contrib module as example of usage of previous two features.
   This module was posted few years ago by Teodor Sigaev. Now, it's wrapped as
   an extension and WAL-logged.

Patches are in WIP state. No documentation and very little of comments.
However, it believe that it's enough to do some general concept review.

Some notes about generic xlog format. Generic xlog represent difference
between pages as operations set of two kinds:

   1. Move memory inside the page. Optional flag is to zero gap on a
   previous memory location.
   2. Copy memory fragment of memory from xlog record to page. As an option
   bitwise logical operations are supported as well as plain copy.

Generic xlog can open page in two modes:

   1. Create mode: page is zeroed independent on its lsn.
   2. Update mode: page is updated only if it's lsn is lower than record lsn

Usually, xlog record is filled in critical sections when memory allocations
is prohibited. Thus, user have to previously initialize it with knowledge
of pages count, total operations count and total length of data.

--
With best regards,
Alexander Korotkov.


create-am.1.patch.gz
Description: GNU Zip compressed data


generic-xlog.1.patch.gz
Description: GNU Zip compressed data


bloom-contrib.1.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [9.4 bug] The database server hangs with write-heavy workload on Windows

2014-10-15 Thread MauMau

From: "MauMau" 
Thank you very much.  I didn't anticipate such a difficult complicated 
cause.  The user agreed to try the patch tonight.  I'll report back the 
result as soon as I got it from him.


The test ran successfully without hang for 24 hours.  It was run with your 
patch + the following:


BTW, in LWLockWaitForVar(), the first line of the following code fragment 
is not necessary, because lwWaitLink is set to head immediately.  I think 
it would be good to eliminate as much unnecessary code as possible from 
the spinlock section.


 proc->lwWaitLink = NULL;

 /* waiters are added to the front of the queue */
 proc->lwWaitLink = lock->head;



Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Buffer Requests Trace

2014-10-15 Thread Lucas Lersch
Sorry for taking so long to answer. I am sending attached the patch with
the changes I did to pgsql code. I followed the steps for compiling and
installing pgsql from:
http://www.postgresql.org/docs/current/static/install-short.html

In summary, the page_id of the page being released in ReleaseBuffer() and
ReleaseAndReadBuffer() is written to the file: /usr/loca/pgsql/data/trace.
This file is created manually.

I have also created a PrivateDirtyFlag for each backend, in analogy to the
PrivateRefCount. I use this to keep track if the current backend performed
an update operation in a page in the buffer pool or simply a read operation
(it is not relevant now). The trace file consists of one line for each
ReleaseBuffer() or ReleaseAndReadBuffer() call. The line has the format:

operation,tblSpace,dbNode,relNode,blockNumber

Once the trace file is complete after the execution of the tpcc benchmark,
I use the following bash script to get only unique pages:

cut -d',' -f2-5 trace | sort -n -t',' -k1 -k2 -k3 -k4 | uniq

Today I  realized that I was making a mistake in executing the
oltpbenchmark application. From the 64 warehouses created for tpcc, only 1
was being accessed (the 14k distinct pages that I mentioned). I increased
the "terminal" option of the tpcc benchmark from 1 to 64, resulting in one
terminal for each warehouse.

This provided me with a higher number of distinct pages being
accessed. Unfortunately, from the 800k pages in the database (64
warehouses), executing tpcc for 10min resulted in 400k distinct pages being
accessed. This number is much better than the previous results, but I think
it is still not realistic.

I would like to thank you guys for all the attention given to my problem :)


On Wed, Oct 15, 2014 at 9:49 AM, Simon Riggs  wrote:

> On 14 October 2014 17:08, Lucas Lersch  wrote:
>
> > Unfortunately, in the generated trace with over 2 million buffer
> requests,
> > only ~14k different pages are being accessed, out of the 800k of the
> whole
> > database. Am I missing something here?
>
> We can't tell what you're doing just by knowing the number of unique
> items in your list.
>
> --
>  Simon Riggs   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>



-- 
Lucas Lersch
diff -rupN ./postgresql-9.3.5_original/config.log ./postgresql-9.3.5_trace/config.log
--- ./postgresql-9.3.5_original/config.log	2014-10-15 10:30:11.552923099 +0200
+++ ./postgresql-9.3.5_trace/config.log	2014-10-15 10:35:45.786580479 +0200
@@ -349,7 +349,7 @@ configure:7734: $? = 0
 configure:7755: result: yes
 configure:7766: checking for library containing setproctitle
 configure:7807: gcc -o conftest -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard  -D_GNU_SOURCEconftest.c -lm  >&5
-/tmp/ccf4os1l.o: In function `main':
+/tmp/ccMWb4Lv.o: In function `main':
 conftest.c:(.text.startup+0x7): undefined reference to `setproctitle'
 collect2: ld returned 1 exit status
 configure:7814: $? = 1
@@ -389,7 +389,7 @@ configure: failed program was:
 |   return 0;
 | }
 configure:7807: gcc -o conftest -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard  -D_GNU_SOURCEconftest.c -lutil  -lm  >&5
-/tmp/ccSriijn.o: In function `main':
+/tmp/ccxsb80w.o: In function `main':
 conftest.c:(.text.startup+0x7): undefined reference to `setproctitle'
 collect2: ld returned 1 exit status
 configure:7814: $? = 1
@@ -431,7 +431,7 @@ configure: failed program was:
 configure:7845: result: no
 configure:7853: checking for library containing dlopen
 configure:7894: gcc -o conftest -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard  -D_GNU_SOURCEconftest.c -lm  >&5
-/tmp/ccZgGSzt.o: In function `main':
+/tmp/ccBvyipD.o: In function `main':
 conftest.c:(.text.startup+0x7): undefined reference to `dlopen'
 collect2: ld returned 1 exit status
 configure:7901: $? = 1
@@ -479,7 +479,7 @@ configure:7988: $? = 0
 configure:8019: result: none required
 configure:8027: checking for library containing shl_load
 configure:8068: gcc -o conftest -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard  -D_GNU_SOURCEconftest.c -ldl -lm  >&5
-/tmp/cc2JE0qC.o: In function `main':
+/tmp/ccSVMb9F.o: In function `main':
 conftest.c:(.text.startup+0x7): undefined reference to `shl_load'
 collect2: ld returned 1 exit status
 configure:8075: $? = 1
@@ -564,7 +564,7 @@ configure:8254: $? = 0
 configure:8285: result: none required
 configure:829

Re: [HACKERS] jsonb generator functions

2014-10-15 Thread Pavel Stehule
2014-10-13 17:22 GMT+02:00 Andrew Dunstan :

>
> On 10/13/2014 09:37 AM, Andrew Dunstan wrote:
>
>>
>> On 09/26/2014 04:54 PM, Andrew Dunstan wrote:
>>
>>>
>>> Here is a patch for the generator and aggregate functions for jsonb that
>>> we didn't manage to get done in time for 9.4. They are all equivalents of
>>> the similarly names json functions. Included are
>>>
>>> to_jsonb
>>> jsonb_build_object
>>> jsonb_build_array
>>> jsonb_object
>>> jsonb_agg
>>> jsonb_object_agg
>>>
>>>
>>> Still to come: documentation.
>>>
>>> Adding to the next commitfest.
>>>
>>
>>
>> Revised patch to fix compiler warnings.
>>
>>
> And again, initializing an incompletely initialized variable, as found by
> Pavel Stehule.
>

I checked a code, and I have only two small objection - a name
"jsonb_object_two_arg" is not good - maybe "json_object_keys_values" ?

Next: there are no tests for to_jsonb function.

Regards

Pavel


>
> cheers
>
> andrew
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


[HACKERS] Locking for Rename To new_name works differently for different objects

2014-10-15 Thread Amit Kapila
I have observed that for renaming some of the objects
AccessExclusiveLock is taken on object whereas for
other kind of objects no lock is taken on object before
renaming the object.

The object's that are renamed via AlterObjectRename_internal()
takes the lock (during get_object_address() call) whereas for
other objects, there is no lock.  I think there is exception to it
i.e for Rename table, it also takes lock.  Refer below function:


ExecRenameStmt()
{

..
..
case OBJECT_AGGREGATE:
case OBJECT_COLLATION:
case OBJECT_CONVERSION:
case OBJECT_EVENT_TRIGGER:
case OBJECT_FDW:
...

address = get_object_address(stmt->renameType,
stmt->object, stmt->objarg,
&relation,
AccessExclusiveLock, false);

Assert(relation == NULL);

catalog = heap_open(address.classId, RowExclusiveLock);

AlterObjectRename_internal(catalog,
  address.objectId,
  stmt->newname);
..

}

Is there a reason for different locking strategy?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Patch to support SEMI and ANTI join removal

2014-10-15 Thread David Rowley
On Thu, Oct 9, 2014 at 12:40 AM, Andres Freund 
wrote:

> On 2014-10-09 00:21:44 +1300, David Rowley wrote:
> > Ok, so I've been hacking away at this for a couple of evenings and I
> think
> > I have a working prototype finally!
>
> Cool!
>
>
Patch attached.


> > So it seems it's not quite as efficient as join removal at planning time,
> > but still a big win when it's possible to perform the join skipping.
>
> Have you checked where the overhead is? Is it really just the additional
> node that the tuples are passed through?
>
>
I've not checked this yet, but I'd assume that it has to be from the extra
node. I'll run some profiles soon.


> Have you measured the overhead of the plan/execution time checks over
> master?
>

I did a bit of benchmarking last night, but this was mostly for testing
that I've not added too much overhead on the nest loop code.
For the merge and hashjoin code I've managed to keep the special skipping
code in the EXEC_MJ_INITIALIZE_OUTER and HJ_BUILD_HASHTABLE part of the
main switch statement, so the extra checks should only be performed on the
first call of the node when skips are not possible. For nested loop I can't
see any other way but to pay the small price of setting the skipflags and
checking if there are any skip flags on every call to the node.

I tested the overhead of this on my laptop by creating 2 tables with 1
million rows each, joining them on an INT column, where each value of the
int column was unique. I seem to have added about a 2% overhead to this. :(
Which I was quite surprised at, giving it's just 101 million extra
settings if the skipflags and checking that the skip flags are not empty,
but perhaps the extra local variable is causing something else to not make
it into a register.   At the moment I can't quite see another way to do it,
but I guess it may not be the end of the world as the chances of having to
perform a nest loop join on 2 tables of that size is probably not that
high.

Test case:
create table t3 (id int primary key);
create table t2 (id int primary key, t3_id int not null references t3);
create table t1 (id int primary key, t2_id int not null references t2);
insert into t3 select x.x from generate_series(1,100) x(x);
insert into t2 select x.x,x.x from generate_series(1,100) x(x);
insert into t1 select x.x,x.x from generate_series(1,100) x(x);
vacuum;
set enable_hashjoin = off;
set enable_mergejoin = off;

select count(*) from t1 inner join t2 on t1.id=t2.id;

Unpatched:
duration: 120 s
number of transactions actually processed: 45
latency average: 2666.667 ms
tps = 0.371901 (including connections establishing)
tps = 0.371965 (excluding connections establishing)

Patched:
Master
duration: 120 s
number of transactions actually processed: 44
latency average: 2727.273 ms
tps = 0.363933 (including connections establishing)
tps = 0.363987 (excluding connections establishing)

102.19%

Of course if we do the join on the column that has the foreign key, then
this is much faster.

test=# select count(*) from t1 inner join t2 on t1.t2_id=t2.id;
  count
-
 100
(1 row)


Time: 105.206 ms

The explain analyze from the above query looks like:
test=# explain (analyze, costs off, timing off) select count(*) from t1
inner join t2 on t1.t2_id=t2.id;
QUERY PLAN
--
 Aggregate (actual rows=1 loops=1)
   ->  Nested Loop (actual rows=100 loops=1)
 ->  Seq Scan on t1 (actual rows=100 loops=1)
 ->  Index Only Scan using t2_pkey on t2 (never executed)
   Index Cond: (id = t1.t2_id)
   Heap Fetches: 0
 Execution time: 124.990 ms
(7 rows)

As you can see the scan on t2 never occurred.

I've so far only managed to come up with 1 useful regression test for this
new code. It's not possible to tell if the removal has taken place at plan
time, as the plan looks the same as if it didn't get removed. The only way
to tell us from the explain analyze, but the problem is that the execution
time is shown and can't be removed. Perhaps I should modify the explain to
tag join nodes that the planner has marked to say skipping may be possible?
But this is not really ideal as it's only the join nodes that know about
skipping and they just don't bother executing the child nodes, so it's
really up to the executor to decide which child nodes don't get called, so
to add something to explain might require making it more smart than it
needs to be.

Right now I'm not quite sure if I should modify any costs. Quite possibly
hash joins could have the costing reduced a little to try to encourage
hashjoins over merge joins, as with merge joins we can't skip sort
operations, but with hash joins we can skip the hash table build.

Regards

David Rowley


inner_join_removals_2014-10-15_0f3f1ea.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http

Re: [HACKERS] proposal: plpgsql - Assert statement

2014-10-15 Thread Ali Akbar
2014-09-30 10:04 GMT+07:00 Jim Nasby :

> On 9/17/14, 7:40 PM, Jan Wieck wrote:
>
>> Exactly. Doing something like
>>
>>  ASSERT (select count(*) from foo
>>  where fk not in (select pk from bar)) = 0;
>>
>> is a perfectly fine, arbitrary boolean expression. It will probably work
>> well in a development environment too. And I am very sure that it will not
>> scale well once that code gets deployed. And I know how DBAs react to the
>> guaranteed following performance problem. They will disable ALL assert ...
>> or was there some sort of assert class system proposed that I missed?
>>
>
Actually, compared with for example Java or C, in production systems,
usually all asserts are disabled for performance (in java removed by JIT,
in C we define NDEBUG).


>  We're also putting too much weight on the term "assert" here. C-style
> asserts are generally not nearly as useful in a database as general
> sanity-checking or error handling, especially if you're trying to use the
> database to enforce data sanity.
>

+1.
without any query capability, assert will become much less useful. If we
cannot query in assert, we will code:

-- perform some query
ASSERT WHEN some_check_on_query_result;

.. and disabling the query in production system will become another trouble.

My wish-list for "asserts" is:
>
> - Works at a SQL level
> - Unique/clear way to identify asserts (so you're not guessing where the
> assert came from)
>
+1


> - Allows for over-riding individual asserts (so if you need to do
> something you're "not supposed to do" you still have the protection of all
> other asserts)
> - Less verbose than IF THEN RAISE END IF
>
+1

-- 
Ali Akbar


Re: [HACKERS] Support UPDATE table SET(*)=...

2014-10-15 Thread Atri Sharma
On Wed, Oct 15, 2014 at 2:18 PM, Atri Sharma  wrote:




>
> On Wednesday, October 15, 2014, Marti Raudsepp  wrote:
>
>> Hi
>>
>> On Wed, Oct 15, 2014 at 11:02 AM, Atri Sharma 
>> wrote:
>> > Please find attached a patch which implements support for UPDATE table1
>> > SET(*)=...
>>
>> I presume you haven't read Tom Lane's proposal and discussion about
>> multiple column assignment in UPDATE:
>> http://www.postgresql.org/message-id/1783.1399054...@sss.pgh.pa.us
>> (Assigning all columns was also discussed there)
>>
>> And there's a WIP patch:
>> http://www.postgresql.org/message-id/20930.1402931...@sss.pgh.pa.us
>>
>>
>>
> Thanks for the links, but this patch only targets SET(*) case, which, if I
> understand correctly, the patch you mentioned doesn't directly handle (If I
> understand correctly, the target of the two patches is different).
>
>
Digging more, I figured that the patch I posted builds on top of Tom's
patch,  since it did not add whole row cases.

Regards,

Atri


Re: [HACKERS] Support UPDATE table SET(*)=...

2014-10-15 Thread Atri Sharma
On Wednesday, October 15, 2014, Marti Raudsepp  wrote:

> Hi
>
> On Wed, Oct 15, 2014 at 11:02 AM, Atri Sharma  > wrote:
> > Please find attached a patch which implements support for UPDATE table1
> > SET(*)=...
>
> I presume you haven't read Tom Lane's proposal and discussion about
> multiple column assignment in UPDATE:
> http://www.postgresql.org/message-id/1783.1399054...@sss.pgh.pa.us
> (Assigning all columns was also discussed there)
>
> And there's a WIP patch:
> http://www.postgresql.org/message-id/20930.1402931...@sss.pgh.pa.us
>
>
>
Thanks for the links, but this patch only targets SET(*) case, which, if I
understand correctly, the patch you mentioned doesn't directly handle (If I
understand correctly, the target of the two patches is different).

Regards,

Atri

-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Support UPDATE table SET(*)=...

2014-10-15 Thread Marti Raudsepp
Hi

On Wed, Oct 15, 2014 at 11:02 AM, Atri Sharma  wrote:
> Please find attached a patch which implements support for UPDATE table1
> SET(*)=...

I presume you haven't read Tom Lane's proposal and discussion about
multiple column assignment in UPDATE:
http://www.postgresql.org/message-id/1783.1399054...@sss.pgh.pa.us
(Assigning all columns was also discussed there)

And there's a WIP patch:
http://www.postgresql.org/message-id/20930.1402931...@sss.pgh.pa.us

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-15 Thread Simon Riggs
On 15 October 2014 05:13, Tom Lane  wrote:
> Robert Haas  writes:
>> For parallelism, I think we need a concept of group locking.  That is,
>> suppose we have a user backend and N worker backends collaborating to
>> execute some query.  For the sake of argument, let's say it's a
>> parallel CLUSTER, requiring a full table lock.  We need all of the
>> backends to be able to lock the table even though at least one of them
>> holds AccessExclusiveLock.  This suggests that the backends should all
>> be members of a locking group, and that locks within the same locking
>> group should be regarded as mutually non-conflicting.
>
> In the background worker case, I imagined that the foreground process
> would hold a lock and the background processes would just assume they
> could access the table without holding locks of their own.  Aren't
> you building a mountain where a molehill would do?

Yeh. Locks should be made in the name of the main transaction and
released by it.

When my family goes to a restaurant, any member of the party may ask
for a table and the request is granted for the whole family. But the
lock is released only when I pay the bill. Once we have the table, any
stragglers know we have locked the table and they just come sit at the
table without needing to make their own lock request to the Maitre D',
though they clearly cache the knowledge that we have the table locked.

So all lock requests held until EOX should be made in the name of the
top level process. Any child process wanting a lock should request it,
but on discovering it is already held at parent level should just
update the local lock table. Transient locks, like catalog locks can
be made and released locally; I think there is more detail there but
it shouldn't affect the generalisation.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-15 Thread Simon Riggs
On 14 October 2014 17:43, Robert Haas  wrote:
> On Sat, Oct 11, 2014 at 3:40 AM, Simon Riggs  wrote:
>> As soon as you issue the above query, you have clearly indicated your
>> intention to steal. Receiving information is no longer accidental, it
>> is an explicit act that is logged in the auditing system against your
>> name. This is sufficient to bury you in court and it is now a real
>> deterrent. Redaction has worked.
>
> To me, this feels thin.  It's true that this might be good enough for
> some users, but I wouldn't bet on it being good enough for very many
> users, and I really hope there's a better option.  We have an existing
> method of doing data redaction via security barrier views.

I agree with "thin". There is a leak in the design, so let me coin the
phrase "imprecise security". Of course, the leaks reduce the value of
such a feature; they just don't reduce it all the way to zero.

Security barrier views or views of any kind don't do the required job.

We are not able to easily classify people as Trusted or Untrusted.

We're seeking to differentiate between the right to use a column for
queries and the right to see the value itself. Or put another way, you
can read the book, you just can't photocopy it and take the copy home.
Or, you can try on the new clothes to see if they fit, but you can't
take them home for free. Both of those examples have imprecise
security measures in place to control and reduce negative behaviours
and in every other industry this is known as "security".

In IT terms, we're looking at controlling and reducing improper access
to data by an otherwise Trusted person. The only problem is that some
actions on data items are allowed, others are not.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Support UPDATE table SET(*)=...

2014-10-15 Thread Atri Sharma
Hi All,

Please find attached a patch which implements support for UPDATE table1
SET(*)=...
The patch supports both UPDATE table SET(*)=(a,b,c) and UPDATE table1
SET(*)=(SELECT a,b,c FROM...).
It solves the problem of doing UPDATE from a record variable of the same
type as the table e.g. update foo set (*) = (select foorec.*) where ...;

The design is simple. It basically expands the * in transformation stage,
does the necessary type checking and adds it to the parse tree. This allows
for normal execution for the rest of the stages.


Thoughts/Comments?

Regards,

Atri


updatestar_ver1.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Buffer Requests Trace

2014-10-15 Thread Simon Riggs
On 14 October 2014 17:08, Lucas Lersch  wrote:

> Unfortunately, in the generated trace with over 2 million buffer requests,
> only ~14k different pages are being accessed, out of the 800k of the whole
> database. Am I missing something here?

We can't tell what you're doing just by knowing the number of unique
items in your list.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-10-15 Thread Amit Kapila
On Tue, Oct 14, 2014 at 3:32 PM, Andres Freund 
wrote:
> On 2014-10-14 15:24:57 +0530, Amit Kapila wrote:
> > After that I observed that contention for LW_SHARED has reduced
> > for this load, but it didn't help much in terms of performance, so I
again
> > rechecked the profile and this time most of the contention is moved
> > to spinlock used in dynahash for buf mapping tables, please refer
> > the profile (for 128 client count; Read only load) below:
> >
> > bgreclaimer patch + wait free lw_shared acquisition patches -
> >
--
>
> This profile is without -O2 again. I really don't think it makes much
> sense to draw much inference from an unoptimized build.

Profile data with -O2 is below.  This shows that top
contributors are calls for BufTableLookup and spin lock caused
by BufTableInsert and BufTableDelete.  To resolve spin lock
contention, patch like above might prove to be useful (although
I have to still evaluate the same).  I would like to once take
LWLOCK_STATS data as well before proceeding further.
Do you have any other ideas?

   11.17%  swapper  [unknown]   [H] 0x011e0328

+   4.62% postgres  postgres[.]
hash_search_with_hash_value
+   4.35%  pgbench  [kernel.kallsyms]   [k] .find_busiest_group

+   3.71% postgres  postgres[.] s_lock

2.56% postgres  [unknown]   [H] 0x01500120

+   2.23%  pgbench  [kernel.kallsyms]   [k] .idle_cpu

+   1.97% postgres  postgres[.] LWLockAttemptLock

+   1.73% postgres  postgres[.] LWLockRelease

+   1.47% postgres  [kernel.kallsyms]   [k]
.__copy_tofrom_user_power7
+   1.44% postgres  postgres[.] GetSnapshotData

+   1.28% postgres  postgres[.] _bt_compare

+   1.04%  swapper  [kernel.kallsyms]   [k] .int_sqrt

+   1.04% postgres  postgres[.] AllocSetAlloc

+   0.97%  pgbench  [kernel.kallsyms]   [k] .default_wake_function





Detailed Data

-   4.62% postgres  postgres[.]
hash_search_with_hash_value
   - hash_search_with_hash_value

  - 2.19% BufTableLookup

 - 2.15% BufTableLookup

  ReadBuffer_common

- ReadBufferExtended

   - 1.32% _bt_relandgetbuf


 - 0.73% BufTableDelete

 - 0.71% BufTableDelete

  ReadBuffer_common

  ReadBufferExtended

  - 0.69% BufTableInsert

 - 0.68% BufTableInsert

  ReadBuffer_common

  ReadBufferExtended

0.66% hash_search_with_hash_value

-   4.35%  pgbench  [kernel.kallsyms]   [k] .find_busiest_group

   - .find_busiest_group

  - 4.28% .find_busiest_group

 - 4.26% .load_balance

- 4.26% .idle_balance

   - .__schedule

  - 4.26% .schedule_hrtimeout_range_clock

   .do_select

   .core_sys_select

-   3.71% postgres  postgres[.] s_lock

   - s_lock

  - 3.19% hash_search_with_hash_value

 - 3.18% hash_search_with_hash_value

- 1.60% BufTableInsert

 ReadBuffer_common

   - ReadBufferExtended

- 1.57% BufTableDelete

 ReadBuffer_common

   - ReadBufferExtended

  - 0.93% index_fetch_heap



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com