Re: [HACKERS] Compression of full-page-writes

2013-08-30 Thread Fujii Masao
On Fri, Aug 30, 2013 at 1:43 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Fri, Aug 30, 2013 at 8:25 AM, Fujii Masao masao.fu...@gmail.com wrote:
 Hi,

 Attached patch adds new GUC parameter 'compress_backup_block'.
 When this parameter is enabled, the server just compresses FPW
 (full-page-writes) in WAL by using pglz_compress() before inserting it
 to the WAL buffers. Then, the compressed FPW is decompressed
 in recovery. This is very simple patch.

 The purpose of this patch is the reduction of WAL size.
 Under heavy write load, the server needs to write a large amount of
 WAL and this is likely to be a bottleneck. What's the worse is,
 in replication, a large amount of WAL would have harmful effect on
 not only WAL writing in the master, but also WAL streaming and
 WAL writing in the standby. Also we would need to spend more
 money on the storage to store such a large data.
 I'd like to alleviate such harmful situations by reducing WAL size.

 My idea is very simple, just compress FPW because FPW is
 a big part of WAL. I used pglz_compress() as a compression method,
 but you might think that other method is better. We can add
 something like FPW-compression-hook for that later. The patch
 adds new GUC parameter, but I'm thinking to merge it to full_page_writes
 parameter to avoid increasing the number of GUC. That is,
 I'm thinking to change full_page_writes so that it can accept new value
 'compress'.

 I measured how much WAL this patch can reduce, by using pgbench.

 * Server spec
   CPU: 8core, Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz
   Mem: 16GB
   Disk: 500GB SSD Samsung 840

 * Benchmark
   pgbench -c 32 -j 4 -T 900 -M prepared
   scaling factor: 100

   checkpoint_segments = 1024
   checkpoint_timeout = 5min
   (every checkpoint during benchmark were triggered by checkpoint_timeout)

 * Result
   [tps]
   1386.8 (compress_backup_block = off)
   1627.7 (compress_backup_block = on)

   [the amount of WAL generated during running pgbench]
   4302 MB (compress_backup_block = off)
   1521 MB (compress_backup_block = on)

 This is really nice data.

 I think if you want, you can once try with one of the tests Heikki has
 posted for one of my other patch which is here:
 http://www.postgresql.org/message-id/51366323.8070...@vmware.com

 Also if possible, for with lesser clients (1,2,4) and may be with more
 frequency of checkpoint.

 This is just to show benefits of this idea with other kind of workload.

Yep, I will do more tests.

 I think we can do these tests later as well, I had mentioned because
 sometime back (probably 6 months), one of my colleagues have tried
 exactly the same idea of using compression method (LZ and few others)
 for FPW, but it turned out that even though the WAL size is reduced
 but performance went down which is not the case in the data you have
 shown even though you have used SSD, might be he has done some mistake
 as he was not as experienced, but I think still it's good to check on
 various workloads.

I'd appreciate if you test the patch with HDD. Now I have no machine with HDD.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Behaviour of take over the synchronous replication

2013-08-30 Thread Sawada Masahiko
On Wed, Aug 28, 2013 at 10:59 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Tue, Aug 27, 2013 at 4:51 PM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 On Sun, Aug 25, 2013 at 3:21 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Sat, Aug 24, 2013 at 2:46 PM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 On Sat, Aug 24, 2013 at 3:14 AM, Josh Berkus j...@agliodbs.com wrote:
 On 08/23/2013 12:42 AM, Sawada Masahiko wrote:
 in case (a), those priority is clear. So I think that re-taking over
 is correct behaviour.
 OHOT, in case (b), even if AAA and BBB are set same priority, AAA
 server steals SYNC replication.
 I think it is better that BBB server continue behaviour SYNC standby,
 and AAA should become potential server.

 So, you're saying that:

 1) synchronous_standby_names = '*'

 2) replica 'BBB' is the current sync standby

 3) replica 'AAA' comes online

 4) replica 'AAA' grabs sync status

 ?
 I'm sorry that you are confuse.
 It means that

 1) synchronous_standby_names = '*'

 2) replica 'AAA' is the current sync standby

 3) replica 'BBB' is the current async standby (potential sync standby)

 4) replica 'AAA' fail. after that, replica 'BBB' is current sync standby.

 5) replica 'AAA' comes online

 6) replica 'AAA' grabs sync status




 If that's the case, I'm not really sure that's undesirable behavior.
 One could argue fairly persuasively that if you care about the
 precendence order of sync replicas, you shouldn't use '*'. And the rule
 of if using *, the lowest-sorted replica name has sync is actually a
 predictable, easy-to-understand rule.

 So if you want to make this a feature request, you'll need to come up
 with an argument as to why the current behavior is bad. Otherwise,
 you're just asking us to document it better (which is a good idea).
 It is not depend on name of standby server. That is, The standby server,
 which was connected to the master server during initial configration
 replication, is top priority even if priority of two server are same.

 What is happening here is that incase of '*' as priority of both are
 same, system will choose whichever
 comes in list of registered standby's first (list is maintained in
 structure WalSndCtl).
 Each standby is registered with WalSndCtl when a new WALSender is
 started in function InitWalSenderSlot().
 As 'AAA' has been registered first it becomes preferred sync standby
 even if priorities of both are same.
 When 'AAA' goes down, it marks that Slot entry as free (by setting
 pid=0 in function WalSndKill),
 now when 'AAA' comes back again, it gets that free Slot entry and
 again becomes preferred sync standby.

 Now if we want to fix as you are suggesting which I don't think is
 necessary, we might need to change WalSndKill and some other place so
 that whenever any standby goes down, it changes slots for already
 registered standby's.
 User must remember that which standby server connected to master server at
 first.
 I think that this behavior confuse user.
 so I think that we need to modify this behaviour or if '*' is used, 
 priority
 of server is not same (modifying manual is also good).

 Here user has done the settings (setting synchronous_standby_names =
 '*'), after which he will not have any control which standby will
 become sync standby, so ideally he should not complain.

 It might be case that for some users current behavior is good enough
 which means that with '*' whichever standby has become sync standby
 first, it will be the sync standby always if alive.

 I'm thinking that it is not necessary to change WalSndKill.
 For example, we add the value (e.g., sync_standby) which have that
 which wal sender is active SYNC rep.
 And if sync_standby is already set and it is active, server doesn't
 looking for active standby.
 Only if sync_standby is not set and it is inactive, server looking for
 that which server is active SYNC rep.
 If so, we also prevent to find active SYNC rep whenever
 SyncRepReleaseWaiters() is called.
For '*' case, it will be okay, but when the user has given proper
 names, in that case even if there is any active Sync
Rep, it has to be changed based on priority.

I think here how to provide a fix, so that behavior gets changed to
 what you describe is a second priority work, first
is to show the value of use-case. Do you really know where people
 actually setup using '*' as configuration and if
yes, are they annoyed with current behavior?
I have thought about it, but could imagine a scenario where people
 will be using '*' in their production
configurations, may be it will be useful in test labs.

Thank you for your feedback.
I have implemented the patch which change how to put priority on each
walsender, based on I suggested.
I added sync_standby value into WalCtl value. This value has that
which walsender is active sync rep.
This patch  handle also for case that user has given proper names.

Regards,

---
Sawada Masahiko


Sync_standby_priority_v1.patch

Re: [HACKERS] Compression of full-page-writes

2013-08-30 Thread Fujii Masao
On Fri, Aug 30, 2013 at 2:32 PM, KONDO Mitsumasa
kondo.mitsum...@lab.ntt.co.jp wrote:
 (2013/08/30 11:55), Fujii Masao wrote:

 * Benchmark
pgbench -c 32 -j 4 -T 900 -M prepared
scaling factor: 100

checkpoint_segments = 1024
checkpoint_timeout = 5min
(every checkpoint during benchmark were triggered by
 checkpoint_timeout)

 Did you execute munual checkpoint before starting benchmark?

Yes.

 We read only your message, it occuered three times checkpoint during
 benchmark.
 But if you did not executed manual checkpoint, it would be different.

 You had better clear this point for more transparent evaluation.

What I executed was:

-
CHECKPOINT
SELECT pg_current_xlog_location()
pgbench -c 32 -j 4 -T 900 -M prepared -r -P 10
SELECT pg_current_xlog_location()
SELECT pg_xlog_location_diff() -- calculate the diff of the above locations
-

I repeated this several times to eliminate the noise.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2013-08-30 Thread Peter Geoghegan
On Thu, Aug 29, 2013 at 10:55 PM, Fujii Masao masao.fu...@gmail.com wrote:
 I suppose that the cost of the random I/O involved would
 probably dominate just as with compress_backup_block = off. That said,
 you've used an SSD here, so perhaps not.

 Oh, maybe my description was confusing. full_page_writes was enabled
 while running the benchmark even if compress_backup_block = off.
 I've not merged those two parameters yet. So even in
 compress_backup_block = off, random I/O would not be increased in recovery.

I understood it that way. I just meant that it could be that the
random I/O was so expensive that the additional cost of decompressing
the FPIs looked insignificant in comparison. If that was the case, the
increase in recovery time would be modest.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2013-08-30 Thread Fujii Masao
On Fri, Aug 30, 2013 at 2:37 PM, Nikhil Sontakke nikkh...@gmail.com wrote:
 Hi Fujii-san,

 I must be missing something really trivial, but why not try to compress all
 types of WAL blocks and not just FPW?

The size of non-FPW WAL is small, compared to that of FPW.
I thought that compression of such a small WAL would not have
big effect on the reduction of WAL size. Rather, compression of
every WAL records might cause large performance overhead.

Also, focusing on FPW makes the patch very simple. We can
add the compression of other WAL later if we want.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2013-08-30 Thread Heikki Linnakangas

On 30.08.2013 05:55, Fujii Masao wrote:

* Result
   [tps]
   1386.8 (compress_backup_block = off)
   1627.7 (compress_backup_block = on)


It would be good to check how much of this effect comes from reducing 
the amount of data that needs to be CRC'd, because there has been some 
talk of replacing the current CRC-32 algorithm with something faster. 
See 
http://www.postgresql.org/message-id/20130829223004.gd4...@awork2.anarazel.de. 
It might even be beneficial to use one routine for full-page-writes, 
which are generally much larger than other WAL records, and another 
routine for smaller records. As long as they both produce the same CRC, 
of course.


Speeding up the CRC calculation obviously won't help with the WAL volume 
per se, ie. you still generate the same amount of WAL that needs to be 
shipped in replication. But then again, if all you want to do is to 
reduce the volume, you could just compress the whole WAL stream.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Add pgbench option: CHECKPOINT before starting benchmark

2013-08-30 Thread KONDO Mitsumasa
Hi,

I add checkpoint option to pgbench.

pgbench is simple and useful benchmark for every user. However, result of
benchmark greatly changes by some situations which are in executing checkpoint,
number of dirty buffers in share_buffers, and so on. For such a problem, it is
custom to carry out a checkpoint before starting benchmark. But it is a fact 
that
the making of the script takes time, like under following script.

psql -U postgres -d pgbench -p5432 -c CHECKPOINT
pgbench -T 600 -c 12 -j4 -U postgres -d pgbench -p 5432

However, this script have a problem.
This script execute CHECKPOINT - VACUUM - starting benchmark.
If relpages have lot of dirty pages, VACUUM generate dirty buffers on
shared_buffers, and it will cause bad heavily checkpoint.

I think pgbench would be more easy and accuracy benchmark tools for everyone. So
I set checkpoint before starting benchmark.

This patch's output is here.
-
[mitsu-ko@localhost pgbench]$ ./pgbench
starting vacuum...end.
starting checkpoint...end.
transaction type: TPC-B (sort of)
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 10/10
tps = 312.851958 (including connections establishing)
tps = 364.524478 (excluding connections establishing)
-
It execute VACUUM - CHECKPOINT - starting benchmark.
I think it is ideal setting for more accuracy benchmark.

My patches option difinition is here.
[mitsu-ko@localhost pgbench]$ ./pgbench --help
~
-N, --no-checkpoint  do not run CHECKPOINT after initialization
~
In latest commited pgbench, -N is --skip-some-updates  skip updates of
pgbench_tellers and pgbench_branches. But I cannot understand why -N is this
option, so I set this option -u, and -N is do not run CHECKPOINT option.

What do you think?
-- 
Mitsumasa KONDO
NTT Open Source Software Center
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index ad8e272..523c278 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -347,6 +347,7 @@ usage(void)
 		 -i, --initialize invokes initialization mode\n
 		 -F, --fillfactor=NUM set fill factor\n
 		 -n, --no-vacuum  do not run VACUUM after initialization\n
+		 -N, --no-checkpoint  do not run CHECKPOINT after initialization\n
 		 -q, --quiet  quiet logging (one message each 5 seconds)\n
 		 -s, --scale=NUM  scaling factor\n
 		 --foreign-keys   create foreign key constraints between tables\n
@@ -366,7 +367,8 @@ usage(void)
 		  protocol for submitting queries 
 		   (default: simple)\n
 		 -n, --no-vacuum  do not run VACUUM before tests\n
-		 -N, --skip-some-updates  skip updates of pgbench_tellers and pgbench_branches\n
+		 -N, --no-checkpoint  do not run CHECKPOINT before tests\n
+		 -u, --skip-some-updates  skip updates of pgbench_tellers and pgbench_branches\n
 		 -P, --progress=NUM   show thread progress report every NUM seconds\n
 		 -r, --report-latencies   report average latency per command\n
 		 -R, --rate=SPEC  target rate in transactions per second\n
@@ -1520,7 +1522,7 @@ disconnect_all(CState *state, int length)
 
 /* create tables and setup data */
 static void
-init(bool is_no_vacuum)
+init(bool is_no_vacuum, bool is_no_checkpoint)
 {
 /* The scale factor at/beyond which 32bit integers are incapable of storing
  * 64bit values.
@@ -1775,6 +1777,12 @@ init(bool is_no_vacuum)
 		}
 	}
 
+	/* checkpoint */
+	if (!is_no_checkpoint)
+	{
+		fprintf(stderr, checkpoint...\n);
+		executeStatement(con, checkpoint);
+	}
 
 	fprintf(stderr, done.\n);
 	PQfinish(con);
@@ -2213,6 +2221,7 @@ main(int argc, char **argv)
 		{jobs, required_argument, NULL, 'j'},
 		{log, no_argument, NULL, 'l'},
 		{no-vacuum, no_argument, NULL, 'n'},
+		{no-checkpoint, no_argument, NULL, 'N'},
 		{port, required_argument, NULL, 'p'},
 		{progress, required_argument, NULL, 'P'},
 		{protocol, required_argument, NULL, 'M'},
@@ -2220,7 +2229,7 @@ main(int argc, char **argv)
 		{report-latencies, no_argument, NULL, 'r'},
 		{scale, required_argument, NULL, 's'},
 		{select-only, no_argument, NULL, 'S'},
-		{skip-some-updates, no_argument, NULL, 'N'},
+		{skip-some-updates, no_argument, NULL, 'u'},
 		{time, required_argument, NULL, 'T'},
 		{transactions, required_argument, NULL, 't'},
 		{username, required_argument, NULL, 'U'},
@@ -2240,6 +2249,7 @@ main(int argc, char **argv)
 	int			nclients = 1;	/* default number of simulated clients */
 	int			nthreads = 1;	/* default number of threads */
 	int			is_init_mode = 0;		/* initialize mode? */
+	int			is_no_checkpoint = 0;		/* no checkpoint at all before testing? */
 	int			is_no_vacuum = 0;		/* no vacuum at all before testing? */
 	int			

Re: [HACKERS] Add pgbench option: CHECKPOINT before starting benchmark

2013-08-30 Thread Mark Kirkwood
On 30/08/13 19:54, KONDO Mitsumasa wrote:
 Hi,
 
 I add checkpoint option to pgbench.
 
 pgbench is simple and useful benchmark for every user. However, result of
 benchmark greatly changes by some situations which are in executing 
 checkpoint,
 number of dirty buffers in share_buffers, and so on. For such a problem, it is
 custom to carry out a checkpoint before starting benchmark. But it is a fact 
 that
 the making of the script takes time, like under following script.
 
 psql -U postgres -d pgbench -p5432 -c CHECKPOINT
 pgbench -T 600 -c 12 -j4 -U postgres -d pgbench -p 5432
 
 However, this script have a problem.
 This script execute CHECKPOINT - VACUUM - starting benchmark.
 If relpages have lot of dirty pages, VACUUM generate dirty buffers on
 shared_buffers, and it will cause bad heavily checkpoint.
 
 I think pgbench would be more easy and accuracy benchmark tools for everyone. 
 So
 I set checkpoint before starting benchmark.
 
 This patch's output is here.
 -
 [mitsu-ko@localhost pgbench]$ ./pgbench
 starting vacuum...end.
 starting checkpoint...end.
 transaction type: TPC-B (sort of)
 scaling factor: 1
 query mode: simple
 number of clients: 1
 number of threads: 1
 number of transactions per client: 10
 number of transactions actually processed: 10/10
 tps = 312.851958 (including connections establishing)
 tps = 364.524478 (excluding connections establishing)
 -
 It execute VACUUM - CHECKPOINT - starting benchmark.
 I think it is ideal setting for more accuracy benchmark.
 
 My patches option difinition is here.
 [mitsu-ko@localhost pgbench]$ ./pgbench --help
 ~
 -N, --no-checkpoint  do not run CHECKPOINT after initialization
 ~
 In latest commited pgbench, -N is --skip-some-updates  skip updates of
 pgbench_tellers and pgbench_branches. But I cannot understand why -N is this
 option, so I set this option -u, and -N is do not run CHECKPOINT option.
 
 What do you think?
 


+1

I have been using a script to add CHECKPOINT before pgbench runs for
ages...adding the option to pgbench is a great idea (wish I had thought
of it)!

Regards

Mark




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-30 Thread Cédric Villemain
Le jeudi 29 août 2013 18:42:13 Stephen Frost a écrit :
 On Thursday, August 29, 2013, Andres Freund wrote:
  If you don't want your installation to use it, tell you ops people not
  to do so. They are superusers, they need to have the ability to follow
  some rules you make up internally.
 
 The OPs people are the ones that will be upset with this because the DBAs
 will be modifying configs which OPs rightfully claim as theirs. If they
 have a way to prevent it then perhaps it's not terrible but they'd also
 need to know to disable this new feature. As for ALTER DATABASE- I would
 be happier with encouraging use of that (or providing an ALTER CLUSTER) for
 those thing it makes sense and works for and removing those from being in
 postgresql.conf. I still feel things like listen_addresses shouldn't be
 changed through this.

ALTER ROLE ALL may be good enougth to handle every GUC that we can also remove 
from postgresql.conf (I suppose all GUC needing only a reload, not a restart). 
It may needs some improvement to handle changing default for ALL and adding 
new role.

  The energy wasted in a good part of this massive 550+ messages thread is
  truly saddening. We all (c|sh)ould have spent that time making PG more
  awesome instead.
 
 Perhaps not understood by all, but keeping PG awesome involves more than
 adding every feature proposed- it also means saying no sometimes; to
 features, to new GUCs, even to micro-optimizations when they're overly
 complicated and offer only minimal or questionable improvements.

Agreed, the current feature and proposal does not include pg_reload, and it 
introduces a full machinery we absolutely don't need.

Grammar can be added later when the feature is stable.

So far, we can achieve the goal by using adminpack, by using a file_fdw or a 
config_fdw. IMHO it is the job of a FDW to be able to handle atomic write or 
anything like that.

I've commented one of the proposed patch adding some helpers to validate GUC 
change, I claimed this part was good enough to be added without ALTER SYSTEM 
(so a contrib can use it).

-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] Add pgbench option: CHECKPOINT before starting benchmark

2013-08-30 Thread Fabien COELHO



My patches option difinition is here.
[mitsu-ko@localhost pgbench]$ ./pgbench --help
~
-N, --no-checkpoint  do not run CHECKPOINT after initialization
~
In latest commited pgbench, -N is --skip-some-updates  skip updates of
pgbench_tellers and pgbench_branches. But I cannot understand why -N is this
option, so I set this option -u, and -N is do not run CHECKPOINT option.

What do you think?


Although I agree that the -N is not really meaningful, ISTM that changing 
option names is to be avoided if people are used to it. In this instance, 
I'm not sure that many people use this option, so maybe this is not an 
issue.


The patch should also update the sgml documentation.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [9.3 doc fix] clarification of Solaris versions

2013-08-30 Thread Bjorn Munch
On 29/08 21.17, MauMau wrote:
 
 Thanks.  I belive PostgreSQL runs successfully on Solaris 10 and later,
 because the binaries are published on the community site:
 
 http://ftp.postgresql.org/pub/binary/v9.3beta2/solaris/

Sorry, I didn't notice this thread earlier. Yes, I am building those
binaries and I have had no problems building PostgreSQL and running
the regression tests on Solaris 11. It would be quite unusual for
something to work on Solaris 10 but not 11, but of course it could
happen. There may have been a case or two which I now can't remember.

- Bjorn Munch


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout logging

2013-08-30 Thread Thom Brown
On 6 June 2013 17:28, Pavel Stehule pavel.steh...@gmail.com wrote:

 2013/6/6 Thom Brown t...@linux.com:
  Hi,
 
  When a statement is cancelled due to it running for long enough for
  statement_timeout to take effect, it logs a message:
 
  ERROR:  canceling statement due to statement timeout
 
  However, it doesn't log what the timeout was at the time of the
  cancellation.  This may be set in postgresql.conf, the database, or on
  the role, but unless log_line_prefix is set to show the database name
  and the user name, there's no reliable way of finding out what context
  the configuration applied from.  Setting log_duration won't help
  either because that only logs the duration of completed queries.
 
  Should we output the statement_timeout value when a query is cancelled?

 +1

 we use same feature in GoodData. Our long queries are cancelled by
 users and we should to known how much a users would to wait.


It seems there are a couple more errors that could share this sort of
information too:

canceling authentication due to timeout
canceling statement due to lock timeout

Admittedly the first of those two isn't really an issue.
-- 
Thom


Re: [HACKERS] [v9.4] row level security

2013-08-30 Thread Kohei KaiGai
2013/8/29 Josh Berkus j...@agliodbs.com:
 Kaigai,

 Although I didn't touch this task by myself, my senior colleague said that we
 calculated some possible bandwidth of leakage as an evident when we ship
 supercomputer system with MAC feature for customer who worry about security.
 I'm not sure how to measure it. However, if we assume a human can run up to
 5 query per seconds, he needs 2-3 seconds to identify a particular integer 
 value
 less than 1, it means bandwidth of this covert channel is less than 5bps.
 I'm not sure whether enterprise-grade dbms has to care about these amount
 of covert channel.

 Why are you assuming a human needs to do it?  Given the explain vector,
 I could write a rather simple python or perl script which would find
 values by EXPLAIN leakage, at 1000 explain plans per minute.

 It's one thing to day we can't solve this covert channel issue right
 now in this patch, but saying we don't plan to solve it at all is
 likely to doom the patch.

 I'm not sure what the solution would be, exactly.  Deny permission for
 EXPLAIN on certain tables?

 Surely someone in the security community has discussed this?

Security community considers covert channel is a hard to solve problem;
nearly impossible to eliminate.
Let's see the summary in wikipedia:
  http://en.wikipedia.org/wiki/Covert_channel

It does not require countermeasure of covert channels in middle or entry
class security evaluation; that is usually required for enterprise grade,
even though it is required for the product being designed for military
grade.
The reason why its priority is relatively lower, is that degree of threats
with information leakage via covert channel has limited bandwidth in
comparison to main channel.

I also follow this standpoint; that is enough reasonable between
functionality and its strictness under limited resources.
Even if we could close a certain channel, we never can all other
channels, like a signal by namespace contention on table creation
as covert channel. Also, I don't know major commercial dbms handles
these scenario well.

Of course, it should be described in the document for users not to
apply these security features onto the region that overs our capability.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.4] row level security

2013-08-30 Thread Kohei KaiGai
2013/8/29 David Fetter da...@fetter.org:
 On Thu, Aug 29, 2013 at 10:05:14AM -0400, Tom Lane wrote:
 Alexander Korotkov aekorot...@gmail.com writes:
  On Wed, Aug 28, 2013 at 4:17 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
  It is out of scope for this feature. We usually calls this type
  of information leakage covert channel; that is not avoidable in
  principle.

  I think there is another covert channel much more serious than
  constrains. You can gather information about hidden data by
  reading query plans.

 I'm not convinced by this argument that covert channels are out of
 scope.  That would be a fine justification for, say, a thesis
 topic.  However, what we're talking about here is a real-world
 feature that will be of no real-world use if it can't stand up
 against rather obvious attack techniques.  I'm not interested in
 carrying the maintenance and runtime overhead of a feature that's
 only of academic value.

 Looking at the real-world perspective, what covert channels do our
 competitors in the space currently claim to do anything about?

I'm not sure whether minor dbms that is designed for extreme secure
environment already got certified. (If they have such functionality,
they should take certification for promotion.)

Oracle lists some of their certified products:
http://www.oracle.com/technetwork/topics/security/security-evaluations-099357.html

However, these are based on protection profile for basic robustness that is
designed for environment where we don't care about covert channel.

 This would represent the bar we need to clear at least as far as
 documenting what we do (do the access constraint before anything else,
 e.g.) or why we don't do things (disabling EXPLAIN, e.g.).

+1. I'd like to add description about this scenario.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.4] row level security

2013-08-30 Thread Kohei KaiGai
2013/8/29 Tom Lane t...@sss.pgh.pa.us:
 Josh Berkus j...@agliodbs.com writes:
 That would close only one covert channel.  Others were already pointed out
 upthread, and I'll bet there are more ...

 Mind you, fundamentally this is no different from allowing INSERT
 permission on a table but denying SELECT, or denying SELECT on certain
 columns.  In either case, covert channels for some data are available.

 Certainly.  But INSERT's purpose in life is not to prevent people from
 inferring what data is in the table.  What we have to ask here is whether
 a row level security feature that doesn't deal with these real-world
 attack techniques is worth having.

I think, we should clearly note that row-level security feature does not
have capability to control information leakage via covert channel but
very limited bandwidth, even though it control information leakage and
manipulation via main channel.
It depends on user's environment and expectation. If they need rdbms
with security feature for military grade, it is not recommendable.
However, it is a recommended solution for regular enterprise grade
environment. Anything depends on user's environment, threats and
worth of values to be protected.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-30 Thread Dimitri Fontaine
Stephen Frost sfr...@snowman.net writes:
 The OPs people are the ones that will be upset with this because the DBAs
 will be modifying configs which OPs rightfully claim as theirs.

If that's the problem you want to solve, there's no technical solution
that will put you at ease. That's a people and trust problem.

I don't think typical (or less typical) organisation should drive our
technical choices too much, and I'm pretty confident they didn't in the
past: pg_hba.conf is a file not because it's meant for this or that team
but because it makes sense technically to manage the settings to allow
using some resources *outside* of said resources.

We currently have no way that I know of to disable ALTER ROLE SET and
ALTER DATABASE SET effects, why do we need to provide that feature for
ALTER SYSTEM SET so much?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-30 Thread Andres Freund
Hi,

On 2013-08-30 10:20:48 +0200, Cédric Villemain wrote:
   The energy wasted in a good part of this massive 550+ messages thread is
   truly saddening. We all (c|sh)ould have spent that time making PG more
   awesome instead.
  
  Perhaps not understood by all, but keeping PG awesome involves more than
  adding every feature proposed- it also means saying no sometimes; to
  features, to new GUCs, even to micro-optimizations when they're overly
  complicated and offer only minimal or questionable improvements.
 
 Agreed, the current feature and proposal does not include pg_reload, and it
 introduces a full machinery we absolutely don't need.

The complexity in the last version of the patch I looked at wasn't in
the added grammar or pg_reload() (well, it didn't have that). It was the
logic to read (from memory)/write the config file and validate the
GUCs. That's needed even if you put it into some module. And it requires
support from guc.c/guc-file.l

 Grammar can be added later when the feature is stable.

Could you explain the advantages of this? It will require users to get
used to different interfaces and we will end up maintaining both just
about forever. And I don't see the grammar being that contentious?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-30 Thread Stephen Frost
* Dimitri Fontaine (dimi...@2ndquadrant.fr) wrote:
 Stephen Frost sfr...@snowman.net writes:
  The OPs people are the ones that will be upset with this because the DBAs
  will be modifying configs which OPs rightfully claim as theirs.
 
 If that's the problem you want to solve, there's no technical solution
 that will put you at ease. That's a people and trust problem.

I really just don't buy that- I've already put forward suggestions for
how to deal with it, but no one here seems to understand the
distinction.  Modifying listen_addresses through ALTER SYSTEM is akin to
ISC/bind allowing changes to its listen_addresses equivilant through
dynamic DNS updates.  Would it be possible to implement?  Sure.  Does it
make any sense?  Certainly not.

 I don't think typical (or less typical) organisation should drive our
 technical choices too much, and I'm pretty confident they didn't in the
 past: pg_hba.conf is a file not because it's meant for this or that team
 but because it makes sense technically to manage the settings to allow
 using some resources *outside* of said resources.

I'm all for moving pg_hba.conf into the database properly, actually.  We
already handle permissions and user access in the DB and that's one of
the DBA's responsibilities.  The same goes for pg_ident.conf.

 We currently have no way that I know of to disable ALTER ROLE SET and
 ALTER DATABASE SET effects, why do we need to provide that feature for
 ALTER SYSTEM SET so much?

Because we've got crap mixed into postgresql.conf which are bootstrap
configs needed to get the system started.  Those things, in my view
anyway, fall much more into the category of resources which should be
managed outside the database than pg_hba.conf.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-30 Thread Stephen Frost
* Cédric Villemain (ced...@2ndquadrant.com) wrote:
 ALTER ROLE ALL may be good enougth to handle every GUC that we can also 
 remove 
 from postgresql.conf (I suppose all GUC needing only a reload, not a 
 restart). 
 It may needs some improvement to handle changing default for ALL and adding 
 new role.

Yes, one of the issues with the existing system is that you can't
specify a default to be applied to new roles.  Also, there are
parameters which are not per-role yet which it probably makes sense to
be in the database and we'd need a way to deal with that.  Although, at
the same time, considering it role based does make for a nice
distinction.  Unfortunately, those clammoring for this will argue that
it wouldn't replace adminpack and that they couldn't use it to modify
their /etc/network/interfaces file, which is the obvious next step to
all of this.

 Grammar can be added later when the feature is stable.

I tend to agree w/ Andres on this point- the grammar isn't really the
contentious part.  I think I see where you were going with this in that
excluding the grammar makes it able to live as a module, but that's a
different consideration.

 I've commented one of the proposed patch adding some helpers to validate GUC 
 change, I claimed this part was good enough to be added without ALTER SYSTEM 
 (so a contrib can use it).

Perhaps..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-30 Thread Stephen Frost
* Andres Freund (and...@2ndquadrant.com) wrote:
 On 2013-08-30 10:20:48 +0200, Cédric Villemain wrote:
  Grammar can be added later when the feature is stable.
 
 Could you explain the advantages of this? It will require users to get
 used to different interfaces and we will end up maintaining both just
 about forever. And I don't see the grammar being that contentious?

The different interfaces issue already exists, to some extent, as
adminpack does exist already and this is clearly a different interface
from that.  That said, as I mentioned just now elsewhere, I agree that
the grammar itself (being rather simple anyway...) isn't the contentious
part of this.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-30 Thread Andres Freund
On 2013-08-30 08:48:21 -0400, Stephen Frost wrote:
 * Dimitri Fontaine (dimi...@2ndquadrant.fr) wrote:
  Stephen Frost sfr...@snowman.net writes:
   The OPs people are the ones that will be upset with this because the DBAs
   will be modifying configs which OPs rightfully claim as theirs.
  
  If that's the problem you want to solve, there's no technical solution
  that will put you at ease. That's a people and trust problem.
 
 I really just don't buy that- I've already put forward suggestions for
 how to deal with it, but no one here seems to understand the
 distinction.  Modifying listen_addresses through ALTER SYSTEM is akin to
 ISC/bind allowing changes to its listen_addresses equivilant through
 dynamic DNS updates.  Would it be possible to implement?  Sure.  Does it
 make any sense?  Certainly not.

I very much want to change stuff like wal_level, listen_addresses and
shared_buffers via ALTER SYSTEM. Configuration variables like that
(PGC_POSTMASTER stuff mostly) are the prime reason why you actually need
to change postgresql.conf instead of changing per user/database
settings.
And you don't even need to do anything special to implement it. Because
it's already there.

  We currently have no way that I know of to disable ALTER ROLE SET and
  ALTER DATABASE SET effects, why do we need to provide that feature for
  ALTER SYSTEM SET so much?
 
 Because we've got crap mixed into postgresql.conf which are bootstrap
 configs needed to get the system started.  Those things, in my view
 anyway, fall much more into the category of resources which should be
 managed outside the database than pg_hba.conf.

I think the problem with your position in this thread is that you want
to overhaul the way our configuration works in a pretty radical
way. Which is fair enough, there certainly are deficiencies. But it's
not the topic of this thread.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-30 Thread Andres Freund
On 2013-08-29 21:26:48 -0400, Stephen Frost wrote:
  Sure, you can construct a scenario where this matters.  The ops guys
  have sudo postgres pg_ctl access but adminpack isn't installed and
  they have no other way to modify the configuration file.  But that's
  just bizarre.  And if that's really the environment you have, then you
  can install a loadable module that grabs ProcessUtility_hook and uses
  it to forbid ALTER SYSTEM on that machine.  Hell, we can ship such a
  thing in contrib.  Problem solved.  But it's surely too obscure a
  combination of circumstances to justify disabling this by default.
 
 It's not the OPs guy that I'm worried about using ALTER SYSTEM- I don't
 expect them to have any clue about it or care about it, except where it
 can be used to modify things under /etc which they, rightfully, consider
 their domain.

I think for the scenarios you describe it makes far, far much more sense
to add the ability to easily monitor for two things:
* on-disk configuration isn't the same as the currently loaded (not
  trivially possible yet)
* Configuration variables only come from locations that are approved for
  in your scenario (Already possible, we might want to make it even easier)

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Variadic aggregates vs. project policy

2013-08-30 Thread Tom Lane
Pavel Stehule pavel.steh...@gmail.com writes:
 I was one who sent a bug report - this error is not too dangerous, but it
 is hidden, and difficult to find, if you don't know what can be happen.
 Same as bug with plpgsql and SQL identifier collisions. If you understand,
 then you can protect self well and  simply. If not, then it is a magic
 error. So still I am thing so best solution is

 a) a warning when detect ORDER BY in variadic aggregates

Such a warning would never be tolerated by users, because it would appear
even when the query is perfectly correct.

 b) disallow ORDER BY in variadic aggregates in classic syntax, and enable
 it only in WITHIN GROUP syntax where is safe ,

And we're *not* inventing randomly different syntax for variadic
aggregates.  That ship sailed when we did it this way for regular
functions.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-30 Thread Robert Haas
On Thu, Aug 29, 2013 at 9:12 PM, Stephen Frost sfr...@snowman.net wrote:
 You will not find me argueing to allow that in normal operation, or most
 direct-catalog hacking.  I'd be much happier if we had all the ALTER,
 etc, options necessary to prevent any need to ever touch the catalogs.
 Similairly, it'd be nice to have more permission options which reduce
 the need for anyone to be superuser.

Sure, I agree.  But you can't burden this patch with the job of
conforming to what you and I think the project policy ought to be, but
isn't.  Right now, as things stand today, the superuser is one step
short of God Almighty.  The only way in which we even attempt to
restrict the superuser is to try to keep her from hijacking the
postgres login account.  But we don't even try to do that very
thoroughly, as has recently been pointed out on other threads; there
are several plausible attack vectors for her to do just that.  This
patch fits right into that existing philosophy.  If we take this
patch, and I think we should, and later change our policy, then the
permissions around this may require adjustment to fit the new policy.
Fine!  But that policy change is properly a separate discussion.

 Now, you can argue that people are more likely to render the database
 nonfunctional by changing GUC settings than they are to do it by
 corrupting the system catalogs, but I'm not sure I believe it.  We
 can, after all, validate that any setting a user supplies is a valid
 value for that setting before writing it out to the configuration
 file.  It might still make the system fail to start if - for example -
 it causes too many semaphores to be reserved, or something like that.
 But why should we think that such mistakes will be common?  If
 anything, it sounds less error-prone to me than hand-editing the
 configuration file, where typing something like on_exit_error=maybe
 will make the server fail to start.  We can eliminate that whole
 category of error, at least for people who choose to use the new
 tools.

 That category of error is much more likely to get caught by the sysadmin
 doing the restart after the update..

If you edit postgresql.conf manually today, you will have no warning
of ANY sort of error you might make.  With a feature like this, we can
catch a large subset of things you might do wrong *before we even
modify the file*.  Yes, there will be some that slip through, but an
imperfect mechanism for catching mistakes is better than no mechanism
at all.

 ALTER SYSTEM doesn't provide any
 way to restart the DB to make sure it worked.
 That, in turn, encourages
 modification of the config parameters w/o a restart, which is a pretty
 bad idea, imv.

vi doesn't provide any way to restart the DB to make sure it worked,
either, and never will.  However, ALTER SYSTEM could be extended in
exactly that way: we could have ALTER SYSTEM RESTART.  Of course some
people are likely to argue that's a bad idea, so we'd have to have
that discussion and think carefully about the issues, but there's
nothing intrinsic that restricts us from giving ALTER SYSTEM any
arbitrary set of capabilities we might want it to have.  Even if we
don't ever do that, we're no worse off than today.

Well, with one exception.  If we make it easier to modify the
configuration file, then people are more likely to do it, and thus
more likely to do it wrong.  But you could use that argument against
any change that improves ease-of-use.  And ease-of-use is a feature,
not a bug.

 If you're using the term dangerous to refer to a security exposure,
 I concede there is some incremental exposure from allowing this by
 default.  But it's not a terribly large additional exposure.  It's
 already the case that if adminpack is available the super-user can do
 whatever he or she wants, because she or he can write to arbitrary
 files inside the data directory.

 This is only true if -contrib is installed, which a *lot* of admins
 refuse to do, specifically because of such insane modules as this.
 Therefore, I would argue that it's not nearly as small a change wrt
 exposure as you believe.  Strictly speaking, I don't believe it's a huge
 security hole or something, but I do think it's going to surprise admins
 who aren't following the lists.

If we take the patch, it will be in the release notes, just like any
other feature.  I suspect there will be more people *pleasantly*
surprised than anything else.  If there's anything I've learned about
ease-of-use as a 10+-year veteran of PostgreSQL, it's that being able
to do everything via a database connection is really, really, really
convenient.  That's why foreign data wrappers are awesome.  Instead of
installing a foreign data wrapper for Oracle and pointing it at an
Oracle server and then sucking data down over the link to store or
process or whatever, you put all that logic in a client which knows
how to talk to both PostgreSQL and Oracle.  It turns out, though, that
having that capability *in the server* is really 

Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-30 Thread Stephen Frost
* Andres Freund (and...@2ndquadrant.com) wrote:
 On 2013-08-30 08:48:21 -0400, Stephen Frost wrote:
  I really just don't buy that- I've already put forward suggestions for
  how to deal with it, but no one here seems to understand the
  distinction.  Modifying listen_addresses through ALTER SYSTEM is akin to
  ISC/bind allowing changes to its listen_addresses equivilant through
  dynamic DNS updates.  Would it be possible to implement?  Sure.  Does it
  make any sense?  Certainly not.
 
 I very much want to change stuff like wal_level, listen_addresses and
 shared_buffers via ALTER SYSTEM. Configuration variables like that
 (PGC_POSTMASTER stuff mostly) are the prime reason why you actually need
 to change postgresql.conf instead of changing per user/database
 settings.

wal_level and shared_buffers I can buy, but listen_addresses?  The most
typical change there is going from localhost - '*', but you've got to
be on the box to do that.  Anything else and you're going to need to be
adding interfaces to the box anyway and hacking around in
/etc/network/interfaces or what-have-you.

  Because we've got crap mixed into postgresql.conf which are bootstrap
  configs needed to get the system started.  Those things, in my view
  anyway, fall much more into the category of resources which should be
  managed outside the database than pg_hba.conf.
 
 I think the problem with your position in this thread is that you want
 to overhaul the way our configuration works in a pretty radical
 way. Which is fair enough, there certainly are deficiencies. But it's
 not the topic of this thread.

You and Robert both seem to be of the opinion that this hack which
brings postgresql.conf into the database via ALTER SYSTEM is a-ok
because it's moving us forward in someone's mind, but it really is
developing a system configuration management system which *looks* like a
first-class citizen when it actually falls woefully short of that.

There is absolutely no question in my mind that this will be a huge
support pain, from the first ALTER SYSTEM SET shared_buffers = blah;
SHOW shared_buffers; to the why can't my database start?!?  it's
complaining it can't allocate memory but I keep changing postgresql.conf
and nothing works!  I'm simply not convinced that this is moving us
forward nor that we will end up with more benefit than pain from it.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] WAL CPU overhead/optimization (was Master-slave visibility order)

2013-08-30 Thread k...@rice.edu
On Fri, Aug 30, 2013 at 03:22:37AM +0300, Ants Aasma wrote:
 On Fri, Aug 30, 2013 at 3:02 AM, Andres Freund and...@2ndquadrant.com wrote:
  I am not sure hot cache large buffer performance is really the
  interesting case. Most of the XLogInsert()s are pretty small in the
  common workloads. I vaguely recall trying 8 and getting worse
  performance on many workloads, but that might have been a problem of my
  implementation.
 
 Slice-by-8 doesn't have any overhead for small buffers besides the
 lookup tables, so it most likely the cache misses that were the issue.
 Murmur3, CityHash and SpookyHash don't have any lookup tables and are
 excellent with small keys. Especially CityHash, 64 byte hash is quoted
 at 9ns.
 
  The reason I'd like to go for a faster CRC32 implementation as a first
  step is that it's easy. Easy to verify, easy to analyze, easy to
  backout. I personally don't have enough interest/time in the 9.4 cycle
  to purse conversion to a different algorithm (I find the idea of using
  different ones on 32/64bit pretty bad), but I obviously won't stop
  somebody else ;)
 
 I might give it a shot later this cycle as I have familiarized my self
 with the problem domain anyway. I understand the appeal of staying
 with what we have, but this would cap the speedup at 4x and has large
 caveats with the extra lookup tables. A 28x speedup might be worth the
 extra effort.
 
 Regards,
 Ants Aasma
 

You may want to also check out xxhash with a BSD License and very fast
32-bit performance as well:

http://fastcompression.blogspot.com/2012/04/selecting-checksum-algorithm.html
http://code.google.com/p/xxhash/

FWIW I agree that a much faster function would be better for CPU overhead.

Regards,
Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-30 Thread Stephen Frost
* Andres Freund (and...@2ndquadrant.com) wrote:
 On 2013-08-29 21:26:48 -0400, Stephen Frost wrote:
  It's not the OPs guy that I'm worried about using ALTER SYSTEM- I don't
  expect them to have any clue about it or care about it, except where it
  can be used to modify things under /etc which they, rightfully, consider
  their domain.
 
 I think for the scenarios you describe it makes far, far much more sense
 to add the ability to easily monitor for two things:
 * on-disk configuration isn't the same as the currently loaded (not
   trivially possible yet)

That would certainly help.  I don't know that it needs to be technically
'trivial', but at least having check_postgres.pl or something which can
alert on it would be an improvement.  Clearly, that would be valuable
today (assuming it doesn't already exist somewhere..  it might).

 * Configuration variables only come from locations that are approved for
   in your scenario (Already possible, we might want to make it even easier)

That an interesting notion; do you have something specific in mind..?
The easiest, imv anyway, would be that options set in postgresql.conf
can't be overridden, but that gets us into the bootstrap problem that
people seem to be concerned about.  It would also be a change to how
postgresql.conf is parsed today which some people would be annoyed by.
Having some configuration option which says what can be modified by
alter system doesn't strike me as a terribly good solution either.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] ENABLE/DISABLE CONSTRAINT NAME

2013-08-30 Thread Tom Lane
wangs...@highgo.com.cn writes:
In order to achieve enable/disable constraint name,I made ​​a 
 few 
 modifications to the code.
First, someone used to build the constraints while building 
 table. Then inserting data must follow a certain order.
And people usually like to insert the data but not affected by 
 foreign keys or check.
Second, the check or the foreign key constraint will waste much 
 time while inserting the data into the table.
Due to the above reasons,I realized this command.

Uh ... why not just drop the constraint, and re-add it later if you want
it again?  This seems like adding a lot of mechanism (and possible bugs)
for a rather marginal use-case.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-30 Thread Andres Freund
On 2013-08-30 09:19:42 -0400, Stephen Frost wrote:
 * Andres Freund (and...@2ndquadrant.com) wrote:
  On 2013-08-30 08:48:21 -0400, Stephen Frost wrote:
   I really just don't buy that- I've already put forward suggestions for
   how to deal with it, but no one here seems to understand the
   distinction.  Modifying listen_addresses through ALTER SYSTEM is akin to
   ISC/bind allowing changes to its listen_addresses equivilant through
   dynamic DNS updates.  Would it be possible to implement?  Sure.  Does it
   make any sense?  Certainly not.
  
  I very much want to change stuff like wal_level, listen_addresses and
  shared_buffers via ALTER SYSTEM. Configuration variables like that
  (PGC_POSTMASTER stuff mostly) are the prime reason why you actually need
  to change postgresql.conf instead of changing per user/database
  settings.
 
 wal_level and shared_buffers I can buy, but listen_addresses?  The most
 typical change there is going from localhost - '*', but you've got to
 be on the box to do that.  Anything else and you're going to need to be
 adding interfaces to the box anyway and hacking around in
 /etc/network/interfaces or what-have-you.

Even if it requires to be on the box locally, I only need libpq for
it. And it's not infrequent to allow additional, already configured
interfaces. And even if not, what's the point of prohibiting
listen_interfaces specifically? When the other very interesting
variables have the same dangers?
Doing this on a variable-by-variable basis will a) be a ridiculous
amount of effort, b) confuse users which may not share our judgement of
individual variables.

   Because we've got crap mixed into postgresql.conf which are bootstrap
   configs needed to get the system started.  Those things, in my view
   anyway, fall much more into the category of resources which should be
   managed outside the database than pg_hba.conf.
  
  I think the problem with your position in this thread is that you want
  to overhaul the way our configuration works in a pretty radical
  way. Which is fair enough, there certainly are deficiencies. But it's
  not the topic of this thread.
 
 You and Robert both seem to be of the opinion that this hack which
 brings postgresql.conf into the database via ALTER SYSTEM is a-ok
 because it's moving us forward in someone's mind, but it really is
 developing a system configuration management system which *looks* like a
 first-class citizen when it actually falls woefully short of that.

It's what plenty of people want and it doesn't hurt people who do not
want it. Yes. I think that's a step forward.

 There is absolutely no question in my mind that this will be a huge
 support pain, from the first ALTER SYSTEM SET shared_buffers = blah;
 SHOW shared_buffers; to the why can't my database start?!?  it's
 complaining it can't allocate memory but I keep changing postgresql.conf
 and nothing works!  I'm simply not convinced that this is moving us
 forward nor that we will end up with more benefit than pain from it.

That will not show the changed shared_buffers. And it (afair) will throw
a WARNING that shared_buffers couldn't be adjusted at this point.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-30 Thread Robert Haas
On Thu, Aug 29, 2013 at 9:26 PM, Stephen Frost sfr...@snowman.net wrote:
 In the first place, modifying postgresql.conf and not immediately
 restarting the server to test your changes is probably the single most
 basic DBA error imaginable.

 You're not modifying postgresql.conf with ALTER SYSTEM, now are you?
 Admins are going to realize the need to restart or at least reload the
 service after updating such, but a DBA who has focused mainly on
 normalization, query optimization, etc, isn't going to have the same
 experience that a sysadmin has.

I refuse to reject any feature on the basis of the possibility that
people might use it without reading the documentation.  Nearly
anything anyone will ever propose is so idiot-proof that we can't
conceive of a scenario in which someone shoots their foot off with it.
 The question is whether the usefulness of the feature exceeds, by a
sufficient amount, the harm it's likely to do, and I think the answer
is clearly yes.

 A DBA updating a GUC, something they are likely to do frequently
 day-in and day-out, isn't necessairly going to consider that it's a
 reboot-needing change.  Indeed, it's not unreasonable to consider that
 we may, some day, actually be able to change or increase
 shared_buffers on the fly (or error in the trying); I'd think your
 work with the dynamic backends would actually make that likely to
 happen sooner rather than later.

I wouldn't hold my breath.  We have way too many separate,
variable-length data structures in shared memory.  Increasing
shared_buffers means making the lwlock array bigger, making the buffer
header array bigger, allocating more space for the buffer mapping
hash, etc.  I doubt we'd accept a design that involves each of those
data structures being a separate shared memory segment, but if they're
all in the same segment one after another, then you can't easily
extend them on the fly.

There are also a lot of problems around unmapping-and-remapping a
segment.  If the remap fails, it's going to be at least a FATAL, but
if you had any shared memory state in the unmapped segment (like a
buffer pin) that has to be unwound, you have to PANIC; there's no
other way to fix it.  My initial design for dynamic shared memory
allowed for an unbounded number of dynamic shared memory segments by
growing the control segment on the fly.  However, this introduced
PANIC hazards in corner cases that I couldn't get rid of, so now
there's a fairly generous but fixed-at-startup-time limit on how many
segments you can have.  In practice I don't think this matters much,
but it was a sobering reminder that the main shared memory segment,
with all of its inflexibility, has important reliability properties
that are hard to replicate in more dynamic scenarios.

 It's not the OPs guy that I'm worried about using ALTER SYSTEM- I don't
 expect them to have any clue about it or care about it, except where it
 can be used to modify things under /etc which they, rightfully, consider
 their domain.

Under the currently-proposed design, it can't be used to do any such
thing.  It can only be used to modify some auto.conf file which lives
in $PGDATA.  It's therefore no different from the ops perspective than
ALTER DATABASE or ALTER USER - except that it allows all settings to
be changed rather than only a subset.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Variadic aggregates vs. project policy

2013-08-30 Thread David Johnston
Tom Lane-2 wrote
 Pavel Stehule lt;

 pavel.stehule@

 gt; writes:
 I was one who sent a bug report - this error is not too dangerous, but it
 is hidden, and difficult to find, if you don't know what can be happen.
 Same as bug with plpgsql and SQL identifier collisions. If you
 understand,
 then you can protect self well and  simply. If not, then it is a magic
 error. So still I am thing so best solution is
 
 a) a warning when detect ORDER BY in variadic aggregates
 
 Such a warning would never be tolerated by users, because it would appear
 even when the query is perfectly correct.
 
 b) disallow ORDER BY in variadic aggregates in classic syntax, and enable
 it only in WITHIN GROUP syntax where is safe ,
 
 And we're *not* inventing randomly different syntax for variadic
 aggregates.  That ship sailed when we did it this way for regular
 functions.

In the example case the problem is that ORDER BY constant is a valid, if
not-very-useful, construct.  Can we warn on this specific usage and thus
mitigate many of the potential avenues of mis-use?

If we alter syntax for mitigation purposes I'd want to consider requiring
parentheses around the columns that belong to the ORDER BY instead of using
the full extended syntax of WITHIN GROUP.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Variadic-aggregates-vs-project-policy-tp5768980p5769106.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-30 Thread Robert Haas
On Fri, Aug 30, 2013 at 8:53 AM, Stephen Frost sfr...@snowman.net wrote:
 Yes, one of the issues with the existing system is that you can't
 specify a default to be applied to new roles.  Also, there are
 parameters which are not per-role yet which it probably makes sense to
 be in the database and we'd need a way to deal with that.  Although, at
 the same time, considering it role based does make for a nice
 distinction.  Unfortunately, those clammoring for this will argue that
 it wouldn't replace adminpack and that they couldn't use it to modify
 their /etc/network/interfaces file, which is the obvious next step to
 all of this.

This is a straw-man.  adminpack doesn't allow reading or writing files
outside of the configured data and log directories, as a security
precaution.  But suppose it did.  If your permissions on
/etc/network/interfaces allow the postgres user to modify it, then you
pretty much deserve exactly what you get.  Likening this to the
feature being proposed is silly.  What is being asked for here is the
ability to easily modify system-wide settings from the SQL prompt,
just as today you can modify settings per-user or per-database.
That's not the same thing as rewriting the entire system
configuration.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-30 Thread Stephen Frost
* Andres Freund (and...@2ndquadrant.com) wrote:
 On 2013-08-30 09:19:42 -0400, Stephen Frost wrote:
  wal_level and shared_buffers I can buy, but listen_addresses?  The most
  typical change there is going from localhost - '*', but you've got to
  be on the box to do that.  Anything else and you're going to need to be
  adding interfaces to the box anyway and hacking around in
  /etc/network/interfaces or what-have-you.
 
 Even if it requires to be on the box locally, I only need libpq for
 it. And it's not infrequent to allow additional, already configured
 interfaces. And even if not, what's the point of prohibiting
 listen_interfaces specifically? When the other very interesting
 variables have the same dangers?

I'd like to see those dangers removed from the other very interesting
variables.  We're making progress towards that, for example with
shared_buffers.  I've commented on that already up-thread.  Hell, you
could even make things such that PG would start w/ a misconfigured
listen_addresses- but if we don't like that then I would argue that it
shouldn't be included here.  I'm not entirely sure how wal_level has the
same danger as the others..

 Doing this on a variable-by-variable basis will a) be a ridiculous
 amount of effort, b) confuse users which may not share our judgement of
 individual variables.

I don't think the effort involved is nearly as much as you claim and we
already have the issue that users don't like our choices around what can
be reconfigured on the fly and what can't (perhaps they understand that
there are technical challenges to some of them, but that doesn't make
them agree with them).

  You and Robert both seem to be of the opinion that this hack which
  brings postgresql.conf into the database via ALTER SYSTEM is a-ok
  because it's moving us forward in someone's mind, but it really is
  developing a system configuration management system which *looks* like a
  first-class citizen when it actually falls woefully short of that.
 
 It's what plenty of people want and it doesn't hurt people who do not
 want it. Yes. I think that's a step forward.

It will be quite interesting to see if people decide they really wanted
this once they actually *get* it.

  There is absolutely no question in my mind that this will be a huge
  support pain, from the first ALTER SYSTEM SET shared_buffers = blah;
  SHOW shared_buffers; to the why can't my database start?!?  it's
  complaining it can't allocate memory but I keep changing postgresql.conf
  and nothing works!  I'm simply not convinced that this is moving us
  forward nor that we will end up with more benefit than pain from it.
 
 That will not show the changed shared_buffers. And it (afair) will throw
 a WARNING that shared_buffers couldn't be adjusted at this point.

Not showing the change is what I was getting at.  As has been said
elsewhere, throwing a warning on every interesting invokation of a
command can speak to it not being exactly 'ready'.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Add pgbench option: CHECKPOINT before starting benchmark

2013-08-30 Thread Tom Lane
KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp writes:
 My patches option difinition is here.
 [mitsu-ko@localhost pgbench]$ ./pgbench --help
 ~
 -N, --no-checkpoint  do not run CHECKPOINT after initialization
 ~
 In latest commited pgbench, -N is --skip-some-updates  skip updates of
 pgbench_tellers and pgbench_branches. But I cannot understand why -N is this
 option, so I set this option -u, and -N is do not run CHECKPOINT option.

 What do you think?

I think it's a bad idea to change the meaning of a pre-existing option
letter, and a worse idea to change pgbench's default behavior.  Instead,
have the option be --checkpoint  do a CHECKPOINT after initialization.

It doesn't look like there's any free single-letter option that goes
nicely with checkpoint either, but that's life.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-30 Thread Andres Freund
On 2013-08-30 09:43:01 -0400, Stephen Frost wrote:
 * Andres Freund (and...@2ndquadrant.com) wrote:
  On 2013-08-30 09:19:42 -0400, Stephen Frost wrote:
   wal_level and shared_buffers I can buy, but listen_addresses?  The most
   typical change there is going from localhost - '*', but you've got to
   be on the box to do that.  Anything else and you're going to need to be
   adding interfaces to the box anyway and hacking around in
   /etc/network/interfaces or what-have-you.
  
  Even if it requires to be on the box locally, I only need libpq for
  it. And it's not infrequent to allow additional, already configured
  interfaces. And even if not, what's the point of prohibiting
  listen_interfaces specifically? When the other very interesting
  variables have the same dangers?
 
 I'd like to see those dangers removed from the other very interesting
 variables.  We're making progress towards that, for example with
 shared_buffers.  I've commented on that already up-thread.  Hell, you
 could even make things such that PG would start w/ a misconfigured
 listen_addresses- but if we don't like that then I would argue that it
 shouldn't be included here.

Yes, we're making progress. But that's an independent thing, partially
constrained by implementation complexity, partially constrained by cross
platform worries.
You're free to work on reducing the dangers around other variables, but
I don't understand in the very least why that should stop this feature.

 I'm not entirely sure how wal_level has the same danger as the
 others..

Try setting max_wal_senders = 10 and wal_level = minimal.

   There is absolutely no question in my mind that this will be a huge
   support pain, from the first ALTER SYSTEM SET shared_buffers = blah;
   SHOW shared_buffers; to the why can't my database start?!?  it's
   complaining it can't allocate memory but I keep changing postgresql.conf
   and nothing works!  I'm simply not convinced that this is moving us
   forward nor that we will end up with more benefit than pain from it.
  
  That will not show the changed shared_buffers. And it (afair) will throw
  a WARNING that shared_buffers couldn't be adjusted at this point.
 
 Not showing the change is what I was getting at.  As has been said
 elsewhere, throwing a warning on every interesting invokation of a
 command can speak to it not being exactly 'ready'.

Then pg isn't ready already, because it has done that for a long
time. Set some PGC_POSTMASTER variable and reload the configuration.

Guiding a user that to perform required further action doesn't imply
nonreadyness in the least.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-30 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 there's a fairly generous but fixed-at-startup-time limit on how many
 segments you can have.  In practice I don't think this matters much,
 but it was a sobering reminder that the main shared memory segment,
 with all of its inflexibility, has important reliability properties
 that are hard to replicate in more dynamic scenarios.

Why wouldn't it be possible to have the same arrangment for
shared_buffers, where you have more entires than you 'need' at startup
but which then allows you to add more shared segments later?  I do see
that we would need an additional bit of indirection to handle that,
which might be too much overhead, but the concept seems possible.  Isn't
that more-or-less how the kernel handles dynamic memory..?

 Under the currently-proposed design, it can't be used to do any such
 thing.  It can only be used to modify some auto.conf file which lives
 in $PGDATA.  It's therefore no different from the ops perspective than
 ALTER DATABASE or ALTER USER - except that it allows all settings to
 be changed rather than only a subset.

Claiming that modifying a file *included from a file in /etc* doesn't
modify things under /etc is disingenuous, imv.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-30 Thread Robert Haas
On Fri, Aug 30, 2013 at 9:19 AM, Stephen Frost sfr...@snowman.net wrote:
 You and Robert both seem to be of the opinion that this hack which
 brings postgresql.conf into the database via ALTER SYSTEM is a-ok
 because it's moving us forward in someone's mind, but it really is
 developing a system configuration management system which *looks* like a
 first-class citizen when it actually falls woefully short of that.

It's not a system configuration management system, and it doesn't
pretend to be.  It's an analogue of ALTER USER and ALTER DATABASE that
papers over their shortcomings, and a safer alternative to adminpack's
kludgy way of enabling the same functionality.

 There is absolutely no question in my mind that this will be a huge
 support pain, from the first ALTER SYSTEM SET shared_buffers = blah;
 SHOW shared_buffers;

They'd have the same problem with ALTER USER SET work_mem = blah.
You're acting like ALTER commands that don't take immediate effect are
something brand-new, but we've had them for years.

  to the why can't my database start?!?  it's
 complaining it can't allocate memory but I keep changing postgresql.conf
 and nothing works!

As of 9.3, that failure mode doesn't really happen any more, unless
you maybe set shared_buffers equal to 100% of system memory.

 I'm simply not convinced that this is moving us
 forward nor that we will end up with more benefit than pain from it.

Fair enough, but I'm not convinced that we'll derive any pain at all
from it.  The existing similar features haven't been a notable source
of complaints AFAIK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Variadic aggregates vs. project policy

2013-08-30 Thread Andres Freund
On 2013-08-30 06:34:47 -0700, David Johnston wrote:
 Tom Lane-2 wrote
  I was one who sent a bug report - this error is not too dangerous, but it
  is hidden, and difficult to find, if you don't know what can be happen.
  Same as bug with plpgsql and SQL identifier collisions. If you
  understand,
  then you can protect self well and  simply. If not, then it is a magic
  error. So still I am thing so best solution is
 
  a) a warning when detect ORDER BY in variadic aggregates
 
  Such a warning would never be tolerated by users, because it would appear
  even when the query is perfectly correct.
 
  b) disallow ORDER BY in variadic aggregates in classic syntax, and enable
  it only in WITHIN GROUP syntax where is safe ,
 
  And we're *not* inventing randomly different syntax for variadic
  aggregates.  That ship sailed when we did it this way for regular
  functions.

 In the example case the problem is that ORDER BY constant is a valid, if
 not-very-useful, construct.  Can we warn on this specific usage and thus
 mitigate many of the potential avenues of mis-use?

That doesn't help against something like »SELECT string_agg(somecol
ORDER BY bar, separator)« where separator is a column.

 If we alter syntax for mitigation purposes I'd want to consider requiring
 parentheses around the columns that belong to the ORDER BY instead of using
 the full extended syntax of WITHIN GROUP.

I think that ship has sailed. The syntax is there and it's not going
away. Requiring different syntaxes for variadic/nonvariadic usages is
going to be a way much bigger pitfall for users.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-30 Thread Stephen Frost
* Andres Freund (and...@2ndquadrant.com) wrote:
 Technically trivial in the sense that it should be queryable from SQL
 without having to write code in an untrusted PL ;).

hah.

 I guess storing the file modification date along the file/location a GUC
 is originating from would be good enough. Then you could write a query
 using pg_stat_file() to make sure they are up2date.

Wouldn't you want to actually look at the GUC and see if the value is
different also?  Just knowing that postgresql.conf changed doesn't mean
you want every value to say it's different...  It's entirely possible
that the file was rewritten or touch'd but the configuration is
identical.

   * Configuration variables only come from locations that are approved for
 in your scenario (Already possible, we might want to make it even 
   easier)
  
  That an interesting notion; do you have something specific in mind..?
  The easiest, imv anyway, would be that options set in postgresql.conf
  can't be overridden, but that gets us into the bootstrap problem that
  people seem to be concerned about.  It would also be a change to how
  postgresql.conf is parsed today which some people would be annoyed by.
  Having some configuration option which says what can be modified by
  alter system doesn't strike me as a terribly good solution either.
 
 I think changing the precedence of options in postgresql.conf has about
 zero chance.

Sadly, you're probably right.

 That would make it possible to easily write a query that works across
 intallation that warns about any values stored in auto.conf, even if
 they are overwritten by a per-user config or similar.

I had the impression that 'approved for' above meant something which
actually *enforced* it rather than just another monitoring check..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-30 Thread Robert Haas
On Fri, Aug 30, 2013 at 9:49 AM, Stephen Frost sfr...@snowman.net wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
 there's a fairly generous but fixed-at-startup-time limit on how many
 segments you can have.  In practice I don't think this matters much,
 but it was a sobering reminder that the main shared memory segment,
 with all of its inflexibility, has important reliability properties
 that are hard to replicate in more dynamic scenarios.

 Why wouldn't it be possible to have the same arrangment for
 shared_buffers, where you have more entires than you 'need' at startup
 but which then allows you to add more shared segments later?  I do see
 that we would need an additional bit of indirection to handle that,
 which might be too much overhead, but the concept seems possible.  Isn't
 that more-or-less how the kernel handles dynamic memory..?

Yeah, I think that something like would be possible, but the
synchronization would be tricky, and it wouldn't work at all with
System V shared memory or Windows shared memory, which apparently
can't be resized after creation.  Also, it would rely on the system
administrator having a sensible setting for
max_on_the_fly_shared_buffers and only having the wrong setting for
initial_shared_buffers_until_i_change_it_later, which might not cover
a very high percentage of the cases that arise in practice.

 Under the currently-proposed design, it can't be used to do any such
 thing.  It can only be used to modify some auto.conf file which lives
 in $PGDATA.  It's therefore no different from the ops perspective than
 ALTER DATABASE or ALTER USER - except that it allows all settings to
 be changed rather than only a subset.

 Claiming that modifying a file *included from a file in /etc* doesn't
 modify things under /etc is disingenuous, imv.

I think you're getting way too hung up on the fact that the proposed
auto.conf will be stored as a flat file.  From your comments upthread,
I gather that you'd be rejoicing if it were a table.  The only reason
we're not doing that is because of the possibility that something in
auto.conf might keep the server from starting.  I don't think that's
gonna be very common now that we're mostly out from under the System V
shared memory limits, but we'll see.  As you point out, it might also
be worth fixing: maybe we can find some way to arrange things so that
the server will always be able to start up regardless of how badly
messed-up auto.conf is... but that's a different patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-30 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 On Fri, Aug 30, 2013 at 8:53 AM, Stephen Frost sfr...@snowman.net wrote:
  Yes, one of the issues with the existing system is that you can't
  specify a default to be applied to new roles.  Also, there are
  parameters which are not per-role yet which it probably makes sense to
  be in the database and we'd need a way to deal with that.  Although, at
  the same time, considering it role based does make for a nice
  distinction.  Unfortunately, those clammoring for this will argue that
  it wouldn't replace adminpack and that they couldn't use it to modify
  their /etc/network/interfaces file, which is the obvious next step to
  all of this.
 
 This is a straw-man.  

It was intended to be and I wasn't particularly expecting a response to
the sarcasm, at the same time..

 Likening this to the
 feature being proposed is silly.  What is being asked for here is the
 ability to easily modify system-wide settings from the SQL prompt,
 just as today you can modify settings per-user or per-database.

/etc/network/interfaces would be considered part of the system by
some.. :)  PG isn't an OS, yet..

 That's not the same thing as rewriting the entire system
 configuration.

And here you're using the *other* meaning of system, no?  Glad there
won't be any confusion here! ;)  (yes, I'm kidding..).

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Variadic aggregates vs. project policy

2013-08-30 Thread Tom Lane
David Johnston pol...@yahoo.com writes:
 If we alter syntax for mitigation purposes I'd want to consider requiring
 parentheses around the columns that belong to the ORDER BY instead of
 using the full extended syntax of WITHIN GROUP.

Unfortunately, that ORDER BY syntax is specified by the SQL standard,
and they didn't say anything about parentheses.  We don't get to
require parens there.

The particular case that's standardized is only array_agg():

array aggregate function ::=
  ARRAY_AGG
  left paren value expression [ ORDER BY sort specification list ]
  right paren

but, as we customarily do, we didn't restrict the feature to be used only
with that aggregate.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Variadic aggregates vs. project policy

2013-08-30 Thread David Johnston
Andres Freund-3 wrote
 On 2013-08-30 06:34:47 -0700, David Johnston wrote:
 Tom Lane-2 wrote
  I was one who sent a bug report - this error is not too dangerous, but
 it
  is hidden, and difficult to find, if you don't know what can be
 happen.
  Same as bug with plpgsql and SQL identifier collisions. If you
  understand,
  then you can protect self well and  simply. If not, then it is a magic
  error. So still I am thing so best solution is
 
  a) a warning when detect ORDER BY in variadic aggregates
 
  Such a warning would never be tolerated by users, because it would
 appear
  even when the query is perfectly correct.
 
  b) disallow ORDER BY in variadic aggregates in classic syntax, and
 enable
  it only in WITHIN GROUP syntax where is safe ,
 
  And we're *not* inventing randomly different syntax for variadic
  aggregates.  That ship sailed when we did it this way for regular
  functions.

 In the example case the problem is that ORDER BY constant is a valid, if
 not-very-useful, construct.  Can we warn on this specific usage and thus
 mitigate many of the potential avenues of mis-use?
 
 That doesn't help against something like »SELECT string_agg(somecol
 ORDER BY bar, separator)« where separator is a column.
 
 If we alter syntax for mitigation purposes I'd want to consider requiring
 parentheses around the columns that belong to the ORDER BY instead of
 using
 the full extended syntax of WITHIN GROUP.
 
 I think that ship has sailed. The syntax is there and it's not going
 away. Requiring different syntaxes for variadic/nonvariadic usages is
 going to be a way much bigger pitfall for users.

Neither suggestion (nor any suggestion I would imagine) is going to solve
the problem.  The goal is to minimize the size of the exposure.

For the second ORDER BY (col1, col2) suggestion it would be added and
recommended so those using that syntax would have less to worry about.  This
would apply to ALL invocations, not just variadic.

David J.






--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Variadic-aggregates-vs-project-policy-tp5768980p5769119.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-30 Thread Stephen Frost
Tom, all,

  I'm not one to give up a fight (I hope that's something ya'll like
  about me ;), but in this case I'm gonna have to concede.  Clearly, I'm
  in the minority about this, at least on the lists and among the active
  hackers.  Let me just say that I hope all the happy users of this will
  outweigh the complaints.  I suppose there's only one way to find out.

Thanks!

Stephen

* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Stephen Frost sfr...@snowman.net writes:
  * Robert Haas (robertmh...@gmail.com) wrote:
  I think you're getting way too hung up on the fact that the proposed
  auto.conf will be stored as a flat file.  From your comments upthread,
  I gather that you'd be rejoicing if it were a table.  
 
  I'd be happy if it was a table which managed an *independent* set of
  parameters from those used to bootstrap the system, but no one seems to
  like breaking up the options between things that can be sanely modified
  without other OS changes and things which require OS support.
 
 I agree with Robert's comments upthread that if the new facility can't do
 everything that can be done today by editing postgresql.conf, it's not
 going to be adequate.  So I'm not in favor of having two sets of
 parameters.  It's also not clear to me that we can make a reliable
 distinction between parameters that can prevent a server restart vs those
 that can't; or at least, the set of the latter will be much smaller than
 one could wish.
 
   regards, tom lane


signature.asc
Description: Digital signature


Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-30 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 I think you're getting way too hung up on the fact that the proposed
 auto.conf will be stored as a flat file.  From your comments upthread,
 I gather that you'd be rejoicing if it were a table.  

I'd be happy if it was a table which managed an *independent* set of
parameters from those used to bootstrap the system, but no one seems to
like breaking up the options between things that can be sanely modified
without other OS changes and things which require OS support.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Variadic aggregates vs. project policy

2013-08-30 Thread Pavel Stehule
2013/8/30 David Johnston pol...@yahoo.com

 Andres Freund-3 wrote
  On 2013-08-30 06:34:47 -0700, David Johnston wrote:
  Tom Lane-2 wrote
   I was one who sent a bug report - this error is not too dangerous,
 but
  it
   is hidden, and difficult to find, if you don't know what can be
  happen.
   Same as bug with plpgsql and SQL identifier collisions. If you
   understand,
   then you can protect self well and  simply. If not, then it is a
 magic
   error. So still I am thing so best solution is
  
   a) a warning when detect ORDER BY in variadic aggregates
  
   Such a warning would never be tolerated by users, because it would
  appear
   even when the query is perfectly correct.
  
   b) disallow ORDER BY in variadic aggregates in classic syntax, and
  enable
   it only in WITHIN GROUP syntax where is safe ,
  
   And we're *not* inventing randomly different syntax for variadic
   aggregates.  That ship sailed when we did it this way for regular
   functions.
 
  In the example case the problem is that ORDER BY constant is a valid, if
  not-very-useful, construct.  Can we warn on this specific usage and thus
  mitigate many of the potential avenues of mis-use?
 
  That doesn't help against something like »SELECT string_agg(somecol
  ORDER BY bar, separator)« where separator is a column.
 
  If we alter syntax for mitigation purposes I'd want to consider
 requiring
  parentheses around the columns that belong to the ORDER BY instead of
  using
  the full extended syntax of WITHIN GROUP.
 
  I think that ship has sailed. The syntax is there and it's not going
  away. Requiring different syntaxes for variadic/nonvariadic usages is
  going to be a way much bigger pitfall for users.

 Neither suggestion (nor any suggestion I would imagine) is going to solve
 the problem.  The goal is to minimize the size of the exposure.

 For the second ORDER BY (col1, col2) suggestion it would be added and
 recommended so those using that syntax would have less to worry about.
  This
 would apply to ALL invocations, not just variadic.


you cannot use this syntax - probably, because (col1, col2) is same as
ROW(col1, col2) and this syntax is supported now. So there is a collision.
I had a same idea, but I don't think so it is possible

Regards

Pavel



 David J.






 --
 View this message in context:
 http://postgresql.1045698.n5.nabble.com/Variadic-aggregates-vs-project-policy-tp5768980p5769119.html
 Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-30 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 * Robert Haas (robertmh...@gmail.com) wrote:
 I think you're getting way too hung up on the fact that the proposed
 auto.conf will be stored as a flat file.  From your comments upthread,
 I gather that you'd be rejoicing if it were a table.  

 I'd be happy if it was a table which managed an *independent* set of
 parameters from those used to bootstrap the system, but no one seems to
 like breaking up the options between things that can be sanely modified
 without other OS changes and things which require OS support.

I agree with Robert's comments upthread that if the new facility can't do
everything that can be done today by editing postgresql.conf, it's not
going to be adequate.  So I'm not in favor of having two sets of
parameters.  It's also not clear to me that we can make a reliable
distinction between parameters that can prevent a server restart vs those
that can't; or at least, the set of the latter will be much smaller than
one could wish.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Add database to PGXACT / per database vacuuming

2013-08-30 Thread Andres Freund
Hi,

For the logical decoding patch I added support for pegging
RecentGlobalXmin (and GetOldestXmin) to a lower value. To avoid causing
undue bloat  cpu overhead (hot pruning is friggin expensive) I split
RecentGlobalXmin into RecentGlobalXmin and RecentGlobalDataXmin where
the latter is the the xmin horizon used for non-shared, non-catalog
tables. That removed almost all overhead I could measure.

During that I was tinkering with the idea of reusing that split to
vacuum/prune user tables in a per db fashion. In a very quick and hacky
test that sped up the aggregate performance of concurrent pgbenches in
different databases by about 30%. So, somewhat worthwile ;).

The problem with that is that GetSnapshotData, which computes
RecentGlobalXmin, only looks at the PGXACT structures and not PGPROC
which contains the database oid. This is a recently added optimization
which made GetSnapshotData() quite a bit faster  scalable which is
important given the frequency it's called at.

What about moving/copying the database oid from PGPROC to PGXACT?
Currently a single PGXACT is 12 bytes which means we a) have several
entries in a single cacheline b) have ugly sharing because we will have
PGXACTs split over more than one cacheline.

Comments?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v5

2013-08-30 Thread Andres Freund
Hi,

I've attached a couple of the preliminary patches to $subject which I've
recently cleaned up in the hope that we can continue improving on those
in a piecemal fashion.
I am preparing submission of a newer version of the major patch but
unfortunately progress on that is slower than I'd like...

In the order of chance of applying them individuall they are:

0005 wal_decoding: Log xl_running_xact's at a higher frequency than checkpoints 
are done
* benefits hot standby startup
0003 wal_decoding: Allow walsender's to connect to a specific database
* biggest problem is how to specify the connection we connect
  to. Currently with the patch walsender connects to a database if it's
  not named replication (via dbname). Perhaps it's better to invent a
  replication_dbname parameter?
0006 wal_decoding: copydir: move fsync_fname to fd.[c.h] and make it public
* Pretty trivial and boring.
0007 wal_decoding: Add information about a tables primary key to struct 
RelationData
* Could be used in the matview refresh code
0002 wal_decoding: Introduce InvalidCommandId and declare that to be the new 
maximum for CommandCounterIncrement

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 3442c3a4e44c5a64efbe651b745a6f86f69cfdab Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Mon, 19 Aug 2013 13:24:30 +0200
Subject: [PATCH 02/13] wal_decoding: Introduce InvalidCommandId and declare
 that to be the new maximum for CommandCounterIncrement

This is useful to be able to represent a CommandId thats invalid. There was no
such value before.

This decreases the possible number of subtransactions by one which seems
unproblematic. Its also not a problem for pg_upgrade because cmin/cmax are
never looked at outside the context of their own transaction (spare timetravel
access, but thats new anyway).
---
 src/backend/access/transam/xact.c | 4 ++--
 src/include/c.h   | 1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 31e868d..0591f3f 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -766,12 +766,12 @@ CommandCounterIncrement(void)
 	if (currentCommandIdUsed)
 	{
 		currentCommandId += 1;
-		if (currentCommandId == FirstCommandId) /* check for overflow */
+		if (currentCommandId == InvalidCommandId)
 		{
 			currentCommandId -= 1;
 			ereport(ERROR,
 	(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
-	 errmsg(cannot have more than 2^32-1 commands in a transaction)));
+	 errmsg(cannot have more than 2^32-2 commands in a transaction)));
 		}
 		currentCommandIdUsed = false;
 
diff --git a/src/include/c.h b/src/include/c.h
index 5961183..14bfdcd 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -368,6 +368,7 @@ typedef uint32 MultiXactOffset;
 typedef uint32 CommandId;
 
 #define FirstCommandId	((CommandId) 0)
+#define InvalidCommandId	(~(CommandId)0)
 
 /*
  * Array indexing support
-- 
1.8.3.251.g1462b67

From ac48fc2f5c5f0031494cfabb0bca46f0bbef47d2 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Mon, 19 Aug 2013 13:24:30 +0200
Subject: [PATCH 03/13] wal_decoding: Allow walsender's to connect to a
 specific database

Currently the decision whether to connect to a database or not is made by
checking whether the passed dbname parameter is replication. Unfortunately
this makes it impossible to connect a to a database named
replication... Possibly it would be better to use a separate connection
parameter like replication_dbname=xxx?

This is useful for future walsender commands which need database interaction.
---
 doc/src/sgml/protocol.sgml |  5 +++-
 src/backend/postmaster/postmaster.c| 13 +--
 .../libpqwalreceiver/libpqwalreceiver.c|  4 ++--
 src/backend/replication/walsender.c| 27 ++
 src/backend/utils/init/postinit.c  |  5 
 src/bin/pg_basebackup/pg_basebackup.c  |  4 ++--
 src/bin/pg_basebackup/pg_receivexlog.c |  4 ++--
 src/bin/pg_basebackup/receivelog.c |  4 ++--
 8 files changed, 51 insertions(+), 15 deletions(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 0b2e60e..51b4435 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -1304,7 +1304,10 @@ To initiate streaming replication, the frontend sends the
 literalreplication/ parameter in the startup message. This tells the
 backend to go into walsender mode, wherein a small set of replication commands
 can be issued instead of SQL statements. Only the simple query protocol can be
-used in walsender mode.
+used in walsender mode. A literaldbname/ of literalreplication/ will
+start a walsender not connected to any database, specifying any other database
+will connect to 

Re: [HACKERS] dynamic shared memory

2013-08-30 Thread Andres Freund
Hi,

On 2013-08-28 15:20:57 -0400, Robert Haas wrote:
  That way any corruption in that area will prevent restarts without
  reboot unless you use ipcrm, or such, right?
 
 The way I've designed it, no.  If what we expect to be the control
 segment doesn't exist or doesn't conform to our expectations, we just
 assume that it's not really the control segment after all - e.g.
 someone rebooted, clearing all the segments, and then an unrelated
 process (malicious, perhaps, or just a completely different cluster)
 reused the same name.  This is similar to what we do for the main
 shared memory segment.

The case I am mostly wondering about is some process crashing and
overwriting random memory. We need to be pretty sure that we'll never
fail partially through cleaning up old segments because they are
corrupted or because we died halfway through our last cleanup attempt.

  I think we want that during development, but I'd rather not go there
  when releasing. After all, we don't support a manual choice between
  anonymous mmap/sysv shmem either.

 That's true, but that decision has not been uncontroversial - e.g. the
 NetBSD guys don't like it, because they have a big performance
 difference between those two types of memory.  We have to balance the
 possible harm of one more setting against the benefit of letting
 people do what they want without needing to recompile or modify code.

But then, it made them fix the issue afaik :P

  In addition, I've included an implementation based on mmap of a plain
  file.  As compared with a true shared memory implementation, this
  obviously has the disadvantage that the OS may be more likely to
  decide to write back dirty pages to disk, which could hurt
  performance.  However, I believe it's worthy of inclusion all the
  same, because there are a variety of situations in which it might be
  more convenient than one of the other implementations.  One is
  debugging.
 
  Hm. Not sure what's the advantage over a corefile here.

 You can look at it while the server's running.

That's what debuggers are for.

  On MacOS X, for example, there seems to be no way to list
  POSIX shared memory segments, and no easy way to inspect the contents
  of either POSIX or System V shared memory segments.

  Shouldn't we ourselves know which segments are around?

 Sure, that's the point of the control segment.  But listing a
 directory is a lot easier than figuring out what the current control
 segment contents are.

But without a good amount of tooling - like in a debugger... - it's not
very interesting to look at those files either way? The mere presence of
a segment doesn't tell you much and the contents won't be easily
readable.

  Another use case is working around an administrator-imposed or
  OS-imposed shared memory limit.  If you're not allowed to allocate
  shared memory, but you are allowed to create files, then this
  implementation will let you use whatever facilities we build on top
  of dynamic shared memory anyway.
 
  I don't think we should try to work around limits like that.

 I do.  There's probably someone, somewhere in the world who thinks
 that operating system shared memory limits are a good idea, but I have
 not met any such person.

Let's drive users away from sysv shem is the only one I heard so far ;)

 I would never advocate deliberately trying to circumvent a
 carefully-considered OS-level policy decision about resource
 utilization, but I don't think that's the dynamic here.  I think if we
 insist on predetermining the dynamic shared memory implementation
 based on the OS, we'll just be inconveniencing people needlessly, or
 flat-out making things not work. [...]

But using file-backed memory will *suck* performancewise. Why should we
ever want to offer that to a user? That's what I was arguing about
primarily.

 If we're SURE
 that a Linux user will prefer posix to sysv or mmap or none in
 100% of cases, and that a NetBSD user will always prefer sysv over
 mmap or none in 100% of cases, then, OK, sure, let's bake it in.
 But I'm not that sure.

I think posix shmem will be preferred to sysv shmem if present, in just
about any relevant case. I don't know of any system with lower limits on
posix shmem than on sysv.

 I think this case is roughly similar
 to wal_sync_method: there really shouldn't be a performance or
 reliability difference between the ~6 ways of flushing a file to disk,
 but as it turns out, there is, so we have an option.

Well, most of them actually give different guarantees, so it makes sense
to have differing performance...

  Why do we want to expose something unreliable as preferred_address to
  the external interface? I haven't read the code yet, so I might be
  missing something here.

 I shared your opinion that preferred_address is never going to be
 reliable, although FWIW Noah thinks it can be made reliable with a
 large-enough hammer.

I think we need to have the arguments for that on list then. Those are
pretty damn fundamental 

Re: [HACKERS] Add database to PGXACT / per database vacuuming

2013-08-30 Thread Heikki Linnakangas

On 30.08.2013 19:01, Andres Freund wrote:

For the logical decoding patch I added support for pegging
RecentGlobalXmin (and GetOldestXmin) to a lower value. To avoid causing
undue bloat  cpu overhead (hot pruning is friggin expensive) I split
RecentGlobalXmin into RecentGlobalXmin and RecentGlobalDataXmin where
the latter is the the xmin horizon used for non-shared, non-catalog
tables. That removed almost all overhead I could measure.

During that I was tinkering with the idea of reusing that split to
vacuum/prune user tables in a per db fashion. In a very quick and hacky
test that sped up the aggregate performance of concurrent pgbenches in
different databases by about 30%. So, somewhat worthwile ;).

The problem with that is that GetSnapshotData, which computes
RecentGlobalXmin, only looks at the PGXACT structures and not PGPROC
which contains the database oid. This is a recently added optimization
which made GetSnapshotData() quite a bit faster  scalable which is
important given the frequency it's called at.


Hmm, so you're creating a version of GetSnapshotData() that only takes 
into account backends in the same backend?



What about moving/copying the database oid from PGPROC to PGXACT?


Might be worthwhile.


Currently a single PGXACT is 12 bytes which means we a) have several
entries in a single cacheline b) have ugly sharing because we will have
PGXACTs split over more than one cacheline.


I can't get excited about either of these arguments, though. The reason 
for having separate PGXACT structs is that they are as small as 
possible, so that you can fit as many of them as possible in as few 
cache lines as possible. Whether one PGXACT crosses a cache line or not 
is not important, because when taking a snapshot, you scan through all 
of them.


I don't know how big an impact adding the database oid would have, on 
the case that the PGPROC/PGXACT split was done in the first place. In 
the worst case it will make taking a snapshot 1/3 slower under 
contention. That needs to be tested.


One idea is to have a separate PGXACT array for each database? Well, 
that might be difficult, but something similar, like group all PGXACTs 
for one database together, and keep a separate lookup array for where 
the entries for each database begins.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add database to PGXACT / per database vacuuming

2013-08-30 Thread Heikki Linnakangas

On 30.08.2013 21:07, Heikki Linnakangas wrote:

On 30.08.2013 19:01, Andres Freund wrote:

For the logical decoding patch I added support for pegging
RecentGlobalXmin (and GetOldestXmin) to a lower value. To avoid causing
undue bloat cpu overhead (hot pruning is friggin expensive) I split
RecentGlobalXmin into RecentGlobalXmin and RecentGlobalDataXmin where
the latter is the the xmin horizon used for non-shared, non-catalog
tables. That removed almost all overhead I could measure.

During that I was tinkering with the idea of reusing that split to
vacuum/prune user tables in a per db fashion. In a very quick and hacky
test that sped up the aggregate performance of concurrent pgbenches in
different databases by about 30%. So, somewhat worthwile ;).

The problem with that is that GetSnapshotData, which computes
RecentGlobalXmin, only looks at the PGXACT structures and not PGPROC
which contains the database oid. This is a recently added optimization
which made GetSnapshotData() quite a bit faster scalable which is
important given the frequency it's called at.


Hmm, so you're creating a version of GetSnapshotData() that only takes
into account backends in the same backend?


I mean, only takes account backends in the same database?

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.4] row level security

2013-08-30 Thread Josh Berkus
On 08/30/2013 03:05 AM, Kohei KaiGai wrote:

 Surely someone in the security community has discussed this?

 Security community considers covert channel is a hard to solve problem;
 nearly impossible to eliminate.
 Let's see the summary in wikipedia:
   http://en.wikipedia.org/wiki/Covert_channel

Well, in each of the cases covered in that article, the given technology
(OSI, TCP, etc.) takes specific provisions to limit the ability of
attackers to discover information via the covert channel.

However, we have yet to talk about taking any such provisions with
Postgres.  If we commit this patch, arguably we'll have a row-level
security feature which only protects data from well-behaved users, which
seems counterproductive.

So, arguments in favor of this patch:
a) it's as good as Oracle's security features, giving us check-box
compliance.
b) it allows securing individual rows against attackers with limited
technical knowledge or limited database access, and could be very
hardened in combination with intelligent access control.
c) it is an improvement on techniques like Veil (is it)?
d) we plan to continue improving it and closing covert channels, or
limiting their bandwidth.

Arguments against:
m) covert channels are currently broad enough to make it trivially
circumventable (are they?)
n) overhead and code maintenance required is substantial

So, determinative questions:

1) are the security mechanisms supplied by this patch superior in some
way to approaches like Veil for multi-tenant applications?  (this is a
serious question, as multi-tenant applications are far less concerned
about covert channels)

2) do we plan to reduce the accessibility of data via covert channels
over successive releases?  How?

3) will accepting this patch allow our users in the Government space to
more freely adopt PostgreSQL?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add database to PGXACT / per database vacuuming

2013-08-30 Thread Andres Freund
On 2013-08-30 21:07:04 +0300, Heikki Linnakangas wrote:
 On 30.08.2013 19:01, Andres Freund wrote:
 For the logical decoding patch I added support for pegging
 RecentGlobalXmin (and GetOldestXmin) to a lower value. To avoid causing
 undue bloat  cpu overhead (hot pruning is friggin expensive) I split
 RecentGlobalXmin into RecentGlobalXmin and RecentGlobalDataXmin where
 the latter is the the xmin horizon used for non-shared, non-catalog
 tables. That removed almost all overhead I could measure.
 
 During that I was tinkering with the idea of reusing that split to
 vacuum/prune user tables in a per db fashion. In a very quick and hacky
 test that sped up the aggregate performance of concurrent pgbenches in
 different databases by about 30%. So, somewhat worthwile ;).
 
 The problem with that is that GetSnapshotData, which computes
 RecentGlobalXmin, only looks at the PGXACT structures and not PGPROC
 which contains the database oid. This is a recently added optimization
 which made GetSnapshotData() quite a bit faster  scalable which is
 important given the frequency it's called at.
 
 Hmm, so you're creating a version of GetSnapshotData() that only takes into
 account backends in the same backend?

You can see what I did for logical decoding in 
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blobdiff;f=src/backend/storage/ipc/procarray.c;h=11aa1f5a71196a61e31b711e0a044b2a5927a6cc;hp=9bf0989c9206b5e07053587f517d5e9a2322a628;hb=edcf0939072ebe68969560a7d54a26c123b279b4;hpb=ff4fa81665798642719c11c779d0518ef6611373

So, basically I compute the normal RecentGlobalXmin, and then just
subtract the logical xmin which is computed elsewhere to get the
catalog xmin.

What I'd done with the prototype of $topic (lost it, but I am going to
hack it together again) was just to compute RecentGlobalXmin (for
non-catalog, non-shared tables) at the same time with
RecentGlobalDataXmin (for everything else) by just not lowering
RecentGlobalDataXmin if pgxact-dboid != MyDatabaseId.
So, the snapshot itself was the same, but because RecentGlobalDataXmin
is independent from the other databases vacuum  pruning can cleanup way
more leading to a smaller database and higher database.

 Currently a single PGXACT is 12 bytes which means we a) have several
 entries in a single cacheline b) have ugly sharing because we will have
 PGXACTs split over more than one cacheline.
 
 I can't get excited about either of these arguments, though. The reason for
 having separate PGXACT structs is that they are as small as possible, so
 that you can fit as many of them as possible in as few cache lines as
 possible. Whether one PGXACT crosses a cache line or not is not important,
 because when taking a snapshot, you scan through all of them.

The problem with that is that we actually write to PGXACT pretty
frequently (at least -xid, -xmin, -nxids, -delayChkpt). As soon as
you factor that in, sharing cachelines between backends can hurt. Even
a plain GetSnapshotData() will write to MyPgXact-xmin...

 I don't know how big an impact adding the database oid would have, on the
 case that the PGPROC/PGXACT split was done in the first place. In the worst
 case it will make taking a snapshot 1/3 slower under contention. That needs
 to be tested.

Yes, definitely. I am basically wondering whether somebody has/sees
fundamental probles with it making it pointless to investigate.

 One idea is to have a separate PGXACT array for each database? Well, that
 might be difficult, but something similar, like group all PGXACTs for one
 database together, and keep a separate lookup array for where the entries
 for each database begins.

Given that we will have to search all PGXACT entries anyway because of
shared relations for the forseeable future, I can't see that being
really beneficial.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freezing without write I/O

2013-08-30 Thread Andres Freund
Hi Heikki,

On 2013-08-27 18:56:15 +0300, Heikki Linnakangas wrote:
 Here's an updated patch. The race conditions I mentioned above have been
 fixed.

Thanks for posting the new version!

I have a quick question: The reason I'd asked about the status of the
patch was that I was thinking about the state of the forensic freezing
patch. After a quick look at your proposal, we still need to freeze in
some situations (old  new data on the same page basically), so I'd say
it still makes sense to apply the forensic freezing patch, right?

Differing Opinions?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ENABLE/DISABLE CONSTRAINT NAME

2013-08-30 Thread Fabien COELHO



Uh ... why not just drop the constraint, and re-add it later if you want
it again?


My 0.02€ : maybe because you must keep track of the constraint details to 
do so, this it is significantly more error prone than disable / enable 
when the bookkeeping is done by the system and if everything is in a 
transaction... If the ENABLE is automatically done on the next COMMIT, 
that would be even better.


This seems like adding a lot of mechanism (and possible bugs) for a 
rather marginal use-case.


That is possible!

--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.4] row level security

2013-08-30 Thread Stephen Frost
Josh,

* Josh Berkus (j...@agliodbs.com) wrote:
 On 08/30/2013 03:05 AM, Kohei KaiGai wrote:
  Security community considers covert channel is a hard to solve problem;
  nearly impossible to eliminate.

While impossible to eliminate, we should certainly consider cases like
this where we can do better and fix them.  RLS certainly brings another
level of consideration to the overall PG security environment by
requiring we think about security on a row level rather than just a
table or column level.

We have issues with covert channels even without RLS though and holding
up RLS because it doesn't fix all the covert channels isn't sensible.
Column-level privleges have a similar problem, where you can read the
default value for a column using the catalogs.  Perhaps the default
isn't sensetive (you'd certainly hope not), but it's still an issue.  It
wouldn't surprise me to find that there are ways to abuse a multi-column
index which includes both a column you can manipulate and one you don't
have access to read to determine something about the hidden column
(maybe you have access to the 2nd field in the index and you can
encourage an in-order index traversal and then look at filtered rows, or
just work out a way to do timing attacks to determine the btree depth).

 Well, in each of the cases covered in that article, the given technology
 (OSI, TCP, etc.) takes specific provisions to limit the ability of
 attackers to discover information via the covert channel.

The work we've done around secure views would lend credit to our
attempts at taking specific provisions as well; sadly, PG is slightly
more complicated than TCP.  We do what we can and we've got a great
community which will point out where we can do better- and we work on it
and improve over time.  Hell, when roles were first added we had a
*massive* security hole because we didn't check to make sure we weren't
overrunning the length of the GUC.  It was a mistake and we should have
done better, but that doesn't mean adding roles was the wrong decision.

 However, we have yet to talk about taking any such provisions with
 Postgres.  If we commit this patch, arguably we'll have a row-level
 security feature which only protects data from well-behaved users, which
 seems counterproductive.

I would argue both that we *have* been taking provisions to avoid
obvious and big covert channels, and that this patch adds value even
if it doesn't protect the system perfectly from malicious users.  We're
all certainly aware of the ability for an attacker to cause major
problems to a PG system if they can issue arbitrary SQL and our
permissions system doesn't do much to protect us.  A single query which
doesn't require any privileges could cause havok on the system (massive
on-disk temp file, which could be shared with pg_xlog causing the system
to PANIC, massive CPU load if they can execute multiple commands in
parallel...).  Not to mention the default installation of pl/pgsql and
anonymous functions.

I could see many a web app (things like LedgerSMB) which could benefit
from having more fine-grained in-database control because they already
authenticate to the database as the user and have a static or at least
controlled set of queries which they run.  Today, any of those kinds of
systems have to implement their own RLS (though sometimes it's done
through independent tables for each customer or similar, rather than as
conditionals added to queries).

 a) it's as good as Oracle's security features, giving us check-box
 compliance.

I'd argue that this is definitely much more than 'check-box' compliance.

 b) it allows securing individual rows against attackers with limited
 technical knowledge or limited database access, and could be very
 hardened in combination with intelligent access control.
 c) it is an improvement on techniques like Veil (is it)?
 d) we plan to continue improving it and closing covert channels, or
 limiting their bandwidth.
 
 Arguments against:
 m) covert channels are currently broad enough to make it trivially
 circumventable (are they?)

There are some which are and likely deserve to be fixed.  Do they all
need to be addressed before this patch goes in?  I'd argue 'no'.

 n) overhead and code maintenance required is substantial
 
 So, determinative questions:
 
 1) are the security mechanisms supplied by this patch superior in some
 way to approaches like Veil for multi-tenant applications?  (this is a
 serious question, as multi-tenant applications are far less concerned
 about covert channels)

I'd argue 'yes' if just for the fact that it'd be simpler and easier to
use, both because it's in core and because you're using an actual
grammar instead of function calls- but this RLS does more than just
that, it's going to cause us to improve things that Veil probably can't
fix and simply ignores today.

 2) do we plan to reduce the accessibility of data via covert channels
 over successive releases?  How?

By discovering them and fixing them as we 

Re: [HACKERS] [v9.4] row level security

2013-08-30 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 We have issues with covert channels even without RLS though and holding
 up RLS because it doesn't fix all the covert channels isn't sensible.

I think it's entirely sensible to question whether we should reject (not
hold up) RLS if it has major covert-channel problems.

The scenario I'm worried about is where somebody says hey, Postgres has
RLS now, I can rely on that to hide my sooper sekrit data from other users
in the same database, and later they have a security breach through some
covert-channel attack.  Are they going to blame themselves?  No, they're
gonna blame Postgres.  Or consider the case where some bozo publishes
a method for such an attack and uses it to badmouth us as insecure.

I don't think we need the headaches that will result from promising
(or at least appearing to promise) something we can't deliver.  Nor am
I convinced that we're really doing users any favors by providing such a
feature.  They'd be *far* better advised to put their critical data in a
separate database.

In short, we can check some check-box is a really, really bad reason
to accept a security-related feature.  If we're going to put up with
all the downsides of RLS, I want the end result to be something that's
actually secure, not something that gives the illusion of security.
And right now, I do not believe we can get past the illusion stage,
ever (certainly not in a release or two).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.4] row level security

2013-08-30 Thread Josh Berkus
On 08/30/2013 12:43 PM, Tom Lane wrote:
 In short, we can check some check-box is a really, really bad reason
 to accept a security-related feature.  If we're going to put up with
 all the downsides of RLS, I want the end result to be something that's
 actually secure, not something that gives the illusion of security.
 And right now, I do not believe we can get past the illusion stage,
 ever (certainly not in a release or two).

Can you be more explicit about all the downsides of RLS?  I was just
looking over the patch, which is less than 5000 lines.  While it's not
small, we have larger patches in the CF.  So what's the specific
downsides of this?

The reason I brought up multi-tenant applications (MTA), BTW, is that
this is the other major potential utility of RLS, and for such users the
covert channel limitations are acceptable (as long as we publish them).
 That is, for a thing-as-a-service application, users are not assumed
to have unlimited access to the SQL command line anyway; RLS is employed
just to limit the damage they can do if they get access, and to limit
the disclosure if some app programmer makes a mistake.

Right now, the primary tool for doing row filtering for MTA is Veil,
which has numerous and well-known limitations.  If RLS has fewer
limitations, or is easier to deploy, maintain, and/or understand, then
it's a valuable feature for that user base, even if it doesn't address
the covert channels we've brought up at all.

That is, if RLS is your *second* level of defense, instead of your
primary defense, covert channels are not a make-or-break issue. It just
has to be better than what we had before.

Note that I have NOT done an evaluation of Veil vs. RLS for MTA at this
point.  I'm hoping someone else will ;-)

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.4] row level security

2013-08-30 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 On 08/30/2013 12:43 PM, Tom Lane wrote:
 In short, we can check some check-box is a really, really bad reason
 to accept a security-related feature.  If we're going to put up with
 all the downsides of RLS, I want the end result to be something that's
 actually secure, not something that gives the illusion of security.

 Can you be more explicit about all the downsides of RLS?  I was just
 looking over the patch, which is less than 5000 lines.  While it's not
 small, we have larger patches in the CF.  So what's the specific
 downsides of this?

I think it's going to be an ongoing maintenance headache and an endless
source of security bugs, even disregarding covert-channel issues.  I have
pretty much zero faith in the planner changes, in particular, and would
still not have a lot if they were adequately documented, which they
absolutely are not.  The whole thing depends on nowhere-clearly-stated
assumptions that plan-time transformations will never allow an RLS check
to be bypassed.  I foresee future planner work breaking this in
non-obvious ways on a regular basis (even granting the assumption that
it's bulletproof today, which is at best unproven).

 The reason I brought up multi-tenant applications (MTA), BTW, is that
 this is the other major potential utility of RLS, and for such users the
 covert channel limitations are acceptable (as long as we publish them).

[ shrug... ]  You might've noticed I work for a multi-tenant shop now.
I'm still not excited about this.

 That is, if RLS is your *second* level of defense, instead of your
 primary defense, covert channels are not a make-or-break issue. It just
 has to be better than what we had before.

Yeah, that's a fair point.  I'm just not convinced that it's enough better
to justify the maintenance burden we'll be taking on.  I'm not thrilled
about the every bug is a security bug angle, either.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What happens at BIND time? (pg_upgrade issue)

2013-08-30 Thread Josh Berkus
On 08/28/2013 11:44 AM, Josh Berkus wrote:
 Tom,
 
 Does the backend's memory usage climb, or hold steady?  If the former,
 I'd bet on client failure to release resources, eg not closing the
 portals when done with them.  A memory map from MemoryContextStats
 would help determine exactly what's leaking.
 
 FS cache usage increases through the test run, as you'd expect, but the
 amount of pinned memory actually remains pretty much constant -- and has
 the same usage in both 8.4 (where the BIND issue doesn't happen) and
 9.3b2 (where it does).

So, this just got a lot stranger.

What we've been testing here is upgrading from 8.4 to 9.X via
pg_upgrade, because that's what they have to do in production for time
reasons.  We recently confirmed that this issue affects 9.0 also.

So I finally got results back from the test where we do dump/restore (to
9.3b2) instead of pg_upgrade ... and the BIND regression does not occur.
 So this now has something to do with pg_upgrade, not just BIND and Java.

cc'd Bruce for that reason.

We'll be rerunning these tests next week, comparing a 9.3 with the issue
to a 9.3 without it under oprofile etc.  Suggestions on what we should
look for are welcome.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.4] row level security

2013-08-30 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Stephen Frost sfr...@snowman.net writes:
  We have issues with covert channels even without RLS though and holding
  up RLS because it doesn't fix all the covert channels isn't sensible.
 
 I think it's entirely sensible to question whether we should reject (not
 hold up) RLS if it has major covert-channel problems.

Rejecting RLS because we've suddently realized that covert channels
exist is foolishness.  It's akin to rejecting the ability to add stored
procedures because we don't protect prosrc from people who don't own or
can't execute the function.

 The scenario I'm worried about is where somebody says hey, Postgres has
 RLS now, I can rely on that to hide my sooper sekrit data from other users
 in the same database, and later they have a security breach through some
 covert-channel attack.  Are they going to blame themselves?  No, they're
 gonna blame Postgres.  Or consider the case where some bozo publishes
 a method for such an attack and uses it to badmouth us as insecure.

In my experience, issues are discovered, patched, and released as
security updates.  Does adding RLS mean we might have more of those?
Sure, as did column level privileges and as does *any* such increase in
the granularity of what we can provide security-wise.

 I don't think we need the headaches that will result from promising
 (or at least appearing to promise) something we can't deliver.  Nor am
 I convinced that we're really doing users any favors by providing such a
 feature.  They'd be *far* better advised to put their critical data in a
 separate database.

We've barely got cross-database queries with FDWs.  The notion that
adding such complexity into those as RLS, which each individual user
will need to figure out how to do themselves and most will likely get
far wrong and much worse than what we'd implement, is better for
our users is just ridiculous.

 In short, we can check some check-box is a really, really bad reason
 to accept a security-related feature.  If we're going to put up with
 all the downsides of RLS, I want the end result to be something that's
 actually secure, not something that gives the illusion of security.
 And right now, I do not believe we can get past the illusion stage,
 ever (certainly not in a release or two).

I'm not argueing for this because it fulfills some check-box; the
question about if it would help a given set of clients (ones which I no
longer have any direct relationship with, as it turns out) adopt PG was
asked and I answered it as best I could.

I certainly think we need to get past the 'illusion' stage also.  I'm
certainly more optimistic about that than you are but I also understand
it's not going to be perfect in the first release- but I do think it'll
be better than the 'illusion' stage.  It'll get there because we'll
continue to discuss it, people will test it, etc; as one hopes happens
with all new features, but this even more than others.

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] Window functions can be created with defaults, but they don't work

2013-08-30 Thread Tom Lane
I noticed this while poking at the variadic-aggregates issue:

regression=# create function nth_value_def(anyelement, integer = 1) returns 
anyelement language internal window immutable strict as 'window_nth_value';
CREATE FUNCTION
regression=# SELECT nth_value_def(ten) OVER (PARTITION BY four), ten, four
FROM (SELECT * FROM tenk1 WHERE unique2  10 ORDER BY four, ten)s;
The connection to the server was lost. Attempting reset: Failed.

The reason this crashes is that the planner doesn't apply
default-insertion to WindowFunc nodes, only to FuncExprs.  We could make
it do that, probably, but that seems to me like a feature addition.
I think a more reasonable approach for back-patching is to fix function
creation to disallow attaching defaults to window functions (or
aggregates, for that matter, which would have the same issue if CREATE
AGGREGATE had the syntax option to specify defaults).  ProcedureCreate
seems like an appropriate place, since it already contains a lot of sanity
checks of this sort.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-08-30 Thread Peter Geoghegan
For many years now, MySQL has a feature called INSERT IGNORE [1];
SQLite has INSERT ON CONFLICT IGNORE [2]; SQL Server has an option
called IGNORE_DUP_KEY and Oracle has a hint called
IGNORE_ROW_ON_DUPKEY_INDEX (they acknowledge that it's a bit odd that
a hint changes the semantics of a DML statement - of
course, MERGE has always been able to do something approximately equivalent).

Each of these features has their respective systems simply not insert
certain tuples that will result in a duplicate key violation, while
potentially proceeding with the insertion of other, non-violating
tuples.

The attached WIP patch implements this for Postgres, with a few
notable differences:

1) The patch is only interested in unique index violations
(specifically, violations of amcanunique access method unique
indexes); it will not do anything with NULL constraint violations, as
the MySQL feature does, for example. It also doesn't care about
exclusion constraints, even though it's currently possible to defer
exclusion constraint enforcement in a way that is analogous to how
it's done for btree indexes.

2) The clause that I've proposed is, as you'll have already gathered,
spelt slightly differently to any of these systems. I'm not
particularly attached to how I've spelt it. I've spelt it this way so
as to avoid suggesting that it's fully compatible with the MySQL
feature. I don't think we want weird NULLness behavior, but I may be
mistaken. If we want that additional behavior, we probably want it as
an optional adjunct, or perhaps an entirely independent clause to what
I've specified here.

3) RETURNING is expanded - RETURNING REJECTS * is now possible where
that makes sense.

It is possible for clients to interrogate the wire protocol to see the
number of rows affected (this can be done with a PQcmdTuples() call by
libpq clients, for example), and see how that compares with the number
of tuples they tried to insert. In addition, a client can use
RETURNING to see what rows were successfully inserted. Here is a
simple example session that shows this usage:

postgres=# create table tab(id int4 primary key, val text);
CREATE TABLE
postgres=# insert into tab(id, val) values(1, 'val'), (3, 'val'), (4, 'val');
INSERT 0 3
postgres=# insert into tab(id, val) values(1, 'nosert'), (2,
'nosert'), (3, 'nosert') on duplicate key ignore returning id;
 id

  2
(1 row)

INSERT 0 1
postgres=# select xmin, * from tab;
  xmin  | id | val
++-
 580843 |  1 | val
 580843 |  3 | val
 580843 |  4 | val
 580844 |  2 | nosert
(4 rows)

If this all happened in two separate, concurrent sessions, the
transaction with xid 580844 might well have blocked on the outcome of
580843 in the usual way (ultimately inserting or doing nothing as
appropriate for each tuple), just as you'd expect. Note, however, that
the xid of the noserter is only seen here for the tuple that it
actually successfully inserted - in effect, the noserter session does
not lock tuples that it did not originally insert. That's consistent
with MySQL's INSERT IGNORE, for example. However, some people might
prefer it if this did share lock rows that prevented tuples from being
inserted, or if that was at least an option. That would be something
quite a bit closer to a fully-fledged upsert - I imagine we'd have to
lock the old tuple using EvalPlanQual, and if a submission did that
correctly then I think we'd be well on our way to solving all the
problems.

RETURNING REJECTS is a way of getting the inverse of affected tuples
of an ordinary INSERT...RETURNING statement - rather than returning
rows successfully inserted only (i.e. tuples where either a BEFORE
trigger returned NULL or we didn't proceed with insertion), we
returning rows that were not successfully inserted only. This is only
meaningful in the context of INSERT...ON DUPLICATE KEY IGNORE. Sure,
clients could figure this out for themselves using vanilla RETURNING
*, but I think that this addition is justified by the fact that it's
generally expected that the number of rejects will be relatively
small. People are going to want to use this feature in an ad-hoc
manner within a psql session, and supporting this will particularly
help there. It might be nice to find a way to indicate the exact
constraint violated per row returned, but I preferred to wait and see
if people thought it was a good idea generally.

Use-cases
=

The use-cases for this feature are fairly obvious. It can be used as a
way of merging tuples easily, if clients can live with the present
limitations.

In addition to that, the feature is may be of value to clients of the
logical change-set generation infrastructure (such as the proposed BDR
contrib module[3]), if and when that infrastructure is finally
committed. Currently, the BDR prototype simply balks at INSERT
conflicts. This is because the amount of subtransactions/xids
generated to do anything else is considered prohibitive. If Postgres
is ever going to solve the kinds of problems 

Re: [HACKERS] Window functions can be created with defaults, but they don't work

2013-08-30 Thread Robert Haas
On Fri, Aug 30, 2013 at 6:14 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I noticed this while poking at the variadic-aggregates issue:

 regression=# create function nth_value_def(anyelement, integer = 1) returns 
 anyelement language internal window immutable strict as 'window_nth_value';
 CREATE FUNCTION
 regression=# SELECT nth_value_def(ten) OVER (PARTITION BY four), ten, four
 FROM (SELECT * FROM tenk1 WHERE unique2  10 ORDER BY four, ten)s;
 The connection to the server was lost. Attempting reset: Failed.

 The reason this crashes is that the planner doesn't apply
 default-insertion to WindowFunc nodes, only to FuncExprs.  We could make
 it do that, probably, but that seems to me like a feature addition.
 I think a more reasonable approach for back-patching is to fix function
 creation to disallow attaching defaults to window functions (or
 aggregates, for that matter, which would have the same issue if CREATE
 AGGREGATE had the syntax option to specify defaults).  ProcedureCreate
 seems like an appropriate place, since it already contains a lot of sanity
 checks of this sort.

I'm not sure I agree.  Under that approach, any functions that have
already been created like that will still crash the server.  A
malicious user could create a function like this now and wait to
crontab it until the day he's leaving the company.  Or there are more
accidental scenarios as well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2013-08-30 Thread Robert Haas
On Thu, Aug 29, 2013 at 10:55 PM, Fujii Masao masao.fu...@gmail.com wrote:
 Attached patch adds new GUC parameter 'compress_backup_block'.

I think this is a great idea.

(This is not to disagree with any of the suggestions made on this
thread for further investigation, all of which I think I basically
agree with.  But I just wanted to voice general support for the
general idea, regardless of what specifically we end up with.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] operator precedence issues

2013-08-30 Thread Merlin Moncure
Hackers,

The operator precedence rules seem hard wired to not be able to be
worked around, not matter what.  The pain point for me has always been
the division operator -- once in a while I end up in a situation where
I want to override it so that it wraps the divisor with NULLIF.  There
is no way I can see to do that: custom operator (for example '//')
names evaluate in different precedence order which is a non-starter
essentially.  That I'm ok with given the reasoning in the docs, but
I'm really scratching my head over this rule (via
http://www.postgresql.org/docs/9.3/static/sql-syntax-lexical.html#SQL-PRECEDENCE):

When a schema-qualified operator name is used in the OPERATOR syntax,
as for example in:
SELECT 3 OPERATOR(pg_catalog.+) 4;
the OPERATOR construct is taken to have the default precedence shown
in Table 4-2 for any other operator. This is true no matter which
specific operator appears inside OPERATOR().

That rule seems intentionally designed to make it impossible to to
override mathematical behaviors.  Mainly curious -- was that
intentional?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Window functions can be created with defaults, but they don't work

2013-08-30 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Fri, Aug 30, 2013 at 6:14 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The reason this crashes is that the planner doesn't apply
 default-insertion to WindowFunc nodes, only to FuncExprs.

 I'm not sure I agree.  Under that approach, any functions that have
 already been created like that will still crash the server.  A
 malicious user could create a function like this now and wait to
 crontab it until the day he's leaving the company.  Or there are more
 accidental scenarios as well.

The crash is only possible because the underlying internal-language
function doesn't sanity-check its input enough to catch the case of too
few arguments.  As such, it's not that different from hundreds of other
cases where a superuser can cause a crash by misdeclaring the arguments to
an internal-language function.  So I don't find your argument compelling.
I'd even say this was user error, except that it's not obvious that this
case shouldn't work.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-08-30 Thread Josh Berkus
On 08/30/2013 03:09 PM, Peter Geoghegan wrote:
 The attached WIP patch implements this for Postgres, with a few
 notable differences:

Thank you for addressing this.  If nobody is going to hack out MERGE, we
could really use at least this feature.

 3) RETURNING is expanded - RETURNING REJECTS * is now possible where
 that makes sense.

Oh, nifty!  OK, now I can *really* use this feature.

 This patch is a spin-off from a broader effort to implement
 INSERT...ON DUPLICATE KEY UPDATE (upsert). During the 2012 developer

Yeah, I was wondering when we'd get to that.  Obviously there will be
users clamoring for it ...

 Unlike some other systems like DB2, we have always allowed BEFORE ROW
 triggers to execute arbitrary SQL. I've frequently thought this was a
 bit of a wart (e.g. [10]), and certainly not supportive of sensible,
 idiomatic trigger use, but there isn't much we can do about it at this
 stage. Right now, BEFORE ROW triggers will still fire when the new
 code decides to not do the insert. It certainly wouldn't be acceptable
 to allow before triggers to run *after* the first phase of speculative
 insertion, because they might try and access an index with an
 exclusive locked buffer, resulting in backend self-deadlock. Besides,
 we cannot really know *what* to lock until after the before triggers
 fire. That leaves us with two options, as far as I can tell:
 
 1. Just have users live with those semantics. This is what the patch
 does presently, and anyone who is using triggers in what I consider to
 be a sensible way doesn't have to care. For everyone else, it's a
 gotcha that they have to deal with, to be noted prominently.

+1.  We already allow BEFORE triggers to violate referential integrity.
 I don't think that allowing them to behave oddly around INSERT ...
IGNORE is a problem compared to that.  We just need to document it, so
that users know that their BEFORE code will fire even if the INSERT is
being ignored, and that a BEFORE trigger can cause an INSERT ... IGNORE
to error out.

 I have done no performance testing to date. Reviewers will want to pay
 attention to the performance implications, particularly in the regular
 insert case; it's conceivable that I've regressed things, though I
 don't specifically suspect that I have.

Yeah, we'll also want to document the performance overhead for the bulk
loading case.  I know I'll want to use this syntax as a primitive form
of MERGE, and I'll want to know what it costs me.

Does this work with multiple VALUES rows?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-08-30 Thread Peter Geoghegan
On Fri, Aug 30, 2013 at 3:40 PM, Josh Berkus j...@agliodbs.com wrote:
 Does this work with multiple VALUES rows?

Yes.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] operator precedence issues

2013-08-30 Thread Tom Lane
Merlin Moncure mmonc...@gmail.com writes:
 The operator precedence rules seem hard wired to not be able to be
 worked around, not matter what.

That's right.  In the first place, bison is incapable of doing anything
other than hard-wired operator precedence.  In the second, if we did
try to allow catalog-driven precedence, it would require catalog lookups
during the raw parser phase, which isn't going to work for a number
of implementation reasons; but worse than the implementation troubles
is that the grammar would then become fundamentally ambiguous, eg there
could be multiple correct parsings of A+B*C depending on what data types
A,B,C have.  So precedence is hard-wired based on the operator name.

 I'm really scratching my head over this rule (via
 http://www.postgresql.org/docs/9.3/static/sql-syntax-lexical.html#SQL-PRECEDENCE):

 When a schema-qualified operator name is used in the OPERATOR syntax,
 as for example in:
 SELECT 3 OPERATOR(pg_catalog.+) 4;
 the OPERATOR construct is taken to have the default precedence shown
 in Table 4-2 for any other operator. This is true no matter which
 specific operator appears inside OPERATOR().

Yeah.  I'd rather have said that it's the same precedence as for the
undecorated operator name, but again bison doesn't really have a way
to do that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] operator precedence issues

2013-08-30 Thread Andres Freund
Hi,

On 2013-08-30 17:35:04 -0500, Merlin Moncure wrote:
 When a schema-qualified operator name is used in the OPERATOR syntax,
 as for example in:
 SELECT 3 OPERATOR(pg_catalog.+) 4;
 the OPERATOR construct is taken to have the default precedence shown
 in Table 4-2 for any other operator. This is true no matter which
 specific operator appears inside OPERATOR().
 
 That rule seems intentionally designed to make it impossible to to
 override mathematical behaviors.  Mainly curious -- was that
 intentional?

You can change your search_path to include your schema before an
explicitly listed pg_catalog afair. Not nice, but should work...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-08-30 Thread Andres Freund
Hi,

This is awesome.

On 2013-08-30 15:09:59 -0700, Peter Geoghegan wrote:
 1) The patch is only interested in unique index violations
 (specifically, violations of amcanunique access method unique
 indexes); it will not do anything with NULL constraint violations, as
 the MySQL feature does, for example. It also doesn't care about
 exclusion constraints, even though it's currently possible to defer
 exclusion constraint enforcement in a way that is analogous to how
 it's done for btree indexes.

All that seems sane to me. I very, very much do not want it to deal with
NOT NULL violations.

 [syntax]

Haven't thought about syntax at all so far, that's food for another day.

 Use-cases
 =
 ...
 In addition to that, the feature is may be of value to clients of the
 logical change-set generation infrastructure (such as the proposed BDR
 contrib module[3]), if and when that infrastructure is finally
 committed. Currently, the BDR prototype simply balks at INSERT
 conflicts. This is because the amount of subtransactions/xids
 generated to do anything else is considered prohibitive. If Postgres
 is ever going to solve the kinds of problems that BDR proposes to
 solve well (except for the relatively narrow use-case where balking at
 INSERT conflicts is acceptable), it needs something like what I've
 proposed here. The only alternative is probably to replay a
 transaction without using subtransactions, and if that fails, remember
 that fact and wrap every single change in a subxact. This is really
 going to be painful for large transactions.

Exactly. In completely unscientific tests it reduces speed to less than
1/10 of the version without subxacts. Without a single conflict that is.

 Andres privately said some weeks back (prior to my sharing of this
 information, and even prior to writing most of the code posted here)
 that he'd like to see an INSERT...ON DUPLICATE KEY LOCK. I suppose
 that he must have meant that he'd like to see shared locks obtained on
 rows that blocked a noserter from actually inserting some of their
 proposed tuples, for the purposes of conflict resolution (which, as
 I've said, this doesn't do). Perhaps he'd like to comment on these
 issues here.

Ok, so what we would like to do is basically to follow a protocol
(simplified) like:

1) replay INSERT, using ON DUPLICATE IGNORE
2) if INSERT succeeded, be happy, no conflict (another INSERT with the
   same key might conflict though)
3) if the INSERT failed, lock tuple for UPDATE
4) if the tuple got deleted by another session, goto 1
5) check whether local tuple or the already inserted tuple wins conflict 
resolution
6) UPDATE or skip accordingly

If there's a variant of INSERT ... ON DUPLICATE that gets a FOR UPDATE
lock on the existing row this gets simpler because there's no chance
anymore we need to loop or look for other versions of the row.

Makes sense?

 Mechanism
 =

 [...]

 From a high level, the patch works by adding something that is
 tentatively internally referred to as speculative insertion in
 lower-level code comments. The basic approach is:

 * Within ExecInsert, when inserting into a regular heap relation from
 a tuple table slot (after BEFORE triggers fire but before inserting
 the heap tuple), there is an extra function call to a new function,
 ExecLockIndexTuples().
 
 * For each applicable amcanunique unique index, ExecLockIndexTuples()
 calls a new am method, amlock. For each unique index, we end up
 acquiring a write buffer lock in a similar fashion to _bt_doinsert.
 When we return from the new amlock function, we report whether or not
 there was a duplicate without having inserted any index tuples or
 having otherwise done something that requires a heap tuple to be
 present. I like to call this point the point of no return, where
 individual unique indexes commit themselves to insertion (which is to
 say, the point after which a unique constraint violation becomes
 impossible but before actual index tuple insertion in _bt_doinsert
 [9]). Another duplicate key cannot be inserted by anyone else until
 that write lock is released, which it won't be until insertion or
 IGNORE, for the same reason why that's currently true once control
 reaches the point of no return. A pagesplit might need to happen
 before insertion proper of that index tuple, for example, and that's
 okay.
 
 * If we have total consensus among all unique indexes, we proceed with
 a heap insertion of the tuple formed from the tuple table slot
 ExecInsert() was called in respect of. Yes, this means that
 potentially multiple exclusive buffer locks are held for the duration
 of the heap tuple insertion. If not, we release the locks early and
 return NULL from ExecInsert() before heap insertion ever happens (and
 certainly before index tuple insertion ever happens).
 
 * If and when we get to ExecInsertIndexTuples(), and ultimately to
 _bt_doinsert, the new _bt_doinsert knows that for any unique indexes
 it sees, that the process has 

Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-08-30 Thread Peter Geoghegan
On Fri, Aug 30, 2013 at 5:47 PM, Andres Freund and...@2ndquadrant.com wrote:
 This is awesome.

Thanks.

 All that seems sane to me. I very, very much do not want it to deal with
 NOT NULL violations.

Sure. But there's nothing stopping us from doing that as a totally
orthogonal thing. Not that I personally find it to be terribly
compelling or anything. It's an easy addition, but I'm certainly not
going to try and mandate it as integral to what I've done here if that
doesn't suit you.

 Ok, so what we would like to do is basically to follow a protocol
 (simplified) like:

 1) replay INSERT, using ON DUPLICATE IGNORE
 2) if INSERT succeeded, be happy, no conflict (another INSERT with the
same key might conflict though)
 3) if the INSERT failed, lock tuple for UPDATE
 4) if the tuple got deleted by another session, goto 1
 5) check whether local tuple or the already inserted tuple wins conflict 
 resolution
 6) UPDATE or skip accordingly

Right, I thought that's what you meant. I'll have to ponder it
further. However, I don't think locking the old tuple(s) is mandatory
to make this a useful feature - as I've noted, MySQL doesn't do that.
That said, I really want to support that, perhaps as another DUPLICATE
KEY variant. As I've noted, if we had that, we almost wouldn't need
upsert - loosely speaking it would be mere syntactic sugar on top of
what you require.

 If there's a variant of INSERT ... ON DUPLICATE that gets a FOR UPDATE
 lock on the existing row this gets simpler because there's no chance
 anymore we need to loop or look for other versions of the row.

 Makes sense?

Perfect sense.

 Puh. It took me some time to even start to understand what you're doing
 here...

 While I ponder on it some more, could you explain whether I understood
 something correctly? Consider the following scenario:

 CREATE TABLE foo(a int UNIQUE, b int UNIQUE);

 S1: INSERT INTO foo(0, 0);
 S1: BEGIN;
 S1: INSERT INTO foo(1, 1);
 S1: SELECT pg_sleep(3600);
 S2: BEGIN;
 S2: INSERT INTO foo(2, 1);
 S3: SELECT * FROM foo WHERE a = 0;

 Won't S2 in this case exclusively lock a page in foo_a_key (the one
 where 2 will reside on) for 3600s, while it's waiting for S1 to finish
 while getting the speculative insertion into foo_b_key?

Yes. The way the patch currently handles that case is unacceptable. It
needs to be more concerned about holding an exclusive lock on
earlier-locked indexes when locking the second and subsequent indexes
iff it blocks on another transaction in the course of locking the
second and subsequent indexes.

 ISTM we could use something like a new lock level to make that work more
 sensibly, basically something like LW_FOR_UPDATE which does not conflict
 with _SHARED but conflicts with _EXCLUSIVE.

I'll ponder it further. There are probably a number of options.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Variadic aggregates vs. project policy

2013-08-30 Thread Alvaro Herrera
Uh, the pg_dump part checks for version 80400, shouldn't it be 90400?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Variadic aggregates vs. project policy

2013-08-30 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Uh, the pg_dump part checks for version 80400, shouldn't it be 90400?

The reasoning there is that 8.4 is where we added
pg_get_function_arguments(), so this dumping code should work against that
server version or later.  (Oh, memo to self: test that.)  It's true that
pre-9.4 servers are not going to produce any useful extra info by using
pg_get_function_arguments() on an aggregate; but the general idea of this
patch is to make the aggregate-related code look more like the
plain-function-related code, so using the same version cutoffs in both
cases seemed to make sense.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.4] row level security

2013-08-30 Thread Kohei KaiGai
2013/8/30 Josh Berkus j...@agliodbs.com:
 On 08/30/2013 03:05 AM, Kohei KaiGai wrote:

 Surely someone in the security community has discussed this?

 Security community considers covert channel is a hard to solve problem;
 nearly impossible to eliminate.
 Let's see the summary in wikipedia:
   http://en.wikipedia.org/wiki/Covert_channel

 Well, in each of the cases covered in that article, the given technology
 (OSI, TCP, etc.) takes specific provisions to limit the ability of
 attackers to discover information via the covert channel.

 However, we have yet to talk about taking any such provisions with
 Postgres.  If we commit this patch, arguably we'll have a row-level
 security feature which only protects data from well-behaved users, which
 seems counterproductive.

The point we shouldn't forget is information leakage via covert-channel
has less grade of threat than leakage via main-channel, because of
much less bandwidth.
Security community also concludes it is not avoidable nature as long
as human can observe system behavior and estimate something, thus,
security evaluation criteria does not require eliminate covert-channels
or does not pay attention about covert-channels for the products that
is installed on the environment with basic robustness (that means,
non-military, regular enterprise usage).
I don't think PostgreSQL goes into military-grade secure database
system. If so, it has to sacrifice many thing (like, performance,
usability, nature of open source, ...) for security. It's not a fact.

 So, arguments in favor of this patch:
 a) it's as good as Oracle's security features, giving us check-box
 compliance.
 b) it allows securing individual rows against attackers with limited
 technical knowledge or limited database access, and could be very
 hardened in combination with intelligent access control.

Even if attacker has enough knowledge, the ratio they can estimate
hidden values is very limited because of much less bandwidth of
covert channels.

 c) it is an improvement on techniques like Veil (is it)?
 d) we plan to continue improving it and closing covert channels, or
 limiting their bandwidth.

 Arguments against:
 m) covert channels are currently broad enough to make it trivially
 circumventable (are they?)
 n) overhead and code maintenance required is substantial

 So, determinative questions:

 1) are the security mechanisms supplied by this patch superior in some
 way to approaches like Veil for multi-tenant applications?  (this is a
 serious question, as multi-tenant applications are far less concerned
 about covert channels)

Yes. This RLS implementation targets the environment that does not
have requirement for a particular bandwidth upperbound on covert-
channels. It allows to centralize the place where we have to care
about access control on main-channel, so it well fits multi-tenant
applications.

 2) do we plan to reduce the accessibility of data via covert channels
 over successive releases?  How?

Less covert channels is better than massive, if we could close it with
reasonable cost; that means run-time performance, code complexity
and so on.
However, in general, it is impossible to eliminate anything in spite of
less degree of threats because of smaller bandwidth. So, I don't think
this is an issue to spent much efforts to solve.

 3) will accepting this patch allow our users in the Government space to
 more freely adopt PostgreSQL?

Government has two spaces. Most of their environment requires similar
requirement as enterprise grade system doing. On the other hand,
military environment often requires upper-bound of covert channel,
as a story I introduce on upthread, but they are minority and have
budget to develop special purpose database system designed for
security with first priority.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory

2013-08-30 Thread Amit Kapila
On Fri, Aug 30, 2013 at 9:15 PM, Andres Freund and...@2ndquadrant.com wrote:
 Hi,

 On 2013-08-28 15:20:57 -0400, Robert Haas wrote:
  That way any corruption in that area will prevent restarts without
  reboot unless you use ipcrm, or such, right?

 The way I've designed it, no.  If what we expect to be the control
 segment doesn't exist or doesn't conform to our expectations, we just
 assume that it's not really the control segment after all - e.g.
 someone rebooted, clearing all the segments, and then an unrelated
 process (malicious, perhaps, or just a completely different cluster)
 reused the same name.  This is similar to what we do for the main
 shared memory segment.

 The case I am mostly wondering about is some process crashing and
 overwriting random memory. We need to be pretty sure that we'll never
 fail partially through cleaning up old segments because they are
 corrupted or because we died halfway through our last cleanup attempt.

  I think we want that during development, but I'd rather not go there
  when releasing. After all, we don't support a manual choice between
  anonymous mmap/sysv shmem either.

 That's true, but that decision has not been uncontroversial - e.g. the
 NetBSD guys don't like it, because they have a big performance
 difference between those two types of memory.  We have to balance the
 possible harm of one more setting against the benefit of letting
 people do what they want without needing to recompile or modify code.

 But then, it made them fix the issue afaik :P

  In addition, I've included an implementation based on mmap of a plain
  file.  As compared with a true shared memory implementation, this
  obviously has the disadvantage that the OS may be more likely to
  decide to write back dirty pages to disk, which could hurt
  performance.  However, I believe it's worthy of inclusion all the
  same, because there are a variety of situations in which it might be
  more convenient than one of the other implementations.  One is
  debugging.
 
  Hm. Not sure what's the advantage over a corefile here.

 You can look at it while the server's running.

 That's what debuggers are for.

  On MacOS X, for example, there seems to be no way to list
  POSIX shared memory segments, and no easy way to inspect the contents
  of either POSIX or System V shared memory segments.

  Shouldn't we ourselves know which segments are around?

 Sure, that's the point of the control segment.  But listing a
 directory is a lot easier than figuring out what the current control
 segment contents are.

 But without a good amount of tooling - like in a debugger... - it's not
 very interesting to look at those files either way? The mere presence of
 a segment doesn't tell you much and the contents won't be easily
 readable.

  Another use case is working around an administrator-imposed or
  OS-imposed shared memory limit.  If you're not allowed to allocate
  shared memory, but you are allowed to create files, then this
  implementation will let you use whatever facilities we build on top
  of dynamic shared memory anyway.
 
  I don't think we should try to work around limits like that.

 I do.  There's probably someone, somewhere in the world who thinks
 that operating system shared memory limits are a good idea, but I have
 not met any such person.

 Let's drive users away from sysv shem is the only one I heard so far ;)

 I would never advocate deliberately trying to circumvent a
 carefully-considered OS-level policy decision about resource
 utilization, but I don't think that's the dynamic here.  I think if we
 insist on predetermining the dynamic shared memory implementation
 based on the OS, we'll just be inconveniencing people needlessly, or
 flat-out making things not work. [...]

 But using file-backed memory will *suck* performancewise. Why should we
 ever want to offer that to a user? That's what I was arguing about
 primarily.

 If we're SURE
 that a Linux user will prefer posix to sysv or mmap or none in
 100% of cases, and that a NetBSD user will always prefer sysv over
 mmap or none in 100% of cases, then, OK, sure, let's bake it in.
 But I'm not that sure.

 I think posix shmem will be preferred to sysv shmem if present, in just
 about any relevant case. I don't know of any system with lower limits on
 posix shmem than on sysv.

 I think this case is roughly similar
 to wal_sync_method: there really shouldn't be a performance or
 reliability difference between the ~6 ways of flushing a file to disk,
 but as it turns out, there is, so we have an option.

 Well, most of them actually give different guarantees, so it makes sense
 to have differing performance...

  Why do we want to expose something unreliable as preferred_address to
  the external interface? I haven't read the code yet, so I might be
  missing something here.

 I shared your opinion that preferred_address is never going to be
 reliable, although FWIW Noah thinks it can be made reliable with a
 large-enough 

Re: [HACKERS] [v9.4] row level security

2013-08-30 Thread Kohei KaiGai
2013/8/30 Tom Lane t...@sss.pgh.pa.us:
 Josh Berkus j...@agliodbs.com writes:
 On 08/30/2013 12:43 PM, Tom Lane wrote:
 In short, we can check some check-box is a really, really bad reason
 to accept a security-related feature.  If we're going to put up with
 all the downsides of RLS, I want the end result to be something that's
 actually secure, not something that gives the illusion of security.

 Can you be more explicit about all the downsides of RLS?  I was just
 looking over the patch, which is less than 5000 lines.  While it's not
 small, we have larger patches in the CF.  So what's the specific
 downsides of this?

 I think it's going to be an ongoing maintenance headache and an endless
 source of security bugs, even disregarding covert-channel issues.  I have
 pretty much zero faith in the planner changes, in particular, and would
 still not have a lot if they were adequately documented, which they
 absolutely are not.  The whole thing depends on nowhere-clearly-stated
 assumptions that plan-time transformations will never allow an RLS check
 to be bypassed.  I foresee future planner work breaking this in
 non-obvious ways on a regular basis (even granting the assumption that
 it's bulletproof today, which is at best unproven).

In general, we will adopt / enhance features as long as PostgreSQL runs
with evolution. It can never be free from bugs or maintenance, regardless
of its nature.
Later half seems to me a bit unfair because any features may conflict
with some future works, not only RLS. Also, if you have some tangible
planner enhancement plan, could you inform us which plan shall be
in progress? At least, it is impossible to adjust my implementation
because of abstract concern towards future conflict.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers