date:20130828

Re: [HACKERS] [v9.4] row level security

2013-08-28 Thread Oleg Bartunov

btw, there is serious problem with row-level security and constraints. For
example, user with low security level could use unique constraint to know
about existence of a row with higher security. I don't know, what is the
best practice to avoid this.

On Wed, Aug 28, 2013 at 1:37 AM, Greg Smith g...@2ndquadrant.com wrote:

On 7/20/13 10:08 AM, Kohei KaiGai wrote:

Hmm. I didn't have this idea. It seems to me fair enough and kills
necessity to enhance RangeTblEntry and getrelid() indeed.
I try to fix up this implementation according to your suggestion.

How is that going? I'm going to do a serious review of this myself over
the next few weeks. I have a good chunk of time set aside for it as part
of a larger project. I'm hoping to get more people here involved in that
effort too, starting in the November CF if that works out.

I've been trying to catch up with your larger plan for this feature for
9.4. You made this comment earlier:

Also, I'd like to have discussion for this feature in earlier half of
v9.4 to keep time for the remaining features, such as check on
writer-side, integration with selinux, and so on

Is any of that code around yet? I see that you have split your
submissions so that a smaller program can be reviewed today. I'd like to
start taking a look at the next step too though. For the project I'm
starting to work on here, getting the integration with labeling also done
is a very important thing to target for 9.4. It would be nice to see how
that fits together today, even if the code for it isn't being reviewed
heavily yet.

I don't quite understand yet what's missing on the writer side. If you
could help explain what's missing there, I would like to read about that.

--
Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/**mailpref/pgsql-hackershttp://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension Templates S03E11

2013-08-28 Thread Dimitri Fontaine

Peter Eisentraut pete...@gmx.net writes:
 make -C pg_upgrade_support all

Do we have something automated to easily test pg_upgrade?

My memories of how pg_upgrade works with extensions makes me believe
that I don't have anything special to do when those extensions have been
made available through a template: the scripts are not used. That said,
we want to retain some new dependencies…

The contrib/pg_upgrade/IMPLEMENTATION file is silent about upgrading
extensions…

I fixed the pg_upgrade_support compiling in my branch here, please find
the patch-on-patch attached. I also checked that the whole of contribs
now build fine and didn't find any other problem.

Regards,
-- 
Dimitri Fontaine06 63 07 10 78
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

commit 71e19099ab935413211e0e1b63529bc71eff7bd9
Author: Dimitri Fontaine dimi...@2ndquadrant.fr
Date:   Wed Aug 28 11:40:27 2013 +0200

Fix compiling contrib/pg_upgrade_support

diff --git a/contrib/pg_upgrade_support/pg_upgrade_support.c b/contrib/pg_upgrade_support/pg_upgrade_support.c
index 99e64c4..efb75a7 100644
--- a/contrib/pg_upgrade_support/pg_upgrade_support.c
+++ b/contrib/pg_upgrade_support/pg_upgrade_support.c
@@ -190,7 +190,8 @@ create_empty_extension(PG_FUNCTION_ARGS)
 		 text_to_cstring(extVersion),
 		 extConfig,
 		 extCondition,
-		 requiredExtensions);
+		 requiredExtensions,
+		 InvalidOid);
 
 	PG_RETURN_VOID();
 }

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.4] row level security

2013-08-28 Thread Kohei KaiGai

2013/8/28 Oleg Bartunov obartu...@gmail.com:
 btw, there is serious problem with row-level security and constraints. For
 example, user with low security level could use unique constraint to know
 about existence of a row with higher security.  I don't know, what is the
 best practice to avoid this.

It is out of scope for this feature. We usually calls this type of information
leakage covert channel; that is not avoidable in principle.
However, its significance is minor, because attacker must know identical
data to be here, or must have proving for each possible values.
Its solution is simple. DBA should not use value to be confidential as unique
key. If needed, our recommendation is alternative key, instead of natural key,
because its value itself does not have worth.

I should add a note of caution onto the documentation according to
the previous consensus, however, I noticed it had gone from the sgml files
while I was unaware. So, let me add description on the documentation.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-08-28 Thread Andres Freund

On 2013-08-28 13:58:08 +0900, Michael Paquier wrote:
 On Tue, Aug 27, 2013 at 11:09 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  On 2013-08-27 15:34:22 +0900, Michael Paquier wrote:
  I have been working a little bit more on this patch for the next
  commit fest. Compared to the previous version, I have removed the part
  of the code where process running REINDEX CONCURRENTLY was waiting for
  transactions holding a snapshot older than the snapshot xmin of
  process running REINDEX CONCURRENTLY at the validation and swap phase.
  At the validation phase, there was a risk that the newly-validated
  index might not contain deleted tuples before the snapshot used for
  validation was taken. I tried to break the code in this area by
  playing with multiple sessions but couldn't. Feel free to try the code
  and break it if you can!
 
  Hm. Do you have any justifications for removing those waits besides I
  couldn't break it? The logic for the concurrent indexing is pretty
  intricate and we've got it wrong a couple of times without noticing bugs
  for a long while, so I am really uncomfortable with statements like this.
 Note that the waits on relation locks are not removed, only the wait
 phases involving old snapshots.
 
 During swap phase, process was waiting for transactions with older
 snapshots than the one taken by transaction doing the swap as they
 might hold the old index information. I think that we can get rid of
 it thanks to the MVCC snapshots as other backends are now able to see
 what is the correct index information to fetch.

I don't see MVCC snapshots guaranteeing that. The only thing changed due
to them is that other backends see a self consistent picture of the
catalog (i.e. not either, neither or both versions of a tuple as
earlier). It's still can be out of date. And we rely on those not being
out of date.

I need to look into the patch for more details.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Valgrind Memcheck support

2013-08-28 Thread Andres Freund

On 2013-08-27 23:46:23 -0400, Noah Misch wrote:
 On Tue, Aug 27, 2013 at 04:14:27PM +0200, Andres Freund wrote:
  On 2013-06-09 17:25:59 -0400, Noah Misch wrote:
   ***
   *** 846,851  exec_simple_query(const char *query_string)
   --- 847,856 
 
 TRACE_POSTGRESQL_QUERY_START(query_string);
 
   + #ifdef USE_VALGRIND
   + VALGRIND_PRINTF(statement: %s\n, query_string);
   + #endif
   + 
  
  Is there a special reason for adding more logging here? I find this
  makes the instrumentation much less useful since reports easily get
  burried in those traces. What's the advantage of doing this instead of
  log_statement=...? Especially as that location afaics won't help for the
  extended protocol?
 
 I typically used valgrind --log-file=  To determine via log_statement
 which SQL statement caused a particular Valgrind error, I would match by
 timestamp; this was easier.  In retrospect, log_statement would have sufficed
 given both Valgrind and PostgreSQL logging to stderr.  Emitting the message in
 exec_simple_query() and not in exec_execute_message() is indeed indefensible.

It's an understandable mistake, nothing indefensible. We don't really
test the extended protocol :(

 That being said, could you tell me more about your workflow where the extra
 messages cause a problem?  Do you just typically diagnose each Valgrind error
 without first isolating the pertinent SQL statement?

There's basically two scenarios where I found it annoying:
a) I manually reproduce some issue, potentially involving several psql
shells. All those fire up autocompletion and such queries which bloat
the log while I actually only want to analyze something specific.
b) I don't actually look for anything triggered by an SQL statement
itself, but am looking for issues in a walsender, background worker,
whatever. I still need some foreground activity to trigger the behaviour
I want to see, but once more, valgrind's logs hide the error.

Also, I sometimes use valgrind's --db-command/--vgdb which gives you
enough context from the backtrace itself since you can just get the
statement from there.

I vote for just removing that VALGRIND_PRINTF - it doesn't give you
anything you cannot easily achieve otherwise...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Deprecating RULES

2013-08-28 Thread Tom Lane

Darren Duncan dar...@darrenduncan.net writes:
 That's a really old post/thread, and I'm not arguing for any kind of action 
 related to RULEs, please disregard the message. -- Darren Duncan

Oh, my fault --- for some reason my mail reader popped it up as an unread
message, and I failed to notice the date.  My apologies.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Behaviour of take over the synchronous replication

2013-08-28 Thread Amit Kapila

On Tue, Aug 27, 2013 at 4:51 PM, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Sun, Aug 25, 2013 at 3:21 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Sat, Aug 24, 2013 at 2:46 PM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 On Sat, Aug 24, 2013 at 3:14 AM, Josh Berkus j...@agliodbs.com wrote:
 On 08/23/2013 12:42 AM, Sawada Masahiko wrote:
 in case (a), those priority is clear. So I think that re-taking over
 is correct behaviour.
 OHOT, in case (b), even if AAA and BBB are set same priority, AAA
 server steals SYNC replication.
 I think it is better that BBB server continue behaviour SYNC standby,
 and AAA should become potential server.

 So, you're saying that:

 1) synchronous_standby_names = '*'

 2) replica 'BBB' is the current sync standby

 3) replica 'AAA' comes online

 4) replica 'AAA' grabs sync status

 ?
 I'm sorry that you are confuse.
 It means that

 1) synchronous_standby_names = '*'

 2) replica 'AAA' is the current sync standby

 3) replica 'BBB' is the current async standby (potential sync standby)

 4) replica 'AAA' fail. after that, replica 'BBB' is current sync standby.

 5) replica 'AAA' comes online

 6) replica 'AAA' grabs sync status




 If that's the case, I'm not really sure that's undesirable behavior.
 One could argue fairly persuasively that if you care about the
 precendence order of sync replicas, you shouldn't use '*'. And the rule
 of if using *, the lowest-sorted replica name has sync is actually a
 predictable, easy-to-understand rule.

 So if you want to make this a feature request, you'll need to come up
 with an argument as to why the current behavior is bad. Otherwise,
 you're just asking us to document it better (which is a good idea).
 It is not depend on name of standby server. That is, The standby server,
 which was connected to the master server during initial configration
 replication, is top priority even if priority of two server are same.

 What is happening here is that incase of '*' as priority of both are
 same, system will choose whichever
 comes in list of registered standby's first (list is maintained in
 structure WalSndCtl).
 Each standby is registered with WalSndCtl when a new WALSender is
 started in function InitWalSenderSlot().
 As 'AAA' has been registered first it becomes preferred sync standby
 even if priorities of both are same.
 When 'AAA' goes down, it marks that Slot entry as free (by setting
 pid=0 in function WalSndKill),
 now when 'AAA' comes back again, it gets that free Slot entry and
 again becomes preferred sync standby.

 Now if we want to fix as you are suggesting which I don't think is
 necessary, we might need to change WalSndKill and some other place so
 that whenever any standby goes down, it changes slots for already
 registered standby's.
 User must remember that which standby server connected to master server at
 first.
 I think that this behavior confuse user.
 so I think that we need to modify this behaviour or if '*' is used, priority
 of server is not same (modifying manual is also good).

 Here user has done the settings (setting synchronous_standby_names =
 '*'), after which he will not have any control which standby will
 become sync standby, so ideally he should not complain.

 It might be case that for some users current behavior is good enough
 which means that with '*' whichever standby has become sync standby
 first, it will be the sync standby always if alive.

 I'm thinking that it is not necessary to change WalSndKill.
 For example, we add the value (e.g., sync_standby) which have that
 which wal sender is active SYNC rep.
 And if sync_standby is already set and it is active, server doesn't
 looking for active standby.
 Only if sync_standby is not set and it is inactive, server looking for
 that which server is active SYNC rep.
 If so, we also prevent to find active SYNC rep whenever
 SyncRepReleaseWaiters() is called.
   For '*' case, it will be okay, but when the user has given proper
names, in that case even if there is any active Sync
   Rep, it has to be changed based on priority.

   I think here how to provide a fix, so that behavior gets changed to
what you describe is a second priority work, first
   is to show the value of use-case. Do you really know where people
actually setup using '*' as configuration and if
   yes, are they annoyed with current behavior?

   I have thought about it, but could imagine a scenario where people
will be using '*' in their production
   configurations, may be it will be useful in test labs.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] split postmaster's checkDataDir to src/common

2013-08-28 Thread Alvaro Herrera

Tom Lane wrote:

 What exactly is the argument for pushing this into 9.3?  Since we
 are past rc1, we should treat that branch as released.  If you wouldn't
 back-patch into all supported branches, you shouldn't be patching 9.3
 either.

This is to fix the stats_temp_directory issue that the directory is too
public.  We had agreed that it wasn't a problem before 9.3, but it is in
that branch because we changed the code to write so many files now
instead of just one.

If there's agreement to tighten the permission check in HEAD only, then
I will only commit to that branch.  But it seems a bit odd to me
postpone the tightening.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Master-slave visibility order

2013-08-28 Thread Ants Aasma

I'm currently implementing commit sequence number (CSN) based
snapshots and I hit a design decision that I would like to resolve
before I have too much code to rewrite.

The issue is commit visibility ordering on slaves. As a couple of
threads on hackers have already noted, currently commit order on
slaves can differ from what is seen on master. This arises from the
fact that on master commit visibility is determined by the order of
ProcArrayLock acquisition by ProcArrayEndTransaction(). On slaves
commit visibility is exactly the order of commit records in WAL.
Because XLogInsert() in RecordTransactionCommit() is not interlocked
with ProcArrayEndTransaction() these orders can differ. In case of
mixed sync and async transactions they in fact are quite likely to
differ due to the durability wait in RecordTransactionCommit().

It's not possible to change master commit order to match WAL order
because then either async transactions must either wait behind sync
transactions before returning losing the point of async; or async
transactions must return without becoming visible, changing user
visible semantics; or sync transactions must become visible before
they become durable, again changing user visible semantics.

As it's not possible to change master commit order, the slave
visibility order must change for the orders to be consistent. WAL
currently doesn't have the information to reconstruct master commit
order. Either we need to add a new WAL record for the commit order
(only necessary when wal_level=hot_standby) or add a side channel to
replication connections to communicate commit order information.

One more consideration here is the wish expressed by several hackers
that commit record LSNs could be used as CSNs. One of the most
interesting benefits of this is the property of LSNs being the same
over the whole cluster, meaning that it would be relatively simple to
create cluster wide consistent snapshots.

I currently see the following courses of action:

1. Do nothing about the inconsistency, use a transient global counter
for master commit order and commit record LSN for slaves.
   Pro: doesn't change any semantics
   Con: we are not making any progress towards cluster wide snapshots
or even serializable transactions on slaves.

2. Create a new WAL record type that is inserted when a transaction
becomes visible. LSN of this record determines transaction visibility
order. Async transactions can be optimized to skip this record. This
record does not need to be flushed.
   Pro: cluster wide consistency, replication method agnostic
   Con: one extra WAL record insertion per writing transaction. (32
bytes of WAL per tx)

3. Use a transient global counter on master, send xid-csn pairs to
slave via a side channel on the replication connection.
   Pro: Less overhead than WAL records
   Con: replication protocol needs (possibly invasive) changes, WAL
shipping based replication can't use this mechanism, lots of extra
code required.

4. Make the choice between 1 and 2 user configurable (it seems to me
that it could even be changed without a restart).

Thoughts?

Regards,
Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] split postmaster's checkDataDir to src/common

2013-08-28 Thread Tom Lane

Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Tom Lane wrote:
 What exactly is the argument for pushing this into 9.3?  Since we
 are past rc1, we should treat that branch as released.  If you wouldn't
 back-patch into all supported branches, you shouldn't be patching 9.3
 either.

 This is to fix the stats_temp_directory issue that the directory is too
 public.  We had agreed that it wasn't a problem before 9.3, but it is in
 that branch because we changed the code to write so many files now
 instead of just one.

To my mind, the new worry about 9.3 was the sloppy file deletion code,
which we've already cleaned up.  The remaining argument for tightening
the directory ownership/permissions checking was a security issue: should
we allow a risk that random people could see or alter the stats files.
That's not different in 9.3 than it was before; splitting up the data
into multiple files doesn't seem (to me) to change that risk any.

In fact, I thought that the end of the discussion left us with doubts
about whether we wanted to change this at all.  Andres pointed out for
instance that we might need a lock file in the directory to prevent
multiple PG instances from trying to share one temp directory.  I don't
know that we need to go that far, but if we don't think we need to defend
against that case, do we need defenses here at all?  In the end this is a
you break it, you get to keep both pieces issue, with not all that
severe consequences even if the DBA does mess it up.  There are other
subdirectories, such as pg_xlog, that we also allow people to relocate,
but for which the consequences of a bad config would be far worse than
they are for the stats dir.  We don't go out of our way to prevent
config foulups for other subdirectories, so why this one?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension Templates S03E11

2013-08-28 Thread Alvaro Herrera

Dimitri Fontaine wrote:
 Peter Eisentraut pete...@gmx.net writes:
  make -C pg_upgrade_support all
 
 Do we have something automated to easily test pg_upgrade?

make check in contrib/pg_upgrade should do the trick.


-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension Templates S03E11

2013-08-28 Thread Dimitri Fontaine

Alvaro Herrera alvhe...@2ndquadrant.com writes:
 make check in contrib/pg_upgrade should do the trick.

PASSED

Even after I added extension to the serial_schedule. I don't know if I
need to do anything specific on that area, will wait about some feedback
on that before sending a new version of the patch. Meanwhile my branch
is updated, and I sent the patch-on-patch too.

Regards,
-- 
Dimitri Fontaine06 63 07 10 78
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Spinlock implementation on x86_64 (was Re: [HACKERS] Better LWLocks with compare-and-swap (9.4))

2013-08-28 Thread Heikki Linnakangas


On 21.05.2013 00:20, Heikki Linnakangas wrote:

On 16.05.2013 01:08, Daniel Farina wrote:

On Mon, May 13, 2013 at 5:50 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

pgbench -S is such a workload. With 9.3beta1, I'm seeing this
profile, when
I run pgbench -S -c64 -j64 -T60 -M prepared on a 32-core Linux
machine:

- 64.09% postgres postgres [.] tas
- tas
- 99.83% s_lock
- 53.22% LWLockAcquire
+ 99.87% GetSnapshotData
- 46.78% LWLockRelease
GetSnapshotData
+ GetTransactionSnapshot
+ 2.97% postgres postgres [.] tas
+ 1.53% postgres libc-2.13.so [.] 0x119873
+ 1.44% postgres postgres [.] GetSnapshotData
+ 1.29% postgres [kernel.kallsyms] [k] arch_local_irq_enable
+ 1.18% postgres postgres [.] AllocSetAlloc
...

So, on this test, a lot of time is wasted spinning on the mutex of
ProcArrayLock. If you plot a graph of TPS vs. # of clients, there is a
surprisingly steep drop in performance once you go beyond 29 clients
(attached, pgbench-lwlock-cas-local-clients-sets.png, red line). My
theory
is that after that point all the cores are busy, and processes start
to be
sometimes context switched while holding the spinlock, which kills
performance.


I have, I also used linux perf to come to this conclusion, and my
determination was similar: a system was undergoing increasingly heavy
load, in this case with processes number of processors. It was
also a phase-change type of event: at one moment everything would be
going great, but once a critical threshold was hit, s_lock would
consume enormous amount of CPU time. I figured preemption while in
the spinlock was to blame at the time, given the extreme nature


Stop the press! I'm getting the same speedup on that 32-core box I got
with the compare-and-swap patch, from this one-liner:

--- a/src/include/storage/s_lock.h
+++ b/src/include/storage/s_lock.h
@@ -200,6 +200,8 @@ typedef unsigned char slock_t;

#define TAS(lock) tas(lock)

+#define TAS_SPIN(lock) (*(lock) ? 1 : TAS(lock))
+
static __inline__ int
tas(volatile slock_t *lock)
{

So, on this system, doing a non-locked test before the locked xchg
instruction while spinning, is a very good idea. That contradicts the
testing that was done earlier when the x86-64 implementation was added,
as we have this comment in the tas() implementation:


/*
* On Opteron, using a non-locking test before the locking instruction
* is a huge loss. On EM64T, it appears to be a wash or small loss,
* so we needn't bother to try to distinguish the sub-architectures.
*/


On my test system, the non-locking test is a big win. I tested this
because I was reading this article from Intel:

http://software.intel.com/en-us/articles/implementing-scalable-atomic-locks-for-multi-core-intel-em64t-and-ia32-architectures/.
It says very explicitly that the non-locking test is a good idea:


Spinning on volatile read vs. spin on lock attempt

One common mistake made by developers developing their own spin-wait
loops is attempting to spin on an atomic instruction instead of
spinning on a volatile read. Spinning on a dirty read instead of
attempting to acquire a lock consumes less time and resources. This
allows an application to only attempt to acquire a lock only when it
is free.


Now, I'm not sure what to do about this. If we put the non-locking test
in there, according to the previous testing that would be a huge loss on
Opterons.


I think we should apply the attached patch to put a non-locked test in 
TAS_SPIN() on x86_64.


Tom tested this back in 2011 on a 32-core Opteron [1] and found little 
difference. And on a 80-core Xeon (160 with hyperthreading) system, he 
saw a quite significant gain from it [2]. Reading the thread, I don't 
understand why it wasn't applied to x86_64 back then. Maybe it was for 
the fear of having a negative impact on performance on old Opterons, but 
I strongly suspect that the performance would be OK even on old Opterons 
if you only do the non-locked test in TAS_SPIN() and not in TAS(). Back 
when the comment about poor performance with a non-locking test on 
Opteron was written, we used the same TAS() macro in the first and while 
spinning.


I tried to find some sort of an authoritative guide from AMD that would 
say how spinlocks should be implemented, but didn't find any. The Intel 
guide I linked above says we should apply the patch. Linux uses a ticket 
spinlock thingie nowadays, which is slightly different, but if I'm 
reading the older pre-ticket version correctly, they used to do the 
same. Ie. non-locking test while spinning, but not before the first 
attempt to grab the lock. Same in FreeBSD.


So, my plan is to apply the attached non-locked-tas-spin-x86_64.patch to 
master. But I would love to get feedback from people running different 
x86_64 hardware.


[1] http://www.postgresql.org/message-id/8292.1314641...@sss.pgh.pa.us
[2] http://www.postgresql.org/message-id/25989.1314734...@sss.pgh.pa.us

- Heikki
diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h
index

Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-28 Thread Bruce Momjian

On Tue, Aug 27, 2013 at 09:04:00AM +0530, Amit Kapila wrote:
  For my part, I'd honestly rather have the first things found be what's
  picked and later things be ignored.  If you want it controlled by ALTER
  SYSTEM, then don't set it in postgresql.conf.  The problem with that is
  there's no bootstrap mechanism without actually modifying such configs
  or making the configs be in auto.conf to begin with, neither of which is
  ideal, imv.
 
  I really hate the idea that someone could configure 'X' in
  postgresql.conf and because the auto.conf line is later in the file,
  it's able to override the original setting.  Does not strike me as
  intuitive at all.
 
 This is currently how include mechanism works in postgresql.conf,
 changing that for this special case can be costly and moreover the
 specs for this patch were layout from beginning that way.

Agreed.  If you are worried about ALTER SYSTEM changing postgresql.conf
settings, you should move the include_auto line to the top of
postgresql.conf, but I don't think that should be the default.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Spinlock implementation on x86_64 (was Re: [HACKERS] Better LWLocks with compare-and-swap (9.4))

2013-08-28 Thread Tom Lane

Heikki Linnakangas hlinnakan...@vmware.com writes:
 So, my plan is to apply the attached non-locked-tas-spin-x86_64.patch to 
 master. But I would love to get feedback from people running different 
 x86_64 hardware.

Surely this patch should update the existing comment at line 209?  Or at
least point out that a non-locked test in TAS_SPIN is not the same as a
non-locked test in tas() itself.

Other than the commenting, I have no objection to this.  I think you're
probably right that the old tests in which this looked like a bad idea
were adding the unlocked test to tas() not only the spin case.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GetTransactionSnapshot() in enum.c

2013-08-28 Thread Robert Haas

On Mon, Aug 26, 2013 at 4:31 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Aug 19, 2013 at 1:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 BTW, I notice that the MVCC-catalog-scans patch summarily asserts that
 RenumberEnumType no longer poses any concurrency hazards.  I doubt that's
 true: isn't it still possible that pg_enum rows acquired through the
 syscaches will have inconsistent enumsortorder values, if they were
 read at different times?  If you want to examine enumsortorder, you really
 need to be comparing rows you know were read with the *same* snapshot.

 Good point, I missed that.  Here's a proposed comment patch.

 Looks sane to me.

Thanks for the review.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hstore: Query speedups with Gin index

2013-08-28 Thread Bruce Momjian

On Sun, Aug 25, 2013 at 10:11:50PM -0400, Tom Lane wrote:
 Michael Paquier michael.paqu...@gmail.com writes:
  On Thu, Aug 22, 2013 at 11:55 PM, Blake Smith blakesmi...@gmail.com wrote:
  The combined entry is used to support contains (@) queries, and the key
  only item is used to support key contains (?) queries. This change seems
  to help especially with hstore keys that have high cardinalities. Downsides
  of this change is that it requires an index rebuild, and the index will be
  larger in size.
 
  Index rebuild would be a problem only for minor releases,
 
 That's completely false; people have expected major releases to be
 on-disk-compatible for several years now.  While there probably will be
 future releases in which we are willing to break storage compatibility,
 a contrib module doesn't get to dictate that.
 
 What might be a practical solution, especially if this isn't always a
 win (which seems likely given the index-bloat risk), is to make hstore
 offer two different GIN index opclasses, one that works the traditional
 way and one that works this way.
 
 Another thing that needs to be taken into account here is Oleg and
 Teodor's in-progress work on extending hstore:
 https://www.pgcon.org/2013/schedule/events/518.en.html
 I'm not sure if this patch would conflict with that at all, but it
 needs to be considered.

We can disallow in-place upgrades for clusters that use certain contrib
modules --- we have done that in the past.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hstore: Query speedups with Gin index

2013-08-28 Thread Andres Freund

On 2013-08-28 13:31:22 -0400, Bruce Momjian wrote:
 On Sun, Aug 25, 2013 at 10:11:50PM -0400, Tom Lane wrote:
  Michael Paquier michael.paqu...@gmail.com writes:
   On Thu, Aug 22, 2013 at 11:55 PM, Blake Smith blakesmi...@gmail.com 
   wrote:
   The combined entry is used to support contains (@) queries, and the 
   key
   only item is used to support key contains (?) queries. This change 
   seems
   to help especially with hstore keys that have high cardinalities. 
   Downsides
   of this change is that it requires an index rebuild, and the index will 
   be
   larger in size.
  
   Index rebuild would be a problem only for minor releases,
  
  That's completely false; people have expected major releases to be
  on-disk-compatible for several years now.  While there probably will be
  future releases in which we are willing to break storage compatibility,
  a contrib module doesn't get to dictate that.
  
  What might be a practical solution, especially if this isn't always a
  win (which seems likely given the index-bloat risk), is to make hstore
  offer two different GIN index opclasses, one that works the traditional
  way and one that works this way.
  
  Another thing that needs to be taken into account here is Oleg and
  Teodor's in-progress work on extending hstore:
  https://www.pgcon.org/2013/schedule/events/518.en.html
  I'm not sure if this patch would conflict with that at all, but it
  needs to be considered.
 
 We can disallow in-place upgrades for clusters that use certain contrib
 modules --- we have done that in the past.

But that really cannot be acceptable for hstore. The probably most
widely used extension there is.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic background workers, round two

2013-08-28 Thread Robert Haas

On Tue, Aug 27, 2013 at 9:50 AM, Andres Freund and...@2ndquadrant.com wrote:
  BgwHandleStatus GetBackgroundWorkerPid(BackgroundWorkerHandle *handle, 
  pid_t *pid);
  BgwHandleStatus WaitForBackgroundWorkerStartup(BackgroundWorkerHandle 
  *handle, pid_t *pid);

 OK, here's a patch that API.  I renamed the constants a bit, because a
 process that has stopped is not necessarily gone; it could be
 configured for restart.  But we can say that it is stopped, at the
 moment.

 I'm not sure that this API is an improvement.  But I think it's OK, if
 you prefer it.

 Thanks, looks more readable to me. I like code that tells me what it
 does without having to look in other places ;)

Well, fair enough.  But you might have to look around a bit to figure
out that you now have two functions each of which can return 3 out of
a possible 4 values.  But whatever.

 +  functionRegisterDynamicBackgroundWorker(typeBackgroundWorker
 +  *worker, BackgroundWorkerHandle **handle/type)/function.  Unlike
 +  functionRegisterBackgroundWorker/, which can only be called from 
 within
 +  the postmaster, functionRegisterDynamicBackgroundWorker/function must 
 be
 +  called from a regular backend.
   /para

 s/which can only be called from within the posmaster/during initial
 startup, via shared_preload_libraries/?

This seems like a possible doc clarification not intimately related to
the patch at hand.  The only reason that paragraph is getting reflowed
is because of the additional argument to
RegisterDynamicBackgroundWorker().  On the substance, I could go
either way on whether your proposed text is better than what's there
now.  It seems like it might need a bit of wordsmithing, anyway.

 s/STATED/STARTED/

Good catch.

  bool
 -RegisterDynamicBackgroundWorker(BackgroundWorker *worker)
 +RegisterDynamicBackgroundWorker(BackgroundWorker *worker,
 + 
 BackgroundWorkerHandle **handle)
  {

 Hm. Not this patches fault, but We seem to allow bgw_start_time ==
 BgWorkerStart_PostmasterStart here which doesn't make sense...

I can add a check for that.  I agree that it's a separate patch.

 Maybe add something like Assert(hanlde-slot  max_worker_processes);?

Sure.

 Shouldn't that loop have a CHECK_FOR_INTERRUPTS()?

Yep.


 Theoretically this would unset set_latch_on_sigusr1 if it was previously
 already set to 'true'. If you feel defensive you could safe the previous
 value...

Probably a good plan.

 So, besides those I don't see much left to be done. I haven't tested
 EXEC_BACKEND, but I don't anything specific to that here.

I certainly can't promise that the code is bug-free.  But I think it's
probably better to get this into the tree and let people start playing
around with it than to continue to maintain it in my private sandbox.
At this point it's just infrastructure anyway, though there seems to
be a good deal of interest in it from more than just myself...  and I
do think it's a useful step forward from where we are today.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic background workers, round two

2013-08-28 Thread Robert Haas

On Wed, Aug 28, 2013 at 2:04 PM, Robert Haas robertmh...@gmail.com wrote:
 I certainly can't promise that the code is bug-free.  But I think it's
 probably better to get this into the tree and let people start playing
 around with it than to continue to maintain it in my private sandbox.
 At this point it's just infrastructure anyway, though there seems to
 be a good deal of interest in it from more than just myself...  and I
 do think it's a useful step forward from where we are today.

Committed with fixes for the issues you mentioned, plus some updates
to out-of-date comments.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic background workers, round two

2013-08-28 Thread Robert Haas

On Wed, Aug 28, 2013 at 2:04 PM, Robert Haas robertmh...@gmail.com wrote:
 Hm. Not this patches fault, but We seem to allow bgw_start_time ==
 BgWorkerStart_PostmasterStart here which doesn't make sense...

 I can add a check for that.  I agree that it's a separate patch.

On third thought, is there really any point to such a check?  I mean,
no background worker is going to start before it's registered, and by
the time any background worker is registered, we'll be passed the time
indicated by all of those constants: BgWorkerStart_PostmasterStart,
BgWorkerStart_ConsistentState, BgWorkerStart_RecoveryFinished.

I think we should view that field as fixing the earliest time at which
the worker should be started, rather than the exact time at which it
must be started.  Otherwise no value is sensible.  And if we take that
approach, then for a dynamic background worker, any value is OK.

...Robert


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-28 Thread Stephen Frost

* Bruce Momjian (br...@momjian.us) wrote:
 On Tue, Aug 27, 2013 at 09:04:00AM +0530, Amit Kapila wrote:
   I really hate the idea that someone could configure 'X' in
   postgresql.conf and because the auto.conf line is later in the file,
   it's able to override the original setting.  Does not strike me as
   intuitive at all.
  
  This is currently how include mechanism works in postgresql.conf,
  changing that for this special case can be costly and moreover the
  specs for this patch were layout from beginning that way.
 
 Agreed.  If you are worried about ALTER SYSTEM changing postgresql.conf
 settings, you should move the include_auto line to the top of
 postgresql.conf, but I don't think that should be the default.

While I appreciate that there are bootstrap-type issues with this, I
really don't like this idea of later stuff can just override earlier
stuff.

include files and conf.d-style options are for breaking the config up,
not to allow you to override options because a file came later than an
earlier file.  Our particular implementation of config-file reading
happens to lend itself to later-definition-wins, but that's really
counter-intuitive for anyone unfamiliar with PG, imv.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-28 Thread Tom Lane

Stephen Frost sfr...@snowman.net writes:
 While I appreciate that there are bootstrap-type issues with this, I
 really don't like this idea of later stuff can just override earlier
 stuff.

 include files and conf.d-style options are for breaking the config up,
 not to allow you to override options because a file came later than an
 earlier file.  Our particular implementation of config-file reading
 happens to lend itself to later-definition-wins, but that's really
 counter-intuitive for anyone unfamiliar with PG, imv.

I don't follow this argument at all.  Do you know any software with text
config files that will act differently from this if the same setting is
listed twice?  Last one wins is certainly what I'd expect.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic background workers, round two

2013-08-28 Thread Andres Freund

On 2013-08-28 14:04:59 -0400, Robert Haas wrote:
  +  functionRegisterDynamicBackgroundWorker(typeBackgroundWorker
  +  *worker, BackgroundWorkerHandle **handle/type)/function.  Unlike
  +  functionRegisterBackgroundWorker/, which can only be called from 
  within
  +  the postmaster, functionRegisterDynamicBackgroundWorker/function 
  must be
  +  called from a regular backend.
/para
 
  s/which can only be called from within the posmaster/during initial
  startup, via shared_preload_libraries/?
 
 This seems like a possible doc clarification not intimately related to
 the patch at hand.  The only reason that paragraph is getting reflowed
 is because of the additional argument to
 RegisterDynamicBackgroundWorker().  On the substance, I could go
 either way on whether your proposed text is better than what's there
 now.

I agree it's unrelated. I just noticed because it was in the diff ;)

 It seems like it might need a bit of wordsmithing, anyway.

Yes, looks like it could use some. It's not going to be me doing it
tho...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] What happens at BIND time?

2013-08-28 Thread Josh Berkus

Tom,

 Does the backend's memory usage climb, or hold steady?  If the former,
 I'd bet on client failure to release resources, eg not closing the
 portals when done with them.  A memory map from MemoryContextStats
 would help determine exactly what's leaking.

FS cache usage increases through the test run, as you'd expect, but the
amount of pinned memory actually remains pretty much constant -- and has
the same usage in both 8.4 (where the BIND issue doesn't happen) and
9.3b2 (where it does).

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-28 Thread Bruce Momjian

On Wed, Aug 28, 2013 at 02:30:41PM -0400, Stephen Frost wrote:
 * Bruce Momjian (br...@momjian.us) wrote:
  On Tue, Aug 27, 2013 at 09:04:00AM +0530, Amit Kapila wrote:
I really hate the idea that someone could configure 'X' in
postgresql.conf and because the auto.conf line is later in the file,
it's able to override the original setting.  Does not strike me as
intuitive at all.
   
   This is currently how include mechanism works in postgresql.conf,
   changing that for this special case can be costly and moreover the
   specs for this patch were layout from beginning that way.
  
  Agreed.  If you are worried about ALTER SYSTEM changing postgresql.conf
  settings, you should move the include_auto line to the top of
  postgresql.conf, but I don't think that should be the default.
 
 While I appreciate that there are bootstrap-type issues with this, I
 really don't like this idea of later stuff can just override earlier
 stuff.
 
 include files and conf.d-style options are for breaking the config up,
 not to allow you to override options because a file came later than an
 earlier file.  Our particular implementation of config-file reading
 happens to lend itself to later-definition-wins, but that's really
 counter-intuitive for anyone unfamiliar with PG, imv.

Agreed, but I think this is a much larger issue than ALTER SYSTEM SET.

I think changing behavior to first-seen would only add to confusion. 
What we really need is a WARNING when a later postgresql.conf setting
overrides an earlier one, and ALTER SYSTEM SET's config file could
behave the same, so you would know right away when you were overriding
something in postgresql.conf that appeared earlier.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-28 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Stephen Frost sfr...@snowman.net writes:
  While I appreciate that there are bootstrap-type issues with this, I
  really don't like this idea of later stuff can just override earlier
  stuff.
 
  include files and conf.d-style options are for breaking the config up,
  not to allow you to override options because a file came later than an
  earlier file.  Our particular implementation of config-file reading
  happens to lend itself to later-definition-wins, but that's really
  counter-intuitive for anyone unfamiliar with PG, imv.
 
 I don't follow this argument at all.  Do you know any software with text
 config files that will act differently from this if the same setting is
 listed twice?  Last one wins is certainly what I'd expect.

Have you tried having the same mount specified in multiple files in
fstab.d..?  Also, aiui (not a big exim fan personally), duplicate
definitions in an exim4/conf.d would result in an error.  Playing around
in apache2/sites-enabled, it appears to be first wins wrt virtual
hosts.

There's a number of cases where only a single value is being set and
subseqeunt files are 'additive' (eg: ld.so.conf.d), so they don't have
this issue.  Similar to that are script directories, which simply run a
set of scripts in the $DIR.d and, as it's the scripts themselves which
are the 'parameters', and being files in a directory, you can't
duplicate them.  Others (eg: pam.d) define the file name to be an
enclosing context, again preventing duplication by using the filename
itself.

There are counter-examples also; sysctl.d will replace what's already
been set, so perhaps it simply depends on an individual's experience.
For my part, I'd much prefer an error or warning saying you've got
duplicate definitions of X than a last-wins approach, though perhaps
that's just because I like things to be very explicit and clear.
Allowing multiple definitions of the same parameter to be valid isn't
'clear' to me.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-28 Thread Stephen Frost

* Bruce Momjian (br...@momjian.us) wrote:
 Agreed, but I think this is a much larger issue than ALTER SYSTEM SET.

Yeah, true.

 I think changing behavior to first-seen would only add to confusion. 
 What we really need is a WARNING when a later postgresql.conf setting
 overrides an earlier one, and ALTER SYSTEM SET's config file could
 behave the same, so you would know right away when you were overriding
 something in postgresql.conf that appeared earlier.

A warning would be nice.  If only everyone read the log files. :/

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] dynamic shared memory

2013-08-28 Thread Robert Haas

On Tue, Aug 27, 2013 at 10:07 AM, Andres Freund and...@2ndquadrant.com wrote:
 [just sending an email which sat in my outbox for two weeks]

Thanks for taking a look.

 Nice to see this coming. I think it will actually be interesting for
 quite some things outside parallel query, but we'll see.

Yeah, I hope so.  The applications may be somewhat limited by the fact
that there are apparently fairly small limits to how many shared
memory segments you can map at the same time.  I believe on one system
I looked at (some version of HP-UX?) the limit was 11.  So we won't be
able to go nuts with this: using it definitely introduces all kinds of
failure modes that we don't have it today.  But it will also let us do
some pretty cool things that we CAN'T do today.

 To help solve these problems, I invented something called the dynamic
 shared memory control segment.  This is a dynamic shared memory
 segment created at startup (or reinitialization) time by the
 postmaster before any user process are created.  It is used to store a
 list of the identities of all the other dynamic shared memory segments
 we have outstanding and the reference count of each.  If the
 postmaster goes through a crash-and-reset cycle, it scans the control
 segment and removes all the other segments mentioned there, and then
 recreates the control segment itself.  If the postmaster is killed off
 (e.g. kill -9) and restarted, it locates the old control segment and
 proceeds similarly.

 That way any corruption in that area will prevent restarts without
 reboot unless you use ipcrm, or such, right?

The way I've designed it, no.  If what we expect to be the control
segment doesn't exist or doesn't conform to our expectations, we just
assume that it's not really the control segment after all - e.g.
someone rebooted, clearing all the segments, and then an unrelated
process (malicious, perhaps, or just a completely different cluster)
reused the same name.  This is similar to what we do for the main
shared memory segment.

 Creating a shared memory segment is a somewhat operating-system
 dependent task.  I decided that it would be smart to support several
 different implementations and to let the user choose which one they'd
 like to use via a new GUC, dynamic_shared_memory_type.

 I think we want that during development, but I'd rather not go there
 when releasing. After all, we don't support a manual choice between
 anonymous mmap/sysv shmem either.

That's true, but that decision has not been uncontroversial - e.g. the
NetBSD guys don't like it, because they have a big performance
difference between those two types of memory.  We have to balance the
possible harm of one more setting against the benefit of letting
people do what they want without needing to recompile or modify code.

 In addition, I've included an implementation based on mmap of a plain
 file.  As compared with a true shared memory implementation, this
 obviously has the disadvantage that the OS may be more likely to
 decide to write back dirty pages to disk, which could hurt
 performance.  However, I believe it's worthy of inclusion all the
 same, because there are a variety of situations in which it might be
 more convenient than one of the other implementations.  One is
 debugging.

 Hm. Not sure what's the advantage over a corefile here.

You can look at it while the server's running.

 On MacOS X, for example, there seems to be no way to list
 POSIX shared memory segments, and no easy way to inspect the contents
 of either POSIX or System V shared memory segments.

 Shouldn't we ourselves know which segments are around?

Sure, that's the point of the control segment.  But listing a
directory is a lot easier than figuring out what the current control
segment contents are.

 Another use case
 is working around an administrator-imposed or OS-imposed shared memory
 limit.  If you're not allowed to allocate shared memory, but you are
 allowed to create files, then this implementation will let you use
 whatever facilities we build on top of dynamic shared memory anyway.

 I don't think we should try to work around limits like that.

I do.  There's probably someone, somewhere in the world who thinks
that operating system shared memory limits are a good idea, but I have
not met any such person.  There are multiple ways to create shared
memory, and they all have different limits.  Normally, System V limits
are small, POSIX limits are large, and the inherited-anonymous-mapping
trick we're now using for the main shared memory segment has no limits
at all.  It's very common to run into a system where you can allocate
huge numbers of gigabytes of backend-private memory, but if you try to
allocate 64MB of *shared* memory, you get the axe - or maybe not,
depending on which API you use to create it.

I would never advocate deliberately trying to circumvent a
carefully-considered OS-level policy decision about resource
utilization, but I don't think that's the dynamic here.  I think if we

Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-28 Thread Bruce Momjian

On Wed, Aug 28, 2013 at 03:15:14PM -0400, Stephen Frost wrote:
 * Bruce Momjian (br...@momjian.us) wrote:
  Agreed, but I think this is a much larger issue than ALTER SYSTEM SET.
 
 Yeah, true.
 
  I think changing behavior to first-seen would only add to confusion. 
  What we really need is a WARNING when a later postgresql.conf setting
  overrides an earlier one, and ALTER SYSTEM SET's config file could
  behave the same, so you would know right away when you were overriding
  something in postgresql.conf that appeared earlier.
 
 A warning would be nice.  If only everyone read the log files. :/

I would expect ALTER SYSTEM SET to return a WARNING in such cases too.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-28 Thread Alvaro Herrera

Stephen Frost escribió:

 There are counter-examples also; sysctl.d will replace what's already
 been set, so perhaps it simply depends on an individual's experience.
 For my part, I'd much prefer an error or warning saying you've got
 duplicate definitions of X than a last-wins approach, though perhaps
 that's just because I like things to be very explicit and clear.
 Allowing multiple definitions of the same parameter to be valid isn't
 'clear' to me.

One of the elements that had to be in place for this whole thing of
multiple configuration files to be allowed was for pg_settings to
include precise information about, for each setting, where is its value
coming from.  I doubt any of the systems you mentioned have comparable
functionality.  If the admin has trouble figuring it out, he needs only
look into that view.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/pgSQL PERFORM with CTE

2013-08-28 Thread Robert Haas

On Tue, Aug 27, 2013 at 6:10 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 what is magical?

 Stored procedures - we talk about this technology was a originally simple
 script moved from client side to server side.

 so if I write on client side

 BEGIN;
   SELECT 1,2;
   SELECT 2;
   SELECT 3,4;
 END;

 then I expect results

 1,2
 2
 3,4

The biggest problem with this idea is that people will do it by
accident with unacceptable frequency.  During the decade or so I
worked as a web programmer, I made this mistake a number of times, and
judging by the comments on this thread, Josh Berkus has made it with
some regularity as well.  If experienced PostgreSQL hackers who know
the system inside and out make such mistakes with some regularity, I
think we can anticipate that novices will make them even more often.

And, TBH, as others have said here, I find the requirement to use
PERFORM rather than SELECT rather ridiculous.  The clash with CTEs has
been there since we added CTEs, and I've hit it more than once.  Yeah,
you can work around it, but it's annoying.  And why annoy people?  So
+1 from me for de-requiring the use of PERFORM (though I think we
should definitely continue to accept that syntax, for backward
compatibility).

At the end of the day, procedural languages in PostgreSQL are
pluggable.  So if we someday have the ability to return extra result
sets on the fly, and if Pavel doesn't like the syntax we choose to use
in PL/pgsql, he can (and, given previous history, very possibly will!)
publish his own PL with different syntax.  But I'm with the crowd that
says that's not the right decision for PL/pgsql.

Also, even if we did adopt Pavel's proposed meaning for SELECT 1,2,
we still have a problem to solve, which is what the user should write
when they want to run a query and ignore the results.  The PERFORM
solution was adequate at a time when all select queries started with
SELECT, but now they can start with WITH or VALUES or TABLE as well,
and while VALUES and TABLE may be ignorable, WITH certainly isn't.
Requiring people to use silly workarounds like selecting into an
otherwise-pointless dummy variable is not cool.  If we reserve the
undecorated-SELECT syntax to mean something else, then we've got to
come up with some other way of solving David's original problem, and I
don't think there are going to be many elegant options.

Finally, I'd like to note that it's been longstanding frustration of
mine that the PERFORM-SELECT transformation is leaky.  For example,
consider:

rhaas=# do $$begin perform amazingly_well(); end;$$;
ERROR:  function amazingly_well() does not exist
LINE 1: SELECT amazingly_well()
   ^
HINT:  No function matches the given name and argument types. You
might need to add explicit type casts.
QUERY:  SELECT amazingly_well()
CONTEXT:  PL/pgSQL function inline_code_block line 1 at PERFORM

Hmm, the user might say.  I didn't type the word SELECT anywhere, yet
it shows up in the error message.  How confusing!  With a big enough
hammer we could perhaps paper over this problem a bit more thoroughly,
but since I've never liked the syntax to begin with, I advance this as
another argument for killing it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/pgSQL PERFORM with CTE

2013-08-28 Thread Merlin Moncure

On Wed, Aug 28, 2013 at 2:59 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Aug 27, 2013 at 6:10 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 what is magical?

 Stored procedures - we talk about this technology was a originally simple
 script moved from client side to server side.

 so if I write on client side

 BEGIN;
   SELECT 1,2;
   SELECT 2;
   SELECT 3,4;
 END;

 then I expect results

 1,2
 2
 3,4

 The biggest problem with this idea is that people will do it by
 accident with unacceptable frequency.  During the decade or so I
 worked as a web programmer, I made this mistake a number of times, and
 judging by the comments on this thread, Josh Berkus has made it with
 some regularity as well.  If experienced PostgreSQL hackers who know
 the system inside and out make such mistakes with some regularity, I
 think we can anticipate that novices will make them even more often.

 And, TBH, as others have said here, I find the requirement to use
 PERFORM rather than SELECT rather ridiculous.  The clash with CTEs has
 been there since we added CTEs, and I've hit it more than once.  Yeah,
 you can work around it, but it's annoying.  And why annoy people?  So
 +1 from me for de-requiring the use of PERFORM (though I think we
 should definitely continue to accept that syntax, for backward
 compatibility).

 At the end of the day, procedural languages in PostgreSQL are
 pluggable.  So if we someday have the ability to return extra result
 sets on the fly, and if Pavel doesn't like the syntax we choose to use
 in PL/pgsql, he can (and, given previous history, very possibly will!)
 publish his own PL with different syntax.  But I'm with the crowd that
 says that's not the right decision for PL/pgsql.

 Also, even if we did adopt Pavel's proposed meaning for SELECT 1,2,
 we still have a problem to solve, which is what the user should write
 when they want to run a query and ignore the results.  The PERFORM
 solution was adequate at a time when all select queries started with
 SELECT, but now they can start with WITH or VALUES or TABLE as well,
 and while VALUES and TABLE may be ignorable, WITH certainly isn't.
 Requiring people to use silly workarounds like selecting into an
 otherwise-pointless dummy variable is not cool.  If we reserve the
 undecorated-SELECT syntax to mean something else, then we've got to
 come up with some other way of solving David's original problem, and I
 don't think there are going to be many elegant options.

 Finally, I'd like to note that it's been longstanding frustration of
 mine that the PERFORM-SELECT transformation is leaky.  For example,
 consider:

 rhaas=# do $$begin perform amazingly_well(); end;$$;
 ERROR:  function amazingly_well() does not exist
 LINE 1: SELECT amazingly_well()
^
 HINT:  No function matches the given name and argument types. You
 might need to add explicit type casts.
 QUERY:  SELECT amazingly_well()
 CONTEXT:  PL/pgSQL function inline_code_block line 1 at PERFORM

 Hmm, the user might say.  I didn't type the word SELECT anywhere, yet
 it shows up in the error message.  How confusing!  With a big enough
 hammer we could perhaps paper over this problem a bit more thoroughly,
 but since I've never liked the syntax to begin with, I advance this as
 another argument for killing it.

Right.  Another pain point for me is that I frequently have to
'up-convert' functions from sql to pgsql (and sometimes the other way
too). The perform requirement turns that into a headache.  It looks
like we are mostly ok on Oracle compatibility too.

I'm a fan of David's 'YIELD' syntax concept as a line of analysis for
'mid procedure set returning' when we get there.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/pgSQL PERFORM with CTE

2013-08-28 Thread Hannu Krosing

On 08/28/2013 09:59 PM, Robert Haas wrote:
 On Tue, Aug 27, 2013 at 6:10 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 what is magical?

 Stored procedures - we talk about this technology was a originally simple
 script moved from client side to server side.

 so if I write on client side

 BEGIN;
   SELECT 1,2;
   SELECT 2;
   SELECT 3,4;
 END;

 then I expect results

 1,2
 2
 3,4
 The biggest problem with this idea is that people will do it by
 accident with unacceptable frequency.  During the decade or so I
 worked as a web programmer, I made this mistake a number of times, and
 judging by the comments on this thread, Josh Berkus has made it with
 some regularity as well.  If experienced PostgreSQL hackers who know
 the system inside and out make such mistakes with some regularity, I
 think we can anticipate that novices will make them even more often.
Usually yo test your queries fom psql prompt and then copy/paste
into your function.
As ignoring the results need no conscious effort at psql prompt, it
will always be a mild surprise that you have to jump through hoops
to do it in pl/pgsql.
And I can easily do this for example in pl/python - just do not assign
the result from plpy.execute() and they get ignored with no extra
effort whatsoever.


 ...
 Finally, I'd like to note that it's been longstanding frustration of
 mine that the PERFORM-SELECT transformation is leaky.  For example,
 consider:

 rhaas=# do $$begin perform amazingly_well(); end;$$;
 ERROR:  function amazingly_well() does not exist
 LINE 1: SELECT amazingly_well()
^
 HINT:  No function matches the given name and argument types. You
 might need to add explicit type casts.
 QUERY:  SELECT amazingly_well()
 CONTEXT:  PL/pgSQL function inline_code_block line 1 at PERFORM

 Hmm, the user might say.  I didn't type the word SELECT anywhere, yet
 it shows up in the error message.  How confusing!  With a big enough
 hammer we could perhaps paper over this problem a bit more thoroughly,
 but since I've never liked the syntax to begin with, I advance this as
 another argument for killing it.

Totally agree.

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clarification on materialized view restriction needed

2013-08-28 Thread Robert Haas

On Wed, Aug 28, 2013 at 1:40 AM, Ashutosh Bapat
ashutosh.ba...@enterprisedb.com wrote:
 I would be good, if this set gets documented, lest users will be confused.
 Can you point me to relevant sections of document? I can add this
 documentation.

I think it's your job to look at the documentation and determine where
this would best fit, not Noah's to go decide that for you.

...Robert


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Valgrind Memcheck support

2013-08-28 Thread Noah Misch

On Wed, Aug 28, 2013 at 03:16:14PM +0200, Andres Freund wrote:
 On 2013-08-27 23:46:23 -0400, Noah Misch wrote:
  On Tue, Aug 27, 2013 at 04:14:27PM +0200, Andres Freund wrote:
   On 2013-06-09 17:25:59 -0400, Noah Misch wrote:
***
*** 846,851  exec_simple_query(const char *query_string)
--- 847,856 
  
TRACE_POSTGRESQL_QUERY_START(query_string);
  
+ #ifdef USE_VALGRIND
+   VALGRIND_PRINTF(statement: %s\n, query_string);
+ #endif
+ 
   
   Is there a special reason for adding more logging here? I find this
   makes the instrumentation much less useful since reports easily get
   burried in those traces. What's the advantage of doing this instead of
   log_statement=...? Especially as that location afaics won't help for the
   extended protocol?

  That being said, could you tell me more about your workflow where the extra
  messages cause a problem?  Do you just typically diagnose each Valgrind 
  error
  without first isolating the pertinent SQL statement?
 
 There's basically two scenarios where I found it annoying:
 a) I manually reproduce some issue, potentially involving several psql
 shells. All those fire up autocompletion and such queries which bloat
 the log while I actually only want to analyze something specific.
 b) I don't actually look for anything triggered by an SQL statement
 itself, but am looking for issues in a walsender, background worker,
 whatever. I still need some foreground activity to trigger the behaviour
 I want to see, but once more, valgrind's logs hide the error.
 
 Also, I sometimes use valgrind's --db-command/--vgdb which gives you
 enough context from the backtrace itself since you can just get the
 statement from there.

Gotcha.  My largest use case was running make installcheck in search of
longstanding bugs.  The runs were unattended, and associating each Valgrind
error with its proximate command was a basic need.

 I vote for just removing that VALGRIND_PRINTF - it doesn't give you
 anything you cannot easily achieve otherwise...

I'd like to see a buildfarm member running make installcheck under Valgrind,
so I'd like the code to fit the needs thereof without patching beyond
pg_config_manual.h.  Perhaps having the buildfarm member do valgrind postgres
--log-statement=all 2combined-logfile is good enough.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improving avg performance for numeric

2013-08-28 Thread Hadi Moshayedi

Hello,

 int, float, double 26829 ms (26675 ms) -- 0.5% slower .. statistic error ..
 cleaner code
 numeric sum 6490 ms (7224 ms) --  10% faster
 numeric avg 6487 ms (12023 ms) -- 46% faster

I also got very similar results.

On the other hand, initially I was receiving sigsegv's whenever I
wanted to try to run an aggregate function. gdb was telling that this
was happening somewhere in nodeAgg.c at ExecInitAgg. While trying to
find the reason, I had to reboot my computer at some point, after the
reboot the sigsegv's went away. I want to look into this and find the
reason, I think I have missed something here. Any thoughts about why
this would happen?

--Hadi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improving avg performance for numeric

2013-08-28 Thread Pavel Stehule

2013/8/29 Hadi Moshayedi h...@moshayedi.net

 Hello,

  int, float, double 26829 ms (26675 ms) -- 0.5% slower .. statistic error
 ..
  cleaner code
  numeric sum 6490 ms (7224 ms) --  10% faster
  numeric avg 6487 ms (12023 ms) -- 46% faster

 I also got very similar results.

 On the other hand, initially I was receiving sigsegv's whenever I
 wanted to try to run an aggregate function. gdb was telling that this
 was happening somewhere in nodeAgg.c at ExecInitAgg. While trying to
 find the reason, I had to reboot my computer at some point, after the
 reboot the sigsegv's went away. I want to look into this and find the
 reason, I think I have missed something here. Any thoughts about why
 this would happen?


I found a few bugs, that I fixed. There was a issue with empty sets. Other
issues I didn't find.

Regards

Pavel



 --Hadi