Re: [HACKERS] [COMMITTERS] pgsql: doc: Fix typos

2016-04-23 Thread Fabien COELHO


Hello Peter,


http://git.postgresql.org/pg/commitdiff/b87b2f4bda1a3b98f8dea867b8bc419ace7a9ea9
doc/src/sgml/ref/pgbench.sgml | 8 


 -   Here is example outputs:
 +   Here is example output:

I think that something is missing now on this line. Should it rather be 
"Here is an example output", if singular is better? It looks awkward to me 
without some article.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-04-23 Thread Peter Geoghegan
On Sat, Apr 23, 2016 at 8:58 PM, Bruce Momjian  wrote:
> Why is the parallelism variable called "max_parallel_degree"?  Is that a
> descriptive name?  What does "degree" mean?  Why is it not called
> "max_parallel_workers"?

I also think that "max_parallel_workers" works better.

While certain other systems have a concept of a "maximum parallel
degree", it is not the same. Those other systems disable parallelism
altogether when "max parallel degree" is 1, whereas Postgres uses 1
parallel worker along with a leader process. And so, parallelism
*will* still be used on Postgres. That's a pretty significant
difference.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-04-23 Thread Amit Kapila
On Sun, Apr 24, 2016 at 9:28 AM, Bruce Momjian  wrote:
>
> Why is the parallelism variable called "max_parallel_degree"?  Is that a
> descriptive name?  What does "degree" mean?

It is to denote amount of parallelism at node level.

>
>   Why is it not called
> "max_parallel_workers"?
>

Degree of Parallelism is a term used in many of the other popular databases
for the similar purpose, so I think that is another reason to prefer
max_parallel_degree.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


[HACKERS] Rename max_parallel_degree?

2016-04-23 Thread Bruce Momjian
Why is the parallelism variable called "max_parallel_degree"?  Is that a
descriptive name?  What does "degree" mean?  Why is it not called
"max_parallel_workers"?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated backup APIs for non-exclusive backups

2016-04-23 Thread Bruce Momjian
On Fri, Apr 22, 2016 at 11:53:46AM -0400, Robert Haas wrote:
> On Wed, Apr 20, 2016 at 1:32 PM, Magnus Hagander  wrote:
> > Note that we have not marked them as deprecated. We're just giving warnings
> > that they will be deprecated.
> 
> But I think that is being said here is that maybe they won't be
> deprecated, at least not any time soon.  And therefore maybe we
> shouldn't say so.
> 
> Frankly, I think that's right.  It is one thing to say that the new
> method is preferred - +1 for that.  But the old method is going to
> continue to be used by many people for a long time, and in some cases
> will be superior.  That's not something we can deprecate, unless I'm
> misunderstanding the situation.

I agree with Robert.  One the one hand we are saying pg_stop_backup()
doesn't work well in psql because you get those two file contents output
that you have to write, and on the other hand we are saying we are going
to deprecate the method that does work well in psql?  I must be missing
something too, as that makes no sense.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-04-23 Thread Andrew Dunstan



On 04/23/2016 06:33 PM, Tom Lane wrote:

I wrote:

Andrew Dunstan  writes:

On 04/23/2016 05:30 PM, Christian Ullrich wrote:

In this case, I would prefer this:

#ifdef WIN32_ONLY_COMPILER
-typedef int pid_t;
+typedef intptr_t pid_t;
#endif

That's a change that will have a pretty wide effect. Everything up to
now has been pretty low risk, but this worries me rather more. Maybe
it's safe, but I'd like to hear others' comments.

Yeah, it makes me a bit nervous too.

One other thought: even if this is safe for HEAD, I think we could
*not* back-patch it into 9.5, because it would amount to an ABI
break on Windows anywhere that pid_t is used in globally visible
structs or function signatures.  (Maybe there are no such places,
but I doubt it.)  So we'd need to go with the messy-cast solution
for 9.5.




It's not that messy. I'm inclined just to make minimal changed to 
pg_basebackup.c and be done with it. I don't think a compiler warning is 
worth doing more for.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

2016-04-23 Thread Andres Freund
On 2016-04-15 16:53:37 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2016-04-15 15:26:17 -0400, Tom Lane wrote:
> >> I think the bottom line is that we misdesigned the WAL representation
> >> by assuming that this sort of info could always be piggybacked on a
> >> transaction commit record.  It's time to fix that.
>
> > I think we got to piggyback it onto a commit record, as long as there's
> > one.
>
> No objection to that part.  What I'm saying is that when there isn't one,
> the answer is a new record type, not forcing xid assignment.  It might
> look almost like a commit record, but it shouldn't imply that there
> was a transaction.

Here's a patch doing so.  Note that, after putting the record into RM_XACT_ID
first, I've decided to make it a RM_STANDBY_ID type record. I think it's
likely that this is going to be needed beyond transaction commits, and
it generally seems to fit better into standby.c; even if it's a bit
awkward that commit records contain similar data. To avoid duplicating
the *desc.c code, I've exposed standby_desc_invalidations() to also be
used by xactdesc.c.

It fixes the problem at hand, but essentially it's just luck that the
patch is sufficient. The first layer of the issue is that queued
invalidation messages aren't sent; but for vm_extend() there's another
side to it. Namely vm_extend() does

/*
 * Send a shared-inval message to force other backends to close any smgr
 * references they may have for this rel, which we are about to change.
 * This is a useful optimization because it means that backends don't 
have
 * to keep checking for creation or extension of the file, which happens
 * infrequently.
 */
CacheInvalidateSmgr(rel->rd_smgr->smgr_rnode);

but CacheInvalidateSmgr is non-transactional as it's comment explains:
 *
 * Note: because these messages are nontransactional, they won't be captured
 * in commit/abort WAL entries.  Instead, calls to CacheInvalidateSmgr()
 * should happen in low-level smgr.c routines, which are executed while
 * replaying WAL as well as when creating it.
 *

as far as I can see vm_extend() is the only current caller forgetting
that rule. I don't think it's safe to use CacheInvalidateSmgr() outside
smgr.c.

The reason this all ends up working nonetheless is that the
heap_inplace_update()s in vacuum triggers a CacheInvalidateHeapTuple()
which queues a relcache invalidation, which in turn does a
RelationCloseSmgr() in RelationClearRelation(). Thereby effectively
doing a transactional CacheInvalidateSmgr().  But that seems more than
fragile.

ISTM we should additionally replace the CacheInvalidateSmgr() with a
CacheInvalidateRelcache() and document that that implies an smgr
invalidation.  Alternatively we could log smgr (and relmapper)
invalidations as well, but that's not quite non-invasive either; but
might be a good long-term idea to keep things simpler.

Comments?

- Andres
>From b2295700fb9e1d01b03bdebca3c71692941144c9 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sat, 23 Apr 2016 19:18:00 -0700
Subject: [PATCH] Emit invalidations to standby for transactions without xid.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

So far, when a transaction with pending invalidations, but without an
assigned xid, committed, we simply ignored those invalidation
messages. That's problematic, because those are actually sent for a
reason.

Known symptoms of this include that existing sessions on a hot-standby
replica sometimes fail to notice new concurrently built indexes and
visibility map updates.

The solution is to WAL log such invalidations in transactions without an
xid. We considered to alternatively force-assign an xid, but that'd be
problematic for vacuum, which might be run in systems with few xids.

Important: This adds a new WAL record, but as the patch has to be
back-patched, we can't bump the WAL page magic. This means that standbys
have to be updated before primaries; otherwise
"PANIC: standby_redo: unknown op code 32" errors can be encountered.

XXX:

Reported-By: Васильев Дмитрий, Masahiko Sawada
Discussion:
CAB-SwXY6oH=9twbkxjtgr4uc1nqt-vpyatxcseme62adwyk...@mail.gmail.com
CAD21AoDpZ6Xjg=gFrGPnSn4oTRRcwK1EBrWCq9OqOHuAcMMC=w...@mail.gmail.com
---
 src/backend/access/rmgrdesc/standbydesc.c   | 54 +
 src/backend/access/rmgrdesc/xactdesc.c  | 30 ++
 src/backend/access/transam/xact.c   | 18 +
 src/backend/replication/logical/decode.c|  9 +
 src/backend/replication/logical/reorderbuffer.c | 53 +++-
 src/backend/storage/ipc/standby.c   | 35 
 src/backend/utils/cache/inval.c |  5 ++-
 src/include/replication/reorderbuffer.h |  2 +
 src/include/storage/standby.h   |  2 +
 src/include/storage/standbydefs.h   | 21 ++
 10 files changed, 181 ins

Re: [HACKERS] xlc atomics

2016-04-23 Thread Noah Misch
On Mon, Feb 15, 2016 at 02:02:41PM -0500, Tom Lane wrote:
> Noah Misch  writes:
> > That commit (0d32d2e) permitted things to compile and usually pass tests, 
> > but
> > I missed the synchronization bug.  Since 2015-10-01, the buildfarm has seen
> > sixteen duplicate-catalog-OID failures.
> 
> I'd been wondering about those ...
> 
> > These suggested OidGenLock wasn't doing its job.  I've seen similar symptoms
> > around WALInsertLocks with "IBM XL C/C++ for Linux, V13.1.2 (5725-C73,
> > 5765-J08)" for ppc64le.  The problem is generic-xlc.h
> > pg_atomic_compare_exchange_u32_impl() issuing __isync() before
> > __compare_and_swap().  __isync() shall follow __compare_and_swap(); see our
> > own s_lock.h, its references, and other projects' usage:
> 
> Nice catch!
> 
> > This patch's test case would have failed about half the time under today's
> > generic-xlc.h.  Fast machines run it in about 1s.  A test that detects the 
> > bug
> > 99% of the time would run far longer, hence this compromise.
> 
> Sounds like a reasonable compromise to me, although I wonder about the
> value of it if we stick it into pgbench's TAP tests.  How many of the
> slower buildfarm members are running the TAP tests?  Certainly mine are
> not.

I missed a second synchronization bug in generic-xlc.h, but the pgbench test
suite caught it promptly after commit 008608b.  Buildfarm members mandrill and
hornet failed[1] or deadlocked within two runs.  The bug is that the
pg_atomic_compare_exchange_*() specifications grant "full barrier semantics",
but generic-xlc.h provided only the semantics of an acquire barrier.  Commit
008608b exposed that older problem; its LWLockWaitListUnlock() relies on
pg_atomic_compare_exchange_u32() (via pg_atomic_fetch_and_u32()) for release
barrier semantics.

The fix is to issue "sync" before each compare-and-swap, in addition to the
"isync" after it.  This is consistent with the corresponding GCC atomics.  The
pgbench test suite survived dozens of runs so patched.

[1] 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2016-04-11%2003%3A14%3A13
diff --git a/src/include/port/atomics/generic-xlc.h 
b/src/include/port/atomics/generic-xlc.h
index f24e3af..f4fd2f3 100644
--- a/src/include/port/atomics/generic-xlc.h
+++ b/src/include/port/atomics/generic-xlc.h
@@ -40,12 +40,22 @@ static inline bool
 pg_atomic_compare_exchange_u32_impl(volatile pg_atomic_uint32 *ptr,
uint32 
*expected, uint32 newval)
 {
+   boolret;
+
+   /*
+* atomics.h specifies sequential consistency ("full barrier semantics")
+* for this interface.  Since "lwsync" provides acquire/release
+* consistency only, do not use it here.  GCC atomics observe the same
+* restriction; see its rs6000_pre_atomic_barrier().
+*/
+   __asm__ __volatile__ (" sync \n" ::: "memory");
+
/*
 * XXX: __compare_and_swap is defined to take signed parameters, but 
that
 * shouldn't matter since we don't perform any arithmetic operations.
 */
-   boolret = __compare_and_swap((volatile int*)&ptr->value,
-   
 (int *)expected, (int)newval);
+   ret = __compare_and_swap((volatile int*)&ptr->value,
+(int *)expected, 
(int)newval);
 
/*
 * xlc's documentation tells us:
@@ -63,6 +73,10 @@ pg_atomic_compare_exchange_u32_impl(volatile 
pg_atomic_uint32 *ptr,
 static inline uint32
 pg_atomic_fetch_add_u32_impl(volatile pg_atomic_uint32 *ptr, int32 add_)
 {
+   /*
+* __fetch_and_add() emits a leading "sync" and trailing "isync", 
thereby
+* providing sequential consistency.  This is undocumented.
+*/
return __fetch_and_add((volatile int *)&ptr->value, add_);
 }
 
@@ -73,8 +87,12 @@ static inline bool
 pg_atomic_compare_exchange_u64_impl(volatile pg_atomic_uint64 *ptr,
uint64 
*expected, uint64 newval)
 {
-   boolret = __compare_and_swaplp((volatile long*)&ptr->value,
-   
   (long *)expected, (long)newval);
+   boolret;
+
+   __asm__ __volatile__ (" sync \n" ::: "memory");
+
+   ret = __compare_and_swaplp((volatile long*)&ptr->value,
+  (long *)expected, 
(long)newval);
 
__isync();
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump dump catalog ACLs

2016-04-23 Thread Noah Misch
On Fri, Apr 22, 2016 at 08:24:53PM -0400, Stephen Frost wrote:
> * Noah Misch (n...@leadboat.com) wrote:
> > On Tue, Apr 05, 2016 at 05:50:18PM -0400, Stephen Frost wrote:
> > > Subject: [PATCH 4/5] In pg_dump, include pg_catalog and extension ACLs, if
> > >  changed
> > > 
> > > Now that all of the infrastructure exists, add in the ability to
> > > dump out the ACLs of the objects inside of pg_catalog or the ACLs
> > > for objects which are members of extensions, but only if they have
> > > been changed from their original values.
> > 
> > I wrote the attached test script to verify which types of ACLs dump/reload
> > covers.  Based on the commit message, I expected these results:
> > 
> >   Dumped: type, class, attribute, proc, largeobject_metadata,
> >   foreign_data_wrapper, foreign_server,
> >   language(in-extension), namespace(in-extension)
> >   Not dumped: database, tablespace,
> >   language(from-initdb), namespace(from-initdb)
> > 
> > Did I interpret the commit message correctly?  The script gave these 
> > results:
> > 
> >   Dumped: type, class, attribute, namespace(from-initdb)
> >   Not dumped: database, tablespace, proc, language(from-initdb)
> 
> You interpreted the commit message correctly and in a number of cases
> the correct results are generated, but there's an issue in the WHERE
> clause being used for some of the object types.

I'm wondering whether it would be a slightly better design to treat
language(from-initdb) like language(in-extension).  Likewise for namespaces.
The code apparently already exists to dump from-initdb namespace ACLs.  Is
there a user interface design reason to want to distinguish the from-initdb
and in-extension cases?

> That's relatively straight-forward to fix, but I'm going to put
> together a series of TAP tests to go with these fixes.  While I tested
> various options and bits and pieces of the code while developing, this
> really needs a proper test suite that runs through all of these
> permutations with each change.

Sounds great.  Thanks.

> I'm planning to have that done by the end of the weekend.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump dump catalog ACLs

2016-04-23 Thread Noah Misch
On Fri, Apr 22, 2016 at 12:31:41PM -0400, Stephen Frost wrote:
> * Noah Misch (n...@leadboat.com) wrote:
> > On Wed, Apr 20, 2016 at 10:50:21PM -0400, Stephen Frost wrote:
> > > I'm certainly open to improving these issues now if we agree that they
> > > should be fixed for 9.6.  If we don't want to include such changes in 9.6
> > > then I will propose then for post-9.6.
> > 
> > Folks run clusters with ~1000 databases; we previously accepted at least one
> > complex performance improvement[1] based on that use case.  On the faster of
> > the two machines I tested, the present thread's commits slowed "pg_dumpall
> > --schema-only --binary-upgrade" by 1-2s per database.  That doubles pg_dump
> > runtime against the installcheck regression database.  A run against a 
> > cluster
> > of one hundred empty databases slowed fifteen-fold, from 8.6s to 131s.
> > "pg_upgrade -j50" probably will keep things tolerable for the 1000-database
> > case, but the performance regression remains jarring.  I think we should not
> > release 9.6 with pg_dump performance as it stands today.
> 
> After looking through the code a bit, I realized that there are a lot of
> object types which don't have ACLs at all but which exist in pg_catalog
> and were being analyzed because the bitmask for pg_catalog included ACLs
> and therefore was non-zero.
> 
> Clearing that bit for object types which don't have ACLs improved the
> performance for empty databases quite a bit (from about 3s to a bit
> under 1s on my laptop).  That's a 42-line patch, with comment lines
> being half of that, which I'll push once I've looked into the other
> concerns which were brought up on this thread.

That's good news.

> Much of the remaining inefficiancy is how we query for column
> information one table at a time (that appears to be around 300ms of the
> 900ms or so total time).  I'm certainly interested in improving that but
> that would require adding more complex data structures to pg_dump than
> what we use currently (we'd want to grab all of the columns we care
> about in an entire schema and store it locally and then provide a way to
> look up that information, etc), so I'm not sure it'd be appropriate to
> do now.

I'm not sure, either; I'd need to see more to decide.  If I were you, I would
draft a patch to the point where the community can see the complexity and the
performance change.  That should be enough to inform the choice among moving
forward with the proposed complexity, soliciting other designs, reverting the
original changes, or accepting for 9.6 the slowdown as it stands at that time.
Other people may have more definite opinions already.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-04-23 Thread Tom Lane
I wrote:
> Andrew Dunstan  writes:
>> On 04/23/2016 05:30 PM, Christian Ullrich wrote:
>>> In this case, I would prefer this:
>>> 
>>> #ifdef WIN32_ONLY_COMPILER
>>> -typedef int pid_t;
>>> +typedef intptr_t pid_t;
>>> #endif

>> That's a change that will have a pretty wide effect. Everything up to 
>> now has been pretty low risk, but this worries me rather more. Maybe 
>> it's safe, but I'd like to hear others' comments.

> Yeah, it makes me a bit nervous too.

One other thought: even if this is safe for HEAD, I think we could
*not* back-patch it into 9.5, because it would amount to an ABI
break on Windows anywhere that pid_t is used in globally visible
structs or function signatures.  (Maybe there are no such places,
but I doubt it.)  So we'd need to go with the messy-cast solution
for 9.5.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-04-23 Thread Tom Lane
Andrew Dunstan  writes:
> On 04/23/2016 05:30 PM, Christian Ullrich wrote:
>> In this case, I would prefer this:
>>
>> #ifdef WIN32_ONLY_COMPILER
>> -typedef int pid_t;
>> +typedef intptr_t pid_t;
>> #endif

> That's a change that will have a pretty wide effect. Everything up to 
> now has been pretty low risk, but this worries me rather more. Maybe 
> it's safe, but I'd like to hear others' comments.

Yeah, it makes me a bit nervous too.  We know that most of the code is
agnostic as to the width of pid_t, because it works on Unixen where
pid_t is 64bit.  But I'm less sure about whether the Windows-specific
parts are equally flexible.

On the other hand, wouldn't you expect to get compiler warnings for
any code that does get affected?  The main hazard would be incorrect
printf format flags (cf 994f11257 for a recent example), and I'd
certainly expect any C compiler worth its salt to whine about
inconsistencies there.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-04-23 Thread Andrew Dunstan



On 04/23/2016 05:30 PM, Christian Ullrich wrote:

* Andrew Dunstan wrote:


On 04/22/2016 01:21 AM, Michael Paquier wrote:



5. It also complains about us casting a pid_t to a HANDLE in
pg_basebackup.c. Not sure what to do about that.

The thing that's being cast is not a PID, but a HANDLE to a process.
pid_t is a typedef for int (in port/win32.h), therefore is always 32
bits, while HANDLE is actually void*. However, Microsoft guarantees
that kernel32 HANDLEs (this includes those to threads and processes)
fit into 32 bits on AMD64.



Yes, when casting things this way I think that a comment would be fine
in the code. We could do that as separate patches actually.


We are already casting the pid_t to HANDLE and still getting a warning.
Apparently we need to do something on win64 like

(HANDLE) ((int64) bgchild)


Ah, OK, it warns about a cast to a larger type because the value might 
get sign extended. Not unreasonable.


In this case, I would prefer this:

diff --git a/src/include/port/win32.h b/src/include/port/win32.h
index ba8cf9d..b4086f1 100644
--- a/src/include/port/win32.h
+++ b/src/include/port/win32.h
@@ -256,7 +256,7 @@ typedef int gid_t;
 typedef long key_t;

 #ifdef WIN32_ONLY_COMPILER
-typedef int pid_t;
+typedef intptr_t pid_t;
 #endif

 /*

With this change, pg_basebackup -X stream works the same when built 
for 64 and 32 bits.







That's a change that will have a pretty wide effect. Everything up to 
now has been pretty low risk, but this worries me rather more. Maybe 
it's safe, but I'd like to hear others' comments.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-04-23 Thread Christian Ullrich

* Andrew Dunstan wrote:


On 04/22/2016 01:21 AM, Michael Paquier wrote:



5. It also complains about us casting a pid_t to a HANDLE in
pg_basebackup.c. Not sure what to do about that.

The thing that's being cast is not a PID, but a HANDLE to a process.
pid_t is a typedef for int (in port/win32.h), therefore is always 32
bits, while HANDLE is actually void*. However, Microsoft guarantees
that kernel32 HANDLEs (this includes those to threads and processes)
fit into 32 bits on AMD64.



Yes, when casting things this way I think that a comment would be fine
in the code. We could do that as separate patches actually.


We are already casting the pid_t to HANDLE and still getting a warning.
Apparently we need to do something on win64 like

(HANDLE) ((int64) bgchild)


Ah, OK, it warns about a cast to a larger type because the value might 
get sign extended. Not unreasonable.


In this case, I would prefer this:

diff --git a/src/include/port/win32.h b/src/include/port/win32.h
index ba8cf9d..b4086f1 100644
--- a/src/include/port/win32.h
+++ b/src/include/port/win32.h
@@ -256,7 +256,7 @@ typedef int gid_t;
 typedef long key_t;

 #ifdef WIN32_ONLY_COMPILER
-typedef int pid_t;
+typedef intptr_t pid_t;
 #endif

 /*

With this change, pg_basebackup -X stream works the same when built for 
64 and 32 bits.


--
Christian




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-04-23 Thread Andrew Dunstan



On 04/22/2016 01:21 AM, Michael Paquier wrote:

5. It also complains about us casting a pid_t to a HANDLE in
pg_basebackup.c. Not sure what to do about that.

The thing that's being cast is not a PID, but a HANDLE to a process. pid_t is a 
typedef for int (in port/win32.h), therefore is always 32 bits, while HANDLE is 
actually void*. However, Microsoft guarantees that kernel32 HANDLEs (this 
includes those to threads and processes) fit into 32 bits on AMD64.

Source: 
https://msdn.microsoft.com/en-us/library/windows/desktop/ee872017(v=vs.85).aspx,
 third bullet point.

So we can simply silence the warning by casting explicitly.

Yes, when casting things this way I think that a comment would be fine
in the code. We could do that as separate patches actually.



We are already casting the pid_t to HANDLE and still getting a warning. 
Apparently we need to do something on win64 like


(HANDLE) ((int64) bgchild)


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-23 Thread Noah Misch
On Fri, Apr 22, 2016 at 02:03:01PM -0700, Jeff Janes wrote:
> On Thu, Apr 21, 2016 at 11:00 PM, Noah Misch  wrote:
> > Could you describe the test case in sufficient detail for Teodor to 
> > reproduce
> > your results?
> 
> [detailed description and attachments]

Thanks.

> The more I think about it, the more I think gin is just an innocent
> bystander, for which I just happen to have a particularly demanding
> test.  I think something about snapshots and wrap-around may be
> broken.

Based on your review, I think the next step on $subject is for Teodor to
either commit gin_alone_cleanup-4.patch or to tell us what he'd like to see or
do before committing that patch.

While the failures during your crash recovery testing may not even be 9.6
regressions, they suggest the presence of serious bugs.  At a minimum, the
community should confirm they are not 9.6 regressions before releasing 9.6.  I
have added an open item to that effect.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-04-23 Thread Bruce Momjian
On Thu, Mar 31, 2016 at 05:46:41PM +0200, Emre Hasegeli wrote:
> > Thank you, pushed
> 
> Thank you for working on this.
> 
> I noticed that have overlooked this:
> 
>  static RectBox *
> -initRectBox()
> +initRectBox(void)
>  {

Done, thanks.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-04-23 Thread Bruce Momjian
On Sat, Apr 23, 2016 at 12:48:08PM +0530, Amit Kapila wrote:
> On Sat, Apr 23, 2016 at 8:34 AM, Bruce Momjian  wrote:
> >
> > I kind of agreed with Tom about just aborting transactions that held
> > snapshots for too long, and liked the idea this could be set per
> > session, but the idea that we abort only if a backend actually touches
> > the old data is very nice.  I can see why the patch author worked hard
> > to do that.
> >
> > How does/did Oracle handle this?
> >
> 
> IIRC then Oracle gives this error when the space in undo tablespace (aka
> rollback segment) is low.  When the rollback segment gets full, it overwrites
> the changed data which might be required by some old snapshot and when that 
> old
> snapshot statement tries to access the data (which is already overwritten), it
> gets "snapshot too old" error.  Assuming there is enough space in rollback
> segment, Oracle seems to provide a way via Alter System set undo_retention =
> . 
> 
> Now, if the above understanding of mine is correct, then I think the current
> implementation done by Kevin is closer to what Oracle provides.

But does the rollback only happen if the long-running Oracle transaction
tries to _access_ specific data that was in the undo segment, or _any_
data that potentially could have been in the undo segment?  If the
later, it seems Kevin's approach is better because you would have to
actually need to access old data that was there to be canceled, not just
any data that could have been overwritten based on the xid.

Also, it seems we have similar behavior already in applying WAL on the
standby --- we delay WAL replay when there is a long-running
transaction.  Once the time expires, we apply the WAL.  Do we cancel the
long-running transaction at that time, or wait for the long-running
transaction to touch some WAL we just applied?  If the former, does
Kevin's new code allow us to do the later?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-04-23 Thread Tom Lane
Amit Kapila  writes:
> The main point for this improvement is that the handling for guc s_s_names
> is not similar to what we do for other somewhat similar guc's and which
> causes in-efficiency in non-hot code path (less used code).

This is not about efficiency, this is about correctness.  The proposed
v7 patch is flat out not acceptable, not now and not for 9.7 either,
because it introduces a GUC assign hook that can easily fail (eg, through
out-of-memory for the copy step).  Assign hook functions need to be
incapable of failure.  I do not see any good reason why this one cannot
satisfy that requirement, either.  It just needs to make use of the
"extra" mechanism to pass back an already-suitably-long-lived result from
check_synchronous_standby_names.  See check_timezone_abbreviations/
assign_timezone_abbreviations for a model to follow.  You are going to
need to find a way to package the parse result into a single malloc'd
blob, though, because that's as much as guc.c can keep track of for an
"extra" value.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-04-23 Thread Amit Kapila
On Sat, Apr 23, 2016 at 5:20 PM, Michael Paquier 
wrote:
>
> On Sat, Apr 23, 2016 at 7:44 PM, Amit Kapila 
wrote:
> > On Wed, Apr 20, 2016 at 12:46 PM, Kyotaro HORIGUCHI
> >  wrote:
> >>
> >>
> >> assign_s_s_names causes SEGV when it is called without calling
> >> check_s_s_names. I think that's not the case for this varialbe
> >> because it is unresettable amid a session. It is very uneasy for
> >> me but I don't see a proper means to reset
> >> syncrep_parse_result.
> >>
> >
> > Is it because syncrep_parse_result is not freed after creating a copy
of it
> > in assign_synchronous_standby_names()?  If it so, then I think we need
to
> > call SyncRepFreeConfig(syncrep_parse_result); in
> > assign_synchronous_standby_names at below place:
> >
> > + /* Copy the parsed config into TopMemoryContext if exists */
> >
> > + if (syncrep_parse_result)
> >
> > + SyncRepConfig = SyncRepCopyConfig(syncrep_parse_result);
> >
> > Could you please explain how to trigger the scenario where you have seen
> > SEGV?
>
> Seeing this discussion moving on, I am wondering if we should not
> discuss those improvements for 9.7.
>

The main point for this improvement is that the handling for guc s_s_names
is not similar to what we do for other somewhat similar guc's and which
causes in-efficiency in non-hot code path (less used code).  So, we can
push this improvement to 9.7, but OTOH we can also consider it as a
non-beta blocker issue and see if we can make this code path better in the
mean time.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-04-23 Thread Amit Kapila
On Tue, Apr 19, 2016 at 8:41 PM, Kevin Grittner  wrote:
>
> On Tue, Apr 19, 2016 at 9:57 AM, Amit Kapila 
wrote:
> > On Sun, Apr 17, 2016 at 2:26 AM, Andres Freund 
wrote:
> >>
> >> On 2016-04-16 16:44:52 -0400, Noah Misch wrote:
> >> > That is more controversial than the potential ~2% regression for
> >> > old_snapshot_threshold=-1.  Alvaro[2] and Robert[3] are okay
releasing
> >> > that way, and Andres[4] is not.
> >>
> >> FWIW, I could be kinda convinced that it's temporarily ok, if there'd
be
> >> a clear proposal on the table how to solve the scalability issue around
> >> MaintainOldSnapshotTimeMapping().
> >
> > It seems that for read-only workloads, MaintainOldSnapshotTimeMapping()
> > takes EXCLUSIVE LWLock which seems to be a probable reason for a
performance
> > regression.  Now, here the question is do we need to acquire that lock
if
> > xmin is not changed since the last time value of
> > oldSnapshotControl->latest_xmin is updated or xmin is lesser than equal
to
> > oldSnapshotControl->latest_xmin?
> > If we don't need it for above cases, I think it can address the
performance
> > regression to a good degree for read-only workloads when the feature is
> > enabled.
>
> Thanks, Amit -- I think something along those lines is the right
> solution to the scaling issues when the feature is enabled.
>

I have tried attached patch along the above lines and it seems that it
addresses performance regression to a good degree when feature is enabled
at moderate client-count like 32, but still more needs to be done for
somewhat higher client-count like 64.


Performance data is for median of 3, 5 min runs of read-only workload -
pgbench -c $client_count -j $client_count -T 300 -M prepared -S postgres

o_s_t - old_snapshot_threshold


Client_Count/Patch_Ver 32 64
HEAD (o_s_t = -1) 354077 552063
HEAD (o_s_t = 1) 92809 55847
Patch (o_s_t = 1) 319759 191741


If you think that attached patch is correct functionality wise, then I
think we can go-ahead with it and then investigate what more can be
improved.  I think newly introduced spinlocks might be the reason of
performance degradation at higher client-count, if that turns out to be
true, then I think we can replace them with atomics, once Andres's patch
for completing the 64-bit atomics implementation is committed.

m/c details used for performance testing
Architecture:  ppc64le
Byte Order:Little Endian
CPU(s):192
On-line CPU(s) list:   0-191
Thread(s) per core:8
Core(s) per socket:1
Socket(s): 24
NUMA node(s):  4
Model: IBM,8286-42A
L1d cache: 64K
L1i cache: 32K
L2 cache:  512K
L3 cache:  8192K
NUMA node0 CPU(s): 0-47
NUMA node1 CPU(s): 48-95
NUMA node2 CPU(s): 96-143
NUMA node3 CPU(s): 144-191



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


old_snapshot_threshold_perf_issue_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-04-23 Thread Michael Paquier
On Sat, Apr 23, 2016 at 7:44 PM, Amit Kapila  wrote:
> On Wed, Apr 20, 2016 at 12:46 PM, Kyotaro HORIGUCHI
>  wrote:
>>
>>
>> assign_s_s_names causes SEGV when it is called without calling
>> check_s_s_names. I think that's not the case for this varialbe
>> because it is unresettable amid a session. It is very uneasy for
>> me but I don't see a proper means to reset
>> syncrep_parse_result.
>>
>
> Is it because syncrep_parse_result is not freed after creating a copy of it
> in assign_synchronous_standby_names()?  If it so, then I think we need to
> call SyncRepFreeConfig(syncrep_parse_result); in
> assign_synchronous_standby_names at below place:
>
> + /* Copy the parsed config into TopMemoryContext if exists */
>
> + if (syncrep_parse_result)
>
> + SyncRepConfig = SyncRepCopyConfig(syncrep_parse_result);
>
> Could you please explain how to trigger the scenario where you have seen
> SEGV?

Seeing this discussion moving on, I am wondering if we should not
discuss those improvements for 9.7. We are getting close to beta 1,
and this is clearly not a bug, and it's not like HEAD is broken. So I
think that we should not take the risk to make the code unstable at
this stage.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-04-23 Thread Amit Kapila
On Wed, Apr 20, 2016 at 12:46 PM, Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp> wrote:
>
>
> assign_s_s_names causes SEGV when it is called without calling
> check_s_s_names. I think that's not the case for this varialbe
> because it is unresettable amid a session. It is very uneasy for
> me but I don't see a proper means to reset
> syncrep_parse_result.
>

Is it because syncrep_parse_result is not freed after creating a copy of it
in assign_synchronous_standby_names()?  If it so, then I think we need to
call SyncRepFreeConfig(syncrep_parse_result); in
assign_synchronous_standby_names at below place:

+ /* Copy the parsed config into TopMemoryContext if exists */

+ if (syncrep_parse_result)

+ SyncRepConfig = SyncRepCopyConfig(syncrep_parse_result);

Could you please explain how to trigger the scenario where you have seen
SEGV?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] max_parallel_degree > 0 for 9.6 beta

2016-04-23 Thread Gavin Flower

On 22/04/16 17:36, Amit Kapila wrote:
On Fri, Apr 22, 2016 at 1:31 AM, Gavin Flower 
mailto:gavinflo...@archidevsys.co.nz>> 
wrote:


On 22/04/16 06:07, Robert Haas wrote:

On Thu, Apr 21, 2016 at 1:48 PM, Tom Lane mailto:t...@sss.pgh.pa.us>> wrote:

Robert Haas mailto:robertmh...@gmail.com>> writes:

On Wed, Apr 20, 2016 at 2:28 PM, Tom Lane
mailto:t...@sss.pgh.pa.us>> wrote:

Andres Freund mailto:and...@anarazel.de>> writes:

max_parallel_degree currently defaults to 0. 
I think we should enable

it by default for at least the beta period.
Otherwise we're primarily
going to get reports back after the release.

So, I suggest that the only sensible non-zero values
here are probably
"1" or "2", given a default pool of 8 worker processes
system-wide.
Andres told me yesterday he'd vote for "2".  Any other
opinions?

It has to be at least 2 for beta purposes, else you are
not testing
situations with more than one worker process at all, which
would be
rather a large omission no?

That's what Andres, thought, too.  From my point of view, the big
thing is to be using workers at all.  It is of course possible
that
there could be some bugs where a single worker is not enough, but
there's a lot of types of bug where even one worker would probably
find the problem.  But I'm OK with changing the default to 2.

I'm curious.

Why not 4?


IIUC, the idea to change max_parallel_degree for beta is to catch any 
bugs in parallelism code, not to do any performance testing of same.  
So, I think either 1 or 2 should be sufficient to hit the bugs if 
there are any.  Do you have any reason to think that we might miss 
some category of bugs if we don't use higher max_parallel_degree?



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com 
No.  Just felt that 4 would not be too great for the type of processor 
chips used on servers to handle.


For complications, such as race conditions and implied logical 
assumptions - I tend to think of 0, 1, 2, 3, many.


Essentially just a gut feeling that 4 might reveal more corner cases.


Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-04-23 Thread Amit Kapila
On Sat, Apr 23, 2016 at 8:34 AM, Bruce Momjian  wrote:
>
> On Tue, Apr 19, 2016 at 07:38:04AM -0400, Robert Haas wrote:
> > 2. Without this feature, you can kill sessions or transactions to
> > control bloat, but this feature is properly thought of as a way to
> > avoid bloat *without* killing sessions or transactions.  You can let
> > the session live, without having it generate bloat, just so long as it
> > doesn't try to touch any data that has been recently modified.  We
> > have no other feature in PostgreSQL that does something like that.
>
> I kind of agreed with Tom about just aborting transactions that held
> snapshots for too long, and liked the idea this could be set per
> session, but the idea that we abort only if a backend actually touches
> the old data is very nice.  I can see why the patch author worked hard
> to do that.
>
> How does/did Oracle handle this?
>

IIRC then Oracle gives this error when the space in undo tablespace (aka
rollback segment) is low.  When the rollback segment gets full, it
overwrites the changed data which might be required by some old snapshot
and when that old snapshot statement tries to access the data (which is
already overwritten), it gets "snapshot too old" error.  Assuming there is
enough space in rollback segment, Oracle seems to provide a way via Alter
System set undo_retention = .

Now, if the above understanding of mine is correct, then I think the
current implementation done by Kevin is closer to what Oracle provides.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com