Re: [HACKERS] FIPS mode?

2017-06-23 Thread Tom Lane
Michael Paquier  writes:
> On Sat, Jun 24, 2017 at 12:56 PM, Curtis Ruck
>  wrote:
>> If I clean this up some, maintain styleguide, what is the likely hood of
>> getting this included in the redhat packages, since redhat ships a certified
>> FIPS implementation?

> So they are applying a custom patch to it already?

Don't believe so.  It's been a few years since I was at Red Hat, but
my recollection is that their approach was that it was a system-wide
configuration choice changing libc's behavior, and there were only very
minor fixes required to PG's behavior, all of which got propagated
upstream (see, eg, commit 01824385a).  It sounds like Curtis is trying
to enable FIPS mode inside Postgres within a system where it isn't enabled
globally, which according to my recollection has basically nothing to do
with complying with the actual federal security standard.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Race conditions with WAL sender PID lookups

2017-06-23 Thread Noah Misch
On Wed, Jun 21, 2017 at 10:52:50PM -0700, Noah Misch wrote:
> On Thu, Jun 15, 2017 at 09:04:44AM +0100, Simon Riggs wrote:
> > On 15 June 2017 at 02:59, Noah Misch  wrote:
> > 
> > > Formally, the causative commit is the one that removed the superuser() 
> > > test,
> > > namely 25fff40.
> > >
> > > [Action required within three days.  This is a generic notification.]
> > >
> > > The above-described topic is currently a PostgreSQL 10 open item.  Simon,
> > > since you committed the patch believed to have created it, you own this 
> > > open
> > > item.  If some other commit is more relevant or if this does not belong 
> > > as a
> > > v10 open item, please let us know.  Otherwise, please observe the policy 
> > > on
> > > open item ownership[1] and send a status update within three calendar 
> > > days of
> > > this message.  Include a date for your subsequent status update.  Testers 
> > > may
> > > discover new open items at any time, and I want to plan to get them all 
> > > fixed
> > > well in advance of shipping v10.  Consequently, I will appreciate your 
> > > efforts
> > > toward speedy resolution.  Thanks.
> > >
> > > [1] 
> > > https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
> > 
> > Thanks for letting me know. I'm on leave at present, so won't be able
> > to get to this until June 20, though will make it a priority for then.
> 
> This PostgreSQL 10 open item is past due for your status update.  Kindly send
> a status update within 24 hours, and include a date for your subsequent status
> update.  Refer to the policy on open item ownership:
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 10 open item is long past due
for your status update.  Please reacquaint yourself with the policy on open
item ownership[1] and then reply immediately.  If I do not hear from you by
2017-06-25 06:00 UTC, I will transfer this item to release management team
ownership without further notice.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FIPS mode?

2017-06-23 Thread Michael Paquier
On Sat, Jun 24, 2017 at 12:56 PM, Curtis Ruck
 wrote:
> I've got a requirement for enabling FIPS support in our environment.
> Looking at postgresql's be-secure-openssl.c and mucking with it, it seems
> fairly straight forward to just add a few ifdefs and enable fips with a new
> configure flag and a new postgresql.conf configuration setting.
>
> If I clean this up some, maintain styleguide, what is the likely hood of
> getting this included in the redhat packages, since redhat ships a certified
> FIPS implementation?

So they are applying a custom patch to it already?

> For what its worth, I've got the FIPS_mode_set(1) working and postgresql
> seems to function properly.  I'd just like to see this in upstream so I
> don't end up maintaining a long-lived branch.
>
> Looking at scope, logically it seems mostly confined to libpq, and
> be-secure-openssl.c, though i'd expect pgcrypto to be affected.

Yes, I would imagine atht this is located into be-secure-openssl.c and
fe-secure-openssl.c as everything should be done when initializing the
SSL context.

Here is a manual about patch submission:
https://wiki.postgresql.org/wiki/Submitting_a_Patch

As things are now, the next version where new features are accepted
will be 11, with a commit fest beginning in September. Here is its
website where patches need to be registered for review and possible
integration into the tree:
https://commitfest.postgresql.org/
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Causal reads take II

2017-06-23 Thread Thomas Munro
On Fri, Jun 23, 2017 at 11:48 PM, Thomas Munro
 wrote:
> Apply the patch after first applying a small bug fix for replication
> lag tracking[4].  Then:

That bug fix was committed, so now causal-reads-v17.patch can be
applied directly on top of master.

> 1.  Set up some streaming replicas.
> 2.  Stick causal_reads_max_replay_lag = 2s (or any time you like) in
> the primary's postgresql.conf.
> 3.  Set causal_reads = on in some transactions on various nodes.
> 4.  Try to break it!

Someone asked me off-list how to set this up quickly and easily for
testing.  Here is a shell script that will start up a primary server
(port 5432) and 3 replicas (ports 5441 to 5443).  Set the two paths at
the top of the file before running in.  Log in with psql postgres [-p
], then SET causal_reads = on to test its effect.
causal_reads_max_replay_lay is set to 2s and depending on your
hardware you might find that stuff like CREATE TABLE big_table AS
SELECT generate_series(1, 1000) or a large COPY data load causes
replicas to be kicked out of the set after a while; you can also pause
replay on the replicas with SELECT pg_wal_replay_pause() and
pg_wal_replay_resume(), kill -STOP/-CONT or -9 the walreceiver
processes to similar various failure modes, or run the replicas
remotely and unplug the network.  SELECT application_name, replay_lag,
causal_reads_state FROM pg_state_replication to see the current
situation, and also monitor the primary's LOG messages about
transitions.  You should find that the
"read-your-writes-or-fail-explicitly" guarantee is upheld, no matter
what you do, and furthermore than failing or lagging replicas don't
hold hold the primary up very long: in the worst case
causal_reads_lease_time for lost contact, and in the best case the
time to exchange a couple of messages with the standby to tell it its
lease is revoked and it should start raising an error.  You might find
test-causal-reads.c[1] useful for testing.

> Maybe it needs a better name.

Ok, how about this: the feature could be called "synchronous replay".
The new column in pg_stat_replication could be called sync_replay
(like the other sync_XXX columns).  The GUCs could be called
synchronous replay, synchronous_replay_max_lag and
synchronous_replay_lease_time.  The language in log messages could
refer to standbys "joining the synchronous replay set".

Restating the purpose of the feature with that terminology:  If
synchronous_replay is set to on, then you see the effects of all
synchronous_replay = on transactions that committed before your
transaction began, or an error is raised if that is not possible on
the current node.  This allows applications to direct read-only
queries to read-only replicas for load balancing without seeing stale
data.  Is that clearer?

Restating the relationship with synchronous replication with that
terminology: while synchronous_commit and synchronous_standby_names
are concerned with distributed durability, synchronous_replay is
concerned with distributed visibility.  While the former prevents
commits from returning if the configured level of durability isn't met
(for example "must be flushed on master + any 2 standbys"), the latter
will simply drop any standbys from the synchronous replay set if they
fail or lag more than synchronous_replay_max_lag.  It is reasonable to
want to use both features at once:  my policy on distributed
durability might be that I want all transactions to be flushed to disk
on master + any of three servers before I report information to users,
and my policy on distributed visibility might be that I want to be
able to run read-only queries on any of my six read-only replicas, but
don't want to wait for any that lag by more than 1 second.

Thoughts?

[1] 
https://www.postgresql.org/message-id/CAEepm%3D3NF%3D7eLkVR2fefVF9bg6RxpZXoQFmOP3RWE4r4iuO7vg%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com


test-causal-reads.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] FIPS mode?

2017-06-23 Thread Curtis Ruck
I've got a requirement for enabling FIPS support in our environment.
Looking at postgresql's be-secure-openssl.c and mucking with it, it seems
fairly straight forward to just add a few ifdefs and enable fips with a new
configure flag and a new postgresql.conf configuration setting.

If I clean this up some, maintain styleguide, what is the likely hood of
getting this included in the redhat packages, since redhat ships a
certified FIPS implementation?

For what its worth, I've got the FIPS_mode_set(1) working and postgresql
seems to function properly.  I'd just like to see this in upstream so I
don't end up maintaining a long-lived branch.

Looking at scope, logically it seems mostly confined to libpq, and
be-secure-openssl.c, though i'd expect pgcrypto to be affected.


Re: [HACKERS] Broken hint bits (freeze)

2017-06-23 Thread Amit Kapila
On Fri, Jun 23, 2017 at 8:18 PM, Bruce Momjian  wrote:
> On Fri, Jun 23, 2017 at 08:10:17AM +0530, Amit Kapila wrote:
>> On Wed, Jun 21, 2017 at 10:03 PM, Bruce Momjian  wrote:
>> > On Wed, Jun 21, 2017 at 07:49:21PM +0530, Amit Kapila wrote:
>> >> On Tue, Jun 20, 2017 at 7:24 PM, Amit Kapila  
>> >> wrote:
>> >> > Hmm.  I think we need something that works with lesser effort because
>> >> > not all users will be as knowledgeable as you are, so if they make any
>> >> > mistakes in copying the file manually, it can lead to problems.  How
>> >> > about issuing a notification (XLogArchiveNotifySeg) in shutdown
>> >> > checkpoint if archiving is enabled?
>> >> >
>> >>
>> >> I have thought more about the above solution and it seems risky to
>> >> notify archiver for incomplete WAL segments (which will be possible in
>> >> this case as there is no guarantee that Checkpoint record will fill
>> >> the segment).  So, it seems to me we should update the document unless
>> >> you or someone has some solution to this problem.
>> >
>> > The over-arching question is how do we tell users to verify that the WAL
>> > has been replayed on the standby?  I am thinking we would say that for
>> > streaming replication, the "Latest checkpoint location" should match on
>> > the primary and standby, while for log shipping, the standbys should be
>> > exactly one WAL file less than the primary.
>> >
>>
>> I am not sure if we can say "standbys should be exactly one WAL file
>> less than the primary" because checkpoint can create few more WAL
>> segments for future use.  I think to make this work user needs to
>> carefully just copy the next WAL segment (next to the last file in
>> standby) which will contain checkpoint record.  Ideally, there should
>> be some way either in form of a tool or a functionality in the
>> database server with which this last file can be copied but I think in
>> the absence of that we can at least document this fact.
>
> I was not clear.  I was not saying there can be only one extra WAL file.
> I am saying the "Latest checkpoint location" should be one WAL file
> farther on the master.  I think the big problem is that we need a full
> replay of that WAL file, not just having it one less than the master.
>

If the user has properly shutdown, then that last file should only
have checkpoint record, is it safe to proceed with upgrade without
actually copying that file?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Broken hint bits (freeze)

2017-06-23 Thread Amit Kapila
On Fri, Jun 23, 2017 at 8:47 PM, Sergey Burladyan  wrote:
> Bruce Momjian  writes:
>
>> On Wed, Jun 21, 2017 at 07:49:21PM +0530, Amit Kapila wrote:
>> > On Tue, Jun 20, 2017 at 7:24 PM, Amit Kapila  
>> > wrote:
>> > > Hmm.  I think we need something that works with lesser effort because
>> > > not all users will be as knowledgeable as you are, so if they make any
>> > > mistakes in copying the file manually, it can lead to problems.  How
>> > > about issuing a notification (XLogArchiveNotifySeg) in shutdown
>> > > checkpoint if archiving is enabled?
>> > >
>> >
>> > I have thought more about the above solution and it seems risky to
>> > notify archiver for incomplete WAL segments (which will be possible in
>> > this case as there is no guarantee that Checkpoint record will fill
>> > the segment).  So, it seems to me we should update the document unless
>> > you or someone has some solution to this problem.
>
>> As far as I know this is the only remaining open issue.  Sergey, please
>> verify.  I appreciate the work everyone has done to improve this, and
>> all the existing fixes have been pushed to all supported branches.  :-)
>
> Yes, thank you all for your help!
>
> Yes, this is last issue with checkpoint that I know, how to ensure that
> standby sync all shared buffers into disk on it shutdown.
>

I think if we have a command like Alter System Flush Shared Buffers,
then it would have been helpful in what you need here.  You could have
run it before shutdown.

> I thinking about enforce restartpoint on shutdown, like:
> src/backend/access/transam/xlog.c
> -   8639 if (XLogRecPtrIsInvalid(lastCheckPointRecPtr) ||
> -   8640 XLByteLE(lastCheckPoint.redo, 
> ControlFile->checkPointCopy.redo))
> -   8641 {
> +   8639 if ( !(flags & CHECKPOINT_IS_SHUTDOWN) && 
> (XLogRecPtrIsInvalid(lastCheckPointRecPtr) ||
> +   8640 XLByteLE(lastCheckPoint.redo, 
> ControlFile->checkPointCopy.redo) )
> +   8641 {
>
> But I still not read source and not sure about this solution.
>

It might serve your purpose, but I think it will not be safe to
perform restartpoint always at shutdown.  It will delete the WAL files
which should be deleted only after the actual checkpoint record is
received from the master.

>
> PS:
> I successfully upgraded last night from 9.2 to 9.4 and find other issue :-)
>
> It is about hash index and promote:
> 1. create master
> 2. create standby from it
> 3. create unlogged table and hash index like:
>  create unlogged table test (id int primary key, v text);
>  create index on test using hash (id);
> 3. stop master
> 4. promote standby
>
> now, if you try to upgrade this new promoted master pg_upgrade will stop
> on this hash index:
> error while creating link for relation "public.test_id_idx" 
> ("s/9.2/base/16384/16393" to "m/9.4/base/16422/16393"): No such file or 
> directory
> Failure, exiting
>

I am not sure if this is a problem because in the version you are
trying hash indexes are not WAL-logged and the creation of same will
not be replicated on standby, so the error seems to be expected.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An attempt to reduce WALWriteLock contention

2017-06-23 Thread Amit Kapila
On Thu, Jun 22, 2017 at 6:54 AM, Andres Freund  wrote:
> On 2017-06-21 00:57:32 -0700, jasrajd wrote:
>> We are also seeing contention on the walwritelock and repeated writes to the
>> same offset if we move the flush outside the lock in the Azure environment.
>> pgbench doesn't scale beyond ~8 cores without saturating the IOPs or
>> bandwidth. Is there more work being done in this area?
>

That should not happen if the writes from various backends are
combined in some way.  However, it is not very clear what exactly you
have done as part of taking flush calls out of walwritelock.  Can you
share patch or some details about how you have done it and how have
you measured the contention you are seeing?



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication: stuck spinlock at ReplicationSlotRelease

2017-06-23 Thread Tom Lane
Peter Eisentraut  writes:
> Do you want to take a look at move those elog calls around a bit?  That
> should do it.

It would be a good idea to have some clarity on *why* that should do it.

Looking at the original report's log, but without having actually
reproduced the problem, I guess what is happening is this:

1. Subscription worker process (23117) gets a duplicate key conflict while
trying to apply an update, and in consequence it exits.  (Is that supposed
to happen?)

2. Publication server process (23124) doesn't notice client connection
loss right away.  By chance, the next thing it tries to send to the client
is the debug output from LogicalIncreaseRestartDecodingForSlot.  Then it
detects loss of connection (at 2017-06-21 14:55:12.033) and FATAL's out.
But since the spinlock stuff has no tracking infrastructure, we don't
know we are still holding the replication slot mutex.

3. Process exit cleanup does know that it's supposed to release the
replication slot, so it tries to take the mutex spinlock ... again.
Eventually that times out and we get the "stuck spinlock" panic.

All correct so far?

So, okay, the proximate cause of the crash is a blatant violation of the
rule that spinlocks may only be held across straight-line code segments.
But I'm wondering about the client exit having occurred in the first
place.  Why is that, and how would one ever recover?  It sure looks
like this isn't the first subscription worker process that has tried
and failed to apply the update.  If our attitude towards this situation is
that it's okay to fork-bomb your server with worker processes continually
respawning and making no progress, well, I don't think that's good enough.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-06-23 Thread Peter Eisentraut
On 6/21/17 22:47, Peter Eisentraut wrote:
> On 6/20/17 22:44, Noah Misch wrote:
>>> A patch has been posted, and it's being reviewed.  Next update Monday.
>>
>> This PostgreSQL 10 open item is past due for your status update.  Kindly send
>> a status update within 24 hours, and include a date for your subsequent 
>> status
>> update.  Refer to the policy on open item ownership:
>> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
> 
> I'm not sure how to proceed here.  Nobody is else seems terribly
> interested, and this is really a minor issue, so I don't want to muck
> around with the locking endlessly.  Let's say, if there are no new
> insights by Friday, I'll pick one of the proposed patches or just close
> it without any patch.

After some comments, a new patch has been posted, and I'll give until
Monday for review.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-06-23 Thread Peter Eisentraut
On 6/22/17 03:26, Masahiko Sawada wrote:
> Thank you for the patch. Some comment and question.
> DropSubscriptionRelState requests ExclusiveLock but why is it not
> ShareRowExclusiveLock?

fixed

> I test DROP SUBSCIPTION case but even with this patch, "tuple
> concurrently updated" is still occurred.
> 
> @@ -405,7 +405,7 @@ RemoveSubscriptionRel(Oid subid, Oid relid)
>   }
>   heap_endscan(scan);
> 
> - heap_close(rel, RowExclusiveLock);
> + heap_close(rel, lockmode);
>  }
> 
> I think we should not release the table lock here, should be
> heap_close(rel, NoLock) instead? After changed it so on my
> environment, DROP SUBSCRIPTION seems to work fine.

fixed

> Also, ALTER SUBSCRIPTION SET PUBLICATION seems the same. Even with
> Petr's patch, the same error still occurred during ALTER SUBSCRIPTION
> SET PUBLICATION. Currently it acquires RowExclusiveLock on
> pg_subscription_rel but as long as both DDL and table sync worker
> could modify the same record on  pg_subscription this error can
> happen.

fixed

> On the other hand if we take a strong lock on pg_subscription
> during DDL the deadlock risk will be increased.

The original problem was that DROP TABLE locked things in the order 1)
user table, 2) pg_subscription_rel, whereas a full-database VACUUM would
lock things in the opposite order.  I think this was a pretty wide
window if you have many tables.  In this case, however, we are only
dealing with pg_subscription and pg_subscription_rel, and I think even
VACUUM would always processes them in the same order.

Could please try the attached patch so see if it addresses your test
scenarios?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From fc50b9e287e8c3c29be21d82c07c39023cbf3882 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 23 Jun 2017 21:26:46 -0400
Subject: [PATCH v2] Parametrize locking in RemoveSubscriptionRel

Revising the change in 521fd4795e3ec3d0b263b62e5eb58e1557be9c86, we need
to take a stronger lock when removing entries from pg_subscription_rel
when ALTER/DROP SUBSCRIPTION.  Otherwise, users might see "tuple
concurrently updated" when table sync workers make updates at the same
time as DDL commands are run.  For DROP TABLE, we keep the lower lock
level that was put in place by the above commit, so that DROP TABLE does
not deadlock with whole-database VACUUM.
---
 src/backend/catalog/heap.c| 2 +-
 src/backend/catalog/pg_subscription.c | 6 +++---
 src/backend/commands/subscriptioncmds.c   | 4 ++--
 src/include/catalog/pg_subscription_rel.h | 2 +-
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index a376b99f1e..8a96bbece6 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -1844,7 +1844,7 @@ heap_drop_with_catalog(Oid relid)
/*
 * Remove any associated relation synchronization states.
 */
-   RemoveSubscriptionRel(InvalidOid, relid);
+   RemoveSubscriptionRel(InvalidOid, relid, RowExclusiveLock);
 
/*
 * Forget any ON COMMIT action for the rel
diff --git a/src/backend/catalog/pg_subscription.c 
b/src/backend/catalog/pg_subscription.c
index c69c461b62..1d80191ca8 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -369,7 +369,7 @@ GetSubscriptionRelState(Oid subid, Oid relid, XLogRecPtr 
*sublsn,
  * subscription, or for a particular relation, or both.
  */
 void
-RemoveSubscriptionRel(Oid subid, Oid relid)
+RemoveSubscriptionRel(Oid subid, Oid relid, LOCKMODE lockmode)
 {
Relationrel;
HeapScanDesc scan;
@@ -377,7 +377,7 @@ RemoveSubscriptionRel(Oid subid, Oid relid)
HeapTuple   tup;
int nkeys = 0;
 
-   rel = heap_open(SubscriptionRelRelationId, RowExclusiveLock);
+   rel = heap_open(SubscriptionRelRelationId, lockmode);
 
if (OidIsValid(subid))
{
@@ -405,7 +405,7 @@ RemoveSubscriptionRel(Oid subid, Oid relid)
}
heap_endscan(scan);
 
-   heap_close(rel, RowExclusiveLock);
+   heap_close(rel, NoLock);
 }
 
 
diff --git a/src/backend/commands/subscriptioncmds.c 
b/src/backend/commands/subscriptioncmds.c
index 9cbd36f646..57954c978b 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -595,7 +595,7 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
{
char   *namespace;
 
-   RemoveSubscriptionRel(sub->oid, relid);
+   RemoveSubscriptionRel(sub->oid, relid, 
ShareRowExclusiveLock);
 
logicalrep_worker_stop(sub->oid, relid);
 
@@ -910,7 +910,7 @@ DropSubscription(DropSubscriptionStmt *stmt, bool 
isTopLevel)
deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
 
/* 

Re: [HACKERS] Logical replication: stuck spinlock at ReplicationSlotRelease

2017-06-23 Thread Peter Eisentraut
On 6/23/17 16:15, Andres Freund wrote:
> On 2017-06-23 13:26:58 -0400, Alvaro Herrera wrote:
>> Hmm, so for instance in LogicalIncreaseRestartDecodingForSlot() we have
>> some elog(DEBUG1) calls with the slot spinlock held.  That's probably
>> uncool.
> 
> It definitely is not cool, rather daft even (it's probably me who wrote
> that).  No idea why I've done it like that, makes no sense.

Do you want to take a look at move those elog calls around a bit?  That
should do it.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_terminate_backend can terminate background workers and autovacuum launchers

2017-06-23 Thread Stephen Frost
Alvaro, all,

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> Yugo Nagata wrote:
> 
> > I tried to make it. Patch attached. 
> > 
> > It is easy and simple. Although I haven't tried for autovacuum worker,
> > I confirmed I can change other process' parameters without affecting others.
> > Do you want this in PG?
> 
> One advantage of this gadget is that you can signal backends that you
> own without being superuser, so +1 for the general idea.  (Please do
> create a fresh thread so that this can be appropriately linked to in
> commitfest app, though.)

Well, that wouldn't quite work with the proposed patch because the
proposed patch REVOKE's execute from public...

I'm trying to figure out how it's actually useful to be able to signal a
backend that you own about a config change that you *aren't* able to
make without being a superuser..  Now, if you could tell the other
backend to use a new value for some USERSET GUC, then *that* would be
useful and interesting.

I haven't got any particularly great ideas for how to make that happen
though.

> You need a much bigger patch for the autovacuum worker.  The use case I
> had in mind is that you have a maintenance window and can run fast
> vacuuming during it, but if the table is large and vacuum can't finish
> within that window then you want vacuum to slow down, without stopping
> it completely.  But implementing this requires juggling the
> stdVacuumCostDelay/Limit values during the reload, which are currently
> read at start of vacuuming only; the working values are overwritten from
> those during a rebalance.

Being able to change those values during a vacuuming run would certainly
be useful, I've had cases where I would have liked to have been able to
do just exactly that.  That's largely independent of this though.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Logical replication: stuck spinlock at ReplicationSlotRelease

2017-06-23 Thread Peter Eisentraut
On 6/23/17 16:10, Peter Eisentraut wrote:
>> Hmm, so for instance in LogicalIncreaseRestartDecodingForSlot() we have
>> some elog(DEBUG1) calls with the slot spinlock held.  That's probably
>> uncool.

> Removing the call you pointed out doesn't make a difference, but it's
> possibly something similar.

Correction: The fault actually is in that function.  I had only looked
at the "if" branch, but it was the "else" branch that triggered the
problem in this run.  If I remove those elog calls, everything works
smoothly.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small bug in replication lag tracking

2017-06-23 Thread Thomas Munro
On Sat, Jun 24, 2017 at 6:07 AM, Simon Riggs  wrote:
> On 23 June 2017 at 08:18, Simon Riggs  wrote:
>> On 23 June 2017 at 06:45, Thomas Munro  wrote:
>>
>>> I discovered a thinko in the new replication lag interpolation code
>>> that can cause a strange number to be reported occasionally.
>>
>> Thanks, will review.
>
> I've pushed this, but I think we should leave the code alone after
> this and wait for user feedback.

Thanks!  Yeah, I haven't heard any feedback about this, which I've
been interpreting as good news...  but if anyone's looking for
something to beta test, reports would be most welcome.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Can ICU be used for a database's default sort order?

2017-06-23 Thread Tom Lane
Peter Geoghegan  writes:
> On Fri, Jun 23, 2017 at 11:32 AM, Peter Eisentraut
>  wrote:
>> 1) Associate by name only.  That is, you can create a database with any
>> COLLATION "foo" that you want, and it's only checked when you first
>> connect to or do anything in the database.
>> 
>> 2) Create shared collations.  Then we'd need a way to manage having a
>> mix of shared and non-shared collations around.
>> 
>> There are significant pros and cons to all of these ideas.  Some people
>> I talked to appeared to prefer the shared collations approach.

> I strongly prefer the second approach. The only downside that occurs
> to me is that that approach requires more code. Is there something
> that I've missed?

I'm not very clear on how you'd bootstrap template1 into anything
other than C locale in the second approach.  With our existing
libc-based stuff, it's possible to define what the database's locale
is before there are any catalogs.  It's not apparent how to do that with
a collation-based solution.

In my mind, collations are just a SQL-syntax wrapper for locales that
are really defined one level down.  I think we'd be well advised to
carry that same approach into the database properties, because otherwise
we have circularities to deal with. So I'm imagining something more like

create database encoding 'utf8' lc_collate 'icu-en_US' lc_ctype ...

where lc_collate is just a string that we know how to interpret, the
same as now.

We could optionally reduce the amount of notation involved by merging the
lc_collate and lc_ctype parameters into one, say

create database encoding 'utf8' locale 'icu-en_US' ...

I'm not too clear on how this would play with other libc locale
functionality (lc_monetary and so on), but we'd have to deal with
that question anyway.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_terminate_backend can terminate background workers and autovacuum launchers

2017-06-23 Thread Michael Paquier
On Sat, Jun 24, 2017 at 5:07 AM, Alvaro Herrera
 wrote:
> Yugo Nagata wrote:
>
>> I tried to make it. Patch attached.
>>
>> It is easy and simple. Although I haven't tried for autovacuum worker,
>> I confirmed I can change other process' parameters without affecting others.
>> Do you want this in PG?

Just browsing the patch...

+if (r == SIGNAL_BACKEND_NOSUPERUSER)
+ereport(WARNING,
+(errmsg("must be a superuser to terminate superuser
process")));
+
+if (r == SIGNAL_BACKEND_NOPERMISSION)
+ereport(WARNING,
+ (errmsg("must be a member of the role whose process
is being terminated or member of pg_signal_backend")));
Both messages are incorrect. This is not trying to terminate a process.

+Datum
+pg_reload_conf_pid(PG_FUNCTION_ARGS)
I think that the naming is closer to pg_reload_backend.

Documentation is needed as well.

> One advantage of this gadget is that you can signal backends that you
> own without being superuser, so +1 for the general idea.  (Please do
> create a fresh thread so that this can be appropriately linked to in
> commitfest app, though.)

That would be nice.

> You need a much bigger patch for the autovacuum worker.  The use case I
> had in mind is that you have a maintenance window and can run fast
> vacuuming during it, but if the table is large and vacuum can't finish
> within that window then you want vacuum to slow down, without stopping
> it completely.  But implementing this requires juggling the
> stdVacuumCostDelay/Limit values during the reload, which are currently
> read at start of vacuuming only; the working values are overwritten from
> those during a rebalance.

Yeah, that's independent from the patch above.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Can ICU be used for a database's default sort order?

2017-06-23 Thread Peter Geoghegan
On Fri, Jun 23, 2017 at 11:32 AM, Peter Eisentraut
 wrote:
> It's something I hope to address soon.

I hope you do. I think that we'd realize significant benefits by
having ICU become the defacto standard collation provider, that most
users get without even realizing it. As things stand, you have to make
a point of specifying an ICU collation as your per-column collation
within every CREATE TABLE. That's a significant barrier to adoption.

> 1) Associate by name only.  That is, you can create a database with any
> COLLATION "foo" that you want, and it's only checked when you first
> connect to or do anything in the database.
>
> 2) Create shared collations.  Then we'd need a way to manage having a
> mix of shared and non-shared collations around.
>
> There are significant pros and cons to all of these ideas.  Some people
> I talked to appeared to prefer the shared collations approach.

I strongly prefer the second approach. The only downside that occurs
to me is that that approach requires more code. Is there something
that I've missed?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] initdb initalization failure for collation "ja_JP"

2017-06-23 Thread Tom Lane
I wrote:
>> One question that I've got is why the ICU portion refuses to load
>> any entries unless is_encoding_supported_by_icu(GetDatabaseEncoding()).
>> Surely this is completely wrong?  I should think that what we load into
>> pg_collation ought to be independent of template1's encoding, the same
>> as it is for libc collations, and the right place to be making a test
>> like that is where somebody attempts to use an ICU collation.

Pursuant to the second part of that: I checked on what happens if you
try to use an ICU collation in a database with a not-supported-by-ICU
encoding.  We have to cope with that scenario even with the current
(broken IMO) initdb behavior, because even if template1 has a supported
encoding, it's possible to create another database that doesn't.
It does fail more or less cleanly; you get an "encoding "foo" not
supported by ICU" message at runtime (out of get_encoding_name_for_icu).
But that's quite a bit unlike the behavior for libc collations: with
those, you get an error in collation name lookup, along the lines of

collation "en_DK.utf8" for encoding "SQL_ASCII" does not exist

The attached proposed patch makes the behavior for ICU collations the
same, by dint of injecting the is_encoding_supported_by_icu() check
into collation name lookup.

regards, tom lane

diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 64f6fee..029a132 100644
*** a/src/backend/catalog/namespace.c
--- b/src/backend/catalog/namespace.c
*** OpfamilyIsVisible(Oid opfid)
*** 1915,1923 
--- 1915,1974 
  }
  
  /*
+  * lookup_collation
+  *		If there's a collation of the given name/namespace, and it works
+  *		with the given encoding, return its OID.  Else return InvalidOid.
+  */
+ static Oid
+ lookup_collation(const char *collname, Oid collnamespace, int32 encoding)
+ {
+ 	Oid			collid;
+ 	HeapTuple	colltup;
+ 	Form_pg_collation collform;
+ 
+ 	/* Check for encoding-specific entry (exact match) */
+ 	collid = GetSysCacheOid3(COLLNAMEENCNSP,
+ 			 PointerGetDatum(collname),
+ 			 Int32GetDatum(encoding),
+ 			 ObjectIdGetDatum(collnamespace));
+ 	if (OidIsValid(collid))
+ 		return collid;
+ 
+ 	/*
+ 	 * Check for any-encoding entry.  This takes a bit more work: while libc
+ 	 * collations with collencoding = -1 do work with all encodings, ICU
+ 	 * collations only work with certain encodings, so we have to check that
+ 	 * aspect before deciding it's a match.
+ 	 */
+ 	colltup = SearchSysCache3(COLLNAMEENCNSP,
+ 			  PointerGetDatum(collname),
+ 			  Int32GetDatum(-1),
+ 			  ObjectIdGetDatum(collnamespace));
+ 	if (!HeapTupleIsValid(colltup))
+ 		return InvalidOid;
+ 	collform = (Form_pg_collation) GETSTRUCT(colltup);
+ 	if (collform->collprovider == COLLPROVIDER_ICU)
+ 	{
+ 		if (is_encoding_supported_by_icu(encoding))
+ 			collid = HeapTupleGetOid(colltup);
+ 		else
+ 			collid = InvalidOid;
+ 	}
+ 	else
+ 	{
+ 		collid = HeapTupleGetOid(colltup);
+ 	}
+ 	ReleaseSysCache(colltup);
+ 	return collid;
+ }
+ 
+ /*
   * CollationGetCollid
   *		Try to resolve an unqualified collation name.
   *		Returns OID if collation found in search path, else InvalidOid.
+  *
+  * Note that this will only find collations that work with the current
+  * database's encoding.
   */
  Oid
  CollationGetCollid(const char *collname)
*** CollationGetCollid(const char *collname)
*** 1935,1953 
  		if (namespaceId == myTempNamespace)
  			continue;			/* do not look in temp namespace */
  
! 		/* Check for database-encoding-specific entry */
! 		collid = GetSysCacheOid3(COLLNAMEENCNSP,
!  PointerGetDatum(collname),
!  Int32GetDatum(dbencoding),
!  ObjectIdGetDatum(namespaceId));
! 		if (OidIsValid(collid))
! 			return collid;
! 
! 		/* Check for any-encoding entry */
! 		collid = GetSysCacheOid3(COLLNAMEENCNSP,
!  PointerGetDatum(collname),
!  Int32GetDatum(-1),
!  ObjectIdGetDatum(namespaceId));
  		if (OidIsValid(collid))
  			return collid;
  	}
--- 1986,1992 
  		if (namespaceId == myTempNamespace)
  			continue;			/* do not look in temp namespace */
  
! 		collid = lookup_collation(collname, namespaceId, dbencoding);
  		if (OidIsValid(collid))
  			return collid;
  	}
*** CollationGetCollid(const char *collname)
*** 1961,1966 
--- 2000,2008 
   *		Determine whether a collation (identified by OID) is visible in the
   *		current search path.  Visible means "would be found by searching
   *		for the unqualified collation name".
+  *
+  * Note that only collations that work with the current database's encoding
+  * will be considered visible.
   */
  bool
  CollationIsVisible(Oid collid)
*** CollationIsVisible(Oid collid)
*** 1990,1998 
  	{
  		/*
  		 * If it is in the path, it might still not be visible; it could be
! 		 * hidden by another conversion of the same name earlier in the path.
! 		 * So we must do a slow 

Re: [HACKERS] Logical replication: stuck spinlock at ReplicationSlotRelease

2017-06-23 Thread Tom Lane
Peter Eisentraut  writes:
> On 6/23/17 13:26, Alvaro Herrera wrote:
>> Hmm, so for instance in LogicalIncreaseRestartDecodingForSlot() we have
>> some elog(DEBUG1) calls with the slot spinlock held.  That's probably
>> uncool.

> I can reproduce the issue with --client-min-messages=debug2 or debug1,
> but it doesn't appear with the default settings.  I don't always get the
> "stuck spinlock" message, but it hangs badly pretty reliably after two
> or three rounds of erroring.

> Removing the call you pointed out doesn't make a difference, but it's
> possibly something similar.

Maybe some function invoked in a debug ereport accesses shared memory
or takes locks (eg, for catalog access)?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] initdb initalization failure for collation "ja_JP"

2017-06-23 Thread Tom Lane
I wrote:
> One question that I've got is why the ICU portion refuses to load
> any entries unless is_encoding_supported_by_icu(GetDatabaseEncoding()).
> Surely this is completely wrong?  I should think that what we load into
> pg_collation ought to be independent of template1's encoding, the same
> as it is for libc collations, and the right place to be making a test
> like that is where somebody attempts to use an ICU collation.  But
> I've not tried to test it.

So I did test that, and found out the presumable reason why that's there:
icu_from_uchar() falls over if the database encoding is unsupported, and
we use that to convert ICU "display names" for use as comments for the
ICU collations.  But that's not very much less wrongheaded, because it will
allow non-ASCII characters into the initial database contents, which is
absolutely not acceptable.  We assume we can bit-copy the contents of
template0 and it will be valid in any encoding.

Therefore, I think the right thing to do is remove that test and change
get_icu_locale_comment() so that it rejects non-ASCII text, making the
encoding conversion trivial, as in the attached patch.

On my Fedora 25 laptop, the only collations that go without a comment
in this approach are the "nb" ones (Norwegian Bokmål).  As I recall,
that locale is a second-class citizen for other reasons already,
precisely because of its loony insistence on a non-ASCII name even
when we're asking for an Anglicized version.

I'm inclined to add a test to reject non-ASCII in the ICU locale names as
well as the comments.  We've had to do that for libc locale names, and
this experience shows that the ICU locale maintainers don't have their
heads screwed on any straighter.  But this patch doesn't do that.

regards, tom lane

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 1c43f0b..6febe63 100644
*** a/src/backend/commands/collationcmds.c
--- b/src/backend/commands/collationcmds.c
*** get_icu_language_tag(const char *localen
*** 431,437 
  
  /*
   * Get a comment (specifically, the display name) for an ICU locale.
!  * The result is a palloc'd string.
   */
  static char *
  get_icu_locale_comment(const char *localename)
--- 431,439 
  
  /*
   * Get a comment (specifically, the display name) for an ICU locale.
!  * The result is a palloc'd string, or NULL if we can't get a comment
!  * or find that it's not all ASCII.  (We can *not* accept non-ASCII
!  * comments, because the contents of template0 must be encoding-agnostic.)
   */
  static char *
  get_icu_locale_comment(const char *localename)
*** get_icu_locale_comment(const char *local
*** 439,444 
--- 441,447 
  	UErrorCode	status;
  	UChar		displayname[128];
  	int32		len_uchar;
+ 	int32		i;
  	char	   *result;
  
  	status = U_ZERO_ERROR;
*** get_icu_locale_comment(const char *local
*** 446,456 
  	displayname, lengthof(displayname),
  	);
  	if (U_FAILURE(status))
! 		ereport(ERROR,
! (errmsg("could not get display name for locale \"%s\": %s",
! 		localename, u_errorName(status;
  
! 	icu_from_uchar(, displayname, len_uchar);
  
  	return result;
  }
--- 449,468 
  	displayname, lengthof(displayname),
  	);
  	if (U_FAILURE(status))
! 		return NULL;			/* no good reason to raise an error */
  
! 	/* Check for non-ASCII comment */
! 	for (i = 0; i < len_uchar; i++)
! 	{
! 		if (displayname[i] > 127)
! 			return NULL;
! 	}
! 
! 	/* OK, transcribe */
! 	result = palloc(len_uchar + 1);
! 	for (i = 0; i < len_uchar; i++)
! 		result[i] = displayname[i];
! 	result[len_uchar] = '\0';
  
  	return result;
  }
*** pg_import_system_collations(PG_FUNCTION_
*** 642,655 
  
  	/* Load collations known to ICU */
  #ifdef USE_ICU
- 	if (!is_encoding_supported_by_icu(GetDatabaseEncoding()))
- 	{
- 		ereport(NOTICE,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-  errmsg("encoding \"%s\" not supported by ICU",
- 		pg_encoding_to_char(GetDatabaseEncoding();
- 	}
- 	else
  	{
  		int			i;
  
--- 654,659 
*** pg_import_system_collations(PG_FUNCTION_
*** 661,666 
--- 665,671 
  		{
  			const char *name;
  			char	   *langtag;
+ 			char	   *icucomment;
  			const char *collcollate;
  			UEnumeration *en;
  			UErrorCode	status;
*** pg_import_system_collations(PG_FUNCTION_
*** 686,693 
  
  CommandCounterIncrement();
  
! CreateComments(collid, CollationRelationId, 0,
! 			   get_icu_locale_comment(name));
  			}
  
  			/*
--- 691,700 
  
  CommandCounterIncrement();
  
! icucomment = get_icu_locale_comment(name);
! if (icucomment)
! 	CreateComments(collid, CollationRelationId, 0,
!    icucomment);
  			}
  
  			/*
*** pg_import_system_collations(PG_FUNCTION_
*** 720,727 
  
  	CommandCounterIncrement();
  
! 	CreateComments(collid, 

Re: [HACKERS] Logical replication: stuck spinlock at ReplicationSlotRelease

2017-06-23 Thread Andres Freund
On 2017-06-23 13:26:58 -0400, Alvaro Herrera wrote:
> Hmm, so for instance in LogicalIncreaseRestartDecodingForSlot() we have
> some elog(DEBUG1) calls with the slot spinlock held.  That's probably
> uncool.

It definitely is not cool, rather daft even (it's probably me who wrote
that).  No idea why I've done it like that, makes no sense.

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication: stuck spinlock at ReplicationSlotRelease

2017-06-23 Thread Peter Eisentraut
On 6/23/17 13:26, Alvaro Herrera wrote:
> Albe Laurenz wrote:
>> Peter Eisentraut wrote:
>>> On 6/21/17 09:02, Albe Laurenz wrote:
 2017-06-21 14:55:12.033 CEST [23124] LOG:  could not send data to client: 
 Broken pipe
 2017-06-21 14:55:12.033 CEST [23124] FATAL:  connection to client lost
 2017-06-21 14:55:17.032 CEST [23133] LOG:  logical replication apply 
 worker for subscription "reprec" has started
 DEBUG:  received replication command: IDENTIFY_SYSTEM
 DEBUG:  received replication command: START_REPLICATION SLOT "reprec" 
 LOGICAL 0/0 (proto_version '1', publication_names '"repsend"')
 2017-06-21 14:57:24.552 CEST [23124] PANIC:  stuck spinlock detected at 
 ReplicationSlotRelease, slot.c:394
 2017-06-21 14:57:24.885 CEST [23070] LOG:  server process (PID 23124) was 
 terminated by signal 6: Aborted
 2017-06-21 14:57:24.885 CEST [23070] LOG:  terminating any other active 
 server processes
 2017-06-21 14:57:24.887 CEST [23134] LOG:  could not send data to client: 
 Broken pipe
 2017-06-21 14:57:24.890 CEST [23070] LOG:  all server processes 
 terminated; reinitializing
>>>
>>> I can't reproduce that.  I let it loop around for about 10 minutes and
>>> it was fine.
>>>
>>> I notice that you have some debug settings on.  Could you share your
>>> exact setup steps from initdb, as well as configure options, just in
>>> case one of these settings is causing a problem?
> 
> Hmm, so for instance in LogicalIncreaseRestartDecodingForSlot() we have
> some elog(DEBUG1) calls with the slot spinlock held.  That's probably
> uncool.

I can reproduce the issue with --client-min-messages=debug2 or debug1,
but it doesn't appear with the default settings.  I don't always get the
"stuck spinlock" message, but it hangs badly pretty reliably after two
or three rounds of erroring.

Removing the call you pointed out doesn't make a difference, but it's
possibly something similar.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_terminate_backend can terminate background workers and autovacuum launchers

2017-06-23 Thread Alvaro Herrera
Yugo Nagata wrote:

> I tried to make it. Patch attached. 
> 
> It is easy and simple. Although I haven't tried for autovacuum worker,
> I confirmed I can change other process' parameters without affecting others.
> Do you want this in PG?

One advantage of this gadget is that you can signal backends that you
own without being superuser, so +1 for the general idea.  (Please do
create a fresh thread so that this can be appropriately linked to in
commitfest app, though.)

You need a much bigger patch for the autovacuum worker.  The use case I
had in mind is that you have a maintenance window and can run fast
vacuuming during it, but if the table is large and vacuum can't finish
within that window then you want vacuum to slow down, without stopping
it completely.  But implementing this requires juggling the
stdVacuumCostDelay/Limit values during the reload, which are currently
read at start of vacuuming only; the working values are overwritten from
those during a rebalance.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Same expression more than once in partition key

2017-06-23 Thread Tom Lane
Peter Eisentraut  writes:
> We also allow the same column more than once in an index.  We probably
> don't have to be more strict here.

There actually are valid uses for the same column more than once in
an index, eg if you use a different operator class for each instance.
I find it hard to envision a similar use-case in partitioning though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REPLICA IDENTITY FULL

2017-06-23 Thread Peter Eisentraut
On 6/23/17 13:14, Alvaro Herrera wrote:
> Andres Freund wrote:
>> On 2017-06-23 13:05:21 -0400, Alvaro Herrera wrote:
>>> Tom Lane wrote:
 Peter Eisentraut  writes:
> Any thoughts about keeping datumAsEqual() as a first check?  I did some
> light performance tests, but it was inconclusive.

 Seems like it would tend to be a win if, in fact, the values are
 usually equal.  If they're usually not, then it's a loser.  Do
 we have any feeling for which case is more common?
>>
>> Seems like a premature optimization to me - if you care about
>> performance and do this frequently, you're not going to end up using
>> FULL.  If we want to performance optimize, it'd probably better to
>> lookup candidate keys and use those if available.
> 
> I can get behind that argument.

Thanks for the feedback.  I have committed it after removing the
datumIsEqual() call.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] shift_sjis_2004 related autority files are remaining

2017-06-23 Thread Peter Eisentraut
On 6/23/17 11:15, Tatsuo Ishii wrote:
>> For clarity, I personally perfer to keep all the source text file
>> in the repository, especially so that we can detect changes of
>> them. But since we decide that at least most of them not to be
>> there (from a reason of license), I just don't see a reason to
>> keep only the rest even without the restriction.
> 
> So are you saying that if n/m of authority files are not kept because
> of license issue, then m-n authority files should not be kept as well?
> What's the benefit for us by doing so?

I don't have a clear opinion on this particular issue, but I think we
should have clarity on why particular files or code exist.  So why do
these files exist and the others don't?  Is it just the license?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Same expression more than once in partition key

2017-06-23 Thread Peter Eisentraut
On 6/23/17 09:54, Tom Lane wrote:
>> However, I can use same expression more than once in partition key.
> 
> ... and so, I can't get excited about extending it to expressions.
> Where would you stop --- for example, are i and (i) equal?  What
> about (i) and (case when true then i else j end)?  In principle,
> the system could recognize both those identities, but it seems
> silly to expend code on it just for nagware purposes.

We also allow the same column more than once in an index.  We probably
don't have to be more strict here.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgjdbc logical replication client throwing exception

2017-06-23 Thread Peter Eisentraut
On 6/23/17 06:11, sanyam jain wrote:
> I'm trying to create something like pg_recvlogical.c in java using
> pgjdbc.Its working for small transactions but when i do a big
> transaction client throws below exception:

What does the server log say?  If nothing interesting, turn up debugging.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Can ICU be used for a database's default sort order?

2017-06-23 Thread Peter Eisentraut
On 6/22/17 23:10, Peter Geoghegan wrote:
> On Thu, Jun 22, 2017 at 7:10 PM, Tom Lane  wrote:
>> Is there some way I'm missing, or is this just a not-done-yet feature?
> 
> It's a not-done-yet feature.

It's something I hope to address soon.

The main definitional challenge is how to associate a pg_database entry
with a collation.

What we currently effectively do is duplicate the fields of pg_collation
in pg_database.  But I imagine over time we'll add more properties in
pg_collation, along with additional ALTER COLLATION commands etc., so
duplicating all of that would be a significant amount of code
complication and result in a puzzling user interface.

Ideally, I'd like to see CREATE DATABASE ... COLLATION "foo".  But the
problem is of course that collations are per-database objects.  Possible
solutions:

1) Associate by name only.  That is, you can create a database with any
COLLATION "foo" that you want, and it's only checked when you first
connect to or do anything in the database.

2) Create shared collations.  Then we'd need a way to manage having a
mix of shared and non-shared collations around.

There are significant pros and cons to all of these ideas.  Some people
I talked to appeared to prefer the shared collations approach.

Other ideas?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix a typo in snapmgr.c

2017-06-23 Thread Andres Freund
On 2017-06-23 19:21:57 +0100, Simon Riggs wrote:
> On 23 June 2017 at 08:23, Simon Riggs  wrote:
> > On 23 June 2017 at 08:21, Andres Freund  wrote:
> >> On 2017-06-07 10:17:31 -0700, Andres Freund wrote:
> >>> On 2017-05-08 09:12:13 -0400, Tom Lane wrote:
> >>> > Simon Riggs  writes:
> >>> > > So rearranged code a little to keep it lean.
> >>> >
> >>> > Didn't you break it with that?  As it now stands, the memcpy will
> >>> > copy the nonzero value.
> >>>
> >>> I've not seen a fix and/or alleviating comment about this so far.  Did I
> >>> miss something?
> >>
> >> Simon, FWIW, I plan to either revert or fix this up soon-ish.  Unless
> >> you're going to actually respond on this thread?
> >
> > Sorry, I've only just seen Tom's reply. Will fix.
> 
> I don't see a bug. Perhaps I'm tired and can't see it yet.
> 
> Will fix if you thwack me with the explanation.

Wasn't my complaint, but here we go:

Previous code:

/*
 * Ignore the SubXID array if it has overflowed, unless the snapshot was
 * taken during recovey - in that case, top-level XIDs are in subxip as
 * well, and we mustn't lose them.
 */
if (serialized_snapshot.suboverflowed && !snapshot->takenDuringRecovery)
serialized_snapshot.subxcnt = 0;

/* Copy struct to possibly-unaligned buffer */
memcpy(start_address,
   _snapshot, sizeof(SerializedSnapshotData));

i.e. if suboverflowed, start_address would contain subxcnt = 0.

New code:


/* Copy struct to possibly-unaligned buffer */
memcpy(start_address,
   _snapshot, sizeof(SerializedSnapshotData));

/* Copy XID array */
if (snapshot->xcnt > 0)
memcpy((TransactionId *) (start_address +
  
sizeof(SerializedSnapshotData)),
   snapshot->xip, snapshot->xcnt * 
sizeof(TransactionId));

/*
 * Copy SubXID array. Don't bother to copy it if it had overflowed,
 * though, because it's not used anywhere in that case. Except if it's a
 * snapshot taken during recovery; all the top-level XIDs are in subxip 
as
 * well in that case, so we mustn't lose them.
 */
if (serialized_snapshot.suboverflowed && !snapshot->takenDuringRecovery)
serialized_snapshot.subxcnt = 0;

Here the copy is done before subxcnt = 0.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix a typo in snapmgr.c

2017-06-23 Thread Simon Riggs
On 23 June 2017 at 08:23, Simon Riggs  wrote:
> On 23 June 2017 at 08:21, Andres Freund  wrote:
>> On 2017-06-07 10:17:31 -0700, Andres Freund wrote:
>>> On 2017-05-08 09:12:13 -0400, Tom Lane wrote:
>>> > Simon Riggs  writes:
>>> > > So rearranged code a little to keep it lean.
>>> >
>>> > Didn't you break it with that?  As it now stands, the memcpy will
>>> > copy the nonzero value.
>>>
>>> I've not seen a fix and/or alleviating comment about this so far.  Did I
>>> miss something?
>>
>> Simon, FWIW, I plan to either revert or fix this up soon-ish.  Unless
>> you're going to actually respond on this thread?
>
> Sorry, I've only just seen Tom's reply. Will fix.

I don't see a bug. Perhaps I'm tired and can't see it yet.

Will fix if you thwack me with the explanation.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small bug in replication lag tracking

2017-06-23 Thread Simon Riggs
On 23 June 2017 at 08:18, Simon Riggs  wrote:
> On 23 June 2017 at 06:45, Thomas Munro  wrote:
>
>> I discovered a thinko in the new replication lag interpolation code
>> that can cause a strange number to be reported occasionally.
>
> Thanks, will review.

I've pushed this, but I think we should leave the code alone after
this and wait for user feedback.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Code quality issues in ICU patch

2017-06-23 Thread David Fetter
On Fri, Jun 23, 2017 at 12:31:40PM -0400, Tom Lane wrote:
> icu_to_uchar() and icu_from_uchar(), and perhaps other places, are
> touchingly naive about integer overflow hazards in buffer size
> calculations.  I call particular attention to this bit in
> icu_from_uchar():
> 
>   len_result = UCNV_GET_MAX_BYTES_FOR_STRING(len_uchar, 
> ucnv_getMaxCharSize(icu_converter));
> 
> The ICU man pages say that that macro is defined as
> 
> #define UCNV_GET_MAX_BYTES_FOR_STRING(length, maxCharSize)
> (((int32_t)(length)+10)*(int32_t)(maxCharSize))
> 
> which means that getting this to overflow (resulting in
> probably-exploitable memory overruns) would be about as hard as
> taking candy from a baby.

So it kicks off really loud and persistent alarms, and isn't as easy
as you thought, even taking this into account?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incorrect documentation about pg_stat_activity

2017-06-23 Thread Simon Riggs
On 21 June 2017 at 15:18, Simon Riggs  wrote:
> On 21 June 2017 at 16:15, Yugo Nagata  wrote:
>> On Wed, 21 Jun 2017 19:08:35 +0530
>> Kuntal Ghosh  wrote:
>>
>>> On Wed, Jun 21, 2017 at 6:05 PM, Yugo Nagata  wrote:
>>> >
>>> > Attached is a patch for the documentation fix.
>>> >
>>> Please attach the patch as well. :-)
>>
>> I'm sorry, I forgot it. I attahed this.
>
> Thanks, will apply.

Done, thanks

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication: stuck spinlock at ReplicationSlotRelease

2017-06-23 Thread Alvaro Herrera
Albe Laurenz wrote:
> Peter Eisentraut wrote:
> > On 6/21/17 09:02, Albe Laurenz wrote:
> >> 2017-06-21 14:55:12.033 CEST [23124] LOG:  could not send data to client: 
> >> Broken pipe
> >> 2017-06-21 14:55:12.033 CEST [23124] FATAL:  connection to client lost
> >> 2017-06-21 14:55:17.032 CEST [23133] LOG:  logical replication apply 
> >> worker for subscription "reprec" has started
> >> DEBUG:  received replication command: IDENTIFY_SYSTEM
> >> DEBUG:  received replication command: START_REPLICATION SLOT "reprec" 
> >> LOGICAL 0/0 (proto_version '1', publication_names '"repsend"')
> >> 2017-06-21 14:57:24.552 CEST [23124] PANIC:  stuck spinlock detected at 
> >> ReplicationSlotRelease, slot.c:394
> >> 2017-06-21 14:57:24.885 CEST [23070] LOG:  server process (PID 23124) was 
> >> terminated by signal 6: Aborted
> >> 2017-06-21 14:57:24.885 CEST [23070] LOG:  terminating any other active 
> >> server processes
> >> 2017-06-21 14:57:24.887 CEST [23134] LOG:  could not send data to client: 
> >> Broken pipe
> >> 2017-06-21 14:57:24.890 CEST [23070] LOG:  all server processes 
> >> terminated; reinitializing
> > 
> > I can't reproduce that.  I let it loop around for about 10 minutes and
> > it was fine.
> > 
> > I notice that you have some debug settings on.  Could you share your
> > exact setup steps from initdb, as well as configure options, just in
> > case one of these settings is causing a problem?

Hmm, so for instance in LogicalIncreaseRestartDecodingForSlot() we have
some elog(DEBUG1) calls with the slot spinlock held.  That's probably
uncool.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REPLICA IDENTITY FULL

2017-06-23 Thread Alvaro Herrera
Andres Freund wrote:
> On 2017-06-23 13:05:21 -0400, Alvaro Herrera wrote:
> > Tom Lane wrote:
> > > Peter Eisentraut  writes:
> > > > Any thoughts about keeping datumAsEqual() as a first check?  I did some
> > > > light performance tests, but it was inconclusive.
> > > 
> > > Seems like it would tend to be a win if, in fact, the values are
> > > usually equal.  If they're usually not, then it's a loser.  Do
> > > we have any feeling for which case is more common?
> 
> Seems like a premature optimization to me - if you care about
> performance and do this frequently, you're not going to end up using
> FULL.  If we want to performance optimize, it'd probably better to
> lookup candidate keys and use those if available.

I can get behind that argument.

> > who in their right minds would use floating point columns as part of
> > replica identity ...?
> 
> Since this is FULL, it'll be all columns...

Yeah, I was thinking you shouldn't have floating point columns if you're
going to use FULL as identity.  But you're tautologically right: doing
the wrong thing is likely not the right thing to do.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REPLICA IDENTITY FULL

2017-06-23 Thread Andres Freund
On 2017-06-23 13:05:21 -0400, Alvaro Herrera wrote:
> Tom Lane wrote:
> > Peter Eisentraut  writes:
> > > Any thoughts about keeping datumAsEqual() as a first check?  I did some
> > > light performance tests, but it was inconclusive.
> > 
> > Seems like it would tend to be a win if, in fact, the values are
> > usually equal.  If they're usually not, then it's a loser.  Do
> > we have any feeling for which case is more common?

Seems like a premature optimization to me - if you care about
performance and do this frequently, you're not going to end up using
FULL.  If we want to performance optimize, it'd probably better to
lookup candidate keys and use those if available.


> Though, thinking about it, maybe the datumIsEqual test would give the
> wrong answer for floating point values, and there'd be no fallback to
> equality with the logic I propose.  But then maybe that's all
> right ---

I don't think it'd be ok, we shouldn't just do the wrong thing because
we think it's unlikely to happen.


> who in their right minds would use floating point columns as part of
> replica identity ...?

Since this is FULL, it'll be all columns...


- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REPLICA IDENTITY FULL

2017-06-23 Thread Alvaro Herrera
Tom Lane wrote:
> Peter Eisentraut  writes:
> > Any thoughts about keeping datumAsEqual() as a first check?  I did some
> > light performance tests, but it was inconclusive.
> 
> Seems like it would tend to be a win if, in fact, the values are
> usually equal.  If they're usually not, then it's a loser.  Do
> we have any feeling for which case is more common?

What about keeping the datumIsEqual test for fixed length pass-by-value
types (I'm mostly thinking about fixed-width integers here ...) and
always use the full blown equality comparison for anything more
elaborate than that?

Though, thinking about it, maybe the datumIsEqual test would give the
wrong answer for floating point values, and there'd be no fallback to
equality with the logic I propose.  But then maybe that's all right ---
who in their right minds would use floating point columns as part of
replica identity ...?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Setting pd_lower in GIN metapage

2017-06-23 Thread Alvaro Herrera
Masahiko Sawada wrote:
> On Fri, Jun 23, 2017 at 3:31 PM, Michael Paquier
>  wrote:
> > On Fri, Jun 23, 2017 at 3:17 PM, Amit Langote
> >  wrote:
> >> Initially, I had naively set wal_consistency_check = all before running
> >> make installcheck and then had to wait for a long time to confirm that WAL
> >> generated by the gin test indeed caused consistency check failure on the
> >> standby with the v1 patch.
> >
> > wal_consistency_check = gin would have saved you a lot of I/O.
> >
> >> But I can see Sawada-san's point that there should be some way for
> >> developers writing code that better had gone through WAL consistency
> >> checking facility to do it without much hassle.  But then again, it may
> >> not be that frequent to need that.
> 
> Yeah, it should be optional. I imagined providing such an option of
> pg_regress or TAP test for the developers.

As far as I know it is possible to have third-party modules that extend
the buildfarm client script so that it runs additional tests that the
standard ones.  You could have a custom module installed in some
powerful machine of yours that runs the WAL consistency check and report
the results to the buildfarm.  A single animal running that test should
be enough, right?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Code quality issues in ICU patch

2017-06-23 Thread Tom Lane
icu_to_uchar() and icu_from_uchar(), and perhaps other places, are
touchingly naive about integer overflow hazards in buffer size
calculations.  I call particular attention to this bit in
icu_from_uchar():

len_result = UCNV_GET_MAX_BYTES_FOR_STRING(len_uchar, 
ucnv_getMaxCharSize(icu_converter));

The ICU man pages say that that macro is defined as

#define UCNV_GET_MAX_BYTES_FOR_STRING(length, maxCharSize)  
(((int32_t)(length)+10)*(int32_t)(maxCharSize))

which means that getting this to overflow (resulting in
probably-exploitable memory overruns) would be about as hard as taking
candy from a baby.

I also notice that the general approach to handling ICU-reported
error conditions is like

if (U_FAILURE(status))
ereport(ERROR,
(errmsg("ucnv_fromUChars failed: %s", u_errorName(status;

This lacks an errcode() setting, which is contrary to project policy,
and the error message violates our message style guidelines.

I don't particularly feel like fixing these things myself, but
somebody needs to; the overflow issues in particular are stop-ship
security hazards.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Broken hint bits (freeze)

2017-06-23 Thread Sergey Burladyan
Sergey Burladyan  writes:

> 1. create master
> 2. create standby from it
> 3. create unlogged table and hash index like:
>  create unlogged table test (id int primary key, v text);
>  create index on test using hash (id);
> 3. stop master
> 4. promote standby
>
> now, if you try to upgrade this new promoted master pg_upgrade will stop
> on this hash index:
> error while creating link for relation "public.test_id_idx" 
> ("s/9.2/base/16384/16393" to "m/9.4/base/16422/16393"): No such file or 
> directory
> Failure, exiting
>
> I touch this file (s/9.2/base/16384/16393) and rerun pg_upgrade from
> scratch and it complete successfully.

Missed test script for it.

-- 
Sergey Burladyan



test_hash.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Broken hint bits (freeze)

2017-06-23 Thread Sergey Burladyan
Bruce Momjian  writes:

> On Wed, Jun 21, 2017 at 07:49:21PM +0530, Amit Kapila wrote:
> > On Tue, Jun 20, 2017 at 7:24 PM, Amit Kapila  
> > wrote:
> > > Hmm.  I think we need something that works with lesser effort because
> > > not all users will be as knowledgeable as you are, so if they make any
> > > mistakes in copying the file manually, it can lead to problems.  How
> > > about issuing a notification (XLogArchiveNotifySeg) in shutdown
> > > checkpoint if archiving is enabled?
> > >
> > 
> > I have thought more about the above solution and it seems risky to
> > notify archiver for incomplete WAL segments (which will be possible in
> > this case as there is no guarantee that Checkpoint record will fill
> > the segment).  So, it seems to me we should update the document unless
> > you or someone has some solution to this problem.

> As far as I know this is the only remaining open issue.  Sergey, please
> verify.  I appreciate the work everyone has done to improve this, and
> all the existing fixes have been pushed to all supported branches.  :-)

Yes, thank you all for your help!

Yes, this is last issue with checkpoint that I know, how to ensure that
standby sync all shared buffers into disk on it shutdown.

I thinking about enforce restartpoint on shutdown, like:
src/backend/access/transam/xlog.c
-   8639 if (XLogRecPtrIsInvalid(lastCheckPointRecPtr) ||
-   8640 XLByteLE(lastCheckPoint.redo, 
ControlFile->checkPointCopy.redo))
-   8641 {
+   8639 if ( !(flags & CHECKPOINT_IS_SHUTDOWN) && 
(XLogRecPtrIsInvalid(lastCheckPointRecPtr) ||
+   8640 XLByteLE(lastCheckPoint.redo, 
ControlFile->checkPointCopy.redo) )
+   8641 {

But I still not read source and not sure about this solution.


PS:
I successfully upgraded last night from 9.2 to 9.4 and find other issue :-)

It is about hash index and promote:
1. create master
2. create standby from it
3. create unlogged table and hash index like:
 create unlogged table test (id int primary key, v text);
 create index on test using hash (id);
3. stop master
4. promote standby

now, if you try to upgrade this new promoted master pg_upgrade will stop
on this hash index:
error while creating link for relation "public.test_id_idx" 
("s/9.2/base/16384/16393" to "m/9.4/base/16422/16393"): No such file or 
directory
Failure, exiting

I touch this file (s/9.2/base/16384/16393) and rerun pg_upgrade from
scratch and it complete successfully.

-- 
Sergey Burladyan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] shift_sjis_2004 related autority files are remaining

2017-06-23 Thread Tatsuo Ishii
> For clarity, I personally perfer to keep all the source text file
> in the repository, especially so that we can detect changes of
> them. But since we decide that at least most of them not to be
> there (from a reason of license), I just don't see a reason to
> keep only the rest even without the restriction.

So are you saying that if n/m of authority files are not kept because
of license issue, then m-n authority files should not be kept as well?
What's the benefit for us by doing so?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Same expression more than once in partition key

2017-06-23 Thread Yugo Nagata
On Fri, 23 Jun 2017 09:54:17 -0400
Tom Lane  wrote:

> Yugo Nagata  writes:
> > When we create a range partitioned table, we cannot use
> > a column more than once in the partition key.
> 
> >  postgres=# create table t (i int) partition by range(i,i);
> >  ERROR:  column "i" appears more than once in partition key
> 
> AFAIK, that's just a simple test for obvious mistakes; it's not a
> property that's essential for correctness.

Thanks. This is what I want to know.

> 
> > However, I can use same expression more than once in partition key.
> 
> ... and so, I can't get excited about extending it to expressions.
> Where would you stop --- for example, are i and (i) equal?  What
> about (i) and (case when true then i else j end)?  In principle,
> the system could recognize both those identities, but it seems
> silly to expend code on it just for nagware purposes.
> 

Agreed.


>   regards, tom lane


-- 
Yugo Nagata 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_terminate_backend can terminate background workers and autovacuum launchers

2017-06-23 Thread Yugo Nagata
On Thu, 22 Jun 2017 14:08:30 +0900
Michael Paquier  wrote:

> On Thu, Jun 22, 2017 at 1:52 PM, Yugo Nagata  wrote:
> > On Thu, 22 Jun 2017 12:05:19 +0900
> > Michael Paquier  wrote:
> >> signal-able). Different thought, but I'd love to see a SQL function
> >> that allows triggering SIGHUP on a specific process, like an
> >> autovacuum worker to change its cost parameters. There is
> >> pg_reload_conf() to do so but that's system-wide.
> >
> > For example, is that like this?
> >
> > =# alter system set autovacuum_vacuum_cost_delay to 10;
> > =# select pg_reload_conf();
> > =# alter system reset autovacuum_vacuum_cost_delay;
> 
> No need to reset the parameter afterwards as this makes it sensible to
> updates afterwards, but you have the idea. Note that this is rather
> recent, as autovacuum listens to SIGHUP only since a75fb9b3.

I tried to make it. Patch attached. 

It is easy and simple. Although I haven't tried for autovacuum worker,
I confirmed I can change other process' parameters without affecting others.
Do you want this in PG?


[session A (PID:18375)]
=# show autovacuum_vacuum_cost_delay;
 autovacuum_vacuum_cost_delay 
--
 20ms
(1 row)


[session B]
postgres=# alter system set autovacuum_vacuum_cost_delay to 10;
ALTER SYSTEM
postgres=# select pg_reload_conf(18375);
 pg_reload_conf 

 t
(1 row)

postgres=# show autovacuum_vacuum_cost_delay;
 autovacuum_vacuum_cost_delay 
--
 20ms
(1 row)


[session A again]
postgres=# show autovacuum_vacuum_cost_delay;
 autovacuum_vacuum_cost_delay 
--
 10ms
(1 row)



> -- 
> Michael


-- 
Yugo Nagata 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 0fdad0c..d79faf7 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1128,6 +1128,7 @@ REVOKE EXECUTE ON FUNCTION pg_wal_replay_pause() FROM public;
 REVOKE EXECUTE ON FUNCTION pg_wal_replay_resume() FROM public;
 REVOKE EXECUTE ON FUNCTION pg_rotate_logfile() FROM public;
 REVOKE EXECUTE ON FUNCTION pg_reload_conf() FROM public;
+REVOKE EXECUTE ON FUNCTION pg_reload_conf(int) FROM public;
 REVOKE EXECUTE ON FUNCTION pg_current_logfile() FROM public;
 REVOKE EXECUTE ON FUNCTION pg_current_logfile(text) FROM public;
 
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index 62341b8..cc173ba 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -339,6 +339,36 @@ pg_reload_conf(PG_FUNCTION_ARGS)
 	PG_RETURN_BOOL(true);
 }
 
+/*
+ * Signal to reload the database configuration for a specified backend
+ *
+ * Permission checking for this function is managed through the normal
+ * GRANT system.
+ */
+Datum
+pg_reload_conf_pid(PG_FUNCTION_ARGS)
+{
+	int			pid = PG_GETARG_INT32(0);
+	int			r = pg_signal_backend(pid, SIGHUP);
+
+	if (r == SIGNAL_BACKEND_NOSUPERUSER)
+		ereport(WARNING,
+(errmsg("must be a superuser to terminate superuser process")));
+
+	if (r == SIGNAL_BACKEND_NOPERMISSION)
+		ereport(WARNING,
+ (errmsg("must be a member of the role whose process is being terminated or member of pg_signal_backend")));
+
+	if (r)
+	{
+		ereport(WARNING,
+(errmsg("failed to send signal to backend: %d", pid)));
+		PG_RETURN_BOOL(false);
+	}
+
+	PG_RETURN_BOOL(true);
+}
+
 
 /*
  * Rotate log file
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 1191b4a..7258f15 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -3255,6 +3255,8 @@ DESCR("true if wal replay is paused");
 
 DATA(insert OID = 2621 ( pg_reload_conf			PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 16 "" _null_ _null_ _null_ _null_ _null_ pg_reload_conf _null_ _null_ _null_ ));
 DESCR("reload configuration files");
+DATA(insert OID = 3409 ( pg_reload_conf			PGNSP PGUID 12 1 0 0 0 f f f f t f v s 1 0 16 "23" _null_ _null_ _null_ _null_ _null_ pg_reload_conf_pid _null_ _null_ _null_ ));
+DESCR("reload configuration files");
 DATA(insert OID = 2622 ( pg_rotate_logfile		PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 16 "" _null_ _null_ _null_ _null_ _null_ pg_rotate_logfile _null_ _null_ _null_ ));
 DESCR("rotate log file");
 DATA(insert OID = 3800 ( pg_current_logfilePGNSP PGUID 12 1 0 0 0 f f f f f f v s 0 0 25 "" _null_ _null_ _null_ _null_ _null_ pg_current_logfile _null_ _null_ _null_ ));

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Broken hint bits (freeze)

2017-06-23 Thread Bruce Momjian
On Fri, Jun 23, 2017 at 08:10:17AM +0530, Amit Kapila wrote:
> On Wed, Jun 21, 2017 at 10:03 PM, Bruce Momjian  wrote:
> > On Wed, Jun 21, 2017 at 07:49:21PM +0530, Amit Kapila wrote:
> >> On Tue, Jun 20, 2017 at 7:24 PM, Amit Kapila  
> >> wrote:
> >> > Hmm.  I think we need something that works with lesser effort because
> >> > not all users will be as knowledgeable as you are, so if they make any
> >> > mistakes in copying the file manually, it can lead to problems.  How
> >> > about issuing a notification (XLogArchiveNotifySeg) in shutdown
> >> > checkpoint if archiving is enabled?
> >> >
> >>
> >> I have thought more about the above solution and it seems risky to
> >> notify archiver for incomplete WAL segments (which will be possible in
> >> this case as there is no guarantee that Checkpoint record will fill
> >> the segment).  So, it seems to me we should update the document unless
> >> you or someone has some solution to this problem.
> >
> > The over-arching question is how do we tell users to verify that the WAL
> > has been replayed on the standby?  I am thinking we would say that for
> > streaming replication, the "Latest checkpoint location" should match on
> > the primary and standby, while for log shipping, the standbys should be
> > exactly one WAL file less than the primary.
> >
> 
> I am not sure if we can say "standbys should be exactly one WAL file
> less than the primary" because checkpoint can create few more WAL
> segments for future use.  I think to make this work user needs to
> carefully just copy the next WAL segment (next to the last file in
> standby) which will contain checkpoint record.  Ideally, there should
> be some way either in form of a tool or a functionality in the
> database server with which this last file can be copied but I think in
> the absence of that we can at least document this fact.

I was not clear.  I was not saying there can be only one extra WAL file.
I am saying the "Latest checkpoint location" should be one WAL file
farther on the master.  I think the big problem is that we need a full
replay of that WAL file, not just having it one less than the master.  I
have no idea how do explain that.  It is easy for streaming replication
since the "Latest checkpoint location" should match, which is simple.

Also, we need something that can be backpatched.  

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-06-23 Thread Tom Lane
Tomas Vondra  writes:
> It would be really great if you could explain why BitmapScans are 
> dubious, instead of just labeling them as dubious. (I guess you mean 
> Bitmap Heap Scans, right?)

The two things I'm aware of are (a) the assumption you noted, that
fetching tuples in TID sort order is a reasonably efficient thing,
and (b) the "offset" part of a TID can't exceed MaxHeapTuplesPerPage
--- see data structure in tidbitmap.c.  The latter issue means that
you don't really have a full six bytes to play with in a TID, only
about five.

I don't think (b) would be terribly hard to fix if we had a motivation to,
but I wonder whether there aren't other places that also know this about
TIDs.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REPLICA IDENTITY FULL

2017-06-23 Thread Tom Lane
Peter Eisentraut  writes:
> Any thoughts about keeping datumAsEqual() as a first check?  I did some
> light performance tests, but it was inconclusive.

Seems like it would tend to be a win if, in fact, the values are
usually equal.  If they're usually not, then it's a loser.  Do
we have any feeling for which case is more common?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-06-23 Thread Tomas Vondra

Hi,

On 6/23/17 10:38 AM, Teodor Sigaev wrote:

1. Table AM with a 6-byte TID.
2. Table AM with a custom locator format, which could be TID-like.
3. Table AM with no locators.


Currently TID has its own type in system catalog. Seems, it's possible 
that storage claims type of TID which it uses. Then, AM could claim it 
too, so the core based on that information could solve the question 
about AM-storage compatibility. Storage could also claim that it hasn't 
TID type at all so it couldn't be used with any access method, use case: 
compressed column oriented storage.




Isn't the fact that TID is an existing type defined in system catalogs 
is a fairly insignificant detail? I mean, we could just as easily define 
a new 64-bit locator data type, and use that instead, for example.


The main issue here is that we assume things about the TID contents, 
i.e. that it contains page/offset etc. And Bitmap nodes rely on that, to 
some extent - e.g. when prefetching data.


>
As I remeber, only GIN depends on TID format, other indexes use it as 
opaque type. Except, at least, btree and GiST - they believe that 
internal pointers are the same as outer (to heap)




I think you're probably right - GIN does compress the posting lists by 
exploiting the TID redundancy (that it's page/offset structure), and I 
don't think there are other index AMs doing that.


But I'm not sure we can simply rely on that - it's possible people will 
try to improve other index types (e.g. by adding similar compression to 
other index types). Moreover we now support extensions defining custom 
index AMs, and we have no clue what people may do in those.


So this would clearly require some sort of flag for each index AM.

>

Another dubious part - BitmapScan.



It would be really great if you could explain why BitmapScans are 
dubious, instead of just labeling them as dubious. (I guess you mean 
Bitmap Heap Scans, right?)


I see no conceptual issue with bitmap scans on arbitrary locator types, 
as long as there's sufficient locality information encoded in the value. 
What I mean by that is that for any two locator values A and B:


   (1) if (A < B) then (A is stored before B)

   (2) if (A is close to B) then (A is stored close to B)

Without these features it's probably futile to try to do bitmap scans, 
because the bitmap would not result in mostly sequential access pattern 
and things like prefetch would not be very efficient, I think.



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Same expression more than once in partition key

2017-06-23 Thread Tom Lane
Yugo Nagata  writes:
> When we create a range partitioned table, we cannot use
> a column more than once in the partition key.

>  postgres=# create table t (i int) partition by range(i,i);
>  ERROR:  column "i" appears more than once in partition key

AFAIK, that's just a simple test for obvious mistakes; it's not a
property that's essential for correctness.

> However, I can use same expression more than once in partition key.

... and so, I can't get excited about extending it to expressions.
Where would you stop --- for example, are i and (i) equal?  What
about (i) and (case when true then i else j end)?  In principle,
the system could recognize both those identities, but it seems
silly to expend code on it just for nagware purposes.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] transition table behavior with inheritance appears broken

2017-06-23 Thread Andrew Gierth
> "Noah" == Noah Misch  writes:

 Noah> This PostgreSQL 10 open item is past due for your status update.
 Noah> Kindly send a status update within 24 hours,

oops, sorry! I forgot to include a date in the last one, and in fact a
personal matter delayed things anyway. I expect to have this wrapped up
by 23:59 BST on the 24th.

-- 
Andrew.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Removing deprecation changing automatically type of opaque functions

2017-06-23 Thread Michael Paquier
Hi all,

Per bug #14706, it has been discussed the possibility to remove the
transformation of opaque functions happening for example when
declaring a trigger:
https://www.postgresql.org/message-id/cab7npqtc-7u1k2vz3qlxvsyaxbvncrc9zmuvyf4-wuzj8zf...@mail.gmail.com

As far as I can see, the deprecation is present since 2002 via commit
a2a3192, and has been around to allow the restore of old dump files.
On HEAD pg_dump supports servers down to 8.0, which was released in
2005. So there are already a couple of years in-between.

This is of course not a topic for 10, just for 11 for when its
development opens.

Thoughts?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REPLICA IDENTITY FULL

2017-06-23 Thread Peter Eisentraut
On 6/20/17 00:10, Andres Freund wrote:
> On 2017-06-20 11:52:10 +0900, Tatsuo Ishii wrote:
>> If my understanding is correct, it would not be easy to fix, no?
>>
>>> We might be able to refine that, but there is a general problem that
>>> without an index and an operator class, we are just doing our random
>>> best to match the values.
> 
> I don't see the problem as being big.  We should just look up the
> default btree opclass and use the relevant operator.  That's a how a
> number of things already work.

Patch for that.

Any thoughts about keeping datumAsEqual() as a first check?  I did some
light performance tests, but it was inconclusive.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 00ea950753c070d440818c71d06e2bbc53b6c294 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 23 Jun 2017 08:55:13 -0400
Subject: [PATCH] Fix replication with replica identity full

The comparison with the target rows on the subscriber side was done with
datumIsEqual(), which can have false negatives.  For instance, it didn't
work reliably for text columns.  So use the equality operator provided
by the type cache as fallback.

Also add more user documentation about replica identity requirements.

Reported-by: Tatsuo Ishii 
---
 doc/src/sgml/logical-replication.sgml  | 28 +++-
 src/backend/executor/execReplication.c | 22 +-
 src/test/subscription/t/001_rep_changes.pl | 23 ---
 3 files changed, 64 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/logical-replication.sgml 
b/doc/src/sgml/logical-replication.sgml
index 92ec175af1..fa8ae536d9 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -110,11 +110,29 @@ Publication
Publications can choose to limit the changes they produce to
any combination of INSERT, UPDATE, and
DELETE, similar to how triggers are fired by
-   particular event types.  If a table without a REPLICA
-   IDENTITY is added to a publication that
-   replicates UPDATE or DELETE
-   operations then subsequent UPDATE
-   or DELETE operations will fail on the publisher.
+   particular event types.  By default, all operation types are replicated.
+  
+
+  
+   A published table must have a replica identity configured in
+   order to be able to replicate UPDATE
+   and DELETE operations, so that appropriate rows to
+   update or delete can be identified on the subscriber side.  By default,
+   this is the primary key, if there is one.  Another unique index (with
+   certain additional requirements) can also be set to be the replica
+   identity.  If the table does not have any suitable key, then it can be set
+   to replica identity full, which means the entire row becomes
+   the key.  This, however, is very inefficient and should only be used as a
+   fallback if no other solution is possible.  If a replica identity other
+   than full is set on the publisher side, a replica identity
+   comprising the same or fewer columns must also be set on the subscriber
+   side.  See  for details on
+   how to set the replica identity.  If a table without a replica identity is
+   added to a publication that replicates UPDATE
+   or DELETE operations then
+   subsequent UPDATE or DELETE
+   operations will cause an error on the publisher.  INSERT
+   operations can proceed regardless of any replica identity.
   
 
   
diff --git a/src/backend/executor/execReplication.c 
b/src/backend/executor/execReplication.c
index 6dae79a8f0..97da47a1e2 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -24,12 +24,14 @@
 #include "parser/parsetree.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
+#include "utils/builtins.h"
 #include "utils/datum.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
+#include "utils/typcache.h"
 #include "utils/tqual.h"
 
 
@@ -245,9 +247,27 @@ tuple_equals_slot(TupleDesc desc, HeapTuple tup, 
TupleTableSlot *slot)
continue;
 
att = desc->attrs[attrnum];
+
+   /*
+* To compare for equality, first try datumIsEqual().  If that 
returns
+* false, try the equality operator.
+*/
if (!datumIsEqual(values[attrnum], slot->tts_values[attrnum],
  att->attbyval, att->attlen))
-   return false;
+   {
+   TypeCacheEntry *typentry;
+
+   typentry = lookup_type_cache(att->atttypid, 
TYPECACHE_EQ_OPR_FINFO);
+   if (!OidIsValid(typentry->eq_opr_finfo.fn_oid))
+   ereport(ERROR,
+   

Re: [HACKERS] Causal reads take II

2017-06-23 Thread Thomas Munro
On Wed, May 24, 2017 at 3:58 PM, Thomas Munro
 wrote:
>> On Mon, May 22, 2017 at 4:10 AM, Dmitry Dolgov <9erthali...@gmail.com> wrote:
>>> I'm wondering about status of this patch and how can I try it out?
>
> I ran into a problem while doing this, and it may take a couple more
> days to fix it since I am at pgcon this week.  More soon.

Apologies for the extended delay.  Here is the rebased patch, now with
a couple of improvements (see below).  To recap, this is the third
part of the original patch series[1], which had these components:

1.  synchronous_commit = remote_apply, committed in PostgreSQL 9.6
2.  replication lag tracking, committed in PostgreSQL 10
3.  causal_reads, the remaining part, hereby proposed for PostgreSQL 11

The goal is to allow applications to move arbitrary read-only
transactions to physical replica databases and still know that they
can see all preceding write transactions or get an error.  It's
something like regular synchronous replication with synchronous_commit
= remote_apply, except that it limits the impact on the primary and
handles failure transitions with defined semantics.

The inspiration for this kind of distributed read-follows-write
consistency using read leases was a system called Comdb2[2][3], whose
designer encouraged me to try to extend Postgres's streaming
replication to do something similar.  Read leases can also be found in
some consensus systems like Google Megastore, albeit in more ambitious
form IIUC.  The name is inspired by a MySQL Galera feature
(approximately the same feature but the approach is completely
different; Galera adds read latency, whereas this patch does not).
Maybe it needs a better name.

Is this is a feature that people want to see in PostgreSQL?

IMPROVEMENTS IN V17

The GUC to enable the feature is now called
"causal_reads_max_replay_lag".  Standbys listed in
causal_reads_standby_names whose pg_stat_replication.replay_lag
doesn't exceed that time are "available" for causal reads and will be
waited for by the primary when committing.  When they exceed that
threshold they are briefly in "revoking" state and then "unavailable",
and when the go return to an acceptable level they are briefly in
"joining" state before reaching "available".  CR states appear in
pg_stat_replication and transitions are logged at LOG level.

A new GUC called "causal_reads_lease_time" controls the lifetime of
read leases sent from the primary to the standby.  This affects the
frequency of lease replacement messages, and more importantly affects
the worst case of commit stall that can be introduced if connectivity
to a standby is lost and we have to wait for the last sent lease to
expire.  In the previous version, one single GUC controlled both
maximum tolerated replay lag and lease lifetime, which was good from
the point of view that fewer GUCs are better, but bad because it had
to be set fairly high when doing both jobs to be conservative about
clock skew.  The lease lifetime must be at least 4 x maximum tolerable
clock skew.  After the recent botching of a leap-second transition on
a popular public NTP network (TL;DR OpenNTP is not a good choice of
implementation to add to a public time server pool) I came to the
conclusion that I wouldn't want to recommend a default max clock skew
under 1.25s, to allow for some servers to be confused about leap
seconds for a while or to be running different smearing algorithms.  A
reasonable causal_reads_lease_time recommendation for people who don't
know much about the quality of their time source might therefore be
5s.  I think it's reasonable to want to set the maximum tolerable
replay lag to lower time than that, or in fact as low as you like,
depending on your workload and hardware.  Therefore I decided to split
the old "causal_reads_timeout" GUC into "causal_reads_max_replay_lag"
and "causal_reads_lease_time".

This new version introduces fast lease revocation.  Whenever the
primary decides that a standby is not keeping up, it kicks it out of
the set of CR-available standbys and revokes its lease, so that anyone
trying to run causal reads transactions there will start receiving a
new error.  In the previous version, it always did that by blocking
commits while waiting for the most recently sent lease to expire,
which I now call "slow revocation" because it could take several
seconds.  Now it blocks commits only until the standby acknowledges
that it is no longer available for causal reads OR the lease expires:
ideally that takes the time of a network a round trip.  Slow
revocation is still needed in various failure cases such as lost
connectivity.

TESTING

Apply the patch after first applying a small bug fix for replication
lag tracking[4].  Then:

1.  Set up some streaming replicas.
2.  Stick causal_reads_max_replay_lag = 2s (or any time you like) in
the primary's postgresql.conf.
3.  Set causal_reads = on in some transactions on various nodes.
4.  Try to break it!

As long as your 

Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests

2017-06-23 Thread Andrew Dunstan


On 06/23/2017 12:11 AM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> On 06/22/2017 10:24 AM, Tom Lane wrote:
>>> That earlier request is still valid.  Also, if you can reproduce the
>>> symptom that lorikeet just showed and get a stack trace from the
>>> (hypothetical) postmaster core dump, that would be hugely valuable.
>> See attached log and stacktrace
> The stacktrace seems to be from the parallel-query-leader backend.
> Was there another core file?
>
> The lack of any indication in the postmaster log that the postmaster saw
> the parallel worker's crash sure looks the same as lorikeet's failure.
> But if the postmaster didn't dump core, then we're still at a loss as to
> why it didn't respond.
>
>   



Rerunning with some different settings to see if I can get separate cores.

cheers

andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication: stuck spinlock at ReplicationSlotRelease

2017-06-23 Thread Albe Laurenz
Peter Eisentraut wrote:
> On 6/21/17 09:02, Albe Laurenz wrote:
>> 2017-06-21 14:55:12.033 CEST [23124] LOG:  could not send data to client: 
>> Broken pipe
>> 2017-06-21 14:55:12.033 CEST [23124] FATAL:  connection to client lost
>> 2017-06-21 14:55:17.032 CEST [23133] LOG:  logical replication apply worker 
>> for subscription "reprec" has started
>> DEBUG:  received replication command: IDENTIFY_SYSTEM
>> DEBUG:  received replication command: START_REPLICATION SLOT "reprec" 
>> LOGICAL 0/0 (proto_version '1', publication_names '"repsend"')
>> 2017-06-21 14:57:24.552 CEST [23124] PANIC:  stuck spinlock detected at 
>> ReplicationSlotRelease, slot.c:394
>> 2017-06-21 14:57:24.885 CEST [23070] LOG:  server process (PID 23124) was 
>> terminated by signal 6: Aborted
>> 2017-06-21 14:57:24.885 CEST [23070] LOG:  terminating any other active 
>> server processes
>> 2017-06-21 14:57:24.887 CEST [23134] LOG:  could not send data to client: 
>> Broken pipe
>> 2017-06-21 14:57:24.890 CEST [23070] LOG:  all server processes terminated; 
>> reinitializing
> 
> I can't reproduce that.  I let it loop around for about 10 minutes and
> it was fine.
> 
> I notice that you have some debug settings on.  Could you share your
> exact setup steps from initdb, as well as configure options, just in
> case one of these settings is causing a problem?

System is Linux, RHEL 6.9, Kernel 2.6.32-696.1.1.el6.x86_64.

Configure options:

CFLAGS='-Wall -O0 -g' ./configure --prefix=/home/laurenz/pg10 \
--sysconfdir=/magwien/etc \
--docdir=/home/laurenz/pg10/doc \
--mandir=/home/laurenz/pg10/man \
--with-perl \
--with-tcl \
--with-tclconfig=/usr/lib64 \
--with-python \
--with-openssl \
--with-pam \
--with-gssapi \
--enable-nls=de \
--enable-thread-safety \
--with-libxml \
--with-libxslt \
--enable-debug \
--enable-cassert \
--with-ldap

$ initdb -A trust -E UTF8 --locale=de_DE.UTF-8 --lc-messages=C -T german -U 
postgres -W -k dbhome

$ grep -v '^[[:space:]]*#' postgresql.conf | grep -v '^$' | sed -e 's/#.*$//'

port = 5432
max_connections = 100
password_encryption = scram-sha-256
shared_buffers = 128MB
dynamic_shared_memory_type = posix
wal_level = logical
log_destination = 'stderr'
logging_collector = on
log_filename = 'postgresql-%Y-%m-%d.log'
log_rotation_size = 0
client_min_messages = debug2
log_min_messages = error
log_connections = off
log_disconnections = off
log_timezone = 'Europe/Vienna'
datestyle = 'iso, dmy'
timezone = 'Europe/Vienna'
lc_messages = 'C'
lc_monetary = 'de_DE.UTF-8'
lc_numeric = 'de_DE.UTF-8'
lc_time = 'de_DE.UTF-8'
default_text_search_config = 'pg_catalog.german'

The crash happens reliably here.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pgjdbc logical replication client throwing exception

2017-06-23 Thread sanyam jain
Hi,

I'm trying to create something like pg_recvlogical.c in java using pgjdbc.Its 
working for small transactions but when i do a big transaction client throws 
below exception:


org.postgresql.util.PSQLException: Database connection failed when reading from 
copy
at 
org.postgresql.core.v3.QueryExecutorImpl.readFromCopy(QueryExecutorImpl.java:1028)
at org.postgresql.core.v3.CopyDualImpl.readFromCopy(CopyDualImpl.java:41)
at 
org.postgresql.core.v3.replication.V3PGReplicationStream.receiveNextData(V3PGReplicationStream.java:150)
at 
org.postgresql.core.v3.replication.V3PGReplicationStream.readInternal(V3PGReplicationStream.java:119)
at 
org.postgresql.core.v3.replication.V3PGReplicationStream.readPending(V3PGReplicationStream.java:73)
at XLogApplier.main(XLogApplier.java:63)
Caused by: java.net.SocketException: Socket closed
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at 
org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:140)
at 
org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:109)
at 
org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:191)
at org.postgresql.core.PGStream.receive(PGStream.java:495)
at org.postgresql.core.PGStream.receive(PGStream.java:479)
at 
org.postgresql.core.v3.QueryExecutorImpl.processCopyResults(QueryExecutorImpl.java:1161)
at 
org.postgresql.core.v3.QueryExecutorImpl.readFromCopy(QueryExecutorImpl.java:1026)
... 5 more


PG Server reports a broken pipe.
Can someone please tell what can be wrong?Its happening only when i add 
millions of rows.

Thanks,
Sanyam Jain



Re: [HACKERS] A bug in mapping attributes in ATExecAttachPartition()

2017-06-23 Thread Amit Langote
Thanks for the review again.

On 2017/06/22 19:55, Ashutosh Bapat wrote:
> On Thu, Jun 15, 2017 at 4:06 PM, Amit Langote
>  wrote:
>>
>> Anyway, I tried to implement the refactoring in patch 0002, which is not
>> all of the patch 0001 that Jeevan posted.  Please take a look.  I wondered
>> if we should emit a NOTICE when an individual leaf partition validation
>> can be skipped?
> 
> Yes. We emit an INFO for the table being attached. We want to do the
> same for the child. Or NOTICE In both the places.

Actually, I meant INFO.

>> No point in adding a new test if the answer to that is
>> no, I'd think.

Updated the patch 0002 so that an INFO message is printed for each leaf
partition and a test for the same.

>> Attaching here 0001 which fixes the bug (unchanged from the previous
>> version) and 0002 which implements the refactoring (and the feature to
>> look at the individual leaf partitions' constraints to see if validation
>> can be skipped.)
> 
> Comments on 0001 patch.
> + *
> + * We request an exclusive lock on all the partitions, because we may
> + * decide later in this function to scan them to validate the new
> + * partition constraint.
> Does that mean that we may not scan the partitions later, in which the 
> stronger
> lock we took was not needed. Is that right?

Yes.  I wrote that comment thinking only about the deadlock hazard which
had then occurred to me, so the text somehow ended up being reflective of
that thinking.  Please see the new comment, which hopefully is more
informative.

> Don't we need an exclusive lock to
> make sure that the constraints are not changed while we are validating those?

If I understand your question correctly, you meant to ask if we don't need
the strongest lock on individual partitions while looking at their
constraints to prove that we don't need to scan them.  We do and we do
take the strongest lock on individual partitions even today in the second
call to find_all_inheritors().  We're trying to eliminate the second call
here.

With the current code, we take AccessShareLock in the first call when
checking the circularity of inheritance.  Then if attachRel doesn't have
the constraint to avoid the scan, we decide to look at individual
partitions (their rows, not constraints, as of now) when we take
AccessExclusiveLock.  That might cause a deadlock (was able to reproduce
one using the debugger).

> if (!skip_validate)
> May be we should turn this into "else" condition of the "if" just above.

Yes, done.

> +/*
> + * We already collected the list of partitions, including the table
> + * named in the command itself, which should appear at the head of 
> the
> + * list.
> + */
> +Assert(list_length(attachRel_children) >= 1 &&
> +linitial_oid(attachRel_children) == RelationGetRelid(attachRel));
> Probably the Assert should be moved next to find_all_inheritors(). But I don't
> see much value in this comment and the Assert at this place.

I agree, removed both.

> 
> +/* Skip the original table if it's partitioned. */
> +if (part_relid == RelationGetRelid(attachRel) &&
> +attachRel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
> +continue;
> +
> 
> There isn't much point in checking this for every child in the loop. Instead,
> we should remove attachRel from the attachRel_children if there are more than
> one elements in the list (which means the attachRel is partitioned, may be
> verify that by Asserting).

Rearranged code considering these comments.

> Comments on 0002 patch.
> Thanks for the refactoring. The code looks really good now.

Thanks.

> The name skipPartConstraintValidation() looks very specific to the case at
> hand. The function is really checking whether existing constraints on the 
> table
> can imply the given constraints (which happen to be partition constraints). 
> How
> about PartConstraintImpliedByRelConstraint()? The idea is to pick a general
> name so that the function can be used for purposes other than skipping
> validation scan in future.

I liked this idea, so done.

>  * This basically returns if the partrel's existing constraints, which
> returns "true". Add "otherwise returns false".
> 
> if (constr != NULL)
> {
> ...
> }
> return false;
> May be you should return false when constr == NULL (I prefer !constr, but
> others may have different preferences.). That should save us one level of
> indentation. At the end just return whatever predicate_implied_by() returns.

Good suggestion, done.

Attached updated patches.

Thanks,
Amit
From ee034a3bcb25a8b516220636134fe3ed38796cfe Mon Sep 17 00:00:00 2001
From: amit 
Date: Wed, 14 Jun 2017 11:32:01 +0900
Subject: [PATCH 1/2] Cope with differing attnos in ATExecAttachPartition code

If the table being attached has different attnos from the parent for
the partitioning columns 

Re: [HACKERS] Pluggable storage

2017-06-23 Thread Teodor Sigaev

1. Table AM with a 6-byte TID.
2. Table AM with a custom locator format, which could be TID-like.
3. Table AM with no locators.


Currently TID has its own type in system catalog. Seems, it's possible that 
storage claims type of TID which it uses. Then, AM could claim it too, so the 
core based on that information could solve the question about AM-storage 
compatibility. Storage could also claim that it hasn't TID type at all so it 
couldn't be used with any access method, use case: compressed column oriented 
storage.


As I remeber, only GIN depends on TID format, other indexes use it as opaque 
type. Except, at least, btree and GiST - they believ that internal pointers are 
the same as outer (to heap)


Another dubious part - BitmapScan.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dynamic instrumentation of lwlock wait times (lwlock flamegraphs)

2017-06-23 Thread Stas Kelvich

> On 23 Jun 2017, at 00:08, Andres Freund  wrote:
> 
> Hi,
> 
> At pgcon some people were talking about the difficulty of instrumenting
> the time actually spent waiting for lwlocks and related measurements.
> I'd mentioned that linux these days provides infrastructure to measure
> such things in unmodified binaries.
> 
> Attached is a prototype of a script that measures the time spent inside
> PGSemaphoreLock(), aggregates that inside the kernel, grouped by pid and
> stacktrace.  That allows one to generate nice flame graphs showing which
> part of the code waits how long for lwlocks.
> 
> The attached script clearly needs improvements, but I thought I'd show
> some of the results it can get.  To run it you need the the python
> library of the 'bcc' project [1], and a sufficiently new kernel.  Some
> distributions, e.g. newer debian versions, package this as python-bpfcc
> and similar.
> 
> The output of the script can be turned into a flamegraph with
> https://github.com/brendangregg/FlameGraph 's flamegraph.pl.
> 
> Here's a few examples of a pgbench run. The number is the number of
> clients, sync/async indicates synchronous_commit on/off.  All the
> numbers here were generated with the libpq & pgbench batch patches
> applied and in use, but that's just because that was the state of my
> working tree.
> 
> http://anarazel.de/t/2017-06-22/pgsemwait_8_sync.svg
> http://anarazel.de/t/2017-06-22/pgsemwait_8_async.svg
> http://anarazel.de/t/2017-06-22/pgsemwait_64_sync.svg
> http://anarazel.de/t/2017-06-22/pgsemwait_64_async.svg
> 
> A bonus, not that representative one is the wait for a pgbench readonly
> run after the above, with autovacuum previously disabled:
> http://anarazel.de/t/2017-06-22/pgsemwait_64_select.svg
> interesting to see how the backends themselves never end up having to
> flush WAL themselves, or at least not in a manner triggering contention.
> 
> I plan to write a few more of these, because they're hugely useful to
> understand actual locking behaviour. Among them:
> - time beteen Acquire/Release of lwlocks, grouped by lwlock
> - time beteen Acquire/Release of lwlocks, grouped by stack
> - callstack of acquirer and waker of lwlocks, grouped by caller stack, waiter 
> stack
> - ...
> 
> I think it might be interesting to collect a few of these somewhere
> centrally once halfway mature.  Maybe in src/tools or such.

Wow, that’s extremely helpful, thanks a lot.


Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum launcher occurs error when cancelled by SIGINT

2017-06-23 Thread Kuntal Ghosh
On Fri, Jun 23, 2017 at 3:01 AM, Thomas Munro
 wrote:
> On Thu, Jun 22, 2017 at 4:29 AM, Kuntal Ghosh
>  wrote:
>> On Wed, Jun 21, 2017 at 7:52 PM, Dilip Kumar  wrote:
>>> On Wed, Jun 21, 2017 at 7:47 PM, Kuntal Ghosh
>>>  wrote:
> IMHO, It's not a good idea to use DSM call to verify the DSA handle.
>
 Okay. Is there any particular scenario you've in mind where this may fail?
>>>
>>> It's not about failure, but about the abstraction.  When we are using
>>> the DSA we should not directly access the DSM which is under DSA.
>>>
>> Okay. I thought that I've found at least one usage of
>> dsm_find_mapping() in the code. :-)
>>
>> But, I've some more doubts.
>> 1. When should we use dsm_find_mapping()? (The first few lines of
>> dsm_attach is same as dsm_find_mapping().)
>> 2. As a user of dsa, how should we check whether my dsa handle is
>> already attached? I guess this is required because, if a user tries to
>> re-attach a dsa handle,  it's punishing the user by throwing an error
>> and the user wants to avoid such errors.
>
> I thought about this when designing the DSA API.  I couldn't think of
> any good reason to provide an 'am-I-already-attached?' function
> equivalent to dsm_find_mapping.  It seemed to me that the client code
> shouldn't ever be in any doubt about whether it's attached, and that
> wilfully or absent-mindedly throwing away dsa_area pointers and having
> to ask for them again doesn't seem like a very good design.  I suspect
> the same applies to dsm_find_mapping, and I don't see any callers in
> the source tree or indeed anywhere on the internet (based on a quick
> Google search).  But I could be missing something.
>
Thanks a lot for the clarification.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Broken O(n^2) avoidance in wal segment recycling.

2017-06-23 Thread Michael Paquier
On Fri, Jun 23, 2017 at 3:25 PM, Michael Paquier
 wrote:
> On Thu, Jun 22, 2017 at 6:10 AM, Andres Freund  wrote:
>> We've not heard any complaints about this afaik, but it's not something
>> that's easily diagnosable as being a problem.  Therefore I suspect we
>> should fix and backpatch this?
>
> Agreed. I have just poked at this problem and have finished with the
> attached. Logically it is not complicated at the argument values used
> by the callers of RemoveXlogFile() are never updated when scanning
> pg_wal. Surely this patch needs an extra pair of eyes.

Andres has pointed me out offline that a bad copy-paste re-introduced
fb886c15. So it is important to properly track the segment name of the
switch point as well as the the segment from where the recycling
begins. There was as well a second mistake in RemoveOldXlogFiles()
causing unnecessary segments to be kept caused by the same copy-pasto.
-- 
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 0a6314a642..fd35cd311c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -878,7 +878,8 @@ static int	emode_for_corrupt_record(int emode, XLogRecPtr RecPtr);
 static void XLogFileClose(void);
 static void PreallocXlogFiles(XLogRecPtr endptr);
 static void RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr);
-static void RemoveXlogFile(const char *segname, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr);
+static void RemoveXlogFile(const char *segname, XLogSegNo *endlogSegNo,
+		   XLogSegNo recycleSegNo);
 static void UpdateLastRemovedPtr(char *filename);
 static void ValidateXLOGDirectoryStructure(void);
 static void CleanupBackupHistory(void);
@@ -3829,10 +3830,18 @@ UpdateLastRemovedPtr(char *filename)
 static void
 RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
 {
+	XLogSegNo	endlogSegNo;
+	XLogSegNo	recycleSegNo;
 	DIR		   *xldir;
 	struct dirent *xlde;
 	char		lastoff[MAXFNAMELEN];
 
+	/*
+	 * Initialize info about where to try to recycle to.
+	 */
+	XLByteToSeg(endptr, endlogSegNo);
+	recycleSegNo = XLOGfileslop(PriorRedoPtr);
+
 	xldir = AllocateDir(XLOGDIR);
 	if (xldir == NULL)
 		ereport(ERROR,
@@ -3875,7 +3884,7 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
 /* Update the last removed location in shared memory first */
 UpdateLastRemovedPtr(xlde->d_name);
 
-RemoveXlogFile(xlde->d_name, PriorRedoPtr, endptr);
+RemoveXlogFile(xlde->d_name, , recycleSegNo);
 			}
 		}
 	}
@@ -3905,8 +3914,16 @@ RemoveNonParentXlogFiles(XLogRecPtr switchpoint, TimeLineID newTLI)
 	struct dirent *xlde;
 	char		switchseg[MAXFNAMELEN];
 	XLogSegNo	endLogSegNo;
+	XLogSegNo	switchLogSegNo;
+	XLogSegNo	recycleSegNo;
 
-	XLByteToPrevSeg(switchpoint, endLogSegNo);
+	/*
+	 * Initialize info about where to begin the work. This will recycle
+	 * arbitrarily 10 segments.
+	 */
+	XLByteToPrevSeg(switchpoint, switchLogSegNo);
+	XLByteToSeg(switchpoint, endLogSegNo);
+	recycleSegNo = endLogSegNo + 10;
 
 	xldir = AllocateDir(XLOGDIR);
 	if (xldir == NULL)
@@ -3918,7 +3935,7 @@ RemoveNonParentXlogFiles(XLogRecPtr switchpoint, TimeLineID newTLI)
 	/*
 	 * Construct a filename of the last segment to be kept.
 	 */
-	XLogFileName(switchseg, newTLI, endLogSegNo);
+	XLogFileName(switchseg, newTLI, switchLogSegNo);
 
 	elog(DEBUG2, "attempting to remove WAL segments newer than log file %s",
 		 switchseg);
@@ -3944,7 +3961,7 @@ RemoveNonParentXlogFiles(XLogRecPtr switchpoint, TimeLineID newTLI)
 			 * - but seems safer to let them be archived and removed later.
 			 */
 			if (!XLogArchiveIsReady(xlde->d_name))
-RemoveXlogFile(xlde->d_name, InvalidXLogRecPtr, switchpoint);
+RemoveXlogFile(xlde->d_name, , recycleSegNo);
 		}
 	}
 
@@ -3954,31 +3971,22 @@ RemoveNonParentXlogFiles(XLogRecPtr switchpoint, TimeLineID newTLI)
 /*
  * Recycle or remove a log file that's no longer needed.
  *
- * endptr is current (or recent) end of xlog, and PriorRedoRecPtr is the
- * redo pointer of the previous checkpoint. These are used to determine
- * whether we want to recycle rather than delete no-longer-wanted log files.
- * If PriorRedoRecPtr is not known, pass invalid, and the function will
- * recycle, somewhat arbitrarily, 10 future segments.
+ * segname is the name of the segment to recycle or remove. endlogSegNo
+ * is the segment number of the current (or recent) end of WAL. recycleSegNo
+ * is the segment number to recycle up to.
+ *
+ * endlogSegNo gets incremented if the segment is recycled so as it is not
+ * checked again with future callers of this function.
  */
 static void
-RemoveXlogFile(const char *segname, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
+RemoveXlogFile(const char *segname, XLogSegNo *endlogSegNo,
+			   XLogSegNo recycleSegNo)
 {
 	char		path[MAXPGPATH];
 #ifdef WIN32
 	char		newpath[MAXPGPATH];
 #endif
 	struct stat 

Re: [HACKERS] Multi column range partition table

2017-06-23 Thread Dean Rasheed
On 23 June 2017 at 08:01, Ashutosh Bapat
 wrote:
> The way we have designed our syntax, we don't have a way to tell that
> p3 comes after p2 and they have no gap between those. But I don't
> think that's your question. What you are struggling with is a way to
> specify a lower bound (10, +infinity) so that anything with i1 > 10
> would go to partition 3.
>

I think actually there is a fundamental problem here, which arises
because UNBOUNDED has 2 different meanings depending on context, and
thus it is not possible in general to specify the start of one range
to be equal to the end of the previous range, as is necessary to get
contiguous non-overlapping ranges.

Note that this isn't just a problem for floating point datatypes
either, it also applies to other types such as strings. For example,
given a partition over (text, int) types defined with the following
values:

  FROM ('a', UNBOUNDED) TO ('b', UNBOUNDED)

which is equivalent to

  FROM ('a', -INFINITY) TO ('b', +INFINITY)

where should the next range start?

Even if you were to find a way to specify "the next string after 'b'",
it wouldn't exactly be pretty. The problem is that the above partition
corresponds to "all the strings starting with 'a', plus the string
'b', which is pretty ugly. A neater way to define the pair of ranges
in this case would be:

  FROM ('a', -INFINITY) TO ('b', -INFINITY)
  FROM ('b', -INFINITY) TO ('c', -INFINITY)

since then all strings starting with 'a' would fall into the first
partition and all the strings starting with 'b' would fall into the
second one.

Currently, when there are 2 partition columns, the partition
constraint is defined as

  (a is not null) and (b is not null)
  and
  (a > al or (a = al and b >= bl))
  and
  (a < au or (a = au and b < bu))

if the upper bound bu were allowed to be -INFINITY (something that
should probably be forbidden unless the previous column's upper bound
were finite), then this would simplify to

  (a is not null) and (b is not null)
  and
  (a > al or (a = al and b >= bl))
  and
  (a < au)

and in the example above, where al is -INFINITY, it would further simplify to

  (a is not null) and (b is not null)
  and
  (a >= al)
  and
  (a < au)

There would also be a similar simplification possible if the lower
bound of a partition column were allowed to be +INFINITY.

So, I think that having UNBOUNDED represent both -INFINITY and
+INFINITY depending on context is a design flaw, and that we need to
allow both -INFINITY and +INFINITY as upper and lower bounds (provided
they are preceded by a column with a finite bound). I think that, in
general, that's the only way to allow contiguous non-overlapping
partitions to be defined on multiple columns.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Setting pd_lower in GIN metapage

2017-06-23 Thread Masahiko Sawada
On Fri, Jun 23, 2017 at 3:31 PM, Michael Paquier
 wrote:
> On Fri, Jun 23, 2017 at 3:17 PM, Amit Langote
>  wrote:
>> Initially, I had naively set wal_consistency_check = all before running
>> make installcheck and then had to wait for a long time to confirm that WAL
>> generated by the gin test indeed caused consistency check failure on the
>> standby with the v1 patch.
>
> wal_consistency_check = gin would have saved you a lot of I/O.
>
>> But I can see Sawada-san's point that there should be some way for
>> developers writing code that better had gone through WAL consistency
>> checking facility to do it without much hassle.  But then again, it may
>> not be that frequent to need that.

Yeah, it should be optional. I imagined providing such an option of
pg_regress or TAP test for the developers.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix a typo in snapmgr.c

2017-06-23 Thread Simon Riggs
On 23 June 2017 at 08:21, Andres Freund  wrote:
> On 2017-06-07 10:17:31 -0700, Andres Freund wrote:
>> On 2017-05-08 09:12:13 -0400, Tom Lane wrote:
>> > Simon Riggs  writes:
>> > > So rearranged code a little to keep it lean.
>> >
>> > Didn't you break it with that?  As it now stands, the memcpy will
>> > copy the nonzero value.
>>
>> I've not seen a fix and/or alleviating comment about this so far.  Did I
>> miss something?
>
> Simon, FWIW, I plan to either revert or fix this up soon-ish.  Unless
> you're going to actually respond on this thread?

Sorry, I've only just seen Tom's reply. Will fix.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix a typo in snapmgr.c

2017-06-23 Thread Andres Freund
On 2017-06-07 10:17:31 -0700, Andres Freund wrote:
> On 2017-05-08 09:12:13 -0400, Tom Lane wrote:
> > Simon Riggs  writes:
> > > So rearranged code a little to keep it lean.
> > 
> > Didn't you break it with that?  As it now stands, the memcpy will
> > copy the nonzero value.
> 
> I've not seen a fix and/or alleviating comment about this so far.  Did I
> miss something?

Simon, FWIW, I plan to either revert or fix this up soon-ish.  Unless
you're going to actually respond on this thread?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small bug in replication lag tracking

2017-06-23 Thread Simon Riggs
On 23 June 2017 at 06:45, Thomas Munro  wrote:

> I discovered a thinko in the new replication lag interpolation code
> that can cause a strange number to be reported occasionally.

Thanks, will review.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Same expression more than once in partition key

2017-06-23 Thread Yugo Nagata
On Fri, 23 Jun 2017 15:57:54 +0900
Yugo Nagata  wrote:

> Hi,
> 
> When we create a range partitioned table, we cannot use
> a column more than once in the partition key.
> 
>  postgres=# create table t (i int) partition by range(i,i);
>  ERROR:  column "i" appears more than once in partition key
> 
> However, I can use same expression more than once in partition key.
> 
> postgres=# create table t (i int) partition by range((i),(i));
> CREATE TABLE
> 
> Although this can be blocked by the attached parch, for example,
> the following is still possible.

I forgot to attach it.

> 
> postgres=# create table t (i int) partition by range((i),i);
> CREATE TABLE
> 
> Can these definition be a problem in the internal of partitioning?
> If not, we might not need to check column that apears more than once 
> in the partition key.
> 
> Regards,
> 
> 
> 
> -- 
> Yugo Nagata 


-- 
Yugo Nagata 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 7d9c769..dc4540b 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -13171,6 +13171,19 @@ transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy)
 		PartitionElem *pelem = castNode(PartitionElem, lfirst(l));
 		ListCell   *lc;
 
+		if (pelem->expr)
+		{
+			/* Copy, to avoid scribbling on the input */
+			pelem = copyObject(pelem);
+
+			/* Now do parse transformation of the expression */
+			pelem->expr = transformExpr(pstate, pelem->expr,
+		EXPR_KIND_PARTITION_EXPRESSION);
+
+			/* we have to fix its collations too */
+			assign_expr_collations(pstate, pelem->expr);
+		}
+
 		/* Check for PARTITION BY ... (foo, foo) */
 		foreach(lc, newspec->partParams)
 		{
@@ -13183,19 +13196,11 @@ transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy)
 		 errmsg("column \"%s\" appears more than once in partition key",
 pelem->name),
 		 parser_errposition(pstate, pelem->location)));
-		}
-
-		if (pelem->expr)
-		{
-			/* Copy, to avoid scribbling on the input */
-			pelem = copyObject(pelem);
-
-			/* Now do parse transformation of the expression */
-			pelem->expr = transformExpr(pstate, pelem->expr,
-		EXPR_KIND_PARTITION_EXPRESSION);
-
-			/* we have to fix its collations too */
-			assign_expr_collations(pstate, pelem->expr);
+			else if (pelem->expr && pparam->expr && equal(pelem->expr, pparam->expr))
+ereport(ERROR,
+		(errcode(ERRCODE_DUPLICATE_COLUMN),
+		 errmsg("same expression appears more than once in partition key"),
+		 parser_errposition(pstate, pelem->location)));
 		}
 
 		newspec->partParams = lappend(newspec->partParams, pelem);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi column range partition table

2017-06-23 Thread Ashutosh Bapat
On Fri, Jun 23, 2017 at 6:58 AM, Amit Langote
 wrote:
> On 2017/06/22 20:48, amul sul wrote:
>> Hi,
>>
>> While working on the another patch, I came across the case where
>> I need an auto generated partition for a mutil-column range partitioned
>> table having following range bound:
>>
>> PARTITION p1 FROM  (UNBOUNDED, UNBOUNDED) TO (10, 10)
>> PARTITION p2 FROM  (10, 10)  TO (10, UNBOUNDED)
>> PARTITION p3 FROM  (10, UNBOUNDED) TO (20, 10)
>> PARTITION p4 FROM (20, 10) TO (20, UNBOUNDED)
>> PARTITION p5 FROM (20, UNBOUNDED) TO (UNBOUNDED, UNBOUNDED)
>>
>> In this, a lower bound of the partition is an upper bound of the
>> previous partition.
>>
>> While trying to create p3 partition with (10, UNBOUNDED) to (20, 10) bound,
>> got an overlap partition error.
>>
>> Here is the SQL to reproduced this error:
>>
>> CREATE TABLE range_parted ( i1 int,  i2 int ) PARTITION BY RANGE (i1, i2);
>> CREATE TABLE p1 PARTITION OF range_parted FOR VALUES FROM (UNBOUNDED,
>> UNBOUNDED) TO (10, 10);
>> CREATE TABLE p2 PARTITION OF range_parted FOR VALUES FROM (10, 10) TO
>> (10, UNBOUNDED);
>> CREATE TABLE p3   PARTITION OF tab1 FOR VALUES FROM (10, UNBOUNDED) TO (20, 
>> 10);
>>
>> ERROR:  partition "p3" would overlap partition "tab1_p_10_10"
>>
>> This happened because of UNBOUNDED handling, where it is a negative infinite
>> if it is in FROM clause.  Wondering can't we explicitly treat this as
>> a positive infinite value, can we?

The way we have designed our syntax, we don't have a way to tell that
p3 comes after p2 and they have no gap between those. But I don't
think that's your question. What you are struggling with is a way to
specify a lower bound (10, +infinity) so that anything with i1 > 10
would go to partition 3.

>
> No, we cannot.  What would be greater than (or equal to) +infinite?
> Nothing.  So, even if you will want p3 to accept (10, 9890148), it won't
> because 9890148 is not >= +infinite.  It will accept only the rows where
> the first column is > 10 (second column is not checked in that case).
>
> You will have to define p3 as follows:
>
> CREATE TABLE p3 PARTITION OF tab1 FOR VALUES FROM (11, UNBOUNDED) TO (20, 10);

That's not exactly same as specifying (10, +infinity) in case i1 is a
float. A user can not precisely tell what would be the acceptable
value just greater than 10.

An UNBOUNDED in the lower bound is always considered as -infinity for
that data type. There is no way to specify a lower bound which has
+infinity in it. +infinite as a lower bounds for the first key may not
make sense (although that means that the partition will always be
empty), but it does make sense for keys after the first as Amul has
explained below.

The question is do we have support for that and if not, will we
consider it for v10 or v11 and how.

>
> It's fine to use the previous partition's upper bound as the lower bound
> of the current partition, if the former does contain an UNBOUNDED value,
> because whereas a finite value divides the range into two parts (assigned
> to the two partitions respectively), an UNBOUNDED value does not.  The
> latter represents an abstract end of the range (either on the positive
> side or the negative).

Not exactly for second key onwards.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Same expression more than once in partition key

2017-06-23 Thread Yugo Nagata
Hi,

When we create a range partitioned table, we cannot use
a column more than once in the partition key.

 postgres=# create table t (i int) partition by range(i,i);
 ERROR:  column "i" appears more than once in partition key

However, I can use same expression more than once in partition key.

postgres=# create table t (i int) partition by range((i),(i));
CREATE TABLE

Although this can be blocked by the attached parch, for example,
the following is still possible.

postgres=# create table t (i int) partition by range((i),i);
CREATE TABLE

Can these definition be a problem in the internal of partitioning?
If not, we might not need to check column that apears more than once 
in the partition key.

Regards,



-- 
Yugo Nagata 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix a typo in README.dependencies

2017-06-23 Thread Ashutosh Bapat
On Fri, Jun 23, 2017 at 2:58 AM, Alvaro Herrera
 wrote:
> Ashutosh Bapat wrote:
>> On Mon, Jun 5, 2017 at 8:22 AM, atorikoshi
>>  wrote:
>> > Hi,
>> >
>> > I found below formula to compute selectivities, but
>> > I think the last Probability 'P(b=?)' should be 'P(c=?)'.
>> >
>> >>  P(a=?,b=?,c=?) = P(a=?,b=?) * (d + (1-d)*P(b=?))
>> >
>> >
>> > Attached patch fixes it, and it also adds some spaces
>> > following another formula which is on line 86 and
>> > computes P(a=?, b=?).
>>
>> Agree. Also using "d" for "degree of functional dependence (b=>c) as
>> well is confusing. We are already using "d" for "degree of functional
>> dependence (a=>b). Here' patch to use "d'" instead of "d".
>
> Since the surrounding text uses single quotes to talk about each letter,
> I thought it was better to use a new letter (e) so that we don't require
> the "prime" notation, which would end up being either inconsistent,
> confusing, stupid-looking, or combinations thereof.

Makes sense.

>
> Also, your proposed text had a slight mistake: it's not (b=>c) that
> d' is for, but (a,b=>c).

Sorry for that.

>
> Pushed with those corrections.

Thanks.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix a typo in README.dependencies

2017-06-23 Thread atorikoshi


On 2017/06/23 6:28, Alvaro Herrera wrote:

Ashutosh Bapat wrote:

On Mon, Jun 5, 2017 at 8:22 AM, atorikoshi
 wrote:

Hi,

I found below formula to compute selectivities, but
I think the last Probability 'P(b=?)' should be 'P(c=?)'.


 P(a=?,b=?,c=?) = P(a=?,b=?) * (d + (1-d)*P(b=?))



Attached patch fixes it, and it also adds some spaces
following another formula which is on line 86 and
computes P(a=?, b=?).


Agree. Also using "d" for "degree of functional dependence (b=>c) as
well is confusing. We are already using "d" for "degree of functional
dependence (a=>b). Here' patch to use "d'" instead of "d".


Since the surrounding text uses single quotes to talk about each letter,
I thought it was better to use a new letter (e) so that we don't require
the "prime" notation, which would end up being either inconsistent,
confusing, stupid-looking, or combinations thereof.

Also, your proposed text had a slight mistake: it's not (b=>c) that
d' is for, but (a,b=>c).

Pushed with those corrections.

Thanks for the reports and patches!



Thanks!

--
Atsushi Torikoshi
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Setting pd_lower in GIN metapage

2017-06-23 Thread Michael Paquier
On Fri, Jun 23, 2017 at 3:17 PM, Amit Langote
 wrote:
> Initially, I had naively set wal_consistency_check = all before running
> make installcheck and then had to wait for a long time to confirm that WAL
> generated by the gin test indeed caused consistency check failure on the
> standby with the v1 patch.

wal_consistency_check = gin would have saved you a lot of I/O.

> But I can see Sawada-san's point that there should be some way for
> developers writing code that better had gone through WAL consistency
> checking facility to do it without much hassle.  But then again, it may
> not be that frequent to need that.

Yeah, I have my own set of generic scripts for that. There could be a
set of modules out of the main check-world, the risk that those finish
rotting is high though...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Broken O(n^2) avoidance in wal segment recycling.

2017-06-23 Thread Michael Paquier
On Thu, Jun 22, 2017 at 6:10 AM, Andres Freund  wrote:
> We've not heard any complaints about this afaik, but it's not something
> that's easily diagnosable as being a problem.  Therefore I suspect we
> should fix and backpatch this?

Agreed. I have just poked at this problem and have finished with the
attached. Logically it is not complicated at the argument values used
by the callers of RemoveXlogFile() are never updated when scanning
pg_wal. Surely this patch needs an extra pair of eyes.
-- 
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 0a6314a642..d594401b6e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -878,7 +878,8 @@ static int	emode_for_corrupt_record(int emode, XLogRecPtr RecPtr);
 static void XLogFileClose(void);
 static void PreallocXlogFiles(XLogRecPtr endptr);
 static void RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr);
-static void RemoveXlogFile(const char *segname, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr);
+static void RemoveXlogFile(const char *segname, XLogSegNo *endlogSegNo,
+		   XLogSegNo recycleSegNo);
 static void UpdateLastRemovedPtr(char *filename);
 static void ValidateXLOGDirectoryStructure(void);
 static void CleanupBackupHistory(void);
@@ -3829,10 +3830,18 @@ UpdateLastRemovedPtr(char *filename)
 static void
 RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
 {
+	XLogSegNo	endlogSegNo;
+	XLogSegNo	recycleSegNo;
 	DIR		   *xldir;
 	struct dirent *xlde;
 	char		lastoff[MAXFNAMELEN];
 
+	/*
+	 * Initialize info about where to try to recycle to.
+	 */
+	XLByteToPrevSeg(endptr, endlogSegNo);
+	recycleSegNo = XLOGfileslop(PriorRedoPtr);
+
 	xldir = AllocateDir(XLOGDIR);
 	if (xldir == NULL)
 		ereport(ERROR,
@@ -3875,7 +3884,7 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
 /* Update the last removed location in shared memory first */
 UpdateLastRemovedPtr(xlde->d_name);
 
-RemoveXlogFile(xlde->d_name, PriorRedoPtr, endptr);
+RemoveXlogFile(xlde->d_name, , recycleSegNo);
 			}
 		}
 	}
@@ -3905,8 +3914,14 @@ RemoveNonParentXlogFiles(XLogRecPtr switchpoint, TimeLineID newTLI)
 	struct dirent *xlde;
 	char		switchseg[MAXFNAMELEN];
 	XLogSegNo	endLogSegNo;
+	XLogSegNo	recycleSegNo;
 
+	/*
+	 * Initialize info about where to begin the work. This will recycle
+	 * arbitrarily 10 segments.
+	 */
 	XLByteToPrevSeg(switchpoint, endLogSegNo);
+	recycleSegNo = endLogSegNo + 10;
 
 	xldir = AllocateDir(XLOGDIR);
 	if (xldir == NULL)
@@ -3944,7 +3959,7 @@ RemoveNonParentXlogFiles(XLogRecPtr switchpoint, TimeLineID newTLI)
 			 * - but seems safer to let them be archived and removed later.
 			 */
 			if (!XLogArchiveIsReady(xlde->d_name))
-RemoveXlogFile(xlde->d_name, InvalidXLogRecPtr, switchpoint);
+RemoveXlogFile(xlde->d_name, , recycleSegNo);
 		}
 	}
 
@@ -3954,31 +3969,22 @@ RemoveNonParentXlogFiles(XLogRecPtr switchpoint, TimeLineID newTLI)
 /*
  * Recycle or remove a log file that's no longer needed.
  *
- * endptr is current (or recent) end of xlog, and PriorRedoRecPtr is the
- * redo pointer of the previous checkpoint. These are used to determine
- * whether we want to recycle rather than delete no-longer-wanted log files.
- * If PriorRedoRecPtr is not known, pass invalid, and the function will
- * recycle, somewhat arbitrarily, 10 future segments.
+ * segname is the name of the segment to recycle or remove. endlogSegNo
+ * is the segment number of the current (or recent) end of WAL. recycleSegNo
+ * is the segment number to recycle up to.
+ *
+ * endlogSegNo gets incremented if the segment is recycled so as it is not
+ * checked again with future callers of this function.
  */
 static void
-RemoveXlogFile(const char *segname, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
+RemoveXlogFile(const char *segname, XLogSegNo *endlogSegNo,
+			   XLogSegNo recycleSegNo)
 {
 	char		path[MAXPGPATH];
 #ifdef WIN32
 	char		newpath[MAXPGPATH];
 #endif
 	struct stat statbuf;
-	XLogSegNo	endlogSegNo;
-	XLogSegNo	recycleSegNo;
-
-	/*
-	 * Initialize info about where to try to recycle to.
-	 */
-	XLByteToSeg(endptr, endlogSegNo);
-	if (PriorRedoPtr == InvalidXLogRecPtr)
-		recycleSegNo = endlogSegNo + 10;
-	else
-		recycleSegNo = XLOGfileslop(PriorRedoPtr);
 
 	snprintf(path, MAXPGPATH, XLOGDIR "/%s", segname);
 
@@ -3987,9 +3993,9 @@ RemoveXlogFile(const char *segname, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
 	 * segment. Only recycle normal files, pg_standby for example can create
 	 * symbolic links pointing to a separate archive directory.
 	 */
-	if (endlogSegNo <= recycleSegNo &&
+	if (*endlogSegNo <= recycleSegNo &&
 		lstat(path, ) == 0 && S_ISREG(statbuf.st_mode) &&
-		InstallXLogFileSegment(, path,
+		InstallXLogFileSegment(endlogSegNo, path,
 			   true, recycleSegNo, true))
 	{
 		ereport(DEBUG2,
@@ -3997,7 +4003,7 @@ RemoveXlogFile(const char 

Re: [HACKERS] Setting pd_lower in GIN metapage

2017-06-23 Thread Amit Langote
On 2017/06/23 15:07, Michael Paquier wrote:
> On Fri, Jun 23, 2017 at 2:56 PM, Masahiko Sawada  
> wrote:
>> Thank you for updating the patch. It looks good to me.
>> BTW I'm inclined to have a regression test case where doing 'make
>> check' to the streaming replication environment with
>> wal_consistency_check on standby server so that we can detect a bug
>> around the wal.
> 
> This would be very costly. A single run of the main regression tests
> generates 15 to 20GB of FPWs if I recall correctly. Tiny buildfarm
> members would suffer on that. *cough*

Initially, I had naively set wal_consistency_check = all before running
make installcheck and then had to wait for a long time to confirm that WAL
generated by the gin test indeed caused consistency check failure on the
standby with the v1 patch.

But I can see Sawada-san's point that there should be some way for
developers writing code that better had gone through WAL consistency
checking facility to do it without much hassle.  But then again, it may
not be that frequent to need that.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Setting pd_lower in GIN metapage

2017-06-23 Thread Michael Paquier
On Fri, Jun 23, 2017 at 2:56 PM, Masahiko Sawada  wrote:
> Thank you for updating the patch. It looks good to me.
> BTW I'm inclined to have a regression test case where doing 'make
> check' to the streaming replication environment with
> wal_consistency_check on standby server so that we can detect a bug
> around the wal.

This would be very costly. A single run of the main regression tests
generates 15 to 20GB of FPWs if I recall correctly. Tiny buildfarm
members would suffer on that. *cough*
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi column range partition table

2017-06-23 Thread Amit Langote
On 2017/06/23 13:42, amul sul wrote:
> On Fri, Jun 23, 2017 at 6:58 AM, Amit Langote wrote:
>> On 2017/06/22 20:48, amul sul wrote:
>>> This happened because of UNBOUNDED handling, where it is a negative infinite
>>> if it is in FROM clause.  Wondering can't we explicitly treat this as
>>> a positive infinite value, can we?
>>
>> No, we cannot.  What would be greater than (or equal to) +infinite?
>> Nothing.  So, even if you will want p3 to accept (10, 9890148), it won't
>> because 9890148 is not >= +infinite.  It will accept only the rows where
>> the first column is > 10 (second column is not checked in that case).
>>
>> You will have to define p3 as follows:
>>
>> CREATE TABLE p3 PARTITION OF tab1 FOR VALUES FROM (11, UNBOUNDED) TO (20, 
>> 10);
>>
> What if the partition key column is FLOAT ?

I would say use a value such that the btfloat4cmp (or btfloat8cmp) will
tell it to be greater than 10.

Of course, we can't write what I just said in the user-level
documentation, because the fact that we use system- or user-defined btree
comparison proc (btfloat4/8cmp) for partitioning may be irrelevant to the
users.  Although, we do mention in the documentation that we use btree
operator class specified semantics for partitioning.  In any case,
defining your partitioning or indexing on raw float type column(s) is
prone to semantic caveats, I'd think.

Also, there was interesting exchange on this topic during the patch
development [1].  Excerpt:

"Same for ranges of floating-point numbers, which are also probably an
unlikely candidate for a partitioning key anyway."

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BTgmoaucSqQ%3DdJFhaojSpb1706MQYo1Tfn_3tWv6CVWhAOdrQ%40mail.gmail.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers