date:20170620

Re: [HACKERS] Regarding Postgres Dynamic Shared Memory (DSA)

2017-06-20 Thread Mahendranath Gurram

Hi Thomas,



Thanks for taking time and explaining the things.

Postgres extensions can't rely on backends inheriting the postmaster's 

memory map like this (other than the main shared memory areas which 

the core code looks after). For one thing, new backends aren't 

created with fork() on all platforms (mainly Windows AFAIK, but also 

any build with EXEC_BACKEND defined). The next problem is that dsa.c 

and dsm.c work with reference counts that will be wrong if you try to 

use memory map inheritance like this. Another problem is that the 

postmaster isn't allowed to create DSM segment: if it's working for 

you then I think you must be building with asserts disabled?


Now I understood, i took wrong approach.


That's simply not how DSA is 

designed to be used: you need to create DSA areas in a non-postmaster 

backend, and then attach explicitly from every other backend that 

wants to access the area. Each backend needs to get its own dsa_area 

object (either by creating or attaching).


Initially i tried to design the same way. 

I mean, i have created a background worker and created dsa in it.

I tried to attach/detach to the same dsa/dsm by all the backends(postgres 
clients/connections) during backend(client/connection) init/destroy.

I didn't find any triggers or callbacks during backend init/close to 
attach/detach the dsa/dsm.  Hence, i took this approach.

If postgres have any such triggers/callbacks available, please let me know, 
that is of great great help for me.



Anyways now i understood, i have taken a wrong approach to use dsa. I'll try to 
figure out any other way to build my in-memory index over postgres.



Once again thanks a lot for taking time to help me.



@Dilip thank you for your response :)



Thanks  Best Regards,

-Mahi
Teamwork divides the task and multiplies the success.








 On Wed, 21 Jun 2017 04:26:45 +0530 Thomas Munro 
thomas.mu...@enterprisedb.com wrote 




On Sat, Jun 17, 2017 at 1:17 AM, Mahi Gurram teckym...@gmail.com wrote: 

 3. Whether you are the backend that created it or a backend that 

 attached to it, I think you'll need to store the dsa_area in a global 

 variable for your UDFs to access. Note that the dsa_area object will 

 be different in each backend: there is no point in storing that 

 address itself in shared memory, as you have it, as you certainly 

 can't use it in any other backend. In other words, each backend that 

 attached has its own dsa_area object that it can use to access the 

 common dynamic shared memory area. 

 

 

 In case of forked processes, the OS actually does share the pages 
initially, 

 because fork implements copy-on-write semantics. which means that provided 

 none of the processes modifies the pages, they both points to same address 

 and the same data. 

 

 Based on above theory, assume i have created dsa_area object in postmaster 

 process(_PG_Init) and is a global variable, all the backends/forked 

 processes can able to access/share the same dsa_area object and it's 

 members. 

 

 Hence theoretically, the code should work with out any issues. But i'm 
sure 

 why it is not working as expected :( 

 

 I tried debugging by putting prints, and observed the below things: 

 1. dsa_area_control address is different among postmaster process and 

 backends. 

 2. After restarting, they seems to be same and hence it is working after 

 that. 

 

Postgres extensions can't rely on backends inheriting the postmaster's 

memory map like this (other than the main shared memory areas which 

the core code looks after). For one thing, new backends aren't 

created with fork() on all platforms (mainly Windows AFAIK, but also 

any build with EXEC_BACKEND defined). The next problem is that dsa.c 

and dsm.c work with reference counts that will be wrong if you try to 

use memory map inheritance like this. Another problem is that the 

postmaster isn't allowed to create DSM segment: if it's working for 

you then I think you must be building with asserts disabled? 

 

I'm not sure exactly why you're seeing the symptoms you're seeing 

(working on one flavour of Unix and not another, and then working 

after a crash-restart -- I guess it has something to do with 

coincidences of mapped address). That's simply not how DSA is 

designed to be used: you need to create DSA areas in a non-postmaster 

backend, and then attach explicitly from every other backend that 

wants to access the area. Each backend needs to get its own dsa_area 

object (either by creating or attaching). 

 

-- 

Thomas Munro 

http://www.enterprisedb.com 

 

 

-- 

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) 

To make changes to your subscription: 

http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-06-20 Thread sanyam jain

Hi,

In this patch in walsender.c sendTimeLineIsHistoric is set to true when current 
and ThisTimeLineID are equal.

sendTimeLineIsHistoric = state->currTLI == ThisTimeLineID;


Shouldn't sendTimeLineIsHistoric is true when state->currTLI is less than 
ThisTimeLineID.


When i applied the timeline following patch alone pg_recvlogical quits in 
startup phase but when i made the above change pg_recvlogical works although 
timeline following doesn't work.


Thanks,

Sanyam Jain


From: pgsql-hackers-ow...@postgresql.org  
on behalf of Robert Haas 
Sent: Wednesday, April 5, 2017 3:25:50 PM
To: Andres Freund
Cc: Craig Ringer; Simon Riggs; Thom Brown; Michael Paquier; Petr Jelinek; 
PostgreSQL Hackers
Subject: Re: [HACKERS] Logical decoding on standby

On Wed, Apr 5, 2017 at 10:32 AM, Andres Freund  wrote:
> On 2017-04-05 17:18:24 +0800, Craig Ringer wrote:
>> On 5 April 2017 at 04:19, Andres Freund  wrote:
>> > On 2017-04-04 22:32:40 +0800, Craig Ringer wrote:
>> >> I'm much happier with this. I'm still fixing some issues in the tests
>> >> for 03 and tidying them up, but 03 should allow 01 and 02 to be
>> >> reviewed in their proper context now.
>> >
>> > To me this very clearly is too late for v10, and now should be moved to
>> > the next CF.
>>
>> I tend to agree that it's late in the piece. It's still worth cleaning
>> it up into a state ready for early pg11 though.
>
> Totally agreed.

Based on this exchange, marked as "Moved to next CF".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Getting server crash on Windows when using ICU collation

2017-06-20 Thread Amit Kapila

On Wed, Jun 21, 2017 at 12:15 AM, Peter Eisentraut
 wrote:
> On 6/20/17 09:23, Amit Kapila wrote:
>> To avoid that why can't we use the same icu path for executing uconv
>> as we are using for linking?
>
> Yeah, you'd need to use prefix $self->{options}->{icu} . '\bin\uconv' or
> something like that.
>

That's what I had in mind as well.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optional message to user when terminating/cancelling backend

2017-06-20 Thread Michael Paquier

On Tue, Jun 20, 2017 at 3:24 AM, Daniel Gustafsson  wrote:
> The message is stored in a new shmem area which is checked when the session is
> aborted.  To keep things simple a small buffer is kept per backend for the
> message.  If deemed too costly, keeping a central buffer from which slabs are
> allocated can be done (but seemed rather complicated for little gain compared
> to the quite moderate memory spend.)

I think that you are right to take the approach with a per-backend
slot. This will avoid complications related to entry removals and
locking issues. There would be scaling issues as well if things get
very signaled for a lot of backends.

+#define MAX_CANCEL_MSG 128
That looks enough.

+   LWLockAcquire(BackendCancelLock, LW_EXCLUSIVE);
+
+   strlcpy(slot->message, message, sizeof(slot->message));
+   slot->len = strlen(message);
Why not using one spin lock per slot and save it in BackendCancelShmemStruct?

+   pid_t   pid = PG_GETARG_INT32(0);
+   char   *msg = text_to_cstring(PG_GETARG_TEXT_PP(1));
+
+   PG_RETURN_BOOL(pg_terminate_backend_internal(pid, msg));
It would be more solid to add some error handling for messages that
are too long, or at least truncate the message if too long.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] CREATE SUBSCRIPTION log noise

2017-06-20 Thread Jeff Janes

I think this should go away:

ereport(NOTICE,
  (errmsg("created replication slot \"%s\" on
publisher",
  slotname)));


It doesn't appear to be contingent on anything other than the content of
the command you just gave it.  I don't think we need a NOTICE saying that
it did that thing I just told it to do--that should be implicit by the lack
of an ERROR.

Cheers,

Jeff

Re: [HACKERS] Missing comment for ResultRelInfo in execnodes.h

2017-06-20 Thread Etsuro Fujita


On 2017/06/21 3:30, Peter Eisentraut wrote:

On 6/20/17 05:50, Etsuro Fujita wrote:

Here is a small patch to add a comment on its new member PartitionRoot.


The existing comment style is kind of unusual.  How about the attached
to clean it up a bit?


+1 for that change.

Best regards,
Etsuro Fujita



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-06-20 Thread Noah Misch

On Thu, Jun 15, 2017 at 11:40:52PM -0400, Peter Eisentraut wrote:
> On 6/13/17 15:49, Peter Eisentraut wrote:
> > On 6/13/17 02:33, Noah Misch wrote:
> >>> Steps to reproduce -
> >>> X cluster -> create 100 tables , publish all tables  (create publication 
> >>> pub
> >>> for all tables);
> >>> Y Cluster -> create 100 tables ,create subscription(create subscription 
> >>> sub
> >>> connection 'user=centos host=localhost' publication pub;
> >>> Y cluster ->drop subscription - drop subscription sub;
> >>>
> >>> check the log file on Y cluster.
> >>>
> >>> Sometime , i have seen this error on psql prompt and drop subscription
> >>> operation got failed at first attempt.
> >>>
> >>> postgres=# drop subscription sub;
> >>> ERROR:  tuple concurrently updated
> >>> postgres=# drop subscription sub;
> >>> NOTICE:  dropped replication slot "sub" on publisher
> >>> DROP SUBSCRIPTION
> >>
> >> [Action required within three days.  This is a generic notification.]
> > 
> > It's being worked on.  Let's see by Thursday.
> 
> A patch has been posted, and it's being reviewed.  Next update Monday.

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Broken link to DocBook XSL Stylesheets

2017-06-20 Thread Masahiko Sawada

Hi,

In J.2. Tool Sets section of documentation, there is a link to DocBook
XSL Stylesheets but that link seems no longer available. I got 404
error.

J.2. Tool Sets


As a updated next link I guess we can use the following link.


Any thoughts?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] Beta 10 parser error for CREATE STATISTICS IF NOT EXISTS

2017-06-20 Thread Amit Langote

On 2017/06/21 10:15, Amit Langote wrote:
> On 2017/06/21 9:42, Bruno Wolff III wrote:
>> I'm not seeing an obvious error in my attempt to use CREATE STATISTICS IF
>> NOT EXISTS. Given this is new, maybe there is a bug in the parser.
>>
>> Sample output:
>> psql (10beta1)
>> Type "help" for help.
>>
>> o365logs=# select version();
>> 
>> version 
>> 
>>
>> 
>> PostgreSQL 10beta1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
>> 20150623
>> (Red Hat 4.8.5-11), 64-bit
>> (1 row)
>>
>> o365logs=# CREATE STATISTICS IF NOT EXISTS logs_corrtest (dependencies) ON
>> record_type, operation FROM logs;
>> ERROR:  syntax error at or near "NOT"
>> LINE 1: CREATE STATISTICS IF NOT EXISTS logs_corrtest (dependencies)...
> 
> Looks like a documentation bug if the authors of the feature actually
> meant to implement the following syntax:
> 
>   CREATE [ IF NOT EXISTS ] STATISTICS
> 
> create if not exists statistics words_stats on a, b from words;
> CREATE STATISTICS
> 
> create if not exists statistics words_stats on a, b from words;
> NOTICE:  statistics object "words_stats" already exists, skipping
> CREATE STATISTICS
> 
> If that's really what's intended, it seems a bit inconsistent with most
> other commands and with DROP STATISTICS [ IF NOT EXISTS ] itself.

Here is a patch, just in case, that changes the grammar to accept the
following syntax instead of the current one:

  CREATE STATISTICS [ IF NOT EXIST ] ...

Also added a test.  Documentation already displays the above syntax, so no
update needed there.

Thanks,
Amit
From a4d331f2be74ad4e0c9a30f3984c2988aa541c3d Mon Sep 17 00:00:00 2001
From: amit 
Date: Wed, 21 Jun 2017 10:47:06 +0900
Subject: [PATCH] Fix the syntax of IF NOT EXISTS variant of CREATE STATISTICS

The norm seems to be CREATE  IF NOT EXISTS, not
CREATE IF NOT EXISTS , which the original code implemented.
---
 src/backend/parser/gram.y   | 23 +--
 src/test/regress/expected/stats_ext.out |  5 +
 src/test/regress/sql/stats_ext.sql  |  5 +
 3 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index ada95e5bc3..34e07c87a2 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3846,15 +3846,26 @@ ExistingIndex:   USING INDEX index_name 
{ $$ = $3; }
  */
 
 CreateStatsStmt:
-   CREATE opt_if_not_exists STATISTICS any_name
+   CREATE STATISTICS any_name
opt_name_list ON expr_list FROM from_list
{
CreateStatsStmt *n = 
makeNode(CreateStatsStmt);
-   n->defnames = $4;
-   n->stat_types = $5;
-   n->exprs = $7;
-   n->relations = $9;
-   n->if_not_exists = $2;
+   n->defnames = $3;
+   n->stat_types = $4;
+   n->exprs = $6;
+   n->relations = $8;
+   n->if_not_exists = false;
+   $$ = (Node *)n;
+   }
+   | CREATE STATISTICS IF_P NOT EXISTS any_name
+ opt_name_list ON expr_list FROM from_list
+   {
+   CreateStatsStmt *n = 
makeNode(CreateStatsStmt);
+   n->defnames = $6;
+   n->stat_types = $7;
+   n->exprs = $9;
+   n->relations = $11;
+   n->if_not_exists = true;
$$ = (Node *)n;
}
;
diff --git a/src/test/regress/expected/stats_ext.out 
b/src/test/regress/expected/stats_ext.out
index 5fd1244bcb..97551e9df6 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -34,6 +34,11 @@ ERROR:  unrecognized statistic type "unrecognized"
 CREATE TABLE ab1 (a INTEGER, b INTEGER, c INTEGER);
 CREATE STATISTICS ab1_a_b_stats ON a, b FROM ab1;
 DROP STATISTICS ab1_a_b_stats;
+-- Check the IF NOT EXISTS syntax
+CREATE STATISTICS IF NOT EXISTS ab1_a_b_stats ON a, b FROM ab1;
+CREATE STATISTICS IF NOT EXISTS ab1_a_b_stats ON a, b FROM ab1;
+NOTICE:  statistics object "ab1_a_b_stats" already exists,

Re: [HACKERS] Proposal for CSN based snapshots

2017-06-20 Thread Michael Paquier

On Wed, Aug 24, 2016 at 5:54 PM, Heikki Linnakangas  wrote:
> And here are the results on the 72 core machine (thanks again, Alexander!).
> The test setup was the same as on the 32-core machine, except that I ran it
> with more clients since the system has more CPU cores. In summary, in the
> best case, the patch increases throughput by about 10%. That peak is with 64
> clients. Interestingly, as the number of clients increases further, the gain
> evaporates, and the CSN version actually performs worse than unpatched
> master. I don't know why that is. One theory that by eliminating one
> bottleneck, we're now hitting another bottleneck which doesn't degrade as
> gracefully when there's contention.
>
> Full results are available at
> https://hlinnaka.iki.fi/temp/csn-4-72core-results/.

There has not been much activity on this thread for some time, and I
mentioned my intentions to some developers at the last PGCon. But I am
planning to study more the work that has been done here, with as
envisaged goal to present a patch for the first CF of PG11. Lots of
fun ahead.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting pd_lower in GIN metapage

2017-06-20 Thread Amit Langote

On 2017/06/20 20:37, Amit Kapila wrote:
> On Tue, Jun 20, 2017 at 1:50 PM, Amit Langote
>  wrote:
>> On 2017/06/19 23:31, Tom Lane wrote:
>>> I'd suggest a rule like "if pd_lower is smaller than SizeOfPageHeaderData
>>> then don't trust it, but assume all of the page is valid data".
>>
>> Actually, such a check is already in place in the tool, whose condition
>> looks like:
>>
>> if (PageGetPageSize(header) == BLCKSZ &&
>> PageGetPageLayoutVersion(header) == PG_PAGE_LAYOUT_VERSION &&
>> (header->pd_flags & ~PD_VALID_FLAG_BITS) == 0 &&
>> header->pd_lower >= SizeOfPageHeaderData &&
>> header->pd_lower <= header->pd_upper &&
>> header->pd_upper <= header->pd_special &&
>> header->pd_special <= BLCKSZ &&
>> header->pd_special == MAXALIGN(header->pd_special) && ...
>>
>> which even GIN metapage passes, making it an eligible data page and hence
>> for omitting the hole between pd_lower and pd_upper.
>>
> 
> Won't checking for GIN_META in header->pd_flags gives you what you want?

GIN_META flag is not written into pd_flags but GinPageOpaqueData.flags,
which still requires including GIN's private header.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_bsd_indent 2.0 is available from git.postgresql.org

2017-06-20 Thread Tom Lane

Michael Paquier  writes:
> On Wed, Jun 21, 2017 at 8:43 AM, Tom Lane  wrote:
>> Yeah, I thought it would work fine with Makefile-using Windows toolchains.
>> But people who use MSVC need something else, no?

> Are there that many anyway who care?

Well, *I* don't care, but I thought we wanted to support Windows-using
developers as reasonably first-class citizens.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_bsd_indent 2.0 is available from git.postgresql.org

2017-06-20 Thread Michael Paquier

On Wed, Jun 21, 2017 at 8:43 AM, Tom Lane  wrote:
> Yeah, I thought it would work fine with Makefile-using Windows toolchains.
> But people who use MSVC need something else, no?

Are there that many anyway who care? Personally I already fallback to
Linux when it comes to indentation of Windows patches.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256

2017-06-20 Thread Michael Paquier

On Wed, Jun 21, 2017 at 4:04 AM, Álvaro Hernández Tortosa
 wrote:
> In the coming weeks, and once my PR for pgjdbc has been added, I will
> work towards another patch to implement channel binding. Should be
> reasonably easy now, thanks to this.

So you basically have an equivalent of OpenSSL stuff in java, right?
- SSL_get_peer_certificate to get the X509 point of the server.
- X509_digest to hash it.
- OBJ_find_sigid_algs and X509_get_signature_nid to guess the
signature algorithm of a certificate. I think that this part can be
tricky depending on the SSL implementation, but I have designed a
generic API for this purpose.
That's all it took me to get end-point to work. Plus the error
handling of course.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] visual studio 2017 build support

2017-06-20 Thread Haribabu Kommi

On Mon, Apr 24, 2017 at 5:50 PM, Ideriha, Takeshi <
ideriha.take...@jp.fujitsu.com> wrote:

> Hi
>
>
>
> I’ve noticed src/tools/msvc/README also needs some fix together with your
> patch.
>
> README discription haven’t updated since VS 2012.
>

Thanks for the review. Here I attached an updated patch with README update.

Regards,
Hari Babu
Fujitsu Australia


0001-VS-2017-build-support-to-PostgreSQL.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_bsd_indent 2.0 is available from git.postgresql.org

2017-06-20 Thread Tom Lane

Andrew Dunstan  writes:
> On 06/20/2017 04:17 PM, Tom Lane wrote:
>> There's no Windows support yet, and I won't be writing that,
>> but I hope someone else will.

> What extra support do you think it needs? I have built it on Cygwin
> without touching a line of code, will probably be able to to the same on
> Mingw.

Yeah, I thought it would work fine with Makefile-using Windows toolchains.
But people who use MSVC need something else, no?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_bsd_indent 2.0 is available from git.postgresql.org

2017-06-20 Thread Andrew Dunstan



On 06/20/2017 04:17 PM, Tom Lane wrote:
> I've set up a repo on our git server for the new improved version
> of pg_bsd_indent.  Please test it for portability to your favorite
> platform(s) by doing
>
> git clone https://git.postgresql.org/git/pg_bsd_indent.git
> cd pg_bsd_indent
> make -s all
> make check
>
> Note you will need a PG installation in your PATH; see
> README.pg_bsd_indent for details.
>
> There's no Windows support yet, and I won't be writing that,
> but I hope someone else will.


What extra support do you think it needs? I have built it on Cygwin
without touching a line of code, will probably be able to to the same on
Mingw.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-06-20 Thread Peter Eisentraut

On 6/19/17 22:54, Masahiko Sawada wrote:
>> It seems to me we could just take a stronger lock around
>> RemoveSubscriptionRel(), so that workers can't write in there concurrently.
> 
> Since we reduced the lock level of updating pg_subscription_rel by
> commit 521fd4795e3e the same deadlock issue will appear if we just
> take a stronger lock level.

I was thinking about a more refined approach, like in the attached
patch.  It just changes the locking when in DropSubscription(), so that
that doesn't fail if workers are doing stuff concurrently.  Everything
else stays the same.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From c101b6da52dd4f9f041390a937b337f20d212e5c Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 20 Jun 2017 19:06:42 -0400
Subject: [PATCH] WIP Parametrize locking in RemoveSubscriptionRel

---
 src/backend/catalog/heap.c| 2 +-
 src/backend/catalog/pg_subscription.c | 6 +++---
 src/backend/commands/subscriptioncmds.c   | 4 ++--
 src/include/catalog/pg_subscription_rel.h | 2 +-
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 4e5b79ef94..643e4bb1d5 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -1844,7 +1844,7 @@ heap_drop_with_catalog(Oid relid)
/*
 * Remove any associated relation synchronization states.
 */
-   RemoveSubscriptionRel(InvalidOid, relid);
+   RemoveSubscriptionRel(InvalidOid, relid, RowExclusiveLock);
 
/*
 * Forget any ON COMMIT action for the rel
diff --git a/src/backend/catalog/pg_subscription.c 
b/src/backend/catalog/pg_subscription.c
index c69c461b62..88b81920c8 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -369,7 +369,7 @@ GetSubscriptionRelState(Oid subid, Oid relid, XLogRecPtr 
*sublsn,
  * subscription, or for a particular relation, or both.
  */
 void
-RemoveSubscriptionRel(Oid subid, Oid relid)
+RemoveSubscriptionRel(Oid subid, Oid relid, LOCKMODE lockmode)
 {
Relationrel;
HeapScanDesc scan;
@@ -377,7 +377,7 @@ RemoveSubscriptionRel(Oid subid, Oid relid)
HeapTuple   tup;
int nkeys = 0;
 
-   rel = heap_open(SubscriptionRelRelationId, RowExclusiveLock);
+   rel = heap_open(SubscriptionRelRelationId, lockmode);
 
if (OidIsValid(subid))
{
@@ -405,7 +405,7 @@ RemoveSubscriptionRel(Oid subid, Oid relid)
}
heap_endscan(scan);
 
-   heap_close(rel, RowExclusiveLock);
+   heap_close(rel, lockmode);
 }
 
 
diff --git a/src/backend/commands/subscriptioncmds.c 
b/src/backend/commands/subscriptioncmds.c
index 5aae7b6f91..02cb7c47b8 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -595,7 +595,7 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
{
char   *namespace;
 
-   RemoveSubscriptionRel(sub->oid, relid);
+   RemoveSubscriptionRel(sub->oid, relid, 
RowExclusiveLock);
 
logicalrep_worker_stop(sub->oid, relid);
 
@@ -910,7 +910,7 @@ DropSubscription(DropSubscriptionStmt *stmt, bool 
isTopLevel)
deleteSharedDependencyRecordsFor(SubscriptionRelationId, subid, 0);
 
/* Remove any associated relation synchronization states. */
-   RemoveSubscriptionRel(subid, InvalidOid);
+   RemoveSubscriptionRel(subid, InvalidOid, ExclusiveLock);
 
/* Kill the apply worker so that the slot becomes accessible. */
logicalrep_worker_stop(subid, InvalidOid);
diff --git a/src/include/catalog/pg_subscription_rel.h 
b/src/include/catalog/pg_subscription_rel.h
index f5f6191676..b2cadb4b13 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -74,7 +74,7 @@ extern Oid SetSubscriptionRelState(Oid subid, Oid relid, char 
state,
XLogRecPtr sublsn, bool 
update_only);
 extern char GetSubscriptionRelState(Oid subid, Oid relid,
XLogRecPtr *sublsn, bool 
missing_ok);
-extern void RemoveSubscriptionRel(Oid subid, Oid relid);
+extern void RemoveSubscriptionRel(Oid subid, Oid relid, LOCKMODE lockmode);
 
 extern List *GetSubscriptionRelations(Oid subid);
 extern List *GetSubscriptionNotReadyRelations(Oid subid);
-- 
2.13.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Regarding Postgres Dynamic Shared Memory (DSA)

2017-06-20 Thread Thomas Munro

On Sat, Jun 17, 2017 at 1:17 AM, Mahi Gurram  wrote:
>> 3.  Whether you are the backend that created it or a backend that
>> attached to it, I think you'll need to store the dsa_area in a global
>> variable for your UDFs to access.  Note that the dsa_area object will
>> be different in each backend: there is no point in storing that
>> address itself in shared memory, as you have it, as you certainly
>> can't use it in any other backend. In other words, each backend that
>> attached has its own dsa_area object that it can use to access the
>> common dynamic shared memory area.
>
>
> In case of forked processes, the OS actually does share the pages initially,
> because fork implements copy-on-write semantics. which means that provided
> none of the processes modifies the pages, they both points to same address
> and the same data.
>
> Based on above theory, assume i have created dsa_area object in postmaster
> process(_PG_Init) and is a global variable, all the backends/forked
> processes can able to access/share the same dsa_area object and it's
> members.
>
> Hence theoretically, the code should work with out any issues. But i'm sure
> why it is not working as expected :(
>
> I tried debugging by putting prints, and observed the below things:
> 1. dsa_area_control address is different among postmaster process and
> backends.
> 2. After restarting, they seems to be same and hence it is working after
> that.

Postgres extensions can't rely on backends inheriting the postmaster's
memory map like this (other than the main shared memory areas which
the core code looks after).  For one thing, new backends aren't
created with fork() on all platforms (mainly Windows AFAIK, but also
any build with EXEC_BACKEND defined).  The next problem is that dsa.c
and dsm.c work with reference counts that will be wrong if you try to
use memory map inheritance like this.  Another problem is that the
postmaster isn't allowed to create DSM segment: if it's working for
you then I think you must be building with asserts disabled?

I'm not sure exactly why you're seeing the symptoms you're seeing
(working on one flavour of Unix and not another, and then working
after a crash-restart -- I guess it has something to do with
coincidences of mapped address).  That's simply not how DSA is
designed to be used: you need to create DSA areas in a non-postmaster
backend, and then attach explicitly from every other backend that
wants to access the area.  Each backend needs to get its own dsa_area
object (either by creating or attaching).

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Regression in join selectivity estimations when using foreign keys

2017-06-20 Thread David Rowley

On 20 June 2017 at 07:49, Tom Lane  wrote:
> I'm not totally satisfied that there isn't any case where the smallest
> selectivity hack is appropriate.  In the example you're showing here,
> the FK columns are independent so that we get more or less the right
> answer with or without the FK.  But in some quick testing I could not
> produce a counterexample proving that that heuristic is helpful;
> so for now let's can it.
>
> Thanks, and sorry again for the delay.

Many thanks for taking the time on this.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Is exec_simple_check_node still doing anything?

2017-06-20 Thread Tom Lane

Robert Haas  writes:
> I'm a little mystified by exec_simple_check_node().
> ...
> Did that, possibly, remove the last way in which a simple expression
> could be could become non-simple?  If so, between that and the new
> hasTargetSRFs test, it might now be impossible for
> exec_simple_check_node() to fail.

I think you might be right.  The other way that I'm aware of that
could cause interesting things to happen is if someone redefines
a SQL function that had been inlined in the originally-compiled
version of the expression.  However, it should be the case that
inline_function() will refuse to inline if the new definition
contains anything "scary", so that the expression as seen by
plpgsql is still simple; any non-simplicity will just be hidden
under a function call.

In fact, I suspect we could get rid of exec_simple_recheck_plan
altogether.  It could use a bit more study, but the empty-rtable
check plus the other checks in exec_simple_check_plan (particularly,
hasAggs, hasWindowFuncs, hasTargetSRFs, hasSubLinks) seem like
they are enough to guarantee that what comes out of the planner
will be "simple".

If I recall things correctly, originally there were only the
post-planning simplicity checks that are now embodied in
exec_simple_recheck_plan/exec_simple_check_node.  I think I added
on the pre-planning checks in exec_simple_check_plan in order to
try to save some planning cycles.  Since the SRF checks were
clearly still necessary at the time, I didn't think hard about
whether any of the other post-planning checks could be got rid of.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Typo in insert.sgml

2017-06-20 Thread David G. Johnston

On Tue, Jun 20, 2017 at 1:35 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Tue, Jun 20, 2017 at 2:34 PM, Peter Eisentraut
>>  wrote:
>>> This was not a typo, this was intentional.
>
>> To me, Julien's change seems to make it easier to understand, but
>> maybe I'm misunderstanding what it's trying to say in the first place.
>
> I think there are two subtly different things here: the sequence counters
> will continue to be attached to the identity columns, and the counters
> will continue in sequence (ie, no implied ALTER SEQUENCE RESTART).
> The existing text seems to mean the latter but Julien's change makes it
> about the former.
>
> It could use rewording for clarity, just not this rewording.

Tom, I'm not following your train of thought:

Given one sequenced IDENTITY column in each table (some rows were
delete from each already):
tbl1
4
5
6

tbl2
10
11
12

The point being made is that inserting all rows from tbl1 into tbl2
will result in tbl2 having:
10
11
12
13
14
15

and not:
10
11
12
4
5
6

That the sequence counters continue to be attached seems like it can
and should be taken for granted.  Inserting to a table likewise should
never cause an ALTER SEQUENCE RESTART to occur and so can and should
be taken for granted.

ISTM that both the existing text and Julien's variation say this, just
differently.  I think the word "continue" is part of the problem.  I
also think not being explicit about the fact that its tbl2's sequence
that supplies the new values - instead of using the values already
present in tbl1 - requires one to think harder about the scenario than
need be.

David J.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Is exec_simple_check_node still doing anything?

2017-06-20 Thread Robert Haas

I'm a little mystified by exec_simple_check_node().  The regression
tests seem not to exercise it.  It can only be reached when
exec_simple_recheck_plan() finds no other reason to reject the plan,
and the only case it seems to reject is the one where there's a
set-returning function buried in there someplace.  But then it seems
like hasTargetSRFs would have been true and we would have given up
before making a plan in the first place.  Of course, that only
protects us when originally forming the plan; they don't account for
later changes -- and the code comments claim that an expression which
was originally simple can become non-simple:

 * It is possible though unlikely for a simple expression to become non-simple
 * (consider for example redefining a trivial view).

But I can't quite figure that one out.  If we're selecting from a
trivial view, then the range table won't be empty and the expression
won't be simple in the first place.  The check for a non-empty range
table didn't exist when this comment was originally added
(95f6d2d20921b7c2dbec29bf2706fd9448208aa6, 2007); it was added in a
subsequent redesign (e6faf910d75027bdce7cd0f2033db4e912592bcc; 2011).
Did that, possibly, remove the last way in which a simple expression
could be could become non-simple?  If so, between that and the new
hasTargetSRFs test, it might now be impossible for
exec_simple_check_node() to fail.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Something is rotten in publication drop

2017-06-20 Thread Tom Lane

Robert Haas  writes:
> On Tue, Jun 20, 2017 at 3:18 PM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> Should we add that to the opr_sanity tests?

>> Yeah, I was wondering about that too.  I can imagine that someday
>> there will be prosecdef built-in functions ... but probably, there
>> would never be so many that maintaining the expected-results list
>> would be hard.

> And if it is, then we remove the test.

Right.  I'll go make it so.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Typo in insert.sgml

2017-06-20 Thread Tom Lane

Robert Haas  writes:
> On Tue, Jun 20, 2017 at 2:34 PM, Peter Eisentraut
>  wrote:
>> This was not a typo, this was intentional.

> To me, Julien's change seems to make it easier to understand, but
> maybe I'm misunderstanding what it's trying to say in the first place.

I think there are two subtly different things here: the sequence counters
will continue to be attached to the identity columns, and the counters
will continue in sequence (ie, no implied ALTER SEQUENCE RESTART).
The existing text seems to mean the latter but Julien's change makes it
about the former.

It could use rewording for clarity, just not this rewording.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] initdb initalization failure for collation "ja_JP"

2017-06-20 Thread Marco Atzeri


On 18/06/2017 16:48, Tom Lane wrote:

Marco Atzeri  writes:

Building on Cygwin latest 10  beta1 or head sourece,
make check fails as:
...
performing post-bootstrap initialization ... 2017-05-31 23:23:22.214
CEST [16860] FATAL:  collation "ja_JP" for encoding "EUC_JP" already exists


Hmph.  Could we see the results of "locale -a | grep ja_JP" ?

regards, tom lane



$ locale -a |grep -i ja
ja_JP
ja_JP
ja_JP.utf8
ja_JP.ujis
ja_JP@cjknarrow
ja_JP.utf8@cjknarrow
japanese
japanese.euc
japanese.sjis

$ locale -a |grep -i "euc"
japanese.euc
korean.euc

Regards
Marco


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optional message to user when terminating/cancelling backend

2017-06-20 Thread Satyanarayana Narlapuram

Agree with David on the general usefulness of this channel. Not that Azure has 
this implementation or proposal today, but as a managed service this channel of 
communication is worth. For example, DBA / service can set a policy that if 
certain session exceeds the resource usage DBA can kill it and communicate the 
same. For example, too many locks, lot of IO activity, large open transactions 
etc. The messages will help application developer to tune their workloads 
appropriately. 

Thanks,
Satya

-Original Message-
From: David G. Johnston [mailto:david.g.johns...@gmail.com] 
Sent: Tuesday, June 20, 2017 12:44 PM
To: Alvaro Herrera 
Cc: Satyanarayana Narlapuram ; Pavel 
Stehule ; Daniel Gustafsson ; 
PostgreSQL mailing lists 
Subject: Re: [HACKERS] Optional message to user when terminating/cancelling 
backend

On Tue, Jun 20, 2017 at 11:54 AM, Alvaro Herrera  
wrote:
> Satyanarayana Narlapuram wrote:

> Unless you have a lot of users running psql manually, I don't see how 
> this is actually very useful or actionable.  What would the user do 
> with the information?  Hopefully your users already trust that you'd 
> keep the downtime to the minimum possible.
>

Why wouldn't this be useful in application logs?  Spurious dropped connections 
during application execution would be alarming.  Seeing a message from the DBA 
when looking into those would be a painless and quick way to alleviate stress.

pg_cancel_backend(, 'please try not to leave sessions in an "idle in 
transaction" state...') would also seem like a useful message to communicate; 
to user or application.  Sure, some of this can, and maybe would also need to, 
be done out-of-band but this communication channel seems worthy enough to at 
least evaluate the provided implementation.

David J.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgresql transactons not fully isolated

2017-06-20 Thread J Chapman Flack

On 06/20/2017 12:33 PM, Merlin Moncure wrote:

> postgres=# create table ints (n int);
> CREATE TABLE
> postgres=# insert into ints values (1);
> INSERT 0 1
> postgres=# insert into ints values (2);
> INSERT 0 1
> 
> T1: BEGIN
> T1: UPDATE ints SET n = n + 1;
> T2: BEGIN
> T2: DELETE FROM ints where n = 2; -- blocks
> T1: COMMIT; -- T2 frees
> T2: SELECT * FROM ints;  -- both rows 2 and 3 visible
> T2: COMMIT:

For me (in PG 9.5 at $work), at the instant of the commit in T1,
T2 says:
ERROR:  could not serialize access due to concurrent update

Is it indicated what PG version Michael Malis is using?
Is it clear that transaction_isolation was set to serializable?

I don't actually see that claim in the linked post. I see the
example (about halfway down, under "Skipped Modification"), but
it doesn't claim that transaction_isolation was set to serializable
at the time, unless I skimmed over it somehow. It seems more of a
demonstration of what can happen under a different isolation setting.

-Chap

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re-indent HEAD tomorrow?

2017-06-20 Thread Tom Lane

Barring objections, I'd like to reindent HEAD with the new version
of pg_bsd_indent (and correspondingly updated pgindent script)
tomorrow, say around 1800 UTC.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_bsd_indent 2.0 is available from git.postgresql.org

2017-06-20 Thread Tom Lane

I've set up a repo on our git server for the new improved version
of pg_bsd_indent.  Please test it for portability to your favorite
platform(s) by doing

git clone https://git.postgresql.org/git/pg_bsd_indent.git
cd pg_bsd_indent
make -s all
make check

Note you will need a PG installation in your PATH; see
README.pg_bsd_indent for details.

There's no Windows support yet, and I won't be writing that,
but I hope someone else will.

BTW, HEAD's version of pgindent doesn't play with this yet, so
don't overwrite your old copy of pg_bsd_indent quite yet.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optional message to user when terminating/cancelling backend

2017-06-20 Thread Andres Freund

Hi,

On 2017-06-19 20:24:43 +0200, Daniel Gustafsson wrote:
> When terminating, or cancelling, a backend it’s currently not possible to let
> the signalled session know *why* it was dropped.  This has nagged me in the
> past and now it happened to come up again, so I took a stab at this.  The
> attached patch implements the ability to pass an optional text message to the
> signalled session which is included in the error message:
> 
>   SELECT pg_terminate_backend( [, message]);
>   SELECT pg_cancel_backend( [, message]);
> 
> Right now the message is simply appended on the error message, not sure if
> errdetail or errhint would be better? Calling:
> 
>   select pg_terminate_backend(, 'server rebooting');
> 
> ..leads to:
> 
>   FATAL:  terminating connection due to administrator command: "server 
> rebooting"
> 
> Omitting the message invokes the command just like today.

For extensions it'd also be useful if it'd be possible to overwrite the
error code.  E.g. for citus there's a distributed deadlock detector,
running out of process because there's no way to interrupt lock waits
locally, and we've to do some ugly hacking to generate proper error
messages and code from another session.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgresql transactons not fully isolated

2017-06-20 Thread Merlin Moncure

On Tue, Jun 20, 2017 at 2:34 PM, David G. Johnston
 wrote:
> On Tue, Jun 20, 2017 at 12:22 PM, Chapman Flack  wrote:
>> I get the reported result (DELETE 0 and a table containing 2 and 3)
>> in both 'read committed' and 'read uncommitted'.
>
> Practically speaking those are a single transaction isolation mode.
>
> https://www.postgresql.org/docs/10/static/transaction-iso.html
>
> I think Merlin has mis-read the article he linked to.  The example
> being used there never claims to be done under serialization and seems
> to describe an example of the perils of relying on the default
> isolation level.

oops -- could be operator error :-)

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Something is rotten in publication drop

2017-06-20 Thread Robert Haas

On Tue, Jun 20, 2017 at 3:18 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Mon, Jun 19, 2017 at 9:57 PM, Tom Lane  wrote:
>>> Hm, patch looks okay, but while eyeballing it I started to wonder
>>> why in the world is pg_get_publication_tables marked prosecdef?
>>> If that has any consequences at all, they're probably bad.
>>> There are exactly no other built-in functions that have that set.
>
>> Should we add that to the opr_sanity tests?
>
> Yeah, I was wondering about that too.  I can imagine that someday
> there will be prosecdef built-in functions ... but probably, there
> would never be so many that maintaining the expected-results list
> would be hard.

And if it is, then we remove the test.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optional message to user when terminating/cancelling backend

2017-06-20 Thread David G. Johnston

On Tue, Jun 20, 2017 at 11:54 AM, Alvaro Herrera
 wrote:
> Satyanarayana Narlapuram wrote:

> Unless you have a lot of users running psql manually, I don't see how
> this is actually very useful or actionable.  What would the user do with
> the information?  Hopefully your users already trust that you'd keep the
> downtime to the minimum possible.
>

Why wouldn't this be useful in application logs?  Spurious dropped
connections during application execution would be alarming.  Seeing a
message from the DBA when looking into those would be a painless and
quick way to alleviate stress.

pg_cancel_backend(, 'please try not to leave sessions in an "idle
in transaction" state...') would also seem like a useful message to
communicate; to user or application.  Sure, some of this can, and
maybe would also need to, be done out-of-band but this communication
channel seems worthy enough to at least evaluate the provided
implementation.

David J.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgresql transactons not fully isolated

2017-06-20 Thread David G. Johnston

On Tue, Jun 20, 2017 at 12:22 PM, Chapman Flack  wrote:
> I get the reported result (DELETE 0 and a table containing 2 and 3)
> in both 'read committed' and 'read uncommitted'.

Practically speaking those are a single transaction isolation mode.

https://www.postgresql.org/docs/10/static/transaction-iso.html

I think Merlin has mis-read the article he linked to.  The example
being used there never claims to be done under serialization and seems
to describe an example of the perils of relying on the default
isolation level.

David J.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgresql transactons not fully isolated

2017-06-20 Thread Chapman Flack

On 06/20/2017 03:08 PM, Chapman Flack wrote:

> For me (in PG 9.5 at $work), at the instant of the commit in T1,
> T2 says:
> ERROR:  could not serialize access due to concurrent update

I get that result in 'serializable' and in 'repeatable read'.

I get the reported result (DELETE 0 and a table containing 2 and 3)
in both 'read committed' and 'read uncommitted'.

-Chap

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optional message to user when terminating/cancelling backend

2017-06-20 Thread Alvaro Herrera

Satyanarayana Narlapuram wrote:
> +1.
> 
> This really helps PostgreSQL Azure service as well. When we are doing
> the upgrades to the service, instead of abruptly terminating the
> sessions we can provide this message.

I think you mean "in addition to" rather than "instead of".

Unless you have a lot of users running psql manually, I don't see how
this is actually very useful or actionable.  What would the user do with
the information?  Hopefully your users already trust that you'd keep the
downtime to the minimum possible.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Something is rotten in publication drop

2017-06-20 Thread Tom Lane

Robert Haas  writes:
> On Mon, Jun 19, 2017 at 9:57 PM, Tom Lane  wrote:
>> Hm, patch looks okay, but while eyeballing it I started to wonder
>> why in the world is pg_get_publication_tables marked prosecdef?
>> If that has any consequences at all, they're probably bad.
>> There are exactly no other built-in functions that have that set.

> Should we add that to the opr_sanity tests?

Yeah, I was wondering about that too.  I can imagine that someday
there will be prosecdef built-in functions ... but probably, there
would never be so many that maintaining the expected-results list
would be hard.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256

2017-06-20 Thread Álvaro Hernández Tortosa




On 20/06/17 06:11, Michael Paquier wrote:

On Tue, Jun 6, 2017 at 3:40 PM, Michael Paquier
 wrote:

As far as I can see, there are a couple of things that I still need to
work on to make people happy:
- Rework the generic APIs for TLS finish and endpoint so as any
implementation can use channel binding without inducing any extra code
footprint to be-secure.c and fe-secure.c.
- Implement endpoint, as Alvaro is saying for JDBC that would be nicer.
- Have a couple of tests for channel binding to allow people to test
the feature easily. Those will be in src/test/ssl/. It would be nice
as well to be able to enforce the channel binding type on libpq-side,
which is useful at least for testing. So we are going to need an
environment variable for this purpose, and a connection parameter.

Okay, here we go. Attached is a set of four patches:
- 0001 is some refactoring for the SSL tests so as other test suite in
src/test/ssl can take advantage of the connection routines. There is
nothing fancy here.
- 0002 is the implementation of tls-unique as channel binding. This
has been largely reworked since last submission, I have found on the
way a couple of bugs and some correctness issues.
- 0003 is a patch to add as connection parameters saslname and
saslchannelbinding. With support of more SASL mechanisms (PG10 has
SCRAM-SHA-256, I am adding SCRAM-SHA-256-PLUS here), saslname can be
used to enforce on the client-side the value of the SASL mechanism
chosen. saslchannelbinding does the same for the channel binding name.
This is very useful for testing, and a set of tests are added in
src/test/ssl/ for tls-unique and the SASL mechanisms. The tests cover
many scenarios, like downgrade attacks for example.
- 0004 is the implementation of tls-server-end-point, as Alvaro has
asked. Per RFC 5929, the binding data needs to be a hash of the server
certificate. If the signature algorithm of the certificate is MD5 or
SHA-1, then SHA-256 is used. Other signature algos like SHA-384 or 512
are used to hash the data. The hashed data is then encoded in base64
and sent to the server for verification. Tests using saslchannelname
have been added as well. It took me a while to find out that
OBJ_find_sigid_algs(X509_get_signature_nid(X509*)) needs to be used to
find out the algorithm of a certificate with OpenSSL.

With the tests directly in the patch, things are easy to run. WIth
PG10 stabilization work, of course I don't expect much feedback :)
But this set of patches looks like the direction we want to go so as
JDBC and libpq users can take advantage of channel binding with SCRAM.


This is awesome, Michael.

In the coming weeks, and once my PR for pgjdbc has been added, I 
will work towards another patch to implement channel binding. Should be 
reasonably easy now, thanks to this.


Appreciated!

Álvaro



--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GSOC][weekly report 3] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-06-20 Thread Heikki Linnakangas


On 06/20/2017 06:51 AM, Mengxing Liu wrote:

But in my benchmark, the throughput decrease by 15% after the modification.
Can you help me do a quick review to find if there is anything wrong?
I also attached the flame graph before/after the modification for reference.


Hmm. The hash table ought to speed up the RWConflictExists() function 
right? Where in the flame graph is RWConflictExists()? If it only 
accounts for a small amount of the overall runtime, even drastic speedup 
there won't make much difference to the total.



Here is my questions:
1. Is there any function in HTAB like “clear” that can make itself empty 
quickly?
In this patch, when releasing a transaction object, I need to scan the hash 
table and
use “hash_search” to remove entries one by one. It would make releasing 
operation slower.


Nope, there is no such function, I'm afraid.


In a previous email, I reported that many backends wait for the lock 
“SerializableFinishedListLock”;
If we don't implement functions like “ReleaseOneSerializableXact” quickly, they 
will be the bottleneck.


Yeah, that's probably what's causing the decrease in throughput you're 
seeing.


You might need to also keep the linked lists, and use the hash table to 
only look up particular items in the linked list faster.


I was surprised to see that you're creating a lot of smallish hash 
tables, three for each PredXact entry. I would've expected all the 
PredXact entries to share the same hash tables, i.e. have only three 
hash tables in total. That would be more flexible in allocating 
resources among entries. (It won't help with the quick-release, though)



2. Is the HTAB thread-safe? I would focus on removing some unnecessary locks if 
possible.


Nope, you need to hold a lock while searching/manipulating a HTAB hash 
table. If the hash table gets big and you start to see lock contention, 
you can partition it so that each operation only needs to lock the one 
partition covering the search key.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Typo in insert.sgml

2017-06-20 Thread David G. Johnston

On Tuesday, June 20, 2017, Robert Haas  wrote:

> On Tue, Jun 20, 2017 at 2:34 PM, Peter Eisentraut
> > wrote:
> > On 6/18/17 03:16, Julien Rouhaud wrote:
> >> Patch attached.
> >
> > This was not a typo, this was intentional.
>
> To me, Julien's change seems to make it easier to understand, but
> maybe I'm misunderstanding what it's trying to say in the first place.
>
>
Maybe

[...] that are not identity columns in tbl2. Values for
identity columns will come from the sequence generators for
tbl2.

Otherwise I'd drop the word "continue" altogether and just say "but will
use the sequence counters for".

David J.

Re: [HACKERS] UPDATE of partition key

2017-06-20 Thread Robert Haas

On Tue, Jun 20, 2017 at 2:54 AM, Amit Khandekar  wrote:
>> I guess I don't see why it should work like this.  In the INSERT case,
>> we must build withCheckOption objects for each partition because those
>> partitions don't appear in the plan otherwise -- but in the UPDATE
>> case, they're already there, so why do we need to build anything at
>> all?  Similarly for RETURNING projections.  How are the things we need
>> for those cases not already getting built, associated with the
>> relevant resultRelInfos?  Maybe there's a concern if some children got
>> pruned - they could turn out later to be the children into which
>> tuples need to be routed. But the patch makes no distinction
>> between possibly-pruned children and any others.
>
> Yes, only a subset of the partitions appear in the UPDATE subplans. I
> think typically for updates, a very small subset of the total leaf
> partitions will be there in the plans, others would get pruned. IMHO,
> it would not be worth having an optimization where it opens only those
> leaf partitions which are not already there in the subplans. Without
> the optimization, we are able to re-use the INSERT infrastructure
> without additional changes.

Well, that is possible, but certainly not guaranteed.  I mean,
somebody could do a whole-table UPDATE, or an UPDATE that hits a
smattering of rows in every partition; e.g. the table is partitioned
on order number, and you do UPDATE lineitem SET product_code = 'K372B'
WHERE product_code = 'K372'.

Leaving that aside, the point here is that you're rebuilding
withCheckOptions and returningLists that have already been built in
the planner.  That's bad for two reasons.  First, it's inefficient,
especially if there are many partitions.  Second, it will amount to a
functional bug if you get a different answer than the planner did.
Note this comment in the existing code:

/*
 * Build WITH CHECK OPTION constraints for each leaf partition rel. Note
 * that we didn't build the withCheckOptionList for each partition within
 * the planner, but simple translation of the varattnos for each partition
 * will suffice.  This only occurs for the INSERT case; UPDATE/DELETE
 * cases are handled above.
 */

The comment "UPDATE/DELETE cases are handled above" is referring to
the code that initializes the WCOs generated by the planner.  You've
modified the comment in your patch, but the associated code: your
updated comment says that only "DELETEs and local UPDATES are handled
above", but in reality, *all* updates are still handled above.  And
then they are handled again here.  Similarly for returning lists.
It's certainly not OK for the comment to be inaccurate, but I think
it's also bad to redo the work which the planner has already done,
even if it makes the patch smaller.

Also, I feel like it's probably not correct to use the first result
relation as the nominal relation for building WCOs and returning lists
anyway.  I mean, if the first result relation has a different column
order than the parent relation, isn't this just broken?  If it works
for some reason, the comments don't explain what that reason is.

>> ... I don't understand how you can *not* need a per-leaf-partition
>> mapping.  I mean, maybe you only need the mapping for the *unpruned*
>> leaf partitions
>
> Yes, we need the mapping only for the unpruned leaf partitions, and
> those partitions are available in the per-subplan resultRelInfo's.

OK.

>> but you certainly need a separate mapping for each one of those.
>
> You mean *each* of the leaf partitions ? I didn't get why we would
> need it for each one. The tuple targeted for update belongs to one of
> the per-subplan resultInfos. And this tuple is to be routed to another
> leaf partition. So the reverse mapping is for conversion from the
> source resultRelinfo to the root partition. I am unable to figure out
> a scenario where we would require this reverse mapping for partitions
> on which UPDATE is *not* going to be executed.

I agree - the reverse mapping is only needed for the partitions in
which UPDATE will be executed.

Some other things:

+ * The row was already deleted by a concurrent DELETE. So we don't
+ * have anything to update.

I find this explanation, and the surrounding comments, inadequate.  It
doesn't really explain why we're doing this.  I think it should say
something like this: For a normal UPDATE, the case where the tuple has
been the subject of a concurrent UPDATE or DELETE would be handled by
the EvalPlanQual machinery, but for an UPDATE that we've translated
into a DELETE from this partition and an INSERT into some other
partition, that's not available, because CTID chains can't span
relation boundaries.  We mimic the semantics to a limited extent by
skipping the INSERT if the DELETE fails to find a tuple.  This ensures
that two concurrent attempts to UPDATE the same tuple at the same time
can't turn one tuple into two, and that

Re: [HACKERS] Typo in insert.sgml

2017-06-20 Thread Julien Rouhaud

On 20/06/2017 20:34, Peter Eisentraut wrote:
> On 6/18/17 03:16, Julien Rouhaud wrote:
>> Patch attached.
> 
> This was not a typo, this was intentional.
> 

Oh, sorry.  I'm not a native english speaker, that sounded really weird.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Getting server crash on Windows when using ICU collation

2017-06-20 Thread Peter Eisentraut

On 6/20/17 09:23, Amit Kapila wrote:
> To avoid that why can't we use the same icu path for executing uconv
> as we are using for linking?

Yeah, you'd need to use prefix $self->{options}->{icu} . '\bin\uconv' or
something like that.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Getting server crash on Windows when using ICU collation

2017-06-20 Thread Peter Eisentraut

On 6/19/17 23:36, Ashutosh Sharma wrote:
> In  this case, even if i am trying to use ICU 53 libraries but as
> dll's i am using is from ICU 49 (as my system PATH is pointing to ICU
> 49 bin path) the uconv -V output would be '49.1' instead of '53.1'. In
> such case, the postgres installation would fail because during
> installation it would search for libicu53.dll not libicu49.dll.

I don't think this is how it works.  The result would be that uconv
claims the function is available but it's not in the library -- then you
get a compilation failure.  Or uconv claims the function is not there
but it is -- then you just build without using it.  The .dll files are
only used at run time.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Typo in insert.sgml

2017-06-20 Thread Robert Haas

On Tue, Jun 20, 2017 at 2:34 PM, Peter Eisentraut
 wrote:
> On 6/18/17 03:16, Julien Rouhaud wrote:
>> Patch attached.
>
> This was not a typo, this was intentional.

To me, Julien's change seems to make it easier to understand, but
maybe I'm misunderstanding what it's trying to say in the first place.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Something is rotten in publication drop

2017-06-20 Thread Robert Haas

On Mon, Jun 19, 2017 at 9:57 PM, Tom Lane  wrote:
> Peter Eisentraut  writes:
>> If there are no new insights, I plan to proceed with the attached patch
>> tomorrow.  This leaves the existing view and function alone, adds
>> pg_relation_is_publishable() and uses that in psql.
>
> Hm, patch looks okay, but while eyeballing it I started to wonder
> why in the world is pg_get_publication_tables marked prosecdef?
> If that has any consequences at all, they're probably bad.
> There are exactly no other built-in functions that have that set.

Should we add that to the opr_sanity tests?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Typo in insert.sgml

2017-06-20 Thread Peter Eisentraut

On 6/18/17 03:16, Julien Rouhaud wrote:
> Patch attached.

This was not a typo, this was intentional.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix a typo in partition.c

2017-06-20 Thread Peter Eisentraut

On 6/19/17 23:02, Masahiko Sawada wrote:
> Hi,
> 
> Attached patch for $subject.
> 
> s/opreator/operator/

fixed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] subscription worker signalling wal writer too much

2017-06-20 Thread Jeff Janes

On Thu, Jun 15, 2017 at 3:06 PM, Jeff Janes  wrote:

>
> This new patch is simpler than the previous one, and more effective at
> speeding up replication. I assume it would speed up pgbench with
> synchronous_commit turned off (or against unlogged tables) as well, but I
> don't think I have the hardware needed to test that.
>

If I use the 'tpcb-func' script embodied in the attached patch to pgbench,
then I can see the performance difference against unlogged tables using 8
clients on a 8 CPU virtual machine. The normal tpcb-like script has too
much communication overhead, bouncing from pgbench to the postgres backend
7 times per transaction, to see the difference. I also had to
make autovacuum_vacuum_cost_delay=0, otherwise auto analyze holds a
snapshot long enough to bloat the HOT chains which injects a great deal of
variability into the timings.

Commit 7975c5e0a992ae9 in the 9.6 branch causes a regression of about 10%,
and the my patch from the previous email redeems that regression.  It also
gives the same improvement against 10dev HEAD.

Cheers,

Jeff

pgbench_function.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Missing comment for ResultRelInfo in execnodes.h

2017-06-20 Thread Peter Eisentraut

On 6/20/17 05:50, Etsuro Fujita wrote:
> Here is a small patch to add a comment on its new member PartitionRoot.

The existing comment style is kind of unusual.  How about the attached
to clean it up a bit?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 07c05624689a3b1762a2816aeb2c70072b2382d0 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 20 Jun 2017 14:29:48 -0400
Subject: [PATCH] Reformat comments about ResultRelInfo

---
 src/include/nodes/execnodes.h | 80 +++
 1 file changed, 50 insertions(+), 30 deletions(-)

diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 9c0852853a..7e214deafd 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -336,61 +336,81 @@ typedef struct JunkFilter
AttrNumber  jf_junkAttNo;
 } JunkFilter;
 
-/* 
- *   ResultRelInfo information
- *
- * Whenever we update an existing relation, we have to
- * update indices on the relation, and perhaps also fire triggers.
- * The ResultRelInfo class is used to hold all the information 
needed
- * about a result relation, including indices.. -cim 10/15/89
- *
- * RangeTableIndex result relation's range table 
index
- * RelationDescrelation descriptor for result 
relation
- * NumIndices  # of indices existing 
on result relation
- * IndexRelationDescs  array of relation descriptors 
for indices
- * IndexRelationInfo   array of key/attr info for 
indices
- * TrigDesctriggers to be fired, 
if any
- * TrigFunctions   cached lookup info for trigger 
functions
- * TrigWhenExprs   array of trigger WHEN expr 
states
- * TrigInstrument  optional runtime measurements 
for triggers
- * FdwRoutine  FDW callback functions, 
if foreign table
- * FdwStateavailable to save 
private state of FDW
- * usesFdwDirectModify true when modifying foreign 
table directly
- * WithCheckOptionslist of WithCheckOption's to be 
checked
- * WithCheckOptionExprslist of WithCheckOption expr states
- * ConstraintExprs array of constraint-checking 
expr states
- * junkFilter  for removing junk 
attributes from tuples
- * projectReturningfor computing a RETURNING list
- * onConflictSetProj   for computing ON CONFLICT DO 
UPDATE SET
- * onConflictSetWhere  list of ON CONFLICT DO UPDATE 
exprs (qual)
- * PartitionCheck  partition check expression
- * PartitionCheckExpr  partition check expression state
- * 
+/*
+ * ResultRelInfo
+ *
+ * Whenever we update an existing relation, we have to update indexes on the
+ * relation, and perhaps also fire triggers.  ResultRelInfo holds all the
+ * information needed about a result relation, including indexes.
  */
 typedef struct ResultRelInfo
 {
NodeTag type;
+
+   /* result relation's range table index */
Index   ri_RangeTableIndex;
+
+   /* relation descriptor for result relation */
Relationri_RelationDesc;
+
+   /* # of indices existing on result relation */
int ri_NumIndices;
+
+   /* array of relation descriptors for indices */
RelationPtr ri_IndexRelationDescs;
+
+   /* array of key/attr info for indices */
IndexInfo **ri_IndexRelationInfo;
+
+   /* triggers to be fired, if any */
TriggerDesc *ri_TrigDesc;
+
+   /* cached lookup info for trigger functions */
FmgrInfo   *ri_TrigFunctions;
+
+   /* array of trigger WHEN expr states */
ExprState **ri_TrigWhenExprs;
+
+   /* optional runtime measurements for triggers */
Instrumentation *ri_TrigInstrument;
+
+   /* FDW callback functions, if foreign table */
struct FdwRoutine *ri_FdwRoutine;
+
+   /* available to save private state of FDW */
void   *ri_FdwState;
+
+   /* true when modifying foreign table directly */
boolri_usesFdwDirectModify;
+
+   /* list of WithCheckOption's to be checked */
List   *ri_WithCheckOptions;
+
+   /* list of WithCheckOption expr states */
List   *ri_WithCheckOptionExprs;
+
+   /* array of constraint-checking expr states */
ExprState **ri_ConstraintExprs;
+
+   /* for removing

Re: [HACKERS] postgresql transactons not fully isolated

2017-06-20 Thread Tom Lane

Merlin Moncure  writes:
> Michael Malis via:
> http://malisper.me/postgres-transactions-arent-fully-isolated/  has
> determined that postgresql transactions are not fully isolated even
> when using serializable isolationl level.

> If I prep a table, ints via:
> postgres=# create table ints (n int);
> CREATE TABLE
> postgres=# insert into ints values (1);
> INSERT 0 1
> postgres=# insert into ints values (2);
> INSERT 0 1

> and then run two concurrent in serializable isolation mode
> transactions like this:
> T1: BEGIN
> T1: UPDATE ints SET n = n + 1;
> T2: BEGIN
> T2: DELETE FROM ints where n = 2; -- blocks
> T1: COMMIT; -- T2 frees
> T2: SELECT * FROM ints;  -- both rows 2 and 3 visible
> T2: COMMIT:

Hm, I get

ERROR:  could not serialize access due to concurrent update

in T2 immediately after T1 commits.  What version are you testing?
Are you sure you selected serializable mode for both xacts?
(I think it would work like this even if only T2 is marked
serializable, but didn't test.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Don't downcase filepath/filename while loading libraries

2017-06-20 Thread Tom Lane

QL Zhuo  writes:
> And, the attached new patch fixes the memory leaks.

Pushed with minor adjustments --- mostly, getting rid of the now-redundant
canonicalize_path() call.  I also updated the documentation.

I notice the documentation formerly said "All library names are converted
to lower case unless double-quoted", which means that formally this isn't
a bug at all, but operating-as-designed-and-documented.  Still, I agree
it's an improvement.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Broken hint bits (freeze)

2017-06-20 Thread Bruce Momjian


Sorry, this email from June 16 didn't make it to the lists for some odd
reason so I am reposting it now.  I will apply a patch based on this
email shortly.

What is really odd is that I replied to this email already but the
original wasn't posted.  I think it was something about my email reader.

---

On Fri, Jun 16, 2017 at 10:57:33PM +0300, Sergey Burladyan wrote:
> Bruce Momjian  writes:
> > On Fri, Jun 16, 2017 at 04:33:16AM +0300, Sergey Burladyan wrote:
> > > Bruce Momjian  writes:
> > The way pg_upgrade uses rsync, the standby never needs to replay the WAL
> > when it starts up because we already copied the changed system tables
> > and hard linked the user data files.
> 
> Oh, it is my fail, I was not run test script completely for current git
> master. In git master it work as expected. But not in previous versions.
> I used this test script and got this result:
> 9.2 -> master: wal_level setting:replica
> 9.2 -> 9.6: wal_level setting:minimal
> 9.2 -> 9.5: wal_level setting:minimal
> 9.2 -> 9.4: Current wal_level setting:minimal

Wow, thank you again for your excellent research.

> >From git master pg_upgrade is restart new master again after
> pg_resetwal -o, as you said.
> 
> It is from src/bin/pg_upgrade/check.c:176
> void
> issue_warnings(void)
> {
> /* Create dummy large object permissions for old < PG 9.0? */
> if (GET_MAJOR_VERSION(old_cluster.major_version) <= 804)
> {
> start_postmaster(_cluster, true);
> new_9_0_populate_pg_largeobject_metadata(_cluster, false);
> stop_postmaster(false);
> }
> 
> /* Reindex hash indexes for old < 10.0 */
> if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
> {
> start_postmaster(_cluster, true);
> old_9_6_invalidate_hash_indexes(_cluster, false);
> stop_postmaster(false);
> }
> }

Yes, that is _exactly_ the right place to look.  Only in PG 10 do we
restart the new cluster to invalidate hash indexes.  In previous
releases we didn't do the restart.

That didn't matter with the old rsync instructions, but now that we have
removed the start/stop before rsync step, the final WAL status of
pg_upgrade matters.

I suggest applying the attached patch 

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

--2oS5YaxWCcQjTEyO
Content-Type: text/x-diff; charset=us-ascii
Content-Disposition: attachment; filename="wal.diff"

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
new file mode 100644
index 8b9e81e..b79e54a
*** a/src/bin/pg_upgrade/check.c
--- b/src/bin/pg_upgrade/check.c
*** report_clusters_compatible(void)
*** 174,196 
  
  
  void
! issue_warnings(void)
  {
/* Create dummy large object permissions for old < PG 9.0? */
if (GET_MAJOR_VERSION(old_cluster.major_version) <= 804)
-   {
-   start_postmaster(_cluster, true);
new_9_0_populate_pg_largeobject_metadata(_cluster, false);
-   stop_postmaster(false);
-   }
  
/* Reindex hash indexes for old < 10.0 */
if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
-   {
-   start_postmaster(_cluster, true);
old_9_6_invalidate_hash_indexes(_cluster, false);
!   stop_postmaster(false);
!   }
  }
  
  
--- 174,198 
  
  
  void
! issue_warnings_and_set_wal_level(void)
  {
+   /*
+* We unconditionally start/stop the new server because pg_resetwal -o
+* set wal_level to 'minimum'.  If the user is upgrading standby
+* servers using the rsync instructions, they will need pg_upgrade
+* to write its final WAL record showing wal_level as 'replica'.
+*/
+   start_postmaster(_cluster, true);
+ 
/* Create dummy large object permissions for old < PG 9.0? */
if (GET_MAJOR_VERSION(old_cluster.major_version) <= 804)
new_9_0_populate_pg_largeobject_metadata(_cluster, false);
  
/* Reindex hash indexes for old < 10.0 */
if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
old_9_6_invalidate_hash_indexes(_cluster, false);
! 
!   stop_postmaster(false);
  }
  
  
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
new file mode 100644
index ca1aa5c..2a9c397
*** a/src/bin/pg_upgrade/pg_upgrade.c
--- b/src/bin/pg_upgrade/pg_upgrade.c
*** main(int argc, char **argv)
*** 162,168 
create_script_for_cluster_analyze(_script_file_name);
create_script_for_old_cluster_deletion(_script_file_name);
  
!   issue_warnings();

Re: [HACKERS] Something is rotten in publication drop

2017-06-20 Thread Peter Eisentraut

On 6/19/17 21:57, Tom Lane wrote:
> Peter Eisentraut  writes:
>> If there are no new insights, I plan to proceed with the attached patch
>> tomorrow.  This leaves the existing view and function alone, adds
>> pg_relation_is_publishable() and uses that in psql.
> 
> Hm, patch looks okay,

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Broken hint bits (freeze)

2017-06-20 Thread Sergey Burladyan

Bruce Momjian  writes:

> On Tue, Jun 20, 2017 at 06:42:58PM +0300, Sergey Burladyan wrote:
> > If file at standby in old data directory is different from same file at
> > master, but it have same size, it will be hardlinked into new data
> > directory at standby and does not copied from master.
>
> Only if pg_upgrade created the hardlinks, right?

Yes, I have not tested rsync itself, but I think that you are right.

-- 
Sergey Burladyan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Decimal64 and Decimal128

2017-06-20 Thread Robert Haas

On Mon, Jun 19, 2017 at 3:47 PM, Peter Geoghegan  wrote:
> On Mon, Jun 19, 2017 at 12:19 PM, Robert Haas  wrote:
>> I don't have a specific use case in mind.  However, datumCopy() is
>> sure to be a lot faster when typByVal is true, and see also the
>> documentation changes in commit
>> 8472bf7a73487b0535c95e299773b882f7523463.
>
> Fair enough.
>
> I ask because at one time I informally benchmarked Postgres (using
> pgbench), where int4 (or maybe int8) primary keys were replaced with
> equivalent numeric primary keys. This was a SELECT benchmark. Anyway,
> the conclusion at the time was that it makes surprisingly little
> difference (I think it was ~5%), because cache misses dominate anyway,
> and the page layout doesn't really change (the fan-in didn't change
> *at all* either, at least for this one case, because of alignment
> considerations). I never published this result, because I didn't have
> time to test rigorously, and wasn't sure that there was sufficient
> interest.

People work pretty hard for a 5% performance improvement, so I
wouldn't dismiss that difference as nothing.  However, I think the
difference would probably be larger if you were using the values for
computations (e.g. sum, avg) rather than as PKs.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] postgresql transactons not fully isolated

2017-06-20 Thread Merlin Moncure

Michael Malis via:
http://malisper.me/postgres-transactions-arent-fully-isolated/  has
determined that postgresql transactions are not fully isolated even
when using serializable isolationl level.

If I prep a table, ints via:
postgres=# create table ints (n int);
CREATE TABLE
postgres=# insert into ints values (1);
INSERT 0 1
postgres=# insert into ints values (2);
INSERT 0 1

and then run two concurrent in serializable isolation mode
transactions like this:
T1: BEGIN
T1: UPDATE ints SET n = n + 1;
T2: BEGIN
T2: DELETE FROM ints where n = 2; -- blocks
T1: COMMIT; -- T2 frees
T2: SELECT * FROM ints;  -- both rows 2 and 3 visible
T2: COMMIT:


My understanding is that for serializable transactions, the result is
correct as long as you can play back transactions in either order, one
after another, when they overlap and get that result.  This is clearly
not the case since when played in either order you'd end up with one
row.

I guess the failure occurs there is some kind of separation between
when the row is initially looked up and the deletion is qualified.
This is likely not a problem in practice, but is Micheal right in is
assessment that we are not precisely following the spec?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Broken hint bits (freeze)

2017-06-20 Thread Sergey Burladyan

Amit Kapila  writes:

> > I am not sure about rsync, in my production server I have for example
> > 111 GB in pg_xlog and if I run rsync for pg_xlog it must send ~ 40GB
> > of new WALs I think.
> >
>
> Isn't the difference between old and new is just the last WAL segment
> file?  What is the source of this difference?

Postgres generate WAL files forward, and at standby too :-(
For example:
=== master ===
$ psql -c 'select pg_current_xlog_insert_location()'
 pg_current_xlog_insert_location 
-
 4ED09/34A74590
(1 row)

$ ls 9.2/main/pg_xlog/ | awk '/4ED090034/,/xxx/ { print }' | wc -l
2262
==

=== standby ===
$ psql -c 'select pg_last_xlog_replay_location()'
 pg_last_xlog_replay_location 
--
 4ED0A/AECFD7B8
(1 row)

postgres@avi-sql29:~$ ls 9.2/main/pg_xlog/ | awk '/4ED0A00AE/,/xxx/ { print 
}' | wc -l
2456
===

See https://www.postgresql.org/docs/9.2/static/wal-configuration.html
> they are recycled (renamed to become the next segments in the numbered 
> sequence)

-- 
Sergey Burladyan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Broken hint bits (freeze)

2017-06-20 Thread Bruce Momjian

On Tue, Jun 20, 2017 at 06:42:58PM +0300, Sergey Burladyan wrote:
> Bruce Momjian  writes:
> 
> > On Tue, Jun 20, 2017 at 01:10:26PM +0300, Sergey Burladyan wrote:
> > > Only if missing/changed files changed in size, because rsync run with
> > > --size-only it does not copy changed files with same size.
> >
> > I am sorry but I am not understanding.  Step 10.b says:
> >
> > 10.b Make sure the new standby data directories do not exist
> > 
> > Make sure the new standby data directories do not exist or are empty. If
> > initdb was run, delete the standby server data directories.
> >
> > so the _entire_ new data directory is empty before rsync is run, meaning
> > that it is an exact copy of the new master.
> 
> Yes, new data directory at standby is empty, but you missed old data
> directory at standby which is hardlink'ed by rsync into new as at master.

OK, I think I am getting closer to understanding.  Only some files are
hard-linked from the old master to the new master, specifically the user
data files (table and indexes).

> rsync run with _three_ arguments and with --hard-links option:
> rsync --archive --delete --hard-links --size-only old_pgdata new_pgdata 
> remote_dir
> (remote_dir is parent directory for old and new data at standby)
> 
> In this mode rsync compare not only new_pgdata with new empty data
> directory at standby, but also compare it with old data directory from
> standby and with --size-only it doing this compare only by the file
> existence or file size.

but it only going to create hard links for hard links that already exist
between the old and new masters.  If I am wrong, we are in big trouble
because rsync would not work.

> If file at standby in old data directory is different from same file at
> master, but it have same size, it will be hardlinked into new data
> directory at standby and does not copied from master.

Only if pg_upgrade created the hardlinks, right?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-06-20 Thread Daniel Verite

Andres Freund wrote:

> FWIW, I still think this needs a pgbench or similar example integration,
> so we can actually properly measure the benefits.

Here's an updated version of the patch I made during review,
adding \beginbatch and \endbatch to pgbench.
The performance improvement appears clearly
with a custom script of this kind:
  \beginbatch
 UPDATE pgbench_branches SET bbalance = bbalance + 1 WHERE bid = 0;
 ..above repeated 1000 times...
  \endbatch

versus the same with a BEGIN; END; pair instead of \beginbatch \endbatch

On localhost on my desktop I tend to see a 30% difference in favor
of the batch mode with that kind of test.
On slower networks there are much bigger differences.

The latest main patch (v10) must also be slightly updated for HEAD,
because of this:
error: patch failed: src/interfaces/libpq/exports.txt:171
v11 attached without any other change.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index c3bd4f9..366f278 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -4656,6 +4656,526 @@ int PQflush(PGconn *conn);
 
  
 
+ 
+  Batch mode and query pipelining
+
+  
+   libpq
+   batch mode
+  
+
+  
+   libpq
+   pipelining
+  
+
+  
+   libpq supports queueing up queries into
+   a pipeline to be executed as a batch on the server. Batching queries allows
+   applications to avoid a client/server round-trip after each query to get
+   the results before issuing the next query.
+  
+
+  
+   An example of batch use may be found in the source distribution in
+   src/test/modules/test_libpq/testlibpqbatch.c.
+  
+
+  
+   When to use batching
+
+   
+Much like asynchronous query mode, there is no performance disadvantage to
+using batching and pipelining. It increases client application complexity
+and extra caution is required to prevent client/server deadlocks but
+can sometimes offer considerable performance improvements.
+   
+
+   
+Batching is most useful when the server is distant, i.e. network latency
+(ping time) is high, and when many small operations are 
being performed in
+rapid sequence. There is usually less benefit in using batches when each
+query takes many multiples of the client/server round-trip time to execute.
+A 100-statement operation run on a server 300ms round-trip-time away would 
take
+30 seconds in network latency alone without batching; with batching it may 
spend
+as little as 0.3s waiting for results from the server.
+   
+
+   
+Use batches when your application does lots of small
+INSERT, UPDATE and
+DELETE operations that can't easily be transformed into
+operations on sets or into a
+COPY operation.
+   
+
+   
+Batching is less useful when information from one operation is required by 
the
+client before it knows enough to send the next operation. The client must
+introduce a synchronisation point and wait for a full client/server
+round-trip to get the results it needs. However, it's often possible to
+adjust the client design to exchange the required information server-side.
+Read-modify-write cycles are especially good candidates; for example:
+
+ BEGIN;
+ SELECT x FROM mytable WHERE id = 42 FOR UPDATE;
+ -- result: x=2
+ -- client adds 1 to x:
+ UPDATE mytable SET x = 3 WHERE id = 42;
+ COMMIT;
+
+could be much more efficiently done with:
+
+ UPDATE mytable SET x = x + 1 WHERE id = 42;
+
+   
+
+   
+
+ The batch API was introduced in PostgreSQL 10.0, but clients using 
PostgresSQL 10.0 version of libpq can
+ use batches on server versions 8.4 and newer. Batching works on any server
+ that supports the v3 extended query protocol.
+
+   
+
+  
+
+  
+   Using batch mode
+
+   
+To issue batches the application must switch
+a connection into batch mode. Enter batch mode with PQenterBatchMode(conn)
 or test
+whether batch mode is active with PQbatchStatus(conn). 
In batch mode only asynchronous operations are permitted, and
+COPY is not recommended as it most likely will trigger 
failure in batch processing. 
+Using any synchronous command execution functions such as 
PQfn,
+PQexec or one of its sibling functions are error 
conditions.
+Functions allowed in batch mode are described in . 
+   
+
+   
+The client uses libpq's asynchronous query functions to dispatch work,
+marking the end of each batch with PQbatchSyncQueue.
+And to get results, it uses PQgetResult and
+PQbatchProcessQueue. It may eventually exit
+batch mode with PQexitBatchMode once all results are
+processed.
+   
+
+   
+
+ It is best to use batch mode with libpq in
+ non-blocking mode. If used 
in
+ blocking mode it is possible for a client/server deadlock to occur. The
+ client will block trying

Re: [HACKERS] Broken hint bits (freeze)

2017-06-20 Thread Sergey Burladyan

Bruce Momjian  writes:

> On Tue, Jun 20, 2017 at 01:10:26PM +0300, Sergey Burladyan wrote:
> > Bruce Momjian  writes:
> > > > Uh, as I understand it the rsync is going to copy the missing WAL file
> > > > from the new master to the standby, right, and I think pg_controldata
> > > > too, so it should be fine.  Have you tested to see if it fails?
> > 
> > It need old WAL files from old version for correct restore heap
> > files. New WAL files from new version does not have this information.
> > 
> > > The point is that we are checking the "Latest checkpoint location" to
> > > make sure all the WAL was replayed.   We are never going to start the
> > > old standby server.  Rsync is going to copy the missing/changed files.
> > 
> > Only if missing/changed files changed in size, because rsync run with
> > --size-only it does not copy changed files with same size.
>
> I am sorry but I am not understanding.  Step 10.b says:
>
>   10.b Make sure the new standby data directories do not exist
>   
>   Make sure the new standby data directories do not exist or are empty. If
>   initdb was run, delete the standby server data directories.
>
> so the _entire_ new data directory is empty before rsync is run, meaning
> that it is an exact copy of the new master.

Yes, new data directory at standby is empty, but you missed old data
directory at standby which is hardlink'ed by rsync into new as at master.

rsync run with _three_ arguments and with --hard-links option:
rsync --archive --delete --hard-links --size-only old_pgdata new_pgdata 
remote_dir
(remote_dir is parent directory for old and new data at standby)

In this mode rsync compare not only new_pgdata with new empty data
directory at standby, but also compare it with old data directory from
standby and with --size-only it doing this compare only by the file
existence or file size.

If file at standby in old data directory is different from same file at
master, but it have same size, it will be hardlinked into new data
directory at standby and does not copied from master.

-- 
Sergey Burladyan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] initdb initalization failure for collation "ja_JP"

2017-06-20 Thread Tom Lane

I wrote:
> Marco Atzeri  writes:
>> Building on Cygwin latest 10  beta1 or head sourece,
>> make check fails as:
>> ...
>> performing post-bootstrap initialization ... 2017-05-31 23:23:22.214 
>> CEST [16860] FATAL:  collation "ja_JP" for encoding "EUC_JP" already exists

> Hmph.  Could we see the results of "locale -a | grep ja_JP" ?

Despite the lack of followup from the OP, I'm pretty troubled by this
report.  It shows that the reimplementation of OS collation data import
as pg_import_system_collations() is a whole lot more fragile than the
original coding.  We have never before trusted "locale -a" to not produce
duplicate outputs, not since the very beginning in 414c5a2e.  AFAICS,
the current coding has also lost the protections we added very shortly
after that in 853c1750f; and it has also lost the admittedly rather
arbitrary, but at least deterministic, preference order for conflicting
short aliases that was in the original initdb code.

I suppose the idea was to see whether we actually needed those defenses,
but since we have here a failure report after less than a month of beta,
it seems clear to me that we do.  I think we need to upgrade
pg_import_system_collations to have all the same logic that was there
before.

Now the hard part of that is that because pg_import_system_collations
isn't using a temporary staging table, but is just inserting directly
into pg_collation, there isn't any way for it to eliminate duplicates
unless it uses if_not_exists behavior all the time.  So there seem to
be two ways to proceed:

1. Drop pg_import_system_collations' if_not_exists argument and just
define it as adding any collations not already known in pg_collation.

2. Significantly rewrite it so that it de-dups the collation set by
hand before trying to insert into pg_collation.

#2 seems like a lot more work, but on the other hand, we might need
most of that logic anyway to get back deterministic alias handling.
However, since I cannot see any real-world use case at all for
if_not_exists = false, I figure we might as well do #1 and take
whatever simplification we can get that way.

I'm willing to do the legwork on this, but before I start, does
anyone have any ideas or objections?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Broken hint bits (freeze)

2017-06-20 Thread Bruce Momjian

On Tue, Jun 20, 2017 at 01:10:26PM +0300, Sergey Burladyan wrote:
> Bruce Momjian  writes:
> > > Uh, as I understand it the rsync is going to copy the missing WAL file
> > > from the new master to the standby, right, and I think pg_controldata
> > > too, so it should be fine.  Have you tested to see if it fails?
> 
> It need old WAL files from old version for correct restore heap
> files. New WAL files from new version does not have this information.
> 
> > The point is that we are checking the "Latest checkpoint location" to
> > make sure all the WAL was replayed.   We are never going to start the
> > old standby server.  Rsync is going to copy the missing/changed files.
> 
> Only if missing/changed files changed in size, because rsync run with
> --size-only it does not copy changed files with same size.

I am sorry but I am not understanding.  Step 10.b says:

10.b Make sure the new standby data directories do not exist

Make sure the new standby data directories do not exist or are empty. If
initdb was run, delete the standby server data directories.

so the _entire_ new data directory is empty before rsync is run, meaning
that it is an exact copy of the new master.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Something is rotten in publication drop

2017-06-20 Thread Peter Eisentraut

On 6/19/17 21:57, Tom Lane wrote:
> but while eyeballing it I started to wonder
> why in the world is pg_get_publication_tables marked prosecdef?
> If that has any consequences at all, they're probably bad.
> There are exactly no other built-in functions that have that set.

That was quite likely a mistake.  I've fixed it.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Broken hint bits (freeze)

2017-06-20 Thread Amit Kapila

On Tue, Jun 20, 2017 at 7:05 PM, Sergey Burladyan  wrote:
> Amit Kapila  writes:
>
>> On Tue, Jun 20, 2017 at 3:40 PM, Sergey Burladyan  
>> wrote:
> I use pg 9.2 and "skipping restartpoint, already performed at" is from
> src/backend/access/transam/xlog.c:8643
> after this statement it return from CreateRestartPoint() and do not run
>8687 CheckPointGuts(lastCheckPoint.redo, flags);
>

You are right, so it will skip restartpoint in such a case.

>> >> > Uh, as I understand it the rsync is going to copy the missing WAL file
>> >> > from the new master to the standby, right, and I think pg_controldata
>> >> > too, so it should be fine.  Have you tested to see if it fails?
>> >
>> > It need old WAL files from old version for correct restore heap
>> > files. New WAL files from new version does not have this information.
>> >
>>
>> So in such a case can we run rsync once before pg_upgrade?
>
> I just copy last WAL from stopped old master into running old standby
> before it shutdown and wait till it replayed. After that standby can
> issue restartpoint at the same location as in stopped master.
>

Hmm.  I think we need something that works with lesser effort because
not all users will be as knowledgeable as you are, so if they make any
mistakes in copying the file manually, it can lead to problems.  How
about issuing a notification (XLogArchiveNotifySeg) in shutdown
checkpoint if archiving is enabled?

> I am not sure about rsync, in my production server I have for example
> 111 GB in pg_xlog and if I run rsync for pg_xlog it must send ~ 40GB
> of new WALs I think.
>

Isn't the difference between old and new is just the last WAL segment
file?  What is the source of this difference?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Regarding Postgres Dynamic Shared Memory (DSA)

2017-06-20 Thread Dilip Kumar

On Tue, Jun 20, 2017 at 6:48 PM, Mahendranath Gurram
 wrote:
> The steps you followed are right. May i know in which OS you tried?
> Mac/Linux.
>
> Because,  In Mac, it is working fine. just as expected. But the
> problem(segmentation fault) I'm facing is with linux systems.

Mine is Linux.

CentOS Linux release 7.2.1511


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Broken hint bits (freeze)

2017-06-20 Thread Sergey Burladyan

Amit Kapila  writes:

> On Tue, Jun 20, 2017 at 3:40 PM, Sergey Burladyan  wrote:
> > Bruce Momjian  writes:
> >
> >> On Mon, Jun 19, 2017 at 10:59:19PM -0400, Bruce Momjian wrote:
> >> > On Tue, Jun 20, 2017 at 03:50:29AM +0300, Sergey Burladyan wrote:
> >> > > 20 июн. 2017 г. 1:21 пользователь "Bruce Momjian"  
> >> > > написал:
> >> > >
> >> > >
> >> > > We are saying that Log-Shipping should match "Latest checkpoint
> >> > > location", but the WAL for that will not be sent to the standby, 
> >> > > so it
> >> > > will not match, but that is OK since the only thing in the 
> >> > > non-shipped
> >> > > WAL file is the checkpoint record.  How should we modify the 
> >> > > wording on
> >> > > this?
> >> > >
> >> > >
> >> > > I am afraid that without this checkpoint record standby cannot make
> >> > > restartpoint
> >> > > and without restartpoint it does not sync shared buffers into disk at
> >> > > shutdown.
> >> >
>
> It seems to me at shutdown time on standby servers we specifically
> make restart points.  See below code in ShutdownXLOG()
>
> ..
> if (RecoveryInProgress())
> CreateRestartPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
> ..
>
> Do you have something else in mind?

What buffers this restartpoint will save into disk? I think what it can
save only buffers with LSN lower or equal to "Latest checkpoint
location". Buffers with LSN between "Minimum recovery ending
location" and "Latest checkpoint location" will not saved at all.

I set log_min_messages=debug2 and it is more clearly what happened here:
2017-06-20 13:18:32 GMT LOG:  restartpoint starting: xlog
...
2017-06-20 13:18:33 GMT DEBUG:  postmaster received signal 15
2017-06-20 13:18:33 GMT LOG:  received smart shutdown request
2017-06-20 13:18:33 GMT DEBUG:  updated min recovery point to 0/1200
2017-06-20 13:18:33 GMT CONTEXT:  writing block 2967 of relation 
base/16384/16385
2017-06-20 13:18:33 GMT DEBUG:  checkpoint sync: number=1 file=global/12587 
time=0.001 msec
2017-06-20 13:18:33 GMT DEBUG:  checkpoint sync: number=2 file=base/16384/12357 
time=0.000 msec
2017-06-20 13:18:33 GMT DEBUG:  checkpoint sync: number=3 file=base/16384/16385 
time=0.000 msec
2017-06-20 13:18:33 GMT DEBUG:  attempting to remove WAL segments older than 
log file 0001000B
2017-06-20 13:18:33 GMT DEBUG:  recycled transaction log file 
"0001000B"
2017-06-20 13:18:33 GMT DEBUG:  recycled transaction log file 
"0001000A"
2017-06-20 13:18:33 GMT DEBUG:  recycled transaction log file 
"00010009"
2017-06-20 13:18:33 GMT DEBUG:  SlruScanDirectory invoking callback on 
pg_subtrans/
2017-06-20 13:18:33 GMT LOG:  restartpoint complete: wrote 1824 buffers 
(44.5%); 0 transaction log file(s) added, 0 removed, 3 recycled; write=1.389 s, 
sync=0.000 s, total=1.389 s; sync files=3, longest=0.000 s, average=0.000 s
2017-06-20 13:18:33 GMT LOG:  recovery restart point at 0/F008D28
2017-06-20 13:18:33 GMT DETAIL:  last completed transaction was at log time 
2017-06-20 13:18:29.282645+00
2017-06-20 13:18:33 GMT LOG:  shutting down
2017-06-20 13:18:33 GMT DEBUG:  skipping restartpoint, already performed at 
0/F008D28
2017-06-20 13:18:33 GMT LOG:  database system is shut down


I use pg 9.2 and "skipping restartpoint, already performed at" is from
src/backend/access/transam/xlog.c:8643
after this statement it return from CreateRestartPoint() and do not run
   8687 CheckPointGuts(lastCheckPoint.redo, flags);

> >> > Uh, as I understand it the rsync is going to copy the missing WAL file
> >> > from the new master to the standby, right, and I think pg_controldata
> >> > too, so it should be fine.  Have you tested to see if it fails?
> >
> > It need old WAL files from old version for correct restore heap
> > files. New WAL files from new version does not have this information.
> >
>
> So in such a case can we run rsync once before pg_upgrade?

I just copy last WAL from stopped old master into running old standby
before it shutdown and wait till it replayed. After that standby can
issue restartpoint at the same location as in stopped master.

I am not sure about rsync, in my production server I have for example
111 GB in pg_xlog and if I run rsync for pg_xlog it must send ~ 40GB
of new WALs I think.

-- 
Sergey Burladyan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Default Partition for Range

2017-06-20 Thread Dilip Kumar

On Thu, Jun 15, 2017 at 11:20 AM, Beena Emerson  wrote:
> Hello,
>
> PFA the updated patch.
> This is rebased over v21 patches of list partition.
> (http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg316818.html)

While testing I have noticed segmentation fault for a simple case.

create table test(a int, b int) partition by range(a,b);
create table test1 partition of test DEFAULT;
postgres=# drop table test1;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

This is basically crashing in RelationBuildPartitionDesc, so I think
we don't have any test case for testing DEFAULT range partition where
partition key has more than one attribute.  So I suggest we can add
such test case.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Getting server crash on Windows when using ICU collation

2017-06-20 Thread Amit Kapila

On Tue, Jun 20, 2017 at 9:06 AM, Ashutosh Sharma  wrote:
> Hi,
>
> On Tue, Jun 20, 2017 at 2:36 AM, Peter Eisentraut
>  wrote:
>> On 6/19/17 00:42, Ashutosh Sharma wrote:
 If we don't find unconv, isn't it better to fall back to non-UTF8
 version rather than saying command not found?
>>> Well, if any of the ICU package is installed on our system then we
>>> will certainly find uconv command. The only case where we can see such
>>> error is when user doesn't have any of the ICU packages installed on
>>> their system and are somehow trying to perform icu enabled build and
>>> in such case the build configuration has to fail which i think is the
>>> expected behaviour. Anyways, it is not uconv that decides whether we
>>> should fallback to non-UTF8 or not. It's the ICU version that decides
>>> whether to stick to UTF8 or fallback to nonUTF8 version. Thanks.
>>
>> How do we know we're running the uconv command from the installation
>> that we will compile against?
>
> Okay, I think, you are talking about the case where we may have
> multiple ICU versions installed on our system and there might be a
> possibility that the uconv command may get executed from the ICU
> version that we are not trying to link with postgres.
>

To avoid that why can't we use the same icu path for executing uconv
as we are using for linking?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Regarding Postgres Dynamic Shared Memory (DSA)

2017-06-20 Thread Mahendranath Gurram

Hi Dilip,



Thanks for your response.



The steps you followed are right. May i know in which OS you tried? Mac/Linux.



Because,  In Mac, it is working fine. just as expected. But the 
problem(segmentation fault) i'm facing is with linux systems.



Thanks  Best Regards,

-Mahi
Teamwork divides the task and multiplies the success.








 On Tue, 20 Jun 2017 16:47:24 +0530 Dilip Kumar 
dilipbal...@gmail.com wrote 




On Tue, Jun 20, 2017 at 3:16 PM, Mahendranath Gurram 

mahendran...@zohocorp.com wrote: 

 Hi Thomas, 

 

 Any update on this? 

 

 Please let me know how can i proceed further. 

 

 Thanks  Best Regards, 

 -Mahi 

 

I did not see the code but just tested with your code. 

 

1) Added test_dsa to shared preload library. 

2) CREATE EXTENSION test_dsa; 

3) executed test_dsa_data_access(1) 

 

postgres=# select test_dsa_data_access(1); 

2017-06-20 16:49:39.483 IST [9738] LOG: Data read is == Hello world 

2017-06-20 16:49:39.483 IST [9738] STATEMENT: select test_dsa_data_access(1); 

2017-06-20 16:49:39.484 IST [9907] LOG: BackgroundWorker Shared 

LinkedList Data read is == Hello world0 

2017-06-20 16:49:39.484 IST [9907] LOG: BackgroundWorker Shared 

LinkedList Data read is == Hello world1 

2017-06-20 16:49:39.484 IST [9907] LOG: BackgroundWorker Shared 

LinkedList Data read is == Hello world2 

2017-06-20 16:49:39.484 IST [9907] LOG: BackgroundWorker Shared 

LinkedList Data read is == Hello world3 

2017-06-20 16:49:39.484 IST [9907] LOG: BackgroundWorker Shared 

LinkedList Data read is == Hello world4 

2017-06-20 16:49:39.484 IST [9907] LOG: BackgroundWorker Shared 

LinkedList Data read is == Hello world5 

2017-06-20 16:49:39.484 IST [9907] LOG: BackgroundWorker Shared 

LinkedList Data read is == Hello world6 

2017-06-20 16:49:39.484 IST [9907] LOG: BackgroundWorker Shared 

LinkedList Data read is == Hello world7 

2017-06-20 16:49:39.484 IST [9907] LOG: BackgroundWorker Shared 

LinkedList Data read is == Hello world8 

2017-06-20 16:49:39.485 IST [9738] LOG: LinkedList Node Data read is 

== Hello world0 

2017-06-20 16:49:39.485 IST [9738] STATEMENT: select test_dsa_data_access(1); 

2017-06-20 16:49:39.485 IST [9738] LOG: LinkedList Node Data read is 

== Hello world1 

2017-06-20 16:49:39.485 IST [9738] STATEMENT: select test_dsa_data_access(1); 

2017-06-20 16:49:39.485 IST [9738] LOG: LinkedList Node Data read is 

== Hello world2 

2017-06-20 16:49:39.485 IST [9738] STATEMENT: select test_dsa_data_access(1); 

2017-06-20 16:49:39.485 IST [9738] LOG: LinkedList Node Data read is 

== Hello world3 

2017-06-20 16:49:39.485 IST [9738] STATEMENT: select test_dsa_data_access(1); 

2017-06-20 16:49:39.485 IST [9738] LOG: LinkedList Node Data read is 

== Hello world4 

2017-06-20 16:49:39.485 IST [9738] STATEMENT: select test_dsa_data_access(1); 

2017-06-20 16:49:39.485 IST [9738] LOG: LinkedList Node Data read is 

== Hello world5 

2017-06-20 16:49:39.485 IST [9738] STATEMENT: select test_dsa_data_access(1); 

2017-06-20 16:49:39.485 IST [9738] LOG: LinkedList Node Data read is 

== Hello world6 

2017-06-20 16:49:39.485 IST [9738] STATEMENT: select test_dsa_data_access(1); 

2017-06-20 16:49:39.485 IST [9738] LOG: LinkedList Node Data read is 

== Hello world7 

2017-06-20 16:49:39.485 IST [9738] STATEMENT: select test_dsa_data_access(1); 

2017-06-20 16:49:39.485 IST [9738] LOG: LinkedList Node Data read is 

== Hello world8 

2017-06-20 16:49:39.485 IST [9738] STATEMENT: select test_dsa_data_access(1); 

2017-06-20 16:49:39.485 IST [9738] LOG: LinkedList Node Data read is 

== Hello world9 

2017-06-20 16:49:39.485 IST [9738] STATEMENT: select test_dsa_data_access(1); 

2017-06-20 16:49:39.485 IST [9738] LOG: LinkedList Node Data read is 

== Hello world10 

2017-06-20 16:49:39.485 IST [9738] STATEMENT: select test_dsa_data_access(1); 

2017-06-20 16:49:39.485 IST [9738] LOG: LinkedList Node Data read is 

== Hello world11 

2017-06-20 16:49:39.485 IST [9738] STATEMENT: select test_dsa_data_access(1); 

2017-06-20 16:49:39.485 IST [9738] LOG: LinkedList Node Data read is 

== Hello world12 

2017-06-20 16:49:39.485 IST [9738] STATEMENT: select test_dsa_data_access(1); 

2017-06-20 16:49:39.485 IST [9738] LOG: LinkedList Node Data read is 

== Hello world13 

2017-06-20 16:49:39.485 IST [9738] STATEMENT: select test_dsa_data_access(1); 

2017-06-20 16:49:39.485 IST [9738] LOG: LinkedList Node Data read is 

== Hello world14 

2017-06-20 16:49:39.485 IST [9738] STATEMENT: select test_dsa_data_access(1); 

2017-06-20 16:49:39.485 IST [9738] LOG: LinkedList Node Data read is 

== Hello world15 

2017-06-20 16:49:39.485 IST [9738] STATEMENT: select test_dsa_data_access(1); 

2017-06-20 16:49:39.485 IST [9738] LOG: LinkedList Node Data read is 

== Hello world16 

2017-06-20 16:49:39.485 IST [9738] STATEMENT: select test_dsa_data_access(1); 

2017-06-20 16:49:39.485 IST [9738] LOG: LinkedList Node Data read is 

== Hello world17 

2017-06-20 16:49:39.485

Re: [HACKERS] Phantom segment upon promotion causing troubles.

2017-06-20 Thread Heikki Linnakangas


On 06/19/2017 10:30 AM, Andres Freund wrote:

Greg Burek from Heroku (CCed) reported a weird issue on IM, that was
weird enough to be interesting.  What he'd observed was that he promoted
some PITR standby, and early clones of that node work, but later clones
did not, failing to read some segment.

The problems turns out to be the following:  [explanation]


Good detective work!


The minimal fix here is presumably not to use XLByteToPrevSeg() in
RemoveXlogFile(), but XLByteToSeg(). I don't quite see what purpose it
serves here - I don't think it's ever needed.


Agreed, I don't see a reason for it either.


There seems to be a larger question ehre though: Why does
XLogFileReadAnyTLI() probe all timelines even if they weren't a parent
at that period?  That seems like a bad idea, especially in more
complicated scenarios where some precursor timeline might live for
longer than it was a parent?  ISTM XLogFileReadAnyTLI() should check
which timeline a segment ought to come from, based on the historY?


Yeah. I've had that thought for years as well, but there has never been 
any pressing reason to bite the bullet and rewrite it, so I haven't 
gotten around to it.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting pd_lower in GIN metapage

2017-06-20 Thread Amit Kapila

On Tue, Jun 20, 2017 at 1:50 PM, Amit Langote
 wrote:
> On 2017/06/19 23:31, Tom Lane wrote:
>> Amit Kapila  writes:
>>> On Mon, Jun 19, 2017 at 11:37 AM, Amit Langote
>>>  wrote:
 What are some arguments against setting pd_lower in the GIN metapage as
 follows?
>>
>>> Actually, hash index also has similar code (See _hash_init_metabuffer)
>>> and I see no harm in doing this at similar other places.
>>
>> Seems reasonable.
>
> Here is a patch that does it for the GIN metapage.  (I am not sure if the
> changes to gin_mask() that are included in the patch are really necessary.)
>
 How about porting such a change to the back-branches if we do this at all?
 The reason I'm asking is that a certain backup tool relies on pd_lower
 values of data pages (disk blocks in relation files that are known to have
 a valid PageHeaderData) to be correct to discard the portion of every page
 that supposedly does not contain any useful information.  The assumption
 doesn't hold in the case of GIN metapage, so any GIN indexes contain
 corrupted metapage after recovery (metadata overwritten with zeros).
>>
>> I'm not in favor of back-porting such a change.  Even if we did, it would
>> only affect subsequently-created indexes not existing ones.  That means
>> your tool has to cope with an unset pd_lower in any case --- and will for
>> the foreseeable future, because of pg_upgrade.
>>
>> I'd suggest a rule like "if pd_lower is smaller than SizeOfPageHeaderData
>> then don't trust it, but assume all of the page is valid data".
>
> Actually, such a check is already in place in the tool, whose condition
> looks like:
>
> if (PageGetPageSize(header) == BLCKSZ &&
> PageGetPageLayoutVersion(header) == PG_PAGE_LAYOUT_VERSION &&
> (header->pd_flags & ~PD_VALID_FLAG_BITS) == 0 &&
> header->pd_lower >= SizeOfPageHeaderData &&
> header->pd_lower <= header->pd_upper &&
> header->pd_upper <= header->pd_special &&
> header->pd_special <= BLCKSZ &&
> header->pd_special == MAXALIGN(header->pd_special) && ...
>
> which even GIN metapage passes, making it an eligible data page and hence
> for omitting the hole between pd_lower and pd_upper.
>

Won't checking for GIN_META in header->pd_flags gives you what you want?



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Broken hint bits (freeze)

2017-06-20 Thread Amit Kapila

On Tue, Jun 20, 2017 at 3:40 PM, Sergey Burladyan  wrote:
> Bruce Momjian  writes:
>
>> On Mon, Jun 19, 2017 at 10:59:19PM -0400, Bruce Momjian wrote:
>> > On Tue, Jun 20, 2017 at 03:50:29AM +0300, Sergey Burladyan wrote:
>> > > 20 июн. 2017 г. 1:21 пользователь "Bruce Momjian"  
>> > > написал:
>> > >
>> > >
>> > > We are saying that Log-Shipping should match "Latest checkpoint
>> > > location", but the WAL for that will not be sent to the standby, so 
>> > > it
>> > > will not match, but that is OK since the only thing in the 
>> > > non-shipped
>> > > WAL file is the checkpoint record.  How should we modify the wording 
>> > > on
>> > > this?
>> > >
>> > >
>> > > I am afraid that without this checkpoint record standby cannot make
>> > > restartpoint
>> > > and without restartpoint it does not sync shared buffers into disk at
>> > > shutdown.
>> >

It seems to me at shutdown time on standby servers we specifically
make restart points.  See below code in ShutdownXLOG()

..
if (RecoveryInProgress())
CreateRestartPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
..

Do you have something else in mind?

>> > Uh, as I understand it the rsync is going to copy the missing WAL file
>> > from the new master to the standby, right, and I think pg_controldata
>> > too, so it should be fine.  Have you tested to see if it fails?
>
> It need old WAL files from old version for correct restore heap
> files. New WAL files from new version does not have this information.
>

So in such a case can we run rsync once before pg_upgrade?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Regarding Postgres Dynamic Shared Memory (DSA)

2017-06-20 Thread Dilip Kumar

On Tue, Jun 20, 2017 at 3:16 PM, Mahendranath Gurram
 wrote:
> Hi Thomas,
>
> Any update on this?
>
> Please let me know how can i proceed further.
>
> Thanks & Best Regards,
> -Mahi

I did not see the code but just tested with your code.

1) Added test_dsa to shared preload library.
2) CREATE EXTENSION test_dsa;
3) executed test_dsa_data_access(1)

postgres=# select test_dsa_data_access(1);
2017-06-20 16:49:39.483 IST [9738] LOG:  Data read is ==> Hello world
2017-06-20 16:49:39.483 IST [9738] STATEMENT:  select test_dsa_data_access(1);
2017-06-20 16:49:39.484 IST [9907] LOG:  BackgroundWorker Shared
LinkedList Data read is ==> Hello world0
2017-06-20 16:49:39.484 IST [9907] LOG:  BackgroundWorker Shared
LinkedList Data read is ==> Hello world1
2017-06-20 16:49:39.484 IST [9907] LOG:  BackgroundWorker Shared
LinkedList Data read is ==> Hello world2
2017-06-20 16:49:39.484 IST [9907] LOG:  BackgroundWorker Shared
LinkedList Data read is ==> Hello world3
2017-06-20 16:49:39.484 IST [9907] LOG:  BackgroundWorker Shared
LinkedList Data read is ==> Hello world4
2017-06-20 16:49:39.484 IST [9907] LOG:  BackgroundWorker Shared
LinkedList Data read is ==> Hello world5
2017-06-20 16:49:39.484 IST [9907] LOG:  BackgroundWorker Shared
LinkedList Data read is ==> Hello world6
2017-06-20 16:49:39.484 IST [9907] LOG:  BackgroundWorker Shared
LinkedList Data read is ==> Hello world7
2017-06-20 16:49:39.484 IST [9907] LOG:  BackgroundWorker Shared
LinkedList Data read is ==> Hello world8
2017-06-20 16:49:39.485 IST [9738] LOG:  LinkedList Node Data read is
==> Hello world0
2017-06-20 16:49:39.485 IST [9738] STATEMENT:  select test_dsa_data_access(1);
2017-06-20 16:49:39.485 IST [9738] LOG:  LinkedList Node Data read is
==> Hello world1
2017-06-20 16:49:39.485 IST [9738] STATEMENT:  select test_dsa_data_access(1);
2017-06-20 16:49:39.485 IST [9738] LOG:  LinkedList Node Data read is
==> Hello world2
2017-06-20 16:49:39.485 IST [9738] STATEMENT:  select test_dsa_data_access(1);
2017-06-20 16:49:39.485 IST [9738] LOG:  LinkedList Node Data read is
==> Hello world3
2017-06-20 16:49:39.485 IST [9738] STATEMENT:  select test_dsa_data_access(1);
2017-06-20 16:49:39.485 IST [9738] LOG:  LinkedList Node Data read is
==> Hello world4
2017-06-20 16:49:39.485 IST [9738] STATEMENT:  select test_dsa_data_access(1);
2017-06-20 16:49:39.485 IST [9738] LOG:  LinkedList Node Data read is
==> Hello world5
2017-06-20 16:49:39.485 IST [9738] STATEMENT:  select test_dsa_data_access(1);
2017-06-20 16:49:39.485 IST [9738] LOG:  LinkedList Node Data read is
==> Hello world6
2017-06-20 16:49:39.485 IST [9738] STATEMENT:  select test_dsa_data_access(1);
2017-06-20 16:49:39.485 IST [9738] LOG:  LinkedList Node Data read is
==> Hello world7
2017-06-20 16:49:39.485 IST [9738] STATEMENT:  select test_dsa_data_access(1);
2017-06-20 16:49:39.485 IST [9738] LOG:  LinkedList Node Data read is
==> Hello world8
2017-06-20 16:49:39.485 IST [9738] STATEMENT:  select test_dsa_data_access(1);
2017-06-20 16:49:39.485 IST [9738] LOG:  LinkedList Node Data read is
==> Hello world9
2017-06-20 16:49:39.485 IST [9738] STATEMENT:  select test_dsa_data_access(1);
2017-06-20 16:49:39.485 IST [9738] LOG:  LinkedList Node Data read is
==> Hello world10
2017-06-20 16:49:39.485 IST [9738] STATEMENT:  select test_dsa_data_access(1);
2017-06-20 16:49:39.485 IST [9738] LOG:  LinkedList Node Data read is
==> Hello world11
2017-06-20 16:49:39.485 IST [9738] STATEMENT:  select test_dsa_data_access(1);
2017-06-20 16:49:39.485 IST [9738] LOG:  LinkedList Node Data read is
==> Hello world12
2017-06-20 16:49:39.485 IST [9738] STATEMENT:  select test_dsa_data_access(1);
2017-06-20 16:49:39.485 IST [9738] LOG:  LinkedList Node Data read is
==> Hello world13
2017-06-20 16:49:39.485 IST [9738] STATEMENT:  select test_dsa_data_access(1);
2017-06-20 16:49:39.485 IST [9738] LOG:  LinkedList Node Data read is
==> Hello world14
2017-06-20 16:49:39.485 IST [9738] STATEMENT:  select test_dsa_data_access(1);
2017-06-20 16:49:39.485 IST [9738] LOG:  LinkedList Node Data read is
==> Hello world15
2017-06-20 16:49:39.485 IST [9738] STATEMENT:  select test_dsa_data_access(1);
2017-06-20 16:49:39.485 IST [9738] LOG:  LinkedList Node Data read is
==> Hello world16
2017-06-20 16:49:39.485 IST [9738] STATEMENT:  select test_dsa_data_access(1);
2017-06-20 16:49:39.485 IST [9738] LOG:  LinkedList Node Data read is
==> Hello world17
2017-06-20 16:49:39.485 IST [9738] STATEMENT:  select test_dsa_data_access(1);
2017-06-20 16:49:39.485 IST [9738] LOG:  LinkedList Node Data read is
==> Hello world18
2017-06-20 16:49:39.485 IST [9738] STATEMENT:  select test_dsa_data_access(1);
2017-06-20 16:49:39.485 IST [9738] LOG:  LinkedList Node Data read is
==> Hello world19
2017-06-20 16:49:39.485 IST [9738] STATEMENT:  select test_dsa_data_access(1);
 test_dsa_data_access
--
0


I don't see any segmentation fault. Is there some other step which I
have

Re: [HACKERS] Broken hint bits (freeze)

2017-06-20 Thread Sergey Burladyan

Bruce Momjian  writes:

> On Mon, Jun 19, 2017 at 10:59:19PM -0400, Bruce Momjian wrote:
> > On Tue, Jun 20, 2017 at 03:50:29AM +0300, Sergey Burladyan wrote:
> > > 20 июн. 2017 г. 1:21 пользователь "Bruce Momjian"  
> > > написал: 
> > > 
> > > 
> > > We are saying that Log-Shipping should match "Latest checkpoint
> > > location", but the WAL for that will not be sent to the standby, so it
> > > will not match, but that is OK since the only thing in the non-shipped
> > > WAL file is the checkpoint record.  How should we modify the wording 
> > > on
> > > this?
> > > 
> > > 
> > > I am afraid that without this checkpoint record standby cannot make
> > > restartpoint
> > > and without restartpoint it does not sync shared buffers into disk at
> > > shutdown. 
> > 
> > Uh, as I understand it the rsync is going to copy the missing WAL file
> > from the new master to the standby, right, and I think pg_controldata
> > too, so it should be fine.  Have you tested to see if it fails?

It need old WAL files from old version for correct restore heap
files. New WAL files from new version does not have this information.

> The point is that we are checking the "Latest checkpoint location" to
> make sure all the WAL was replayed.   We are never going to start the
> old standby server.  Rsync is going to copy the missing/changed files.

Only if missing/changed files changed in size, because rsync run with
--size-only it does not copy changed files with same size.

I have this test script and without copy_last_wal it make standby broken
in the first few loops, like:
=== run 1, cnt: 70
=== run 2, cnt: 729450

PS: I think what with big shared_buffers I can make it broken more
quickly, but with big shared_buffers I cannot break it at all, hm...

-- 
Sergey Burladyan



test_rsync.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Missing comment for ResultRelInfo in execnodes.h

2017-06-20 Thread Etsuro Fujita


Here is a small patch to add a comment on its new member PartitionRoot.

Best regards,
Etsuro Fujita
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index d33392f..7175a42 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -365,6 +365,7 @@ typedef struct JunkFilter
  * onConflictSetWhere  list of ON CONFLICT DO UPDATE 
exprs (qual)
  * PartitionCheck  partition check expression
  * PartitionCheckExpr  partition check expression state
+ * PartitionRoot   relation descriptor for root 
partitioned table
  * 
  */
 typedef struct ResultRelInfo

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Regarding Postgres Dynamic Shared Memory (DSA)

2017-06-20 Thread Mahendranath Gurram

Hi Thomas,



Any update on this?



Please let me know how can i proceed further.



Thanks  Best Regards,

-Mahi









 On Fri, 16 Jun 2017 18:47:37 +0530 Mahi Gurram teckym...@gmail.com 
wrote 




Hi Thomas,



Thanks for your response and suggestions to change the code.



Now i have modified my code as per your suggestions. Now dsa_area pointer is 
not in shared memory, it is a global variable. Also, implemented all your code 
suggestions but unfortunately, no luck. Still facing the same behaviour. Refer 
the attachment for the modified code.



I have some doubts in your response. Please clarify.



I didn't try your code but I see a few different problems here.  Every
backend is creating a new dsa area, and then storing the pointer to it
in shared memory instead of attaching from other backends using the
handle, and there are synchronisation problems.  That isn't going to
work.  Here's what I think you might want to try:

Actually i'm not creating dsa_area for every backend. I'm creating it only 
once(in BufferShmemHook). 

* I put prints in my _PG_init and  BufferShmemHook  function to confirm the 
same.



As far as i know, _PG_Init of a shared_library/extension is called only 
once(during startup) by postmaster process, and all the postgres backends are 
forked/child process to postmaster process.



Since the backends are the postmaster's child processes and are created after 
the shared memory(dsa_area) has been created and attached, the backend/child 
process will receive the shared memory segment in its address space and as a 
result no shared memory operations like dsa_attach are required to access/use 
dsa data.



Please correct me, if i'm wrong.



3.  Whether you are the backend that created it or a backend that
attached to it, I think you'll need to store the dsa_area in a global
variable for your UDFs to access.  Note that the dsa_area object will
be different in each backend: there is no point in storing that
address itself in shared memory, as you have it, as you certainly
can't use it in any other backend. In other words, each backend that
attached has its own dsa_area object that it can use to access the
common dynamic shared memory area.

In case of forked processes, the OS actually does share the pages initially, 
because fork implements copy-on-write semantics. which means that provided none 
of the processes modifies the pages, they both points to same address and the 
same data.



Based on above theory, assume i have created dsa_area object in postmaster 
process(_PG_Init) and is a global variable, all the backends/forked processes 
can able to access/share the same dsa_area object and it's members.



Hence theoretically, the code should work with out any issues. But i'm sure why 
it is not working as expected :(



I tried debugging by putting prints, and observed the below things:

1. dsa_area_control address is different among postmaster process and backends.

2. After restarting, they seems to be same and hence it is working after that.



2017-06-16 18:08:50.798 IST [9195] LOG:   Inside Postmaster Process, after 
dsa_create() +

2017-06-16 18:08:50.798 IST [9195] LOG:  the address of dsa_area_control is 
0x7f50ddaa6000
2017-06-16 18:08:50.798 IST [9195] LOG:  the dsa_area_handle is 1007561696
2017-06-16 18:11:01.904 IST [9224] LOG:   Inside UDF function in forked 
process +

2017-06-16 18:11:01.904 IST [9224] LOG:  the address of dsa_area_control is 
0x1dac910
2017-06-16 18:11:01.904 IST [9224] LOG:  the dsa_area_handle is 0
2017-06-16 18:11:01.907 IST [9195] LOG:  server process (PID 9224) was 
terminated by signal 11: Segmentation fault

2017-06-16 18:11:01.907 IST [9195] DETAIL:  Failed process was running: select 
test_dsa_data_access(1);

2017-06-16 18:11:01.907 IST [9195] LOG:  terminating any other active server 
processes

2017-06-16 18:11:01.907 IST [9227] FATAL:  the database system is in recovery 
mode

2017-06-16 18:11:01.907 IST [9220] WARNING:  terminating connection because of 
crash of another server process

2017-06-16 18:11:01.907 IST [9220] DETAIL:  The postmaster has commanded this 
server process to roll back the current transaction and exit, because another 
server process exited abnormally and possibly corrupted shared memory.

2017-06-16 18:11:01.907 IST [9220] HINT:  In a moment you should be able to 
reconnect to the database and repeat your command.

2017-06-16 18:11:01.907 IST [9195] LOG:  all server processes terminated; 
reinitialising

2017-06-16 18:08:50.798 IST [9195] LOG:   Inside Postmaster Process, after 
dsa_create() +

2017-06-16 18:11:01.937 IST [9195] LOG:  the address of dsa_area_control is 
0x7f50ddaa6000
2017-06-16 18:11:01.937 IST [9195] LOG:  the dsa_area_handle is 1833840303
2017-06-16 18:11:01.904 IST [9224] LOG:   Inside UDF function in forked 
process +

2017-06-16 18:12:24.247 IST [9239] LOG:  the address of dsa_area_control is 
0x7f50ddaa6000
2017-06-16 18:12:24.247 IST

Re: [HACKERS] Rules on table partitions

2017-06-20 Thread Amit Langote

On 2017/06/20 17:51, Dean Rasheed wrote:
> On 20 June 2017 at 03:01, Amit Langote  wrote:
>> On 2017/06/19 20:19, Dean Rasheed wrote:
>>> Perhaps we
>>> should explicitly forbid this for now -- i.e., raise a "not supported"
>>> error when attempting to add a rule to a partition, or attach a table
>>> with rules to a partitioned table.
>>
>> We could do that, but an oft-raised question is how different we should
>> make new partitions from the old-style inheritance child tables?
>>
>> Although a slightly different territory, you will also notice that
>> statement triggers of partitions won't be fired unless they are explicitly
>> named in the query, which is what happens for inheritance in general and
>> hence also for partitions.
>>
>>> Personally, I wouldn't regard adding proper support for rules on
>>> partitions as a high priority, so I'd be OK with it remaining
>>> unsupported unless someone cares enough to implement it, but that
>>> seems preferable to leaving it partially working in this way.
>>
>> Sure, if consensus turns out to be that we prohibit rules, statement
>> triggers, etc. that depend on the relation being explicitly named in the
>> query to be defined on partitions, I could draft up a patch for v10.
>>
> 
> Hmm, perhaps it's OK as-is. The user can always define the
> rules/triggers on both the parent and the children, if that's what
> they want.

I feel that the documentation in this area could clarify some of these
things a bit more (at least any differences that exist between the
old-style inheritance and new partitioning).  I will try to see if I can
come up with a sensible patch for that.

>>> Also, as things stand, it is possible to do the following:
>>>
>>> CREATE TABLE t2(a int, b int) PARTITION BY RANGE(a);
>>> CREATE TABLE t2_p PARTITION OF t2 FOR VALUES FROM (1) TO (10);
>>> CREATE RULE "_RETURN" AS ON SELECT TO t2_p DO INSTEAD SELECT * FROM t2;
>>>
>>> which results in the partition becoming a view that selects from the
>>> parent, which surely ought to be forbidden.
>>
>> Hmm, yes.  The following exercise convinced me.
>>
>> create table r (a int) partition by range (a);
>> create table r1 partition of r for values from (1) to (10);
>> create rule "_RETURN" as on select to r1 do instead select * from r;
>>
>> insert into r values (1);
>> ERROR:  cannot insert into view "r1"
>> HINT:  To enable inserting into the view, provide an INSTEAD OF INSERT
>> trigger or an unconditional ON INSERT DO INSTEAD rule.
>>
>> The error is emitted by CheckValidResultRel() that is called on individual
>> leaf partitions when setting up tuple-routing in ExecInitModifyTable.
>>
>> I agree that we should forbid this case,
> 
> Yeah. Also it would be possible to define the rule to select from a
> non-empty table, leading to rows appearing in the partition, but not
> the parent. Since we normally explicitly forbid the use of a view as a
> table partition, it seems worth closing the loophole in this case.

Oh, that's even worse. :-(

>> so please find attached a patch.
> 
> Thanks, I'll take a look at it later today.

Thanks.

Regards,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rules on table partitions

2017-06-20 Thread Dean Rasheed

On 20 June 2017 at 03:01, Amit Langote  wrote:
> On 2017/06/19 20:19, Dean Rasheed wrote:
>> Currently we allow rules to be defined on table partitions, but these
>> rules only fire when the partition is accessed directly, not when it
>> is accessed via the parent:
>
> Yeah, the same thing as will happen with an inheritance setup, but I guess
> you are asking whether that's what we want.
>
> With old-style inheritance based setup, you will see exactly the exact
> same result:
>
> Note that inheritance is expanded in the planner, not the rewriter, so the
> rules of partitions (that are added to the query later than the rewriter)
> won't fire, AFAICS.  Same will apply to partitions.
>

Ah yes, you're right. I was misremembering how that worked. In an
old-style inheritance-based setup you would have to define triggers on
the parent table to handle INSERT, and since they would insert
directly into the relevant child, any child INSERT rules would be
fired, but child UPDATE and DELETE rules would not. At least with
built-in partitioning it's consistent in not firing rules on the
partitions for any command.


>> Perhaps we
>> should explicitly forbid this for now -- i.e., raise a "not supported"
>> error when attempting to add a rule to a partition, or attach a table
>> with rules to a partitioned table.
>
> We could do that, but an oft-raised question is how different we should
> make new partitions from the old-style inheritance child tables?
>
> Although a slightly different territory, you will also notice that
> statement triggers of partitions won't be fired unless they are explicitly
> named in the query, which is what happens for inheritance in general and
> hence also for partitions.
>
>> Personally, I wouldn't regard adding proper support for rules on
>> partitions as a high priority, so I'd be OK with it remaining
>> unsupported unless someone cares enough to implement it, but that
>> seems preferable to leaving it partially working in this way.
>
> Sure, if consensus turns out to be that we prohibit rules, statement
> triggers, etc. that depend on the relation being explicitly named in the
> query to be defined on partitions, I could draft up a patch for v10.
>

Hmm, perhaps it's OK as-is. The user can always define the
rules/triggers on both the parent and the children, if that's what
they want.


>> Also, as things stand, it is possible to do the following:
>>
>> CREATE TABLE t2(a int, b int) PARTITION BY RANGE(a);
>> CREATE TABLE t2_p PARTITION OF t2 FOR VALUES FROM (1) TO (10);
>> CREATE RULE "_RETURN" AS ON SELECT TO t2_p DO INSTEAD SELECT * FROM t2;
>>
>> which results in the partition becoming a view that selects from the
>> parent, which surely ought to be forbidden.
>
> Hmm, yes.  The following exercise convinced me.
>
> create table r (a int) partition by range (a);
> create table r1 partition of r for values from (1) to (10);
> create rule "_RETURN" as on select to r1 do instead select * from r;
>
> insert into r values (1);
> ERROR:  cannot insert into view "r1"
> HINT:  To enable inserting into the view, provide an INSTEAD OF INSERT
> trigger or an unconditional ON INSERT DO INSTEAD rule.
>
> The error is emitted by CheckValidResultRel() that is called on individual
> leaf partitions when setting up tuple-routing in ExecInitModifyTable.
>
> I agree that we should forbid this case,

Yeah. Also it would be possible to define the rule to select from a
non-empty table, leading to rows appearing in the partition, but not
the parent. Since we normally explicitly forbid the use of a view as a
table partition, it seems worth closing the loophole in this case.

> so please find attached a patch.

Thanks, I'll take a look at it later today.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ECPG: WHENEVER statement with DO CONTINUE action

2017-06-20 Thread vinayak


Hi Sawada-san,

On 2017/06/20 17:22, Masahiko Sawada wrote:

On Tue, Jun 20, 2017 at 1:51 PM, vinayak
 wrote:


On 2017/06/12 13:09, vinayak wrote:

Hi,

On 2017/06/10 12:23, Vinayak Pokale wrote:

Thank you for your reply

On Jun 9, 2017 5:39 PM, "Michael Meskes"  wrote:

Could you please add a "DO CONTINUE" case to one of the test cases? Or
add a new one? We would need a test case IMO.


Yes I will add test case and send updated patch.

I have added new test case for DO CONTINUE.
Please check the attached patch.

I have added this in Sept. CF
https://commitfest.postgresql.org/14/1173/


I got the following warning by git show --check. I think you should
remove unnecessary whitespace. Also the code indent of
whenever_do_continue.pgc seems to need to be adjusted.

$ git show --check
commit a854aa0130589b7bd43b2c6c1c86651be91b1f59
Author: Vinayak Pokale 
Date:   Mon Jun 12 13:03:21 2017 +0900

 WHENEVER statement DO CONTINUE support

src/interfaces/ecpg/test/preproc/whenever_do_continue.pgc:16: trailing
whitespace.
+
src/interfaces/ecpg/test/preproc/whenever_do_continue.pgc:21: trailing
whitespace.
+
src/interfaces/ecpg/test/preproc/whenever_do_continue.pgc:24: trailing
whitespace.
+
src/interfaces/ecpg/test/preproc/whenever_do_continue.pgc:27: trailing
whitespace.
+
src/interfaces/ecpg/test/preproc/whenever_do_continue.pgc:35: trailing
whitespace.
+
src/interfaces/ecpg/test/preproc/whenever_do_continue.pgc:37: trailing
whitespace.
+
src/interfaces/ecpg/test/preproc/whenever_do_continue.pgc:39: trailing
whitespace.
+
src/interfaces/ecpg/test/preproc/whenever_do_continue.pgc:41: trailing
whitespace.
+
src/interfaces/ecpg/test/preproc/whenever_do_continue.pgc:47: trailing
whitespace.
+
src/interfaces/ecpg/test/preproc/whenever_do_continue.pgc:49: trailing
whitespace.
+
src/interfaces/ecpg/test/preproc/whenever_do_continue.pgc:52: trailing
whitespace.
+
src/interfaces/ecpg/test/preproc/whenever_do_continue.pgc:54: trailing
whitespace.
+
src/interfaces/ecpg/test/preproc/whenever_do_continue.pgc:1: new blank
line at EOF.

--
In whenever_do_continue.pgc file, the following line seems not to be
processed successfully by ecpg but should we fix that?

+
+   exec sql whenever sqlerror continue;
+

Also, you wrote the test case using "WHENEVER sqlerror DO CONTINUE"
action but that seems not to emit sqlerror, so "DO CONTINUE" is not
executed. I think the test case for DO CONTINUE should be a C code
that executes the "continue" clause.

Thank you for testing the patch.
I agreed with your comments. I will update the patch.

Regards,
Vinayak Pokale
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ECPG: WHENEVER statement with DO CONTINUE action

2017-06-20 Thread Masahiko Sawada

On Tue, Jun 20, 2017 at 1:51 PM, vinayak
 wrote:
>
>
> On 2017/06/12 13:09, vinayak wrote:
>
> Hi,
>
> On 2017/06/10 12:23, Vinayak Pokale wrote:
>
> Thank you for your reply
>
> On Jun 9, 2017 5:39 PM, "Michael Meskes"  wrote:
>>
>> Could you please add a "DO CONTINUE" case to one of the test cases? Or
>> add a new one? We would need a test case IMO.
>>
> Yes I will add test case and send updated patch.
>
> I have added new test case for DO CONTINUE.
> Please check the attached patch.
>
> I have added this in Sept. CF
> https://commitfest.postgresql.org/14/1173/
>

I got the following warning by git show --check. I think you should
remove unnecessary whitespace. Also the code indent of
whenever_do_continue.pgc seems to need to be adjusted.

$ git show --check
commit a854aa0130589b7bd43b2c6c1c86651be91b1f59
Author: Vinayak Pokale 
Date:   Mon Jun 12 13:03:21 2017 +0900

WHENEVER statement DO CONTINUE support

src/interfaces/ecpg/test/preproc/whenever_do_continue.pgc:16: trailing
whitespace.
+
src/interfaces/ecpg/test/preproc/whenever_do_continue.pgc:21: trailing
whitespace.
+
src/interfaces/ecpg/test/preproc/whenever_do_continue.pgc:24: trailing
whitespace.
+
src/interfaces/ecpg/test/preproc/whenever_do_continue.pgc:27: trailing
whitespace.
+
src/interfaces/ecpg/test/preproc/whenever_do_continue.pgc:35: trailing
whitespace.
+
src/interfaces/ecpg/test/preproc/whenever_do_continue.pgc:37: trailing
whitespace.
+
src/interfaces/ecpg/test/preproc/whenever_do_continue.pgc:39: trailing
whitespace.
+
src/interfaces/ecpg/test/preproc/whenever_do_continue.pgc:41: trailing
whitespace.
+
src/interfaces/ecpg/test/preproc/whenever_do_continue.pgc:47: trailing
whitespace.
+
src/interfaces/ecpg/test/preproc/whenever_do_continue.pgc:49: trailing
whitespace.
+
src/interfaces/ecpg/test/preproc/whenever_do_continue.pgc:52: trailing
whitespace.
+
src/interfaces/ecpg/test/preproc/whenever_do_continue.pgc:54: trailing
whitespace.
+
src/interfaces/ecpg/test/preproc/whenever_do_continue.pgc:1: new blank
line at EOF.

--
In whenever_do_continue.pgc file, the following line seems not to be
processed successfully by ecpg but should we fix that?

+
+   exec sql whenever sqlerror continue;
+

Also, you wrote the test case using "WHENEVER sqlerror DO CONTINUE"
action but that seems not to emit sqlerror, so "DO CONTINUE" is not
executed. I think the test case for DO CONTINUE should be a C code
that executes the "continue" clause.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting pd_lower in GIN metapage

2017-06-20 Thread Amit Langote

On 2017/06/19 23:31, Tom Lane wrote:
> Amit Kapila  writes:
>> On Mon, Jun 19, 2017 at 11:37 AM, Amit Langote
>>  wrote:
>>> What are some arguments against setting pd_lower in the GIN metapage as
>>> follows?
> 
>> Actually, hash index also has similar code (See _hash_init_metabuffer)
>> and I see no harm in doing this at similar other places.
> 
> Seems reasonable.

Here is a patch that does it for the GIN metapage.  (I am not sure if the
changes to gin_mask() that are included in the patch are really necessary.)

>>> How about porting such a change to the back-branches if we do this at all?
>>> The reason I'm asking is that a certain backup tool relies on pd_lower
>>> values of data pages (disk blocks in relation files that are known to have
>>> a valid PageHeaderData) to be correct to discard the portion of every page
>>> that supposedly does not contain any useful information.  The assumption
>>> doesn't hold in the case of GIN metapage, so any GIN indexes contain
>>> corrupted metapage after recovery (metadata overwritten with zeros).
> 
> I'm not in favor of back-porting such a change.  Even if we did, it would
> only affect subsequently-created indexes not existing ones.  That means
> your tool has to cope with an unset pd_lower in any case --- and will for
> the foreseeable future, because of pg_upgrade.
> 
> I'd suggest a rule like "if pd_lower is smaller than SizeOfPageHeaderData
> then don't trust it, but assume all of the page is valid data".

Actually, such a check is already in place in the tool, whose condition
looks like:

if (PageGetPageSize(header) == BLCKSZ &&
PageGetPageLayoutVersion(header) == PG_PAGE_LAYOUT_VERSION &&
(header->pd_flags & ~PD_VALID_FLAG_BITS) == 0 &&
header->pd_lower >= SizeOfPageHeaderData &&
header->pd_lower <= header->pd_upper &&
header->pd_upper <= header->pd_special &&
header->pd_special <= BLCKSZ &&
header->pd_special == MAXALIGN(header->pd_special) && ...

which even GIN metapage passes, making it an eligible data page and hence
for omitting the hole between pd_lower and pd_upper.

That's because a GIN metapage will always have undergone PageInit() that
sets pd_lower to SizeOfPageHeaderData.  Which means the tool has to look
beyond the standard PageHeaderData to determine whether the area between
pd_lower and pd_upper is really a hole.  Amit K also suggested the same,
but that seems to require either duplicating GIN's private struct
definition (of GinMetaPageData) in the tool or including backend's
gin_private.h, either of which doesn't seem to be a good thing to do in
what is FRONTEND code, but maybe there is no other way.  Am I missing
something?

Thanks,
Amit
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index d03d59da6a..b1755c6f1c 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -372,6 +372,13 @@ GinInitMetabuffer(Buffer b)
metadata->nDataPages = 0;
metadata->nEntries = 0;
metadata->ginVersion = GIN_CURRENT_VERSION;
+
+   /*
+* Set pd_lower just past the end of the metadata.  This is not 
essential
+* but it makes the page look compressible to xlog.c.
+*/
+   ((PageHeader) page)->pd_lower =
+   ((char *) metadata + sizeof(GinMetaPageData)) - (char 
*) page;
 }
 
 /*
diff --git a/src/backend/access/gin/ginxlog.c b/src/backend/access/gin/ginxlog.c
index 7ba04e324f..95b842bef0 100644
--- a/src/backend/access/gin/ginxlog.c
+++ b/src/backend/access/gin/ginxlog.c
@@ -776,18 +776,11 @@ gin_mask(char *pagedata, BlockNumber blkno)
mask_page_hint_bits(page);
 
/*
-* GIN metapage doesn't use pd_lower/pd_upper. Other page types do. 
Hence,
-* we need to apply masking for those pages.
+* For GIN_DELETED page, the page is initialized to empty. Hence, mask
+* the page content.
 */
-   if (opaque->flags != GIN_META)
-   {
-   /*
-* For GIN_DELETED page, the page is initialized to empty. 
Hence, mask
-* the page content.
-*/
-   if (opaque->flags & GIN_DELETED)
-   mask_page_content(page);
-   else
-   mask_unused_space(page);
-   }
+   if (opaque->flags & GIN_DELETED)
+   mask_page_content(page);
+   else
+   mask_unused_space(page);
 }

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting pd_lower in GIN metapage

2017-06-20 Thread Amit Langote

On 2017/06/19 22:59, Amit Kapila wrote:
> On Mon, Jun 19, 2017 at 11:37 AM, Amit Langote
>  wrote:
>> What are some arguments against setting pd_lower in the GIN metapage as
>> follows?
>>
>> GinMetaPageData *metad = GinPageGetMeta(page);
>>
>> ((PageHeader) page)->pd_lower =
>> ((char *) metad + sizeof(GinMetaPageData)) - (char *) page;
>>
>> I saw that _bt_initmetapage() does it, so was wondering why doesn't GIN.
>>
> 
> Actually, hash index also has similar code (See _hash_init_metabuffer)
> and I see no harm in doing this at similar other places.  This helps
> in compressing the hole in metapage during WAL writing.  I think that
> in itself might not be an argument in favor of doing this because
> there is only one meta page for index and saving ~7K WAL is not huge
> but OTOH I don't see a reason for not doing it.

I agree.  Will write a patch.

>> How about porting such a change to the back-branches if we do this at all?
>>  I couldn't find any discussion in the archives about this.  I read in
>> comments that server versions older than 9.4 didn't set pd_lower correctly
>> in any of GIN index pages, so relying on pd_lower value of GIN pages is
>> unreliable in general.
>>
>> The reason I'm asking is that a certain backup tool relies on pd_lower
>> values of data pages (disk blocks in relation files that are known to have
>> a valid PageHeaderData) to be correct to discard the portion of every page
>> that supposedly does not contain any useful information.  The assumption
>> doesn't hold in the case of GIN metapage, so any GIN indexes contain
>> corrupted metapage after recovery (metadata overwritten with zeros).
>>
> 
> Why can't you do a special check for metapage identification?  It
> should be easy to add such a check.  I have seen one of such tools
> (pg_filedump) has similar check to skip metapage in certain code
> paths.

I tried to add a metapage check, but getting such code to compile requires
it to include headers like gin_private.h (in PG versions < 10), which in
turn includes other headers that are forbidden to be included in what's
supposed to be FRONTEND code.   (thread at [1] seems relevant in this
regard.)  Another way to fix this might be to copy the GinMetaPageData
struct definition and a few other symbols into the tool's code and make
necessary checks using the same, instead of including gin_private.h.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BTgmoZ%3DF%3DGkxV0YEv-A8tb%2BAEGy_Qa7GSiJ8deBKFATnzfEug%40mail.gmail.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPDATE of partition key

2017-06-20 Thread Amit Khandekar

On 20 June 2017 at 03:46, Robert Haas  wrote:
> On Thu, Jun 15, 2017 at 1:36 PM, Amit Khandekar  
> wrote:
>> Attached patch v10 fixes the above. In the existing code, where it
>> builds WCO constraints for each leaf partition; with the patch, that
>> code now is applicable to row-movement-updates as well.
>
> I guess I don't see why it should work like this.  In the INSERT case,
> we must build withCheckOption objects for each partition because those
> partitions don't appear in the plan otherwise -- but in the UPDATE
> case, they're already there, so why do we need to build anything at
> all?  Similarly for RETURNING projections.  How are the things we need
> for those cases not already getting built, associated with the
> relevant resultRelInfos?  Maybe there's a concern if some children got
> pruned - they could turn out later to be the children into which
> tuples need to be routed. But the patch makes no distinction
> between possibly-pruned children and any others.

Yes, only a subset of the partitions appear in the UPDATE subplans. I
think typically for updates, a very small subset of the total leaf
partitions will be there in the plans, others would get pruned. IMHO,
it would not be worth having an optimization where it opens only those
leaf partitions which are not already there in the subplans. Without
the optimization, we are able to re-use the INSERT infrastructure
without additional changes.

>
>> There is another issue I discovered. The row-movement works fine if
>> the destination leaf partition has different attribute ordering than
>> the root : the existing insert-tuple-routing mapping handles that. But
>> if the source partition has different ordering w.r.t. the root, it has
>> a problem : there is no mapping in the opposite direction, i.e. from
>> the leaf to root. And we require that because the tuple of source leaf
>> partition needs to be converted to root partition tuple descriptor,
>> since ExecFindPartition() starts with root.
>
> Seems reasonable, but...
>
>> To fix this, I have introduced another mapping array
>> mtstate->mt_resultrel_maps[]. This corresponds to the
>> mtstate->resultRelInfo[]. We don't require per-leaf-partition mapping,
>> because the update result relations are pruned subset of the total
>> leaf partitions.
>
> ... I don't understand how you can *not* need a per-leaf-partition
> mapping.  I mean, maybe you only need the mapping for the *unpruned*
> leaf partitions

Yes, we need the mapping only for the unpruned leaf partitions, and
those partitions are available in the per-subplan resultRelInfo's.

> but you certainly need a separate mapping for each one of those.

You mean *each* of the leaf partitions ? I didn't get why we would
need it for each one. The tuple targeted for update belongs to one of
the per-subplan resultInfos. And this tuple is to be routed to another
leaf partition. So the reverse mapping is for conversion from the
source resultRelinfo to the root partition. I am unable to figure out
a scenario where we would require this reverse mapping for partitions
on which UPDATE is *not* going to be executed.

>
> It's possible to imagine driving the tuple routing off of just the
> partition key attributes, extracted from wherever they are inside the
> tuple at the current level, rather than converting to the root's tuple
> format.  However, that's not totally straightforward because there
> could be multiple levels of partitioning throughout the tree and
> different attributes might be needed at different levels.

Yes, the conversion anyway occurs at each of these levels even for
insert, specifically because there can be different partition
attributes each time. For update, its only one additional conversion.
But yes, this new mapping would be required for this one single
conversion.

> Moreover,
> in most cases, the mappings are going to end up being no-ops because
> the column order will be the same, so it's probably not worth
> complicating the code to try to avoid a double conversion that usually
> won't happen.

I agree.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

85 matches

Mail list logo