date:20170226

Re: [HACKERS] Creation of "Future" commit fest, named 2017-07

2017-02-26 Thread Robert Haas

On Mon, Feb 27, 2017 at 12:23 PM, Michael Paquier
 wrote:
> On Mon, Feb 27, 2017 at 3:46 PM, Robert Haas  wrote:
>> Do we, ah, have a CommitFest manager for this final CommitFest?
>
> No, victims found yet.
>
>> /me takes two quick steps backward.
>
> Which means that you are volunteering? This takes you out of the crowd :D

Ha!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Documentation improvements for partitioning

2017-02-26 Thread Amit Langote

On 2017/02/27 3:18, Petr Jelinek wrote:
> On 24/02/17 07:15, Robert Haas wrote:
>> On Fri, Feb 24, 2017 at 9:53 AM, Simon Riggs  wrote:
>>> The good news is that logical replication DOES work with partitioning,
>>> but only for a Publication with PUBLISH INSERT, pushing from a normal
>>> table to a partitioned one. Which is useful enough for first release.
>>>
>>> The work on having UPDATE work between partitions can be used to make
>>> updates and deletes work in logical replication. That might yet be
>>> made to work in this release, and I think we should look into it, but
>>> I think it can be left til next release if we try.
>>
>> Are you speaking of the case where you want to replicate from an
>> unpartitioned table to a partitioned table?  I would imagine that if
>> you are replicating between two identical partitioning hierarchies, it
>> would just work.  (If not, that's probably a bug, though I don't know
>> whose bug.)
>>
> 
> Yes same hierarchies will work but mainly because one has to add
> partitions themselves to publication currently. I guess that's the
> limitation we might have to live with in 10 as adding the whole
> partitioned table should probably work for different hierarchies when we
> enable it and I am not quite sure that's doable before start of the CF
> at this point.

If and when we add support to add partitioned tables to publications, I
think it will work by recursing to include the partitions to the same
publication (I see that OpenTableList() in publicationcmds.c calls
find_all_inheritors if recursion is requested by not specifying ONLY).
When a subscription replicates from this publication, it's going to match
changes for individual leaf partitions, not the root parent table.  IOW,
none of the changes applied to a partitioned table are replicated as
changes to the table itself.  So, it seems that adding a partitioned table
to a publication or subscription would simply be a *shorthand* for adding
all the (leaf) partitions that will actually emit and receive changes.
I'm not sure but should adding/removing partitions after the fact cause
their addition/removal from the publication (& subscription)? Maybe we'll
discuss these issues later.

By the way, we currently get the following error due to the relkind check
in check_publication_add_relation():

create publication p_pub for table p;
ERROR:  "p" is not a table
DETAIL:  Only tables can be added to publications.

Would it be better to produce an error message that explicitly states that
adding partitioned tables to publications is not supported.  Something like:

create publication p_pub for table p;
ERROR:  table "p" cannot be replicated
DETAIL:  Adding partitioned tables to publications is not supported.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Logical replication and inheritance

2017-02-26 Thread Amit Langote

I see that if the table is a inheritance parent, and ONLY is not
specified, the child tables are also added to the publication.

create table parent (a int);
create table child () inherits (parent);
create publication parent_pub for table parent;
\d parent
   Table "public.parent"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+-
 a  | integer |   |  |
Publications:
"parent_pub"
Number of child tables: 1 (Use \d+ to list them.)

\d child
   Table "public.child"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+-
 a  | integer |   |  |
Publications:
"parent_pub"
Inherits: parent

If the child table is later removed from the inheritance hierarchy, it
continues to be a part of the publication.

alter table child no inherit parent;
ALTER TABLE
\d child
   Table "public.child"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+-
 a  | integer |   |  |
Publications:
"parent_pub"

Perhaps that's intentional?

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump does not refresh matviews from extensions

2017-02-26 Thread Robert Haas

On Thu, Feb 23, 2017 at 5:11 AM, Jim Nasby  wrote:
> Now that I know that, I guess I'm kinda on the fence about doing it
> automatically, because AFAIK there'd be no way to override that automatic
> behavior. I can't really conceive of any reason you wouldn't want the
> refresh, but since it's not happening today...

Well, it would make pg_dump run longer to do something you may not care about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Creation of "Future" commit fest, named 2017-07

2017-02-26 Thread Michael Paquier

On Mon, Feb 27, 2017 at 3:46 PM, Robert Haas  wrote:
> Do we, ah, have a CommitFest manager for this final CommitFest?

No, victims found yet.

> /me takes two quick steps backward.

Which means that you are volunteering? This takes you out of the crowd :D
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] I propose killing PL/Tcl's "modules" infrastructure

2017-02-26 Thread Robert Haas

On Mon, Feb 27, 2017 at 1:24 AM, Tom Lane  wrote:
> * I'm not terribly comfortable about what the permissions levels of the
> GUCs ought to be.  The call permissions check means that you can't use
> either GUC to call a function you couldn't have called anyway.  However
> there's a separate risk of trojan-horse execution, analogous to what a
> blackhat can get by controlling the search_path GUC setting used by a
> SECURITY DEFINER function: the function might intend to invoke some pltcl
> function, but you can get it to invoke some other pltcl function in
> addition to that.  I think this means we had better make pltclu.start_proc
> be SUSET, but from a convenience standpoint it'd be nice if
> pltcl.start_proc were just USERSET.  An argument in favor of that is that
> we don't restrict search_path which is just as dangerous; but on the other
> hand, existing code should be expected to know that it needs to beware of
> search_path, while it wouldn't know that start_proc needs to be locked
> down.  Maybe we'd better make them both SUSET.

Making them SUSET sounds like a usability fail to me.  I'm not sure
how bad the security risks of NOT making them SUSET are, but I think
if we find that SUSET is required for safety then we've squeezed most
of the value out of the feature.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling parallelism for queries coming from SQL or other PL functions

2017-02-26 Thread Robert Haas

On Mon, Feb 27, 2017 at 8:33 AM, Amit Kapila  wrote:
> Is there any easy way to find out which way is less expensive?

No.  But that's a separate problem.  I'm just saying we shouldn't
arbitrarily prohibit parallelism for parallel-safe functions.

> Even
> if we find some way or just make a rule that when an outer query uses
> parallelism, then force function call to run serially, how do we
> achieve that I mean in each worker we can ensure that each
> individual statements from a function can run serially (by having a
> check of isparallelworker() in gather node), but having a similar
> check in the master backend is tricky or maybe we don't want to care
> for the same in master backend.  Do you have any suggestions on how to
> make it work?

I don't understand what's wrong with the existing logic in standard_planner().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DROP FUNCTION of multiple functions

2017-02-26 Thread Michael Paquier

On Sat, Feb 25, 2017 at 10:27 PM, Peter Eisentraut
 wrote:
> Here is a new patch set that addresses your comments.  The structure is
> still the same, just a bunch of things have been renamed based on
> suggestions.

+  
+   Drop multiple functions in one command:
+
+DROP FUNCTION sqrt(integer), sqrt(bigint);
+
  
Perhaps adding as well on the page of DROP OPERATOR an example
dropping multiple operators at once?

-DropPolicyStmt:
-   DROP POLICY name ON any_name opt_drop_behavior
-   {
Oh, nice. I can see that you have decided to bit the bullet. Thanks
for considering that.

I am marking this patch as ready for committer, aka you.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Creation of "Future" commit fest, named 2017-07

2017-02-26 Thread Robert Haas

On Mon, Feb 27, 2017 at 11:17 AM, Michael Paquier
 wrote:
> As the next commit fest beginning soon, there will be no place where
> hackers can park future patches for PG11~. So I have created a Future
> CF named 2017-07 where you can do that.
> Feel free to use it to park your future patches. The next development
> schedule is undecided, so the name of this future CF may change. Or
> not.

Do we, ah, have a CommitFest manager for this final CommitFest?

/me takes two quick steps backward.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel bitmap heap scan

2017-02-26 Thread Robert Haas

On Mon, Feb 27, 2017 at 10:30 AM, Dilip Kumar  wrote:
> On Sun, Feb 26, 2017 at 9:26 PM, Robert Haas  wrote:
>> tbm_free_shared_area because the two iterators share one copy of the
>> underlying iteration arrays, and the TBM code isn't smart enough to
>> avoid freeing them twice.  You're going to have to come up with a
>> better solution to that problem; nodeBitmapHeapScan.c shouldn't know
>> about the way the underlying storage details are managed.  (Maybe you
>> need to reference-count the iterator arrays?)
>
> Yeah, I also think current way doesn't look so clean, currently, these
> arrays are just integers array, may be we can use a first slot of the
> array for reference-count? or convert to the structure which has space
> for reference-count and an integers array. What do you suggest?

Maybe something like typedef struct { int refcnt; SomeType
somename[FLEXIBLE_ARRAY_MEMBER]; } SomeOtherType; would be a good
approach.

>> +if (node->inited)
>> +goto start_iterate;
>>
>> My first programming teacher told me not to use goto.  I've
>> occasionally violated that rule, but I need a better reason than you
>> have here.  It looks very easy to avoid.
>
> Yes, this can be avoided, I was just trying to get rid of multi-level
> if nesting and end up with the "goto".

That's what I figured.

>> +pbms_set_parallel(outerPlanState(node));
>>
>> I think this should be a flag in the plan, and the planner should set
>> it correctly, instead of having it be a flag in the executor that the
>> executor sets.  Also, the flag itself should really be called
>> something that involves the word "shared" rather than "parallel",
>> because the bitmap will not be created in parallel, but it will be
>> shared.
>
> Earlier, I thought that it will be not a good idea to set that flag in
> BitmapIndexScan path because the same path can be used either under
> ParallelBitmapHeapPath or under normal BitmapHeapPath.  But, now after
> putting some thought, I realised that we can do that in create_plan.
> Therefore, I will change this.

Cool.

>> pbms_is_leader() looks horrifically inefficient.  Every worker will
>> reacquire the spinlock for every tuple.  You should only have to enter
>> this spinlock-acquiring loop for the first tuple.  After that, either
>> you became the leader, did the initialization, and set the state to
>> PBM_FINISHED, or you waited until the pre-existing leader did the
>> same.  You should have a backend-local flag that keeps you from
>> touching the spinlock for every tuple.  I wouldn't be surprised if
>> fixing this nets a noticeable performance increase for this patch at
>> high worker counts.
>
> I think there is some confusion, if you notice, below code will avoid
> calling pbms_is_leader for every tuple.
> It will be called only first time. And, after that node->inited will
> be true and it will directly jump to start_iterate for subsequent
> calls.  Am I missing something?
>
>> +if (node->inited)
>> +goto start_iterate;

Oh, OK.  I guess I was just confused, then.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SerializedSnapshotData alignment

2017-02-26 Thread Robert Haas

On Mon, Feb 27, 2017 at 5:13 AM, Noah Misch  wrote:
> Dear 7b4ac19 authors,
>
> Field ps_snapshot_data usually receives four-byte alignment within
> ParallelIndexScanDescData, but it contains the eight-byte whenTaken field.
> The select_parallel test dies with SIGBUS on "Oracle Solaris 10 1/13
> s10s_u11wos_24a SPARC", building with gcc 4.9.2.  Some credible fixes:
>
> 1. Move the SerializedSnapshotData declaration from snapmgr.c to snapmgr.h and
>declare the ps_snapshot_data field to be of type SerializedSnapshotData.
>Probably also add a field "TransactionId xids[FLEXIBLE_ARRAY_MEMBER]" to
>SerializedSnapshotData, to assert the variable-length nature.
>
> 2. Change "char ps_snapshot_data[FLEXIBLE_ARRAY_MEMBER]" to "int64 ...".  I
>have attached this in SerializedSnapshot-int64-v1.patch.
>
> 3. Change no declarations, and make snapmgr.c memcpy() the
>SerializedSnapshotData through a local buffer.  I have attached this as
>SerializedSnapshot-memcpy-v1.patch.
>
> I like (2) well enough, but I don't see that technique used elsewhere in the
> tree.  (1) is more typical of PostgreSQL, though I personally like it when
> structs can stay private to a file.  (3) is also well-attested, particularly
> in xlog replay code.  I am leaning toward (2).  Other opinions?

In my imagination, we were already doing #3, so I'd probably favor
that approach.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_serial early wraparound

2017-02-26 Thread Thomas Munro

On Mon, Feb 27, 2017 at 7:28 PM, Thomas Munro
 wrote:
> On Wed, Nov 9, 2016 at 11:07 AM, Thomas Munro
>  wrote:
>> The SLRU managed by predicate.c can wrap around and overwrite data if
>> you have more than 1 billion active XIDs.  That's because when SSI was
>> implemented, slru.c was limited to four digit segment names, which
>> implied a page limit that wasn't enough for pg_serial to have space
>> for every possible XID.  We should probably rip that code out, because
>> SLRUs now support five digit segment names.  Something like the
>> attached.  I'll post a test script to demonstrate correct wraparound
>> behaviour around in time for one of the later CFs.
>
> Here is a shell script that shows a full rotation through xid space if
> you build PostgreSQL with TEST_OLDSERXID, which you can do by
> uncommenting a line in predicate.c.  On master we see the SLRU
> segments go around the clock twice for each time xid goes around.
> With the patch it goes around just once, adding an extra character to
> the segment name to double the space.

I attached the wrong version.  Here is the right one.

-- 
Thomas Munro
http://www.enterprisedb.com


ssi-slru-wraparound-test.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_serial early wraparound

2017-02-26 Thread Thomas Munro

On Wed, Nov 9, 2016 at 11:07 AM, Thomas Munro
 wrote:
> The SLRU managed by predicate.c can wrap around and overwrite data if
> you have more than 1 billion active XIDs.  That's because when SSI was
> implemented, slru.c was limited to four digit segment names, which
> implied a page limit that wasn't enough for pg_serial to have space
> for every possible XID.  We should probably rip that code out, because
> SLRUs now support five digit segment names.  Something like the
> attached.  I'll post a test script to demonstrate correct wraparound
> behaviour around in time for one of the later CFs.

Here is a shell script that shows a full rotation through xid space if
you build PostgreSQL with TEST_OLDSERXID, which you can do by
uncommenting a line in predicate.c.  On master we see the SLRU
segments go around the clock twice for each time xid goes around.
With the patch it goes around just once, adding an extra character to
the segment name to double the space.

By the way, I think the real number of xids it can hold today is
(65536 * 32 * 8192) / sizeof(uint64) = 2^31 xids, not 2^30 as
indicated by an existing comment.  So I think there is actually enough
space and wrapping is probably harmess, but it seems cleaner and
simpler not to do that and to rip out the scary warning code, so I'll
add this to the CF.

-- 
Thomas Munro
http://www.enterprisedb.com

ssi-slru-wraparound-test.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increasing the default WAL segment size

2017-02-26 Thread Beena Emerson

Hello,

Since the following commit, does not enable us to apply the patch cleanly,
I have attached a rebased patch 02.  patch 01 does not have any problems.
commit 9e3755ecb2d058f7d123dd35a2e1784006190962
Author: Tom Lane 
Date:   Sat Feb 25 16:12:24 2017 -0500

Remove useless duplicate inclusions of system header files.


-- 
Thank you,

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


02-initdb-walsegsize-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PGSERVICEFILE as a connection string parameter

2017-02-26 Thread David Fetter

On Sun, Feb 26, 2017 at 10:03:40PM -0800, Andres Freund wrote:
> Hi,
> 
> On 2017-02-27 14:43:49 +0900, Michael Paquier wrote:
> > I bumped into a case where it would have been rather useful to
> > specify a service file path in a connection string with a service
> > name. In my case, I have finished by setting up PGSERVICEFILE, but
> > now like PGPASSFILE I think that being able to define the service
> > file available as well as a connection parameter would be useful
> > as well.
> >
> > I am not planning to work on that immediately (there is one day
> > left for the last CF of PG10!), but I was wondering if people
> > would be interested in something like that.
> 
> Hm - I'm not sure that's a good idea. service files are a libpq
> feature, but connection strings are a bit more universal than just
> libpq...

You bring up a salient point.  What say we make pg_services a little
more universal?  I'm guessing that the Java port wouldn't be too
complicated.  It's already well defined.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PGSERVICEFILE as a connection string parameter

2017-02-26 Thread Andres Freund

Hi,

On 2017-02-27 14:43:49 +0900, Michael Paquier wrote:
> I bumped into a case where it would have been rather useful to specify
> a service file path in a connection string with a service name. In my
> case, I have finished by setting up PGSERVICEFILE, but now like
> PGPASSFILE I think that being able to define the service file
> available as well as a connection parameter would be useful as well.
>
> I am not planning to work on that immediately (there is one day left
> for the last CF of PG10!), but I was wondering if people would be
> interested in something like that.

Hm - I'm not sure that's a good idea. service files are a libpq feature,
but connection strings are a bit more universal than just libpq...

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SerializedSnapshotData alignment

2017-02-26 Thread Amit Kapila

On Mon, Feb 27, 2017 at 5:13 AM, Noah Misch  wrote:
> Dear 7b4ac19 authors,
>
> Field ps_snapshot_data usually receives four-byte alignment within
> ParallelIndexScanDescData, but it contains the eight-byte whenTaken field.
> The select_parallel test dies with SIGBUS on "Oracle Solaris 10 1/13
> s10s_u11wos_24a SPARC", building with gcc 4.9.2.  Some credible fixes:
>
> 1. Move the SerializedSnapshotData declaration from snapmgr.c to snapmgr.h and
>declare the ps_snapshot_data field to be of type SerializedSnapshotData.
>Probably also add a field "TransactionId xids[FLEXIBLE_ARRAY_MEMBER]" to
>SerializedSnapshotData, to assert the variable-length nature.
>
> 2. Change "char ps_snapshot_data[FLEXIBLE_ARRAY_MEMBER]" to "int64 ...".  I
>have attached this in SerializedSnapshot-int64-v1.patch.
>
> 3. Change no declarations, and make snapmgr.c memcpy() the
>SerializedSnapshotData through a local buffer.  I have attached this as
>SerializedSnapshot-memcpy-v1.patch.
>
> I like (2) well enough, but I don't see that technique used elsewhere in the
> tree.
>

You seem to have forgotten to change all the callers of
Serialize/RestoreSnapshot to int64* which is causing below warnings on
my m/c:

1>src/backend/access/transam/parallel.c(334): warning C4133:
'function' : incompatible types - from 'char *' to 'int64 *'
1>src/backend/access/transam/parallel.c(338): warning C4133:
'function' : incompatible types - from 'char *' to 'int64 *'
1>src/backend/access/transam/parallel.c(1072): warning C4133:
'function' : incompatible types - from 'char *' to 'int64 *'
1>src/backend/access/transam/parallel.c(1078): warning C4133:
'function' : incompatible types - from 'char *' to 'int64 *'


>  (1) is more typical of PostgreSQL, though I personally like it when
> structs can stay private to a file.  (3) is also well-attested, particularly
> in xlog replay code.  I am leaning toward (2).  Other opinions?
>

Your Approach-2 patch looks like a sane idea to me, if we decide to do
it some other way, I can write a patch unless you prefer to do it by
yourself.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Creation of "Future" commit fest, named 2017-07

2017-02-26 Thread Michael Paquier

Hi all,

As the next commit fest beginning soon, there will be no place where
hackers can park future patches for PG11~. So I have created a Future
CF named 2017-07 where you can do that.
Feel free to use it to park your future patches. The next development
schedule is undecided, so the name of this future CF may change. Or
not.
Thanks,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] PGSERVICEFILE as a connection string parameter

2017-02-26 Thread Michael Paquier

Hi all,

I bumped into a case where it would have been rather useful to specify
a service file path in a connection string with a service name. In my
case, I have finished by setting up PGSERVICEFILE, but now like
PGPASSFILE I think that being able to define the service file
available as well as a connection parameter would be useful as well.

I am not planning to work on that immediately (there is one day left
for the last CF of PG10!), but I was wondering if people would be
interested in something like that.
So, opinions?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog

2017-02-26 Thread Michael Paquier

On Sun, Feb 26, 2017 at 8:24 AM, Michael Paquier
 wrote:
> To be consistent with archive_command and restore_command I'd rather
> not do that. The command called can decide by itself what to do by
> looking at the shape of the argument string.

Just before the CF begins, I have taken some time to build up a patch
that implements this --end-segment-command, with %f as placeholder.
The patch is registered here:
https://commitfest.postgresql.org/13/1040/
Comments are welcome.
-- 
Michael


pgreceivewal-endseg-cmd.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical Replication and Character encoding

2017-02-26 Thread Kyotaro HORIGUCHI

Sorry for the abesnse.

At Fri, 24 Feb 2017 02:43:14 +, "Shinoda, Noriyoshi" 
 wrote in 

> >From: Peter Eisentraut [mailto:peter.eisentr...@2ndquadrant.com] 
> >Sent: Friday, February 24, 2017 1:32 AM
> >To: Petr Jelinek ; Kyotaro HORIGUCHI 
> >
> >Cc: cr...@2ndquadrant.com; Shinoda, Noriyoshi ; 
> >pgsql-hackers@postgresql.org
> >Subject: Re: [HACKERS] Logical Replication and Character encoding
> >
> >On 2/17/17 10:14, Peter Eisentraut wrote:
> >> Well, it is sort of a libpq connection, and a proper libpq client 
> >> should set the client encoding, and a proper libpq server should do 
> >> encoding conversion accordingly.  If we just play along with this, it 
> >> all works correctly.
> >> 
> >> Other output plugins are free to ignore the encoding settings (just 
> >> like libpq can send binary data in some cases).
> >> 
> >> The attached patch puts it all together.
> >
> >committed
..
> However, in the case of PUBLICATION(UTF-8) and SUBSCRIOTION(EUC_JP) 
> environment, the following error was output and the process went down.
...
> LOG:  starting logical replication worker for subscription "sub1"
> LOG:  logical replication apply for subscription "sub1" has started
> ERROR:  insufficient data left in message
> LOG:  worker process: logical replication worker for subscription 16439 (PID 
> 22583) exited with exit code 1

Yeah, the patch sends converted string with the length of the
orignal length. Usually encoding conversion changes the length of
a string. I doubt that the reverse case was working correctly.

As the result pg_sendstring is not usable for this case since we
don't have the true length of the string to be sent. So my first
patch did the same thing using pg_server_to_client() explicitly.

That being said, I think that a more important thing is that the
consensus about the policy of logical replication between
databases with different encodings is refusing connection. The
reason for that is it surely breaks BDR usage for some
combinations of encodings.


Anyway the attached patch fixes the current bug about encoding in
logical replication.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index bc6e9b5..da81a2d 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -16,6 +16,7 @@
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_type.h"
 #include "libpq/pqformat.h"
+#include "mb/pg_wchar.h"
 #include "replication/logicalproto.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
@@ -442,9 +443,13 @@ logicalrep_write_tuple(StringInfo out, Relation rel, HeapTuple tuple)
 		pq_sendbyte(out, 't');	/* 'text' data follows */
 
 		outputstr = OidOutputFunctionCall(typclass->typoutput, values[i]);
+
+		if (pg_get_client_encoding() != GetDatabaseEncoding())
+			outputstr = pg_server_to_client(outputstr, strlen(outputstr));
+
 		len = strlen(outputstr) + 1;	/* null terminated */
 		pq_sendint(out, len, 4);		/* length */
-		pq_sendstring(out, outputstr);	/* data */
+		appendBinaryStringInfo(out, outputstr, len); /* data */
 
 		pfree(outputstr);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2017-02-26 Thread Nikhil Sontakke

Hi Michael,

Thanks for taking a look at the patch.



> twophase.c: In function ‘PrepareRedoAdd’:
> twophase.c:2539:20: warning: variable ‘gxact’ set but not used
> [-Wunused-but-set-variable]
>   GlobalTransaction gxact;
> There is a warning at compilation.
>
>
Will fix.


> The comment at the top of PrescanPreparedTransactions() needs to be
> updated. Not only does this routine look for the contents of
> pg_twophase, but it does also look at the shared memory contents, at
> least those ones marked with inredo and not on_disk.
>
>
Oh yes. Will change comments at top of
StandbyRecoverPreparedTransactions(), RecoverPreparedTransactions() as well.


> +   ereport(WARNING,
> +   (errmsg("removing future two-phase state data from
> memory \"%u\"",
> +   xid)));
> +   PrepareRedoRemove(xid);
> +   continue
> Those are not normal (partially because unlink is atomic, but not
> durable)... But they match the correct coding pattern regarding
> incorrect 2PC entries... I'd really like to see those switched to a
> FATAL with unlink() made durable for those calls.
>
>
Hmm, not sure what exactly we need to do here. If you look at the prior
checks, there we already skip on-disk entries. So, typically, the entries
that we encounter here will be in shmem only.


> +   /* Deconstruct header */
> +   hdr = (TwoPhaseFileHeader *) buf;
> +   Assert(TransactionIdEquals(hdr->xid, xid));
> +
> +   if (TransactionIdPrecedes(xid, result))
> +   result = xid;
> This portion is repeated three times and could be easily refactored.
> You could just have a routine that returns the oldes transaction ID
> used, and ignore the result for StandbyRecoverPreparedTransactions()
> by casting a (void) to let any kind of static analyzer understand that
> we don't care about the result for example. Handling for subxids is
> necessary as well depending on the code path. Spliting things into a
> could of sub-routines may be more readable as well. There are really a
> couple of small parts that can be gathered and strengthened.
>
>
I will see if we can reduce this to a couple of function calls.


> +   /*
> +* Recreate its GXACT and dummy PGPROC
> +*/
> +   gxactnew = MarkAsPreparing(xid, gid,
> +   hdr->prepared_at,
> +   hdr->owner, hdr->database,
> +   gxact->prepare_start_lsn,
> +   gxact->prepare_end_lsn);
> MarkAsPreparing() does not need to be extended with two new arguments.
> RecoverPreparedTransactions() is used only at the end of recovery,
> where it is not necessary to look at the 2PC state files from the
> records. In this code path inredo is also set to false :)
>
>
That's not true. We will have entries with inredo set at the end of
recovery as well. Infact the MarkAsPreparing() call
from RecoverPreparedTransactions() is the one which will remove these
inredo entries and convert them into regular entries. We have optimized the
recovery code path as well.


> +   {
> +   /*
> +* Entry could be on disk. Call with giveWarning=false
> +* since it can be expected during replay.
> +*/
> +   RemoveTwoPhaseFile(xid, false);
> +   }
> This would be removed at the end of recovery anyway as a stale entry,
> so that's not necessary.
>
>
Ok, will remove this.


> +   /* Delete TwoPhaseState gxact entry and/or 2PC file. */
> +   PrepareRedoRemove(parsed.twophase_xid);
> Both things should not be present, no? If the file is pushed to disk
> it means that the checkpoint horizon has already moved.
>
>
PREPARE in redo, followed by a checkpoint, followed by a COMMIT/ROLLBACK.
We can have both the bits set in this case.



> -   ereport(ERROR,
> +   /* It's ok to find an entry in the redo/recovery case */
> +   if (!gxact->inredo)
> +   ereport(ERROR,
> (errcode(ERRCODE_DUPLICATE_OBJECT),
>  errmsg("transaction identifier \"%s\" is already in
> use",
> gid)));
> +   else
> +   {
> +   found = true;
> +   break;
> +   }
> I would not have thought so.
>
> Since we are using the TwoPhaseState structure to track redo entries, at
end of recovery, we will find existing entries. Please see my comments
above for RecoverPreparedTransactions()


> MarkAsPreparing and MarkAsPreparingInRedo really share the same code.
> What if the setup of the dummy PGPROC entry is made conditional?
>

I thought it was cleaner this ways. We can definitely add a bunch of
if-else in MarkAsPreparing() but it won't look pretty.

Regards,
Nikhils
-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services

Re: [HACKERS] Parallel bitmap heap scan

2017-02-26 Thread Dilip Kumar

On Sun, Feb 26, 2017 at 9:26 PM, Robert Haas  wrote:

Thanks for the review, I will work on these. There are some comments I
need suggestions.

> tbm_free_shared_area because the two iterators share one copy of the
> underlying iteration arrays, and the TBM code isn't smart enough to
> avoid freeing them twice.  You're going to have to come up with a
> better solution to that problem; nodeBitmapHeapScan.c shouldn't know
> about the way the underlying storage details are managed.  (Maybe you
> need to reference-count the iterator arrays?)

Yeah, I also think current way doesn't look so clean, currently, these
arrays are just integers array, may be we can use a first slot of the
array for reference-count? or convert to the structure which has space
for reference-count and an integers array. What do you suggest?

>
> +if (node->inited)
> +goto start_iterate;
>
> My first programming teacher told me not to use goto.  I've
> occasionally violated that rule, but I need a better reason than you
> have here.  It looks very easy to avoid.

Yes, this can be avoided, I was just trying to get rid of multi-level
if nesting and end up with the "goto".

>
> +pbms_set_parallel(outerPlanState(node));
>
> I think this should be a flag in the plan, and the planner should set
> it correctly, instead of having it be a flag in the executor that the
> executor sets.  Also, the flag itself should really be called
> something that involves the word "shared" rather than "parallel",
> because the bitmap will not be created in parallel, but it will be
> shared.

Earlier, I thought that it will be not a good idea to set that flag in
BitmapIndexScan path because the same path can be used either under
ParallelBitmapHeapPath or under normal BitmapHeapPath.  But, now after
putting some thought, I realised that we can do that in create_plan.
Therefore, I will change this.

>
> Have you checked whether this patch causes any regression in the
> non-parallel case?  It adds a bunch of "if" statements, so it might.
> Hopefully not, but it needs to be carefully tested.

During the initial patch development, I have tested this aspect also
but never published any results for the same. I will do that along
with my next patch and post the results.
>
> pbms_is_leader() looks horrifically inefficient.  Every worker will
> reacquire the spinlock for every tuple.  You should only have to enter
> this spinlock-acquiring loop for the first tuple.  After that, either
> you became the leader, did the initialization, and set the state to
> PBM_FINISHED, or you waited until the pre-existing leader did the
> same.  You should have a backend-local flag that keeps you from
> touching the spinlock for every tuple.  I wouldn't be surprised if
> fixing this nets a noticeable performance increase for this patch at
> high worker counts.

I think there is some confusion, if you notice, below code will avoid
calling pbms_is_leader for every tuple.
It will be called only first time. And, after that node->inited will
be true and it will directly jump to start_iterate for subsequent
calls.  Am I missing something?

> +if (node->inited)
> +goto start_iterate;

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dropping partitioned tables without CASCADE

2017-02-26 Thread Amit Langote

On 2017/02/27 13:35, Ashutosh Bapat wrote:
> On Mon, Feb 27, 2017 at 8:08 AM, Amit Langote
>  wrote:
>> On 2017/02/26 5:30, Simon Riggs wrote:
>>> On 23 February 2017 at 16:33, Simon Riggs  wrote:
>>>
  I'll be happy to review
>>>
>>> Patch looks OK so far, but fails on a partition that has partitions,
>>> probably because of the way we test relkind in the call to
>>> StoreCatalogInheritance1().
>>
>> Thanks for the review.
>>
>> I could not reproduce the failure you are seeing; could you perhaps share
>> the failing test case?  Here's mine that seems to work as expected:
>>
>> create table p (a int, b char) partition by list (a);
>> create table p1 (a int, b char) partition by list (b);
>> alter table p attach partition p1 for values in (1);
>>
>> -- add a partition to p1
>> create table p1a (like p1);
>> alter table p1 attach partition p1a for values in ('a');
>>
>> create table p2 partition of p for values in (1)
>>
>> \d+ p
>> 
>> Partition key: LIST (a)
>> Partitions: p1 FOR VALUES IN (1),
>> p2 FOR VALUES IN (2)
>>
>> -- this works (remember that p1 is a partitioned table)
>> drop table p1;
>> DROP TABLE
>>
>> \d+ p
>> 
>> Partition key: LIST (a)
>> Partitions: p2 FOR VALUES IN (2)
>>
>>> Please add a test for that so we can check automatically.
>>
>> OK, done.
> 
> Isn't list_range_parted multilevel partitioned table. It gets dropped
> in the testcases. So, I guess, we already have a testcase there.

I thought Simon meant the test case where a partition that is itself
partitioned is dropped.  At least that's what I took from "... fails *on*
partition that has partitions".  So in the example I posted, drop table p1.

Anyway, there might be the confusion that *only* the root level
partitioned table is of RELKIND_PARTITIONED_TABLE.  That's not true - any
partitioned table (even one that's a partition) is of that relkind.  So
the condition in the call to StoreCatalogInheritance1() is correct.  The
following hunk:

@@ -10744,7 +10756,9 @@ CreateInheritance(Relation child_rel, Relation
parent_rel)
 StoreCatalogInheritance1(RelationGetRelid(child_rel),
  RelationGetRelid(parent_rel),
  inhseqno + 1,
- catalogRelation);
+ catalogRelation,
+ parent_rel->rd_rel->relkind ==
+RELKIND_PARTITIONED_TABLE);

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : Parallel Merge Join

2017-02-26 Thread Amit Kapila

On Sun, Feb 26, 2017 at 12:01 PM, Robert Haas  wrote:
> On Fri, Feb 24, 2017 at 3:54 PM, Amit Kapila  wrote:
>> I agree in some cases it could be better, but I think benefits are not
>> completely clear, so probably we can leave it as of now and if later
>> any one comes with a clear use case or can see the benefits of such
>> path then we can include it.
>
> TBH, I think Dilip had the right idea here.  cheapest_total_inner
> isn't anything magical; it's just that there's no reason to use
> anything but the cheapest path for a relation when forming a join path
> unless that first path lacks some property that you need, like having
> the right sortkeys or being parallel-safe.  Since there are lots of
> join paths that just want to do things in the cheapest way possible,
> we identify the cheapest path and hang on to it; when that's not what
> we need, we go find the path or paths that have the properties we
> want.  I'm not sure why this case should be an exception.  You could
> argue that if the cheapest parallel-safe path is already more
> expensive then the parallel join may not pay off, but that's hard to
> say; it depends on what happens higher up in the plan tree.
>

Okay, but in that case don't you think it is better to consider the
parallel safety of cheapest_total_inner only when we don't find any
cheap parallel_safe innerpath by reducing the sort keys?



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A typo in mcxt.c

2017-02-26 Thread Kyotaro HORIGUCHI

I'm happy to know such a thing. mcxt.c is very stable part of the
code so those who don't know such things like me rarely comes.

At Thu, 23 Feb 2017 22:59:51 -0500, Tom Lane  wrote in 
<31880.1487908...@sss.pgh.pa.us>
> Andres Freund  writes:
> > On 2017-02-23 14:26:07 -0600, Jim Nasby wrote:
> >> On 2/23/17 6:38 AM, Thomas Munro wrote:
> >>> That is an archaic way of contracting the same words differently:
> 
> >> Given the number of non-native English speakers we have, it's probably 
> >> worth
> >> changing it...
> 
> > I'm a non-native speaker and I actually like discovering new language
> > "features" every now and then. I think as long as it's not inhibiting
> > understanding to much - which doesn't seem to be the case here - it's
> > fine to keep things like this.
> 
> While I don't recall it specifically, git blame says that comment is mine.
> I'm pretty sure it's not a typo, but that the allusion to Hamlet was
> intentional.  I think it's good to have a bit of levity and external
> references in our comments; cuts down on the boredom of reading totally
> dry code.
> 
> (But see commit d2783bee3 for one hazard of this sort of thing.)
> 
>   regards, tom lane

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dropping partitioned tables without CASCADE

2017-02-26 Thread Ashutosh Bapat

On Mon, Feb 27, 2017 at 8:08 AM, Amit Langote
 wrote:
> On 2017/02/26 5:30, Simon Riggs wrote:
>> On 23 February 2017 at 16:33, Simon Riggs  wrote:
>>
>>>  I'll be happy to review
>>
>> Patch looks OK so far, but fails on a partition that has partitions,
>> probably because of the way we test relkind in the call to
>> StoreCatalogInheritance1().
>
> Thanks for the review.
>
> I could not reproduce the failure you are seeing; could you perhaps share
> the failing test case?  Here's mine that seems to work as expected:
>
> create table p (a int, b char) partition by list (a);
> create table p1 (a int, b char) partition by list (b);
> alter table p attach partition p1 for values in (1);
>
> -- add a partition to p1
> create table p1a (like p1);
> alter table p1 attach partition p1a for values in ('a');
>
> create table p2 partition of p for values in (1)
>
> \d+ p
> 
> Partition key: LIST (a)
> Partitions: p1 FOR VALUES IN (1),
> p2 FOR VALUES IN (2)
>
> -- this works (remember that p1 is a partitioned table)
> drop table p1;
> DROP TABLE
>
> \d+ p
> 
> Partition key: LIST (a)
> Partitions: p2 FOR VALUES IN (2)
>
>> Please add a test for that so we can check automatically.
>
> OK, done.

Isn't list_range_parted multilevel partitioned table. It gets dropped
in the testcases. So, I guess, we already have a testcase there.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IF (NOT) EXISTS in psql-completion

2017-02-26 Thread Kyotaro HORIGUCHI

At Mon, 27 Feb 2017 10:43:39 +0900, Michael Paquier  
wrote in 
> On Mon, Feb 27, 2017 at 10:20 AM, Tom Lane  wrote:
> > Michael Paquier  writes:
> >> On Mon, Feb 27, 2017 at 10:12 AM, Tom Lane  wrote:
> >>> BTW ... can anyone explain to me the reason why we offer to complete
> >>> CREATE OBJECT with the names of existing objects of that kind?
> >
> >> Isn't that to facilitate commands appended after CREATE SCHEMA? Say
> >> table foo is in schema1, and creating it in schema2 gets easier with
> >> tab completion?
> >
> > Seems like pretty much of a stretch.  I've never done anything like
> > that, have you?
> 
> Never, but that was the only reason I could think about. I recall
> reading something else on -hackers but I cannot put my finger on it,
> nor does a lookup at the archives help... Perhaps that's the one I
> just mentioned as well.

I suppose it is for suggesting what kind of word should come
there, or avoiding silence for a tab. Or for symmetry with other
types of manipulation, like DROP. Another possibility is creating
multiple objects with similar names, say CREATE TABLE
employee_x1, CREATE TABLE employee_x2. Just trying to complete
existing *schema* is one more another possible objective.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] error detail when partition not found

2017-02-26 Thread Amit Langote

Thanks for the review.

On 2017/02/27 2:39, Robert Haas wrote:
> On Tue, Feb 21, 2017 at 7:28 AM, Amit Langote
>  wrote:
>> Simon pointed out in a nearby thread [0] that the detail part of
>> partition-not-found error should show just the partition keys.  I posted a
>> patch on that thread [1], but to avoid confusion being caused by multitude
>> of patches over there I'm re-posting it here.
> 
> Thanks.  GetPartitionFailureData seems like a strange name for a
> datatype, particularly the "Get" part.  How about
> PartitionRoutingFailureInfo?  Or just two out parameters.

Went with two out parameters instead of a new struct.

> Spelling: BuildSlotPartitinKeyDescription (in comment).

Fixed.

> ExecBuildSlotPartitionKeyDescription could have a comment saying that
> it's LIKE BuildIndexValueDescription() instead of copy-and-pasting the
> comments. And maybe BuildIndexValueDescription() could also get a
> comment saying that if we change anything there, we should check
> whether ExecBuildSlotPartitionKeyDescription() needs a similar change.

OK, I modified the comments.  Although, I kept comments that are a bit
different.

Updated patch is attached.

Thanks,
Amit
>From e8da812092c8cdd9d847467fbad668afbb0f3df0 Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 13 Feb 2017 19:52:19 +0900
Subject: [PATCH] Show only the partition key upon failing to find a partition

Currently, the whole row is shown without column names.  Instead,
adopt a style similar to _bt_check_unique() in ExecFindPartition()
and show the failing key in format (key1, ...)=(val1, ...).

The key description shown in the error message is now built by the
new ExecBuildSlotPartitionKeyDescription(), which works along the
lines of BuildIndexValueDescription(), using similar rules about
what columns of the partition key to include depending on the user's
privileges to view the same.

A bunch of relevant tests are added.
---
 src/backend/access/index/genam.c |   4 ++
 src/backend/catalog/partition.c  |  30 
 src/backend/executor/execMain.c  | 132 ++-
 src/backend/utils/adt/ruleutils.c|  37 +++---
 src/include/catalog/partition.h  |   8 ++-
 src/include/utils/ruleutils.h|   2 +
 src/test/regress/expected/insert.out |  38 --
 src/test/regress/sql/insert.sql  |  30 
 8 files changed, 231 insertions(+), 50 deletions(-)

diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index c4a393f34e..cb50b751c7 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -166,6 +166,10 @@ IndexScanEnd(IndexScanDesc scan)
  * The passed-in values/nulls arrays are the "raw" input to the index AM,
  * e.g. results of FormIndexDatum --- this is not necessarily what is stored
  * in the index, but it's what the user perceives to be stored.
+ *
+ * Note: if we change anything there, we should check whether
+ * ExecBuildSlotPartitionKeyDescription() in execMain.c needs a similar
+ * change.
  */
 char *
 BuildIndexValueDescription(Relation indexRelation,
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 4bcef58763..e01ef864f0 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -140,13 +140,6 @@ static int partition_bound_bsearch(PartitionKey key,
 		PartitionBoundInfo boundinfo,
 		void *probe, bool probe_is_bound, bool *is_equal);
 
-/* Support get_partition_for_tuple() */
-static void FormPartitionKeyDatum(PartitionDispatch pd,
-	  TupleTableSlot *slot,
-	  EState *estate,
-	  Datum *values,
-	  bool *isnull);
-
 /*
  * RelationBuildPartitionDesc
  *		Form rel's partition descriptor
@@ -1608,7 +1601,7 @@ generate_partition_qual(Relation rel)
  * the heap tuple passed in.
  * 
  */
-static void
+void
 FormPartitionKeyDatum(PartitionDispatch pd,
 	  TupleTableSlot *slot,
 	  EState *estate,
@@ -1672,7 +1665,8 @@ int
 get_partition_for_tuple(PartitionDispatch *pd,
 		TupleTableSlot *slot,
 		EState *estate,
-		Oid *failed_at)
+		PartitionDispatchData **failed_at,
+		TupleTableSlot **failed_slot)
 {
 	PartitionDispatch parent;
 	Datum		values[PARTITION_MAX_KEYS];
@@ -1693,13 +1687,6 @@ get_partition_for_tuple(PartitionDispatch *pd,
 		TupleTableSlot *myslot = parent->tupslot;
 		TupleConversionMap *map = parent->tupmap;
 
-		/* Quick exit */
-		if (partdesc->nparts == 0)
-		{
-			*failed_at = RelationGetRelid(parent->reldesc);
-			return -1;
-		}
-
 		if (myslot != NULL && map != NULL)
 		{
 			HeapTuple	tuple = ExecFetchSlotTuple(slot);
@@ -1710,6 +1697,14 @@ get_partition_for_tuple(PartitionDispatch *pd,
 			slot = myslot;
 		}
 
+		/* Quick exit */
+		if (partdesc->nparts == 0)
+		{
+			*failed_at = parent;
+			*failed_slot = slot;
+			return -1;
+		}
+
 		/*
 		 * Extract partition key from tuple. Expression evaluation machinery
 		 * that

Re: [HACKERS] Enabling parallelism for queries coming from SQL or other PL functions

2017-02-26 Thread Amit Kapila

On Sun, Feb 26, 2017 at 4:14 PM, Robert Haas  wrote:
> On Sun, Feb 26, 2017 at 6:34 AM, Amit Kapila  wrote:
>> On Sat, Feb 25, 2017 at 9:47 PM, Dilip Kumar  wrote:
>>> On Sat, Feb 25, 2017 at 5:12 PM, Amit Kapila  
>>> wrote:
 Sure, but that should only happen if the function is *not* declared as
 parallel safe (aka in parallel safe functions, we should not generate
 parallel plans).
>>>
>>> So basically we want to put a restriction that parallel-safe function
>>> can not use the parallel query? This will work but it seems too
>>> restrictive to me. Because by marking function parallel safe we enable
>>> it to be used with the outer parallel query that is fine. But, that
>>> should not restrict the function from using the parallel query if it's
>>> used with the other outer query which is not having the parallel
>>> plan(or function is executed directly).
>>
>> I think if the user is explicitly marking a function as parallel-safe,
>> then it doesn't make much sense to allow parallel query in such
>> functions as it won't be feasible for the planner (or at least it will
>> be quite expensive) to detect the same.  By the way, if the user has
>> any such expectation from a function, then he can mark the function as
>> parallel-restricted or parallel-unsafe.
>
> However, if a query is parallel-safe, it might not end up getting run
> in parallel.  In that case, it could still benefit from parallelism
> internally.  I think we want to allow that.  For example, suppose you
> run a query like:
>
> SELECT x, sum(somewhat_expensive_function(y)) FROM sometab GROUP BY 1;
>
> If sometab isn't very big, it's probably better to use a non-parallel
> plan for this query, because then somewhat_expensive_function() can
> still use parallelism internally, which might be better. However, if
> sometab is large enough, then it might be better to parallelize the
> whole query using a Partial/FinalizeAggregate and force each call to
> somewhat_expensive_function() to run serially.
>

Is there any easy way to find out which way is less expensive?  Even
if we find some way or just make a rule that when an outer query uses
parallelism, then force function call to run serially, how do we
achieve that?  I mean in each worker we can ensure that each
individual statements from a function can run serially (by having a
check of isparallelworker() in gather node), but having a similar
check in the master backend is tricky or maybe we don't want to care
for the same in master backend.  Do you have any suggestions on how to
make it work?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dropping partitioned tables without CASCADE

2017-02-26 Thread Amit Langote

On 2017/02/26 5:30, Simon Riggs wrote:
> On 23 February 2017 at 16:33, Simon Riggs  wrote:
> 
>>  I'll be happy to review
> 
> Patch looks OK so far, but fails on a partition that has partitions,
> probably because of the way we test relkind in the call to
> StoreCatalogInheritance1().

Thanks for the review.

I could not reproduce the failure you are seeing; could you perhaps share
the failing test case?  Here's mine that seems to work as expected:

create table p (a int, b char) partition by list (a);
create table p1 (a int, b char) partition by list (b);
alter table p attach partition p1 for values in (1);

-- add a partition to p1
create table p1a (like p1);
alter table p1 attach partition p1a for values in ('a');

create table p2 partition of p for values in (1)

\d+ p

Partition key: LIST (a)
Partitions: p1 FOR VALUES IN (1),
p2 FOR VALUES IN (2)

-- this works (remember that p1 is a partitioned table)
drop table p1;
DROP TABLE

\d+ p

Partition key: LIST (a)
Partitions: p2 FOR VALUES IN (2)

> Please add a test for that so we can check automatically.

OK, done.

Thanks,
Amit
>From 419768af093d1d7b4bd9f57e2c9481227499aa1c Mon Sep 17 00:00:00 2001
From: amit 
Date: Thu, 16 Feb 2017 15:56:44 +0900
Subject: [PATCH] Allow dropping partitioned table without CASCADE

Currently, a normal dependency is created between a inheritance
parent and child when creating the child.  That means one must
specify CASCADE to drop the parent table if a child table exists.
When creating partitions as inheritance children, create auto
dependency instead, so that partitions get dropped automatically
when the parent is dropped i.e., without specifying CASCADE.
---
 src/backend/commands/tablecmds.c   | 30 ---
 src/test/regress/expected/alter_table.out  | 10 
 src/test/regress/expected/create_table.out | 38 --
 src/test/regress/expected/inherit.out  | 22 ++---
 src/test/regress/expected/insert.out   |  7 ++
 src/test/regress/expected/update.out   |  7 +-
 src/test/regress/sql/alter_table.sql   | 10 
 src/test/regress/sql/create_table.sql  | 23 --
 src/test/regress/sql/inherit.sql   |  4 ++--
 src/test/regress/sql/insert.sql|  7 ++
 src/test/regress/sql/update.sql|  2 +-
 11 files changed, 87 insertions(+), 73 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 3cea220421..3753d66c9e 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -289,9 +289,11 @@ static List *MergeAttributes(List *schema, List *supers, char relpersistence,
 static bool MergeCheckConstraint(List *constraints, char *name, Node *expr);
 static void MergeAttributesIntoExisting(Relation child_rel, Relation parent_rel);
 static void MergeConstraintsIntoExisting(Relation child_rel, Relation parent_rel);
-static void StoreCatalogInheritance(Oid relationId, List *supers);
+static void StoreCatalogInheritance(Oid relationId, List *supers,
+		bool child_is_partition);
 static void StoreCatalogInheritance1(Oid relationId, Oid parentOid,
-		 int16 seqNumber, Relation inhRelation);
+		 int16 seqNumber, Relation inhRelation,
+		 bool child_is_partition);
 static int	findAttrByName(const char *attributeName, List *schema);
 static void AlterIndexNamespaces(Relation classRel, Relation rel,
    Oid oldNspOid, Oid newNspOid, ObjectAddresses *objsMoved);
@@ -725,7 +727,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 		  typaddress);
 
 	/* Store inheritance information for new rel. */
-	StoreCatalogInheritance(relationId, inheritOids);
+	StoreCatalogInheritance(relationId, inheritOids, stmt->partbound != NULL);
 
 	/*
 	 * We must bump the command counter to make the newly-created relation
@@ -2240,7 +2242,8 @@ MergeCheckConstraint(List *constraints, char *name, Node *expr)
  * supers is a list of the OIDs of the new relation's direct ancestors.
  */
 static void
-StoreCatalogInheritance(Oid relationId, List *supers)
+StoreCatalogInheritance(Oid relationId, List *supers,
+		bool child_is_partition)
 {
 	Relation	relation;
 	int16		seqNumber;
@@ -2270,7 +2273,8 @@ StoreCatalogInheritance(Oid relationId, List *supers)
 	{
 		Oid			parentOid = lfirst_oid(entry);
 
-		StoreCatalogInheritance1(relationId, parentOid, seqNumber, relation);
+		StoreCatalogInheritance1(relationId, parentOid, seqNumber, relation,
+ child_is_partition);
 		seqNumber++;
 	}
 
@@ -2283,7 +2287,8 @@ StoreCatalogInheritance(Oid relationId, List *supers)
  */
 static void
 StoreCatalogInheritance1(Oid relationId, Oid parentOid,
-		 int16 seqNumber, Relation inhRelation)
+		 int16 seqNumber, Relation inhRelation,
+		 bool child_is_partition)
 {
 	TupleDesc	desc = RelationGetDescr(inhRelation);
 	Datum		values[Natts_pg_inherits];
@@

Re: [HACKERS] Instability in select_parallel regression test

2017-02-26 Thread Amit Kapila

On Sun, Feb 26, 2017 at 11:15 PM, Robert Haas  wrote:
> On Mon, Feb 20, 2017 at 7:52 AM, Amit Kapila  wrote:
>
>> The main point is if we
>> keep any loose end in this area, then there is a chance that the
>> regression test select_parallel can still fail, if not now, then in
>> future.  Another way could be that we can try to minimize the race
>> condition here and then adjust the select_parallel as suggested above
>> so that we don't see this failure.
>
> My guess is that if we apply the fix I suggested above, it'll be good
> enough.  If that turns out not to be true, then I guess we'll have to
> deal with that, but why not do the easy thing first?
>

Okay, that is also a sensible approach.  Your patch looks good to me,
though I have not tested it.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] timeouts in PostgresNode::psql

2017-02-26 Thread Craig Ringer

Amended patch attached after a few Perl-related comments I got on
private mail. Instead of

$exc_save !~ /^$timeout_exception.*/

I've updated to:

$exc_save !~ /^\Q$timeout_exception\E/

i.e. don't do an unnecessary wildcard match at the end, and disable
metachar interpretation in the substituted range.

Still needs applying to pg9.6 and pg10.
From f96b78c19c09fda83b26b75cca500dd35c849002 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Fri, 24 Feb 2017 11:43:30 +0800
Subject: [PATCH] Fix timeouts in PostgresNode::psql

Newer Perl or IPC::Run versions default to appending the filename to string
exceptions, e.g. the exception

psql timed out

 is thrown as

psql timed out at /usr/share/perl5/vendor_perl/IPC/Run.pm line 2961.

To handle this, match exceptions with !~ rather than ne.
---
 src/test/perl/PostgresNode.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 9b712eb..bd627b2 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -1116,7 +1116,7 @@ sub psql
 			# IPC::Run::run threw an exception. re-throw unless it's a
 			# timeout, which we'll handle by testing is_expired
 			die $exc_save
-			  if (blessed($exc_save) || $exc_save ne $timeout_exception);
+			  if (blessed($exc_save) || $exc_save !~ /^\Q$timeout_exception\E/);
 
 			$ret = undef;
 
-- 
2.5.5


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SerializedSnapshotData alignment

2017-02-26 Thread Thomas Munro

On Mon, Feb 27, 2017 at 2:18 PM, Noah Misch  wrote:
> I wondered the same thing; if nothing else, why don't protosciurus and
> castoroides fail the same way?  They do use older compilers, "Sun C 5.10
> SunOS_sparc 2009/06/03" and gcc 3.4.3.  I have "Sun C 5.12 SunOS_sparc
> 2011/11/16" and gcc 4.9.2, both of which are alignment-sensitive in several
> configurations, according to the attached test program.  However, in a 32-bit
> build with this Sun C, I don't get alignment-related bus errors.  (Those
> animals build 64-bit, so this isn't the full story.)

It's been a while but I seem to recall that Sun C defaulted to a
-xmemalign setting that tolerated misaligned reads in 32 bit builds,
but used a different default on 64 bit builds.  "Solaris Application
Programming" 5.8.5 seems to confirm this:  "For 32-bit applications,
since Sun Studio 9, the default is for the compiler to assume 8-byte
alignment and to trap and correct any misaligned data accesses.  For
64-bit applications, the compiler assumes 8-byte alignment, but the
application will SIGBUS on a misaligned access."

Yet castoroides seems to be building with -m64, so that's not the
explanation.  Could it be correctly aligned by coincidence?

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IF (NOT) EXISTS in psql-completion

2017-02-26 Thread Michael Paquier

On Mon, Feb 27, 2017 at 10:20 AM, Tom Lane  wrote:
> Michael Paquier  writes:
>> On Mon, Feb 27, 2017 at 10:12 AM, Tom Lane  wrote:
>>> BTW ... can anyone explain to me the reason why we offer to complete
>>> CREATE OBJECT with the names of existing objects of that kind?
>
>> Isn't that to facilitate commands appended after CREATE SCHEMA? Say
>> table foo is in schema1, and creating it in schema2 gets easier with
>> tab completion?
>
> Seems like pretty much of a stretch.  I've never done anything like
> that, have you?

Never, but that was the only reason I could think about. I recall
reading something else on -hackers but I cannot put my finger on it,
nor does a lookup at the archives help... Perhaps that's the one I
just mentioned as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IF (NOT) EXISTS in psql-completion

2017-02-26 Thread Tom Lane

Michael Paquier  writes:
> On Mon, Feb 27, 2017 at 10:12 AM, Tom Lane  wrote:
>> BTW ... can anyone explain to me the reason why we offer to complete
>> CREATE OBJECT with the names of existing objects of that kind?

> Isn't that to facilitate commands appended after CREATE SCHEMA? Say
> table foo is in schema1, and creating it in schema2 gets easier with
> tab completion?

Seems like pretty much of a stretch.  I've never done anything like
that, have you?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SerializedSnapshotData alignment

2017-02-26 Thread Noah Misch

On Sun, Feb 26, 2017 at 07:53:15PM -0500, Tom Lane wrote:
> Noah Misch  writes:
> > Dear 7b4ac19 authors,
> > Field ps_snapshot_data usually receives four-byte alignment within
> > ParallelIndexScanDescData, but it contains the eight-byte whenTaken field.
> > The select_parallel test dies with SIGBUS on "Oracle Solaris 10 1/13
> > s10s_u11wos_24a SPARC", building with gcc 4.9.2.
> 
> It's a little distressing that the buildfarm didn't find this already.
> Is there some reason why it's specific to that particular compiler,
> rather than generic to alignment-picky 64-bit machines?

I wondered the same thing; if nothing else, why don't protosciurus and
castoroides fail the same way?  They do use older compilers, "Sun C 5.10
SunOS_sparc 2009/06/03" and gcc 3.4.3.  I have "Sun C 5.12 SunOS_sparc
2011/11/16" and gcc 4.9.2, both of which are alignment-sensitive in several
configurations, according to the attached test program.  However, in a 32-bit
build with this Sun C, I don't get alignment-related bus errors.  (Those
animals build 64-bit, so this isn't the full story.)

> In general, though, I agree that using a char[] member to represent
> anything that has any alignment requirement at all is seriously bad
> coding style that is almost certain to fail eventually.
> 
> A solution you didn't mention is to change the ParallelIndexScanDescData
> field to be a pointer, perhaps "struct SerializedSnapshotData *", while
> leaving that struct opaque so far as relscan.h is concerned.  This could
> avoid the need to use the unsafe blind casts that I'm sure must be
> involved in accesses to that field at present.

That, too, would be reasonable.
/* run with: for n in 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15; do ./a.out $n; 
done */

#include 
#include 

int main(int argc, char **argv)
{
char foo[] = "abcdefghijklmnopqrstuvwxyz";
if (argc != 2)
{
puts("bad usage");
return 1;
}

printf("%llx\n", *(long long *)(foo + atoi(argv[1])));
return 0;
}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IF (NOT) EXISTS in psql-completion

2017-02-26 Thread Michael Paquier

On Mon, Feb 27, 2017 at 10:12 AM, Tom Lane  wrote:
> Michael Paquier  writes:
>> On Mon, Feb 27, 2017 at 5:21 AM, Tom Lane  wrote:
>>> So I'd be a whole lot happier if it didn't do that.  Can we really not
>>> add the desired features in a more localized fashion?
>
>> As "if not exists" is defined after the object type if would not be
>> that complicated to add completion for IE/INE after the object type
>> with a set of THING_* flags in words_after_create. One missing piece
>> would be to add completion for the objects themselves after IE or INE
>> have been entered by the user, but I would think that tweaking the
>> checks on words_after_create[i] would be doable as well. And that
>> would be localized.
>
> BTW ... can anyone explain to me the reason why we offer to complete
> CREATE OBJECT with the names of existing objects of that kind?
> That seems pretty darn stupid.  I can see offering the names of existing
> schemas there, if the object type is one that has schema-qualified names,
> but completing with an existing object name is just setting up to fail
> isn't it?

Isn't that to facilitate commands appended after CREATE SCHEMA? Say
table foo is in schema1, and creating it in schema2 gets easier with
tab completion?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IF (NOT) EXISTS in psql-completion

2017-02-26 Thread Tom Lane

Michael Paquier  writes:
> On Mon, Feb 27, 2017 at 5:21 AM, Tom Lane  wrote:
>> So I'd be a whole lot happier if it didn't do that.  Can we really not
>> add the desired features in a more localized fashion?

> As "if not exists" is defined after the object type if would not be
> that complicated to add completion for IE/INE after the object type
> with a set of THING_* flags in words_after_create. One missing piece
> would be to add completion for the objects themselves after IE or INE
> have been entered by the user, but I would think that tweaking the
> checks on words_after_create[i] would be doable as well. And that
> would be localized.

BTW ... can anyone explain to me the reason why we offer to complete
CREATE OBJECT with the names of existing objects of that kind?
That seems pretty darn stupid.  I can see offering the names of existing
schemas there, if the object type is one that has schema-qualified names,
but completing with an existing object name is just setting up to fail
isn't it?

If we dropped that behavior, seems like it would become much easier
to plug in IF NOT EXISTS at those spots.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for changes to recovery.conf API

2017-02-26 Thread Michael Paquier

On Sun, Feb 26, 2017 at 5:56 PM, Robert Haas  wrote:
> On Sat, Feb 25, 2017 at 7:28 PM, Michael Paquier
>  wrote:
>> - pg_basebackup -R generates recovery.conf.auto.
>
> Does anything cause that file to get read?
>
> Wouldn't it be better to just append to postgresql.conf.auto?

Yeah, that would be cleaner than having the backend look for an extra
hardcoded path. Looking at pg_basebackup.c, actually it would not be
difficult to append data to an existing file: look for the file in the
tar stream, and when it is here save its content for later and bypass
it. Once the tar stream is written, just use the data saved previously
and append the parameters at the end of it.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SerializedSnapshotData alignment

2017-02-26 Thread Tom Lane

Noah Misch  writes:
> Dear 7b4ac19 authors,
> Field ps_snapshot_data usually receives four-byte alignment within
> ParallelIndexScanDescData, but it contains the eight-byte whenTaken field.
> The select_parallel test dies with SIGBUS on "Oracle Solaris 10 1/13
> s10s_u11wos_24a SPARC", building with gcc 4.9.2.

It's a little distressing that the buildfarm didn't find this already.
Is there some reason why it's specific to that particular compiler,
rather than generic to alignment-picky 64-bit machines?

In general, though, I agree that using a char[] member to represent
anything that has any alignment requirement at all is seriously bad
coding style that is almost certain to fail eventually.

A solution you didn't mention is to change the ParallelIndexScanDescData
field to be a pointer, perhaps "struct SerializedSnapshotData *", while
leaving that struct opaque so far as relscan.h is concerned.  This could
avoid the need to use the unsafe blind casts that I'm sure must be
involved in accesses to that field at present.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IF (NOT) EXISTS in psql-completion

2017-02-26 Thread Michael Paquier

On Mon, Feb 27, 2017 at 5:21 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> Yeah, maybe, but we'd need a committer to take more of an interest in
>> this patch series.  Personally, I'm wondering why we need a series of
>> 19 patches to add tab completion support for IF NOT EXISTS.  The
>> feature which is the subject of this thread arrives in patch 0017, and
>> a lot of the patches which come before that seem to change a lot of
>> stuff without actually improving much that would really benefit users.
>
> FWIW, one reason this committer hasn't jumped in is that we already
> rewrote tab-complete.c pretty completely in 9.6.  If we accept a patch
> that completely rewrites it again, we're going to be faced with
> maintaining three fundamentally different implementations for the next
> three-plus years (until 9.5 dies).  Admittedly, we don't back-patch
> fixes in tab-complete.c every week, but a look at the git history says
> we do need to do that several times a year.

Indeed, having worked on the 9.6 refactoring a bit as well... I'll
vote for not doing this again as HEAD is in a more readable shape
compared to the pre-9.5 area, and I am not convinced that it is worth
the trouble. There are a couple of things that can be extracted from
this set of patches, but I would vote for not doing the same level of
refactoring.

> Also, the nature of the primary refactoring (changing the big else-chain
> into standalone ifs, if I read it correctly) is particularly bad from a
> back-patching standpoint because all you have to do is insert an "else",
> or fail to insert one, to silently and almost completely break either
> one or the other branch.  And I don't really understand why that's a good
> idea anyway: surely we can return at most one set of completions, so how
> is turning the function into a lot of independent actions a win?
>
> So I'd be a whole lot happier if it didn't do that.  Can we really not
> add the desired features in a more localized fashion?

As "if not exists" is defined after the object type if would not be
that complicated to add completion for IE/INE after the object type
with a set of THING_* flags in words_after_create. One missing piece
would be to add completion for the objects themselves after IE or INE
have been entered by the user, but I would think that tweaking the
checks on words_after_create[i] would be doable as well. And that
would be localized.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Statement timeout behavior in extended queries

2017-02-26 Thread Tatsuo Ishii

> On Wed, Feb 22, 2017 at 11:50:44AM +0900, Tatsuo Ishii wrote:
>> Last year I have proposed an enhancement regarding behavior of the
>> statement timeout in extended queries.
>> 
>> https://www.postgresql.org/message-id/20160528.220442.1489791680347556026.t-ishii%40sraoss.co.jp
>> 
>> IMO the current behavior is counter intuitive and I would like to
>> change it toward PostgreSQL 10.0.
>> 
>> For example, suppose that the timeout is set to 4 seconds and the
>> first query takes 2 seconds and the second query takes 3 seconds. Then
>> the statement timeout is triggered if a sync message is sent to
>> backend after the second query.
>> 
>> Moreover, log_duration or log_min_duration_statement shows that each
>> query took 2 or 3 seconds of course, which is not very consistent with
>> the statement timeout IMO.
>> 
>> Attached patch tries to change the behavior, by checking statement
>> timeout against each phase of an extended query.
>> 
>> To test the patch, I have created a small tool called "pgproto", which
>> can issue arbitrary sequence of frontend/backend message, reading from a
>> text file.
>> 
>> https://github.com/tatsuo-ishii/pgproto
>> (to build the program, you need C compiler and libpq)
> 
> Does it seem reasonable to start making this into a regression test
> and/or fuzz test for the protocol itself?

I personally think the regression tests ought to include tests for
extended query protocols and pgproto could be an useful tool to
implement that. Of course if we are going for that direction, pgproto
needs to be a contrib module first.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] SerializedSnapshotData alignment

2017-02-26 Thread Noah Misch

Dear 7b4ac19 authors,

Field ps_snapshot_data usually receives four-byte alignment within
ParallelIndexScanDescData, but it contains the eight-byte whenTaken field.
The select_parallel test dies with SIGBUS on "Oracle Solaris 10 1/13
s10s_u11wos_24a SPARC", building with gcc 4.9.2.  Some credible fixes:

1. Move the SerializedSnapshotData declaration from snapmgr.c to snapmgr.h and
   declare the ps_snapshot_data field to be of type SerializedSnapshotData.
   Probably also add a field "TransactionId xids[FLEXIBLE_ARRAY_MEMBER]" to
   SerializedSnapshotData, to assert the variable-length nature.

2. Change "char ps_snapshot_data[FLEXIBLE_ARRAY_MEMBER]" to "int64 ...".  I
   have attached this in SerializedSnapshot-int64-v1.patch.

3. Change no declarations, and make snapmgr.c memcpy() the
   SerializedSnapshotData through a local buffer.  I have attached this as
   SerializedSnapshot-memcpy-v1.patch.

I like (2) well enough, but I don't see that technique used elsewhere in the
tree.  (1) is more typical of PostgreSQL, though I personally like it when
structs can stay private to a file.  (3) is also well-attested, particularly
in xlog replay code.  I am leaning toward (2).  Other opinions?

Field phs_snapshot_data happens to receive eight-byte alignment within
ParallelHeapScanDescData; otherwise, such scans would fail the same way.
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index f232c84..a46a73e 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -2013,7 +2013,7 @@ EstimateSnapshotSpace(Snapshot snap)
  * memory location at start_address.
  */
 void
-SerializeSnapshot(Snapshot snapshot, char *start_address)
+SerializeSnapshot(Snapshot snapshot, int64 *start_address)
 {
SerializedSnapshotData *serialized_snapshot;
 
@@ -2069,7 +2069,7 @@ SerializeSnapshot(Snapshot snapshot, char *start_address)
  * to 0.  The returned snapshot has the copied flag set.
  */
 Snapshot
-RestoreSnapshot(char *start_address)
+RestoreSnapshot(int64 *start_address)
 {
SerializedSnapshotData *serialized_snapshot;
Sizesize;
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index ce3ca8d..3a5c5c9 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -38,7 +38,7 @@ typedef struct ParallelHeapScanDescData
slock_t phs_mutex;  /* mutual exclusion for block 
number fields */
BlockNumber phs_startblock; /* starting block number */
BlockNumber phs_cblock; /* current block number */
-   charphs_snapshot_data[FLEXIBLE_ARRAY_MEMBER];
+   int64   phs_snapshot_data[FLEXIBLE_ARRAY_MEMBER];
 }  ParallelHeapScanDescData;
 
 typedef struct HeapScanDescData
@@ -138,7 +138,7 @@ typedef struct ParallelIndexScanDescData
Oid ps_relid;
Oid ps_indexid;
Sizeps_offset;  /* Offset in bytes of am 
specific structure */
-   charps_snapshot_data[FLEXIBLE_ARRAY_MEMBER];
+   int64   ps_snapshot_data[FLEXIBLE_ARRAY_MEMBER];
 } ParallelIndexScanDescData;
 
 /* Struct for heap-or-index scans of system tables */
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index ab95366..4b5feb7 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -106,8 +106,8 @@ extern void TeardownHistoricSnapshot(bool is_error);
 extern bool HistoricSnapshotActive(void);
 
 extern Size EstimateSnapshotSpace(Snapshot snapshot);
-extern void SerializeSnapshot(Snapshot snapshot, char *start_address);
-extern Snapshot RestoreSnapshot(char *start_address);
+extern void SerializeSnapshot(Snapshot snapshot, int64 *start_address);
+extern Snapshot RestoreSnapshot(int64 *start_address);
 extern void RestoreTransactionSnapshot(Snapshot snapshot, void *master_pgproc);
 
 #endif   /* SNAPMGR_H */
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index f232c84..e6d7a19 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -2015,34 +2015,35 @@ EstimateSnapshotSpace(Snapshot snap)
 void
 SerializeSnapshot(Snapshot snapshot, char *start_address)
 {
-   SerializedSnapshotData *serialized_snapshot;
+   SerializedSnapshotData serialized_snapshot;
 
Assert(snapshot->subxcnt >= 0);
 
-   serialized_snapshot = (SerializedSnapshotData *) start_address;
-
/* Copy all required fields */
-   serialized_snapshot->xmin = snapshot->xmin;
-   serialized_snapshot->xmax = snapshot->xmax;
-   serialized_snapshot->xcnt = snapshot->xcnt;
-   serialized_snapshot->subxcnt = snapshot->subxcnt;
-   serialized_snapshot->suboverflowed = snapshot->suboverflowed;
-   serialized_snapshot->takenDuringRecovery = 
snapshot->takenDuringRecovery;
-   serialized_snapshot->curcid = snapshot->curcid;
-

Re: [HACKERS] Proposal for changes to recovery.conf API

2017-02-26 Thread Josh Berkus

On 02/26/2017 12:55 AM, Robert Haas wrote:
> On Wed, Jan 11, 2017 at 11:23 PM, Simon Riggs  wrote:
>>> I think the issue was that some people didn't want configuration files
>>> in the data directory.  By removing recovery.conf we accomplish that.
>>> Signal/trigger files are not configuration (or at least it's much easier
>>> to argue that), so I think having them in the data directory is fine.
>>
>> There were a considerable number of people that pushed to make the
>> data directory non-user writable, which is where the signal directory
>> came from.
> 
> Specifically, it's a problem for Debian's packaging conventions,
> right?  The data directory can contain anything that the server itself
> will write, but configuration files that are written for the server to
> read are supposed to go in some external location dictated by Debian's
> packaging policy.
> 
> Things like trigger files aren't configuration files per se, so maybe
> it's OK if those still get written into the data directory.  Even if
> not, that seems like a separate patch.  In my view, based on Michael's
> description of what the current patch version does, it's a clear step
> forward.  Other steps can be taken at another time, if required.
> 

>From the perspective of containerized Postgres, you want config files to
go into one (non-writeable) directory, and anything which is writeable
by the DB server to go into another directory (and preferably, a single
directory).

A trigger file (that is, assuming an empty one, and recovery config
merged with pg.conf) would thus be writeable, non-configuration data
which goes in the data directory.

Users manually writing the trigger file doesn't show up as a problem
since, in a containerized environment, they can't.  It's either written
by postgres itself, or by management software which runs as the postgres
user.

-- 
Josh Berkus
Containers & Databases Oh My!


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Bug fix] PQsendQuery occurs error when target_session_attrs is set to read-write

2017-02-26 Thread Michael Paquier

On Mon, Feb 27, 2017 at 3:12 AM, Robert Haas  wrote:
> Thanks!  Committed.

Thanks.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog

2017-02-26 Thread Michael Paquier

On Fri, Feb 24, 2017 at 11:58 AM, Jim Nasby  wrote:
> Why not provide % replacements that contain that info? pg_receivexlog has a
> much better shot at doing that correctly than some random user tool...
>
> (%f could certainly be useful for other things)

(was unfortunately sent off-list, thanks Jim!)
%f maps with archive_command and restore_command, so for consistency
this makes sense, at least to me. And I cannot believe that all users
will actually need this argument, per se my use case upthread.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Statement timeout behavior in extended queries

2017-02-26 Thread David Fetter

On Wed, Feb 22, 2017 at 11:50:44AM +0900, Tatsuo Ishii wrote:
> Last year I have proposed an enhancement regarding behavior of the
> statement timeout in extended queries.
> 
> https://www.postgresql.org/message-id/20160528.220442.1489791680347556026.t-ishii%40sraoss.co.jp
> 
> IMO the current behavior is counter intuitive and I would like to
> change it toward PostgreSQL 10.0.
> 
> For example, suppose that the timeout is set to 4 seconds and the
> first query takes 2 seconds and the second query takes 3 seconds. Then
> the statement timeout is triggered if a sync message is sent to
> backend after the second query.
> 
> Moreover, log_duration or log_min_duration_statement shows that each
> query took 2 or 3 seconds of course, which is not very consistent with
> the statement timeout IMO.
> 
> Attached patch tries to change the behavior, by checking statement
> timeout against each phase of an extended query.
> 
> To test the patch, I have created a small tool called "pgproto", which
> can issue arbitrary sequence of frontend/backend message, reading from a
> text file.
> 
> https://github.com/tatsuo-ishii/pgproto
> (to build the program, you need C compiler and libpq)

Does it seem reasonable to start making this into a regression test
and/or fuzz test for the protocol itself?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] btree_gin and btree_gist for enums

2017-02-26 Thread Andrew Dunstan



On 02/26/2017 03:26 PM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> This works for the btree_gin case. However, there's a difficulty for
>> btree_gist - in the puicksplit routine the cmp function is passed to
>> qsort() so there is no chance to pass it an flinfo to set up the call to
>> the real comparison routine. Implementing a custom sort routine to work
>> around the problem seems a bridge too far. I can;t think of an
>> alternative off hand.
> We already have qsort_arg ... can't you change it to use that?
>
>   


Yes, wasn't aware of that, that looks like exactly what I need. thanks.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gitlab post-mortem: pg_basebackup waiting for checkpoint

2017-02-26 Thread Tom Lane

Magnus Hagander  writes:
> On Sun, Feb 26, 2017 at 8:27 PM, Michael Banck 
> wrote:
>> ISTM the consensus is that there should be no output in regular mode,
>> but a message should be displayed in verbose and progress mode.

> Agreed, and applied as one patch.

Is there an argument for back-patching this?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gitlab post-mortem: pg_basebackup waiting for checkpoint

2017-02-26 Thread Magnus Hagander

On Sun, Feb 26, 2017 at 9:53 PM, Michael Banck 
wrote:

> Hi,
>
> Am Sonntag, den 26.02.2017, 21:32 +0100 schrieb Magnus Hagander:
>
> > On Sun, Feb 26, 2017 at 8:27 PM, Michael Banck
> >  wrote:
>
> > Agreed, and applied as one patch. Except I noticed you also fixed a
> > couple of entries which were missing the progname in the messages -- I
> > broke those out to a separate patch instead.
>
> Thanks!
>
> > Made a small change to "using as much I/O as available" rather than
> > "as possible", which I think is a better wording, along with the
> > change of the idle wording I suggested before. (but feel free to point
> > it out to me if that's wrong).
>
> LGTM, I apparently missed your suggestion when I re-read the thread.
>
> I am just wondering whether this could/should be back-patched, maybe? It
> is not a bug fix, of course, but OTOH is rather small and probably
> helpful to some users on current releases.
>
>
Good point. We should definitely back-patch the documentation updates.

Not 100% sure about the others, as it's a small behaviour change. But since
it's only in verbose mode, I doubt it is very likely to break anybodys
scripts relying on certain output or so.

What do others think?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] gitlab post-mortem: pg_basebackup waiting for checkpoint

2017-02-26 Thread Michael Banck

Hi,

Am Sonntag, den 26.02.2017, 21:32 +0100 schrieb Magnus Hagander:

> On Sun, Feb 26, 2017 at 8:27 PM, Michael Banck
>  wrote:

> Agreed, and applied as one patch. Except I noticed you also fixed a
> couple of entries which were missing the progname in the messages -- I
> broke those out to a separate patch instead.

Thanks!

> Made a small change to "using as much I/O as available" rather than
> "as possible", which I think is a better wording, along with the
> change of the idle wording I suggested before. (but feel free to point
> it out to me if that's wrong). 

LGTM, I apparently missed your suggestion when I re-read the thread.

I am just wondering whether this could/should be back-patched, maybe? It
is not a bug fix, of course, but OTOH is rather small and probably
helpful to some users on current releases.

Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gitlab post-mortem: pg_basebackup waiting for checkpoint

2017-02-26 Thread Magnus Hagander

On Sun, Feb 26, 2017 at 8:27 PM, Michael Banck 
wrote:

> Hi,
>
> Am Dienstag, den 14.02.2017, 18:18 -0500 schrieb Robert Haas:
> > On Tue, Feb 14, 2017 at 4:06 PM, Alvaro Herrera
> >  wrote:
> > > I'd rather have a --quiet mode instead.  If you're running it by hand,
> > > you're likely to omit the switch, whereas when writing the cron job
> > > you're going to notice lack of switch even before you let the job run
> > > once.
> >
> > Well, that might've been a better way to design it, but changing it
> > now would break backward compatibility and I'm not really sure that's
> > a good idea.  Even if it is, it's a separate concern from whether or
> > not in the less-quiet mode we should point out that we're waiting for
> > a checkpoint on the server side.
>
> ISTM the consensus is that there should be no output in regular mode,
> but a message should be displayed in verbose and progress mode.
>
> So I went forth and also added a message in progress mode (unless
> verbose messages are requested anyway).
>
> Regarding the documentation, I tried to clarify the difference between
> the checkpoint types a bit more, but I think any further action is
> probably a larger rewrite of the documentation on this topic.
>
> So attached are two patches, I've split it up in the documentation and
> the code output part. I'll add it as one commitfest entry in the
> "Clients" section though, as it's not really a big patch, unless
> somebody thinks it should have a secon entry in "Documentation"?


Agreed, and applied as one patch. Except I noticed you also fixed a couple
of entries which were missing the progname in the messages -- I broke those
out to a separate patch instead.

Made a small change to "using as much I/O as available" rather than "as
possible", which I think is a better wording, along with the change of the
idle wording I suggested before. (but feel free to point it out to me if
that's wrong).

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] btree_gin and btree_gist for enums

2017-02-26 Thread Tom Lane

Andrew Dunstan  writes:
> This works for the btree_gin case. However, there's a difficulty for
> btree_gist - in the puicksplit routine the cmp function is passed to
> qsort() so there is no chance to pass it an flinfo to set up the call to
> the real comparison routine. Implementing a custom sort routine to work
> around the problem seems a bridge too far. I can;t think of an
> alternative off hand.

We already have qsort_arg ... can't you change it to use that?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IF (NOT) EXISTS in psql-completion

2017-02-26 Thread Tom Lane

Robert Haas  writes:
> Yeah, maybe, but we'd need a committer to take more of an interest in
> this patch series.  Personally, I'm wondering why we need a series of
> 19 patches to add tab completion support for IF NOT EXISTS.  The
> feature which is the subject of this thread arrives in patch 0017, and
> a lot of the patches which come before that seem to change a lot of
> stuff without actually improving much that would really benefit users.

FWIW, one reason this committer hasn't jumped in is that we already
rewrote tab-complete.c pretty completely in 9.6.  If we accept a patch
that completely rewrites it again, we're going to be faced with
maintaining three fundamentally different implementations for the next
three-plus years (until 9.5 dies).  Admittedly, we don't back-patch
fixes in tab-complete.c every week, but a look at the git history says
we do need to do that several times a year.

Also, the nature of the primary refactoring (changing the big else-chain
into standalone ifs, if I read it correctly) is particularly bad from a
back-patching standpoint because all you have to do is insert an "else",
or fail to insert one, to silently and almost completely break either
one or the other branch.  And I don't really understand why that's a good
idea anyway: surely we can return at most one set of completions, so how
is turning the function into a lot of independent actions a win?

So I'd be a whole lot happier if it didn't do that.  Can we really not
add the desired features in a more localized fashion?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] btree_gin and btree_gist for enums

2017-02-26 Thread Andrew Dunstan



On 02/25/2017 01:39 PM, Andrew Dunstan wrote:
>
> On 02/25/2017 01:34 PM, Tom Lane wrote:
>> Andrew Dunstan  writes:
>>> On 02/25/2017 12:04 PM, Tom Lane wrote:
 I think it'd be better to leave DirectFunctionCallN alone and just invent
 a small number of CallerFInfoFunctionCallN support functions (maybe N=1
 and N=2 would be enough, at least for now).
>>> See attached.
>> Yeah, I like this better, except that instead of
>>
>> + * The callee should not look at anything except the fn_mcxt and fn_extra.
>> + * Anything else is likely to be bogus.
>>
>> maybe
>>
>> + * It's recommended that the callee only use the fn_extra and fn_mcxt
>> + * fields, as other fields will typically describe the calling function
>> + * not the callee.  Conversely, the calling function should not have
>> + * used fn_extra, unless its use is known compatible with the callee's.
>>
>>  
>
> OK, Works for me. Thanks.
>


This works for the btree_gin case. However, there's a difficulty for
btree_gist - in the puicksplit routine the cmp function is passed to
qsort() so there is no chance to pass it an flinfo to set up the call to
the real comparison routine. Implementing a custom sort routine to work
around the problem seems a bridge too far. I can;t think of an
alternative off hand.

cheers

andrew





-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-02-26 Thread Corey Huinker

On Sun, Feb 26, 2017 at 2:47 AM, Fabien COELHO  wrote:

>
> Hello Corey,
>
> About v18: Patch applies, make check ok, psql tap tests ok.
>
>
> ISTM that contrary to the documentation "\elif something" is not evaluated
> in all cases, and the resulting code is harder to understand with a nested
> switch and condition structure:
>
>   switch
>   default
> read
> if
>switch
>
> I wish (this is a personal taste) it could avoid switch-nesting on the
> very same value. It should also conform to the documentation.
>

I wasn't too happy with it, but I figured it would spark discussion. I
succeeded.

>
> If there is no compelling reason for the switch-nesting, I would suggest
> to move the read_boolean_expression before the swich, to deal with error
> immediately there, and then to have just one switch.
>

I thought about doing it that way. However, in the case of:

\set x invalid

\if true

\else
\elif :x

\endif

The error has already "happened" at line 4, char 5, and it doesn't matter
what expression follows, you will get an error.  But because
read_boolean_expression() can emit errors, you would see the error saying
"invalid" isn't a valid boolean expression, and then see another error
saying that the \elif was out of place. If we suppress
read_boolean_expression()'s error reporting, then we have to re-create that
error message ourselves, or be fine with the error message on invalid
\elifs being inconsistent with invalid \ifs.

Similar to your suggestion below, we could encapsulate the first switch
into a function valid_elif_context(ConditionalStack), which might make the
code cleaner, but would make it harder to see that all swtich-cases are
covered between the two. That might be a tradeoff we want to make.

>
> Alternatively if the structure must really be kept, then deal with errors
> in a first switch, read value *after* switch and deal with other errors
> there, then start a second switch, and adjust the documentation accordingly?
>
>   switch
> errors
>   read
>   if
> errors
>   // no error
>   switch
>

How would the documentation have to change?

Also, the %R documentation has single line marker '^' before not executed
> '@', but it is the reverse in the code.

Noted and fixed in the next patch, good catch.

Re: [HACKERS] I propose killing PL/Tcl's "modules" infrastructure

2017-02-26 Thread Tom Lane

Andrew Dunstan  writes:
> [ we should borrow plv8's start_proc idea for pltcl ]

So after thinking about this for awhile, I propose the following
concrete spec for replacing pltcl's autoload-unknown behavior:

* Invent two GUCs, pltcl.start_proc and pltclu.start_proc, which default
to empty strings but can be set to the name (possibly schema-qualified)
of a parameterless function that must be of the corresponding language.
When so set, the specified function is called once just after creation of
any pltcl or pltclu interpreter.

* The function is called as the current SQL user, who must have
permissions to call it.  (This decision is more or less forced by
the fact that pltcl interpreters are per-userid; we want whatever
initialization gets done to be done in the new interpreter, and it
would be very weird and probably a security hole if we weren't
running as the same SQL userid that owns the interpreter.)

* Pre-call error conditions (no such function, wrong language, or no
permissions) result in a WARNING but the original operation continues.
(Making these WARNING not ERROR is possibly debatable, but it looks
like that's what plv8 does.)

* If the function itself throws an error, we do a transaction abort,
but the Tcl interpreter remains in existence and is deemed usable
for later operations.  So a failing start_proc can't lock you out
of pltcl operations altogether.

* I'm not terribly comfortable about what the permissions levels of the
GUCs ought to be.  The call permissions check means that you can't use
either GUC to call a function you couldn't have called anyway.  However
there's a separate risk of trojan-horse execution, analogous to what a
blackhat can get by controlling the search_path GUC setting used by a
SECURITY DEFINER function: the function might intend to invoke some pltcl
function, but you can get it to invoke some other pltcl function in
addition to that.  I think this means we had better make pltclu.start_proc
be SUSET, but from a convenience standpoint it'd be nice if
pltcl.start_proc were just USERSET.  An argument in favor of that is that
we don't restrict search_path which is just as dangerous; but on the other
hand, existing code should be expected to know that it needs to beware of
search_path, while it wouldn't know that start_proc needs to be locked
down.  Maybe we'd better make them both SUSET.

Comments?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IF (NOT) EXISTS in psql-completion

2017-02-26 Thread Pavel Stehule

2017-02-26 19:43 GMT+01:00 Robert Haas :

> On Wed, Feb 22, 2017 at 12:38 AM, Pavel Stehule 
> wrote:
> > Now first patch is broken :(
> >
> > It is pretty sensitive to any changes. Isn't possible to commit first
> four
> > patches first and separately maybe out of commitfest window?
>
> Yeah, maybe, but we'd need a committer to take more of an interest in
> this patch series.  Personally, I'm wondering why we need a series of
> 19 patches to add tab completion support for IF NOT EXISTS.  The
> feature which is the subject of this thread arrives in patch 0017, and
> a lot of the patches which come before that seem to change a lot of
> stuff without actually improving much that would really benefit users.
>
> 0001 seems like a lot of churn for no real benefit that I can immediately
> see.
> 0002 is a real feature, and probably a good one, though unrelated to
> the subject of this thread.  In the process, it changes many lines of
> code in fairly mechanical ways; does it need to do that?
> 0003 is infrastructure.
> 0004 adds a README.  Do we really need that?  It seems to be
> explaining things which are mostly fairly clear from just looking at
> the code.  If we add a README, we have to update it when we change
> things.  That's worthwhile if it helps people write code better, I'm
> not sure if it will do that.
>

it needs a separation to refactoring part and to new features part. The
refactoring looks well and I am sure so has sense.

about README - there are described fundamental things - that should be
stable. With last changes and this set of patches, the autocomplete is not
trivial and I am sure, so any documentation is better than nothing. Not all
developers has years of experience with PostgreSQL hacking.



> 0005 extends 0002.
> 0006 prevents incorrect completions in obscure circumstances.
> 0007 adds some kind of tab completion for CREATE RULE; I'm fuzzy on the
> details.
> 0008 improves tab completion after EXPLAIN.
> 0009-0014 uses the infrastructure from 0003 to improve tab completion
> for various commands.  They say they're merely simplifying tab
> completion for those things, but actually they're extending it to some
> obscure situations that aren't currently covered.
> 0015 adds completion for magic keywords like CURRENT_USER when role
> commands are used.
> 0016 refactors tab completion for ALTER DEFAULT PRIVILEGES, possibly
> improving it somehow.
> 0017 implements the titular feature.
> 0018 adds optional debugging output.
> 0019 improves things for CREATE OR REPLACE completion.
>
> Phew.  That's a lot of work for relatively obscure improvements to tab
> completion.  I grant that the result is probably better, but it's a
> lot of code change for what we get out of it.  I'm not saying we
> should reject it on that basis, but it may be the reason why nobody's
> jumped in to work on getting this committed.
>

These patches are big - but in the end it cleaning tab complete code, and
open a doors for more smarter completion.

Some features can be interesting for users too - repeated writing IF
EXISTS, IF NOT EXISTS or OR REPLACE is really scary - mainly so some other
parts of tab complete are friendly enough now.

Can be solution a splitting this set of patches to more independent parts?
We should to start with refactoring. Other patches can be processed
individually - with individual discussion.

Regards

Pavel



>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: [HACKERS] gitlab post-mortem: pg_basebackup waiting for checkpoint

2017-02-26 Thread Michael Banck

Hi,

Am Dienstag, den 14.02.2017, 18:18 -0500 schrieb Robert Haas:
> On Tue, Feb 14, 2017 at 4:06 PM, Alvaro Herrera
>  wrote:
> > I'd rather have a --quiet mode instead.  If you're running it by hand,
> > you're likely to omit the switch, whereas when writing the cron job
> > you're going to notice lack of switch even before you let the job run
> > once.
> 
> Well, that might've been a better way to design it, but changing it
> now would break backward compatibility and I'm not really sure that's
> a good idea.  Even if it is, it's a separate concern from whether or
> not in the less-quiet mode we should point out that we're waiting for
> a checkpoint on the server side.

ISTM the consensus is that there should be no output in regular mode,
but a message should be displayed in verbose and progress mode.

So I went forth and also added a message in progress mode (unless
verbose messages are requested anyway).

Regarding the documentation, I tried to clarify the difference between
the checkpoint types a bit more, but I think any further action is
probably a larger rewrite of the documentation on this topic.

So attached are two patches, I've split it up in the documentation and
the code output part. I'll add it as one commitfest entry in the
"Clients" section though, as it's not really a big patch, unless
somebody thinks it should have a secon entry in "Documentation"?

Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
From bcbe19855f9f94eadf9e47a7f3b9a920a7f2a616 Mon Sep 17 00:00:00 2001
From: Michael Banck 
Date: Sun, 26 Feb 2017 18:06:40 +0100
Subject: [PATCH 1/2] Documentation updates regarding checkpoints for
 basebackups.

Mention that fast and immediate checkpoints are the same, and add a paragraph to
the pg_basebackup documentation about the checkpoint taken on the remote server.
---
 doc/src/sgml/backup.sgml|  3 ++-
 doc/src/sgml/ref/pg_basebackup.sgml | 10 +-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 5f009ee..9485d87 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -862,7 +862,8 @@ SELECT pg_start_backup('label', false, false);
  ).  This is
  usually what you want, because it minimizes the impact on query
  processing.  If you want to start the backup as soon as
- possible, change the second parameter to true.
+ possible, change the second parameter to true, which will
+ issue an immediate checkpoint using as much I/O as possible.

diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index c9dd62c..c197630 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -419,7 +419,7 @@ PostgreSQL documentation
   --checkpoint=fast|spread

-Sets checkpoint mode to fast or spread (default) (see ).
+Sets checkpoint mode to fast (immediate) or spread (default) (see ).

@@ -660,6 +660,14 @@ PostgreSQL documentation
   Notes

+   At the beginning of the backup, a checkpoint needs to be written on the
+   server the backup is taken from.  Especially if the option
+   --checkpoint=fast is not used, this can take some time
+   during which pg_basebackup will be idle on the
+   server it is running on.
+  
+
+  
The backup will include all files in the data directory and tablespaces,
including the configuration files and any additional files placed in the
directory by third parties, except certain temporary files managed by
-- 
2.1.4

From 1e4051dff9710382b6b4f63373a304c6ce70c4ac Mon Sep 17 00:00:00 2001
From: Michael Banck 
Date: Sun, 26 Feb 2017 20:23:21 +0100
Subject: [PATCH 2/2] Mention initial checkpoint in pg_basebackup for
 verbose/progess output.

Before the actual data directory contents are streamed, a checkpoint is
taken on the remote server. Especially if no fast checkpoint is
requested, this can take quite a while during which the pg_basebackup
command apparently sits idle doing nothing.

To alert the user that work is being done on the remote server, mention
the checkpoint if verbose or progress output has been requested.  As
pg_basebackup does not output anything during regular operation, no
additional output is printed in this case.

Also harmonize some other verbose messages in passing.
---
 src/bin/pg_basebackup/pg_basebackup.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index bc997dc..4b75e76 100644
---

Re: [HACKERS] PUBLICATIONS and pg_dump

2017-02-26 Thread Petr Jelinek

On 08/02/17 05:02, Stephen Frost wrote:
> Peter,
> 
> On Tue, Feb 7, 2017 at 22:49 Peter Eisentraut
>  > wrote:
> 
> On 2/7/17 3:19 PM, Stephen Frost wrote:
> > I understand that this is a bit complicated, but I would have thought
> > we'd do something similar to what is done for DEFAULT PRIVILEGES,
> where
> > we include the "global" default privileges when we are doing a dump of
> > "everything", but if we're dumping a specific schema then we only
> > include the default privileges directly associated with that schema.
> >
> > Perhaps we need to include publications which are specific to a
> > particular table, but the current logic of, essentially, "always
> include
> > all publications" does not seem to make a lot of sense to me.
> 
> I think it would be sensible to refine it along those lines.
> 
> 
> Great!  I've added it to the open items list for PG10. 
> 

Yeah that was oversight in initial patch, publications and their
membership was supposed to be dumped only when table filter is not used.
I mistakenly made it check for data_only instead of using the
selectDumpableObject machinery.

Fix attached.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From 907dcca71712558ba954a28cdf65ccdd77473bfe Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Sun, 26 Feb 2017 20:12:26 +0100
Subject: [PATCH] Don't dump publications with pg_dump -t

---
 src/bin/pg_dump/pg_dump.c| 29 +
 src/bin/pg_dump/t/002_pg_dump.pl | 17 +
 2 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 7273ec8..8f7245d 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -1613,6 +1613,23 @@ selectDumpableExtension(ExtensionInfo *extinfo, DumpOptions *dopt)
 }
 
 /*
+ * selectDumpablePublicationTable: policy-setting subroutine
+ *		Mark a publication table as to be dumped or not
+ *
+ * Publication tables have schemas but those should be ignored in decitions
+ * making as publications are only dumped when we are dumping everything.
+ */
+static void
+selectDumpablePublicationTable(DumpableObject *dobj, Archive *fout)
+{
+	if (checkExtensionMembership(dobj, fout))
+		return;	/* extension membership overrides all else */
+
+	dobj->dump = fout->dopt->include_everything ?
+		DUMP_COMPONENT_ALL : DUMP_COMPONENT_NONE;
+}
+
+/*
  * selectDumpableObject: policy-setting subroutine
  *		Mark a generic dumpable object as to be dumped or not
  *
@@ -3389,6 +3406,9 @@ getPublications(Archive *fout)
 		if (strlen(pubinfo[i].rolname) == 0)
 			write_msg(NULL, "WARNING: owner of publication \"%s\" appears to be invalid\n",
 	  pubinfo[i].dobj.name);
+
+		/* Decide whether we want to dump it */
+		selectDumpableObject(&(pubinfo[i].dobj), fout);
 	}
 	PQclear(res);
 
@@ -3402,11 +3422,10 @@ getPublications(Archive *fout)
 static void
 dumpPublication(Archive *fout, PublicationInfo *pubinfo)
 {
-	DumpOptions *dopt = fout->dopt;
 	PQExpBuffer delq;
 	PQExpBuffer query;
 
-	if (dopt->dataOnly)
+	if (!(pubinfo->dobj.dump & DUMP_COMPONENT_DEFINITION))
 		return;
 
 	delq = createPQExpBuffer();
@@ -3534,6 +3553,9 @@ getPublicationTables(Archive *fout, TableInfo tblinfo[], int numTables)
 			pubrinfo[j].dobj.namespace = tbinfo->dobj.namespace;
 			pubrinfo[j].pubname = pg_strdup(PQgetvalue(res, j, i_pubname));
 			pubrinfo[j].pubtable = tbinfo;
+
+			/* Decide whether we want to dump it */
+			selectDumpablePublicationTable(&(pubrinfo[j].dobj), fout);
 		}
 		PQclear(res);
 	}
@@ -3547,12 +3569,11 @@ getPublicationTables(Archive *fout, TableInfo tblinfo[], int numTables)
 static void
 dumpPublicationTable(Archive *fout, PublicationRelInfo *pubrinfo)
 {
-	DumpOptions *dopt = fout->dopt;
 	TableInfo  *tbinfo = pubrinfo->pubtable;
 	PQExpBuffer query;
 	char	   *tag;
 
-	if (dopt->dataOnly)
+	if (!(pubrinfo->dobj.dump & DUMP_COMPONENT_DEFINITION))
 		return;
 
 	tag = psprintf("%s %s", pubrinfo->pubname, tbinfo->dobj.name);
diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl
index f73bf89..40509a4 100644
--- a/src/bin/pg_dump/t/002_pg_dump.pl
+++ b/src/bin/pg_dump/t/002_pg_dump.pl
@@ -2242,14 +2242,14 @@ qr/CREATE TRANSFORM FOR integer LANGUAGE sql \(FROM SQL WITH FUNCTION pg_catalog
 			exclude_test_table   => 1,
 			no_privs => 1,
 			no_owner => 1,
-			only_dump_test_schema=> 1,
-			only_dump_test_table => 1,
 			pg_dumpall_dbprivs   => 1,
 			schema_only  => 1,
-			section_post_data=> 1,
-			test_schema_plus_blobs   => 1, },
+			section_post_data=> 1, },
 		unlike => {
 			section_pre_data => 1,
+			only_dump_test_table => 1,
+			test_schema_plus_blobs   => 1,
+

Re: [HACKERS] IF (NOT) EXISTS in psql-completion

2017-02-26 Thread Robert Haas

On Wed, Feb 22, 2017 at 12:38 AM, Pavel Stehule  wrote:
> Now first patch is broken :(
>
> It is pretty sensitive to any changes. Isn't possible to commit first four
> patches first and separately maybe out of commitfest window?

Yeah, maybe, but we'd need a committer to take more of an interest in
this patch series.  Personally, I'm wondering why we need a series of
19 patches to add tab completion support for IF NOT EXISTS.  The
feature which is the subject of this thread arrives in patch 0017, and
a lot of the patches which come before that seem to change a lot of
stuff without actually improving much that would really benefit users.

0001 seems like a lot of churn for no real benefit that I can immediately see.
0002 is a real feature, and probably a good one, though unrelated to
the subject of this thread.  In the process, it changes many lines of
code in fairly mechanical ways; does it need to do that?
0003 is infrastructure.
0004 adds a README.  Do we really need that?  It seems to be
explaining things which are mostly fairly clear from just looking at
the code.  If we add a README, we have to update it when we change
things.  That's worthwhile if it helps people write code better, I'm
not sure if it will do that.
0005 extends 0002.
0006 prevents incorrect completions in obscure circumstances.
0007 adds some kind of tab completion for CREATE RULE; I'm fuzzy on the details.
0008 improves tab completion after EXPLAIN.
0009-0014 uses the infrastructure from 0003 to improve tab completion
for various commands.  They say they're merely simplifying tab
completion for those things, but actually they're extending it to some
obscure situations that aren't currently covered.
0015 adds completion for magic keywords like CURRENT_USER when role
commands are used.
0016 refactors tab completion for ALTER DEFAULT PRIVILEGES, possibly
improving it somehow.
0017 implements the titular feature.
0018 adds optional debugging output.
0019 improves things for CREATE OR REPLACE completion.

Phew.  That's a lot of work for relatively obscure improvements to tab
completion.  I grant that the result is probably better, but it's a
lot of code change for what we get out of it.  I'm not saying we
should reject it on that basis, but it may be the reason why nobody's
jumped in to work on getting this committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Documentation improvements for partitioning

2017-02-26 Thread Petr Jelinek

On 24/02/17 07:15, Robert Haas wrote:
> On Fri, Feb 24, 2017 at 9:53 AM, Simon Riggs  wrote:
>> The good news is that logical replication DOES work with partitioning,
>> but only for a Publication with PUBLISH INSERT, pushing from a normal
>> table to a partitioned one. Which is useful enough for first release.
>>
>> The work on having UPDATE work between partitions can be used to make
>> updates and deletes work in logical replication. That might yet be
>> made to work in this release, and I think we should look into it, but
>> I think it can be left til next release if we try.
> 
> Are you speaking of the case where you want to replicate from an
> unpartitioned table to a partitioned table?  I would imagine that if
> you are replicating between two identical partitioning hierarchies, it
> would just work.  (If not, that's probably a bug, though I don't know
> whose bug.)
> 

Yes same hierarchies will work but mainly because one has to add
partitions themselves to publication currently. I guess that's the
limitation we might have to live with in 10 as adding the whole
partitioned table should probably work for different hierarchies when we
enable it and I am not quite sure that's doable before start of the CF
at this point.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Bug fix] PQsendQuery occurs error when target_session_attrs is set to read-write

2017-02-26 Thread Robert Haas

On Mon, Feb 20, 2017 at 11:52 AM, Michael Paquier
 wrote:
> On Thu, Feb 16, 2017 at 10:55 AM, Michael Paquier
>  wrote:
>> It is possible to get a test easily in this area by abusing of the
>> fact that multiple -d switches defined in psql make it use only the
>> last value. By looking at psql() in PostgresNode.pm you would see what
>> I mean as -d is defined by default. Or we could just enforce
>> PGHOST/PGPORT as there is a method to get the unix domain directory.
>> Not sure which one of those two methods is most elegant though. I
>> would tend for just using PGHOST and PGPORT.
>
> What do you think about the patch attached then? As env{PGHOST} is set
> once and for all in INIT for each test run, I have gone with the
> approach of building connection strings and feed that to psql -d. I
> have reused 001_stream_rep.pl so as connections are done on already
> existing nodes, making the test really cheap. Here is how the tests
> look:
> # testing connection parameter target_session_attrs
> ok 5 - connect to node master if mode "read-write" and master,standby_1 listed
> ok 6 - connect to node master if mode "read-write" and standby_1,master listed
> ok 7 - connect to node master if mode "any" and master,standby_1 listed
> ok 8 - connect to node standby_1 if mode "any" and standby_1,master listed
>
> I have registered an entry in the CF here:
> https://commitfest.postgresql.org/13/1014/

Thanks!  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] I propose killing PL/Tcl's "modules" infrastructure

2017-02-26 Thread Tom Lane

I wrote:
> Now, we could try to fix this bug, and add the regression test coverage
> that the code clearly lacks, and upgrade the documentation about it from
> its currently very sad state.  But I think the right answer is just to
> remove the feature altogether.

BTW, I tried to poke into what it would take to write some regression test
coverage, and immediately hit a show-stopper:

$ pltcl_loadmod --help
can't find package Pgtcl
while executing
"package require Pgtcl"
(file "/home/postgres/testversion/bin/pltcl_loadmod" line 10)

That is, these scripts depend on the old Tcl client library, which
we removed from our core distro in 2004 (cf commit 41fa9e9ba).
So we don't even have a way of creating self-contained tests for them.

At this point I think there's no question that src/pl/tcl/modules/
needs to go away.  There might be some argument for retaining the
"autoload the unknown module" startup behavior in pltcl proper, but
I think that Andrew Dunstan's proposal of calling an initialization
function is a far cleaner solution.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should logtape.c blocks be of type long?

2017-02-26 Thread Peter Geoghegan

On Sun, Feb 26, 2017 at 9:48 AM, Tom Lane  wrote:
> [ shrug... ]  If you're excited enough about it to do the work, I won't
> stand in your way.  But I don't find it to be a stop-ship issue.

I'll add it to my todo list for Postgres 10.

I think it's worth being consistent about a restriction like this, as
Robert said. Given that fixing this issue will not affect the machine
code generated by compilers for the majority of platforms we support,
doing so seems entirely worthwhile to me.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should logtape.c blocks be of type long?

2017-02-26 Thread Tom Lane

Peter Geoghegan  writes:
> On Sun, Feb 26, 2017 at 9:07 AM, Tom Lane  wrote:
>> Having said that, I'm not sure it's worth the trouble of changing.
>> The platforms where there's a difference are probably not muscular
>> enough that anyone would ever get past 16TB in a temp file anyhow.

> As things stand, a 64-bit windows installation would have any CLUSTER
> of a table that exceeds 16TiB fail, possibly pretty horribly (I
> haven't thought through the consequences much). This is made more
> likely by the fact that we've made tuplesort faster in the past few
> releases (gains which the MAX_KILOBYTES restriction won't impinge on
> too much, particularly in Postgres 10). I find that unacceptable, at
> least for Postgres 10.

[ shrug... ]  If you're excited enough about it to do the work, I won't
stand in your way.  But I don't find it to be a stop-ship issue.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Instability in select_parallel regression test

2017-02-26 Thread Robert Haas

On Mon, Feb 20, 2017 at 7:52 AM, Amit Kapila  wrote:
> On Sun, Feb 19, 2017 at 8:32 PM, Robert Haas  wrote:
>> On Sun, Feb 19, 2017 at 6:50 PM, Amit Kapila  wrote:
>>> To close the remaining gap, don't you think we can check slot->in_use
>>> flag when generation number for handle and slot are same.
>>
>> That doesn't completely fix it either, because
>> ForgetBackgroundWorker() also does
>> BackgroundWorkerData->parallel_terminate_count++, which we might also
>> fail to see, which would cause RegisterDynamicBackgroundWorker() to
>> bail out early.  There are CPU ordering effects to think about here,
>> not just the order in which the operations are actually performed.
>>
>
> Sure, I think we can attempt to fix that as well by adding write
> memory barrier in ForgetBackgroundWorker().

I don't think so.

> The main point is if we
> keep any loose end in this area, then there is a chance that the
> regression test select_parallel can still fail, if not now, then in
> future.  Another way could be that we can try to minimize the race
> condition here and then adjust the select_parallel as suggested above
> so that we don't see this failure.

My guess is that if we apply the fix I suggested above, it'll be good
enough.  If that turns out not to be true, then I guess we'll have to
deal with that, but why not do the easy thing first?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should logtape.c blocks be of type long?

2017-02-26 Thread Robert Haas

On Sun, Feb 26, 2017 at 10:37 PM, Tom Lane  wrote:
> Having said that, I'm not sure it's worth the trouble of changing.
> The platforms where there's a difference are probably not muscular
> enough that anyone would ever get past 16TB in a temp file anyhow.

Is this comment a reflection of your generally-low opinion about
Windows, or some specific limitation that I don't know about?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] error detail when partition not found

2017-02-26 Thread Robert Haas

On Tue, Feb 21, 2017 at 7:28 AM, Amit Langote
 wrote:
> Simon pointed out in a nearby thread [0] that the detail part of
> partition-not-found error should show just the partition keys.  I posted a
> patch on that thread [1], but to avoid confusion being caused by multitude
> of patches over there I'm re-posting it here.

Thanks.  GetPartitionFailureData seems like a strange name for a
datatype, particularly the "Get" part.  How about
PartitionRoutingFailureInfo?  Or just two out parameters.

Spelling: BuildSlotPartitinKeyDescription (in comment).

ExecBuildSlotPartitionKeyDescription could have a comment saying that
it's LIKE BuildIndexValueDescription() instead of copy-and-pasting the
comments. And maybe BuildIndexValueDescription() could also get a
comment saying that if we change anything there, we should check
whether ExecBuildSlotPartitionKeyDescription() needs a similar change.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should logtape.c blocks be of type long?

2017-02-26 Thread Peter Geoghegan

On Sun, Feb 26, 2017 at 9:07 AM, Tom Lane  wrote:
> Yeah.  This code is far older than our willingness to assume that every
> platform can support int64, and I'm pretty sure that use of "long" was
> just a compromise to get the widest values we could use portably and
> without a lot of notational hassle.  (There are some similar choices in
> the area of memory usage, particularly calculations related to work_mem.)

I'm glad that you pointed out the history with work_mem calculations
specifically, since I have found this confusing in the past. I was
about to ask "what about 64-bit Windows?", but then remembered that we
don't actually support large allocations on that platform (this is why
MAX_KILOBYTES exists).

> Having said that, I'm not sure it's worth the trouble of changing.
> The platforms where there's a difference are probably not muscular
> enough that anyone would ever get past 16TB in a temp file anyhow.

As things stand, a 64-bit windows installation would have any CLUSTER
of a table that exceeds 16TiB fail, possibly pretty horribly (I
haven't thought through the consequences much). This is made more
likely by the fact that we've made tuplesort faster in the past few
releases (gains which the MAX_KILOBYTES restriction won't impinge on
too much, particularly in Postgres 10). I find that unacceptable, at
least for Postgres 10.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Append implementation

2017-02-26 Thread Robert Haas

On Mon, Feb 20, 2017 at 10:54 AM, Ashutosh Bapat
 wrote:
> On Sun, Feb 19, 2017 at 2:33 PM, Robert Haas  wrote:
>> On Fri, Feb 17, 2017 at 11:44 AM, Ashutosh Bapat
>>  wrote:
>>> That's true for a partitioned table, but not necessarily for every
>>> append relation. Amit's patch is generic for all append relations. If
>>> the child plans are joins or subquery segments of set operations, I
>>> doubt if the same logic works. It may be better if we throw as many
>>> workers (or some function "summing" those up) as specified by those
>>> subplans. I guess, we have to use different logic for append relations
>>> which are base relations and append relations which are not base
>>> relations.
>>
>> Well, I for one do not believe that if somebody writes a UNION ALL
>> with 100 branches, they should get 100 (or 99) workers.  Generally
>> speaking, the sweet spot for parallel workers on queries we've tested
>> so far has been between 1 and 4.  It's straining credulity to believe
>> that the number that's correct for parallel append is more than an
>> order of magnitude larger.  Since increasing resource commitment by
>> the logarithm of the problem size has worked reasonably well for table
>> scans, I believe we should pursue a similar approach here.
>
> Thanks for that explanation. I makes sense. So, something like this
> would work: total number of workers = some function of log(sum of
> sizes of relations). The number of workers allotted to each segment
> are restricted to the the number of workers chosen by the planner
> while planning that segment. The patch takes care of the limit right
> now. It needs to incorporate the calculation for total number of
> workers for append.

log(sum of sizes of relations) isn't well-defined for a UNION ALL query.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tab completion for partitioning

2017-02-26 Thread Robert Haas

On Mon, Feb 20, 2017 at 8:37 AM, Amit Langote
 wrote:
> Thanks for taking a look.  Hm, I think the second part seems to be
> needless duplication.  So, I changed it to match using TailMatches2("FOR",
> "VALUES") and kept just one instance of it.  The first part is matching
> and completing two different commands (ATTACH PARTITION partition_name and
> PARTITION OF parent_name), so that seems fine.

I agree.  Committed this version.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] reorder tablespaces in basebackup tar stream for backup_label

2017-02-26 Thread Michael Banck

Hi,

Am Sonntag, den 26.02.2017, 22:25 +0530 schrieb Robert Haas:
> On Tue, Feb 21, 2017 at 3:47 PM, Michael Banck
>  wrote:
> > So I am proposing the attached patch, which sends the base tablespace
> > first, and then all the other external tablespaces afterwards, thus
> > having base_backup be the first file in the tar in all cases. Does
> > anybody see a problem with that?
> 
> Please add this to commitfest.postgresql.org so it doesn't get forgotten.

Right, I was going to do that and have done so now, thanks for the
reminder.

Also, attached is a slightly changed patch which expands on the reason
of the change in the comment.

Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
From 9829ad0da1950b0c928c8b2adfc93a98271930d1 Mon Sep 17 00:00:00 2001
From: Michael Banck 
Date: Sun, 26 Feb 2017 17:59:38 +0100
Subject: [PATCH] Reorder tablespaces for streaming basebackups.

The replication protocol documentation appears to express that the main
tablespace is th first to be sent, however, it is actually that last one. This
makes the backup_label file (which gets prepended to the main tablespace)
inconveniently end up in the middle of the basebackup stream if other
tablespaces are present.

Change this so that the main tablespace is the first to be sent, ensuring that
backup_label is the first file in the stream.
---
 src/backend/replication/basebackup.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 09ecc15..b806205 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -230,10 +230,11 @@ perform_base_backup(basebackup_options *opt, DIR *tblspcdir)
 		else
 			statrelpath = pgstat_stat_directory;

-		/* Add a node for the base directory at the end */
+		/* Add a node for the base directory at the beginning.  This way, the
+		 * backup_label file is always the first file to be sent. */
 		ti = palloc0(sizeof(tablespaceinfo));
 		ti->size = opt->progress ? sendDir(".", 1, true, tablespaces, true) : -1;
-		tablespaces = lappend(tablespaces, ti);
+		tablespaces = lcons(ti, tablespaces);

 		/* Send tablespace header */
 		SendBackupHeader(tablespaces);
-- 
2.1.4

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should logtape.c blocks be of type long?

2017-02-26 Thread Tom Lane

Peter Geoghegan  writes:
> logtape.c stores block numbers on disk. These block numbers are
> represented in memory as being of type long.

Yeah.  This code is far older than our willingness to assume that every
platform can support int64, and I'm pretty sure that use of "long" was
just a compromise to get the widest values we could use portably and
without a lot of notational hassle.  (There are some similar choices in
the area of memory usage, particularly calculations related to work_mem.)

Having said that, I'm not sure it's worth the trouble of changing.
The platforms where there's a difference are probably not muscular
enough that anyone would ever get past 16TB in a temp file anyhow.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] reorder tablespaces in basebackup tar stream for backup_label

2017-02-26 Thread Robert Haas

On Tue, Feb 21, 2017 at 3:47 PM, Michael Banck
 wrote:
> currently, the backup_label and (I think) the tablespace_map files are
> (by design) conveniently located at the beginning of the main tablespace
> tarball when making a basebackup. However, (by accident or also by
> design?) the main tablespace is the last tarball[1] to be streamed via
> the BASE_BACKUP replication protocol command.
>
> For pg_basebackup, this is not a real problem, as either each tablespace
> is its own tarfile (in tar format mode), or the files are extracted
> anyway (in plain mode).
>
> However, third party tools using the BASE_BACKUP command might want to
> extract the backup_label, e.g. in order to figure out the START WAL
> LOCATION. If they make a big tarball for the whole cluster potentially
> including all external tablespaces, then the backup_label file is
> somewhere in the middle of it and it takes a long time for tar to
> extract it.
>
> So I am proposing the attached patch, which sends the base tablespace
> first, and then all the other external tablespaces afterwards, thus
> having base_backup be the first file in the tar in all cases. Does
> anybody see a problem with that?

Please add this to commitfest.postgresql.org so it doesn't get forgotten.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] mat views stats

2017-02-26 Thread Robert Haas

On Wed, Feb 22, 2017 at 11:13 AM, Jim Nasby  wrote:
> Certainly easier, but I don't think it'd be better. Matviews really aren't
> the same thing as tables. Off-hand (without reviewing the patch), update and
> delete counts certainly wouldn't make any sense. "Insert" counts might, in
> as much as it's how many rows have been added by refreshes. You'd want a
> refresh count too.

Regular REFRESH truncates the view and repopulates it, but REFRESH
CONCURRENTLY does inserts, updates, and deletes as needed to adjust
the contents.  So I think all the same counters that make sense for
regular tables are also sensible here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-02-26 Thread Robert Haas

On Wed, Feb 22, 2017 at 11:49 PM, Mithun Cy  wrote:
> Hi all thanks,
> I have tried to fix all of the comments given above with some more
> code cleanups.

While reading this patch tonight, I realized a serious problem with
the entire approach, which is that this patch is supposing that we can
read relation blocks for every database from a single worker that's
not connected to any database.  I realize that I suggested that
approach, but now I think it's broken, because the patch isn't taking
any locks on the relations whose pages it is reading, and that is
definitely going to break things.  While autoprewarm is busy sucking
blocks into the shared buffer cache, somebody could be, for example,
dropping one of those relations.  DropRelFileNodesAllBuffers and
friends expect that nobody is going to be concurrently reading blocks
back into the buffer cache because they hold AccessExclusiveLock, and
they assume that anybody else who is touching it will hold at least
AccessShareLock.  But this violates that assumption, and probably some
others.

This is not easy to fix.  The lock has to be taken based on the
relation OID, not the relfilenode, but we don't have the relation OID
in the dump file, and recording it there won't help, because the
relfilenode can change under us if the relation is rewritten with
CLUSTER or VACUUM FULL or relevant forms of ALTER TABLE.  I don't see
a solution other than launching a separate worker for each database,
which seems like it could be extremely expensive if there are many
databases.  Also, I am pretty sure it's no good to take locks before
recovery reaches a consistent state.  I'm not sure off-hand whether
crash-recovery will notice conflicting locks, but even if it does,
blocking crash recovery in order to prewarm stuff is bad news.

Here are some other review comments (which may not matter unless we
can think up a solution to the problems above).

- I think auto_pg_prewarm.c is an unnecessarily long name.  How about
autoprewarm.c?

- It looks like you ran pgindent over this without adding the new
typedefs to your typedefs.list, so things like the last line of each
typedef is formatted incorrectly.

- ReadBufferForPrewarm isn't a good name for this interface.  You need
to give it a generic name (and header comment) that describes what it
does, rather than why you added it.  Others might want to also use
this interface.  Actually, an even better idea might be to adjust
ReadBufferWithoutRelcache() to serve your need here.  That function's
header comment seems to contemplate that somebody might want to add a
relpersistence argument someday; perhaps that day has arrived.

- have_free_buffer's comment shouldn't mention autoprewarm, but it
should mention that this is a lockless test, so the results might be
slightly stale.  See similar comments in various other backend
functions for an example of how to write this.

- next_task could be static, and with such a generic name, it really
MUST be static to avoid namespace conflicts.

- load_block() has a race condition.  The relation could be dropped
after you check smgrexists() and before you access the relation.
Also, the fork number validity check looks useless; it should never
fail.

- I suggest renaming the file that stores the blocks to
autoprewarm.blocks or something like that.  Calling it just
"autopgprewarm" doesn't seem very clear.

- I don't see any reason for the dump file to contain a header record
with an expected record count.  When rereading the file, you can just
read until EOF; there's no real need to know the record count before
you start.

- You should test for multiple flags like this: if ((buf_state &
(BM_VALID|VM_TAG_VALID|BM_PERSISTENT)) != 0).  However, I also think
the test is wrong. Even if the buffer isn't BM_VALID, that's not
really a reason not to include it in the dump file.  Same with
BM_PERSISTENT.  I think the reason for the latter restriction is that
you're always calling load_block() with RELPERSISTENCE_PERMANENT, but
that's not a good idea either.  If the relation were made unlogged
after you wrote the dump file, then on reload it you'd incorrectly set
BM_PERMANENT on the reloaded blocks.

- elog() should not be used for user-facing messages, but rather only
for internal messages that we don't expect to get generated.  Also,
the messages you've picked don't conform to the project's message
style guidelines.

- The error handling loop around load_block() suggests that you're
expecting some reads to fail, which I guess is because you could be
trying to read blocks from a relation that's been rewritten under a
different relfilenode, or partially or entirely truncated.  But I
don't think it's very reasonable to just let ReadBufferWhatever() spew
error messages into the log and hope users don't panic.  People will
expect an automatic prewarm solution to handle any such cases quietly,
not bleat loudly in the log.  I suspect that this error-trapping code
isn't entirely correct,

Re: [HACKERS] Parallel bitmap heap scan

2017-02-26 Thread Robert Haas

On Tue, Feb 21, 2017 at 10:20 AM, Dilip Kumar  wrote:
> 0001:
> 1. I have got a new interface, "tbm_free_shared_area(dsa_area *dsa,
> dsa_pointer dp)"  which will be responsible for freeing all the shared
> members (pagetable, ptpage and ptchunk).  Actually, we can not do this
> in tbm_free itself because the only leader will have a local tbm with
> reference to all these pointers and our parallel bitmap leader may not
> necessarily be the actual backend.

I'm not entirely sure about the calling convention for
tbm_free_shared_area() but the rest seems OK.

> 2. Now tbm_free is not freeing any of the shared members which can be
> accessed by other worker so tbm_free is safe to call from
> ExecEndBitmapHeapScan without any safety check or ref count.

That also seems fine. We ended up with something very similar in the
Parallel Index Scan patch.

> 0002:
> 1. We don't need ExecShutdownBitmapHeapScan anymore because now we are
> not freeing any shared member in ExecEndBitmapHeapScan.
> 2. In ExecReScanBitmapHeapScan we will call tbm_free_shared_area to
> free the shared members of the TBM.
> 3. After that, we will free TBMSharedIteratorState what we allocated
> using tbm_prepare_shared_iterate.

Check.  But I think tbm_free_shared_area() should also free the object
itself, instead of making the caller do that separately.

+if (DsaPointerIsValid(node->pstate->tbmiterator))
+{
+/* First we free the shared TBM members using the shared state */
+tbm_free_shared_area(dsa, node->pstate->tbmiterator);
+dsa_free(dsa, node->pstate->tbmiterator);
+}
+
+if (DsaPointerIsValid(node->pstate->prefetch_iterator))
+dsa_free(dsa, node->pstate->prefetch_iterator);

The fact that these cases aren't symmetric suggests that your
abstraction is leaky.  I'm guessing that you can't call
tbm_free_shared_area because the two iterators share one copy of the
underlying iteration arrays, and the TBM code isn't smart enough to
avoid freeing them twice.  You're going to have to come up with a
better solution to that problem; nodeBitmapHeapScan.c shouldn't know
about the way the underlying storage details are managed.  (Maybe you
need to reference-count the iterator arrays?)

+if (node->inited)
+goto start_iterate;

My first programming teacher told me not to use goto.  I've
occasionally violated that rule, but I need a better reason than you
have here.  It looks very easy to avoid.

+pbms_set_parallel(outerPlanState(node));

I think this should be a flag in the plan, and the planner should set
it correctly, instead of having it be a flag in the executor that the
executor sets.  Also, the flag itself should really be called
something that involves the word "shared" rather than "parallel",
because the bitmap will not be created in parallel, but it will be
shared.

Have you checked whether this patch causes any regression in the
non-parallel case?  It adds a bunch of "if" statements, so it might.
Hopefully not, but it needs to be carefully tested.

@@ -48,10 +48,11 @@
 #include "utils/snapmgr.h"
 #include "utils/tqual.h"

-
 static TupleTableSlot *BitmapHeapNext(BitmapHeapScanState *node);

Unnecessary.

+static bool pbms_is_leader(ParallelBitmapState *pbminfo);
+static void pbms_set_parallel(PlanState *node);

I don't think this "pbms" terminology is very good.  It's dissimilar
to the way other functions in this file are named.  Maybe
BitmapShouldInitializeSharedState().

I think that some of the bits that this function makes conditional on
pstate should be factored out into inline functions.  Like:

-if (node->prefetch_target >= node->prefetch_maximum)
- /* don't increase any further */ ;
-else if (node->prefetch_target >= node->prefetch_maximum / 2)
-node->prefetch_target = node->prefetch_maximum;
-else if (node->prefetch_target > 0)
-node->prefetch_target *= 2;
-else
-node->prefetch_target++;
+if (!pstate)
+{
+if (node->prefetch_target >= node->prefetch_maximum)
+ /* don't increase any further */ ;
+else if (node->prefetch_target >= node->prefetch_maximum / 2)
+node->prefetch_target = node->prefetch_maximum;
+else if (node->prefetch_target > 0)
+node->prefetch_target *= 2;
+else
+node->prefetch_target++;
+}
+else if (pstate->prefetch_target < node->prefetch_maximum)
+{
+SpinLockAcquire(>prefetch_mutex);
+if (pstate->prefetch_target >= node->prefetch_maximum)
+ /* don't increase any further */ ;
+else if (pstate->prefetch_target >=
+ node->prefetch_maximum / 2)
+pstate->prefetch_target =

Re: [HACKERS] Logical replication existing data copy

2017-02-26 Thread Erik Rijkers


On 2017-02-26 10:53, Erik Rijkers wrote:


Not yet perfect, but we're getting there...


Sorry, I made a mistake: I was running the newest patches on master but 
the older versions on replica (or more precise: I didn't properly 
shutdown the replica so the older version remained up and running during 
subsequent testing).


So my last email mentioning the 'DROP SUBSCRIPTION' hang error are 
hopefully wrong.


I'll get back when I've repeated these tests. This will take some hours 
(at least).


Sorry to cause you these palpitations, perhaps unnecessarily...


Erik Rijkers





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance degradation in TPC-H Q18

2017-02-26 Thread Robert Haas

On Wed, Feb 22, 2017 at 11:23 AM, Kuntal Ghosh
 wrote:
> While conducting the experiments for parallelism, Rafia came across a
> hang in Q18 when plan uses partial and finalize hash aggregate. This
> could be seen on both scale factors - 20 and 300, on setting work_mem
> high enough so that the query uses hash aggregate. It seems that
> commit b81b5a96f424531b97cdd1dba97d9d1b9c9d372e does not solve the
> issue completely.

Andres, any thoughts?  Isn't the same issue that we were discussing
https://www.postgresql.org/message-id/CA+TgmoYNO8qouPVO=1q2axzuxe942d_t5bcvzd9ikoc9tb3...@mail.gmail.com
over a month ago?

To me, it seems like a big problem that we have large, unfixed
performance regressions with simplehash four months after it went in.
I hate to suggest ripping the whole thing out, and it seems like
overkill, but it's pretty clear to me that the current state of things
is unacceptable, and that we're going to have a lot of unhappy users
if we ship it the way that it is right now.  I want to point out that
the kinds of problems we're hitting here are exactly what I told you I
was worried about before it went in - that the average-case
performance would be better but that there would be
all-too-easy-to-hit cases where things got much worse because the
whole thing degenerated into a linear search.  Not only did that
happen, but there seem to be multiple ways of producing it without
half trying, of which b81b5a96f424531b97cdd1dba97d9d1b9c9d372e fixed
only one.  Something that's 5-10% faster in common cases but 2x or 10x
slower when things go wrong is not an improvement.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Index usage for elem-contained-by-const-range clauses

2017-02-26 Thread Robert Haas

On Thu, Feb 23, 2017 at 4:47 AM, Pritam Baral  wrote:
> The topic has been previously discussed[0] on the -performance mailing list,
> about four years ago.
>
> In that thread, Tom suggested[0] the planner could be made to "expand
> "intcol <@
> 'x,y'::int4range" into "intcol between x and y", using something similar
> to the
> index LIKE optimization (ie, the "special operator" stuff in indxpath.c)".
>
> This patch tries to do exactly that.

Please add your patch to https://commitfest.postgresql.org/ so it
doesn't get overlooked.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling parallelism for queries coming from SQL or other PL functions

2017-02-26 Thread Robert Haas

On Wed, Feb 22, 2017 at 10:25 AM, Rafia Sabih
 wrote:
> 1. Allow parallelism for queries in PL functions by passing
> CURSOR_OPT_PARALLEL_OK instead of 0 to exec_prepare_plan called from
> exec_stmt_execsql or exec_stmt_dynexecute. Similarly, pass
> CURSOR_OPT_PARALLEL_OK instead of 0 to SPI_execute and exec_run_select
> called from exec_stmt_dynexecute. CURSOR_OPT_PARALLEL_OK is passed to
> these functions after checking if the statement is not trigger, since
> in that case using parallelism may not be efficient.
>
> 2. In ExecutePlan there is an additional check to see if the query is
> coming from SQL or PL functions and is having a parallel plan. In that
> case we ignore the check of numberTuples since it is always one for
> these functions and existing checks restrict parallelism for these
> cases. Though, I understand this may not be the most elegant way to do
> this, and would be pleased to know some better alternative.

I think I see the problem that you're trying to solve, but I agree
that this doesn't seem all that elegant.  The reason why we have that
numberTuples check is because we're afraid that we might be in a
context like the extended-query protocol, where the caller can ask for
1 tuple, and then later ask for another tuple.  That won't work,
because once we shut down the workers we can't reliably generate the
rest of the query results.  However, I think it would probably work
fine to let somebody ask for less than the full number of tuples if
it's certain that they won't later ask for any more.

So maybe what we ought to do is allow CURSOR_OPT_PARALLEL_OK to be set
any time we know that ExecutorRun() will be called for the QueryDesc
at most once rather than (as at present) only where we know it will be
executed only once with a tuple-count of zero.  Then we could change
things in ExecutePlan so that it doesn't disable parallel query when
the tuple-count is non-zero, but does take an extra argument "bool
execute_only_once", and it disables parallel execution if that is not
true.  Also, if ExecutorRun() is called a second time for the same
QueryDesc when execute_only_once is specified as true, it should
elog(ERROR, ...).  Then exec_execute_message(), for example, can pass
that argument as false when the tuple-count is non-zero, but other
places that are going to fetch a limited number of rows could pass it
as true even though they also pass a row-count.

I'm not sure if that's exactly right, but something along those lines
seems like it should work.

I think that a final patch for this functionality should involve
adding CURSOR_OPT_PARALLEL_OK to appropriate places in each PL, plus
maybe some infrastructure changes like the ones mentioned above.
Maybe it can be divided into two patches, one to make the
infrastructure changes and a second to add CURSOR_OPT_PARALLEL_OK to
more places.

+/* Allow parallelism if the query is read-only */
+if(read_only)
+plan.cursor_options = CURSOR_OPT_PARALLEL_OK;
+else
+plan.cursor_options = 0;

I don't think this can ever be right.  If the only thing we are
worried about is the query being read-only, then we could just pass
CURSOR_OPT_PARALLEL_OK everywhere and planner.c would figure it out
without any help from us.  But that's not the problem.  The problem is
that the PL may be using a function like SPI_prepare_cursor(), where
it's going to later use SPI_cursor_fetch() or similar.  Parallel query
can't handle cursor operations.  Whatever changes we make to spi.c
should involve passing CURSOR_OPT_PARALLEL_OK everywhere that we can
be sure there will be no cursor operations and not anywhere that we
might have cursor operations.   Cursor operations - or more
specifically anything that might try to suspend execution of the query
and resume later - are the problem.  Things that will already cause
the tests in standard_planner() to disable parallelism don't need to
be rechecked elsewhere:

if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
IsUnderPostmaster &&
dynamic_shared_memory_type != DSM_IMPL_NONE &&
parse->commandType == CMD_SELECT &&
!parse->hasModifyingCTE &&
max_parallel_workers_per_gather > 0 &&
!IsParallelWorker() &&
!IsolationIsSerializable())
{
/* all the cheap tests pass, so scan the query tree */
glob->maxParallelHazard = max_parallel_hazard(parse);
glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
}
else
{
/* skip the query tree scan, just assume it's unsafe */
glob->maxParallelHazard = PROPARALLEL_UNSAFE;
glob->parallelModeOK = false;
}

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allow pg_dumpall to work without pg_authid

2017-02-26 Thread Robins Tharakan

On 26 February 2017 at 21:37, Robert Haas  wrote:

>
> How's that not a bug?  I mean, it's reasonable for someone to want to
> restrict the superuser in a cloud environment, but if they restrict it
> so much that you can't take a backup with standard tools, I'd say they
> should also patch the tools, though maybe a better idea would be to
> restrict the superuser a bit less.
>

They 'should', and I completely agree; but that isn't the reality today.
Thus the patch.

Now it's understandable for the community to say 'not my problem' but
I'd say that depends on what it considers 'my problem'. If half the people
manage their installations on a cloud solution, it unwillingly becomes one.

Unrelated, it still does allow a non-superuser to take a dump sans the
password, which doesn't seem like a bad idea, since the patch
doesn't do anything more than dump from pg_roles, which is anyways
available to non-superusers.

I await if more object similarly.
-
robins

Re: [HACKERS] Enabling parallelism for queries coming from SQL or other PL functions

2017-02-26 Thread Robert Haas

On Sun, Feb 26, 2017 at 6:34 AM, Amit Kapila  wrote:
> On Sat, Feb 25, 2017 at 9:47 PM, Dilip Kumar  wrote:
>> On Sat, Feb 25, 2017 at 5:12 PM, Amit Kapila  wrote:
>>> Sure, but that should only happen if the function is *not* declared as
>>> parallel safe (aka in parallel safe functions, we should not generate
>>> parallel plans).
>>
>> So basically we want to put a restriction that parallel-safe function
>> can not use the parallel query? This will work but it seems too
>> restrictive to me. Because by marking function parallel safe we enable
>> it to be used with the outer parallel query that is fine. But, that
>> should not restrict the function from using the parallel query if it's
>> used with the other outer query which is not having the parallel
>> plan(or function is executed directly).
>
> I think if the user is explicitly marking a function as parallel-safe,
> then it doesn't make much sense to allow parallel query in such
> functions as it won't be feasible for the planner (or at least it will
> be quite expensive) to detect the same.  By the way, if the user has
> any such expectation from a function, then he can mark the function as
> parallel-restricted or parallel-unsafe.

However, if a query is parallel-safe, it might not end up getting run
in parallel.  In that case, it could still benefit from parallelism
internally.  I think we want to allow that.  For example, suppose you
run a query like:

SELECT x, sum(somewhat_expensive_function(y)) FROM sometab GROUP BY 1;

If sometab isn't very big, it's probably better to use a non-parallel
plan for this query, because then somewhat_expensive_function() can
still use parallelism internally, which might be better. However, if
sometab is large enough, then it might be better to parallelize the
whole query using a Partial/FinalizeAggregate and force each call to
somewhat_expensive_function() to run serially.  So I don't think a
hard-and-fast rule that parallel-safe functions can't use parallelism
internally is a good idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allow pg_dumpall to work without pg_authid

2017-02-26 Thread Robert Haas

On Sun, Feb 26, 2017 at 3:43 PM, Robins Tharakan  wrote:
> To confirm, this did originate by trying to accommodate a fork. But what
> I can say is that this doesn't appear to be a bug; what they call
> Super-User isn't effectively one.

How's that not a bug?  I mean, it's reasonable for someone to want to
restrict the superuser in a cloud environment, but if they restrict it
so much that you can't take a backup with standard tools, I'd say they
should also patch the tools, though maybe a better idea would be to
restrict the superuser a bit less.

My basic concern here is that I don't want half of our tools to end up
with special-purpose flags that serve only to unbreak Amazon RDS.
That can't be a good solution to anything.  It will lead to extra work
for us and confusion for users about whether they should be using
them.  People are going to see this --avoid-pgauthid and wonder why
it's there.  And the next time Amazon RDS breaks something, we'll get
a different flag someplace else to fix that problem.  If we call them
all --unbreak-amazon-rds instead of things like --avoid-pgauthid, then
it will be clear when they need to be used, but why would we accept
the job of working around the defects in Amazon's fork of PG?  I'm
already responsible for helping maintain one fork of PostgreSQL, but
I'm not under any illusion that I get to do that by putting changes
that make that easier into the community code base.  Plus, for that
work, I get paid.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Make subquery alias optional in FROM clause

2017-02-26 Thread Matthew Woodcraft

On 2017-02-24 07:25, Robert Haas wrote:
> I don't think it's only Oracle that allows omitting the
> alias; I think there are a number of other systems that behave
> similarly.

SQLite, for example.

Making conversions from SQLite to Postgres easier is a Good Thing.
"subquery in FROM must have an alias" has caused me inconvenience doing
that as recently as last week.

-M-

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allow pg_dumpall to work without pg_authid

2017-02-26 Thread Robins Tharakan

On 26 February 2017 at 19:26, Robert Haas  wrote:

> I am a little surprised that this patch has gotten such a good
> reception.  We haven't in the past been all that willing to accept
> core changes for the benefit of forks of PostgreSQL; extensions, sure,
> but forks?  Maybe we should take the view that Amazon has broken this
> and Amazon ought to fix it, rather than making it our job to (try to)
> work around their bugs.
>
>
(Disclaimer: I work at the said company, although don't represent them
in any way. This patch is in my personal capacity)

To confirm, this did originate by trying to accommodate a fork. But what
I can say is that this doesn't appear to be a bug; what they call
Super-User isn't effectively one.

Personally, I think it would be wise to also consider that this fork has
a very large user-base and for that user-base, this 'is' Postgres. Further,
case-by-case exceptions still should be considered for important issues
(here, this relates to lock-in).

Either way, I could pull-back the patch if more people object.
-
robins

Re: [HACKERS] bytea_output output of base64

2017-02-26 Thread Robert Haas

On Sun, Feb 26, 2017 at 5:19 AM, Peter Eisentraut
 wrote:
>> Is there a reason we chose hex over base64?
>
> The reason we changed from the old format to hex was for performance.
> We didn't consider base64 at the time, but hex would probably still have
> been faster.

I might be remembering the discussion incorrectly, but I thought the
issue was that it was really hard to predict how large a buffer we
needed for the "escape" format, because the answer depends on the
number of backslashes in the string.  With the "hex" format, we could
predict the size of the output just based on the size of the input
(which we know) rather than having it depend on the contents of the
input (which we don't).  So I would think that any format that has the
property that we can know the exact output size based on the input
size would be equally good.  And base64 would seem to have some
advantage because the output would be more substantially more compact.

That having been said, I do agree with Tom's point that we already
have one more bytea_output format than would be ideal.  To justify
implementing base64 as a third choice, it would have to not only be
better than hex, but enough better to justify the migration pain.  I'm
not sure whether it could clear that bar.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication existing data copy

2017-02-26 Thread Erik Rijkers


On 2017-02-26 01:45, Petr Jelinek wrote:

Again, much better... :

-- out_20170226_0724.txt
 25 -- pgbench -c 1 -j 8 -T 10 -P 5 -n
 25 -- All is well.
-- out_20170226_0751.txt
 25 -- pgbench -c 4 -j 8 -T 10 -P 5 -n
 25 -- All is well.
-- out_20170226_0819.txt
 25 -- pgbench -c 8 -j 8 -T 10 -P 5 -n
 25 -- All is well.
-- out_20170226_0844.txt
 25 -- pgbench -c 16 -j 8 -T 10 -P 5 -n
 25 -- All is well.
-- out_20170226_0912.txt
 25 -- pgbench -c 32 -j 8 -T 10 -P 5 -n
 25 -- All is well.
-- out_20170226_0944.txt
 25 -- scale  5 clients  1   INIT_WAIT  0CLEAN_ONLY=
 25 -- pgbench -c 1 -j 8 -T 10 -P 5 -n
 25 -- All is well.

 but not perfect: with the next scale up (pgbench scale 25) I got:

-- out_20170226_1001.txt
  3 -- scale  25 clients  1   INIT_WAIT  0CLEAN_ONLY=
  3 -- pgbench -c 1 -j 8 -T 10 -P 5 -n
  2 -- All is well.
  1 -- Not good, but breaking out of wait (waited more than 60s)

It looks like something got stuck at DROP SUBSCRIPTION again which, I 
think, derives from this line:

echo "drop subscription if exists sub1"  | psql -qXp $port2

I don't know exactly what is useful/useless to report; below is the 
state of some tables/views (note that this is from 31 minutes after the 
fact (see 'duration' in the first query)), and a backtrace :



$ ./view.sh
select current_setting('port') as port;
 port
--
 6973
(1 row)

select
  rpad(now()::text,19) as now
, pid   as pid
, application_name  as app
, state as state
, wait_eventas wt_evt
, wait_event_type   as wt_evt_type
, date_trunc('second', query_start::timestamp)  as query_start
, substring((now() - query_start)::text, 1, position('.' in (now() - 
query_start)::text)-1) as duration

, query
from pg_stat_activity
where query !~ 'pg_stat_activity'
;
 now |  pid  | app   
  | state  | wt_evt | wt_evt_type | query_start 
| duration |  query

-+---+-+++-+-+--+--
 2017-02-26 10:42:43 | 28232 | logical replication worker 31929  
  | active | relation   | Lock| 
|  |
 2017-02-26 10:42:43 | 28237 | logical replication worker 31929 sync 
31906 || LogicalSyncStateChange | IPC |  
   |  |
 2017-02-26 10:42:43 | 28242 | logical replication worker 31929 sync 
31909 || transactionid  | Lock|  
   |  |
 2017-02-26 10:42:43 | 32023 | psql  
  | active | BgWorkerShutdown   | IPC | 2017-02-26 10:10:52 
| 00:31:51 | drop subscription if exists sub1

(4 rows)

select * from pg_stat_replication;
 pid | usesysid | usename | application_name | client_addr | 
client_hostname | client_port | backend_start | backend_xmin | state | 
sent_location | write_location | flush_location | replay_location | 
sync_priority | sync_state

-+--+-+--+-+-+-+---+--+---+---+++-+---+
(0 rows)

select * from pg_stat_subscription;
 subid | subname |  pid  | relid | received_lsn |  
last_msg_send_time   | last_msg_receipt_time | 
latest_end_lsn |latest_end_time

---+-+---+---+--+---+---++---
 31929 | sub1| 28242 | 31909 |  | 2017-02-26 
10:07:05.723093+01 | 2017-02-26 10:07:05.723093+01 || 
2017-02-26 10:07:05.723093+01
 31929 | sub1| 28237 | 31906 |  | 2017-02-26 
10:07:04.721229+01 | 2017-02-26 10:07:04.721229+01 || 
2017-02-26 10:07:04.721229+01
 31929 | sub1| 28232 |   | 1/73497468   |
   | 2017-02-26 10:07:47.781883+01 | 1/59A73EF8 | 2017-02-26 
10:07:04.720595+01

(3 rows)

select * from pg_subscription;
 subdbid | subname | subowner | subenabled | subconninfo | subslotname | 
subpublications

-+-+--++-+-+-
   16384 | sub1|   10 | t  | port=6972   | sub1| 
{pub1}

(1 row)

select * from pg_subscription_rel;
 srsubid | srrelid | srsubstate |  srsublsn
-+-++
   31929 |   31912 | i  |
   31929 |   31917 | i  |
   31929 |   31909 | d  |
   31929 |   31906 | w  | 1/73498F90
(4 rows)

Dunno if a backtrace is is useful

$ gdb -pid 32023  (from the DROP SUBSCRIPTION

Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog

2017-02-26 Thread Robert Haas

On Thu, Feb 23, 2017 at 9:40 PM, Magnus Hagander  wrote:
> I'm not sure this logic belongs in pg_receivexlog. If we put the decision
> making there, then we lock ourselves into one "type of policy".

That's not really true.  We can add other policies - or extensibility
- later.  A more accurate statement, ISTM, would be that initially we
only support one type of policy.  But that's fine; more can be added
later.

> Wouldn't this one, along with some other scenarios, be better provided by
> the "run command at end of segment" function that we've talked about before?
> And then that external command could implement whatever aging logic would be
> appropriate for the environment?

I don't think it's bad to have that, but I don't understand the
resistance to having a policy that by default lets us keep as much WAL
as will fit within our space budget.  That seems like an eminently
sensible thing to want.  Ideally, I'd like to be able to recovery any
backup, however old, from that point forward to a point of my
choosing.  But if I run out of disk space, removing the oldest WAL
files I have is more sensible than not accepting new ones.  Sure, I'll
have less ability to go back in time, but I'm less likely to need the
data from 3 or 4 backups ago than I am to need the most recent data.
I'm only going to go back to an older backup if I can't recover from
the most recent one, or if I need some data that was removed or
corrupted some time ago.  It's good to have that ability for as long
as it is sustainable, but when I have to pick, I want the new stuff.

I think we're actually desperately in need of smarter WAL management
tools in core not just in this respect but in a whole bunch of places,
and I think size is an excellent thing for those tools to be
considering.  When Heikki implemented min_wal_size and max_wal_size
(88e982302684246e8af785e78a467ac37c76dee9, February 2015) there was
quite a bit of discussion about how nice it would be to have a HARD
limit on WAL size rather than a soft limit.  When the system gets too
close to the hard limit, processes trying to write WAL slow or stop
until a checkpoint can be completed, allowing for the removal of WAL.
Heroku also previously advocated for such a system, to replace their
ad-hoc system of SIGSTOPping backends for short periods of time (!) to
accomplish the same thing.  When replication slots were added
(858ec11858a914d4c380971985709b6d6b7dd6fc, January 2014) we talked
about how nice it would be if there were a facility to detect when a
replication slot (or combination of slots) was forcing the retention
of too much WAL and, when some threshold is exceeded, disable WAL
retention for those slots to prevent disk space exhaustion.  When
pg_archivecleanup was added (ca65f2190ae20b8bba9aa66e4cab1982b95d109f,
24bfbb5857a1e7ae227b526e64e540752c3b1fe3, June 2010) it documented
that it was really only smart enough to handle the case of an archive
for the benefit of a single standby, and it didn't do anything to help
you if that one standby got far enough behind to fill up the disk.  In
all of these cases, we're still waiting for something smarter to come
along.  This is an enormous practical problem.  "pg_xlog filled up" is
a reasonably common cause of production outages, and "archive
directory filled up" is a reasonably common cause of "pg_xlog filled
up".  I don't mind having a mode where we give the user the tools with
which to build their own solution to these problems, but we shouldn't
ignore the likelihood that many people are likely to want the same
policies, and I'd rather have those commonly-used policies
well-implemented in core than implemented at highly varying levels of
quality in individual installations.

All IMHO, of course...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should logtape.c blocks be of type long?

2017-02-26 Thread Robert Haas

On Sun, Feb 26, 2017 at 2:44 AM, Peter Geoghegan  wrote:
> I tend to be suspicious of use of the type "long" in general, because
> in general one should assume that it is no wider than "int". This
> calls into question why any code that uses "long" didn't just use
> "int", at least in my mind.

Yeah.  Using things that are guaranteed to be the size we want them to
be (and the same size on all platforms) seems like a good plan.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for changes to recovery.conf API

2017-02-26 Thread Robert Haas

On Sat, Feb 25, 2017 at 7:28 PM, Michael Paquier
 wrote:
> - pg_basebackup -R generates recovery.conf.auto.

Does anything cause that file to get read?

Wouldn't it be better to just append to postgresql.conf.auto?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for changes to recovery.conf API

2017-02-26 Thread Robert Haas

On Wed, Jan 11, 2017 at 11:23 PM, Simon Riggs  wrote:
>> I think the issue was that some people didn't want configuration files
>> in the data directory.  By removing recovery.conf we accomplish that.
>> Signal/trigger files are not configuration (or at least it's much easier
>> to argue that), so I think having them in the data directory is fine.
>
> There were a considerable number of people that pushed to make the
> data directory non-user writable, which is where the signal directory
> came from.

Specifically, it's a problem for Debian's packaging conventions,
right?  The data directory can contain anything that the server itself
will write, but configuration files that are written for the server to
read are supposed to go in some external location dictated by Debian's
packaging policy.

Things like trigger files aren't configuration files per se, so maybe
it's OK if those still get written into the data directory.  Even if
not, that seems like a separate patch.  In my view, based on Michael's
description of what the current patch version does, it's a clear step
forward.  Other steps can be taken at another time, if required.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-02-26 Thread Robert Haas

On Sat, Feb 25, 2017 at 10:50 AM, Pavan Deolasee
 wrote:
> On Fri, Feb 24, 2017 at 9:47 PM, Robert Haas  wrote:
>> And there are no bugs, right?  :-)
>
> Yeah yeah absolutely nothing. Just like any other feature committed to
> Postgres so far ;-)

Fair point, but I've already said why I think the stakes for this
particular feature are pretty high.

> I need to polish the chain conversion patch a bit and also add missing
> support for redo, hash indexes etc. Support for hash indexes will need
> overloading of ip_posid bits in the index tuple (since there are no free
> bits left in hash tuples). I plan to work on that next and submit a fully
> functional patch, hopefully before the commit-fest starts.
>
> (I have mentioned the idea of overloading ip_posid bits a few times now and
> haven't heard any objection so far. Well, that could either mean that nobody
> has read those emails seriously or there is general acceptance to that
> idea.. I am assuming latter :-))

I'm not sure about that.  I'm not really sure I have an opinion on
that yet, without seeing the patch.  The discussion upthread was a bit
vague:

"One idea is to free up 3 bits from ip_posid knowing that OffsetNumber
can never really need more than 13 bits with the other constraints in
place."

Not sure exactly what "the other constraints" are, exactly.

/me goes off, tries to figure it out.

If I'm reading the definition of MaxIndexTuplesPerPage correctly, it
thinks that the minimum number of bytes per index tuple is at least
16: I think sizeof(IndexTupleData) will be 8, so when you add 1 and
MAXALIGN, you get to 12, and then ItemIdData is another 4.  So an 8k
page (2^13 bits) could have, on a platform with MAXIMUM_ALIGNOF == 4,
as many as 2^9 tuples.  To store more than 2^13 tuples, we'd need a
block size > 128k, but it seems 32k is the most we support.  So that
seems OK, if I haven't gotten confused about the logic.

I suppose the only other point of concern about stealing some bits
there is that it might make some operations a little more expensive,
because they've got to start masking out the high bits.  But that's
*probably* negligible.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allow pg_dumpall to work without pg_authid

2017-02-26 Thread Robert Haas

On Sun, Feb 19, 2017 at 12:24 AM, Robins Tharakan  wrote:
> I would like to work on a patch to accommodate restricted environments (such
> as AWS RDS Postgres) which don't allow pg_authid access since their
> definition of Superuser is just a regular user with extra permissions.
>
> Would you consider a patch to add a flag to work around this restriction,
> Or, do you prefer that this be maintained outside core?

I am a little surprised that this patch has gotten such a good
reception.  We haven't in the past been all that willing to accept
core changes for the benefit of forks of PostgreSQL; extensions, sure,
but forks?  Maybe we should take the view that Amazon has broken this
and Amazon ought to fix it, rather than making it our job to (try to)
work around their bugs.

On the other hand, maybe that approach is narrow-minded.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reduce lock levels for reloptions

2017-02-26 Thread Robert Haas

On Sat, Feb 25, 2017 at 11:24 PM, Simon Riggs  wrote:
> Patch to reduce lock levels of various relation-level options,
> following on from earlier work by Fabrizio, with additional work and
> documentation by me.

I have reviewed this patch and I think it looks reasonable.  I think
that the behavior will be that overlapping transactions may or may not
pick up the new values depending on whether they already have a lock
on the relation and whether they happen to have a relcache flush for
some other reason, but new transactions started after the value is
altered via DDL should always see the new value, because they will
always do AcceptInvalidationMessages() after locking the relation, and
if they haven't yet picked up the value by that point, then they will
see it then.  If that sounds right, maybe we should document someplace
in the SGML documentation that new values of reloptions which can be
changed without AEL might or might not be seen by overlapping
transactions; perhaps the lock levels for each reloption should also
be documented (since, as evidenced by this and previous patches,
people obviously do care about what those levels are).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: ParallelFinish-hook of FDW/CSP (Re: [HACKERS] Steps inside ExecEndGather)

2017-02-26 Thread Robert Haas

On Sat, Feb 25, 2017 at 10:00 AM, Kohei KaiGai  wrote:
> I tried to add a description how custom-scan callbacks performs under the
> executor, and when invoked roughly.
> However, it is fundamentally not easy for most of people because it assumes
> FDW/CSP author understand the overall behavior of optimizer / executor, not
> only APIs specifications.

I think the statements about Shutdown only being useful in parallel
mode aren't quite striking the right tone.  In theory a node can hold
any kind of resources and find it worthwhile to release them at
shutdown time.  I think we'll eventually find it useful to run
shutdown callbacks in other cases as well - e.g. when a Limit has been
filled.  It's probably true that the undeniable need for this today is
in the parallel query case, but I'd like to have this text read in a
way that doesn't assert that this is the only possible use case,
because I don't think it is.

I rewrote the documentation along those lines a bit and committed this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

98 matches

Mail list logo