date:20161025

Re: [HACKERS]

2016-10-25 Thread Masahiko Sawada

On Wed, Oct 26, 2016 at 2:51 PM, Masahiko Sawada  wrote:
> On Fri, Oct 21, 2016 at 2:38 PM, Ashutosh Bapat
>  wrote:
>> On Wed, Oct 19, 2016 at 9:17 PM, Robert Haas  wrote:
>>> On Thu, Oct 13, 2016 at 7:27 AM, Amit Langote
>>>  wrote:
 However, when I briefly read the description in "Transaction Management in
 the R* Distributed Database Management System (C. Mohan et al)" [2], it
 seems that what Ashutosh is saying might be a correct way to proceed after
 all:
>>>
>>> I think Ashutosh is mostly right, but I think there's a lot of room to
>>> doubt whether the design of this patch is good enough that we should
>>> adopt it.
>>>
>>> Consider two possible designs.  In design #1, the leader performs the
>>> commit locally and then tries to send COMMIT PREPARED to every standby
>>> server afterward, and only then acknowledges the commit to the client.
>>> In design #2, the leader performs the commit locally and then
>>> acknowledges the commit to the client at once, leaving the task of
>>> running COMMIT PREPARED to some background process.  Design #2
>>> involves a race condition, because it's possible that the background
>>> process might not complete COMMIT PREPARED on every node before the
>>> user submits the next query, and that query might then fail to see
>>> supposedly-committed changes.  This can't happen in design #1.  On the
>>> other hand, there's always the possibility that the leader's session
>>> is forcibly killed, even perhaps by pulling the plug.  If the
>>> background process contemplated by design #2 is well-designed, it can
>>> recover and finish sending COMMIT PREPARED to each relevant server
>>> after the next restart.  In design #1, that background process doesn't
>>> necessarily exist, so inevitably there is a possibility of orphaning
>>> prepared transactions on the remote servers, which is not good. Even
>>> if the DBA notices them, it won't be easy to figure out whether to
>>> commit them or roll them back.
>>>
>>> I think this thought experiment shows that, on the one hand, there is
>>> a point to waiting for commits on the foreign servers, because it can
>>> avoid the anomaly of not seeing the effects of your own commits.  On
>>> the other hand, it's ridiculous to suppose that every case can be
>>> handled by waiting, because that just isn't true.  You can't be sure
>>> that you'll be able to wait long enough for COMMIT PREPARED to
>>> complete, and even if that works out, you may not want to wait
>>> indefinitely for a dead server.  Waiting for a ROLLBACK PREPARED has
>>> no value whatsoever unless the system design is such that failing to
>>> wait for it results in the ROLLBACK PREPARED never getting performed
>>> -- which is a pretty poor excuse.
>>>
>>> Moreover, there are good reasons to think that doing this kind of
>>> cleanup work in the post-commit hooks is never going to be acceptable.
>>> Generally, the post-commit hooks need to be no-fail, because it's too
>>> late to throw an ERROR.  But there's very little hope that a
>>> connection to a remote server can be no-fail; anything that involves a
>>> network connection is, by definition, prone to failure.  We can try to
>>> guarantee that every single bit of code that runs in the path that
>>> sends COMMIT PREPARED only raises a WARNING or NOTICE rather than an
>>> ERROR, but that's going to be quite difficult to do: even palloc() can
>>> throw an error.  And what about interrupts?  We don't want to be stuck
>>> inside this code for a long time without any hope of the user
>>> recovering control of the session by pressing ^C, but of course the
>>> way that works is it throws an ERROR, which we can't handle here.  We
>>> fixed a similar issue for synchronous replication in
>>> 9a56dc3389b9470031e9ef8e45c95a680982e01a by making an interrupt emit a
>>> WARNING in that case and then return control to the user.  But if we
>>> do that here, all of the code that every FDW emits has to be aware of
>>> that rule and follow it, and it just adds to the list of ways that the
>>> user backend can escape this code without having cleaned up all of the
>>> prepared transactions on the remote side.
>>
>> Hmm, IIRC, my patch and possibly patch by Masahiko-san and Vinayak,
>> tries to resolve prepared transactions in post-commit code. I agree
>> with you here, that it should be avoided and the backend should take
>> over the job of resolving transactions.
>>
>>>
>>> It seems to me that the only way to really make this feature robust is
>>> to have a background worker as part of the equation.  The background
>>> worker launches at startup and looks around for local state that tells
>>> it whether there are any COMMIT PREPARED or ROLLBACK PREPARED
>>> operations pending that weren't completed during the last server
>>> lifetime, whether because of a local crash or remote unavailability.
>>> It attempts to complete

[HACKERS]

2016-10-25 Thread Masahiko Sawada

On Fri, Oct 21, 2016 at 2:38 PM, Ashutosh Bapat
 wrote:
> On Wed, Oct 19, 2016 at 9:17 PM, Robert Haas  wrote:
>> On Thu, Oct 13, 2016 at 7:27 AM, Amit Langote
>>  wrote:
>>> However, when I briefly read the description in "Transaction Management in
>>> the R* Distributed Database Management System (C. Mohan et al)" [2], it
>>> seems that what Ashutosh is saying might be a correct way to proceed after
>>> all:
>>
>> I think Ashutosh is mostly right, but I think there's a lot of room to
>> doubt whether the design of this patch is good enough that we should
>> adopt it.
>>
>> Consider two possible designs.  In design #1, the leader performs the
>> commit locally and then tries to send COMMIT PREPARED to every standby
>> server afterward, and only then acknowledges the commit to the client.
>> In design #2, the leader performs the commit locally and then
>> acknowledges the commit to the client at once, leaving the task of
>> running COMMIT PREPARED to some background process.  Design #2
>> involves a race condition, because it's possible that the background
>> process might not complete COMMIT PREPARED on every node before the
>> user submits the next query, and that query might then fail to see
>> supposedly-committed changes.  This can't happen in design #1.  On the
>> other hand, there's always the possibility that the leader's session
>> is forcibly killed, even perhaps by pulling the plug.  If the
>> background process contemplated by design #2 is well-designed, it can
>> recover and finish sending COMMIT PREPARED to each relevant server
>> after the next restart.  In design #1, that background process doesn't
>> necessarily exist, so inevitably there is a possibility of orphaning
>> prepared transactions on the remote servers, which is not good. Even
>> if the DBA notices them, it won't be easy to figure out whether to
>> commit them or roll them back.
>>
>> I think this thought experiment shows that, on the one hand, there is
>> a point to waiting for commits on the foreign servers, because it can
>> avoid the anomaly of not seeing the effects of your own commits.  On
>> the other hand, it's ridiculous to suppose that every case can be
>> handled by waiting, because that just isn't true.  You can't be sure
>> that you'll be able to wait long enough for COMMIT PREPARED to
>> complete, and even if that works out, you may not want to wait
>> indefinitely for a dead server.  Waiting for a ROLLBACK PREPARED has
>> no value whatsoever unless the system design is such that failing to
>> wait for it results in the ROLLBACK PREPARED never getting performed
>> -- which is a pretty poor excuse.
>>
>> Moreover, there are good reasons to think that doing this kind of
>> cleanup work in the post-commit hooks is never going to be acceptable.
>> Generally, the post-commit hooks need to be no-fail, because it's too
>> late to throw an ERROR.  But there's very little hope that a
>> connection to a remote server can be no-fail; anything that involves a
>> network connection is, by definition, prone to failure.  We can try to
>> guarantee that every single bit of code that runs in the path that
>> sends COMMIT PREPARED only raises a WARNING or NOTICE rather than an
>> ERROR, but that's going to be quite difficult to do: even palloc() can
>> throw an error.  And what about interrupts?  We don't want to be stuck
>> inside this code for a long time without any hope of the user
>> recovering control of the session by pressing ^C, but of course the
>> way that works is it throws an ERROR, which we can't handle here.  We
>> fixed a similar issue for synchronous replication in
>> 9a56dc3389b9470031e9ef8e45c95a680982e01a by making an interrupt emit a
>> WARNING in that case and then return control to the user.  But if we
>> do that here, all of the code that every FDW emits has to be aware of
>> that rule and follow it, and it just adds to the list of ways that the
>> user backend can escape this code without having cleaned up all of the
>> prepared transactions on the remote side.
>
> Hmm, IIRC, my patch and possibly patch by Masahiko-san and Vinayak,
> tries to resolve prepared transactions in post-commit code. I agree
> with you here, that it should be avoided and the backend should take
> over the job of resolving transactions.
>
>>
>> It seems to me that the only way to really make this feature robust is
>> to have a background worker as part of the equation.  The background
>> worker launches at startup and looks around for local state that tells
>> it whether there are any COMMIT PREPARED or ROLLBACK PREPARED
>> operations pending that weren't completed during the last server
>> lifetime, whether because of a local crash or remote unavailability.
>> It attempts to complete those and retries periodically.  When a new
>> transaction needs this type of coordination, it adds the necessary
>> crash-proof state and then signals the

[HACKERS] [bug fix] Stats collector is not restarted on the standby

2016-10-25 Thread Tsunakawa, Takayuki

Hello,

If the stats collector is forcibly terminated on the standby in streaming 
replication configuration, it won't be restarted until the standby is promoted 
to the primary.  The attached patch restarts the stats collector on the standby.

FYI, when the stats collector is down, SELECTs against the statistics views get 
stale data with the following message.

LOG:  using stale statistics instead of current ones because stats collector is 
not responding
STATEMENT:  select * from pg_stat_user_tables

Regards
Takayuki Tsunakawa



stats_collector_not_restarted.patch
Description: stats_collector_not_restarted.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Minor typo in reorderbuffer.c

2016-10-25 Thread vinayak


Hi,

Attached patch fixes a typo in reorderbuffer.c

s/messsage/message/g

Regards,

Vinayak Pokale

NTT Open Source Software Center


typo-reorderbuffer-c.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unused variable in postgres_fdw/deparse.c

2016-10-25 Thread Ashutosh Bapat

On Wed, Oct 26, 2016 at 6:19 AM, Kyotaro HORIGUCHI
 wrote:
> Hello, compiler complains about unused variable during build
> postgres_fdw without assertions (!--enable-cassert).
>
> deparse.c: In function ‘deparseFromExpr’:
> deparse.c:1029:14: warning: unused variable ‘foreignrel’ [-Wunused-variable]
>   RelOptInfo *foreignrel = context->foreignrel;
>   ^~
>
> The attched patch removes it and moves into the assertion below it.
>

Thanks for the patch and sorry for missing this in the review. I

The patch applies but seems to have a trailing white space.
patch -p1 < /mnt/hgfs/tmp/postgre_fdw_delete_unused_var.patch
(Stripping trailing CRs from patch.)
patching file contrib/postgres_fdw/deparse.c
But that's removed my "patch" command.

It compiles cleanly and make check in contrib/postgres_fdw does not
show any failure.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-10-25 Thread Karl O. Pinc

On Tue, 25 Oct 2016 22:30:48 -0500
"Karl O. Pinc"  wrote:

> Since pg_log_file may contain only one line, and that
> line may be either the filename of the csv log file or
> the file name of the stderr file name it's impossible
> to tell whether that single file is in csv or stderr
> format.  I suppose it might be possible based on file
> name suffix, but Unix expressly ignores file name
> extensions and it seems unwise to force dependence
> on them.  Perhaps each line could begin with
> the "type" of the file, either 'csv' or 'stderr'
> followed by a space and the file name?  
> 
> In other words,
> as long as you're making the content of pg_log_file
> a data structure that contains more than just a single
> file name you may as well make that data structure
> something well-defined, easily parseable in shell, extensible,
> and informative.

While you're at it, it wouldn't hurt to provide another
function that tells you the format of the file returned
by pg_current_logfile(), since pg_current_logfile()
called without arguments could return either a stderr
formatted file or a csvlog formatted file.

Or leave it for the future.  Just a thought.


Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-10-25 Thread Amit Langote

On 2016/10/26 12:09, Amit Kapila wrote:
> On Wed, Oct 26, 2016 at 8:27 AM, Amit Langote
>  wrote:
>> On 2016/10/26 11:41, Amit Kapila wrote:
>>> On Wed, Oct 26, 2016 at 6:36 AM, Amit Langote
>>>  wrote:

 Sure, CopyTo() can be be taught to scan leaf partitions when a partitioned
 table is specified, but I thought this may be fine initially.
>>>
>>> Okay, I don't want to add anything to your existing work unless it is
>>> important.  However, I think there should be some agreement on which
>>> of the restrictions are okay for first version of patch.  This can
>>> avoid such questions in future from other reviewers.
>>
>> OK, so I assume you don't find this particular restriction problematic in
>> long term.
> 
> I think you can keep it as you have in patch.  After posting your
> updated patches, please do send a list of restrictions which this
> patch is imposing based on the argument that for first version they
> are not essential.

OK, agreed that it will be better to have all such restrictions and
limitations of the first version listed in one place, rather than being
scattered across different emails where they might have been mentioned and
discussed.

I will try to include such a list when posting the latest set of patches.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-10-25 Thread Karl O. Pinc

Another new version of a doc patch to the v6 patch.

More better English.  *sigh*

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


patch_pg_current_logfile-v6.diff.patchv4
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUG] pg_basebackup from disconnected standby fails

2016-10-25 Thread Amit Kapila

On Mon, Oct 24, 2016 at 12:25 PM, Michael Paquier
 wrote:
> On Mon, Oct 24, 2016 at 1:39 PM, Amit Kapila  wrote:
>>
>> I think what you are saying is not completely right, because we do
>> update minRecoveryPoint when we don't perform a new restart point.
>> When we perform restart point, then it assumes that flushing the
>> buffers will take care of updating minRecoveryPoint.
>
> Yep, minRecoveryPoint still gets updated when the last checkpoint
> record is the last restart point to avoid a hot standby to allow
> read-only connections at a LSN-point earlier than the last shutdown.
> Anyway, we can clearly reject 1. in the light of
> https://www.postgresql.org/message-id/caa4ek1kmjtsxqf0cav7cs4d4vwv2h_pc8d8q1bucqdzaf+7...@mail.gmail.com
> when playing with different stop locations at recovery.
>

That point holds good only for cases when we try to update minimum
recovery point beyond what is required (like earlier we were thinking
to update it unconditionally), however what is being discussed here is
to update only if it is not updated by flush of buffers.  I think that
is okay, but I feel Kyotaro-San's fix is a good fix for the problem
and we don't need to add some more code (additional update of control
file) to fix the problem.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUG] pg_basebackup from disconnected standby fails

2016-10-25 Thread Amit Kapila

On Tue, Oct 25, 2016 at 10:43 PM, Robert Haas  wrote:
> On Mon, Oct 24, 2016 at 12:26 AM, Amit Kapila  wrote:
>>
>> You are right that it will include additional WAL than strictly
>> necessary, but that can happen today as well because minrecoverypoint
>> could be updated after you have established restart point for
>> do_pg_start_backup().  Do you want to say that even without patch in
>> some cases we are including additional WAL in backup than what is
>> strictly necessary, so it is better to improve the situation?
>
> In that case, including additional WAL is *necessary*.  If the minimum
> recovery point is updated after do_pg_start_backup(), pages bearing
> LSNs newer than the old minimum recovery point could conceivably have
> been copied as part of the backup,
>

I might be missing something here, but I think this theory doesn't
hold true with respect to current code because we always update
minimum recovery point to last record being replayed not till till the
LSN of the page being flushed.  This is done to avoid repeated update
of control file.  Considering that, it seems to me that the patch by
Kyotaro-San is a reasonable fix provided we update comments at all
appropriate places.  I can try to update the comments, if the proposed
fix is acceptable to you.

> and therefore we need to replay up
> through the new minimum recovery point to guarantee consistency.
>
>>> I can think of two solutions that would be "tighter":
>>>
>>> 1. When performing a restartpoint, update the minimum recovery point
>>> to just beyond the checkpoint record.  I think this can't hurt anyone
>>> who is actually restarting recovery on the same machine, because
>>> that's exactly the point where they are going to start recovery.  A
>>> minimum recovery point which precedes the point at which they will
>>> start recovery is no better than one which is equal to the point at
>>> which they will start recovery.
>>
>> I think this will work but will cause an unnecessary control file
>> update for the cases when buffer flush will anyway do it.  It can work
>> if we try to do only when required (minrecoverypoint is lesser than
>> lastcheckpoint) after flush of buffers.
>
> Sure, that doesn't sound too hard to implement.
>

If you are inclined towards this solution, then I think what we need
to do is to change the API UpdateMinRecoveryPoint() such that it's
second parameter can take three values.  0 means update
minRecoveryPoint to passed lsn if minRecoveryPoint < lsn; 1 means
update minRecoveryPoint to latest replayed point if minRecoveryPoint <
lsn, same as currently false for *force*; 2 indicates same behaviour
as current *force* as true.  Also we need to pass currentTLI parameter
(lastCheckPoint.ThisTimeLineID) to this API to update
minRecoveryPointTLI.  I have not tried this, but I think something on
these lines should work.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-10-25 Thread Karl O. Pinc

On Tue, 25 Oct 2016 22:53:41 -0500
"Karl O. Pinc"  wrote:

> On Tue, 25 Oct 2016 22:30:48 -0500
> "Karl O. Pinc"  wrote:
>  
> > Hope to provide more feedback soon.  

Er, attached is yet another doc patch to the v6 patch.
Sorry about that.

Changes pg_current_logfile() detailed documentation.

Instead of saying that return values are undefined
I've documented that NULL is returned.  And
reworded, and shortened, my previous wording.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein

patch_pg_current_logfile-v6.diff.patchv3
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-10-25 Thread Karl O. Pinc

On Tue, 25 Oct 2016 22:30:48 -0500
"Karl O. Pinc"  wrote:
 
> Hope to provide more feedback soon.

Before I forget:

"make check" fails, due to oid issues with pg_current_logfile().

You're writing Unix eol characters into pg_log_file.  (I think.)
Does this matter on MS Windows?  (I'm not up on MS Windows,
and haven't put any thought into this at all.  But didn't
want to forget about the potential issue.)

Now that pg_log_file contains multiple lines shouldn't
it be called pg_log_files?

In the docs, other functions that take optional arguments
show up as multiple rows.  Attached is a new version of
my patch to the v6 patch which fixes this and supplies
a slightly better short description of pg_log_filename().

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


patch_pg_current_logfile-v6.diff.patchv2
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch proposal

2016-10-25 Thread Venkata B Nagothi

Attached is the patch which introduces the new parameter
"recovery_target_incomplete" -

Currently this patch addresses two scenarios -

*Scenario 1 :*

Provide options to DBA when the recovery target is not reached and has
stopped mid-way due to unavailability of WALs

When performing recovery to a specific recovery target,
"recovery_target_incomplete" parameter can be used to either promote, pause
or shutdown when recovery does not reach the specified recovery target and
reaches end-of-the-wal.


*Scenario 2 :*

Generates Errors, Hints when the specified recovery target is prior to the
backup's current position (xid, time and lsn). This behaviour is integrated
with the parameters "recovery_target_time","recovery_target_lsn" and
"recovery_target_xid".

I would like to know if the approach this patch incorporates sounds ok ?

My other comments are inline

On Mon, Aug 29, 2016 at 3:17 PM, Venkata B Nagothi 
wrote:

>
> On Fri, Aug 26, 2016 at 10:58 PM, Stephen Frost 
> wrote:
>
>> * Venkata B Nagothi (nag1...@gmail.com) wrote:
>> > On Fri, Aug 26, 2016 at 12:30 PM, Stephen Frost 
>> wrote:
>> > > * Venkata B Nagothi (nag1...@gmail.com) wrote:
>> > > > This behaviour will be similar to that of
>> recovery_target="immediate" and
>> > > > can be aliased.
>> > >
>> > > I don't believe we're really going at this the right way.  Clearly,
>> > > there will be cases where we'd like promotion at the end of the WAL
>> > > stream (as we currently have) even if the recovery point is not found,
>> > > but if the new option's "promote" is the same as "immediate" then we
>> > > don't have that.
>> >
>> > My apologies for the confusion. Correction - I meant, "promote" option
>> > would promote the database once the consistent point is reached at the
>> > end-of-the-WAL.
>>
>> "consistent point" and "end-of-the-WAL" are not the same.
>>
>> > > recovery_target = immediate, other
>> > >
>> > > recovery_action_target_found = promote, pause, shutdown
>> >
>> > This is currently supported by the existing parameter called
>> > "recovery_target_action"
>>
>> Right, sure, we could possibly use that, or create an alias, etc.
>>
>> > > recovery_action_target_not_found = promote, pause, shutdown
>> >
>> > This is exactly what newly proposed parameter will do.
>>
>> Then it isn't the same as 'recovery_target = immediate'.
>>
>> > > One question to ask is if we need to support an option for xid and
>> time
>> > > related to when we realize that we won't find the recovery target.  If
>> > > consistency is reached at a time which is later than the recovery
>> target
>> > > for time, what then?  Do we go through the rest of the WAL and perform
>> > > the "not found" action at the end of the WAL stream?  If we use that
>> > > approach, then at least all of the recovery target types are handled
>> the
>> > > same, but I can definitely see cases where an administrator might
>> prefer
>> > > an "error" option.
>> >
>> > Currently, this situation is handled by recovery_target_inclusive
>> > parameter.
>>
>> No, that's not the same.
>>
>> > Administrator can choose to stop the recovery at any consistent
>> > point before or after the specified recovery target. Is this what you
>> were
>> > referring to ?
>>
>> Not quite.
>>
>> > I might need a bit of clarification here, the parameter i am proposing
>> > would be effective only if the specified recovery target is not reached
>> and
>> > may not be effective if the recovery goes beyond recovery target point.
>>
>> Ok, let's consider this scenario:
>>
>> Backup started at: 4/22D3B8E0, time: 2016-08-26 08:29:08
>> Backup completed (consistency) at: 4/22D3B8EF, 2016-08-26 08:30:00
>> recovery_target is not set
>> recovery_target_time = 2016-08-26 08:29:30
>> recovery_target_inclusive = false
>>
>> In such a case, we will necessairly commit transactions which happened
>> between 8:29:30 and 8:30:00 and only stop (I believe..) once we've
>> reached consistency.  We could possibly detect that case and throw an
>> error instead.  The same could happen with xid.
>>
>
> Yes, currently, PG just recovers until the consistency is reached. It
> makes sense to throw an error instead.
>

This has not been addressed yet. Currently, the patch only considers
generating an error if the recovery target position is prior the current
backup's position and this is irrespective of weather backup_label is
present or not.
I will share my updates on how this can be addressed.


>
>
>> Working through more scenarios would be useful, I believe.  For example,
>> what if the xid or time specified happened before the backup started?
>>
>> Basically, what I was trying to get at is that we might be able to
>> detect that we'll never find a given recovery target point without
>> actually replaying all of WAL and wondering if we should provide options
>> to control what to do in such a case.
>>
>
> Ah, Yes, I got the point and I agree. Thanks for the clarification.
>
> Yes,

Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-10-25 Thread Karl O. Pinc

On Tue, 18 Oct 2016 14:18:36 +0200
Gilles Darold  wrote:

> Here is the v6 of the patch, here is the description of the
> pg_current_logfile() function, I have tried to keep thing as simple as
> possible:
> 
> pg_current_logfile( [ destination text ] )
> -

Attached is a doc patch to apply on top of your v6 patch.

Changes are, roughly:

Put pg_log_file in alphabetical order in the file name listing
and section body.

Put pg_current_logfile in alphabetical order in the function
name listing and section body.

1 argument functions don't seem to have a parameter name
when listed in documentation tables, just a data type,
so I got rid of the parameter name for pg_current_logfile().

Cleaned up the wording and provided more detail.

Added hyperlink to log_destination config parameter.

Added markup to various system values.  The markup does
not stand out very well in the html docs, but that's a different
issue and should be fixed by changing the markup processing.
(I used the markup found in the log_destination()
config parameter docs.)

pg_current_logfile does not seem related to pg_listening_channels
or pg_notification_queue_usage so I moved the latter 2 index
entries closer to their text.

I also have a couple of preliminary comments.

It seems like the code is testing for whether the
logfile is a CSV file by looking for a '.csv' suffix
on the end of the file name.  This seems a little fragile.
Shouldn't it be looking at something set internally when the
log_destination config declaration is parsed?

Since pg_log_file may contain only one line, and that
line may be either the filename of the csv log file or
the file name of the stderr file name it's impossible
to tell whether that single file is in csv or stderr
format.  I suppose it might be possible based on file
name suffix, but Unix expressly ignores file name
extensions and it seems unwise to force dependence
on them.  Perhaps each line could begin with
the "type" of the file, either 'csv' or 'stderr'
followed by a space and the file name?  

In other words,
as long as you're making the content of pg_log_file
a data structure that contains more than just a single
file name you may as well make that data structure
something well-defined, easily parseable in shell, extensible,
and informative.

Hope to provide more feedback soon.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index d1a5b96..54e10af 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -15443,6 +15443,12 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n);

+   pg_current_logfile(text)
+   text
+   log file in use by the logging collector
+  
+
+  
pg_my_temp_schema()
oid
OID of session's temporary schema, or 0 if none
@@ -15480,12 +15486,6 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n);

-   pg_current_logfile( destination text)
-   text
-   current log file used by the logging collector
-  
-
-  
session_user
name
session user name
@@ -15667,6 +15667,27 @@ SET search_path TO schema , schema, ..

+pg_current_logfile
+   
+
+   
+pg_current_logfile returns, as text
+the name of the log file currently in use by the logging collector.
+Log collection must be active or the return value is undefined.  When
+the  configuration parameter
+contains both csvlog and stderr
+pg_current_logfile returns the stderr filename.
+When  does not
+contain stderr the csv filename is returned.  When it
+does not contains csvlog the stderr filename is
+returned.  To request a specific file format supply either
+csvlog or stderr as the value of the
+optional, type text, parameter. The return value is
+undefined when the log format requested is not a configured log
+destination.
+   
+
+   
 pg_my_temp_schema

@@ -15692,23 +15713,6 @@ SET search_path TO schema , schema, ..
 pg_notification_queue_usage

-   
-pg_current_logfile
-   
-
-   
-pg_current_logfile returns the name of the
-current log file used by the logging collector, as text.
-Log collection must be active or the return value is undefined. When
-csvlog is used as log destination, the csv filename is returned, when
-it is set to stderr, the stderr filename is returned. When both are
-used, it returns the stderr filename.
-There is an optional parameter of type text to determines
-the log filename to return following the log destination, values can
-be 'csvlog' or 'stderr'. When the log format asked is not used as log
-destination then the return value is undefined.
-   
-

 pg_listening_channels returns a set of names of
 asynchronous

Re: [HACKERS] Declarative partitioning - another take

2016-10-25 Thread Amit Kapila

On Wed, Oct 26, 2016 at 8:27 AM, Amit Langote
 wrote:
> On 2016/10/26 11:41, Amit Kapila wrote:
>> On Wed, Oct 26, 2016 at 6:36 AM, Amit Langote
>>  wrote:
 1.
 @@ -1775,6 +1775,12 @@ BeginCopyTo(ParseState *pstate,
 {
 ..
 + else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 + ereport(ERROR,
 + (errcode(ERRCODE_WRONG_OBJECT_TYPE),
 + errmsg("cannot copy from partitioned table \"%s\"",
 + RelationGetRelationName(rel)),
 + errhint("Try the COPY (SELECT ...) TO variant.")));
 ..
 }

 Why is this restriction?  Won't it be useful to allow it for the cases
 when user wants to copy the data of all the partitions?
>>>
>>> Sure, CopyTo() can be be taught to scan leaf partitions when a partitioned
>>> table is specified, but I thought this may be fine initially.
>>>
>>
>> Okay, I don't want to add anything to your existing work unless it is
>> important.  However, I think there should be some agreement on which
>> of the restrictions are okay for first version of patch.  This can
>> avoid such questions in future from other reviewers.
>
> OK, so I assume you don't find this particular restriction problematic in
> long term.
>

I think you can keep it as you have in patch.  After posting your
updated patches, please do send a list of restrictions which this
patch is imposing based on the argument that for first version they
are not essential.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-10-25 Thread Amit Langote

On 2016/10/26 11:41, Amit Kapila wrote:
> On Wed, Oct 26, 2016 at 6:36 AM, Amit Langote
>  wrote:
>>> 1.
>>> @@ -1775,6 +1775,12 @@ BeginCopyTo(ParseState *pstate,
>>> {
>>> ..
>>> + else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
>>> + ereport(ERROR,
>>> + (errcode(ERRCODE_WRONG_OBJECT_TYPE),
>>> + errmsg("cannot copy from partitioned table \"%s\"",
>>> + RelationGetRelationName(rel)),
>>> + errhint("Try the COPY (SELECT ...) TO variant.")));
>>> ..
>>> }
>>>
>>> Why is this restriction?  Won't it be useful to allow it for the cases
>>> when user wants to copy the data of all the partitions?
>>
>> Sure, CopyTo() can be be taught to scan leaf partitions when a partitioned
>> table is specified, but I thought this may be fine initially.
>>
> 
> Okay, I don't want to add anything to your existing work unless it is
> important.  However, I think there should be some agreement on which
> of the restrictions are okay for first version of patch.  This can
> avoid such questions in future from other reviewers.

OK, so I assume you don't find this particular restriction problematic in
long term.

>>> 2.
>>> + if (!pg_strcasecmp(stmt->partspec->strategy, "list") &&
>>> + partnatts > 1)
>>> + ereport(ERROR,
>>> + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
>>> + errmsg("cannot list partition using more than one column")));
>>>
>>> /cannot list/cannot use list
>>
>> Actually "list partition" works here as a verb, as in "to list partition".
>>
> 
> I am not an expert of this matter, so probably some one having better
> grip can comment.  Are we using something similar in any other error
> message?

In fact, I changed to the current text after Robert suggested the same [1].

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BTgmoaPxXJ14eDVia514UiuQAXyZGqfbz8Qg3G4a8Rz2gKF7w%40mail.gmail.com




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-10-25 Thread Amit Kapila

On Wed, Oct 26, 2016 at 6:36 AM, Amit Langote
 wrote:
>> 1.
>> @@ -1775,6 +1775,12 @@ BeginCopyTo(ParseState *pstate,
>> {
>> ..
>> + else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
>> + ereport(ERROR,
>> + (errcode(ERRCODE_WRONG_OBJECT_TYPE),
>> + errmsg("cannot copy from partitioned table \"%s\"",
>> + RelationGetRelationName(rel)),
>> + errhint("Try the COPY (SELECT ...) TO variant.")));
>> ..
>> }
>>
>> Why is this restriction?  Won't it be useful to allow it for the cases
>> when user wants to copy the data of all the partitions?
>
> Sure, CopyTo() can be be taught to scan leaf partitions when a partitioned
> table is specified, but I thought this may be fine initially.
>

Okay, I don't want to add anything to your existing work unless it is
important.  However, I think there should be some agreement on which
of the restrictions are okay for first version of patch.  This can
avoid such questions in future from other reviewers.


>> 2.
>> + if (!pg_strcasecmp(stmt->partspec->strategy, "list") &&
>> + partnatts > 1)
>> + ereport(ERROR,
>> + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
>> + errmsg("cannot list partition using more than one column")));
>>
>> /cannot list/cannot use list
>
> Actually "list partition" works here as a verb, as in "to list partition".
>

I am not an expert of this matter, so probably some one having better
grip can comment.  Are we using something similar in any other error
message?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: scan key push down to heap [WIP]

2016-10-25 Thread Dilip Kumar

On Tue, Oct 25, 2016 at 10:35 PM, Alvaro Herrera
 wrote:
> We don't promise order of execution (which is why we can afford to sort
> on cost), but I think it makes sense to keep a rough ordering based on
> cost, and not let this push-down affect those ordering decisions too
> much.

Ok, Thanks for clarification.

>
> I think it's fine to push-down quals that are within the same order of
> magnitude of the cost of a pushable condition, while keeping any other
> much-costlier qual (which could otherwise be pushed down) together with
> non-pushable conditions; this would sort-of guarantee within-order-of-
> magnitude order of execution of quals.
>
> Hopefully an example clarifies what I mean.  Let's suppose you have
> three quals, where qual2 is non-pushable but 1 and 3 are.  cost(1)=10,
> cost(2)=11, cost(3)=12.  Currently, they are executed in that order.
>
> If you were to compare costs in the straightforward way, you would push
> only 1 (because 3 is costlier than 2 which is not push-down-able).  With
> fuzzy comparisons, you'd push both 1 and 3, because cost of 3 is close
> enough to that of qual 2.
>
> But if cost(3)=100 then only push qual 1, and let qual 3 be evaluated
> together with 2 outside the scan node.

After putting more thought on this, IMHO it need not to be so
complicated. Currently we are talking about pushing only "var op
const", and cost of all such functions are very low and fixed "1".

Do we really need to take care of any user defined function which is
declared with very low cost ?
Because while building index conditions also we don't take care of
such things. Index conditions will always we evaluated first then only
filter will be applied.

Am I missing something ?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.6, background worker processes, and PL/Java

2016-10-25 Thread Chapman Flack

On 10/25/16 18:56, Chapman Flack wrote:

> If pooled, and tied to the backend that started them, do they need
> to do anything special to detect when the leader has executed
> SET ROLE or SET SESSION AUTHORIZATION?

Let me guess ... such information is *not* synchronized across workers,
and that'd be why the manual says "functions must be marked PARALLEL
RESTRICTED if they access ... client connection state ..."?

That's probably a resounding 'no' for declaring any PL/Java function
SAFE, then.

And if changing "the transaction state even temporarily (e.g. a PL/pgsql
function which establishes an EXCEPTION block to catch errors)" is enough
to require UNSAFE, then it may be that RESTRICTED is off limits too, as
there are places PL/Java does that internally.

I take it that example refers not to just any use of PG_TRY/PG_CATCH,
but only to those uses where an internal subtransaction is used to
allow execution to continue?

If a person writes a function in some language (SQL, for example),
declares it PARALLEL SAFE but is lying because it calls another
function (in Java, say) that is PARALLEL UNSAFE or RESTRICTED,
does PostgreSQL detect or prevent that, or is it just considered
an unfortunate mistake by the goofball who declared the first
function safe?

And if that's not already prevented, could it be worth adding
code in the PL/Java call handler to detect such a situation and
make sure it ends in a meaningful ereport and not something worse?

-Chap

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-10-25 Thread Amit Langote

On 2016/10/25 15:58, Amit Kapila wrote:
> On Thu, Oct 6, 2016 at 12:44 PM, Amit Langote
>  wrote:
>> On 2016/10/05 2:12, Robert Haas wrote:
>>> Hmm, do we ever fire triggers on the parent for operations on a child
>>> table?  Note this thread, which seems possibly relevant:
>>>
>>> https://www.postgresql.org/message-id/flat/cd282adde5b70b20c57f53bb9ab75e27%40biglumber.com
>>
>> The answer to your question is no.
>>
>> The thread you quoted discusses statement-level triggers and the
>> conclusion is that they don't work as desired for UPDATE and DELETE on
>> inheritance tables.  As things stand, only UPDATE or DELETE on the parent
>> affects the child tables and it's proposed there that the statement-level
>> triggers on the parent and also on any child tables affected should be
>> fired in that case.
>>
> 
> Doesn't that imply that the statement level triggers should be fired
> for all the tables that get changed for statement?  If so, then in
> your case it should never fire for parent table, which means we could
> disallow statement level triggers as well on parent tables?

I may have misunderstood statement-level triggers, but don't they apply to
tables *specified* as the target table in the statement, instead of those
*changed* by resulting actions?

Now in case of inheritance, unless ONLY is specified, all tables in the
hierarchy including the parent are *implicitly* specified to be affected
by an UPDATE or DELETE operation.  So, if some or all of those tables have
any statement-level triggers defined, they should get fired.  That was the
conclusion of that thread, but that TODO item still remains [1].

I am not (or no longer) sure how that argument affects INSERT on
partitioned tables with tuple-routing though.  Are partitions at all
levels *implicitly specified to be affected* when we say INSERT INTO
root_partitioned_table?

> Some of the other things that we might want to consider disallowing on
> parent table could be:
> a. Policy on table_name

Perhaps.  Since there are no rows in the parent table(s) itself of a
partition hierarchy, it might not make sense to continue to allow creating
row-level security policies on them.

> b. Alter table has many clauses, are all of those allowed and will it
> make sense to allow them?

Currently, we only disallow the following with partitioned parent tables
as far as alter table is concerned.

- cannot change inheritance by ALTER TABLE partitioned_table INHERIT ...

- cannot let them be regular inheritance parents either - that is, the
  following is disallowed: ALTER TABLE some_able INHERIT partitioned_table

- cannot create UNIQUE, PRIMARY KEY, FOREIGN KEY, EXCLUDE constraints

- cannot drop column involved in the partitioning key

Most other forms that affect attributes and constraints follow the regular
inheritance behavior (recursion) with certain exceptions such as:

- cannot add/drop an attribute or check constraint to *only* to/from
  the parent

- cannot add/drop NOT NULL constraint to/from *only* the parent

Thoughts?

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-10-25 Thread Amit Langote


Thanks for the review!

On 2016/10/25 20:32, Amit Kapila wrote:
> On Thu, Oct 6, 2016 at 12:44 PM, Amit Langote
>  wrote:
>> On 2016/10/05 2:12, Robert Haas wrote:
> 
>> Attached revised patches.
> 
> Few assorted review comments for 0001-Catalog*:
> 
> 
> 1.
> @@ -1775,6 +1775,12 @@ BeginCopyTo(ParseState *pstate,
> {
> ..
> + else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
> + ereport(ERROR,
> + (errcode(ERRCODE_WRONG_OBJECT_TYPE),
> + errmsg("cannot copy from partitioned table \"%s\"",
> + RelationGetRelationName(rel)),
> + errhint("Try the COPY (SELECT ...) TO variant.")));
> ..
> }
> 
> Why is this restriction?  Won't it be useful to allow it for the cases
> when user wants to copy the data of all the partitions?

Sure, CopyTo() can be be taught to scan leaf partitions when a partitioned
table is specified, but I thought this may be fine initially.

> 2.
> + if (!pg_strcasecmp(stmt->partspec->strategy, "list") &&
> + partnatts > 1)
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
> + errmsg("cannot list partition using more than one column")));
> 
> /cannot list/cannot use list

Actually "list partition" works here as a verb, as in "to list partition".

> 3.
> @@ -77,7 +77,7 @@ typedef enum DependencyType
>   DEPENDENCY_INTERNAL = 'i',
>   DEPENDENCY_EXTENSION = 'e',
>   DEPENDENCY_AUTO_EXTENSION = 'x',
> - DEPENDENCY_PIN = 'p'
> + DEPENDENCY_PIN = 'p',
>  } DependencyType;
> Why is this change required?

Looks like a leftover hunk from previous revision.  Will fix.

> 4.
> @@ -0,0 +1,69 @@
> +/*-
> + *
> + * pg_partitioned_table.h
> + *  definition of the system "partitioned table" relation
> + *  along with the relation's initial contents.
> + *
> + *
> + * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
> 
> Copyright year should be 2016.

Oops, will fix.

> 5.
> +/*
> + * PartitionSpec - partition key definition including the strategy
> + *
> + * 'strategy' partition strategy name ('list', 'range', etc.)
> 
> etc. in above comment seems to be unnecessary.

Will fix, although  I thought that list is yet incomplete.

> 6.
> + {PartitionedRelationId, /* PARTEDRELID */
> 
> Here PARTEDRELID sounds inconvenient, how about PARTRELID?

Agreed.  There used to be another catalog which had used up PARTRELID, but
that's no longer an issue.

I will include these changes in the next version of patches I will post
soon in reply to [1].

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BTgmoYJcUTcN7vVgg54GHtffH11JJWYZnfF4KiRxjV-iaACQg%40mail.gmail.com




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Unused variable in postgres_fdw/deparse.c

2016-10-25 Thread Kyotaro HORIGUCHI

Hello, compiler complains about unused variable during build
postgres_fdw without assertions (!--enable-cassert).

deparse.c: In function ‘deparseFromExpr’:
deparse.c:1029:14: warning: unused variable ‘foreignrel’ [-Wunused-variable]
  RelOptInfo *foreignrel = context->foreignrel;
  ^~

The attched patch removes it and moves into the assertion below it.

regards,

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 8da8c11..450693a 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1026,11 +1026,10 @@ static void
 deparseFromExpr(List *quals, deparse_expr_cxt *context)
 {
 	StringInfo	buf = context->buf;
-	RelOptInfo *foreignrel = context->foreignrel;
 	RelOptInfo *scanrel = context->scanrel;
 
 	/* For upper relations, scanrel must be either a joinrel or a baserel */
-	Assert(foreignrel->reloptkind != RELOPT_UPPER_REL ||
+	Assert(context->foreignrel->reloptkind != RELOPT_UPPER_REL ||
 		   scanrel->reloptkind == RELOPT_JOINREL ||
 		   scanrel->reloptkind == RELOPT_BASEREL);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wraparound warning

2016-10-25 Thread Noah Misch

On Tue, Oct 25, 2016 at 01:32:04PM -0400, Bruce Momjian wrote:
> On Tue, Oct 25, 2016 at 01:19:26PM -0400, Robert Haas wrote:
> > On Tue, Oct 25, 2016 at 11:20 AM, Bruce Momjian  wrote:
> > > LOG:  MultiXact member wraparound protections are now enabled
> > >
> > > I thought that was going to be removed at some point, no?

Latest proposal on when to remove it:
https://www.postgresql.org/message-id/flat/CA%2BTgmobOoo9a9tGbvXgnvZC_p_15sVNHu2mV9usbB_OLGRZSmw%40mail.gmail.com

Since we have outstanding multixact bugs, that proposal's clock of "a couple
of years" has not begun to tick.

> > Maybe what we should do (since this message obviously annoys everyone)
> 
> It is a persistent reminder our of engineering failure.  I guess it
> should annoy us, and maybe that's OK.

Yep.

> > is change things so that, starting in v10, a message is logged at
> > startup if those protections are NOT enabled, and nothing is logged if
> > they are enabled.  Keep the message for the case where protections are
> > enabled later, after startup time.

I continue to find the message good as-is, per the thread I linked above.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: two slab-like memory allocators

2016-10-25 Thread Jim Nasby


On 10/25/16 4:48 PM, Tomas Vondra wrote:

The main issue that bugs me is the name of the Gen allocator, but I
don't have a good naming ideas :( The basic characteristics of Gen is
that it does not reuse space released by pfree(), relying on the fact
that the whole block will become free. That should be reflected in the
name somehow, I guess.


OneTime? OneUse? OneShot? AllocOnce?

OneHitWonder? ;P
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] 9.6, background worker processes, and PL/Java

2016-10-25 Thread Chapman Flack

Hi,

I have a report of a PL/Java crash in 9.6 where the stack trace
suggests it was trying to initialize in a background worker
process (not sure why that even happened, yet), and by my first
glance, it seems to have crashed dereferencing MyProcPort, which
I am guessing a BGW might not always have (?).

So, as I try to get up to speed on this PostgreSQL feature, it
seems to me that I have up to three different cases that I may
need to make PL/Java detect and respond appropriately to. (If
you see me veering into any misconceptions, please let me know.)

1. A worker explicitly created with Register... or RegisterDynamic...
   that has not called ...InitializeConnection... and so isn't
   any particular user or connected to any database.

2. A worker explicitly created that has called ...Initialize...
   and therefore is connected to some database as some user.
   (So, is there a MyProcPort in this case?)

3. A worker implicitly created for a parallel query plan (and therefore
   associated with a database and a user). Does this have a MyProcPort?


Case 1, I think I at most need to detect and ereport. It is hard to
imagine how it could even arise, as without a database connection
there's no pg_extension, pg_language, or pg_proc, but I suppose it
could happen if someone misguidedly puts libpljava in
shared_preload_libraries, or some other bgw code inexplicably loads
it. It's a non-useful case as PL/Java has nothing to do without
a database connection and sqlj schema.

Case 2 might be worth supporting, but I may need to account for
anything that differs in this environment from a normal connected
backend.

Case 3 seems most likely. It should only be possible by invoking
a declared Java function that somebody marked parallel-safe, right?
In the parallel-unsafe or -restricted cases, PL/Java can only find
itself invoked within the leader process?

Such a leader process can only be a normal backend? Or perhaps also
a case-2 explicitly created BGW that is executing a query?

My main question is, what state do I need to examine at startup
in order to distinguish these cases? Do I detect I'm in a BGW by
a non-null MyBgworkerEntry? If it's there, do I detect whether
I have a database and an identity by checking for a MyProcPort,
or some other way?

As for declaring functions parallel-unsafe, -restricted, or -safe,
I assume there should be no problems with PL/Java functions with
the default designation of unsafe. There should be no essential
problem if someone declares a function -restricted - provided PL/Java
itself can be audited to make sure it doesn't do any of the things
restricted functions can't do - as it will only be running in the
leader process anyway.

Even should somebody mark a PL/Java function safe, while hard to
imagine a good case for, shouldn't really break anything; as the
workers are separate processes, this should be safe. Any imagined
speed advantage of the parallel query is likely to evaporate while
the several processes load their own JVMs, but nothing should
outright break.

That leads me to:

Are BGWs for parallel queries born fresh for each query, or do they
get pooled and reused?

If pooled, can they be reused across backends/database connections/
identities, or only by the backend that created them?

If reusable across contexts, that's a dealbreaker and I'd have to
have PL/Java reject any parallel-safe declaration, but a pool tied
to a connection should be ok (and better yet, allow amortizing the
JVM startup cost).

If pooled, and tied to the backend that started them, do they need
to do anything special to detect when the leader has executed
SET ROLE or SET SESSION AUTHORIZATION?

If all of this is covered to death in some document I obviously
haven't read, please feel free to point me to it.

Thanks!
-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-25 Thread Jim Nasby

On 10/22/16 12:38 PM, Tom Lane wrote:

Jim Nasby  writes:

> On 10/21/16 7:43 PM, Tom Lane wrote:

>> Alvaro Herrera  writes:

>>> Agreed.  The problem is how to install it without breaking pg_upgrade.

> It can't look up relation names?

It can't shove 64 bytes into a page that has < 64 bytes free space.

Yeah, that would need to be a special case, but unless the page pointers 
are horked you'd be able to tell if there was a name in there.

>> Well, that's the first problem.  The second problem is how to cope with
>> RENAME TABLE.

> If the name was only encoded in the first block of the relation I'm not
> sure how bad this would be; are there any facilities to change the name
> back on a rollback?

No.  How would that work in crash situations?  (And keep in mind that the

Well, if FPI's were enabled you'd get the old page back. Even without 
that recovery could remember rename records it's seen that didn't commit.

argument for this hinges entirely on its being 100% trustworthy after a
crash.)

I don't think this needs to be 100% reliable to be useful, especially if 
we went with Andreas suggestion to store both the old and new name (and 
storing the OID as well isn't a bad idea). If you're ever at the point 
of looking at this info you're already in deep trouble and should 
already be taking everything with a grain of salt anyway.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: two slab-like memory allocators

2016-10-25 Thread Jim Nasby


On 10/1/16 7:34 PM, Tomas Vondra wrote:

+/* otherwise add it to the proper freelist bin */
Looks like something went missing... :)



Ummm? The patch contains this:

+/* otherwise add it to the proper freelist bin */
+if (set->freelist[block->nfree])
+set->freelist[block->nfree]->prev = block;
+
+block->next = set->freelist[block->nfree];
+set->freelist[block->nfree] = block;

Which does exactly the thing it should do. Or what is missing?


What's confusing is the "otherwise" right at the beginning of the function:

+static void
+add_to_freelist(Slab set, SlabBlock block)
+{
+   /* otherwise add it to the proper freelist bin */
+   if (set->freelist[block->nfree])
+   set->freelist[block->nfree]->prev = block;
+
+   block->next = set->freelist[block->nfree];
+   set->freelist[block->nfree] = block;
+}

Otherwise what? What's the other option?


(Haven't looked at the newer patch, so maybe this isn't an issue anymore.)
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improving RLS planning

2016-10-25 Thread David Rowley

On 26 October 2016 at 10:58, Tom Lane  wrote:
> The alternative I'm now thinking about pursuing is to get rid of the
> conversion of RLS quals to subqueries.  Instead, we can label individual
> qual clauses with security precedence markings.  Concretely, suppose we
> add an "int security_level" field to struct RestrictInfo.  The semantics
> of this would be that a qual with a lower security_level value must be
> evaluated before a qual with a higher security_level value, unless the
> latter qual is leakproof.  (It would likely also behoove us to add a
> "leakproof" bool field to struct RestrictInfo, to avoid duplicate
> leakproof-ness checks on quals.  But that's just an optimization.)

I wonder if there will be a need for more security_levels in the
future, otherwise perhaps a more generic field can be added, like "int
evaulation_flags".
In [1], there's still things to work out with it, but I mentioned that
I'd like to improve equivalence classes to handle more than just
simple equality, which seems to require "optional" RestrictInfos. It
would be nice if we could store all this in one field as a set of
bits.

[1] 
https://www.postgresql.org/message-id/cakjs1f9fpdlkm6%3dsuzagwuch3otbspk6k0yt8-a1hgjfapl...@mail.gmail.com
-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Does it make sense to add a -q (quiet) flag to initdb?

2016-10-25 Thread Jim Nasby


On 10/25/16 11:26 AM, Joshua D. Drake wrote:

Per: https://www.commandprompt.com/blog/can_i_make_initdb_quiet/

This was a question that was asked on #postgresql. Obviously we found a
work around but I wonder if it makes sense to add a -q to solve some of
these issues? (I could see it being useful with automation).


Well, there's always pg_ctl initdb -s (not sure why it's -s instead of 
the more common -q...). ISTM it'd be better to point people that 
direction, but a silent option to initdb certainly wouldn't hurt (and 
would maybe simplify pg_ctl as well...)

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Improving RLS planning

2016-10-25 Thread Tom Lane

Currently, we don't produce very good plans when row-level security
is enabled.  An example is that, given

create table t1 (pk1 int primary key, label text);
create table t2 (pk2 int primary key, fk int references t1);

then for

select * from t1, t2 where pk1 = fk and pk2 = 42;

you would ordinarily get a cheap plan like

 Nested Loop
   ->  Index Scan using t2_pkey on t2
 Index Cond: (pk2 = 42)
   ->  Index Scan using t1_pkey on t1
 Index Cond: (pk1 = t2.fk)

But stick an RLS policy on t1, and that degrades to a seqscan, eg

 Nested Loop
   Join Filter: (t1.pk1 = t2.fk)
   ->  Index Scan using t2_pkey on t2
 Index Cond: (pk2 = 42)
   ->  Seq Scan on t1
 Filter: (label = 'public'::text)

The reason for this is that we implement RLS by turning the reference
to t1 into a sub-SELECT, and the planner's recursive invocation of
subquery_planner produces only a seqscan path for t1, there not being
any reason visible in the subquery for it to do differently.

I have been thinking about improving this by allowing subquery_planner
to generate parameterized paths; but the more I think about that the
less satisfied I am with it.  It will be quite expensive and probably
will still fail to find desirable plans in many cases.  (I've not given
up on parameterized subquery paths altogether --- I just feel it'd be a
brute-force and not very effective way of dealing with RLS.)

The alternative I'm now thinking about pursuing is to get rid of the
conversion of RLS quals to subqueries.  Instead, we can label individual
qual clauses with security precedence markings.  Concretely, suppose we
add an "int security_level" field to struct RestrictInfo.  The semantics
of this would be that a qual with a lower security_level value must be
evaluated before a qual with a higher security_level value, unless the
latter qual is leakproof.  (It would likely also behoove us to add a
"leakproof" bool field to struct RestrictInfo, to avoid duplicate
leakproof-ness checks on quals.  But that's just an optimization.)

In the initial implementation, quals coming from a RangeTblEntry's
securityQuals field would have security_level 0, quals coming from
anywhere else would have security_level 1; except that if we know
there are no security quals anywhere (ie not Query->hasRowSecurity),
we could give all quals security_level 0.  (I think this exception
may be worth making because there's no need to test leakproofness
for a qual with security level 0; it could never be a candidate
for security delay anyway.)

Having done that much, I think all we need in order to get rid of
RLS subqueries, and just stick RLS quals into their relation's
baserestrictinfo list, are two rules:

1. When selecting potential indexquals, a RestrictInfo can be considered
for indexqual use only if it is leakproof or has security_level <= the
minimum among the table's baserestrictinfo clauses.

2. In order_qual_clauses, sort first by security_level and second by cost.

This would already be enough of a win to be worth doing.  I think though
that this mechanism can be extended to also allow getting rid of the
restriction that security-barrier views can't be flattened.  The idea
would be to make sure that quals coming from above the SB view are given
higher security_level values than quals within the SB view.  We'd need
some extra mechanism to make that possible --- perhaps an additional kind
of node within jointree nests to show where there had been a
security-barrier boundary, and then some smarts in distribute_qual_to_rels
to prevent pushing upper quals down past a lower qual of strictly lesser
security level.  But that can come later.  (We do not need such smarts
to fix the RLS problem, because in the initial version, quals with lower
security level than another qual could only exist at the baserel level.)

In short, I'm proposing to throw away the entire existing implementation
for planning of RLS and SB views, and start over.

There are some corner cases I've not entirely worked out, in particular
what security_level to assign to quals generated from EquivalenceClasses.
A safe but not optimal answer would be to assign them the maximum
security_level of any source clause of the EC.  Maybe it's not worth
working harder than that, because most equality operators are leakproof
anyway, so that it wouldn't matter what level we assigned them.

Before I start implementing this, can anyone see a fatal flaw in the
design?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup stream xlog to tar

2016-10-25 Thread Michael Paquier

On Wed, Oct 26, 2016 at 2:00 AM, Magnus Hagander  wrote:
> Thanks, applied and pushed.

Thanks.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: two slab-like memory allocators

2016-10-25 Thread Tomas Vondra


On 10/23/2016 05:26 PM, Petr Jelinek wrote:

On 23/10/16 16:26, Tomas Vondra wrote:

On 10/22/2016 08:30 PM, Tomas Vondra wrote:
...
Moreover, the slab/gen allocators proposed here seem like a better
fit for reorderbuffer, e.g. because they release memory. I haven't
looked at sb_alloc too closely, but I think it behaves more like
AllocSet in this regard (i.e. keeping the memory indefinitely).



For reorderbuffer, from what I've seen in practice, I'd prefer
proper freeing to 5% performance gain as I seen walsenders taking GBs
of memory dues to reoderbuffer allocations that are never properly
freed.



Right.

>

About your actual patch. I do like both the Slab and the Gen allocators
and think that we should proceed with them for the moment. You
definitely need to rename the Gen one (don't ask me to what though) as
it sounds like "generic" and do some finishing touches but I think it's
the way to go. I don't see any point in GenSlab anymore.



Attached is a v5 of the patch that does this i.e. throws away the 
GenSlab allocator and modifies reorderbuffer in two steps.


First (0001) it adds Slab allocator for TXN/Change allocations, and 
keeps the local slab cache for TupleBuf allocations (with a separate 
AllocSet context).


Then (in 0002) it adds the Gen allocator for TupleBuf, removing the last 
bits of the local slab cache.


I do think this version is is as simple as it gets - there's not much 
more we could simplify / remove.


The main issue that bugs me is the name of the Gen allocator, but I 
don't have a good naming ideas :( The basic characteristics of Gen is 
that it does not reuse space released by pfree(), relying on the fact 
that the whole block will become free. That should be reflected in the 
name somehow, I guess.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


slab-allocators-v5.tgz
Description: application/compressed-tar

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wraparound warning

2016-10-25 Thread Robert Haas

On Tue, Oct 25, 2016 at 1:32 PM, Bruce Momjian  wrote:
> On Tue, Oct 25, 2016 at 01:19:26PM -0400, Robert Haas wrote:
>> On Tue, Oct 25, 2016 at 11:20 AM, Bruce Momjian  wrote:
>> > Do we still need to report the wraparound warning on server startup?
>> >
>> > LOG:  MultiXact member wraparound protections are now enabled
>> >
>> > I thought that was going to be removed at some point, no?
>>
>> If you start with a 9.3.small cluster and do a pg_upgrade to 10, you
>> could end up not seeing that message and not having those protections
>> enabled, and it would be useful to be able to tell that from looking
>> at the log.
>
> Uh, I am confused.  I am seeing this on a fresh PG 10 initdb'ed cluster,
> not one pg_upgraded.  Is that expected?

Yes, that is expected.  If you initdb with a version >= 9.5, you will
always see that message on every startup.  However, if you initdb with
an older version, things get messed up, and then you pg_upgrade, you
might fail to see it.  Not seeing it would indicate that your cluster
is in a bad situation, which is something you might want to know
about.

>> is change things so that, starting in v10, a message is logged at
>> startup if those protections are NOT enabled, and nothing is logged if
>> they are enabled.  Keep the message for the case where protections are
>> enabled later, after startup time.
>
> How are those protections enabled/disabled?

When you start the cluster, they are disabled.  We then try to enable
them every time SetOffsetVacuumLimit() is called.  It tries to
determine the oldest multixact offset by examining what it thinks is
the oldest multixact.  If that works, then it enables the protections.
If that fails because the "oldest multixact" value is corrupted and
that multixact doesn't actually exist, then it forces autovacuum to
run in emergency mode, which will eventually correct the problem.
Once the problem has been corrected, the next call to
SetOffsetVacuumLimit() will enable the protections.

Normally, your cluster is OK, so the very first call to that function
enables the protections right at startup time.  Maybe we should skip
emitting the message in that case, so that the "normal" case doesn't
generate extra log chatter.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump: Simplify internal archive version handling

2016-10-25 Thread Peter Eisentraut

On 10/13/16 9:13 AM, Tom Lane wrote:
> Peter Eisentraut  writes:
>> I propose the attached patch to clean up some redundancy and confusion
>> in pg_dump.
> 
> Looks sane, though I'd suggest coding the access macros with shifts
> not multiplies/divides.

Done.

> Since these numbers are purely internal and not stored in this form in the
> file, I wonder whether we shouldn't drop the rightmost zero byte, ie make
> it just  not 00.  That would save at least
> one divide/shift when accessing.

Done and committed.  Thanks.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-25 Thread Merlin Moncure

On Tue, Oct 25, 2016 at 2:31 PM, Tom Lane  wrote:
> Merlin Moncure  writes:
>> What if the subsequent dataloss was in fact a symptom of the first
>> outage?  Is in theory possible for data to appear visible but then be
>> eaten up as the transactions making the data visible get voided out by
>> some other mechanic?  I had to pull a quick restart the first time and
>> everything looked ok -- or so I thought.   What I think was actually
>> happening is that data started to slip into the void.   It's like
>> randomly sys catalogs were dropping off. I bet other data was, too.  I
>> can pull older backups and verify that.  It's as if some creeping xmin
>> was snuffing everything out.
>
> Might be interesting to look at age(xmin) in a few different system
> catalogs.  I think you can ignore entries with age = 2147483647;
> those should be frozen rows.  But if you see entries with very large
> ages that are not that, it'd be suspicious.

nothing really stands out.

The damage did re-occur after a dump/restore -- not sure about a
cluster level rebuild.  No problems previous to that.  This suggests
that if this theory holds the damage would have had to have been under
the database level -- perhaps in clog.  Maybe hint bits and clog did
not agree as to commit or delete status for example.  clog has plenty
of history leading past the problem barrier:
-rwx-- 1 postgres postgres 256K Jul 10 16:21 
-rwx-- 1 postgres postgres 256K Jul 21 12:39 0001
-rwx-- 1 postgres postgres 256K Jul 21 13:19 0002
-rwx-- 1 postgres postgres 256K Jul 21 13:59 0003


Confirmation of problem re-occurrence will come in a few days.I'm
much more likely to believe 6+sigma occurrence (storage, freak bug,
etc) should it prove the problem goes away post rebuild.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-25 Thread Tom Lane

Merlin Moncure  writes:
> What if the subsequent dataloss was in fact a symptom of the first
> outage?  Is in theory possible for data to appear visible but then be
> eaten up as the transactions making the data visible get voided out by
> some other mechanic?  I had to pull a quick restart the first time and
> everything looked ok -- or so I thought.   What I think was actually
> happening is that data started to slip into the void.   It's like
> randomly sys catalogs were dropping off. I bet other data was, too.  I
> can pull older backups and verify that.  It's as if some creeping xmin
> was snuffing everything out.

Might be interesting to look at age(xmin) in a few different system
catalogs.  I think you can ignore entries with age = 2147483647;
those should be frozen rows.  But if you see entries with very large
ages that are not that, it'd be suspicious.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-25 Thread Merlin Moncure

On Tue, Oct 25, 2016 at 12:57 PM, Alvaro Herrera
 wrote:
> Merlin Moncure wrote:
>
>> After last night, I rebuilt the cluster, turning on checksums, turning
>> on synchronous commit (it was off) and added a standby replica.  This
>> should help narrow the problem down should it re-occur; if storage is
>> bad (note, other database on same machine is doing 10x write activity
>> and is fine) or something is scribbling on shared memory (my guess
>> here)  then checksums should be popped, right?
>
> Not really sure about that.  As I recall we compute the CRC on the
> buffer's way out, based on the then-current contents, so if something
> scribbles on the buffer while it's waiting to be evicted, the CRC
> computation would include the new (corrupted) bytes rather than the
> original ones -- see FlushBuffer.

Huh. I have a new theory on this.  Dealing with the reconstituted
database, I'm finding more things -- functions and such, that are
simply gone and had to be rebuilt -- they escaped notice as they were
not in primary code paths.  Recall that the original outage came
manifested as queries getting stuck, possibly on spinlock (we don't
know for sure).  After that, things started to randomly disappear,
possibly from system catalogs (but now need to go back and verify
older data, I think).  There were three autovac processes running.

What if the subsequent dataloss was in fact a symptom of the first
outage?  Is in theory possible for data to appear visible but then be
eaten up as the transactions making the data visible get voided out by
some other mechanic?  I had to pull a quick restart the first time and
everything looked ok -- or so I thought.   What I think was actually
happening is that data started to slip into the void.   It's like
randomly sys catalogs were dropping off. I bet other data was, too.  I
can pull older backups and verify that.  It's as if some creeping xmin
was snuffing everything out.

The confirmation of this should be obvious -- if that's indeed the
case, the backup and restored cluster should no longer present data
loss. Given that I was getting that every 1-2 days, we should be able
to figure that out pretty soon.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-25 Thread Alvaro Herrera

Merlin Moncure wrote:

> After last night, I rebuilt the cluster, turning on checksums, turning
> on synchronous commit (it was off) and added a standby replica.  This
> should help narrow the problem down should it re-occur; if storage is
> bad (note, other database on same machine is doing 10x write activity
> and is fine) or something is scribbling on shared memory (my guess
> here)  then checksums should be popped, right?

Not really sure about that.  As I recall we compute the CRC on the
buffer's way out, based on the then-current contents, so if something
scribbles on the buffer while it's waiting to be evicted, the CRC
computation would include the new (corrupted) bytes rather than the
original ones -- see FlushBuffer.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-25 Thread Merlin Moncure

On Mon, Oct 24, 2016 at 9:18 PM, Alvaro Herrera
 wrote:
> Merlin Moncure wrote:
>> On Mon, Oct 24, 2016 at 6:01 PM, Merlin Moncure  wrote:
>
>> > Corruption struck again.
>> > This time got another case of view busted -- attempting to create
>> > gives missing 'type' error.
>>
>> Call it a hunch -- I think the problem is in pl/sh.
>
> I've heard that before.

well, yeah, previously I had an issue where the database crashed
during a heavy concurrent pl/sh based load.   However the problems
went away when I refactored the code.   Anyways, I looked at the code
and couldn't see anything obviously wrong so who knows?  All I know is
my production database is exploding continuously and I'm looking for
answers.  The only other extension in heavy use on this servers is
postgres_fdw.

The other database on the cluster is fine, which kind of suggests we
are not facing clog or WAL type problems.

After last night, I rebuilt the cluster, turning on checksums, turning
on synchronous commit (it was off) and added a standby replica.  This
should help narrow the problem down should it re-occur; if storage is
bad (note, other database on same machine is doing 10x write activity
and is fine) or something is scribbling on shared memory (my guess
here)  then checksums should be popped, right?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wraparound warning

2016-10-25 Thread Bruce Momjian

On Tue, Oct 25, 2016 at 01:19:26PM -0400, Robert Haas wrote:
> On Tue, Oct 25, 2016 at 11:20 AM, Bruce Momjian  wrote:
> > Do we still need to report the wraparound warning on server startup?
> >
> > LOG:  MultiXact member wraparound protections are now enabled
> >
> > I thought that was going to be removed at some point, no?
> 
> If you start with a 9.3.small cluster and do a pg_upgrade to 10, you
> could end up not seeing that message and not having those protections
> enabled, and it would be useful to be able to tell that from looking
> at the log.

Uh, I am confused.  I am seeing this on a fresh PG 10 initdb'ed cluster,
not one pg_upgraded.  Is that expected?

> Maybe what we should do (since this message obviously annoys everyone)

It is a persistent reminder our of engineering failure.  I guess it
should annoy us, and maybe that's OK.

> is change things so that, starting in v10, a message is logged at
> startup if those protections are NOT enabled, and nothing is logged if
> they are enabled.  Keep the message for the case where protections are
> enabled later, after startup time.

How are those protections enabled/disabled?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] spoonbill - rare buildfarm failures in test_shm_mq_pipelined()

2016-10-25 Thread Tom Lane

Stefan Kaltenbrunner  writes:
> Spoonbill is very rarely (ie once every few months) failing like this:

Yeah, I've been wondering about that ...

> Any ideas on what is causing this?

No, but it would sure be interesting to get a stack trace showing where
the SIGFPE is happening.

Could you change the ereport(ERROR) in FloatExceptionHandler to
ereport(PANIC) and run that single test over and over till you
get a panic?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-10-25 Thread Robert Haas

On Mon, Oct 24, 2016 at 4:15 PM, Peter Eisentraut
 wrote:
> On 10/24/16 11:57 AM, Robert Haas wrote:
>> Today, since the host part can't include a
>> port specifier, it's regarded as part of the IP address, and I think
>> it would probably be a bad idea to change that, as I believe Victor's
>> patch would.  He seems to have it in mind that we could allow things
>> like host=[1:2:3::4:5:6] or host=[1:2:3::4:5]:6, which would might be
>> helpful for the future but doesn't avoid changing the meaning of
>> connection strings that work today.
>
> Let's keep in mind here that the decision to allow database names to
> contain a connection parameter substructure has caused some security
> issues.  Let's not create more levels of ambiguity and the need to pass
> around override flags.

I agree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-10-25 Thread Robert Haas

On Mon, Oct 24, 2016 at 4:40 PM, Alvaro Herrera
 wrote:
> Umm, my recollection regarding IPv6 parsing in the URI syntax is that
> those must appear inside square brackets -- it's not valid to have the
> IPv6 address outside brackets, and the port number is necessarily
> outside the brackets.  So there's no ambiguity.  If the current patch is
> allowing IPv6 address to appear outside of brackets, that seems like a
> bug to me.

I agree.

> The string conninfo spec should not accept port numbers in the "host"
> part.  (Does it?)

Not currently, but the proposed patch would cause it to do so.

>> So now I think that to make this work correctly, we're going to need
>> to change both the URL parser and also add parsing for the host and
>> port.  Let's say the user says this:
>>
>> postgresql://[1::2]:3,[4::5],[6::7]::8/
>
> (There's a double colon before 8, I suppose that's just a typo.)

Oh, yeah.  Right.

>> It is not obvious what it means if there are multiple ports but the
>> number doesn't equal the number of hosts.
>
> I think we should reject the case of differing number of elements and
> neither host nor port is a singleton, as an error.  The suggestion to
> ignore some parts seems too error-prone.

OK, can do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wraparound warning

2016-10-25 Thread Robert Haas

On Tue, Oct 25, 2016 at 11:20 AM, Bruce Momjian  wrote:
> Do we still need to report the wraparound warning on server startup?
>
> LOG:  MultiXact member wraparound protections are now enabled
>
> I thought that was going to be removed at some point, no?

If you start with a 9.3.small cluster and do a pg_upgrade to 10, you
could end up not seeing that message and not having those protections
enabled, and it would be useful to be able to tell that from looking
at the log.

Maybe what we should do (since this message obviously annoys everyone)
is change things so that, starting in v10, a message is logged at
startup if those protections are NOT enabled, and nothing is logged if
they are enabled.  Keep the message for the case where protections are
enabled later, after startup time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: scan key push down to heap [WIP]

2016-10-25 Thread Tom Lane

Alvaro Herrera  writes:
> BTW, should we cost push-down-able quals differently, say discount some
> fraction of the cost, to reflect the fact that they are cheaper to run?
> However, since the decision of which ones to push down depends on the
> cost, and the cost would depend on which ones we push down, it looks
> rather messy.

I don't think our cost model is anywhere near refined enough to make it
worth trying to distinguish that.  (Frankly, I'm pretty skeptical of this
entire patch being worth the trouble...)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] spoonbill - rare buildfarm failures in test_shm_mq_pipelined()

2016-10-25 Thread Stefan Kaltenbrunner

Spoonbill is very rarely (ie once every few months) failing like this:

[2016-08-29 18:15:35.273 CEST:57c45f88.52d4:8] LOG:  statement: SELECT
test_shm_mq_pipelined(16384, (select
string_agg(chr(32+(random()*95)::int), '') from
generate_series(1,27)), 200, 3);
[2016-08-29 18:15:35.282 CEST:57c45f88.52d4:9] ERROR:  floating-point
exception
[2016-08-29 18:15:35.282 CEST:57c45f88.52d4:10] DETAIL:  An invalid
floating-point operation was signaled. This probably means an
out-of-range result or an invalid operation, such as division by zero.
[2016-08-29 18:15:35.282 CEST:57c45f88.52d4:11] STATEMENT:  SELECT
test_shm_mq_pipelined(16384, (select
string_agg(chr(32+(random()*95)::int), '') from
generate_series(1,27)), 200, 3);


Some examples:


http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbill=2016-10-22%2023%3A00%3A06
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbill=2016-08-29%2011%3A00%3A06
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbill=2016-06-21%2023%3A00%3A06


Any ideas on what is causing this? IIrc we had issues with that specific
test on spoonbill (and other sparc based boxes) before so maybe we
failed to fix the issue completely...




Stefan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUG] pg_basebackup from disconnected standby fails

2016-10-25 Thread Robert Haas

On Mon, Oct 24, 2016 at 2:55 AM, Michael Paquier
 wrote:
> Yep, minRecoveryPoint still gets updated when the last checkpoint
> record is the last restart point to avoid a hot standby to allow
> read-only connections at a LSN-point earlier than the last shutdown.
> Anyway, we can clearly reject 1. in the light of
> https://www.postgresql.org/message-id/caa4ek1kmjtsxqf0cav7cs4d4vwv2h_pc8d8q1bucqdzaf+7...@mail.gmail.com
> when playing with different stop locations at recovery.

I can't understand what point you're driving at here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUG] pg_basebackup from disconnected standby fails

2016-10-25 Thread Robert Haas

On Mon, Oct 24, 2016 at 12:26 AM, Amit Kapila  wrote:
>> The consensus solution on this thread seems to be that we should have
>> pg_do_stop_backup() return the last-replayed XLOG location as the
>> backup end point.  If the control file has been updated with a newer
>> redo location, then the associated checkpoint record has surely been
>> replayed, so we'll definitely have enough WAL to replay that
>> checkpoint record (and we don't need to replay anything else, because
>> we're supposing that this is the case where the minimum recovery point
>> precedes the redo location).  Although this will work, it might
>> include more WAL in the backup than is strictly necessary.  If the
>> additional WAL replayed beyond the minimum recovery point does NOT
>> include a checkpoint, we don't need any of it; if it does, we need
>> only the portion up to and including the last checkpoint record, and
>> not anything beyond that.
>
> You are right that it will include additional WAL than strictly
> necessary, but that can happen today as well because minrecoverypoint
> could be updated after you have established restart point for
> do_pg_start_backup().  Do you want to say that even without patch in
> some cases we are including additional WAL in backup than what is
> strictly necessary, so it is better to improve the situation?

In that case, including additional WAL is *necessary*.  If the minimum
recovery point is updated after do_pg_start_backup(), pages bearing
LSNs newer than the old minimum recovery point could conceivably have
been copied as part of the backup, and therefore we need to replay up
through the new minimum recovery point to guarantee consistency.

>> I can think of two solutions that would be "tighter":
>>
>> 1. When performing a restartpoint, update the minimum recovery point
>> to just beyond the checkpoint record.  I think this can't hurt anyone
>> who is actually restarting recovery on the same machine, because
>> that's exactly the point where they are going to start recovery.  A
>> minimum recovery point which precedes the point at which they will
>> start recovery is no better than one which is equal to the point at
>> which they will start recovery.
>
> I think this will work but will cause an unnecessary control file
> update for the cases when buffer flush will anyway do it.  It can work
> if we try to do only when required (minrecoverypoint is lesser than
> lastcheckpoint) after flush of buffers.

Sure, that doesn't sound too hard to implement.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUG] pg_basebackup from disconnected standby fails

2016-10-25 Thread Robert Haas

On Sun, Oct 23, 2016 at 11:48 PM, Kyotaro HORIGUCHI
 wrote:
>> I can think of two solutions that would be "tighter":
>>
>> 1. When performing a restartpoint, update the minimum recovery point
>> to just beyond the checkpoint record.  I think this can't hurt anyone
>> who is actually restarting recovery on the same machine, because
>> that's exactly the point where they are going to start recovery.  A
>> minimum recovery point which precedes the point at which they will
>> start recovery is no better than one which is equal to the point at
>> which they will start recovery.
>
> This is a candidate that I thought of. But I avoided to change
> the behavior of minRecoveryPoint that (seems to me that) it
> describes only buffer state. So checkpoint with no update doesn't
> advances minRecoveryPoint as my understanding.

I don't really know what you mean by this.  minRecoveryPoint is the
point up to which we need to replay WAL in order to achieve
consistency - the distinction between "buffer state" and something
else is artificial.  My point is that the amount of WAL we need to
replay can never be negative, but that's what a minimum recovery point
prior to the end of the checkpoint record used for start-of-REDO
implies.

>> 2. In perform_base_backup(), if the endptr returned by
>> do_pg_stop_backup() precedes the end of the checkpoint record returned
>> by do_pg_start_backup(), use the latter value instead.  Unfortunately,
>> that's not so easy: we can't just say if (endptr < starptr) startptr =
>> endptr; because startptr is the *start* of the checkpoint record, not
>> the end.  I suspect somebody could figure out a solution to this
>> problem, though.
>
> Yes, and the reason I rejected it was that it is not logical,
> maybe the same meaning to it would cause another problem later.

I don't know why this isn't logical.  If you want to say it's going to
cause another problem later, you should say what problem you think it
will cause, not just guess that there might be one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: scan key push down to heap [WIP]

2016-10-25 Thread Alvaro Herrera

Dilip Kumar wrote:

> #2. Currently quals are ordered based on cost (refer
> order_qual_clauses), But once we pushdown some of the quals, then
> those quals will always be executed first. Can this create problem ?

We don't promise order of execution (which is why we can afford to sort
on cost), but I think it makes sense to keep a rough ordering based on
cost, and not let this push-down affect those ordering decisions too
much.

I think it's fine to push-down quals that are within the same order of
magnitude of the cost of a pushable condition, while keeping any other
much-costlier qual (which could otherwise be pushed down) together with
non-pushable conditions; this would sort-of guarantee within-order-of-
magnitude order of execution of quals.

Hopefully an example clarifies what I mean.  Let's suppose you have
three quals, where qual2 is non-pushable but 1 and 3 are.  cost(1)=10,
cost(2)=11, cost(3)=12.  Currently, they are executed in that order.

If you were to compare costs in the straightforward way, you would push
only 1 (because 3 is costlier than 2 which is not push-down-able).  With
fuzzy comparisons, you'd push both 1 and 3, because cost of 3 is close
enough to that of qual 2.

But if cost(3)=100 then only push qual 1, and let qual 3 be evaluated
together with 2 outside the scan node.

BTW, should we cost push-down-able quals differently, say discount some
fraction of the cost, to reflect the fact that they are cheaper to run?
However, since the decision of which ones to push down depends on the
cost, and the cost would depend on which ones we push down, it looks
rather messy.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup stream xlog to tar

2016-10-25 Thread Magnus Hagander

On Tue, Oct 25, 2016 at 2:52 PM, Michael Paquier 
wrote:

> On Tue, Oct 25, 2016 at 7:12 PM, Magnus Hagander 
> wrote:
> > On Mon, Oct 24, 2016 at 7:46 AM, Michael Paquier <
> michael.paqu...@gmail.com>
> > wrote:
> >>
> >> On Sun, Oct 23, 2016 at 10:28 PM, Magnus Hagander 
> >> wrote:
> >> +   if (format == 'p')
> >> +   stream.walmethod = CreateWalDirectoryMethod(param->xlog,
> do_sync);
> >> +   else
> >> +   stream.walmethod = CreateWalTarMethod(param->xlog,
> >> compresslevel, do_sync);
> >> LogStreamerMain() exits immediately once it is done, but I think that
> >> we had better be tidy here and clean up the WAL methods that have been
> >> allocated. I am thinking here about a potentially retry method on
> >> failure, though the best shot in this area would be with
> >> ReceiveXlogStream().
> >
> > Wouldn't the same be needed in pg_receivexlog.c in that case?
>
> Oops, missed that. Thanks for the extra checks. Attached is an updated
> patch.
>

Thanks, applied and pushed.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Proposal: scan key push down to heap [WIP]

2016-10-25 Thread Dilip Kumar

On Fri, Oct 14, 2016 at 11:54 AM, Andres Freund  wrote:
> I don't think it's a good idea to do this under the content lock in any
> case. But luckily I don't think we have to do so at all.
>
> Due to pagemode - which is used by pretty much everything iterating over
> heaps, and definitely seqscans - the key test already happens without
> the content lock held, in heapgettup_pagemode():

Yes, you are right. Now after this clarification,  it's clear that
even we push down the key we are not evaluating it under buffer
content lock.

As of now, I have done my further analysis by keeping in mind only
pushing 'var op const'. Below are my observations.

#1. If we process the qual in executor, all temp memory allocation are
under "per_tuple_context" which get reset after each tuple process.
But, on other hand if we do that in heap, then that will be under
"per_query_context". This restrict us to pushdown any qual which need
to do memory allocations like "toastable" field.

Is there any other way to handle this ?

#2. Currently quals are ordered based on cost (refer
order_qual_clauses), But once we pushdown some of the quals, then
those quals will always be executed first. Can this create problem ?

consider below example..

create or replace function f1(anyelement) returns bool as
$$
begin
raise notice '%', $1;
return true;
end;
$$ LANGUAGE plpgsql cost 0.01;

select * from t3 where a>1 and f1(b);

In this case "f1(b)" will always be executed as first qual (cost is
set very low by user) hence it will raise notice for all the tuple.
But if we pushdown "a>1" qual to heap then only qualified tuple will
be passed to "f1(b)".

Is it behaviour change ?

I know that, it can also impact the performance, because when user has
given some function with very low cost then that should be executed
first as it may discard most of the tuple.

One solution to this can be..
1. Only pushdown if either all quals are of same cost.
2. Pushdown all quals until we find first non pushable qual (this way
we can maintain the same qual execution order what is there in
existing system).

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Does it make sense to add a -q (quiet) flag to initdb?

2016-10-25 Thread Joshua D. Drake


Hello,

Per: https://www.commandprompt.com/blog/can_i_make_initdb_quiet/

This was a question that was asked on #postgresql. Obviously we found a 
work around but I wonder if it makes sense to add a -q to solve some of 
these issues? (I could see it being useful with automation).


Sincerely,

JD
--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.
Unless otherwise stated, opinions are my own.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reload config instructions

2016-10-25 Thread Bruce Momjian

On Sat, Oct 22, 2016 at 11:39:40AM -0400, Bruce Momjian wrote:
> I found our postgresql.conf and pg_hba.conf config files had
> inconsistent comment instructions about reloading, and didn't mention
> SELECT pg_reload_conf().
> 
> The attached doc patch fixes this.

Patch applied to head.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Wraparound warning

2016-10-25 Thread Bruce Momjian

Do we still need to report the wraparound warning on server startup?

LOG:  MultiXact member wraparound protections are now enabled

I thought that was going to be removed at some point, no?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recommend wrappers of PG_DETOAST_DATUM_PACKED()

2016-10-25 Thread Tom Lane

Noah Misch  writes:
> When commit 3e23b68dac006e8deb0afa327e855258df8de064 introduced single-byte
> varlena headers, its fmgr.h changes presented PG_GETARG_TEXT_PP() and
> PG_GETARG_TEXT_P() as equals.  Its postgres.h changes presented
> PG_DETOAST_DATUM_PACKED() and VARDATA_ANY() as the exceptional case.  Let's
> firmly recommend PG_GETARG_TEXT_PP() over PG_GETARG_TEXT_P(); likewise for
> other ...PP() macros.  This shaves cycles and brings consistency of style.

+1

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Recommend wrappers of PG_DETOAST_DATUM_PACKED()

2016-10-25 Thread Noah Misch

When commit 3e23b68dac006e8deb0afa327e855258df8de064 introduced single-byte
varlena headers, its fmgr.h changes presented PG_GETARG_TEXT_PP() and
PG_GETARG_TEXT_P() as equals.  Its postgres.h changes presented
PG_DETOAST_DATUM_PACKED() and VARDATA_ANY() as the exceptional case.  Let's
firmly recommend PG_GETARG_TEXT_PP() over PG_GETARG_TEXT_P(); likewise for
other ...PP() macros.  This shaves cycles and brings consistency of style.

If the attached policy-setting patch seems reasonable, I will follow it with a
mechanical patch adopting the recommended macros throughout the tree.  (If any
code does want the alignment it gets from an obsolecent macro, I will add a
comment to that code.)

Thanks,
nm
diff --git a/src/include/fmgr.h b/src/include/fmgr.h
index 0878418..2ae0b64 100644
--- a/src/include/fmgr.h
+++ b/src/include/fmgr.h
@@ -243,13 +243,9 @@ extern struct varlena *pg_detoast_datum_packed(struct 
varlena * datum);
 /* and this if you can handle 1-byte-header datums: */
 #define PG_GETARG_VARLENA_PP(n) PG_DETOAST_DATUM_PACKED(PG_GETARG_DATUM(n))
 /* DatumGetFoo macros for varlena types will typically look like this: */
-#define DatumGetByteaP(X)  ((bytea *) PG_DETOAST_DATUM(X))
 #define DatumGetByteaPP(X) ((bytea *) 
PG_DETOAST_DATUM_PACKED(X))
-#define DatumGetTextP(X)   ((text *) PG_DETOAST_DATUM(X))
 #define DatumGetTextPP(X)  ((text *) 
PG_DETOAST_DATUM_PACKED(X))
-#define DatumGetBpCharP(X) ((BpChar *) PG_DETOAST_DATUM(X))
 #define DatumGetBpCharPP(X)((BpChar *) 
PG_DETOAST_DATUM_PACKED(X))
-#define DatumGetVarCharP(X)((VarChar *) 
PG_DETOAST_DATUM(X))
 #define DatumGetVarCharPP(X)   ((VarChar *) PG_DETOAST_DATUM_PACKED(X))
 #define DatumGetHeapTupleHeader(X) ((HeapTupleHeader) PG_DETOAST_DATUM(X))
 /* And we also offer variants that return an OK-to-write copy */
@@ -264,13 +260,9 @@ extern struct varlena *pg_detoast_datum_packed(struct 
varlena * datum);
 #define DatumGetBpCharPSlice(X,m,n) ((BpChar *) PG_DETOAST_DATUM_SLICE(X,m,n))
 #define DatumGetVarCharPSlice(X,m,n) ((VarChar *) 
PG_DETOAST_DATUM_SLICE(X,m,n))
 /* GETARG macros for varlena types will typically look like this: */
-#define PG_GETARG_BYTEA_P(n)   DatumGetByteaP(PG_GETARG_DATUM(n))
 #define PG_GETARG_BYTEA_PP(n)  DatumGetByteaPP(PG_GETARG_DATUM(n))
-#define PG_GETARG_TEXT_P(n)
DatumGetTextP(PG_GETARG_DATUM(n))
 #define PG_GETARG_TEXT_PP(n)   DatumGetTextPP(PG_GETARG_DATUM(n))
-#define PG_GETARG_BPCHAR_P(n)  DatumGetBpCharP(PG_GETARG_DATUM(n))
 #define PG_GETARG_BPCHAR_PP(n) DatumGetBpCharPP(PG_GETARG_DATUM(n))
-#define PG_GETARG_VARCHAR_P(n) DatumGetVarCharP(PG_GETARG_DATUM(n))
 #define PG_GETARG_VARCHAR_PP(n)
DatumGetVarCharPP(PG_GETARG_DATUM(n))
 #define PG_GETARG_HEAPTUPLEHEADER(n)   
DatumGetHeapTupleHeader(PG_GETARG_DATUM(n))
 /* And we also offer variants that return an OK-to-write copy */
@@ -284,6 +276,21 @@ extern struct varlena *pg_detoast_datum_packed(struct 
varlena * datum);
 #define PG_GETARG_TEXT_P_SLICE(n,a,b)  
DatumGetTextPSlice(PG_GETARG_DATUM(n),a,b)
 #define PG_GETARG_BPCHAR_P_SLICE(n,a,b) 
DatumGetBpCharPSlice(PG_GETARG_DATUM(n),a,b)
 #define PG_GETARG_VARCHAR_P_SLICE(n,a,b) 
DatumGetVarCharPSlice(PG_GETARG_DATUM(n),a,b)
+/*
+ * Obsolescent variants that guarantee INT alignment for the return value.
+ * Few operations on these particular types need alignment, mainly operations
+ * that cast the VARDATA pointer to a type like int16[].  Most code should use
+ * the ...PP(X) counterpart.  Nonetheless, these appear frequently in code
+ * predating the PostgreSQL 8.3 introduction of the ...PP(X) variants.
+ */
+#define DatumGetByteaP(X)  ((bytea *) PG_DETOAST_DATUM(X))
+#define DatumGetTextP(X)   ((text *) PG_DETOAST_DATUM(X))
+#define DatumGetBpCharP(X) ((BpChar *) PG_DETOAST_DATUM(X))
+#define DatumGetVarCharP(X)((VarChar *) 
PG_DETOAST_DATUM(X))
+#define PG_GETARG_BYTEA_P(n)   DatumGetByteaP(PG_GETARG_DATUM(n))
+#define PG_GETARG_TEXT_P(n)
DatumGetTextP(PG_GETARG_DATUM(n))
+#define PG_GETARG_BPCHAR_P(n)  DatumGetBpCharP(PG_GETARG_DATUM(n))
+#define PG_GETARG_VARCHAR_P(n) DatumGetVarCharP(PG_GETARG_DATUM(n))
 
 /* To return a NULL do this: */
 #define PG_RETURN_NULL()  \
diff --git a/src/include/postgres.h b/src/include/postgres.h
index be4d0d6..5844f78 100644
--- a/src/include/postgres.h
+++ b/src/include/postgres.h
@@ -287,20 +287,18 @@ typedef struct
 /* Externally visible macros */
 
 /*
- * VARDATA, VARSIZE, and SET_VARSIZE are the recommended API for most code
- * for varlena datatypes.  Note that they only work on untoasted,
- * 4-byte-header Datums!
- *
- * Code that wants to use 1-byte-header values without detoasting should
- * use

Re: [HACKERS] [PATCH] pgpassfile connection option

2016-10-25 Thread Julian Markwort


On 10/16/2016 12:09 PM, Fabien COELHO wrote:
Patch applies cleanly, make check ok... however AFAICS it only means 
that it compiles but it is not tested in anyway... This is is 
annoying. Well I'm not sure whether other options are tested either, 
but they should. 
Thanks for taking the time to review my patch! I haven't found much 
regarding the testing of connection options. However since there isn't 
anything fancy going on in PasswordFromFile() I'm not too worried about 
that.


The documentation needs to be updated. 

I've written a couple lines now.
I aligned the definition of the connection option and the environment 
variable with that of other (conn.opt) pairs and added mention 
of the different options to the doc of the "Password File".


The reported password file is wrong. It is even more funny if 
~/.pgpass contains the right password: the pgpassfile option *is* 
taken into account, ./foo is read and it fails, but the error message 
is not updated and points to the wrong file. The error message stuff 
should be updated to look for the pgpassfile connection string option... 
That was indeed an Error on my side, I hadn't updated the errormessages 
to inform you which file has been used.


So attached is an updated version of the patch.

I'd like to ask for some input on how to handle invalid files - right 
now no message is shown, the user just gets a password prompt as a 
result, however I think a message when the custom pgpassfile hasn't been 
found would be helpful.


Kind regards,
Julian

--

Julian Markwort
Westphalian Wilhelms-University in Münster
julian(dot)markwort(at)uni-muenster(dot)de

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 4e34f00..1bd5597 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -930,7 +930,7 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
 Note that authentication is likely to fail if host
 is not the name of the server at network address hostaddr.
 Also, note that host rather than hostaddr
-is used to identify the connection in ~/.pgpass (see
+is used to identify the connection in a password file (see
 ).

 
@@ -986,6 +986,16 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   
  
 
+ 
+  pgpassfile
+  
+  
+Specifies the name of the file used to lookup passwords.
+Defaults to the password file (see ).
+  
+  
+ 
+
  
   connect_timeout
   
@@ -6862,9 +6872,8 @@ myEventProc(PGEventId evtId, void *evtInfo, void 
*passThrough)
   
PGPASSFILE
   
-  PGPASSFILE specifies the name of the password file to
-  use for lookups.  If not set, it defaults to ~/.pgpass
-  (see ).
+  PGPASSFILE behaves the same as the  connection parameter.
  
 
 
@@ -7136,13 +7145,15 @@ myEventProc(PGEventId evtId, void *evtInfo, void 
*passThrough)
   
 
   
-   The file .pgpass in a user's home directory or the
-   file referenced by PGPASSFILE can contain passwords to
+   The file .pgpass in a user's home directory can 
contain passwords to
be used if the connection requires a password (and no password has been
specified  otherwise). On Microsoft Windows the file is named
%APPDATA%\postgresql\pgpass.conf (where
%APPDATA% refers to the Application Data subdirectory in
the user's profile).
+   Alternatively, a password file can be specified
+   using the connection parameter 
+   or the environment variable PGPASSFILE.
   
 
   
diff --git a/src/interfaces/libpq/fe-connect.c 
b/src/interfaces/libpq/fe-connect.c
index f3a9e5a..db1de26 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -184,6 +184,10 @@ static const internalPQconninfoOption PQconninfoOptions[] 
= {
"Database-Password", "*", 20,
offsetof(struct pg_conn, pgpass)},
 
+   {"pgpassfile", "PGPASSFILE", NULL, NULL,
+   "Database-Password-File", "", 64,
+   offsetof(struct pg_conn, pgpassfile)},
+
{"connect_timeout", "PGCONNECT_TIMEOUT", NULL, NULL,
"Connect-timeout", "", 10,  /* strlen(INT32_MAX) == 
10 */
offsetof(struct pg_conn, connect_timeout)},
@@ -375,7 +379,7 @@ static int parseServiceFile(const char *serviceFile,
 bool *group_found);
 static char *pwdfMatchesString(char *buf, char *token);
 static char *PasswordFromFile(char *hostname, char *port, char *dbname,
-char *username);
+char *username, char *pgpassfile);
 static bool getPgPassFilename(char *pgpassfile);
 static void dot_pg_pass_warning(PGconn *conn);
 static void default_threadlock(int acquire);
@@ -806,8 +810,9 @@ connectOptions2(PGconn *conn)
{
if (conn->pgpass)
free(conn->pgpass);
+   /* We'll pass conn->pgpassfile regardless of it's contents - 
checks happen in

Re: [HACKERS] Quorum commit for multiple synchronous replication.

2016-10-25 Thread Michael Paquier

On Mon, Oct 17, 2016 at 4:00 PM, Masahiko Sawada  wrote:
> Attached latest patch.
> Please review it.

Okay, so let's move on with this patch...

+ 
+ The keyword ANY is omissible, but note that there is
+ not compatibility between PostgreSQL version 10 and
+ 9.6 or before. For example, 1 (s1, s2) is the same as the
+ configuration with FIRST and 
+ num_sync equal to 1 in PostgreSQL 9.6
+ or before.  On the other hand, It's the same as the configuration with
+ ANY and num_sync equal to
+ 1 in PostgreSQL 10 or later.
+
This paragraph could be reworded:
If FIRST or ANY are not specified, this parameter behaves as ANY. Note
that this grammar is incompatible with PostgreSQL 9.6, where no
keyword specified is equivalent as if FIRST was specified.
In short, there is no real need to specify num_sync as this behavior
does not have changed, as well as it is not necessary to mention
pre-9.6 versions as the multi-sync grammar has been added in 9.6.

-Specifying more than one standby name can allow very high availability.
Why removing this sentence?

+The keyword ANY, coupeld with an interger number N,
s/coupeld/coupled/ and s/interger/integer/, for a double hit in one
line, still...

+The keyword ANY, coupeld with an interger number N,
+chooses N standbys in a set of standbys with the same, lowest,
+priority and makes transaction commit when WAL records are received
+those N standbys.
This could be reworded more simply, for example: The keyword ANY,
coupled with an integer number N, makes transaction commits wait until
WAL records are received from N connected standbys among those defined
in the list of synchronous_standby_names.

+s2 and s3 wil be considered as synchronous standby
s/wil/will/

+  when standby is considered as a condidate of quorum commit.
s/condidate/candidate/

[... stopping here ...] Please be more careful with the documentation
and comment grammar. There are other things in the patch..

A bunch of comments at the top of syncrep.c need to be updated.

+extern List *SyncRepGetSyncStandbysPriority(bool *am_sync);
+extern List *SyncRepGetSyncStandbysQuorum(bool *am_sync);
Those two should be static routines in syncrep.c, let's keep the whole
logic about quorum and higher-priority definition only there and not
bother the callers of them about that.

+   if (SyncRepConfig->sync_method == SYNC_REP_PRIORITY)
+   return SyncRepGetSyncStandbysPriority(am_sync);
+   else /* SYNC_REP_QUORUM */
+   return SyncRepGetSyncStandbysQuorum(am_sync);
Both routines share the same logic to detect if a WAL sender can be
selected as a candidate for sync evaluation or not, still per the
selection they do I agree that it is better to keep them as separate.

+   /* In quroum method, all sync standby priorities are always 1 */
+   if (found && SyncRepConfig->sync_method == SYNC_REP_QUORUM)
+   priority = 1;
Honestly I don't understand why you are enforcing that. Priority can
be important for users willing to switch from ANY to FIRST to have a
look immediately at what are the standbys that would become sync or
potential.

else if (list_member_int(sync_standbys, i))
-   values[7] = CStringGetTextDatum("sync");
+   values[7] = SyncRepConfig->sync_method == SYNC_REP_PRIORITY ?
+   CStringGetTextDatum("sync") : CStringGetTextDatum("quorum");
The comment at the top of this code block needs to be refreshed.

The representation given to the user in pg_stat_replication is not
enough IMO. For example, imagine a cluster with 4 standbys:
=# select application_name, sync_priority, sync_state from pg_stat_replication ;
 application_name | sync_priority | sync_state
--+---+
 node_5433| 0 | async
 node_5434| 0 | async
 node_5435| 0 | async
 node_5436| 0 | async
(4 rows)

If FIRST N is used, is it easy for the user to understand what are the
nodes in sync:
=# alter system set synchronous_standby_names = 'FIRST 2 (node_5433,
node_5434, node_5435)';
ALTER SYSTEM
=# select pg_reload_conf();
 pg_reload_conf

 t
(1 row)
=# select application_name, sync_priority, sync_state from pg_stat_replication ;
 application_name | sync_priority | sync_state
--+---+
 node_5433| 1 | sync
 node_5434| 2 | sync
 node_5435| 3 | potential
 node_5436| 0 | async
(4 rows)

In this case it is easy to understand that two nodes are required to be in sync.

When using ANY similarly for three nodes, here is what
pg_stat_replication tells:
=# select application_name, sync_priority, sync_state from pg_stat_replication ;
 application_name | sync_priority | sync_state

Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1

2016-10-25 Thread Albe Laurenz

I wrote:
> Anyway, I have prepared a patch along the lines you suggest.

It occurred to me that the documentation still suggests that you should
add a declaration to a C function; I have fixed that too.

I'll add the patch to the next commitfest.

Yours,
Laurenz Albe


0001-Add-PGDLLEXPORT-to-PG_FUNCTION_INFO_V1-function-decl.patch
Description: 0001-Add-PGDLLEXPORT-to-PG_FUNCTION_INFO_V1-function-decl.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup stream xlog to tar

2016-10-25 Thread Michael Paquier

On Tue, Oct 25, 2016 at 7:12 PM, Magnus Hagander  wrote:
> On Mon, Oct 24, 2016 at 7:46 AM, Michael Paquier 
> wrote:
>>
>> On Sun, Oct 23, 2016 at 10:28 PM, Magnus Hagander 
>> wrote:
>> +   if (format == 'p')
>> +   stream.walmethod = CreateWalDirectoryMethod(param->xlog, do_sync);
>> +   else
>> +   stream.walmethod = CreateWalTarMethod(param->xlog,
>> compresslevel, do_sync);
>> LogStreamerMain() exits immediately once it is done, but I think that
>> we had better be tidy here and clean up the WAL methods that have been
>> allocated. I am thinking here about a potentially retry method on
>> failure, though the best shot in this area would be with
>> ReceiveXlogStream().
>
> Wouldn't the same be needed in pg_receivexlog.c in that case?

Oops, missed that. Thanks for the extra checks. Attached is an updated patch.
-- 
Michael
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 16cab97..e2875df 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -495,6 +495,13 @@ LogStreamerMain(logstreamer_param *param)
 	}
 
 	PQfinish(param->bgconn);
+
+	if (format == 'p')
+		FreeWalDirectoryMethod();
+	else
+		FreeWalTarMethod();
+	pg_free(stream.walmethod);
+
 	return 0;
 }
 
diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c
index bbdf96e..99445e6 100644
--- a/src/bin/pg_basebackup/pg_receivexlog.c
+++ b/src/bin/pg_basebackup/pg_receivexlog.c
@@ -352,6 +352,10 @@ StreamLog(void)
 	}
 
 	PQfinish(conn);
+
+	FreeWalDirectoryMethod();
+	pg_free(stream.walmethod);
+
 	conn = NULL;
 }
 
diff --git a/src/bin/pg_basebackup/walmethods.c b/src/bin/pg_basebackup/walmethods.c
index d1dc046..656622f 100644
--- a/src/bin/pg_basebackup/walmethods.c
+++ b/src/bin/pg_basebackup/walmethods.c
@@ -299,6 +299,13 @@ CreateWalDirectoryMethod(const char *basedir, bool sync)
 	return method;
 }
 
+void
+FreeWalDirectoryMethod(void)
+{
+	pg_free(dir_data->basedir);
+	pg_free(dir_data);
+}
+
 
 /*-
  * WalTarMethod - write wal to a tar file containing pg_xlog contents
@@ -611,6 +618,9 @@ tar_sync(Walfile f)
 	Assert(f != NULL);
 	tar_clear_error();
 
+	if (!tar_data->sync)
+		return 0;
+
 	/*
 	 * Always sync the whole tarfile, because that's all we can do. This makes
 	 * no sense on compressed files, so just ignore those.
@@ -842,7 +852,8 @@ tar_finish(void)
 #endif
 
 	/* sync the empty blocks as well, since they're after the last file */
-	fsync(tar_data->fd);
+	if (tar_data->sync)
+		fsync(tar_data->fd);
 
 	if (close(tar_data->fd) != 0)
 		return false;
@@ -890,3 +901,14 @@ CreateWalTarMethod(const char *tarbase, int compression, bool sync)
 
 	return method;
 }
+
+void
+FreeWalTarMethod(void)
+{
+	pg_free(tar_data->tarfilename);
+#ifdef HAVE_LIBZ
+	if (tar_data->compression)
+		 pg_free(tar_data->zlibOut);
+#endif
+	pg_free(tar_data);
+}
diff --git a/src/bin/pg_basebackup/walmethods.h b/src/bin/pg_basebackup/walmethods.h
index 0c8eac7..8cea8ff 100644
--- a/src/bin/pg_basebackup/walmethods.h
+++ b/src/bin/pg_basebackup/walmethods.h
@@ -43,3 +43,7 @@ struct WalWriteMethod
  */
 WalWriteMethod *CreateWalDirectoryMethod(const char *basedir, bool sync);
 WalWriteMethod *CreateWalTarMethod(const char *tarbase, int compression, bool sync);
+
+/* Cleanup routines for previously-created methods */
+void FreeWalDirectoryMethod(void);
+void FreeWalTarMethod(void);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support

2016-10-25 Thread Peter Eisentraut

On 10/25/16 1:38 AM, Haribabu Kommi wrote:
> Here I attached the first version of patch that supports both EUI-48 and
> EUI-64 type
> Mac addresses with a single datatype called macaddr. This is an variable
> length
> datatype similar like inet. It can store both 6 and 8 byte addresses.
> Variable length
> type is used because in case in future, if MAC address gets enhanced,
> still this type
> can support without breaking DISK compatibility.

Since the world knows the 6-byte variant as MAC-48, shouldn't it be
renamed to macaddr48 or even mac48?

> Currently the patch lacks of documentation. Comments?

For patches like this, it would be good if you included a mock commit
message so that someone who comes in late knows what's going on.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Push down more full joins in postgres_fdw

2016-10-25 Thread Etsuro Fujita


On 2016/10/25 18:58, Ashutosh Bapat wrote:

You wrote:

13. The comment below is missing the main purpose i.e. creating a a unique
alias, in case the relation gets converted into a subquery. Lowest or
highest
relid will create a unique alias at given level of join and that would be
more
future proof. If we decide to consider paths for every join order,
following
comment will no more be true.
+/*
+ * Set the relation index.  This is defined as the position of this
+ * joinrel in the join_rel_list list plus the length of the rtable
list.
+ * Note that since this joinrel is at the end of the list when we are
+ * called, we can get the position by list_length.
+ */
+fpinfo->relation_index =
+list_length(root->parse->rtable) +
list_length(root->join_rel_list);


I wrote:

I don't agree on that point.  As I said before, the approach using the
lowest/highest relid would make a remote query having nested subqueries
difficult to read.  And even if we decided to consider paths for every join
order, we could define the relation_index safely, by modifying
postgresGetForeignJoinPaths so that it (1) sets the relation_index the
proposed way at the first time it is called and (2) avoids setting the
relation_index after the first call, by checking to see
fpinfo->relation_index > 0.



OK. Although, the alias which contains a number, which apparently
doesn't show any relation with the relation (no pun intended :)) being
deparsed, won't make much sense in the deparsed query. But I am fine
leaving this to the committer's judgement.


Fine with me.


May be you want to add an
assert here making sure that relation_index is set only once.
Something like Assert(fpinfo->relation_index == 0) just before setting
it.


Will do.


15. While deparsing every Var, we are descending the RelOptInfo hierarchy.



Right.  I think that not only avoids creating redundant data to find the
alias of a given Var node but also would be able to find it in optimal cost.



Instead of that, we should probably gather all the alias information once
and
store it in context as an array indexed by relid. While deparsing a Var we
can
get the targetlist and alias for a given relation by using var->varno as
index.
And then search for given Var node in the targetlist to deparse the column
name
by its position in the targetlist. For the relations that are not
converted
into subqueries, this array will not hold any information and the Var will
be
deparsed as it's being done today.



Sorry, I don't understand this part.  How does that work when creating a
remote query having nested subqueries?  Is that able to search for a given
Var node efficiently?



For every relation that is deparsed as a subquery, we will create a
separate context. Each such context will have an array described
above. This array will contain the targetlist and aliases for all the
relations, covered by that relation, that are required to be deparsed
as subqueries. These arrays bubble up to topmost relation that is not
required to be deparsed as subquery. When a relation is required to be
deparsed as a subquery, its immediate upper relation sets the alias
and targetlist chosen for that relation at all the indexes
corresponding to all the base relations covered by that relation.


Thanks for the explanation!


For
example, let's assume that a relation (1, 2, 3) is required to be
deparsed as subquery for an immediate upper relation (1, 2, 3, 4, 5)
(thus the other joining relation being (4,5)). While deparsing for
relation (1,2,3,4,5), the context will contain a 5 element array, with
positions 1, 2, 3 filled by same targetlist and alias whereas
positions 4 and 5 will not be filled as those relations are not
deparsed as subqueries.


Sorry, I don't understand this.  In that case, the immediate upper 
relation (1, 2, 3, 4, 5) would need to fill the targetlist and aliases 
for *the join relation (1, 2, 3) somewhere*, not the targetlist and 
aliases for each of the component relations 1, 2, and 3, because the 
join relation is deparsed as a subquery.  Maybe I'm missing something, 
though.



Let's assume in relation (1, 2, 3), (1, 3) in
turn requires subquery but (2) does not. Thus the context created
while deparsing (1, 2, 3) will have a 3 element array with positions 1
and 3 containing the same targetlist and alias, where as position 2
will be empty.



When deparsing a Var node with varno = N and varattno =
m, if the nth position in the array in the context is empty, that Var
node will be deparsed as rN..


What happens when deparsing eg, a Var with varno = 2 at the topmost 
relation (1, 2, 3, 4, 5)?  The second position of the array is empty, 
but the join relation (1, 2, 3) is deparsed as a subquery, so the Var 
should be deparsed as an alias of an output column of the subquery at 
the topmost relation, I think.



But if that position is has
alias sZ, then we search for that Var node in the targetlist and if
it's found at kth position in the targetlist,

Re: [HACKERS] Declarative partitioning - another take

2016-10-25 Thread Amit Kapila

On Thu, Oct 6, 2016 at 12:44 PM, Amit Langote
 wrote:
> On 2016/10/05 2:12, Robert Haas wrote:

> Attached revised patches.

Few assorted review comments for 0001-Catalog*:


1.
@@ -1775,6 +1775,12 @@ BeginCopyTo(ParseState *pstate,
{
..
+ else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot copy from partitioned table \"%s\"",
+ RelationGetRelationName(rel)),
+ errhint("Try the COPY (SELECT ...) TO variant.")));
..
}

Why is this restriction?  Won't it be useful to allow it for the cases
when user wants to copy the data of all the partitions?


2.
+ if (!pg_strcasecmp(stmt->partspec->strategy, "list") &&
+ partnatts > 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("cannot list partition using more than one column")));

/cannot list/cannot use list

3.
@@ -77,7 +77,7 @@ typedef enum DependencyType
  DEPENDENCY_INTERNAL = 'i',
  DEPENDENCY_EXTENSION = 'e',
  DEPENDENCY_AUTO_EXTENSION = 'x',
- DEPENDENCY_PIN = 'p'
+ DEPENDENCY_PIN = 'p',
 } DependencyType;

Why is this change required?

4.
@@ -0,0 +1,69 @@
+/*-
+ *
+ * pg_partitioned_table.h
+ *  definition of the system "partitioned table" relation
+ *  along with the relation's initial contents.
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group

Copyright year should be 2016.

5.
+/*
+ * PartitionSpec - partition key definition including the strategy
+ *
+ * 'strategy' partition strategy name ('list', 'range', etc.)

etc. in above comment seems to be unnecessary.

6.
+ {PartitionedRelationId, /* PARTEDRELID */

Here PARTEDRELID sounds inconvenient, how about PARTRELID?



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] asynchronous execution

2016-10-25 Thread Kyotaro HORIGUCHI

Hi, this is the 7th patch to make instrumentation work.

Explain analyze shows the following result by the previous patch set .

| Aggregate  (cost=820.25..820.26 rows=1 width=8) (actual 
time=4324.676..4324.676
| rows=1 loops=1)
|   ->  Append  (cost=0.00..791.00 rows=11701 width=4) (actual 
time=0.910..3663.8
|82 rows=400 loops=1)
| ->  Foreign Scan on ft10  (cost=100.00..197.75 rows=2925 width=4)
|   (never executed)
| ->  Foreign Scan on ft20  (cost=100.00..197.75 rows=2925 width=4)
|   (never executed)
| ->  Foreign Scan on ft30  (cost=100.00..197.75 rows=2925 width=4)
|   (never executed)
| ->  Foreign Scan on ft40  (cost=100.00..197.75 rows=2925 width=4)
|   (never executed)
| ->  Seq Scan on pf0  (cost=0.00..0.00 rows=1 width=4)
|  (actual time=0.004..0.004 rows=0 loops=1)

The current instrument stuff assumes that requested tuple always
returns a tuple or the end of tuple comes. This async framework
has two point of executing underneath nodes. ExecAsyncRequest and
ExecAsyncEventLoop. So I'm not sure if this is appropriate but
anyway it seems to show sane numbers.

| Aggregate  (cost=820.25..820.26 rows=1 width=8) (actual 
time=4571.205..4571.206
| rows=1 loops=1)
|   ->  Append  (cost=0.00..791.00 rows=11701 width=4) (actual 
time=1.362..3893.1
|14 rows=400 loops=1)
| ->  Foreign Scan on ft10  (cost=100.00..197.75 rows=2925 width=4)
|(actual time=1.056..770.863 rows=100 loops=1)
| ->  Foreign Scan on ft20  (cost=100.00..197.75 rows=2925 width=4)
|(actual time=0.461..767.840 rows=100 loops=1)
| ->  Foreign Scan on ft30  (cost=100.00..197.75 rows=2925 width=4)
|(actual time=0.474..782.547 rows=100 loops=1)
| ->  Foreign Scan on ft40  (cost=100.00..197.75 rows=2925 width=4)
|(actual time=0.156..765.920 rows=100 loops=1)
| ->  Seq Scan on pf0  (cost=0.00..0.00 rows=1 width=4) (never executed)


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 35c60a46f49aab72d492c798ff7eb8fc0e672250 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 25 Oct 2016 19:04:04 +0900
Subject: [PATCH 7/7] Add instrumentation to async execution

Make explain analyze give sane result when async execution has taken
place.
---
 src/backend/executor/execAsync.c  | 19 +++
 src/backend/executor/instrument.c |  2 +-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/src/backend/executor/execAsync.c b/src/backend/executor/execAsync.c
index 40e3f67..588ba18 100644
--- a/src/backend/executor/execAsync.c
+++ b/src/backend/executor/execAsync.c
@@ -46,6 +46,9 @@ ExecAsyncRequest(EState *estate, PlanState *requestor, int request_index,
 	PendingAsyncRequest *areq = NULL;
 	int		nasync = estate->es_num_pending_async;
 
+	if (requestee->instrument)
+		InstrStartNode(requestee->instrument);
+
 	/*
 	 * If the number of pending asynchronous nodes exceeds the number of
 	 * available slots in the es_pending_async array, expand the array.
@@ -121,11 +124,17 @@ ExecAsyncRequest(EState *estate, PlanState *requestor, int request_index,
 	if (areq->state == ASYNC_COMPLETE)
 	{
 		Assert(areq->result == NULL || IsA(areq->result, TupleTableSlot));
+
 		ExecAsyncResponse(estate, areq);
+		if (areq->requestee->instrument)
+			InstrStopNode(requestee->instrument,
+		  TupIsNull((TupleTableSlot*)areq->result) ? 0.0 : 1.0);
 
 		return;
 	}
 
+	if (areq->requestee->instrument)
+		InstrStopNode(requestee->instrument, 0);
 	/* No result available now, make this node pending */
 	estate->es_num_pending_async++;
 }
@@ -193,6 +202,9 @@ ExecAsyncEventLoop(EState *estate, PlanState *requestor, long timeout)
 		{
 			PendingAsyncRequest *areq = estate->es_pending_async[i];
 
+			if (areq->requestee->instrument)
+InstrStartNode(areq->requestee->instrument);
+
 			/* Skip it if not pending. */
 			if (areq->state == ASYNC_CALLBACK_PENDING)
 			{
@@ -211,7 +223,14 @@ ExecAsyncEventLoop(EState *estate, PlanState *requestor, long timeout)
 if (requestor == areq->requestor)
 	requestor_done = true;
 ExecAsyncResponse(estate, areq);
+
+if (areq->requestee->instrument)
+	InstrStopNode(areq->requestee->instrument,
+  TupIsNull((TupleTableSlot*)areq->result) ?
+  0.0 : 1.0);
 			}
+			else if (areq->requestee->instrument)
+InstrStopNode(areq->requestee->instrument, 0);
 		}
 
 		/* If any node completed, compact the array. */
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 2614bf4..6a22a15 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -102,7 +102,7 @@ InstrStopNode(Instrumentation *instr, double nTuples)
 			 , >bufusage_start);
 
 	/* Is this the first tuple

Re: [HACKERS] pg_basebackup stream xlog to tar

2016-10-25 Thread Magnus Hagander

On Mon, Oct 24, 2016 at 7:46 AM, Michael Paquier 
wrote:

> On Sun, Oct 23, 2016 at 10:28 PM, Magnus Hagander 
> wrote:
> > It also broke the tests and invalidated some documentation. But it was
> easy
> > enough to fix.
> >
> > I've now applied this, so next time you get to do the merging :P Joking
> > aside, please review and let me know if you can spot something I messed
> up
> > in the final merge.
>
> Just had another look at it..
> +static int
> +tar_fsync(Walfile f)
> +{
> +   Assert(f != NULL);
> +   tar_clear_error();
> +
> +   /*
> +* Always sync the whole tarfile, because that's all we can do. This
> makes
> +* no sense on compressed files, so just ignore those.
> +*/
> +   if (tar_data->compression)
> +   return 0;
> +
> +   return fsync(tar_data->fd);
> +}
> fsync() should not be called here if --no-sync is used.
>
> +   /* sync the empty blocks as well, since they're after the last file */
> +   fsync(tar_data->fd);
> Similarly, the fsync call of tar_finish() should not happen when
> --no-sync is used.
>

Yeah, agreed.



> +   if (format == 'p')
> +   stream.walmethod = CreateWalDirectoryMethod(param->xlog, do_sync);
> +   else
> +   stream.walmethod = CreateWalTarMethod(param->xlog,
> compresslevel, do_sync);
> LogStreamerMain() exits immediately once it is done, but I think that
> we had better be tidy here and clean up the WAL methods that have been
> allocated. I am thinking here about a potentially retry method on
> failure, though the best shot in this area would be with
> ReceiveXlogStream().
>


Wouldn't the same be needed in pg_receivexlog.c in that case?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Push down more full joins in postgres_fdw

2016-10-25 Thread Ashutosh Bapat

>
>> 13. The comment below is missing the main purpose i.e. creating a a unique
>> alias, in case the relation gets converted into a subquery. Lowest or
>> highest
>> relid will create a unique alias at given level of join and that would be
>> more
>> future proof. If we decide to consider paths for every join order,
>> following
>> comment will no more be true.
>> +/*
>> + * Set the relation index.  This is defined as the position of this
>> + * joinrel in the join_rel_list list plus the length of the rtable
>> list.
>> + * Note that since this joinrel is at the end of the list when we are
>> + * called, we can get the position by list_length.
>> + */
>> +fpinfo->relation_index =
>> +list_length(root->parse->rtable) +
>> list_length(root->join_rel_list);
>
>
> I don't agree on that point.  As I said before, the approach using the
> lowest/highest relid would make a remote query having nested subqueries
> difficult to read.  And even if we decided to consider paths for every join
> order, we could define the relation_index safely, by modifying
> postgresGetForeignJoinPaths so that it (1) sets the relation_index the
> proposed way at the first time it is called and (2) avoids setting the
> relation_index after the first call, by checking to see
> fpinfo->relation_index > 0.

OK. Although, the alias which contains a number, which apparently
doesn't show any relation with the relation (no pun intended :)) being
deparsed, won't make much sense in the deparsed query. But I am fine
leaving this to the committer's judgement. May be you want to add an
assert here making sure that relation_index is set only once.
Something like Assert(fpinfo->relation_index == 0) just before setting
it.

>
>> 14. The function name talks about non-vars but the Assert few lines below
>> asserts that every node there is a Var. Need better naming.
>> +reltarget_has_non_vars(RelOptInfo *foreignrel)
>> +{
>> +ListCell   *lc;
>> +
>> +foreach(lc, foreignrel->reltarget->exprs)
>> +{
>> +Var   *var = (Var *) lfirst(lc);
>> +
>> +Assert(IsA(var, Var));
>> And also an explanation for the claim
>> + * Note: currently deparseExplicitTargetList can't properly handle such
>> Vars.
>
>
> Will remove this function for the reason described in #12.
>
>> 15. While deparsing every Var, we are descending the RelOptInfo hierarchy.
>
>
> Right.  I think that not only avoids creating redundant data to find the
> alias of a given Var node but also would be able to find it in optimal cost.
>
>> Instead of that, we should probably gather all the alias information once
>> and
>> store it in context as an array indexed by relid. While deparsing a Var we
>> can
>> get the targetlist and alias for a given relation by using var->varno as
>> index.
>> And then search for given Var node in the targetlist to deparse the column
>> name
>> by its position in the targetlist. For the relations that are not
>> converted
>> into subqueries, this array will not hold any information and the Var will
>> be
>> deparsed as it's being done today.
>
>
> Sorry, I don't understand this part.  How does that work when creating a
> remote query having nested subqueries?  Is that able to search for a given
> Var node efficiently?

For every relation that is deparsed as a subquery, we will create a
separate context. Each such context will have an array described
above. This array will contain the targetlist and aliases for all the
relations, covered by that relation, that are required to be deparsed
as subqueries. These arrays bubble up to topmost relation that is not
required to be deparsed as subquery. When a relation is required to be
deparsed as a subquery, its immediate upper relation sets the alias
and targetlist chosen for that relation at all the indexes
corresponding to all the base relations covered by that relation. For
example, let's assume that a relation (1, 2, 3) is required to be
deparsed as subquery for an immediate upper relation (1, 2, 3, 4, 5)
(thus the other joining relation being (4,5)). While deparsing for
relation (1,2,3,4,5), the context will contain a 5 element array, with
positions 1, 2, 3 filled by same targetlist and alias whereas
positions 4 and 5 will not be filled as those relations are not
deparsed as subqueries. Let's assume in relation (1, 2, 3), (1, 3) in
turn requires subquery but (2) does not. Thus the context created
while deparsing (1, 2, 3) will have a 3 element array with positions 1
and 3 containing the same targetlist and alias, where as position 2
will be empty. When deparsing a Var node with varno = N and varattno =
m, if the nth position in the array in the context is empty, that Var
node will be deparsed as rN.. But if that position is has
alias sZ, then we search for that Var node in the targetlist and if
it's found at kth position in the targetlist, we will deparse it as
sZ.ck. The search in the targetlist can be performed using
tlist_member, and then fetching the

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2016-10-25 Thread Peter Moser


On 27.07.2016 at 16:09 Robert Haas wrote:

On Fri, Jul 22, 2016 at 7:15 AM, Anton Dignös  wrote:

We would like to contribute to PostgreSQL a solution that supports the query
processing of "at each time point". The basic idea is to offer two new
operators, NORMALIZE and ALIGN, whose purpose is to adjust (or split) the
ranges of tuples so that subsequent queries can use the usual grouping and
equality conditions to get the intended results.


I think that it is great that you want to contribute your work to
PostgreSQL.  I don't know whether there will be a consensus that this
is generally useful functionality that we should accept, but I commend
the effort anyhow.  Assuming there is, getting this into a state that
we consider committable will probably take quite a bit of additional
work on your part; no one will do it for you.



Hi hackers,

thank you for your feedback.

We are aware that contributing to PostgreSQL is a long way with a lot
of work.  We are committed to go all the way and do the work as
discussed in the community.

We had some internal discussions about the project, looking also at
some other patches to better understand whether the patch is 
work-in-progress or ready for commitfest.




If you're still
interested in proceeding given those caveats, please add your patch
here so that it gets reviewed:

https://commitfest.postgresql.org/action/commitfest_view/open



We decided to follow your recommendation and add the patch to the 
commitfest.



Looking forward for your feedback,
Anton, Johann, Michael, Peter


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] asynchronous execution

2016-10-25 Thread Kyotaro HORIGUCHI

This is the rebased version on the current master(-0004), and
added resowner stuff (0005) and unlikely(0006).

At Tue, 18 Oct 2016 10:30:51 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20161018.103051.30820907.horiguchi.kyot...@lab.ntt.co.jp>
> > > - Errors in the executor can leak the WaitEventSet.  Probably we need
> > > to modify ResourceOwners to be able to own WaitEventSets.
> 
> WaitEventSet itself is not leaked but epoll-fd should be closed
> at failure. This seems doable with TRY-CATCHing in
> ExecAsyncEventLoop. (not yet)

Haha, that's a silly talk. The wait event can continue to live
when timeout and any error can happen on the way after the
that. I added an entry for wait event set to resource owner and
hang ones created in ExecAsyncEventWait to
TopTransactionResourceOwner. Currently WaitLatchOrSocket doesn't
do so not to change the current behavior. WaitEventSet doesn't
have usable identifier for resowner.c so currently I use the
address(pointer value) for the purpose. The patch 0005 does that.

> I measured performance and had the following result.
> 
> t0  - SELECT sum(a) FROM ;
> pl  - SELECT sum(a) FROM <4 local children>;
> pf0 - SELECT sum(a) FROM <4 foreign children on single connection>;
> pf1 - SELECT sum(a) FROM <4 foreign children on dedicate connections>;
> 
> The result is written as "time (std dev )"
> 
> sync
>   t0: 3820.33 (  1.88)
>   pl: 1608.59 ( 12.06)
>  pf0: 7928.29 ( 46.58)
>  pf1: 8023.16 ( 26.43)
> 
> async
>   t0: 3806.31 (  4.49)0.4% faster (should be error)
>   pl: 1629.17 (  0.29)1.3% slower
>  pf0: 6447.07 ( 25.19)   18.7% faster
>  pf1: 1876.80 ( 47.13)   76.6% faster
> 
> t0 is not affected since the ExecProcNode stuff has gone.
> 
> pl is getting a bit slower. (almost the same to simple seqscan of
> the previous patch) This should be a misprediction penalty.

Using likely macro for ExecAppend, and it seems to have shaken
off the degradation.

sync
  t0: 3919.49 (  5.95)
  pl: 1637.95 (  0.75)
 pf0: 8304.20 ( 43.94)
 pf1: 8222.09 ( 28.20)

async
  t0: 3885.84 ( 40.20)  0.86% faster (should be error but stable on my env..)
  pl: 1617.20 (  3.51)  1.26% faster (ditto)
 pf0: 6680.95 (478.72)  19.5% faster
 pf1: 1886.87 ( 36.25)  77.1% faster

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 25eba7e506228ab087e8b743efb039286a8251c4 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 12 Oct 2016 12:46:10 +0900
Subject: [PATCH 1/6] robert's 2nd framework

---
 contrib/postgres_fdw/postgres_fdw.c |  49 
 src/backend/executor/Makefile   |   4 +-
 src/backend/executor/README |  43 +++
 src/backend/executor/execAmi.c  |   5 +
 src/backend/executor/execAsync.c| 462 
 src/backend/executor/nodeAppend.c   | 162 ++-
 src/backend/executor/nodeForeignscan.c  |  49 
 src/backend/nodes/copyfuncs.c   |   1 +
 src/backend/nodes/outfuncs.c|   1 +
 src/backend/nodes/readfuncs.c   |   1 +
 src/backend/optimizer/plan/createplan.c |  45 +++-
 src/include/executor/execAsync.h|  29 ++
 src/include/executor/nodeAppend.h   |   3 +
 src/include/executor/nodeForeignscan.h  |   7 +
 src/include/foreign/fdwapi.h|  15 ++
 src/include/nodes/execnodes.h   |  57 +++-
 src/include/nodes/plannodes.h   |   1 +
 17 files changed, 909 insertions(+), 25 deletions(-)
 create mode 100644 src/backend/executor/execAsync.c
 create mode 100644 src/include/executor/execAsync.h

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 906d6e6..c480945 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -19,6 +19,7 @@
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
+#include "executor/execAsync.h"
 #include "foreign/fdwapi.h"
 #include "funcapi.h"
 #include "miscadmin.h"
@@ -349,6 +350,14 @@ static void postgresGetForeignUpperPaths(PlannerInfo *root,
 			 UpperRelationKind stage,
 			 RelOptInfo *input_rel,
 			 RelOptInfo *output_rel);
+static bool postgresIsForeignPathAsyncCapable(ForeignPath *path);
+static void postgresForeignAsyncRequest(EState *estate,
+			PendingAsyncRequest *areq);
+static void postgresForeignAsyncConfigureWait(EState *estate,
+  PendingAsyncRequest *areq,
+  bool reinit);
+static void postgresForeignAsyncNotify(EState *estate,
+		   PendingAsyncRequest *areq);
 
 /*
  * Helper functions
@@ -468,6 +477,12 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for upper relation push-down */
 	routine->GetForeignUpperPaths = postgresGetForeignUpperPaths;
 
+	/* Support functions for async execution */
+	routine->IsForeignPathAsyncCapable = postgresIsForeignPathAsyncCapable;
+	routine->ForeignAsyncRequest = postgresForeignAsyncRequest;
+

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-10-25 Thread Amit Kapila

On Tue, Oct 25, 2016 at 2:10 AM, Alvaro Herrera
 wrote:
> Robert Haas wrote:
>
>
>> It is not obvious what it means if there are multiple ports but the
>> number doesn't equal the number of hosts.
>
> I think we should reject the case of differing number of elements and
> neither host nor port is a singleton, as an error.  The suggestion to
> ignore some parts seems too error-prone.
>

+1. I also think returning error is better option in above case.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Transaction management overhaul is necessary?

2016-10-25 Thread Tsunakawa, Takayuki

From: Craig Ringer [mailto:cr...@2ndquadrant.com]
> >> This was because psqlODBC starts and ends a subtransaction for each
> >> SQL statement by default to implement statement-level rollback.  And
> >> PostgreSQL creates one CurTransactionContext memory context, which is
> >> 8KB, for each subtransaction and retain them until the top transaction
> ends.
> 
> Surely that's where to start then. Find a way to pool and re-use, fully
> release, or otherwise be done with transaction contexts for released
> savepoints.

Yes, I'll investigate this.  Any reference information would be appreciated on 
why the CurTransactionContexts had to be retained, and whether it's difficult 
to circumvent.


> You can control transaction level rollback in psqlODBC directly. You do
> not need to fall back to the old protocol. Check the driver options.

That driver option is Protocol=7.4-1.  The name is misleading, as the driver 
now ignores version part (7.4), and interprets 1 as transaction-rollback.


> Right. We can't just fire off each statement wrapped in SAVEPOINT and RELEASE
> SAVEPOINT because we need to get the result of the statement and decide
> whether to ROLLBACK TO SAVEPOINT or RELEASE SAVEPOINT. It only requires
> two round trips if you shove the SAVEPOINT in with the intended statement,
> but it's still messy.
> 
> I'd like to see an alternative statement with semantics more akin to COMMIT
> - which automatically into ROLLBACK if the tx is aborted.
> COMMIT SAVEPOINT would be too confusing since it's not truly committed.
> I don't know what to call it. But basically something that does RELEASE
> SAVEPOINT [named savepoint] unless the subxact is in aborted state, in which
> case it does ROLLBACK TO [named savepoint].
> Bonus points for letting it remember the last savepoint created and use
> that.
> 
> Furthermore, we should really add it on the protocol level so drivers can
> send subtransaction control messages more compactly, without needing to
> go through the parser etc, and without massively spamming the logs. For
> this purpose savepoint names would be internally generated so the driver
> wouldn't have to send them. We'd log savepoint boundaries when transaction
> logging was enabled. Since the client would send the first such protocol
> request we could do it on the sly without a protocol version bump; clients
> could just check server version and not use the new messages for older
> servers. If they send it to an older server they get a protocol error, which
> is fine.

I'm simply thinking of proposing a new GUC, something like "SET auto_rollback = 
{none | statement | transaction}", where none is the default and traditional 
behavior.


> > You should to implement a CALL statement - that can be independent on
> > outer transaction. The behave inside procedure called by CALL
> > statement should be same like client side - and there you can controll
> > transactions explicitly without nesting.
> 
> I agree that'd be desirable. Top level "procedures" are necessary for this,
> really.
> 
> This would also enable us to return multiple result sets.
> 
> We'd probably have to start at least one small read-only tx for the initial
> cache access to look up the proc and set everything up, but if we don't
> allocate xids local transactions are super cheap.

OK, that would be a very big challenge... I can't imagine how difficult it will 
be now.  But supporting the stored procedure with CALL statement would be a 
wall to overcome.


> However, I think trying to tackle the memory context bloat reported upthread
> would be a more effective starting point since it immediately targets the

Yes, I think I'll address this.  Maybe I'll start different threads for each 
topic:

1. Memory context bloat
2. Statement-level rollback
3. Stored procedures where transactions can be ended and started

Regards
Takayuki Tsunakawa


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_hba_file_settings view patch

2016-10-25 Thread Alvaro Herrera

Haribabu Kommi wrote:
> On Mon, Oct 3, 2016 at 3:51 PM, Michael Paquier 
> wrote:

> Yes, I agree that adding these JSONB utility functions for this view
> is an overkill, but I thought that these are may be useful for some
> users if it is a JSONB type instead of array.

Peter Eisentraut said he'd like JSON better:
https://www.postgresql.org/message-id/5547db0a.2020...@gmx.net
I asked twice about the use of JSON, suggesting an array instead:
https://www.postgresql.org/message-id/20151204163147.GZ2763@alvherre.pgsql
https://www.postgresql.org/message-id/20160201215714.GA98800@alvherre.pgsql

I now think that we should figure out what it is that we want before we
continue to request it to be changed over and over.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-10-25 Thread Amit Kapila

On Thu, Oct 6, 2016 at 12:44 PM, Amit Langote
 wrote:
> On 2016/10/05 2:12, Robert Haas wrote:
>> On Tue, Oct 4, 2016 at 4:02 AM, Amit Langote
>>  wrote:
 Even if we leave the empty relfilenode around for now -- in the long
 run I think it should die -- I think we should prohibit the creation
 of subsidiary object on the parent which is only sensible if it has
 rows - e.g. indexes.  It makes no sense to disallow non-inheritable
 constraints while allowing indexes, and it could box us into a corner
 later.
>>>
>>> I agree.  So we must prevent from the get-go the creation of following
>>> objects on parent tables (aka RELKIND_PARTITIONED_TABLE relations):
>>>
>>> * Indexes
>>> * Row triggers (?)
>>
>> Hmm, do we ever fire triggers on the parent for operations on a child
>> table?  Note this thread, which seems possibly relevant:
>>
>> https://www.postgresql.org/message-id/flat/cd282adde5b70b20c57f53bb9ab75e27%40biglumber.com
>
> The answer to your question is no.
>
> The thread you quoted discusses statement-level triggers and the
> conclusion is that they don't work as desired for UPDATE and DELETE on
> inheritance tables.  As things stand, only UPDATE or DELETE on the parent
> affects the child tables and it's proposed there that the statement-level
> triggers on the parent and also on any child tables affected should be
> fired in that case.
>

Doesn't that imply that the statement level triggers should be fired
for all the tables that get changed for statement?  If so, then in
your case it should never fire for parent table, which means we could
disallow statement level triggers as well on parent tables?

Some of the other things that we might want to consider disallowing on
parent table could be:
a. Policy on table_name
b. Alter table has many clauses, are all of those allowed and will it
make sense to allow them?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] planet postgresql issue

2016-10-25 Thread hubert depesz lubaczewski

On Tue, Oct 25, 2016 at 08:36:22AM +0200, hubert depesz lubaczewski wrote:
> Same here. feed url is https://www.depesz.com/tag/postgresql/feed/ and
> as far as I can tell, it works OK.

Magnus is looking into the problem now. Seems to be something related to
networking in the box that hosts planet.

Best regards,

depesz



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] planet postgresql issue

2016-10-25 Thread Fabien COELHO




I got a email about issues with reading feed URL. [...]


Same here. [...]


Same issue here. [...]


Same.

I would guess that the feed loader has some connectivity problems.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] planet postgresql issue

2016-10-25 Thread Michael Paquier

On Tue, Oct 25, 2016 at 3:36 PM, hubert depesz lubaczewski
 wrote:
> On Tue, Oct 25, 2016 at 08:28:00AM +0200, Pavel Stehule wrote:
>> Hi
>>
>> I got a email about issues with reading feed URL.
>>
>> I checked manually URL and it looks well. http://okbob.blogspot.com/
>> feeds/posts/default
>
> Same here. feed url is https://www.depesz.com/tag/postgresql/feed/ and
> as far as I can tell, it works OK.

Same issue here. Glad I am not crazy yet.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUG] pg_basebackup from disconnected standby fails

2016-10-25 Thread Michael Paquier

On Tue, Oct 25, 2016 at 11:07 AM, Kyotaro HORIGUCHI
 wrote:
> At Mon, 24 Oct 2016 15:55:58 +0900, Michael Paquier 
>  wrote in 
> 
>> Anyway, we can clearly reject 1. in the light of
>> https://www.postgresql.org/message-id/caa4ek1kmjtsxqf0cav7cs4d4vwv2h_pc8d8q1bucqdzaf+7...@mail.gmail.com
>> when playing with different stop locations at recovery.
>
> | * If the last checkpoint record we've replayed is already our last
> | * restartpoint, we can't perform a new restart point. We still update
> | * minRecoveryPoint in that case, so that if this is a shutdown restart
> | * point, we won't start up earlier than before.
> ...
> | * We don't explicitly advance minRecoveryPoint when we do create a
> | * restartpoint. It's assumed that flushing the buffers will do that as a
> | * side-effect.
>
> The second sentence seems to me as "we *expect* minRecoveryPoint
> to be updated anyway even if we don't do that here". Though a bit
> different in reality..it.
>
> skipped checkpoints - advance minRecvoeryPoint to the checkpoint
>
> I'm failing to make a consistent model for the code around here
> in my mind..

Hm? If the last checkpoint record replayed is the last restart point,
no restart point is created *but* minRecoveryPoint is updated to
prevent the case where read only queries are allowed earlier than the
next startup point. On the other hand, if a restart point is created,
this code does not update minRecoveryPoint and it is assumed that the
next buffer flush will do it.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] planet postgresql issue

2016-10-25 Thread hubert depesz lubaczewski

On Tue, Oct 25, 2016 at 08:28:00AM +0200, Pavel Stehule wrote:
> Hi
> 
> I got a email about issues with reading feed URL.
> 
> I checked manually URL and it looks well. http://okbob.blogspot.com/
> feeds/posts/default

Same here. feed url is https://www.depesz.com/tag/postgresql/feed/ and
as far as I can tell, it works OK.

Best regards,

depesz

-- 
The best thing about modern society is how easy it is to avoid contact with it.
 http://depesz.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] planet postgresql issue

2016-10-25 Thread Pavel Stehule

Hi

I got a email about issues with reading feed URL.

I checked manually URL and it looks well. http://okbob.blogspot.com/
feeds/posts/default

Regards

Pavel

Re: [HACKERS] pg_hba_file_settings view patch

2016-10-25 Thread Haribabu Kommi

On Mon, Oct 3, 2016 at 3:51 PM, Michael Paquier 
wrote:

> On Mon, Sep 5, 2016 at 4:09 PM, Haribabu Kommi 
> wrote:
> > On Sun, Sep 4, 2016 at 1:44 AM, Simon Riggs 
> wrote:
> >> On 15 August 2016 at 12:17, Haribabu Kommi 
> >> wrote:
> >>
> >> > comments?
> >>
> >> This looks like a good feature contribution, thanks.
> >>
> >> At present the patch doesn't apply cleanly, please rebase.
> >
> >
> > Rebased patch is attached.
>
> Moved to next CF as there is a patch and no reviews.
>
> +   push_jsonb_string_key(, "map");
> +   push_jsonb_string_value(, hba->usermap);
> [...]
> +
> + options
> + jsonb
> + Configuration options set for authentication method
> +
> Why is it an advantage to use jsonb here instead of a simple array
> made of name=value? If they were nested I'd see a case for it but it
> seems to me that as presented this is just an overkill. In short, I
> think that this patch needs a bit of rework, so I am marking it as
> returned with feedback.
>


Yes, I agree that adding these JSONB utility functions for this view
is an overkill, but I thought that these are may be useful for some
users if it is a JSONB type instead of array.

If anyone else feel the same opinion, I can update the patch with
array datatype.

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] Minor code improvement to postgresGetForeignJoinPaths

2016-10-25 Thread Tatsuro Yamada


Hi Ashutosh,


You are right. Every call except that one is using NIL, so better to
fix it. The pattern was repeated in the recent aggregate pushdown
support. Here's patch to fix create_foreignscan_path() call in
add_foreign_grouping_paths() as well.


Thanks for the review!

Tatsuro Yamada
NTT Open Source Software Center




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

81 matches

Mail list logo