date:20170501

On 2017-05-01 03:54:49 +0200, Petr Jelinek wrote:
> I agree with adding running, I think that's good thing even for the per
> transaction tracking and snapshot exports - we could use the newly added
> field to get rid of the issue we have with 'snapshot too large' when
> there were many aborted transactions while we waited for running ones to
> finish.

I'm not sure of that - what I was proposing would only track this for
the ->running substructure.  How'd that help?

> But, I still think we need to restart the tracking after new
> xl_running_xacts. Reason for that is afaics any of the catalog snapshots
> that we assigned to transactions at the end of SnapBuildCommitTxn might
> be corrupted otherwise as they were built before we knew one of the
> supposedly running txes was actually already committed and that
> transaction might have done catalog changes.

I'm afraid you're right.  But I think this is even more complicated: The
argument in your version that this can only happen once, seems to also
be holey: Just imagine a pg_usleep(3000 * 100) right before
ProcArrayEndTransaction() and enjoy the picture.

Wonder if we should just (re-)add a stage between SNAPBUILD_START and
SNAPBUILD_FULL_SNAPSHOT.  Enter SNAPBUILD_BUILD_INITIAL_SNAPSHOT at the
first xl_running_xacts, wait for all transactions to end with my
approach, while populating SnapBuild->committed, only then start
collecting changes for transactions (i.e. return true from
SnapBuildProcessChange()), return true once all xacts have finished
again.  That'd presumably be a bit easier to understand, more robust -
and slower.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On How To Shorten the Steep Learning Curve Towards PG Hacking...

2017-05-01 Thread Kang Yuzhe

On Sun, Apr 30, 2017 at 10:47 AM, Andrew Borodin  wrote:
> Hi, Kang and everyone in this thread.
>
> I'm planning to present the online course "Hacking PostgreSQL: data
> access methods in action and under the hood" on edX on June 1st. It's
> not announced yet, links will be available later.
> This course I'm describing information that was crucial for me to
> start hacking. Currently, my knowledge of technologies behind Postgres
> is quite limited, though I know the border of my knowledge quite well.
>
> Chances are that description from my point of view will not be helpful
> in some cases: before starting contributing to Postgres I had already
> held PhD in CS for database technology and I had already implemented 3
> different commercial DBMS (all in different technologies, PLs,
> paradigms, focuses, different prbolems being solved). And still,
> production of minimally viable contribution took 3 months (I was
> hacking for an hour a day, mostly at evenings).
> That's why I decided that it worth talking about how to get there
> before I'm already there. It's quite easy to forget that some concepts
> are really hard before you get them.
>
> The course will cover:
> 1. Major differences of Postgres from others
> 2. Dev tools as I use them
> 3. Concept of paged memory, latches and paged data structures
> 4. WAL, recovery, replication
> 5. Concurrency and locking in B-trees
> 6. GiST internals
> 7. Extensions
> 8. Text search and some of GIN
> 9. Postgres community mechanics
> Every topic will consist of two parts: 1 - video lectures on YouTube
> (in English and Russian, BTW my English is far from perfect) with
> references to docs and other resources, 2 - practical tasks where you
> change code slightly and observe differences (this part is mostly to
> help the student to observe easy entry points).
>

Thanks Andrey in advance. I am looking forward to meetingyou  there at Edx.
Regards,
Zeray


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapbuild woes

2017-05-01 Thread Petr Jelinek

On 01/05/17 10:03, Andres Freund wrote:
> On 2017-05-01 03:54:49 +0200, Petr Jelinek wrote:
>> I agree with adding running, I think that's good thing even for the per
>> transaction tracking and snapshot exports - we could use the newly added
>> field to get rid of the issue we have with 'snapshot too large' when
>> there were many aborted transactions while we waited for running ones to
>> finish.
> 
> I'm not sure of that - what I was proposing would only track this for
> the ->running substructure.  How'd that help?
> 

Well not as is, but it's a building block for it.

> 
>> But, I still think we need to restart the tracking after new
>> xl_running_xacts. Reason for that is afaics any of the catalog snapshots
>> that we assigned to transactions at the end of SnapBuildCommitTxn might
>> be corrupted otherwise as they were built before we knew one of the
>> supposedly running txes was actually already committed and that
>> transaction might have done catalog changes.
> 
> I'm afraid you're right.  But I think this is even more complicated: The
> argument in your version that this can only happen once, seems to also
> be holey: Just imagine a pg_usleep(3000 * 100) right before
> ProcArrayEndTransaction() and enjoy the picture.
>

Well yes, transaction can in theory have written commit/abort xlog
record and stayed in proc for more than single xl_running_xacts write.
But then the condition which we test that the new xl_running_xacts has
bigger xmin than the previously tracked one's xmax would not be
satisfied and we would not enter the relevant code path yet. So I think
we should not be able to get any xids we didn't see. But we have to
restart tracking from beginning (after first checking if we didn't
already see anything that the xl_running_xacts considers as running),
that's what my code did.

> Wonder if we should just (re-)add a stage between SNAPBUILD_START and
> SNAPBUILD_FULL_SNAPSHOT.  Enter SNAPBUILD_BUILD_INITIAL_SNAPSHOT at the
> first xl_running_xacts, wait for all transactions to end with my
> approach, while populating SnapBuild->committed, only then start
> collecting changes for transactions (i.e. return true from
> SnapBuildProcessChange()), return true once all xacts have finished
> again.  That'd presumably be a bit easier to understand, more robust -
> and slower.
> 

That would also work, but per above, I don't understand why it's needed.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: ALTER SYSTEM [...] COMMENT

2017-05-01 Thread Amit Kapila

On Wed, Apr 26, 2017 at 10:33 PM, Joshua D. Drake  
wrote:
> I propose:
>
> Add a column to pg_settings comment(text)
> Change the grammar to allow:
>
> ALTER SYSTEM SET configuration_parameter { TO | = } { value | 'value' |
> DEFAULT } COMMENT 'comment'
>
> Example:
>
> ALTER SYSTEM SET maintenance_work_mem TO '1GB' COMMENT IS 'Increased to
> allow autovacuum to be more efficient';
>
> Potential issues:
>
> Does not use existing comment functionality. Alternate solution which would
> decrease functionality is:
>
> COMMENT ON SETTING setting IS 'comment';
>

I think syntax wise, it makes sense to have both the forms in some way
as proposed by you.  I think providing COMMENT along with ALTER SYSTEM
has the advantage that user can specify it when it specifies the
setting and by providing a separate command facilitates to use it even
when we just want to add/change the comments for a particular setting.

I think Robert [1] and I [2] have made some comments when this topic
was previously discussed which are worth looking if you or someone is
planning to write a patch for this feature.

[1] - 
https://www.postgresql.org/message-id/CA%2BTgmoZBDLhDexHyTJ%3DH0xZt7-6M%3Dh%2Bv2Xi0Myj7Q37QGZQb4g%40mail.gmail.com
[2] - 
https://www.postgresql.org/message-id/CAA4eK1%2B%3DovYQqYGHcX2OBU3mk%2BhSHjFDpvmrHCos1Vgj8_b6vg%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] some review comments on logical rep code

On Sat, Apr 29, 2017 at 12:33 AM, Noah Misch  wrote:
>> I think the patch that Fujii Masao has proposed has found general
>> agreement.  I would recommend that he commits it as he sees fit.
>
> This is not a conforming status update, because it does not specify a date for
> your next update.

If Peter thinks that the issues fixed by this patch don't really need
to be corrected, then we could interpret this as a statement that it
should not be listed as an open item but that he does not object to
someone else fixing it anyway.

If these are real issues, then Peter should take responsibility for
them because they were created by his commit.  If Fujii Masao is
willing to help by committing the patch in question, great; but if
not, then Peter should be responsible for committing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

2017-05-01 Thread Amit Kapila

On Fri, Apr 28, 2017 at 10:02 PM, Dmitriy Sarafannikov
 wrote:
>
> What I'm thinking of is the regular indexscan that's done internally
> by get_actual_variable_range, not whatever ends up getting chosen as
> the plan for the user query.  I had supposed that that would kill
> dead index entries as it went, but maybe that's not happening for
> some reason.
>
>
> Really, this happens as you said. Index entries are marked as dead.
> But after this, backends spends cpu time on skip this killed entries
> in _bt_checkkeys :
>

If that is the case, then how would using SnapshotAny solve this
problem.  We get the value from index first and then check its
visibility in heap, so if time is spent in _bt_checkkeys, why would
using a different kind of Snapshot solve the problem?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG 10 release notes

On Tue, Apr 25, 2017 at 11:01 AM, Bruce Momjian  wrote:
>> Or the ability of logical decoding to follow timeline switches.
>
> I didn't think logical decoding was really more than a proof-of-concept
> until now.

/me searches for jaw on floor.

It sounds like you don't understand how logical decoding works.  There
are plugins -- fairly widely used, I think -- like
https://github.com/confluentinc/bottledwater-pg and
https://github.com/eulerto/wal2json which use the in-core
infrastructure to do very nifty things, much like there are foreign
data wrappers other than postgres_fdw.  Even test_decoding is (perhaps
regrettably) being used to build production solutions.  The point is
that most of the logic is in core; test_decoding or bottlewater or
wal2json are just small plugins that tap into that infrastructure.

I would not in any way refer to logical decoding as being only a proof
of concept, even before logical replication.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should pg_current_wal_location() become pg_current_wal_lsn()

2017-05-01 Thread David Rowley

On 20 April 2017 at 07:29, Euler Taveira  wrote:
> 2017-04-19 1:32 GMT-03:00 Michael Paquier :
>>
>> I vote for "location" -> "lsn". I would expect complains about the
>> current inconsistency at some point, and the function names have been
>> already changed for this release..

OK, so I've created a draft patch which does this.

In summary of what it does

1. Renames all wal_location functions to wal_lsn.
2. Renames all system view columns to use "lsn" instead of "location"
3. Rename function parameters to use "lsn" instead of "location".
4. Rename function parameters "wal_position" to "lsn". (Not mentioned
before, but seems consistency was high on the list from earlier
comments on the thread)
5. Change documentation to reflect the above changes.
6. Fix bug where docs claimed return type of
pg_logical_slot_peek_changes.location was text, when it was pg_lsn
(maybe apply separately?)
7. Change some places in the func.sgml where it was referring to the
lsn as a "position" rather than "location". (We want consistency here)

Is this touching all places which were mentioned by everyone?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

location2lsn_rename.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SCRAM in the PG 10 release notes

On Tue, Apr 25, 2017 at 10:16 PM, Bruce Momjian  wrote:
> Well, we could add "MD5 users are encouraged to switch to
> SCRAM-SHA-256".  Now whether we want to list this as something on the
> SCRAM-SHA-256 description, or mention it as an incompatibility, or
> under Migration.  I am not clear that MD5 is in such terrible shape that
> this is warranted.

I think it's warranted.  The continuing use of MD5 has been a headache
for some EnterpriseDB customers who have compliance requirements which
they must meet.  It's not that they themselves necessarily know or
care whether MD5 is secure, although in some cases they do; it's that
if they use it, they will be breaking laws or regulations to which
their business or agency is subject.  I imagine customers of other
PostgreSQL companies have similar issues.  But leaving that aside, the
advantage of SCRAM isn't merely that it uses a better algorithm to
hash the password.  It has other advantages also, like not being
vulnerable to replay attacks.  If you're doing password
authentication, you should really be using SCRAM, and encouraging
people to move to SCRAM after upgrading is a good idea.

That having been said, SCRAM is a wire protocol break.  You will not
be able to upgrade to SCRAM unless and until the drivers you use to
connect to the database add support for it.  The only such driver
that's part of libpq; other drivers that have reimplemented the
PostgreSQL wire protocol will have to be updated with SCRAM support
before it will be possible to use SCRAM with those drivers.  I think
this should be mentioned in the release notes, too.  I also think it
would be great if somebody would put together a wiki page listing all
the popular drivers and (1) whether they use libpq or reimplement the
wire protocol, and (2) if the latter, the status of any efforts to
implement SCRAM, and (3) if those efforts have been completed, the
version from which they support SCRAM.  Then, I think we should reach
out to all of the maintainers of those driver authors who aren't
moving to support SCRAM and encourage them to do so.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

On Mon, May 1, 2017 at 12:18 AM, Amit Langote
 wrote:
> Attached updated patch.

Committed, except for this bit:

+A statement-level trigger defined on partitioned tables is fired only
+once for the table itself, not once for every table in the partitioning
+hierarchy.  However, row-level triggers of any affected leaf partitions
+will be fired.

The first sentence there has a number of issues.  Grammatically, there
is an agreement problem: trigger is singular, but partitioned table is
plural, and one trigger isn't defined across multiple tables.  It
would have to say something like "A statement-level trigger defined on
a partitioned table".  But even with that correction, it's not really
saying what you want it to say.  Nobody would expect that the same
statement-level trigger would be fired multiple times.  The issue is
whether statement-level triggers on the partitions themselves will be
fired, not the statement-level trigger on the partitioned table.
Also, if this applies to inheritance as well as partitioning, then why
mention only partitioning?  I think we might want to document
something more here, but not like this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG 10 release notes

On Tue, Apr 25, 2017 at 9:56 PM, Bruce Momjian  wrote:
> First, I don't think RFC references belong in the release notes, let
> alone RFC links.

Why not?  I think RFC references are a great thing to include in the
release notes, and including links seems helpful, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapbuild woes

Noah Misch  writes:
> On Fri, Apr 21, 2017 at 10:34:58PM -0700, Andres Freund wrote:
>> ... I was wondering about adding
>> a loop that simply runs for like 30s and then quits or such, but who
>> knows.

> If the probabilistic test catches the bug even 5% of the time in typical
> configurations, the buildfarm will rapidly identify any regression.  I'd
> choose a 7s test that detects the bug 5% of the time over a 30s test that
> detects it 99% of the time.  (When I wrote src/bin/pgbench/t/001_pgbench.pl
> for a probabilistic bug, I sized that test to finish in 1s and catch its bug
> half the time.  In its case, only two buildfarm members were able to
> demonstrate the original bug, so 5% detection would have been too low.)

30sec is kind of a big lump from a buildfarm standpoint, especially if
you mean "it runs for 30s on my honkin' fast workstation".  I'm fine
with individual tests that run for ~ 1sec.

(This is top-of-mind for me right now because I've been looking around
for ways to speed up the regression tests.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] vcregress support for single TAP tests



On 04/28/2017 08:54 AM, Andrew Dunstan wrote:
>
> On 04/26/2017 10:32 PM, Peter Eisentraut wrote:
>> On 4/23/17 17:09, Andrew Dunstan wrote:
>>> Here's a patch that will allow calling vcregress.pl to run a single TAP
>>> test set. It would work like this:
>>>
>>>
>>> vcregress.pl src/test/recover true
>>>
>>>
>>> The second argument if true (in the perl sense, of course) would trigger
>>> a temp install before running the tests. It defaults to off, in an
>>> attempt to minimize the unnecessary running of installs.
>> Seems kind of weird to have that functionality only for the tap tests,
>> though.
>>
>
>
> Yeah, you're right, I think we'd be much better mimicking what the
> makefiles do and honouring the NO_TEMP_INSTALL environment variable.
> Revised patch that works that way attached.
>
> So to run an arbitrary TAP test set, you would do (for example)
>
> vcregress.pl taptest src/test/recover
>
>


In the absence of further comments I'm going to apply this and
back-patch it so we can get a significant improvement in how the
buildfarm reports results from TAP tests, as well as increased coverage,
on Windows/MSVC machines.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CTE inlining

2017-05-01 Thread Tomas Vondra


On 05/01/2017 06:22 AM, Pavel Stehule wrote:



2017-05-01 1:21 GMT+02:00 Andres Freund mailto:and...@anarazel.de>>:

On 2017-04-30 07:19:21 +0200, Pavel Stehule wrote:
> why we cannot to introduce GUC option - enable_cteoptfence ?

Doesn't really solve the issue, and we've generally shied away from GUCs
that influence behaviour after a few bad experiences.  What if you want
one CTE inlined, but another one not?


It change behave in same sense like enable_nestloop, enable_hashjoin,
... with same limits.



Those (and also the other enable_*) GUCs are a great example why we 
should not use GUCs for tweaking planner behavior, except perhaps for 
the purpose of investigation. It's an extremely blunt tool.


You typically want to affect just a single node in the query plan (say, 
one join), but those options don't allow you to do that. It's all or 
nothing thing.


Even if you're OK with affecting the whole query, it's a separate 
control channel - it's not embedded in the query, the user has to set it 
somehow. So you either set it for the whole session (affecting all the 
other queries that don't really need it), or you set it before each 
query. Which however sucks for a number of reasons, e.g. if you have a 
slow query in the log, how do you know with what GUC values it was 
executed? (You don't, and there's no way to find out.)


Exactly the same issues would affect this new GUC. It would be 
impossible to use multiple CTEs in the query with different fencing 
behavior, and it would be just as difficult to investigate.


So no more planner-affecting GUCs, please, particularly if we expect 
regular users to use them.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapbuild woes



On 05/01/2017 08:46 AM, Tom Lane wrote:
> Noah Misch  writes:
>> On Fri, Apr 21, 2017 at 10:34:58PM -0700, Andres Freund wrote:
>>> ... I was wondering about adding
>>> a loop that simply runs for like 30s and then quits or such, but who
>>> knows.
>> If the probabilistic test catches the bug even 5% of the time in typical
>> configurations, the buildfarm will rapidly identify any regression.  I'd
>> choose a 7s test that detects the bug 5% of the time over a 30s test that
>> detects it 99% of the time.  (When I wrote src/bin/pgbench/t/001_pgbench.pl
>> for a probabilistic bug, I sized that test to finish in 1s and catch its bug
>> half the time.  In its case, only two buildfarm members were able to
>> demonstrate the original bug, so 5% detection would have been too low.)
> 30sec is kind of a big lump from a buildfarm standpoint, especially if
> you mean "it runs for 30s on my honkin' fast workstation".  I'm fine
> with individual tests that run for ~ 1sec.
>
> (This is top-of-mind for me right now because I've been looking around
> for ways to speed up the regression tests.)
>
>   


Yes, me too. We're getting a bit lazy about that - see thread nearby
that will let us avoid unnecessary temp installs among other things.


cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in prepared statement cache invalidation?

On Fri, Apr 28, 2017 at 3:01 AM, Konstantin Knizhnik
 wrote:
> I find out that now Postgres correctly invalidates prepared plans which
> directly depend on altered relation, but doesn't invalidate plans having
> transitive (indirect) dependencies.

I think the problem here is that replanning and recompiling are two
different things.  Here's a slightly different example:

rhaas=# create table foo (a int, b text);
CREATE TABLE
rhaas=# create table bar (a text, b int);
CREATE TABLE
rhaas=# create function quux() returns void as $$declare f foo; begin
select * from foo into f limit 1; raise notice '%', f; end$$ language
plpgsql;
CREATE FUNCTION
rhaas=# insert into foo values (1, 'one');
INSERT 0 1
rhaas=# insert into bar values ('two', 2);
INSERT 0 1
rhaas=# select quux();
NOTICE:  (1,one)
 quux
--

(1 row)

rhaas=# begin;
BEGIN
rhaas=# alter table foo rename to snazzy;
ALTER TABLE
rhaas=# alter table bar rename to foo;
ALTER TABLE
rhaas=# select quux();
ERROR:  invalid input syntax for integer: "two"
CONTEXT:  PL/pgSQL function quux() line 1 at SQL statement

So what has happened here is that the query "SELECT * FROM foo" inside
of quux() got replanned, but the variable f declared by quux() as
being type foo did not get changed to match the new definition of type
foo.  The plancache is working just fine here; it correctly identifies
each statement that needs re-planning and performs that re-planning as
required.  The problem is that this only affects the statements
themselves, not other PL-created data structures like local variables.
The PL's cache of compiled functions is the issue, not the plancache.
I think that's also the problem in your test case.  Your test case
doesn't explicitly involve a variable but I think there must be
something like that under the hood.

This problem has been discussed before but nobody's done anything
about it.  The problem is a bit tricky because the core system doesn't
know anything about the function caches maintained by individual PLs.
I suppose ideally there'd be a way for a PL to say "if the definition
of X changes, please tell me to recompile function Y".  That probably
wouldn't be perfect because the PL might not be able to figure out
everything on which they actually depend; that might be
Turing-complete in some cases.  But even a partial solution would
probably be welcomed by users.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CTE inlining



On 05/01/2017 09:05 AM, Tomas Vondra wrote:
> On 05/01/2017 06:22 AM, Pavel Stehule wrote:
>>
>>
>> 2017-05-01 1:21 GMT+02:00 Andres Freund > >:
>>
>> On 2017-04-30 07:19:21 +0200, Pavel Stehule wrote:
>> > why we cannot to introduce GUC option - enable_cteoptfence ?
>>
>> Doesn't really solve the issue, and we've generally shied away
>> from GUCs
>> that influence behaviour after a few bad experiences.  What if
>> you want
>> one CTE inlined, but another one not?
>>
>>
>> It change behave in same sense like enable_nestloop, enable_hashjoin,
>> ... with same limits.
>>
>
> Those (and also the other enable_*) GUCs are a great example why we
> should not use GUCs for tweaking planner behavior, except perhaps for
> the purpose of investigation. It's an extremely blunt tool.
>
> You typically want to affect just a single node in the query plan
> (say, one join), but those options don't allow you to do that. It's
> all or nothing thing.
>
> Even if you're OK with affecting the whole query, it's a separate
> control channel - it's not embedded in the query, the user has to set
> it somehow. So you either set it for the whole session (affecting all
> the other queries that don't really need it), or you set it before
> each query. Which however sucks for a number of reasons, e.g. if you
> have a slow query in the log, how do you know with what GUC values it
> was executed? (You don't, and there's no way to find out.)
>
> Exactly the same issues would affect this new GUC. It would be
> impossible to use multiple CTEs in the query with different fencing
> behavior, and it would be just as difficult to investigate.
>
> So no more planner-affecting GUCs, please, particularly if we expect
> regular users to use them.
>
>


+1

I still see users wanting to use the enable_foo settings in production.

Having had years of telling users that CTEs are an optimization fence it
doesn't seem at all nice for us to turn around and change our mind about
that. I have relied on it in the past and I'm sure I'm very far from
alone in that.

Maybe we could allow a "decorator" that would tell the planner the CTE
could be inlined?

WITH INLINE mycte AS ( ...)

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG 10 release notes

2017-05-01 Thread Joshua D. Drake


On 05/01/2017 05:40 AM, Robert Haas wrote:

On Tue, Apr 25, 2017 at 9:56 PM, Bruce Momjian  wrote:

First, I don't think RFC references belong in the release notes, let
alone RFC links.


Why not?  I think RFC references are a great thing to include in the
release notes, and including links seems helpful, too.


I could see not including RFC references in a PR but release notes seem 
to be the perfect place for them.


JD


--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] check with serial

The other day I wanted to run "make check" but with the serial schedule.
This wasn't as easy as it should have been. Although we now have
installcheck-parallel we don't have check-serial. Should we have that?
Alternatively, should we allow a SCHEDULE=foo argument for the "check"
target which defaults to parallel?


cheers


andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CTE inlining

On 1 May 2017 at 21:22, Andrew Dunstan  wrote:

> Having had years of telling users that CTEs are an optimization fence it
> doesn't seem at all nice for us to turn around and change our mind about
> that. I have relied on it in the past and I'm sure I'm very far from
> alone in that.
>
> Maybe we could allow a "decorator" that would tell the planner the CTE
> could be inlined?
>
> WITH INLINE mycte AS ( ...)

I'd rather reverse that so we behave like other implementations by
default, and have extension syntax for our no-inline query hint. And
yes, that's what it is, because we'd only inline when we could produce
semantically equivalent results anyway.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Description of create_singleton_array()

2017-05-01 Thread Neha Khatri

Is it intentional to have the existing $SUBJECT.

The commit 33f43725

updated
the function text_to_array() such that it does not directly invoke
create_singleton_array(). But $SUBJECT was not updated.

If it is not intentional then is it fine to update the description like
attached.

Regards,
Neha


correct-desc-create-singleton-array.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CTE inlining

2017-05-01 Thread David Fetter

On Mon, May 01, 2017 at 09:22:42AM -0400, Andrew Dunstan wrote:
> > So no more planner-affecting GUCs, please, particularly if we expect
> > regular users to use them.
> 
> +1
> 
> I still see users wanting to use the enable_foo settings in production.
> 
> Having had years of telling users that CTEs are an optimization fence it
> doesn't seem at all nice for us to turn around and change our mind about
> that. I have relied on it in the past and I'm sure I'm very far from
> alone in that.

You are certainly not alone, but I believe that in this you're missing
the vast majority (we hope) of PostgreSQL users.  These are the users
who have yet to adopt PostgreSQL, and have the quite reasonable
expectation that ordinary-looking grammar *isn't* an optimization
fence.

> Maybe we could allow a "decorator" that would tell the planner the CTE
> could be inlined?
> 
> WITH INLINE mycte AS ( ...)

+1 for a decorator, -1 for this one.

We already have an explicit optimization fence with OFFSET 0, and I
think making optimization fences explicit is how we should continue.
I'd be more in favor of something along the lines of

WITH FENCED/* Somewhat fuzzy.  What fence? */
or
WITH AT_MOST_ONCE  /* Clearer, but not super precise */
or
WITH UNIQUE_ATOMIC /* More descriptive, but not super clear without the 
docs in hand */

or something along that line.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG 10 release notes

On Mon, Apr 24, 2017 at 11:37 PM, Bruce Momjian  wrote:
>> I think that might also be because you skipped a few things that should
>> get their own entries.  I've not yet made a pass through your draft (and
>> won't for some days), but a quick search shows the draft to e.g miss:
>> b30d3ea824c5ccba43d3e942704f20686e7dbab8 - Add a macro templatized hashtable.
>> 75ae538bc3168bf44475240d4e0487ee2f3bb376 - Use more efficient hashtable for 
>> tidbitmap.c to speed up bitmap scans.
>> 5dfc198146b49ce7ecc8a1fc9d5e171fb75f6ba5 - Use more efficient hashtable for 
>> execGrouping.c to speed up hash aggregation.
>> fc4b3dea2950e4f6081f1ed2380f82c9efd672e0 - User narrower representative 
>> tuples in the hash-agg hashtable.
>> 8ed3f11bb045ad7a3607690be668dbd5b3cc31d7 - Perform one only projection to 
>> compute agg arguments.
>> (not that this needs to five entries)
>
> I remember seeing those and those are normally details I do not put in
> the release notes as there isn't a clear user experience change except
> "Postgres is faster".  Yeah, a bummer, and I can change my filter, but
> it would require discussion.

I'm pretty sure this is not the first year in which your policy of
excluding certain performance-related items has met with opposition.
I agree that there are some improvements that are sufficiently small
and boring that they do not merit a mention, but (1) that's also true
of non-performance items and (2) the fact that people keep complaining
about performance items you excluded constitutes a discussion of
changing your filter.

My own opinion is that the filter should not be particularly different
for performance items vs. non-performance.  The question ought to be
whether the change is significant enough that users are likely to
care.  If you've got several people saying "hey, you forgot NN in
the release notes!" it is probably a good bet that the change is
significant enough that users will care about it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CTE inlining

2017-05-01 Thread Andreas Karlsson


On 05/01/2017 04:17 PM, David Fetter wrote:

Maybe we could allow a "decorator" that would tell the planner the CTE
could be inlined?

WITH INLINE mycte AS ( ...)


+1 for a decorator, -1 for this one.


I am not sure I like decorators since this means adding an ad hoc query 
hint directly into the SQL syntax which is something which I requires 
serious consideration.



We already have an explicit optimization fence with OFFSET 0, and I
think making optimization fences explicit is how we should continue.
I'd be more in favor of something along the lines of

WITH FENCED/* Somewhat fuzzy.  What fence? */
or
WITH AT_MOST_ONCE  /* Clearer, but not super precise */
or
WITH UNIQUE_ATOMIC /* More descriptive, but not super clear without the 
docs in hand */

or something along that line.


What about WITH MATERIALIZED, borrowing from the MySQL terminology 
"materialized subquery"?


Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CTE inlining

2017-05-01 Thread David G. Johnston

On Mon, May 1, 2017 at 7:26 AM, Andreas Karlsson  wrote:

> On 05/01/2017 04:17 PM, David Fetter wrote:
>
>> Maybe we could allow a "decorator" that would tell the planner the CTE
>>> could be inlined?
>>>
>>> WITH INLINE mycte AS ( ...)
>>>
>>
>> +1 for a decorator, -1 for this one.
>>
>
> I am not sure I like decorators since this means adding an ad hoc query
> hint directly into the SQL syntax which is something which I requires
> serious consideration.

Given that we already have
"
prevent optimization
"
syntax why do we need a decorator on the CTE?

>
>
> We already have an explicit optimization fence with OFFSET 0, and I
>> think making optimization fences explicit is how we should continue.
>> I'd be more in favor of something along the lines of
>>
>> WITH FENCED/* Somewhat fuzzy.  What fence? */
>> or
>> WITH AT_MOST_ONCE  /* Clearer, but not super precise */
>> or
>> WITH UNIQUE_ATOMIC /* More descriptive, but not super clear without
>> the docs in hand */
>>
>> or something along that line.
>>
>
> What about WITH MATERIALIZED, borrowing from the MySQL terminology
> "materialized subquery"?

I would shorten that to "WITH MAT" except that I don't think that having
two way to introduce an optimization fence is worthwhile.

If we don't optimize SRFs-in-target-list, and thus avoid multiple function
evaluation for (composite_col).*, I believe a significant number of
intentional optimization fence uses will be safe but behavioral changes.

David J.

Re: [HACKERS] Partition-wise aggregation/grouping

On Fri, Apr 28, 2017 at 3:03 AM, Antonin Houska  wrote:
> I think this is not generic enough because the result of the Append plan can
> be joined to another relation. As such a join can duplicate the
> already-aggregated values, the aggregates should not be finalized below the
> top-level plan.

If the grouping key matches the partition key, then it's correct to
push the entire aggregate down, and there's probably a large
performance advantage from avoiding aggregating twice.  If the two
don't match, then pushing the aggregate down necessarily involves a
"partial" and a "finalize" stage, which may or may not be cheaper than
doing the aggregation all at once.  If you have lots of 2-row groups
with 1 row in the first branch of the append and 1 row in the second
branch of the append, breaking the aggregate into two steps is
probably going to be a loser.  If the overall number of groups is
small, it's probably going to win.  But when the grouping key matches
the partition key, so that two-stage aggregation isn't required, I
suspect the pushdown should almost always win.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] some review comments on logical rep code

I have committed this version.

I have omitted all the talk about 2PC.  There are discussions ongoing
about changing the transaction behavior of CREATE SUBSCRIPTION, which
might interfere with that.  If someone wants to rebase and propose the
parts about 2PC separately, I don't object, but it can be handled
separately.

I will close this open item now, since I think we have covered
everything that was originally reported.  Please start new threads if
there are any issues that were missed or that have been discovered
during the discussion.


On 4/23/17 22:18, Masahiko Sawada wrote:
> On Sat, Apr 22, 2017 at 4:26 AM, Fujii Masao  wrote:
>> On Fri, Apr 21, 2017 at 4:02 PM, Masahiko Sawada  
>> wrote:
>>> On Fri, Apr 21, 2017 at 1:19 AM, Fujii Masao  wrote:
 On Tue, Apr 18, 2017 at 5:16 PM, Masahiko Sawada  
 wrote:
> On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI
>  wrote:
>> Hi,
>>
>> Thank you for the revised version.
>>
>> At Mon, 17 Apr 2017 23:29:28 +0900, Masahiko Sawada 
>>  wrote in 
>> 
>>> On Mon, Apr 17, 2017 at 9:13 PM, Masahiko Sawada 
>>>  wrote:
 On Mon, Apr 17, 2017 at 7:39 PM, Kyotaro HORIGUCHI
  wrote:
> At Mon, 17 Apr 2017 18:02:57 +0900, Masahiko Sawada 
>  wrote in 
> 
>> On Fri, Apr 14, 2017 at 4:47 AM, Fujii Masao  
>> wrote:
>> 1.
>>>
>>> In ApplyWorkerMain(), DatumGetInt32() should be used to get integer
>>> value from the argument, instead of DatumGetObjectId().
>>
>> Attached 001 patch fixes it.
>
> Hmm. I looked at the launcher side and found that it assigns bare
> integer to bgw_main_arg. It also should use Int32GetDatum.

 Yeah, I agreed. Will fix it.
>>
>> Thanks.
>>
>> 2.
>>>
>>> No one resets on_commit_launcher_wakeup flag to false.
>>
>> Attached 002 patch makes on_commit_launcher_wakeup turn off after 
>> woke
>> up the launcher process.
>
> It is omitting the abort case. Maybe it should be
> AtEOXact_ApplyLauncher(bool commit)?

 Should we wake up the launcher process even when abort?
>>
>> No, I meant that on_commit_launcher_wakeup should be just reset
>> without waking up launcher on abort.
>
> I understood. Sounds reasonable.

 ROLLBACK PREPARED should not wake up the launcher, but not even with the 
 patch.
>>>
>>> Yeah, I've missed it. But on_commit_launcher_wakeup dosen't correspond
>>> to a prepared transaction. Even COMMIT PREPARED might wake up the
>>> launcher process but might not wake up it. ROLLBACK PREPARED is also
>>> the same. For example, in the following situation we wake up the
>>> launcher at COMMIT, not at COMMIT PREPARED.
>>>
>>> (BTW, CreateSubscription, is the only one function that sets
>>> on_commit_launcher_wakeup for now, cannot be called in a transaction
>>> block. So it doesn't actually happen that we wake up the launcher when
>>> a ROLLBACK PREPARED. But considering waking up the launcher by ALTER
>>> SUBSCRIPTION ENABLE, we need to deal with it.)
>>
>> We can run CREATE SUBSCRIPTION with NOCREATE SLOT option
>> within the transaction block.
> 
> Oops, I'd missed it.
> 
>>
>>>
>>> BEGIN;
>>> ALTER SUBSCRIPTION hoge_sub ENABLE;
>>> PREPARE TRANSACTION 'g';
>>> BEGIN;
>>> SELECT 1;
>>> COMMIT;  -- wake up the launcher at this point.
>>> COMMIT PREPARED 'g';
>>>
>>> Even if we wake up the launcher even when a ROLLBACK PREPARED, it's
>>> not a big deal to the launcher process actually. Compared with the
>>> complexly of changing the logic so that on_commit_launcher_wakeup
>>> corresponds to prepared transaction, we might want to accept this
>>> behavior.
>>
>> on_commit_launcher_wakeup needs to be recoreded in 2PC state
>> file to handle this issue properly. But I agree with you, that would
>> be overkill for small gain. So I'm thinking to add the following
>> comment into your patch and push it. Thought?
>>
>> -
>> Note that on_commit_launcher_wakeup flag doesn't work as expected in 2PC 
>> case.
>> For example, COMMIT PREPARED on the transaction enabling the flag
>> doesn't wake up
>> the logical replication launcher if ROLLBACK on another transaction
>> runs before it.
>> To handle this case properly, the flag needs to be recorded in 2PC
>> state file so that
>> COMMIT PREPARED and ROLLBACK PREPARED can determine whether to wake up
>> the launcher. However this is overkill for small gain and false wakeup
>> of the launcher
>> is not so harmful (probably we can live with that), so we do nothing
>> here for this issue.
>> -
>>
> 
> Agreed.
> 
> I added this comment to the patch and attached it.
> 
> Regards,
> 
> --
> Masahiko Sawada
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center
> 
> 
> 
> 


-- 
Peter Eisentraut  http:/

Re: [HACKERS] CTE inlining

2017-05-01 Thread Andreas Karlsson

On 05/01/2017 04:33 PM, David G. Johnston wrote:
> On Mon, May 1, 2017 at 7:26 AM, Andreas Karlsson  I am not sure I like decorators since this means adding an ad hoc
> query hint directly into the SQL syntax which is something which I
> requires serious consideration.
>
> Given that we already have
> "
> prevent optimization
> "
> syntax why do we need a decorator on the CTE?

I do not think I follow. Me and some other people here would ideally 
allow CTEs to be inlined by default. Some people today use CTEs as 
optimization fences, to for example control join order, and the 
suggestion here is to add new syntax for CTEs to allow them to 
selectively be used as optimization fences.

> I would shorten that to "WITH MAT" except that I don't think that
> having two way to introduce an optimization fence is worthwhile.

You mean OFFSET 0? I have never been a fan of using it as an 
optimization fence. I do not think OFFSET 0 conveys clearly enough to 
the reader that is is an optimization fence.

Andreas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] some review comments on logical rep code

On 4/29/17 00:33, Noah Misch wrote:
> On Fri, Apr 28, 2017 at 02:13:48PM -0400, Peter Eisentraut wrote:
>> On 4/28/17 01:01, Noah Misch wrote:
>>> On Fri, Apr 28, 2017 at 01:55:48PM +0900, Masahiko Sawada wrote:
 On Fri, Apr 28, 2017 at 1:42 PM, Noah Misch  wrote:
> On Fri, Apr 28, 2017 at 06:37:09AM +0900, Fujii Masao wrote:
>> Pushed. Thanks!
>
> Does this close the open item, or is there more to do?

 There is only one item remaining, and the patch is attached on
 here[1]. I guess reviewing that patch is almost done.

 [1] 
 https://www.postgresql.org/message-id/CAHGQGwELzJrA4SDA4TsJGds4X-ykTOP%2By5hecsoQmQqzZf8T7A%40mail.gmail.com
>>>
>>> Thanks.  Peter, This PostgreSQL 10 open item is past due for your status
>>> update.  Kindly send a status update within 24 hours, and include a date for
>>> your subsequent status update.  Refer to the policy on open item ownership:
>>> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
>>
>> I think the patch that Fujii Masao has proposed has found general
>> agreement.  I would recommend that he commits it as he sees fit.
> 
> This is not a conforming status update, because it does not specify a date for
> your next update.

Everything related to this is fixed now.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CTE inlining

2017-05-01 Thread Corey Huinker

On Mon, May 1, 2017 at 10:26 AM, Andreas Karlsson  wrote:

> What about WITH MATERIALIZED, borrowing from the MySQL terminology
> "materialized subquery"?


+1, you beat me to it.

Re: [HACKERS] CTE inlining

2017-05-01 Thread David G. Johnston

On Mon, May 1, 2017 at 7:43 AM, Andreas Karlsson  wrote:

> On 05/01/2017 04:33 PM, David G. Johnston wrote:
> > On Mon, May 1, 2017 at 7:26 AM, Andreas Karlsson  > I am not sure I like decorators since this means adding an ad hoc
> > query hint directly into the SQL syntax which is something which I
> > requires serious consideration.
>
> > I would shorten that to "WITH MAT" except that I don't think that
> > having two way to introduce an optimization fence is worthwhile.
>
> You mean OFFSET 0? I have never been a fan of using it as an optimization
> fence. I do not think OFFSET 0 conveys clearly enough to the reader that is
> is an optimization fence.

I think I was being too literal in my thinking.  Proposing that we
effectively do away with OFFSET 0 and instead recommend "WITH MAT name AS
()" for subqueries requiring standalone evaluation is something I can agree
with.  I too have always though of OFFSET 0 as hack-ish which is why my SRF
usage examples have always used CTEs.

David J.

Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)

On Mon, May 1, 2017 at 12:51 AM, Amit Langote
 wrote:
> What we could document now is that partitioned tables don't allow
> specifying triggers that reference transition tables.  Although, I am
> wondering where this note really belongs - the partitioning chapter, the
> triggers chapter or the CREATE TRIGGER reference page?  Maybe, Kevin and
> Thomas have something to say about that.  If it turns out that the
> partitioning chapter is a good place, here is a documentation patch.

I think that before we document this behavior, we'd better make sure
we understand exactly what the behavior is, and we'd better make sure
it's correct.  Currently, triggers that involve transition tables are
altogether prohibited when the root relation is partitioned, but are
allowed in inheritance cases.  However, the actual behavior appears to
be buggy.  Here's what happens when I define a parent and a child and
update all the rows:

rhaas=# CREATE FUNCTION t() RETURNS trigger
rhaas-# LANGUAGE plpgsql
rhaas-# AS $$declare q record; begin raise notice 'table %',
tg_table_name; for q in select * from old loop raise notice 'table %
got value %', tg_table_name, q.a; end loop; return null; end;$$;
CREATE FUNCTION
rhaas=# CREATE TABLE p (a int, b text);
CREATE TABLE
rhaas=# CREATE TABLE p1 () INHERITS (p);
CREATE TABLE
rhaas=# CREATE TRIGGER x AFTER UPDATE ON p REFERENCING OLD TABLE AS
old NEW TABLE AS new FOR EACH STATEMENT EXECUTE PROCEDURE t();
CREATE TRIGGER
rhaas=# INSERT INTO p VALUES (0, 'zero');
INSERT 0 1
rhaas=# INSERT INTO p1 VALUES (1, 'one');
INSERT 0 1
rhaas=# INSERT INTO p1 VALUES (2, 'two');
INSERT 0 1
rhaas=# UPDATE p SET b = 'whatever';
NOTICE:  table p
NOTICE:  table p got value 0
UPDATE 3

Only the rows in the parent show up in the transition table.  But now
look what happens if I add an unrelated trigger that also uses
transition tables to the children:

rhaas=# CREATE FUNCTION u() RETURNS trigger LANGUAGE plpgsql AS
$$begin null; end$$;
CREATE FUNCTION
rhaas=# CREATE TRIGGER x1 AFTER UPDATE ON p1 REFERENCING OLD TABLE AS
old NEW TABLE AS new FOR EACH STATEMENT EXECUTE PROCEDURE u();
CREATE TRIGGER
rhaas=# UPDATE p SET b = 'whatever';
NOTICE:  table p
NOTICE:  table p got value 0
NOTICE:  table p got value 1
NOTICE:  table p got value 2
UPDATE 3

It seems pretty clear to me that this is busted.  The existence of
trigger x1 on p1 shouldn't affect whether trigger x on p sees changes
to p1's rows in its transition tables.  Either all changes to any
descendants of p should be captured by the transition tables, or only
changes to the root table should be captured.  If we do the former,
the restriction against using transition tables in triggers on
partitioned tables should be removed, I would think.  If we do the
latter, then what we should document is not that partitioned tables
have a restriction that doesn't apply to inheritance but rather that
the restriction on the partitioned case flows from the fact that only
the parent's changes are captured, and the parent is always empty in
the partitioning case.  In deciding between these two cases, we should
consider the case where the inheritance children have extra columns
and/or different column orderings.

Adding this as an open item.  Kevin?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CTE inlining



On 05/01/2017 10:17 AM, David Fetter wrote:
> On Mon, May 01, 2017 at 09:22:42AM -0400, Andrew Dunstan wrote:
>>> So no more planner-affecting GUCs, please, particularly if we expect
>>> regular users to use them.
>> +1
>>
>> I still see users wanting to use the enable_foo settings in production.
>>
>> Having had years of telling users that CTEs are an optimization fence it
>> doesn't seem at all nice for us to turn around and change our mind about
>> that. I have relied on it in the past and I'm sure I'm very far from
>> alone in that.
> You are certainly not alone, but I believe that in this you're missing
> the vast majority (we hope) of PostgreSQL users.  These are the users
> who have yet to adopt PostgreSQL, and have the quite reasonable
> expectation that ordinary-looking grammar *isn't* an optimization
> fence.
>



I am not in favor of seriously inconveniencing a significant number of
our existing users for the sake of a bunch of people who don't now and
might never in the future use Postgres. I think the bar for silent
behaviour changes needs to be a bit higher than this one is.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQhost may return socket dir for network connection

On Fri, Apr 28, 2017 at 3:43 AM, Kyotaro HORIGUCHI
 wrote:
> As the subject, PQhost() seems to be forgeting about the case
> where only hostaddr is specified in a connection string.

I suspect that may have been intentional.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] .pgpass's behavior has changed

On Fri, Apr 28, 2017 at 3:54 AM, Kyotaro HORIGUCHI
 wrote:
> But the above also leaves a bug so I sent another patch to fix
> it. The attched patch restores the 9.6's beavior of looking up
> .pgpass file in the same manner to the aother patch.

Thanks for catching this.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cached plans and statement generalization

On Fri, Apr 28, 2017 at 6:01 AM, Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:

>
> Any comments and suggestions for future improvement of this patch are
> welcome.
>

+PG_TRY();
+{
+query = parse_analyze_varparams(parse_tree,
+query_string,
+¶m_types,
+&num_params);
+}
+PG_CATCH();
+{
+/*
+ * In case of analyze errors revert back to original query
processing
+ * and disable autoprepare for this query to avoid such
problems in future.
+ */
+FlushErrorState();
+if (snapshot_set) {
+PopActiveSnapshot();
+}
+entry->disable_autoprepare = true;
+undo_query_plan_changes(parse_tree, const_param_list);
+MemoryContextSwitchTo(old_context);
+return false;
+}
+PG_END_TRY();

This is definitely not a safe way of using TRY/CATCH.

+
+/* Convert literal value to parameter value */
+switch (const_param->literal->val.type)
+{
+  /*
+   * Convert from integer literal
+   */
+  case T_Integer:
+switch (ptype) {
+  case INT8OID:
+params->params[paramno].value =
Int64GetDatum((int64)const_param->literal->val.val.ival);
+break;
+  case INT4OID:
+params->params[paramno].value =
Int32GetDatum((int32)const_param->literal->val.val.ival);
+break;
+  case INT2OID:
+if (const_param->literal->val.val.ival < SHRT_MIN
+|| const_param->literal->val.val.ival > SHRT_MAX)
+{
+ereport(ERROR,
+
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("smallint out of range")));
+}
+params->params[paramno].value =
Int16GetDatum((int16)const_param->literal->val.val.ival);
+break;
+  case FLOAT4OID:
+params->params[paramno].value =
Float4GetDatum((float)const_param->literal->val.val.ival);
+break;
+  case FLOAT8OID:
+params->params[paramno].value =
Float8GetDatum((double)const_param->literal->val.val.ival);
+break;
+  case INT4RANGEOID:
+sprintf(buf, "[%ld,%ld]",
const_param->literal->val.val.ival, const_param->literal->val.val.ival);
+getTypeInputInfo(ptype, &typinput, &typioparam);
+params->params[paramno].value =
OidInputFunctionCall(typinput, buf, typioparam, -1);
+break;
+  default:
+pg_lltoa(const_param->literal->val.val.ival, buf);
+getTypeInputInfo(ptype, &typinput, &typioparam);
+params->params[paramno].value =
OidInputFunctionCall(typinput, buf, typioparam, -1);
+}
+break;
+  case T_Null:
+params->params[paramno].isnull = true;
+break;
+  default:
+/*
+ * Convert from string literal
+ */
+getTypeInputInfo(ptype, &typinput, &typioparam);
+params->params[paramno].value =
OidInputFunctionCall(typinput, const_param->literal->val.val.str,
typioparam, -1);
+}

I don't see something with a bunch of hard-coded rules for particular type
OIDs having any chance of being acceptable.

This patch seems to duplicate a large amount of existing code.  That would
be a good thing to avoid.

It could use a visit from the style police and a spell-checker, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] scram and \password

On Thu, Apr 27, 2017 at 3:22 AM, Heikki Linnakangas  wrote:
> You could argue, that since we need to document how to avoid the query and
> the blocking, we might as well always require the application to run the
> "show password_encryption" query before calling PQencryptPasswordConn(). But
> I'd like to offer the convenience for the majority of applications that
> don't mind blocking.

I still think that's borrowing trouble.  It just seems like too
critical of a thing to have a default -- if the convenience logic gets
it wrong and encrypts the password in a manner not intended by the
user, that could (a) amount to a security vulnerability or (b) lock
you out of your account.  If you ask your significant other "where do
you want to go to dinner?" and can't get a clear answer out of them
after some period of time, it's probably OK to assume they don't care
that much and you can just pick something.  If you ask the
commander-in-chief "which country should we fire the missiles at?" and
you don't get a clear and unambiguous answer, just picking something
is not a very good idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQhost may return socket dir for network connection

Robert Haas  writes:
> On Fri, Apr 28, 2017 at 3:43 AM, Kyotaro HORIGUCHI
>  wrote:
>> As the subject, PQhost() seems to be forgeting about the case
>> where only hostaddr is specified in a connection string.

> I suspect that may have been intentional.

See commits 11003eb55 and 40cb21f70 for recent history in this area.
Further back there's more history around host vs. hostaddr.  We've
gone back and forth on this in the past, including an ultimately
reverted attempt to add "PQhostaddr()", so it'd be a good idea to
study those past threads before proposing a change here.

Having said that, the behavior stated in $subject does sound wrong.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapbuild woes

On 2017-05-01 08:46:47 -0400, Tom Lane wrote:
> Noah Misch  writes:
> > On Fri, Apr 21, 2017 at 10:34:58PM -0700, Andres Freund wrote:
> >> ... I was wondering about adding
> >> a loop that simply runs for like 30s and then quits or such, but who
> >> knows.
> 
> > If the probabilistic test catches the bug even 5% of the time in typical
> > configurations, the buildfarm will rapidly identify any regression.  I'd
> > choose a 7s test that detects the bug 5% of the time over a 30s test that
> > detects it 99% of the time.  (When I wrote src/bin/pgbench/t/001_pgbench.pl
> > for a probabilistic bug, I sized that test to finish in 1s and catch its bug
> > half the time.  In its case, only two buildfarm members were able to
> > demonstrate the original bug, so 5% detection would have been too low.)
> 
> 30sec is kind of a big lump from a buildfarm standpoint, especially if
> you mean "it runs for 30s on my honkin' fast workstation".  I'm fine
> with individual tests that run for ~ 1sec.

I was more thinking of pgench -T$XX, rather than constant number of
iterations.  I currently can reproduce the issues within like 3-4
minutes, so 5s is probably not quite sufficient to get decent coverage.


- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)

2017-05-01 Thread Kevin Grittner

On Mon, May 1, 2017 at 10:01 AM, Robert Haas  wrote:

> It seems pretty clear to me that this is busted.

I don't think you actually tested anything that is dependent on any
of my patches there.

> Adding this as an open item.  Kevin?

It will take some time to establish what legacy behavior is and how
the new transition tables are impacted.  My first reaction is that a
trigger on the parent should fire for any related action on a child
(unless maybe the trigger is defined with an ONLY keyword???) using
the TupleDesc of the parent.  Note that the SQL spec mandates that
even in a AFTER EACH ROW trigger the transition tables must
represent all rows affected by the STATEMENT.  I think that this
should be independent of triggers fired at the row level.  I think
the rules should be similar for updateable views.

This will take some time to investigate, discuss and produce a
patch.  I think best case is Friday.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_ctl wait exit code (was Re: [COMMITTERS] pgsql: Additional tests for subtransactions in recovery)

On 4/27/17 08:41, Michael Paquier wrote:
> +$node_slave->promote;
> +$node_slave->poll_query_until('postgres',
> +   "SELECT NOT pg_is_in_recovery()")
> +  or die "Timed out while waiting for promotion of standby";
> 
> This reminds me that we should really switch PostgresNode::promote to
> use the wait mode of pg_ctl promote, and remove all those polling
> queries...

I was going to say: This should all be obsolete already, because pg_ctl
promote waits by default.

However: Failure to complete promotion within the waiting time does not
lead to an error exit, so you will not get a failure if the promotion
does not finish.  This is probably a mistake.  Looking around pg_ctl, I
found that this was handled seemingly inconsistently in do_start(), but
do_stop() errors when it does not complete.

Possible patches for this attached.

Perhaps we need a separate exit code in pg_ctl to distinguish general
errors from did not finish within timeout?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

0001-pg_ctl-Make-failure-to-complete-operation-a-nonzero-.patch
Description: invalid/octet-stream

0002-Remove-unnecessary-pg_is_in_recovery-calls-in-tests.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapbuild woes

Andres Freund  writes:
> On 2017-05-01 08:46:47 -0400, Tom Lane wrote:
>> 30sec is kind of a big lump from a buildfarm standpoint, especially if
>> you mean "it runs for 30s on my honkin' fast workstation".  I'm fine
>> with individual tests that run for ~ 1sec.

> I was more thinking of pgench -T$XX, rather than constant number of
> iterations.  I currently can reproduce the issues within like 3-4
> minutes, so 5s is probably not quite sufficient to get decent coverage.

Adding a five-minute pgbench run to the buildfarm sequence is definitely
going to get you ridden out of town on a rail.  But quite aside from the
question of whether we can afford the cycles, it seems like the wrong
approach.  IMO the buildfarm is mainly for verifying portability, not for
trying to prove that race-like conditions don't exist.  In most situations
we're going out of our way to ensure reproduceability of tests we add to
the buildfarm sequence; but it seems like this is looking for
irreproducible results.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapbuild woes

On 2017-05-01 12:32:07 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-05-01 08:46:47 -0400, Tom Lane wrote:
> >> 30sec is kind of a big lump from a buildfarm standpoint, especially if
> >> you mean "it runs for 30s on my honkin' fast workstation".  I'm fine
> >> with individual tests that run for ~ 1sec.
> 
> > I was more thinking of pgench -T$XX, rather than constant number of
> > iterations.  I currently can reproduce the issues within like 3-4
> > minutes, so 5s is probably not quite sufficient to get decent coverage.
> 
> Adding a five-minute pgbench run to the buildfarm sequence is definitely
> going to get you ridden out of town on a rail.

Right - that was referring to Noah's comment upthread:

On 2017-04-29 14:42:01 -0700, Noah Misch wrote:
> If the probabilistic test catches the bug even 5% of the time in typical
> configurations, the buildfarm will rapidly identify any regression.  I'd
> choose a 7s test that detects the bug 5% of the time over a 30s test that
> detects it 99% of the time.  (When I wrote src/bin/pgbench/t/001_pgbench.pl
> for a probabilistic bug, I sized that test to finish in 1s and catch its bug
> half the time.  In its case, only two buildfarm members were able to
> demonstrate the original bug, so 5% detection would have been too low.)

and I suspect that you'd not find these within 5s within sufficient
time, because the detection rate would be too low.



> But quite aside from the question of whether we can afford the cycles,
> it seems like the wrong approach.  IMO the buildfarm is mainly for
> verifying portability, not for trying to prove that race-like
> conditions don't exist.  In most situations we're going out of our way
> to ensure reproduceability of tests we add to the buildfarm sequence;
> but it seems like this is looking for irreproducible results.

Yea, I wondered about that upthread as well.  But the tests are quite
useful nonetheless.  Wonder about adding them simply as a separate
target.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)

On Mon, May 1, 2017 at 12:10 PM, Kevin Grittner  wrote:
> On Mon, May 1, 2017 at 10:01 AM, Robert Haas  wrote:
>> It seems pretty clear to me that this is busted.
>
> I don't think you actually tested anything that is dependent on any
> of my patches there.

I was testing which rows show up in a transition table, so I assumed
that was related to the transition tables patch.  Note that this is
not about which triggers are fired, just about how inheritance
interacts with transition tables.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapbuild woes

Andres Freund  writes:
> On 2017-05-01 12:32:07 -0400, Tom Lane wrote:
>> But quite aside from the question of whether we can afford the cycles,
>> it seems like the wrong approach.  IMO the buildfarm is mainly for
>> verifying portability, not for trying to prove that race-like
>> conditions don't exist.  In most situations we're going out of our way
>> to ensure reproduceability of tests we add to the buildfarm sequence;
>> but it seems like this is looking for irreproducible results.

> Yea, I wondered about that upthread as well.  But the tests are quite
> useful nonetheless.  Wonder about adding them simply as a separate
> target.

I have no objection to adding more tests as a non-default target.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQhost may return socket dir for network connection

On Mon, May 1, 2017 at 12:06 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Fri, Apr 28, 2017 at 3:43 AM, Kyotaro HORIGUCHI
>>  wrote:
>>> As the subject, PQhost() seems to be forgeting about the case
>>> where only hostaddr is specified in a connection string.
>
>> I suspect that may have been intentional.
>
> See commits 11003eb55 and 40cb21f70 for recent history in this area.
> Further back there's more history around host vs. hostaddr.  We've
> gone back and forth on this in the past, including an ultimately
> reverted attempt to add "PQhostaddr()", so it'd be a good idea to
> study those past threads before proposing a change here.
>
> Having said that, the behavior stated in $subject does sound wrong.

I'm not sure.  My understanding of the relationship between host and
hostaddr is that hostaddr overrides our notion of where to find host,
but not our notion of the host to which we're connecting.  Under that
definition, the current behavior as described by Kyotaro sounds
correct.  When you say host=X hostaddr=Y, we act as though we're
connecting to X but try to connect to IP Y.  When you just specify
hostaddr=Y, we act as if we're trying to connect to the default host
(which happens to be a socket address in that example) but actually
use the specified IP address.  That's consistent.

Now, against that,
https://www.postgresql.org/docs/9.6/static/libpq-connect.html says:

--
If hostaddr is specified without host, the value for hostaddr gives
the server network address. The connection attempt will fail if the
authentication method requires a host name.
--

And PQhost() asks for the host name, so you could argue that it too
should "fail" in this situation.  But it has no way to report failure,
so that's kind of problematic.
https://www.postgresql.org/docs/9.6/static/libpq-status.html says:

--
Without either a host name or host address, libpq will connect using a
local Unix-domain socket; or on machines without Unix-domain sockets,
it will attempt to connect to localhost.
--

But that's just about where to make the connection, not what we should
consider the hostname to be for purposes other than calling
connect(2).

Kyotaro Horiguchi argues that the current behavior is "not useful" and
that's probably true, but I don't think it's the job of an API to try
as hard as possible to do something useful in every case.  It's more
important that the behavior is predictable and understandable.  In
short, if we're going to change the behavior of PQhost() here, then
there should be a documentation change to go with it, because the
current documentation around the interaction between host and hostaddr
does not make it at all clear that the current behavior is wrong, at
least not as far as I can see.  To the contrary, it suggests that if
you use hostaddr without host, you may find the results surprising or
even unfortunate:

--
Using hostaddr instead of host allows the application to avoid a host
name look-up, which might be important in applications with time
constraints. However, a host name is required for GSSAPI or SSPI
authentication methods, as well as for verify-full SSL certificate
verification. ... Note that authentication is likely to fail if host
is not the name of the server at network address hostaddr.
--

The overall impression that the documentation leaves me with is that
you are expected to use only host unless you care about saving name
lookups; then, use both host and hostaddr; if you want to use just
hostaddr you can try it, but it'll fail to work properly if you then
try to do something that needs a host name.  Calling PQhost() is,
perhaps, one such thing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)

2017-05-01 Thread Kevin Grittner

On Mon, May 1, 2017 at 11:53 AM, Robert Haas  wrote:
> On Mon, May 1, 2017 at 12:10 PM, Kevin Grittner  wrote:
>> On Mon, May 1, 2017 at 10:01 AM, Robert Haas  wrote:
>>> It seems pretty clear to me that this is busted.
>>
>> I don't think you actually tested anything that is dependent on any
>> of my patches there.
>
> I was testing which rows show up in a transition table, so I assumed
> that was related to the transition tables patch.  Note that this is
> not about which triggers are fired, just about how inheritance
> interacts with transition tables.

Yeah, I got confused a bit there, comparing to the updateable views case.

-- 
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug with pg_basebackup and 'shared' tablespace

2017-05-01 Thread Pierre Ducroquet


On Friday, April 7, 2017 3:12:58 AM CEST, Kyotaro HORIGUCHI wrote:

Hi, Pierre.

Maybe you're the winner:p

At Thu, 06 Apr 2017 12:34:09 +0200, Pierre Ducroquet 
 wrote in <1714428.BHRm6e8A2D@peanuts2>

On Thursday, April 6, 2017 2:00:55 PM CEST Kyotaro HORIGUCHI wrote: ...


That being said, it is a different matter if the behavior is
preferable. The discussion on the behavior is continued here.

https://www.postgresql.org/message-id/20170406.160844.120459562.horiguchi.kyot...@lab.ntt.co.jp

Right now, there is a conflict between pg_basebackup and the 
server since they 
do not allow the same behaviour. I can submit a patch either 
way, but I won't 
decide what is the right way to do it.
I know tricks will allow to work around that issue, I found 
them hopefully and 
I guess most people affected by this issue would be able to 
find and use them, 
but nevertheless being able to build a server that can no longer be base-

backuped does not seem right.


regards,


Hi

I didn't have much time to spend on that issue until today, and I found a 
way to solve it that seems acceptable to me.


The biggest drawback will be that if the backup is interrupted, previous 
tablespaces already copied will stay on disk, but since that issue could 
already happen, I don't think it's too much a drawback.
The patch simply delays the empty folder checking until we start reading 
the tablespace tarball. The first entry of the tarball being the 
PG_MAJOR_CATVER folder, that can not be found otherwise, there is no real 
alternative to that.


I will submit this patch in the current commit fest.


Regards

Pierre
From f3e6b078d4159c90237d966c56289cb59b54ede1 Mon Sep 17 00:00:00 2001
From: Pierre Ducroquet 
Date: Mon, 1 May 2017 11:38:52 +0200
Subject: [PATCH] Allow a pg_basebackup when a tablespace is shared between two
 versions

When a tablespace folder is shared between two PostgreSQL versions,
pg_basebackup fails because its tablespace folder checking is stricter
than what is done in the server.
That behaviour makes it possible to create clusters that will then be
complicated to replicate whithout playing with symlinks.

This patch fixes this by delaying the tablespace folder verification.
The folder name is using the PG catalog version, that can not be
obtained from the server. It is a compile-time constant that is not
exposed publicly.
The fix is thus to simply delay the checking of folders and use the
folder name from the tablespace tarball.
---
 src/bin/pg_basebackup/pg_basebackup.c | 24 +++-
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index e2a2ebb30f..2e687a2880 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1306,6 +1306,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
 	pgoff_t		current_len_left = 0;
 	int			current_padding = 0;
 	bool		basetablespace;
+	bool		firstfile = 1;
 	char	   *copybuf = NULL;
 	FILE	   *file = NULL;
 
@@ -1399,7 +1400,15 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
 	 * Directory
 	 */
 	filename[strlen(filename) - 1] = '\0';		/* Remove trailing slash */
-	if (mkdir(filename, S_IRWXU) != 0)
+	if (firstfile && !basetablespace)
+	{
+		/*
+		 * The first file in the tablespace is its main folder, whose name can not be guessed (PG_MAJORVER_CATVER)
+		 * So we must check here that this folder can be created or is empty.
+		 */
+		verify_dir_is_empty_or_create(filename, &made_tablespace_dirs, &found_tablespace_dirs);
+	}
+	else if (mkdir(filename, S_IRWXU) != 0)
 	{
 		/*
 		 * When streaming WAL, pg_wal (or pg_xlog for pre-9.6
@@ -1530,6 +1539,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
 continue;
 			}
 		}		/* continuing data in existing file */
+		firstfile = 0;			/* mark that we are done with the first file of the tarball */
 	}			/* loop over all data blocks */
 	progress_report(rownum, filename, true);
 
@@ -1844,18 +1854,6 @@ BaseBackup(void)
 	for (i = 0; i < PQntuples(res); i++)
 	{
 		totalsize += atol(PQgetvalue(res, i, 2));
-
-		/*
-		 * Verify tablespace directories are empty. Don't bother with the
-		 * first once since it can be relocated, and it will be checked before
-		 * we do anything anyway.
-		 */
-		if (format == 'p' && !PQgetisnull(res, i, 1))
-		{
-			char	   *path = (char *) get_tablespace_mapping(PQgetvalue(res, i, 1));
-
-			verify_dir_is_empty_or_create(path, &made_tablespace_dirs, &found_tablespace_dirs);
-		}
 	}
 
 	/*
-- 
2.11.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQhost may return socket dir for network connection

Robert Haas  writes:
> On Mon, May 1, 2017 at 12:06 PM, Tom Lane  wrote:
>> Having said that, the behavior stated in $subject does sound wrong.

> I'm not sure.  My understanding of the relationship between host and
> hostaddr is that hostaddr overrides our notion of where to find host,
> but not our notion of the host to which we're connecting.  Under that
> definition, the current behavior as described by Kyotaro sounds
> correct.

Perhaps.  But hostaddr also forces us to believe that we're making an
IP connection, so it still seems pretty dubious to return a socket
path.  The true situation is that we're connecting to an IP host that
we do not know the name of.

I notice that one of the recent changes was made to avoid situations where
PQhost() would return NULL and thereby provoke a crash if the application
wasn't expecting that (which is not unreasonable of it, since the PQhost()
documentation mentions no such hazard).  So I would not want to see us
return NULL in this case.

And I believe we already considered and rejected the idea of having it
return the hostaddr string, back in some of the older discussions.
(We could revisit that decision, no doubt, but let's go back and see
what the reasoning was first.)

But maybe returning an empty string ("") would be OK?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: logical replication and PANIC during shutdown checkpoint in publisher

On 4/30/17 20:52, Noah Misch wrote:
> IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 10 open item is long past due
> for your status update.  Please thoroughly reacquaint yourself with the policy
> on open item ownership[1] and then reply immediately.  If I do not hear from
> you by 2017-05-02 01:00 UTC, I will transfer this item to release management
> team ownership without further notice.

I'm reviewing this now and will report tomorrow.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [buildfarm-members] BuildFarm client release 4.19


On 2017-04-19 15:59, Andrew Dunstan wrote:



I have released version 4.19 of the PostgreSQL Buildfarm client. It can
be downloaded from



I don't know if it's only me or if others have noticed this also but I 
have the buildfarm client set up like this:


/home/pgbf
/home/pgbf/build-farm-4
/home/pgbf/buildfarm@ -> build-farm-4.19/
/home/pgbf/buildroot

and then I have a cron job that does this:

x y * * * cd /home/pgbf/buildfarm && ./run_build.pl REL9_6_STABLE

and when it tries to run I get the following error:

cannot create 
/usr/home/pgbf/buildroot/REL9_6_STABLE/loach.lastrun-logs/lastcomand.log: 
No such file or directory

gmake is not GNU Make - please fix config file at ./run_build.pl line 343.

if I manually create the directory loach.lastrun-logs the script then 
continues normally.


I am no perl guru so I can't figure out really what is going on here.  I 
think that the directory 
/usr/home/pgbf/buildroot/REL9_6_STABLE/loach.lastrun-logs needs to be 
created before the PGBuild/Utils.pm sub run_log tries to write to it.


/Mikael


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [buildfarm-members] BuildFarm client release 4.19



On 05/01/2017 02:13 PM, Mikael Kjellström wrote:
> On 2017-04-19 15:59, Andrew Dunstan wrote:
>
>
>> I have released version 4.19 of the PostgreSQL Buildfarm client. It can
>> be downloaded from
>> 
>>
>
> I don't know if it's only me or if others have noticed this also but I
> have the buildfarm client set up like this:
>
> /home/pgbf
> /home/pgbf/build-farm-4
> /home/pgbf/buildfarm@ -> build-farm-4.19/
> /home/pgbf/buildroot
>
> and then I have a cron job that does this:
>
> x y * * * cd /home/pgbf/buildfarm && ./run_build.pl REL9_6_STABLE
>
> and when it tries to run I get the following error:
>
> cannot create
> /usr/home/pgbf/buildroot/REL9_6_STABLE/loach.lastrun-logs/lastcomand.log:
> No such file or directory
> gmake is not GNU Make - please fix config file at ./run_build.pl line
> 343.
>
> if I manually create the directory loach.lastrun-logs the script then
> continues normally.
>
> I am no perl guru so I can't figure out really what is going on here. 
> I think that the directory
> /usr/home/pgbf/buildroot/REL9_6_STABLE/loach.lastrun-logs needs to be
> created before the PGBuild/Utils.pm sub run_log tries to write to it.
>


I will check on it. Thanks for the report.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication in the same cluster

On Fri, Apr 28, 2017 at 9:28 AM, Robert Haas  wrote:
> On Thu, Apr 27, 2017 at 4:08 AM, Petr Jelinek
>  wrote:
>> Back when writing the original patch set, I was also playing with the
>> idea of having CREATE SUBSCRIPTION do multiple committed steps in
>> similar fashion to CREATE INDEX CONCURRENTLY but that leaves mess behind
>> on failure which also wasn't very popular outcome.

There is no inherent reason why the CREATE INDEX CONCURRENTLY style of
using multiple transactions makes it necessary to leave a mess behind
in the event of an error or hard crash. Is someone going to get around
to fixing the problem for CREATE INDEX CONCURRENTLY (e.g., having
extra steps to drop the useless index during recovery)? IIRC, this was
always the plan.

-- 
Peter Geoghegan

VMware vCenter Server
https://www.vmware.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication in the same cluster

On 2017-05-01 11:22:47 -0700, Peter Geoghegan wrote:
> On Fri, Apr 28, 2017 at 9:28 AM, Robert Haas  wrote:
> > On Thu, Apr 27, 2017 at 4:08 AM, Petr Jelinek
> >  wrote:
> >> Back when writing the original patch set, I was also playing with the
> >> idea of having CREATE SUBSCRIPTION do multiple committed steps in
> >> similar fashion to CREATE INDEX CONCURRENTLY but that leaves mess behind
> >> on failure which also wasn't very popular outcome.
> 
> There is no inherent reason why the CREATE INDEX CONCURRENTLY style of
> using multiple transactions makes it necessary to leave a mess behind
> in the event of an error or hard crash. Is someone going to get around
> to fixing the problem for CREATE INDEX CONCURRENTLY (e.g., having
> extra steps to drop the useless index during recovery)? IIRC, this was
> always the plan.

Doing catalog changes in recovery is frought with problems. Essentially
requires starting one worker per database, before allowing access.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [buildfarm-members] BuildFarm client release 4.19



On 05/01/2017 02:13 PM, Mikael Kjellström wrote:
> On 2017-04-19 15:59, Andrew Dunstan wrote:
>
>
>> I have released version 4.19 of the PostgreSQL Buildfarm client. It can
>> be downloaded from
>> 
>>
>
> I don't know if it's only me or if others have noticed this also but I
> have the buildfarm client set up like this:
>
> /home/pgbf
> /home/pgbf/build-farm-4
> /home/pgbf/buildfarm@ -> build-farm-4.19/
> /home/pgbf/buildroot
>
> and then I have a cron job that does this:
>
> x y * * * cd /home/pgbf/buildfarm && ./run_build.pl REL9_6_STABLE
>
> and when it tries to run I get the following error:
>
> cannot create
> /usr/home/pgbf/buildroot/REL9_6_STABLE/loach.lastrun-logs/lastcomand.log:
> No such file or directory
> gmake is not GNU Make - please fix config file at ./run_build.pl line
> 343.
>
> if I manually create the directory loach.lastrun-logs the script then
> continues normally.
>
> I am no perl guru so I can't figure out really what is going on here. 
> I think that the directory
> /usr/home/pgbf/buildroot/REL9_6_STABLE/loach.lastrun-logs needs to be
> created before the PGBuild/Utils.pm sub run_log tries to write to it.
>
>


OK, that's a bug. Mea culpa.

the quick fix is this patch:


diff --git a/run_build.pl b/run_build.pl
index aeb8966..822b4de 100755
--- a/run_build.pl
+++ b/run_build.pl
@@ -1008,7 +1008,8 @@ sub writelog
 
 sub check_make
 {
-my @out = run_log("$make -v");
+   # don't use run_log here - it's too early in the process
+my @out = `$make -v 2>&1`;
 return undef unless ($? == 0 && grep {/GNU Make/} @out);
 return 'OK';
 }



cheers

andrew




-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication in the same cluster

On Mon, May 1, 2017 at 11:24 AM, Andres Freund  wrote:
> Doing catalog changes in recovery is frought with problems. Essentially
> requires starting one worker per database, before allowing access.

Do you think it's worth just covering the case where you get an error,
such as a duplicate violation? I imagine that that's the much more
common case.

-- 
Peter Geoghegan

VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication in the same cluster

On 2017-05-01 11:31:53 -0700, Peter Geoghegan wrote:
> On Mon, May 1, 2017 at 11:24 AM, Andres Freund  wrote:
> > Doing catalog changes in recovery is frought with problems. Essentially
> > requires starting one worker per database, before allowing access.
> 
> Do you think it's worth just covering the case where you get an error,
> such as a duplicate violation? I imagine that that's the much more
> common case.

What exactly are you proposing to do?  You mean catching errors in the
creating backend, if it didn't crash?  That doesn't strike me as a good
idea, because it'll push down the likelihood of the issue below where
people will see it, but it'll still be likely enough for it to create
problems.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [buildfarm-members] BuildFarm client release 4.19




On 2017-05-01 20:25, Andrew Dunstan wrote:

OK, that's a bug. Mea culpa.

the quick fix is this patch:


diff --git a/run_build.pl b/run_build.pl
index aeb8966..822b4de 100755
--- a/run_build.pl
+++ b/run_build.pl
@@ -1008,7 +1008,8 @@ sub writelog

 sub check_make
 {
-my @out = run_log("$make -v");
+   # don't use run_log here - it's too early in the process
+my @out = `$make -v 2>&1`;
 return undef unless ($? == 0 && grep {/GNU Make/} @out);
 return 'OK';
 }


Nope, that didn't do it.

If I zap REL9_6_STABLE and then run:

./run_build.pl REL9_6_STABLE

I get the following output:

cannot create 
/usr/home/pgbf/buildroot/REL9_6_STABLE/loach.lastrun-logs/lastcomand.log: 
No such file or directory
cannot create 
/usr/home/pgbf/buildroot/REL9_6_STABLE/loach.lastrun-logs/lastcomand.log: 
No such file or directory
cannot create 
/usr/home/pgbf/buildroot/REL9_6_STABLE/loach.lastrun-logs/lastcomand.log: 
No such file or directory

Buildfarm member loach failed on REL9_6_STABLE stage pgsql-Git

/Mikael



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [buildfarm-members] BuildFarm client release 4.19




On 2017-05-01 20:44, Mikael Kjellström wrote:

Nope, that didn't do it.


Or well.  It fixed the check_make bug but not the other bug with that 
the loach.lastrun-logs-directory isn't created before trying to write to it.


/Mikael


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [buildfarm-members] BuildFarm client release 4.19



On 05/01/2017 02:46 PM, Mikael Kjellström wrote:
>
>
> On 2017-05-01 20:44, Mikael Kjellström wrote:
>> Nope, that didn't do it.
>
> Or well.  It fixed the check_make bug but not the other bug with that
> the loach.lastrun-logs-directory isn't created before trying to write
> to it.
>


OK, coming up with a more comprehensive fix.

The obvious workaround for now is to create the directory and dont zap
it or its parents. You should only have to do it once (per branch)

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [buildfarm-members] BuildFarm client release 4.19



On 2017-05-01 20:56, Andrew Dunstan wrote:

OK, coming up with a more comprehensive fix.


Ok.


The obvious workaround for now is to create the directory and dont zap
it or its parents. You should only have to do it once (per branch)


Yes, I know.  That is what I have been doing so far.  But if you want to 
rerun a branch from scratch it's much easier to just do:


rm -rf buildroot/HEAD
or
rm -rf buildroot/REL9_6_STABLE

and then the buildfarm script should do the right thing and create all 
the directories that it needs.


/Mikael


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication in the same cluster

On Mon, May 1, 2017 at 11:37 AM, Andres Freund  wrote:
> What exactly are you proposing to do?  You mean catching errors in the
> creating backend, if it didn't crash?

That is what I meant, though I'm not actually proposing to do anything.

> That doesn't strike me as a good
> idea, because it'll push down the likelihood of the issue below where
> people will see it, but it'll still be likely enough for it to create
> problems.

I was concerned about that too. I have a hard time defending changes
like this to myself, but it doesn't hurt to ask.

-- 
Peter Geoghegan

VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [buildfarm-members] BuildFarm client release 4.19



On 05/01/2017 03:00 PM, Mikael Kjellström wrote:
>
> On 2017-05-01 20:56, Andrew Dunstan wrote:
>> OK, coming up with a more comprehensive fix.
>
> Ok.
>
>> The obvious workaround for now is to create the directory and dont zap
>> it or its parents. You should only have to do it once (per branch)
>
> Yes, I know.  That is what I have been doing so far.  But if you want
> to rerun a branch from scratch it's much easier to just do:
>
> rm -rf buildroot/HEAD
> or
> rm -rf buildroot/REL9_6_STABLE
>
> and then the buildfarm script should do the right thing and create all
> the directories that it needs.
>


Not sure I understand what "rerun a branch" from scratch means. If you
zap the branch directory you lose all its state. That's generally a bad
thing.

Anyway, this patch should fix it for all uses. It creates the directory
if it doesn't exist.

diff --git a/PGBuild/Utils.pm b/PGBuild/Utils.pm
index 91c1362..175eaa7 100644
--- a/PGBuild/Utils.pm
+++ b/PGBuild/Utils.pm
@@ -14,6 +14,8 @@ See accompanying License file for license details
 use strict;
 use warnings;
 
+use File::Path;
+
 use Exporter   ();
 our (@ISA, @EXPORT, @EXPORT_OK, %EXPORT_TAGS);
 
@@ -30,8 +32,9 @@ use vars qw($VERSION); $VERSION = 'REL_4.19';
 sub run_log
 {
 my $command = shift;
-my $file=
- 
"$main::branch_root/$main::st_prefix$main::logdirname/lastcomand.log";
+   my $filedir =
"$main::branch_root/$main::st_prefix$main::logdirname";
+   mkpath($filedir);
+   my $file= "$filedir/lastcomand.log";
 unlink $file;
 system("$command > $file 2>&1");
 my @loglines;


cheers

andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapbuild woes

On 2017-05-01 11:09:44 +0200, Petr Jelinek wrote:
> On 01/05/17 10:03, Andres Freund wrote:
> > On 2017-05-01 03:54:49 +0200, Petr Jelinek wrote:

> >> But, I still think we need to restart the tracking after new
> >> xl_running_xacts. Reason for that is afaics any of the catalog snapshots
> >> that we assigned to transactions at the end of SnapBuildCommitTxn might
> >> be corrupted otherwise as they were built before we knew one of the
> >> supposedly running txes was actually already committed and that
> >> transaction might have done catalog changes.
> > 
> > I'm afraid you're right.  But I think this is even more complicated: The
> > argument in your version that this can only happen once, seems to also
> > be holey: Just imagine a pg_usleep(3000 * 100) right before
> > ProcArrayEndTransaction() and enjoy the picture.

> Well yes, transaction can in theory have written commit/abort xlog
> record and stayed in proc for more than single xl_running_xacts write.
> But then the condition which we test that the new xl_running_xacts has
> bigger xmin than the previously tracked one's xmax would not be
> satisfied and we would not enter the relevant code path yet. So I think
> we should not be able to get any xids we didn't see. But we have to
> restart tracking from beginning (after first checking if we didn't
> already see anything that the xl_running_xacts considers as running),
> that's what my code did.

But to get that correct, we'd have to not only track ->committed, but
also somehow maintain ->aborted, and not just for the transactions in
the original set of running transactions.  That'd be fairly complicated
and large.  The reason I was trying - and it's definitely not correct as
I had proposed - to use the original running_xacts record is that that
only required tracking as many transaction statuses as in the first
xl_running_xacts.  Am I missing something?



The probabilistic tests catch the issues here fairly quickly, btw, if
you run with synchronous_commit=on, while pgbench is running, because
the WAL flushes make this more likely.  Runs this query:

SELECT account_count, teller_count, account_sum - teller_sum s
FROM
(
SELECT count(*) account_count, SUM(abalance) account_sum
FROM pgbench_accounts
) a,
(
SELECT count(*) teller_count, SUM(tbalance) teller_sum
FROM pgbench_tellers
) t

which, for my scale, should always return:
┌─┬─┬───┐
│a│  t  │ s │
├─┼─┼───┤
│ 200 │ 200 │ 0 │
└─┴─┴───┘
but with my POC patch occasionally returns things like:
┌─┬─┬───┐
│a│  t  │   s   │
├─┼─┼───┤
│ 200 │ 212 │ 37358 │
└─┴─┴───┘

which obviously shouldn't be the case.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [buildfarm-members] BuildFarm client release 4.19



On 2017-05-01 21:10, Andrew Dunstan wrote:

Not sure I understand what "rerun a branch" from scratch means. If you
zap the branch directory you lose all its state. That's generally a bad
thing.


I mean like the first time when you set up the buildfarm client / 
branch.  And it's not something I recomend doing all the time but some 
times when the buildfarm client doesn't clean up after it self and there 
are a lot of crap left it's "easier" to just zap everything and start 
over to get it running again.




Anyway, this patch should fix it for all uses. It creates the directory
if it doesn't exist.

diff --git a/PGBuild/Utils.pm b/PGBuild/Utils.pm
index 91c1362..175eaa7 100644
--- a/PGBuild/Utils.pm
+++ b/PGBuild/Utils.pm
@@ -14,6 +14,8 @@ See accompanying License file for license details
 use strict;
 use warnings;

+use File::Path;
+
 use Exporter   ();
 our (@ISA, @EXPORT, @EXPORT_OK, %EXPORT_TAGS);

@@ -30,8 +32,9 @@ use vars qw($VERSION); $VERSION = 'REL_4.19';
 sub run_log
 {
 my $command = shift;
-my $file=
-
"$main::branch_root/$main::st_prefix$main::logdirname/lastcomand.log";
+   my $filedir =
"$main::branch_root/$main::st_prefix$main::logdirname";
+   mkpath($filedir);
+   my $file= "$filedir/lastcomand.log";
 unlink $file;
 system("$command > $file 2>&1");
 my @loglines;


Ok.

Thanks.  Will try it out.

/Mikael


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [buildfarm-members] BuildFarm client release 4.19




On 2017-05-01 21:19, Mikael Kjellström wrote:


Thanks.  Will try it out.


Just wanted to report that I've tried it and it works as expected.

Thanks for the really fast fixes.

/Mikael


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A design for amcheck heapam verification

On Fri, Apr 28, 2017 at 9:02 PM, Peter Geoghegan  wrote:
> I'd like to hear feedback on the general idea, and what the
> user-visible interface ought to look like. The non-deterministic false
> negatives may need to be considered by the user visible interface,
> which is the main reason I mention it.

Bloom filters are one of those things that come up on this mailing
list incredibly frequently but rarely get used in committed code; thus
far, contrib/bloom is the only example we've got, and not for lack of
other proposals.  One problem is that Bloom filters assume you can get
n independent hash functions for a given value, which we have not got.
That problem would need to be solved somehow.  If you only have one
hash function, the size of the required bloom filter probably gets
very large.

When hashing index and heap tuples, do you propose to include the heap
TID in the data getting hashed?  I think that would be a good idea,
because otherwise you're only verifying that every heap tuple has an
index pointer pointing at something, not that every heap tuple has an
index tuple pointing at the right thing.

I wonder if it's also worth having a zero-error mode, even if it runs
for a long time.  Scan the heap, and probe the index for the value
computed from each heap tuple.  Maybe that's so awful that nobody
would ever use it, but I'm not sure.  It might actually be simpler to
implement than what you have in mind.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Description of create_singleton_array()

Neha Khatri  writes:
> Is it intentional to have the existing $SUBJECT.
> The commit 33f43725
> 
> updated
> the function text_to_array() such that it does not directly invoke
> create_singleton_array(). But $SUBJECT was not updated.

Yeah, that was pretty sloppy.

> If it is not intentional then is it fine to update the description like
> attached.

Well, now that we've been burnt once by the specific call site moving,
I think we should learn from experience and not have this say where
it's called from.  That's a lousy substitute for defining the API
expectations explicitly, anyway.

Your proposed patch tries to improve that, but the result isn't
necessarily a "1-D array" --- it's a one-element array, with possibly
a higher number of dimensions than 1.  (Not really sure why we thought
flexibility in the number of dimensions was useful, but there it is.)

Actually, the thing that's more important to specify is that the function
insists on using the caller's fcinfo->flinfo->fn_extra.  The usage in
text_to_array[_internal] is on the hairy edge of being broken: if that
function were using fn_extra for some other purpose in other code paths,
you could get a core dump or worse from the conflict, because it's
possible for fldsep to vary from empty to non-empty within a single
sequence of calls.  That's especially nasty because that would be far from
a mainstream usage, so such a bug could go undetected for a long time.

I wonder if we wouldn't be better off to get rid of this function entirely.
It seems like it's not providing any real increment of simplicity over a
direct call to construct_md_array, since text_to_array could perfectly
well hard-wire the array element storage properties, as we do in very many
other places.  And it's a bug waiting to happen, looks like.

I pushed an update to the header comment, but now I'm thinking maybe we
should just get rid of it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQhost may return socket dir for network connection

On Mon, May 1, 2017 at 1:36 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Mon, May 1, 2017 at 12:06 PM, Tom Lane  wrote:
>>> Having said that, the behavior stated in $subject does sound wrong.
>
>> I'm not sure.  My understanding of the relationship between host and
>> hostaddr is that hostaddr overrides our notion of where to find host,
>> but not our notion of the host to which we're connecting.  Under that
>> definition, the current behavior as described by Kyotaro sounds
>> correct.
>
> Perhaps.  But hostaddr also forces us to believe that we're making an
> IP connection, so it still seems pretty dubious to return a socket
> path.  The true situation is that we're connecting to an IP host that
> we do not know the name of.

Yes, I think that's a reasonable interpretation.

> I notice that one of the recent changes was made to avoid situations where
> PQhost() would return NULL and thereby provoke a crash if the application
> wasn't expecting that (which is not unreasonable of it, since the PQhost()
> documentation mentions no such hazard).  So I would not want to see us
> return NULL in this case.
>
> And I believe we already considered and rejected the idea of having it
> return the hostaddr string, back in some of the older discussions.
> (We could revisit that decision, no doubt, but let's go back and see
> what the reasoning was first.)
>
> But maybe returning an empty string ("") would be OK?

Yeah, that might be OK.  But I'd be inclined not to back-patch any
behavior changes we make in this area unless it's clear that 9.6
regressed relative to previous releases.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Potential hot-standby bug around xacts committed but in xl_running_xacts

Hi,

The thread below
http://archives.postgresql.org/message-id/f37e975c-908f-858e-707f-058d3b1eb214%402ndquadrant.com
describes an issue in logical decoding that arises because
xl_running_xacts' contents aren't necessarily coherent with the contents
of the WAL, because RecordTransactionCommit() / RecordTransactionAbort()
don't have any interlock against the procarray. That means
xl_running_xacts can contain transactions assumed to be running, that
already have their commit/abort records WAL logged.

I think that's not just problematic in logical decoding, but also
Hot-Standby. Consider the following:

ProcArrayApplyRecoveryInfo() gets an xl_running_xacts record that's not
suboverflowed, and thus will change to STANDBY_SNAPSHOT_READY. In that
case it'll populate the KnownAssignedXids machinery using
KnownAssignedXidsAdd().

Once STANDBY_SNAPSHOT_READY, CheckRecoveryConsistency() will signal
postmaster to allow connections.

For HS, a snapshot will be built by GetSnapshotData() using
KnownAssignedXidsGetAndSetXmin(). That in turn uses the transactions
currently known to be running, to populate the snapshot.

Now, if transactions have committed before (in the "earlier LSN" sense)
the xl_running_xacts record, ExpireTreeKnownAssignedTransactionIds() in
xact_redo_commit() will already have run. Which means we'll assume
already committed transactions are still running. In other words, the
snapshot is corrupted.

Luckily this'll self-correct over time, fixed by
ExpireOldKnownAssignedTransactionIds().

Am I missing something that protects against the above scenario?

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A design for amcheck heapam verification

On Mon, May 1, 2017 at 12:46 PM, Robert Haas  wrote:
> Bloom filters are one of those things that come up on this mailing
> list incredibly frequently but rarely get used in committed code; thus
> far, contrib/bloom is the only example we've got, and not for lack of
> other proposals.

They certainly are a fashionable data structure, but it's not as if
they're a new idea. The math behind them is very well understood. They
solve one narrow class of problem very well.

> One problem is that Bloom filters assume you can get
> n independent hash functions for a given value, which we have not got.
> That problem would need to be solved somehow.  If you only have one
> hash function, the size of the required bloom filter probably gets
> very large.

I don't think that that's a problem, because you really only need 2
hash functions [1], which we have already (recall that Andres added
Murmur hash to Postgres 10). It's an area that I'd certainly need to
do more research on if I'm to go forward with bloom filters, but I'm
pretty confident that there is a robust solution to the practical
problem of not having arbitrary many hash functions at hand. (I think
that you rarely need all that many independent hash functions, in any
case.)

It isn't that hard to evaluate whether or not an implementation has
things right, at least for a variety of typical cases. We know how to
objectively evaluate a hash function while making only some pretty
mild assumptions.

> When hashing index and heap tuples, do you propose to include the heap
> TID in the data getting hashed?  I think that would be a good idea,
> because otherwise you're only verifying that every heap tuple has an
> index pointer pointing at something, not that every heap tuple has an
> index tuple pointing at the right thing.

Yes -- I definitely want to hash the heap TID from each IndexTuple.

> I wonder if it's also worth having a zero-error mode, even if it runs
> for a long time.  Scan the heap, and probe the index for the value
> computed from each heap tuple.  Maybe that's so awful that nobody
> would ever use it, but I'm not sure.  It might actually be simpler to
> implement than what you have in mind.

It's easy if you don't mind that the implementation will be an ad-hoc
nested loop join. I guess I could do that too, if only because it
won't be that hard, and that's really what you want when you know you
have corruption. Performance will probably be prohibitively poor when
verification needs to be run in any kind of routine way, which is a
problem if that's the only way it can work. My sense is that
verification needs to be reasonably low overhead, and it needs to
perform pretty consistently, even if you only use it for
stress-testing new features.

To reiterate, another thing that makes a bloom filter attractive is
how it simplifies resource management relative to an approach
involving sorting or a hash table. There are a bunch of edge cases
that I don't have to worry about around resource management (e.g., a
subset of very wide outlier IndexTuples, or two indexes that are of
very different sizes associated with the same table that need to
receive an even share of memory).

As I said, even if I was totally willing to duplicate the effort that
went into respecting work_mem as a budget within places like
tuplesort.c, having as little infrastructure code as possible is a
specific goal for amcheck.

[1] https://www.eecs.harvard.edu/~michaelm/postscripts/rsa2008.pdf
-- 
Peter Geoghegan

VMware vCenter Server
https://www.vmware.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Faster pg_timezone_names view

I've been casting about for low-hanging fruit for making the regression
tests run faster.  One thing I noticed is that this query in sysviews.sql
is one of the slowest queries in the whole suite:

select count(distinct utc_offset) >= 24 as ok from pg_timezone_names;

Reading pg_timezone_names takes upwards of 300 msec on my workstation,
and north of 3 seconds on some of my slower buildfarm critters.  I'd
always figured that, since it's reading the whole of the timezone data
directory tree, it's just naturally got to be slow.  However, I chanced
to try strace'ing a scan of pg_timezone_names, and what I saw was just
horrid.  We do something like this for *each* timezone data file:

stat("/usr/share/zoneinfo/posix/America/Iqaluit", {st_mode=S_IFREG|0644, 
st_size=2000, ...}) = 0
open("/usr/share/zoneinfo", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 20
getdents(20, /* 68 entries */, 32768)   = 1968
close(20)   = 0
open("/usr/share/zoneinfo/posix", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 
20
getdents(20, /* 62 entries */, 32768)   = 1776
close(20)   = 0
open("/usr/share/zoneinfo/posix/America", 
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 20
getdents(20, /* 146 entries */, 32768)  = 4696
close(20)   = 0
open("/usr/share/zoneinfo/posix/America/Iqaluit", O_RDONLY) = 20
read(20, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\0\0\0\10\0\0\0\0"..., 
54968) = 2000
close(20)   = 0
open("/usr/share/zoneinfo", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 20
getdents(20, /* 68 entries */, 32768)   = 1968
close(20)   = 0
open("/usr/share/zoneinfo/posixrules", O_RDONLY) = 20
read(20, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\4\0\0\0\4\0\0\0\0"..., 
54968) = 3519
close(20)   = 0

That is, having found a data file by dint of searching the directory tree,
we *repeat the search* from the root of the timezone tree.  And then we
read the posixrules file, in addition to the target zoneinfo file itself.
And just to add insult to injury, we search the directory path down to
posixrules, though fortunately that's just one level down.

There are a couple of things going on here.  One is that pg_open_tzfile()
performs the repeat directory searches we're seeing above, because it is
tasked with accepting a case-insensitive spelling of a timezone name and
returning the correctly-cased zone name as seen in the file system.
That's necessary, and not too expensive, when performing something like
a "SET timezone" command, because we don't insist that the user spell
the zone name canonically in SET.  But it's pretty silly in the context
of a directory scan, because we already know the canonical spelling of
the file path: we just read it from the filesystem.

The second problem is that we don't cache the result of reading
posixrules, even though we need it for practically every zone load.
This seems like a performance bug in the upstream IANA library,
though I'm not sure that they care about use-cases like this one.
(They might also object that they don't want to cache data that
could conceivably change on-disk; but we already cache timezone
data at other levels, so there's no reason not to do it here.)

The attached patch adjusts both of these things.  On my workstation
it reduces the runtime of the pg_timezone_names view by better than
a factor of 3.

If I were to commit this, I'd want to back-patch, because experience
has shown that we don't want any cross-branch variation in the timezone
code files; absorbing upstream fixes is painful enough without that.
But it seems like a pretty safe and helpful thing to back-patch.

Thoughts, objections?

regards, tom lane

diff --git a/src/timezone/localtime.c b/src/timezone/localtime.c
index 154124e..a2faa82 100644
*** a/src/timezone/localtime.c
--- b/src/timezone/localtime.c
*** static const pg_time_t time_t_min = MINV
*** 55,60 
--- 55,67 
  static const pg_time_t time_t_max = MAXVAL(pg_time_t, TYPE_BIT(pg_time_t));
  
  /*
+  * We cache the result of trying to load the TZDEFRULES zone here.
+  * tzdefrules_loaded is 0 if not tried yet, +1 if good, -1 if failed.
+  */
+ static struct state tzdefrules_s;
+ static int	tzdefrules_loaded = 0;
+ 
+ /*
   * The DST rules to use if TZ has no rules and we can't load TZDEFRULES.
   * We default to US rules as of 1999-08-17.
   * POSIX 1003.1 section 8.1.1 says that the default DST rules are
*** tzparse(const char *name, struct state *
*** 942,948 
  		charcnt = stdlen + 1;
  		if (sizeof sp->chars < charcnt)
  			return false;
! 		load_ok = tzload(TZDEFRULES, NULL, sp, false) == 0;
  	}
  	if (!load_ok)
  		sp->leapcnt = 0;		/* so, we're off a little */
--- 949,969 
  		charcnt = stdlen + 1;
  		if (sizeof sp->chars < charcnt)
  			return false;
! 
! 		/*
! 		 * This bit also differs from the IANA code, which doesn't make

Re: [HACKERS] A design for amcheck heapam verification

On Fri, Apr 28, 2017 at 6:02 PM, Peter Geoghegan  wrote:
>  - Is committed, and committed before RecentGlobalXmin.

Actually, I guess amcheck would need to use its own scan's snapshot
xmin instead. This is true because it cares about visibility in a way
that's "backwards" relative to existing code that tests something
against RecentGlobalXmin. Is there any existing thing that works that
way?

If it's not clear what I mean: existing code that cares about
RecentGlobalXmin is using it as a *conservative* point before which
every snapshot sees every transaction as committed/aborted (and
therefore nobody can care if that other backend hot prunes dead tuples
from before then, or whatever it is). Whereas, amcheck needs to care
about the possibility that *anyone else* decided that pruning or
whatever is okay, based on generic criteria, and not what amcheck
happened to see as RecentGlobalXmin during snapshot acquisition.

-- 
Peter Geoghegan

VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication in the same cluster

2017-05-01 Thread Greg Stark

On 1 May 2017 at 19:24, Andres Freund  wrote:
>> There is no inherent reason why the CREATE INDEX CONCURRENTLY style of
>> using multiple transactions makes it necessary to leave a mess behind
>> in the event of an error or hard crash. Is someone going to get around
>> to fixing the problem for CREATE INDEX CONCURRENTLY (e.g., having
>> extra steps to drop the useless index during recovery)? IIRC, this was
>> always the plan.
>
> Doing catalog changes in recovery is frought with problems. Essentially
> requires starting one worker per database, before allowing access.

The "plan" was to add more layers PG_TRY and transactions so that if
there was an error during building the index all the remnants of the
failed index build got cleaned up. But when I went tried to actually
do it the problem seemed to metastatize and it was going to require
two or three layers of messy nested PG_TRY and extra transactions.
Perhaps there's a cleaner way to structure it and I should look again.

I don't recall ever having a plan to do anything in recovery. I think
we did talk about why it was hard to mark hash indexes invalid during
recovery which was probably the same problem.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A design for amcheck heapam verification

2017-05-01 Thread Greg Stark

On 1 May 2017 at 20:46, Robert Haas  wrote:
>  One problem is that Bloom filters assume you can get
> n independent hash functions for a given value, which we have not got.
> That problem would need to be solved somehow.  If you only have one
> hash function, the size of the required bloom filter probably gets
> very large.

There's a simple formula to calculate the optimal number of hash
functions and size of the filter given a target false positive rate.

But I don't think this is as big of a problem as you imagine.

a) we don't really only have one hash function, we have a 32-bit hash
function and we could expand that to a larger bit size if we wanted.
Bloom filters are never 2^32 size bit arrays for obvious reasons. If
you have a 1kbit sized bloom filter that only needs 10 bits per index
so you have three fully independent hash functions available already.
If we changed to a 64-bit or 128-bit hash function then you could have
enough bits available to have a larger set of hash functions and a
larger array.

b) you can get a poor man's universal hash out of hash_any or hash_int
by just tweaking the input value in a way that doesn't interact in a
simple way with the hash function. Even something as simple has xoring
it with a random number (i.e. a vector of random numbers that identify
your randomly chosen distinct "hash functions") seems to work fine.

However for future-proofing security hardening I think Postgres should
really implement a real mathematically rigorous Universal Hashing
scheme which provides a family of hash functions from which to pick
randomly. This prevents users from being able to generate data that
would intentionally perform poorly in hash data structures for
example. But it also means you have a whole family of hash functions
to pick for bloom filters.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication and PANIC during shutdown checkpoint in publisher

On 4/25/17 21:47, Michael Paquier wrote:
> Attached is an updated patch to reflect that.

I edited this a bit, here is a new version.

A variant approach would be to prohibit *all* new commands after
entering the "stopping" state, just let running commands run.  That way
we don't have to pick which individual commands are at risk.  I'm not
sure that we have covered everything here.

More reviews please.  Also, this is a possible backpatching candidate.

Also, if someone has a test script for reproducing the original issue,
please share it, or run it against this patch.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


v3-0001-Prevent-panic-during-shutdown-checkpoint.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapbuild woes

On 5/1/17 13:02, Tom Lane wrote:
> Andres Freund  writes:
>> On 2017-05-01 12:32:07 -0400, Tom Lane wrote:
>>> But quite aside from the question of whether we can afford the cycles,
>>> it seems like the wrong approach.  IMO the buildfarm is mainly for
>>> verifying portability, not for trying to prove that race-like
>>> conditions don't exist.  In most situations we're going out of our way
>>> to ensure reproduceability of tests we add to the buildfarm sequence;
>>> but it seems like this is looking for irreproducible results.
> 
>> Yea, I wondered about that upthread as well.  But the tests are quite
>> useful nonetheless.  Wonder about adding them simply as a separate
>> target.
> 
> I have no objection to adding more tests as a non-default target.

Well, the problem with nondefault targets is that they are hard to find
if you don't know them, and then they will rot.

Sure, we need a way to distinguish different classes of tests, but lets
think about the bigger scheme, too.  Ideas welcome.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump emits ALTER TABLE ONLY partitioned_table

2017-05-01 Thread Stephen Frost

Amit,

* Amit Langote (langote_amit...@lab.ntt.co.jp) wrote:
> Sorry about the delay.

No worries, I'm just back from being in NY and will take a look at this
tomorrow (wrt the open item, I'll provide a status tomorrow).

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Description of create_singleton_array()

2017-05-01 Thread Neha Khatri

On Tue, May 2, 2017 at 5:47 AM, Tom Lane  wrote:

>
>
> Well, now that we've been burnt once by the specific call site moving,
> I think we should learn from experience and not have this say where
> it's called from.  That's a lousy substitute for defining the API
> expectations explicitly, anyway.
>

Agreed.

>
> Your proposed patch tries to improve that, but the result isn't
> necessarily a "1-D array" --- it's a one-element array, with possibly
> a higher number of dimensions than 1.  (Not really sure why we thought
> flexibility in the number of dimensions was useful, but there it is.)
>

Ah, yes. The function in itself has the capability to allow m-D array where
'm' could be more than 1. I kind of missed this fact because the only
caller of the function, attempts to create a 1-D array.

>
> I wonder if we wouldn't be better off to get rid of this function entirely.
> It seems like it's not providing any real increment of simplicity over a
> direct call to construct_md_array, since text_to_array could perfectly
> well hard-wire the array element storage properties, as we do in very many
> other places.  And it's a bug waiting to happen, looks like.

Yesterday, while prying into the definition of this function it did occur
to me that whether there is an additional benefit of this function over
construct_md_array. Yes, it looked like that construct_md_array could be
used in lieu of create_singleton_array.

Regards,
Neha

Re: [HACKERS] A design for amcheck heapam verification

On Mon, May 1, 2017 at 2:10 PM, Peter Geoghegan  wrote:
> Actually, I guess amcheck would need to use its own scan's snapshot
> xmin instead. This is true because it cares about visibility in a way
> that's "backwards" relative to existing code that tests something
> against RecentGlobalXmin. Is there any existing thing that works that
> way?

Looks like pg_visibility has a similar set of concerns, and so
sometimes calls GetOldestXmin() to "recompute" what it calls
OldestXmin (which I gather is like RecentGlobalXmin, but comes from
calling GetOldestXmin() at least once). This happens within
pg_visibility's collect_corrupt_items(). So, I could either follow
that approach, or, more conservatively, call GetOldestXmin()
immediately after each "amcheck whole index scan" finishes, for use
later on, when we go to the heap. Within the heap, we expect that any
committed tuple whose xmin precedes FooIndex.OldestXmin should be
present in that index's bloom filter. Of course, when there are
multiple indexes, we might only arrive at the heap much later. (I
guess we'd also want to check if the MVCC Snapshot's xmin preceded
FooIndex.OldestXmin, and set that as FooIndex.OldestXmin when that
happened to be the case.)

Anyone have an opinion on any of this? Offhand, I think that calling
GetOldestXmin() once per index when its "amcheck whole index scan"
finishes would be safe, and yet provide appreciably better test
coverage than only expecting things visible to our original MVCC
snapshot to be present in the index. I don't see a great reason to be
more aggressive and call GetOldestXmin() more often than once per
whole index scan, though.

--
Peter Geoghegan

VMware vCenter Server
https://www.vmware.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CTE inlining

2017-05-01 Thread Ilya Shkuratov

30.04.2017, 08:58, "Craig Ringer" : On 30 Apr. 2017 13:28, "Andres Freund" wrote:On 2017-04-30 00:28:46 -0400, Tom Lane wrote:> There's already a pretty large hill to climb here in the way of> breaking peoples' expectations about CTEs being optimization> fences. Breaking the documented semantics about CTEs being> single-evaluation seems to me to be an absolute non-starter. If all referenced functions are non-volatile, I don't quite see theproblem? Personally I believe we'll have to offer a properanti-inlining workaround anyway, and in that case there's really nothingthat should stop us from inlining CTE without volatile functions twice? Exactly. The initial implementation had limitations. So they got documented as features, not bugs or possible future enhancements. Yay? So we're stuck with it forever? I agree we shouldn't break working, correct queries such that they return different results. But if someone is lying about volatility they don't get the expectation of correctness. And we have a policy against hints, so surely we should be keen to remove this hack that serves as a hint - right? We have OFFSET 0 for anyone really depending on it, and at least when you read that you know to go "wtf" and look at the manual, wheras the CTE fence behaviour is invisible and silent. Yes, experienced and established postgres users expect the optimisation fence behaviour. They abuse it as a query hint or work around it with subqueries in FROM. They also know OFFSET 0 ... and ideally should even read the relnotes. Users from other DMBSes looking to migrate, and new users, are regularly surprised by our CTEs. I see it a lot on Stack Overflow and other places outside our comfortable walls. Personally I find it very annoying when I'd like to use CTEs to structure queries more readably, but land up having to use subqueries in FROM instead. Like the work Andes has been doing on our bizarre handing of SRFs in the SELECT target list I really think it's just something that needs to be done. Also, I would like to remind that the disabling optimization fence is suggested to be OPTIONAL.So we don't break peoples' expectations, nor documented semantics.

Re: [HACKERS] A design for amcheck heapam verification

On Mon, May 1, 2017 at 4:28 PM, Peter Geoghegan  wrote:
> Anyone have an opinion on any of this? Offhand, I think that calling
> GetOldestXmin() once per index when its "amcheck whole index scan"
> finishes would be safe, and yet provide appreciably better test
> coverage than only expecting things visible to our original MVCC
> snapshot to be present in the index. I don't see a great reason to be
> more aggressive and call GetOldestXmin() more often than once per
> whole index scan, though.

Wait, that's wrong, because in general RecentGlobalXmin may advance at
any time as new snapshots are acquired by other backends. The only
thing that we know for sure is that our MVCC snapshot is an interlock
against things being recycled that the snapshot needs to see
(according to MVCC semantics). And, we don't just have heap pruning to
worry about -- we also have nbtree's LP_DEAD based recycling to worry
about, before and during the amcheck full index scan (actually, this
is probably the main source of problematic recycling for our
verification protocol).

So, I think that we could call GetOldestXmin() once, provided we were
willing to recheck in the style of pg_visibility if and when there was
an apparent violation that might be explained as caused by concurrent
LP_DEAD recycling within nbtree. That seems complicated enough that
I'll never be able to convince myself that it's worthwhile before
actually trying to write the code.

-- 
Peter Geoghegan

VMware vCenter Server
https://www.vmware.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A design for amcheck heapam verification

Peter Geoghegan  writes:
> If it's not clear what I mean: existing code that cares about
> RecentGlobalXmin is using it as a *conservative* point before which
> every snapshot sees every transaction as committed/aborted (and
> therefore nobody can care if that other backend hot prunes dead tuples
> from before then, or whatever it is). Whereas, amcheck needs to care
> about the possibility that *anyone else* decided that pruning or
> whatever is okay, based on generic criteria, and not what amcheck
> happened to see as RecentGlobalXmin during snapshot acquisition.

ISTM if you want to do that you have an inherent race condition.
That is, no matter what you do, the moment after you look the currently
oldest open transaction could commit, allowing some other session's
view of RecentGlobalXmin to move past what you think it is, so that
that session could start pruning stuff.

Maybe you can fix this by assuming that your own session's advertised xmin
is a safe upper bound on everybody else's RecentGlobalXmin.  But I'm not
sure if that rule does what you want.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A design for amcheck heapam verification

On Mon, May 1, 2017 at 6:20 PM, Tom Lane  wrote:
> Maybe you can fix this by assuming that your own session's advertised xmin
> is a safe upper bound on everybody else's RecentGlobalXmin.  But I'm not
> sure if that rule does what you want.

That's what you might ultimately need to fall back on (that, or
perhaps repeated calls to GetOldestXmin() to recheck, in the style of
pg_visibility). It's useful to do rechecking, rather than just
starting with the MVCC snapshot's xmin because you might be able to
determine that the absence of some index tuple in the index (by which
I mean its bloom filter) *still* cannot be explained by concurrent
recycling. The conclusion that there is a real problem might never
have been reached without this extra complexity.

I'm not saying that it's worthwhile to add this complexity, rather
than just starting with the MVCC snapshot's xmin in the first place --
I really don't have an opinion either way just yet.

-- 
Peter Geoghegan

VMware vCenter Server
https://www.vmware.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication in the same cluster

On 4/30/17 20:31, Andres Freund wrote:
> On 2017-04-26 23:41:51 +0200, Petr Jelinek wrote:
>> Yes that's result of how logical replication slots work, the transaction
>> that needs to finish is your transaction. It can be worked around by
>> creating the slot manually via the SQL interface for example and create
>> the subscription using WITH (NOCREATE SLOT, SLOT NAME = 'your slot') .
> 
> Is there any chance the creation of the slot could be moved ahead, to
> before an xid has been assigned?

The trend has rather been to do most of the stuff before creating the
slot, so that all the error checking is done.  See for example
dcb39c37c1d3b90115e1501af8efb7af59c341c3.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A design for amcheck heapam verification

On Mon, May 1, 2017 at 9:20 PM, Tom Lane  wrote:
> ISTM if you want to do that you have an inherent race condition.
> That is, no matter what you do, the moment after you look the currently
> oldest open transaction could commit, allowing some other session's
> view of RecentGlobalXmin to move past what you think it is, so that
> that session could start pruning stuff.

It can't prune the stuff we care about if we've got a shared content
lock on the target buffer.  That's the trick pg_visibility uses:

   /*
 * Time has passed since we computed
OldestXmin, so it's
 * possible that this tuple is
all-visible in reality even
 * though it doesn't appear so based on our
 * previously-computed value.  Let's
compute a new value so we
 * can be certain whether there is a problem.
 *
 * From a concurrency point of view,
it sort of sucks to
 * retake ProcArrayLock here while
we're holding the buffer
 * exclusively locked, but it should
be safe against
 * deadlocks, because surely
GetOldestXmin() should never take
 * a buffer lock. And this shouldn't
happen often, so it's
 * worth being careful so as to avoid
false positives.
 */

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CTE inlining

On 1 May 2017 at 22:26, Andreas Karlsson  wrote:

> I am not sure I like decorators since this means adding an ad hoc query hint
> directly into the SQL syntax which is something which I requires serious
> consideration.

And mangling the semantics of existing syntax doesn't?

That's what we do right now so we can pretend we don't have query
hints while still having query hints.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CTE inlining

On Sun, Apr 30, 2017 at 5:54 PM, Tomas Vondra
 wrote:
> But I keep running into people who face serious performance issues exactly
> because not realizing this, and using CTEs as named subqueries. And when I
> tell them "optimization fence" they react "Whaaat?"
>
> If I had to make up some numbers, I'd say the "What?" group is about 10x
> the group of people who intentionally rely on CTEs being optimization
> fences.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CTE inlining

On 1 May 2017 at 21:05, Tomas Vondra  wrote:
> On 05/01/2017 06:22 AM, Pavel Stehule wrote:
>>
>>
>>
>> 2017-05-01 1:21 GMT+02:00 Andres Freund > >:
>>
>> On 2017-04-30 07:19:21 +0200, Pavel Stehule wrote:
>> > why we cannot to introduce GUC option - enable_cteoptfence ?
>>
>> Doesn't really solve the issue, and we've generally shied away from
>> GUCs
>> that influence behaviour after a few bad experiences.  What if you
>> want
>> one CTE inlined, but another one not?
>>
>>
>> It change behave in same sense like enable_nestloop, enable_hashjoin,
>> ... with same limits.
>>
>
> Those (and also the other enable_*) GUCs are a great example why we should
> not use GUCs for tweaking planner behavior, except perhaps for the purpose
> of investigation. It's an extremely blunt tool.

Definitely agree. You can't use them to affect only a portion of a
query. In fact there's no way to automatically scope them to one query
at all. They're also very big hammers, and as we introduce new types
of plan nodes they won't offer comprehensive control without being
amended. They're a tool of last resort for dealing with problems.

> Exactly the same issues would affect this new GUC. It would be impossible to
> use multiple CTEs in the query with different fencing behavior, and it would
> be just as difficult to investigate.

Yeah, I think a GUC is a non-starter.

If we want fence behaviour, we should require people to declare their
desire for fence behaviour, rather than treating it as a sort of
hint-as-a-bug that we grandfather in because we're so desperate not to
admit we have hints.

Before forming a strong view on this, I strongly everyone stick your
head outside the postgresql.org lists for a while. In my experience
even regular postgres users I see pop up on places like Stack Overflow
tend to react to this behaviour with "WTF?!" and wonder why we haven't
fixed this limitation yet, viewing it as a bug not a feature.

The same logic being applied here should've prevented us from ever introducing:

* inlining of SQL functions
* inlining of views
* inlining of subqueries

... but somehow, this one is different.

We're not even removing functionality. You can still use the OFFSET 0
hack. If you need a nonzero offset that's fine, because we don't
inline over OFFSET anyway.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Small patch for pg_basebackup argument parsing

On Sat, Apr 22, 2017 at 12:46 PM, Pierre Ducroquet  wrote:
>> Here are the general guidelines about patch submission:
>> https://wiki.postgresql.org/wiki/Submitting_a_Patch
>> And the best thing would be to register it to the next commit fest so
>> as it does not get lost:
>> https://commitfest.postgresql.org/
>
> Thank you, I added an entry in the next commit fest.

Isn't (strtol_endptr != optarg + strlen(optarg))) just a really
inefficient way of writing (strtol_endptr != '\0')?

I would be inclined to write tests like if (standby_message_timeout <
0 || strtol_endptr != optarg + strlen(optarg)) in the other order;
that is, test strtol_endptr first, and only test the other conditions
if that test passes.

I would probably also just use endptr rather than strtol_endptr, for brevity.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] subscription worker doesn't start immediately on eabled

On 4/27/17 21:36, Masahiko Sawada wrote:
> That makes sense to me. Since NOCONNECT option changes some default
> values including ENABLED to false I think we should apply it also when
> NOCONNECT is specified?

That's not necessary, because if NOCONNECT is specified, then "enabled"
will be set accordingly.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] subscription worker doesn't start immediately on eabled

On 4/26/17 11:51, Fujii Masao wrote:
> On Wed, Apr 26, 2017 at 4:03 PM, Kyotaro HORIGUCHI
>  wrote:
>> At Tue, 25 Apr 2017 14:45:03 -0400, Peter Eisentraut 
>>  wrote in 
>> <3d6a1bd0-08ce-301d-3336-ec9f623a3...@2ndquadrant.com>
>>> On 4/6/17 08:24, Kyotaro HORIGUCHI wrote:
 The attached patch wakes up launcher when a subscription is
 enabled. This fails when a subscription is enabled immedaitely
 after disabling but it won't be a matter.
>>>
>>> committed, thanks
>>
>> Thanks!
> 
> This patch makes me think that CREATE SUBSCRIPTION should also wake up
> the launcher only when ENABLE is specified. Patch attached. Thought?

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication and PANIC during shutdown checkpoint in publisher

2017-05-01 Thread Michael Paquier

On Tue, May 2, 2017 at 7:07 AM, Peter Eisentraut
 wrote:
> On 4/25/17 21:47, Michael Paquier wrote:
>> Attached is an updated patch to reflect that.
>
> I edited this a bit, here is a new version.

Thanks, looks fine for me.

> A variant approach would be to prohibit *all* new commands after
> entering the "stopping" state, just let running commands run.  That way
> we don't have to pick which individual commands are at risk.  I'm not
> sure that we have covered everything here.

It seems to me that everything is covered. We are taking about
creation and dropping of slots here, where standby snapshots can be
created and SQL queries can be run when doing a tablesync meaning that
FPWs could be taken in the context of the WAL sender. Blocking all
commands would be surely safer I agree, but I see no reason to block
things more than necessary.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapbuild woes

2017-05-01 Thread Noah Misch

On Mon, May 01, 2017 at 12:32:07PM -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-05-01 08:46:47 -0400, Tom Lane wrote:
> >> 30sec is kind of a big lump from a buildfarm standpoint, especially if
> >> you mean "it runs for 30s on my honkin' fast workstation".  I'm fine
> >> with individual tests that run for ~ 1sec.
> 
> > I was more thinking of pgench -T$XX, rather than constant number of
> > iterations.  I currently can reproduce the issues within like 3-4
> > minutes, so 5s is probably not quite sufficient to get decent coverage.

You might hit the race faster by adding a dedicated stress test function to
regress.c.

> IMO the buildfarm is mainly for verifying portability, not for
> trying to prove that race-like conditions don't exist.

Perhaps so, but it has excelled at both tasks.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] vcregress support for single TAP tests

2017-05-01 Thread Vaishnavi Prabakaran

On Mon, May 1, 2017 at 11:01 PM, Andrew Dunstan <
andrew.duns...@2ndquadrant.com> wrote:

>
>
> In the absence of further comments I'm going to apply this and
> back-patch it so we can get a significant improvement in how the
> buildfarm reports results from TAP tests, as well as increased coverage,
> on Windows/MSVC machines.
>
>
I see vcregress is not processing the second argument passed to it. For
example, "vcregress.bat check serial" runs the regress test in parallel
mode.
Though this issue is present even before this patch is applied, am writing
here because this patch touches the argument passing to sub-routine logic.
Attached is the one fix I find it working for me.

Thanks & Regards,
Vaishnavi,
Fujitsu Australia.


0001-Argument-passing-to-sub-routine-corrected-in-vcregre.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transition tables for triggers on foreign tables and views

2017-05-01 Thread Noah Misch

On Fri, Apr 28, 2017 at 05:07:31AM +, Noah Misch wrote:
> On Wed, Apr 26, 2017 at 11:17:05AM +1200, Thomas Munro wrote:
> > My colleague Prabhat Sahu reported off list that transition tables
> > don't work for views.  I probably should have thought about that when
> > I fixed something similar for partitioned tables, and after some
> > experimentation I see that this is also broken for foreign tables.
> > 
> > For foreign tables using postgres_fdw, I see that transition tables
> > capture new rows for INSERT but capture nothing for DELETE and UPDATE.
> > 
> > For views, aside from the question of transition tables, I noticed
> > that statement triggers don't seem to fire at all with updatable
> > views.  Surely they should -- isn't that a separate bug?
> > 
> > Example:
> > 
> >   create table my_table (i int);
> >   create view my_view as select * from my_table;
> >   create function my_trigger_function() returns trigger language plpgsql as
> > $$ begin raise warning 'hello world'; return null; end; $$;
> >   create trigger my_trigger after insert or update or delete on my_view
> > for each statement execute procedure my_trigger_function();
> >   insert into my_view values (42);
> > 
> > ... and the world remains ungreeted.
> > 
> > As for transition tables, there are probably meaningful ways to
> > support those for both views and foreign tables at least in certain
> > cases, as future feature enhancements.  For now, do you agree that we
> > should reject such triggers as unsupported?  See attached.
> 
> [Action required within three days.  This is a generic notification.]
> 
> The above-described topic is currently a PostgreSQL 10 open item.  Kevin,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> v10 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within three calendar days of
> this message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping v10.  Consequently, I will appreciate your efforts
> toward speedy resolution.  Thanks.
> 
> [1] 
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Potential hot-standby bug around xacts committed but in xl_running_xacts

2017-05-01 Thread Simon Riggs

On 1 May 2017 at 22:38, Andres Freund  wrote:

> The thread below 
> http://archives.postgresql.org/message-id/f37e975c-908f-858e-707f-058d3b1eb214%402ndquadrant.com
> describes an issue in logical decoding that arises because
> xl_running_xacts' contents aren't necessarily coherent with the contents
> of the WAL, because RecordTransactionCommit() / RecordTransactionAbort()
> don't have any interlock against the procarray.  That means
> xl_running_xacts can contain transactions assumed to be running, that
> already have their commit/abort records WAL logged.

That is known/by design.

> I think that's not just problematic in logical decoding, but also
> Hot-Standby.  Consider the following:
>
> ProcArrayApplyRecoveryInfo() gets an xl_running_xacts record that's not
> suboverflowed, and thus will change to STANDBY_SNAPSHOT_READY.

Yes

> In that
> case it'll populate the KnownAssignedXids machinery using
> KnownAssignedXidsAdd().

No, because of this section of code in ProcArrayApplyRecoveryInfo()

/*
 * The running-xacts snapshot can contain xids that were still visible
 * in the procarray when the snapshot was taken, but were already
 * WAL-logged as completed. They're not running anymore, so ignore
 * them.
 */
 if (TransactionIdDidCommit(xid) || TransactionIdDidAbort(xid))
 continue;

> Am I missing something that protects against the above scenario?

It does seem so, yes. The locking issues in Hot Standby are complex,
but they do seem to be correct, thanks for reviewing them.

This is documented in multiple locations, including what I thought was
your own comment in LogStandbySnapshot().

What I suggest is that with logical decoding in mind we do this
1. Inject a new record XLOG_SNAPSHOT_START at the start of
LogStandbySnapshot(). We start logical decoding from there.
2. Record any transactions that end
3. Now the full XLOG_RUNNING_XACTS record arrives. We apply all xacts
that are seen as running, minus any ended between 1 and 3

This avoids the problems for the race but without holding locks while
we log XLOG_RUNNING_XACTS, something that was considered painful for
Hot Standby.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CTE inlining

On 2 May 2017 at 10:45, Craig Ringer  wrote:

> If we want fence behaviour, we should require people to declare their
> desire for fence behaviour, rather than treating it as a sort of
> hint-as-a-bug that we grandfather in because we're so desperate not to
> admit we have hints.

Answers to my question here https://dba.stackexchange.com/q/27425/7788
suggest that there's no justification in the standard for treating it
as a mandatory fence.

Here https://www.postgresql.org/message-id/29918.1320244...@sss.pgh.pa.us
Tom says

"CTEs are also treated as optimization fences; this is not so much an
optimizer limitation as to keep the semantics sane when the CTE contains
a writable query."

In my view, if we address that by walking the query tree for volatile
functions, we should be perfectly fine to inline single-reference
CTEs. Is that too simplistic?

Is the sticking point simply this docs text:

"A useful property of WITH queries is that they are evaluated only
once per execution of the parent query, even if they are referred to
more than once by the parent query or sibling WITH queries. Thus,
expensive calculations that are needed in multiple places can be
placed within a WITH query to avoid redundant work. Another possible
application is to prevent unwanted multiple evaluations of functions
with side-effects. However, the other side of this coin is that the
optimizer is less able to push restrictions from the parent query down
into a WITH query than an ordinary subquery. The WITH query will
generally be evaluated as written, without suppression of rows that
the parent query might discard afterwards. (But, as mentioned above,
evaluation might stop early if the reference(s) to the query demand
only a limited number of rows.)"

?

... because to me, that seems to offer a great deal of wiggle room in
"less able", "generally", etc, even if we treated the docs as a hard
specification of future behaviour. (Which we don't, since we change
things and document the changes).

We may also stop evaluation early, so there's already no guarantee the
full query is run.

Is there any *technical* rather than policy reason we could not safely
inline a CTE term referenced by only one query, where the CTE term is
a SELECT, contains no volatile function calls, is not RECURSIVE, and
does not its self reference another non-inlineable CTE term?

Josh Kupershmidt, in a comment on my blog post on this topic some time
ago (https://blog.2ndquadrant.com/postgresql-ctes-are-optimization-fences/),
pointed out that

"One notable exception to the “(non-writeable) CTEs are always
materialized” rule is that if a non-writeable CTE node is not
referenced anywhere, it won’t actually be evaluated. For example, this
query returns 1 instead of bombing out:

WITH not_executed AS (SELECT 1/0),

 executed AS (SELECT 1)

SELECT * FROM executed;"

Prior discussions:

* https://www.postgresql.org/message-id/201209191305.44674...@kavod.com
* http://archives.postgresql.org/pgsql-performance/2011-10/msg00208.php
* https://www.postgresql.org/message-id/29918.1320244...@sss.pgh.pa.us

Relevant posts where users get confused by our behaviour:

* https://dba.stackexchange.com/q/127828/7788
* https://news.ycombinator.com/item?id=7023907
* https://dba.stackexchange.com/q/84760/7788
* https://dba.stackexchange.com/q/97393/7788
* http://stackoverflow.com/q/20403792/398670
* http://stackoverflow.com/q/33731068/398670
* 

Also, comments on my post:

* https://blog.2ndquadrant.com/postgresql-ctes-are-optimization-fences/

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Potential hot-standby bug around xacts committed but in xl_running_xacts