Re: [HACKERS] Restricting maximum keep segments by repslots

2017-11-05 Thread Craig Ringer
On 31 October 2017 at 17:43, Kyotaro HORIGUCHI
<horiguchi.kyot...@lab.ntt.co.jp> wrote:
> Hello, this is a rebased version.
>
> It gets a change of the meaning of monitoring value along with
> rebasing.
>
> In previous version, the "live" column mysteriously predicts the
> necessary segments will be kept or lost by the next checkpoint
> and the "distance" offered a still more mysterious value.

Would it make sense to teach xlogreader how to fetch from WAL archive,
too? That way if there's an archive, slots could continue to be used
even after we purge from local pg_xlog, albeit at a performance cost.

I'm thinking of this mainly for logical slots.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ArrayLists instead of List (for some things)

2017-11-03 Thread Craig Ringer
On 3 November 2017 at 12:41, David Rowley <david.row...@2ndquadrant.com> wrote:
> On 3 November 2017 at 03:26, Craig Ringer <cr...@2ndquadrant.com> wrote:
>> On 2 November 2017 at 22:22, David Rowley <david.row...@2ndquadrant.com> 
>> wrote:
>>> Maybe, but the new implementation is not going to do well with places
>>> where we perform lcons(). Probably many of those places could be
>>> changed to lappend(), but I bet there's plenty that need prepend.
>>
>> Yeah, and it's IMO significant that pretty much every nontrivial
>> system with ADTs offers multiple implementations of list data
>> structures, often wrapped with a common API.
>>
>> Java's Collections, the STL, you name it.
>
> I've never really looked at much Java, but I've done quite a bit of
> dotnet stuff in my time and they have a common interface ICollection
> and various data structures that implement that interface. Which is
> implemented by a bunch of classes, something like
>
> ConcurrentDictionary (hash table)
> Dictionary (hash table)
> HashSet (hash table)
> LinkedList (some kinda linked list)
> List (Arrays, probably)
> SortedDictionary (bst, I think)
> SortedList (no idea, perhaps a btree)
> SortedSet
> Bag (no idea, but does not care about any order)
>
> Probably there are more, but the point is that they obviously don't
> believe there's a one-size-fits-all type. I don't think we should do
> that either. However, I've proved nothing on the performance front
> yet, so that should probably be my next step.

Right. If you've done C# you've done cleaned-up, modernized Java.

C# and Java both have an advantage we don't, namely JIT compilation,
so they can afford to do more with common interfaces then do runtime
specialization. As a pure C project we don't really have that option,
so it's unlikely to be efficient for us to have a fully common
interface and dynamic dispatch. Even C++ only gets away with its
common interfaces in the STL because it does compile-time
specialization via templates, non-virtual methods, etc. There's only
so much you can perpetrate with macros and rely on the compiler to
clean up before it just gets awful.

So I imagine if we do land up with a couple of collections we'd
probably have an AList, alcons, alappend, etc. Bit of a pain, but
hardly the end of the world.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSL and Encryption

2017-11-02 Thread Craig Ringer
On 3 November 2017 at 11:16, chiru r <chir...@gmail.com> wrote:
> Hi ,
>
> Please suggest the best chiper suite to configure openSSL for PostgreSQL
> Server and client?.
>
> How to use other than md5 encryption algorithm to encrypt the passwords in
> PostgreSQL?

This is probably off topic for pgsql-hackers.

For password crypto please go read the SCRAM thread and the PostgreSQL
10 release notes.



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Client Connection redirection support for PostgreSQL

2017-11-02 Thread Craig Ringer
On 2 November 2017 at 14:02, Satyanarayana Narlapuram
<satyanarayana.narlapu...@microsoft.com> wrote:
> Proposal:
>
> Add the ability to the PostgreSQL server instance to route the traffic to a
> different server instance based on the rules defined in server’s pg_bha.conf
> configuration file. At a high level this enables offloading the user
> requests to a different server instance based on the rules defined in the
> pg_hba.conf configuration file.

pg_hba.conf is "host based access [control]" . I'm not sure it's
really the right place.

> Some of the interesting scenarios this
> enables include but not limited to - rerouting traffic based on the client
> hosts, users, database, etc. specified, redirecting read-only query traffic
> to the hot stand by replicas, and in multi-master scenarios.

When this has come up before, one of the issues has been determining
what exactly should constitute "read only" vs "read write" for the
purposes of redirecting work.

There are a bunch of issues there. If you're doing "read only" txns
and then do something "read write" and get redirected, the destination
doesn't have your prepared statements, any WITH HOLD cursors, temp
tables, etc you were working with. Strangeness ensues.

But we now have a session-intent stuff though. So we could possibly do
it at session level.

Backends used just for a redirect would be pretty expensive though.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: schema variables

2017-11-02 Thread Craig Ringer
On 26 October 2017 at 15:21, Pavel Stehule <pavel.steh...@gmail.com> wrote:
> Hi,
>
> I propose a  new database object - a variable.

Didn't we have a pretty long discussion about this already in

Yeah.

https://www.postgresql.org/message-id/flat/CAMsr%2BYF0G8_FehQyFS8gSfnEer9OPsMOvpfniDJOVGQzJzHzsw%40mail.gmail.com#camsr+yf0g8_fehqyfs8gsfneer9opsmovpfnidjovgqzjzh...@mail.gmail.com

It'd be nice if you summarised any outcomes from that and addressed
it, rather than taking this as a new topic.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Custom compression methods

2017-11-02 Thread Craig Ringer
On 2 November 2017 at 17:41, Ildus Kurbangaliev
<i.kurbangal...@postgrespro.ru> wrote:

> In this patch compression methods is suitable for MAIN and EXTENDED
> storages like in current implementation in postgres. Just instead only
> of LZ4 you can specify any other compression method.

We've had this discussion before.

Please read the "pluggable compression support" thread. See you in a
few days ;) sorry, it's kinda long.

https://www.postgresql.org/message-id/flat/20130621000900.GA12425%40alap2.anarazel.de#20130621000900.ga12...@alap2.anarazel.de

IIRC there were some concerns about what happened with pg_upgrade,
with consuming precious toast bits, and a few other things.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ArrayLists instead of List (for some things)

2017-11-02 Thread Craig Ringer
On 2 November 2017 at 22:22, David Rowley <david.row...@2ndquadrant.com> wrote:
> On 3 November 2017 at 03:17, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> We've jacked up the List API and driven a new implementation underneath
>> once before.  Maybe it's time to do that again.
>
> Maybe, but the new implementation is not going to do well with places
> where we perform lcons(). Probably many of those places could be
> changed to lappend(), but I bet there's plenty that need prepend.

Yeah, and it's IMO significant that pretty much every nontrivial
system with ADTs offers multiple implementations of list data
structures, often wrapped with a common API.

Java's Collections, the STL, you name it.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ArrayLists instead of List (for some things)

2017-11-02 Thread Craig Ringer
On 2 November 2017 at 22:17, Tom Lane <t...@sss.pgh.pa.us> wrote:
> David Rowley <david.row...@2ndquadrant.com> writes:
>> Comments on the design are welcome, but I was too late to the
>> commitfest, so there are other priorities. However, if you have a
>> strong opinion, feel free to voice it.
>
> I do not like replacing Lists piecemeal; that's a recipe for ongoing
> API breakage and back-patching pain.  Plus we'll then have *four*
> different linked-list implementations in the backend, which sure
> seems like too many.
>
> We've jacked up the List API and driven a new implementation underneath
> once before.  Maybe it's time to do that again.

I know some systems use hybrid linked array-lists, where linked list
cells are multi-element.

https://en.wikipedia.org/wiki/Unrolled_linked_list

I don't have much experience with them myself.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mapping MERGE onto CTEs (Re: MERGE SQL Statement for PG11)

2017-11-02 Thread Craig Ringer
On 2 November 2017 at 01:14, Alvaro Herrera <alvhe...@alvh.no-ip.org> wrote:
> Nico Williams wrote:
>
>> As an aside, I'd like to be able to control which CTEs are view-like and
>> which are table-like.  In SQLite3, for example, they are all view-like,
>> and the optimizer will act accordingly, whereas in PG they are all
>> table-like, and thus optimizer barriers.
>
> There was a short and easy to grasp (OK, maybe not) discussion on the
> topic of CTEs acting differently.  I think the consensus is that for
> CTEs that are read-only and do not use functions that aren't immutable,
> they may be considered for inlining.
> https://www.postgresql.org/message-id/5351711493487...@web53g.yandex.ru

Yep. All theoretical though, I don't think anyone (myself included)
stumped up a patch.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement-level rollback

2017-11-02 Thread Craig Ringer
On 2 November 2017 at 13:59, Vladimir Sitnikov
<sitnikov.vladi...@gmail.com> wrote:

> The performance overhead for "SELECT" statement (no columns, just select)
> statement over localhost is 36±4 us vs 38±3 us (savepoint is pipelined along
> with user-provided query). That is network overhead is close to negligible.

Yep. Not for psqlODBC or other libpq-based drives that can't pipeline
queries though.

> In other words, "savepoint; insert;savepoint; insert;savepoint;
> insert;savepoint; insert;savepoint; insert;" would allocate xids and might
> blow up backend's memory.

RELEASE SAVEPOINT, like psqlODBC does.

> Adding protocol messages would blow pgbouncer, etc things, so it makes sense
> to refrain from new messages unless it is absolutely required.

Yeah, it'd affect proxies, true. But it'd let us get rid of a lot of
very ugly log spam too. And unlike some of the prior protocol tweaks
I've been interested in, it'd be client-initiated so it should be
pretty safe.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-01 Thread Craig Ringer
On 1 November 2017 at 01:55, Peter Geoghegan <p...@bowt.ie> wrote:

> The problem here is: Iff the first statement uses ON CONFLICT
> infrastructure, doesn't the absence of WHEN NOT MATCHED imply
> different semantics for the remaining updates and deletes in the
> second version of the query? You've removed what seems like a neat
> adjunct to the MERGE, but it actually changes everything else too when
> using READ COMMITTED.

Would these concerns be alleviated by adding some kind of Pg-specific
decoration that constrained concurrency-safe MERGEs?

So your first statement would be

 MERGE CONCURRENTLY ...

and when you removed the WHEN NOT MATCHED clause it'd ERROR because
that's no longer able to be done with the same concurrency-safe
semantics?

I don't know if this would be helpful TBH, or if it would negate
Simon's compatibility goals. Just another idea.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement-level rollback

2017-11-01 Thread Craig Ringer
On 2 November 2017 at 09:33, Peter Eisentraut
<peter.eisentr...@2ndquadrant.com> wrote:

> If you turned the autocommit setting off, then this code would
> effectively silently do nothing, and that is obviously quite bad.

Right.

The example often cited is some variant of

BEGIN;
CREATTE TABLE t2 AS SELECT * FROM t1;
DROP TABLE t1;
ALTER TABLE t2 RENAME TO t1;
COMMIT;

Right now, we do the right thing here. With default statement level
rollback, you just dropped t1 and all your data. oops.


On a related note, psql's -v ON_ERROR_STOP=1 is horrible and hard to
discover UI, and one of the top FAQs on Stack Overflow is some variant
of "I'm getting random and incomprehensible errors restoring a dump,
wtf?". So I'd really love to make it the default, but we'd face
similar issues where a SQL script that's currently correct instead
produces dangerously wrong results with ON_ERROR_STOP=1 .

> In principle, a backend-based solution that drivers just have to opt
> into would save a lot of duplication.  But the drivers that care or
> require it according to their standards presumably already implement
> this behavior in some other way, so it comes back to whether there is a
> performance or other efficiency gain here.

There definitely would be over SQL-level savepoints. They're horrible
for performance, especially since libpq can't yet pipeline work so you
need three round-trips for each successful statement: SAVEPOINT,
statement, RELEASE SAVEPOINT. It produces massive log spam too.

What about if we add protocol-level savepoint support? Two new messages:

BeginUnnamedSavepoint

and

EndUnnamedSavepoint

where the latter does a rollback-to-last-unnamed-savepoint if the txn
state is bad, or a release-last-unnamed-savepoint if the txn state is
ok. That means the driver doesn't have to wait for the result of the
statement. It knows the conn state and query outcome from our prior
messages, and knows that as a result of this message any failed state
has been rolled back.

This would, with appropriate libpq support, give people who want
statement level error handling pretty much what they want. And we
could expose it in psql too. No GUCs needed, no fun surprises for
apps. psqlODBC could adopt it to replace its current slow and
super-log-spammy statement rollback model.

Because we'd know it was a special savepoint used for statement level
rollback we might still have some optimisation opportunities.

Downside is that it needs support in each client driver.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: enabling parallel execution for cursors explicitly (experimental)

2017-11-01 Thread Craig Ringer
On 2 November 2017 at 10:01, Robert Haas <robertmh...@gmail.com> wrote:

> I think that still leaves a fair number of scenarios to consider, and
> the error handling by itself seems pretty thorny.  Plus it's kind of a
> weird mode and, like Craig, I'm not really sure what it gets you.
> Maybe if somebody has the use case where this would help, they should
> just do:
>
> CREATE TEMP TABLE x AS SELECT * FROM t2 WHERE ...;
> DECLARE x CURSOR FOR SELECT * FROM x;

That forces materialization, and I'm guessing part of Tomas's goal
here is to prevent the need to materialize into a temp table /
tuplestore / etc.

It's not clear to me why an unbounded portal fetch, using the tcp
socket windows and buffers for flow control, isn't sufficient.

Tomas, can you explain the use case a bit more?


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: extend shm_mq to support more use cases

2017-11-01 Thread Craig Ringer
On 1 November 2017 at 21:24, Ildus Kurbangaliev
<i.kurbangal...@postgrespro.ru> wrote:
> Hello! Apparently the current version of shm_mq supports only one sender
> and one receiver. I think it could be very useful to add possibility to
> change senders and receivers. It could be achieved by adding methods
> that remove sender or receiver for mq.

That'd put the complexity penalty on existing shm_mq users. You'd have
to find a way to make it work cheaply for existing 1:1 lightweight
uses.

I'm using shm_mq pretty extensively now, and mostly in one-to-many
two-way patterns between a persistent endpoint and a pool of transient
peers. It's not simple to get it right, there are a lot of traps
introduced by the queue's design for one-shot contexts where
everything is created together and destroyed together.

I think it'd make a lot of sense to make this easier, but I'd be
inclined to do it by layering on top of shm_mq rather than extending
it.

One thing that would make such patterns much easier would be a way to
safely reset a queue for re-use in a race-free way that didn't need
the use of any external synchronisation primitives. So once the
initial queues are set up, a client can attach, exchange messages on a
queue pair, and detach. And the persistent endpoint process can reset
the queues for re-use. Attempting to attach to a busy queue, or one
recently detached from, should fail gracefully.

Right now it's pretty much necessary to have an per-queue spinlock or
similar that you take before you overwrite a queue and re-create it.
And you need is-in-use flags for both persistent side and transient
side peers. That way the persistent side knows for sure there's no
transient side still attached to the queue when it overwrites it, and
so the transient side knows not to attempt to attach to a queue that's
already been used and detached by someone else because that Assert()s.
And you need the persistent side in-use flag so the transient side
knows the queue exists and is ready to be attached to, and it doesn't
try to attach to a queue that's in the process of being overwritten.

The issues I've had are:

* When one side detaches it can't tell if the other side is still
attached, detached, or has never attached. So you need extra book
keeping to find out "once I've detached, is it possible the other peer
could've attached and not yet detached" and ensure that no peer could
be attached when you zero and overwrite a queue.

* Attempting to set the receive/send proc and attach to an
already-attached queue Assert()s, there's no "try and fail
gracefully". So you need extra book keeping to record "this queue slot
has been used up, detached from, and needs to be reset". You can't
just try a queue until you find a free one, or retry your queue slot
until it's reset, or whatever.

* Receive/send proc is tracked by PGPROC not pid, and those slots are
re-used much faster and more predictably. So failures where you
attempt to set yourself as sender/receiver for a queue that's already
had a sender/receiver set can produce confusing errors.


I'd like to:

* Zero the PGPROC* for the sender when it detaches, and the receiver
when it detaches

* When the queue is marked as peer-detached but the local PGPROC is
still attached, have a function that takes the queue spinlock and
resets the queue to not-detached state with the local proc still
attached, ready for re-use by attaching a new peer.

* (At least optionally) return failure code when attempting to set
sender or receiver on a detached queue, not assert. So you can wait
and try again later when the queue is reset and ready for re-use, or
scan to find another free queue slot to attach to.

If there's a need to keep the sender and receiver PGPROC after detach
we could instead make the detached bool into a flag of
RECEIVER_DETACHED|SENDER_DETACHED. However, this adds
read-modify-write cycles to what's otherwise a simple set, so the
spinlock must be taken on detach an atomic must be used. So I'd rather
just blindly clear the PGPROC pointer for whichever side is detaching.

These changes would make using shm_mq persistently MUCH easier,
without imposing significant cost on existing users. And it'd make it
way simpler to build a layer on top for a 1:m 2-way comms system like
Ildus is talking about.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Anyone have experience benchmarking very high effective_io_concurrency on NVME's?

2017-10-31 Thread Craig Ringer
On 1 November 2017 at 11:49, Andres Freund <and...@anarazel.de> wrote:

> Right. It'd probably be good to be a bit more adaptive here. But it's
> hard to do with posix_fadvise - we'd need an operation that actually
> notifies us of IO completion.  If we were using, say, asynchronous
> direct IO, we could initiate the request and regularly check how many
> blocks ahead of the current window are already completed and adjust the
> queue based on that, rather than jus tfiring off fadvises and hoping for
> the best.

In case it's of interest, I did some looking into using Linux's AIO
support in Pg a while ago, when chasing some issues around fsync
retries and handling of I/O errors.

It was a pretty serious dead end; it was clear that fsync support in
AIO is not only incomplete but inconsistent across kernel versions,
let alone other platforms.

But I see your name in the relevant threads, so you know that. To save
others the time, see:

* https://lwn.net/Articles/724198/
* https://lwn.net/Articles/671649/

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dynamic result sets from procedures

2017-10-31 Thread Craig Ringer
On 1 November 2017 at 05:08, Peter Eisentraut
<peter.eisentr...@2ndquadrant.com> wrote:

> CREATE PROCEDURE pdrstest1()
> LANGUAGE SQL
> AS $$
> DECLARE c1 CURSOR WITH RETURN FOR SELECT * FROM cp_test2;
> DECLARE c2 CURSOR WITH RETURN FOR SELECT * FROM cp_test3;
> $$;
>
> CALL pdrstest1();

FWIW, this is similar to the model already used by PgJDBC to emulate
multiple result sets, though the current support in the driver is
rather crude. It detects a REFCURSOR in an output parameter / result
set and transparently FETCHes the result set, making it look to the
client app like it's a nested result set.

This shouldn't conflict with what you're doing because the driver does
not follow the JDBC standard behaviour of using
Statement.getMoreResults() and Statement.getResultSet() for multiple
result sets. That's currently only used by PgJDBC when fetching result
sets from batch query executions. Instead, the multiple result set
emulation requires the caller to 'getObject' the 'refcursor' field's
result-object, then cast it to ResultSet, and treat it as a new
(nested) result set.

True multiple result sets would be exposed in PgJDBC via getMoreResults().

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: enabling parallel execution for cursors explicitly (experimental)

2017-10-31 Thread Craig Ringer
> Now, I agree this is somewhat more limited than I hoped for, but OTOH it
> still solves the issue I initially aimed for (processing large query
> results in efficient way).

I don't quite understand this part.

We already send results to the client in a stream unless it's
something we have to materialize, in which case a cursor won't help
anyway.

If the client wants to fetch in chunks it can use a portal and limited
size fetches. That shouldn't (?) be parallel-unsafe, since nothing
else can happen in the middle anyway.

But in most cases the client can just fetch all, and let the socket
buffering take care of things, reading results only when it wants
them, and letting the server block when the windows are full.

That's not to say that SQL-level cursor support wouldn't be nice. I'm
just trying to better understand what it's solving.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reading timeline from pg_control on replication slave

2017-10-29 Thread Craig Ringer
On 28 October 2017 at 06:09, Michael Paquier  wrote:
> On Fri, Oct 27, 2017 at 1:04 AM, Andrey Borodin  wrote:
>> I'm working on backups from replication salve in WAL-G [0]
>> Backups used to use result of pg_walfile_name(pg_start_backup(...)). Call to 
>> pg_start_backup() works nice, but "pg_walfile_name() cannot be executed 
>> during recovery."
>> This function has LSN as argument and reads TimeLineId from global state.
>> So I made a function[1] that, if on replica, reads timeline from pg_control 
>> file and formats WAL file name as is it was produces by pg_wal_filename(lsn).
>
> ThisTimeLineID is not something you can rely on for standby backends
> as it is not set during recovery.

That's not much of a concern really, you just have to ensure you call
GetXLogReplayRecPtr and set ThisTimeLineID.

(I'd quite like ThisTimeLineID to go away as a global. It's messy and
confusing, and I'd much rather it be fetched when needed).

However, that doesn't negate the rest of the issues you raised.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical decoding of two-phase transactions

2017-10-29 Thread Craig Ringer
On 28 October 2017 at 03:53, Sokolov Yura <y.soko...@postgrespro.ru> wrote:
> On 2017-10-26 22:01, Sokolov Yura wrote:

> Small improvement compared to v7:
> - twophase_gid is written with alignment padding in the XactLogCommitRecord
> and XactLogAbortRecord.

I think Nikhils has done some significant work on this patch.
Hopefully he'll be able to share it.



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Timeline ID in backup_label file

2017-10-25 Thread Craig Ringer
On 26 October 2017 at 02:50, Michael Paquier <michael.paqu...@gmail.com> wrote:
> Hi all,
>
> Lately, in order to extract some information from a backup_label file
> in python I have found myself doing something like the following to
> get a timeline number (feel free to reuse that code, especially the
> regex pattern):
> pattern = re.compile('^START WAL
> LOCATION:.*([0-9A-F]+\/[0-9A-F]+).*\(file ([0-9A-F]+)\)')
> if pattern.match(backup_lable_line):
> current_lsn = m.group(1)
> current_segment = m.group(2)
> current_tli = str(int(current_segment[:8], 16))
>
> That's more or less similar to what read_backup_label()@xlog.c does
> using sscanf(), still I keep wondering why we need to go through this
> much complication to get the origin timeline number of a base backup,
> information which is IMO as important as the start LSN when working on
> backup methods.
>
> I would find interesting to add at the bottom of the backup_label file
> a new field called "START TIMELINE: %d" to put this information in a
> more understandable shape. Any opinions?

Strong "yes" from me.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] legitimacy of using PG_TRY , PG_CATCH , PG_END_TRY in C function

2017-10-23 Thread Craig Ringer
On 23 October 2017 at 16:16, Craig Ringer <cr...@2ndquadrant.com> wrote:
> On 23 October 2017 at 08:30, John Lumby <johnlu...@hotmail.com> wrote:
>
>> All works but not perfectly --  at COMMIT,  resource_owner issues
>> relcache reference leak messages about relation scans not closed
>> and also about  snapshot still active. I guess that the CREATE has
>> switched resource_owner and pushed a snapshot,  but I did not
>> debug in detail.
>
> A lot more work is required than what's done pg PG_CATCH to return to
> a queryable state. I've been down this path myself, and it's not fun.

Ignore me, Tom's example is probably more relevant to you since it
applies to subtransactions, not top-level query state.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] legitimacy of using PG_TRY , PG_CATCH , PG_END_TRY in C function

2017-10-23 Thread Craig Ringer
On 23 October 2017 at 08:30, John Lumby <johnlu...@hotmail.com> wrote:

> All works but not perfectly --  at COMMIT,  resource_owner issues
> relcache reference leak messages about relation scans not closed
> and also about  snapshot still active. I guess that the CREATE has
> switched resource_owner and pushed a snapshot,  but I did not
> debug in detail.

A lot more work is required than what's done pg PG_CATCH to return to
a queryable state. I've been down this path myself, and it's not fun.

Take a look at all the extra work done on the error handling path in
PostgresMain.

At some point I'd really like to expose that in a more general way so
it can be used from background workers. Right now AFAICS most
background workers have to cope with errors with a proc_exit(1) and
getting restarted to try again. Not ideal.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to determine that a TransactionId is really aborted?

2017-10-23 Thread Craig Ringer
On 23 October 2017 at 05:44, Eric Ridge <eeb...@gmail.com> wrote:
>> On Oct 22, 2017, at 3:24 PM, Peter Geoghegan <p...@bowt.ie> wrote:

>> Again, you'll probably need to put this low level requirement into
>> context if you want sound advice from this list.
>
> I'm just thinking out lout here, but the context is likely something along 
> the lines of externally storing all transaction ids, and periodically asking 
> Postgres if they're known-to-be-aborted-by-all-transactions -- one at a time.

I think Peter is asking "why?".

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Is anything preventing us from allowing write to foreign tables from standby?

2017-10-17 Thread Craig Ringer
On 18 October 2017 at 02:01, Alexander Korotkov
<a.korot...@postgrespro.ru> wrote:
> On Wed, Sep 6, 2017 at 4:42 PM, Alexander Korotkov
> <a.korot...@postgrespro.ru> wrote:
>>
>> We're currently blocking writing queries on standby if even they are
>> modifying contents of foreign tables.  But do we have serious reasons for
>> that?
>> Keeping in the mind FDW-sharding, making FDW-tables writable from standby
>> would be good to prevent single-master bottleneck.
>> I wrote simple patch enabling writing to foreign tables from standbys.  It
>> works from the first glance for me.
>
>
> No interest yet, but no objections too :-)
> I'm going to add this to next commitfest.

Superficially at least, it sounds like a good idea.

We should only need a virtual xid when we're working with foreign
tables since we don't do any local heap changes.

How's it work with savepoints?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query got Killed with CTE.

2017-10-17 Thread Craig Ringer
On 17 October 2017 at 21:18, Prabhat Sahu <prabhat.s...@enterprisedb.com>
wrote:

> Hi,
>
> While quering with CTE against PG HEAD , i found that query got killed
> with this below error logs
> -- Machine Configuration: (d1.xlarge) CUPs : 8 , RAM  : 16GB , SIze : 640GB
>
> postgres=# with x as (select 5 c1 from generate_series(1,100) x)
> select * from x x1 join x x2 using(c1);
> Killed
> 2017-10-17 14:12:33.558 BST [949] LOG:  could not send data to client:
> Broken pipe
> 2017-10-17 14:12:33.558 BST [949] STATEMENT:  with x as (select 5 c1 from
> generate_series(1,100) x) select * from x x1 join x x2 using(c1);
> 2017-10-17 14:12:33.559 BST [949] FATAL:  connection to client lost
> 2017-10-17 14:12:33.559 BST [949] STATEMENT:  with x as (select 5 c1 from
> generate_series(1,100) x) select * from x x1 join x x2 using(c1);
>
>
You produced a huge cross-product by the looks, and psql ran of of RAM
buffering the result. The OOM killer fired (check 'dmesg' to confirm) and
killed psql.  The server noticed psql going away, and reported the fact.

None of this is surprising. What's the problem here?


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-17 Thread Craig Ringer
On 17 October 2017 at 22:39, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Justin Pryzby <pry...@telsasoft.com> writes:
>> On Tue, Oct 17, 2017 at 09:34:24AM -0400, Tom Lane wrote:
>>> So: where did you get the existing binaries?  If it's from some vendor
>>> packaging system, what you should do is fetch the package source, add
>>> the patch to the probably-nonempty set of patches the vendor is applying,
>>> and rebuild your own custom package version.  If you haven't done that
>>> before, it's a good skill to acquire ...
>
>> I'm familiar with that process; but, these are PG10 binaries from PGDG for
>> centos6 x64.
>
> Well, I'm pretty sure Devrim builds those using the RPM build process.
> I'd have grabbed his SRPM and proceeded as above.

Yep, or unpack the tarball and apply the patch then build with the
same configure options as you find in the rpm spec. That can be easier
when iterating tests and builds.

Since the patches are separate, you can skip the tarball and clone the
same tag from git instead. Then apply the rpm patches as separate
commits. That's typically what I'll do, makes it easier to keep track.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Determine state of cluster (HA)

2017-10-16 Thread Craig Ringer
On 17 October 2017 at 01:02, Joshua D. Drake <j...@commandprompt.com> wrote:
> On 10/15/2017 07:39 PM, Craig Ringer wrote:
>>
>> On 13 October 2017 at 08:50, Joshua D. Drake <j...@commandprompt.com> wrote:
>>>
>>> -Hackers,
>>>
>>> I had a long call with a firm developing front end proxy/cache/HA for
>>> Postgres today. Essentially the software is a replacement for PGPool in
>>> entirety but also supports analytics etc... When I was asking them about
>>> pain points they talked about the below and I was wondering if this is a
>>> problem we would like to solve.
>>
>>
>> IMO: no one node knows the full state of the system, or can know it.
>
>
> That isn't exactly true. We do know if our replication state is current but
> only from the master which is part of the problem.

Sure. But unless you have a perfectly-reliable, latency-ignoring
wormhole link between master and standby, the standby always has
imperfect knowledge of the master. More importantly, it can never know
for sure how old its knowledge is.

https://www.postgresql.org/docs/current/static/monitoring-stats.html#PG-STAT-WAL-RECEIVER-VIEW
already does about the best we can probably do. In particular
last_msg_send_time and last_msg_receipt_time, used in combination with
latest_end_lsn and latest_end_time.

>> That said, I do think it'd be very desirable for us to introduce a
>> greater link from a standby to master:
>>
>> - Get info about master. We should finish merging recovery.conf into
>> postgresql.conf.
>
>
> Definitely.

There's a patch from Abhijit Menon-Sen you could adopt for PostgreSQL
11 for that.


>>> 1.  The dblink call doesn't have a way to specify a timeout, so we have
>>> to
>>> use Java futures to control how long this may take to a reasonable amount
>>> of
>>> time;
>>
>>
>> statement_timeout doesn't work?
>
>
> That would be a work around definitely but I think it would be better to
> say: ALTER SYSTEM SET PROMOTE TIMEOUT '120' (Example, let's not get off into
> the weeds :P) and if the standby can't receive a ping/ack within 120 it will
> promote itself.

I'm confused by this. I thought you were talking about timeouts
querying status of an upstream over dblink. Not automatic
self-promotion.

I'm really not a fan of Pg standbys self-promoting without working
with an external co-ordinator that handles STONITH/fencing. It's a
recipe for disaster. That's what I was saying upthread, that
implementing bits and pieces here can be quite dangerous.

This also takes it well outside what you were talking about, improving
the ability to detect Pg's state, and into having it become its own
co-ordinator for HA actions.

So lets go back to the original question. What's missing that
statement_timeout doesn't provide for querying remote servers for
their status over dblink?

If you want a nicer way to say "look up whatever your conninfo in
recovery.conf is, connect to it, get me some info on it and return it,
possibly daisy-chaining up a chain of replicas if you reach the
master" ... that's fine. But it's a different thing.

>> Er, yes? I don't understand what you are getting at here.
>
>
> Yes, I will need to go back to them on this one. I think what they mean is
> that if we have a connection that is getting closed it doesn't return why it
> is closing. It just throws an error.

Yes, we do.  From
https://www.postgresql.org/docs/current/static/errcodes-appendix.html:

Class 57 — Operator Intervention
  57000 operator_intervention
  57014 query_canceled
  57P01 admin_shutdown
  57P02 crash_shutdown
  57P03 cannot_connect_now
  57P04 database_dropped

Maybe they want more granularity in terms of what reasons are given
and what errors are reported. That's fine. But please provide
specifics.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Determine state of cluster (HA)

2017-10-15 Thread Craig Ringer
On 13 October 2017 at 08:50, Joshua D. Drake  wrote:
> -Hackers,
>
> I had a long call with a firm developing front end proxy/cache/HA for
> Postgres today. Essentially the software is a replacement for PGPool in
> entirety but also supports analytics etc... When I was asking them about
> pain points they talked about the below and I was wondering if this is a
> problem we would like to solve.

IMO: no one node knows the full state of the system, or can know it.

I'd love PostgreSQL to help users more with scaling, HA, etc. But I
think it's a big job. We'd need:

- a node topology of some kind, communicated between nodes
- heartbeat and monitoring
- failover coordination
- pooling/proxying
- STONITH/fencing
- etc.

That said, I do think it'd be very desirable for us to introduce a
greater link from a standby to master:

- Get info about master. We should finish merging recovery.conf into
postgresql.conf.

-

> b. Attempt to connect to the host directly, if not...
> c. use the slave and use the hostname via dblink to connect to the master,
> as the hostname , i.e. select * from dblink('" + connInfo + "
> dbname=postgres', 'select inet_server_addr()') AS t(inet_server_addr inet).
> This is necessary in the event the hostname used in the recovery.conf file
> is not resolvable from the outside.

OK, so "connect directly" here means from some 3rd party, the one
doing the querying of the replica.

> 1.  The dblink call doesn't have a way to specify a timeout, so we have to
> use Java futures to control how long this may take to a reasonable amount of
> time;

statement_timeout doesn't work?

If not, that sounds like a sensible, separate feature to add. Patches welcome!

> 2.  NAT mapping may result in us detecting IP ranges that are not accessible
> to the application nodes.

PostgreSQL can't do anything about this one.

> 3.  there is no easy way to monitor for state changes as they happen,
> allowing faster failovers, everything has to be polled based on events;

It'd be pretty simple to write a function that sleeps in the backend
until it's promoted. I don't know off the top of my head if we set all
proc latches when we promote, but if we don't it's probably harmless
and somewhat useful to do so.

Either way, you'd do long-polling. Call the function and let the
connection block until something interesting happens. Use TCP
keepalives to make sure you notice if it dies. Have the function
return when the state changes.

> 4.  It doesn't support cascading replication very well, although we could
> augment the logic to allow us to map the relationship between nodes.
> 5.  There is no way to connect to a db node with something akin to
> SQL-Server's "application intent" flags, to allow a connection to be
> rejected if we wish it to be a read/write connection.  This helps detect the
> state of the node directly without having to ask any further questions of
> the node, and makes it easier to "stall" during connection until a proper
> connection can be made.

That sounds desirable, and a good step toward eventually being able to
transparently re-route read/write queries from replica to master.
Which is where I'd like to land up eventually.

Again, that'd be a sensible patch to submit, quite separately to the
other topics.

> 6.  The master, on shutdown, will not actually close and disable connections
> as it shuts down, instead, it will issue an error that it is shutting down
> as it does so.

Er, yes? I don't understand what you are getting at here.

Can you describe expected vs actual behaviour in more detail?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Slow synchronous logical replication

2017-10-12 Thread Craig Ringer
On 12 October 2017 at 16:09, Konstantin Knizhnik
<k.knizh...@postgrespro.ru> wrote:
>

> Is the CREATE TABLE and INSERT done in the same transaction?
>
> No. Table was create in separate transaction.
> Moreover  the same effect will take place if table is create before start of 
> replication.
> The problem in this case seems to be caused by spilling decoded transaction 
> to the file by ReorderBufferSerializeTXN.

Yeah. That's known to perform sub-optimally, and it also uses way more
memory than it should.

Your design compounds that by spilling transactions it will then
discard, and doing so multiple times.

To make your design viable you likely need some kind of cache of
serialized reorder buffer transactions, where you don't rebuild one if
it's already been generated. And likely a fair bit of optimisation on
the serialisation.

Or you might want a table- and even a row-filter that can be run
during decoding, before appending to the ReorderBuffer, to let you
skip changes early. Right now this can only be done at the transaction
level, based on replication origin. Of course, if you do this you
can't do the caching thing.

> Unfortunately it is not quite clear how to make wal-sender smarter and let
> him skip transaction not affecting its publication.

You'd need more hooks to be implemented by the output plugin.

> I really not sure that it is possible to skip over WAL. But the particular 
> problem with invalidation records etc  can be solved by always processing 
> this records by WAl sender.
> I.e. if backend is inserting invalidation record or some other record which 
> always should be processed by WAL sender, it can always promote LSN of this 
> record to WAL sender.
> So WAl sender will skip only those WAl records which is safe to skip 
> (insert/update/delete records not affecting this publication).

That sounds like a giant layering violation too.

I suggest focusing on reducing the amount of work done when reading
WAL, not trying to jump over whole ranges of WAL.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] show precise repos version for dev builds?

2017-10-11 Thread Craig Ringer
On 12 October 2017 at 06:46, Peter Eisentraut
<peter.eisentr...@2ndquadrant.com> wrote:

> I've been using
>
> --with-extra-version=+git`date +%Y%m%d`"~"`git rev-parse --short HEAD`
>
> for my local builds for some time, and I've not experienced any such
> problems.

Interesting.

I've seen issues with a number of tools. The one I can remember most
clearly is check_postgres.pl . Nobody's going to argue that this is
pretty code, but last time I tested (9.4-era, admittedly) it exploded
messily with extra-version.

> However, using the various numeric reporting options is clearly better
> if you want to do version comparisons.  The "extra version" stuff should
> be mainly for labeling.

The trouble there is that we lack numeric version in some (IMO
crucial) places where we only have the string version:

- In the startup packet. We send server_version, but not
server_version_num, as GUC_REPORT. So if a client wants
server_version_num it has to do another round trip to query for it.

- In pg_config, where we don't expose any --version-num only --version.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Slow synchronous logical replication

2017-10-11 Thread Craig Ringer
On 12 October 2017 at 00:57, Konstantin Knizhnik
<k.knizh...@postgrespro.ru> wrote:

> The reason of such behavior is obvious: wal sender has to decode huge
> transaction generate by insert although it has no relation to this
> publication.

It does. Though I wouldn't expect anywhere near the kind of drop you
report, and haven't observed it here.

Is the CREATE TABLE and INSERT done in the same transaction? Because
that's a known pathological case for logical replication, it has to do
a LOT of extra work when it's in a transaction that has done DDL. I'm
sure there's room for optimisation there, but the general
recommendation for now is "don't do that".

> Filtering of insert records of this transaction is done only inside output
> plug-in.

Only partly true. The output plugin can register a transaction origin
filter and use that to say it's entirely uninterested in a
transaction. But this only works based on filtering by origins. Not
tables.

I imagine we could call another hook in output plugins, "do you care
about this table", and use it to skip some more work for tuples that
particular decoding session isn't interested in. Skip adding them to
the reorder buffer, etc. No such hook currently exists, but it'd be an
interesting patch for Pg11 if you feel like working on it.

> Unfortunately it is not quite clear how to make wal-sender smarter and let
> him skip transaction not affecting its publication.

As noted, it already can do so by origin. Mostly. We cannot totally
skip over WAL, since we need to process various invalidations etc. See
ReorderBufferSkip.

It's not so simple by table since we don't know early enough whether
the xact affects tables of interest or not. But you could definitely
do some selective skipping. Making it efficient could be the
challenge.

> Once of the possible solutions is to let backend inform wal-sender about
> smallest LSN it should wait for (backend knows which table is affected by
> current operation,
> so which publications are interested in this operation and so can point wal
> -sender to the proper LSN without decoding huge part of WAL.
> But it seems to be not so easy to implement.

Sounds like confusing layering violations to me.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] show precise repos version for dev builds?

2017-10-11 Thread Craig Ringer
On 11 October 2017 at 11:44, Jeremy Schneider <schnei...@ardentperf.com> wrote:
> On Sun, Oct 1, 2017 at 8:10 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>
>> configure --with-extra-version=whateveryouwant
>
> I see that this build option has been around since 9.4; is anyone
> using it to mark patched production builds?  EnterpriseDB or
> 2ndQuadrant? How about the cloud providers?

We started using it for BDR, but unfortunately too much software
explodes spectacularly when you use it, due to simplistic/buggy
version parsing.

This led me to push for wider adoption of server_version_num - in
particular, exposing it as an initial connection parameter via
GUC_REPORT, and exposing it in pg_config. After a few attempts I've
given up on that as a lost cause now, though I still disagree with the
rationale.

Right now, if you use it, various 3rd party software will fail in a
variety of exciting ways, some more subtle than others.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Slow synchronous logical replication

2017-10-11 Thread Craig Ringer
On 9 October 2017 at 15:37, Konstantin Knizhnik
<k.knizh...@postgrespro.ru> wrote:
> Thank you for explanations.
>
> On 08.10.2017 16:00, Craig Ringer wrote:
>>
>> I think it'd be helpful if you provided reproduction instructions,
>> test programs, etc, making it very clear when things are / aren't
>> related to your changes.
>
>
> It will be not so easy to provide some reproducing scenario, because
> actually it involves many components (postgres_fdw, pg_pasthman,
> pg_shardman, LR,...)

So simplify it to a test case that doesn't.

> I have checked syncrepl.c file, particularly SyncRepGetSyncRecPtr function.
> Each wal sender independently calculates minimal LSN among all synchronous
> replicas and wakeup backends waiting for this LSN. It means that transaction
> performing update of data in one shard will actually wait confirmation from
> replication channels for all shards.

That's expected for the current sync rep design, yes. Because it's
based on lsn, and was designed for physical rep where there's no
question about whether we're sending some data to some peers and not
others.

So all backends will wait for the slowest-responding peer, including
peers that don't need to actually do anything for this xact. You could
possibly hack around that by having the output plugin advance the slot
position when it sees that it just processed an empty xact.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Latches API on windows

2017-10-09 Thread Craig Ringer
On 9 October 2017 at 21:26, Abbas Butt <abbas.b...@enterprisedb.com> wrote:
> Hi,
> I am working on a contrib module that uses RegisterDynamicBackgroundWorker
> API
> to create a couple of worker processes. For synchronization between the
> background worker processes I am using InitSharedLatch, SetLatch, WaitLatch
> APIs.
> One of the processes is supposed to wait for the latch, the other is
> supposed to set it.
> The system works perfectly fine as long as its run on Linux, however when
> tried
> on Windows, it fails giving the error:
> ResetEvent failed: error code 6
> Error code 6 means invalid handle. Debugging reveals that the handle
> contains
> a valid value, however it seems that the handle is not accessible (was not
> created)
> in the process that is calling ResetEvent.
>
> Debugging the issue lead me to the following comment on top of
> InitSharedLatch:
>
>  * InitSharedLatch needs to be called in postmaster before forking child
>  * processes, usually right after allocating the shared memory block
>  * containing the latch with ShmemInitStruct. (The Unix implementation
>  * doesn't actually require that, but the Windows one does.)
>
> In my case this is not true, I am calling InitSharedLatch in _PG_init
> which gets called at CREATE EXTENSION time.
> My question : Is there a way to get the latches API work on windows
> the way it is working on Linux?

I suspect you'd need to do it by having your extension load via
shared_preload_libraries, registering its latch in shmem_startup_hook
.

But ... that's an off-the-cuff guess.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Help required to debug pg_repack breaking logical replication

2017-10-08 Thread Craig Ringer
On 8 October 2017 at 02:37, Daniele Varrazzo <daniele.varra...@gmail.com> wrote:
> Hello,
>
> we have been reported, and I have experienced a couple of times,
> pg_repack breaking logical replication.
>
> - https://github.com/reorg/pg_repack/issues/135
> - https://github.com/2ndQuadrant/pglogical/issues/113

Yeah, I was going to say I've seen reports of this with pglogical, but
I see you've linked to them.

I haven't had a chance to look into it though, and haven't had a
suitable reproducible test case.

> In the above issue #113, Petr Jelinek commented:
>
>> From quick look at pg_repack, the way it does table rewrite is almost 
>> guaranteed
>> to break logical decoding unless there is zero unconsumed changes for a 
>> given table
>> as it does not build the necessary mappings info for logical decoding that 
>> standard
>> heap rewrite in postgres does.
>
> unfortunately he didn't follow up to further details requests.

At a guess he's referring to src/backend/access/heap/rewriteheap.c .

I'd explain better if I understood what was going on myself, but I
haven't really understood the logical decoding parts of that code.

> - Is Petr diagnosis right and freezing of logical replication is to be
> blamed to missing mapping?
> - Can you suggest a test to reproduce the issue reliably?
> - What are mapped relations anyway?

I can't immediately give you the answers you seek, but start by
studying src/backend/access/heap/rewriteheap.c . Notably
logical_end_heap_rewrite, logical_rewrite_heap_tuple,
logical_begin_heap_rewrite.

At a wild "I haven't read any of the relevant code in detail yet" stab
in the dark, pg_repack is failing to do the bookkeeping required by
logical decoding around relfilenode changes, cmin/cmax, etc.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Slow synchronous logical replication

2017-10-08 Thread Craig Ringer
On 8 October 2017 at 03:58, Konstantin Knizhnik
<k.knizh...@postgrespro.ru> wrote:

> The question was about logical replication mechanism in mainstream version
> of Postgres.

I think it'd be helpful if you provided reproduction instructions,
test programs, etc, making it very clear when things are / aren't
related to your changes.

> I think that most of people are using asynchronous logical replication and
> synchronous LR is something exotic and not well tested and investigated.
> It will be great if I am wrong:)

I doubt it's widely used. That said, a lot of people use synchronous
replication with BDR and pglogical, which are ancestors of the core
logical rep code and design.

I think you actually need to collect some proper timings and
diagnostics here, rather than hand-waving about it being "slow". A
good starting point might be setting some custom 'perf' tracepoints,
or adding some 'elog()'ing for timestamps. Then scrape the results and
build a latency graph.

That said, if I had to guess why it's slow, I'd say that you're facing
a number of factors:

* By default, logical replication in PostgreSQL does not do an
immediate flush to disk after downstream commit. In the interests of
faster apply performance it instead delays sending flush confirmations
until the next time WAL is flushed out. See the docs for CREATE
SUBSCRIPTION, notably the synchronous_commit option. This will
obviously greatly increase latencies on sync commit.

* Logical decoding doesn't *start* streaming a transaction until the
origin node finishes the xact and writes a COMMIT, then the xlogreader
picks it up.

* As a consequence of the above, a big xact holds up commit
confirmations of smaller ones by a LOT more than is the case for
streaming physical replication.

Hopefully that gives you something to look into, anyway. Maybe you'll
be inspired to work on parallelized logical decoding :)

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Still another race condition in recovery TAP tests

2017-10-06 Thread Craig Ringer
On 6 October 2017 at 14:03, Noah Misch <n...@leadboat.com> wrote:
> On Fri, Sep 08, 2017 at 10:32:03PM -0400, Tom Lane wrote:
>> (I do kinda wonder why we rolled our own RecursiveCopy; surely there's
>> a better implementation in CPAN?)
>
> Fewer people will test as we grow the list of modules they must first install.

Meh, I don't buy that. At worst, all we have to do is provide a script
that fetches them, from distro repos if possible, and failing that
from CPAN.

With cpanminus, that's pretty darn simple too.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] A hook for session start

2017-10-05 Thread Craig Ringer
On 6 October 2017 at 10:52, Pavel Stehule <pavel.steh...@gmail.com> wrote:

> It is better to work on GLOBAL TEMP tables.
>
> Current TEMP tables, if you do it for any session has pretty significant
> overhead  - with possible risk of performance lost (system catalog bloat).
>
> pretty significant performance issue of my customers are related to temp
> tables usage (under high load)

I've seen the same thing too. Especially when combined with logical
decoding, where IIRC we mark transactions as having catalog changes
due to temp tables.

Sometimes the catalog bloat can be truly horrible when a user has
hundreds of plpgsql functions that all like to make temp tables.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PATCH: Expose generate_qualified_relation_name functionality

2017-10-05 Thread Craig Ringer
I'm regularly surprised that the only non-static function we seem to
have for getting a relation name by oid is get_rel_name. It doesn't
handle schema qualification at all, and it returns NULL rather than
ERROR'ing.

Doing it correctly involves interacting with the syscache, calling
RelationIsVisible and calling get_namespace_name if needed, then
passing the result to quote_qualified_identifier.

so here's what I propose to do:

Add get_qualified_relation_name(bool force_qualified, bool missing_ok)
to ruleutils.c and builtins.h.

(Unless there's somewhere better? I wanted to put it in lsyscache.c
alongside get_rel_name, but it uses quote_qualified_identifier,
StringInfo, etc, so it doesn't fit well in lsyscache.c.)

Replace generate_qualified_relation_name in ruleutils.c with a call to
get_qualified_relation_name(relid, true, false) .

Add comments to RelationGetRelationName, get_rel_name and regclassout
pointing to get_qualified_relation_name.

The only part I don't like here is the two boolean arguments, since
they're ugly to read. But inventing a new flags field seemed a bit
heavy for such a trivial task.

I had a quick look through the codebase for places where this pattern
is repeated and found astonishingly few. In most places we just use
get_rel_name and hope the user can figure out any ambiguity.

I did notice that in heap_copy_data and AlterTableMoveAll we manually
schema-qualify a relation name and fail to call
quote_qualified_identifier, so it's unclear if we mean a relation
named "my.relation" or the relation "relation" in schema "my". But in
diagnostic output, meh, whatever.

Arguably regclassout does the same thing as
get_qualified_relation_name, but I didn't replace it because it cares
about IsBootstrapProcessingMode().

(Prompted by https://dba.stackexchange.com/q/187788/7788)

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 011cd6e08d6d29854db637d6beb8709615f376cb Mon Sep 17 00:00:00 2001
From: Craig Ringer <cr...@2ndquadrant.com>
Date: Fri, 6 Oct 2017 10:48:17 +0800
Subject: [PATCH v1] Add get_qualified_relation_name

It was surprisingly hard to just get a table name in PostgreSQL at the
C level.

To simplify that, expose the functionality of generate_qualified_relation_name
and generate_relation_name to callers in the wider codebase and to extensions
in the form of a new get_qualified_relation_name function.
---
 src/backend/utils/adt/regproc.c |  2 +
 src/backend/utils/adt/ruleutils.c   | 81 +++--
 src/backend/utils/cache/lsyscache.c |  5 ++-
 src/include/utils/builtins.h|  2 +
 src/include/utils/rel.h |  4 ++
 5 files changed, 70 insertions(+), 24 deletions(-)

diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c
index 6fe81fa..691efa8 100644
--- a/src/backend/utils/adt/regproc.c
+++ b/src/backend/utils/adt/regproc.c
@@ -967,6 +967,8 @@ to_regclass(PG_FUNCTION_ARGS)
 
 /*
  * regclassout		- converts class OID to "class_name"
+ *
+ * (See get_qualified_relation_name for a direct-callable version).
  */
 Datum
 regclassout(PG_FUNCTION_ARGS)
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index eb01f35..0eccc02 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -10483,6 +10483,9 @@ quote_qualified_identifier(const char *qualifier,
  *
  * This differs from the underlying get_rel_name() function in that it will
  * throw error instead of silently returning NULL if the OID is bad.
+ *
+ * The returned relation name is not guaranteed to be unique outside the
+ * schema, so it is frequently preferable to use get_qualified_relation_name.
  */
 static char *
 get_relation_name(Oid relid)
@@ -10495,6 +10498,56 @@ get_relation_name(Oid relid)
 }
 
 /*
+ * get_qualified_relation_name
+ *
+ * Get a relation name as a palloc'd string in the current memory context.
+ *
+ * If the relation is not on the current search path, or if force_qualify is
+ * true, the namespace for the relation is prepended.
+ *
+ * ERROR's on a missing relation unless missing_ok is set, in which case returns
+ * NULL.
+ *
+ * The quoting rules used are the same as those for regclass quoting: each
+ * component is quoted if necessary to prevent ambiguity.
+ */
+char *
+get_qualified_relation_name(Oid relid, bool force_qualify, bool missing_ok)
+{
+	HeapTuple   tp;
+	Form_pg_class reltup;
+	char   *relname;
+	char   *nspname = NULL;
+	char   *result;
+
+	tp = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+	if (!HeapTupleIsValid(tp))
+	{
+		if (missing_ok)
+			return NULL;
+		else
+			elog(ERROR, "cache lookup failed for relation %u", relid);
+	}
+
+	reltup = (Form_pg_class) GETSTRUCT(tp);
+	relname = NameStr(reltup->relname);
+
+	if (force_qualify || !RelationIsVisible(relid))
+	{
+		nspname = get_namespace_name

Re: [HACKERS] fork()-safety, thread-safety

2017-10-05 Thread Craig Ringer
On 6 October 2017 at 08:06, Andres Freund <and...@anarazel.de> wrote:
> On 2017-10-06 07:59:40 +0800, Craig Ringer wrote:
>> The only thing that gets me excited about a threaded postgres is the
>> ability to have a PL/Java, PL/Mono etc that don't suck. We could do
>> some really cool things that just aren't practical right now.
>
> Faster parallelism with a lot less reinventing the wheel. Easier backend
> / session separation. Shared caches.

Yeah. We have a pretty major NIH problem in PostgreSQL, and I agree
that adopting threading and some commonplace tools would sure help us
reduce that burden a bit.

I would really miss shared-nothing-by-default though.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fork()-safety, thread-safety

2017-10-05 Thread Craig Ringer
On 6 October 2017 at 06:49, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Andres Freund <and...@anarazel.de> writes:
>> On 2017-10-05 17:31:07 -0500, Nico Williams wrote:
>>> You don't think eliminating a large difference between handling of WIN32
>>> vs. POSIX is a good reason?
>
>> I seems like you'd not really get a much reduced set of differences,
>> just a *different* set of differences. After investing time.
>
> Yeah -- unless we're prepared to drop threadless systems altogether,
> this doesn't seem like it does much for maintainability.  It might even
> be a net negative on that score, due to reducing the amount of testing
> the now-legacy code path would get.
>
> If there were reason to think we'd get a large performance benefit,
> or some other concrete win, it might be worth putting time into this.
> But I see no reason to believe that.
>
> (There's certainly an argument to be made that no-one cares about
> platforms without thread support anymore.  But I'm unconvinced that
> rewriting existing code that works fine is the most productive
> way to exploit such a choice if we were to make it.)

The only thing that gets me excited about a threaded postgres is the
ability to have a PL/Java, PL/Mono etc that don't suck. We could do
some really cool things that just aren't practical right now.

Not compelling to a wide audience, really.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgresql gives error that role goes not exists while it exists

2017-10-03 Thread Craig Ringer
On 3 October 2017 at 20:47, Euler Taveira  wrote:
>
> 2017-10-03 5:49 GMT-03:00 Nick Dro :
> > Can someone assists with the issue posted on StackOverflow?
> >
> > https://stackoverflow.com/questions/46540537/postgresql-9-3-creation-of-group-role-causes-permission-problems
> >
> >
> > Creation of new Group Role causes postgresql to think that Login roles does
> > not exist. I think it's a bug? or at least a wrong error message
> >
> I'm not sure. I bet a dime that the role was created as "Iris" and you
> are trying to assing "iris" (they are different). If you list the
> roles, we can confirm that.
>

... and the reason we don't emit a HINT here is that the exact same
HINT would apply in any context involving identifiers, so we'd just
flood the logs. It'd be spurious in most cases.

We could only emit a useful HINT if we actually went and looked in the
relevant catalog for a different-cased version. Which is pretty
expensive.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Off-by-one error in logical slot resource retention

2017-10-01 Thread Craig Ringer
On 2 October 2017 at 05:27, Daniel Gustafsson <dan...@yesql.se> wrote:

> > On 15 Sep 2017, at 13:26, Daniel Gustafsson <dan...@yesql.se> wrote:
> >
> >> On 01 Sep 2017, at 14:28, Aleksander Alekseev <
> a.aleks...@postgrespro.ru> wrote:
> >>
> >> The following review has been posted through the commitfest application:
> >> make installcheck-world:  not tested
> >> Implements feature:   not tested
> >> Spec compliant:   not tested
> >> Documentation:not tested
> >>
> >> Hi Craig,
> >>
> >> I'm afraid patches 0002 and 0003 don't apply anymore. Could you please
> resolve the conflicts?
> >>
> >> The new status of this patch is: Waiting on Author
> >
> > Have you had a chance to look at rebasing this patch so we can get it
> further
> > in the process?
>
> Since this patch has been in Waiting for Author state for the duration of
> the
> commitfest without moving, I’m marking it Returned with Feedback.  If
> there is
> still interest in pursuing this patch, please re-submit it to the next
> commitfest with the comments addressed.
>

Thanks. I'll revisit it next CF.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] pg_prepared_xact_status

2017-10-01 Thread Craig Ringer
On 30 September 2017 at 14:10, konstantin knizhnik <
k.knizh...@postgrespro.ru> wrote:


> So I do not see any troubles caused by adding this functions. And it can
> really be helpful for DBA in some cases.
>

If it's self-contained and exposes existing functionality, then I'm not
opposed, I just don't really see the point. I'm not seeing where it'd come
in useful.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] pg_prepared_xact_status

2017-10-01 Thread Craig Ringer
On 2 October 2017 at 08:09, Robert Haas <robertmh...@gmail.com> wrote:

> On Sat, Sep 30, 2017 at 2:10 AM, konstantin knizhnik
> <k.knizh...@postgrespro.ru> wrote:
> > txid_status() also not always be able to return status of transaction
> (if wraparound happen).
> > But it is still considered as one of the key features of 10 (transaction
> traceability...).
>
> Not by me.  It's a feature, though, for sure.


Same here. It's nice, and obviously I wanted it since I submitted it, but
it's not a key feature by any stretch.

Even if Pg also reported the xid to the client when assigned it'd still be
a nice utility, not a huge headline feature.


> It's also a LOT more
> stable than what you're proposing.  Even on a busy system, it takes a
> while to go through 200 million transactions; you probably can't
> realistically do that in under an hour, and you'll probably raise the
> 200 million transaction limit if you're anywhere close to that rate.
> In practice, you'll almost always be able to look up transactions for
> several days, and often weeks or months.


Not necessarily, since it doesn't hold down or delay clog truncation, so if
the system is really eagerly VACUUMed it might discard things sooner.

At the moment though we're quite lazy about clog truncation, mainly because
we're very lazy about vacuuming template1 and template0 (which we should
autofreeze, really) so they tend to hold down global datfrozenxid. If we
get better about that, then we might need some way to ask Pg to keep extra
clog. But for now it works well enough.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] pg_prepared_xact_status

2017-09-29 Thread Craig Ringer
On 29 September 2017 at 15:57, Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:


> So you are saying that Postgresql 2PC mechanism is not complete and user
> needs to maintain some extra information to make it work?
>

No, it provides what's needed for an implementation of what in XA terms is
a local resource manager (LRM). What it does not provide is infrastructure
to make postgres its self into a global transaction manager (GTM) for
co-ordinating multiple LRMs.

It sounds like you're trying to build a GTM using PostgreSQL's existing LRM
book-keeping as your authorative data store, right?


> The problems with 2PC arrive when coordinator node is not available but is
> expected to be recovered in future.
> In this case we may have not enough information to make a decision whether
> to abort or commit prepared transaction.
> But it is a different story. We need to use 3PC or some other protocol to
> prevent such situation.


In that case the node sits and waits patiently for the GTM (or in more
complex architectures, *a* valid voting quorum of GTMs) to be reachable
again. Likely using a protocol like Raft, Paxos, 3PC etc to co-ordinate.

It can't do anything else, since if it unilaterally commits or rolls back
it might later find out that the nodes on the other side of the network
partition or whatever made the opposite decision and, boom!

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] pg_prepared_xact_status

2017-09-29 Thread Craig Ringer
On 29 September 2017 at 15:57, Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:

>
> The idea of pg_prepared_xact_status function is that it allows to get
> status of 2PC transaction without any additional requirements to GIDs and
> any other additional information about participants of 2PC transaction.
>
>
This sounds kind-of like 1/4 of a distributed transaction resolver, without
a way to make it reliable enough to build the other 3/4.

To make this practical I think you'd need a way to retain historic GIDs +
their outcomes, and a way to prune that information only once an
application knows all interested participants consider the transaction
finalized.

I'd be all for a global xid status function if there were a good way to
manage resource retention. But it's fuzzy enough for txid_status, which
isn't really making any firm promises, just improving on the prior state of
"no f'ing idea what happened to that tx, sorry". 2PC consumers will want
absolute guarantees, not "dunno, sorry".

(Ahem, HeuristicMixedException raises its hand. You, sit down! You can only
get that if you mix 2PC and !2PC xacts).

I can see it being useful for Pg to be able to report a stream of GIDs +
outcomes for applications to consume. But unless you have either:

* some kind of app-controlled retention horizon and a way to multiplex it
for >1 app (like slots do); or

* a node registry where Pg its self implements a full built-in transaction
resolver;

then I think it's probably not going to get far.

I could see a full DTC resolver in postgres one day, once we have things
like working in-core logical rep based multi-master with 2PC support. But
that's a looong way off.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Built-in plugin for logical decoding output

2017-09-26 Thread Craig Ringer
On 26 September 2017 at 22:14, Magnus Hagander <mag...@hagander.net> wrote:

>
>
> On Tue, Sep 26, 2017 at 2:16 PM, Alvaro Hernandez <a...@ongres.com> wrote:
>
>>
>>
>>
>> But what about earlier versions? Any chance it could be backported
>> down to 9.4? If that would be acceptable, I could probably help/do that...
>
>
> The likelihood is zero if you mean backported into core of earlier
> versions.
>

Right. We don't add features to back branches.


>
> If you mean backported as a standalone extension that could be installed
> on a previous version, probably. I'm not sure if it relies on any internals
> not present before that would make it harder, but it would probably at
> least be possible.
>
>
All the pub/sub stuff is new and hooked into syscache etc. So you'd be
doing a bunch of emulation/shims using user catalogs. Not impossible, but
probably irritating and verbose. And you'd have none of the DDL required to
manage it, so you'd need SQL-function equivalents.

I suspect you'd be better off tweaking pglogical to speak the same protocol
as pg10, since the pgoutput protocol is an evolution of pglogical's
protocol. Then using pglogical on older versions.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Built-in plugin for logical decoding output

2017-09-26 Thread Craig Ringer
On 26 September 2017 at 15:26, Alvaro Hernandez <a...@ongres.com> wrote:

>
> That's better than nothing. But as much as interoperable json may be,
> people still need to talk the (binary) replication protocol to use it.
>

No, they don't.

They can use the SQL interface to logical decoding.

We could enhance that with a few small changes to make it a lot more useful
too. Most importantly, allow a long-polling model, where you can wait if
there's nothing to consume rather than getting an immediate empty
result-set.

I expect the walsender protocol to be dealt with by client drivers, not
user applications, much like you never really deal with the regular libpq
protocol in your app. PgJDBC and psycopg2 already support it. It'd be nice
if libpq offered some helper interfaces for C apps, and I'd help review any
such patch.

Nothing against binary output, but as you noted, we can likely use pgoutput
for that to some extent at least. I think the pressing need is json, going
by the zillion plugins out there for it.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Built-in plugin for logical decoding output

2017-09-26 Thread Craig Ringer
On 26 September 2017 at 14:08, Alvaro Hernandez <a...@ongres.com> wrote:


>
>>
> OK, let me try to do that. I believe data integration is a priority.


Definitely agree so far.


> - If you want to develop your own output plugin, then your market is
> reduced as you have to exclude all the managed solutions (until, and only
> if, you would convince them to include your plugin... highly unlikely, very
> difficult). And probably another % of enterprise environments which will
> hesitate to link your own plugin to the running production database. Last
> but not least, you need to compile and test (and testing this is far from
> easy) on multiple OSes/architectures.
>

Right. You probably don't want one output plugin per application.

This doesn't mean it has to be in-core, just flexible and share-able by
many, and adopted by cloud vendors. Like say PostGIS.


>
> - If you stick to in-core plugins, then you need to support at least three
> different output formats if you want to support 9.4+: test_decoding (and
> pray it works!), pgoutput, and the "new" in-core plugin that was proposed
> at the beginning of this thread, if that would see the light.
>

The only practical way will IMO be to have whatever new plugin it also have
an out-of-core version maintained for older Pg versions, where it can be
installed.


> But only in-core plugins help for general-purpose solutions.


I still don't agree there. If there's enough need/interest/adoption you can
get cloud vendors on board, they'll feel the customer pressure. It's not
our job to create that pressure and do their work for them.

I see nothing wrong with a plugin starting off out of core and being
adopted+adapted later, assuming it's done well.

That said, I'm all in favour of a generic json output plugin that shares
infrastructure with logical replication, so people who are on inflexible
environments have a fallback option. I just don't care to write it.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Built-in plugin for logical decoding output

2017-09-25 Thread Craig Ringer
On 26 September 2017 at 01:48, Joshua D. Drake <j...@commandprompt.com> wrote:

> On 09/25/2017 10:43 AM, Andres Freund wrote:
>
>> On 2017-09-25 10:38:52 -0700, Joshua D. Drake wrote:
>>
>>> On 09/25/2017 10:32 AM, Petr Jelinek wrote:
>>>
>>>> On 25/09/17 19:26, Tom Lane wrote:
>>>>
>>>>> Alvaro Hernandez <a...@ongres.com> writes:
>>>>>
>>>>
>>>
>>>> There is already about 3 million output plugins out there so I think we
>>>> did reasonable job there. The fact that vast majority of that are
>>>> various json ones gives reasonable hint that we should have that one in
>>>> core though.
>>>>
>>>
>>> And I am sure that 2ndQuadrant would be happy to add it to their version
>>> of
>>> Postgres and maintain it themselves.
>>>
>>> https://www.2ndquadrant.com/en/resources/2ndqpostgres/
>>>
>>
>> This doesn't seem like a good way to argue.
>>
>>
> Sorry, that wasn't supposed to be negative. My point was that 2ndQuadrant
> has a distribution of Postgres that they support.


For what it's worth, it's mostly things that didn't make it into core for a
release, things that got knocked back, backports, etc.

I for one don't want to carry a big delta from core.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Built-in plugin for logical decoding output

2017-09-25 Thread Craig Ringer
On 26 September 2017 at 01:53, Andres Freund <and...@anarazel.de> wrote:

> On 2017-09-25 13:50:29 -0400, Tom Lane wrote:
> > Andres Freund <and...@anarazel.de> writes:
> > >> On 25/09/17 19:26, Tom Lane wrote:
> > >>> The problem with this type of argument is that it leads directly to
> the
> > >>> conclusion that every feature users want must be in core.
> >
> > > ... I don't think that should mean that there's no possible output
> > > plugin that could/should be integrated into core.
> >
> > Yeah, my point is just that the argument needs to be about why a
> > *particular* plugin is valuable enough to justify adding it to the
> > core developers' maintenance workload.
>
> +1
>
>
Yep, and that goes for plugins like pglogical too.

I agree we need a json plugin, it's pretty clear that's a widespread need.

But I don't buy the whole argument about "hosted postgres" issues. The
hosted solutions suppliers can simply use 3rd party plugins, like some do
PostGIS already. Trying to push things into core is offloading work onto us
to make their lives easier and I don't much care for that.

Personally I'd be more friendly toward Amazon / Google / etc wanting us to
include things for their convenience if they actually usefully contributed
to development and maintenance of Pg.

-- 
 Craig Ringer


Re: [HACKERS] Built-in plugin for logical decoding output

2017-09-24 Thread Craig Ringer
On 24 September 2017 at 07:41, Euler Taveira <eu...@timbira.com.br> wrote:


> It is difficult to
> coordinate a change like that having only one-way communication).
> <http://www.postgresql.org/mailpref/pgsql-hackers>
>

I really think we need to fix that at some point, such that:

* Downstream connections can send CopyData messages *up* the COPY BOTH
protocol, where they're passed to a hook on the output plugin; and

* Output plugins can hook the walsender's event loop (latch set, etc) and
send their own messages without being driven by a logical decoding event .

I wanted to do that some time ago but ran into some issues and time
constraints. Because of the need to support older versions I'm now
committed to an approach using direct libpq connections and function calls
instead, but it seems like a real shame to do that when the replication
protocol connection is *right there*...

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Built-in plugin for logical decoding output

2017-09-24 Thread Craig Ringer
On 23 September 2017 at 06:28, Gregory Brail <gregbr...@google.com> wrote:


> Would the community support the development of another plugin that is
> distributed as part of "contrib" that addresses these issues?
>

Petr Jelinek and I tried just that with pglogical. Our submission was
knocked back with the complaint that there was no in-core user of the code,
and it couldn't be evaluated usefully without an in-core consumer/receiver.

It's possible we'd make more progress if we tried again now, since we could
probably write a test suite using the TAP test framework and a small
src/test/modules consumer. But now we'd probably instead get blocked with
the complaint that the output plugin used for logical replication should be
sufficient for any reasonable need. I anticipate that we'd have some
disagreements about what a reasonable need is, but ... *shrug*.

I personally think we _should_ have such a thing, and that it should be
separate to the logical replication plugin to allow us to evolve that
without worrying about out of core dependencies etc.

There's some common functionality that needs factoring out into the logical
decoding framework, like some sort of relation metadata cache, some concept
of "replication sets" or a set of tables to include/exclude, etc. Doing
that is non-trivial work, but it's unlikely that two plugins with similar
and overlapping implementations of such things would be accepted; in that
case I'd be firmly in the "no" camp too.

Code in Pg has a cost, and we do have to justify that cost when we drop
things in contrib/. It's not a free slush pile. So a solid argument does
need to be made for why having this module living in github/whatever isn't
good enough.

I'd be happy to submit a patch, or GitHub repo, or whatever works best as
> an example. (Also, although Transicator uses protobuf, I'm happy to have it
> output a simple binary format as well.)
>

PostgreSQL tends to be very, very conservative about dependencies and
favours (not-)-invented-here rather heavily. Optional dependencies are
accepted sometimes when they can be neatly isolated to one portion of the
codebase and/or abstracted away, so it's not impossible you'd get
acceptance for something like protocol buffers. But there's pretty much
zero chance you'll get it as a hard dependency, you'll need a simple text
and binary protocol too.

At which point the question will arise, why aren't these 3 separate output
plugins? The text one, the binary one for in-core and the protobuf one to
be maintained out of core.

That's a pretty sensible question. The answer is that they'll all need to
share quite a bit of common infrastructure. But if that's infrastructure
all plugins need, shouldn't it be pushed "up" into the logical decoding
layer's supporting framework? Patches welcome for the next major release
cycle.

Thus, that's where I think you should actually start. Extract (and where
necessary generalize) key parts of your code that should be provided by
postgres its self, not implemented by each plugin. And submit it so all
plugins can share it and yours can be simpler. Eventually to the point
where output plugins are often simple format wrappers.

You might want to look at

* pglogical's output plugin; and
* bottled-water

for ideas about things that would benefit from shared infrastructure, and
ways to generalize it. I will be very happy to help there as time permits.


> As a side note, doing this would also help making logical decoding a
> useful feature for customers of Amazon and Google's built-in Postgres
> hosting options.
>

Colour me totally unconvinced there. Either, or both, can simply bless
out-of-tree plugins as it is; after all, they can and do patch the core
server freely too.

It'd *help* encourage them both to pick the same plugin, but that's about
it. And only if the plugin could satisfy their various constraints about no
true superuser access, etc.

I guess I'm a bit frustrated, because *I tried this*, and where was anyone
from Google or Amazon then? But now there's a new home-invented plugin that
we should adopt, ignoring any of the existing ones. Why?


https://github.com/apigee-labs/transicator/tree/master/pgoutput
>

No README?

Why did this need to be invented, rather than using an existing plugin?

I don't mind, I mean, it's great that you're using the plugin
infrastructure and using postgres. I'm just curious what bottled-water,
pglogical, etc lacked, what made you go your own way?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

2017-09-21 Thread Craig Ringer
On 21 September 2017 at 16:56, Michael Paquier <michael.paqu...@gmail.com>
wrote:

> On Thu, Sep 21, 2017 at 4:40 PM, Masahiko Sawada <sawada.m...@gmail.com>
> wrote:
> > On Thu, Sep 21, 2017 at 2:25 PM, Michael Paquier
> > <michael.paqu...@gmail.com> wrote:
> >> On Thu, Sep 21, 2017 at 1:07 AM, Masahiko Sawada <sawada.m...@gmail.com>
> wrote:
> >>> The bug can happen in PostgreSQL 9.1 or higher that non-exclusive
> >>> backup has been introduced, so we should back-patch to the all
> >>> supported versions.
> >>
> >> There is a typo here right? Non-exclusive backups have been introduced
> >> in 9.6. Why would a back-patch further down be needed?
> >
> > I think the non-exclusive backups infrastructure has been introduced
> > in 9.1 for pg_basebackup. I've not checked yet that it can be
> > reproduced using pg_basebackup in PG9.1 but I thought it could happen
> > as far as I checked the code.
>
> Yep, but the deficiency is caused by the use before_shmem_exit() in
> the SQL-level functions present in 9.6~ which make the cleanup happen
> potentially twice. If you look at the code of basebackup.c, you would
> notice that the cleanups of the counters only happen within the
> PG_ENSURE_ERROR_CLEANUP() blocks, so a backpatch to 9.6 should be
> enough.
>
> >> +-  Assert(XLogCtl->Insert.nonExclusiveBackups >= 0);
> >> +   if (XLogCtl->Insert.nonExclusiveBackups > 0)
> >> +   XLogCtl->Insert.nonExclusiveBackups--;
> >> Hm, no, I don't agree. I think that instead you should just leave
> >> do_pg_abort_backup() immediately if sessionBackupState is set to
> >> SESSION_BACKUP_NONE. This variable is the link between the global
> >> counters and the session stopping the backup so I don't think that we
> >> should touch this assertion of this counter. I think that this method
> >> would be safe as well for backup start as pg_start_backup_callback
> >> takes care of any cleanup. Also because the counters are incremented
> >> before entering in the PG_ENSURE_ERROR_CLEANUP block, and
> >> sessionBackupState is updated just after leaving the block.
> >
> > I think that the assertion failure still can happen if the process
> > aborts after decremented the counter and before setting to
> > SESSION_BACKUP_NONE. Am I missing something?
>
> The process would stop at the next CHECK_FOR_INTERRUPTS() and trigger
> the cleanup at this moment. So this happens when waiting for the
> archives to be done, and the session flag is set to NONE at this
> point.
>

Another one to watch out for is that elog(...) and ereport(...) invoke
CHECK_FOR_INTERRUPTS. That's given me exciting surprises before when
combined with assertion checking and various exit cleanup hooks.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Error: dsa_area could not attach to a segment that has been freed

2017-09-20 Thread Craig Ringer
On 21 September 2017 at 05:50, Thomas Munro <thomas.mu...@enterprisedb.com>
wrote:

> On Thu, Sep 21, 2017 at 12:59 AM, Robert Haas <robertmh...@gmail.com>
> wrote:
> > On Wed, Sep 20, 2017 at 5:54 AM, Craig Ringer <cr...@2ndquadrant.com>
> wrote:
> >> By the way, dsa.c really needs a cross-reference to shm_toc.c and vice
> >> versa. With a hint as to when each is appropriate.
> >
> > /me blinks.
> >
> > Aren't those almost-entirely-unrelated facilities?
>
> I think I see what Craig means.
>
> 1.  A DSM segment works if you know how much space you'll need up
> front so that you can size it. shm_toc provides a way to exchange
> pointers into it with other backends in the form of shm_toc keys
> (perhaps implicitly, in the form of well known keys or a convention
> like executor node ID -> shm_toc key).  Examples: Fixed sized state
> for parallel-aware executor nodes, and fixed size parallel executor
> infrastructure.
>
> 2.  A DSA area is good if you don't know how much space you'll need
> yet.  dsa_pointer provides a way to exchange pointers into it with
> other backends.  Examples: A shared cache, an in-memory database
> object like Gaddam Sai Ram's graph index extension, variable sized
> state for parallel-aware executor nodes, the shared record typmod
> registry stuff.
>
> Perhaps confusingly we also support DSA areas inside DSM segments,
> there are DSM segments inside DSA areas.  We also use DSM segments as
> a kind of shared resource cleanup mechanism, and don't yet provide an
> equivalent for DSA.  I haven't proposed anything like that because I
> feel like there may be a better abstraction of reliable scoped cleanup
> waiting to be discovered (as I think Craig was also getting at).
>

Well said, and what I would've wanted to say if I could've figured it out
well enough to express it.

Hence needing some kind of README or cross reference to help people know
which facility/facilities are suitable for their needs... and actually
discover them.

(A hint on RequestAddinShmemSpace etc pointing to DSM + DSA would be good
too)

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Error: dsa_area could not attach to a segment that has been freed

2017-09-20 Thread Craig Ringer
On 20 September 2017 at 17:52, Craig Ringer <cr...@2ndquadrant.com> wrote:

> On 20 September 2017 at 16:55, Thomas Munro <thomas.mu...@enterprisedb.com
> > wrote:
>
>> On Wed, Sep 20, 2017 at 6:14 PM, Gaddam Sai Ram
>> <gaddamsaira...@zohocorp.com> wrote:
>> > Thank you very much! That fixed my issue! :)
>> > I was in an assumption that pinning the area will increase its lifetime
>> but
>> > yeah after taking memory context into consideration its working fine!
>>
>> So far the success rate in confusing people who first try to make
>> long-lived DSA areas and DSM segments is 100%.  Basically, this is all
>> designed to ensure automatic cleanup of resources in short-lived
>> scopes.
>>
>
> 90% ;)
>
> I got it working with no significant issues for a long lived segment used
> to store a pool of shm_mq pairs used for a sort of "connection listener"
> bgworker. Though I only used DSM+ToC, not DSA.
>
>
By the way, dsa.c really needs a cross-reference to shm_toc.c and vice
versa. With a hint as to when each is appropriate.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Error: dsa_area could not attach to a segment that has been freed

2017-09-20 Thread Craig Ringer
On 20 September 2017 at 16:55, Thomas Munro <thomas.mu...@enterprisedb.com>
wrote:

> On Wed, Sep 20, 2017 at 6:14 PM, Gaddam Sai Ram
> <gaddamsaira...@zohocorp.com> wrote:
> > Thank you very much! That fixed my issue! :)
> > I was in an assumption that pinning the area will increase its lifetime
> but
> > yeah after taking memory context into consideration its working fine!
>
> So far the success rate in confusing people who first try to make
> long-lived DSA areas and DSM segments is 100%.  Basically, this is all
> designed to ensure automatic cleanup of resources in short-lived
> scopes.
>

90% ;)

I got it working with no significant issues for a long lived segment used
to store a pool of shm_mq pairs used for a sort of "connection listener"
bgworker. Though I only used DSM+ToC, not DSA. But TBH that may well be
luck, as I tend to routinely use memory contexts scoped to the operational
lifetime of a subsystem, making most problems like this just vanish without
my realising they were there in the first place. Usually.

I pretty much shamelessly cribbed from test_shm_mq for the ToC stuff
though. It's simple enough when you read it in use, but I'd be lucky to do
it without an example.

I had lots more problems with shm_mq than DSM. shm_mq is very obviously
designed for short-lived scopes, and falls down badly if you have a pool of
queues you want to re-use after the peer detaches. You have to track "in
use" flags separately to the shm_mq's own, because it doesn't clear its
stored PGPROC entries for receiver/sender on detach. Once you know neither
sender nor receiver is still attached, you can memset() the area and create
a new queue in it.

You can't just reset the queue for a new peer, and have to do quite a dance
to make sure it's safe detach from, overwrite, re-create and re-attach to.


> Good luck for your graph project.  I think you're going to have to
> expend a lot of energy trying to avoid memory leaks if your DSA lives
> as long as the database cluster, since error paths won't automatically
> free any memory you allocated in it.


Yeah, that's going to be hard. You might land up having lots and lots of
little DSM segments.



> There is a kind of garbage collection for palloc'd memory and
> also for other resources like file handles, but if you're using a big
> long lived DSA area you have nothing like that.


We need, IMO, a DSA-backed heirachical MemoryContext system.

We can't use the exact MemoryContext API as-is due to the need for far
pointers though :(

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


[HACKERS] Utilities for quoting literals and identifiers in Pg TAP tests

2017-09-19 Thread Craig Ringer
Hi all

Here's a little utility class I wrote for value and identifier quoting for
use in TAP tests.

Might be handy for others.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


PGValues.pm
Description: Perl program

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] src/test/subscription/t/002_types.pl hanging on particular environment

2017-09-19 Thread Craig Ringer
On 20 September 2017 at 12:16, Craig Ringer <cr...@2ndquadrant.com> wrote:


> The thought I had in mind upthread was to get rid of logicalrep slots
>> in favor of expanding the underlying bgworker slot with some additional
>> fields that would carry whatever extra info we need about a logicalrep
>> worker.  Such fields could also be repurposed for additional info about
>> other kinds of bgworkers, when (not if) we find out we need that.
>>
>
> That sounds OK to me personally for in-core logical rep, but it's really
> Petr and Peter who need to have a say here, not me.
>
>
Actually, I take that back. It'd bloat struct BackgroundWorker
significantly, and lead to arguments whenever logical replication needed
new fields, which it surely will. Every bgworker would pay the price. If we
added some kind of union to struct BackgroundWorker, other worker types
could later use the same space, offsetting the cost somewhat. But that'd be
no use to out-of-core workers, workers that don't actually need the extra
room, etc.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] src/test/subscription/t/002_types.pl hanging on particular environment

2017-09-19 Thread Craig Ringer
On 20 September 2017 at 12:06, Amit Kapila <amit.kapil...@gmail.com> wrote:

> On Wed, Sep 20, 2017 at 9:23 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> > Craig Ringer <cr...@2ndquadrant.com> writes:
> >> On 19 September 2017 at 18:04, Petr Jelinek <
> petr.jeli...@2ndquadrant.com>
> >> wrote:
> >>> If you are asking why they are not identified by the
> >>> BackgroundWorkerHandle, then it's because it's private struct and can't
> >>> be shared with other processes so there is no way to link the logical
> >>> worker info with bgworker directly.
> >
> >> I really want BackgroundWorkerHandle to be public, strong +1 from me.
> >
> > I'm confused about what you think that would accomplish.  AFAICS, the
> > point of BackgroundWorkerHandle is to allow the creator/requestor of
> > a bgworker to verify whether or not the slot in question is still
> > "owned" by its request.
> >
>
> Right, but can we avoid maintaining additional information (in_use,
> generation,..) in LogicalRepWorker which is similar to bgworker worker
> machinery (which in turn can also avoid all the housekeeping for those
> variables) if we have access to BackgroundWorkerHandle?
> <http://www.enterprisedb.com>
>

As far as I can see, probably not.

Because struct BackgroundWorker only has a single by-value Datum argument,
you can't pass much more information to a bgworker than "here's a slot
number to look up what you need to know to configure yourself". That can be
a ToC position for a shm_mq, or some kind of shmem array, but you need
something. You can't pass a pointer because of ASLR and EXEC_BACKEND. If
the launcher lives significantly longer than its workers, where a worker
dying isn't fatal to the launcher, it needs to be able to re-use slots in
the fixed-size shmem array workers use to self-configure. Which means some
kind of generation counter, like we have in BackgroundWorkerHandle, and an
in_use flag.

Knowing a logical worker's bgworker has started isn't
enough. WaitForReplicationWorkerAttach needs to know it has come up
successfully as a logical replication worker. To do that, it needs to know
that the entry it's looking at in LogicalRepWorker isn't for some prior
logical replication worker that previously had the same LogicalRepWorker
slot, though not necessarily the same BackgroundWorker slot. There's no 1:1
mapping of LogicalRepWorker slot to BackgroundWorker slot to rely on.

So unless we're willing to extend struct BackgroundWorker with some kind of
union that can be used to store extra info for logical rep workers and
whatever else we might later want, I don't see LogicalRepWorker going away.
If we do that, it'll make extending the shmem for logical replication
workers harder since it'll grow the entry for every background worker, not
just logical replication workers.

Parallel query gets away without most of this because as far as I can see
parallel workers don't need to enumerate each other or look up each others'
state, which logical rep can need to do for things like initial table sync.
It doesn't appear to re-launch workers unless it has a clean slate and can
clobber the entire parallel DSM context, either. So it doesn't need to
worry about whether the worker it's talking to is the one it just launched,
or some old worker that hasn't gone away yet. The DSM segment acts as a
generation counter of sorts, with all workers in the same generation.


If we wanted to do away with LogicalRepWorker shmem slots, I suspect we'd
have to broker all inter-worker communications via shm_mq pairs, where
worker A asks the launcher for the status of worker B, or to block until
worker B does X, or whatever. Do everything over shm_mq messaging not shmem
variables. That's a pretty major change, and makes the launcher responsible
for all synchronisation and IPC. It also wouldn't work properly unless we
nuked the DSM segment containing all the queues whenever any worker died
and restarted the whole lot, which would be terribly costly in terms of
restarting transaction replays etc. Or we'd have to keep some kind of
per-shm_mq-pair "in use" flag so we knew when to nuke and reset the queue
pair for a new worker to connect, at which point we're halfway back to
where we started...


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] src/test/subscription/t/002_types.pl hanging on particular environment

2017-09-19 Thread Craig Ringer
On 20 September 2017 at 11:53, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Craig Ringer <cr...@2ndquadrant.com> writes:
> > On 19 September 2017 at 18:04, Petr Jelinek <
> petr.jeli...@2ndquadrant.com>
> > wrote:
> >> If you are asking why they are not identified by the
> >> BackgroundWorkerHandle, then it's because it's private struct and can't
> >> be shared with other processes so there is no way to link the logical
> >> worker info with bgworker directly.
>
> > I really want BackgroundWorkerHandle to be public, strong +1 from me.
>
> I'm confused about what you think that would accomplish.  AFAICS, the
> point of BackgroundWorkerHandle is to allow the creator/requestor of
> a bgworker to verify whether or not the slot in question is still
> "owned" by its request.  This is necessarily not useful to any other
> process, since they didn't make the request.
>

I'm using shm_mq in a sort of "connection accepter" role, where a pool of
shm_mq's are attached to by a long lived bgworker. Other backends, both
bgworkers and normal user backends, can find a free slot and attach to it
to talk to the long lived bgworker. These other backends are not
necessarily started by the long lived worker, so it doesn't have a
BackgroundWorkerHandle for them.

Currently, if the long lived bgworker dies and a peer attempts to attach to
the slot, they'll hang forever in shm_mq_wait_internal, because it cannot
use shm_mq_set_handle as described in
https://www.postgresql.org/message-id/E1XbwOs-0002Fd-H9%40gemulon.postgresql.org
to protect its self.

See also thread
https://www.postgresql.org/message-id/CAMsr%2BYHmm%3D01LsuEYR6YdZ8CLGfNK_fgdgi%2BQXUjF%2BJeLPvZQg%40mail.gmail.com
.

If the BackgroundWorkerHandle for the long-lived bgworker is copied to a
small static control shmem segment, the connecting workers can use that to
reliably bail out if the long-lived worker dies.


> The thought I had in mind upthread was to get rid of logicalrep slots
> in favor of expanding the underlying bgworker slot with some additional
> fields that would carry whatever extra info we need about a logicalrep
> worker.  Such fields could also be repurposed for additional info about
> other kinds of bgworkers, when (not if) we find out we need that.
>

That sounds OK to me personally for in-core logical rep, but it's really
Petr and Peter who need to have a say here, not me.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Show backtrace when tap tests fail

2017-09-19 Thread Craig Ringer
On 20 September 2017 at 06:36, David Steele <da...@pgmasters.net> wrote:

>
> I just use:
>
> $SIG{__DIE__} = sub {Carp::confess @_};
>

That's what I patched into my TestLib.pm too, until I learned of
Carp::Always.

I'd rather have Carp::Always, but it's definitely an OK fallback.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] src/test/subscription/t/002_types.pl hanging on particular environment

2017-09-19 Thread Craig Ringer
On 19 September 2017 at 20:33, Amit Kapila <amit.kapil...@gmail.com> wrote:


> Yeah, but you could have used the way we do for parallel query where
> we setup dsm and share all such information.  You can check the logic
> of execparallel.c and parallel.c to see how we do all such stuff for
> parallel query.


Parallel query has a very clearly scoped lifetime, which seems to help. The
parallel workers are started by a leader, whose lifetime entirely
encompasses that of the workers. They're strictly children of the leader
process.

In logical replication, the logical replication manager doesn't start the
walsenders, they're started by the postmaster in response to inbound
connections.

But the logical replication launcher does start the managers, and the
managers start the apply workers. (IIRC). If we don't mind the whole thing
dying and restarting if the launcher dies,  or workers for a db dying if a
database manager dies, then using dsm segments and detach notifications
does seem viable.

IIRC when we did something similar in pglogical we ran into problems with
(a) inability to handle an ERROR in a bgworker without exiting and being
restarted by the postmaster; and (b) the postmaster remembering bgworker
registrations across crash restart with no way to tell it not to. Maybe
Petr remembers the details?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] src/test/subscription/t/002_types.pl hanging on particular environment

2017-09-19 Thread Craig Ringer
On 19 September 2017 at 18:04, Petr Jelinek <petr.jeli...@2ndquadrant.com>
wrote:


>
> If you are asking why they are not identified by the
> BackgroundWorkerHandle, then it's because it's private struct and can't
> be shared with other processes so there is no way to link the logical
> worker info with bgworker directly.


I really want BackgroundWorkerHandle to be public, strong +1 from me.

It'd be very beneficial when working with shm_mq, too. Right now you cannot
pass a BackgroundWorkerHandle through shmem to another process, either via
a static shmem region or via shm_mq. This means you can't supply it to
shm_mq_attach and have shm_mq handle lifetime issues for you based on the
worker handle.

TBH I think there's a fair bit of work needed in the bgworker
infrastructure, but making BackgroundWorkerHandle public is a small and
simple start that'd be a big help.

At some point we'll also want to be able to enumerate background workers
and get handles for existing workers. Also, let background workers recover
from errors without exiting, which means factoring a bunch of stuff out of
PostgresMain. But both of those are bigger jobs.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2017-09-14 Thread Craig Ringer
On 14 September 2017 at 19:57, Ashutosh Sharma <ashu.coe...@gmail.com>
wrote:


>
> Are you planning to work on the review comments from Robert, Moon
> Insung and supply the new patch. I just had a quick glance into this
> mail thread (after a long time) and could understand Robert's concern
> till some extent. I think, he is trying to say that if a tuple is
> frozen (committed|invalid) then it shouldn't be shown as COMMITTED and
> INVALID together in fact it should just be displayed as FROZEN tuple.


Yes, I'd like to, and should have time for it in this CF.

My plan is to emit raw flags by default, so FROZEN would't be shown at all,
only COMMITTED|INVALID. If the bool to decode combined flags is set, then
it'll show things like FROZEN, and hide COMMITTED|INVALID. Similar for
other combos.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-09-13 Thread Craig Ringer
On 13 September 2017 at 13:44, Vaishnavi Prabakaran <
vaishnaviprabaka...@gmail.com> wrote:


> Thanks for explaining. Will change this too in next version.
>
>
Thankyou, a lot, for picking up this patch.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-09-12 Thread Craig Ringer
On 13 September 2017 at 13:06, Vaishnavi Prabakaran <
vaishnaviprabaka...@gmail.com> wrote:

>
>
> On Wed, Aug 23, 2017 at 7:40 PM, Andres Freund <and...@anarazel.de> wrote:
>
>>
>>
>>
>> > Am failing to see the benefit in allowing user to set
>> > PQBatchAutoFlush(true|false) property? Is it really needed?
>>
>> I'm inclined not to introduce that for now. If somebody comes up with a
>> convincing usecase and numbers, we can add it later. Libpq API is set in
>> stone, so I'd rather not introduce unnecessary stuff...
>>
>>
> Thanks for reviewing the patch and yes ok.
>
>
>>
>>
>> > +   
>> > +Much like asynchronous query mode, there is no performance
>> disadvantage to
>> > +using batching and pipelining. It increases client application
>> complexity
>> > +and extra caution is required to prevent client/server deadlocks
>> but
>> > +can sometimes offer considerable performance improvements.
>> > +   
>>
>> That's not necessarily true, is it? Unless you count always doing
>> batches of exactly size 1.
>>
>
> Client application complexity is increased in batch mode,because
> application needs to remember the query queue status. Results processing
> can be done at anytime, so the application needs to know till what query,
> the results are consumed.
>
>

Yep. Also, the client/server deadlocks at issue here are a buffer
management issue, and deadlock is probably not exactly the right word. Your
app has to process replies from the server while it's sending queries,
otherwise it can get into a state where it has no room left in its send
buffer, but the server isn't consuming its receive buffer because the
server's send buffer is full. To allow the system to make progress, the
client must read from the client receive buffer.

This isn't an issue when using libpq normally.

PgJDBC has similar issues with its batch mode, but in PgJDBC it's much
worse because there's no non-blocking send available. In libpq you can at
least set your sending socket to non-blocking.



>
> > +   
>> > +Use batches when your application does lots of small
>> > +INSERT, UPDATE and
>> > +DELETE operations that can't easily be
>> transformed into
>> > +operations on sets or into a
>> > +COPY
>> operation.
>> > +   
>>
>> Aren't SELECTs also a major beneficiarry of this?
>>
>
Yes, many individual SELECTs that cannot be assembled into a single more
efficient query would definitely also benefit.


> Hmm, though SELECTs also benefit from batch mode, doing multiple selects
> in batch mode will fill up the memory rapidly and might not be as
> beneficial as other operations listed.
>

Depends on the SELECT. With wide results you'll get less benefit, but even
then you can gain if you're on a high latency network. With "n+1" patterns
and similar, you'll see huge gains.


> Maybe note that multiple batches can be "in flight"?
>> I.e. PQbatchSyncQueue() is about error handling, nothing else? Don't
>> have a great idea, but we might want to rename...
>>
>>
> This function not only does error handling, but also sends the "Sync"
> message to backend. In batch mode, "Sync" message is not sent with every
> query but will
> be sent only via this function to mark the end of implicit transaction.
> Renamed it to PQbatchCommitQueue. Kindly let me know if you think of any
> other better name.
>

I really do not like calling it "commit" as that conflates with a database
commit.

A batch can embed multiple BEGINs and COMMITs. It's entirely possible for
an earlier part of the batch to succeed and commit, then a later part to
fail, if that's the case. So that name is IMO wrong.


>>
>> > +
>> > + 
>> > +  PQbatchSyncQueue
>> > +  
>> > +   PQbatchSyncQueue
>> > +  
>> > + 
>>
>> I wonder why this isn't framed as PQbatchIssue/Send/...()? Syncing seems
>> to mostly make sense from a protocol POV.
>>
>>
> Renamed to PQbatchCommitQueue.
>
>
Per above, strong -1 on that. But SendQueue seems OK, or FlushQueue?


>
>> > + *   Put an idle connection in batch mode. Commands submitted after
>> this
>> > + *   can be pipelined on the connection, there's no requirement to
>> wait for
>> > + *   one to finish before the next is dispatched.
>> > + *
>> > + *   Queuing of new query or syncing during COPY is not allowed.
>>
>> +"a"?
>>
>
> Hmm, Can you explain the question please. I don't understand.
>

s/of new query/of a new query/


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] WIP: Failover Slots

2017-09-06 Thread Craig Ringer
On 14 August 2017 at 11:56, Craig Ringer <cr...@2ndquadrant.com> wrote:

>
> I don't want to block failover slots on decoding on standby just because
> decoding on standby would be nice to have.
>

However, during discussion with Tomas Munro a point has come up that does
block failover slots as currently envisioned - silent timeline divergence.
It's a solid reason why the current design and implementation is
insufficient to solve the problem. This issue exists both with the original
failover slots and with the model Robert and I were discussing.

Say a decoding client has replayed from master up to commit of xid 42 at
1/1000 and confirmed flush, then a failover slots standby of the master is
promoted. The standby has only received WAL from the failed master up to
1/500 with most recent xid 20. Now the standby does some other new xacts,
pushing xid up to 30 at 1/1000 then continuing to insert until xid 50 at
lsn 1/2000.

Then the logical client reconnects. The logical client will connect to the
failover slot fine, and start replay. But it'll ask for replay to start at
1/1000. The standby will happily fast-forward the slot (as it should), and
start replay after 1/1000.

But now we have silent divergence in timelines. The logical replica has
received and committed xacts 20...42 at lsn 1/500 through 1/1000, but these
are not present on the promoted master. And the replica has skipped over
the new-master's xids 20...30 with lsns 1/500 through 1/1000, so they're
present on the new master but not the replica.

IMO, this shows that not including the timeline in replication origins was
a bit of a mistake, since we'd trivially detect this if they were included
- but it's a bit late now.  And anyway, detection would just mean logical
rep would break, which doesn't help much.

The simplest fix, but rather limited, is to require that failover
candidates be in synchronous_standby_names, and delay ReorderBufferCommit
sending the actual commit message until all peers in s_s_n confirm flush of
the commit lsn. But that's not much good if you want sync rep for your
logical connections too, and is generally a hack.

A more general solution requires that masters be told which peers are
failover candidates, so they can ensure ordering between logical decoding
and physical failover candidates. Which effectively adds another kind of
sync rep, where we do "wait for physical failover candidates to flush, and
only then allow logical decoding". This actually seems pretty practical
with the design Robert and I discussed, but it's definitely an expansion in
scope.

Alternately, we could require the decoding clients to keep an eye on the
flush/replay positions of all failover candidates and delay commit+confirm
of decoded xacts until the upstream's failover candidates have received and
flushed up to that lsn. Theat starts to look at lot like a decoding on
standby based model for logical failover, where the downstream maintains
slots on each failover candidate upstream.

So yeah. More work needed here. Even if we suddenly decided the original
failover slots model was OK, it's not sufficient to fully solve the problem.

(It's something I'd thought for BDR failover, but never applied to falover
slots: the problem of detecting or preventing divergence when the logical
client is ahead of physical receive at the time the physical standby is
promoted.)

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] [PATCH] Fix drop replication slot blocking instead of returning error

2017-08-29 Thread Craig Ringer
On 29 August 2017 at 22:02, Andres Freund <and...@anarazel.de> wrote:

> Hi,
>
> On 2017-08-29 13:42:05 +0200, Simone Gotti wrote:
> > On Tue, Aug 29, 2017 at 12:13 PM, Alvaro Herrera
> > <alvhe...@2ndquadrant.com> wrote:
> > >
> >
> > Hi Alvaro,
> >
> > > Simone Gotti wrote:
> > > > Hi all,
> > > >
> > > > I noticed that in postgres 10beta3, calling pg_drop_replication_slot
> on an
> > > > active slot will block until it's released instead of returning an
> error
> > > > like
> > > > done in pg 9.6. Since this is a change in the previous behavior and
> the docs
> > > > wasn't changed I made a patch to restore the previous behavior.
> > >
> > > Changing that behavior was the entire point of the cited commit.
> >
> > Sorry, I was thinking that the new behavior was needed for internal
> > future functions since the doc wasn't changed.
>
> FWIW, I also don't think it's ok to just change the behaviour
> unconditionally and without a replacement for existing behaviour.


Seems like it just needs a new argument nowait DEFAULT false


 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Custom allocators in libpq

2017-08-28 Thread Craig Ringer
On 29 August 2017 at 05:15, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Peter Eisentraut <peter.eisentr...@2ndquadrant.com> writes:
> > On 8/28/17 15:11, Tom Lane wrote:
> >> ... but it seems like you're giving up a lot of the possible uses if
> >> you don't make it apply uniformly.  I admit I'm not sure how we'd handle
> >> the initial creation of a connection object with a custom malloc.  The
> >> obvious solution of requiring the functions to be specified at PQconnect
> >> time seems to require Yet Another PQconnect Variant, which is not very
> >> appetizing.
>
> > I would have expected a separate function just to register the callback
> > functions, before doing anything else with libpq.  Similar to libxml:
> > http://xmlsoft.org/xmlmem.html
>
> I really don't much care for libxml's solution, because it implies
> global variables, with the attendant thread-safety issues.  That's
> especially bad if you want a passthrough such as a memory context
> pointer, since it's quite likely that different call sites would
> need different passthrough values, even assuming that a single set
> of callback functions would suffice for an entire application.
> That latter assumption isn't so pleasant either.  One could expect
> that by using such a solution, postgres_fdw could be expected to
> break, say, a libpq-based DBI library inside plperl.


Yeah, the 'register a malloc() function pointer in a global via a
registration function call' approach seems fine and dandy until you find
yourself with an app that, via shared library loads, has more than one
different user of libpq with its own ideas about memory allocation.

RTLD_LOCAL can help, but may introduce other issues.

So there doesn't seem much way around another PQconnect variant. Yay? We
could switch to a struct-passing argument model, but by the time you add
the necessary "nfields" argument to allow you to know how much of the
struct you can safely access, etc, just adding new connect functions starts
to look good in comparison.

Which reminds me, it kind of stinks that PQconnectdbParams and PQpingParams
accept key and value char* arrays, but PQconninfoParse produces
a PQconninfoOption* . This makes it seriously annoying when you want to
parse a connstring, make some transformations and pass it to a connect
function. I pretty much always just put the user's original connstring in
'dbname' and set expand_dbname = true instead.

It might make sense to have any new function accept PQconninfoOption*. Or a
variant of PQconninfoParse that populates k/v arrays with 'n' extra fields
allocated and zeroed on return, I guess.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Make pg_regress print a connstring with sockdir

2017-08-28 Thread Craig Ringer
On 28 August 2017 at 19:45, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Craig Ringer <cr...@2ndquadrant.com> writes:
> > It's a pain having to find the postmaster command line to get the port
> > pg_regress started a server on. We print the port in the pg_regress
> output,
> > why not the socket directory / host?
>
> I'm not following the point here.  The test postmaster isn't really
> going to be around long enough to connect to it manually.  If you
> want to do that, you should be using "installcheck", and then the
> problem doesn't arise.
>
> The reason for printing the port number, if memory serves, is to
> aid in debugging port selection conflicts.  That doesn't really
> apply for temporary socket directories; we're expecting libc to
> avoid any conflicts there.
>

I'm frequently debugging postmasters that are around long enough.
Deadlocks, etc.

It's also way easier to debug shmem related issues with a live postmaster
vs a core.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] psql --batch

2017-08-28 Thread Craig Ringer
On 28 August 2017 at 16:23, Fabien COELHO <coe...@cri.ensmp.fr> wrote:

>
> This doesn't really address the original issue though, that it's far from
>> obvious how to easily and correctly script psql.
>>
>
> That is another interesting argument. I understood that the issue was
> having to type these options, but now it is also to remember which one are
> relevant and wanted, which is a little different and more justifiable as an
> option.
>
> On that account, ISTM that '|' as a field separator is debatable, that
> pager should be turned off... and maybe a few other things.
>


Good point re pager, though it's turned off automatically when stdout isn't
a terminal, so in practice that'll only matter if you're using something
like 'expect' that uses a pty. Still worth doing.

I agree that | is a bit iffy, but so's anything really. null bytes aren't
usable for all scripts, and nothing else cannot also be output in the data
its self. No easy answers there. In cases where I expect that to be an
issue I sometimes use \COPY ... TO STDOUT WITH (FORMAT CSV) though.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] psql --batch

2017-08-28 Thread Craig Ringer
On 28 August 2017 at 15:34, Pavel Stehule <pavel.steh...@gmail.com> wrote:

>
>
> 2017-08-28 9:33 GMT+02:00 Fabien COELHO <coe...@cri.ensmp.fr>:
>
>>
>> ISTM that the real pain is the "-v ON_ERRORS_STOP=1" which I occasionally
>>>> encountered as well, the other one letter ones are not too bad. Maybe it
>>>> would be enough to have a shortcut for this one, say "-B"?
>>>>
>>>
>>> I agree with last sentence. I don't think so -qAtX are expected always,
>>> but
>>> "-v ON_ERRORS_STOP=1" is pretty often.
>>>
>>
>> Yep. I often "\set" that in the script.
>>
>> What do you think about long option "--on-errors-stop" ?
>>>
>>
>> It does not really relieve the typing pain Craig is complaining about,
>> but it would be ok as the long option version if -B is added, and it is
>> auto-completion friendly.
>
>
>
This doesn't really address the original issue though, that it's far from
obvious how to easily and correctly script psql.

I guess there's always the option of a docs patch for that. *shrug*

I'll see what others have to say.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Make pg_regress print a connstring with sockdir

2017-08-28 Thread Craig Ringer
On 28 August 2017 at 15:19, Michael Paquier <michael.paqu...@gmail.com>
wrote:

> On Mon, Aug 28, 2017 at 4:07 PM, Craig Ringer <cr...@2ndquadrant.com>
> wrote:
> > == starting postmaster==
> > running with PID 30235; connect with:
> >   psql "host='/tmp/pg_regress-j74yFE' port=50848 dbname='regression'"
> > == creating database "regression" ==
>
> Sorry if my words were confusing and have cost you three minutes of
> development. I like better the one-line version :)
> Now a socket path could be quite long. I can live with that personally.
>

I'm not fussed, I just think we should show it one way or the other.

One nice thing about the two line form is that you can
double-click/middle-click to open a new psql in the pg_regress session
pretty much instantly.




-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Make pg_regress print a connstring with sockdir

2017-08-28 Thread Craig Ringer
== starting postmaster==
running with PID 30235; connect with:
  psql "host='/tmp/pg_regress-j74yFE' port=50848 dbname='regression'"
== creating database "regression" ==


On 28 August 2017 at 14:08, Michael Paquier <michael.paqu...@gmail.com>
wrote:

> On Mon, Aug 28, 2017 at 2:28 PM, Craig Ringer <cr...@2ndquadrant.com>
> wrote:
> > It's a pain having to find the postmaster command line to get the port
> > pg_regress started a server on. We print the port in the pg_regress
> output,
> > why not the socket directory / host?
> >
> > How about
> > running on 'port=50848 host=/tmp/pg_regress-UMrcT3' with PID 16409
> >
> > per the attached?
> >
> > If you'd prefer nicer wording at the expense of two lines, maybe
> >
> > running with PID 16409
> > connection string: 'port=50848 host=/tmp/blah'
>
> Yeah, I think that this is a good idea.
> --
> Michael
>



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From c3613d4526ba04fb18011e2af1923b9a167effec Mon Sep 17 00:00:00 2001
From: Craig Ringer <cr...@2ndquadrant.com>
Date: Mon, 28 Aug 2017 13:28:05 +0800
Subject: [PATCH v2] Show sockdir/hostname in pg_regress startup output

---
 src/test/regress/pg_regress.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index abb742b..4cccefc 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2438,8 +2438,10 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 #else
 #define ULONGPID(x) (unsigned long) (x)
 #endif
-		printf(_("running on port %d with PID %lu\n"),
-			   port, ULONGPID(postmaster_pid));
+		printf(_("running with PID %lu; connect with:\n"),
+ 			   ULONGPID(postmaster_pid));
+		printf(gettext_noop("  psql \"host='%s' port=%d dbname='%s'\"\n"),
+			   hostname ? hostname : sockdir, port, dblist->str);
 	}
 	else
 	{
-- 
2.9.5


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql --batch

2017-08-28 Thread Craig Ringer
On 28 August 2017 at 14:56, Fabien COELHO <coe...@cri.ensmp.fr> wrote:

>
> I find myself regurgitating the incantation
>>
>> psql -qAtX -v ON_ERRORS_STOP=1
>>
>> quite a bit. It's not ... super friendly.
>>
>> It strikes me that we could possibly benefit from a 'psql --batch' option.
>>
>> Thoughts?
>>
>
> The link between -qAtX and "batch" is not that fully obvious, especially
> the unaligned tuples-only part. If so, why not some -F  as well?
>
>
q: quiet

Pretty much always wanted for a batch mode run of anything.

A: unaligned tuples

Because alignment is pretty much never useful when you're parsing result
sets with scripting (splitting, cut, etc) and just makes everything harder.
The alignment psql uses isn't fixed, after all.

t: tuples-only

Headers just make everything more annoying to parse, and force you to do
extra work to skip them. If you're running batch code you know the headers
because you used a column-list form SELECT, or should've. You're unlikely
to be ingesting them and using them to split up the tuple anyway. I think
this one is a bit more arguable than the first two, though, as I can at
least think of some cases where you might want it.

X: skip .psqlrc

Reliable, portable scripted psql shouldn't be using the local .psqlrc IMO.
It's likely to just break things in exciting ways. But I can see it being
reasonable to require this option to be supplied separately and just
document it as "recommended" with --batch.



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


[HACKERS] Make pg_regress print a connstring with sockdir

2017-08-27 Thread Craig Ringer
Hi all

It's a pain having to find the postmaster command line to get the port
pg_regress started a server on. We print the port in the pg_regress output,
why not the socket directory / host?

How about

running on 'port=50848 host=/tmp/pg_regress-UMrcT3' with PID 16409

per the attached?

If you'd prefer nicer wording at the expense of two lines, maybe

running with PID 16409
connection string: 'port=50848 host=/tmp/blah'

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 5b242d9af911a88c084a74bf49904b94117347c1 Mon Sep 17 00:00:00 2001
From: Craig Ringer <cr...@2ndquadrant.com>
Date: Mon, 28 Aug 2017 13:28:05 +0800
Subject: [PATCH v1] Show sockdir/hostname in pg_regress startup output

---
 src/test/regress/pg_regress.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index abb742b..11931db 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2438,8 +2438,9 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 #else
 #define ULONGPID(x) (unsigned long) (x)
 #endif
-		printf(_("running on port %d with PID %lu\n"),
-			   port, ULONGPID(postmaster_pid));
+		printf(_("running on 'port=%d host=%s' with PID %lu\n"),
+			   port, hostname ? hostname : sockdir,
+ 			   ULONGPID(postmaster_pid));
 	}
 	else
 	{
-- 
2.9.5


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] psql --batch

2017-08-27 Thread Craig Ringer
Hi all

I find myself regurgitating the incantation

psql -qAtX -v ON_ERRORS_STOP=1

quite a bit. It's not ... super friendly.

It strikes me that we could possibly benefit from a 'psql --batch' option.

Thoughts?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] WIP: Separate log file for extension

2017-08-27 Thread Craig Ringer
On 25 August 2017 at 15:12, Antonin Houska <a...@cybertec.at> wrote:

> Attached is a draft patch to allow extension to write log messages to a
> separate file.


I like the idea a lot. I'm not so sure about the approach.

How will this play with syslog? csvlog? etc?

I wonder if a level of indirection is appropriate here, where extensions
(or postgres subsystems, even) provide a log stream label. Then the logging
backed takes care of using that appropriately for the logging mechanism in
use; for logging to file that'd generally be separate files.  Same for
CSVlog. Other mechanisms could be left as stubs initially.

So the outcome would be the same, just without the assumption of specific
file name and output mechanism baked in.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] shm_mq_wait_internal gets stuck forever on fast shutdown

2017-08-21 Thread Craig Ringer
On 21 August 2017 at 21:44, Robert Haas <robertmh...@gmail.com> wrote:


> While this would work, I don't really see the need for it given the
> availability of nonblocking operations.  See mq_putmessage() for an
> example.
>

Makes sense, and I'm not especially concerned. If the expected solution to
such usage is to use non-blocking calls, that's fine with me.

I partly wanted to put this out there to help the next person looking into
it. Or myself, when I've forgotten and go looking again ;) . But also, to
ensure that this was in fact fully expected behaviour not an oversight re
applying shm_mq to non-bgworker endpoints.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] shm_mq_wait_internal gets stuck forever on fast shutdown

2017-08-20 Thread Craig Ringer
On 21 August 2017 at 10:57, Craig Ringer <cr...@2ndquadrant.com> wrote:

> Hi all
>
> I've noticed a possible bug / design limitation where shm_mq_wait_internal
> sleep in a latch wait forever, and the postmaster gets stuck waiting for
> the bgworker the wait is running in to exit.
>
> This happens when the shm_mq does not have an associated bgworker handle
> registered because the other end is not known at mq creation time or is a
> normal backend not a bgworker. So a BGW handle cannot be passed.
>
> shm_mq_wait_internal() will CHECK_FOR_INTERRUPTS() when its latch wait is
> interrupted by a SIGTERM. But it doesn't actually respond to SIGTERM in any
> way; it just merrily resets its latch and keeps looping.
>
> It will bail out correctly on SIGQUIT.
>
> If the proc waiting to attach was known at queue creation time and was a
> bgworker, we'd pass a bgworker handle and the mq would notice it failed to
> start and stop waiting. There's only a problem if no bgworker handle can be
> supplied.
>
> The underlying problem is that CHECK_FOR_INTERRUPTS() doesn't care about
> SIGTERM or have any way to test for it. And we don't have any global
> management of SIGTERM like we do SIGQUIT so the shm_mq_wait_internal loop
> can't test for it.
>
> The only ways I can see to fix this are:
>
> * Generalize SIGTERM handling across postgres, so there's a global
> "got_SIGTERM" global that shm_mq_wait_internal can test to break out of its
> loop, and every backend's signal handler must set it. Lots of churn.
>
> * In a proc's signal handler, use globals set before entry and after exit
> from shm_mq operations to detect if we're currently in shm_mq and promote
> SIGTERM to SIGQUIT by sending a new signal to ourselves. Or set up state so
> CHECK_FOR_INTERRUPTS() will notice when the handler returns.
>
> * Allow passing of a *bool that tests for SIGTERM, or a function pointer
> called on each iteration to test whether looping should continue, to be
> passed to shm_mq_attach. So if you can't supply a bgw handle, you supply
> that instead. Provide a shm_mq_set_handle equivalent for it too.
>
> Any objections to the last approach?
>

BTW, you can work around it in extension code for existing versions with
something like this in your bgworker:

volatile bool   in_shm_mq = false;

void
my_handle_sigterm(SIGNAL_ARGS)
{
...

if (in_shm_mq)
{
/*
 * shm_mq can get stuck in shm_mq_wait_internal on SIGTERM;
see
 *
https://www.postgresql.org/message-id/CAMsr+YHmm=01lsueyr6ydz8clgfnk_fgdgi+qxujf+jelpv...@mail.gmail.com
 *
 * To work around this we keep track of whether we're in
shmem_mq
 * and generate a fake interrupt for CHECK_FOR_INTERRUPTS()
to
 * process if so.
 *
 * The guard around in_shm_mq may not be necessary, but
without
 * it any SIGTERM will likely cause ereport(FATAL) with
 * "terminating connection due to administrator command"
 * which isn't ideal.
 */
InterruptPending = true;
ProcDiePending = true;
}


}

inline static shm_mq_handle *
myext_shm_mq_attach(shm_mq *mq, dsm_segment *seg, BackgroundWorkerHandle
*handle)
{
shm_mq_handle *ret;
in_shm_mq = true;
ret = shm_mq_attach(mq, seg, handle);
in_shm_mq = false;
return ret;
}

/* repeated for shm_mq_receive, shm_mq_send, shm_mq_sendv,
shm_mq_wait_for_attach */


You can instead use non-blocking sends instead, and sleep on your own
latch, doing the same work as shm_mq_wait_internal yourself.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


[HACKERS] shm_mq_wait_internal gets stuck forever on fast shutdown

2017-08-20 Thread Craig Ringer
Hi all

I've noticed a possible bug / design limitation where shm_mq_wait_internal
sleep in a latch wait forever, and the postmaster gets stuck waiting for
the bgworker the wait is running in to exit.

This happens when the shm_mq does not have an associated bgworker handle
registered because the other end is not known at mq creation time or is a
normal backend not a bgworker. So a BGW handle cannot be passed.

shm_mq_wait_internal() will CHECK_FOR_INTERRUPTS() when its latch wait is
interrupted by a SIGTERM. But it doesn't actually respond to SIGTERM in any
way; it just merrily resets its latch and keeps looping.

It will bail out correctly on SIGQUIT.

If the proc waiting to attach was known at queue creation time and was a
bgworker, we'd pass a bgworker handle and the mq would notice it failed to
start and stop waiting. There's only a problem if no bgworker handle can be
supplied.

The underlying problem is that CHECK_FOR_INTERRUPTS() doesn't care about
SIGTERM or have any way to test for it. And we don't have any global
management of SIGTERM like we do SIGQUIT so the shm_mq_wait_internal loop
can't test for it.

The only ways I can see to fix this are:

* Generalize SIGTERM handling across postgres, so there's a global
"got_SIGTERM" global that shm_mq_wait_internal can test to break out of its
loop, and every backend's signal handler must set it. Lots of churn.

* In a proc's signal handler, use globals set before entry and after exit
from shm_mq operations to detect if we're currently in shm_mq and promote
SIGTERM to SIGQUIT by sending a new signal to ourselves. Or set up state so
CHECK_FOR_INTERRUPTS() will notice when the handler returns.

* Allow passing of a *bool that tests for SIGTERM, or a function pointer
called on each iteration to test whether looping should continue, to be
passed to shm_mq_attach. So if you can't supply a bgw handle, you supply
that instead. Provide a shm_mq_set_handle equivalent for it too.

Any objections to the last approach?

--
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Updating line length guidelines

2017-08-20 Thread Craig Ringer
On 21 August 2017 at 10:36, Michael Paquier <michael.paqu...@gmail.com>
wrote:

> On Mon, Aug 21, 2017 at 11:30 AM, Robert Haas <robertmh...@gmail.com>
> wrote:
> > On Sun, Aug 20, 2017 at 10:49 AM, Andres Freund <and...@anarazel.de>
> wrote:
> >> We currently still have the guideline that code should fit into an 80
> >> character window. But an increasing amount of the code, and code
> >> submissions, don't adhere to that (e.g. copy.c, which triggered me to
> >> write this email). And I mean outside of accepted "exceptions" like
> >> error messages.  And there's less need for such a relatively tight limit
> >> these days.  Perhaps we should up the guideline to 90 or 100 chars?
> >
> > Or maybe we should go the other way and get a little more rigorous
> > about enforcing that limit.  I realize 80 has nothing on its side but
> > tradition, but I'm a traditionalist -- and I still do use 80 character
> > windows a lot of the time.
>
> +1. FWIW, I find the non-truncation of some error messages a bit
> annoying when reading them. And having a 80-character makes things
> readable. For long URLs this enforcement makes little sense as those
> strings cannot be split, but more effort could be done.
> <http://www.postgresql.org/mailpref/pgsql-hackers>
>

The main argument for not wrapping error messages is grep-ableness.

Personally I'd be fine with 100 or so, but when I'm using buffers side by
side, or when I'm working in poor conditions where I've set my terminal to
"giant old people text" sizes, I remember the advantages of a width limit.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?

2017-08-20 Thread Craig Ringer
On 20 August 2017 at 10:10, MauMau <maumau...@gmail.com> wrote:

> From: Chris Travers
> > Why cannot you do all this in a language handler and treat as a user
> defined function?
> > ...
> > If you have a language handler for cypher, why do you need in_region
> or cast_region?  Why not just have a graph_search() function which
> takes in a cypher query and returns a set of records?
>
> The language handler is for *stored* functions.  The user-defined
> function (UDF) doesn't participate in the planning of the outer
> (top-level) query.  And they both assume that they are executed in SQL
> commands.
>

While I generally agree with Tom on this, I think there are some useful
ideas to examine.

Allow a UDF to emit multiple result sets that can then be incorporated into
a outer query. IMO it'd be fine to support this by returning a wide row of
REFCURSORs and then allow FETCH to be used in a subquery.

The UDF would need to be invoked before the rest of the query was planned,
so the planner could learn the structure of the cursor's result sets.

Or some higher level concept could be introduced, like it was for
aggregates and window functions, where one call can be made to get the
output structure and some stats estimates, and another call (or series) to
get the rows.

I guess you're going two steps further than that, seeking a more integrated
model where the plugin can generate paths and participate more actively in
planning, and where you can optionally make it the default so you don't
need a SQL function call to access it.

If you want to pursue that, I suggest you start small and go step-by-step.
Things like:

* Allow FETCH ...  to be used in subqueries with explicitly
listed output relation structure, like calling a function that returns
record

* Allow pre-execution of parts of a query that produce refcursors used in
subqueries, then finish planning the outer query once the cursor output
types are known

* A construct that can inject arbitrary virtual relations into the
namespace at parse-time, so you don't have to do the dance with refcursors.
(Like WITH).

* Construct that can supply stats estimates for the virtual relations

So try to build it in stages.

You could also potentially use the FDW interface.


> I want the data models to meet these:
>
> 1) The query language can be used as a top-level session language.
> For example, if an app specifies "region=cypher_graph" at database
> connection, it can use the database as a graph database and submit
> Cypher queries without embedding them in SQL.
>

Why? What does this offer over the app or client tool wrapping its queries
in "SELECT cypher_graph('')" ?


> 2) When a query contains multiple query fragments of different data
> models, all those fragments are parsed and planned before execution.
> The planner comes up with the best plan, crossing the data model
> boundary.  To take the query example in my first mail, which joins a
> relational table and the result of a graph query.  The relational
> planner considers how to scan the table, the graph planner considers
> how to search the graph, and the relational planner considers how to
> join the two fragments.
>

Here, what you need is a way to define a set of virtual relations on a
per-query basis, where you can get stats estimates for the relations during
planning.

I guess what you're imagining is something more sophisticated where you're
generating some kind of sub-plan candidates, like the path model. With some
kind of interaction so the sub-planner for the other model could know to
generate a different sub-plan based on the context of the outer plan. I
have no idea how that could work. But I think you have about zero chance of
achieving what you want by going straight there. Focus on small incremental
steps, preferably ones you can find other uses for too.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Function to move the position of a replication slot

2017-08-16 Thread Craig Ringer
On 17 August 2017 at 09:33, Andres Freund <and...@anarazel.de> wrote:

> On 2017-08-16 21:25:48 -0400, Robert Haas wrote:
> > On Wed, Aug 16, 2017 at 5:55 PM, Andres Freund <and...@anarazel.de>
> wrote:
> > > I think we should constrain the API to only allow later LSNs than
> > > currently in the slot, rather than arbitrary ones. That's why I was
> > > thinking of "forward".  I'm not convinced it's a good / safe idea to
> > > allow arbitrary values to be set.
> >
> > Maybe I shouldn't play the devil's advocate here, but isn't a feature
> > like this by definition only for people who Know What They Are Doing?
> > If so, why not let them back the slot up?  I'm sure that will work out
> > just fine.  They Know What They Are Doing.
>
> I have yet to hear a reason for allowing to move things backward etc. So
> I'm not sure what the benefit would be. But more importantly I'd like to
> make this available to non-superusers at some point, and there I think
> it's more important that they can't do bad things. The reason for
> allowing it for non-superusers is that I think it's quite a useful
> function to be used by an automated system. E.g. to ensure enough, but
> not too much, WAL is available for a tertiary standby both on the actual
> primary and a failover node.
>

I strongly agree.

If you really need to move a physical slot back (why?) you can do it with
an extension that uses the low level APIs. But I can't see why you would
want to.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2017-08-16 Thread Craig Ringer
On 16 August 2017 at 23:14, Robert Haas <robertmh...@gmail.com> wrote:

> On Tue, Aug 15, 2017 at 4:36 PM, Tomas Vondra
> <tomas.von...@2ndquadrant.com> wrote:
> > You might say that people investigating issues in this area of code
> should
> > be aware of how HEAP_XMIN_FROZEN is defined, and perhaps you're right ...
>
> Yes, I think that's what I would say.  I mean, if you happen to NOT
> know that committed|invalid == frozen, but you DO know what committed
> means and what invalid means, then you're going to be *really*
> confused when you see committed and invalid set on the same tuple.
> Showing you frozen has got to be clearer.
>
> Now, I agree with you that a test like (enumval_tup->t_data &
> HEAP_XMIN_COMMITTED) could be confusing to someone who doesn't realize
> that HEAP_XMIN_FROZEN & HEAP_XMIN_COMMITTED == HEAP_XMIN_COMMITTED,
> but I think that's just one of those things that unfortunately is
> going to require adequate knowledge for people investigating issues.
> If there's an action item there, it might be to try to come up with a
> way to make the source code clearer.
>
>
For other multi-purpose flags we have macros, and I think it'd make sense
to use them here too.

Eschew direct use of  HEAP_XMIN_COMMITTED, HEAP_XMIN_INVALID and
HEAP_XMIN_FROZEN in tests. Instead, consistently use HeapXminIsFrozen(),
HeapXminIsCommitted(), and HeapXminIsInvalid() or something like that.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Function to move the position of a replication slot

2017-08-16 Thread Craig Ringer
On 17 August 2017 at 07:30, Michael Paquier <michael.paqu...@gmail.com>
wrote:

>
> Definitely agreed on that. Any move function would need to check if
> the WAL position given by caller is already newer than what's
> available in the local pg_wal (minimum of all other slots), with a
> shared lock that would need to be taken by xlog.c when recycling past
> segments. A forward function works on a single entry, which should be
> disabled at the moment of the update. It looks dangerous to me to do
> such an operation if there is a consumer of a slot currently on it.
>
>
pg_advance_replication_slot(...)

ERROR's on logical slot, for now. Physical slots only.

Forward-only.

Future work to allow it to use the logical decoding infrastructure to
fast-forward a slot by reading only catalog change information and
invalidations, either via a dummy output plugin that filters out all xacts,
or by lower level use of the decoding code.

Reasonable?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2017-08-15 Thread Craig Ringer
On 16 August 2017 at 03:42, Tomas Vondra <tomas.von...@2ndquadrant.com>
wrote:

>
>
> On 08/15/2017 07:54 PM, Robert Haas wrote:
>
>> On Tue, Aug 15, 2017 at 9:59 AM, Tomas Vondra
>> <tomas.von...@2ndquadrant.com> wrote:
>>
>>> I don't think so -- the "committed" and "invalid" meanings are
>>>> effectively canceled when the "frozen" mask is present.
>>>>
>>>> I mean, "committed" and "invalid" contradict each other...
>>>>
>>>
>>> FWIW I agree with Craig - the functions should output the masks raw,
>>> without
>>> any filtering. The reason is that when you're investigating data
>>> corruption
>>> or unexpected behavior, all this is very useful when reasoning about what
>>> might (not) have happened.
>>>
>>> Or at least make the filtering optional.
>>>
>>
>> I don't think "filtering" is the right way to think about it.  It's
>> just labeling each combination of bits with the meaning appropriate to
>> that combination of bits.
>>
>> I mean, if you were displaying the contents of a CLOG entry, would you
>> want the value 3 to be displayed as COMMITTED ABORTED SUBCOMMITTED
>> because TRANSACTION_STATUS_COMMITTED|TRANSACTION_STATUS_ABORTED ==
>> TRANSACTION_STATUS_SUB_COMMITTED?
>>
>> I realize that you may be used to thinking of the HEAP_XMIN_COMMITTED
>> and HEAP_XMAX_COMMITTED bits as two separate bits, but that's not
>> really true any more.  They're a 2-bit field that can have one of four
>> values: committed, aborted, frozen, or none of the above.
>>
>>
> All I'm saying is that having the complete information (knowing which bits
> are actually set in the bitmask) is valuable when reasoning about how you
> might have gotten to the current state. Which I think is what Craig is
> after.
>
> What I think we should not do is interpret the bitmasks (omitting some of
> the information) assuming all the bits were set correctly.


I agree, and the patch already does half of this: it can output just the
raw bit flags, or it can interpret them to show HEAP_XMIN_FROZEN etc.

So the required change, which seems to have broad agreement, is to have the
"interpret the bits" mode show only HEAP_XMIN_FROZEN when it sees
HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID, etc. We can retain raw-flags output
as-is for when seriously bogus state is suspected.

Any takers?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] What users can do with custom ICU collations in Postgres 10

2017-08-14 Thread Craig Ringer
On 15 August 2017 at 10:16, Michael Paquier <michael.paqu...@gmail.com>
wrote:

> On Tue, Aug 15, 2017 at 11:10 AM, Craig Ringer <cr...@2ndquadrant.com>
> wrote:
> > Ooh, this finally gives us a path toward case-insensitive default
> database
> > collation via CLDR caseLevel.
> >
> > http://userguide.icu-project.org/collation
> > http://www.unicode.org/reports/tr35/tr35-collation.html#Algorithm_Case
> >
> > That *definitely* should be documented and exposed by initdb.
>
> The addition of an interface to initdb smells like an item for v11~,
> but the documentation for 10 could be improved in this sense?
>

Yeah, not suggesting changing it for Pg10, way too late.

It's also not enough for case-insensitive DB by its self, since we still do
binary compares for equality. There'd need to be deeper surgery to make it
work. So I'm getting prematurely excited here.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] What users can do with custom ICU collations in Postgres 10

2017-08-14 Thread Craig Ringer
On 10 August 2017 at 06:49, Peter Geoghegan <p...@bowt.ie> wrote:

> There are actually very many customizations to collations that are
> possible beyond what the "stock" ICU collations provide (whatever
> "stock" means). Some of these are really cool, and I can imagine use
> cases where they are very compelling that have nothing to do with
> internationalization (such customizations are how we should eventually
> implement case-insensitive collations, once the infrastructure for
> doing that without breaking hashing is in place).
>
> I'd like to give a demo on what is already possible, but not currently
> documented. I didn't see anyone else comment on this, including Peter
> E (maybe I missed that?). We should improve the documentation in this
> area, to get this into the hands of users.
>
> Say we're unhappy that numbers come first, which we see here:
>
>
Ooh, this finally gives us a path toward case-insensitive default database
collation via CLDR caseLevel.

http://userguide.icu-project.org/collation

http://www.unicode.org/reports/tr35/tr35-collation.html#Algorithm_Case

That *definitely* should be documented and exposed by initdb.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] make async slave to wait for lsn to be replayed

2017-08-14 Thread Craig Ringer
On 22 March 2017 at 01:17, Robert Haas <robertmh...@gmail.com> wrote:

> On Sun, Mar 12, 2017 at 10:20 PM, Thomas Munro
> <thomas.mu...@enterprisedb.com> wrote:
> > Maybe someone can think of a clever way for an extension to insert a
> > wait for a user-supplied LSN *before* acquiring a snapshot so it can
> > work for the higher levels, or maybe the hooks should go into core
> > PostgreSQL so that the extension can exist as an external project not
> > requiring a patched PostgreSQL installation, or maybe this should be
> > done with new core syntax that extends transaction commands.  Do other
> > people have views on this?
>
> IMHO, trying to do this using a function-based interface is a really
> bad idea for exactly the reasons you mention.  I don't see why we'd
> resist the idea of core syntax here; transactions are a core part of
> PostgreSQL.
>
> There is, of course, the question of whether making LSNs such a
> user-visible thing is a good idea in the first place, but that's a
> separate question from issue of what syntax for such a thing is best.


(I know this is old, but):

That ship sailed a long time ago unfortunately, they're all over
pg_stat_replication and pg_replication_slots and so on. They're already
routinely used for monitoring replication lag in bytes, waiting for a peer
to catch up, etc.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2017-08-14 Thread Craig Ringer
On 15 August 2017 at 09:11, Moon Insung <moon_insung...@lab.ntt.co.jp>
wrote:

> Dear Craig Ringer
>
>
>
> Frist, thank you for implementing the necessary function.
>
>
>
> but, i have some question.
>
>
>
> question 1) vacuum freeze hint bits
>
> If run a vacuum freeze, bits in the infomask will be 0x0300.
>
> in this case, if output the value of informsk in the run to you modified,
>
> HEAP_XMIN_COMMITTED(0x0100), HEAP_XMIN_INVALID(0x0200),
> HEAP_XMIN_FROZEN(0x0300)
>
> all outputs to hint bits.
>
>
>
> is it normal to output values?
>
>
>
> if look at htup_details.h code,
>
>
>
> #define HeapTupleHeaderXminInvalid(tup) \
>
> ( \
>
>   ((tup)->t_infomask & (HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID))
> == \
>
> HEAP_XMIN_INVALID \
>
> )
>
> #define HeapTupleHeaderSetXminCommitted(tup) \
>
> ( \
>
>   AssertMacro(!HeapTupleHeaderXminInvalid(tup)), \
>
>   ((tup)->t_infomask |= HEAP_XMIN_COMMITTED) \
>
> )
>
>
>
> HEAP_XMIN_INVALID and HEAP_XMIN_COMMITTED can not be write simultaneously.
>

The bits are set, those macros just test to exclude the special meaning of
both bits being set at once to mean "frozen".

I was reluctant to filter out  HEAP_XMIN_COMMITTED and HEAP_XMIN_INVALID
when we detect that it's frozen, because that could well be misleading when
debugging.

If you think that is useful, then I suggest you add an option so that when
it's outputting the interpreted mask not the raw mask, it suppresses output
of HEAP_XMIN_COMMITTED and HEAP_XMIN_INVALID if HEAP_XMIN_FROZEN.

question 2) xmax lock hint bits
>
> similar to the vacuum freezeze question..
>
> Assume that the infomask has a bit of 0x0050
>
>
>
> In this case, if run on the code that you modified,
>
> HEAP_XMAX_KEYSHR_LOCK(0x0010), HEAP_XMAX_EXCL_LOCK(0x0040),
> HEAP_XMAX_IS_LOCKED_ONLY
>
> three hint bits are the output.
>
>
>
> if look at htup_details.h code,
>
>
>
> #define HEAP_XMAX_IS_SHR_LOCKED(infomask) \
>
>   (((infomask) & HEAP_LOCK_MASK) == HEAP_XMAX_SHR_LOCK)
>
> #define HEAP_XMAX_IS_EXCL_LOCKED(infomask) \
>
>   (((infomask) & HEAP_LOCK_MASK) == HEAP_XMAX_EXCL_LOCK)
>
> #define HEAP_XMAX_IS_KEYSHR_LOCKED(infomask) \
>
>   (((infomask) & HEAP_LOCK_MASK) == HEAP_XMAX_KEYSHR_LOCK)
>
>
>
> It is divided into to hint bits.
>
> so I think this part needs to fix.
>

It's the same issue as above, with the same answer IMO.

If we're showing the raw mask bits we should show the raw mask bits only.

But if we're showing combined bits and masks too, I guess we should filter
out the raw bits when matched by some mask.

I'm not entirely convinced by that, since I think hiding information could
create more confusion than it fixes. I welcome others' views here.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] WIP: Failover Slots

2017-08-13 Thread Craig Ringer
ding along with
the master slots it's actually replaying from. If a client isn't "told"
about a promotion candidate, decoding will break when we fail over. If a
client cannot connect to a promotion candidate, catalog_xmin will fall
behind on master until the replica is discarded (and its physical slot
dropped) or the client regains access. Every different logical decoding
client application must implement all this logic and management separately.

It may be possible to implement failover-slots like functionality based on
decoding on standby in an app transparent way, by having the replica
monitor slot states on the master and self-advance its own slots by
loopback decoding connection. Or the master could maintain an inventory of
replicas and make decoding connections to them where it advances their
slots after the masters' slots are advanced by an app. But either way, why
would we want to do this? Why actually decode WAL and use the logical
decoding machinery when we *know* the state of the system because only the
master is writeable?

The way I see it, to provide failover slots functionality we'd land up with
something quite similar to what Robert and I just discussed, but the slot
advance would be implemented using decoding (on standby) instead of
directly setting slot state. What benefit does that offer?

I don't want to block failover slots on decoding on standby just because
decoding on standby would be nice to have.


> - How much divergence are we going to accept between infrastructure for
>   logical failover, and logical failover via failover slots (or however
>   we're naming this)? Again, I'm probably a lot closer to zero than
>   craig is.



We don't have logical failover, let alone mature, tested logical failover
that covers most of Pg's available functionality. Nor much of a design for
it AFAIK. There is no logical failover to diverge from, and I don't want to
block physical failover support on that.

But, putting that aside to look at the details of how logical failover
might work, what sort of commonality do you expect to see? Physical
failover is by WAL replication using archive recovery/streaming, managed
via recovery.conf, with unilateral promotion by trigger file/command. The
admin is expected to ensure that any clients and cascading replicas get
redirected to the promoted node and the old one is fenced - and we don't
care if that's done by IP redirection or connstring updates or what. Per
the proposal Robert and I discussed, logical slots will be managed by
having the walsender/walreceiver exchange slot state information that
cascades up/down the replication tree via mirror slot creations.

How's logical replica promotion going to work? Here's one possible way, of
many: the promotion-candidate logical replica consumes an unfiltered xact
stream that contains changes from all nodes, not just its immediate
upstream. Downstreams of the master can maintain direct connections to the
promotion candidate and manage their own slots directly, sending flush
confirmations for slots on the promotion candidate as they see their
decoding sessions on the replica decode commits for LSNs the clients sent
flush confirmations to the master for. On promotion, the master's
downstreams would be reconfigured to connect to the node-id of the newly
promoted master and would begin decoding from it in catchup mode, where
they receive the commits from the old master via the new master, until they
reach the new master's end-of-wal at time of promotion. With some tweaks
like a logical WAL message recording the moment of promotion, it's not that
different to the client-managed physical failover model.

It can also be converted to a more transparent failover-slots like model by
having the promotion candidate physical replica clone slots from its
upstream, but advance them by loopback decoding - not necessarily actual
network loopback. It'd use a filter that discards data and only sees the
commit XIDs + LSNs. It'd send confirmations on the slots when the local
slot processed a commit for which the upstream's copy of the slot had a
confirmation for that lsn. On promotion, replicas would connect with new
replorigins (0) and let decoding start at the slot positions on the
replica. The master->replica slot state reporting can be done via the
walsender too, just as proposed for the physical case, though no
replica->master reporting would be needed for logical failover.

So despite my initial expectations they can be moderately similar in broad
structure. But I don't think there's going to be much actual code overlap
beyond minor things like both wanting a way to query slot state on the
upstream. Both *could* use decoding on standby to advance slot positions,
but for the physical case that's just a slower (and unfinished) way to do
what we already have, wheras it's necessary for logical failover.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Thoughts on unit testing?

2017-08-13 Thread Craig Ringer
On 14 August 2017 at 03:19, Peter Geoghegan <p...@bowt.ie> wrote:

> On Thu, Aug 10, 2017 at 2:53 PM, Thomas Munro
> <thomas.mu...@enterprisedb.com> wrote:
> > The current regression tests, isolation tests and TAP tests are very
> > good (though I admit my experience with TAP is limited), but IMHO we
> > are lacking support for C-level unit testing.  Complicated, fiddly
> > things with many states, interactions, edge cases etc can be hard to
> > get full test coverage on from the outside.  Consider
> > src/backend/utils/mmgr/freepage.c as a case in point.
>
> It is my understanding that much of the benefit of unit testing comes
> from maintainability. It's something that goes well with design by
> contract. Writing unit tests is a forcing function. It requires
> testable units, making the units more composable. The programmer must
> be very deliberate about state, and must delineate code as testable
> units. Unit tests are often supposed to be minimal, to serve as a kind
> of design document.


This is why I personally only find true unit tests useful as part of large
software packages like Pg when they cover functional units that can be
fairly well isolated.

However, I'm not sure that Thomas meant unit tests in the formal sense
isolated and mocked, but rather in the sense of test procedures written in
C to excercise portions of Pg's code that are not easily/conveniently
tested from SQL.

Pg lacks the strict module delineation needed to make formal unit testing
practical, as you have noted. That doesn't mean test frameworks couldn't be
useful. I routinely write tests for reasonably isolated functionality in my
code by writing throwaway SQL-callable functions to exercise it and
pg_regress tests to run them. It's not strict unit testing as the rest of
Pg's APIs aren't mocked away, but it's very practical small-unit
integration testing that helps catch issues.

I wouldn't mind having an easier and nicer way to do that built in to Pg,
but don't have many ideas about practical, low-maintenance ways to achieve
it.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] WIP: Failover Slots

2017-08-10 Thread Craig Ringer
|
   |to phys slot "B"  |  |
   | CLIENT  |
   |marked permanent  |  |
 +->   | |
   +> |  |   ERROR: slot X
still being+-+
   |  |Sees upmirror |   created on
master, not ready
   |  |slot "X" in   |
   |  |list from "A",|
   |  |marks it  |
   |  |permanent and |
   |  |copies state  |
   |  +> |
   |  |  |Sees upmirror slot
   |  |  |"X" on "B" got
marked
   |  |  |permanent
(because it
   |  |  |appears in B's
slot
   |  |  |listings),
   |  |  |marks permanent
on C.
   |  |  |Copies state.
   |  |  |
   |  |  |Slot "X" now
persistent
   |  |  |and (when
decoding on standby
   |  |  |supported) can be
used for decoding
   |  |  |on standby.
   +  +  +


(also avail as
https://gist.github.com/ringerc/d4a8fe97f5fd332d8b883d596d61e257 )

To actually use the slot once decoding on standby is supported: a decoding
client on "C" can consume xacts and cause slot "X" to advance catalog_xmin,
confirmed_flush_lsn, etc. walreceiver on "C" will tell walsender on "B"
about the new slot state, and it'll get synced up-tree, then B will tell A.

Since slot is already marked permanent, state won't get copied back
down-tree, that only happens once when slot is first fully created on
master.

Some node "D" can exist as a phys rep of "C". If C fails and is replace
with D, admin can promote the down-mirror slot on "D" to an owned slot.


Make sense?


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Thoughts on unit testing?

2017-08-10 Thread Craig Ringer
On 11 August 2017 at 07:13, Thomas Munro <thomas.mu...@enterprisedb.com>
wrote:

> On Fri, Aug 11, 2017 at 10:35 AM, Andres Freund <and...@anarazel.de>
> wrote:
> > On 2017-08-11 09:53:23 +1200, Thomas Munro wrote:
> >> One idea that keeps coming back to me is that we could probably extend
> >> our existing regression tests to cover C tests with automatic
> >> discovery/minimal boilerplate.
> >
> > What's your definition of boilerplate here? All the "expected" state
> > tests in C unit tests is plenty boilerplate...
>
> I mean close to zero effort required to create and run new tests for
> primitive C code.  Just create a .test.c file and type "TEST(my_math,
> factorial) { EXPECT_EQ(6, factorial(3)); }" and it'll run when you
> "make check" and print out nicely tabulated results and every build
> farm member will run it.


The closest we come now is a src/test/modules/ with an extension that
exposes SQL functions that in turn run the C tests, one SQL func per C test.

Definitely not brief and simple. But that's what I've been doing.

Lots of those unit frameworks support auto-generation of harness code. I
wonder how practical it'd be to teach them to generate an extension like
this and the .sql to run it?

It won't work if you want to test anything static or file-private, but
pretty much nothing will.

(Side note: I'd really like to get away from relying so heavily on 'static'
and move toward higher level visibility (https://gcc.gnu.org/wiki/Visibility:
-fvisibility=hidden and __attribute__((dllexport)) ). It'd make it easier
not to forget needed PGDLLEXPORTs, let us hide stuff we consider really
internal but still share across a few files, etc.)
-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] [TRAP: FailedAssertion] causing server to crash

2017-08-10 Thread Craig Ringer
On 10 August 2017 at 23:25, Robert Haas <robertmh...@gmail.com> wrote:

> On Mon, Aug 7, 2017 at 2:06 AM, Craig Ringer <cr...@2ndquadrant.com>
> wrote:
> > I think so - specifically, that it's a leftover from a revision where the
> > xid limit was advanced before clog truncation.
> >
> > I'll be finding time in the next couple of days to look more closely and
> > ensure that's all it is.
>
> A couple of days having gone by without hearing further on this, I'm
> going to assume for the moment that my analysis is correct and just
> push a commit to remove this assertion.  That's not intended to
> discourage you from spending further time on this - it would be bad to
> be wrong about this - but doing something we all agree is wrong can't
> be better than doing something that I, at least, think is correct.
>


Thanks.

Keeps getting bumped down by other things, but it's on the stack to not get
lost


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] WIP: Failover Slots

2017-08-10 Thread Craig Ringer
On 9 August 2017 at 23:42, Robert Haas <robertmh...@gmail.com> wrote:

> On Tue, Aug 8, 2017 at 4:00 AM, Craig Ringer <cr...@2ndquadrant.com>
> wrote:
> >> - When a standby connects to a master, it can optionally supply a list
> >> of slot names that it cares about.
> >
> > Wouldn't that immediately exclude use for PITR and snapshot recovery? I
> have
> > people right now who want the ability to promote a PITR-recovered
> snapshot
> > into place of a logical replication master and have downstream peers
> replay
> > from it. It's more complex than that, as there's a resync process
> required
> > to recover changes the failed node had sent to other peers but isn't
> > available in the WAL archive, but that's the gist.
> >
> > If you have a 5TB database do you want to run an extra replica or two
> > because PostgreSQL can't preserve slots without a running, live replica?
> > Your SAN snapshots + WAL archiving have been fine for everything else so
> > far.
>
> OK, so what you're basically saying here is that you want to encode
> the failover information in the write-ahead log rather than passing it
> at the protocol level, so that if you replay the write-ahead log on a
> time delay you get the same final state that you would have gotten if
> you had replayed it immediately.  I hadn't thought about that
> potential advantage, and I can see that it might be an advantage for
> some reason, but I don't yet understand what the reason is.  How would
> you imagine using any version of this feature in a PITR scenario?  If
> you PITR the master back to an earlier point in time, I don't see how
> you're going to manage without resyncing the replicas, at which point
> you may as well just drop the old slot and create a new one anyway.
>


I've realised that it's possible to work around it in app-space anyway. You
create a new slot on a node before you snapshot it, and you don't drop this
slot until you discard the snapshot. The existence of this slot ensures
that any WAL generated by the node (and replayed by PITR after restore)
cannot clobber needed catalog_xmin. If we xlog catalog_xmin advances or
have some other safeguard in place, which we need for logical decoding on
standby to be safe anyway, then we can fail gracefully if the user does
something dumb.

So no need to care about this.

(What I wrote previously on this was):

You definitely can't just PITR restore and pick up where you left off.

You need a higher level protocol between replicas to recover. For example,
in a multi-master configuration, this can be something like (simplified):

* Use the timeline history file to find the lsn at which we diverged from
our "future self", the failed node
* Connect to the peer and do logical decoding, with a replication origin
filter for "originating from me", for xacts from the divergence lsn up to
the peer's current end-of-wal.
* Reset peer's replication origin for us to our new end-of-wal, and resume
replication

To enable that to be possible, since we can't rewind slots once confirmed
advanced, maintain a backup slot on the peer corresponding to the
point-in-time at which a snapshot was taken.

For most other situations there is little benefit vs just re-creating the
slot before you permit write user-initiated write xacts to begin on the
restored node.

I can accept an argument that "we" as pgsql-hackers do not consider this
something worth caring about, should that be the case. It's niche enough
that you could argue it doesn't have to be supportable in stock postgres.


Maybe you're thinking of a scenario where we PITR the master and also
> use PITR to rewind the replica to a slightly earlier point?


That can work, but must be done in lock-step. You have to pause apply on
both ends for long enough to snapshot both, otherwise the replicaion
origins on one end get out of sync with the slots on another.

Interesting, but I really hope nobody's going to need to do it.


> But I
> can't quite follow what you're thinking about.  Can you explain
> further?
>

Gladly.

I've been up to my eyeballs in this for years now, and sometimes it becomes
quite hard to see the outside perspective, so thanks for your patience.


>
> > Requiring live replication connections could also be an issue for service
> > interruptions, surely?  Unless you persist needed knowledge in the
> physical
> > replication slot used by the standby to master connection, so the master
> can
> > tell the difference between "downstream went away for while but will come
> > back" and "downstream is gone forever, toss out its resources."
>
> I don't think the master needs to retain any resources on behalf of
> the failover slot.  If the slot has been updated by feedback from the
> assoc

Re: [HACKERS] Walsender timeouts and large transactions

2017-08-09 Thread Craig Ringer
On 9 August 2017 at 21:23, Petr Jelinek <petr.jeli...@2ndquadrant.com>
wrote:

> On 02/08/17 19:35, Yura Sokolov wrote:
> > The following review has been posted through the commitfest application:
> > make installcheck-world:  tested, passed
> > Implements feature:   not tested
> > Spec compliant:   not tested
> > Documentation:not tested
> >
> > There is no check for (last_reply_timestamp <= 0 || wal_sender_timeout
> <= 0) as in other places
> > (in WalSndKeepaliveIfNecessary for example).
> >
> > I don't think moving update of 'now' down to end of loop body is correct:
> > there are calls to ProcessConfigFile with SyncRepInitConfig,
> ProcessRepliesIfAny that can
> > last non-negligible time. It could lead to over sleeping due to larger
> computed sleeptime.
> > Though I could be mistaken.
> >
> > I'm not sure about moving `if (!pg_is_send_pending())` in a body loop
> after WalSndKeepaliveIfNecessary.
> > Is it necessary? But it looks harmless at least.
> >
>
> We also need to do actual timeout handing so that the timeout is not
> deferred to the end of the transaction (Which is why I moved `if
> (!pg_is_send_pending())` under WalSndCheckTimeOut() and
> WalSndKeepaliveIfNecessary() calls).
>
> > Could patch be reduced to check after first `if (!pg_is_sendpending())`
> ? like:
> >
> >   if (!pq_is_send_pending())
> > - return;
> > + {
> > + if (last_reply_timestamp <= 0 || wal_sender_timeout <= 0)
> > + {
> > + CHECK_FOR_INTERRUPTS();
> > + return;
> > + }
> > + if (now <= TimestampTzPlusMilliseconds(last_reply_timestamp,
> wal_sender_timeout / 2))
> > + return;
> > + }
> >
> > If not, what problem prevents?
>
> We should do CHECK_FOR_INTERRUPTS() independently of pq_is_send_pending
> so that it's possible to stop walsender while it's processing large
> transaction.
>
>
Is there any chance of getting this bugfix into Pg 10?

We've just cut back branches, so it'd be a sensible time.



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


  1   2   3   4   5   6   7   8   9   10   >