On Fri, Sep 12, 2014 at 12:48 AM, Robert Haas <robertmh...@gmail.com> wrote: > On Wed, Sep 10, 2014 at 11:40 PM, Michael Paquier > <michael.paqu...@gmail.com> wrote: >> Currently two nodes can only have the same priority if they have the >> same application_name, so we could for example add a new connstring >> parameter called, let's say application_group, to define groups of >> nodes that will have the same priority (if a node does not define >> application_group, it defaults to application_name, if app_name is >> NULL, well we don't care much it cannot be a sync candidate). That's a >> first idea that we could use to control groups of nodes. And we could >> switch syncrep.c to use application_group in s_s_names instead of >> application_name. That would be backward-compatible, and could open >> the door for more improvements for quorum commits as we could control >> groups node nodes. Well this is a super-set of what application_name >> can already do, but there is no problem to identify single nodes of >> the same data center and how much they could be late in replication, >> so I think that this would be really user-friendly. An idea similar to >> that would be a base work for the next thing... See below. > > In general, I think the user's requirement for what synchronous > standbys could need to acknowledge a commit could be an arbitrary > Boolean expression - well, probably no NOT, but any amount of AND and > OR that you want to use. Can someone want A OR (((B AND C) OR (D AND > E)) AND F)? Maybe! Based on previous discussions, it seems not > unlikely that as soon as we decide we don't want to support that, > someone will tell us they can't live without it. In general, though, > I'd expect the two common patterns to be more or less what you've set > forth above: any K servers from set X plus any L servers from set Y > plus any M servers from set Z, etc. However, I'm not confident it's > right to control this by adding more configuration on the client side. > I think it would be better to stick with the idea that each client > specifies an application_name, and then the master specifies the > policy in some way. One advantage of that is that you can change the > rules in ONE place - the master - rather than potentially having to > update every client. OK. I see your point.
Now, what about the following assumptions (somewhat restrictions to facilitate the user experience for setting syncrep and the parametrization of this feature): - Nodes are defined within the same set (or group) if they have the same priority, aka the same application_name. - One node cannot be a part of two sets. That's obvious... The current patch has its own merit, but it fails in the case you and Heikki are describing: wait for k nodes in set 1 (nodes with lowest priority value), l nodes in set 2 (nodes with priority 2nd lowest priority value), etc. What is does is, if for example we have a set of nodes with priorities {0,1,1,2,2,3,3}, backends will wait for flush_position from the first s_s_num nodes. By setting s_s_num to 3, we'll wait for {0,1,1}, to 2 {0,1,1,2}, etc. Now what about that: instead of waiting for the nodes in "absolute" order like the way current patch does, let's do it in a "relative" way. By that I mean that a backend waits for flush_position confirmation only from *1* node among a set of nodes having the same priority. So by using s_s_num = 3, we'll wait for {0, "one node with 1", "one node with 2"}, and you can guess the rest. The point is as well that we can keep s_s_num behavior as it is now: - if set at -1, we rely on the default way of doing with s_s_names (empty means all nodes async, at least one entry meaning that we need to wait for a node) - if set at 0, all nodes are forced to be async'd - if set at n > 1, we have to wait for one node in each set of the N-lowest priority values. I'd see enough users happy with those improvements, and that would help improving the coverage of test cases that Heikki and you envisioned. By the way, as the CF is running low in time, I am going to mark this patch as "Returned with Feedback" as I have received enough feedback. I am still planning to work on that for the next CF, so it would be great if there is an agreement on what can be done for this feature to avoid blind progress. Particularly I see some merit in the last idea, that we could still extend by allowing values of the type "k,l,m" in s_s_num to let user decide: wait for 3 sets, k nodes in set 1, l nodes in set 2 and m nodes in set 3. Having a GUC parameter with integer values is not that user-friendly though, so I think that I'd hold on having only one node for a single set. Thoughts? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers