David,
This can be resolved by requiring that for any transaction to succeed the
entrypoint database must receive acknowlegements from n/2 + 0.5 (rounded up
to the nearest integer) databases where n is the total number in the
replicant set. The following cases are shown as an example;
Total Number of databases: 2
Number required to accept transaction: 2
Total Number of databases: 3
Number required to accept transaction: 2
Total Number of databases: 4
Number required to accept transaction: 3
Total Number of databases: 5
Number required to accept transaction: 3
Total Number of databases: 6
Number required to accept transaction: 4
Total Number of databases: 7
Number required to accept transaction: 4
Total Number of databases: 8
Number required to accept transaction: 5
This would prevent two replicant sub-sets forming, because it is impossible
for both sets to have over 50% of the databases.
Applications could be able to detect when a database has dropped out of the
replicant set because the database could report a state of Unable to obtain
majority consesus. This would allow applications differentiate between a
database out of the set where writing to other databases in the set could
yield a sucessful result, and Unable to commit due to conflict where
trying other databases is pointless.
Al
Example
- Original Message -
From: David Walker [EMAIL PROTECTED]
To: Al Sutton [EMAIL PROTECTED]; Darren Johnson
[EMAIL PROTECTED]
Cc: Bruce Momjian [EMAIL PROTECTED]; Jan Wieck
[EMAIL PROTECTED]; [EMAIL PROTECTED];
PostgreSQL-development [EMAIL PROTECTED]
Sent: Sunday, December 15, 2002 2:29 PM
Subject: Re: [MLIST] Re: [mail] Re: [HACKERS] Big 7.4 items - Replication
Another concern I have with multi-master systems is what happens if the
network splits in 2 so that 2 master systems are taking commits for 2
separate sets of clients. It seems to me that to re-sync the 2 databases
upon the network healing would be a very complex task or impossible task.
On Sunday 15 December 2002 04:16 am, Al Sutton wrote:
Many thanks for the explanation. Could you explain to me where the order
or
the writeset for the following scenario;
If a tranasction takes 50ms to reach one database from another, for a
specific data element (called X), the following timeline occurs
at 0ms, T1(X) is written to system A.
at 10ms, T2(X) is written to system B.
Where T1(X) and T2(X) conflict.
My concern is that if the Group Communication Daemon (gcd) is operating
on
each database, a successful result for T1(X) will returned to the
client
talking to database A because T2(X) has not reached it, and thus no
conflict is known about, and a sucessful result is returned to the
client
submitting T2(X) to database B because it is not aware of T1(X). This
would
mean that the two clients beleive bothe T1(X) and T2(X) completed
succesfully, yet they can not due to the conflict.
Thanks,
Al.
- Original Message -
From: Darren Johnson [EMAIL PROTECTED]
To: Al Sutton [EMAIL PROTECTED]
Cc: Bruce Momjian [EMAIL PROTECTED]; Jan Wieck
[EMAIL PROTECTED]; [EMAIL PROTECTED];
PostgreSQL-development [EMAIL PROTECTED]
Sent: Saturday, December 14, 2002 6:48 PM
Subject: Re: [mail] Re: [HACKERS] Big 7.4 items - Replication
b) The Group Communication blob will consist of a number of processes
which
need to talk to all of the others to interrogate them for changes
which
may
conflict with the current write that being handled and then issue the
transaction response. This is basically the two phase commit solution
with
phases moved into the group communication process.
I can see the possibility of using solution b and having less group
communication processes than databases as attempt to simplify things,
but this would mean the loss of a number of databases if the machine
running
the
group communication process for the set of databases is lost.
The group communication system doesn't just run on one system. For
postgres-r using spread
there is actually a spread daemon that runs on each database server.
It
has nothing to do with
detecting the conflicts. Its job is to deliver messages in a total
order for writesets or simple order
for commits, aborts, joins, etc.
The detection of conflicts will be done at the database level, by a
backend processes. The basic
concept is if all databases get the writesets (changes) in the exact
same order, apply them in a
consistent order, avoid conflicts, then one copy serialization is
achieved. (one copy of the database
replicated across all databases in the replica)
I hope that explains the group communication system's responsibility.
Darren
---(end of
broadcast)---
TIP 5: Have you checked our extensive FAQ?
http://www.postgresql.org/users-lounge/docs/faq.html