Re: [DOCS] High Availability, Load Balancing, and Replication Feature Matrix

Markus Schiltknecht Mon, 12 Nov 2007 02:59:01 -0800

Hello Bruce,

Bruce Momjian wrote:

Sorry, I forgot who was involved in that discussion.


Well, at least that means I didn't annoy you to death last time ;-)

With the other two I'm unsure.. I see it's very hard to find helpfulpositive formulations...
Yea, that's where I got stuck --- that the positives were harder to
understand.


Okay, understood.

Sorry, I meant that a master that is modifying data is slowed down by
other masters to an extent that doesn't happen in other cases (e.g. with
slaves).  Is the current "No inter-server locking delay" OK?

Yes, sort of. I know what you meant, but I find it hard to understand.And with regard to anything except lazy or eager replication, it doesnot make any sense. Its pretty moot saying anything about "inter-serverlocking delays" for "statement-based replication middleware": you don'tknow if it's lazy or eager. And all other solutions you are mentioningare single-master or no replication at all. When there's only onemaster, it's pretty obvious that there can't be no inter-(master)-serverlocking delay. (Well, it's also very obvious that a single master never'conflicts' with itself...)

Given you want to cover existing solutions, one could say, that (AFAIK)all statement based replication solutions are eager. But in that case,the dot would be wrong, because the middleware would need to wait for atleast an absolute majority to confirm the commit. Which as well leads toexcessive locking, as you are saying for "synchronous multi-masterreplication". Because it's a property inherent to eager multi-masterreplication, as we correctly explain above the feature matrix.

multi-master replication" as well as "statement-based replicationmiddleware" should not have a dot, because those as well slow down othermasters. In the async case at different points in time, yes, but allmaster have to write the data, which slows them down.
Yea, that is why I have the new text about locking.

To me this makes it sound like "statement-based replication" could befaster than "synchronous multi-master replication". That's absolutenonsense, since those two don't compare. Or put it another way: most"statement-based replication" solutions often are "synchronousmulti-master replication" as well.

[ In that sense, stating that "PostgreSQL does not offer this kind ofreplication" is wrong, under "Synchronous Multi-Master Replication". Asis the assumption, that all those send "data changes". Probably youshould clarify that to say: "tuple based, eager multi-masterreplication", because that's what you are talking about. ]

If you are comparing an eager, statement-based, multi-master replication(like PgCluster) with an eager, tuple-based, multi-master replication(like Postgres-R), the former can't possibly be faster than the later.I.e. it certainly doesn't have less (locking?) delays.

which is the reason we don't support it yet.
Uhm.. PgCluster *is* a synchronous multi-master replication solution. Italso is a middleware and it does statement based replication. Which dotsof the matrix do you think apply for it?
I don't consider PgCluster middleware because the servers have to
cooperate with the middleware.

Okay, then take Sequoia: statement-based, middleware, synchronous (thuseager) multi-master replication solution.

( I've never liked the term "middleware" in that chapter. It's solely aquestion of implementation and does not have much to do with otherconcepts of replication. )

And I am told it is much slower for
writes than a single server which supports my "locking" item, though it
is more "waiting for other masters" that is the delay, I think.

Uh.. with the dot there, you are saying that "statement basedmiddleware" does *not* have any inter-server locking delay.

What's the difference between "waiting for other masters" and "lockingdelay"? What exactly do you consider a lock? Why should it be lockingwhen using binary-tuple replication, but not when using statement basedreplication?

I don't assume the disk failover has mirrored disks.  It can just like a
single server can, but it isn't part of the backend process, and I
assume a RAID card that has RAM that can cache writes.

In that case, you'd loose the "master failure will never lose data"property, no? Or do you trust the writeback cache and the connection tothe NAS that much as to assume it never fails?

I don't think
the network is an issue considering many use NAS anyway.
I think you are comparing an enterprise NAS to a low-cost, commodityhardware clustered filesystem. Take the same amount of money and thesame number of mirrors and you'll get comparable performance.
Agreed.  In the one case you are relying on another server, and in the
NAS case you are relying on a black box server.  I think the big
difference is that the other server is a separate entity, while the NAS
is a shared item.

Correct, thus the former is a kind of single-master replication, whilethe later cannot be considered replication (lacking a replica). It'srather a variant of how to enhance reliability of your single-masterdatabase server system.

There is no dot there so I am saying "statement based replication
solution" requires conflict resolution.  Agreed you could do it without
conflict resolution and it is kind of independent.  How should we deal
with this?

Maybe a third state: 'n/a'?


Good idea, or "~".  How would middleware avoid conflicts, i.e. how would
it know that two incoming queries were in conflict?

A majority of servers rejecting or blocking the query? In case of aminority, which blocks, the majority would win and apply thetransaction, while the minority would have to replay the transaction? Idon't know, probably most solutions do something simpler, like abortinga transaction even if only one server fails. Much simpler, andsufficient for most cases.

(Why do you ask me, I'm advocating internal, tuple level replicationwith Postgres-R, not statement based one :-) )

I did move it below and removed it from the chart because as you say how
to replicate to the slaves is an independent issue.


Okay, I like that better, thanks.

With regard to replication, there's another feature I think would beworth mentioning: dynamic addition or removal of nodes (masters orslaves). But that's solely implementation dependent, so it probablydoesn't fit into the matrix.
Yea, I had that but found you could add/remove slaves easily in most
cases.


Hm.. you're right.

Another interesting property I'm missing is the existence of singlepoints of failures.
Ah, yea, but then you get into power and fire issues.


Which high-availability is all about, no?

But well, again, all kinds of replication (which excludes the NAS) cantheoretically be spread across the continent. So it might be prettyuseless to add dots for that.


Regards

Markus

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Re: [DOCS] High Availability, Load Balancing, and Replication Feature Matrix

Reply via email to