Re: [HACKERS] Standard replication interface?
On Thu, 2002-08-15 at 15:36, Tom Lane wrote: Well, I am, but I'm only speaking for myself here: Fair enough. I think there is room for several replication solutions for Postgres (three or four, maybe). If the ideal solution count is merely one with a maybe on two then I tend to concur that any specification along these lines would *mostly* be a waste. On the other hand, if we can count three or more possible replication solutions, IMHO, there seemingly would be merit is providing some sort of defacto monitoring interface. Seems the current difficulty is forecasting the future in this regard. Perhaps other core developers would care to chime in and share their vision? CVS tree. So assuming that the Postgres-R project gets to the point of usefulness, I'd vote in favor of integrating it. On the other hand, I guess I should ask. Do the developers foresee immediate usability from this project or are we looking at something that's a year+ away? I don't think I have a problem helping guide what could be an interim solution if the interim window were large enough. In theory, monitoring tools developed between now and the closing of the window could largely continue to function without change. That, of course, assumes that even the end-run solutions would implement the interface as well. The return on such a concept is that it allows generic monitoring tools to mature while providing value now and in the future. The end result should be a stronger, more powerful tool base which matures while other technologies are still being developed. Another question along this line is, once something rolls into a core position, does that obsolete all other existing implementations or merely become the defacto in a bag of solutions? Tom seems to hint at the later. If the answer is the former then that seemingly argues not to worry about this...unless the window for usefulness and/or inclusion is rather large. As for the point at hand: I'm fairly dubious that a common monitoring API will be very useful, considering how different the possible Well, all replication scenarios have a lot in common. They should, after all, they are all doing the same thing. Since the different strategies for accomplishing replication are well understood, it seems well within reason to assume that someone can put their brain around this. I can also imagine that the specification includes requirements as well as optional facilities. Certainly capability queries would further iron out any gaps between differing solutions/implementations. replication approaches are. If Greg can prove me wrong, fine. But I don't want to see us artificially constraining replication solutions by insisting that they meet some prespecified API. Hmmm. I'm not sure how it would act as a constraining force. To me, this implies that any such specification would fail to evolve and could not be revised based on feedback. IMO, most specifications are regarded as living documents. While I can see that some specifications are set in stone, I certainly am not so bold as to assert my crystal ball even came with batteries. ;) That is, I assume some level of revision to an initial specification would be required following real-world use. Regards, Greg Copeland signature.asc Description: This is a digitally signed message part
Re: [HACKERS] Standard replication interface?
Greg Copeland [EMAIL PROTECTED] writes: I guess I should ask. Do the developers foresee immediate usability from [Postgres-R] or are we looking at something that's a year+ away? Darren Johnson would be the man to answer that, but from what he said at OSCON it sounded like we'd be seeing something useful by the end of the year, with all the usual caveats about time actually being available to work on it. As for the point at hand: I'm fairly dubious that a common monitoring API will be very useful, considering how different the possible Well, all replication scenarios have a lot in common. They should,=20 after all, they are all doing the same thing. The end goal is approximately the same, but the mechanisms are totally different, and that means that what you want to monitor is totally different. Perhaps the problem is that you're using the wrong word, and that what you would like to standardize is not monitoring but administrative functions. For example, I'd classify selecting tables to be replicated as an admin task. Monitoring to me means something like how much data is in the queue to be pushed out to slave X?, which is a question that already presupposes a heck of a lot about the implementation. I could agree with a set of guidelines that say stuff like if your mechanism is capable of selecting individual tables to replicate, then here's the preferred way to control that feature. But I'm not sure that there's enough common functionality for monitoring (in the above sense) to be worth standardizing. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Standard replication interface?
Well, that's a different issue. ;) I initially wanted to get feedback to see if anyone else thought the concept might hold some merit. I take it from your answer you think it might...but are scratching your head wondering exactly what it entails... Greg On Wed, 2002-08-14 at 22:47, Tom Lane wrote: Greg Copeland [EMAIL PROTECTED] writes: ... it occurred to me that a predefined set of views and/or tables for all replication implementations may be worthwhile. Do we understand replication well enough to define such a set of views? I sure don't ... regards, tom lane signature.asc Description: This is a digitally signed message part
Re: [HACKERS] Standard replication interface?
On Wed, Aug 14, 2002 at 10:15:32PM -0500, Greg Copeland wrote: Reading about the pgmonitor thread and mention of gborg made me wonder about replication and ready ability to uniformly monitor it. Just as pg_stat* tables exist to allow for statistic gathering and monitoring in a uniform fashion, it occurred to me that a predefined set of views and/or tables for all replication implementations may be worthwhile. That way, no matter what replication method/tool is being used, as long as it conforms to the defined replication interfaces, generic monitoring tools can be used to keep an eye on things. That sounds like the cart is before the horse. You need to know what sort of replication scheme you might ever have before you could know the statistics that you might want to know. There are different sorts of replication schemes under consideration. For instance, rserv uses an asynchronous master/slave approach, which relies on slaves that are almost dumb as chickens. (Not quite. There is some data about the state of replication in the slave database; but most of it is in the master.) Postgres-R, on the other hand, contemplates a distributed model wherein different database machines participate in a pool. So for rserv-style replication, you want to know (for instance) average slave-update times, and whether slaves are getting behind, and by how much, and such. Balancing of inserts, however, is not relevant, because you can't do that. Postgres-R will have the opposite need: you'll want to know what sort of load balancing you're getting, but time-to-replicate is not relevant, because a commit on one machine is necessarily a commit everywhere (that's why it's eager replication). You probably could design a set of statistics that would cover all cases, but only after you know what the cases were. A -- Andrew Sullivan 87 Mowat Avenue Liberty RMS Toronto, Ontario Canada [EMAIL PROTECTED] M6K 3E3 +1 416 646 3304 x110 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Standard replication interface?
On Thu, 2002-08-15 at 09:47, Andrew Sullivan wrote: On Wed, Aug 14, 2002 at 10:15:32PM -0500, Greg Copeland wrote: That way, no matter what replication method/tool is being used, as long as it conforms to the defined replication interfaces, generic monitoring tools can be used to keep an eye on things. That sounds like the cart is before the horse. You need to know what sort of replication scheme you might ever have before you could know the statistics that you might want to know. Hmmm. Never heard of an inquiry for interest in a concept as putting the cart before the horse. Considering this is pretty much how things get developed in the real world, I'm not sure what you feel is so special about replication. First step is always identify the need. I'm attempting to do so. Not sure what you'd consider the first step to be but I can assure you, regardless of this concept seeing the light of day, it is the first step. The horse is correctly positioned in front of the cart. I also stress that I'm talking about a statistical replication interface. It occurred to me that you might of been confused on this matter. That is, a set of tables and views will allow for the replication process to be uniformly *monitored*. I am not talking about a set of interfaces which all manner of replication much perform its job through (interface with databases for replication). There are different sorts of replication schemes under consideration. Yep. Thus it would seemingly be ideal to have a specification which different implementations would seek to implement. Off of the top of my head and for starters, a table and/or view which could can queried that returns the tables that are being replicated sounds good to me. Same thing for the list of databases, the servers involved and their associated role (master, slave, peer). Without such a concept, there will be no standardized way to monitor your replication. As such, chances are one of two things will happen. One, a single replication method will be championed and fair tools will develop to support where all others are bastards. Two, quality tools to monitor replication will never materialize because each method for monitoring is specific to the different types of implementations. Resources will constantly be spread amongst a variety of well meaning projects. --Greg signature.asc Description: This is a digitally signed message part
Re: [HACKERS] Standard replication interface?
On Thu, 2002-08-15 at 09:53, Neil Conway wrote: That's exactly what I was going to say -- I'd prefer that any interested parties concentrate on producing a *really good* replication implementation, which might eventually be integrated into PostgreSQL itself. Producing a generic API for something that really doesn't need genericity sounds like a waste of time, IMHO. Cheers, Neil Some how I get the impression that I've been completely misunderstood. Somehow, people seem to of only read the subject and skipped the body explaining the concept. In what way would providing a generic interface to *monitor* be a waste of time? In what way would that prevent someone from producing a *readlly good* replication implementation? I utterly fail to see the connection. Regards, Greg Copeland signature.asc Description: This is a digitally signed message part
Re: [HACKERS] Standard replication interface?
As I said -- I don't really see the need for a bunch of replication implementations, and therefore I don't see the need for a generic API to make the whole mess (slightly) more manageable. I see. So the intension of the core developers is to have one and only one replication solution? Greg signature.asc Description: This is a digitally signed message part
Re: [HACKERS] Standard replication interface?
Greg Copeland [EMAIL PROTECTED] writes: As I said -- I don't really see the need for a bunch of replication implementations, and therefore I don't see the need for a generic API to make the whole mess (slightly) more manageable. I see. So the intension of the core developers is to have one and only one replication solution? Not being a core developer, I can't comment on their intentions. That said, I _personally_ don't see the need for more than one or two replication implementations. You might need more than one if you wanted to do both lazy and eager replication, for example. But you certainly don't need 5 or 6 or however many implementations exist at the moment. I think the reason there are a lot of different implementations at the moment is that each one has some pretty serious problems. So rather than trying to reduce the problem by making it slightly easier for the different replication solutions to inter-operate, I think it's a better idea to solve the problem outright by improving one of the existing replication projects to the point at which it is ready for widespread production usage. Cheers, Neil -- Neil Conway [EMAIL PROTECTED] PGP Key ID: DB3C29FC ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Standard replication interface?
On Thu, 2002-08-15 at 13:18, Neil Conway wrote: That said, I _personally_ don't see the need for more than one or two replication implementations. You might need more than one if you wanted to do both lazy and eager replication, for example. But you certainly don't need 5 or 6 or however many implementations exist at the moment. Fair enough. Thank you for offering a complete explanation. You're argument certainly made sense. I wasn't aware of any single serious effort underway which sought to finally put replication to bed, let alone integrated into the core code base. Sign, Greg Copeland signature.asc Description: This is a digitally signed message part
Re: [HACKERS] Standard replication interface?
--=-QQHYShMlxI2BY71i6NiO Content-Type: text/plain Content-Transfer-Encoding: quoted-printable As I said -- I don't really see the need for a bunch of replication implementations, and therefore I don't see the need for a generic API to make the whole mess (slightly) more manageable. I see. So the intension of the core developers is to have one and only one replication solution? If the various solutions may be folded down into a smaller set of programs, perhaps, ultimately, into _one_ program, that would surely be easier to manage, in the codebase, than having five or six such programs. If one program can do the job that needs to be done, and it has not been _clearly_ established that that is _not_ possible, then I'd think it rather silly to have a bunch of replication solutions that need to be updated any time a relevant change goes into the database engine. I'd be surprised if, in the end, there truly _needed_ to be more than about two approaches. Should the team plan to _have_ a mess? I'd think not. -- (concatenate 'string cbbrowne ntlug.org) http://cbbrowne.com/info/linuxdistributions.html We don't understand the software, and sometimes we don't understand the hardware, but we can *see* the blinking lights! -- Unknown ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Standard replication interface?
Neil Conway [EMAIL PROTECTED] writes: Greg Copeland [EMAIL PROTECTED] writes: I see. So the intension of the core developers is to have one and only one replication solution? Not being a core developer, I can't comment on their intentions. Well, I am, but I'm only speaking for myself here: I think there's definitely a need for at least two replication implementations: sync and async. The space of requirements is wide enough that there's not a one-size-fits-all solution. You might care to look at Darren Johnson's OSCON slides for more about this: http://conferences.oreillynet.com/cs/os2002/view/e_sess/3280 I think there is room for several replication solutions for Postgres (three or four, maybe). It's difficult to say what will wind up in our core distribution. A tightly linked implementation like Postgres-R is really impractical as an add-on: you need enough mods of the core code that it'd be a nightmare to try to maintain if it's not integrated into the regular CVS tree. So assuming that the Postgres-R project gets to the point of usefulness, I'd vote in favor of integrating it. On the other hand, it's possible to do good stuff without touching the core code at all (cf. PostgreSQL Inc's rserv) and in that case there may or may not be any interest in integrating the code. It's really gonna depend mostly on the wishes of the people who develop the replication solutions, I think. I can foresee a time when there are one or two replication solutions that are included in the base distribution and others are available separately. In fact, counting contrib/rserv that more or less describes the state of affairs today. What we need is more work on the available solutions to improve their quality and general usefulness. As for the point at hand: I'm fairly dubious that a common monitoring API will be very useful, considering how different the possible replication approaches are. If Greg can prove me wrong, fine. But I don't want to see us artificially constraining replication solutions by insisting that they meet some prespecified API. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Standard replication interface?
Greg Copeland [EMAIL PROTECTED] writes: ... it occurred to me that a predefined set of views and/or tables for all replication implementations may be worthwhile. Do we understand replication well enough to define such a set of views? I sure don't ... regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster