Re: [HACKERS] Standard replication interface?

2002-08-16 Thread Greg Copeland

On Thu, 2002-08-15 at 15:36, Tom Lane wrote:
 Well, I am, but I'm only speaking for myself here:
 

Fair enough.

 I think there is room for several replication solutions for Postgres
 (three or four, maybe).

If the ideal solution count is merely one with a maybe on two then I
tend to concur that any specification along these lines would *mostly*
be a waste.  On the other hand, if we can count three or more possible
replication solutions, IMHO, there seemingly would be merit is providing
some sort of defacto monitoring interface.

Seems the current difficulty is forecasting the future in this regard. 
Perhaps other core developers would care to chime in and share their
vision?

 CVS tree.  So assuming that the Postgres-R project gets to the point
 of usefulness, I'd vote in favor of integrating it.  On the other hand,

I guess I should ask.  Do the developers foresee immediate usability
from this project or are we looking at something that's a year+ away?  I
don't think I have a problem helping guide what could be an interim
solution if the interim window were large enough.  In theory, monitoring
tools developed between now and the closing of the window could largely
continue to function without change.  That, of course, assumes that even
the end-run solutions would implement the interface as well.

The return on such a concept is that it allows generic monitoring tools
to mature while providing value now and in the future.  The end result
should be a stronger, more powerful tool base which matures while other
technologies are still being developed.

Another question along this line is, once something rolls into a core
position, does that obsolete all other existing implementations or
merely become the defacto in a bag of solutions?  Tom seems to hint at
the later.  If the answer is the former then that seemingly argues not
to worry about this...unless the window for usefulness and/or inclusion
is rather large.

 As for the point at hand: I'm fairly dubious that a common monitoring
 API will be very useful, considering how different the possible

Well, all replication scenarios have a lot in common.  They should, 
after all, they are all doing the same thing.  Since the different
strategies for accomplishing replication are well understood, it seems
well within reason to assume that someone can put their brain around
this.

I can also imagine that the specification includes requirements as well
as optional facilities.  Certainly capability queries would further iron
out any gaps between differing solutions/implementations.

 replication approaches are.  If Greg can prove me wrong, fine.  But
 I don't want to see us artificially constraining replication solutions
 by insisting that they meet some prespecified API.

Hmmm.  I'm not sure how it would act as a constraining force.  To me,
this implies that any such specification would fail to evolve and could
not be revised based on feedback.  IMO, most specifications are regarded
as living documents.  While I can see that some specifications are set
in stone, I certainly am not so bold as to assert my crystal ball even
came with batteries.  ;)  That is, I assume some level of revision to an
initial specification would be required following real-world use.


Regards,

Greg Copeland 





signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Standard replication interface?

2002-08-16 Thread Tom Lane

Greg Copeland [EMAIL PROTECTED] writes:
 I guess I should ask.  Do the developers foresee immediate usability
 from [Postgres-R] or are we looking at something that's a year+ away?

Darren Johnson would be the man to answer that, but from what he said
at OSCON it sounded like we'd be seeing something useful by the end of
the year, with all the usual caveats about time actually being available
to work on it.

 As for the point at hand: I'm fairly dubious that a common monitoring
 API will be very useful, considering how different the possible

 Well, all replication scenarios have a lot in common.  They should,=20
 after all, they are all doing the same thing.

The end goal is approximately the same, but the mechanisms are totally
different, and that means that what you want to monitor is totally
different.

Perhaps the problem is that you're using the wrong word, and that what
you would like to standardize is not monitoring but administrative
functions.  For example, I'd classify selecting tables to be replicated
as an admin task.  Monitoring to me means something like how much data
is in the queue to be pushed out to slave X?, which is a question that
already presupposes a heck of a lot about the implementation.

I could agree with a set of guidelines that say stuff like if your
mechanism is capable of selecting individual tables to replicate,
then here's the preferred way to control that feature.  But I'm not
sure that there's enough common functionality for monitoring (in the
above sense) to be worth standardizing.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Standard replication interface?

2002-08-15 Thread Greg Copeland

Well, that's a different issue.  ;)

I initially wanted to get feedback to see if anyone else thought the
concept might hold some merit.

I take it from your answer you think it might...but are scratching your
head wondering exactly what it entails...

Greg


On Wed, 2002-08-14 at 22:47, Tom Lane wrote:
 Greg Copeland [EMAIL PROTECTED] writes:
  ... it occurred to me that a predefined set of views
  and/or tables for all replication implementations may be worthwhile.
 
 Do we understand replication well enough to define such a set of views?
 I sure don't ...
 
   regards, tom lane




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Standard replication interface?

2002-08-15 Thread Andrew Sullivan

On Wed, Aug 14, 2002 at 10:15:32PM -0500, Greg Copeland wrote:

 Reading about the pgmonitor thread and mention of gborg made me wonder
 about replication and ready ability to uniformly monitor it.  Just as
 pg_stat* tables exist to allow for statistic gathering and monitoring in
 a uniform fashion, it occurred to me that a predefined set of views
 and/or tables for all replication implementations may be worthwhile. 
 That way, no matter what replication method/tool is being used, as long
 as it conforms to the defined replication interfaces, generic monitoring
 tools can be used to keep an eye on things.

That sounds like the cart is before the horse.  You need to know what
sort of replication scheme you might ever have before you could
know the statistics that you might want to know.

There are different sorts of replication schemes under consideration. 
For instance, rserv uses an asynchronous master/slave approach, which
relies on slaves that are almost dumb as chickens.  (Not quite. 
There is some data about the state of replication in the slave
database; but most of it is in the master.)  Postgres-R, on the other
hand, contemplates a distributed model wherein different database
machines participate in a pool.

So for rserv-style replication, you want to know (for instance)
average slave-update times, and whether slaves are getting behind,
and by how much, and such.  Balancing of inserts, however, is not
relevant, because you can't do that.

Postgres-R will have the opposite need: you'll want to know what sort
of load balancing you're getting, but time-to-replicate is not
relevant, because a commit on one machine is necessarily a commit
everywhere (that's why it's eager replication).

You probably could design a set of statistics that would cover all
cases, but only after you know what the cases were.

A

-- 

Andrew Sullivan   87 Mowat Avenue 
Liberty RMS   Toronto, Ontario Canada
[EMAIL PROTECTED]  M6K 3E3
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Standard replication interface?

2002-08-15 Thread Greg Copeland

On Thu, 2002-08-15 at 09:47, Andrew Sullivan wrote:
 On Wed, Aug 14, 2002 at 10:15:32PM -0500, Greg Copeland wrote:
  That way, no matter what replication method/tool is being used, as long
  as it conforms to the defined replication interfaces, generic monitoring
  tools can be used to keep an eye on things.
 
 That sounds like the cart is before the horse.  You need to know what
 sort of replication scheme you might ever have before you could
 know the statistics that you might want to know.

Hmmm.  Never heard of an inquiry for interest in a concept as putting
the cart before the horse.  Considering this is pretty much how things
get developed in the real world, I'm not sure what you feel is so
special about replication.

First step is always identify the need.  I'm attempting to do so.  Not
sure what you'd consider the first step to be but I can assure you,
regardless of this concept seeing the light of day, it is the first
step.  The horse is correctly positioned in front of the cart.

I also stress that I'm talking about a statistical replication
interface.  It occurred to me that you might of been confused on this
matter.  That is, a set of tables and views will allow for the
replication process to be uniformly *monitored*.  I am not talking about
a set of interfaces which all manner of replication much perform its job
through (interface with databases for replication).

 
 There are different sorts of replication schemes under consideration. 

Yep.  Thus it would seemingly be ideal to have a specification which
different implementations would seek to implement.  Off of the top of my
head and for starters, a table and/or view which could can queried that
returns the tables that are being replicated sounds good to me.  Same
thing for the list of databases, the servers involved and their
associated role (master, slave, peer).

Without such a concept, there will be no standardized way to monitor
your replication.  As such, chances are one of two things will happen. 
One, a single replication method will be championed and fair tools will
develop to support where all others are bastards.  Two, quality tools to
monitor replication will never materialize because each method for
monitoring is specific to the different types of implementations. 
Resources will constantly be spread amongst a variety of well meaning
projects.


--Greg





signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Standard replication interface?

2002-08-15 Thread Greg Copeland

On Thu, 2002-08-15 at 09:53, Neil Conway wrote:
 That's exactly what I was going to say -- I'd prefer that any
 interested parties concentrate on producing a *really good*
 replication implementation, which might eventually be integrated into
 PostgreSQL itself.
 
 Producing a generic API for something that really doesn't need
 genericity sounds like a waste of time, IMHO.
 
 Cheers,
 
 Neil


Some how I get the impression that I've been completely misunderstood. 
Somehow, people seem to of only read the subject and skipped the body
explaining the concept.

In what way would providing a generic interface to *monitor* be a waste
of time?  In what way would that prevent someone from producing a
*readlly good* replication implementation?  I utterly fail to see the
connection.

Regards,
Greg Copeland




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Standard replication interface?

2002-08-15 Thread Greg Copeland

 As I said -- I don't really see the need for a bunch of replication
 implementations, and therefore I don't see the need for a generic API
 to make the whole mess (slightly) more manageable.

I see.  So the intension of the core developers is to have one and only
one replication solution?

Greg






signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Standard replication interface?

2002-08-15 Thread Neil Conway

Greg Copeland [EMAIL PROTECTED] writes:
  As I said -- I don't really see the need for a bunch of replication
  implementations, and therefore I don't see the need for a generic API
  to make the whole mess (slightly) more manageable.
 
 I see.  So the intension of the core developers is to have one and only
 one replication solution?

Not being a core developer, I can't comment on their intentions.

That said, I _personally_ don't see the need for more than one or two
replication implementations. You might need more than one if you
wanted to do both lazy and eager replication, for example. But you
certainly don't need 5 or 6 or however many implementations exist at
the moment.

I think the reason there are a lot of different implementations at the
moment is that each one has some pretty serious problems. So rather
than trying to reduce the problem by making it slightly easier for the
different replication solutions to inter-operate, I think it's a
better idea to solve the problem outright by improving one of the
existing replication projects to the point at which it is ready for
widespread production usage.

Cheers,

Neil

-- 
Neil Conway [EMAIL PROTECTED]
PGP Key ID: DB3C29FC


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Standard replication interface?

2002-08-15 Thread Greg Copeland

On Thu, 2002-08-15 at 13:18, Neil Conway wrote:
 That said, I _personally_ don't see the need for more than one or two
 replication implementations. You might need more than one if you
 wanted to do both lazy and eager replication, for example. But you
 certainly don't need 5 or 6 or however many implementations exist at
 the moment.

Fair enough.  Thank you for offering a complete explanation.

You're argument certainly made sense.  I wasn't aware of any single
serious effort underway which sought to finally put replication to bed,
let alone integrated into the core code base.

Sign,

Greg Copeland



signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Standard replication interface?

2002-08-15 Thread cbbrowne

 --=-QQHYShMlxI2BY71i6NiO
 Content-Type: text/plain
 Content-Transfer-Encoding: quoted-printable
 
  As I said -- I don't really see the need for a bunch of replication
  implementations, and therefore I don't see the need for a generic API
  to make the whole mess (slightly) more manageable.
 
 I see.  So the intension of the core developers is to have one and only
 one replication solution?

If the various solutions may be folded down into a smaller set of programs, 
perhaps, ultimately, into _one_ program, that would surely be easier to 
manage, in the codebase, than having five or six such programs.

If one program can do the job that needs to be done, and it has not been 
_clearly_ established that that is _not_ possible, then I'd think it rather 
silly to have a bunch of replication solutions that need to be updated any 
time a relevant change goes into the database engine.

I'd be surprised if, in the end, there truly _needed_ to be more than about 
two approaches.

Should the team plan to _have_ a mess?  I'd think not.
--
(concatenate 'string cbbrowne ntlug.org)
http://cbbrowne.com/info/linuxdistributions.html
We don't understand the  software, and sometimes we don't  understand
the hardware, but we can *see* the blinking lights!  -- Unknown



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Standard replication interface?

2002-08-15 Thread Tom Lane

Neil Conway [EMAIL PROTECTED] writes:
 Greg Copeland [EMAIL PROTECTED] writes:
 I see.  So the intension of the core developers is to have one and only
 one replication solution?

 Not being a core developer, I can't comment on their intentions.

Well, I am, but I'm only speaking for myself here:

I think there's definitely a need for at least two replication
implementations: sync and async.  The space of requirements is wide
enough that there's not a one-size-fits-all solution.  You might care
to look at Darren Johnson's OSCON slides for more about this:
http://conferences.oreillynet.com/cs/os2002/view/e_sess/3280
I think there is room for several replication solutions for Postgres
(three or four, maybe).

It's difficult to say what will wind up in our core distribution.
A tightly linked implementation like Postgres-R is really impractical
as an add-on: you need enough mods of the core code that it'd be a
nightmare to try to maintain if it's not integrated into the regular
CVS tree.  So assuming that the Postgres-R project gets to the point
of usefulness, I'd vote in favor of integrating it.  On the other hand,
it's possible to do good stuff without touching the core code at all
(cf. PostgreSQL Inc's rserv) and in that case there may or may not be
any interest in integrating the code.  It's really gonna depend mostly
on the wishes of the people who develop the replication solutions,
I think.

I can foresee a time when there are one or two replication solutions
that are included in the base distribution and others are available
separately.  In fact, counting contrib/rserv that more or less describes
the state of affairs today.  What we need is more work on the available
solutions to improve their quality and general usefulness.

As for the point at hand: I'm fairly dubious that a common monitoring
API will be very useful, considering how different the possible
replication approaches are.  If Greg can prove me wrong, fine.  But
I don't want to see us artificially constraining replication solutions
by insisting that they meet some prespecified API.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Standard replication interface?

2002-08-14 Thread Tom Lane

Greg Copeland [EMAIL PROTECTED] writes:
 ... it occurred to me that a predefined set of views
 and/or tables for all replication implementations may be worthwhile.

Do we understand replication well enough to define such a set of views?
I sure don't ...

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster