Re: [HACKERS] replication hooks

2008-06-01 Thread James Mansion

Marko Kreen wrote:

There is this tiny matter of replicating schema changes asynchronously,
but I suspect nobody actually cares.  Few random points about that:
  
I'm not sure I follow you - the Sybase 'warm standby' replication of 
everything is really
useful for business continuity.  The per-table rep is more effective for 
publishing reference

data, but is painful to maintain.

Not having something that automagically reps a complete copy including 
DDL (except

for temp tables) is a major weakness IMO.

James


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] replication hooks

2008-05-30 Thread Robert Hodges
Hi Marko,

Replication requirements vary widely of course, but DDL support is shared by 
such a wide range of use cases it is very difficult to see how any real 
solution would fail to include it.  This extends to change extraction APIs, 
however, defined.  The question of what DDL to replicate is also quite 
clear-all of it with as few exceptions as possible.

For instance, it is almost impossible to set up and manage replicated systems 
easily if you cannot propagate schema changes in serialized order along with 
other updates from applications.  The inconvenience of using alternative 
mechanisms like the SLONY 'execute script' is considerable and breaks most 
commonly used database management tools.

That said, SLONY at least serializes the changes.  Non-serialized approaches 
lead to serious outages and can get you into distributed consensus problems, 
such as when is it 'safe' to change schema across different instances.  These 
are very hard to solve practically and tend to run into known impossibility 
results like Brewer's Conjecture, which holds that it is impossible to keep 
distributed databases consistent while also remaining open for updates and 
handling network partitions.

I'll post back later on the question of the API.  The key is to do something 
simple that avoids the problems discussed by Andrew and ties it accurately to 
use cases.  However, this requires a more prepared response than my hastily 
written post from last night.

Cheers, Robert

On 5/29/08 9:05 PM, Marko Kreen [EMAIL PROTECTED] wrote:

On 5/29/08, Andrew Sullivan [EMAIL PROTECTED] wrote:
 On Thu, May 29, 2008 at 12:05:18PM -0700, Robert Hodges wrote:
   people are starting to get religion on this issue I would strongly
   advocate a parallel effort to put in a change-set extraction API
   that would allow construction of comprehensive master/slave
   replication.

 You know, I gave a talk in Ottawa just last week about how the last
  effort to develop a comprehensive API for replication failed.  I had
  some ideas about why, the main one of which is something like this:
  Big features with a roadmap have not historically worked, so unless
  we're willing to change the way we work, we won't get that.

  I don't think an API is what's needed.  It's clear proposals for
  particlar features that can be delivered in small pieces.  That's what
  the current proposal offers.  I think any kind of row-based approach
  such as what you're proposing would need that kind of proposal too.

  That isn't to say that I think an API is impossible or undesirable.
  It is to say that the last few times we tried, it went nowhere; and
  that I don't think the circumstances have changed.

I think the issue is simpler - API for synchronous replication is
undesirable - it would be too complex and hinder future development
(as I explained above).

And the API for asynchronous replication is already there - triggers,
txid functions for queueing.

There is this tiny matter of replicating schema changes asynchronously,
but I suspect nobody actually cares.  Few random points about that:

- The task cannot even be clearly defined (on technical level - how
  the events should be represented).
- Any schema changes need to be carefully prepared anyway.  Whether
  to apply them to one or more servers does not make much difference.
- Major plus of async replica is ability to actually have different
  schema on slaves.
- People _do_ care about exact schema on single place - failover servers.
- But for failover server we want also synchronous replication.

So if we have synchronous WAL based replication for failover servers,
the interest in hooks to log schema changes will decrease even more.

--
marko

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



--
Robert Hodges, CTO, Continuent, Inc.
Email:  [EMAIL PROTECTED]
Mobile:  +1-510-501-3728  Skype:  hodgesrm


[HACKERS] replication hooks

2008-05-29 Thread Marko Kreen
On 5/29/08, Andrew Sullivan [EMAIL PROTECTED] wrote:
 On Thu, May 29, 2008 at 12:05:18PM -0700, Robert Hodges wrote:
   people are starting to get religion on this issue I would strongly
   advocate a parallel effort to put in a change-set extraction API
   that would allow construction of comprehensive master/slave
   replication.

 You know, I gave a talk in Ottawa just last week about how the last
  effort to develop a comprehensive API for replication failed.  I had
  some ideas about why, the main one of which is something like this:
  Big features with a roadmap have not historically worked, so unless
  we're willing to change the way we work, we won't get that.

  I don't think an API is what's needed.  It's clear proposals for
  particlar features that can be delivered in small pieces.  That's what
  the current proposal offers.  I think any kind of row-based approach
  such as what you're proposing would need that kind of proposal too.

  That isn't to say that I think an API is impossible or undesirable.
  It is to say that the last few times we tried, it went nowhere; and
  that I don't think the circumstances have changed.

I think the issue is simpler - API for synchronous replication is
undesirable - it would be too complex and hinder future development
(as I explained above).

And the API for asynchronous replication is already there - triggers,
txid functions for queueing.

There is this tiny matter of replicating schema changes asynchronously,
but I suspect nobody actually cares.  Few random points about that:

- The task cannot even be clearly defined (on technical level - how
  the events should be represented).
- Any schema changes need to be carefully prepared anyway.  Whether
  to apply them to one or more servers does not make much difference.
- Major plus of async replica is ability to actually have different
  schema on slaves.
- People _do_ care about exact schema on single place - failover servers.
- But for failover server we want also synchronous replication.

So if we have synchronous WAL based replication for failover servers,
the interest in hooks to log schema changes will decrease even more.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] replication hooks

2008-05-29 Thread Andrew Sullivan
On Thu, May 29, 2008 at 11:05:09PM +0300, Marko Kreen wrote:

 There is this tiny matter of replicating schema changes asynchronously,
 but I suspect nobody actually cares.  

I know that Slony's users call this their number one irritant, so I
have my doubts nobody cares.  But maybe nobody cares enough.

 - The task cannot even be clearly defined (on technical level - how
   the events should be represented).

Really?  I've been in discussions where different people had clear
(but, alas, different) ideas of how to represent them.

 - Any schema changes need to be carefully prepared anyway.  Whether
   to apply them to one or more servers does not make much difference.

One problem that designers of replication systems have is that they're
already thinking in the Serious Database Application world.  But I
have recently had the pleasure of being reminded how many users of
database systems neither know nor care to know any of the details of
the underlying system.  They already know how to make schema changes:
log into database, and start typing ALTER TABLE. . .  You or I
agreeing that more careful preparation than that is important will not
change their mind.  This is part of the reason MySQL looks so good:
you can just do these things.  If it doesn't work out later, well,
you don't know that when your ALTER TABLE just works.  

 - Major plus of async replica is ability to actually have different
   schema on slaves.

I agree.

 - People _do_ care about exact schema on single place - failover servers.

Yeah, but not only there.  One of the things I was hoping to have
nailed down in the hooks discussion was, in fact, the use cases.
Half the time, people have such a clear idea of what _they_ want from
their replication that they come to believe replication means that.  

Another thing I like about the current proposal is that it is very
clear about what it is (and isn't) aiming for.

A

-- 
Andrew Sullivan
[EMAIL PROTECTED]
+1 503 667 4564 x104
http://www.commandprompt.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] replication hooks

2008-05-29 Thread Marko Kreen
On 5/29/08, Andrew Sullivan [EMAIL PROTECTED] wrote:
 On Thu, May 29, 2008 at 11:05:09PM +0300, Marko Kreen wrote:
   There is this tiny matter of replicating schema changes asynchronously,
   but I suspect nobody actually cares.

 I know that Slony's users call this their number one irritant, so I
  have my doubts nobody cares.  But maybe nobody cares enough.

Oh, users of course like their lives to be as easy as possible
and all tools be do as i wish-complete.

I meant no developer is interested after looking at the task complexity
and resulting payoff.

   - The task cannot even be clearly defined (on technical level - how
 the events should be represented).

 Really?  I've been in discussions where different people had clear
  (but, alas, different) ideas of how to represent them.

Yeah.  The main problem is that unless you do WAL based replication,
you cannot achieve transparency.  So you need to pick few use cases
and tailor you solution for them, which gets uninteresting very fast
- user _will_ stumble upon spacial cases, and if they expect everything
just work the resulting conversation wont be funny.

   - Any schema changes need to be carefully prepared anyway.  Whether
 to apply them to one or more servers does not make much difference.

 One problem that designers of replication systems have is that they're
  already thinking in the Serious Database Application world.  But I
  have recently had the pleasure of being reminded how many users of
  database systems neither know nor care to know any of the details of
  the underlying system.  They already know how to make schema changes:
  log into database, and start typing ALTER TABLE. . .  You or I
  agreeing that more careful preparation than that is important will not
  change their mind.  This is part of the reason MySQL looks so good:
  you can just do these things.  If it doesn't work out later, well,
  you don't know that when your ALTER TABLE just works.

Simple - use WAL-based replication.

Although - not so simple, as currently we don't provide it.  The existing
PITR hooks expect users to write their own replication, which is not
a user-friendly approach...

Hopefully this will be fixed in 8.4.

   - People _do_ care about exact schema on single place - failover servers.

 Yeah, but not only there.  One of the things I was hoping to have
  nailed down in the hooks discussion was, in fact, the use cases.
  Half the time, people have such a clear idea of what _they_ want from
  their replication that they come to believe replication means that.

The main problem with replica-hooks-discuss list was lack of focus.

There are various replication methods - single-master, multi-master,
asynchronous, synchronous, WAL-based, trigger-based, changeset-based.

Any combination wants different hooks, putting them all together
makes people not care.

Eg - setting the topic to schema change logging for async trigger-based
replication would be better, but even there are various usage scenarios
that may not be compatible, so it people don't see a chance of common
hooks they don't bother.  Actually I suspect this task is solvable,
main problem is that it's pretty low on anyones priority list.

  Another thing I like about the current proposal is that it is very
  clear about what it is (and isn't) aiming for.

Yes.  And we can skip the common hooks discussion. ;)

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] replication hooks

2008-05-29 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


 Yeah.  The main problem is that unless you do WAL based replication,
 you cannot achieve transparency.  So you need to pick few use cases
 and tailor you solution for them, which gets uninteresting very fast
 - user _will_ stumble upon spacial cases,

Isn't that what PostGIS is for?

g,d,r

- --
Greg Sabino Mullane [EMAIL PROTECTED]
PGP Key: 0x14964AC8 200805291840
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAkg/MTQACgkQvJuQZxSWSsjfEACgr64IdjtfhidTAGg/dVVVTMOP
0HAAn2tkYoNleSryZ5EyiSMp0o2x9ZFL
=Fmc4
-END PGP SIGNATURE-



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication hooks discussion

2006-10-02 Thread José Orlando Pereira
On Friday 29 September 2006 20:02, Andrew Sullivan wrote:
 At the beginning of the month, in
 http://archives.postgresql.org/pgsql-hackers/2006-09/msg00453.php,
 I said that I'd be willing to try to do any sort of co-ordination,
 document writing, c. for a project that might define common back-end
 resources necessary for the various kinds of replication systems
 people seem to want.

 There seems to be a widespread agreement that there is more than one
 sort of replication facilities that are desired, and that none of the
 systems on offer satisfies all of those desires.  There also seems to
 be a hope that we could come to some sort of agreement on what the
 necessary conditions for any of these facilties are.  If we could,
 then we could build the necessary framework to provide those
 conditions, and it could be made available in the back end without
 every replication project having to be shipped with the main
 PostgreSQL code.

We at the GORDA project strongly agree with this approach. I'll try to 
summarize our proposals on the new list.

Regards,

-- 
Jose Orlando Pereira

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match