subject:"\[HACKERS\] Future In\-Core Replication"

Re: [HACKERS] Future In-Core Replication

2012-05-04 Thread Hannu Krosing

On Thu, 2012-05-03 at 00:58 -0500, Jim Nasby wrote:
 On 4/29/12 6:03 AM, Simon Riggs wrote:
  The DML-WITH-LIMIT-1 is required to do single logical updates on tables
with non-unique rows.
And as for any logical updates we will have huge performance problem
when doing UPDATE or DELETE on large table with no indexes, but
fortunately this problem is on slave, not master;)
  While that is possible, I would favour the do-nothing approach. By
  making the default replication mode = none, we then require a PK to be
  assigned before allowing replication mode = on for a table. Trying to
  replicate tables without PKs is a problem that can wait basically.
 
 
 Something that a in-core method might be able to do that an external one 
 can't would be to support a method of uniquely identifying rows in tables
 with no PK's. A gross example (that undoubtedly wouldn't work in the real 
 world) would be using TID's. 
 A real-world implementation might be based on a hidden serial column.

For logical we don't really need to uniquely identify such rows - it
should sufficient if we just update exactly one of the matching rows.

The way to do this is to put all fields of the OLD.* tuple in the WHERE
clause and then update just one matching row.

IIRC updating (or deleting) CURRENT OF a cursor is currently supported
only in pl/pgsql so this needs to be done using a plpgsql cursor.

If the table has no indexes or index lookup returns lots of rows, then
this is bound to be slow, but in this case it was probably slow on
master too :)

 -- 
 Jim C. Nasby, Database Architect   j...@nasby.net
 512.569.9461 (cell) http://jim.nasby.net
 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-05-04 Thread Robert Haas

On Fri, May 4, 2012 at 8:32 AM, Hannu Krosing ha...@krosing.net wrote:
 For logical we don't really need to uniquely identify such rows - it
 should sufficient if we just update exactly one of the matching rows.

 The way to do this is to put all fields of the OLD.* tuple in the WHERE
 clause and then update just one matching row.

 IIRC updating (or deleting) CURRENT OF a cursor is currently supported
 only in pl/pgsql so this needs to be done using a plpgsql cursor.

 If the table has no indexes or index lookup returns lots of rows, then
 this is bound to be slow, but in this case it was probably slow on
 master too :)

I was about to write a reply saying exactly this, but you said it
better than I would have been able to manage.

I think this is all exactly right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-05-04 Thread Robert Haas

On Thu, May 3, 2012 at 8:22 PM, Greg Smith g...@2ndquadrant.com wrote:
 On 05/01/2012 09:09 AM, Robert Haas wrote:

 I think we ought to be sharing and debugging designs in
 public, not internally within 2ndQuadrant - or any other company, or
 any other mailing list other than this one.

 OK.  You go first.

Are you requesting more transparency in general, asking for my
thoughts on logical replication specifically, or something else?

I try pretty hard not to go off and do large amounts of work in a
vacuum.  If something is more than a couple days work, I post the
design on hackers and wait for feedback before writing a line of code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-05-04 Thread Simon Riggs

On 4 May 2012 14:01, Robert Haas robertmh...@gmail.com wrote:
 On Fri, May 4, 2012 at 8:32 AM, Hannu Krosing ha...@krosing.net wrote:
 For logical we don't really need to uniquely identify such rows - it
 should sufficient if we just update exactly one of the matching rows.

 The way to do this is to put all fields of the OLD.* tuple in the WHERE
 clause and then update just one matching row.

 IIRC updating (or deleting) CURRENT OF a cursor is currently supported
 only in pl/pgsql so this needs to be done using a plpgsql cursor.

 If the table has no indexes or index lookup returns lots of rows, then
 this is bound to be slow, but in this case it was probably slow on
 master too :)

 I was about to write a reply saying exactly this, but you said it
 better than I would have been able to manage.

 I think this is all exactly right.

Yes, but its not a high priority for inclusion. Many things like this
will need to wait behind the really critical additional features.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-05-04 Thread Greg Smith


On 05/04/2012 09:03 AM, Robert Haas wrote:

I try pretty hard not to go off and do large amounts of work in a
vacuum.  If something is more than a couple days work, I post the
design on hackers and wait for feedback before writing a line of code.


That is an excellent luxury to have.  You've worked very hard to earn 
it.  Not everyone is in the position where that's possible though.



Are you requesting more transparency in general, asking for my
thoughts on logical replication specifically, or something else?


The straw man argument here would require 100% transparency on 
everything you do in regards to PostgreSQL and related software.  Before 
doing any development on any code, first post here to ask for design 
review.  And if someone asks you to work on a program that isn't open 
source from day one, refuse unless you can operate that transparently.


That standard is nice if you can pull it off.  But I don't give you a 
hard time if you have to make some compromises from that ideal to keep 
yourself gainfully employed.  You do a ton of good work for the 
PostgreSQL community in a transparent way, so I assume that you're doing 
the best you can.  I would like to see that assumption presumed on our 
side, too.


Here are the individual straw men in this area I'd like to see put out 
of their misery:


You're developing things in secret:  if that's the case, we're pretty 
bad at it, given the history I outlined at 
http://archives.postgresql.org/message-id/4f9b1b6c.5010...@2ndquadrant.com


That discussion didn't happen in the right place:  it's not our fault 
that the cluster-hackers list exists.  Go joust at getting that list 
shut down and their meeting during PGCon canceled if you think it's 
unproductive for discussions to happen there.  I've been trying to 
bridge that gap for over two years now; note how many times I appear in 
the edit history at 
http://wiki.postgresql.org/index.php?title=ClusterFeaturesaction=history


You might do too much development in the wrong direction and not build 
the right thing:  and?  Yes, there are people who develop into a corner 
and end up doing unproductive work as a result.  And there are others 
who submit things and give up when faced with feedback on them.  Last 
time I checked, there wasn't anyone who flat-out rejects on-list 
feedback working for 2ndQuadrant.   Instead, I see features that go 
through extensive and numerous review cycles based on what we hear back.


Designs should be presented on-list before doing any development: 
this is not always practical for those of us who are doing feature 
development.  Some feature sponsors are still getting used to open 
development.  If we have a private development milestone date to hit *in 
order to get more funding for public PostgreSQL work*, which is often 
the case here, we try not to miss it.  We'd be bad community members to 
do so.  And sometimes that involves building a proof of concept or 
prototype here first, then submitting it to the community once it's 
moved onto being a proven concept.  Since the community has a clear set 
of guidelines for how and when to submit new features, we make sure the 
development plans line up with them.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-05-04 Thread Robert Haas

On Fri, May 4, 2012 at 11:06 AM, Greg Smith g...@2ndquadrant.com wrote:
 The straw man argument here would require 100% transparency on everything
 you do in regards to PostgreSQL and related software.  Before doing any
 development on any code, first post here to ask for design review.  And if
 someone asks you to work on a program that isn't open source from day one,
 refuse unless you can operate that transparently.

Well, look.  At the end of the day, I don't really care whether you
post your designs before writing code or not - unless it turns out
that we get to the end of the development cycle, a gigantic patch
shows up at the last minute, it gets rejected because people aren't
satisfied with the design, and then massive bitching ensues because
the author(s) put a lot of work into that patch.  Then I care, because
now the fact that no design consensus was sought at the outset has
been transformed into a defect in the community process, which does in
fact have defects, but that isn't one of them.  We all know that
design review is going to have to happen at some point, and if there's
not an adequate opportunity to do that before the code is written then
it will happen after the code is written.  If that means the code has
to be thrown out, then that's the risk you take by writing the code
first.  As long as everybody understands that, do it in whatever order
you like.

I think the real straw man here is the idea that it will somehow save
time to skip the design phase and start writing code.  I have never
worked on a project, open source or otherwise, where that was true,
and I believe that any textbook on software engineering you pick up is
likely to tell you that in fact exactly the opposite is the case.

Obviously, there are times when you need to write some throw-away code
just to see how things shake out, and I do that all the time, and it
makes complete sense, and I'm not knocking it.  But if any of that
code makes it into the committed patch, I count that as unusually
lucky.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-05-04 Thread Andres Freund

Hi Robert, Hi all,

On Friday, May 04, 2012 06:29:33 PM Robert Haas wrote:
 On Fri, May 4, 2012 at 11:06 AM, Greg Smith g...@2ndquadrant.com wrote:
  The straw man argument here would require 100% transparency on everything
  you do in regards to PostgreSQL and related software.  Before doing any
  development on any code, first post here to ask for design review.  And
  if someone asks you to work on a program that isn't open source from day
  one, refuse unless you can operate that transparently.
 
 Well, look.  At the end of the day, I don't really care whether you
 post your designs before writing code or not - unless it turns out
 that we get to the end of the development cycle, a gigantic patch
 shows up at the last minute, it gets rejected because people aren't
 satisfied with the design, and then massive bitching ensues because
 the author(s) put a lot of work into that patch.  Then I care, because
 now the fact that no design consensus was sought at the outset has
 been transformed into a defect in the community process, which does in
 fact have defects, but that isn't one of them.  We all know that
 design review is going to have to happen at some point, and if there's
 not an adequate opportunity to do that before the code is written then
 it will happen after the code is written.  If that means the code has
 to be thrown out, then that's the risk you take by writing the code
 first.  As long as everybody understands that, do it in whatever order
 you like.
In my understanding - as the person doing quite a bit of the coding atm - the 
point is to provide a very minimal *early* prototype to have a sensible basis 
for design decisions/discussions. On one side thats useful to get a feeling 
for the problems involved. On the other side doing design discussions without 
an underlaying basic patch  design on -hackers tends to often go into 
directions of feature creep and bikeshedding. It also helps against this is 
impossible claims.
Parts of this thread and related ones are a somewhat good example of this.

The plan is to show the early prototype around pgcon and send design documents 
and split-up patches (of that prototype) a holiday and some cleanup later to -
hackers. I/We aim to have individual, independently usable, parts of the patch 
submitted to the first 9.3 commitfest.

I definitely do not wish to screw anyone over doing this or such. And I am 
sure thats the same with others working on this. Even if sometimes emotions 
get in the way/into play.

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-05-04 Thread Robert Haas

On Fri, May 4, 2012 at 12:59 PM, Andres Freund and...@2ndquadrant.com wrote:
 In my understanding - as the person doing quite a bit of the coding atm - the
 point is to provide a very minimal *early* prototype to have a sensible basis
 for design decisions/discussions. On one side thats useful to get a feeling
 for the problems involved. On the other side doing design discussions without
 an underlaying basic patch  design on -hackers tends to often go into
 directions of feature creep and bikeshedding. It also helps against this is
 impossible claims.
 Parts of this thread and related ones are a somewhat good example of this.

 The plan is to show the early prototype around pgcon and send design documents
 and split-up patches (of that prototype) a holiday and some cleanup later to -
 hackers. I/We aim to have individual, independently usable, parts of the patch
 submitted to the first 9.3 commitfest.

 I definitely do not wish to screw anyone over doing this or such. And I am
 sure thats the same with others working on this. Even if sometimes emotions
 get in the way/into play.

Thanks.  I think we are basically all on the same page, here.  As I
said upthread, I am not trying to accuse anything of doing anything
wrong, and as you and Simon and Greg have all said, you're not trying
to do anything wrong.  So, no problem.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-05-03 Thread Josh Berkus

On 5/2/12 10:58 PM, Jim Nasby wrote:
 On 4/29/12 6:03 AM, Simon Riggs wrote:
 The DML-WITH-LIMIT-1 is required to do single logical updates on tables
   with non-unique rows.
   And as for any logical updates we will have huge performance problem
   when doing UPDATE or DELETE on large table with no indexes, but
   fortunately this problem is on slave, not master;)
 While that is possible, I would favour the do-nothing approach. By
 making the default replication mode = none, we then require a PK to be
 assigned before allowing replication mode = on for a table. Trying to
 replicate tables without PKs is a problem that can wait basically.

 
 Something that a in-core method might be able to do that an external one
 can't would be to support a method of uniquely identifying rows in
 tables with no PK's. A gross example (that undoubtedly wouldn't work in
 the real world) would be using TID's. A real-world implementation might
 be based on a hidden serial column.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-05-03 Thread Josh Berkus


 Something that a in-core method might be able to do that an external one
 can't would be to support a method of uniquely identifying rows in
 tables with no PK's. A gross example (that undoubtedly wouldn't work in
 the real world) would be using TID's. A real-world implementation might
 be based on a hidden serial column.

Realistically you need more than a serial for MM replication.  For each
row-version, you need:

serverID of last update
serialID of row
timestamp of last update

... and note that this would have to include deleted rows as well.

Currently Bucardo does this by using several fields, but you could put
together one 128-bit field which contains all of this information.  Or
you could do an Intagram and compress it into 64 bits, but that would
require limiting the problem space in a way you probably can't do it for
a general tool.


I do agree that depending on user-defined PKs raises a whole host of
issues which we'd rather just sidestep, though.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-05-03 Thread Simon Riggs

On Thu, May 3, 2012 at 6:03 PM, Josh Berkus j...@agliodbs.com wrote:

 I do agree that depending on user-defined PKs raises a whole host of
 issues which we'd rather just sidestep, though.

What do you have in mind instead?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-05-03 Thread Greg Smith


On 05/01/2012 09:09 AM, Robert Haas wrote:

I think we ought to be sharing and debugging designs in
public, not internally within 2ndQuadrant - or any other company, or
any other mailing list other than this one.


OK.  You go first.

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-05-02 Thread Jim Nasby


On 4/29/12 6:03 AM, Simon Riggs wrote:

The DML-WITH-LIMIT-1 is required to do single logical updates on tables
  with non-unique rows.
  And as for any logical updates we will have huge performance problem
  when doing UPDATE or DELETE on large table with no indexes, but
  fortunately this problem is on slave, not master;)

While that is possible, I would favour the do-nothing approach. By
making the default replication mode = none, we then require a PK to be
assigned before allowing replication mode = on for a table. Trying to
replicate tables without PKs is a problem that can wait basically.



Something that a in-core method might be able to do that an external one can't 
would be to support a method of uniquely identifying rows in tables with no 
PK's. A gross example (that undoubtedly wouldn't work in the real world) would 
be using TID's. A real-world implementation might be based on a hidden serial 
column.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-05-01 Thread Simon Riggs

On Tue, May 1, 2012 at 1:10 AM, Tatsuo Ishii is...@postgresql.org wrote:
 Those are the basic requirements that I am trying to address. There
 are a great many important details, but the core of this is probably
 what I would call logical replication, that is shipping changes to
 other nodes in a way that does not tie us to the same physical
 representation that recovery/streaming replication does now. Of
 course, non-physical replication can take many forms.

 Guessing from shipping changes to other nodes, you seem to
 implicitly aim at asynchronous replication? If so, I am afraid it will
 force users to pay some cost to migrate from existig applications.

The main plan is to use the existing streaming mechanism. That allows
changes/LCRs to be shipped synchronously to some nodes and
asynchronously to others.

We need a mechanism that works across continents, so eager replication
is not the most likely candidate. So the focus would be on shipping
changes after they have been made using lazy replication.

It sounds a bit strange but we can have synchronous lazy replication,
if you wish it.

None of the above presumes anything about the content or role of the
LCRs being shipped.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-05-01 Thread Robert Haas

On Mon, Apr 30, 2012 at 6:43 PM, Josh Berkus j...@agliodbs.com wrote:
 Well, this *is* the purpose of the cluster-hackers group, to add backend
 support which would make external replication systems easier to build
 and more efficient.  So far the only real feature to come out of that
 has been the Command Triggers, but if you read the TODO list of that
 group you'll see that it's a laundry list of things replication systems
 need support for in the backend.

Right, but the cluster-hackers group has essentially zero mailing list
traffic and seems to be making no progress on getting any of the
features they need into the backend, with the exception, as you say,
of Command Triggers.  So to say, oh, well, we don't need a new effort
to do this, we have cluster-hackers is just sticking our head in the
sand.  We obviously do need more of an effort than we've heretofore
had, and if Simon/2Q are willing to step it up a few notches, kudos to
them.  Hopefully I'll get a chance to participate as well.

One thing I am a bit uncomfortable with about this whole discussion is
that it seems like a lot of the design and research is happening
off-list, with intent to report results back here later.  While that
is obviously fine at some level, keeping the design under wraps until
the code is written is a strategy that's had notoriously poor success
in this community, and I hope we're not in for a repeat of that
strategy.  I think we ought to be sharing and debugging designs in
public, not internally within 2ndQuadrant - or any other company, or
any other mailing list other than this one.  Nobody enjoys doing a lot
of work on a patch and then having it get killed because Tom Lane - or
someone else - points out some critical flaw in the design, and the
way to avoid that is to try to flush out the flaws in public before
too much work goes into it.  On the flip side I certainly understand
that sometimes you need to take some time to get your house in order
before you can present a coherent public proposal, so don't take an
accusation that anyone is doing anything wrong, just a possible
concern.

 What puts me off about this let's start from first principles approach
 is that in our community we have years of experience (in a couple cases,
 over a decade) with logical-level replication.  It seems like the last
 thing we need is Yet Another PostgreSQL Replication System, started over
 from scratch and years away from being production quality.  Don't we
 have enough external replication systems with not enough developers
 behind them?

No.  We have, after over a decade of work, no meaningful core support
for the parts of replication that need core support in order to not
suck.  People are switching from logical replication of various types
to SR in droves, not because they like the draconian restrictions that
HS/SR impose, but rather because it has superior performance and
reliability characteristics.  It has those characteristics because it
has core support.  You cannot build an airplane by bolting wings onto
a car.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-05-01 Thread Simon Riggs

On Tue, May 1, 2012 at 2:09 PM, Robert Haas robertmh...@gmail.com wrote:

 One thing I am a bit uncomfortable with about this whole discussion is
 that it seems like a lot of the design and research is happening
 off-list, with intent to report results back here later.  While that
 is obviously fine at some level, keeping the design under wraps until
 the code is written is a strategy that's had notoriously poor success
 in this community, and I hope we're not in for a repeat of that
 strategy.  I think we ought to be sharing and debugging designs in
 public, not internally within 2ndQuadrant - or any other company, or
 any other mailing list other than this one.  Nobody enjoys doing a lot
 of work on a patch and then having it get killed because Tom Lane - or
 someone else - points out some critical flaw in the design, and the
 way to avoid that is to try to flush out the flaws in public before
 too much work goes into it.  On the flip side I certainly understand
 that sometimes you need to take some time to get your house in order
 before you can present a coherent public proposal, so don't take an
 accusation that anyone is doing anything wrong, just a possible
 concern.

I'm aware of the difficulties here. It's a bootstrap. If we start by
discussing it then we receive the predictable that'll never work
replies. If we start by building a prototype then we get the you're
hiding it replies. Neither is ideal.  I'm also well aware of the if
only you'd showed me your design sooner, I could have saved you the
time which is the stock response from every architect I ever met. The
truth is, as I'm sure you might guess, that I don't know everything,
so making a few mistakes early on allows us to avoid the time wasting
annoyance of early stage ideas.

All I can say is that I've done this a few times before and the
process is easier once we have a little structure in place, with
details like which bike shed we are painting and what the meaning of
the phrase to paint entails.

My hope is that we can present a basic set of facts and measurements
that will act as a platform for challenge and discussion of
requirements and ideas.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-04-30 Thread Dave Page

On Sun, Apr 29, 2012 at 11:20 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

 Excerpts from Simon Riggs's message of jue abr 26 11:10:09 -0300 2012:
 On Thu, Apr 26, 2012 at 1:41 PM, Simon Riggs si...@2ndquadrant.com wrote:

  I will also be organising a small-medium sized Future of In-Core
  Replication meeting in Ottawa on Wed 16 May, 6-10pm.

 Thanks for such rapid response. I've put up a wiki page and will be
 adding names as they come through

 http://wiki.postgresql.org/wiki/PgCon2012CanadaInCoreReplicationMeeting

 How is this not redundant with the Cluster Summit?
 http://wiki.postgresql.org/wiki/PgCon2012CanadaClusterSummit

 ... oh, you're also already enlisted in that one.  Sigh.

My understanding is that the agenda for the cluster meeting is almost
entirely dedicated to Postgres-XC.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-04-30 Thread Simon Riggs

On Mon, Apr 30, 2012 at 7:35 AM, Dave Page dp...@pgadmin.org wrote:
 On Sun, Apr 29, 2012 at 11:20 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:

 Excerpts from Simon Riggs's message of jue abr 26 11:10:09 -0300 2012:
 On Thu, Apr 26, 2012 at 1:41 PM, Simon Riggs si...@2ndquadrant.com wrote:

  I will also be organising a small-medium sized Future of In-Core
  Replication meeting in Ottawa on Wed 16 May, 6-10pm.

 Thanks for such rapid response. I've put up a wiki page and will be
 adding names as they come through

 http://wiki.postgresql.org/wiki/PgCon2012CanadaInCoreReplicationMeeting

 How is this not redundant with the Cluster Summit?
 http://wiki.postgresql.org/wiki/PgCon2012CanadaClusterSummit

 ... oh, you're also already enlisted in that one.  Sigh.

 My understanding is that the agenda for the cluster meeting is almost
 entirely dedicated to Postgres-XC.

Yes. Regrettably, there wasn't a 90 minute slot available to discuss
this at the cluster meeting, the dev meeting agenda is fairly full and
my presentation on replication was rejected for the main conference.
As a result, the additional meeting was the only way left open to me
to initiate detailed discussion.

The additional meeting will give us 240 minutes of discussion and
allows us to bring in some technical users as well.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-04-30 Thread Atri Sharma

On Mon, Apr 30, 2012 at 12:39 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Mon, Apr 30, 2012 at 7:35 AM, Dave Page dp...@pgadmin.org wrote:
 On Sun, Apr 29, 2012 at 11:20 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:

 Excerpts from Simon Riggs's message of jue abr 26 11:10:09 -0300 2012:
 On Thu, Apr 26, 2012 at 1:41 PM, Simon Riggs si...@2ndquadrant.com wrote:

  I will also be organising a small-medium sized Future of In-Core
  Replication meeting in Ottawa on Wed 16 May, 6-10pm.

 Thanks for such rapid response. I've put up a wiki page and will be
 adding names as they come through

 http://wiki.postgresql.org/wiki/PgCon2012CanadaInCoreReplicationMeeting

 How is this not redundant with the Cluster Summit?
 http://wiki.postgresql.org/wiki/PgCon2012CanadaClusterSummit

 ... oh, you're also already enlisted in that one.  Sigh.

 My understanding is that the agenda for the cluster meeting is almost
 entirely dedicated to Postgres-XC.

 Yes. Regrettably, there wasn't a 90 minute slot available to discuss
 this at the cluster meeting, the dev meeting agenda is fairly full and
 my presentation on replication was rejected for the main conference.
 As a result, the additional meeting was the only way left open to me
 to initiate detailed discussion.

 The additional meeting will give us 240 minutes of discussion and
 allows us to bring in some technical users as well.

 --
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

Hi All,

Will the live streaming/recording of the meeting be available?

Atri

-- 
Regards,

Atri
l'apprenant

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-04-30 Thread Bruce Momjian

On Thu, Apr 26, 2012 at 01:41:33PM +0100, Simon Riggs wrote:
 Some people have talked about the need for multi-master replication,
 whereby 2+ databases communicate changes to one another. This topic
 has been discussed in some depth in Computer Science academic papers,
 most notably, The Dangers of Replication and a Solution by the late
 Jim Gray. I've further studied this to the point where I have a
 mathematical model of this that allows me to predict what our likely
 success will be from implementing that. Without meaning to worry you,
 MM replication alone is not a solution for large data or the general
 case. For the general case, single master replication will continue to
 be the most viable option. For large and distributed data sets, some
 form of partitioning/sharding is required simply because full
 multi-master replication just isn't viable at both volume and scale.
 So my take on this is that MM is desirable, but is not the only thing
 we need - we also need partial/filtered replication to make large
 systems practical. Hence why I've been calling this the
 Bi-Directional Replication project. I'm aware that paragraph alone
 requires lots of explanation, which I hope to do both in writing and
 in person at the forthcoming developer conference.

I would love to see a layout of exactly where these things make sense,
similar to what we do at the bottom of our documentation for High
Availability, Load Balancing, and Replication:


http://www.postgresql.org/docs/9.1/static/different-replication-solutions.html

Users and developers just can't seem to get the calculus of where things
make sense into their heads, me included.

For example, you said that MM replication alone is not a solution for
large data or the general case.  Why is that?  Is the goal of your work
really to do logical replciation, which allows for major version
upgrades?  Is that the defining feature?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-04-30 Thread Merlin Moncure

On Mon, Apr 30, 2012 at 12:38 PM, Bruce Momjian br...@momjian.us wrote:
 For example, you said that MM replication alone is not a solution for
 large data or the general case.  Why is that?  Is the goal of your work
 really to do logical replciation, which allows for major version
 upgrades?  Is that the defining feature?

TBH, I don't think MM replication belongs in the database at all.
Ditto any replication solution that implements 'eventual consistency'
such that after the fact conflict resolution is required.  In an SQL
database, when a transaction commits, it should remain so.  It belongs
in the application layer.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-04-30 Thread Simon Riggs

On Mon, Apr 30, 2012 at 6:38 PM, Bruce Momjian br...@momjian.us wrote:

 I would love to see a layout of exactly where these things make sense,
 similar to what we do at the bottom of our documentation for High
 Availability, Load Balancing, and Replication:

        
 http://www.postgresql.org/docs/9.1/static/different-replication-solutions.html

 Users and developers just can't seem to get the calculus of where things
 make sense into their heads, me included.

 For example, you said that MM replication alone is not a solution for
 large data or the general case.  Why is that?  Is the goal of your work
 really to do logical replciation, which allows for major version
 upgrades?  Is that the defining feature?

Good question.

The use case, its breadth and utility are always the first place I
start. I'm in the middle of writing a presentation that explains this
from first principles and will be discussing that at the PgCon
meeting. It's taken a long time to articulate that rather than make
leaps of assumption and belief.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-04-30 Thread Robert Haas

On Mon, Apr 30, 2012 at 2:33 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Mon, Apr 30, 2012 at 12:38 PM, Bruce Momjian br...@momjian.us wrote:
 For example, you said that MM replication alone is not a solution for
 large data or the general case.  Why is that?  Is the goal of your work
 really to do logical replciation, which allows for major version
 upgrades?  Is that the defining feature?

 TBH, I don't think MM replication belongs in the database at all.
 Ditto any replication solution that implements 'eventual consistency'
 such that after the fact conflict resolution is required.  In an SQL
 database, when a transaction commits, it should remain so.  It belongs
 in the application layer.

I basically agree, at least in the medium term.  The logical
replication solutions we have today generally seem to work by watching
the inserts, updates, and deletes go by and writing the changed tuples
to a side table.  This is not very performant, because it amounts to
writing the data four times: we have to write WAL for the original
change, write the data files for the original change, write more WAL
for the change records, and the write those data files.  Since all
large database solutions are eventually I/O-bound, this is not great.
Writing and flushing a separate replication log in parallel to WAL
would get us down to three writes, and extracting tuple data from the
existing WAL would get us down to two writes, which is as well as we
ever know how to do.

If we just had that much in core - that is, the ability to efficiently
extra tuple inserts, updates, and deletes on a logical level - it
would be much easier to build a good logical replication system around
PostgreSQL than it is today, and the existing systems could be adapted
to deliver higher performance by making use of the new infrastructure.
 The other half of the changes - applying the updates - is relatively
straightforward, and it wouldn't bother me to leave that in user-land,
especially in the MMR case, where you have to deal with conflict
resolution rules that may be much simpler to express in a higher-level
language than they would be in C.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-04-30 Thread Bruce Momjian

On Mon, Apr 30, 2012 at 07:55:00PM +0100, Simon Riggs wrote:
 On Mon, Apr 30, 2012 at 6:38 PM, Bruce Momjian br...@momjian.us wrote:
 
  I would love to see a layout of exactly where these things make sense,
  similar to what we do at the bottom of our documentation for High
  Availability, Load Balancing, and Replication:
 
         
  http://www.postgresql.org/docs/9.1/static/different-replication-solutions.html
 
  Users and developers just can't seem to get the calculus of where things
  make sense into their heads, me included.
 
  For example, you said that MM replication alone is not a solution for
  large data or the general case.  Why is that?  Is the goal of your work
  really to do logical replciation, which allows for major version
  upgrades?  Is that the defining feature?
 
 Good question.
 
 The use case, its breadth and utility are always the first place I
 start. I'm in the middle of writing a presentation that explains this
 from first principles and will be discussing that at the PgCon
 meeting. It's taken a long time to articulate that rather than make
 leaps of assumption and belief.

Yep, it is the assumption and belief that always confuses me.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-04-30 Thread Kevin Grittner

Robert Haas robertmh...@gmail.com wrote:
 
 The other half of the changes - applying the updates - is
 relatively straightforward, and it wouldn't bother me to leave
 that in user-land, especially in the MMR case, where you have to
 deal with conflict resolution rules that may be much simpler to
 express in a higher-level language than they would be in C.
 
I've developed successful MMR more than once, and while I wouldn't
say it's exactly been implemented in the database, it hasn't exactly
been in application space either.  The most successful
implementations I've worked with have been a layer just outside the
database, of which application software was completely unaware.  The
database wasn't aware of the coordination per se; but it *did* need
to provide transaction information in a clean way, and the
declarations of how data was distributed were in the database.  In
my experience a declarative definition of data distribution has
always been sufficient, and certainly cleaner to deal with than
imperative coding would be.
 
YMMV.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-04-30 Thread Merlin Moncure

On Mon, Apr 30, 2012 at 2:38 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Apr 30, 2012 at 2:33 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Mon, Apr 30, 2012 at 12:38 PM, Bruce Momjian br...@momjian.us wrote:
 For example, you said that MM replication alone is not a solution for
 large data or the general case.  Why is that?  Is the goal of your work
 really to do logical replciation, which allows for major version
 upgrades?  Is that the defining feature?

 TBH, I don't think MM replication belongs in the database at all.
 Ditto any replication solution that implements 'eventual consistency'
 such that after the fact conflict resolution is required.  In an SQL
 database, when a transaction commits, it should remain so.  It belongs
 in the application layer.

 I basically agree, at least in the medium term.  The logical
 replication solutions we have today generally seem to work by watching
 the inserts, updates, and deletes go by and writing the changed tuples
 to a side table.  This is not very performant, because it amounts to
 writing the data four times: we have to write WAL for the original
 change, write the data files for the original change, write more WAL
 for the change records, and the write those data files.  Since all
 large database solutions are eventually I/O-bound, this is not great.
 Writing and flushing a separate replication log in parallel to WAL
 would get us down to three writes, and extracting tuple data from the
 existing WAL would get us down to two writes, which is as well as we
 ever know how to do.

 If we just had that much in core - that is, the ability to efficiently
 extra tuple inserts, updates, and deletes on a logical level - it
 would be much easier to build a good logical replication system around
 PostgreSQL than it is today, and the existing systems could be adapted
 to deliver higher performance by making use of the new infrastructure.
  The other half of the changes - applying the updates - is relatively
 straightforward, and it wouldn't bother me to leave that in user-land,
 especially in the MMR case, where you have to deal with conflict
 resolution rules that may be much simpler to express in a higher-level
 language than they would be in C.

Yeah -- here at $work the SQL Server team (once in a while we cross
no-man's land and converse) has some fancy technology that sits
directly on top of the transaction log and exposes an API that you can
use to peek into the river of data running through the log and do
stuff with it.  In our case, they use it to triage extracts from about
100 or so distributed databases into a centralized store in a
relatively realtime fashion.  HS/SR simply can't do that and there
would be tremendous value in something that could.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-04-30 Thread Josh Berkus


 If we just had that much in core - that is, the ability to efficiently
 extra tuple inserts, updates, and deletes on a logical level - it
 would be much easier to build a good logical replication system around
 PostgreSQL than it is today, and the existing systems could be adapted
 to deliver higher performance by making use of the new infrastructure.

Well, this *is* the purpose of the cluster-hackers group, to add backend
support which would make external replication systems easier to build
and more efficient.  So far the only real feature to come out of that
has been the Command Triggers, but if you read the TODO list of that
group you'll see that it's a laundry list of things replication systems
need support for in the backend.

http://wiki.postgresql.org/wiki/ClusterFeatures
http://wiki.postgresql.org/wiki/PgCon2012CanadaClusterSummit

What puts me off about this let's start from first principles approach
is that in our community we have years of experience (in a couple cases,
over a decade) with logical-level replication.  It seems like the last
thing we need is Yet Another PostgreSQL Replication System, started over
from scratch and years away from being production quality.  Don't we
have enough external replication systems with not enough developers
behind them?

Even if improving an existing replication system proves to be
impossible, it would make more sense to start with an analysis of the
strengths and deficiencies of Slony, Londiste, Bucardo, etc., than with
some kind of clean-room approach.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-04-30 Thread Simon Riggs

On Mon, Apr 30, 2012 at 11:43 PM, Josh Berkus j...@agliodbs.com wrote:

 Well, this *is* the purpose of the cluster-hackers group

Well, I tried all available means to discuss my ideas before
organising an external meeting. You can think of the InCore meeting as
an extension of the cluster hackers meeting if you wish.

 What puts me off about this let's start from first principles approach
 is that in our community we have years of experience (in a couple cases,
 over a decade) with logical-level replication.  It seems like the last
 thing we need is Yet Another PostgreSQL Replication System, started over
 from scratch and years away from being production quality.  Don't we
 have enough external replication systems with not enough developers
 behind them?

I've tried very hard to express my admiration and respect for
developers of many replication systems both personally when we meet in
person and on list. And I've mentioned that I'd like to include as
many ideas as possible in an in-core approach. Yes, we have many
external replication systems. Many, many people have expressed the
desire for more coherent features in core.

I believe we can achieve production quality code in 1, maybe 2
releases. That is only possible by building on what we already have
and reusing the concepts, experience and perhaps even code from other
projects. We are closer to that than your statement allows. Yes, we
have enough external replication systems and that is one reason why we
need to put things in core and not just create another external
system.

 Even if improving an existing replication system proves to be
 impossible, it would make more sense to start with an analysis of the
 strengths and deficiencies of Slony, Londiste, Bucardo, etc., than with
 some kind of clean-room approach.

I'm not sure why you think I would *not* be starting with that analysis.

I look forward to discussing this in person, where I'm sure it will be easier.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-04-30 Thread Josh Berkus


 Well, this *is* the purpose of the cluster-hackers group
 
 Well, I tried all available means to discuss my ideas before
 organising an external meeting. You can think of the InCore meeting as
 an extension of the cluster hackers meeting if you wish.

That comment wasn't for you, it was for other folks who didn't seem to
be aware that some of these ideas had already been discussed.  You, of
course, know because you were there.

And, if you need a 3-hour meeting there's no question that you need the
evening thing.  There's just not enough time in anyone's schedule.

 I'm not sure why you think I would *not* be starting with that analysis.

That wasn't the impression I got from your prior emails to hackers.
Thanks for the clarification!

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-04-30 Thread Tatsuo Ishii

 Those are the basic requirements that I am trying to address. There
 are a great many important details, but the core of this is probably
 what I would call logical replication, that is shipping changes to
 other nodes in a way that does not tie us to the same physical
 representation that recovery/streaming replication does now. Of
 course, non-physical replication can take many forms.

Guessing from shipping changes to other nodes, you seem to
implicitly aim at asynchronous replication? If so, I am afraid it will
force users to pay some cost to migrate from existig applications.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-04-29 Thread Simon Riggs

On Sat, Apr 28, 2012 at 8:40 PM, Hannu Krosing ha...@krosing.net wrote:

 As to what LCRs should contain, it will probably be locical equivalents
 of INSERT, UPDATE ... LIMIT 1, DELETE ... LIMIT 1, TRUNCATE and all DDL.

Yeh

 I would even go as far as propose a variant for DML-WITH-LIMIT-1 to be
 added to postgresql's SQL syntax so that the LCRs could be converted to
 SQL text for some tasks and thus should be easy to process using generic
 text-based tools.
 The DML-WITH-LIMIT-1 is required to do single logical updates on tables
 with non-unique rows.
 And as for any logical updates we will have huge performance problem
 when doing UPDATE or DELETE on large table with no indexes, but
 fortunately this problem is on slave, not master ;)

While that is possible, I would favour the do-nothing approach. By
making the default replication mode = none, we then require a PK to be
assigned before allowing replication mode = on for a table. Trying to
replicate tables without PKs is a problem that can wait basically.

 Generating and shipping the LCR-s at WAL-generation time or perhaps even
 a bit earlier will have a huge performance benefit of not doing double
 writing of captured events on the master which currently is needed for
 several reasons, the main one being the determining of which
 transactions do commit and in what order. (this cant be solved on master
 without a local event log table as we dont have commit/rollback
 triggers)

 If we delegate that part out of the master then this alone enables us to
 be almost as fast as WAL based replica in most cases, even when we have
 different logical structure on slaves.

Agreed

 How the LCRs are produced and how they are applied is a subject for
 debate and measurement. We're lucky enough to have a variety of
 mechanisms to compare, Slony 1.0/2.0, Slony 2.2/Londiste/Bucardo and
 its worth adding WAL translation there also. My initial thought is
 that WAL translation has many positive aspects to it and we are
 investigating. There are also some variants on those themes, such as
 the one you discussed above.

 You probably won't recognise this as such, but I hope that people
 might see that I'm hoping to build Slony 3.0, Londiste++ etc. At some
 point, we'll all say thats not Slony, but we'll also say (Josh
 already did) thats not binary replication. But it will be the
 descendant of all.

 If we get efficient and flexible logical change event generation on the
 master, then I'm sure the current trigger-based logical replication
 providers will switch (for full replication) or at least add and extra
 LCR-source . It may still make sense to leave some flexibility to the
 master side, so the some decisions - possibly even complex ones - could
 be made when generating the LCR-s

 What I would like is to have some of it exposed to userspace via
 function which could be used by developers to push their own LCRs.

Yes, I see that one coming.

That use case is not something I'm focused on, but I do recognise
others wish to pursue that.

The bit I'm not sure about is whether we have custom handler code as well.

 As metioned above, significant part of this approach can be prototyped
 from user-level triggers as soon as we have triggers on commit and
 rollback , even though at a slightly reduced performance. That is it
 will still have the trigger overhead, but we can omit all the extra
 writing and then re-reading and event-table management on the master.

Agreed.

 Wanting to play with Streaming Logical Replication (as opposed to
 current Chunked Logical Replication) is also one of the reasons that I
 complained when the command triggers patch was kicked out from 9.2.

Yeh. It's clear that project needs to move forwards quickly in 9.3 if
we are to make this Just Work in the way we hope.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-04-29 Thread Hannu Krosing

On Sun, 2012-04-29 at 12:03 +0100, Simon Riggs wrote:
 On Sat, Apr 28, 2012 at 8:40 PM, Hannu Krosing ha...@krosing.net wrote:
 
  As to what LCRs should contain, it will probably be locical equivalents
  of INSERT, UPDATE ... LIMIT 1, DELETE ... LIMIT 1, TRUNCATE and all DDL.
 
 Yeh
 
  I would even go as far as propose a variant for DML-WITH-LIMIT-1 to be
  added to postgresql's SQL syntax so that the LCRs could be converted to
  SQL text for some tasks and thus should be easy to process using generic
  text-based tools.
  The DML-WITH-LIMIT-1 is required to do single logical updates on tables
  with non-unique rows.
  And as for any logical updates we will have huge performance problem
  when doing UPDATE or DELETE on large table with no indexes, but
  fortunately this problem is on slave, not master ;)
 
 While that is possible, I would favour the do-nothing approach. By
 making the default replication mode = none, we then require a PK to be
 assigned before allowing replication mode = on for a table. Trying to
 replicate tables without PKs is a problem that can wait basically.

While this is a good approach in most cases, there is a large use case
for pk-less / indexless tables in large logfiles, where you may want to
do INSERT only replication, perhaps with some automatic partitioning on
logdate. Allowing this is probably something to look at in the first
release, even though I'm not sure wht would happen on violation of this
insert-only policy.
Should it 
* refuse to continue and rollback the transaction (probably not)
* fail silently
* succeed but log the change locally 
* succseed with some special flags so other side can treat it specially
without having to look up stuff in system catalog
* (if we mark the unique / pk fields in some special way anyway, then
the previous one is free :)


-- 
---
Hannu Krosing
PostgreSQL Unlimited Scalability and Performance Consultant
2ndQuadrant Nordic
PG Admin Book: http://www.2ndQuadrant.com/books/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-04-29 Thread Simon Riggs

On Sun, Apr 29, 2012 at 6:20 PM, Hannu Krosing ha...@2ndquadrant.com wrote:
 On Sun, 2012-04-29 at 12:03 +0100, Simon Riggs wrote:
 On Sat, Apr 28, 2012 at 8:40 PM, Hannu Krosing ha...@krosing.net wrote:

  As to what LCRs should contain, it will probably be locical equivalents
  of INSERT, UPDATE ... LIMIT 1, DELETE ... LIMIT 1, TRUNCATE and all DDL.

 Yeh

  I would even go as far as propose a variant for DML-WITH-LIMIT-1 to be
  added to postgresql's SQL syntax so that the LCRs could be converted to
  SQL text for some tasks and thus should be easy to process using generic
  text-based tools.
  The DML-WITH-LIMIT-1 is required to do single logical updates on tables
  with non-unique rows.
  And as for any logical updates we will have huge performance problem
  when doing UPDATE or DELETE on large table with no indexes, but
  fortunately this problem is on slave, not master ;)

 While that is possible, I would favour the do-nothing approach. By
 making the default replication mode = none, we then require a PK to be
 assigned before allowing replication mode = on for a table. Trying to
 replicate tables without PKs is a problem that can wait basically.

 While this is a good approach in most cases, there is a large use case
 for pk-less / indexless tables in large logfiles, where you may want to
 do INSERT only replication, perhaps with some automatic partitioning on
 logdate. Allowing this is probably something to look at in the first
 release, even though I'm not sure wht would happen on violation of this
 insert-only policy.
 Should it
 * refuse to continue and rollback the transaction (probably not)
 * fail silently
 * succeed but log the change locally
 * succseed with some special flags so other side can treat it specially
 without having to look up stuff in system catalog
 * (if we mark the unique / pk fields in some special way anyway, then
 the previous one is free :)

OK, I think an insert-only replication mode would allow that.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-04-29 Thread Alvaro Herrera


Excerpts from Simon Riggs's message of jue abr 26 11:10:09 -0300 2012:
 On Thu, Apr 26, 2012 at 1:41 PM, Simon Riggs si...@2ndquadrant.com wrote:
 
  I will also be organising a small-medium sized Future of In-Core
  Replication meeting in Ottawa on Wed 16 May, 6-10pm.
 
 Thanks for such rapid response. I've put up a wiki page and will be
 adding names as they come through
 
 http://wiki.postgresql.org/wiki/PgCon2012CanadaInCoreReplicationMeeting

How is this not redundant with the Cluster Summit?
http://wiki.postgresql.org/wiki/PgCon2012CanadaClusterSummit

... oh, you're also already enlisted in that one.  Sigh.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-04-28 Thread Simon Riggs

On Fri, Apr 27, 2012 at 11:50 PM, Christopher Browne cbbro...@gmail.com wrote:
 On Fri, Apr 27, 2012 at 4:11 AM, Simon Riggs si...@2ndquadrant.com wrote:
 What I'm hoping to do is to build a basic prototype of logical
 replication using WAL translation, so we can inspect it to see what
 the downsides are. It's an extremely non-trivial problem and so I
 expect there to be mountains to climb. There are other routes to
 logical replication, with messages marshalled in a similar way to
 Slony/Londiste/Bucardo/Mammoth(?). So there are options, with
 measurements to be made and discussions to be had.

 I'll note that the latest version of Slony ...has made a substantial change 
 to its data
 representation

The basic model I'm working to is that logical replication will ship
Logical Change Records (LCRs) using the same transport mechanism that
we built for WAL.

How the LCRs are produced and how they are applied is a subject for
debate and measurement. We're lucky enough to have a variety of
mechanisms to compare, Slony 1.0/2.0, Slony 2.2/Londiste/Bucardo and
its worth adding WAL translation there also. My initial thought is
that WAL translation has many positive aspects to it and we are
investigating. There are also some variants on those themes, such as
the one you discussed above.

You probably won't recognise this as such, but I hope that people
might see that I'm hoping to build Slony 3.0, Londiste++ etc. At some
point, we'll all say thats not Slony, but we'll also say (Josh
already did) thats not binary replication. But it will be the
descendant of all.

Backwards compatibility is not a goal, please note, but only because
that will complicate matters intensely.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-04-28 Thread Hannu Krosing

On Sat, 2012-04-28 at 09:36 +0100, Simon Riggs wrote:
 On Fri, Apr 27, 2012 at 11:50 PM, Christopher Browne cbbro...@gmail.com 
 wrote:
  On Fri, Apr 27, 2012 at 4:11 AM, Simon Riggs si...@2ndquadrant.com wrote:
  What I'm hoping to do is to build a basic prototype of logical
  replication using WAL translation, so we can inspect it to see what
  the downsides are. It's an extremely non-trivial problem and so I
  expect there to be mountains to climb. There are other routes to
  logical replication, with messages marshalled in a similar way to
  Slony/Londiste/Bucardo/Mammoth(?). So there are options, with
  measurements to be made and discussions to be had.
 
  I'll note that the latest version of Slony ...has made a substantial change 
  to its data
  representation
 
 The basic model I'm working to is that logical replication will ship
 Logical Change Records (LCRs) using the same transport mechanism that
 we built for WAL.

One outcome of this LCR approach is probably that you will be shipping
changes as they are made and on slave have to either apply them in N
parallel transactions and commit each transaction when the LCR for the
corresponding transaction says so, or you have to collect the LCR-s
before applying and then apply and commit committed-on-master
transactions in commit order and throw away the aborted ones.

The optimal approach will probably be some combination of these, that is
collect and apply short ones, start replay in separate transaction if
commit does not arrive in N ms.

As to what LCRs should contain, it will probably be locical equivalents
of INSERT, UPDATE ... LIMIT 1, DELETE ... LIMIT 1, TRUNCATE and all DDL.

The DDL could actually stay raw (as in LCRs for system tables) on
generator side as hopefully the rule that system tables cant have
triggers does not apply when generating the LCR-s on WAL path.
If we need to go back to ALTER TABLE ... commands, then this is probably
the wisest to leave for client. Client here could also be some xReader
like middleman.

I would even go as far as propose a variant for DML-WITH-LIMIT-1 to be
added to postgresql's SQL syntax so that the LCRs could be converted to
SQL text for some tasks and thus should be easy to process using generic
text-based tools.
The DML-WITH-LIMIT-1 is required to do single logical updates on tables
with non-unique rows.
And as for any logical updates we will have huge performance problem
when doing UPDATE or DELETE on large table with no indexes, but
fortunately this problem is on slave, not master ;)

Generating and shipping the LCR-s at WAL-generation time or perhaps even
a bit earlier will have a huge performance benefit of not doing double
writing of captured events on the master which currently is needed for
several reasons, the main one being the determining of which
transactions do commit and in what order. (this cant be solved on master
without a local event log table as we dont have commit/rollback
triggers)

If we delegate that part out of the master then this alone enables us to
be almost as fast as WAL based replica in most cases, even when we have
different logical structure on slaves.

 How the LCRs are produced and how they are applied is a subject for
 debate and measurement. We're lucky enough to have a variety of
 mechanisms to compare, Slony 1.0/2.0, Slony 2.2/Londiste/Bucardo and
 its worth adding WAL translation there also. My initial thought is
 that WAL translation has many positive aspects to it and we are
 investigating. There are also some variants on those themes, such as
 the one you discussed above.
 
 You probably won't recognise this as such, but I hope that people
 might see that I'm hoping to build Slony 3.0, Londiste++ etc. At some
 point, we'll all say thats not Slony, but we'll also say (Josh
 already did) thats not binary replication. But it will be the
 descendant of all.

If we get efficient and flexible logical change event generation on the
master, then I'm sure the current trigger-based logical replication
providers will switch (for full replication) or at least add and extra
LCR-source . It may still make sense to leave some flexibility to the
master side, so the some decisions - possibly even complex ones - could
be made when generating the LCR-s

What I would like is to have some of it exposed to userspace via
function which could be used by developers to push their own LCRs.

As metioned above, significant part of this approach can be prototyped
from user-level triggers as soon as we have triggers on commit and
rollback , even though at a slightly reduced performance. That is it
will still have the trigger overhead, but we can omit all the extra
writing and then re-reading and event-table management on the master. 

Wanting to play with Streaming Logical Replication (as opposed to
current Chunked Logical Replication) is also one of the reasons that I
complained when the command triggers patch was kicked out from 9.2.

 Backwards compatibility is not a goal, please

Re: [HACKERS] Future In-Core Replication

2012-04-28 Thread Hannu Krosing

On Sat, 2012-04-28 at 21:40 +0200, Hannu Krosing wrote:
 On Sat, 2012-04-28 at 09:36 +0100, Simon Riggs wrote:
  On Fri, Apr 27, 2012 at 11:50 PM, Christopher Browne cbbro...@gmail.com 
  wrote:
   On Fri, Apr 27, 2012 at 4:11 AM, Simon Riggs si...@2ndquadrant.com 
   wrote:
   What I'm hoping to do is to build a basic prototype of logical
   replication using WAL translation, so we can inspect it to see what
   the downsides are. It's an extremely non-trivial problem and so I
   expect there to be mountains to climb. There are other routes to
   logical replication, with messages marshalled in a similar way to
   Slony/Londiste/Bucardo/Mammoth(?). So there are options, with
   measurements to be made and discussions to be had.
  
   I'll note that the latest version of Slony ...has made a substantial 
   change to its data
   representation

Btw, Londiste has also moved from Slony-like trigger with partial SQL to
more general format a few versions ago.

I sure hope they will move to JSON as the data representation now when
we'll have JSON in core in 9.2

-- 
---
Hannu Krosing
PostgreSQL Unlimited Scalability and Performance Consultant
2ndQuadrant Nordic
PG Admin Book: http://www.2ndQuadrant.com/books/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-04-27 Thread Simon Riggs

On Fri, Apr 27, 2012 at 1:26 AM, Josh Berkus j...@agliodbs.com wrote:
 Simon,

 I'm beginning to work on advanced additions to in-core replication for
 PostgreSQL.
 ...
 Those are the basic requirements that I am trying to address. There
 are a great many important details, but the core of this is probably
 what I would call logical replication, that is shipping changes to
 other nodes in a way that does not tie us to the same physical
 representation that recovery/streaming replication does now. Of
 course, non-physical replication can take many forms.

 So, I'm a bit confused.  You talk about this as additions to in-core
 replication, but then you talk about implementing logical replication,
 which would NOT be an addition to Binary replication.  Can you explain
 what you mean?

The key point is that there is a specific objective of including
additional features in-core. That places some restrictions, but also
offers some opportunities. Tight integration allows performance
improvements, as well as ease of use etc..

I'm not sure what you mean by would not be an addition to binary
replication. Yes, for reasons most elegantly explained by Robert here
[http://rhaas.blogspot.co.uk/2011/02/case-for-logical-replication.html],
physical/binary replication puts too many restrictions on us and we
cannot solve all of the problems that way. I was unaware of Robert's
post, but it sets the scene clearly.

So the future of in-core replication, IMHO, is some form of
non-physical replication. There are various options there, but
anything that goes in would reuse significant parts of the existing
replication setup that already works so well. Put that another way:
the infrastructure for the secure and efficient transport of
replication messages is already in place. Reuse is also what makes
something useful be achievable in a reasonable timescale.

What we need to consider is the form of those new non-physicalWAL
messages, how they are built on the sender and how they are handled at
the receiving end.

What I'm hoping to do is to build a basic prototype of logical
replication using WAL translation, so we can inspect it to see what
the downsides are. It's an extremely non-trivial problem and so I
expect there to be mountains to climb. There are other routes to
logical replication, with messages marshalled in a similar way to
Slony/Londiste/Bucardo/Mammoth(?). So there are options, with
measurements to be made and discussions to be had.

It will take time for people to believe this is possible and longer to
analyse and agree the options.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-04-27 Thread Simon Riggs

On Fri, Apr 27, 2012 at 12:40 AM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Apr 26, 2012 at 6:48 PM, Josh Berkus j...@agliodbs.com wrote:
 So the idea is that you'll present briefly your intentions for 9.3 at
 the developer meeting, and then have this in-depth afterwards?  Sounds
 great.

 I really, really do not want the developer meeting to turn into a
 series of presentations.  That's what we had last year, and it was
 boring and unproductive.  Furthermore, it excludes from the
 conversation everyone who doesn't fit into a room in Ottawa.  I think
 that plans should be presented here, on pgsql-hackers, and the
 developer meeting should be reserved for discussion of issues with
 which everyone who will be there is already familiar.  If a
 presentation is required, the developer meeting is the wrong forum,
 IMHO of course.

I agree.

Obviously, the word presentation is the issue here. If one person
speaks and everybody else is silent that is not a good use of
anybody's time.

On any topic, I expect the introducer of that topic to set the scene,
establish the major questions and encourage discussion.

I would follow that form myself: I want to hear the wisdom of others
and alter my plans accordingly.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-04-27 Thread Josh Berkus


 What I'm hoping to do is to build a basic prototype of logical
 replication using WAL translation, so we can inspect it to see what
 the downsides are. 

Sounds like Mammoth.  You take a look at that?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-04-27 Thread Simon Riggs

On Fri, Apr 27, 2012 at 6:59 PM, Josh Berkus j...@agliodbs.com wrote:

 What I'm hoping to do is to build a basic prototype of logical
 replication using WAL translation, so we can inspect it to see what
 the downsides are.

 Sounds like Mammoth.  You take a look at that?

Well, they all sound similar. My info was that Mammoth was not WAL-based.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-04-27 Thread Joshua D. Drake



On 04/27/2012 11:33 AM, Simon Riggs wrote:
Well, they all sound similar. My info was that Mammoth was not WAL-based. 

Mammoth was transaction log based but not WAL based.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-04-27 Thread Christopher Browne

On Fri, Apr 27, 2012 at 4:11 AM, Simon Riggs si...@2ndquadrant.com wrote:
 What I'm hoping to do is to build a basic prototype of logical
 replication using WAL translation, so we can inspect it to see what
 the downsides are. It's an extremely non-trivial problem and so I
 expect there to be mountains to climb. There are other routes to
 logical replication, with messages marshalled in a similar way to
 Slony/Londiste/Bucardo/Mammoth(?). So there are options, with
 measurements to be made and discussions to be had.

I'll note that the latest version of Slony, expected to be 2.2 (which
generally seems to work, but we're stuck at the moment waiting to get
free cycles to QA it) has made a substantial change to its data
representation.

The triggers used to cook data into a sort of fractional WHERE
clause, transforming an I/U/D into a string that you'd trivially
combine with the string INSERT INTO/UPDATE/DELETE FROM to get the
logical update.  If there was need to do anything fancier, you'd be
left having to have a fractional SQL parser to split the data out by
hand.

New in 2.2 is that the log data is split out into an array of text
values which means that if someone wanted to do some transformation,
such as filtering on value, or filtering out columns, they could
modify the application-of-updates code to query for the data that they
want to fiddle with.  No parser needed.

It's doubtless worthwhile to take a peek at that to make sure it
informs your data representation appropriately.  It's important to
have data represented in a fashion that is amenable to manipulation,
and that decidedly wasn't the case pre-2.2.

I wonder if a meaningful transport mechanism might involve combining:
a) A trigger that indicates that some data needs to be captured in a
logical form (rather than the presently pretty purely physical form
of WAL)
b) Perhaps a way of capturing logical updates in WAL
c) One of the old ideas that fell through was to try to capture commit
timestamps via triggers.  Doing it directly turned out to be too
controversial to get in.  Perhaps that's something that could be
captured via some process that parses WAL.

Something seems wrong about that in that it mixes together updates of
multiple forms into WAL, physical *and* logical, and perhaps that
implies that there should be an altogether separate logical updates
log. (LUL?)  That still involves capturing updates in a duplicative
fashion, e.g. - WAL + LUL, which seems somehow wrong.  Or perhaps I'm
tilting at a windmill here.  With Slony/Londiste/Bucardo, we're
capturing LUL in some tables, meaning that it gets written both to
the tables' data files as well as WAL.  Adding a binary LUL eliminates
those table files and attendant WAL updates, thus providing some
savings.

[Insert a LULCATS joke here...]

Perhaps I've just had too much coffee...
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Future In-Core Replication

2012-04-26 Thread Simon Riggs

I'm beginning to work on advanced additions to in-core replication for
PostgreSQL.

There are a number of additional features for existing single-master
replication still to achieve, but the key topics to be addressed are
major leaps forward in functionality. I hope to add useful features in
9.3, though realise that many things could take two or even more
release cycles to achieve. (The last set of features took 8 years, so
I'm hoping to do this a little faster).

Some of my 2ndQuadrant colleagues will be committing themselves to the
project also and we hope to work with the community in the normal way
to create new features. I mention this only to say that major skills
and resources will be devoted to this for the next release(s), not
that this is a private project.

Some people have talked about the need for multi-master replication,
whereby 2+ databases communicate changes to one another. This topic
has been discussed in some depth in Computer Science academic papers,
most notably, The Dangers of Replication and a Solution by the late
Jim Gray. I've further studied this to the point where I have a
mathematical model of this that allows me to predict what our likely
success will be from implementing that. Without meaning to worry you,
MM replication alone is not a solution for large data or the general
case. For the general case, single master replication will continue to
be the most viable option. For large and distributed data sets, some
form of partitioning/sharding is required simply because full
multi-master replication just isn't viable at both volume and scale.
So my take on this is that MM is desirable, but is not the only thing
we need - we also need partial/filtered replication to make large
systems practical. Hence why I've been calling this the
Bi-Directional Replication project. I'm aware that paragraph alone
requires lots of explanation, which I hope to do both in writing and
in person at the forthcoming developer conference.

My starting point for designs is to focus on a key aspect: massive
change to the code base is not viable and any in-core solution must
look at minimally invasive changes. And of course, if it is in-core we
must also add robust, clear code with reasonable performance that do
not impede non-replication usage.

The use cases we will address are not focused on any one project or
user. I've distilled these points so far from talking to a wide
variety of users, from major enterprises to startups.

1. GEOGRAPHICALLY DISTRIBUTED - Large users require both High
Availability, which necessitates multiple nodes, as well as Disaster
Recovery, which necessitates geographically distributed nodes. So my
focus is not focused on clustering in the sense of Hadoop or Oracle
RAC, since those technologies require additional technologies to
provide DR, so my aim is to arrive at a coherent set of technologies
that provide all that we want. I'm aware that other projects *are*
focused on clustering, so even more reason not to try to
simultaneously invent the wheel.

2. COHERENT - With regard to the coherence, I note this thinking is
similar to the way that Oracle replication is evolving, where they
have multiple kinds of in-core replication and various purchased
technologies. We have a similar issue with regard to various external
projects. I very much hope that we can utilise the knowledge, code and
expertise of those other projects in the way we move forwards.

3. ONLINE UPGRADE - highly available distributed systems must have a
mechanism for online upgrade, otherwise they won't stay HA for long.
This challenge must be part of the solution, and incidentally should
be a useful goal in itself.

4. MULTI-MASTER - the ability to update data from a variety of locations

5. WRITE-SCALEABLE - the ability to partition data across nodes in a
way that allows the solution to improve beyond the write rate of a
single node.

Those are the basic requirements that I am trying to address. There
are a great many important details, but the core of this is probably
what I would call logical replication, that is shipping changes to
other nodes in a way that does not tie us to the same physical
representation that recovery/streaming replication does now. Of
course, non-physical replication can take many forms.

The assumption of consistency across nodes is considered optional at
this point, and I hope to support both eagerly consistent and
eventually consistent approaches.

I'm aware that this is a broad topic and many people will want input
on this, and am also aware that will take much time. This post is more
about announcing the project, than discussing specific details.

My strategy for doing this is to come up with some designs and
prototypes of a few things that might be the best way forwards. By
building prototypes we will more quickly be able to address the key
questions before us. So there is currently work on research-based
development to allow wider discussion based upon something more than
just

Re: [HACKERS] Future In-Core Replication

2012-04-26 Thread Simon Riggs

On Thu, Apr 26, 2012 at 1:41 PM, Simon Riggs si...@2ndquadrant.com wrote:

 I will also be organising a small-medium sized Future of In-Core
 Replication meeting in Ottawa on Wed 16 May, 6-10pm.

Thanks for such rapid response. I've put up a wiki page and will be
adding names as they come through

http://wiki.postgresql.org/wiki/PgCon2012CanadaInCoreReplicationMeeting

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-04-26 Thread Jim Nasby


On 4/26/12 7:41 AM, Simon Riggs wrote:

5. WRITE-SCALEABLE - the ability to partition data across nodes in a
way that allows the solution to improve beyond the write rate of a
single node.


It would be valuable to look at READ-SCALEABLE as well; specifically a second form of 
synchronous replication where you can read from a slave immediately after 
transaction commit and have the changes be visible. That ability would make it trivial to spread 
reads off of the master database.

My hope is this wouldn't be horribly painful to achieve if we relaxed the need 
to fsync the corresponding WAL on the slave; kind of the opposite of the 
semi-synchronous mode we have now. My theory is that thanks to full page writes 
a slave should normally have the necessary pages to handle a WAL record in 
cache, so actually applying the WAL change shouldn't be horribly slow.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-04-26 Thread Josh Berkus

Simon,

So the idea is that you'll present briefly your intentions for 9.3 at
the developer meeting, and then have this in-depth afterwards?  Sounds
great.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-04-26 Thread Robert Haas

On Thu, Apr 26, 2012 at 6:48 PM, Josh Berkus j...@agliodbs.com wrote:
 So the idea is that you'll present briefly your intentions for 9.3 at
 the developer meeting, and then have this in-depth afterwards?  Sounds
 great.

I really, really do not want the developer meeting to turn into a
series of presentations.  That's what we had last year, and it was
boring and unproductive.  Furthermore, it excludes from the
conversation everyone who doesn't fit into a room in Ottawa.  I think
that plans should be presented here, on pgsql-hackers, and the
developer meeting should be reserved for discussion of issues with
which everyone who will be there is already familiar.  If a
presentation is required, the developer meeting is the wrong forum,
IMHO of course.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future In-Core Replication

2012-04-26 Thread Josh Berkus

Simon,

 I'm beginning to work on advanced additions to in-core replication for
 PostgreSQL.
...
 Those are the basic requirements that I am trying to address. There
 are a great many important details, but the core of this is probably
 what I would call logical replication, that is shipping changes to
 other nodes in a way that does not tie us to the same physical
 representation that recovery/streaming replication does now. Of
 course, non-physical replication can take many forms.

So, I'm a bit confused.  You talk about this as additions to in-core
replication, but then you talk about implementing logical replication,
which would NOT be an addition to Binary replication.  Can you explain
what you mean?


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

49 matches

Mail list logo