Re: [HACKERS] Hot standby, overflowed snapshots, testing

2009-11-15 Thread Robert Hodges
On 11/15/09 2:25 AM PST, Simon Riggs si...@2ndquadrant.com wrote:

 On Sat, 2009-11-14 at 08:43 -0800, Robert Hodges wrote:
 
 I can help set up automated basic tests for hot standby using 1+1 setups on
 Amazon.   I¹m already working on tests for warm standby for our commercial
 Tungsten implementation and need to solve the problem of creating tests that
 adapt flexibly across different replication mechanisms.
 
 I didn't leap immediately to say yes for a couple of reasons.
 
I'm easy on this.  We are going to find some hot standby problems no matter
what from our own testing.  At least I hope so.

It does sound to me as if there is a class of errors that would be easiest
to find by putting up a long running test that throws a lot of different
queries at the server over time.  We have such tests already written in our
Bristlecone tools. 

Cheers, Robert


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby, overflowed snapshots, testing

2009-11-14 Thread Robert Hodges
Hi Simon and Heikki,

I can help set up automated basic tests for hot standby using 1+1 setups on
Amazon.   I¹m already working on tests for warm standby for our commercial
Tungsten implementation and need to solve the problem of creating tests that
adapt flexibly across different replication mechanisms.

It would be nice to add a list of test cases to the write-up on the Hot
Standby wiki (http://wiki.postgresql.org/wiki/Hot_Standby).  I would be
happy to help with that effort.

Cheers, Robert

On 11/13/09 1:43 PM PST, Simon Riggs si...@2ndquadrant.com wrote:

 On Fri, 2009-11-13 at 22:19 +0200, Heikki Linnakangas wrote:
 
 I got the impression earlier that you had some test environment set up
 to test hot standby. Can you share any details of what test cases
 you've run?
 
 Fair question. The Sep 15 submission happened too quickly for us to
 mobilise testers, so the final submission was submitted with only manual
 testing by me. Many last minute major bug fixes meant that the code was
 much less tested than I would have hoped - you found some of those while
 I lay exhausted from the efforts to hit a superimposed and unrealistic
 deadline. I expected us to kick in to fix those but it never happened
 and that was why I was keen to withdraw the patch about a week later.
 
 You've been kicking hell out of it for a while now, rightly so, so I've
 left it a while before commencing another set of changes and more
 testing to follow.
 
 It takes time, and money, to mobilise qualified testers, so that should
 begin again shortly.
 
 I agreed with you at PGday that we shouldn't expect a quick commit.
 There are good reasons for that, but still no panic in my mind about
 skipping this release.
 
 --
  Simon Riggs   www.2ndQuadrant.com
 
 
 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write ahead logging in standby (streaming replication)

2009-11-12 Thread Robert Hodges
Hi Greg and Fujii, 

Just a point on terminology:  there's a difference in the usage of
semi-synchronous between DRBD and MySQL semi-synchronous replication, which
was originally developed by Google.

In the Google case semi-synchronous replication is a quorum algorithm where
clients receive a commit notification only after at least one of N slaves
has received the replication event.  In the DRBD case semi-synchronous means
that events have reached the slave but are not necessarily durable.  There's
no quorum.  

Of these two usages the Google semi-sync approach is the more interesting
because it avoids the availability problems associated with fully
synchronous operation but gets most of the durability benefits.

Cheers, Robert

On 11/12/09 9:29 PM PST, Fujii Masao masao.fu...@gmail.com wrote:

 On Fri, Nov 13, 2009 at 1:49 PM, Greg Smith g...@2ndquadrant.com wrote:
 Right, those are the possibilities, all four of them have valid use cases in
 the field and are worth implementing.  I don't like the label
 semi-synchronous replication myself, but it's a valuable feature to
 implement, and that is unfortunately the term other parts of the industry
 use for that approach.
 
 BTW, MySQL and DRBD use the term semi-synchronous:
 http://forge.mysql.com/wiki/ReplicationFeatures/SemiSyncReplication
 http://www.drbd.org/users-guide/s-replication-protocols.html
 
 Regards,
 
 --
 Fujii Masao
 NIPPON TELEGRAPH AND TELEPHONE CORPORATION
 NTT Open Source Software Center
 
 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Conflict resolution in Multimaster replication(Postgres-R)

2008-09-03 Thread Robert Hodges
Hi Srinivas,

Multi-master replication in Postgres-R is handled using a process called 
certification that ensures there are no serializability violations.  Look at 
the paper by Kemme and Alonzo entitled Don't be Lazy, Be Consistent... 
(http://www.cs.mcgill.ca/~kemme/papers/vldb00.html).  In the first case you 
describe one transaction must abort if applying them would break 
serializability.

In the second case you describe, you must transmit read sets as well as write 
sets.  The same sort of algorithm is applied as for writes.

Please email me directly if you want more information.

Thanks, Robert

On 9/3/08 4:02 PM, M2Y [EMAIL PROTECTED] wrote:

Hello,

My basic question is: in multimaster replication, if each site goes
ahead and does the modifications issued by the transaction and then
sends the writeset to others in the group, how the ACID properties be
maintained?

Details:
Suppose there are two sites in the group, lets say, A and B and are
managing a database D. Two transactions TA and TB started in sites A
and B respectively, at nearly same time, wanted to update same row of
a table in the database. As, no locking structures and other
concurrency handling structures are replicated each will go ahead and
do the modifications in their corresponding databases and sends the
writeset. Since, both writesets contain update to the same row, will
the two transactions be rolled back or anything other than this
happens?

A more general question is: for Transactional isolation level
4(serializable level), the information such as locking of rows be
transmitted across sites? If not, what is the mechanism to address
concurrency with serializibility.

Thanks,
Srinivas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



--
Robert Hodges, CTO, Continuent, Inc.
Email:  [EMAIL PROTECTED]
Mobile:  +1-510-501-3728  Skype:  hodgesrm


Re: [HACKERS] Transaction-controlled robustness for replication

2008-08-12 Thread Robert Hodges
Hi Tom,

Could you expand on why logical application of WAL records is impractical in
these cases?  This is what Oracle does.  Moreover once you are into SQL a
lot of other use cases immediately become practical, such as large scale
master/slave set-ups for read scaling.

Thanks, Robert

On 8/12/08 12:40 PM, Tom Lane [EMAIL PROTECTED] wrote:

 Simon Riggs [EMAIL PROTECTED] writes:
 On Tue, 2008-08-12 at 11:52 -0400, Bruce Momjian wrote:
 What is the attraction of logical application of the WAL logs?
 Transmitting to a server with different architecture?

 Yes,

 * different release
 * different encoding
 * different CPU architecture
 * (with the correct transform) a different DBMS

 The notion that the WAL logs will ever be portable across such
 differences is so ... so ... well, it's barely worth laughing at.

 regards, tom lane

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



--
Robert Hodges, CTO, Continuent, Inc.
Email:  [EMAIL PROTECTED]
Mobile:  +1-510-501-3728  Skype:  hodgesrm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transaction-controlled robustness for replication

2008-08-12 Thread Robert Hodges
Hi Tom,

Part of this is semantics-I like Simon's logical vs. physical terminology 
because it distinguishes neatly between replication that copies implementation 
down to OIDs etc. and replication that copies data content including schema 
changes but not implementation.  It seems a noble goal get both to work well, 
as they are quite complementary.

There are various ways to get information to recapitulate SQL, but 
piggy-backing off WAL record generation has a lot of advantages.  You at least 
have the data structures and don't have to reverse-engineer log information on 
disk.  Of the multiple ways to build capable logical replication solutions, 
this seems to involve the least effort.

My company is currently heads down building a solution for Oracle based on 
reading REDO log files.  It requires a master of Oracle dark arts to decode 
them and is also purely asynchronous.  PostgreSQL will eventually be far better 
as these discussions boil down into designs.

Thanks, Robert

On 8/12/08 8:51 PM, Tom Lane [EMAIL PROTECTED] wrote:

Markus Wanner [EMAIL PROTECTED] writes:
 Robert Hodges wrote:
 Could you expand on why logical application of WAL records is impractical in
 these cases?  This is what Oracle does.  Moreover once you are into SQL a
 lot of other use cases immediately become practical, such as large scale
 master/slave set-ups for read scaling.

 I cannot speak for Tom, but what strikes me as a strange approach here
 is using the WAL for logical application of changes. That's because
 the WAL is quite far away from SQL, and thus from a logical
 representation of the data. It's rather pretty physical, meaning it's
 bound to a certain Postgres release and CPU architecture.

Right.  To take just one example: the value of MAXALIGN affects not only
how many tuples you can put on a heap page (thus changing TIDs of tuples,
which fundamentally breaks most of the current types of WAL records)
but how many tuples you can put on an index page (and thus index page
split decisions, and thereby pretty much every single fact about the
contents of upper btree levels).  We need not go into architecture
dependencies that are stronger than that one, though there are many.
As for version independence, who thinks they can WAL-replicate changes
of the system catalogs into a different version with significantly
different system catalogs?

You couldn't even begin to make this work with anything approaching the
current level of semantic detail of WAL entries.

What I think Simon was actually driving at was query-shipping, which is
not my idea of WAL at all.  It has some usefulness, but also a bunch
of downsides of its very own, mostly centered around reproducibility.
With the current WAL design I have some faith that the slaves reproduce
the contents of the master.  With any logical replication design that
becomes a faith of the religious kind, because it sure isn't provable.

regards, tom lane



--
Robert Hodges, CTO, Continuent, Inc.
Email:  [EMAIL PROTECTED]
Mobile:  +1-510-501-3728  Skype:  hodgesrm


Re: [HACKERS] Follow-up on replication hooks for PostgreSQL

2008-07-10 Thread Robert Hodges
Hi Marko,

No fear, we definitely will discuss on pgsql-hackers.  I just wanted to make 
sure that people understood we are still committed to solving this problem and 
will one way or another commit resources to help.

Just to be clear, by logical replication I mean replication based on sending 
SQL or near-SQL (e.g., generic DML events) between servers.   Physical 
replication on the other hand uses internal formats to replicate changes 
without intervening conversion to SQL, for example by shipping WAL records.  
There are advantages to each for different applications.  BTW, I heard this 
nomenclature from Simon Riggs.  It seems quite helpful.

The DDL trigger proposal is interesting and would be a very useful feature 
addition to PostgreSQL.  To execute correctly it may also be necessary to know 
which database you were using at the time the SQL was issued.

For our part we are looking for ways to replicate most or all data on a server 
as efficiently as possible.  Generic call-outs at commit time or reading the 
log directly are attractive approaches.  Depending on the implementation you 
can avoid double writes of replicated data on the master host.  Also, it avoids 
the management headache of ensuring that triggers are correctly installed.  It 
seems as if one of these generic approaches could hook into WAL record 
transport.

Cheers, Robert

On 7/10/08 4:56 AM, Marko Kreen [EMAIL PROTECTED] wrote:

On 7/10/08, Robert Hodges [EMAIL PROTECTED] wrote:
  This is a quick update on a promise I made early in June to suggest
 requirements as well as ways to add replication hooks that would support
 logical replication, as opposed to the physical replication work currently
 underway based on NTT's code.

  Well, June was a pretty busy month, so it has taken a while to get back to
 this.  However, we are now beginning to examine options for PostgreSQL
 logical replication.  To make a long story short we are willing to commit
 resources to this problem or fund other people to do it for us.  If you are
 interested please contact me directly.  Meanwhile, we are quite serious
 about this problem and intend to work on helpful additions to PostgreSQL in
 this area.  I will post more as we make progress.

Well, I'm not exactly sure what you are planning.  It's OK to do
draft design privately, but before actually starting coding, the
design should be discussed in -hackers.

And I'm not exactly sure what you mean in logical replication?
Way to log DDL statements?  Do you want to log DML also?

FWIW, here's very draft design for functionality that could be used
to make current Slony-I/Londiste-like solutions to replicate DDL also.

1. CREATE DDL TRIGGER statement that allows to call function
   for all DDL statements.

   Only filtering that makes sense here is filtering by area:
   tables/functions/views/etc.

   It must be possible to do AFTER trigger.  Whether BEFORE
   trigger for DDL make sense or not, I'm not sure.

2. When function is called, following information is given:

   - Object type the event was for (table/view/function)
   - Array of object names.
   - SQL statement as text.

   The trigger function can filter further based on object names
   whether it does want to log the event or not.

Trying to make the trigger run on only subset of events is complex,
and parsing the SQL to pieces for trigger to understand it better
is also complex and neither is needed.  Unless there are some common
situation where such simple design fails to work, I would not make
the scheme more complex.

Also the design should be based on assumption that the target side
is exactly in sync.  Eg. DROP CASCADE should be replicated as DROP CASCADE.
We should not make scheme more complex to survive cases where target
is not in sync.  That way madness lies.  The effect should be like
same SQL statements are applied to target by hand, no more, no less.

--
marko



--
Robert Hodges, CTO, Continuent, Inc.
Email:  [EMAIL PROTECTED]
Mobile:  +1-510-501-3728  Skype:  hodgesrm


[HACKERS] Follow-up on replication hooks for PostgreSQL

2008-07-09 Thread Robert Hodges
Hi everyone,

This is a quick update on a promise I made early in June to suggest 
requirements as well as ways to add replication hooks that would support 
logical replication, as opposed to the physical replication work currently 
underway based on NTT's code.

Well, June was a pretty busy month, so it has taken a while to get back to 
this.  However, we are now beginning to examine options for PostgreSQL logical 
replication.  To make a long story short we are willing to commit resources to 
this problem or fund other people to do it for us.  If you are interested 
please contact me directly.  Meanwhile, we are quite serious about this problem 
and intend to work on helpful additions to PostgreSQL in this area.  I will 
post more as we make progress.

Thanks, Robert

--
Robert Hodges, CTO, Continuent, Inc.
Email:  [EMAIL PROTECTED]

P.s., Happy 12th birthday everyone!


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-02 Thread Robert Hodges
Hi Hannu,

Hi Hannu,


On 6/1/08 2:14 PM, Hannu Krosing [EMAIL PROTECTED] wrote:


 As a consequence, I don¹t see how you can get around doing some sort
 of row-based replication like all the other databases.

 Is'nt WAL-base replication some sort of row-based replication ?

Yes, in theory.  However, there's a big difference between replicating
physical WAL records and doing logical replication with SQL statements.
Logical replication requires extra information to reconstruct primary keys.
(Somebody tell me if this is already in the WAL; I'm learning the code as
fast as possible but assuming for now it's not.)


  Now that people are starting to get religion on this issue I would
 strongly advocate a parallel effort to put in a change-set extraction
 API that would allow construction of comprehensive master/slave
 replication.

 Triggers. see pgQ's logtrigga()/logutrigga(). See slides for Marko
 Kreen's presentation at pgCon08.


Thanks very much for the pointer.  The slides look interesting.

Robert


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Table rewrites vs. pending AFTER triggers

2008-06-02 Thread Robert Hodges
Hi Gokul,

If you are saying that DDL should be auto-commit, yes, this really does
limit some use cases.

Transactional DDL is quite helpful for SQL generators, which need to avoid
leaving schema half-changed if the application crashes or there¹s a problem
with the database that causes a command to fail.  SLONY is an example of
such a generator where transactional DDL would be helpful though I don¹t
know for a fact that SLONY uses it.  We have used it in the past for
building queues in SQL, which required multiple schema changes for a single
queue.

In sum, it¹s much easier to implement such tools if you can do a set of
schema changes atomically.  There are no doubt other use cases as well.

Cheers, Robert

On 1/2/08 11:04 PM, Gokulakannan Somasundaram [EMAIL PROTECTED] wrote:

 Is there why we allow DDLs inside a transaction and allow it to be rolled
 back? If we commit the previous transaction, as soon as we encounter a DDL,
 and commit the DDL too (without waiting for commit) will it be affecting some
 use cases?

 I actually mean to say that DDLs can be declared as self-committing. That
 would get rid of these exceptions.

 Am i missing something?

 Thanks,
 Gokul.

 On Jan 3, 2008 12:02 AM, Andrew Dunstan  [EMAIL PROTECTED] wrote:


 Simon Riggs wrote:
 On Tue, 2008-01-01 at 16:09 -0500, Tom Lane wrote:


 Paranoia would
 suggest forbidding *any* form of ALTER TABLE when there are pending
 trigger events, but maybe that's unnecessarily strong.


 That works for me. Such a combination makes no sense, so banning it is
 the right thing to do.



 +1. Doesn't make much sense to me either.

 cheers

 andrew

 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?

http://archives.postgresql.org
 http://archives.postgresql.org





--
Robert Hodges, CTO, Continuent, Inc.
Email:  [EMAIL PROTECTED]
Mobile:  +1-510-501-3728  Skype:  hodgesrm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-01 Thread Robert Hodges
Hi Merlin,

My point here is that with reasonably small extensions to the core you can 
build products that are a lot better than SLONY.   Triggers do not cover DDL, 
among other issues, and it's debatable whether they are the best way to 
implement quorum policies like Google's semi-synchronous replication.  As I 
mentioned separately this topic deserves another thread which I promise to 
start.

It is of course possible to meet some of these needs with an appropriate client 
interface to WAL shipping.  There's no a-priori reason why built-in PostgreSQL 
slaves need to be the only client.  I would put a vote in for covering this 
possibility in the initial replication design.  We are using a very similar 
approach in our own master/slave replication product.

Thanks, Robert

P.S., No offense intended to Jan Wieck et al.  There are some pretty cool 
things in SLONY.

On 5/29/08 8:16 PM, Merlin Moncure [EMAIL PROTECTED] wrote:

On Thu, May 29, 2008 at 3:05 PM, Robert Hodges
[EMAIL PROTECTED] wrote:
 Third, you can't stop with just this feature.  (This is the BUT part of the
 post.)  The use cases not covered by this feature area actually pretty
 large.  Here are a few that concern me:

 1.) Partial replication.
 2.) WAN replication.
 3.) Bi-directional replication.  (Yes, this is evil but there are problems
 where it is indispensable.)
 4.) Upgrade support.  Aside from database upgrade (how would this ever
 really work between versions?), it would not support zero-downtime app
 upgrades, which depend on bi-directional replication tricks.
 5.) Heterogeneous replication.
 6.) Finally, performance scaling using scale-out over large numbers of
 replicas.  I think it's possible to get tunnel vision on this-it's not a big
 requirement in the PG community because people don't use PG in the first
 place when they want to do this.  They use MySQL, which has very good
 replication for performance scaling, though it's rather weak for
 availability.

These type of things are what Slony is for.  Slony is trigger based.
This makes it more complex than log shipping style replication, but
provides lots of functionality.

wal shipping based replication is maybe the fastest possible
solution...you are already paying the overhead so it comes virtually
for free from the point of view of the master.

mysql replication is imo nearly worthless from backup standpoint.

merlin



--
Robert Hodges, CTO, Continuent, Inc.
Email:  [EMAIL PROTECTED]
Mobile:  +1-510-501-3728  Skype:  hodgesrm


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Robert Hodges
Hi Tom,

Thanks for the reasoned reply.  As you saw from point #2 in my comments, I
think you should do this feature.  I hope this answers Josh Berkus' concern
about my comments.

You make a very interesting comment which seems to go to the heart of this
design approach:

 About the only thing that would make me want to consider row-based
 replication in core would be if we determine that read-only slave
 queries are impractical atop a WAL-log-shipping implementation.

It's possible I'm misunderstanding some of the implementation issues, but it
is striking that the detailed responses to your proposal list a number of
low-level dependencies between master and slave states when replicating WAL
records.  It appears that you are designing a replication mechanism that
works effectively between a master and a relatively small number of nearby
slaves.  This is clearly an important use case but it also seems clear that
the WAL approach is not a general-purpose approach to replication.  In other
words, you'll incrementally get to that limited end point I describe.  This
will still leave a lot to be desired on read scaling, not to mention many
other cases.

Hence my original comments.  However, rather than harp on that further I
will open up a separate thread to describe a relatively small set of
extensions to PostgreSQL that would be enabling for a wide range of
replication applications.  Contrary to popular opinion these extensions are
actually well understood at the theory level and have been implemented as
prototypes as well as in commercial patches multiple times in different
databases.  Those of us who are deeply involved in replication deserve just
condemnation for not stepping up and getting our thoughts out on the table.

Meanwhile, I would be interested in your reaction to these thoughts on the
scope of the real-time WAL approach.  There's obviously tremendous interest
in this feature.  A general description that goes beyond the NTT slides
would be most helpful for further discussions.

Cheers, Robert

P.s., The NTT slides were really great.  Takahiro and Masao deserve
congratulations on an absolutely first-rate presentation.

On 5/29/08 9:09 PM, Tom Lane [EMAIL PROTECTED] wrote:

 Andrew Sullivan [EMAIL PROTECTED] writes:
 On Thu, May 29, 2008 at 12:05:18PM -0700, Robert Hodges wrote:
 people are starting to get religion on this issue I would strongly
 advocate a parallel effort to put in a change-set extraction API
 that would allow construction of comprehensive master/slave
 replication.

 You know, I gave a talk in Ottawa just last week about how the last
 effort to develop a comprehensive API for replication failed.

 Indeed, core's change of heart on this issue was largely driven by
 Andrew's talk and subsequent discussion.  We had more or less been
 waiting for the various external replication projects to tell us
 what they wanted in this line, and it was only the realization that
 no such thing was likely to happen that forced us to think seriously
 about what could be done within the core project.

 As I said originally, we have no expectation that the proposed features
 will displace the existing replication projects for high end
 replication problems ... and I'd characterize all of Robert's concerns
 as high end problems.  We are happy to let those be solved outside
 the core project.

 About the only thing that would make me want to consider row-based
 replication in core would be if we determine that read-only slave
 queries are impractical atop a WAL-log-shipping implementation.
 Which could happen; in fact I think that's the main risk of the
 proposed development plan.  But I also think that the near-term
 steps of the plan are worth doing anyway, for various other reasons,
 and so we won't be out too much effort if the plan fails.

 regards, tom lane



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] replication hooks

2008-05-30 Thread Robert Hodges
Hi Marko,

Replication requirements vary widely of course, but DDL support is shared by 
such a wide range of use cases it is very difficult to see how any real 
solution would fail to include it.  This extends to change extraction APIs, 
however, defined.  The question of what DDL to replicate is also quite 
clear-all of it with as few exceptions as possible.

For instance, it is almost impossible to set up and manage replicated systems 
easily if you cannot propagate schema changes in serialized order along with 
other updates from applications.  The inconvenience of using alternative 
mechanisms like the SLONY 'execute script' is considerable and breaks most 
commonly used database management tools.

That said, SLONY at least serializes the changes.  Non-serialized approaches 
lead to serious outages and can get you into distributed consensus problems, 
such as when is it 'safe' to change schema across different instances.  These 
are very hard to solve practically and tend to run into known impossibility 
results like Brewer's Conjecture, which holds that it is impossible to keep 
distributed databases consistent while also remaining open for updates and 
handling network partitions.

I'll post back later on the question of the API.  The key is to do something 
simple that avoids the problems discussed by Andrew and ties it accurately to 
use cases.  However, this requires a more prepared response than my hastily 
written post from last night.

Cheers, Robert

On 5/29/08 9:05 PM, Marko Kreen [EMAIL PROTECTED] wrote:

On 5/29/08, Andrew Sullivan [EMAIL PROTECTED] wrote:
 On Thu, May 29, 2008 at 12:05:18PM -0700, Robert Hodges wrote:
   people are starting to get religion on this issue I would strongly
   advocate a parallel effort to put in a change-set extraction API
   that would allow construction of comprehensive master/slave
   replication.

 You know, I gave a talk in Ottawa just last week about how the last
  effort to develop a comprehensive API for replication failed.  I had
  some ideas about why, the main one of which is something like this:
  Big features with a roadmap have not historically worked, so unless
  we're willing to change the way we work, we won't get that.

  I don't think an API is what's needed.  It's clear proposals for
  particlar features that can be delivered in small pieces.  That's what
  the current proposal offers.  I think any kind of row-based approach
  such as what you're proposing would need that kind of proposal too.

  That isn't to say that I think an API is impossible or undesirable.
  It is to say that the last few times we tried, it went nowhere; and
  that I don't think the circumstances have changed.

I think the issue is simpler - API for synchronous replication is
undesirable - it would be too complex and hinder future development
(as I explained above).

And the API for asynchronous replication is already there - triggers,
txid functions for queueing.

There is this tiny matter of replicating schema changes asynchronously,
but I suspect nobody actually cares.  Few random points about that:

- The task cannot even be clearly defined (on technical level - how
  the events should be represented).
- Any schema changes need to be carefully prepared anyway.  Whether
  to apply them to one or more servers does not make much difference.
- Major plus of async replica is ability to actually have different
  schema on slaves.
- People _do_ care about exact schema on single place - failover servers.
- But for failover server we want also synchronous replication.

So if we have synchronous WAL based replication for failover servers,
the interest in hooks to log schema changes will decrease even more.

--
marko

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



--
Robert Hodges, CTO, Continuent, Inc.
Email:  [EMAIL PROTECTED]
Mobile:  +1-510-501-3728  Skype:  hodgesrm


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-29 Thread Robert Hodges
Hi everyone,

First of all, I'm absolutely delighted that the PG community is thinking 
seriously about replication.

Second, having a solid, easy-to-use database availability solution that works 
more or less out of the box would be an enormous benefit to customers.  
Availability is the single biggest problem for customers in my experience and 
as other people have commented the alternatives are not nice.  It's an 
excellent idea to build off an existing feature-PITR is already pretty useful 
and the proposed features are solid next steps.  The fact that it does not 
solve all problems is not a drawback but means it's likely to get done in a 
reasonable timeframe.

Third, you can't stop with just this feature.  (This is the BUT part of the 
post.)  The use cases not covered by this feature area actually pretty large.  
Here are a few that concern me:

1.) Partial replication.
2.) WAN replication.
3.) Bi-directional replication.  (Yes, this is evil but there are problems 
where it is indispensable.)
4.) Upgrade support.  Aside from database upgrade (how would this ever really 
work between versions?), it would not support zero-downtime app upgrades, which 
depend on bi-directional replication tricks.
5.) Heterogeneous replication.
6.) Finally, performance scaling using scale-out over large numbers of 
replicas.  I think it's possible to get tunnel vision on this-it's not a big 
requirement in the PG community because people don't use PG in the first place 
when they want to do this.  They use MySQL, which has very good replication for 
performance scaling, though it's rather weak for availability.

As a consequence, I don't see how you can get around doing some sort of 
row-based replication like all the other databases.  Now that people are 
starting to get religion on this issue I would strongly advocate a parallel 
effort to put in a change-set extraction API that would allow construction of 
comprehensive master/slave replication.  (Another approach would be to make it 
possible for third party apps to read the logs and regenerate SQL.) There are 
existing models for how to do change set extraction; we have done it several 
times at my company already.  There are also research projects like GORDA that 
have looked fairly comprehensively at this problem.

My company would be quite happy to participate in or even sponsor such an API.  
Between the proposed WAL-based approach and change-set-based replication it's 
not hard to see PG becoming the open source database of choice for a very large 
number of users.

Cheers, Robert

On 5/29/08 6:37 PM, Tom Lane [EMAIL PROTECTED] wrote:

David Fetter [EMAIL PROTECTED] writes:
 On Thu, May 29, 2008 at 08:46:22AM -0700, Joshua D. Drake wrote:
 The only question I have is... what does this give us that PITR
 doesn't give us?

 It looks like a wrapper for PITR to me, so the gain would be ease of
 use.

A couple of points about that:

* Yeah, ease of use is a huge concern here.  We're getting beat up
because people have to go find a separate package (and figure out
which one they want), install it, learn how to use it, etc.  It doesn't
help that the most mature package is Slony which is, um, not very
novice-friendly or low-admin-complexity.  I personally got religion
on this about two months ago when Red Hat switched their bugzilla
from Postgres to MySQL because the admins didn't want to deal with Slony
any more.  People want simple.

* The proposed approach is trying to get to real replication
incrementally.  Getting rid of the loss window involved in file-by-file
log shipping is step one, and I suspect that step two is going to be
fixing performance issues in WAL replay to ensure that slaves can keep
up.  After that we'd start thinking about how to let slaves run
read-only queries.  But even without read-only queries, this will be
a useful improvement for HA/backup scenarios.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



--
Robert Hodges, CTO, Continuent, Inc.
Email:  [EMAIL PROTECTED]
Mobile:  +1-510-501-3728  Skype:  hodgesrm


[HACKERS] Deterministic locking in PostgreSQL

2008-05-09 Thread Robert Hodges
Hi everyone,

This question may have an obvious answer I have somehow missed, but to what
extent is locking order deterministic in PostgreSQL?  For example, if
requests from multiple transactions arrive in some deterministic order and
acquire locks, can one assume that locks will be granted in the same order
if the requests are repeated at different times or on different servers?

Lock determinism is an important issue for replication algorithms that
depend on database instances to behave as state machines.  Here's a simple
example of the behavior I'm seeking.   Suppose you have transactions T1, T2,
and T3 that execute as shown below.  Each line represents an increment of
time.

T1, T2, T3: begin
T1: update foo set value='x' where id=25;   -- Grabs row lock
T2: update foo set value='y' where id=25;   -- Blocked
T3: update foo set value='z' where id=25;   -- Blocked
T1: update foo set value='x1' where id=25;
T1: commit
T2: commit
T3: commit

T2 and T3 are both blocked until T1 commits.  At that point, is the row lock
granted to T2 and T3 in some deterministic order?  Or can it vary based on
load, lock manager state, etc., so that sometimes you get 'y' and sometimes
'z' as the final result?

If this case turns out to be deterministic, are there other cases that come
to mind that would turn out to be non-deterministic?

Thanks, Robert

--
Robert Hodges, CTO, Continuent, Inc.
Email:  [EMAIL PROTECTED]
Mobile:  +1-510-501-3728  Skype:  hodgesrm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deterministic locking in PostgreSQL

2008-05-09 Thread Robert Hodges
Hi Tom,

First of all thanks for the quick response.  No, the arrival order will not
be deterministic.  Here is how we ensure determinism.

1.) SQL requests are delivered to the replication agent in a specific total
order.  This could occur either because they were already serialized by a
database (master/slave case) or delivery through group communications
(master/master case).

2.) Within replication we use advisory table locks at the middleware level
to guide scheduling of request execution.  This allows non-conflicting SQL
statements to proceed in parallel but blocks those that might conflict.

The reason I asked about determinism in locking is that this algorithm has a
problem with distributed deadlock.  If you look back at the example in the
original post, you get the following:

1: T1, T2, T3: begin
2: T1: update foo set value='x' where id=25;   -- Grabs row lock,
grabs and releases middleware table lock
3: T2: update foo set value='y' where id=25;   -- Grabs middleware
table lock, blocks on row lock
4: T3: update foo set value='z' where id=25;   -- DEADLOCKED
5: T1: update foo set value='x1' where id=25;
6: T1: commit
7: T2: commit
8: T3: commit

At step 3 we deadlock since the request blocks in the database while holding
the middleware table lock.

Our plan to alleviate this problem is to look for requests that block (i.e.,
show up in pg_locks) and release their middleware table lock.  As long as
locks are granted deterministically this allows the next request to
proceed--the ordering is now enforced by the database itself.

There are some other possible race conditions, such as results of
sub-selects on UPDATE statements, but this optimization will help us avoid a
number of unnecessary failures in master/master replication.  If anything
else about this raises hackles on your neck (or anyone else's for that
matter) please let me know.  It's better to know now.  :)

Cheers, Robert

On 5/9/08 4:53 PM, Tom Lane [EMAIL PROTECTED] wrote:

 Robert Hodges [EMAIL PROTECTED] writes:
 This question may have an obvious answer I have somehow missed, but to what
 extent is locking order deterministic in PostgreSQL?  For example, if
 requests from multiple transactions arrive in some deterministic order and
 acquire locks, can one assume that locks will be granted in the same order
 if the requests are repeated at different times or on different servers?

 Yeah, it should be deterministic given consistent arrival order.

 Lock determinism is an important issue for replication algorithms that
 depend on database instances to behave as state machines.

 However, the idea of depending on a replication algorithm that has race
 conditions gives me the willies ... and that sure sounds like what you
 are describing.  Do not trust your data to the assumption that arrival
 order will be deterministic.

 regards, tom lane



--
Robert Hodges, CTO, Continuent, Inc.
Email:  [EMAIL PROTECTED]
Mobile:  +1-510-501-3728  Skype:  hodgesrm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Table rewrites vs. pending AFTER triggers

2008-03-24 Thread Robert Hodges
Hi,

I'm with David on this one.  Transactional DDL also turns out to be
incredibly helpful for tools that generate and load DDL to extend the
database, for example triggers and control tables to implement reliable
messaging.  You can put the setup in a single transaction, which vastly
simplifies tool implementation.

We have an application at Continuent that depends on exactly this behavior.
I was investigating PostgreSQL semantics just last week and was delighted to
find they appear to be exactly right.  Oracle on the other hand is going to
be a pain...

Cheers,

Robert Hodges

On 1/3/08 12:11 AM, David Fetter [EMAIL PROTECTED] wrote:

 On Thu, Jan 03, 2008 at 01:08:47PM +0530, Gokulakannan Somasundaram wrote:
 On Jan 3, 2008 12:44 PM, Tom Lane [EMAIL PROTECTED] wrote:
 Gokulakannan Somasundaram [EMAIL PROTECTED] writes:
 I actually mean to say that DDLs can be declared as
 self-committing.

 Egad, an Oracle lover in our midst.

 :). True, its an impact of working more with Oracle. I made the
 suggestion here, because it might reduce some if conditions.

 Most of us think that roll-back-able DDL is one of the best
 features of Postgres, and certainly one of our best selling points
 vis-a-vis Oracle.  Don't expect us to give it up.

 Can you please explain, any specific use-case where DDLs are
 necessary within a transaction?

 Let's imagine that you want to do a DDL change to a production
 database.  You've tested the change script on a test database, but you
 want to be sure you get *exactly* from the place you were to the place
 you want to be.  With transactional DDL, you know absolutely for sure
 that you've done either the whole change or none of it, i.e. not
 half-way in between :)

 Cheers,
 David (a giant fan of transactional DDL)


-
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Test lab

2007-11-06 Thread Robert Hodges

Hi everyone,

Here are a couple of additions to the performance test lab  
discussion.   I hope you will find these useful.


1.) Test tools.  The Bristlecone testing package I presented at the  
PG Fall 2007 Conference is now available at http:// 
bristlecone.continuent.org.  There are two main tools:  Evaluator and  
Benchmark.  Evaluator generates a CPU-intensive mixed load.   
Benchmark generates very specific loads with systematically varying  
parameters.  I have been using bristlecone to do a lot of testing of  
MySQL and PostgreSQL, since we have middleware that runs on both.  I  
plan to follow Josh's request and run some of the current benchmarks  
to compare 8.2.5 vs. 8.3 performance.   So far most of my tests have  
compared MySQL and PostgreSQL vs. our middleware but I recently  
started to compare the databases directly.   One initial result:   
MySQL appears to be much faster at streaming very large result sets.


2.) Test hardware.  We have a number of hosts in Grenoble, France  
that are available to help set up a European lab.We gave away 4  
to the postgresql.fr folks but if there's anyone else within driving  
(or trucking distance) we still have at least a dozen 1U rack  
mountable Compaq units.  They are in a garage and winter will soon be  
upon the Alps, so we need to try to unload them.   Unluckily we  
overbought hardware in this location but with luck this can be  
someone else's good fortune.   It probably won't help in the US of A  
due to shipping costs.


Please look at Bristlecone.  It's very early on but I have found  
these tools to be exceedingly useful.   Among other things it should  
be possible to add features that allow us to do regression testing on  
performance,  something that is a pain for normal test frameworks.


Cheers, Robert

Robert Hodges, CTO, Continuent, Inc.
Email:  [EMAIL PROTECTED]
Mobile:  +1-510-501-3728  Skype:  hodgesrm


On Nov 6, 2007, at 9:49 AM, Mark Wong wrote:


On Tue, 06 Nov 2007 13:15:02 +
Simon Riggs [EMAIL PROTECTED] wrote:


On Mon, 2007-11-05 at 14:33 -0800, Mark Wong wrote:

On 11/4/07, Simon Riggs [EMAIL PROTECTED] wrote:


Why don't you post a TODO list for TPC-E somewhere, so people  
can bite
small pieces off of the list. I'm sure there's lots of people  
can help

if we do it that way.


This should be a good start:

http://osdldbt.sourceforge.net/dbt5/todo.html



Ah, thanks.

Not sure what some of the TODOs mean, but I'll see if I have time to
look at some of the code to see if I can help.


No worries, just ask when you get to it. ;)  I'm making slow  
progress anyway.  I'll get to them all eventually...


Mark

---(end of  
broadcast)---

TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq