On Mon, Oct 15, 2012 at 4:51 PM, Andres Freund <and...@2ndquadrant.com> wrote: > On Monday, October 15, 2012 10:08:28 PM Christopher Browne wrote: >> On Mon, Oct 15, 2012 at 3:18 PM, Peter Geoghegan <pe...@2ndquadrant.com> > wrote: >> > On 15 October 2012 19:19, Bruce Momjian <br...@momjian.us> wrote: >> >> I think Robert is right that if Slony can't use the API, it is unlikely >> >> any other replication system could use it. >> > >> > I don't accept that. Clearly there is a circular dependency, and >> > someone has to go first - why should the Slony guys invest in adopting >> > this technology if it is going to necessitate using a forked Postgres >> > with an uncertain future? That would be (with respect to the Slony >> > guys) a commercial risk that is fairly heavily concentrated with >> > Afilias. >> >> Yep, there's something a bit too circular there. >> >> I'd also not be keen on reimplementing the "Slony integration" over >> and over if it turns out that the API churns for a while before >> stabilizing. That shouldn't be misread as "I expect horrible amounts >> of churn", just that *any* churn comes at a cost. And if anything >> unfortunate happens, that can easily multiply into a multiplicity of >> painfulness(es?). > > Well, as a crosscheck, could you list your requirements? > > Do you need anything more than outputting data in a format compatible to whats > stored in sl_log_*? You wouldn't have sl_actionseq, everything else should be > there (Well, you would need to do lookups to get the tableid, but thats not > really much of a problem). The results would be ordered in complete > transactions, in commit order.
Hmm. We need to have log data that's in a compatible ordering. We use sl_actionseq, and can mix data from multiple transactions together; if what you're providing is, instead, in order based on transaction commit order followed by some sequencing within each transaction, then that should be acceptable. The stylized query on sl_log_* looks like... select log_origin, log_txid, log_tableid, log_actionseq, log_tablenspname, log_tablerelname, log_cmdtype, log_cmdupdncols, log_cmdargs from %s.sl_log_%d where log_origin = %d How about I "quibble" about each of these: a) log_origin - this indicates the node from which the data originates. Presumably, this is implicit in a "chunk" of data that is coming in. b) log_txid - indicating the transaction ID. I presume you've got this available. It's less important with the WAL-based scheme in that we'd probably not be using it as a basis for querying as is the case today with Slony. c) log_tableid - indicating the ID of the table. Are you capturing an OID equivalent to this? Or what? d) log_actionseq - indicating relative sequences of updates. You don't have this, but if you're capturing commit ordering, we don't need it. e) log_tablenspname, log_tablerelname - some small amount of magic needful to get this. Or perhaps you are already capturing it? f) log_cmdtype - I/U/D/T - indicating the action (insert/update/delete/truncate). Hopefully you have something like this? g) log_cmdupdncols - for UPDATE action, the number of updated columns. Probably not mandatory; this was a new 2.1 thing... h) log_cmdargs - the actual data needed to do the I/U/D. The form of this matters a fair bit. Before Slony 2.1, this was a portion of a SQL statement, omitting the operation (provided in log_cmdtype) and the table name (in log_tablerelname et al). In Slony 2.1, this changes to be a text[] array that essentially consists of pairs of [column name, column value] values. I see one place, very notable in Slony 2.2, that would also be more complicated, which is the handling of DDL. In 2.1 and earlier, we handled DDL as "events", essentially out of band. This isn't actually correct; it could mix very badly if you had replication activity mixing with DDL requests. (More detail than you want is in a bug on this... <http://www.slony.info/bugzilla/show_bug.cgi?id=137>). In Slony 2.2, we added a third "log table" where DDL gets captured. sl_log_script has much the same schema as sl_log_{1,2}; it needs to get "mixed in" in compatible order. What I imagine would pointedly complicate life is if a single transaction contained both DDL and "regular replicable activity." Slony 2.2 mixes this in using XID + log_actionseq; how this would play out with your log capture mechanism isn't completely clear to me. That's the place where I'd expect the very messiest interaction. > I guess the other tables would stay as they are as they contain the "added > value" of slony? A fair bit of Slony is about the "event infrastructure," and some of that ceases to be as needful. The configuration bits probably continue to remain interesting. The parts that seem notably mysterious to me at the moment are: a) How do we pull result sets (e.g. - sl_log_* data)? b) How is the command data represented? c) If we have a need to mix together your 'raw logs' and other material (e.g. - our sl_log_script that captures DDL-like changes to be mixed back in), how easy|impossible is this? -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?" -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers