Very good points, Gwen. I hadn't thought of Oracle Streams case of
dependencies. I wonder if GoldenGate handles this better?
The tradeoff of these approaches is that each RDBMS will be proprietary on
how to get this CDC information. I guess GoldenGate can be a standard
interface on RDBMs, but
Hi Jonathan,
I agree we can have topic-per-table, but some transactions may span
multiple tables and therefore will get applied partially out-of-order. I
suspect this can be a consistency issue and create a state that is
different than the state in the original database, but I don't have good
With mypipe (MySQL - Kafka) we've had a similar discussion re: topic names
and preserving transactions.
At this point:
- Kafka topic names are configurable allowing for per db or per table topics
- transactions maintain a transaction ID for each event when published into
Kafka
Hi Gwen,
As you said I see Bottled Water and Sqoop managing slightly different use
cases so I don't see this feature as a Sqoop killer. However I did have a
question on your comment that the transaction log or CDC approach will have
problems with very large, very active databases.
I get that
Hello Everyone,
I am quite exited about the recent example of replicating PostgresSQL
Changes to Kafka. My view on the log compaction feature always had been
a very sceptical one, but now with its great potential exposed to the
wide public, I think its an awesome feature. Especially when
I feel a need to respond to the Sqoop-killer comment :)
1) Note that most databases have a single transaction log per db and in
order to get the correct view of the DB, you need to read it in order
(otherwise transactions will get messed up). This means you are limited to
a single producer