In my experience, 7 is the easiest route. Just make sure to run the mirror-maker on the consumer side of the wan, it's order of magnitude faster this way.
If you put receive.buffer.bytes=33554432 send.buffer.bytes=33554432 in your consumer config and adjust the remote server.config to socket.receive.buffer.bytes=33554432 socket.send.buffer.bytes=33554432 you can reliably mirror large volumes across the atlantic (we do). It would be so much nicer to run the mirror-maker on the producer side of the wan (enable compression in the mirror-maker and have compressed data on wan, but cpu for that outside the hotpath; but like I said, that's order of magnitude slower for unknown (but reproducable) reasons. Cheers, Hagen On Tue, Sep 12, 2017 at 9:19 PM, Philip Schmitt <philip.schm...@outlook.com> wrote: > Hi! > > > > We want to reliably produce events into a remote Kafka cluster in (mostly) > near real-time. We have to provide an at-least-once guarantee. > > Examples are a "Customer logged in" event, that will be consumed by a data > warehouse for reporting (numbers should be correct) or a "Customer > unsubscribed from newsletter" event, that determines whether the customer > gets emails (if she unsubscribes, but the message is lost, she will not be > happy). > > > > Context: > > * We run an ecommerce website on a cluster of up to ten servers and an > Oracle database. > * We have a small Kafka cluster at a different site. We have in the > past had a small number of network issues, where the web servers could not > reach the other site for maybe an hour. > * We don't persist all events in the database. If the application is > restarted, events that occurred before the restart cannot be sent to Kafka. > The row of a customer might have a newer timestamp, but we couldn't tell > which columns were changed. > > > > Concerns: > > * In case of, for example, a network outage between the web servers > and the Kafka cluster, we may accumulate thousands of events on each web > server that cannot be sent to Kafka. If a server is shut down during that > time, the messages would be lost. > * If we produce to Kafka from within the application in addition to > writing to the database, the data may become inconsistent if one of the > writes fails. > > > > > > The more I read about Kafka, the more options I see, but I cannot assess, > how well the options might work and what the trade-offs between the options > are. > > > > 1. produce records directly within the application > 2. produce records from the Oracle database via Kafka Connect > 3. produce records from the Oracle database via a CDC solution > (GoldenGate, Attunity, Striim, others?) > 4. persist events in log files and produce to Kafka via elastic > Logstash/Filebeat > 5. persist events in log files and produce to Kafka via a Kafka Connect > source connector > 6. persist events in a local, embedded database and produce to Kafka > via an existing source connector > 7. produce records directly within the application to a new Kafka > cluster in the same network and mirror to remote cluster > 8. ? > > > > These are all the options I could gather so far. Some of the options > probably won't work for my situation -- for example Oracle Golden Gate > might be too expensive -- but I don't want to rule anything out just yet. > > > > > > How would you approach this, and why? Which options might work? Which > options would you advise against? > > > > > I appreciate any advice. Thank you in advance. > > > Thanks, > > Philip > -- *Hagen Rother* Lead Architect | LiquidM ------------------------------ LiquidM Technology GmbH Rosenthaler Str. 36 | 10178 Berlin | Germany Phone: +49 176 15 00 38 77 Internet: www.liquidm.com | LinkedIn <http://www.linkedin.com/company/3488199?trk=tyah&trkInfo=tas%3AliquidM%2Cidx%3A1-2-2> ------------------------------ Managing Directors | André Bräuer, Philipp Simon, Thomas Hille Jurisdiction | Local Court Berlin-Charlottenburg HRB 152426 B