Re: Reliably producing records to remote cluster: what are my options?

Hagen Rother Wed, 13 Sep 2017 13:18:06 -0700

In my experience, 7 is the easiest route. Just make sure to run the
mirror-maker on the consumer side of the wan, it's order of magnitude
faster this way.


If you put
receive.buffer.bytes=33554432
send.buffer.bytes=33554432
in your consumer config and adjust the remote server.config to
socket.receive.buffer.bytes=33554432
socket.send.buffer.bytes=33554432

you can reliably mirror large volumes across the atlantic (we do). It would
be so much nicer to run the mirror-maker on the producer side of the wan
(enable compression in the mirror-maker and have compressed data on wan,
but cpu for that outside the hotpath; but like I said, that's order of
magnitude slower for unknown (but reproducable) reasons.

Cheers,
Hagen

On Tue, Sep 12, 2017 at 9:19 PM, Philip Schmitt <philip.schm...@outlook.com>
wrote:

> Hi!
>
>
>
> We want to reliably produce events into a remote Kafka cluster in (mostly)
> near real-time. We have to provide an at-least-once guarantee.
>
> Examples are a "Customer logged in" event, that will be consumed by a data
> warehouse for reporting (numbers should be correct) or a "Customer
> unsubscribed from newsletter" event, that determines whether the customer
> gets emails (if she unsubscribes, but the message is lost, she will not be
> happy).
>
>
>
> Context:
>
>   *   We run an ecommerce website on a cluster of up to ten servers and an
> Oracle database.
>   *   We have a small Kafka cluster at a different site. We have in the
> past had a small number of network issues, where the web servers could not
> reach the other site for maybe an hour.
>   *   We don't persist all events in the database. If the application is
> restarted, events that occurred before the restart cannot be sent to Kafka.
> The row of a customer might have a newer timestamp, but we couldn't tell
> which columns were changed.
>
>
>
> Concerns:
>
>   *   In case of, for example, a network outage between the web servers
> and the Kafka cluster, we may accumulate thousands of events on each web
> server that cannot be sent to Kafka. If a server is shut down during that
> time, the messages would be lost.
>   *   If we produce to Kafka from within the application in addition to
> writing to the database, the data may become inconsistent if one of the
> writes fails.
>
>
>
>
>
> The more I read about Kafka, the more options I see, but I cannot assess,
> how well the options might work and what the trade-offs between the options
> are.
>
>
>
>   1.  produce records directly within the application
>   2.  produce records from the Oracle database via Kafka Connect
>   3.  produce records from the Oracle database via a CDC solution
> (GoldenGate, Attunity, Striim, others?)
>   4.  persist events in log files and produce to Kafka via elastic
> Logstash/Filebeat
>   5.  persist events in log files and produce to Kafka via a Kafka Connect
> source connector
>   6.  persist events in a local, embedded database and produce to Kafka
> via an existing source connector
>   7.  produce records directly within the application to a new Kafka
> cluster in the same network and mirror to remote cluster
>   8.  ?
>
>
>
> These are all the options I could gather so far. Some of the options
> probably won't work for my situation -- for example Oracle Golden Gate
> might be too expensive -- but I don't want to rule anything out just yet.
>
>
>
>
>
> How would you approach this, and why? Which options might work? Which
> options would you advise against?
>
>
>
>
> I appreciate any advice. Thank you in advance.
>
>
> Thanks,
>
> Philip
>



-- 
*Hagen Rother*
Lead Architect | LiquidM
------------------------------
LiquidM Technology GmbH
Rosenthaler Str. 36 | 10178 Berlin | Germany
Phone: +49 176 15 00 38 77
Internet: www.liquidm.com | LinkedIn
<http://www.linkedin.com/company/3488199?trk=tyah&trkInfo=tas%3AliquidM%2Cidx%3A1-2-2>
------------------------------
Managing Directors | André Bräuer, Philipp Simon, Thomas Hille
Jurisdiction | Local Court Berlin-Charlottenburg HRB 152426 B

Re: Reliably producing records to remote cluster: what are my options?

Reply via email to