Hi all,

I'm rather new to Samza and trying some things out using Kafka as the
message broker. One usecase i was interested in which is mentioned on the
documentation is creating a table-stream join using bootstrap streams.

I'm interested in some recommendations/thoughts concerning the changelog
and database possibly going out of sync.

Suppose I have my database push a changelog to Kafka for every
insert/update/delete and then have a Samza job consume this stream as a
bootstrap (+ maybe some other datastream).

The only info about the database this job will ever see is by reading the
Kafka stream containing the changelogs (maybe compacted by kafka based on
key..). So losing any of these changelog messages ever is not an option as
then this job's view on the database will be wrong forever. This implies
that Kafka needs to be forced in fsyncing every new message for this
changelog topic? Or would it be better to still provide a complete
recreation of the changelog stream based on the current contents of the
database in case of disaster (all Kafka nodes losing power at the same
time). Or would it be bettter to recreate the database based on the
changelog (still some dataloss but at least the database and the changelog
are in sync).

Any thought/experiences/references iis much appreciated.
Regards,
Bart


-- 
Bart De Vylder
+32(0)496/558065
bartdevyl...@gmail.com

Reply via email to