Hi all, I'm rather new to Samza and trying some things out using Kafka as the message broker. One usecase i was interested in which is mentioned on the documentation is creating a table-stream join using bootstrap streams.
I'm interested in some recommendations/thoughts concerning the changelog and database possibly going out of sync. Suppose I have my database push a changelog to Kafka for every insert/update/delete and then have a Samza job consume this stream as a bootstrap (+ maybe some other datastream). The only info about the database this job will ever see is by reading the Kafka stream containing the changelogs (maybe compacted by kafka based on key..). So losing any of these changelog messages ever is not an option as then this job's view on the database will be wrong forever. This implies that Kafka needs to be forced in fsyncing every new message for this changelog topic? Or would it be better to still provide a complete recreation of the changelog stream based on the current contents of the database in case of disaster (all Kafka nodes losing power at the same time). Or would it be bettter to recreate the database based on the changelog (still some dataloss but at least the database and the changelog are in sync). Any thought/experiences/references iis much appreciated. Regards, Bart -- Bart De Vylder +32(0)496/558065 bartdevyl...@gmail.com