Hey! I was hoping I could get some input from people more experienced with
Kafka Streams to determine if they'd be a good use case/solution for me.

I have multi-tenant clients submitting data to a Kafka topic that they want
ETL'd to a third party service.  I'd like to batch and group these by
tenant over a time window, somewhere between 1 and 5 minutes.  At the end
of a time window then issue an API request to the third party service for
each tenant sending the batch of data over.

Other points of note:
- Ideally we'd have exactly-once semantics, sending data multiple times
would typically be bad.  But we'd need to gracefully handle things like API
request errors / service outages.

- We currently use Storm for doing stream processing, but the long running
time-windows and potentially large amount of data stored in memory make me
a bit nervous to use it for this.

Thoughts?  Thanks in Advance!
Stephen

Reply via email to