Hi,

I work at a research facility where numerous hi-res detectors produce thousands 
of GB of data every day. We want to build a highly flexible and performant 
streaming platform for storing, transmitting and routing the data. For example, 
detector output needs to end up:

1. In permanent storage systems 
2. In realtime or semi-realtime visualization software
3. In post-processing and analysis software
4. In metrics software

...and possibly more. Now I'm exploring Kafka as an option to back such a 
platform.

Would Kafka be a good fit? The reason I'm asking is because among the use 
cases, I've mostly seen Kafka being used with more lightweight data, geared 
towards business events and high frequency streams of text and scalars. In 
other words, *more* but *smaller* messages.

In my situation, we'd be looking at low frequency but huge files (typically 
these detectors produce one large file at a time). In order not to flood the 
storage, raw data topics would need to have a very short retention time (hours 
to days).

Does anybody have experience in using Kafka in a similar scenario? What are 
your thoughts about the situation I describe? Would we benefit from using Kafka?

I highly appreciate any input on this. Many thanks in advance. 

Best regards
Hannes

Sent from my iPhone

Reply via email to