I can't answer about Spark or Flink, but as a druid person, I'll put in a plug for druid for the "if necessary" case. It can ingest from kafka and aggregate and do sketches during ingestion. (It's a whole new ballpark, though, if you're not already using it.)
On Tue, Apr 6, 2021 at 9:56 AM Alex Garland <agarl...@expediagroup.com> wrote: > Hi > > > > New to DataSketches and looking forward to using, seems like a great > library. > > > > My team are evaluating it to profile streaming data (in Kafka) in 5-minute > windows. The obvious options for stream processing (given experience within > our org) would be either Flink or Spark Streaming. > > > > Two questions: > > - Would I be right in thinking that there are not existing > integrations as libraries for either of these platforms? Absolutely fine if > not, just confirming understanding. > - Is there any view (from either the maintainers or the wider > community) on whether either of those two are easier to integrate with > DataSketches? We would also consider other streaming platforms if > necessary, but as mentioned wider usage within the org would lean us > against that if at all possible. > > > > Many thanks >