I'll echo what Ben said -- if a pre-existing solution does what you need, certainly use that.
Having said that, I want to revisit frequent directions in light of the work Charlie did on using it for ridge regression. And when I asked internally I was told that Flink is where at least my company seems to be going for such jobs. So when I get a chance to dive into that, I'll be learning how to do it in Flink. jon On Tue, Apr 6, 2021 at 11:26 AM Ben Krug <ben.k...@imply.io> wrote: > I can't answer about Spark or Flink, but as a druid person, I'll put in a > plug for druid for the "if necessary" case. It can ingest from kafka and > aggregate and do sketches during ingestion. (It's a whole new ballpark, > though, if you're not already using it.) > > On Tue, Apr 6, 2021 at 9:56 AM Alex Garland <agarl...@expediagroup.com> > wrote: > >> Hi >> >> >> >> New to DataSketches and looking forward to using, seems like a great >> library. >> >> >> >> My team are evaluating it to profile streaming data (in Kafka) in >> 5-minute windows. The obvious options for stream processing (given >> experience within our org) would be either Flink or Spark Streaming. >> >> >> >> Two questions: >> >> - Would I be right in thinking that there are not existing >> integrations as libraries for either of these platforms? Absolutely fine >> if >> not, just confirming understanding. >> - Is there any view (from either the maintainers or the wider >> community) on whether either of those two are easier to integrate with >> DataSketches? We would also consider other streaming platforms if >> necessary, but as mentioned wider usage within the org would lean us >> against that if at all possible. >> >> >> >> Many thanks >> >