I'll echo what Ben said -- if a pre-existing solution does what you need,
certainly use that.

Having said that, I want to revisit frequent directions in light of the
work Charlie did on using it for ridge regression. And when I asked
internally I was told that Flink is where at least my company seems to be
going for such jobs. So when I get a chance to dive into that, I'll be
learning how to do it in Flink.

  jon

On Tue, Apr 6, 2021 at 11:26 AM Ben Krug <ben.k...@imply.io> wrote:

> I can't answer about Spark or Flink, but as a druid person, I'll put in a
> plug for druid for the "if necessary" case.  It can ingest from kafka and
> aggregate and do sketches during ingestion.  (It's a whole new ballpark,
> though, if you're not already using it.)
>
> On Tue, Apr 6, 2021 at 9:56 AM Alex Garland <agarl...@expediagroup.com>
> wrote:
>
>> Hi
>>
>>
>>
>> New to DataSketches and looking forward to using, seems like a great
>> library.
>>
>>
>>
>> My team are evaluating it to profile streaming data (in Kafka) in
>> 5-minute windows. The obvious options for stream processing (given
>> experience within our org) would be either Flink or Spark Streaming.
>>
>>
>>
>> Two questions:
>>
>>    - Would I be right in thinking that there are not existing
>>    integrations as libraries for either of these platforms? Absolutely fine 
>> if
>>    not, just confirming understanding.
>>    - Is there any view (from either the maintainers or the wider
>>    community) on whether either of those two are easier to integrate with
>>    DataSketches? We would also consider other streaming platforms if
>>    necessary, but as mentioned wider usage within the org would lean us
>>    against that if at all possible.
>>
>>
>>
>> Many thanks
>>
>

Reply via email to