Actually it looks like the better way would be to output the counts to a
new topic then ingest that topic into the DB itself.  Is that the correct
way?

On Fri, Mar 2, 2018 at 9:24 AM, Matt Daum <m...@setfive.com> wrote:

> I am new to Kafka but I think I have a good use case for it.  I am trying
> to build daily counts of requests based on a number of different attributes
> in a high throughput system (~1 million requests/sec. across all  8
> servers).  The different attributes are unbounded in terms of values, and
> some will spread across 100's of millions values.  This is my current
> through process, let me know where I could be more efficient or if there is
> a better way to do it.
>
> I'll create an AVRO object "Impression" which has all the attributes of
> the inbound request.  My application servers then will on each request
> create and send this to a single kafka topic.
>
> I'll then have a consumer which creates a stream from the topic.  From
> there I'll use the windowed timeframes and groupBy to group by the
> attributes on each given day.  At the end of the day I'd need to read out
> the data store to an external system for storage.  Since I won't know all
> the values I'd need something similar to the KVStore.all() but for
> WindowedKV Stores.  This appears that it'd be possible in 1.1 with this
> commit: https://github.com/apache/kafka/commit/
> 1d1c8575961bf6bce7decb049be7f10ca76bd0c5 .
>
> Is this the best approach to doing this?  Or would I be better using the
> stream to listen and then an external DB like Aerospike to store the counts
> and read out of it directly end of day.
>
> Thanks for the help!
> Daum
>

Reply via email to