Use case:
I work for a company that ingests events that come from both real-time
sources (which spike during the day) and historical log data.

We want the real-time data processed in minutes, and the historical log
data processed within hours. The consumer's business logic is the same.

Our current plan is to have two topics, and two downstream consumer groups.
We plan to have the "hot" consumer group of the real-time data provisioned
at the 90th percentile of inbound message rate. And the "cold" log data at
the 60th percentile because it's okay if it takes longer to absorb spikes
in cold data.

Priority topics could *potentially* solve this.

However, one problem we've hit with a similar priority queuing system built
using a different tech stack was that if there was even a handful of
messages in the priority queues, those would keep the consumer just busy
enough that the cold data would never be processed.

The underlying root cause of the problem was two-fold:
1) the API only returned messages from a single queue at a time, so even if
the consumer requested 1,000 messages, the scheduler would see a message in
the hot queue and immediately return it. By the time the consumer processed
that and requested another batch of messages, one more message had trickled
into the hot queue. Versus if the API made sure to return a full batch,
first by filling up the hot queue and then from the cold queue, we could
still get batch efficiency at the network / consumer / downstream DB call
layers.
2) On the server side, switching between fetching messages for the
different queues seemed to be expensive. I'm not sure if that was due to an
inefficient scheduler, lack of memory, or poor I/O management. I suspect
Kafka wouldn't hit this as long as the messages were present in the page
cache, but it's just something to keep in mind--how this is implemented
matters from a performance/starvation standpoint.

So from a design standpoint, I think that means that for a priority
queueing design to minimize starvation, the design criteria should probably
be "returning messages based on priority, but be sure to also keep the
consumer fully occupied"

If done right, this would make our lives much easier operationally (only
one consumer group to manage, not two) and make our consumer usage more
efficient.


On Thu, Jan 17, 2019 at 4:20 AM Tobias Adamson <tob...@stargazer.com.sg>
wrote:

> Use cases: prioritise current data
>
> When processing messages sometimes there is a need to re process old data.
> It would be nice to be abled to send the old data as messages to a
> separate topic and that would only be processed when the current topic
> doesn’t have any messages left to process.
> This would prevent customers getting delays in current data processing due
> to message processors being  busy processing old data.
>
>
> > On 17 Jan 2019, at 7:55 PM, Tim Ward <tim.w...@origamienergy.com> wrote:
> >
> > Use cases: processing alerts.
> >
> > High priority alerts ("a large chunk of your system has stopped
> providing service, immediate action essential") should be processed before
> low priority alerts ("some minor component has put out a not-very serious
> warning, somebody should probably have a look at it when they get bored"),
> of which there could be a long queue.
> >
> > Urgent alerts (a phone call telling someone "you need to do this now")
> should be processed before non-urgent alerts (a phone call telling someone
> "FYI, such and such is going to happen in a couple of hours").
> >
> > Tim Ward
> >
> > -----Original Message-----
> > From: n...@afshartous.com <n...@afshartous.com>
> > Sent: 17 January 2019 02:52
> > To: users@kafka.apache.org
> > Subject: Prioritized Topics for Kafka
> >
> >
> >
> > Hi all,
> >
> > On the dev list we’ve been discussing a proposed new feature
> (prioritized topics).  In a nutshell, when consuming from a set of topics
> with assigned priorities, consumption from lower-priority topics only
> occurs if there’s no data flowing in from a higher-priority topic.
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349:+Priorities+for+Source+Topics
> >
> >
> > One question is are there use-cases for the proposed API.  If you think
> this would be useful and have use-cases in mind please reply with the
> use-cases.
> >
> > Its also possible to implement prioritization with the existing API by
> using a combination of pausing, resuming, and local buffering.  The
> question is then does it make sense to introduce the proposed higher-level
> API to make this easier ?
> >
> > The responses will be used as input to determine if we move ahead with
> the proposal.  Thanks in advance for input.
> >
> > Cheers,
> > --
> >      Nick
> >
> > The contents of this email and any attachment are confidential to the
> intended recipient(s). If you are not an intended recipient: (i) do not
> use, disclose, distribute, copy or publish this email or its contents; (ii)
> please contact the sender immediately; and (iii) delete this email. Our
> privacy policy is available here:
> https://origamienergy.com/privacy-policy/. Origami Energy Limited
> (company number 8619644); Origami Storage Limited (company number 10436515)
> and OSSPV001 Limited (company number 10933403), each registered in England
> and each with a registered office at: Ashcombe Court, Woolsack Way,
> Godalming, GU7 1LQ.
>
>

-- 

*Jeff Widman*
jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
<><

Reply via email to