Re: [VOTE] KIP-349 Priorities for Source Topics

2019-03-25 Thread Sönke Liebau
Hi Colin,

that is definitely a good option and will cover 90% of all use cases
(probaby more).

However strictly speaking it only addresses one half of the issue unless I
am mistaken. The internal behavior of the KafkaConsumer (which partition
the fetcher gets data from next and which buffered data is returned on the
next poll) is not affected by this. So records will only "jump the queue"
once they leave the KafkaConsumer, until then they will need to fairly
queue just like the rest of the messages.
Again, this will be sufficient in most cases, but if you want high priority
messages to actually jump to the front of the queue you would probably want
to combine both approaches and have a consumer for high prio topics and one
for the rest, both feeding into the same prioritized queue.

Best regards,
Sönke

On Mon, Mar 25, 2019 at 5:43 AM Colin McCabe  wrote:

> On Sat, Mar 23, 2019, at 18:41, nathank...@gmail.com wrote:
> >
> >
> > On 2019/01/28 02:26:31, n...@afshartous.com wrote:
> > > Hi Sönke,
> > >
> > > Thanks for taking the time to review.  I’ve put KIP-349 into
> hibernation.
> > >
> > > Thanks also to everyone who participated in the discussion.
> > >
> > > Best regards,
> > > --
> > >   Nick
> > >
> > > > On Jan 25, 2019, at 5:51 AM, Sönke Liebau <
> soenke.lie...@opencore.com.INVALID> wrote:
> > > >
> > > > a bit late to the party, sorry. I recently spent some time looking
> > > > into this / a similar issue [1].
> > > > After some investigation and playing around with settings I think
> that
> > > > the benefit that could be gained from this is somewhat limited and
> > > > probably outweighed by the implementation effort.
> > > >
> > > > The consumer internal are already geared towards treating partitions
> > > > fairly so that no partition has to wait an undue amount of time and
> > > > this can be further tuned for latency over throughput. Additionally,
> > > > if this is a large issue for someone, there is always the option of
> > > > having a dedicated consumer reading only from the control topic,
> which
> > > > would mean that messages from that topic are received "immediately".
> > > > For a Kafka Streams job it would probably make sense to create two
> > > > input streams and then merging those as a first step.
> > > >
> > > > I think with these knobs a fairly large amount of flexibility can be
> > > > achieved so that there is no urgent need to implement priorities.
> > > >
> > > > So my personal preference would be to set this KIP to dormant for
> now.
> > >
> > >
> > >
> > >
> > >
> > >
> > Hello Nick,
> >
> > I'm extremely new to Kafka, but I was attempting to set up a per-topic
> > priority application, and ended up finding this thread. I'm having
> > difficulty seeing how one can implement it with pause/resume. Would you
> > elaborate?
> >
> > Since those operations are per-partition, and when you stop a
> > partition, it attempts to re-balance, I would need to stop all
> > partitions. Even then, it would try to finish the current transactions
> > instead of immediately putting it on hold and processing other topics.
>
> Hi nathankski,
>
> Calling pause() on a partition doesn't trigger a re-balance or try to
> finish the current transactions.  It just means that you won't get more
> records for that partition until you call resume() on it.
>
> >
> > It also looks like in order to determine if I had received messages
> > from the pri-1 topic, I would need to loop through all records, and
> > ignore those that weren't pri-1 until a poll failed to retrieve any,
> > which seems like it would screw up the other topics.
>
> One way to do this would be to have two threads.  The first thread calls
> poll() on the Kafka consumer.  It puts the records it retrieves into a
> PriorityBlockingQueue.  Records from pri-1 have the priority within the
> queue.
>
> The second thread retrieves records from the queue.  pri-1 records will
> always be pulled out of the PriorityBlockingQueue ahead of any other
> records, so they will be processed first.
>
> If the priority queue gets too big, you pause partitions until thread 2
> can clear the backlog.  The low-priority partition is paused first.
>
> best,
> Colin
>
> >
> > Thank you,
> >
> > Nathan
> >
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [VOTE] KIP-349 Priorities for Source Topics

2019-03-24 Thread Colin McCabe
On Sat, Mar 23, 2019, at 18:41, nathank...@gmail.com wrote:
> 
> 
> On 2019/01/28 02:26:31, n...@afshartous.com wrote: 
> > Hi Sönke,
> > 
> > Thanks for taking the time to review.  I’ve put KIP-349 into hibernation.  
> > 
> > Thanks also to everyone who participated in the discussion.
> > 
> > Best regards,
> > --
> >   Nick
> > 
> > > On Jan 25, 2019, at 5:51 AM, Sönke Liebau 
> > >  wrote:
> > > 
> > > a bit late to the party, sorry. I recently spent some time looking
> > > into this / a similar issue [1].
> > > After some investigation and playing around with settings I think that
> > > the benefit that could be gained from this is somewhat limited and
> > > probably outweighed by the implementation effort.
> > > 
> > > The consumer internal are already geared towards treating partitions
> > > fairly so that no partition has to wait an undue amount of time and
> > > this can be further tuned for latency over throughput. Additionally,
> > > if this is a large issue for someone, there is always the option of
> > > having a dedicated consumer reading only from the control topic, which
> > > would mean that messages from that topic are received "immediately".
> > > For a Kafka Streams job it would probably make sense to create two
> > > input streams and then merging those as a first step.
> > > 
> > > I think with these knobs a fairly large amount of flexibility can be
> > > achieved so that there is no urgent need to implement priorities.
> > > 
> > > So my personal preference would be to set this KIP to dormant for now.
> > 
> > 
> > 
> > 
> > 
> > 
> Hello Nick,
> 
> I'm extremely new to Kafka, but I was attempting to set up a per-topic 
> priority application, and ended up finding this thread. I'm having 
> difficulty seeing how one can implement it with pause/resume. Would you 
> elaborate?
> 
> Since those operations are per-partition, and when you stop a 
> partition, it attempts to re-balance, I would need to stop all 
> partitions. Even then, it would try to finish the current transactions 
> instead of immediately putting it on hold and processing other topics. 

Hi nathankski,

Calling pause() on a partition doesn't trigger a re-balance or try to finish 
the current transactions.  It just means that you won't get more records for 
that partition until you call resume() on it.

> 
> It also looks like in order to determine if I had received messages 
> from the pri-1 topic, I would need to loop through all records, and 
> ignore those that weren't pri-1 until a poll failed to retrieve any, 
> which seems like it would screw up the other topics.

One way to do this would be to have two threads.  The first thread calls poll() 
on the Kafka consumer.  It puts the records it retrieves into a 
PriorityBlockingQueue.  Records from pri-1 have the priority within the queue.

The second thread retrieves records from the queue.  pri-1 records will always 
be pulled out of the PriorityBlockingQueue ahead of any other records, so they 
will be processed first.

If the priority queue gets too big, you pause partitions until thread 2 can 
clear the backlog.  The low-priority partition is paused first.

best,
Colin

> 
> Thank you,
> 
> Nathan
>


Re: [VOTE] KIP-349 Priorities for Source Topics

2019-03-24 Thread Sönke Liebau
Hi Nathan,

I have a couple of remarks/questions about your mail, if I may.

First of all, the javadoc for the pause operation of KafkaConsumer states:
"Suspend fetching from the requested partitions. Future calls to
poll(Duration)

 will not return any records from these partitions until they have been
resumed using resume(Collection)
.
Note that this method does not affect partition subscription. In
particular, it does not cause a group rebalance when automatic assignment
is used." [1]
You mentioned that "those operations" cause a rebalance, can you perhaps
elaborate on that some more?

Second, you state that "it would try to finish the current transactions",
which confuses me a little as well, since the consumer is not really aware
of transactions in a meaningful way. Or does "transaction" in this case
refer to your last call to poll()?

Have you looked into splitting your subscription across two consumers, one
for high priority topics, one for low(er) priority topics? Unless you are
looking for a dynamic, multi-tier priority system across many topics, that
might be your best bet. This works quite well for scenarios where you have
one topic that acts as a control plane (think start,stop processing type of
messages) and a second topic contains the actual data.

Best regards,
Sönke






[1]
https://kafka.apache.org/20/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#pause-java.util.Collection-


On Sun, Mar 24, 2019 at 2:41 AM nathank...@gmail.com 
wrote:

>
>
> On 2019/01/28 02:26:31, n...@afshartous.com wrote:
> > Hi Sönke,
> >
> > Thanks for taking the time to review.  I’ve put KIP-349 into
> hibernation.
> >
> > Thanks also to everyone who participated in the discussion.
> >
> > Best regards,
> > --
> >   Nick
> >
> > > On Jan 25, 2019, at 5:51 AM, Sönke Liebau 
> > > 
> wrote:
> > >
> > > a bit late to the party, sorry. I recently spent some time looking
> > > into this / a similar issue [1].
> > > After some investigation and playing around with settings I think that
> > > the benefit that could be gained from this is somewhat limited and
> > > probably outweighed by the implementation effort.
> > >
> > > The consumer internal are already geared towards treating partitions
> > > fairly so that no partition has to wait an undue amount of time and
> > > this can be further tuned for latency over throughput. Additionally,
> > > if this is a large issue for someone, there is always the option of
> > > having a dedicated consumer reading only from the control topic, which
> > > would mean that messages from that topic are received "immediately".
> > > For a Kafka Streams job it would probably make sense to create two
> > > input streams and then merging those as a first step.
> > >
> > > I think with these knobs a fairly large amount of flexibility can be
> > > achieved so that there is no urgent need to implement priorities.
> > >
> > > So my personal preference would be to set this KIP to dormant for now.
> >
> >
> >
> >
> >
> >
> Hello Nick,
>
> I'm extremely new to Kafka, but I was attempting to set up a per-topic
> priority application, and ended up finding this thread. I'm having
> difficulty seeing how one can implement it with pause/resume. Would you
> elaborate?
>
> Since those operations are per-partition, and when you stop a partition,
> it attempts to re-balance, I would need to stop all partitions. Even then,
> it would try to finish the current transactions instead of immediately
> putting it on hold and processing other topics.
>
> It also looks like in order to determine if I had received messages from
> the pri-1 topic, I would need to loop through all records, and ignore those
> that weren't pri-1 until a poll failed to retrieve any, which seems like it
> would screw up the other topics.
>
> Thank you,
>
> Nathan
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [VOTE] KIP-349 Priorities for Source Topics

2019-03-23 Thread nathankski



On 2019/01/28 02:26:31, n...@afshartous.com wrote: 
> Hi Sönke,
> 
> Thanks for taking the time to review.  I’ve put KIP-349 into hibernation.  
> 
> Thanks also to everyone who participated in the discussion.
> 
> Best regards,
> --
>   Nick
> 
> > On Jan 25, 2019, at 5:51 AM, Sönke Liebau 
> >  wrote:
> > 
> > a bit late to the party, sorry. I recently spent some time looking
> > into this / a similar issue [1].
> > After some investigation and playing around with settings I think that
> > the benefit that could be gained from this is somewhat limited and
> > probably outweighed by the implementation effort.
> > 
> > The consumer internal are already geared towards treating partitions
> > fairly so that no partition has to wait an undue amount of time and
> > this can be further tuned for latency over throughput. Additionally,
> > if this is a large issue for someone, there is always the option of
> > having a dedicated consumer reading only from the control topic, which
> > would mean that messages from that topic are received "immediately".
> > For a Kafka Streams job it would probably make sense to create two
> > input streams and then merging those as a first step.
> > 
> > I think with these knobs a fairly large amount of flexibility can be
> > achieved so that there is no urgent need to implement priorities.
> > 
> > So my personal preference would be to set this KIP to dormant for now.
> 
> 
> 
> 
> 
> 
Hello Nick,

I'm extremely new to Kafka, but I was attempting to set up a per-topic priority 
application, and ended up finding this thread. I'm having difficulty seeing how 
one can implement it with pause/resume. Would you elaborate?

Since those operations are per-partition, and when you stop a partition, it 
attempts to re-balance, I would need to stop all partitions. Even then, it 
would try to finish the current transactions instead of immediately putting it 
on hold and processing other topics. 

It also looks like in order to determine if I had received messages from the 
pri-1 topic, I would need to loop through all records, and ignore those that 
weren't pri-1 until a poll failed to retrieve any, which seems like it would 
screw up the other topics.

Thank you,

Nathan


Re: [VOTE] KIP-349 Priorities for Source Topics

2019-01-27 Thread nick
Hi Sönke,

Thanks for taking the time to review.  I’ve put KIP-349 into hibernation.  

Thanks also to everyone who participated in the discussion.

Best regards,
--
  Nick

> On Jan 25, 2019, at 5:51 AM, Sönke Liebau 
>  wrote:
> 
> a bit late to the party, sorry. I recently spent some time looking
> into this / a similar issue [1].
> After some investigation and playing around with settings I think that
> the benefit that could be gained from this is somewhat limited and
> probably outweighed by the implementation effort.
> 
> The consumer internal are already geared towards treating partitions
> fairly so that no partition has to wait an undue amount of time and
> this can be further tuned for latency over throughput. Additionally,
> if this is a large issue for someone, there is always the option of
> having a dedicated consumer reading only from the control topic, which
> would mean that messages from that topic are received "immediately".
> For a Kafka Streams job it would probably make sense to create two
> input streams and then merging those as a first step.
> 
> I think with these knobs a fairly large amount of flexibility can be
> achieved so that there is no urgent need to implement priorities.
> 
> So my personal preference would be to set this KIP to dormant for now.







Re: [VOTE] KIP-349 Priorities for Source Topics

2019-01-25 Thread Sönke Liebau
Hi Nick,

a bit late to the party, sorry. I recently spent some time looking
into this / a similar issue [1].
After some investigation and playing around with settings I think that
the benefit that could be gained from this is somewhat limited and
probably outweighed by the implementation effort.

The consumer internal are already geared towards treating partitions
fairly so that no partition has to wait an undue amount of time and
this can be further tuned for latency over throughput. Additionally,
if this is a large issue for someone, there is always the option of
having a dedicated consumer reading only from the control topic, which
would mean that messages from that topic are received "immediately".
For a Kafka Streams job it would probably make sense to create two
input streams and then merging those as a first step.

I think with these knobs a fairly large amount of flexibility can be
achieved so that there is no urgent need to implement priorities.

So my personal preference would be to set this KIP to dormant for now.

Best regards,
Sönke

[1] 
https://lists.apache.org/thread.html/07b5a9f1232ea139e01c477481a9d77208d8e8b2e55d8608a0271417@%3Cdev.kafka.apache.org%3E

On Fri, Jan 25, 2019 at 1:21 AM  wrote:
>
>
> Hi Colin,
>
> > On Jan 24, 2019, at 12:14 PM, Colin McCabe  wrote:
> >
> > Users almost always like the idea of new features, whatever they are.  But 
> > that doesn't mean that the feature would necessarily work well or be 
> > necessary.
>
> Yes, though we should certainly consider the responses on the user list as 
> input (Subject: Prioritized Topics for Kafka).
>
> > If you still want to pursue this, then I suggest gathering a set of 
> > use-cases that can't be addressed through the means we discussed here 
> > previously.  So, something that can't effectively be addressed through 
> > using the pause and resume API.
>
> We’ve discussed this point before.  I accept you point that a user could 
> implement this behavior with pause and resume.  This KIP is about creating a 
> higher-level API to make it easier to do so.
>
> > Then come up with a concrete proposal that addresses all the questions we 
> > have, including about starvation, incremental fetch requests, and so on.
>
> To me it seems like there’s only one outstanding issue here (incremental 
> fetch), and we could just pick one of the options.  Starvation is by design.  
> I’m not sure what “and so on” references.
>
> > This could be a lot of work.  If you're looking for a way to make more 
> > contributions, I'd recommend getting started with something easier.
>
>
> Yes it does.  And after 6 months of (sometimes circular) discussion I’d like 
> to either move towards a vote or set the status of this KIP to dormant until 
> if and when someone else picks up it up.
>
> Does anybody else have input on either having a vote or setting the KIP 
> dormant ?
>
> Cheers,
> --
>   Nick
>
>
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [VOTE] KIP-349 Priorities for Source Topics

2019-01-24 Thread nick

Hi Colin,

> On Jan 24, 2019, at 12:14 PM, Colin McCabe  wrote:
> 
> Users almost always like the idea of new features, whatever they are.  But 
> that doesn't mean that the feature would necessarily work well or be 
> necessary.

Yes, though we should certainly consider the responses on the user list as 
input (Subject: Prioritized Topics for Kafka).

> If you still want to pursue this, then I suggest gathering a set of use-cases 
> that can't be addressed through the means we discussed here previously.  So, 
> something that can't effectively be addressed through using the pause and 
> resume API.  
 
We’ve discussed this point before.  I accept you point that a user could 
implement this behavior with pause and resume.  This KIP is about creating a 
higher-level API to make it easier to do so.  

> Then come up with a concrete proposal that addresses all the questions we 
> have, including about starvation, incremental fetch requests, and so on.

To me it seems like there’s only one outstanding issue here (incremental 
fetch), and we could just pick one of the options.  Starvation is by design.  
I’m not sure what “and so on” references.  

> This could be a lot of work.  If you're looking for a way to make more 
> contributions, I'd recommend getting started with something easier.


Yes it does.  And after 6 months of (sometimes circular) discussion I’d like to 
either move towards a vote or set the status of this KIP to dormant until if 
and when someone else picks up it up.

Does anybody else have input on either having a vote or setting the KIP dormant 
?

Cheers,
--
  Nick





Re: [VOTE] KIP-349 Priorities for Source Topics

2019-01-24 Thread Colin McCabe
On Thu, Jan 24, 2019, at 05:31, n...@afshartous.com wrote:
> 
> > On Jan 17, 2019, at 8:49 PM, n...@afshartous.com wrote:
> > 
> >> On Jan 15, 2019, at 2:26 PM, Colin McCabe  >> > wrote:
> >> 
> >> I think it makes sense to go back to use-cases again.  So far, all of the 
> >> use-cases we discussed could be handled by pause and resume.  So it makes 
> >> sense to try to figure out what the issue with those APIs is.  Are they 
> >> not well-documented enough?  Is there something higher-level we could 
> >> build on top to make them easier to use?
> >> 
> >> It would be better to wait until a user comes forward and with a case 
> >> where priorities are needed, to implement them.  Since then we would know 
> >> more about what the API should be, etc.
> > 
> > 
> > Hi Colin,
> > 
> > I agree that the use-cases are important.  Rather than wait though I took 
> > some initiative and posted the message below to the Kafka user list 
> > (Subject: Prioritized Topics for Kafka).
> > Since yesterday there have been 6 replies containing 7 different use-cases 
> > and very positive feedback.  Please review.
> > 
> >https://lists.apache.org/list.html?us...@kafka.apache.org 
> > 
> > 
> > At this point I feel like we have enough info and would like to try and 
> > work towards a vote or set the status of the KIP to dormant.  
> > 
> 
> 
> Hi Colin,
> 
> Just bumping this thread to see if you’ve had a chance to review the 
> use-cases on the thread on the user’s list.
> 

Hi Nick,

Users almost always like the idea of new features, whatever they are.  But that 
doesn't mean that the feature would necessarily work well or be necessary.

If you still want to pursue this, then I suggest gathering a set of use-cases 
that can't be addressed through the means we discussed here previously.  So, 
something that can't effectively be addressed through using the pause and 
resume API.  Then come up with a concrete proposal that addresses all the 
questions we have, including about starvation, incremental fetch requests, and 
so on.

This could be a lot of work.  If you're looking for a way to make more 
contributions, I'd recommend getting started with something easier.

best,
Colin

> Cheers,
> --
>   Nick
> 
> 
> 
>


Re: [VOTE] KIP-349 Priorities for Source Topics

2019-01-24 Thread Jan Filipiak


On 24.01.2019 15:51, Thomas Becker wrote:
> Yes, I think this type of strategy interface would be valuable.
> 

Thank you for leaving this here!


Re: [VOTE] KIP-349 Priorities for Source Topics

2019-01-24 Thread Thomas Becker
Yes, I think this type of strategy interface would be valuable.

On Wed, 2019-01-16 at 15:41 +, Jan Filipiak wrote:


On 16.01.2019 14:05, Thomas Becker wrote:

I'm going to bow out of this discussion since it's been made clear that

the feature is not targeted at streams. But for the record, my desire is

to have an alternative to the timestamp based message choosing strategy

streams currently imposes, and I thought topic prioritization in the

consumer could potentially enable that. See

https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-4113data=02%7C01%7CThomas.Becker%40tivo.com%7C3b9d3e621aa34f3407ca08d67bc925ba%7Cd05b7c6912014c0db45d7f1dcc227e4d%7C1%7C0%7C636832501188529765sdata=MigH1qL0irqfOWk8K3yN3FBDJlmxXSPxq4HhcdAA3lQ%3Dreserved=0


-Tommy



Would you be so kind to leave an impression about a MessageChooser

interface? Might be important for an extra KIP later


Best Jan


--
[cid:21b269bfb8fb69852562f383cafc57a80e6a5ddc.camel@tivo.com] Tommy Becker
Principal Engineer
Personalized Content Discovery
O +1 919.460.4747
tivo.com



This email and any attachments may contain confidential and privileged material 
for the sole use of the intended recipient. Any review, copying, or 
distribution of this email (or any attachments) by others is prohibited. If you 
are not the intended recipient, please contact the sender immediately and 
permanently delete this email and any attachments. No employee or agent of TiVo 
Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by 
email. Binding agreements with TiVo Inc. may only be made by a signed written 
agreement.


Re: [VOTE] KIP-349 Priorities for Source Topics

2019-01-24 Thread nick

> On Jan 17, 2019, at 8:49 PM, n...@afshartous.com wrote:
> 
>> On Jan 15, 2019, at 2:26 PM, Colin McCabe > > wrote:
>> 
>> I think it makes sense to go back to use-cases again.  So far, all of the 
>> use-cases we discussed could be handled by pause and resume.  So it makes 
>> sense to try to figure out what the issue with those APIs is.  Are they not 
>> well-documented enough?  Is there something higher-level we could build on 
>> top to make them easier to use?
>> 
>> It would be better to wait until a user comes forward and with a case where 
>> priorities are needed, to implement them.  Since then we would know more 
>> about what the API should be, etc.
> 
> 
> Hi Colin,
> 
> I agree that the use-cases are important.  Rather than wait though I took 
> some initiative and posted the message below to the Kafka user list (Subject: 
> Prioritized Topics for Kafka).
> Since yesterday there have been 6 replies containing 7 different use-cases 
> and very positive feedback.  Please review.
> 
>https://lists.apache.org/list.html?us...@kafka.apache.org 
> 
> 
> At this point I feel like we have enough info and would like to try and work 
> towards a vote or set the status of the KIP to dormant.  
> 


Hi Colin,

Just bumping this thread to see if you’ve had a chance to review the use-cases 
on the thread on the user’s list.

Cheers,
--
  Nick





Re: [VOTE] KIP-349 Priorities for Source Topics

2019-01-17 Thread nick


> On Jan 15, 2019, at 2:26 PM, Colin McCabe  wrote:
> 
> I think it makes sense to go back to use-cases again.  So far, all of the 
> use-cases we discussed could be handled by pause and resume.  So it makes 
> sense to try to figure out what the issue with those APIs is.  Are they not 
> well-documented enough?  Is there something higher-level we could build on 
> top to make them easier to use?
> 
> It would be better to wait until a user comes forward and with a case where 
> priorities are needed, to implement them.  Since then we would know more 
> about what the API should be, etc.


Hi Colin,

I agree that the use-cases are important.  Rather than wait though I took some 
initiative and posted the message below to the Kafka user list (Subject: 
Prioritized Topics for Kafka).
Since yesterday there have been 6 replies containing 7 different use-cases and 
very positive feedback.  Please review.

   https://lists.apache.org/list.html?us...@kafka.apache.org 


At this point I feel like we have enough info and would like to try and work 
towards a vote or set the status of the KIP to dormant.  

Cheers,
--
  Nick



> On Jan 16, 2019, at 9:51 PM, n...@afshartous.com wrote:
> 
> Hi all,
> 
> On the dev list we’ve been discussing a proposed new feature (prioritized 
> topics). In a nutshell, when consuming from a set of topics with assigned 
> priorities, consumption from lower-priority topics only occurs if there’s no 
> data flowing in from a higher-priority topic.  
> 
>  
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics
>  
>   
> >
> 
> One question is are there use-cases for the proposed API.  If you think this 
> would be useful and have use-cases in mind please reply with the use-cases.
> 
> Its also possible to implement prioritization with the existing API by using 
> a combination of pausing, resuming, and local buffering.  The question is 
> then does it make sense to introduce the proposed higher-level API to make 
> this easier ?
> 
> The responses will be used as input to determine if we move ahead with the 
> proposal.  Thanks in advance for input.  
> 
> Cheers,
> --
>  Nick




Re: [VOTE] KIP-349 Priorities for Source Topics

2019-01-16 Thread Jan Filipiak


On 16.01.2019 14:05, Thomas Becker wrote:
> I'm going to bow out of this discussion since it's been made clear that
> the feature is not targeted at streams. But for the record, my desire is
> to have an alternative to the timestamp based message choosing strategy
> streams currently imposes, and I thought topic prioritization in the
> consumer could potentially enable that. See
> https://issues.apache.org/jira/browse/KAFKA-4113
>
> -Tommy
>

Would you be so kind to leave an impression about a MessageChooser 
interface? Might be important for an extra KIP later

Best Jan


Re: [VOTE] KIP-349 Priorities for Source Topics

2019-01-16 Thread Thomas Becker
I'm going to bow out of this discussion since it's been made clear that the 
feature is not targeted at streams. But for the record, my desire is to have an 
alternative to the timestamp based message choosing strategy streams currently 
imposes, and I thought topic prioritization in the consumer could potentially 
enable that. See https://issues.apache.org/jira/browse/KAFKA-4113

-Tommy

On Mon, 2019-01-14 at 19:19 -0500, n...@afshartous.com wrote:

Hi Jan,


As discussed, I’ve adopted the position that MessageChooser is orthogonal to 
topic prioritization and hence outside the scope of KIP-349.

--

  Nick



On Jan 14, 2019, at 12:47 AM, Jan Filipiak 
mailto:jan.filip...@trivago.com>> wrote:


On 14.01.2019 02:48, n...@afshartous.com 
> wrote:



On reflection, it would be hard to describe the semantics of an API that tried 
to address starvation by temporarily disabling prioritization, and then 
oscillating back and forth.

Thus I agree that it makes sense not to try and address starvation to Mathias’ 
point that this is intended by design.  The KIP has been updated to reflect 
this by removing the second method.



The semantics of almost everything are hard to describe with only those

two tools at hand. Just here to remember yall that Samza already shows

us the interface of a powerful enough abstraction to get stuff done :)


https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsamza.apache.org%2Flearn%2Fdocumentation%2F0.12%2Fapi%2Fjavadocs%2Forg%2Fapache%2Fsamza%2Fsystem%2Fchooser%2FMessageChooser.htmldata=02%7C01%7CThomas.Becker%40tivo.com%7C442806cbda9d49b2495b08d67a7f1886%7Cd05b7c6912014c0db45d7f1dcc227e4d%7C1%7C0%7C636831083628567804sdata=KQSr42ldcMoR7xYUgJUOL6U%2FzsQyMGrE7xXXuscQVXA%3Dreserved=0
 



welcome :)






--
[cid:b273dd8104d798cae8911c5c326b64f16294580e.camel@tivo.com] Tommy Becker
Principal Engineer
Personalized Content Discovery
O +1 919.460.4747
tivo.com



This email and any attachments may contain confidential and privileged material 
for the sole use of the intended recipient. Any review, copying, or 
distribution of this email (or any attachments) by others is prohibited. If you 
are not the intended recipient, please contact the sender immediately and 
permanently delete this email and any attachments. No employee or agent of TiVo 
Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by 
email. Binding agreements with TiVo Inc. may only be made by a signed written 
agreement.


Re: [VOTE] KIP-349 Priorities for Source Topics

2019-01-15 Thread Colin McCabe
On Sun, Jan 13, 2019, at 18:13, n...@afshartous.com wrote:
> Thanks Colin and Mathias.
> 
> > On Jan 12, 2019, at 8:27 PM, Matthias J. Sax  wrote:
> > 
> > Thus, I would suggest to limit this KIP to the consumer only, otherwise,
> > the scope will be too large and this KIP will drag on even longer. If we
> > really want to add this to Kafka Streams, I expect a long and difficult
> > discussion about this by itself, and thus, doing this in a follow up KIP
> > (if there is any demand) seems to be the better approach.
> > 
> 
> Agreed, and my intent is to limit the scope to the consumer.  
> 
> > About the starvation issue: maybe it's a bold claim, but is a potential
> > starvation of a low-priority topic not intended by design if topics have

Yeah, I was thinking this as well.  If you want strict priority behavior, high 
priority tasks should always take priority over low priority ones.

> 
> On reflection, it would be hard to describe the semantics of an API 
> that tried to address starvation by temporarily disabling 
> prioritization, and then oscillating back and forth. 
> Thus I agree that it makes sense not to try and address starvation to 
> Mathias’ point that this is intended by design.  The KIP has been 
> updated to reflect this by removing the second method.  

Yeah, I agree with that.  The problem is, the actual policy you want is kind of 
complex.  It's probably better in the application rather than in Kafka.

> 
> Regarding incremental fetch, Colin do you have any suggestion on which 
> option to adopt or how to proceed ?  

I think it makes sense to go back to use-cases again.  So far, all of the 
use-cases we discussed could be handled by pause and resume.  So it makes sense 
to try to figure out what the issue with those APIs is.  Are they not 
well-documented enough?  Is there something higher-level we could build on top 
to make them easier to use?

It would be better to wait until a user comes forward and with a case where 
priorities are needed, to implement them.  Since then we would know more about 
what the API should be, etc.

best,
Colin


Re: [VOTE] KIP-349 Priorities for Source Topics

2019-01-14 Thread nick

Hi Jan,

As discussed, I’ve adopted the position that MessageChooser is orthogonal to 
topic prioritization and hence outside the scope of KIP-349.
--
  Nick


> On Jan 14, 2019, at 12:47 AM, Jan Filipiak  wrote:
> 
> On 14.01.2019 02:48, n...@afshartous.com  wrote:
> 
>> 
>> On reflection, it would be hard to describe the semantics of an API that 
>> tried to address starvation by temporarily disabling prioritization, and 
>> then oscillating back and forth.
>> Thus I agree that it makes sense not to try and address starvation to 
>> Mathias’ point that this is intended by design.  The KIP has been updated to 
>> reflect this by removing the second method.
>> 
> 
> The semantics of almost everything are hard to describe with only those 
> two tools at hand. Just here to remember yall that Samza already shows 
> us the interface of a powerful enough abstraction to get stuff done :)
> 
> https://samza.apache.org/learn/documentation/0.12/api/javadocs/org/apache/samza/system/chooser/MessageChooser.html
>  
> 
> 
> welcome :)






Re: [VOTE] KIP-349 Priorities for Source Topics

2019-01-13 Thread Jan Filipiak
On 14.01.2019 02:48, n...@afshartous.com wrote:

>
> On reflection, it would be hard to describe the semantics of an API that 
> tried to address starvation by temporarily disabling prioritization, and then 
> oscillating back and forth.
> Thus I agree that it makes sense not to try and address starvation to 
> Mathias’ point that this is intended by design.  The KIP has been updated to 
> reflect this by removing the second method.
>

The semantics of almost everything are hard to describe with only those 
two tools at hand. Just here to remember yall that Samza already shows 
us the interface of a powerful enough abstraction to get stuff done :)

https://samza.apache.org/learn/documentation/0.12/api/javadocs/org/apache/samza/system/chooser/MessageChooser.html

welcome :)


Re: [VOTE] KIP-349 Priorities for Source Topics

2019-01-13 Thread nick
Thanks Colin and Mathias.

> On Jan 12, 2019, at 8:27 PM, Matthias J. Sax  wrote:
> 
> Thus, I would suggest to limit this KIP to the consumer only, otherwise,
> the scope will be too large and this KIP will drag on even longer. If we
> really want to add this to Kafka Streams, I expect a long and difficult
> discussion about this by itself, and thus, doing this in a follow up KIP
> (if there is any demand) seems to be the better approach.
> 

Agreed, and my intent is to limit the scope to the consumer.  

> About the starvation issue: maybe it's a bold claim, but is a potential
> starvation of a low-priority topic not intended by design if topics have


On reflection, it would be hard to describe the semantics of an API that tried 
to address starvation by temporarily disabling prioritization, and then 
oscillating back and forth. 
Thus I agree that it makes sense not to try and address starvation to Mathias’ 
point that this is intended by design.  The KIP has been updated to reflect 
this by removing the second method.  

Regarding incremental fetch, Colin do you have any suggestion on which option 
to adopt or how to proceed ?  
--
  Nick





Re: [VOTE] KIP-349 Priorities for Source Topics

2019-01-12 Thread Matthias J. Sax
Thanks for the summary Colin.

One remark from my side: I have my doubts that topic priorities make
sense for Kafka Streams (at least not for the DSL).

Thus, I would suggest to limit this KIP to the consumer only, otherwise,
the scope will be too large and this KIP will drag on even longer. If we
really want to add this to Kafka Streams, I expect a long and difficult
discussion about this by itself, and thus, doing this in a follow up KIP
(if there is any demand) seems to be the better approach.

About the starvation issue: maybe it's a bold claim, but is a potential
starvation of a low-priority topic not intended by design if topics have
priorities? Thus, do we really need to address/fix this? To me, it seems
the topic priorities plus avoiding starvation for low-priority topics is
a contradiction.


-Matthias





On 1/11/19 7:00 AM, Adam Bellemare wrote:
> Hi Colin
> 
> Thanks for the sober second thought - I actually didn't see the
> inconclusive parts of the DISCUSS (I must have missed them when going
> through) so I am grateful you highlighted these. I will have to remove my
> +1 in light of the issues Colin has mentioned, but I will follow the
> discussion more carefully.
> 
> 
> 
> On Thu, Jan 10, 2019 at 5:41 PM Colin McCabe  wrote:
> 
>> Hi all,
>>
>> Just as a quick reminder, this is not really a complete proposal.  There
>> are a bunch of unresolved issues with this KIP.  One example is how this
>> interacts with incremental fetch sessions.  It is not mentioned anywhere in
>> the KIP text.  Previously we discussed some approaches, but there was no
>> clear consensus.
>>
>> Another example is the issue of starvation.  The KIP discusses "an idea"
>> for handling starvation, but the details are very sparse-- just a sentence
>> of two.  At minimum we would need some kind of configuration for the
>> proposed "lag deltas".  It's also not clear that the proposed mechanism
>> would work, since we don't receive lag metrics for partitions that we don't
>> fetch.  But if we do fetch from the partitions, we may receive data, which
>> would cause our policy to not be strict prioties.  Keep in mind, even
>> attempting to fetch 1 byte may cause us to read an entire message, as
>> described in KIP-74.
>>
>> It seems that we don't understand the potential use-cases.  The only
>> use-case referenced by the KIP is this one, by Bala Prassanna:
>>
>>  > We use Kafka to process the asynchronous events of our Document
>> Management
>>  > System such as preview generation, indexing for search etc.
>>  > The traffic gets generated via Web and Desktop Sync application. In
>> such
>>  > cases, we had to prioritize the traffic from web and consume them
>> first.
>>  > But this might lead to the starvation of events from sync if the
>> consumer
>>  > speed is slow and the event rate is high from web.  A solution to
>> handle
>>  > the starvation with a timeout after which the events are consumed
>> normally
>>  > for a specified period of time would be great and help us use our
>>  > resources effectively.
>>
>> Reading this carefully, it seems that the problem is actually starvation,
>> not implementing priorities.  Bala already implemented priorities outside
>> of Kafka.  If you read the discussion on KAFKA-6690, Bala also makes this
>> comment: "We would need this in both Consumer API and Streams API."  The
>> current KIP does not discuss adding priorities to Streams-- only to the
>> basic consumer API.  So it seems clear that KIP-349 does not address Bala's
>> use-case at all.
>>
>> Stepping back a little bit, it seems like a few people have spoken up
>> recently asking for some way to re-order the messages they receive from the
>> Kafka consumer.  For example, ChienHsing Wu has discussed a use-case where
>> he wants to receive messages in a "round robin" order.  All of this is
>> possible by doing some local buffering and using the pause and resume
>> APIs.  Perhaps we should consider better documenting these APIs, and adding
>> some examples.  Or perhaps we should consider some kind of API to do
>> pluggable buffering on the client side.
>>
>> In any case, this needs more discussion.  We need to be clear and definite
>> about what use cases we want to solve, and the tradeoffs we're making to
>> solve them.  For now, I have to reiterate my -1 (binding).
>>
>> Colin
>>
>>
>> On Thu, Jan 10, 2019, at 10:46, Adam Bellemare wrote:
>>> Looks good to me then!
>>>
>>> +1 non-binding
>>>
>>>
>>>
>>&g

Re: [VOTE] KIP-349 Priorities for Source Topics

2019-01-11 Thread Adam Bellemare
Hi Colin

Thanks for the sober second thought - I actually didn't see the
inconclusive parts of the DISCUSS (I must have missed them when going
through) so I am grateful you highlighted these. I will have to remove my
+1 in light of the issues Colin has mentioned, but I will follow the
discussion more carefully.



On Thu, Jan 10, 2019 at 5:41 PM Colin McCabe  wrote:

> Hi all,
>
> Just as a quick reminder, this is not really a complete proposal.  There
> are a bunch of unresolved issues with this KIP.  One example is how this
> interacts with incremental fetch sessions.  It is not mentioned anywhere in
> the KIP text.  Previously we discussed some approaches, but there was no
> clear consensus.
>
> Another example is the issue of starvation.  The KIP discusses "an idea"
> for handling starvation, but the details are very sparse-- just a sentence
> of two.  At minimum we would need some kind of configuration for the
> proposed "lag deltas".  It's also not clear that the proposed mechanism
> would work, since we don't receive lag metrics for partitions that we don't
> fetch.  But if we do fetch from the partitions, we may receive data, which
> would cause our policy to not be strict prioties.  Keep in mind, even
> attempting to fetch 1 byte may cause us to read an entire message, as
> described in KIP-74.
>
> It seems that we don't understand the potential use-cases.  The only
> use-case referenced by the KIP is this one, by Bala Prassanna:
>
>  > We use Kafka to process the asynchronous events of our Document
> Management
>  > System such as preview generation, indexing for search etc.
>  > The traffic gets generated via Web and Desktop Sync application. In
> such
>  > cases, we had to prioritize the traffic from web and consume them
> first.
>  > But this might lead to the starvation of events from sync if the
> consumer
>  > speed is slow and the event rate is high from web.  A solution to
> handle
>  > the starvation with a timeout after which the events are consumed
> normally
>  > for a specified period of time would be great and help us use our
>  > resources effectively.
>
> Reading this carefully, it seems that the problem is actually starvation,
> not implementing priorities.  Bala already implemented priorities outside
> of Kafka.  If you read the discussion on KAFKA-6690, Bala also makes this
> comment: "We would need this in both Consumer API and Streams API."  The
> current KIP does not discuss adding priorities to Streams-- only to the
> basic consumer API.  So it seems clear that KIP-349 does not address Bala's
> use-case at all.
>
> Stepping back a little bit, it seems like a few people have spoken up
> recently asking for some way to re-order the messages they receive from the
> Kafka consumer.  For example, ChienHsing Wu has discussed a use-case where
> he wants to receive messages in a "round robin" order.  All of this is
> possible by doing some local buffering and using the pause and resume
> APIs.  Perhaps we should consider better documenting these APIs, and adding
> some examples.  Or perhaps we should consider some kind of API to do
> pluggable buffering on the client side.
>
> In any case, this needs more discussion.  We need to be clear and definite
> about what use cases we want to solve, and the tradeoffs we're making to
> solve them.  For now, I have to reiterate my -1 (binding).
>
> Colin
>
>
> On Thu, Jan 10, 2019, at 10:46, Adam Bellemare wrote:
> > Looks good to me then!
> >
> > +1 non-binding
> >
> >
> >
> > > On Jan 10, 2019, at 1:22 PM, Afshartous, Nick 
> wrote:
> > >
> > >
> > > Hi Adam,
> > >
> > >
> > > This change is only intended for the basic consumer API.
> > >
> > >
> > > Cheers,
> > >
> > > --
> > >
> > >Nick
> > >
> > >
> > > 
> > > From: Adam Bellemare 
> > > Sent: Sunday, January 6, 2019 11:45 AM
> > > To: dev@kafka.apache.org
> > > Subject: Re: [VOTE] KIP-349 Priorities for Source Topics
> > >
> > > Hi Nick
> > >
> > > Is this change only for the basic consumer? How would this affect
> anything with Kafka Streams?
> > >
> > > Thanks
> > >
> > >
> > >> On Jan 5, 2019, at 10:52 PM, n...@afshartous.com wrote:
> > >>
> > >> Bumping again for more votes.
> > >> --
> > >> Nick
> > >>
> > >>
> > >>> On Dec 26, 2018, at 12:36 PM, n...@afshartous.c

Re: [VOTE] KIP-349 Priorities for Source Topics

2019-01-10 Thread Colin McCabe
Hi all,

Just as a quick reminder, this is not really a complete proposal.  There are a 
bunch of unresolved issues with this KIP.  One example is how this interacts 
with incremental fetch sessions.  It is not mentioned anywhere in the KIP text. 
 Previously we discussed some approaches, but there was no clear consensus.

Another example is the issue of starvation.  The KIP discusses "an idea" for 
handling starvation, but the details are very sparse-- just a sentence of two.  
At minimum we would need some kind of configuration for the proposed "lag 
deltas".  It's also not clear that the proposed mechanism would work, since we 
don't receive lag metrics for partitions that we don't fetch.  But if we do 
fetch from the partitions, we may receive data, which would cause our policy to 
not be strict prioties.  Keep in mind, even attempting to fetch 1 byte may 
cause us to read an entire message, as described in KIP-74.

It seems that we don't understand the potential use-cases.  The only use-case 
referenced by the KIP is this one, by Bala Prassanna:

 > We use Kafka to process the asynchronous events of our Document Management 
 > System such as preview generation, indexing for search etc.
 > The traffic gets generated via Web and Desktop Sync application. In such 
 > cases, we had to prioritize the traffic from web and consume them first.  
 > But this might lead to the starvation of events from sync if the consumer 
 > speed is slow and the event rate is high from web.  A solution to handle 
 > the starvation with a timeout after which the events are consumed normally 
 > for a specified period of time would be great and help us use our 
 > resources effectively.

Reading this carefully, it seems that the problem is actually starvation, not 
implementing priorities.  Bala already implemented priorities outside of Kafka. 
 If you read the discussion on KAFKA-6690, Bala also makes this comment: "We 
would need this in both Consumer API and Streams API."  The current KIP does 
not discuss adding priorities to Streams-- only to the basic consumer API.  So 
it seems clear that KIP-349 does not address Bala's use-case at all.

Stepping back a little bit, it seems like a few people have spoken up recently 
asking for some way to re-order the messages they receive from the Kafka 
consumer.  For example, ChienHsing Wu has discussed a use-case where he wants 
to receive messages in a "round robin" order.  All of this is possible by doing 
some local buffering and using the pause and resume APIs.  Perhaps we should 
consider better documenting these APIs, and adding some examples.  Or perhaps 
we should consider some kind of API to do pluggable buffering on the client 
side.

In any case, this needs more discussion.  We need to be clear and definite 
about what use cases we want to solve, and the tradeoffs we're making to solve 
them.  For now, I have to reiterate my -1 (binding).

Colin


On Thu, Jan 10, 2019, at 10:46, Adam Bellemare wrote:
> Looks good to me then!
> 
> +1 non-binding
> 
> 
> 
> > On Jan 10, 2019, at 1:22 PM, Afshartous, Nick  
> > wrote:
> > 
> > 
> > Hi Adam,
> > 
> > 
> > This change is only intended for the basic consumer API.
> > 
> > 
> > Cheers,
> > 
> > --
> > 
> >    Nick
> > 
> > 
> > 
> > From: Adam Bellemare 
> > Sent: Sunday, January 6, 2019 11:45 AM
> > To: dev@kafka.apache.org
> > Subject: Re: [VOTE] KIP-349 Priorities for Source Topics
> > 
> > Hi Nick
> > 
> > Is this change only for the basic consumer? How would this affect anything 
> > with Kafka Streams?
> > 
> > Thanks
> > 
> > 
> >> On Jan 5, 2019, at 10:52 PM, n...@afshartous.com wrote:
> >> 
> >> Bumping again for more votes.
> >> --
> >> Nick
> >> 
> >> 
> >>> On Dec 26, 2018, at 12:36 PM, n...@afshartous.com wrote:
> >>> 
> >>> Bumping this thread for more votes
> >>> 
> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D349-3A-2BPriorities-2Bfor-2BSource-2BTopics=DwIFAg=-SicqtCl7ffNuxX6bdsSog=P28z_ShLjFv5AP-w9-b_auYBx8qTrjk2JPYZKbjmJTs=5qg4fCOVMtRYYLu2e8h8KmDyis_uk3aFqT5Eq0x4hN8=Sbrd5XSwEZiMc9iTPJjRQafl4ubXwIOnsnFzhBEa0h0=
> >>>  
> >>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D349-3A-2BPriorities-2Bfor-2BSource-2BTopics=DwIFAg=-SicqtCl7ffNuxX6bdsSog=P28z_ShLjFv5AP-w9-b_auYBx8qTrjk2JPYZKbjmJTs=5qg4fCOVMtRYYLu2e8h8KmDyis_uk3aFqT5Eq0x4hN8=Sbrd5XSwEZiMc9iTPJjRQafl4ubXwIOnsnFzhBEa0h0=><https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_

Re: [VOTE] KIP-349 Priorities for Source Topics

2019-01-10 Thread Adam Bellemare
Looks good to me then!

+1 non-binding



> On Jan 10, 2019, at 1:22 PM, Afshartous, Nick  wrote:
> 
> 
> Hi Adam,
> 
> 
> This change is only intended for the basic consumer API.
> 
> 
> Cheers,
> 
> --
> 
>Nick
> 
> 
> 
> From: Adam Bellemare 
> Sent: Sunday, January 6, 2019 11:45 AM
> To: dev@kafka.apache.org
> Subject: Re: [VOTE] KIP-349 Priorities for Source Topics
> 
> Hi Nick
> 
> Is this change only for the basic consumer? How would this affect anything 
> with Kafka Streams?
> 
> Thanks
> 
> 
>> On Jan 5, 2019, at 10:52 PM, n...@afshartous.com wrote:
>> 
>> Bumping again for more votes.
>> --
>> Nick
>> 
>> 
>>> On Dec 26, 2018, at 12:36 PM, n...@afshartous.com wrote:
>>> 
>>> Bumping this thread for more votes
>>> 
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D349-3A-2BPriorities-2Bfor-2BSource-2BTopics=DwIFAg=-SicqtCl7ffNuxX6bdsSog=P28z_ShLjFv5AP-w9-b_auYBx8qTrjk2JPYZKbjmJTs=5qg4fCOVMtRYYLu2e8h8KmDyis_uk3aFqT5Eq0x4hN8=Sbrd5XSwEZiMc9iTPJjRQafl4ubXwIOnsnFzhBEa0h0=
>>>  
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D349-3A-2BPriorities-2Bfor-2BSource-2BTopics=DwIFAg=-SicqtCl7ffNuxX6bdsSog=P28z_ShLjFv5AP-w9-b_auYBx8qTrjk2JPYZKbjmJTs=5qg4fCOVMtRYYLu2e8h8KmDyis_uk3aFqT5Eq0x4hN8=Sbrd5XSwEZiMc9iTPJjRQafl4ubXwIOnsnFzhBEa0h0=><https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D349-3A-2BPriorities-2Bfor-2BSource-2BTopics=DwIFAg=-SicqtCl7ffNuxX6bdsSog=P28z_ShLjFv5AP-w9-b_auYBx8qTrjk2JPYZKbjmJTs=5qg4fCOVMtRYYLu2e8h8KmDyis_uk3aFqT5Eq0x4hN8=Sbrd5XSwEZiMc9iTPJjRQafl4ubXwIOnsnFzhBEa0h0=
>>>  
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D349-3A-2BPriorities-2Bfor-2BSource-2BTopics=DwIFAg=-SicqtCl7ffNuxX6bdsSog=P28z_ShLjFv5AP-w9-b_auYBx8qTrjk2JPYZKbjmJTs=5qg4fCOVMtRYYLu2e8h8KmDyis_uk3aFqT5Eq0x4hN8=Sbrd5XSwEZiMc9iTPJjRQafl4ubXwIOnsnFzhBEa0h0=>>
>> 
>> 
>> 
>> 


Re: [VOTE] KIP-349 Priorities for Source Topics

2019-01-10 Thread Afshartous, Nick

Hi Adam,


This change is only intended for the basic consumer API.


Cheers,

--

Nick



From: Adam Bellemare 
Sent: Sunday, January 6, 2019 11:45 AM
To: dev@kafka.apache.org
Subject: Re: [VOTE] KIP-349 Priorities for Source Topics

Hi Nick

Is this change only for the basic consumer? How would this affect anything with 
Kafka Streams?

Thanks


> On Jan 5, 2019, at 10:52 PM, n...@afshartous.com wrote:
>
> Bumping again for more votes.
> --
>  Nick
>
>
>> On Dec 26, 2018, at 12:36 PM, n...@afshartous.com wrote:
>>
>> Bumping this thread for more votes
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D349-3A-2BPriorities-2Bfor-2BSource-2BTopics=DwIFAg=-SicqtCl7ffNuxX6bdsSog=P28z_ShLjFv5AP-w9-b_auYBx8qTrjk2JPYZKbjmJTs=5qg4fCOVMtRYYLu2e8h8KmDyis_uk3aFqT5Eq0x4hN8=Sbrd5XSwEZiMc9iTPJjRQafl4ubXwIOnsnFzhBEa0h0=
>>  
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D349-3A-2BPriorities-2Bfor-2BSource-2BTopics=DwIFAg=-SicqtCl7ffNuxX6bdsSog=P28z_ShLjFv5AP-w9-b_auYBx8qTrjk2JPYZKbjmJTs=5qg4fCOVMtRYYLu2e8h8KmDyis_uk3aFqT5Eq0x4hN8=Sbrd5XSwEZiMc9iTPJjRQafl4ubXwIOnsnFzhBEa0h0=><https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D349-3A-2BPriorities-2Bfor-2BSource-2BTopics=DwIFAg=-SicqtCl7ffNuxX6bdsSog=P28z_ShLjFv5AP-w9-b_auYBx8qTrjk2JPYZKbjmJTs=5qg4fCOVMtRYYLu2e8h8KmDyis_uk3aFqT5Eq0x4hN8=Sbrd5XSwEZiMc9iTPJjRQafl4ubXwIOnsnFzhBEa0h0=
>>  
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D349-3A-2BPriorities-2Bfor-2BSource-2BTopics=DwIFAg=-SicqtCl7ffNuxX6bdsSog=P28z_ShLjFv5AP-w9-b_auYBx8qTrjk2JPYZKbjmJTs=5qg4fCOVMtRYYLu2e8h8KmDyis_uk3aFqT5Eq0x4hN8=Sbrd5XSwEZiMc9iTPJjRQafl4ubXwIOnsnFzhBEa0h0=>>
>
>
>
>


Re: [VOTE] KIP-349 Priorities for Source Topics

2019-01-06 Thread Adam Bellemare
Hi Nick

Is this change only for the basic consumer? How would this affect anything with 
Kafka Streams?

Thanks


> On Jan 5, 2019, at 10:52 PM, n...@afshartous.com wrote:
> 
> Bumping again for more votes.  
> --
>  Nick
> 
> 
>> On Dec 26, 2018, at 12:36 PM, n...@afshartous.com wrote:
>> 
>> Bumping this thread for more votes
>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349:+Priorities+for+Source+Topics
>>  
>> >  
>> >
> 
> 
> 
> 


Re: [VOTE] KIP-349 Priorities for Source Topics

2019-01-05 Thread nick
Bumping again for more votes.  
--
  Nick


> On Dec 26, 2018, at 12:36 PM, n...@afshartous.com wrote:
> 
> Bumping this thread for more votes
> 
>  
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349:+Priorities+for+Source+Topics
>  
>   
> >






Re: [VOTE] KIP-349 Priorities for Source Topics

2018-12-26 Thread nick

Hi All,

Bumping this thread for more votes

  
https://cwiki.apache.org/confluence/display/KAFKA/KIP-349:+Priorities+for+Source+Topics
 


Cheers,
--
  Nick





Re: [VOTE] KIP-349 Priorities for Source Topics

2018-10-27 Thread nick


> On Oct 26, 2018, at 2:00 PM, Colin McCabe  wrote:
>> 
> Priorities won't help for this use-case, right?  If the "web" partition has a 
> higher priority, and data is always available, there will *never* be any 
> events reported for "sync". Priorities don't prevent starvation-- they cause 
> starvation by design, because the high priority partition always takes 
> priority.

Starvation is certainly an issue, though we could include a timeout as Bala 
suggested to address this.  

> In general the best solution would probably be to have a work queue between 
> the consumer and the event handler, and manage the backpressure as 
> appropriate.  This could be done with pause and resume, as Streams does.


I agree that similar semantics could be achieved with a work queue.  What we’re 
voting on is the merits topic prioritization to make the API more expressive 
and to make it easier for developers to do this.

Thanks Colin for your vote on the KIP and for all you input.  I look forward to 
hearing from others.

Cheers,
--
  Nick




Re: [VOTE] KIP-349 Priorities for Source Topics

2018-10-26 Thread Colin McCabe
On Thu, Oct 25, 2018, at 18:16, n...@afshartous.com wrote:
> 
> The reporter of KAFKA-6690 (Bala) replied in the JIra ticket to my 
> question to elaborate about his use-case.  I don’t think he’s on the dev 
> list.  Here’s his response:  
> 
>   Bala:  Sorry about the delay in reply. We use Kafka to process the 
> asynchronous events of our Document Management System such as preview 
> generation, indexing for search etc. The traffic gets generated via Web 
> and Desktop Sync application. In such cases, we had to prioritize the 
> traffic from web and consume them first. But this might lead to the 
> starvation of events from sync if the consumer speed is slow and the 
> event rate is high from web. A solution to handle the starvation with a 
> timeout after which the events are consumed normally for a specified 
> period of time would be great and help us use our resources effectively.

Priorities won't help for this use-case, right?  If the "web" partition has a 
higher priority, and data is always available, there will *never* be any events 
reported for "sync".  Priorities don't prevent starvation-- they cause 
starvation by design, because the high priority partition always takes priority.

In general the best solution would probably be to have a work queue between the 
consumer and the event handler, and manage the backpressure as appropriate.  
This could be done with pause and resume, as Streams does.

best,
Colin


> 
> --
>   Nick
> 
> 
> 
> 
> > On Oct 18, 2018, at 12:23 PM, n...@afshartous.com wrote:
> > 
> >> On Oct 12, 2018, at 5:06 PM, Colin McCabe  wrote:
> >> 
> >> Maybe there's some really cool use-case that I haven't thought of.  But so 
> >> far I can't really think of any time I would need topic priorities if I 
> >> was muting topics and offloading blocking operations in a reasonable way.  
> >> It would be good to identify use-cases 
> > 
> > 
> > Hi Colin,
> > 
> > How about the use-case where there are multiple streams/topics, and the 
> > intent is to have a single consumer interleave the messages so that higher 
> > priority messages are processed first ?
> > That seems to be what the reporter of the associated Jira ticket
> > 
> >   https://issues.apache.org/jira/browse/KAFKA-6690 
> > 
> > 
> > has identified as a use-case he frequently encounters.  I’ve asked him to 
> > elaborate on the dev list though he has not responded yet.
> > 
> > Best,
> > --
> >  Nick
> > 
> > 
> > 
> 
> 
> 
> 
> 


Re: [VOTE] KIP-349 Priorities for Source Topics

2018-10-25 Thread nick

The reporter of KAFKA-6690 (Bala) replied in the JIra ticket to my question to 
elaborate about his use-case.  I don’t think he’s on the dev list.  Here’s his 
response:  

  Bala:  Sorry about the delay in reply. We use Kafka to process the 
asynchronous events of our Document Management System such as preview 
generation, indexing for search etc. The traffic gets generated via Web and 
Desktop Sync application. In such cases, we had to prioritize the traffic from 
web and consume them first. But this might lead to the starvation of events 
from sync if the consumer speed is slow and the event rate is high from web. A 
solution to handle the starvation with a timeout after which the events are 
consumed normally for a specified period of time would be great and help us use 
our resources effectively.

--
  Nick




> On Oct 18, 2018, at 12:23 PM, n...@afshartous.com wrote:
> 
>> On Oct 12, 2018, at 5:06 PM, Colin McCabe  wrote:
>> 
>> Maybe there's some really cool use-case that I haven't thought of.  But so 
>> far I can't really think of any time I would need topic priorities if I was 
>> muting topics and offloading blocking operations in a reasonable way.  It 
>> would be good to identify use-cases 
> 
> 
> Hi Colin,
> 
> How about the use-case where there are multiple streams/topics, and the 
> intent is to have a single consumer interleave the messages so that higher 
> priority messages are processed first ?
> That seems to be what the reporter of the associated Jira ticket
> 
>   https://issues.apache.org/jira/browse/KAFKA-6690 
> 
> 
> has identified as a use-case he frequently encounters.  I’ve asked him to 
> elaborate on the dev list though he has not responded yet.
> 
> Best,
> --
>  Nick
> 
> 
> 







Re: [VOTE] KIP-349 Priorities for Source Topics

2018-10-21 Thread Colin McCabe
On Thu, Oct 18, 2018, at 09:23, n...@afshartous.com wrote:
> 
> 
> > On Oct 12, 2018, at 5:06 PM, Colin McCabe  wrote:
> > 
> > Maybe there's some really cool use-case that I haven't thought of.  But so 
> > far I can't really think of any time I would need topic priorities if I was 
> > muting topics and offloading blocking operations in a reasonable way.  It 
> > would be good to identify use-cases 
> 
> 
> Hi Colin,
> 
> How about the use-case where there are multiple streams/topics, and the 
> intent is to have a single consumer interleave the messages so that 
> higher priority messages are processed first ?
> That seems to be what the reporter of the associated Jira ticket
> 
>https://issues.apache.org/jira/browse/KAFKA-6690 
> 
> 
> has identified as a use-case he frequently encounters.  I’ve asked him 
> to elaborate on the dev list though he has not responded yet.
> 
> Best,
> --
>   Nick

Thanks, Nick.  It will be interesting to hear more about that.

best,
Colin

> 
> 
> 


Re: [VOTE] KIP-349 Priorities for Source Topics

2018-10-18 Thread nick


> On Oct 12, 2018, at 5:06 PM, Colin McCabe  wrote:
> 
> Maybe there's some really cool use-case that I haven't thought of.  But so 
> far I can't really think of any time I would need topic priorities if I was 
> muting topics and offloading blocking operations in a reasonable way.  It 
> would be good to identify use-cases 


Hi Colin,

How about the use-case where there are multiple streams/topics, and the intent 
is to have a single consumer interleave the messages so that higher priority 
messages are processed first ?
That seems to be what the reporter of the associated Jira ticket

   https://issues.apache.org/jira/browse/KAFKA-6690 


has identified as a use-case he frequently encounters.  I’ve asked him to 
elaborate on the dev list though he has not responded yet.

Best,
--
  Nick





Re: [VOTE] KIP-349 Priorities for Source Topics

2018-10-12 Thread Colin McCabe
On Mon, Oct 8, 2018, at 12:35, Thomas Becker wrote:
> Well my (perhaps flawed) understanding of topic priorities is that lower 
> priority topics are not consumed as long as higher priority ones have 
> unconsumed messages (which means our position < HW). So if I'm doing 
> this manually, I have to make some determination as to whether my high 
> priority topic partitions are at the HW before I can decide if I want to 
> poll the lower priority ones. Right?

Hi Thomas,

You could periodically check the last committed position of various partitions 
using KafkaConsumer#committed.  But this would be very inefficient.  For one 
thing, you'd have to keep waking up your consumer thread all the time to do 
this.

The two-consumer solution that I suggested earlier just implies that you have 
two consumers, one for the control data and one for the non-control data.  In 
that case, as long as control data is available, your consumer will always try 
to read it.  It doesn't involve the caller checking committed position using 
KafkaConsumer#committed at any point.

Usually, consumers are reading data that is relatively recent.  If the consumer 
is too slow to keep up with the incoming messages over the long term, the 
system usually gets into a bad state.  I think this is one reason why it's hard 
to think of use-cases for this feature.  If you had a control partition and 
data partition, the data partition wouldn't really block you from getting the 
control messages in a timely fashion.  You almost certainly need to be able to 
keep up with both partitions anyway.  Also, if you have to do some very 
expensive processing on data messages, you should be offloading that processing 
to another thread, rather than doing the expensive thing in your consumer 
thread.  And you can mute a partition while you're processing an expensive 
message from that partition, so it doesn't really block the processing of other 
partitions anyway.

Maybe there's some really cool use-case that I haven't thought of.  But so far 
I can't really think of any time I would need topic priorities if I was muting 
topics and offloading blocking operations in a reasonable way.  It would be 
good to identify use-cases because it would motivate choices like how many 
priorities do we want (2? 256?  4 billion?) and what the API would be like, etc.

best,
Colin

> 
> On Fri, 2018-10-05 at 11:34 -0700, Colin McCabe wrote:
> 
> On Fri, Oct 5, 2018, at 10:58, Thomas Becker wrote:
> 
> Colin,
> 
> Would you mind sharing your vision for how this looks with multiple
> 
> consumers? I'm still getting my bearings with the new consumer but it's
> 
> not immediately obvious to me how this would work.
> 
> 
> Hi Thomas,
> 
> 
> I was just responding to the general idea that you would have some kind 
> of control topic that you wanted to read with very low latency, and some 
> kind of set of data topics where the latency requirements are less 
> strict.  In that case, you can just have two consumers: one for the low-
> latency topic, and one for the less low-latency topics.
> 
> 
> There's a lot of things in this picture that are unclear.  Does the data 
> in one set of topics have any relation to the data in the other?  Why do 
> we want a control channel distinct from the data channel?  That's why I 
> asked for clarification on the use-case.
> 
> 
> In particular, it doesn't seem particularly easy to know when you are at 
> the high
> 
> watermark of a topic.
> 
> 
> KafkaConsumer#committed will return the last committed offset for a 
> partition.  However, I'm not sure I understand why you want this 
> information in this case-- can you expand a bit on this?
> 
> 
> best,
> 
> Colin
> 
> 
> 
> 
> -Tommy
> 
> 
> On Mon, 2018-10-01 at 13:43 -0700, Colin McCabe wrote:
> 
> 
> Hi all,
> 
> 
> 
> I feel like the DISCUSS thread didn't really come to a conclusion, so a
> 
> vote would be premature here.
> 
> 
> 
> In particular, I still don't really understand the use-case for this
> 
> feature.  Can someone give a concrete scenario where you would need
> 
> this?  The control plane / data plane example that is listed in the KIP
> 
> doesn't require this feature.  You can just have one consumer for the
> 
> control plane, and one for the data plane, and do priority that way.
> 
> The discussion feels kind of unfocused since we haven't identified even
> 
> one concrete use-case that needs this feature.
> 
> 
> 
> Unfortunately, this is a feature which consumes server-side memory.  We
> 
> have to store the priorities somehow when doing incremental fetch
> 
> requests.  If we go with an int as suggested, then this is at least 4
> 
> bytes per partition per incremental fetch request.  It also makes it
> 
> more complex and potentially slower to maintain the linked list of
> 
> partitions in the fetch requests.  Before we think about this, I'd like
> 
> to have a concrete use-case in mind, so that we can evaluate the costs
> 
> versus benefits.
> 
> 
> 
> best,
> 
> 
> 

Re: [VOTE] KIP-349 Priorities for Source Topics

2018-10-08 Thread Thomas Becker
Well my (perhaps flawed) understanding of topic priorities is that lower 
priority topics are not consumed as long as higher priority ones have 
unconsumed messages (which means our position < HW). So if I'm doing this 
manually, I have to make some determination as to whether my high priority 
topic partitions are at the HW before I can decide if I want to poll the lower 
priority ones. Right?

On Fri, 2018-10-05 at 11:34 -0700, Colin McCabe wrote:

On Fri, Oct 5, 2018, at 10:58, Thomas Becker wrote:

Colin,

Would you mind sharing your vision for how this looks with multiple

consumers? I'm still getting my bearings with the new consumer but it's

not immediately obvious to me how this would work.


Hi Thomas,


I was just responding to the general idea that you would have some kind of 
control topic that you wanted to read with very low latency, and some kind of 
set of data topics where the latency requirements are less strict.  In that 
case, you can just have two consumers: one for the low-latency topic, and one 
for the less low-latency topics.


There's a lot of things in this picture that are unclear.  Does the data in one 
set of topics have any relation to the data in the other?  Why do we want a 
control channel distinct from the data channel?  That's why I asked for 
clarification on the use-case.


In particular, it doesn't seem particularly easy to know when you are at the 
high

watermark of a topic.


KafkaConsumer#committed will return the last committed offset for a partition.  
However, I'm not sure I understand why you want this information in this case-- 
can you expand a bit on this?


best,

Colin




-Tommy


On Mon, 2018-10-01 at 13:43 -0700, Colin McCabe wrote:


Hi all,



I feel like the DISCUSS thread didn't really come to a conclusion, so a

vote would be premature here.



In particular, I still don't really understand the use-case for this

feature.  Can someone give a concrete scenario where you would need

this?  The control plane / data plane example that is listed in the KIP

doesn't require this feature.  You can just have one consumer for the

control plane, and one for the data plane, and do priority that way.

The discussion feels kind of unfocused since we haven't identified even

one concrete use-case that needs this feature.



Unfortunately, this is a feature which consumes server-side memory.  We

have to store the priorities somehow when doing incremental fetch

requests.  If we go with an int as suggested, then this is at least 4

bytes per partition per incremental fetch request.  It also makes it

more complex and potentially slower to maintain the linked list of

partitions in the fetch requests.  Before we think about this, I'd like

to have a concrete use-case in mind, so that we can evaluate the costs

versus benefits.



best,


Colin




On Mon, Oct 1, 2018, at 07:47, Dongjin Lee wrote:


Great. +1 (non-binding)



On Mon, Oct 1, 2018 at 4:23 AM Matthias J. Sax

mailto:matth...@confluent.io>>>


wrote:



+1 (binding)



As Dongjin pointed out, the community is working on upcoming 2.1


release, and thus it might take some time until people find time to


follow up on this an vote.




-Matthias



On 9/30/18 11:11 AM, 
n...@afshartous.com>
 wrote:



On Sep 30, 2018, at 5:16 AM, Dongjin Lee

mailto:dong...@apache.org>>>
 wrote:



1. Your KIP document


<


https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics


<


https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics



lacks hyperlink to the discussion thread. And I couldn`t find the


discussion thread from the mailing archive.




Hi Dongjin,



There has been a discussion thread.  I added this link as a reference



  https://lists.apache.org/list.html?dev@kafka.apache.org:lte=1M:kip-349






to the KIP-349 page




https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics


<


https://cwiki.apache.org/confluence/display/KAFKA/KIP-349:+Priorities+for+Source+Topics




Best,


--


  Nick







--


*Dongjin Lee*



*A hitchhiker in the mathematical world.*



*github:  github.com/dongjinleekr


linkedin: kr.linkedin.com/in/dongjinleekr


slideshare:


www.slideshare.net/dongjinleekr>


*





This email and any attachments may contain confidential and privileged

material for the sole use of the intended recipient. Any review,

copying, or 

Re: [VOTE] KIP-349 Priorities for Source Topics

2018-10-06 Thread nick


> On Oct 5, 2018, at 2:25 PM, Colin McCabe  wrote:
> 
> t's possible for the change to be 100% backwards compatible, but still not 
> have a separate code path for people who don't want to use this feature, 
> right?  What I am getting at is basically: will this feature increase 
> broker-side memory consumption for people who don't use it?


Hi Colin,

My intent is to leave current consumer semantics and performance and resource 
characteristics intact.  Therefore there should be no additional resources 
necessary for existing code paths.
I would expect that a PR would likely not be accepted if there was a 
degradation of performance for the existing code paths.  

Regarding use-cases, it seems that prioritized topics would be required for 
interleaving messages from multiple topics in order of priority.  For example, 
let’s say each topic represents a real-time news feed (i.e. NY Times, 
Washington Post, Boston Globe).  And if we’d like to process messages from the 
NY Times first as these arrive.  This could be expressed easily using topic 
prioritization.  With multiple consumers the merging and interleaving of 
messages would have to be done by the caller.  Its certainly doable, though it 
seems like those who are requesting topic prioritization are asking for a more 
expressive consumer API.  My $0.02.

Best,
--
  Nick





Re: [VOTE] KIP-349 Priorities for Source Topics

2018-10-05 Thread Colin McCabe
On Fri, Oct 5, 2018, at 10:58, Thomas Becker wrote:
> Colin,
> Would you mind sharing your vision for how this looks with multiple 
> consumers? I'm still getting my bearings with the new consumer but it's 
> not immediately obvious to me how this would work.

Hi Thomas,

I was just responding to the general idea that you would have some kind of 
control topic that you wanted to read with very low latency, and some kind of 
set of data topics where the latency requirements are less strict.  In that 
case, you can just have two consumers: one for the low-latency topic, and one 
for the less low-latency topics.

There's a lot of things in this picture that are unclear.  Does the data in one 
set of topics have any relation to the data in the other?  Why do we want a 
control channel distinct from the data channel?  That's why I asked for 
clarification on the use-case.

> In particular, it doesn't seem particularly easy to know when you are at the 
> high 
> watermark of a topic.

KafkaConsumer#committed will return the last committed offset for a partition.  
However, I'm not sure I understand why you want this information in this case-- 
can you expand a bit on this?

best,
Colin


> 
> -Tommy
> 
> On Mon, 2018-10-01 at 13:43 -0700, Colin McCabe wrote:
> 
> Hi all,
> 
> 
> I feel like the DISCUSS thread didn't really come to a conclusion, so a 
> vote would be premature here.
> 
> 
> In particular, I still don't really understand the use-case for this 
> feature.  Can someone give a concrete scenario where you would need 
> this?  The control plane / data plane example that is listed in the KIP 
> doesn't require this feature.  You can just have one consumer for the 
> control plane, and one for the data plane, and do priority that way.  
> The discussion feels kind of unfocused since we haven't identified even 
> one concrete use-case that needs this feature.
> 
> 
> Unfortunately, this is a feature which consumes server-side memory.  We 
> have to store the priorities somehow when doing incremental fetch 
> requests.  If we go with an int as suggested, then this is at least 4 
> bytes per partition per incremental fetch request.  It also makes it 
> more complex and potentially slower to maintain the linked list of 
> partitions in the fetch requests.  Before we think about this, I'd like 
> to have a concrete use-case in mind, so that we can evaluate the costs 
> versus benefits.
> 
> 
> best,
> 
> Colin
> 
> 
> 
> On Mon, Oct 1, 2018, at 07:47, Dongjin Lee wrote:
> 
> Great. +1 (non-binding)
> 
> 
> On Mon, Oct 1, 2018 at 4:23 AM Matthias J. Sax 
> mailto:matth...@confluent.io>>
> 
> wrote:
> 
> 
> +1 (binding)
> 
> 
> As Dongjin pointed out, the community is working on upcoming 2.1
> 
> release, and thus it might take some time until people find time to
> 
> follow up on this an vote.
> 
> 
> 
> -Matthias
> 
> 
> On 9/30/18 11:11 AM, n...@afshartous.com wrote:
> 
> 
> On Sep 30, 2018, at 5:16 AM, Dongjin Lee 
> mailto:dong...@apache.org>> wrote:
> 
> 
> 1. Your KIP document
> 
> <
> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics
> 
> <
> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics
> 
> 
> lacks hyperlink to the discussion thread. And I couldn`t find the
> 
> discussion thread from the mailing archive.
> 
> 
> 
> Hi Dongjin,
> 
> 
> There has been a discussion thread.  I added this link as a reference
> 
> 
>   https://lists.apache.org/list.html?dev@kafka.apache.org:lte=1M:kip-349
> 
> 
> 
> 
> to the KIP-349 page
> 
> 
> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics
> 
> <
> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349:+Priorities+for+Source+Topics
> 
> 
> 
> Best,
> 
> --
> 
>   Nick
> 
> 
> 
> 
> 
> 
> --
> 
> *Dongjin Lee*
> 
> 
> *A hitchhiker in the mathematical world.*
> 
> 
> *github:  github.com/dongjinleekr
> 
> linkedin: kr.linkedin.com/in/dongjinleekr
> 
> slideshare:
> 
> www.slideshare.net/dongjinleekr
> 
> *
> 
> 
> 
> This email and any attachments may contain confidential and privileged 
> material for the sole use of the intended recipient. Any review, 
> copying, or distribution of this email (or any attachments) by others is 
> prohibited. If you are not the intended recipient, please contact the 
> sender immediately and permanently delete this email and any 
> attachments. No employee or agent of TiVo Inc. is authorized to conclude 
> any binding agreement on behalf of TiVo Inc. by email. Binding 
> agreements with TiVo Inc. may only be made by a signed written 
> agreement.


Re: [VOTE] KIP-349 Priorities for Source Topics

2018-10-05 Thread Colin McCabe
On Wed, Oct 3, 2018, at 16:01, n...@afshartous.com wrote:
> 
> 
> > On Oct 3, 2018, at 12:41 PM, Colin McCabe  wrote:
> > 
> > Will there be a separate code path for people who don't want to use this 
> > feature?
> 
> 
> Yes, I tried to capture this in the KIP by indicating that this API 
> change is 100% backwards compatible.  Current consumer semantics and 
> performance would be unaffected.  

Hi Nick,

Sorry if I was unclear.  It's possible for the change to be 100% backwards 
compatible, but still not have a separate code path for people who don't want 
to use this feature, right?  What I am getting at is basically: will this 
feature increase broker-side memory consumption for people who don't use it?

best,
Colin


Re: [VOTE] KIP-349 Priorities for Source Topics

2018-10-05 Thread Thomas Becker
Colin,
Would you mind sharing your vision for how this looks with multiple consumers? 
I'm still getting my bearings with the new consumer but it's not immediately 
obvious to me how this would work. In particular, it doesn't seem particularly 
easy to know when you are at the high watermark of a topic.

-Tommy

On Mon, 2018-10-01 at 13:43 -0700, Colin McCabe wrote:

Hi all,


I feel like the DISCUSS thread didn't really come to a conclusion, so a vote 
would be premature here.


In particular, I still don't really understand the use-case for this feature.  
Can someone give a concrete scenario where you would need this?  The control 
plane / data plane example that is listed in the KIP doesn't require this 
feature.  You can just have one consumer for the control plane, and one for the 
data plane, and do priority that way.  The discussion feels kind of unfocused 
since we haven't identified even one concrete use-case that needs this feature.


Unfortunately, this is a feature which consumes server-side memory.  We have to 
store the priorities somehow when doing incremental fetch requests.  If we go 
with an int as suggested, then this is at least 4 bytes per partition per 
incremental fetch request.  It also makes it more complex and potentially 
slower to maintain the linked list of partitions in the fetch requests.  Before 
we think about this, I'd like to have a concrete use-case in mind, so that we 
can evaluate the costs versus benefits.


best,

Colin



On Mon, Oct 1, 2018, at 07:47, Dongjin Lee wrote:

Great. +1 (non-binding)


On Mon, Oct 1, 2018 at 4:23 AM Matthias J. Sax 
mailto:matth...@confluent.io>>

wrote:


+1 (binding)


As Dongjin pointed out, the community is working on upcoming 2.1

release, and thus it might take some time until people find time to

follow up on this an vote.



-Matthias


On 9/30/18 11:11 AM, n...@afshartous.com wrote:


On Sep 30, 2018, at 5:16 AM, Dongjin Lee 
mailto:dong...@apache.org>> wrote:


1. Your KIP document

<

https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics

<

https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics


lacks hyperlink to the discussion thread. And I couldn`t find the

discussion thread from the mailing archive.



Hi Dongjin,


There has been a discussion thread.  I added this link as a reference


  https://lists.apache.org/list.html?dev@kafka.apache.org:lte=1M:kip-349




to the KIP-349 page



https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics

<

https://cwiki.apache.org/confluence/display/KAFKA/KIP-349:+Priorities+for+Source+Topics



Best,

--

  Nick






--

*Dongjin Lee*


*A hitchhiker in the mathematical world.*


*github:  github.com/dongjinleekr

linkedin: kr.linkedin.com/in/dongjinleekr

slideshare:

www.slideshare.net/dongjinleekr

*



This email and any attachments may contain confidential and privileged material 
for the sole use of the intended recipient. Any review, copying, or 
distribution of this email (or any attachments) by others is prohibited. If you 
are not the intended recipient, please contact the sender immediately and 
permanently delete this email and any attachments. No employee or agent of TiVo 
Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by 
email. Binding agreements with TiVo Inc. may only be made by a signed written 
agreement.


Re: [VOTE] KIP-349 Priorities for Source Topics

2018-10-03 Thread nick


> On Oct 3, 2018, at 12:41 PM, Colin McCabe  wrote:
> 
> Will there be a separate code path for people who don't want to use this 
> feature?


Yes, I tried to capture this in the KIP by indicating that this API change is 
100% backwards compatible.  Current consumer semantics and performance would be 
unaffected.  

Best,
--
  Nick





Re: [VOTE] KIP-349 Priorities for Source Topics

2018-10-01 Thread Colin McCabe
Hi all,

I feel like the DISCUSS thread didn't really come to a conclusion, so a vote 
would be premature here.

In particular, I still don't really understand the use-case for this feature.  
Can someone give a concrete scenario where you would need this?  The control 
plane / data plane example that is listed in the KIP doesn't require this 
feature.  You can just have one consumer for the control plane, and one for the 
data plane, and do priority that way.  The discussion feels kind of unfocused 
since we haven't identified even one concrete use-case that needs this feature.

Unfortunately, this is a feature which consumes server-side memory.  We have to 
store the priorities somehow when doing incremental fetch requests.  If we go 
with an int as suggested, then this is at least 4 bytes per partition per 
incremental fetch request.  It also makes it more complex and potentially 
slower to maintain the linked list of partitions in the fetch requests.  Before 
we think about this, I'd like to have a concrete use-case in mind, so that we 
can evaluate the costs versus benefits.

best,
Colin


On Mon, Oct 1, 2018, at 07:47, Dongjin Lee wrote:
> Great. +1 (non-binding)
> 
> On Mon, Oct 1, 2018 at 4:23 AM Matthias J. Sax 
> wrote:
> 
> > +1 (binding)
> >
> > As Dongjin pointed out, the community is working on upcoming 2.1
> > release, and thus it might take some time until people find time to
> > follow up on this an vote.
> >
> >
> > -Matthias
> >
> > On 9/30/18 11:11 AM, n...@afshartous.com wrote:
> > >
> > >> On Sep 30, 2018, at 5:16 AM, Dongjin Lee  wrote:
> > >>
> > >> 1. Your KIP document
> > >> <
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics
> > <
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics
> > >>
> > >> lacks hyperlink to the discussion thread. And I couldn`t find the
> > >> discussion thread from the mailing archive.
> > >
> > >
> > > Hi Dongjin,
> > >
> > > There has been a discussion thread.  I added this link as a reference
> > >
> > >   https://lists.apache.org/list.html?dev@kafka.apache.org:lte=1M:kip-349
> > 
> > >
> > > to the KIP-349 page
> > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics
> > <
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-349:+Priorities+for+Source+Topics
> > >
> > >
> > > Best,
> > > --
> > >   Nick
> > >
> > >
> > >
> > >
> >
> > --
> *Dongjin Lee*
> 
> *A hitchhiker in the mathematical world.*
> 
> *github:  github.com/dongjinleekr
> linkedin: kr.linkedin.com/in/dongjinleekr
> slideshare:
> www.slideshare.net/dongjinleekr
> *


Re: [VOTE] KIP-349 Priorities for Source Topics

2018-10-01 Thread Dongjin Lee
Great. +1 (non-binding)

On Mon, Oct 1, 2018 at 4:23 AM Matthias J. Sax 
wrote:

> +1 (binding)
>
> As Dongjin pointed out, the community is working on upcoming 2.1
> release, and thus it might take some time until people find time to
> follow up on this an vote.
>
>
> -Matthias
>
> On 9/30/18 11:11 AM, n...@afshartous.com wrote:
> >
> >> On Sep 30, 2018, at 5:16 AM, Dongjin Lee  wrote:
> >>
> >> 1. Your KIP document
> >> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics
> >>
> >> lacks hyperlink to the discussion thread. And I couldn`t find the
> >> discussion thread from the mailing archive.
> >
> >
> > Hi Dongjin,
> >
> > There has been a discussion thread.  I added this link as a reference
> >
> >   https://lists.apache.org/list.html?dev@kafka.apache.org:lte=1M:kip-349
> 
> >
> > to the KIP-349 page
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349:+Priorities+for+Source+Topics
> >
> >
> > Best,
> > --
> >   Nick
> >
> >
> >
> >
>
> --
*Dongjin Lee*

*A hitchhiker in the mathematical world.*

*github:  github.com/dongjinleekr
linkedin: kr.linkedin.com/in/dongjinleekr
slideshare:
www.slideshare.net/dongjinleekr
*


Re: [VOTE] KIP-349 Priorities for Source Topics

2018-09-30 Thread Matthias J. Sax
+1 (binding)

As Dongjin pointed out, the community is working on upcoming 2.1
release, and thus it might take some time until people find time to
follow up on this an vote.


-Matthias

On 9/30/18 11:11 AM, n...@afshartous.com wrote:
> 
>> On Sep 30, 2018, at 5:16 AM, Dongjin Lee  wrote:
>>
>> 1. Your KIP document
>> >  
>> >
>> lacks hyperlink to the discussion thread. And I couldn`t find the
>> discussion thread from the mailing archive.
> 
> 
> Hi Dongjin,
> 
> There has been a discussion thread.  I added this link as a reference
> 
>   https://lists.apache.org/list.html?dev@kafka.apache.org:lte=1M:kip-349 
> 
> 
> to the KIP-349 page
> 
>   
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics
>  
> 
> 
> Best,
> --
>   Nick
> 
> 
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: [VOTE] KIP-349 Priorities for Source Topics

2018-09-30 Thread nick

> On Sep 30, 2018, at 5:16 AM, Dongjin Lee  wrote:
> 
> 1. Your KIP document
>   
> >
> lacks hyperlink to the discussion thread. And I couldn`t find the
> discussion thread from the mailing archive.


Hi Dongjin,

There has been a discussion thread.  I added this link as a reference

  https://lists.apache.org/list.html?dev@kafka.apache.org:lte=1M:kip-349 


to the KIP-349 page

  
https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics
 


Best,
--
  Nick





Re: [VOTE] KIP-349 Priorities for Source Topics

2018-09-30 Thread Dongjin Lee
Hi Nick,

Thanks for your proposal. However, I have two things I would like to point
out:

1. Your KIP document

lacks hyperlink to the discussion thread. And I couldn`t find the
discussion thread from the mailing archive.[^1] Have you ever opened the
discussion thread and carried on a discussion? The vote should be initiated
only after the discussion is ended.
2. The committers look busy to finalize ver 2.1.0. So It would be much
better to start the discussion thread after the 2.1.0 is released.

Best,
Dongjin

[^1]: What I could find was only like the following: [1]
,
[2]
,
[3]


On Sat, Sep 29, 2018 at 9:30 PM  wrote:

> Hi All,
>
> At this point, I’d like to call for a vote on KIP-349
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349:+Priorities+for+Source+Topics
> >
>
> This is the original proposal, sans MessageChooser.
>
> Cheers,
> --
>   Nick
>
>
>
>

-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*

*github:  github.com/dongjinleekr
linkedin: kr.linkedin.com/in/dongjinleekr
slideshare:
www.slideshare.net/dongjinleekr
*


[VOTE] KIP-349 Priorities for Source Topics

2018-09-29 Thread nick
Hi All,

At this point, I’d like to call for a vote on KIP-349

https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics
 


This is the original proposal, sans MessageChooser.

Cheers,
--
  Nick





Re: [VOTE] KIP-349 Priorities for Source Topics

2018-09-13 Thread Matthias J. Sax
That sound correct, Colin.

At runtime (we just merged an improvement this week, cf KIP-353), Kafka
Streams synchronizes different topics based on record timestamps.
Records are buffered internally before processed and we `pause()`
partitions for which the number of records in the buffer exceeds a
configurable threshold (`buffered.records.per.partition` parameter).

Timestamp based synchronization (ie, message choosing) is essential for
DSL semantics. A custom MessageChosser might break DSL operator semantics.

Having said this, there might be use cases for with timestamps based
synchronization is not desired. However, I would assume that this would
be a Processor API level feature, not a DSL level feature.

Hence, offering something similar to MessageChooser interface at
Processor API level that is leveraged at runtime might make sense. For
this case, the DSL would plug-in its timestamp based synchronization
strategy. Take this with a grain of salt though. I have not thought this
through and it might actually not be possible to express the needed
timestamp synchronization with a MessageChooser interface.


-Matthias

On 9/10/18 10:54 AM, Colin McCabe wrote:
> Hmm.  My understanding is that streams doesn't need anything like this since 
> streams pauses topics when it doesn't need more data from them.  (Matthias, 
> can you confirm?)
> 
> best,
> Colin
> 
> 
> On Mon, Aug 20, 2018, at 06:01, Thomas Becker wrote:
>> I agree with Jan. A strategy interface for choosing processing order is 
>> nice, and would hopefully be a step towards getting this in streams.
>>
>> -Tommy
>>
>> On Mon, 2018-08-20 at 12:52 +0200, Jan Filipiak wrote:
>>
>> On 20.08.2018 00:19, Matthias J. Sax wrote:
>>
>> @Nick: A KIP is only accepted if it got 3 binding votes, ie, votes from
>>
>> committers. If you close the vote before that, the KIP would not be
>>
>> accepted. Note that committers need to pay attention to a lot of KIPs
>>
>> and it can take a while until people can look into it. Thanks for your
>>
>> understanding.
>>
>>
>> @Jan: Can you give a little bit more context on your concerns? It's
>>
>> unclear why you mean atm.
>>
>> Just saying that we should peek at the Samza approach, it's a much more
>>
>> powerful abstraction. We can ship a default MessageChooser
>>
>> that looks at the topics priority.
>>
>> @Adam: anyone can vote :)
>>
>>
>>
>>
>> -Matthias
>>
>>
>> On 8/19/18 9:58 AM, Adam Bellemare wrote:
>>
>> While I am not sure if I can or can’t vote, my question re: Jan’s 
>> comment is, “should we be implementing it as Samza does?”
>>
>>
>> I am not familiar with the drawbacks of the current approach vs how 
>> samza does it.
>>
>>
>> On Aug 18, 2018, at 5:06 PM, 
>> n...@afshartous.com wrote:
>>
>>
>>
>> I only saw one vote on KIP-349, just checking to see if anyone else 
>> would like to vote before closing this out.
>>
>> --
>>
>>   Nick
>>
>>
>>
>> On Aug 13, 2018, at 9:19 PM, 
>> n...@afshartous.com wrote:
>>
>>
>>
>> Hi All,
>>
>>
>> Calling for a vote on KIP-349
>>
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics
>>
>>
>> --
>>
>>  Nick
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> 
>>
>> This email and any attachments may contain confidential and privileged 
>> material for the sole use of the intended recipient. Any review, 
>> copying, or distribution of this email (or any attachments) by others is 
>> prohibited. If you are not the intended recipient, please contact the 
>> sender immediately and permanently delete this email and any 
>> attachments. No employee or agent of TiVo Inc. is authorized to conclude 
>> any binding agreement on behalf of TiVo Inc. by email. Binding 
>> agreements with TiVo Inc. may only be made by a signed written 
>> agreement.



signature.asc
Description: OpenPGP digital signature


Re: [VOTE] KIP-349 Priorities for Source Topics

2018-09-10 Thread Colin McCabe
Hmm.  My understanding is that streams doesn't need anything like this since 
streams pauses topics when it doesn't need more data from them.  (Matthias, can 
you confirm?)

best,
Colin


On Mon, Aug 20, 2018, at 06:01, Thomas Becker wrote:
> I agree with Jan. A strategy interface for choosing processing order is 
> nice, and would hopefully be a step towards getting this in streams.
> 
> -Tommy
> 
> On Mon, 2018-08-20 at 12:52 +0200, Jan Filipiak wrote:
> 
> On 20.08.2018 00:19, Matthias J. Sax wrote:
> 
> @Nick: A KIP is only accepted if it got 3 binding votes, ie, votes from
> 
> committers. If you close the vote before that, the KIP would not be
> 
> accepted. Note that committers need to pay attention to a lot of KIPs
> 
> and it can take a while until people can look into it. Thanks for your
> 
> understanding.
> 
> 
> @Jan: Can you give a little bit more context on your concerns? It's
> 
> unclear why you mean atm.
> 
> Just saying that we should peek at the Samza approach, it's a much more
> 
> powerful abstraction. We can ship a default MessageChooser
> 
> that looks at the topics priority.
> 
> @Adam: anyone can vote :)
> 
> 
> 
> 
> -Matthias
> 
> 
> On 8/19/18 9:58 AM, Adam Bellemare wrote:
> 
> While I am not sure if I can or can’t vote, my question re: Jan’s 
> comment is, “should we be implementing it as Samza does?”
> 
> 
> I am not familiar with the drawbacks of the current approach vs how 
> samza does it.
> 
> 
> On Aug 18, 2018, at 5:06 PM, 
> n...@afshartous.com wrote:
> 
> 
> 
> I only saw one vote on KIP-349, just checking to see if anyone else 
> would like to vote before closing this out.
> 
> --
> 
>   Nick
> 
> 
> 
> On Aug 13, 2018, at 9:19 PM, 
> n...@afshartous.com wrote:
> 
> 
> 
> Hi All,
> 
> 
> Calling for a vote on KIP-349
> 
> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics
> 
> 
> --
> 
>  Nick
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> This email and any attachments may contain confidential and privileged 
> material for the sole use of the intended recipient. Any review, 
> copying, or distribution of this email (or any attachments) by others is 
> prohibited. If you are not the intended recipient, please contact the 
> sender immediately and permanently delete this email and any 
> attachments. No employee or agent of TiVo Inc. is authorized to conclude 
> any binding agreement on behalf of TiVo Inc. by email. Binding 
> agreements with TiVo Inc. may only be made by a signed written 
> agreement.


Re: [VOTE] KIP-349 Priorities for Source Topics

2018-08-23 Thread Jan Filipiak

also:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-353%3A+Improve+Kafka+Streams+Timestamp+Synchronization


On 20.08.2018 15:01, Thomas Becker wrote:

I agree with Jan. A strategy interface for choosing processing order is nice, 
and would hopefully be a step towards getting this in streams.

-Tommy

On Mon, 2018-08-20 at 12:52 +0200, Jan Filipiak wrote:

On 20.08.2018 00:19, Matthias J. Sax wrote:

@Nick: A KIP is only accepted if it got 3 binding votes, ie, votes from

committers. If you close the vote before that, the KIP would not be

accepted. Note that committers need to pay attention to a lot of KIPs

and it can take a while until people can look into it. Thanks for your

understanding.


@Jan: Can you give a little bit more context on your concerns? It's

unclear why you mean atm.

Just saying that we should peek at the Samza approach, it's a much more

powerful abstraction. We can ship a default MessageChooser

that looks at the topics priority.

@Adam: anyone can vote :)




-Matthias


On 8/19/18 9:58 AM, Adam Bellemare wrote:

While I am not sure if I can or can’t vote, my question re: Jan’s comment is, 
“should we be implementing it as Samza does?”


I am not familiar with the drawbacks of the current approach vs how samza does 
it.


On Aug 18, 2018, at 5:06 PM, n...@afshartous.com 
wrote:



I only saw one vote on KIP-349, just checking to see if anyone else would like 
to vote before closing this out.

--

   Nick



On Aug 13, 2018, at 9:19 PM, n...@afshartous.com 
wrote:



Hi All,


Calling for a vote on KIP-349


https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics


--

  Nick











This email and any attachments may contain confidential and privileged material 
for the sole use of the intended recipient. Any review, copying, or 
distribution of this email (or any attachments) by others is prohibited. If you 
are not the intended recipient, please contact the sender immediately and 
permanently delete this email and any attachments. No employee or agent of TiVo 
Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by 
email. Binding agreements with TiVo Inc. may only be made by a signed written 
agreement.




Re: [VOTE] KIP-349 Priorities for Source Topics

2018-08-20 Thread Thomas Becker
I agree with Jan. A strategy interface for choosing processing order is nice, 
and would hopefully be a step towards getting this in streams.

-Tommy

On Mon, 2018-08-20 at 12:52 +0200, Jan Filipiak wrote:

On 20.08.2018 00:19, Matthias J. Sax wrote:

@Nick: A KIP is only accepted if it got 3 binding votes, ie, votes from

committers. If you close the vote before that, the KIP would not be

accepted. Note that committers need to pay attention to a lot of KIPs

and it can take a while until people can look into it. Thanks for your

understanding.


@Jan: Can you give a little bit more context on your concerns? It's

unclear why you mean atm.

Just saying that we should peek at the Samza approach, it's a much more

powerful abstraction. We can ship a default MessageChooser

that looks at the topics priority.

@Adam: anyone can vote :)




-Matthias


On 8/19/18 9:58 AM, Adam Bellemare wrote:

While I am not sure if I can or can’t vote, my question re: Jan’s comment is, 
“should we be implementing it as Samza does?”


I am not familiar with the drawbacks of the current approach vs how samza does 
it.


On Aug 18, 2018, at 5:06 PM, n...@afshartous.com 
wrote:



I only saw one vote on KIP-349, just checking to see if anyone else would like 
to vote before closing this out.

--

  Nick



On Aug 13, 2018, at 9:19 PM, n...@afshartous.com 
wrote:



Hi All,


Calling for a vote on KIP-349


https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics


--

 Nick











This email and any attachments may contain confidential and privileged material 
for the sole use of the intended recipient. Any review, copying, or 
distribution of this email (or any attachments) by others is prohibited. If you 
are not the intended recipient, please contact the sender immediately and 
permanently delete this email and any attachments. No employee or agent of TiVo 
Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by 
email. Binding agreements with TiVo Inc. may only be made by a signed written 
agreement.


Re: [VOTE] KIP-349 Priorities for Source Topics

2018-08-20 Thread Jan Filipiak



On 20.08.2018 00:19, Matthias J. Sax wrote:

@Nick: A KIP is only accepted if it got 3 binding votes, ie, votes from
committers. If you close the vote before that, the KIP would not be
accepted. Note that committers need to pay attention to a lot of KIPs
and it can take a while until people can look into it. Thanks for your
understanding.

@Jan: Can you give a little bit more context on your concerns? It's
unclear why you mean atm.
Just saying that we should peek at the Samza approach, it's a much more 
powerful abstraction. We can ship a default MessageChooser

that looks at the topics priority.

@Adam: anyone can vote :)



-Matthias

On 8/19/18 9:58 AM, Adam Bellemare wrote:

While I am not sure if I can or can’t vote, my question re: Jan’s comment is, 
“should we be implementing it as Samza does?”

I am not familiar with the drawbacks of the current approach vs how samza does 
it.


On Aug 18, 2018, at 5:06 PM, n...@afshartous.com wrote:


I only saw one vote on KIP-349, just checking to see if anyone else would like 
to vote before closing this out.
--
  Nick



On Aug 13, 2018, at 9:19 PM, n...@afshartous.com wrote:


Hi All,

Calling for a vote on KIP-349

https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics

--
 Nick











Re: [VOTE] KIP-349 Priorities for Source Topics

2018-08-19 Thread Matthias J. Sax
@Nick: A KIP is only accepted if it got 3 binding votes, ie, votes from
committers. If you close the vote before that, the KIP would not be
accepted. Note that committers need to pay attention to a lot of KIPs
and it can take a while until people can look into it. Thanks for your
understanding.

@Jan: Can you give a little bit more context on your concerns? It's
unclear why you mean atm.

@Adam: anyone can vote :)



-Matthias

On 8/19/18 9:58 AM, Adam Bellemare wrote:
> While I am not sure if I can or can’t vote, my question re: Jan’s comment is, 
> “should we be implementing it as Samza does?” 
> 
> I am not familiar with the drawbacks of the current approach vs how samza 
> does it. 
> 
>> On Aug 18, 2018, at 5:06 PM, n...@afshartous.com wrote:
>>
>>
>> I only saw one vote on KIP-349, just checking to see if anyone else would 
>> like to vote before closing this out.  
>> --
>>  Nick
>>
>>
>>> On Aug 13, 2018, at 9:19 PM, n...@afshartous.com wrote:
>>>
>>>
>>> Hi All,
>>>
>>> Calling for a vote on KIP-349
>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics
>>>
>>> --
>>> Nick
>>>
>>>
>>>
>>
>>
>>
>>



signature.asc
Description: OpenPGP digital signature


Re: [VOTE] KIP-349 Priorities for Source Topics

2018-08-19 Thread Adam Bellemare
While I am not sure if I can or can’t vote, my question re: Jan’s comment is, 
“should we be implementing it as Samza does?” 

I am not familiar with the drawbacks of the current approach vs how samza does 
it. 

> On Aug 18, 2018, at 5:06 PM, n...@afshartous.com wrote:
> 
> 
> I only saw one vote on KIP-349, just checking to see if anyone else would 
> like to vote before closing this out.  
> --
>  Nick
> 
> 
>> On Aug 13, 2018, at 9:19 PM, n...@afshartous.com wrote:
>> 
>> 
>> Hi All,
>> 
>> Calling for a vote on KIP-349
>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics
>> 
>> --
>> Nick
>> 
>> 
>> 
> 
> 
> 
> 


Re: [VOTE] KIP-349 Priorities for Source Topics

2018-08-18 Thread nick


I only saw one vote on KIP-349, just checking to see if anyone else would like 
to vote before closing this out.  
--
  Nick


> On Aug 13, 2018, at 9:19 PM, n...@afshartous.com wrote:
> 
> 
> Hi All,
> 
> Calling for a vote on KIP-349
> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics
> 
> --
>  Nick
> 
> 
> 






Re: [VOTE] KIP-349 Priorities for Source Topics

2018-08-13 Thread Jan Filipiak

Sorry for missing the discussion

-1 nonbinding

see

https://samza.apache.org/learn/documentation/0.7.0/api/javadocs/org/apache/samza/system/chooser/MessageChooser.html

Best Jan


On 14.08.2018 03:19, n...@afshartous.com wrote:

Hi All,

Calling for a vote on KIP-349

https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics

--
   Nick







[VOTE] KIP-349 Priorities for Source Topics

2018-08-13 Thread nick


Hi All,

Calling for a vote on KIP-349

https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics

--
  Nick





Re: [VOTE] KIP-349 Priorities for Source Topics

2018-08-12 Thread Gwen Shapira
Both use-case: Command queue for a streams job and work-queue in general
seem reasonable. Thank you for explaining.

On Sun, Aug 12, 2018 at 6:34 AM, Matt Farmer  wrote:

> The work-queue use case is mostly how I see this being used, yes.
>
> In the most generic sense I can see its use in a situation where the
> business dictates
> that we have to guarantee quality of service for some set of low number of
> messages while
> there's some high number of messages being processed from a different
> topic.
>
> For our pipelines at work, this would actually make it possible for us to
> define a command
> and control topic for some of our streams applications. We occasionally
> have to change the
> behavior of our streams in reaction to system issues or, occasionally,
> user's abusing the
> system and creating a bunch of garbage data that we'd like to skip over.
> Today, we have
> to either define that behavior in a config setting and restart the
> application (which is what
> we currently do) or implement some sort of API external to streams that the
> stream pulls
> state from.
>
> With a CnC topic, we could interleave these into the normal stream
> processing flow and
> instruct it to alter a state store, for example, with the criteria for
> records to be dropped
> without introducing other libraries or having to manually synchronize with
> external state.
>
> On Wed, Aug 8, 2018 at 10:11 PM Gwen Shapira  wrote:
>
> > Can you guys spell it out for me? I just don't really see when I want to
> > subscribe to two topics but not get events from both at the same time.
> > Is this a work-queue type pattern?
> >
> > On Wed, Aug 8, 2018 at 6:10 PM, Matt Farmer  wrote:
> >
> > > Oh, almost forgot, thanks for the KIP - I can see this being a very
> > useful
> > > addition. :)
> > >
> > > On Wed, Aug 8, 2018 at 9:09 PM Matt Farmer  wrote:
> > >
> > > > Is it worth spelling out explicitly what the behavior is when two
> > topics
> > > > have the same priority? I'm a bit fuzzy on how we choose what topics
> to
> > > > consume from right now, if I'm being honest, so it could be useful to
> > > > outline the current behavior in the background and to spell out how
> > that
> > > > would change (or if it would change) when two topics are given the
> same
> > > > priority.
> > > >
> > > > Also, how does this play with max.poll.records? Does the consumer
> read
> > > > from all the topics in priority order until we've hit the number of
> > > records
> > > > or the poll timeout? Or does it immediately return the high priority
> > > > records without pulling low priority records?
> > > >
> > > > On Wed, Aug 8, 2018 at 8:39 PM  wrote:
> > > >
> > > >>
> > > >> Hi All,
> > > >>
> > > >> Calling for a vote on KIP-349
> > > >>
> > > >>
> > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 349%3A+Priorities+for+Source+Topics
> > > >> <
> > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 349:+Priorities+for+Source+Topics
> > > >> >
> > > >>
> > > >> Cheers,
> > > >> --
> > > >>   Nick
> > > >>
> > > >>
> > > >>
> > > >>
> > >
> >
> >
> >
> > --
> > *Gwen Shapira*
> > Product Manager | Confluent
> > 650.450.2760 | @gwenshap
> > Follow us: Twitter  | blog
> > 
> >
>



-- 
*Gwen Shapira*
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter  | blog



Re: [VOTE] KIP-349 Priorities for Source Topics

2018-08-12 Thread Matt Farmer
In thinking on it, another solution for this is another consumer external
to the stream - but then we run into timing issues and complexity with
using a state store as the storage of record. :/
On Sun, Aug 12, 2018 at 9:34 AM Matt Farmer  wrote:

> The work-queue use case is mostly how I see this being used, yes.
>
> In the most generic sense I can see its use in a situation where the
> business dictates
> that we have to guarantee quality of service for some set of low number of
> messages while
> there's some high number of messages being processed from a different
> topic.
>
> For our pipelines at work, this would actually make it possible for us to
> define a command
> and control topic for some of our streams applications. We occasionally
> have to change the
> behavior of our streams in reaction to system issues or, occasionally,
> user's abusing the
> system and creating a bunch of garbage data that we'd like to skip over.
> Today, we have
> to either define that behavior in a config setting and restart the
> application (which is what
> we currently do) or implement some sort of API external to streams that
> the stream pulls
> state from.
>
> With a CnC topic, we could interleave these into the normal stream
> processing flow and
> instruct it to alter a state store, for example, with the criteria for
> records to be dropped
> without introducing other libraries or having to manually synchronize with
> external state.
>
> On Wed, Aug 8, 2018 at 10:11 PM Gwen Shapira  wrote:
>
>> Can you guys spell it out for me? I just don't really see when I want to
>> subscribe to two topics but not get events from both at the same time.
>> Is this a work-queue type pattern?
>>
>> On Wed, Aug 8, 2018 at 6:10 PM, Matt Farmer  wrote:
>>
>> > Oh, almost forgot, thanks for the KIP - I can see this being a very
>> useful
>> > addition. :)
>> >
>> > On Wed, Aug 8, 2018 at 9:09 PM Matt Farmer  wrote:
>> >
>> > > Is it worth spelling out explicitly what the behavior is when two
>> topics
>> > > have the same priority? I'm a bit fuzzy on how we choose what topics
>> to
>> > > consume from right now, if I'm being honest, so it could be useful to
>> > > outline the current behavior in the background and to spell out how
>> that
>> > > would change (or if it would change) when two topics are given the
>> same
>> > > priority.
>> > >
>> > > Also, how does this play with max.poll.records? Does the consumer read
>> > > from all the topics in priority order until we've hit the number of
>> > records
>> > > or the poll timeout? Or does it immediately return the high priority
>> > > records without pulling low priority records?
>> > >
>> > > On Wed, Aug 8, 2018 at 8:39 PM  wrote:
>> > >
>> > >>
>> > >> Hi All,
>> > >>
>> > >> Calling for a vote on KIP-349
>> > >>
>> > >>
>> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > 349%3A+Priorities+for+Source+Topics
>> > >> <
>> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > 349:+Priorities+for+Source+Topics
>> > >> >
>> > >>
>> > >> Cheers,
>> > >> --
>> > >>   Nick
>> > >>
>> > >>
>> > >>
>> > >>
>> >
>>
>>
>>
>> --
>> *Gwen Shapira*
>> Product Manager | Confluent
>> 650.450.2760 | @gwenshap
>> Follow us: Twitter  | blog
>> 
>>
>


Re: [VOTE] KIP-349 Priorities for Source Topics

2018-08-12 Thread Matt Farmer
The work-queue use case is mostly how I see this being used, yes.

In the most generic sense I can see its use in a situation where the
business dictates
that we have to guarantee quality of service for some set of low number of
messages while
there's some high number of messages being processed from a different topic.

For our pipelines at work, this would actually make it possible for us to
define a command
and control topic for some of our streams applications. We occasionally
have to change the
behavior of our streams in reaction to system issues or, occasionally,
user's abusing the
system and creating a bunch of garbage data that we'd like to skip over.
Today, we have
to either define that behavior in a config setting and restart the
application (which is what
we currently do) or implement some sort of API external to streams that the
stream pulls
state from.

With a CnC topic, we could interleave these into the normal stream
processing flow and
instruct it to alter a state store, for example, with the criteria for
records to be dropped
without introducing other libraries or having to manually synchronize with
external state.

On Wed, Aug 8, 2018 at 10:11 PM Gwen Shapira  wrote:

> Can you guys spell it out for me? I just don't really see when I want to
> subscribe to two topics but not get events from both at the same time.
> Is this a work-queue type pattern?
>
> On Wed, Aug 8, 2018 at 6:10 PM, Matt Farmer  wrote:
>
> > Oh, almost forgot, thanks for the KIP - I can see this being a very
> useful
> > addition. :)
> >
> > On Wed, Aug 8, 2018 at 9:09 PM Matt Farmer  wrote:
> >
> > > Is it worth spelling out explicitly what the behavior is when two
> topics
> > > have the same priority? I'm a bit fuzzy on how we choose what topics to
> > > consume from right now, if I'm being honest, so it could be useful to
> > > outline the current behavior in the background and to spell out how
> that
> > > would change (or if it would change) when two topics are given the same
> > > priority.
> > >
> > > Also, how does this play with max.poll.records? Does the consumer read
> > > from all the topics in priority order until we've hit the number of
> > records
> > > or the poll timeout? Or does it immediately return the high priority
> > > records without pulling low priority records?
> > >
> > > On Wed, Aug 8, 2018 at 8:39 PM  wrote:
> > >
> > >>
> > >> Hi All,
> > >>
> > >> Calling for a vote on KIP-349
> > >>
> > >>
> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 349%3A+Priorities+for+Source+Topics
> > >> <
> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 349:+Priorities+for+Source+Topics
> > >> >
> > >>
> > >> Cheers,
> > >> --
> > >>   Nick
> > >>
> > >>
> > >>
> > >>
> >
>
>
>
> --
> *Gwen Shapira*
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter  | blog
> 
>


Re: [VOTE] KIP-349 Priorities for Source Topics

2018-08-08 Thread Gwen Shapira
Can you guys spell it out for me? I just don't really see when I want to
subscribe to two topics but not get events from both at the same time.
Is this a work-queue type pattern?

On Wed, Aug 8, 2018 at 6:10 PM, Matt Farmer  wrote:

> Oh, almost forgot, thanks for the KIP - I can see this being a very useful
> addition. :)
>
> On Wed, Aug 8, 2018 at 9:09 PM Matt Farmer  wrote:
>
> > Is it worth spelling out explicitly what the behavior is when two topics
> > have the same priority? I'm a bit fuzzy on how we choose what topics to
> > consume from right now, if I'm being honest, so it could be useful to
> > outline the current behavior in the background and to spell out how that
> > would change (or if it would change) when two topics are given the same
> > priority.
> >
> > Also, how does this play with max.poll.records? Does the consumer read
> > from all the topics in priority order until we've hit the number of
> records
> > or the poll timeout? Or does it immediately return the high priority
> > records without pulling low priority records?
> >
> > On Wed, Aug 8, 2018 at 8:39 PM  wrote:
> >
> >>
> >> Hi All,
> >>
> >> Calling for a vote on KIP-349
> >>
> >>
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 349%3A+Priorities+for+Source+Topics
> >> <
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 349:+Priorities+for+Source+Topics
> >> >
> >>
> >> Cheers,
> >> --
> >>   Nick
> >>
> >>
> >>
> >>
>



-- 
*Gwen Shapira*
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter  | blog



Re: [VOTE] KIP-349 Priorities for Source Topics

2018-08-08 Thread Matt Farmer
Oh, almost forgot, thanks for the KIP - I can see this being a very useful
addition. :)

On Wed, Aug 8, 2018 at 9:09 PM Matt Farmer  wrote:

> Is it worth spelling out explicitly what the behavior is when two topics
> have the same priority? I'm a bit fuzzy on how we choose what topics to
> consume from right now, if I'm being honest, so it could be useful to
> outline the current behavior in the background and to spell out how that
> would change (or if it would change) when two topics are given the same
> priority.
>
> Also, how does this play with max.poll.records? Does the consumer read
> from all the topics in priority order until we've hit the number of records
> or the poll timeout? Or does it immediately return the high priority
> records without pulling low priority records?
>
> On Wed, Aug 8, 2018 at 8:39 PM  wrote:
>
>>
>> Hi All,
>>
>> Calling for a vote on KIP-349
>>
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics
>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349:+Priorities+for+Source+Topics
>> >
>>
>> Cheers,
>> --
>>   Nick
>>
>>
>>
>>


Re: [VOTE] KIP-349 Priorities for Source Topics

2018-08-08 Thread Matt Farmer
Is it worth spelling out explicitly what the behavior is when two topics
have the same priority? I'm a bit fuzzy on how we choose what topics to
consume from right now, if I'm being honest, so it could be useful to
outline the current behavior in the background and to spell out how that
would change (or if it would change) when two topics are given the same
priority.

Also, how does this play with max.poll.records? Does the consumer read from
all the topics in priority order until we've hit the number of records or
the poll timeout? Or does it immediately return the high priority records
without pulling low priority records?

On Wed, Aug 8, 2018 at 8:39 PM  wrote:

>
> Hi All,
>
> Calling for a vote on KIP-349
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349:+Priorities+for+Source+Topics
> >
>
> Cheers,
> --
>   Nick
>
>
>
>


[VOTE] KIP-349 Priorities for Source Topics

2018-08-08 Thread nick

Hi All,

Calling for a vote on KIP-349 

  
https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics
 


Cheers,
--
  Nick