Re: [DISCUSS] KIP-491: Preferred Leader Deprioritized List (Temporary Blacklist)

2020-02-03 Thread George Li
 Hi Stanislav/Colin,

A couple people ran into issues with auto.leader.rebalance.enable=true in 
https://issues.apache.org/jira/browse/KAFKA-4084

And I think KIP-491 can help solve that issue.  We have implemented KIP-491 
internally together with another feature called latest offset for quickly 
bringing up a failed empty node, and found it quite useful.   

Could you take a look at the comments in the ticket, re-evaluate and provide 
your feedbacks? 

Thanks,
George


On Tuesday, September 17, 2019, 07:56:52 AM PDT, Stanislav Kozlovski 
 wrote:  
 
 Hey Harsha,

> If we want to go with making this an option and providing a tool which
abstracts moving the broker to end preferred leader list , it needs to do
it for all the partitions that broker is leader for. As said in the above
comment a broker i.e leader for 1000 partitions we have to this for all the
partitions.  Instead of having a blacklist will help simplify this process
and we can provide monitoring/alerts on such list.

Sorry, I thought that part of the reasoning for not using reassignment was
to optimize the process.

> Do you mind shedding some light what issue you are talking to propose a
KIP for?


The issue I was talking about is the one I quoted in my previous reply. I
understand that you want to have a way of running a "shallow" replica of
sorts - one that is lacking the historical data but has (and continues to
replicate) the latest data. That is the goal of setting the last offsets
for all partitions in replication-offset-checkpoint, right?

Thanks,
Stanislav

On Mon, Sep 16, 2019 at 3:39 PM Satish Duggana 
wrote:

> Hi George,
> Thanks for explaining the usecase for topic level preferred leader
> blacklist. As I mentioned earlier, I am fine with broker level config
> for now.
>
> ~Satish.
>
>
> On Sat, Sep 7, 2019 at 12:29 AM George Li
>  wrote:
> >
> >  Hi,
> >
> > Just want to ping and bubble up the discussion of KIP-491.
> >
> > On a large scale of Kafka clusters with thousands of brokers in many
> clusters.  Frequent hardware failures are common, although the
> reassignments to change the preferred leaders is a workaround, it incurs
> unnecessary additional work than the proposed preferred leader blacklist in
> KIP-491, and hard to scale.
> >
> > I am wondering whether others using Kafka in a big scale running into
> same problem.
> >
> >
> > Satish,
> >
> > Regarding your previous question about whether there is use-case for
> TopicLevel preferred leader "blacklist",  I thought about one use-case:  to
> improve rebalance/reassignment, the large partition will usually cause
> performance/stability issues, planning to change the say the New Replica
> will start with Leader's latest offset(this way the replica is almost
> instantly in the ISR and reassignment completed), and put this partition's
> NewReplica into Preferred Leader "Blacklist" at the Topic Level config for
> that partition. After sometime(retention time), this new replica has caught
> up and ready to serve traffic, update/remove the TopicConfig for this
> partition's preferred leader blacklist.
> >
> > I will update the KIP-491 later for this use case of Topic Level config
> for Preferred Leader Blacklist.
> >
> >
> > Thanks,
> > George
> >
> >    On Wednesday, August 7, 2019, 07:43:55 PM PDT, George Li <
> sql_consult...@yahoo.com> wrote:
> >
> >  Hi Colin,
> >
> > > In your example, I think we're comparing apples and oranges.  You
> started by outlining a scenario where "an empty broker... comes up...
> [without] any > leadership[s]."  But then you criticize using reassignment
> to switch the order of preferred replicas because it "would not actually
> switch the leader > automatically."  If the empty broker doesn't have any
> leaderships, there is nothing to be switched, right?
> >
> > Let me explained in details of this particular use case example for
> comparing apples to apples.
> >
> > Let's say a healthy broker hosting 3000 partitions, and of which 1000
> are the preferred leaders (leader count is 1000). There is a hardware
> failure (disk/memory, etc.), and kafka process crashed. We swap this host
> with another host but keep the same broker.id, when this new broker
> coming up, it has no historical data, and we manage to have the current
> last offsets of all partitions set in the replication-offset-checkpoint (if
> we don't set them, it could cause crazy ReplicaFetcher pulling of
> historical data from other brokers and cause cluster high latency and other
> instabilities), so when Kafka is brought up, it is quickly catching up as
> followers in the ISR.  Note, we have auto.leader.rebalance.enable
> disabled, so it's not serving any traffic as leaders (leader count = 0),
> even there are 1000 partitions that this broker is the Preferred Leader.
> >
> > We need to make this broker not serving traffic for a few hours or days
> depending on the SLA of the topic retention requirement until after it's
> having enough historical data.
> >
> >
> > * The 

Re: [DISCUSS] KIP-491: Preferred Leader Deprioritized List (Temporary Blacklist)

2019-09-17 Thread Stanislav Kozlovski
Hey Harsha,

> If we want to go with making this an option and providing a tool which
abstracts moving the broker to end preferred leader list , it needs to do
it for all the partitions that broker is leader for. As said in the above
comment a broker i.e leader for 1000 partitions we have to this for all the
partitions.  Instead of having a blacklist will help simplify this process
and we can provide monitoring/alerts on such list.

Sorry, I thought that part of the reasoning for not using reassignment was
to optimize the process.

> Do you mind shedding some light what issue you are talking to propose a
KIP for?


The issue I was talking about is the one I quoted in my previous reply. I
understand that you want to have a way of running a "shallow" replica of
sorts - one that is lacking the historical data but has (and continues to
replicate) the latest data. That is the goal of setting the last offsets
for all partitions in replication-offset-checkpoint, right?

Thanks,
Stanislav

On Mon, Sep 16, 2019 at 3:39 PM Satish Duggana 
wrote:

> Hi George,
> Thanks for explaining the usecase for topic level preferred leader
> blacklist. As I mentioned earlier, I am fine with broker level config
> for now.
>
> ~Satish.
>
>
> On Sat, Sep 7, 2019 at 12:29 AM George Li
>  wrote:
> >
> >  Hi,
> >
> > Just want to ping and bubble up the discussion of KIP-491.
> >
> > On a large scale of Kafka clusters with thousands of brokers in many
> clusters.  Frequent hardware failures are common, although the
> reassignments to change the preferred leaders is a workaround, it incurs
> unnecessary additional work than the proposed preferred leader blacklist in
> KIP-491, and hard to scale.
> >
> > I am wondering whether others using Kafka in a big scale running into
> same problem.
> >
> >
> > Satish,
> >
> > Regarding your previous question about whether there is use-case for
> TopicLevel preferred leader "blacklist",  I thought about one use-case:  to
> improve rebalance/reassignment, the large partition will usually cause
> performance/stability issues, planning to change the say the New Replica
> will start with Leader's latest offset(this way the replica is almost
> instantly in the ISR and reassignment completed), and put this partition's
> NewReplica into Preferred Leader "Blacklist" at the Topic Level config for
> that partition. After sometime(retention time), this new replica has caught
> up and ready to serve traffic, update/remove the TopicConfig for this
> partition's preferred leader blacklist.
> >
> > I will update the KIP-491 later for this use case of Topic Level config
> for Preferred Leader Blacklist.
> >
> >
> > Thanks,
> > George
> >
> > On Wednesday, August 7, 2019, 07:43:55 PM PDT, George Li <
> sql_consult...@yahoo.com> wrote:
> >
> >   Hi Colin,
> >
> > > In your example, I think we're comparing apples and oranges.  You
> started by outlining a scenario where "an empty broker... comes up...
> [without] any > leadership[s]."  But then you criticize using reassignment
> to switch the order of preferred replicas because it "would not actually
> switch the leader > automatically."  If the empty broker doesn't have any
> leaderships, there is nothing to be switched, right?
> >
> > Let me explained in details of this particular use case example for
> comparing apples to apples.
> >
> > Let's say a healthy broker hosting 3000 partitions, and of which 1000
> are the preferred leaders (leader count is 1000). There is a hardware
> failure (disk/memory, etc.), and kafka process crashed. We swap this host
> with another host but keep the same broker.id, when this new broker
> coming up, it has no historical data, and we manage to have the current
> last offsets of all partitions set in the replication-offset-checkpoint (if
> we don't set them, it could cause crazy ReplicaFetcher pulling of
> historical data from other brokers and cause cluster high latency and other
> instabilities), so when Kafka is brought up, it is quickly catching up as
> followers in the ISR.  Note, we have auto.leader.rebalance.enable
> disabled, so it's not serving any traffic as leaders (leader count = 0),
> even there are 1000 partitions that this broker is the Preferred Leader.
> >
> > We need to make this broker not serving traffic for a few hours or days
> depending on the SLA of the topic retention requirement until after it's
> having enough historical data.
> >
> >
> > * The traditional way using the reassignments to move this broker in
> that 1000 partitions where it's the preferred leader to the end of
> assignment, this is O(N) operation. and from my experience, we can't submit
> all 1000 at the same time, otherwise cause higher latencies even the
> reassignment in this case can complete almost instantly.  After  a few
> hours/days whatever, this broker is ready to serve traffic,  we have to run
> reassignments again to restore that 1000 partitions preferred leaders for
> this broker: O(N) operation.  then run preferred leader 

Re: [DISCUSS] KIP-491: Preferred Leader Deprioritized List (Temporary Blacklist)

2019-09-16 Thread Satish Duggana
Hi George,
Thanks for explaining the usecase for topic level preferred leader
blacklist. As I mentioned earlier, I am fine with broker level config
for now.

~Satish.


On Sat, Sep 7, 2019 at 12:29 AM George Li
 wrote:
>
>  Hi,
>
> Just want to ping and bubble up the discussion of KIP-491.
>
> On a large scale of Kafka clusters with thousands of brokers in many 
> clusters.  Frequent hardware failures are common, although the reassignments 
> to change the preferred leaders is a workaround, it incurs unnecessary 
> additional work than the proposed preferred leader blacklist in KIP-491, and 
> hard to scale.
>
> I am wondering whether others using Kafka in a big scale running into same 
> problem.
>
>
> Satish,
>
> Regarding your previous question about whether there is use-case for 
> TopicLevel preferred leader "blacklist",  I thought about one use-case:  to 
> improve rebalance/reassignment, the large partition will usually cause 
> performance/stability issues, planning to change the say the New Replica will 
> start with Leader's latest offset(this way the replica is almost instantly in 
> the ISR and reassignment completed), and put this partition's NewReplica into 
> Preferred Leader "Blacklist" at the Topic Level config for that partition. 
> After sometime(retention time), this new replica has caught up and ready to 
> serve traffic, update/remove the TopicConfig for this partition's preferred 
> leader blacklist.
>
> I will update the KIP-491 later for this use case of Topic Level config for 
> Preferred Leader Blacklist.
>
>
> Thanks,
> George
>
> On Wednesday, August 7, 2019, 07:43:55 PM PDT, George Li 
>  wrote:
>
>   Hi Colin,
>
> > In your example, I think we're comparing apples and oranges.  You started 
> > by outlining a scenario where "an empty broker... comes up... [without] any 
> > > leadership[s]."  But then you criticize using reassignment to switch the 
> > order of preferred replicas because it "would not actually switch the 
> > leader > automatically."  If the empty broker doesn't have any leaderships, 
> > there is nothing to be switched, right?
>
> Let me explained in details of this particular use case example for comparing 
> apples to apples.
>
> Let's say a healthy broker hosting 3000 partitions, and of which 1000 are the 
> preferred leaders (leader count is 1000). There is a hardware failure 
> (disk/memory, etc.), and kafka process crashed. We swap this host with 
> another host but keep the same broker.id, when this new broker coming up, it 
> has no historical data, and we manage to have the current last offsets of all 
> partitions set in the replication-offset-checkpoint (if we don't set them, it 
> could cause crazy ReplicaFetcher pulling of historical data from other 
> brokers and cause cluster high latency and other instabilities), so when 
> Kafka is brought up, it is quickly catching up as followers in the ISR.  
> Note, we have auto.leader.rebalance.enable  disabled, so it's not serving any 
> traffic as leaders (leader count = 0), even there are 1000 partitions that 
> this broker is the Preferred Leader.
>
> We need to make this broker not serving traffic for a few hours or days 
> depending on the SLA of the topic retention requirement until after it's 
> having enough historical data.
>
>
> * The traditional way using the reassignments to move this broker in that 
> 1000 partitions where it's the preferred leader to the end of  assignment, 
> this is O(N) operation. and from my experience, we can't submit all 1000 at 
> the same time, otherwise cause higher latencies even the reassignment in this 
> case can complete almost instantly.  After  a few hours/days whatever, this 
> broker is ready to serve traffic,  we have to run reassignments again to 
> restore that 1000 partitions preferred leaders for this broker: O(N) 
> operation.  then run preferred leader election O(N) again.  So total 3 x O(N) 
> operations.  The point is since the new empty broker is expected to be the 
> same as the old one in terms of hosting partition/leaders, it would seem 
> unnecessary to do reassignments (ordering of replica) during the broker 
> catching up time.
>
>
>
> * The new feature Preferred Leader "Blacklist":  just need to put a dynamic 
> config to indicate that this broker should be considered leader (preferred 
> leader election or broker failover or unclean leader election) to the lowest 
> priority. NO need to run any reassignments. After a few hours/days, when this 
> broker is ready, remove the dynamic config, and run preferred leader election 
> and this broker will serve traffic for that 1000 original partitions it was 
> the preferred leader. So total  1 x O(N) operation.
>
>
> If auto.leader.rebalance.enable  is enabled,  the Preferred Leader 
> "Blacklist" can be put it before Kafka is started to prevent this broker 
> serving traffic.  In the traditional way of running reassignments, once the 
> broker is up, with auto.leader.rebalance.enable  , if 

Re: [DISCUSS] KIP-491: Preferred Leader Deprioritized List (Temporary Blacklist)

2019-09-14 Thread Harsha Ch
Hi Stanislav,

               Thanks for the comments. The proposal we are making is not about 
optimizing Big-O but instead provide a simpler way of stopping a broker 
becoming leader.  If we want to go with making this an option and providing a 
tool which abstracts moving the broker to end preferred leader list , it needs 
to do it for all the partitions that broker is leader for. As said in the above 
comment a broker i.e leader for 1000 partitions we have to this for all the 
partitions.  Instead of having a blacklist will help simplify this process and 
we can provide monitoring/alerts on such list. 

"This sounds like a bit of a hack. If that is the concern, why not propose a 
KIP that addresses the specific issue?"

Do you mind shedding some light what issue you are talking to propose a KIP for?

Replication is a challenge when we are bringing up a new node.  If you have 
retention period of 3 days there is honestly no way to do it via online 
replication without taking a hit on latency SLAs. 

Is your ask to find a way to fix the replication itself when we are bringing a 
new broker from  no data.

"Having a blacklist you control still seems like a workaround given that Kafka 
itself knows when the topic retention would allow you to switch that replica to 
a leader"

Not sure how its making it any complicated by having a single zk path to have a 
list of brokers.

Thanks,

Harsha

On Mon, Sep 09, 2019 at 3:55 PM, Stanislav Kozlovski < stanis...@confluent.io > 
wrote:

> 
> 
> 
> I agree with Colin that the same result should be achievable through
> proper abstraction in a tool. Even if that might be "4xO(N)" operations,
> that is still not a lot - it is still classified as O(N)
> 
> 
> 
> Let's say a healthy broker hosting 3000 partitions, and of which 1000 are
> 
> 
>> 
>> 
>> the preferred leaders (leader count is 1000). There is a hardware failure
>> (disk/memory, etc.), and kafka process crashed. We swap this host with
>> another host but keep the same broker. id ( http://broker.id/ ) , when this
>> new broker coming up, it has no historical data, and we manage to have the
>> current last offsets of all partitions set in the
>> replication-offset-checkpoint (if we don't set them, it could cause crazy
>> ReplicaFetcher pulling of historical data from other brokers and cause
>> cluster high latency and other instabilities), so when Kafka is brought
>> up, it is quickly catching up as followers in the ISR. Note, we have
>> auto.leader.rebalance.enable disabled, so it's not serving any traffic as
>> leaders (leader count = 0), even there are 1000 partitions that this
>> broker is the Preferred Leader. We need to make this broker not serving
>> traffic for a few hours or days depending on the SLA of the topic
>> retention requirement until after it's having enough historical data.
>> 
>> 
> 
> 
> 
> This sounds like a bit of a hack. If that is the concern, why not propose
> a KIP that addresses the specific issue? Having a blacklist you control
> still seems like a workaround given that Kafka itself knows when the topic
> retention would allow you to switch that replica to a leader
> 
> 
> 
> I really hope we can come up with a solution that avoids complicating the
> controller and state machine logic further.
> Could you please list out the main drawbacks of abstract this away in the
> reassignments tool (or a new tool)?
> 
> 
> 
> On Mon, Sep 9, 2019 at 7:53 AM Colin McCabe < cmccabe@ apache. org (
> cmcc...@apache.org ) > wrote:
> 
> 
>> 
>> 
>> On Sat, Sep 7, 2019, at 09:21, Harsha Chintalapani wrote:
>> 
>> 
>>> 
>>> 
>>> Hi Colin,
>>> Can you give us more details on why you don't want this to be part of the
>>> Kafka core. You are proposing KIP-500 which will take away zookeeper and
>>> writing this interim tools to change the zookeeper metadata doesn't make
>>> sense to me.
>>> 
>>> 
>> 
>> 
>> 
>> Hi Harsha,
>> 
>> 
>> 
>> The reassignment API described in KIP-455, which will be part of Kafka
>> 2.4, doesn't rely on ZooKeeper. This API will stay the same after KIP-500
>> is implemented.
>> 
>> 
>>> 
>>> 
>>> As George pointed out there are
>>> several benefits having it in the system itself instead of asking users to
>>> hack bunch of json files to deal with outage scenario.
>>> 
>>> 
>> 
>> 
>> 
>> In both cases, the user just has to run a shell command, right? In both
>> cases, the user has to remember to undo the command later when they want
>> the broker to be treated normally again. And in both cases, the user
>> should probably be running an external rebalancing tool to avoid having to
>> run these commands manually. :)
>> 
>> 
>> 
>> best,
>> Colin
>> 
>> 
>>> 
>>> 
>>> Thanks,
>>> Harsha
>>> 
>>> 
>>> 
>>> On Fri, Sep 6, 2019 at 4:36 PM George Li < sql_consulting@ yahoo. com (
>>> sql_consult...@yahoo.com )
>>> 
>>> 
>> 
>> 
>> 
>> .invalid>
>> 
>> 
>>> 
>>> 
>>> wrote:
>>> 
>>> 
 
 
 Hi Colin,
 
 
 
 Thanks for the feedback. The "separate set of 

Re: [DISCUSS] KIP-491: Preferred Leader Deprioritized List (Temporary Blacklist)

2019-09-09 Thread Stanislav Kozlovski
I agree with Colin that the same result should be achievable through proper
abstraction in a tool. Even if that might be "4xO(N)" operations, that is
still not a lot - it is still classified as O(N)

Let's say a healthy broker hosting 3000 partitions, and of which 1000 are
> the preferred leaders (leader count is 1000). There is a hardware failure
> (disk/memory, etc.), and kafka process crashed. We swap this host with
> another host but keep the same broker.id, when this new broker coming up,
> it has no historical data, and we manage to have the current last offsets
> of all partitions set in the replication-offset-checkpoint (if we don't set
> them, it could cause crazy ReplicaFetcher pulling of historical data from
> other brokers and cause cluster high latency and other instabilities), so
> when Kafka is brought up, it is quickly catching up as followers in the
> ISR.  Note, we have auto.leader.rebalance.enable  disabled, so it's not
> serving any traffic as leaders (leader count = 0), even there are 1000
> partitions that this broker is the Preferred Leader.
> We need to make this broker not serving traffic for a few hours or days
> depending on the SLA of the topic retention requirement until after it's
> having enough historical data.


This sounds like a bit of a hack. If that is the concern, why not propose a
KIP that addresses the specific issue? Having a blacklist you control still
seems like a workaround given that Kafka itself knows when the topic
retention would allow you to switch that replica to a leader

I really hope we can come up with a solution that avoids complicating the
controller and state machine logic further.
Could you please list out the main drawbacks of abstract this away in the
reassignments tool (or a new tool)?

On Mon, Sep 9, 2019 at 7:53 AM Colin McCabe  wrote:

> On Sat, Sep 7, 2019, at 09:21, Harsha Chintalapani wrote:
> > Hi Colin,
> >   Can you give us more details on why you don't want this to be
> > part of the Kafka core. You are proposing KIP-500 which will take away
> > zookeeper and writing this interim tools to change the zookeeper
> > metadata doesn't make sense to me.
>
> Hi Harsha,
>
> The reassignment API described in KIP-455, which will be part of Kafka
> 2.4, doesn't rely on ZooKeeper.  This API will stay the same after KIP-500
> is implemented.
>
> > As George pointed out there are
> > several benefits having it in the system itself instead of asking users
> > to hack bunch of json files to deal with outage scenario.
>
> In both cases, the user just has to run a shell command, right?  In both
> cases, the user has to remember to undo the command later when they want
> the broker to be treated normally again.  And in both cases, the user
> should probably be running an external rebalancing tool to avoid having to
> run these commands manually. :)
>
> best,
> Colin
>
> >
> > Thanks,
> > Harsha
> >
> > On Fri, Sep 6, 2019 at 4:36 PM George Li  .invalid>
> > wrote:
> >
> > >  Hi Colin,
> > >
> > > Thanks for the feedback.  The "separate set of metadata about
> blacklists"
> > > in KIP-491 is just the list of broker ids. Usually 1 or 2 or a couple
> in
> > > the cluster.  Should be easier than keeping json files?  e.g. what if
> we
> > > first blacklist broker_id_1, then another broker_id_2 has issues, and
> we
> > > need to write out another json file to restore later (and in which
> order)?
> > >  Using blacklist, we can just add the broker_id_2 to the existing one.
> and
> > > remove whatever broker_id returning to good state without worrying
> how(the
> > > ordering of putting the broker to blacklist) to restore.
> > >
> > > For topic level config,  the blacklist will be tied to
> > > topic/partition(e.g.  Configs:
> > > topic.preferred.leader.blacklist=0:101,102;1:103where 0 & 1 is the
> > > partition#, 101,102,103 are the blacklist broker_ids), and easier to
> > > update/remove, no need for external json files?
> > >
> > >
> > > Thanks,
> > > George
> > >
> > > On Friday, September 6, 2019, 02:20:33 PM PDT, Colin McCabe <
> > > cmcc...@apache.org> wrote:
> > >
> > >  One possibility would be writing a new command-line tool that would
> > > deprioritize a given replica using the new KIP-455 API.  Then it could
> > > write out a JSON files containing the old priorities, which could be
> > > restored when (or if) we needed to do so.  This seems like it might be
> > > simpler and easier to maintain than a separate set of metadata about
> > > blacklists.
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Fri, Sep 6, 2019, at 11:58, George Li wrote:
> > > >  Hi,
> > > >
> > > > Just want to ping and bubble up the discussion of KIP-491.
> > > >
> > > > On a large scale of Kafka clusters with thousands of brokers in many
> > > > clusters.  Frequent hardware failures are common, although the
> > > > reassignments to change the preferred leaders is a workaround, it
> > > > incurs unnecessary additional work than the proposed preferred leader
> 

Re: [DISCUSS] KIP-491: Preferred Leader Deprioritized List (Temporary Blacklist)

2019-09-09 Thread Colin McCabe
On Sat, Sep 7, 2019, at 09:21, Harsha Chintalapani wrote:
> Hi Colin,
>   Can you give us more details on why you don't want this to be
> part of the Kafka core. You are proposing KIP-500 which will take away
> zookeeper and writing this interim tools to change the zookeeper 
> metadata doesn't make sense to me.

Hi Harsha,

The reassignment API described in KIP-455, which will be part of Kafka 2.4, 
doesn't rely on ZooKeeper.  This API will stay the same after KIP-500 is 
implemented.

> As George pointed out there are
> several benefits having it in the system itself instead of asking users
> to hack bunch of json files to deal with outage scenario.

In both cases, the user just has to run a shell command, right?  In both cases, 
the user has to remember to undo the command later when they want the broker to 
be treated normally again.  And in both cases, the user should probably be 
running an external rebalancing tool to avoid having to run these commands 
manually. :)

best,
Colin

> 
> Thanks,
> Harsha
> 
> On Fri, Sep 6, 2019 at 4:36 PM George Li 
> wrote:
> 
> >  Hi Colin,
> >
> > Thanks for the feedback.  The "separate set of metadata about blacklists"
> > in KIP-491 is just the list of broker ids. Usually 1 or 2 or a couple in
> > the cluster.  Should be easier than keeping json files?  e.g. what if we
> > first blacklist broker_id_1, then another broker_id_2 has issues, and we
> > need to write out another json file to restore later (and in which order)?
> >  Using blacklist, we can just add the broker_id_2 to the existing one. and
> > remove whatever broker_id returning to good state without worrying how(the
> > ordering of putting the broker to blacklist) to restore.
> >
> > For topic level config,  the blacklist will be tied to
> > topic/partition(e.g.  Configs:
> > topic.preferred.leader.blacklist=0:101,102;1:103where 0 & 1 is the
> > partition#, 101,102,103 are the blacklist broker_ids), and easier to
> > update/remove, no need for external json files?
> >
> >
> > Thanks,
> > George
> >
> > On Friday, September 6, 2019, 02:20:33 PM PDT, Colin McCabe <
> > cmcc...@apache.org> wrote:
> >
> >  One possibility would be writing a new command-line tool that would
> > deprioritize a given replica using the new KIP-455 API.  Then it could
> > write out a JSON files containing the old priorities, which could be
> > restored when (or if) we needed to do so.  This seems like it might be
> > simpler and easier to maintain than a separate set of metadata about
> > blacklists.
> >
> > best,
> > Colin
> >
> >
> > On Fri, Sep 6, 2019, at 11:58, George Li wrote:
> > >  Hi,
> > >
> > > Just want to ping and bubble up the discussion of KIP-491.
> > >
> > > On a large scale of Kafka clusters with thousands of brokers in many
> > > clusters.  Frequent hardware failures are common, although the
> > > reassignments to change the preferred leaders is a workaround, it
> > > incurs unnecessary additional work than the proposed preferred leader
> > > blacklist in KIP-491, and hard to scale.
> > >
> > > I am wondering whether others using Kafka in a big scale running into
> > > same problem.
> > >
> > >
> > > Satish,
> > >
> > > Regarding your previous question about whether there is use-case for
> > > TopicLevel preferred leader "blacklist",  I thought about one
> > > use-case:  to improve rebalance/reassignment, the large partition will
> > > usually cause performance/stability issues, planning to change the say
> > > the New Replica will start with Leader's latest offset(this way the
> > > replica is almost instantly in the ISR and reassignment completed), and
> > > put this partition's NewReplica into Preferred Leader "Blacklist" at
> > > the Topic Level config for that partition. After sometime(retention
> > > time), this new replica has caught up and ready to serve traffic,
> > > update/remove the TopicConfig for this partition's preferred leader
> > > blacklist.
> > >
> > > I will update the KIP-491 later for this use case of Topic Level config
> > > for Preferred Leader Blacklist.
> > >
> > >
> > > Thanks,
> > > George
> > >
> > >On Wednesday, August 7, 2019, 07:43:55 PM PDT, George Li
> > >  wrote:
> > >
> > >  Hi Colin,
> > >
> > > > In your example, I think we're comparing apples and oranges.  You
> > started by outlining a scenario where "an empty broker... comes up...
> > [without] any > leadership[s]."  But then you criticize using reassignment
> > to switch the order of preferred replicas because it "would not actually
> > switch the leader > automatically."  If the empty broker doesn't have any
> > leaderships, there is nothing to be switched, right?
> > >
> > > Let me explained in details of this particular use case example for
> > > comparing apples to apples.
> > >
> > > Let's say a healthy broker hosting 3000 partitions, and of which 1000
> > > are the preferred leaders (leader count is 1000). There is a hardware
> > > failure (disk/memory, etc.), and kafka process crashed. We 

Re: [DISCUSS] KIP-491: Preferred Leader Deprioritized List (Temporary Blacklist)

2019-09-07 Thread Harsha Chintalapani
Hi Colin,
  Can you give us more details on why you don't want this to be
part of the Kafka core. You are proposing KIP-500 which will take away
zookeeper and
writing this interim tools to change the zookeeper metadata doesn't make
sense to me. As George pointed out there are several benefits having it in
the system itself
instead of asking users to hack bunch of json files to deal with outage
scenario.

Thanks,
Harsha

On Fri, Sep 6, 2019 at 4:36 PM George Li 
wrote:

>  Hi Colin,
>
> Thanks for the feedback.  The "separate set of metadata about blacklists"
> in KIP-491 is just the list of broker ids. Usually 1 or 2 or a couple in
> the cluster.  Should be easier than keeping json files?  e.g. what if we
> first blacklist broker_id_1, then another broker_id_2 has issues, and we
> need to write out another json file to restore later (and in which order)?
>  Using blacklist, we can just add the broker_id_2 to the existing one. and
> remove whatever broker_id returning to good state without worrying how(the
> ordering of putting the broker to blacklist) to restore.
>
> For topic level config,  the blacklist will be tied to
> topic/partition(e.g.  Configs:
> topic.preferred.leader.blacklist=0:101,102;1:103where 0 & 1 is the
> partition#, 101,102,103 are the blacklist broker_ids), and easier to
> update/remove, no need for external json files?
>
>
> Thanks,
> George
>
> On Friday, September 6, 2019, 02:20:33 PM PDT, Colin McCabe <
> cmcc...@apache.org> wrote:
>
>  One possibility would be writing a new command-line tool that would
> deprioritize a given replica using the new KIP-455 API.  Then it could
> write out a JSON files containing the old priorities, which could be
> restored when (or if) we needed to do so.  This seems like it might be
> simpler and easier to maintain than a separate set of metadata about
> blacklists.
>
> best,
> Colin
>
>
> On Fri, Sep 6, 2019, at 11:58, George Li wrote:
> >  Hi,
> >
> > Just want to ping and bubble up the discussion of KIP-491.
> >
> > On a large scale of Kafka clusters with thousands of brokers in many
> > clusters.  Frequent hardware failures are common, although the
> > reassignments to change the preferred leaders is a workaround, it
> > incurs unnecessary additional work than the proposed preferred leader
> > blacklist in KIP-491, and hard to scale.
> >
> > I am wondering whether others using Kafka in a big scale running into
> > same problem.
> >
> >
> > Satish,
> >
> > Regarding your previous question about whether there is use-case for
> > TopicLevel preferred leader "blacklist",  I thought about one
> > use-case:  to improve rebalance/reassignment, the large partition will
> > usually cause performance/stability issues, planning to change the say
> > the New Replica will start with Leader's latest offset(this way the
> > replica is almost instantly in the ISR and reassignment completed), and
> > put this partition's NewReplica into Preferred Leader "Blacklist" at
> > the Topic Level config for that partition. After sometime(retention
> > time), this new replica has caught up and ready to serve traffic,
> > update/remove the TopicConfig for this partition's preferred leader
> > blacklist.
> >
> > I will update the KIP-491 later for this use case of Topic Level config
> > for Preferred Leader Blacklist.
> >
> >
> > Thanks,
> > George
> >
> >On Wednesday, August 7, 2019, 07:43:55 PM PDT, George Li
> >  wrote:
> >
> >  Hi Colin,
> >
> > > In your example, I think we're comparing apples and oranges.  You
> started by outlining a scenario where "an empty broker... comes up...
> [without] any > leadership[s]."  But then you criticize using reassignment
> to switch the order of preferred replicas because it "would not actually
> switch the leader > automatically."  If the empty broker doesn't have any
> leaderships, there is nothing to be switched, right?
> >
> > Let me explained in details of this particular use case example for
> > comparing apples to apples.
> >
> > Let's say a healthy broker hosting 3000 partitions, and of which 1000
> > are the preferred leaders (leader count is 1000). There is a hardware
> > failure (disk/memory, etc.), and kafka process crashed. We swap this
> > host with another host but keep the same broker.id, when this new
> > broker coming up, it has no historical data, and we manage to have the
> > current last offsets of all partitions set in
> > the replication-offset-checkpoint (if we don't set them, it could cause
> > crazy ReplicaFetcher pulling of historical data from other brokers and
> > cause cluster high latency and other instabilities), so when Kafka is
> > brought up, it is quickly catching up as followers in the ISR.  Note,
> > we have auto.leader.rebalance.enable  disabled, so it's not serving any
> > traffic as leaders (leader count = 0), even there are 1000 partitions
> > that this broker is the Preferred Leader.
> >
> > We need to make this broker not serving traffic for a few hours or days
> > 

Re: [DISCUSS] KIP-491: Preferred Leader Deprioritized List (Temporary Blacklist)

2019-09-06 Thread George Li
 Hi Colin,

Thanks for the feedback.  The "separate set of metadata about blacklists" in 
KIP-491 is just the list of broker ids. Usually 1 or 2 or a couple in the 
cluster.  Should be easier than keeping json files?  e.g. what if we first 
blacklist broker_id_1, then another broker_id_2 has issues, and we need to 
write out another json file to restore later (and in which order)?   Using 
blacklist, we can just add the broker_id_2 to the existing one. and remove 
whatever broker_id returning to good state without worrying how(the ordering of 
putting the broker to blacklist) to restore.

For topic level config,  the blacklist will be tied to topic/partition(e.g.  
Configs: topic.preferred.leader.blacklist=0:101,102;1:103    where 0 & 1 is the 
partition#, 101,102,103 are the blacklist broker_ids), and easier to 
update/remove, no need for external json files? 


Thanks,
George

On Friday, September 6, 2019, 02:20:33 PM PDT, Colin McCabe 
 wrote:  
 
 One possibility would be writing a new command-line tool that would 
deprioritize a given replica using the new KIP-455 API.  Then it could write 
out a JSON files containing the old priorities, which could be restored when 
(or if) we needed to do so.  This seems like it might be simpler and easier to 
maintain than a separate set of metadata about blacklists.

best,
Colin


On Fri, Sep 6, 2019, at 11:58, George Li wrote:
>  Hi, 
> 
> Just want to ping and bubble up the discussion of KIP-491. 
> 
> On a large scale of Kafka clusters with thousands of brokers in many 
> clusters.  Frequent hardware failures are common, although the 
> reassignments to change the preferred leaders is a workaround, it 
> incurs unnecessary additional work than the proposed preferred leader 
> blacklist in KIP-491, and hard to scale. 
> 
> I am wondering whether others using Kafka in a big scale running into 
> same problem. 
> 
> 
> Satish,  
> 
> Regarding your previous question about whether there is use-case for 
> TopicLevel preferred leader "blacklist",  I thought about one 
> use-case:  to improve rebalance/reassignment, the large partition will 
> usually cause performance/stability issues, planning to change the say 
> the New Replica will start with Leader's latest offset(this way the 
> replica is almost instantly in the ISR and reassignment completed), and 
> put this partition's NewReplica into Preferred Leader "Blacklist" at 
> the Topic Level config for that partition. After sometime(retention 
> time), this new replica has caught up and ready to serve traffic, 
> update/remove the TopicConfig for this partition's preferred leader 
> blacklist. 
> 
> I will update the KIP-491 later for this use case of Topic Level config 
> for Preferred Leader Blacklist.
> 
> 
> Thanks,
> George
>  
>    On Wednesday, August 7, 2019, 07:43:55 PM PDT, George Li 
>  wrote:  
>  
>  Hi Colin,
> 
> > In your example, I think we're comparing apples and oranges.  You started 
> > by outlining a scenario where "an empty broker... comes up... [without] any 
> > > leadership[s]."  But then you criticize using reassignment to switch the 
> > order of preferred replicas because it "would not actually switch the 
> > leader > automatically."  If the empty broker doesn't have any leaderships, 
> > there is nothing to be switched, right?
> 
> Let me explained in details of this particular use case example for 
> comparing apples to apples. 
> 
> Let's say a healthy broker hosting 3000 partitions, and of which 1000 
> are the preferred leaders (leader count is 1000). There is a hardware 
> failure (disk/memory, etc.), and kafka process crashed. We swap this 
> host with another host but keep the same broker.id, when this new 
> broker coming up, it has no historical data, and we manage to have the 
> current last offsets of all partitions set in 
> the replication-offset-checkpoint (if we don't set them, it could cause 
> crazy ReplicaFetcher pulling of historical data from other brokers and 
> cause cluster high latency and other instabilities), so when Kafka is 
> brought up, it is quickly catching up as followers in the ISR.  Note, 
> we have auto.leader.rebalance.enable  disabled, so it's not serving any 
> traffic as leaders (leader count = 0), even there are 1000 partitions 
> that this broker is the Preferred Leader. 
> 
> We need to make this broker not serving traffic for a few hours or days 
> depending on the SLA of the topic retention requirement until after 
> it's having enough historical data. 
> 
> 
> * The traditional way using the reassignments to move this broker in 
> that 1000 partitions where it's the preferred leader to the end of  
> assignment, this is O(N) operation. and from my experience, we can't 
> submit all 1000 at the same time, otherwise cause higher latencies even 
> the reassignment in this case can complete almost instantly.  After  a 
> few hours/days whatever, this broker is ready to serve traffic,  we 
> have to run reassignments again to restore 

Re: [DISCUSS] KIP-491: Preferred Leader Deprioritized List (Temporary Blacklist)

2019-09-06 Thread Colin McCabe
One possibility would be writing a new command-line tool that would 
deprioritize a given replica using the new KIP-455 API.  Then it could write 
out a JSON files containing the old priorities, which could be restored when 
(or if) we needed to do so.  This seems like it might be simpler and easier to 
maintain than a separate set of metadata about blacklists.

best,
Colin


On Fri, Sep 6, 2019, at 11:58, George Li wrote:
>  Hi, 
> 
> Just want to ping and bubble up the discussion of KIP-491. 
> 
> On a large scale of Kafka clusters with thousands of brokers in many 
> clusters.  Frequent hardware failures are common, although the 
> reassignments to change the preferred leaders is a workaround, it 
> incurs unnecessary additional work than the proposed preferred leader 
> blacklist in KIP-491, and hard to scale. 
> 
> I am wondering whether others using Kafka in a big scale running into 
> same problem. 
> 
> 
> Satish,  
> 
> Regarding your previous question about whether there is use-case for 
> TopicLevel preferred leader "blacklist",  I thought about one 
> use-case:  to improve rebalance/reassignment, the large partition will 
> usually cause performance/stability issues, planning to change the say 
> the New Replica will start with Leader's latest offset(this way the 
> replica is almost instantly in the ISR and reassignment completed), and 
> put this partition's NewReplica into Preferred Leader "Blacklist" at 
> the Topic Level config for that partition. After sometime(retention 
> time), this new replica has caught up and ready to serve traffic, 
> update/remove the TopicConfig for this partition's preferred leader 
> blacklist. 
> 
> I will update the KIP-491 later for this use case of Topic Level config 
> for Preferred Leader Blacklist.
> 
> 
> Thanks,
> George
>  
> On Wednesday, August 7, 2019, 07:43:55 PM PDT, George Li 
>  wrote:  
>  
>   Hi Colin,
> 
> > In your example, I think we're comparing apples and oranges.  You started 
> > by outlining a scenario where "an empty broker... comes up... [without] any 
> > > leadership[s]."  But then you criticize using reassignment to switch the 
> > order of preferred replicas because it "would not actually switch the 
> > leader > automatically."  If the empty broker doesn't have any leaderships, 
> > there is nothing to be switched, right?
> 
> Let me explained in details of this particular use case example for 
> comparing apples to apples. 
> 
> Let's say a healthy broker hosting 3000 partitions, and of which 1000 
> are the preferred leaders (leader count is 1000). There is a hardware 
> failure (disk/memory, etc.), and kafka process crashed. We swap this 
> host with another host but keep the same broker.id, when this new 
> broker coming up, it has no historical data, and we manage to have the 
> current last offsets of all partitions set in 
> the replication-offset-checkpoint (if we don't set them, it could cause 
> crazy ReplicaFetcher pulling of historical data from other brokers and 
> cause cluster high latency and other instabilities), so when Kafka is 
> brought up, it is quickly catching up as followers in the ISR.  Note, 
> we have auto.leader.rebalance.enable  disabled, so it's not serving any 
> traffic as leaders (leader count = 0), even there are 1000 partitions 
> that this broker is the Preferred Leader. 
> 
> We need to make this broker not serving traffic for a few hours or days 
> depending on the SLA of the topic retention requirement until after 
> it's having enough historical data. 
> 
> 
> * The traditional way using the reassignments to move this broker in 
> that 1000 partitions where it's the preferred leader to the end of  
> assignment, this is O(N) operation. and from my experience, we can't 
> submit all 1000 at the same time, otherwise cause higher latencies even 
> the reassignment in this case can complete almost instantly.  After  a 
> few hours/days whatever, this broker is ready to serve traffic,  we 
> have to run reassignments again to restore that 1000 partitions 
> preferred leaders for this broker: O(N) operation.  then run preferred 
> leader election O(N) again.  So total 3 x O(N) operations.  The point 
> is since the new empty broker is expected to be the same as the old one 
> in terms of hosting partition/leaders, it would seem unnecessary to do 
> reassignments (ordering of replica) during the broker catching up time. 
> 
> 
> 
> * The new feature Preferred Leader "Blacklist":  just need to put a 
> dynamic config to indicate that this broker should be considered leader 
> (preferred leader election or broker failover or unclean leader 
> election) to the lowest priority. NO need to run any reassignments. 
> After a few hours/days, when this broker is ready, remove the dynamic 
> config, and run preferred leader election and this broker will serve 
> traffic for that 1000 original partitions it was the preferred leader. 
> So total  1 x O(N) operation. 
> 
> 
> If 

Re: [DISCUSS] KIP-491: Preferred Leader Deprioritized List (Temporary Blacklist)

2019-09-06 Thread George Li
 Hi, 

Just want to ping and bubble up the discussion of KIP-491. 

On a large scale of Kafka clusters with thousands of brokers in many clusters.  
Frequent hardware failures are common, although the reassignments to change the 
preferred leaders is a workaround, it incurs unnecessary additional work than 
the proposed preferred leader blacklist in KIP-491, and hard to scale. 

I am wondering whether others using Kafka in a big scale running into same 
problem. 


Satish,  

Regarding your previous question about whether there is use-case for TopicLevel 
preferred leader "blacklist",  I thought about one use-case:  to improve 
rebalance/reassignment, the large partition will usually cause 
performance/stability issues, planning to change the say the New Replica will 
start with Leader's latest offset(this way the replica is almost instantly in 
the ISR and reassignment completed), and put this partition's NewReplica into 
Preferred Leader "Blacklist" at the Topic Level config for that partition. 
After sometime(retention time), this new replica has caught up and ready to 
serve traffic, update/remove the TopicConfig for this partition's preferred 
leader blacklist. 

I will update the KIP-491 later for this use case of Topic Level config for 
Preferred Leader Blacklist.


Thanks,
George
 
On Wednesday, August 7, 2019, 07:43:55 PM PDT, George Li 
 wrote:  
 
  Hi Colin,

> In your example, I think we're comparing apples and oranges.  You started by 
> outlining a scenario where "an empty broker... comes up... [without] any > 
> leadership[s]."  But then you criticize using reassignment to switch the 
> order of preferred replicas because it "would not actually switch the leader 
> > automatically."  If the empty broker doesn't have any leaderships, there is 
> nothing to be switched, right?

Let me explained in details of this particular use case example for comparing 
apples to apples. 

Let's say a healthy broker hosting 3000 partitions, and of which 1000 are the 
preferred leaders (leader count is 1000). There is a hardware failure 
(disk/memory, etc.), and kafka process crashed. We swap this host with another 
host but keep the same broker.id, when this new broker coming up, it has no 
historical data, and we manage to have the current last offsets of all 
partitions set in the replication-offset-checkpoint (if we don't set them, it 
could cause crazy ReplicaFetcher pulling of historical data from other brokers 
and cause cluster high latency and other instabilities), so when Kafka is 
brought up, it is quickly catching up as followers in the ISR.  Note, we have 
auto.leader.rebalance.enable  disabled, so it's not serving any traffic as 
leaders (leader count = 0), even there are 1000 partitions that this broker is 
the Preferred Leader. 

We need to make this broker not serving traffic for a few hours or days 
depending on the SLA of the topic retention requirement until after it's having 
enough historical data. 


* The traditional way using the reassignments to move this broker in that 1000 
partitions where it's the preferred leader to the end of  assignment, this is 
O(N) operation. and from my experience, we can't submit all 1000 at the same 
time, otherwise cause higher latencies even the reassignment in this case can 
complete almost instantly.  After  a few hours/days whatever, this broker is 
ready to serve traffic,  we have to run reassignments again to restore that 
1000 partitions preferred leaders for this broker: O(N) operation.  then run 
preferred leader election O(N) again.  So total 3 x O(N) operations.  The point 
is since the new empty broker is expected to be the same as the old one in 
terms of hosting partition/leaders, it would seem unnecessary to do 
reassignments (ordering of replica) during the broker catching up time. 



* The new feature Preferred Leader "Blacklist":  just need to put a dynamic 
config to indicate that this broker should be considered leader (preferred 
leader election or broker failover or unclean leader election) to the lowest 
priority. NO need to run any reassignments. After a few hours/days, when this 
broker is ready, remove the dynamic config, and run preferred leader election 
and this broker will serve traffic for that 1000 original partitions it was the 
preferred leader. So total  1 x O(N) operation. 


If auto.leader.rebalance.enable  is enabled,  the Preferred Leader "Blacklist" 
can be put it before Kafka is started to prevent this broker serving traffic.  
In the traditional way of running reassignments, once the broker is up, with 
auto.leader.rebalance.enable  , if leadership starts going to this new empty 
broker, it might have to do preferred leader election after reassignments to 
remove its leaderships. e.g. (1,2,3) => (2,3,1) reassignment only change the 
ordering, 1 remains as the current leader, and needs prefer leader election to 
change to 2 after reassignment. so potentially one more O(N) operation. 

I hope the above 

Re: [DISCUSS] KIP-491: Preferred Leader Deprioritized List (Temporary Blacklist)

2019-08-07 Thread George Li
 Hi Colin,

> In your example, I think we're comparing apples and oranges.  You started by 
> outlining a scenario where "an empty broker... comes up... [without] any > 
> leadership[s]."  But then you criticize using reassignment to switch the 
> order of preferred replicas because it "would not actually switch the leader 
> > automatically."  If the empty broker doesn't have any leaderships, there is 
> nothing to be switched, right?

Let me explained in details of this particular use case example for comparing 
apples to apples. 

Let's say a healthy broker hosting 3000 partitions, and of which 1000 are the 
preferred leaders (leader count is 1000). There is a hardware failure 
(disk/memory, etc.), and kafka process crashed. We swap this host with another 
host but keep the same broker.id, when this new broker coming up, it has no 
historical data, and we manage to have the current last offsets of all 
partitions set in the replication-offset-checkpoint (if we don't set them, it 
could cause crazy ReplicaFetcher pulling of historical data from other brokers 
and cause cluster high latency and other instabilities), so when Kafka is 
brought up, it is quickly catching up as followers in the ISR.  Note, we have 
auto.leader.rebalance.enable  disabled, so it's not serving any traffic as 
leaders (leader count = 0), even there are 1000 partitions that this broker is 
the Preferred Leader. 

We need to make this broker not serving traffic for a few hours or days 
depending on the SLA of the topic retention requirement until after it's having 
enough historical data. 


* The traditional way using the reassignments to move this broker in that 1000 
partitions where it's the preferred leader to the end of  assignment, this is 
O(N) operation. and from my experience, we can't submit all 1000 at the same 
time, otherwise cause higher latencies even the reassignment in this case can 
complete almost instantly.  After  a few hours/days whatever, this broker is 
ready to serve traffic,  we have to run reassignments again to restore that 
1000 partitions preferred leaders for this broker: O(N) operation.  then run 
preferred leader election O(N) again.  So total 3 x O(N) operations.  The point 
is since the new empty broker is expected to be the same as the old one in 
terms of hosting partition/leaders, it would seem unnecessary to do 
reassignments (ordering of replica) during the broker catching up time. 



* The new feature Preferred Leader "Blacklist":  just need to put a dynamic 
config to indicate that this broker should be considered leader (preferred 
leader election or broker failover or unclean leader election) to the lowest 
priority. NO need to run any reassignments. After a few hours/days, when this 
broker is ready, remove the dynamic config, and run preferred leader election 
and this broker will serve traffic for that 1000 original partitions it was the 
preferred leader. So total  1 x O(N) operation. 


If auto.leader.rebalance.enable  is enabled,  the Preferred Leader "Blacklist" 
can be put it before Kafka is started to prevent this broker serving traffic.  
In the traditional way of running reassignments, once the broker is up, with 
auto.leader.rebalance.enable  , if leadership starts going to this new empty 
broker, it might have to do preferred leader election after reassignments to 
remove its leaderships. e.g. (1,2,3) => (2,3,1) reassignment only change the 
ordering, 1 remains as the current leader, and needs prefer leader election to 
change to 2 after reassignment. so potentially one more O(N) operation. 

I hope the above example can show how easy to "blacklist" a broker serving 
leadership.  For someone managing Production Kafka cluster, it's important to 
react fast to certain alerts and mitigate/resolve some issues. As I listed the 
other use cases in KIP-291, I think this feature can make the Kafka product 
more easier to manage/operate. 

> In general, using an external rebalancing tool like Cruise Control is a good 
> idea to keep things balanced without having deal with manual rebalancing.  > 
> We expect more and more people who have a complex or large cluster will start 
> using tools like this.
> 
> However, if you choose to do manual rebalancing, it shouldn't be that bad.  
> You would save the existing partition ordering before making your changes, 
> then> make your changes (perhaps by running a simple command line tool that 
> switches the order of the replicas).  Then, once you felt like the broker was 
> ready to> serve traffic, you could just re-apply the old ordering which you 
> had saved.


We do have our own rebalancing tool which has its own criteria like Rack 
diversity,  disk usage,  spread partitions/leaders across all brokers in the 
cluster per topic, leadership Bytes/BytesIn served per broker, etc.  We can run 
reassignments. The point is whether it's really necessary, and if there is more 
effective, easier, safer way to do it.    

take another use case 

Re: [DISCUSS] KIP-491: Preferred Leader Deprioritized List (Temporary Blacklist)

2019-08-07 Thread Colin McCabe
On Wed, Aug 7, 2019, at 12:48, George Li wrote:
>  Hi Colin,
> 
> Thanks for your feedbacks.  Comments below:
> > Even if you have a way of blacklisting an entire broker all at once, you 
> >still would need to run a leader election > for each partition where you 
> >want to move the leader off of the blacklisted broker.  So the operation is 
> >still O(N) in > that sense-- you have to do something per partition.
> 
> For a failed broker and swapped with an empty broker, when it comes up, 
> it will not have any leadership, and we would like it to remain not 
> having leaderships for a couple of hours or days. So there is no 
> preferred leader election needed which incurs O(N) operation in this 
> case.  Putting the preferred leader blacklist would safe guard this 
> broker serving traffic during that time. otherwise, if another broker 
> fails(if this broker is the 1st, 2nd in the assignment), or someone 
> runs preferred leader election, this new "empty" broker can still get 
> leaderships. 
> 
> Also running reassignment to change the ordering of preferred leader 
> would not actually switch the leader automatically.  e.g.  (1,2,3) => 
> (2,3,1). unless preferred leader election is run to switch current 
> leader from 1 to 2.  So the operation is at least 2 x O(N).  and then 
> after the broker is back to normal, another 2 x O(N) to rollback. 

Hi George,

Hmm.  I guess I'm still on the fence about this feature.

In your example, I think we're comparing apples and oranges.  You started by 
outlining a scenario where "an empty broker... comes up... [without] any 
leadership[s]."  But then you criticize using reassignment to switch the order 
of preferred replicas because it "would not actually switch the leader 
automatically."  If the empty broker doesn't have any leaderships, there is 
nothing to be switched, right?

> 
> 
> > In general, reassignment will get a lot easier and quicker once KIP-455 is 
> > implemented.  > Reassignments that just change the order of preferred 
> > replicas for a specific partition should complete pretty much instantly.
> >> I think it's simpler and easier just to have one source of truth for what 
> >> the preferred replica is for a partition, rather than two.  So for> me, 
> >> the fact that the replica assignment ordering isn't changed is actually a 
> >> big disadvantage of this KIP.  If you are a new user (or just>  an 
> >> existing user that didn't read all of the documentation) and you just look 
> >> at the replica assignment, you might be confused by why> a particular 
> >> broker wasn't getting any leaderships, even  though it appeared like it 
> >> should.  More mechanisms mean more complexity> for users and developers 
> >> most of the time.
> 
> 
> I would like stress the point that running reassignment to change the 
> ordering of the replica (putting a broker to the end of partition 
> assignment) is unnecessary, because after some time the broker is 
> caught up, it can start serving traffic and then need to run 
> reassignments again to "rollback" to previous states. As I mentioned in 
> KIP-491, this is just tedious work. 

In general, using an external rebalancing tool like Cruise Control is a good 
idea to keep things balanced without having deal with manual rebalancing.  We 
expect more and more people who have a complex or large cluster will start 
using tools like this.

However, if you choose to do manual rebalancing, it shouldn't be that bad.  You 
would save the existing partition ordering before making your changes, then 
make your changes (perhaps by running a simple command line tool that switches 
the order of the replicas).  Then, once you felt like the broker was ready to 
serve traffic, you could just re-apply the old ordering which you had saved.

> 
> I agree this might introduce some complexities for users/developers. 
> But if this feature is good, and well documented, it is good for the 
> kafka product/community.  Just like KIP-460 enabling unclean leader 
> election to override TopicLevel/Broker Level config of 
> `unclean.leader.election.enable`
> 
> > I agree that it would be nice if we could treat some brokers differently 
> > for the purposes of placing replicas, selecting leaders, etc. > Right now, 
> > we don't have any way of implementing that without forking the broker.  I 
> > would support a new PlacementPolicy class that> would close this gap.  But 
> > I don't think this KIP is flexible enough to fill this role.  For example, 
> > it can't prevent users from creating> new single-replica topics that get 
> > put on the "bad" replica.  Perhaps we should reopen the discussion> about 
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-201%3A+Rationalising+Policy+interfaces
> 
> Creating topic with single-replica is beyond what KIP-491 is trying to 
> achieve.  The user needs to take responsibility of doing that. I do see 
> some Samza clients notoriously creating single-replica topics and that 
> got flagged by 

Re: [DISCUSS] KIP-491: Preferred Leader Deprioritized List (Temporary Blacklist)

2019-08-07 Thread George Li
 Hi Colin,

Thanks for your feedbacks.  Comments below:
> Even if you have a way of blacklisting an entire broker all at once, you 
>still would need to run a leader election > for each partition where you want 
>to move the leader off of the blacklisted broker.  So the operation is still 
>O(N) in > that sense-- you have to do something per partition.

For a failed broker and swapped with an empty broker, when it comes up, it will 
not have any leadership, and we would like it to remain not having leaderships 
for a couple of hours or days. So there is no preferred leader election needed 
which incurs O(N) operation in this case.  Putting the preferred leader 
blacklist would safe guard this broker serving traffic during that time. 
otherwise, if another broker fails(if this broker is the 1st, 2nd in the 
assignment), or someone runs preferred leader election, this new "empty" broker 
can still get leaderships. 

Also running reassignment to change the ordering of preferred leader would not 
actually switch the leader automatically.  e.g.  (1,2,3) => (2,3,1). unless 
preferred leader election is run to switch current leader from 1 to 2.  So the 
operation is at least 2 x O(N).  and then after the broker is back to normal, 
another 2 x O(N) to rollback. 


> In general, reassignment will get a lot easier and quicker once KIP-455 is 
> implemented.  > Reassignments that just change the order of preferred 
> replicas for a specific partition should complete pretty much instantly.
>> I think it's simpler and easier just to have one source of truth for what 
>> the preferred replica is for a partition, rather than two.  So for> me, the 
>> fact that the replica assignment ordering isn't changed is actually a big 
>> disadvantage of this KIP.  If you are a new user (or just>  an existing user 
>> that didn't read all of the documentation) and you just look at the replica 
>> assignment, you might be confused by why> a particular broker wasn't getting 
>> any leaderships, even  though it appeared like it should.  More mechanisms 
>> mean more complexity> for users and developers most of the time.


I would like stress the point that running reassignment to change the ordering 
of the replica (putting a broker to the end of partition assignment) is 
unnecessary, because after some time the broker is caught up, it can start 
serving traffic and then need to run reassignments again to "rollback" to 
previous states. As I mentioned in KIP-491, this is just tedious work. 

I agree this might introduce some complexities for users/developers. But if 
this feature is good, and well documented, it is good for the kafka 
product/community.  Just like KIP-460 enabling unclean leader election to 
override TopicLevel/Broker Level config of `unclean.leader.election.enable`

> I agree that it would be nice if we could treat some brokers differently for 
> the purposes of placing replicas, selecting leaders, etc. > Right now, we 
> don't have any way of implementing that without forking the broker.  I would 
> support a new PlacementPolicy class that> would close this gap.  But I don't 
> think this KIP is flexible enough to fill this role.  For example, it can't 
> prevent users from creating> new single-replica topics that get put on the 
> "bad" replica.  Perhaps we should reopen the discussion> about 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-201%3A+Rationalising+Policy+interfaces

Creating topic with single-replica is beyond what KIP-491 is trying to achieve. 
 The user needs to take responsibility of doing that. I do see some Samza 
clients notoriously creating single-replica topics and that got flagged by 
alerts, because a single broker down/maintenance will cause offline partitions. 
For KIP-491 preferred leader "blacklist",  the single-replica will still serve 
as leaders, because there is no other alternative replica to be chosen as 
leader. 

Even with a new PlacementPolicy for topic creation/partition expansion, it 
still needs the blacklist info (e.g. a zk path node, or broker level/topic 
level config) to "blacklist" the broker to be preferred leader? Would it be the 
same as KIP-491 is introducing? 


Thanks,
George

On Wednesday, August 7, 2019, 11:01:51 AM PDT, Colin McCabe 
 wrote:  
 
 On Fri, Aug 2, 2019, at 20:02, George Li wrote:
>  Hi Colin,
> Thanks for looking into this KIP.  Sorry for the late response. been busy. 
> 
> If a cluster has MAMY topic partitions, moving this "blacklist" broker 
> to the end of replica list is still a rather "big" operation, involving 
> submitting reassignments.  The KIP-491 way of blacklist is much 
> simpler/easier and can undo easily without changing the replica 
> assignment ordering. 

Hi George,

Even if you have a way of blacklisting an entire broker all at once, you still 
would need to run a leader election for each partition where you want to move 
the leader off of the blacklisted broker.  So the operation is still O(N) in 
that sense-- you 

Re: [DISCUSS] KIP-491: Preferred Leader Deprioritized List (Temporary Blacklist)

2019-08-07 Thread Colin McCabe
On Fri, Aug 2, 2019, at 20:02, George Li wrote:
>  Hi Colin,
> Thanks for looking into this KIP.  Sorry for the late response. been busy. 
> 
> If a cluster has MAMY topic partitions, moving this "blacklist" broker 
> to the end of replica list is still a rather "big" operation, involving 
> submitting reassignments.  The KIP-491 way of blacklist is much 
> simpler/easier and can undo easily without changing the replica 
> assignment ordering. 

Hi George,

Even if you have a way of blacklisting an entire broker all at once, you still 
would need to run a leader election for each partition where you want to move 
the leader off of the blacklisted broker.  So the operation is still O(N) in 
that sense-- you have to do something per partition.

In general, reassignment will get a lot easier and quicker once KIP-455 is 
implemented.  Reassignments that just change the order of preferred replicas 
for a specific partition should complete pretty much instantly.

I think it's simpler and easier just to have one source of truth for what the 
preferred replica is for a partition, rather than two.  So for me, the fact 
that the replica assignment ordering isn't changed is actually a big 
disadvantage of this KIP.  If you are a new user (or just an existing user that 
didn't read all of the documentation) and you just look at the replica 
assignment, you might be confused by why a particular broker wasn't getting any 
leaderships, even  though it appeared like it should.  More mechanisms mean 
more complexity for users and developers most of the time.

> Major use case for me, a failed broker got swapped with new hardware, 
> and starts up as empty (with latest offset of all partitions), the SLA 
> of retention is 1 day, so before this broker is up to be in-sync for 1 
> day, we would like to blacklist this broker from serving traffic. after 
> 1 day, the blacklist is removed and run preferred leader election.  
> This way, no need to run reassignments before/after.  This is the 
> "temporary" use-case.

What if we just add an option to the reassignment tool to generate a plan to 
move all the leaders off of a specific broker?  The tool could also run a 
leader election as well.  That would be a simple way of doing this without 
adding new mechanisms or broker-side configurations, etc.

> 
> There are use-cases that this Preferred Leader "blacklist" can be 
> somewhat permanent, as I explained in the AWS data center instances Vs. 
> on-premises data center bare metal machines (heterogenous hardware), 
> that the AWS broker_ids will be blacklisted.  So new topics created,  
> or existing topic expansion would not make them serve traffic even they 
> could be the preferred leader. 

I agree that it would be nice if we could treat some brokers differently for 
the purposes of placing replicas, selecting leaders, etc.  Right now, we don't 
have any way of implementing that without forking the broker.  I would support 
a new PlacementPolicy class that would close this gap.  But I don't think this 
KIP is flexible enough to fill this role.  For example, it can't prevent users 
from creating new single-replica topics that get put on the "bad" replica.  
Perhaps we should reopen the discussion about 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-201%3A+Rationalising+Policy+interfaces

regards,
Colin

> 
> Please let me know there are more question. 
> 
> 
> Thanks,
> George
> 
> On Thursday, July 25, 2019, 08:38:28 AM PDT, Colin McCabe 
>  wrote:  
>  
>  We still want to give the "blacklisted" broker the leadership if 
> nobody else is available.  Therefore, isn't putting a broker on the 
> blacklist pretty much the same as moving it to the last entry in the 
> replicas list and then triggering a preferred leader election?
> 
> If we want this to be undone after a certain amount of time, or under 
> certain conditions, that seems like something that would be more 
> effectively done by an external system, rather than putting all these 
> policies into Kafka.
> 
> best,
> Colin
> 
> 
> On Fri, Jul 19, 2019, at 18:23, George Li wrote:
> >  Hi Satish,
> > Thanks for the reviews and feedbacks.
> > 
> > > > The following is the requirements this KIP is trying to accomplish:
> > > This can be moved to the"Proposed changes" section.
> > 
> > Updated the KIP-491. 
> > 
> > > >>The logic to determine the priority/order of which broker should be
> > > preferred leader should be modified.  The broker in the preferred leader
> > > blacklist should be moved to the end (lowest priority) when
> > > determining leadership.
> > >
> > > I believe there is no change required in the ordering of the preferred
> > > replica list. Brokers in the preferred leader blacklist are skipped
> > > until other brokers int he list are unavailable.
> > 
> > Yes. partition assignment remained the same, replica & ordering. The 
> > blacklist logic can be optimized during implementation. 
> > 
> > > >>The blacklist can be at the broker level. 

Re: [DISCUSS] KIP-491: Preferred Leader Deprioritized List (Temporary Blacklist)

2019-08-07 Thread Satish Duggana
Hi George,
Thanks for addressing the comments. I do not have any more questions.

On Wed, Aug 7, 2019 at 11:08 AM George Li
 wrote:
>
>  Hi Colin, Satish, Stanislav,
>
> Did I answer all your comments/concerns for KIP-491 ?  Please let me know if 
> you have more questions regarding this feature.  I would like to start coding 
> soon. I hope this feature can get into the open source trunk so every time we 
> upgrade Kafka in our environment, we don't need to cherry pick this.
>
> BTW, I have added below in KIP-491 for auto.leader.rebalance.enable behavior 
> with the new Preferred Leader "Blacklist".
>
> "When auto.leader.rebalance.enable is enabled.  The broker(s) in the 
> preferred leader "blacklist" should be excluded from being elected leaders. "
>
>
> Thanks,
> George
>
> On Friday, August 2, 2019, 08:02:07 PM PDT, George Li 
>  wrote:
>
>   Hi Colin,
> Thanks for looking into this KIP.  Sorry for the late response. been busy.
>
> If a cluster has MAMY topic partitions, moving this "blacklist" broker to the 
> end of replica list is still a rather "big" operation, involving submitting 
> reassignments.  The KIP-491 way of blacklist is much simpler/easier and can 
> undo easily without changing the replica assignment ordering.
> Major use case for me, a failed broker got swapped with new hardware, and 
> starts up as empty (with latest offset of all partitions), the SLA of 
> retention is 1 day, so before this broker is up to be in-sync for 1 day, we 
> would like to blacklist this broker from serving traffic. after 1 day, the 
> blacklist is removed and run preferred leader election.  This way, no need to 
> run reassignments before/after.  This is the "temporary" use-case.
>
> There are use-cases that this Preferred Leader "blacklist" can be somewhat 
> permanent, as I explained in the AWS data center instances Vs. on-premises 
> data center bare metal machines (heterogenous hardware), that the AWS 
> broker_ids will be blacklisted.  So new topics created,  or existing topic 
> expansion would not make them serve traffic even they could be the preferred 
> leader.
>
> Please let me know there are more question.
>
>
> Thanks,
> George
>
> On Thursday, July 25, 2019, 08:38:28 AM PDT, Colin McCabe 
>  wrote:
>
>  We still want to give the "blacklisted" broker the leadership if nobody else 
> is available.  Therefore, isn't putting a broker on the blacklist pretty much 
> the same as moving it to the last entry in the replicas list and then 
> triggering a preferred leader election?
>
> If we want this to be undone after a certain amount of time, or under certain 
> conditions, that seems like something that would be more effectively done by 
> an external system, rather than putting all these policies into Kafka.
>
> best,
> Colin
>
>
> On Fri, Jul 19, 2019, at 18:23, George Li wrote:
> >  Hi Satish,
> > Thanks for the reviews and feedbacks.
> >
> > > > The following is the requirements this KIP is trying to accomplish:
> > > This can be moved to the"Proposed changes" section.
> >
> > Updated the KIP-491.
> >
> > > >>The logic to determine the priority/order of which broker should be
> > > preferred leader should be modified.  The broker in the preferred leader
> > > blacklist should be moved to the end (lowest priority) when
> > > determining leadership.
> > >
> > > I believe there is no change required in the ordering of the preferred
> > > replica list. Brokers in the preferred leader blacklist are skipped
> > > until other brokers int he list are unavailable.
> >
> > Yes. partition assignment remained the same, replica & ordering. The
> > blacklist logic can be optimized during implementation.
> >
> > > >>The blacklist can be at the broker level. However, there might be use 
> > > >>cases
> > > where a specific topic should blacklist particular brokers, which
> > > would be at the
> > > Topic level Config. For this use cases of this KIP, it seems that broker 
> > > level
> > > blacklist would suffice.  Topic level preferred leader blacklist might
> > > be future enhancement work.
> > >
> > > I agree that the broker level preferred leader blacklist would be
> > > sufficient. Do you have any use cases which require topic level
> > > preferred blacklist?
> >
> >
> >
> > I don't have any concrete use cases for Topic level preferred leader
> > blacklist.  One scenarios I can think of is when a broker has high CPU
> > usage, trying to identify the big topics (High MsgIn, High BytesIn,
> > etc), then try to move the leaders away from this broker,  before doing
> > an actual reassignment to change its preferred leader,  try to put this
> > preferred_leader_blacklist in the Topic Level config, and run preferred
> > leader election, and see whether CPU decreases for this broker,  if
> > yes, then do the reassignments to change the preferred leaders to be
> > "permanent" (the topic may have many partitions like 256 that has quite
> > a few of them having this broker as preferred leader).  So this 

Re: [DISCUSS] KIP-491: Preferred Leader Deprioritized List (Temporary Blacklist)

2019-08-06 Thread George Li
 Hi Colin, Satish, Stanislav, 

Did I answer all your comments/concerns for KIP-491 ?  Please let me know if 
you have more questions regarding this feature.  I would like to start coding 
soon. I hope this feature can get into the open source trunk so every time we 
upgrade Kafka in our environment, we don't need to cherry pick this.

BTW, I have added below in KIP-491 for auto.leader.rebalance.enable behavior 
with the new Preferred Leader "Blacklist".  

"When auto.leader.rebalance.enable is enabled.  The broker(s) in the preferred 
leader "blacklist" should be excluded from being elected leaders. "


Thanks,
George

On Friday, August 2, 2019, 08:02:07 PM PDT, George Li 
 wrote:  
 
  Hi Colin,
Thanks for looking into this KIP.  Sorry for the late response. been busy. 

If a cluster has MAMY topic partitions, moving this "blacklist" broker to the 
end of replica list is still a rather "big" operation, involving submitting 
reassignments.  The KIP-491 way of blacklist is much simpler/easier and can 
undo easily without changing the replica assignment ordering. 
Major use case for me, a failed broker got swapped with new hardware, and 
starts up as empty (with latest offset of all partitions), the SLA of retention 
is 1 day, so before this broker is up to be in-sync for 1 day, we would like to 
blacklist this broker from serving traffic. after 1 day, the blacklist is 
removed and run preferred leader election.  This way, no need to run 
reassignments before/after.  This is the "temporary" use-case.

There are use-cases that this Preferred Leader "blacklist" can be somewhat 
permanent, as I explained in the AWS data center instances Vs. on-premises data 
center bare metal machines (heterogenous hardware), that the AWS broker_ids 
will be blacklisted.  So new topics created,  or existing topic expansion would 
not make them serve traffic even they could be the preferred leader. 

Please let me know there are more question. 


Thanks,
George

    On Thursday, July 25, 2019, 08:38:28 AM PDT, Colin McCabe 
 wrote:  
 
 We still want to give the "blacklisted" broker the leadership if nobody else 
is available.  Therefore, isn't putting a broker on the blacklist pretty much 
the same as moving it to the last entry in the replicas list and then 
triggering a preferred leader election?

If we want this to be undone after a certain amount of time, or under certain 
conditions, that seems like something that would be more effectively done by an 
external system, rather than putting all these policies into Kafka.

best,
Colin


On Fri, Jul 19, 2019, at 18:23, George Li wrote:
>  Hi Satish,
> Thanks for the reviews and feedbacks.
> 
> > > The following is the requirements this KIP is trying to accomplish:
> > This can be moved to the"Proposed changes" section.
> 
> Updated the KIP-491. 
> 
> > >>The logic to determine the priority/order of which broker should be
> > preferred leader should be modified.  The broker in the preferred leader
> > blacklist should be moved to the end (lowest priority) when
> > determining leadership.
> >
> > I believe there is no change required in the ordering of the preferred
> > replica list. Brokers in the preferred leader blacklist are skipped
> > until other brokers int he list are unavailable.
> 
> Yes. partition assignment remained the same, replica & ordering. The 
> blacklist logic can be optimized during implementation. 
> 
> > >>The blacklist can be at the broker level. However, there might be use 
> > >>cases
> > where a specific topic should blacklist particular brokers, which
> > would be at the
> > Topic level Config. For this use cases of this KIP, it seems that broker 
> > level
> > blacklist would suffice.  Topic level preferred leader blacklist might
> > be future enhancement work.
> > 
> > I agree that the broker level preferred leader blacklist would be
> > sufficient. Do you have any use cases which require topic level
> > preferred blacklist?
> 
> 
> 
> I don't have any concrete use cases for Topic level preferred leader 
> blacklist.  One scenarios I can think of is when a broker has high CPU 
> usage, trying to identify the big topics (High MsgIn, High BytesIn, 
> etc), then try to move the leaders away from this broker,  before doing 
> an actual reassignment to change its preferred leader,  try to put this 
> preferred_leader_blacklist in the Topic Level config, and run preferred 
> leader election, and see whether CPU decreases for this broker,  if 
> yes, then do the reassignments to change the preferred leaders to be 
> "permanent" (the topic may have many partitions like 256 that has quite 
> a few of them having this broker as preferred leader).  So this Topic 
> Level config is an easy way of doing trial and check the result. 
> 
> 
> > You can add the below workaround as an item in the rejected alternatives 
> > section
> > "Reassigning all the topic/partitions which the intended broker is a
> > replica for."
> 
> Updated the KIP-491. 
> 
> 
> 
> 

Re: [DISCUSS] KIP-491: Preferred Leader Deprioritized List (Temporary Blacklist)

2019-08-02 Thread George Li
 Hi Colin,
Thanks for looking into this KIP.  Sorry for the late response. been busy. 

If a cluster has MAMY topic partitions, moving this "blacklist" broker to the 
end of replica list is still a rather "big" operation, involving submitting 
reassignments.  The KIP-491 way of blacklist is much simpler/easier and can 
undo easily without changing the replica assignment ordering. 
Major use case for me, a failed broker got swapped with new hardware, and 
starts up as empty (with latest offset of all partitions), the SLA of retention 
is 1 day, so before this broker is up to be in-sync for 1 day, we would like to 
blacklist this broker from serving traffic. after 1 day, the blacklist is 
removed and run preferred leader election.  This way, no need to run 
reassignments before/after.  This is the "temporary" use-case.

There are use-cases that this Preferred Leader "blacklist" can be somewhat 
permanent, as I explained in the AWS data center instances Vs. on-premises data 
center bare metal machines (heterogenous hardware), that the AWS broker_ids 
will be blacklisted.  So new topics created,  or existing topic expansion would 
not make them serve traffic even they could be the preferred leader. 

Please let me know there are more question. 


Thanks,
George

On Thursday, July 25, 2019, 08:38:28 AM PDT, Colin McCabe 
 wrote:  
 
 We still want to give the "blacklisted" broker the leadership if nobody else 
is available.  Therefore, isn't putting a broker on the blacklist pretty much 
the same as moving it to the last entry in the replicas list and then 
triggering a preferred leader election?

If we want this to be undone after a certain amount of time, or under certain 
conditions, that seems like something that would be more effectively done by an 
external system, rather than putting all these policies into Kafka.

best,
Colin


On Fri, Jul 19, 2019, at 18:23, George Li wrote:
>  Hi Satish,
> Thanks for the reviews and feedbacks.
> 
> > > The following is the requirements this KIP is trying to accomplish:
> > This can be moved to the"Proposed changes" section.
> 
> Updated the KIP-491. 
> 
> > >>The logic to determine the priority/order of which broker should be
> > preferred leader should be modified.  The broker in the preferred leader
> > blacklist should be moved to the end (lowest priority) when
> > determining leadership.
> >
> > I believe there is no change required in the ordering of the preferred
> > replica list. Brokers in the preferred leader blacklist are skipped
> > until other brokers int he list are unavailable.
> 
> Yes. partition assignment remained the same, replica & ordering. The 
> blacklist logic can be optimized during implementation. 
> 
> > >>The blacklist can be at the broker level. However, there might be use 
> > >>cases
> > where a specific topic should blacklist particular brokers, which
> > would be at the
> > Topic level Config. For this use cases of this KIP, it seems that broker 
> > level
> > blacklist would suffice.  Topic level preferred leader blacklist might
> > be future enhancement work.
> > 
> > I agree that the broker level preferred leader blacklist would be
> > sufficient. Do you have any use cases which require topic level
> > preferred blacklist?
> 
> 
> 
> I don't have any concrete use cases for Topic level preferred leader 
> blacklist.  One scenarios I can think of is when a broker has high CPU 
> usage, trying to identify the big topics (High MsgIn, High BytesIn, 
> etc), then try to move the leaders away from this broker,  before doing 
> an actual reassignment to change its preferred leader,  try to put this 
> preferred_leader_blacklist in the Topic Level config, and run preferred 
> leader election, and see whether CPU decreases for this broker,  if 
> yes, then do the reassignments to change the preferred leaders to be 
> "permanent" (the topic may have many partitions like 256 that has quite 
> a few of them having this broker as preferred leader).  So this Topic 
> Level config is an easy way of doing trial and check the result. 
> 
> 
> > You can add the below workaround as an item in the rejected alternatives 
> > section
> > "Reassigning all the topic/partitions which the intended broker is a
> > replica for."
> 
> Updated the KIP-491. 
> 
> 
> 
> Thanks, 
> George
> 
>    On Friday, July 19, 2019, 08:20:22 AM PDT, Satish Duggana 
>  wrote:  
>  
>  Thanks for the KIP. I have put my comments below.
> 
> This is a nice improvement to avoid cumbersome maintenance.
> 
> >> The following is the requirements this KIP is trying to accomplish:
>   The ability to add and remove the preferred leader deprioritized
> list/blacklist. e.g. new ZK path/node or new dynamic config.
> 
> This can be moved to the"Proposed changes" section.
> 
> >>The logic to determine the priority/order of which broker should be
> preferred leader should be modified.  The broker in the preferred leader
> blacklist should be moved to the end (lowest priority) when
> 

Re: [DISCUSS] KIP-491: Preferred Leader Deprioritized List (Temporary Blacklist)

2019-07-25 Thread Colin McCabe
We still want to give the "blacklisted" broker the leadership if nobody else is 
available.  Therefore, isn't putting a broker on the blacklist pretty much the 
same as moving it to the last entry in the replicas list and then triggering a 
preferred leader election?

If we want this to be undone after a certain amount of time, or under certain 
conditions, that seems like something that would be more effectively done by an 
external system, rather than putting all these policies into Kafka.

best,
Colin


On Fri, Jul 19, 2019, at 18:23, George Li wrote:
>  Hi Satish,
> Thanks for the reviews and feedbacks.
> 
> > > The following is the requirements this KIP is trying to accomplish:
> > This can be moved to the"Proposed changes" section.
> 
> Updated the KIP-491. 
> 
> > >>The logic to determine the priority/order of which broker should be
> > preferred leader should be modified.  The broker in the preferred leader
> > blacklist should be moved to the end (lowest priority) when
> > determining leadership.
> >
> > I believe there is no change required in the ordering of the preferred
> > replica list. Brokers in the preferred leader blacklist are skipped
> > until other brokers int he list are unavailable.
> 
> Yes. partition assignment remained the same, replica & ordering. The 
> blacklist logic can be optimized during implementation. 
> 
> > >>The blacklist can be at the broker level. However, there might be use 
> > >>cases
> > where a specific topic should blacklist particular brokers, which
> > would be at the
> > Topic level Config. For this use cases of this KIP, it seems that broker 
> > level
> > blacklist would suffice.  Topic level preferred leader blacklist might
> > be future enhancement work.
> > 
> > I agree that the broker level preferred leader blacklist would be
> > sufficient. Do you have any use cases which require topic level
> > preferred blacklist?
> 
> 
> 
> I don't have any concrete use cases for Topic level preferred leader 
> blacklist.  One scenarios I can think of is when a broker has high CPU 
> usage, trying to identify the big topics (High MsgIn, High BytesIn, 
> etc), then try to move the leaders away from this broker,  before doing 
> an actual reassignment to change its preferred leader,  try to put this 
> preferred_leader_blacklist in the Topic Level config, and run preferred 
> leader election, and see whether CPU decreases for this broker,  if 
> yes, then do the reassignments to change the preferred leaders to be 
> "permanent" (the topic may have many partitions like 256 that has quite 
> a few of them having this broker as preferred leader).  So this Topic 
> Level config is an easy way of doing trial and check the result. 
> 
> 
> > You can add the below workaround as an item in the rejected alternatives 
> > section
> > "Reassigning all the topic/partitions which the intended broker is a
> > replica for."
> 
> Updated the KIP-491. 
> 
> 
> 
> Thanks, 
> George
> 
> On Friday, July 19, 2019, 08:20:22 AM PDT, Satish Duggana 
>  wrote:  
>  
>  Thanks for the KIP. I have put my comments below.
> 
> This is a nice improvement to avoid cumbersome maintenance.
> 
> >> The following is the requirements this KIP is trying to accomplish:
>   The ability to add and remove the preferred leader deprioritized
> list/blacklist. e.g. new ZK path/node or new dynamic config.
> 
> This can be moved to the"Proposed changes" section.
> 
> >>The logic to determine the priority/order of which broker should be
> preferred leader should be modified.  The broker in the preferred leader
> blacklist should be moved to the end (lowest priority) when
> determining leadership.
> 
> I believe there is no change required in the ordering of the preferred
> replica list. Brokers in the preferred leader blacklist are skipped
> until other brokers int he list are unavailable.
> 
> >>The blacklist can be at the broker level. However, there might be use cases
> where a specific topic should blacklist particular brokers, which
> would be at the
> Topic level Config. For this use cases of this KIP, it seems that broker level
> blacklist would suffice.  Topic level preferred leader blacklist might
> be future enhancement work.
> 
> I agree that the broker level preferred leader blacklist would be
> sufficient. Do you have any use cases which require topic level
> preferred blacklist?
> 
> You can add the below workaround as an item in the rejected alternatives 
> section
> "Reassigning all the topic/partitions which the intended broker is a
> replica for."
> 
> Thanks,
> Satish.
> 
> On Fri, Jul 19, 2019 at 7:33 AM Stanislav Kozlovski
>  wrote:
> >
> > Hey George,
> >
> > Thanks for the KIP, it's an interesting idea.
> >
> > I was wondering whether we could achieve the same thing via the
> > kafka-reassign-partitions tool. As you had also said in the JIRA,  it is
> > true that this is currently very tedious with the tool. My thoughts are
> > that we could improve the tool and give it the notion of 

Re: [DISCUSS] KIP-491: Preferred Leader Deprioritized List (Temporary Blacklist)

2019-07-19 Thread George Li
 Hi Satish,
Thanks for the reviews and feedbacks.

> > The following is the requirements this KIP is trying to accomplish:
> This can be moved to the"Proposed changes" section.

Updated the KIP-491. 

> >>The logic to determine the priority/order of which broker should be
> preferred leader should be modified.  The broker in the preferred leader
> blacklist should be moved to the end (lowest priority) when
> determining leadership.
>
> I believe there is no change required in the ordering of the preferred
> replica list. Brokers in the preferred leader blacklist are skipped
> until other brokers int he list are unavailable.

Yes. partition assignment remained the same, replica & ordering. The blacklist 
logic can be optimized during implementation. 

> >>The blacklist can be at the broker level. However, there might be use cases
> where a specific topic should blacklist particular brokers, which
> would be at the
> Topic level Config. For this use cases of this KIP, it seems that broker level
> blacklist would suffice.  Topic level preferred leader blacklist might
> be future enhancement work.
> 
> I agree that the broker level preferred leader blacklist would be
> sufficient. Do you have any use cases which require topic level
> preferred blacklist?



I don't have any concrete use cases for Topic level preferred leader blacklist. 
 One scenarios I can think of is when a broker has high CPU usage, trying to 
identify the big topics (High MsgIn, High BytesIn, etc), then try to move the 
leaders away from this broker,  before doing an actual reassignment to change 
its preferred leader,  try to put this preferred_leader_blacklist in the Topic 
Level config, and run preferred leader election, and see whether CPU decreases 
for this broker,  if yes, then do the reassignments to change the preferred 
leaders to be "permanent" (the topic may have many partitions like 256 that has 
quite a few of them having this broker as preferred leader).  So this Topic 
Level config is an easy way of doing trial and check the result. 


> You can add the below workaround as an item in the rejected alternatives 
> section
> "Reassigning all the topic/partitions which the intended broker is a
> replica for."

Updated the KIP-491. 



Thanks, 
George

On Friday, July 19, 2019, 08:20:22 AM PDT, Satish Duggana 
 wrote:  
 
 Thanks for the KIP. I have put my comments below.

This is a nice improvement to avoid cumbersome maintenance.

>> The following is the requirements this KIP is trying to accomplish:
  The ability to add and remove the preferred leader deprioritized
list/blacklist. e.g. new ZK path/node or new dynamic config.

This can be moved to the"Proposed changes" section.

>>The logic to determine the priority/order of which broker should be
preferred leader should be modified.  The broker in the preferred leader
blacklist should be moved to the end (lowest priority) when
determining leadership.

I believe there is no change required in the ordering of the preferred
replica list. Brokers in the preferred leader blacklist are skipped
until other brokers int he list are unavailable.

>>The blacklist can be at the broker level. However, there might be use cases
where a specific topic should blacklist particular brokers, which
would be at the
Topic level Config. For this use cases of this KIP, it seems that broker level
blacklist would suffice.  Topic level preferred leader blacklist might
be future enhancement work.

I agree that the broker level preferred leader blacklist would be
sufficient. Do you have any use cases which require topic level
preferred blacklist?

You can add the below workaround as an item in the rejected alternatives section
"Reassigning all the topic/partitions which the intended broker is a
replica for."

Thanks,
Satish.

On Fri, Jul 19, 2019 at 7:33 AM Stanislav Kozlovski
 wrote:
>
> Hey George,
>
> Thanks for the KIP, it's an interesting idea.
>
> I was wondering whether we could achieve the same thing via the
> kafka-reassign-partitions tool. As you had also said in the JIRA,  it is
> true that this is currently very tedious with the tool. My thoughts are
> that we could improve the tool and give it the notion of a "blacklisted
> preferred leader".
> This would have some benefits like:
> - more fine-grained control over the blacklist. we may not want to
> blacklist all the preferred leaders, as that would make the blacklisted
> broker a follower of last resort which is not very useful. In the cases of
> an underpowered AWS machine or a controller, you might overshoot and make
> the broker very underutilized if you completely make it leaderless.
> - is not permanent. If we are to have a blacklist leaders config,
> rebalancing tools would also need to know about it and manipulate/respect
> it to achieve a fair balance.
> It seems like both problems are tied to balancing partitions, it's just
> that KIP-491's use case wants to balance them against other factors in a
> more nuanced way. It makes 

Re: [DISCUSS] KIP-491: Preferred Leader Deprioritized List (Temporary Blacklist)

2019-07-19 Thread George Li
 Hi Stanislav,

Thanks for taking time to do the review and feedbacks.  

The Preferred Leader "Blacklist" feature is meant to be temporary in most use 
cases listed (I will explain a case which might need to be "permanent" below). 
It's a quick/easy way for the on-call engineer to take away leaderships of a 
problem broker and mitigate kafka cluster production issues.  

The reassignment/rebalance is expensive especially involving moving a replica 
to a different broker. Even same replicas but changing the preferred leader 
ordering, it will require running reassignments (batching, staggering running 
in Production), and when the issue is resolved (e.g. empty broker caught-up 
with retention time,  the broker having hardware issues with poor performance 
is replaced, controller switched, etc.),  need to run reassignments again 
(either rollback previous reassignments or run rebalance to generate a new 
plan).  As you see, this reassignment approach is more tedious.  If there is a 
Preferred Leader blacklist of a broker, it can be simply added and removed to 
take effect. 

Below are some answers to your questions.

> - more fine-grained control over the blacklist. we may not want to
> blacklist all the preferred leaders, as that would make the blacklisted
> broker a follower of last resort which is not very useful. In the cases of
> an underpowered AWS machine or a controller, you might overshoot and make
> the broker very underutilized if you completely make it leaderless.

The current proposed changes in KIP-491 is to have the Preferred Leader 
Blacklist at the broker level, as it seems that it can satisfy most use-cases 
listed.  A fine-grained control feature can be added if there is a need to have 
preferred leader blacklist at the Topic Level (e.g. have a new topic config at 
the topic level).  

> - is not permanent. If we are to have a blacklist leaders config,
> rebalancing tools would also need to know about it and manipulate/respect
> it to achieve a fair balance.
> It seems like both problems are tied to balancing partitions, it's just
> that KIP-491's use case wants to balance them against other factors in a
> more nuanced way. It makes sense to have both be done from the same place


Most of the use case, the preferred leader blacklist is temporary.  One case I 
could think of that will be somewhat permanent is the Cross Data Center less 
powerful AWS instances case. For some critical data which needs protection 
against data loss because of the whole DC failure.  We have 1 on-premise data 
center, and 2 AWS data centers.  The topic/partition replicas are spread to 
these 3 DCs.  

The Preferred Leader Blacklist will be somewhat permanent in this case.  Even 
we run reassignments to move all preferred leaders to the On-Premises brokers 
for existing topics, there is always new topics created and existing topics 
partitions getting expanded for capacity growth.  The new partitions' preferred 
leaders are not guaranteed to be the on-premises brokers.  The topic management 
(new/expand) code needs some info about blacklist leaders, which is missing 
now.   With the Preferred Leader Blacklist in-place, we can make sure the AWS 
DC instances broker will not be serving traffic normally, unless the on-prem 
brokers is down. It's a better safe guard for better performance. 

> To make note of the motivation section:
> > Avoid bouncing broker in order to lose its leadership
> The recommended way to make a broker lose its leadership is to run a
> reassignment on its partitions


Understood.  This new preferred leader blacklist feature is trying to improve 
and make it easier/cleaner/quicker to do it. 

> > The cross-data center cluster has AWS cloud instances which have less
> computing power
> We recommend running Kafka on homogeneous machines. It would be cool if the
> system supported more flexibility in that regard but that is more nuanced
> and a preferred leader blacklist may not be the best first approach to the
> issue
We are aware of recommendation of not having heterogeneous hardware in the 
kafka cluster, but it this case, it's more cost-efficient to use AWS than 
spawning a new on-premise DC nearby with low latency.  


> Adding a new config which can fundamentally change the way replication is
> done is complex, both for the system (the replication code is complex
> enough) and the user. Users would have another potential config that could
> backfire on them - e.g if left forgotten.


Actually, this new proposed new dynamic config (e.g. 
preferred_leader_blacklist) should not affect replication code at all. It will 
just provide more information when leadership is determined (moving the brokers 
in the blacklist to the lowest priority) during preferred leader election or a 
failed broker with its leaders going to other live brokers. 

Just like any other configs,  the users need to understand what the config 
exactly is and need to add/remove config accordingly to the issues/situations 

Re: [DISCUSS] KIP-491: Preferred Leader Deprioritized List (Temporary Blacklist)

2019-07-19 Thread Satish Duggana
Thanks for the KIP. I have put my comments below.

This is a nice improvement to avoid cumbersome maintenance.

>> The following is the requirements this KIP is trying to accomplish:
   The ability to add and remove the preferred leader deprioritized
list/blacklist. e.g. new ZK path/node or new dynamic config.

This can be moved to the"Proposed changes" section.

>>The logic to determine the priority/order of which broker should be
preferred leader should be modified.  The broker in the preferred leader
blacklist should be moved to the end (lowest priority) when
determining leadership.

I believe there is no change required in the ordering of the preferred
replica list. Brokers in the preferred leader blacklist are skipped
until other brokers int he list are unavailable.

>>The blacklist can be at the broker level. However, there might be use cases
where a specific topic should blacklist particular brokers, which
would be at the
Topic level Config. For this use cases of this KIP, it seems that broker level
blacklist would suffice.  Topic level preferred leader blacklist might
be future enhancement work.

I agree that the broker level preferred leader blacklist would be
sufficient. Do you have any use cases which require topic level
preferred blacklist?

You can add the below workaround as an item in the rejected alternatives section
"Reassigning all the topic/partitions which the intended broker is a
replica for."

Thanks,
Satish.

On Fri, Jul 19, 2019 at 7:33 AM Stanislav Kozlovski
 wrote:
>
> Hey George,
>
> Thanks for the KIP, it's an interesting idea.
>
> I was wondering whether we could achieve the same thing via the
> kafka-reassign-partitions tool. As you had also said in the JIRA,  it is
> true that this is currently very tedious with the tool. My thoughts are
> that we could improve the tool and give it the notion of a "blacklisted
> preferred leader".
> This would have some benefits like:
> - more fine-grained control over the blacklist. we may not want to
> blacklist all the preferred leaders, as that would make the blacklisted
> broker a follower of last resort which is not very useful. In the cases of
> an underpowered AWS machine or a controller, you might overshoot and make
> the broker very underutilized if you completely make it leaderless.
> - is not permanent. If we are to have a blacklist leaders config,
> rebalancing tools would also need to know about it and manipulate/respect
> it to achieve a fair balance.
> It seems like both problems are tied to balancing partitions, it's just
> that KIP-491's use case wants to balance them against other factors in a
> more nuanced way. It makes sense to have both be done from the same place
>
> To make note of the motivation section:
> > Avoid bouncing broker in order to lose its leadership
> The recommended way to make a broker lose its leadership is to run a
> reassignment on its partitions
> > The cross-data center cluster has AWS cloud instances which have less
> computing power
> We recommend running Kafka on homogeneous machines. It would be cool if the
> system supported more flexibility in that regard but that is more nuanced
> and a preferred leader blacklist may not be the best first approach to the
> issue
>
> Adding a new config which can fundamentally change the way replication is
> done is complex, both for the system (the replication code is complex
> enough) and the user. Users would have another potential config that could
> backfire on them - e.g if left forgotten.
>
> Could you think of any downsides to implementing this functionality (or a
> variation of it) in the kafka-reassign-partitions.sh tool?
> One downside I can see is that we would not have it handle new partitions
> created after the "blacklist operation". As a first iteration I think that
> may be acceptable
>
> Thanks,
> Stanislav
>
> On Fri, Jul 19, 2019 at 3:20 AM George Li 
> wrote:
>
> >  Hi,
> >
> > Pinging the list for the feedbacks of this KIP-491  (
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120736982
> > )
> >
> >
> > Thanks,
> > George
> >
> > On Saturday, July 13, 2019, 08:43:25 PM PDT, George Li <
> > sql_consult...@yahoo.com.INVALID> wrote:
> >
> >   Hi,
> >
> > I have created KIP-491 (
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120736982)
> > for putting a broker to the preferred leader blacklist or deprioritized
> > list so when determining leadership,  it's moved to the lowest priority for
> > some of the listed use-cases.
> >
> > Please provide your comments/feedbacks.
> >
> > Thanks,
> > George
> >
> >
> >
> >   - Forwarded Message - From: Jose Armando Garcia Sancio (JIRA) <
> > j...@apache.org>To: "sql_consult...@yahoo.com" 
> > Sent:
> > Tuesday, July 9, 2019, 01:06:05 PM PDTSubject: [jira] [Commented]
> > (KAFKA-8638) Preferred Leader Blacklist (deprioritized list)
> >
> > [
> > 

Re: [DISCUSS] KIP-491: Preferred Leader Deprioritized List (Temporary Blacklist)

2019-07-19 Thread Stanislav Kozlovski
Hey George,

Thanks for the KIP, it's an interesting idea.

I was wondering whether we could achieve the same thing via the
kafka-reassign-partitions tool. As you had also said in the JIRA,  it is
true that this is currently very tedious with the tool. My thoughts are
that we could improve the tool and give it the notion of a "blacklisted
preferred leader".
This would have some benefits like:
- more fine-grained control over the blacklist. we may not want to
blacklist all the preferred leaders, as that would make the blacklisted
broker a follower of last resort which is not very useful. In the cases of
an underpowered AWS machine or a controller, you might overshoot and make
the broker very underutilized if you completely make it leaderless.
- is not permanent. If we are to have a blacklist leaders config,
rebalancing tools would also need to know about it and manipulate/respect
it to achieve a fair balance.
It seems like both problems are tied to balancing partitions, it's just
that KIP-491's use case wants to balance them against other factors in a
more nuanced way. It makes sense to have both be done from the same place

To make note of the motivation section:
> Avoid bouncing broker in order to lose its leadership
The recommended way to make a broker lose its leadership is to run a
reassignment on its partitions
> The cross-data center cluster has AWS cloud instances which have less
computing power
We recommend running Kafka on homogeneous machines. It would be cool if the
system supported more flexibility in that regard but that is more nuanced
and a preferred leader blacklist may not be the best first approach to the
issue

Adding a new config which can fundamentally change the way replication is
done is complex, both for the system (the replication code is complex
enough) and the user. Users would have another potential config that could
backfire on them - e.g if left forgotten.

Could you think of any downsides to implementing this functionality (or a
variation of it) in the kafka-reassign-partitions.sh tool?
One downside I can see is that we would not have it handle new partitions
created after the "blacklist operation". As a first iteration I think that
may be acceptable

Thanks,
Stanislav

On Fri, Jul 19, 2019 at 3:20 AM George Li 
wrote:

>  Hi,
>
> Pinging the list for the feedbacks of this KIP-491  (
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120736982
> )
>
>
> Thanks,
> George
>
> On Saturday, July 13, 2019, 08:43:25 PM PDT, George Li <
> sql_consult...@yahoo.com.INVALID> wrote:
>
>   Hi,
>
> I have created KIP-491 (
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120736982)
> for putting a broker to the preferred leader blacklist or deprioritized
> list so when determining leadership,  it's moved to the lowest priority for
> some of the listed use-cases.
>
> Please provide your comments/feedbacks.
>
> Thanks,
> George
>
>
>
>   - Forwarded Message - From: Jose Armando Garcia Sancio (JIRA) <
> j...@apache.org>To: "sql_consult...@yahoo.com" Sent:
> Tuesday, July 9, 2019, 01:06:05 PM PDTSubject: [jira] [Commented]
> (KAFKA-8638) Preferred Leader Blacklist (deprioritized list)
>
> [
> https://issues.apache.org/jira/browse/KAFKA-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881511#comment-16881511
> ]
>
> Jose Armando Garcia Sancio commented on KAFKA-8638:
> ---
>
> Thanks for feedback and clear use cases [~sql_consulting].
>
> > Preferred Leader Blacklist (deprioritized list)
> > ---
> >
> >Key: KAFKA-8638
> >URL: https://issues.apache.org/jira/browse/KAFKA-8638
> >Project: Kafka
> >  Issue Type: Improvement
> >  Components: config, controller, core
> >Affects Versions: 1.1.1, 2.3.0, 2.2.1
> >Reporter: GEORGE LI
> >Assignee: GEORGE LI
> >Priority: Major
> >
> > Currently, the kafka preferred leader election will pick the broker_id
> in the topic/partition replica assignments in a priority order when the
> broker is in ISR. The preferred leader is the broker id in the first
> position of replica. There are use-cases that, even the first broker in the
> replica assignment is in ISR, there is a need for it to be moved to the end
> of ordering (lowest priority) when deciding leadership during  preferred
> leader election.
> > Let’s use topic/partition replica (1,2,3) as an example. 1 is the
> preferred leader.  When preferred leadership is run, it will pick 1 as the
> leader if it's ISR, if 1 is not online and in ISR, then pick 2, if 2 is not
> in ISR, then pick 3 as the leader. There are use cases that, even 1 is in
> ISR, we would like it to be moved to the end of ordering (lowest priority)
> when deciding leadership during preferred leader election.  Below is a list
> of use cases:
> > * (If broker_id 1 is a swapped 

Re: [DISCUSS] KIP-491: Preferred Leader Deprioritized List (Temporary Blacklist)

2019-07-18 Thread George Li
 Hi,

Pinging the list for the feedbacks of this KIP-491  
(https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120736982) 


Thanks,
George

On Saturday, July 13, 2019, 08:43:25 PM PDT, George Li 
 wrote:  
 
  Hi,

I have created KIP-491 
(https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120736982) 
for putting a broker to the preferred leader blacklist or deprioritized list so 
when determining leadership,  it's moved to the lowest priority for some of the 
listed use-cases. 

Please provide your comments/feedbacks. 

Thanks,
George



  - Forwarded Message - From: Jose Armando Garcia Sancio (JIRA) 
To: "sql_consult...@yahoo.com" Sent: 
Tuesday, July 9, 2019, 01:06:05 PM PDTSubject: [jira] [Commented] (KAFKA-8638) 
Preferred Leader Blacklist (deprioritized list)
 
    [ 
https://issues.apache.org/jira/browse/KAFKA-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881511#comment-16881511
 ] 

Jose Armando Garcia Sancio commented on KAFKA-8638:
---

Thanks for feedback and clear use cases [~sql_consulting].

> Preferred Leader Blacklist (deprioritized list)
> ---
>
>                Key: KAFKA-8638
>                URL: https://issues.apache.org/jira/browse/KAFKA-8638
>            Project: Kafka
>          Issue Type: Improvement
>          Components: config, controller, core
>    Affects Versions: 1.1.1, 2.3.0, 2.2.1
>            Reporter: GEORGE LI
>            Assignee: GEORGE LI
>            Priority: Major
>
> Currently, the kafka preferred leader election will pick the broker_id in the 
> topic/partition replica assignments in a priority order when the broker is in 
> ISR. The preferred leader is the broker id in the first position of replica. 
> There are use-cases that, even the first broker in the replica assignment is 
> in ISR, there is a need for it to be moved to the end of ordering (lowest 
> priority) when deciding leadership during  preferred leader election. 
> Let’s use topic/partition replica (1,2,3) as an example. 1 is the preferred 
> leader.  When preferred leadership is run, it will pick 1 as the leader if 
> it's ISR, if 1 is not online and in ISR, then pick 2, if 2 is not in ISR, 
> then pick 3 as the leader. There are use cases that, even 1 is in ISR, we 
> would like it to be moved to the end of ordering (lowest priority) when 
> deciding leadership during preferred leader election.  Below is a list of use 
> cases:
> * (If broker_id 1 is a swapped failed host and brought up with last segments 
> or latest offset without historical data (There is another effort on this), 
> it's better for it to not serve leadership till it's caught-up.
> * The cross-data center cluster has AWS instances which have less computing 
> power than the on-prem bare metal machines.  We could put the AWS broker_ids 
> in Preferred Leader Blacklist, so on-prem brokers can be elected leaders, 
> without changing the reassignments ordering of the replicas. 
> * If the broker_id 1 is constantly losing leadership after some time: 
> "Flapping". we would want to exclude 1 to be a leader unless all other 
> brokers of this topic/partition are offline.  The “Flapping” effect was seen 
> in the past when 2 or more brokers were bad, when they lost leadership 
> constantly/quickly, the sets of partition replicas they belong to will see 
> leadership constantly changing.  The ultimate solution is to swap these bad 
> hosts.  But for quick mitigation, we can also put the bad hosts in the 
> Preferred Leader Blacklist to move the priority of its being elected as 
> leaders to the lowest. 
> *  If the controller is busy serving an extra load of metadata requests and 
> other tasks. we would like to put the controller's leaders to other brokers 
> to lower its CPU load. currently bouncing to lose leadership would not work 
> for Controller, because after the bounce, the controller fails over to 
> another broker.
> * Avoid bouncing broker in order to lose its leadership: it would be good if 
> we have a way to specify which broker should be excluded from serving 
> traffic/leadership (without changing the replica assignment ordering by 
> reassignments, even though that's quick), and run preferred leader election.  
> A bouncing broker will cause temporary URP, and sometimes other issues.  Also 
> a bouncing of broker (e.g. broker_id 1) can temporarily lose all its 
> leadership, but if another broker (e.g. broker_id 2) fails or gets bounced, 
> some of its leaderships will likely failover to broker_id 1 on a replica with 
> 3 brokers.  If broker_id 1 is in the blacklist, then in such a scenario even 
> broker_id 2 offline,  the 3rd broker can take leadership. 
> The current work-around of the above is to change the topic/partition's 
> replica reassignments to move the broker_id 1 from the first position to the 
> last position and run preferred leader 

[DISCUSS] KIP-491: Preferred Leader Deprioritized List (Temporary Blacklist)

2019-07-13 Thread George Li
 Hi,

I have created KIP-491 
(https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120736982) 
for putting a broker to the preferred leader blacklist or deprioritized list so 
when determining leadership,  it's moved to the lowest priority for some of the 
listed use-cases. 

Please provide your comments/feedbacks. 

Thanks,
George



   - Forwarded Message - From: Jose Armando Garcia Sancio (JIRA) 
To: "sql_consult...@yahoo.com" Sent: 
Tuesday, July 9, 2019, 01:06:05 PM PDTSubject: [jira] [Commented] (KAFKA-8638) 
Preferred Leader Blacklist (deprioritized list)
 
    [ 
https://issues.apache.org/jira/browse/KAFKA-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881511#comment-16881511
 ] 

Jose Armando Garcia Sancio commented on KAFKA-8638:
---

Thanks for feedback and clear use cases [~sql_consulting].

> Preferred Leader Blacklist (deprioritized list)
> ---
>
>                Key: KAFKA-8638
>                URL: https://issues.apache.org/jira/browse/KAFKA-8638
>            Project: Kafka
>          Issue Type: Improvement
>          Components: config, controller, core
>    Affects Versions: 1.1.1, 2.3.0, 2.2.1
>            Reporter: GEORGE LI
>            Assignee: GEORGE LI
>            Priority: Major
>
> Currently, the kafka preferred leader election will pick the broker_id in the 
> topic/partition replica assignments in a priority order when the broker is in 
> ISR. The preferred leader is the broker id in the first position of replica. 
> There are use-cases that, even the first broker in the replica assignment is 
> in ISR, there is a need for it to be moved to the end of ordering (lowest 
> priority) when deciding leadership during  preferred leader election. 
> Let’s use topic/partition replica (1,2,3) as an example. 1 is the preferred 
> leader.  When preferred leadership is run, it will pick 1 as the leader if 
> it's ISR, if 1 is not online and in ISR, then pick 2, if 2 is not in ISR, 
> then pick 3 as the leader. There are use cases that, even 1 is in ISR, we 
> would like it to be moved to the end of ordering (lowest priority) when 
> deciding leadership during preferred leader election.  Below is a list of use 
> cases:
> * (If broker_id 1 is a swapped failed host and brought up with last segments 
> or latest offset without historical data (There is another effort on this), 
> it's better for it to not serve leadership till it's caught-up.
> * The cross-data center cluster has AWS instances which have less computing 
> power than the on-prem bare metal machines.  We could put the AWS broker_ids 
> in Preferred Leader Blacklist, so on-prem brokers can be elected leaders, 
> without changing the reassignments ordering of the replicas. 
> * If the broker_id 1 is constantly losing leadership after some time: 
> "Flapping". we would want to exclude 1 to be a leader unless all other 
> brokers of this topic/partition are offline.  The “Flapping” effect was seen 
> in the past when 2 or more brokers were bad, when they lost leadership 
> constantly/quickly, the sets of partition replicas they belong to will see 
> leadership constantly changing.  The ultimate solution is to swap these bad 
> hosts.  But for quick mitigation, we can also put the bad hosts in the 
> Preferred Leader Blacklist to move the priority of its being elected as 
> leaders to the lowest. 
> *  If the controller is busy serving an extra load of metadata requests and 
> other tasks. we would like to put the controller's leaders to other brokers 
> to lower its CPU load. currently bouncing to lose leadership would not work 
> for Controller, because after the bounce, the controller fails over to 
> another broker.
> * Avoid bouncing broker in order to lose its leadership: it would be good if 
> we have a way to specify which broker should be excluded from serving 
> traffic/leadership (without changing the replica assignment ordering by 
> reassignments, even though that's quick), and run preferred leader election.  
> A bouncing broker will cause temporary URP, and sometimes other issues.  Also 
> a bouncing of broker (e.g. broker_id 1) can temporarily lose all its 
> leadership, but if another broker (e.g. broker_id 2) fails or gets bounced, 
> some of its leaderships will likely failover to broker_id 1 on a replica with 
> 3 brokers.  If broker_id 1 is in the blacklist, then in such a scenario even 
> broker_id 2 offline,  the 3rd broker can take leadership. 
> The current work-around of the above is to change the topic/partition's 
> replica reassignments to move the broker_id 1 from the first position to the 
> last position and run preferred leader election. e.g. (1, 2, 3) => (2, 3, 1). 
> This changes the replica reassignments, and we need to keep track of the 
> original one and restore if things change (e.g. controller fails over to 
> another broker, the swapped empty