Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-12-05 Thread Tom Bentley
Hi all,

I've been thinking about the proposed changes in KIP-179 and, on reflection,
I don't think the API presented is really ideal. Some of the limitations it
has include:

1. It sticks to the current, batch oriented (i.e. a single set of
reassignments at a time), model.
2. It still doesn't really provide a nice way of knowing that a
reassignment is complete.
3. As presented, the automatic removal of throttles only happens at the end
of
the reassignment batch. But individual brokers could be unthrottled before
then.

As an illustration of this, https://issues.apache.org/jira/browse/KAFKA-6304
provides a use case for wanting to cancel a reassignment because one of the
brokers in the new assignment has failed. With the proposed API:

1. We can't identify the subset of the reassignment batch which we want to
cancel.
2. All we could do would be to revise the proposed API to allow calling
   reassignPartitions() while a reassignment was in progress. This second
call
   could revert the subset of reassignments involving the failed broker.
3. But the API has no way to express that the original reassignment was
cancelled.

Another illustration of the problem: An advanced cluster balancer
(such as LinkedIn's cruise control) has to batch large reassignments
(partly so as to make cancellation easier). This batching itself leads in
inefficiency because some of the partitions in the batch will finish before
others, so time is wasted with the cluster only moving a small number of
partitions (when most in the batch have finished).

In hindsight, I think I was too influenced by reproducing what the
kafka-reassign-partitions.sh tool does today. I think what's actually
needed
(for things like cruise control) is an API that's more fine-grained, and
less batch oriented. I am therefore withdrawing KIP-179 and
intend to start a new KIP to propose a different API for partition
reassignment.

I'm still interested in hearing about other deficiencies of the KIP-179
proposal,
so I can avoid them in the new proposal. Similarly, if there are features
you'd like to see in the API, please let me know.

I won't go in to details of the new API here, but the basic idea I'd like
to use is to
give the reassignment of each partition an identity (though this wouldn't
be exposed directly in the API). This is necessary to allow new
reassignments
to be added while some are already running. API methods would then be
provided to discover
all the currently running reassignments, determine if a reassignment is
still running etc.

Cheers,

Tom

On 1 November 2017 at 10:20, Tom Bentley  wrote:

> This thread has been very quiet for a while now. It's unclear whether this
> is because no one has anything more to say, or whether no one has taken a
> look at it in its current form. I suspect the latter, so I'm not calling
> the vote today, but instead asking for more review.
>
> What's currently proposed – in addition to the reassignPartitions() API
> itself – is to have a pair of RPCs for managing throttles. This is quite
> different from the earlier proposal to reuse alterConfigs(). The benefits
> of this specific API include:
>
> * being more typesafe,
> * allowing for the automatic removal of throttles when reassignment has
> completed,
> * being careful about correct management of the throttles wrt controller
> failover
>
> Surely someone has something to say about this, before we reach the vote
> stage?
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-179+-+
> Change+ReassignPartitionsCommand+to+use+AdminClient
>
> Thanks,
>
> Tom
>
>
> On 25 October 2017 at 10:33, Tom Bentley  wrote:
>
>> If there are no further comments, I will start a vote on this next week.
>>
>> Thanks,
>>
>> Tom
>>
>> On 20 October 2017 at 08:33, Tom Bentley  wrote:
>>
>>> Hi,
>>>
>>> I've made a fairly major update to KIP-179 to propose APIs for setting
>>> throttled rates and throttled replicas with the ability to remove these
>>> automatically at the end of reassignment.
>>>
>>> I'd be grateful for your feedback:
>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-179+-+
>>> Change+ReassignPartitionsCommand+to+use+AdminClient
>>>
>>> Thanks,
>>>
>>> Tom
>>>
>>> On 2 October 2017 at 13:15, Tom Bentley  wrote:
>>>
 One question I have is about whether/how to scope throttling to a
 reassignment. Currently throttles are only loosely associated with
 reassignment: You can start a reassignment without any throttling, add
 throttling to an in-flight reassignment, and remember/forget to remove
 throttling after the reassignment is complete. There's is great flexibility
 in that, but also the risk that you forget the remove the throttle(s).

 Just adding an API for setting the throttled rate makes this situation
 worse: While it's nice to be able to auto-remove the throttles rate what
 about the config for the throttled replicas? 

Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-11-01 Thread Tom Bentley
This thread has been very quiet for a while now. It's unclear whether this
is because no one has anything more to say, or whether no one has taken a
look at it in its current form. I suspect the latter, so I'm not calling
the vote today, but instead asking for more review.

What's currently proposed – in addition to the reassignPartitions() API
itself – is to have a pair of RPCs for managing throttles. This is quite
different from the earlier proposal to reuse alterConfigs(). The benefits
of this specific API include:

* being more typesafe,
* allowing for the automatic removal of throttles when reassignment has
completed,
* being careful about correct management of the throttles wrt controller
failover

Surely someone has something to say about this, before we reach the vote
stage?

https://cwiki.apache.org/confluence/display/KAFKA/KIP-179+-+Change+
ReassignPartitionsCommand+to+use+AdminClient

Thanks,

Tom


On 25 October 2017 at 10:33, Tom Bentley  wrote:

> If there are no further comments, I will start a vote on this next week.
>
> Thanks,
>
> Tom
>
> On 20 October 2017 at 08:33, Tom Bentley  wrote:
>
>> Hi,
>>
>> I've made a fairly major update to KIP-179 to propose APIs for setting
>> throttled rates and throttled replicas with the ability to remove these
>> automatically at the end of reassignment.
>>
>> I'd be grateful for your feedback:
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-179+-+
>> Change+ReassignPartitionsCommand+to+use+AdminClient
>>
>> Thanks,
>>
>> Tom
>>
>> On 2 October 2017 at 13:15, Tom Bentley  wrote:
>>
>>> One question I have is about whether/how to scope throttling to a
>>> reassignment. Currently throttles are only loosely associated with
>>> reassignment: You can start a reassignment without any throttling, add
>>> throttling to an in-flight reassignment, and remember/forget to remove
>>> throttling after the reassignment is complete. There's is great flexibility
>>> in that, but also the risk that you forget the remove the throttle(s).
>>>
>>> Just adding an API for setting the throttled rate makes this situation
>>> worse: While it's nice to be able to auto-remove the throttles rate what
>>> about the config for the throttled replicas? Also you might add a throttle
>>> thinking a reassignment is in-flight, but it has in fact just finished:
>>> Those throttles will now hang around until reset or the end of the next
>>> reassignment. For these reasons it would be good if the throttle were more
>>> directly scoped to the reassignment.
>>>
>>> On the other hand, taking LinkedIn's Cruise Control as an example, there
>>> they seem to modify the reassignment znode directly and incrementally and
>>> so there is no notion of "the reassignment". Reassignments will be running
>>> continuously, with partitions added before all of the current partitions
>>> have completed. If there is no meaningful cluster-wide "reassignment" then
>>> it would be better to remove remove the throttle by changing the list of
>>> replicas as each replica catches up.
>>>
>>> I'm interested in any use cases people can share on this, as I'd like
>>> the throttle API to be useful for a broad range of use cases, rather than
>>> being too narrowly focussed on what's needed by the existing CLI tools.
>>>
>>> Thanks,
>>>
>>> Tom
>>>
>>>
>>>
>>>
>>> On 28 September 2017 at 17:22, Tom Bentley 
>>> wrote:
>>>
 I'm starting to think about KIP-179 again. In order to have more
 manageably-scoped KIPs and PRs I think it might be worth factoring-out the
 throttling part into a separate KIP. Wdyt?

 Keeping the throttling discussion in this thread for the moment...

 The throttling behaviour is currently spread across the
 `(leader|follower).replication.throttled.replicas` topic config and
 the `(leader|follower).replication.throttled.rate` dynamic broker
 config. It's not really clear to me exactly what "removing the throttle" is
 supposed to mean. I mean we could reset the rate to Long.MAV_VALUE or we
 could change the list of replicas to an empty list. The
 ReassignPartitionsCommand does both, but there is some small utility in
 leaving the rate, but clearing the list, if you've discovered the "right"
 rate for your cluster/workload and to want it to be sticky for next time.
 Does any one do this in practice?

 With regards to throttling, it would be
>> worth thinking about a way where the throttling configs can be
>> automatically removed without the user having to re-run the tool.
>>
>
> Isn't that just a matter of updating the topic configs for
> (leader|follower).replication.throttled.replicas at the same time we
> remove the reassignment znode? That leaves open the question about whether
> to reset the rates at the same time.
>

 Thinking some more about my "update the configs at the same time 

Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-10-25 Thread Tom Bentley
If there are no further comments, I will start a vote on this next week.

Thanks,

Tom

On 20 October 2017 at 08:33, Tom Bentley  wrote:

> Hi,
>
> I've made a fairly major update to KIP-179 to propose APIs for setting
> throttled rates and throttled replicas with the ability to remove these
> automatically at the end of reassignment.
>
> I'd be grateful for your feedback:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-179+-+Change+
> ReassignPartitionsCommand+to+use+AdminClient
>
> Thanks,
>
> Tom
>
> On 2 October 2017 at 13:15, Tom Bentley  wrote:
>
>> One question I have is about whether/how to scope throttling to a
>> reassignment. Currently throttles are only loosely associated with
>> reassignment: You can start a reassignment without any throttling, add
>> throttling to an in-flight reassignment, and remember/forget to remove
>> throttling after the reassignment is complete. There's is great flexibility
>> in that, but also the risk that you forget the remove the throttle(s).
>>
>> Just adding an API for setting the throttled rate makes this situation
>> worse: While it's nice to be able to auto-remove the throttles rate what
>> about the config for the throttled replicas? Also you might add a throttle
>> thinking a reassignment is in-flight, but it has in fact just finished:
>> Those throttles will now hang around until reset or the end of the next
>> reassignment. For these reasons it would be good if the throttle were more
>> directly scoped to the reassignment.
>>
>> On the other hand, taking LinkedIn's Cruise Control as an example, there
>> they seem to modify the reassignment znode directly and incrementally and
>> so there is no notion of "the reassignment". Reassignments will be running
>> continuously, with partitions added before all of the current partitions
>> have completed. If there is no meaningful cluster-wide "reassignment" then
>> it would be better to remove remove the throttle by changing the list of
>> replicas as each replica catches up.
>>
>> I'm interested in any use cases people can share on this, as I'd like the
>> throttle API to be useful for a broad range of use cases, rather than being
>> too narrowly focussed on what's needed by the existing CLI tools.
>>
>> Thanks,
>>
>> Tom
>>
>>
>>
>>
>> On 28 September 2017 at 17:22, Tom Bentley  wrote:
>>
>>> I'm starting to think about KIP-179 again. In order to have more
>>> manageably-scoped KIPs and PRs I think it might be worth factoring-out the
>>> throttling part into a separate KIP. Wdyt?
>>>
>>> Keeping the throttling discussion in this thread for the moment...
>>>
>>> The throttling behaviour is currently spread across the
>>> `(leader|follower).replication.throttled.replicas` topic config and the
>>> `(leader|follower).replication.throttled.rate` dynamic broker config.
>>> It's not really clear to me exactly what "removing the throttle" is
>>> supposed to mean. I mean we could reset the rate to Long.MAV_VALUE or we
>>> could change the list of replicas to an empty list. The
>>> ReassignPartitionsCommand does both, but there is some small utility in
>>> leaving the rate, but clearing the list, if you've discovered the "right"
>>> rate for your cluster/workload and to want it to be sticky for next time.
>>> Does any one do this in practice?
>>>
>>> With regards to throttling, it would be
> worth thinking about a way where the throttling configs can be
> automatically removed without the user having to re-run the tool.
>

 Isn't that just a matter of updating the topic configs for
 (leader|follower).replication.throttled.replicas at the same time we
 remove the reassignment znode? That leaves open the question about whether
 to reset the rates at the same time.

>>>
>>> Thinking some more about my "update the configs at the same time we
>>> remove the reassignment znode" suggestion. The reassignment znode is
>>> persistent, so the reassignment will survive a zookeeper restart. If there
>>> was a flag for the auto-removal of the throttle it would likewise need to
>>> be persistent. Otherwise a ZK restart would remember the reassignment, but
>>> forget about the preference for auto removal of throttles. So, we would use
>>> a persistent znode (a child of the reassignment path, perhaps) to store a
>>> flag for throttle removal.
>>>
>>> Thoughts?
>>>
>>> Cheers,
>>>
>>> Tom
>>>
>>
>>
>


Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-10-20 Thread Tom Bentley
Hi,

I've made a fairly major update to KIP-179 to propose APIs for setting
throttled rates and throttled replicas with the ability to remove these
automatically at the end of reassignment.

I'd be grateful for your feedback:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-179+-+Change+ReassignPartitionsCommand+to+use+AdminClient

Thanks,

Tom

On 2 October 2017 at 13:15, Tom Bentley  wrote:

> One question I have is about whether/how to scope throttling to a
> reassignment. Currently throttles are only loosely associated with
> reassignment: You can start a reassignment without any throttling, add
> throttling to an in-flight reassignment, and remember/forget to remove
> throttling after the reassignment is complete. There's is great flexibility
> in that, but also the risk that you forget the remove the throttle(s).
>
> Just adding an API for setting the throttled rate makes this situation
> worse: While it's nice to be able to auto-remove the throttles rate what
> about the config for the throttled replicas? Also you might add a throttle
> thinking a reassignment is in-flight, but it has in fact just finished:
> Those throttles will now hang around until reset or the end of the next
> reassignment. For these reasons it would be good if the throttle were more
> directly scoped to the reassignment.
>
> On the other hand, taking LinkedIn's Cruise Control as an example, there
> they seem to modify the reassignment znode directly and incrementally and
> so there is no notion of "the reassignment". Reassignments will be running
> continuously, with partitions added before all of the current partitions
> have completed. If there is no meaningful cluster-wide "reassignment" then
> it would be better to remove remove the throttle by changing the list of
> replicas as each replica catches up.
>
> I'm interested in any use cases people can share on this, as I'd like the
> throttle API to be useful for a broad range of use cases, rather than being
> too narrowly focussed on what's needed by the existing CLI tools.
>
> Thanks,
>
> Tom
>
>
>
>
> On 28 September 2017 at 17:22, Tom Bentley  wrote:
>
>> I'm starting to think about KIP-179 again. In order to have more
>> manageably-scoped KIPs and PRs I think it might be worth factoring-out the
>> throttling part into a separate KIP. Wdyt?
>>
>> Keeping the throttling discussion in this thread for the moment...
>>
>> The throttling behaviour is currently spread across the
>> `(leader|follower).replication.throttled.replicas` topic config and the
>> `(leader|follower).replication.throttled.rate` dynamic broker config.
>> It's not really clear to me exactly what "removing the throttle" is
>> supposed to mean. I mean we could reset the rate to Long.MAV_VALUE or we
>> could change the list of replicas to an empty list. The
>> ReassignPartitionsCommand does both, but there is some small utility in
>> leaving the rate, but clearing the list, if you've discovered the "right"
>> rate for your cluster/workload and to want it to be sticky for next time.
>> Does any one do this in practice?
>>
>> With regards to throttling, it would be
 worth thinking about a way where the throttling configs can be
 automatically removed without the user having to re-run the tool.

>>>
>>> Isn't that just a matter of updating the topic configs for
>>> (leader|follower).replication.throttled.replicas at the same time we
>>> remove the reassignment znode? That leaves open the question about whether
>>> to reset the rates at the same time.
>>>
>>
>> Thinking some more about my "update the configs at the same time we
>> remove the reassignment znode" suggestion. The reassignment znode is
>> persistent, so the reassignment will survive a zookeeper restart. If there
>> was a flag for the auto-removal of the throttle it would likewise need to
>> be persistent. Otherwise a ZK restart would remember the reassignment, but
>> forget about the preference for auto removal of throttles. So, we would use
>> a persistent znode (a child of the reassignment path, perhaps) to store a
>> flag for throttle removal.
>>
>> Thoughts?
>>
>> Cheers,
>>
>> Tom
>>
>
>


Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-10-02 Thread Tom Bentley
One question I have is about whether/how to scope throttling to a
reassignment. Currently throttles are only loosely associated with
reassignment: You can start a reassignment without any throttling, add
throttling to an in-flight reassignment, and remember/forget to remove
throttling after the reassignment is complete. There's is great flexibility
in that, but also the risk that you forget the remove the throttle(s).

Just adding an API for setting the throttled rate makes this situation
worse: While it's nice to be able to auto-remove the throttles rate what
about the config for the throttled replicas? Also you might add a throttle
thinking a reassignment is in-flight, but it has in fact just finished:
Those throttles will now hang around until reset or the end of the next
reassignment. For these reasons it would be good if the throttle were more
directly scoped to the reassignment.

On the other hand, taking LinkedIn's Cruise Control as an example, there
they seem to modify the reassignment znode directly and incrementally and
so there is no notion of "the reassignment". Reassignments will be running
continuously, with partitions added before all of the current partitions
have completed. If there is no meaningful cluster-wide "reassignment" then
it would be better to remove remove the throttle by changing the list of
replicas as each replica catches up.

I'm interested in any use cases people can share on this, as I'd like the
throttle API to be useful for a broad range of use cases, rather than being
too narrowly focussed on what's needed by the existing CLI tools.

Thanks,

Tom




On 28 September 2017 at 17:22, Tom Bentley  wrote:

> I'm starting to think about KIP-179 again. In order to have more
> manageably-scoped KIPs and PRs I think it might be worth factoring-out the
> throttling part into a separate KIP. Wdyt?
>
> Keeping the throttling discussion in this thread for the moment...
>
> The throttling behaviour is currently spread across the
> `(leader|follower).replication.throttled.replicas` topic config and the
> `(leader|follower).replication.throttled.rate` dynamic broker config.
> It's not really clear to me exactly what "removing the throttle" is
> supposed to mean. I mean we could reset the rate to Long.MAV_VALUE or we
> could change the list of replicas to an empty list. The
> ReassignPartitionsCommand does both, but there is some small utility in
> leaving the rate, but clearing the list, if you've discovered the "right"
> rate for your cluster/workload and to want it to be sticky for next time.
> Does any one do this in practice?
>
> With regards to throttling, it would be
>>> worth thinking about a way where the throttling configs can be
>>> automatically removed without the user having to re-run the tool.
>>>
>>
>> Isn't that just a matter of updating the topic configs for
>> (leader|follower).replication.throttled.replicas at the same time we
>> remove the reassignment znode? That leaves open the question about whether
>> to reset the rates at the same time.
>>
>
> Thinking some more about my "update the configs at the same time we remove
> the reassignment znode" suggestion. The reassignment znode is persistent,
> so the reassignment will survive a zookeeper restart. If there was a flag
> for the auto-removal of the throttle it would likewise need to be
> persistent. Otherwise a ZK restart would remember the reassignment, but
> forget about the preference for auto removal of throttles. So, we would use
> a persistent znode (a child of the reassignment path, perhaps) to store a
> flag for throttle removal.
>
> Thoughts?
>
> Cheers,
>
> Tom
>


Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-09-28 Thread Tom Bentley
I'm starting to think about KIP-179 again. In order to have more
manageably-scoped KIPs and PRs I think it might be worth factoring-out the
throttling part into a separate KIP. Wdyt?

Keeping the throttling discussion in this thread for the moment...

The throttling behaviour is currently spread across the `(leader|follower).
replication.throttled.replicas` topic config and the
`(leader|follower).replication.throttled.rate`
dynamic broker config. It's not really clear to me exactly what "removing
the throttle" is supposed to mean. I mean we could reset the rate to
Long.MAV_VALUE or we could change the list of replicas to an empty list.
The ReassignPartitionsCommand does both, but there is some small utility in
leaving the rate, but clearing the list, if you've discovered the "right"
rate for your cluster/workload and to want it to be sticky for next time.
Does any one do this in practice?

With regards to throttling, it would be
>> worth thinking about a way where the throttling configs can be
>> automatically removed without the user having to re-run the tool.
>>
>
> Isn't that just a matter of updating the topic configs for
> (leader|follower).replication.throttled.replicas at the same time we
> remove the reassignment znode? That leaves open the question about whether
> to reset the rates at the same time.
>

Thinking some more about my "update the configs at the same time we remove
the reassignment znode" suggestion. The reassignment znode is persistent,
so the reassignment will survive a zookeeper restart. If there was a flag
for the auto-removal of the throttle it would likewise need to be
persistent. Otherwise a ZK restart would remember the reassignment, but
forget about the preference for auto removal of throttles. So, we would use
a persistent znode (a child of the reassignment path, perhaps) to store a
flag for throttle removal.

Thoughts?

Cheers,

Tom


Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-09-07 Thread Tom Bentley
Hi Ismael,

OK, KIP-195 has been factored out.

Regarding the dynamic configs, I personally still think we should have a
> specific protocol API for that


Can you explain a little more why?

With regards to throttling, it would be
> worth thinking about a way where the throttling configs can be
> automatically removed without the user having to re-run the tool.
>

Isn't that just a matter of updating the topic configs for
(leader|follower).replication.throttled.replicas
at the same time we remove the reassignment znode? That leaves open the
question about whether to reset the rates at the same time.

But now I wonder what the broker configs
"(leader|follower).replication.throttled.rate"
are DynamicConfigs but the topic configs
"(leader|follower).replication.throttled.replicas"
are normal configs. Aren't the topic ones for the replicas just as dynamic
as the broker ones for the rate?

Thanks,

Tom


On 7 September 2017 at 17:24, Ismael Juma  wrote:

> Hi Tom,
>
> It won't be used within Kafka, but it's a public API that can be used by
> other projects. And the protocol can be used by non-Java clients. So, there
> is still value in including it.
>
> Regarding the dynamic configs, I personally still think we should have a
> specific protocol API for that. With regards to throttling, it would be
> worth thinking about a way where the throttling configs can be
> automatically removed without the user having to re-run the tool. So, yes,
> maybe it should be a separate KIP as well.
>
> Not sure if we need it in the template, but you're welcome to mention any
> dependencies when there are some.
>
> Thanks,
> Ismael
>
> On Thu, Sep 7, 2017 at 5:15 PM, Tom Bentley  wrote:
>
> > Hi Ismael,
> >
> > It would be good to get at least some of this into 1.0.0.
> >
> > We could put the increasePartitions() work into another KIP, but it would
> > be an unused AdminClient API in that release. The consumer of this API
> will
> > be the TopicsCommand when that get refactored to use the AdminClient.
> > That's something Paolo Patierno is proposing to do but afaik not in time
> > for 1.0.0. I don't think that's an issue, though, so I'll split out a
> > separate KIP.
> >
> > FWIW, we could also split out the proposal to support describeConfigs()
> and
> > alterConfigs() for dynamic configs into a separate KIP too. But that's
> also
> > a decision we can defer until we're looking at the remainder of KIP-179.
> >
> > Aside: I wonder if it would be a good idea for the KIP template to have a
> > "Depends on" field so people can more easily keep track of how multiple
> > in-flight KIPs depend on one another?
> >
> > Cheers,
> >
> > Tom
> >
> > On 7 September 2017 at 16:42, Ismael Juma  wrote:
> >
> > > Hi Tom,
> > >
> > > What do you think of moving `increasePartitionsCount` (or
> > > `increaseNumPartitions`) to a separate KIP? That is simple enough that
> we
> > > could potentially include it in 1.0.0. I'd be happy to review it.
> > > ReassignPartitions is more complex and we can probably aim to include
> > that
> > > in the January release. What do you think?
> > >
> > > Ismael
> > >
> > > On Wed, Sep 6, 2017 at 11:42 PM, Colin McCabe 
> > wrote:
> > >
> > > > On Wed, Sep 6, 2017, at 00:20, Tom Bentley wrote:
> > > > > Hi Ted and Colin,
> > > > >
> > > > > Thanks for the comments.
> > > > >
> > > > > It seems you're both happier with reassign rather than assign, so
> I'm
> > > > > happy
> > > > > to stick with that.
> > > > >
> > > > >
> > > > > On 5 September 2017 at 18:46, Colin McCabe 
> > wrote:
> > > > >
> > > > > > ...
> > > > >
> > > > >
> > > > > > Do we expect that reducing the number of partitions will ever be
> > > > > > supported by this API?  It seems like decreasing would require a
> > > > > > different API-- one which supported data movement, had a "check
> > > status
> > > > > > of this operation" feature, etc. etc.  If this API is only ever
> > going
> > > > to
> > > > > > be used to increase the number of partitions, I think we should
> > just
> > > > > > call it "increasePartitionCount" to avoid confusion.
> > > > > >
> > > > >
> > > > > I thought a little about the decrease possibility (hence the static
> > > > > factory
> > > > > methods on PartitionCount), but not really in enough detail. I
> > suppose
> > > a
> > > > > decrease process could look like this:
> > > > >
> > > > > 1. Producers start partitioning based on the decreased partition
> > count.
> > > > > 2. Consumers continue to consume from all partitions.
> > > > > 3. At some point all the records in the old partitions have expired
> > and
> > > > > they can be deleted.
> > > > >
> > > > > This wouldn't work for compacted topics. Of course a more
> aggressive
> > > > > strategy is also possible where we forcibly move data from the
> > > partitions
> > > > > to be deleted.
> > > > >
> > > > > Anyway, in either case the process would be long running, 

Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-09-07 Thread Ismael Juma
Hi Tom,

It won't be used within Kafka, but it's a public API that can be used by
other projects. And the protocol can be used by non-Java clients. So, there
is still value in including it.

Regarding the dynamic configs, I personally still think we should have a
specific protocol API for that. With regards to throttling, it would be
worth thinking about a way where the throttling configs can be
automatically removed without the user having to re-run the tool. So, yes,
maybe it should be a separate KIP as well.

Not sure if we need it in the template, but you're welcome to mention any
dependencies when there are some.

Thanks,
Ismael

On Thu, Sep 7, 2017 at 5:15 PM, Tom Bentley  wrote:

> Hi Ismael,
>
> It would be good to get at least some of this into 1.0.0.
>
> We could put the increasePartitions() work into another KIP, but it would
> be an unused AdminClient API in that release. The consumer of this API will
> be the TopicsCommand when that get refactored to use the AdminClient.
> That's something Paolo Patierno is proposing to do but afaik not in time
> for 1.0.0. I don't think that's an issue, though, so I'll split out a
> separate KIP.
>
> FWIW, we could also split out the proposal to support describeConfigs() and
> alterConfigs() for dynamic configs into a separate KIP too. But that's also
> a decision we can defer until we're looking at the remainder of KIP-179.
>
> Aside: I wonder if it would be a good idea for the KIP template to have a
> "Depends on" field so people can more easily keep track of how multiple
> in-flight KIPs depend on one another?
>
> Cheers,
>
> Tom
>
> On 7 September 2017 at 16:42, Ismael Juma  wrote:
>
> > Hi Tom,
> >
> > What do you think of moving `increasePartitionsCount` (or
> > `increaseNumPartitions`) to a separate KIP? That is simple enough that we
> > could potentially include it in 1.0.0. I'd be happy to review it.
> > ReassignPartitions is more complex and we can probably aim to include
> that
> > in the January release. What do you think?
> >
> > Ismael
> >
> > On Wed, Sep 6, 2017 at 11:42 PM, Colin McCabe 
> wrote:
> >
> > > On Wed, Sep 6, 2017, at 00:20, Tom Bentley wrote:
> > > > Hi Ted and Colin,
> > > >
> > > > Thanks for the comments.
> > > >
> > > > It seems you're both happier with reassign rather than assign, so I'm
> > > > happy
> > > > to stick with that.
> > > >
> > > >
> > > > On 5 September 2017 at 18:46, Colin McCabe 
> wrote:
> > > >
> > > > > ...
> > > >
> > > >
> > > > > Do we expect that reducing the number of partitions will ever be
> > > > > supported by this API?  It seems like decreasing would require a
> > > > > different API-- one which supported data movement, had a "check
> > status
> > > > > of this operation" feature, etc. etc.  If this API is only ever
> going
> > > to
> > > > > be used to increase the number of partitions, I think we should
> just
> > > > > call it "increasePartitionCount" to avoid confusion.
> > > > >
> > > >
> > > > I thought a little about the decrease possibility (hence the static
> > > > factory
> > > > methods on PartitionCount), but not really in enough detail. I
> suppose
> > a
> > > > decrease process could look like this:
> > > >
> > > > 1. Producers start partitioning based on the decreased partition
> count.
> > > > 2. Consumers continue to consume from all partitions.
> > > > 3. At some point all the records in the old partitions have expired
> and
> > > > they can be deleted.
> > > >
> > > > This wouldn't work for compacted topics. Of course a more aggressive
> > > > strategy is also possible where we forcibly move data from the
> > partitions
> > > > to be deleted.
> > > >
> > > > Anyway, in either case the process would be long running, whereas the
> > > > increase case is fast, so the semantics are quite different. So I
> > agree,
> > > > lets rename the method to make clear that it's only for increasing
> the
> > > > partition count.
> > > >
> > > > >
> > > > > > Outstanding questions:
> > > > > >
> > > > > > 1. Is the proposed alterInterReplicaThrottle() API really better
> > than
> > > > > > changing the throttle via alterConfigs()?
> > > > >
> > > > > That's a good point.  I would argue that we should just use
> > > alterConfigs
> > > > > to set the broker configuration, rather than having a special RPC
> > just
> > > > > for this.
> > > > >
> > > >
> > > > Yes, I'm minded to agree.
> > > >
> > > > The reason I originally thought a special API might be better was if
> > > > people
> > > > felt that the DynamicConfig mechanism (which seems to exist only to
> > > > support
> > > > these throttles) was an implementation detail of the throttles. But I
> > now
> > > > realise that they're visible via kafka-topics.sh, so they're
> > effectively
> > > > already a public API.
> > > >
> > > >
> > > > >
> > > > > ...
> > > > > > Would it be a problem that
> > > > > > triggering the reassignment required ClusterAction on the

Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-09-07 Thread Tom Bentley
Hi Ismael,

It would be good to get at least some of this into 1.0.0.

We could put the increasePartitions() work into another KIP, but it would
be an unused AdminClient API in that release. The consumer of this API will
be the TopicsCommand when that get refactored to use the AdminClient.
That's something Paolo Patierno is proposing to do but afaik not in time
for 1.0.0. I don't think that's an issue, though, so I'll split out a
separate KIP.

FWIW, we could also split out the proposal to support describeConfigs() and
alterConfigs() for dynamic configs into a separate KIP too. But that's also
a decision we can defer until we're looking at the remainder of KIP-179.

Aside: I wonder if it would be a good idea for the KIP template to have a
"Depends on" field so people can more easily keep track of how multiple
in-flight KIPs depend on one another?

Cheers,

Tom

On 7 September 2017 at 16:42, Ismael Juma  wrote:

> Hi Tom,
>
> What do you think of moving `increasePartitionsCount` (or
> `increaseNumPartitions`) to a separate KIP? That is simple enough that we
> could potentially include it in 1.0.0. I'd be happy to review it.
> ReassignPartitions is more complex and we can probably aim to include that
> in the January release. What do you think?
>
> Ismael
>
> On Wed, Sep 6, 2017 at 11:42 PM, Colin McCabe  wrote:
>
> > On Wed, Sep 6, 2017, at 00:20, Tom Bentley wrote:
> > > Hi Ted and Colin,
> > >
> > > Thanks for the comments.
> > >
> > > It seems you're both happier with reassign rather than assign, so I'm
> > > happy
> > > to stick with that.
> > >
> > >
> > > On 5 September 2017 at 18:46, Colin McCabe  wrote:
> > >
> > > > ...
> > >
> > >
> > > > Do we expect that reducing the number of partitions will ever be
> > > > supported by this API?  It seems like decreasing would require a
> > > > different API-- one which supported data movement, had a "check
> status
> > > > of this operation" feature, etc. etc.  If this API is only ever going
> > to
> > > > be used to increase the number of partitions, I think we should just
> > > > call it "increasePartitionCount" to avoid confusion.
> > > >
> > >
> > > I thought a little about the decrease possibility (hence the static
> > > factory
> > > methods on PartitionCount), but not really in enough detail. I suppose
> a
> > > decrease process could look like this:
> > >
> > > 1. Producers start partitioning based on the decreased partition count.
> > > 2. Consumers continue to consume from all partitions.
> > > 3. At some point all the records in the old partitions have expired and
> > > they can be deleted.
> > >
> > > This wouldn't work for compacted topics. Of course a more aggressive
> > > strategy is also possible where we forcibly move data from the
> partitions
> > > to be deleted.
> > >
> > > Anyway, in either case the process would be long running, whereas the
> > > increase case is fast, so the semantics are quite different. So I
> agree,
> > > lets rename the method to make clear that it's only for increasing the
> > > partition count.
> > >
> > > >
> > > > > Outstanding questions:
> > > > >
> > > > > 1. Is the proposed alterInterReplicaThrottle() API really better
> than
> > > > > changing the throttle via alterConfigs()?
> > > >
> > > > That's a good point.  I would argue that we should just use
> > alterConfigs
> > > > to set the broker configuration, rather than having a special RPC
> just
> > > > for this.
> > > >
> > >
> > > Yes, I'm minded to agree.
> > >
> > > The reason I originally thought a special API might be better was if
> > > people
> > > felt that the DynamicConfig mechanism (which seems to exist only to
> > > support
> > > these throttles) was an implementation detail of the throttles. But I
> now
> > > realise that they're visible via kafka-topics.sh, so they're
> effectively
> > > already a public API.
> > >
> > >
> > > >
> > > > ...
> > > > > Would it be a problem that
> > > > > triggering the reassignment required ClusterAction on the Cluster,
> > but
> > > > > throttling the assignment required Alter on the Topic? What if a
> > user had
> > > > > the former permission, but not the latter?
> > > >
> > > > We've been trying to reserve ClusterAction on Cluster for
> > > > broker-initiated operations.  Alter on Cluster is the ACL for "root
> > > > stuff" and I would argue that it should be what we use here.
> > > >
> > > > For reconfiguring the broker, I think we should follow KIP-133 and
> use
> > > > AlterConfigs on the Broker resource.  (Of course, if you just use the
> > > > existing alterConfigs call, you get this without any special effort.)
> > > >
> > >
> > > Yes, sorry, what I put in the email about authorisation wasn't what I
> put
> > > in the KIP (I revised the KIP after drafting the email and then forgot
> to
> > > update the email).
> > >
> > > Although KIP-133 proposes a Broker resource, I don't see one in the
> code
> > > and KIP-133 was supposedly delivered in 

Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-09-07 Thread Ismael Juma
Hi Tom,

What do you think of moving `increasePartitionsCount` (or
`increaseNumPartitions`) to a separate KIP? That is simple enough that we
could potentially include it in 1.0.0. I'd be happy to review it.
ReassignPartitions is more complex and we can probably aim to include that
in the January release. What do you think?

Ismael

On Wed, Sep 6, 2017 at 11:42 PM, Colin McCabe  wrote:

> On Wed, Sep 6, 2017, at 00:20, Tom Bentley wrote:
> > Hi Ted and Colin,
> >
> > Thanks for the comments.
> >
> > It seems you're both happier with reassign rather than assign, so I'm
> > happy
> > to stick with that.
> >
> >
> > On 5 September 2017 at 18:46, Colin McCabe  wrote:
> >
> > > ...
> >
> >
> > > Do we expect that reducing the number of partitions will ever be
> > > supported by this API?  It seems like decreasing would require a
> > > different API-- one which supported data movement, had a "check status
> > > of this operation" feature, etc. etc.  If this API is only ever going
> to
> > > be used to increase the number of partitions, I think we should just
> > > call it "increasePartitionCount" to avoid confusion.
> > >
> >
> > I thought a little about the decrease possibility (hence the static
> > factory
> > methods on PartitionCount), but not really in enough detail. I suppose a
> > decrease process could look like this:
> >
> > 1. Producers start partitioning based on the decreased partition count.
> > 2. Consumers continue to consume from all partitions.
> > 3. At some point all the records in the old partitions have expired and
> > they can be deleted.
> >
> > This wouldn't work for compacted topics. Of course a more aggressive
> > strategy is also possible where we forcibly move data from the partitions
> > to be deleted.
> >
> > Anyway, in either case the process would be long running, whereas the
> > increase case is fast, so the semantics are quite different. So I agree,
> > lets rename the method to make clear that it's only for increasing the
> > partition count.
> >
> > >
> > > > Outstanding questions:
> > > >
> > > > 1. Is the proposed alterInterReplicaThrottle() API really better than
> > > > changing the throttle via alterConfigs()?
> > >
> > > That's a good point.  I would argue that we should just use
> alterConfigs
> > > to set the broker configuration, rather than having a special RPC just
> > > for this.
> > >
> >
> > Yes, I'm minded to agree.
> >
> > The reason I originally thought a special API might be better was if
> > people
> > felt that the DynamicConfig mechanism (which seems to exist only to
> > support
> > these throttles) was an implementation detail of the throttles. But I now
> > realise that they're visible via kafka-topics.sh, so they're effectively
> > already a public API.
> >
> >
> > >
> > > ...
> > > > Would it be a problem that
> > > > triggering the reassignment required ClusterAction on the Cluster,
> but
> > > > throttling the assignment required Alter on the Topic? What if a
> user had
> > > > the former permission, but not the latter?
> > >
> > > We've been trying to reserve ClusterAction on Cluster for
> > > broker-initiated operations.  Alter on Cluster is the ACL for "root
> > > stuff" and I would argue that it should be what we use here.
> > >
> > > For reconfiguring the broker, I think we should follow KIP-133 and use
> > > AlterConfigs on the Broker resource.  (Of course, if you just use the
> > > existing alterConfigs call, you get this without any special effort.)
> > >
> >
> > Yes, sorry, what I put in the email about authorisation wasn't what I put
> > in the KIP (I revised the KIP after drafting the email and then forgot to
> > update the email).
> >
> > Although KIP-133 proposes a Broker resource, I don't see one in the code
> > and KIP-133 was supposedly delivered in 0.11. Can anyone fill in the
> > story
> > here? Is it simply because the functionality to update broker configs
> > hasn't been implemented yet?
>
> Look in
> ./clients/src/main/java/org/apache/kafka/common/config/
> ConfigResource.java,
> for the BROKER resource.  I bet you're looking at the Resource class
> used for ACLs, which is a different class.
>
> >
> > As currently proposed, both reassignPartitions() and
> > alterInterBrokerThrottle()
> > require Alter on the Cluster. If we used alterConfigs() to set the
> > throttles then we create a situation where the triggering of the
> > reassignment required Alter(Cluster), but the throttling required
> > Alter(Broker), and the user might have the former but not the latter. I
> > don't think this is likely to be a big deal in practice, but maybe others
> > disagree?
>
> Alter:Cluster is essentially root, though.  If you have Alter:Cluster
> and you don't have AlterConfigs:Broker, you can just create a new ACL
> giving it to yourself (creating and deleting ACLs is one of the powers
> of Alter:Cluster)
>
> cheers,
> Colin
>
> >
> >
> > > >
> > > > 2. Is reassignPartitions() really the best name? I find 

Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-09-06 Thread Colin McCabe
On Wed, Sep 6, 2017, at 00:20, Tom Bentley wrote:
> Hi Ted and Colin,
> 
> Thanks for the comments.
> 
> It seems you're both happier with reassign rather than assign, so I'm
> happy
> to stick with that.
> 
> 
> On 5 September 2017 at 18:46, Colin McCabe  wrote:
> 
> > ...
> 
> 
> > Do we expect that reducing the number of partitions will ever be
> > supported by this API?  It seems like decreasing would require a
> > different API-- one which supported data movement, had a "check status
> > of this operation" feature, etc. etc.  If this API is only ever going to
> > be used to increase the number of partitions, I think we should just
> > call it "increasePartitionCount" to avoid confusion.
> >
> 
> I thought a little about the decrease possibility (hence the static
> factory
> methods on PartitionCount), but not really in enough detail. I suppose a
> decrease process could look like this:
> 
> 1. Producers start partitioning based on the decreased partition count.
> 2. Consumers continue to consume from all partitions.
> 3. At some point all the records in the old partitions have expired and
> they can be deleted.
> 
> This wouldn't work for compacted topics. Of course a more aggressive
> strategy is also possible where we forcibly move data from the partitions
> to be deleted.
> 
> Anyway, in either case the process would be long running, whereas the
> increase case is fast, so the semantics are quite different. So I agree,
> lets rename the method to make clear that it's only for increasing the
> partition count.
> 
> >
> > > Outstanding questions:
> > >
> > > 1. Is the proposed alterInterReplicaThrottle() API really better than
> > > changing the throttle via alterConfigs()?
> >
> > That's a good point.  I would argue that we should just use alterConfigs
> > to set the broker configuration, rather than having a special RPC just
> > for this.
> >
> 
> Yes, I'm minded to agree.
> 
> The reason I originally thought a special API might be better was if
> people
> felt that the DynamicConfig mechanism (which seems to exist only to
> support
> these throttles) was an implementation detail of the throttles. But I now
> realise that they're visible via kafka-topics.sh, so they're effectively
> already a public API.
> 
> 
> >
> > ...
> > > Would it be a problem that
> > > triggering the reassignment required ClusterAction on the Cluster, but
> > > throttling the assignment required Alter on the Topic? What if a user had
> > > the former permission, but not the latter?
> >
> > We've been trying to reserve ClusterAction on Cluster for
> > broker-initiated operations.  Alter on Cluster is the ACL for "root
> > stuff" and I would argue that it should be what we use here.
> >
> > For reconfiguring the broker, I think we should follow KIP-133 and use
> > AlterConfigs on the Broker resource.  (Of course, if you just use the
> > existing alterConfigs call, you get this without any special effort.)
> >
> 
> Yes, sorry, what I put in the email about authorisation wasn't what I put
> in the KIP (I revised the KIP after drafting the email and then forgot to
> update the email).
> 
> Although KIP-133 proposes a Broker resource, I don't see one in the code
> and KIP-133 was supposedly delivered in 0.11. Can anyone fill in the
> story
> here? Is it simply because the functionality to update broker configs
> hasn't been implemented yet?

Look in
./clients/src/main/java/org/apache/kafka/common/config/ConfigResource.java,
for the BROKER resource.  I bet you're looking at the Resource class
used for ACLs, which is a different class.

> 
> As currently proposed, both reassignPartitions() and
> alterInterBrokerThrottle()
> require Alter on the Cluster. If we used alterConfigs() to set the
> throttles then we create a situation where the triggering of the
> reassignment required Alter(Cluster), but the throttling required
> Alter(Broker), and the user might have the former but not the latter. I
> don't think this is likely to be a big deal in practice, but maybe others
> disagree?

Alter:Cluster is essentially root, though.  If you have Alter:Cluster
and you don't have AlterConfigs:Broker, you can just create a new ACL
giving it to yourself (creating and deleting ACLs is one of the powers
of Alter:Cluster)

cheers,
Colin

> 
> 
> > >
> > > 2. Is reassignPartitions() really the best name? I find the distinction
> > > between reassigning and just assigning to be of limited value, and
> > > "reassign" is misleading when the APIs now used for changing the
> > > replication factor too. Since the API is asynchronous/long running, it
> > > might be better to indicate that in the name some how. What about
> > > startPartitionAssignment()?
> >
> > Good idea -- I like the idea of using "start" or "initiate" to indicate
> > that this is kicking off a long-running operation.
> >
> > "reassign" seemed like a natural choice to me since this is changing an
> > existing assignment.  It was assigned (when the topic it was 

Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-09-06 Thread Tom Bentley
Hi Ted and Colin,

Thanks for the comments.

It seems you're both happier with reassign rather than assign, so I'm happy
to stick with that.


On 5 September 2017 at 18:46, Colin McCabe  wrote:

> ...


> Do we expect that reducing the number of partitions will ever be
> supported by this API?  It seems like decreasing would require a
> different API-- one which supported data movement, had a "check status
> of this operation" feature, etc. etc.  If this API is only ever going to
> be used to increase the number of partitions, I think we should just
> call it "increasePartitionCount" to avoid confusion.
>

I thought a little about the decrease possibility (hence the static factory
methods on PartitionCount), but not really in enough detail. I suppose a
decrease process could look like this:

1. Producers start partitioning based on the decreased partition count.
2. Consumers continue to consume from all partitions.
3. At some point all the records in the old partitions have expired and
they can be deleted.

This wouldn't work for compacted topics. Of course a more aggressive
strategy is also possible where we forcibly move data from the partitions
to be deleted.

Anyway, in either case the process would be long running, whereas the
increase case is fast, so the semantics are quite different. So I agree,
lets rename the method to make clear that it's only for increasing the
partition count.

>
> > Outstanding questions:
> >
> > 1. Is the proposed alterInterReplicaThrottle() API really better than
> > changing the throttle via alterConfigs()?
>
> That's a good point.  I would argue that we should just use alterConfigs
> to set the broker configuration, rather than having a special RPC just
> for this.
>

Yes, I'm minded to agree.

The reason I originally thought a special API might be better was if people
felt that the DynamicConfig mechanism (which seems to exist only to support
these throttles) was an implementation detail of the throttles. But I now
realise that they're visible via kafka-topics.sh, so they're effectively
already a public API.


>
> ...
> > Would it be a problem that
> > triggering the reassignment required ClusterAction on the Cluster, but
> > throttling the assignment required Alter on the Topic? What if a user had
> > the former permission, but not the latter?
>
> We've been trying to reserve ClusterAction on Cluster for
> broker-initiated operations.  Alter on Cluster is the ACL for "root
> stuff" and I would argue that it should be what we use here.
>
> For reconfiguring the broker, I think we should follow KIP-133 and use
> AlterConfigs on the Broker resource.  (Of course, if you just use the
> existing alterConfigs call, you get this without any special effort.)
>

Yes, sorry, what I put in the email about authorisation wasn't what I put
in the KIP (I revised the KIP after drafting the email and then forgot to
update the email).

Although KIP-133 proposes a Broker resource, I don't see one in the code
and KIP-133 was supposedly delivered in 0.11. Can anyone fill in the story
here? Is it simply because the functionality to update broker configs
hasn't been implemented yet?

As currently proposed, both reassignPartitions() and alterInterBrokerThrottle()
require Alter on the Cluster. If we used alterConfigs() to set the
throttles then we create a situation where the triggering of the
reassignment required Alter(Cluster), but the throttling required
Alter(Broker), and the user might have the former but not the latter. I
don't think this is likely to be a big deal in practice, but maybe others
disagree?


> >
> > 2. Is reassignPartitions() really the best name? I find the distinction
> > between reassigning and just assigning to be of limited value, and
> > "reassign" is misleading when the APIs now used for changing the
> > replication factor too. Since the API is asynchronous/long running, it
> > might be better to indicate that in the name some how. What about
> > startPartitionAssignment()?
>
> Good idea -- I like the idea of using "start" or "initiate" to indicate
> that this is kicking off a long-running operation.
>
> "reassign" seemed like a natural choice to me since this is changing an
> existing assignment.  It was assigned (when the topic it was created)--
> now it's being re-assigned.  If you change it to just "assign" then it
> feels confusing to me.  A new user might ask if "assigning partitions"
> is a step that they should take after creating a topic, similar to how
> subscribing to topics is a step you take after creating a consumer.
>

Yeah, I find this new user argument persuasive, so I'm happy to stick with
reassign.

Thanks again for the feedback,

Tom


Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-09-05 Thread Colin McCabe
On Tue, Sep 5, 2017, at 09:39, Tom Bentley wrote:
> I've revised this KIP again:
> 
> * Change the alterPartitionCounts() API to support passing an optional
> assignment for the new partitions (which is already supported by
> kafka-topics.sh). At the same time I didn't want the API to suggest it
> was
> possible to change the existing assignments in the same call (which isn't
> supported today).
> 
> * Removed alterReplicationFactor(), since it's really just a special case
> of reassignPartitions(): In both cases it is necessary to give a complete
> partition assignment and both can be long running due to data movement.

Thanks, Tom.

Do we expect that reducing the number of partitions will ever be
supported by this API?  It seems like decreasing would require a
different API-- one which supported data movement, had a "check status
of this operation" feature, etc. etc.  If this API is only ever going to
be used to increase the number of partitions, I think we should just
call it "increasePartitionCount" to avoid confusion.

>
> Outstanding questions:
> 
> 1. Is the proposed alterInterReplicaThrottle() API really better than
> changing the throttle via alterConfigs()?

That's a good point.  I would argue that we should just use alterConfigs
to set the broker configuration, rather than having a special RPC just
for this.  

> It wouldn't be necessary to
> support alterConfigs() for all broker configs, just DynamicConfigs (which
> can't be set via the config file any way).

Hmm.  For this purpose, it's not even necessary to support alterConfigs
for all dynamic configurations.  You could just support alterConfigs on
this particular dynamic configuration key.  It might be better to be
conservative here, since exposing all of our internals could lead to
compatibility problems later on.

> Would it be a problem that
> triggering the reassignment required ClusterAction on the Cluster, but
> throttling the assignment required Alter on the Topic? What if a user had
> the former permission, but not the latter?

We've been trying to reserve ClusterAction on Cluster for
broker-initiated operations.  Alter on Cluster is the ACL for "root
stuff" and I would argue that it should be what we use here.

For reconfiguring the broker, I think we should follow KIP-133 and use
AlterConfigs on the Broker resource.  (Of course, if you just use the
existing alterConfigs call, you get this without any special effort.)

> 
> 2. Is reassignPartitions() really the best name? I find the distinction
> between reassigning and just assigning to be of limited value, and
> "reassign" is misleading when the APIs now used for changing the
> replication factor too. Since the API is asynchronous/long running, it
> might be better to indicate that in the name some how. What about
> startPartitionAssignment()?

Good idea -- I like the idea of using "start" or "initiate" to indicate
that this is kicking off a long-running operation.

"reassign" seemed like a natural choice to me since this is changing an
existing assignment.  It was assigned (when the topic it was created)--
now it's being re-assigned.  If you change it to just "assign" then it
feels confusing to me.  A new user might ask if "assigning partitions"
is a step that they should take after creating a topic, similar to how
subscribing to topics is a step you take after creating a consumer.

Basically, "reassign" makes it clear that it is changing an assignment
that already exists.  "assign" leaves open the possibility that no
assignment exists (which of course we know is not true, but it would be
nice if the name reflected that.)

> On 30 August 2017 at 16:17, Tom Bentley  wrote:
> 
> > Hi all,
> >
> > I've updated the KIP as follows:
> >
> > * remove the APIs supporting progress reporting in favour of the APIs
> > being implemented in KIP-113.
> > * added some APIs to cover the existing functionality around throttling
> > inter-broker transfers, which was previously a TODO.
> >
> > To respond to Colin's suggestion:
> >
> > > TopicAlterationResult alterTopics(TopicAlterationOptions options,
> > >   Map alters)
> > >
> > > PartitionCountAlteration(int numPartitions) implements TopicAlteration
> > >
> > > ReplicationFactorAlteration(int repl) implements TopicAlteration
> > >
> > > ReassignPartitionsAlteration(...) implements TopicAlteration
> >
> > That is a workable alternative to providing 3 separate methods on the
> > AdminClient, but I don't see why it is objectively better. I don't see
> > clients commonly needing to do a mixture of alterations, and it assumes
> > that the options make sense for all alterations.

Yeah-- it was just an idea I had for an alternate API that grouped
things a little bit more.  I agree that it does constrain us to using
only options that make sense for all the operations, which is not ideal.
 So your existing split into separate functions probably makes more
sense.

cheers,

Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-09-05 Thread Ted Yu
bq. What about startPartitionAssignment() ?

Make sense.
startPartitionReassignment() seems to be better since the API deals with
reassignment.

Cheers

On Tue, Sep 5, 2017 at 9:39 AM, Tom Bentley  wrote:

> I've revised this KIP again:
>
> * Change the alterPartitionCounts() API to support passing an optional
> assignment for the new partitions (which is already supported by
> kafka-topics.sh). At the same time I didn't want the API to suggest it was
> possible to change the existing assignments in the same call (which isn't
> supported today).
>
> * Removed alterReplicationFactor(), since it's really just a special case
> of reassignPartitions(): In both cases it is necessary to give a complete
> partition assignment and both can be long running due to data movement.
>
>
> Outstanding questions:
>
> 1. Is the proposed alterInterReplicaThrottle() API really better than
> changing the throttle via alterConfigs()? It wouldn't be necessary to
> support alterConfigs() for all broker configs, just DynamicConfigs (which
> can't be set via the config file any way). Would it be a problem that
> triggering the reassignment required ClusterAction on the Cluster, but
> throttling the assignment required Alter on the Topic? What if a user had
> the former permission, but not the latter?
>
> 2. Is reassignPartitions() really the best name? I find the distinction
> between reassigning and just assigning to be of limited value, and
> "reassign" is misleading when the APIs now used for changing the
> replication factor too. Since the API is asynchronous/long running, it
> might be better to indicate that in the name some how. What about
> startPartitionAssignment()?
>
> I am, of course, interested in any other questions or comments people have.
>
>
>
>
> On 30 August 2017 at 16:17, Tom Bentley  wrote:
>
> > Hi all,
> >
> > I've updated the KIP as follows:
> >
> > * remove the APIs supporting progress reporting in favour of the APIs
> > being implemented in KIP-113.
> > * added some APIs to cover the existing functionality around throttling
> > inter-broker transfers, which was previously a TODO.
> >
> > To respond to Colin's suggestion:
> >
> > > TopicAlterationResult alterTopics(TopicAlterationOptions options,
> > >   Map alters)
> > >
> > > PartitionCountAlteration(int numPartitions) implements TopicAlteration
> > >
> > > ReplicationFactorAlteration(int repl) implements TopicAlteration
> > >
> > > ReassignPartitionsAlteration(...) implements TopicAlteration
> >
> > That is a workable alternative to providing 3 separate methods on the
> > AdminClient, but I don't see why it is objectively better. I don't see
> > clients commonly needing to do a mixture of alterations, and it assumes
> > that the options make sense for all alterations.
> >
> > Cheers,
> >
> > Tom
> >
> > On 22 August 2017 at 23:49, Colin McCabe  wrote:
> >
> >> On Wed, Aug 9, 2017, at 07:17, Tom Bentley wrote:
> >> > Hi Dong and Jun,
> >> >
> >> > Thanks again for your input in this discussion and on KIP-113. It's
> >> > difficult that discussion is split between this thread and the one for
> >> > KIP-113, but I'll try to respond on this thread to questions asked on
> >> > this
> >> > thread.
> >> >
> >> > It seems there is some consensus that the alterTopic() API is the
> wrong
> >> > thing, and it would make more sense to separate the different kinds of
> >> > alteration into separate APIs. It seems to me there is then a choice.
> >> >
> >> > 1. Have separate alterPartitionCount(), alterReplicationFactor() and
> >> > reassignPartitions() methods. This would most closely match the
> >> > facilities
> >> > currently offered by kafka-alter-topics and kafka-reassign-partitions.
> >> > 2. Just provide a reassignPartitions() method and infer from the shape
> >> of
> >> > the data passed to that that a change in replication factor and/or
> >> > partition count is required, as suggested by Dong.
> >> >
> >> > The choice we make here is also relevant to KIP-113 and KIP-178. By
> >> > choosing (1) we can change the replication factor or partition count
> >> > without providing an assignment and therefore are necessarily
> requiring
> >> > the
> >> > controller to make a decision for us about which broker (and, for 113,
> >> > which log directory and thus which disk) the replica should be reside.
> >> > There is then the matter of what criteria the controller should use to
> >> > make
> >> > that decision (a decision is also required on topic creation of
> course,
> >> > but
> >> > I'll try not to go there right now).
> >>
> >> Hmm.  One approach would be to have something like
> >>
> >> > TopicAlterationResult alterTopics(TopicAlterationOptions options,
> >> >   Map alters)
> >> >
> >> > PartitionCountAlteration(int numPartitions) implements TopicAlteration
> >> >
> >> > 

Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-09-05 Thread Tom Bentley
I've revised this KIP again:

* Change the alterPartitionCounts() API to support passing an optional
assignment for the new partitions (which is already supported by
kafka-topics.sh). At the same time I didn't want the API to suggest it was
possible to change the existing assignments in the same call (which isn't
supported today).

* Removed alterReplicationFactor(), since it's really just a special case
of reassignPartitions(): In both cases it is necessary to give a complete
partition assignment and both can be long running due to data movement.


Outstanding questions:

1. Is the proposed alterInterReplicaThrottle() API really better than
changing the throttle via alterConfigs()? It wouldn't be necessary to
support alterConfigs() for all broker configs, just DynamicConfigs (which
can't be set via the config file any way). Would it be a problem that
triggering the reassignment required ClusterAction on the Cluster, but
throttling the assignment required Alter on the Topic? What if a user had
the former permission, but not the latter?

2. Is reassignPartitions() really the best name? I find the distinction
between reassigning and just assigning to be of limited value, and
"reassign" is misleading when the APIs now used for changing the
replication factor too. Since the API is asynchronous/long running, it
might be better to indicate that in the name some how. What about
startPartitionAssignment()?

I am, of course, interested in any other questions or comments people have.




On 30 August 2017 at 16:17, Tom Bentley  wrote:

> Hi all,
>
> I've updated the KIP as follows:
>
> * remove the APIs supporting progress reporting in favour of the APIs
> being implemented in KIP-113.
> * added some APIs to cover the existing functionality around throttling
> inter-broker transfers, which was previously a TODO.
>
> To respond to Colin's suggestion:
>
> > TopicAlterationResult alterTopics(TopicAlterationOptions options,
> >   Map alters)
> >
> > PartitionCountAlteration(int numPartitions) implements TopicAlteration
> >
> > ReplicationFactorAlteration(int repl) implements TopicAlteration
> >
> > ReassignPartitionsAlteration(...) implements TopicAlteration
>
> That is a workable alternative to providing 3 separate methods on the
> AdminClient, but I don't see why it is objectively better. I don't see
> clients commonly needing to do a mixture of alterations, and it assumes
> that the options make sense for all alterations.
>
> Cheers,
>
> Tom
>
> On 22 August 2017 at 23:49, Colin McCabe  wrote:
>
>> On Wed, Aug 9, 2017, at 07:17, Tom Bentley wrote:
>> > Hi Dong and Jun,
>> >
>> > Thanks again for your input in this discussion and on KIP-113. It's
>> > difficult that discussion is split between this thread and the one for
>> > KIP-113, but I'll try to respond on this thread to questions asked on
>> > this
>> > thread.
>> >
>> > It seems there is some consensus that the alterTopic() API is the wrong
>> > thing, and it would make more sense to separate the different kinds of
>> > alteration into separate APIs. It seems to me there is then a choice.
>> >
>> > 1. Have separate alterPartitionCount(), alterReplicationFactor() and
>> > reassignPartitions() methods. This would most closely match the
>> > facilities
>> > currently offered by kafka-alter-topics and kafka-reassign-partitions.
>> > 2. Just provide a reassignPartitions() method and infer from the shape
>> of
>> > the data passed to that that a change in replication factor and/or
>> > partition count is required, as suggested by Dong.
>> >
>> > The choice we make here is also relevant to KIP-113 and KIP-178. By
>> > choosing (1) we can change the replication factor or partition count
>> > without providing an assignment and therefore are necessarily requiring
>> > the
>> > controller to make a decision for us about which broker (and, for 113,
>> > which log directory and thus which disk) the replica should be reside.
>> > There is then the matter of what criteria the controller should use to
>> > make
>> > that decision (a decision is also required on topic creation of course,
>> > but
>> > I'll try not to go there right now).
>>
>> Hmm.  One approach would be to have something like
>>
>> > TopicAlterationResult alterTopics(TopicAlterationOptions options,
>> >   Map alters)
>> >
>> > PartitionCountAlteration(int numPartitions) implements TopicAlteration
>> >
>> > ReplicationFactorAlteration(int repl) implements TopicAlteration
>> >
>> > ReassignPartitionsAlteration(...) implements TopicAlteration
>>
>>
>> >
>> > Choosing (2), on the other hand, forces us to make an assignment, and
>> > currently the AdminClient doesn't provide the APIs necessary to make a
>> > very
>> > informed decision. To do the job properly we'd need APIs to enumerate
>> the
>> > permissible log directories on each broker, know the current 

Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-08-30 Thread Tom Bentley
Hi all,

I've updated the KIP as follows:

* remove the APIs supporting progress reporting in favour of the APIs being
implemented in KIP-113.
* added some APIs to cover the existing functionality around throttling
inter-broker transfers, which was previously a TODO.

To respond to Colin's suggestion:

> TopicAlterationResult alterTopics(TopicAlterationOptions options,
>   Map alters)
>
> PartitionCountAlteration(int numPartitions) implements TopicAlteration
>
> ReplicationFactorAlteration(int repl) implements TopicAlteration
>
> ReassignPartitionsAlteration(...) implements TopicAlteration

That is a workable alternative to providing 3 separate methods on the
AdminClient, but I don't see why it is objectively better. I don't see
clients commonly needing to do a mixture of alterations, and it assumes
that the options make sense for all alterations.

Cheers,

Tom

On 22 August 2017 at 23:49, Colin McCabe  wrote:

> On Wed, Aug 9, 2017, at 07:17, Tom Bentley wrote:
> > Hi Dong and Jun,
> >
> > Thanks again for your input in this discussion and on KIP-113. It's
> > difficult that discussion is split between this thread and the one for
> > KIP-113, but I'll try to respond on this thread to questions asked on
> > this
> > thread.
> >
> > It seems there is some consensus that the alterTopic() API is the wrong
> > thing, and it would make more sense to separate the different kinds of
> > alteration into separate APIs. It seems to me there is then a choice.
> >
> > 1. Have separate alterPartitionCount(), alterReplicationFactor() and
> > reassignPartitions() methods. This would most closely match the
> > facilities
> > currently offered by kafka-alter-topics and kafka-reassign-partitions.
> > 2. Just provide a reassignPartitions() method and infer from the shape of
> > the data passed to that that a change in replication factor and/or
> > partition count is required, as suggested by Dong.
> >
> > The choice we make here is also relevant to KIP-113 and KIP-178. By
> > choosing (1) we can change the replication factor or partition count
> > without providing an assignment and therefore are necessarily requiring
> > the
> > controller to make a decision for us about which broker (and, for 113,
> > which log directory and thus which disk) the replica should be reside.
> > There is then the matter of what criteria the controller should use to
> > make
> > that decision (a decision is also required on topic creation of course,
> > but
> > I'll try not to go there right now).
>
> Hmm.  One approach would be to have something like
>
> > TopicAlterationResult alterTopics(TopicAlterationOptions options,
> >   Map alters)
> >
> > PartitionCountAlteration(int numPartitions) implements TopicAlteration
> >
> > ReplicationFactorAlteration(int repl) implements TopicAlteration
> >
> > ReassignPartitionsAlteration(...) implements TopicAlteration
>
>
> >
> > Choosing (2), on the other hand, forces us to make an assignment, and
> > currently the AdminClient doesn't provide the APIs necessary to make a
> > very
> > informed decision. To do the job properly we'd need APIs to enumerate the
> > permissible log directories on each broker, know the current disk usage
> > etc. These just don't exist today, and I think it would be a lot of work
> > to
> > bite off to specify them. We've already got three KIPs on the go.
> >
> > So, for the moment, I think we have to choose 1, for pragmatic reasons.
> > There is nothing to stop us deprecating alterPartitionCount() and
> > alterReplicationFactor() and moving to (2) at a later date.
>
> Yeah, the API should be pragmatic.  An API that doesn't let us do what
> we want to do (like let Kafka assign us a new replica on the least
> loaded node) is not good.
>
> >
> > To answer a specific questions of Jun's:
> >
> > 3. It's not very clear to me what status_time in ReplicaStatusResponse
> > > means.
> >
> >
> > It corresponds to the ReplicaStatus.statusTime, and the idea was it was
> > the
> > epoch offset at which the controllers notion of the lag was current
> > (which
> > I think is the epoch offset of the last FetchRequest from the follower).
> > At
> > one point I was considering some other ways of determining progress and
> > completion, and TBH I think it was more pertinent to those rejected
> > alternatives.
> >
> > I've already made some changes to KIP-179. As suggested in the thread for
> > KIP-113, I will be changing some more since I think we can
> > DescribeDirsRequest
> > instead of ReplicaStatusResponse to implement the --progress option.
>
> That makes sense.  Whatever response is used, it would be nice to have
> an error message in addition to an error code.
>
> best,
> Colin
>
> >
> > Cheers,
> >
> > Tom
> >
> >
> >
> > On 9 August 2017 at 02:28, Jun Rao  wrote:
> >
> > > Hi, Tom,
> > >
> > > Thanks for the KIP. A few minor comments 

Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-08-22 Thread Colin McCabe
On Wed, Aug 9, 2017, at 07:17, Tom Bentley wrote:
> Hi Dong and Jun,
> 
> Thanks again for your input in this discussion and on KIP-113. It's
> difficult that discussion is split between this thread and the one for
> KIP-113, but I'll try to respond on this thread to questions asked on
> this
> thread.
> 
> It seems there is some consensus that the alterTopic() API is the wrong
> thing, and it would make more sense to separate the different kinds of
> alteration into separate APIs. It seems to me there is then a choice.
> 
> 1. Have separate alterPartitionCount(), alterReplicationFactor() and
> reassignPartitions() methods. This would most closely match the
> facilities
> currently offered by kafka-alter-topics and kafka-reassign-partitions.
> 2. Just provide a reassignPartitions() method and infer from the shape of
> the data passed to that that a change in replication factor and/or
> partition count is required, as suggested by Dong.
> 
> The choice we make here is also relevant to KIP-113 and KIP-178. By
> choosing (1) we can change the replication factor or partition count
> without providing an assignment and therefore are necessarily requiring
> the
> controller to make a decision for us about which broker (and, for 113,
> which log directory and thus which disk) the replica should be reside.
> There is then the matter of what criteria the controller should use to
> make
> that decision (a decision is also required on topic creation of course,
> but
> I'll try not to go there right now).

Hmm.  One approach would be to have something like

> TopicAlterationResult alterTopics(TopicAlterationOptions options,
>   Map alters)
>
> PartitionCountAlteration(int numPartitions) implements TopicAlteration
>
> ReplicationFactorAlteration(int repl) implements TopicAlteration
>
> ReassignPartitionsAlteration(...) implements TopicAlteration


> 
> Choosing (2), on the other hand, forces us to make an assignment, and
> currently the AdminClient doesn't provide the APIs necessary to make a
> very
> informed decision. To do the job properly we'd need APIs to enumerate the
> permissible log directories on each broker, know the current disk usage
> etc. These just don't exist today, and I think it would be a lot of work
> to
> bite off to specify them. We've already got three KIPs on the go.
> 
> So, for the moment, I think we have to choose 1, for pragmatic reasons.
> There is nothing to stop us deprecating alterPartitionCount() and
> alterReplicationFactor() and moving to (2) at a later date.

Yeah, the API should be pragmatic.  An API that doesn't let us do what
we want to do (like let Kafka assign us a new replica on the least
loaded node) is not good.

> 
> To answer a specific questions of Jun's:
> 
> 3. It's not very clear to me what status_time in ReplicaStatusResponse
> > means.
> 
> 
> It corresponds to the ReplicaStatus.statusTime, and the idea was it was
> the
> epoch offset at which the controllers notion of the lag was current
> (which
> I think is the epoch offset of the last FetchRequest from the follower).
> At
> one point I was considering some other ways of determining progress and
> completion, and TBH I think it was more pertinent to those rejected
> alternatives.
> 
> I've already made some changes to KIP-179. As suggested in the thread for
> KIP-113, I will be changing some more since I think we can
> DescribeDirsRequest
> instead of ReplicaStatusResponse to implement the --progress option.

That makes sense.  Whatever response is used, it would be nice to have
an error message in addition to an error code.

best,
Colin

> 
> Cheers,
> 
> Tom
> 
> 
> 
> On 9 August 2017 at 02:28, Jun Rao  wrote:
> 
> > Hi, Tom,
> >
> > Thanks for the KIP. A few minor comments below.
> >
> > 1. Implementation wise, the broker handles adding partitions differently
> > from changing replica assignment. For the former, we directly update the
> > topic path in ZK with the new partitions. For the latter, we write the new
> > partition reassignment in the partition reassignment path. Changing the
> > replication factor is handled in the same way as changing replica
> > assignment. So, it would be useful to document how the broker handles these
> > different cases accordingly. I think it's simpler to just allow one change
> > (partition, replication factor, or replica assignment) in a request.
> >
> > 2. Currently, we only allow adding partitions. We probably want to document
> > the restriction in the api.
> >
> > 3. It's not very clear to me what status_time in ReplicaStatusResponse
> > means.
> >
> > Jun
> >
> >
> >
> > On Fri, Aug 4, 2017 at 10:04 AM, Dong Lin  wrote:
> >
> > > Hey Tom,
> > >
> > > Thanks for your reply. Here are my thoughts:
> > >
> > > 1) I think the DescribeDirsResponse can be used by AdminClient to query
> > the
> > > lag of follower replica as well. Here is how it works:
> > >
> > > - AdminClient 

Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-08-09 Thread Tom Bentley
Hi Dong and Jun,

Thanks again for your input in this discussion and on KIP-113. It's
difficult that discussion is split between this thread and the one for
KIP-113, but I'll try to respond on this thread to questions asked on this
thread.

It seems there is some consensus that the alterTopic() API is the wrong
thing, and it would make more sense to separate the different kinds of
alteration into separate APIs. It seems to me there is then a choice.

1. Have separate alterPartitionCount(), alterReplicationFactor() and
reassignPartitions() methods. This would most closely match the facilities
currently offered by kafka-alter-topics and kafka-reassign-partitions.
2. Just provide a reassignPartitions() method and infer from the shape of
the data passed to that that a change in replication factor and/or
partition count is required, as suggested by Dong.

The choice we make here is also relevant to KIP-113 and KIP-178. By
choosing (1) we can change the replication factor or partition count
without providing an assignment and therefore are necessarily requiring the
controller to make a decision for us about which broker (and, for 113,
which log directory and thus which disk) the replica should be reside.
There is then the matter of what criteria the controller should use to make
that decision (a decision is also required on topic creation of course, but
I'll try not to go there right now).

Choosing (2), on the other hand, forces us to make an assignment, and
currently the AdminClient doesn't provide the APIs necessary to make a very
informed decision. To do the job properly we'd need APIs to enumerate the
permissible log directories on each broker, know the current disk usage
etc. These just don't exist today, and I think it would be a lot of work to
bite off to specify them. We've already got three KIPs on the go.

So, for the moment, I think we have to choose 1, for pragmatic reasons.
There is nothing to stop us deprecating alterPartitionCount() and
alterReplicationFactor() and moving to (2) at a later date.

To answer a specific questions of Jun's:

3. It's not very clear to me what status_time in ReplicaStatusResponse
> means.


It corresponds to the ReplicaStatus.statusTime, and the idea was it was the
epoch offset at which the controllers notion of the lag was current (which
I think is the epoch offset of the last FetchRequest from the follower). At
one point I was considering some other ways of determining progress and
completion, and TBH I think it was more pertinent to those rejected
alternatives.

I've already made some changes to KIP-179. As suggested in the thread for
KIP-113, I will be changing some more since I think we can DescribeDirsRequest
instead of ReplicaStatusResponse to implement the --progress option.

Cheers,

Tom



On 9 August 2017 at 02:28, Jun Rao  wrote:

> Hi, Tom,
>
> Thanks for the KIP. A few minor comments below.
>
> 1. Implementation wise, the broker handles adding partitions differently
> from changing replica assignment. For the former, we directly update the
> topic path in ZK with the new partitions. For the latter, we write the new
> partition reassignment in the partition reassignment path. Changing the
> replication factor is handled in the same way as changing replica
> assignment. So, it would be useful to document how the broker handles these
> different cases accordingly. I think it's simpler to just allow one change
> (partition, replication factor, or replica assignment) in a request.
>
> 2. Currently, we only allow adding partitions. We probably want to document
> the restriction in the api.
>
> 3. It's not very clear to me what status_time in ReplicaStatusResponse
> means.
>
> Jun
>
>
>
> On Fri, Aug 4, 2017 at 10:04 AM, Dong Lin  wrote:
>
> > Hey Tom,
> >
> > Thanks for your reply. Here are my thoughts:
> >
> > 1) I think the DescribeDirsResponse can be used by AdminClient to query
> the
> > lag of follower replica as well. Here is how it works:
> >
> > - AdminClient sends DescribeDirsRequest to both the leader and the
> follower
> > of the partition.
> > - DescribeDirsResponse from both leader and follower shows the LEO of the
> > leader replica and the follower replica.
> > - Lag of the follower replica is the difference in the LEO between leader
> > and follower.
> >
> > In comparison to ReplicaStatusRequest, DescribeDirsRequest needs to be
> send
> > to each replica of the partition whereas ReplicaStatusRequest only needs
> to
> > be sent to the leader of the partition. It doesn't seem to make much
> > difference though. In practice we probably want to query the replica lag
> of
> > many partitions and AminClient needs to send exactly one request to each
> > broker with either solution. Does this make sense?
> >
> >
> > 2) KIP-179 proposes to add the following AdminClient API:
> >
> > alterTopics(Collection alteredTopics, AlterTopicsOptions
> > options)
> >
> > Where AlteredTopic includes the following fields
> >
> > 

Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-08-08 Thread Jun Rao
Hi, Tom,

Thanks for the KIP. A few minor comments below.

1. Implementation wise, the broker handles adding partitions differently
from changing replica assignment. For the former, we directly update the
topic path in ZK with the new partitions. For the latter, we write the new
partition reassignment in the partition reassignment path. Changing the
replication factor is handled in the same way as changing replica
assignment. So, it would be useful to document how the broker handles these
different cases accordingly. I think it's simpler to just allow one change
(partition, replication factor, or replica assignment) in a request.

2. Currently, we only allow adding partitions. We probably want to document
the restriction in the api.

3. It's not very clear to me what status_time in ReplicaStatusResponse
means.

Jun



On Fri, Aug 4, 2017 at 10:04 AM, Dong Lin  wrote:

> Hey Tom,
>
> Thanks for your reply. Here are my thoughts:
>
> 1) I think the DescribeDirsResponse can be used by AdminClient to query the
> lag of follower replica as well. Here is how it works:
>
> - AdminClient sends DescribeDirsRequest to both the leader and the follower
> of the partition.
> - DescribeDirsResponse from both leader and follower shows the LEO of the
> leader replica and the follower replica.
> - Lag of the follower replica is the difference in the LEO between leader
> and follower.
>
> In comparison to ReplicaStatusRequest, DescribeDirsRequest needs to be send
> to each replica of the partition whereas ReplicaStatusRequest only needs to
> be sent to the leader of the partition. It doesn't seem to make much
> difference though. In practice we probably want to query the replica lag of
> many partitions and AminClient needs to send exactly one request to each
> broker with either solution. Does this make sense?
>
>
> 2) KIP-179 proposes to add the following AdminClient API:
>
> alterTopics(Collection alteredTopics, AlterTopicsOptions
> options)
>
> Where AlteredTopic includes the following fields
>
> AlteredTopic(String name, int numPartitions, int replicationFactor,
> Map replicasAssignment)
>
> I have two comments on this:
>
> - The information in AlteredTopic seems a bit redundant. Both numPartitions
> and replicationFactor can be derived from replicasAssignment. I think we
> can probably just use the following API in AdminClient instead:
>
> AlterTopicsResult alterTopics(Map
> partitionAssignment, AlterTopicsOptions options)
>
> - Do you think "reassignPartitions" may be a better name than
> "alterTopics"? This is more consistent with the existing name used in
> kafka-reassign-partitions.sh and I also find it to be more accurate. On the
> other hand, alterTopics seems to suggest that we can also alter topic
> config.
>
>
>
> Thanks,
> Dong
>
> On Fri, Aug 4, 2017 at 1:03 AM, Tom Bentley  wrote:
>
> > Hi Dong,
> >
> > Thanks for your reply.
> >
> > You're right that your DescribeDirsResponse contains appropriate data.
> The
> > comment about the log_end_offset in the KIP says "Enable user to track
> > movement progress by comparing LEO of the *.log and *.move". That makes
> me
> > wonder whether this would only work for replica movement on the same
> > broker. In the case of reassigning partitions between brokers it's only
> > really the leader for the partition that knows the max LEO of the ISR and
> > the LEO of the target broker. Maybe the comment is misleading and it
> would
> > be the leader doing the same thing in both cases. Can you clarify this?
>
>
> > At a conceptual level there's a lot of similarity between KIP-113 and
> > KIP-179 because they're both about moving replicas around. Concretely
> > though, ChangeReplicaDirRequest assumes the movement is on the same
> broker,
> > while the equivalent APIs in KIP-179 (whichever alternative) lack the
> > notion of disks. I wonder if a unified API would be better or not.
> > Operationally, the reasons for changing replicas between brokers are
> likely
> > to be motivated by balancing load across the cluster, or adding/removing
> > brokers. But the JBOD use case has different motivations. So I suspect
> it's
> > OK that the APIs are separate, but I'd love to hear what others think.
>
>
> > I think you're right about TopicPartitionReplica. At the protocol level
> we
> > could group topics and partitions together so avoid having the same topic
> > name multiple times when querying for the status of all the partitions
> of a
> > topic.
> >
> > Thanks again for taking the time to respond.
> >
> > Tom
> >
> > On 3 August 2017 at 23:07, Dong Lin  wrote:
> >
> > > Hey Tom,
> > >
> > > Thanks for the KIP. It seems that the prior discussion in this thread
> has
> > > focused on reassigning partitions (or AlterTopics). I haven't looked
> into
> > > this yet. I have two comments on the replicaStatus() API and the
> > > ReplicaStatusRequest:
> > >
> > > -  It seems that the 

Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-08-04 Thread Dong Lin
Hey Tom,

Thanks for your reply. Here are my thoughts:

1) I think the DescribeDirsResponse can be used by AdminClient to query the
lag of follower replica as well. Here is how it works:

- AdminClient sends DescribeDirsRequest to both the leader and the follower
of the partition.
- DescribeDirsResponse from both leader and follower shows the LEO of the
leader replica and the follower replica.
- Lag of the follower replica is the difference in the LEO between leader
and follower.

In comparison to ReplicaStatusRequest, DescribeDirsRequest needs to be send
to each replica of the partition whereas ReplicaStatusRequest only needs to
be sent to the leader of the partition. It doesn't seem to make much
difference though. In practice we probably want to query the replica lag of
many partitions and AminClient needs to send exactly one request to each
broker with either solution. Does this make sense?


2) KIP-179 proposes to add the following AdminClient API:

alterTopics(Collection alteredTopics, AlterTopicsOptions
options)

Where AlteredTopic includes the following fields

AlteredTopic(String name, int numPartitions, int replicationFactor,
Map replicasAssignment)

I have two comments on this:

- The information in AlteredTopic seems a bit redundant. Both numPartitions
and replicationFactor can be derived from replicasAssignment. I think we
can probably just use the following API in AdminClient instead:

AlterTopicsResult alterTopics(Map
partitionAssignment, AlterTopicsOptions options)

- Do you think "reassignPartitions" may be a better name than
"alterTopics"? This is more consistent with the existing name used in
kafka-reassign-partitions.sh and I also find it to be more accurate. On the
other hand, alterTopics seems to suggest that we can also alter topic
config.



Thanks,
Dong

On Fri, Aug 4, 2017 at 1:03 AM, Tom Bentley  wrote:

> Hi Dong,
>
> Thanks for your reply.
>
> You're right that your DescribeDirsResponse contains appropriate data. The
> comment about the log_end_offset in the KIP says "Enable user to track
> movement progress by comparing LEO of the *.log and *.move". That makes me
> wonder whether this would only work for replica movement on the same
> broker. In the case of reassigning partitions between brokers it's only
> really the leader for the partition that knows the max LEO of the ISR and
> the LEO of the target broker. Maybe the comment is misleading and it would
> be the leader doing the same thing in both cases. Can you clarify this?


> At a conceptual level there's a lot of similarity between KIP-113 and
> KIP-179 because they're both about moving replicas around. Concretely
> though, ChangeReplicaDirRequest assumes the movement is on the same broker,
> while the equivalent APIs in KIP-179 (whichever alternative) lack the
> notion of disks. I wonder if a unified API would be better or not.
> Operationally, the reasons for changing replicas between brokers are likely
> to be motivated by balancing load across the cluster, or adding/removing
> brokers. But the JBOD use case has different motivations. So I suspect it's
> OK that the APIs are separate, but I'd love to hear what others think.


> I think you're right about TopicPartitionReplica. At the protocol level we
> could group topics and partitions together so avoid having the same topic
> name multiple times when querying for the status of all the partitions of a
> topic.
>
> Thanks again for taking the time to respond.
>
> Tom
>
> On 3 August 2017 at 23:07, Dong Lin  wrote:
>
> > Hey Tom,
> >
> > Thanks for the KIP. It seems that the prior discussion in this thread has
> > focused on reassigning partitions (or AlterTopics). I haven't looked into
> > this yet. I have two comments on the replicaStatus() API and the
> > ReplicaStatusRequest:
> >
> > -  It seems that the use-case for ReplicaStatusRequest is covered by
> > the DescribeDirsRequest introduced in KIP-113
> >  > 113%3A+Support+replicas+movement+between+log+directories>.
> > DescribeDirsResponse contains the logEndOffset for the specified
> > partitions. The the lag of the follower replica can be derived as the
> > difference between leader's LEO and follower's LEO. Do you think we can
> > simply use DescribeDirsRequest without introducing ReplicaStatusRequest?
> >
> > - According to the current KIP, replicaStatus() takes a list of
> partitions
> > as input. And its output maps the partition to a list of replica status.
> Do
> > you think it would be better to have a new class called
> > TopicPartitionReplica, which identifies the topic, partition and the
> > brokerId. replicaStatus can then take a list of TopicPartitionReplica as
> > input. And its output maps the replica to replica status. The latter API
> > seems simpler and also matches the method name better. What do you think?
> >
> > Thanks,
> > Dong
> >
> >
> >
> > On Wed, Aug 

Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-08-04 Thread Tom Bentley
Hi Dong,

Thanks for your reply.

You're right that your DescribeDirsResponse contains appropriate data. The
comment about the log_end_offset in the KIP says "Enable user to track
movement progress by comparing LEO of the *.log and *.move". That makes me
wonder whether this would only work for replica movement on the same
broker. In the case of reassigning partitions between brokers it's only
really the leader for the partition that knows the max LEO of the ISR and
the LEO of the target broker. Maybe the comment is misleading and it would
be the leader doing the same thing in both cases. Can you clarify this?

At a conceptual level there's a lot of similarity between KIP-113 and
KIP-179 because they're both about moving replicas around. Concretely
though, ChangeReplicaDirRequest assumes the movement is on the same broker,
while the equivalent APIs in KIP-179 (whichever alternative) lack the
notion of disks. I wonder if a unified API would be better or not.
Operationally, the reasons for changing replicas between brokers are likely
to be motivated by balancing load across the cluster, or adding/removing
brokers. But the JBOD use case has different motivations. So I suspect it's
OK that the APIs are separate, but I'd love to hear what others think.

I think you're right about TopicPartitionReplica. At the protocol level we
could group topics and partitions together so avoid having the same topic
name multiple times when querying for the status of all the partitions of a
topic.

Thanks again for taking the time to respond.

Tom

On 3 August 2017 at 23:07, Dong Lin  wrote:

> Hey Tom,
>
> Thanks for the KIP. It seems that the prior discussion in this thread has
> focused on reassigning partitions (or AlterTopics). I haven't looked into
> this yet. I have two comments on the replicaStatus() API and the
> ReplicaStatusRequest:
>
> -  It seems that the use-case for ReplicaStatusRequest is covered by
> the DescribeDirsRequest introduced in KIP-113
>  113%3A+Support+replicas+movement+between+log+directories>.
> DescribeDirsResponse contains the logEndOffset for the specified
> partitions. The the lag of the follower replica can be derived as the
> difference between leader's LEO and follower's LEO. Do you think we can
> simply use DescribeDirsRequest without introducing ReplicaStatusRequest?
>
> - According to the current KIP, replicaStatus() takes a list of partitions
> as input. And its output maps the partition to a list of replica status. Do
> you think it would be better to have a new class called
> TopicPartitionReplica, which identifies the topic, partition and the
> brokerId. replicaStatus can then take a list of TopicPartitionReplica as
> input. And its output maps the replica to replica status. The latter API
> seems simpler and also matches the method name better. What do you think?
>
> Thanks,
> Dong
>
>
>
> On Wed, Aug 2, 2017 at 5:12 AM, Tom Bentley  wrote:
>
> > Hi again Ismael,
> >
> >
> > 1. It's worth emphasising that reassigning partitions is a different
> > >> process than what happens when a topic is created, so not sure trying
> to
> > >> make it symmetric is beneficial. In addition to what was already
> > >> discussed,
> > >> one should also enable replication throttling before moving the data.
> > >
> > >
> > > Thanks for your responses.The only advantage I can see from having
> > > separate APIs for partition count change vs. the other changes is that
> we
> > > expect the former to return synchronously, and the latter to (usually)
> > > return asynchronously. Lumping them into the same API forces clients of
> > > that API to code for the two possible cases. Is this the particular
> > concern
> > > you had in mind, or do you have others?
> > >
> >
> > Just to help see what this looks like I've created another version of
> > KIP-179 (
> > https://cwiki.apache.org/confluence/display/KAFKA/Copy+
> > of+KIP-179+-+Change+ReassignPartitionsCommand+to+use+AdminClient
> > ).
> >
> > The AdminClient APIs are more type safe (harder to misuse). The
> > asynchronous nature of the alterReplicationFactors() and
> > reassignPartitions() methods aren't really apparent (you'd still have to
> > read the javadocs to know you had to call replicaStatus() to determine
> > completion), but it's still better that the lumped-together API because
> the
> > methods are _consistently_ async. The Protocol APIs are slightly nicer
> too,
> > in that INVALID_REQUEST is less commonly returned. Another benefit is it
> > would allow for these different alteration APIs to be modified
> > independently in the future, should that necessary.
> >
> > What do others think?
> >
> > You're right that the proposal doesn't cover setting and clearing a
> > > throttle, and it should. I will study the code about how this works
> > before
> > > posting back about that.
> > >
> >
> > AFAICS from the code the throttle is simply a dynamic 

Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-08-03 Thread Dong Lin
Hey Tom,

Thanks for the KIP. It seems that the prior discussion in this thread has
focused on reassigning partitions (or AlterTopics). I haven't looked into
this yet. I have two comments on the replicaStatus() API and the
ReplicaStatusRequest:

-  It seems that the use-case for ReplicaStatusRequest is covered by
the DescribeDirsRequest introduced in KIP-113
.
DescribeDirsResponse contains the logEndOffset for the specified
partitions. The the lag of the follower replica can be derived as the
difference between leader's LEO and follower's LEO. Do you think we can
simply use DescribeDirsRequest without introducing ReplicaStatusRequest?

- According to the current KIP, replicaStatus() takes a list of partitions
as input. And its output maps the partition to a list of replica status. Do
you think it would be better to have a new class called
TopicPartitionReplica, which identifies the topic, partition and the
brokerId. replicaStatus can then take a list of TopicPartitionReplica as
input. And its output maps the replica to replica status. The latter API
seems simpler and also matches the method name better. What do you think?

Thanks,
Dong



On Wed, Aug 2, 2017 at 5:12 AM, Tom Bentley  wrote:

> Hi again Ismael,
>
>
> 1. It's worth emphasising that reassigning partitions is a different
> >> process than what happens when a topic is created, so not sure trying to
> >> make it symmetric is beneficial. In addition to what was already
> >> discussed,
> >> one should also enable replication throttling before moving the data.
> >
> >
> > Thanks for your responses.The only advantage I can see from having
> > separate APIs for partition count change vs. the other changes is that we
> > expect the former to return synchronously, and the latter to (usually)
> > return asynchronously. Lumping them into the same API forces clients of
> > that API to code for the two possible cases. Is this the particular
> concern
> > you had in mind, or do you have others?
> >
>
> Just to help see what this looks like I've created another version of
> KIP-179 (
> https://cwiki.apache.org/confluence/display/KAFKA/Copy+
> of+KIP-179+-+Change+ReassignPartitionsCommand+to+use+AdminClient
> ).
>
> The AdminClient APIs are more type safe (harder to misuse). The
> asynchronous nature of the alterReplicationFactors() and
> reassignPartitions() methods aren't really apparent (you'd still have to
> read the javadocs to know you had to call replicaStatus() to determine
> completion), but it's still better that the lumped-together API because the
> methods are _consistently_ async. The Protocol APIs are slightly nicer too,
> in that INVALID_REQUEST is less commonly returned. Another benefit is it
> would allow for these different alteration APIs to be modified
> independently in the future, should that necessary.
>
> What do others think?
>
> You're right that the proposal doesn't cover setting and clearing a
> > throttle, and it should. I will study the code about how this works
> before
> > posting back about that.
> >
>
> AFAICS from the code the throttle is simply a dynamic config of the broker.
> We already have the AdminClient.alterConfigs API for setting this, so the
> refactoring of kafka-reassign-partitions.sh could just use that existing
> API. We could still achieve your objective of setting the throttle before
> reassignment by making the --throttle option mandatory when --execute was
> present. Or did you have something else in mind?
>
> Thanks,
>
> Tom
>


Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-08-02 Thread Tom Bentley
Hi again Ismael,


1. It's worth emphasising that reassigning partitions is a different
>> process than what happens when a topic is created, so not sure trying to
>> make it symmetric is beneficial. In addition to what was already
>> discussed,
>> one should also enable replication throttling before moving the data.
>
>
> Thanks for your responses.The only advantage I can see from having
> separate APIs for partition count change vs. the other changes is that we
> expect the former to return synchronously, and the latter to (usually)
> return asynchronously. Lumping them into the same API forces clients of
> that API to code for the two possible cases. Is this the particular concern
> you had in mind, or do you have others?
>

Just to help see what this looks like I've created another version of
KIP-179 (
https://cwiki.apache.org/confluence/display/KAFKA/Copy+of+KIP-179+-+Change+ReassignPartitionsCommand+to+use+AdminClient
).

The AdminClient APIs are more type safe (harder to misuse). The
asynchronous nature of the alterReplicationFactors() and
reassignPartitions() methods aren't really apparent (you'd still have to
read the javadocs to know you had to call replicaStatus() to determine
completion), but it's still better that the lumped-together API because the
methods are _consistently_ async. The Protocol APIs are slightly nicer too,
in that INVALID_REQUEST is less commonly returned. Another benefit is it
would allow for these different alteration APIs to be modified
independently in the future, should that necessary.

What do others think?

You're right that the proposal doesn't cover setting and clearing a
> throttle, and it should. I will study the code about how this works before
> posting back about that.
>

AFAICS from the code the throttle is simply a dynamic config of the broker.
We already have the AdminClient.alterConfigs API for setting this, so the
refactoring of kafka-reassign-partitions.sh could just use that existing
API. We could still achieve your objective of setting the throttle before
reassignment by making the --throttle option mandatory when --execute was
present. Or did you have something else in mind?

Thanks,

Tom


Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-08-01 Thread Tom Bentley
Hi Ismael,


1. It's worth emphasising that reassigning partitions is a different
> process than what happens when a topic is created, so not sure trying to
> make it symmetric is beneficial. In addition to what was already discussed,
> one should also enable replication throttling before moving the data.


Thanks for your responses.The only advantage I can see from having separate
APIs for partition count change vs. the other changes is that we expect the
former to return synchronously, and the latter to (usually) return
asynchronously. Lumping them into the same API forces clients of that API
to code for the two possible cases. Is this the particular concern you had
in mind, or do you have others?

You're right that the proposal doesn't cover setting and clearing a
throttle, and it should. I will study the code about how this works before
posting back about that.

3. In my opinion, the warning when someone updates configs via
> `kafka-topic.sh` was a mistake. Doing the same thing via the configs tool
> is quite clunky in comparison.
>

Is that a request to remove the deprecation warning (and thus to
undeprecate using kafka-topic.sh to change configs)?

Thanks again for taking the time to respond,

Tom



On 1 August 2017 at 17:52, Ismael Juma  wrote:

> Hi both,
>
> Thanks for the replies. A few points:
>
> 1. It's worth emphasising that reassigning partitions is a different
> process than what happens when a topic is created, so not sure trying to
> make it symmetric is beneficial. In addition to what was already discussed,
> one should also enable replication throttling before moving the data.
>
> 2. HTTP codes are not a great comparison as they are per response while
> Kafka responses often include multiple error codes (due to batching).
>
> 3. In my opinion, the warning when someone updates configs via
> `kafka-topic.sh` was a mistake. Doing the same thing via the configs tool
> is quite clunky in comparison.
>
> Ismael
>
> On Tue, Aug 1, 2017 at 4:52 PM, Tom Bentley  wrote:
>
> > > Regarding adding the possibility to alter the topic config through the
> > > AlterTopic API, the current TopicCommand implementation provides a
> > warning
> > > on doing this suggesting to use the ConfigCommand tool. So it would be
> a
> > > step back allowing to do the configs change with the alter topic as
> well.
> > >
> >
> > To be clear, I'm not proposing that this be supported via the
> > kafka-reassign-partitions.sh tool, nor via kafka-topics.sh. It's purely a
> > question of whether the API for the AlterTopicsRequest/Response and the
> > AdminClient API would support it. As I said, I really don't like the
> > duplication of functionality and would prefer to leave it out, even
> though
> > it makes for a less "symmetric" API wrt CreateTopicsRequest/Response, but
> > maybe the community feels otherwise.
> >
>


Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-08-01 Thread Ismael Juma
Hi both,

Thanks for the replies. A few points:

1. It's worth emphasising that reassigning partitions is a different
process than what happens when a topic is created, so not sure trying to
make it symmetric is beneficial. In addition to what was already discussed,
one should also enable replication throttling before moving the data.

2. HTTP codes are not a great comparison as they are per response while
Kafka responses often include multiple error codes (due to batching).

3. In my opinion, the warning when someone updates configs via
`kafka-topic.sh` was a mistake. Doing the same thing via the configs tool
is quite clunky in comparison.

Ismael

On Tue, Aug 1, 2017 at 4:52 PM, Tom Bentley  wrote:

> > Regarding adding the possibility to alter the topic config through the
> > AlterTopic API, the current TopicCommand implementation provides a
> warning
> > on doing this suggesting to use the ConfigCommand tool. So it would be a
> > step back allowing to do the configs change with the alter topic as well.
> >
>
> To be clear, I'm not proposing that this be supported via the
> kafka-reassign-partitions.sh tool, nor via kafka-topics.sh. It's purely a
> question of whether the API for the AlterTopicsRequest/Response and the
> AdminClient API would support it. As I said, I really don't like the
> duplication of functionality and would prefer to leave it out, even though
> it makes for a less "symmetric" API wrt CreateTopicsRequest/Response, but
> maybe the community feels otherwise.
>


Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-08-01 Thread Tom Bentley
> Regarding adding the possibility to alter the topic config through the
> AlterTopic API, the current TopicCommand implementation provides a warning
> on doing this suggesting to use the ConfigCommand tool. So it would be a
> step back allowing to do the configs change with the alter topic as well.
>

To be clear, I'm not proposing that this be supported via the
kafka-reassign-partitions.sh tool, nor via kafka-topics.sh. It's purely a
question of whether the API for the AlterTopicsRequest/Response and the
AdminClient API would support it. As I said, I really don't like the
duplication of functionality and would prefer to leave it out, even though
it makes for a less "symmetric" API wrt CreateTopicsRequest/Response, but
maybe the community feels otherwise.


Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-08-01 Thread Paolo Patierno
Hi,


Regarding the double API I agree on having only one. Compared to an HTTP REST 
API you could have an "already executed" (fast) operation so the HTTP response 
status code could be just 200 OK while a "long running" operation could have an 
HTTP response status code as 202 ACCEPTED (but the processing has not 
completed).


Regarding adding the possibility to alter the topic config through the 
AlterTopic API, the current TopicCommand implementation provides a warning on 
doing this suggesting to use the ConfigCommand tool. So it would be a step back 
allowing to do the configs change with the alter topic as well.


Last thing on the KIP ... the "timeout" field in the AlterTopicRequest is 
missing in the table with related description.


Thanks


Paolo Patierno
Senior Software Engineer (IoT) @ Red Hat
Microsoft MVP on Windows Embedded & IoT
Microsoft Azure Advisor

Twitter : @ppatierno<http://twitter.com/ppatierno>
Linkedin : paolopatierno<http://it.linkedin.com/in/paolopatierno>
Blog : DevExperience<http://paolopatierno.wordpress.com/>



From: Tom Bentley <t.j.bent...@gmail.com>
Sent: Tuesday, August 1, 2017 2:22 PM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use 
AdminClient

Hi Ismael,

Thanks for taking the time to look at this.

Currently the proposal around the ReplicaStatusRequest/Response just allows
you to see the lag for the given replicas. It's not something that's tied
to a prior request to alter the topic at all, though obviously you can use
it to monitor how a reassignment is progressing.

I think having the API independent of AlterTopicsRequest/Response is a good
thing, because people might want to know that independently of a
reassignment, for example. But the fact that you just see the instantaneous
lag makes it less than perfect for monitoring a reassignment. You would
have to make multiple calls to figure out how quickly the lag is
diminishing or growing. But I don't want each partition leader having to
maintain some kind of stateful calculation just in case it is queried for
the state of that partition.

So I think that problem might better be solved in the
kafka-reassign-partitions.sh tool. Let _it_ make multiple requests for
status and figure out how the lag is changing.

>From the PoV of the API presented by the protocol, the proposal has moved
in the direction of making AlterTopicsRequest/Response more and more like
CreateTopicsRequest/Response, and I think it has improved as a result of
this.

Personally I don't think we should split the AlterTopicsRequest/Response
API in two, one which we expect to be short running and another which we
expect to be long running. I think that would make the API less symmetric,
and thus less easily understood. What might make more sense is to use the
return code to indicate the distinction between, "OK, it's done completely"
and "OK, it's been started, but will finish asynchronously" -- In a way
that distinction already exists via the timeout in AlterTopicsOptions. In
other words a configs change might return NONE, but a reassignment (with a
short timeout) might return REQUEST_TIMED_OUT, though I guess the problem
with that is that TimeoutException is retriable, and REQUEST_TIMED_OUT
doesn't suggest that you can use another API for query for completion, so
perhaps a dedicated status code is warranted (ASYNCHRONOUS_COMPLETION?)

To be honest, if we added the ability to update the topic configs, it would
be even more symmetric, and I wouldn't be opposed to that, except that I
dislike having two code paths for achieving the same thing, and we already
have AlterConfigsRequest and Response.

What do other people think?

Cheers,

Tom


On 1 August 2017 at 14:25, Ismael Juma <ism...@juma.me.uk> wrote:

> Hi Tom,
>
> A high-level point for discussion before going into the details. The
> proposed protocol API `alterTopics` has 2 types of operations:
>
> 1. Operations that cause data movement (or deletion): increase/decrease of
> replication factor and partition reassignment. These are currently done by
> `kafka-reassign-partitions.sh` and having a way to check the progress is
> useful.
>
> 2. Operations that don't involve data movement: increase the number of
> partitions and update topic configs (this is not in the proposal, but could
> be). These are currently done by `kafka-topics.sh` and they don't usually
> take very long so progress reporting is less important.
>
> It would be worth thinking whether having 2 separate protocol APIs would be
> better. I can see pros and cons, so I'd be interested in what you and
> others think.
>
> Ismael
>
> On Tue, Aug 1, 2017 at 10:19 AM, Tom Bentley <t.j.bent...@gmail.com>
> wrote:
>
> > I have added the timeout I mentioned before, because it makes the
> &g

Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-08-01 Thread Tom Bentley
Hi Ismael,

Thanks for taking the time to look at this.

Currently the proposal around the ReplicaStatusRequest/Response just allows
you to see the lag for the given replicas. It's not something that's tied
to a prior request to alter the topic at all, though obviously you can use
it to monitor how a reassignment is progressing.

I think having the API independent of AlterTopicsRequest/Response is a good
thing, because people might want to know that independently of a
reassignment, for example. But the fact that you just see the instantaneous
lag makes it less than perfect for monitoring a reassignment. You would
have to make multiple calls to figure out how quickly the lag is
diminishing or growing. But I don't want each partition leader having to
maintain some kind of stateful calculation just in case it is queried for
the state of that partition.

So I think that problem might better be solved in the
kafka-reassign-partitions.sh tool. Let _it_ make multiple requests for
status and figure out how the lag is changing.

>From the PoV of the API presented by the protocol, the proposal has moved
in the direction of making AlterTopicsRequest/Response more and more like
CreateTopicsRequest/Response, and I think it has improved as a result of
this.

Personally I don't think we should split the AlterTopicsRequest/Response
API in two, one which we expect to be short running and another which we
expect to be long running. I think that would make the API less symmetric,
and thus less easily understood. What might make more sense is to use the
return code to indicate the distinction between, "OK, it's done completely"
and "OK, it's been started, but will finish asynchronously" -- In a way
that distinction already exists via the timeout in AlterTopicsOptions. In
other words a configs change might return NONE, but a reassignment (with a
short timeout) might return REQUEST_TIMED_OUT, though I guess the problem
with that is that TimeoutException is retriable, and REQUEST_TIMED_OUT
doesn't suggest that you can use another API for query for completion, so
perhaps a dedicated status code is warranted (ASYNCHRONOUS_COMPLETION?)

To be honest, if we added the ability to update the topic configs, it would
be even more symmetric, and I wouldn't be opposed to that, except that I
dislike having two code paths for achieving the same thing, and we already
have AlterConfigsRequest and Response.

What do other people think?

Cheers,

Tom


On 1 August 2017 at 14:25, Ismael Juma  wrote:

> Hi Tom,
>
> A high-level point for discussion before going into the details. The
> proposed protocol API `alterTopics` has 2 types of operations:
>
> 1. Operations that cause data movement (or deletion): increase/decrease of
> replication factor and partition reassignment. These are currently done by
> `kafka-reassign-partitions.sh` and having a way to check the progress is
> useful.
>
> 2. Operations that don't involve data movement: increase the number of
> partitions and update topic configs (this is not in the proposal, but could
> be). These are currently done by `kafka-topics.sh` and they don't usually
> take very long so progress reporting is less important.
>
> It would be worth thinking whether having 2 separate protocol APIs would be
> better. I can see pros and cons, so I'd be interested in what you and
> others think.
>
> Ismael
>
> On Tue, Aug 1, 2017 at 10:19 AM, Tom Bentley 
> wrote:
>
> > I have added the timeout I mentioned before, because it makes the
> > implementation of topic alteration more symmetric with the topic creation
> > APIs.
> >
> > I have also added a section ("Policy") on retrofitting the
> > CreateTopicPolicy's rules to topic alteration requests, and made a few
> > other minor fixes.
> >
> > I've still not factored out the topic name from the ReplicaStatusRequest
> > (Ismael, do you have an opinion about that?)
> >
> > If any one else has some feedback on this KIP that would be great.
> > Otherwise, I would like to start a vote on this KIP before the end of the
> > week.
> >
> > Thanks,
> >
> > Tom
> >
> > On 26 July 2017 at 11:45, Tom Bentley  wrote:
> >
> > > I've updated the KIP to fix those niggles, but I've not factored out
> the
> > > topic name from the ReplicaStatusRequest, yet.
> > >
> > > Looking at the topic creation APIs in more detail, the
> > CreateTopicsOptions
> > > has
> > >
> > > * `shouldValidateOnly()`, which would make a lot of sense for the alter
> > > topic APIs
> > > * `timeoutMs()`, which I'm not sure sure about...
> > >
> > > Topic creation doesn't require shifting replicas between brokers so
> it's
> > > reasonable to support timeout, because we don't expect it to take very
> > long.
> > >
> > > Topic alteration usually takes a while because we are going to have to
> > > move replicas. Since we're adding a whole API to track the progress of
> > that
> > > replication, I'm inclined to think that having a timeout is a bit
> > 

Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-08-01 Thread Ismael Juma
Hi Tom,

A high-level point for discussion before going into the details. The
proposed protocol API `alterTopics` has 2 types of operations:

1. Operations that cause data movement (or deletion): increase/decrease of
replication factor and partition reassignment. These are currently done by
`kafka-reassign-partitions.sh` and having a way to check the progress is
useful.

2. Operations that don't involve data movement: increase the number of
partitions and update topic configs (this is not in the proposal, but could
be). These are currently done by `kafka-topics.sh` and they don't usually
take very long so progress reporting is less important.

It would be worth thinking whether having 2 separate protocol APIs would be
better. I can see pros and cons, so I'd be interested in what you and
others think.

Ismael

On Tue, Aug 1, 2017 at 10:19 AM, Tom Bentley  wrote:

> I have added the timeout I mentioned before, because it makes the
> implementation of topic alteration more symmetric with the topic creation
> APIs.
>
> I have also added a section ("Policy") on retrofitting the
> CreateTopicPolicy's rules to topic alteration requests, and made a few
> other minor fixes.
>
> I've still not factored out the topic name from the ReplicaStatusRequest
> (Ismael, do you have an opinion about that?)
>
> If any one else has some feedback on this KIP that would be great.
> Otherwise, I would like to start a vote on this KIP before the end of the
> week.
>
> Thanks,
>
> Tom
>
> On 26 July 2017 at 11:45, Tom Bentley  wrote:
>
> > I've updated the KIP to fix those niggles, but I've not factored out the
> > topic name from the ReplicaStatusRequest, yet.
> >
> > Looking at the topic creation APIs in more detail, the
> CreateTopicsOptions
> > has
> >
> > * `shouldValidateOnly()`, which would make a lot of sense for the alter
> > topic APIs
> > * `timeoutMs()`, which I'm not sure sure about...
> >
> > Topic creation doesn't require shifting replicas between brokers so it's
> > reasonable to support timeout, because we don't expect it to take very
> long.
> >
> > Topic alteration usually takes a while because we are going to have to
> > move replicas. Since we're adding a whole API to track the progress of
> that
> > replication, I'm inclined to think that having a timeout is a bit
> pointless.
> >
> > But should the replicaStatus() API have a timeout? I suppose it probably
> > should.
> >
> >
> > On 26 July 2017 at 10:58, Tom Bentley  wrote:
> >
> >> Thanks Paolo,
> >>
> >>   *   in the "Public Interfaces" section you wrote
> >>> alterTopics(Set) but then a collection is used (instead
> of a
> >>> set) in the Proposed Changes section. I'm ok with collection.
> >>>
> >>
> >> Agree it should be Collection.
> >>
> >>
> >>>   *   in the summary of the alterTopics method you say "The request can
> >>> change the number of partitions, replication factor and/or the
> partition
> >>> assignments." I think that the "and/or" is misleading (at least for
> me).
> >>> For the TopicCommand tool you can specify partitions AND replication
> factor
> >>> ... OR partition assignments (but not for example partitions AND
> >>> replication factor AND partition assignments). Maybe it could be "The
> >>> request can change the number of partitions and the related replication
> >>> factor or specifying a partition assignments."
> >>>
> >>
> >> Is there a reason why we can't support changing all three at once? It
> >> certainly makes conceptual sense to, say, increase the number of
> partitions
> >> and replication factor, and specify how you want the partitions
> assigned.
> >> And doing two separate calls would be less efficient as we sync new
> >> replicas on brokers only to then move them somewhere else.
> >>
> >> If there is a reason we don't want to support changing all three, then
> we
> >> can return the error INVALID_REQUEST (42). That would allow us to
> support
> >> changing all three at some time in the future, without having to change
> the
> >> API.
> >>
> >>
> >>>   *   I know that it would be a breaking change in the Admin Client API
> >>> but why having an AlteredTopic class which is quite similar to the
> already
> >>> existing NewTopic class ? I know that using NewTopic for the alter
> method
> >>> could be misleading. What about renaming NewTopic in something like
> >>> AdminTopic ? At same time it means that the TopicDetails class (which
> you
> >>> can get from the current NewTopic) should be outside the
> >>> CreateTopicsRequest because it could be used in the AlterTopicsRequest
> as
> >>> well.
> >>>
> >>
> >> One problem with this is it tends to inhibit future API changes for
> >> either newTopics() or alterTopics(), because any common class needs to
> make
> >> sense in both contexts.
> >>
> >> For createTopics() we get to specify some configs (the
> >> Map), but since the AdminClient already has
> alterConfigs()
> >> for changing topic 

Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-08-01 Thread Tom Bentley
I have added the timeout I mentioned before, because it makes the
implementation of topic alteration more symmetric with the topic creation
APIs.

I have also added a section ("Policy") on retrofitting the
CreateTopicPolicy's rules to topic alteration requests, and made a few
other minor fixes.

I've still not factored out the topic name from the ReplicaStatusRequest
(Ismael, do you have an opinion about that?)

If any one else has some feedback on this KIP that would be great.
Otherwise, I would like to start a vote on this KIP before the end of the
week.

Thanks,

Tom

On 26 July 2017 at 11:45, Tom Bentley  wrote:

> I've updated the KIP to fix those niggles, but I've not factored out the
> topic name from the ReplicaStatusRequest, yet.
>
> Looking at the topic creation APIs in more detail, the CreateTopicsOptions
> has
>
> * `shouldValidateOnly()`, which would make a lot of sense for the alter
> topic APIs
> * `timeoutMs()`, which I'm not sure sure about...
>
> Topic creation doesn't require shifting replicas between brokers so it's
> reasonable to support timeout, because we don't expect it to take very long.
>
> Topic alteration usually takes a while because we are going to have to
> move replicas. Since we're adding a whole API to track the progress of that
> replication, I'm inclined to think that having a timeout is a bit pointless.
>
> But should the replicaStatus() API have a timeout? I suppose it probably
> should.
>
>
> On 26 July 2017 at 10:58, Tom Bentley  wrote:
>
>> Thanks Paolo,
>>
>>   *   in the "Public Interfaces" section you wrote
>>> alterTopics(Set) but then a collection is used (instead of a
>>> set) in the Proposed Changes section. I'm ok with collection.
>>>
>>
>> Agree it should be Collection.
>>
>>
>>>   *   in the summary of the alterTopics method you say "The request can
>>> change the number of partitions, replication factor and/or the partition
>>> assignments." I think that the "and/or" is misleading (at least for me).
>>> For the TopicCommand tool you can specify partitions AND replication factor
>>> ... OR partition assignments (but not for example partitions AND
>>> replication factor AND partition assignments). Maybe it could be "The
>>> request can change the number of partitions and the related replication
>>> factor or specifying a partition assignments."
>>>
>>
>> Is there a reason why we can't support changing all three at once? It
>> certainly makes conceptual sense to, say, increase the number of partitions
>> and replication factor, and specify how you want the partitions assigned.
>> And doing two separate calls would be less efficient as we sync new
>> replicas on brokers only to then move them somewhere else.
>>
>> If there is a reason we don't want to support changing all three, then we
>> can return the error INVALID_REQUEST (42). That would allow us to support
>> changing all three at some time in the future, without having to change the
>> API.
>>
>>
>>>   *   I know that it would be a breaking change in the Admin Client API
>>> but why having an AlteredTopic class which is quite similar to the already
>>> existing NewTopic class ? I know that using NewTopic for the alter method
>>> could be misleading. What about renaming NewTopic in something like
>>> AdminTopic ? At same time it means that the TopicDetails class (which you
>>> can get from the current NewTopic) should be outside the
>>> CreateTopicsRequest because it could be used in the AlterTopicsRequest as
>>> well.
>>>
>>
>> One problem with this is it tends to inhibit future API changes for
>> either newTopics() or alterTopics(), because any common class needs to make
>> sense in both contexts.
>>
>> For createTopics() we get to specify some configs (the
>> Map), but since the AdminClient already has alterConfigs()
>> for changing topic configs after topic creation I don't think it's right to
>> also support changing those configs via alterTopics() as well. But having
>> them in a common AdminTopic class would suggest that that was supported.
>> Yes, alterTopics could return INVALID_REQUEST if it was given topic
>> configs, but this is just making the API harder to use since it is
>> weakening of the type safety of the API.
>>
>> I suppose we *could* write a common TopicDetails class and make the
>> existing nested one extend the common one, with deprecations, to eventually
>> remove the nested one.
>>
>>
>>
>>>   *   A typo in the ReplicaStatus : gpartition() instead of partition()
>>>   *   In the AlterTopicRequets
>>>  *   the replication factor should be INT16
>>>
>>
>> Ah, thanks!
>>
>>
>>>  *   I would use same fields name as CreateTopicsRequest (they are
>>> quite similar)
>>>   *   What's the broker id in the ReplicaStatusRequest ?
>>>
>>
>> It's the broker, which is expected to have a replica of the given
>> partition, that we're querying the status of. It is necessary because we're
>> asking the _leader_ for the 

Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-07-26 Thread Tom Bentley
I've updated the KIP to fix those niggles, but I've not factored out the
topic name from the ReplicaStatusRequest, yet.

Looking at the topic creation APIs in more detail, the CreateTopicsOptions
has

* `shouldValidateOnly()`, which would make a lot of sense for the alter
topic APIs
* `timeoutMs()`, which I'm not sure sure about...

Topic creation doesn't require shifting replicas between brokers so it's
reasonable to support timeout, because we don't expect it to take very long.

Topic alteration usually takes a while because we are going to have to move
replicas. Since we're adding a whole API to track the progress of that
replication, I'm inclined to think that having a timeout is a bit pointless.

But should the replicaStatus() API have a timeout? I suppose it probably
should.


On 26 July 2017 at 10:58, Tom Bentley  wrote:

> Thanks Paolo,
>
>   *   in the "Public Interfaces" section you wrote
>> alterTopics(Set) but then a collection is used (instead of a
>> set) in the Proposed Changes section. I'm ok with collection.
>>
>
> Agree it should be Collection.
>
>
>>   *   in the summary of the alterTopics method you say "The request can
>> change the number of partitions, replication factor and/or the partition
>> assignments." I think that the "and/or" is misleading (at least for me).
>> For the TopicCommand tool you can specify partitions AND replication factor
>> ... OR partition assignments (but not for example partitions AND
>> replication factor AND partition assignments). Maybe it could be "The
>> request can change the number of partitions and the related replication
>> factor or specifying a partition assignments."
>>
>
> Is there a reason why we can't support changing all three at once? It
> certainly makes conceptual sense to, say, increase the number of partitions
> and replication factor, and specify how you want the partitions assigned.
> And doing two separate calls would be less efficient as we sync new
> replicas on brokers only to then move them somewhere else.
>
> If there is a reason we don't want to support changing all three, then we
> can return the error INVALID_REQUEST (42). That would allow us to support
> changing all three at some time in the future, without having to change the
> API.
>
>
>>   *   I know that it would be a breaking change in the Admin Client API
>> but why having an AlteredTopic class which is quite similar to the already
>> existing NewTopic class ? I know that using NewTopic for the alter method
>> could be misleading. What about renaming NewTopic in something like
>> AdminTopic ? At same time it means that the TopicDetails class (which you
>> can get from the current NewTopic) should be outside the
>> CreateTopicsRequest because it could be used in the AlterTopicsRequest as
>> well.
>>
>
> One problem with this is it tends to inhibit future API changes for either
> newTopics() or alterTopics(), because any common class needs to make sense
> in both contexts.
>
> For createTopics() we get to specify some configs (the
> Map), but since the AdminClient already has alterConfigs()
> for changing topic configs after topic creation I don't think it's right to
> also support changing those configs via alterTopics() as well. But having
> them in a common AdminTopic class would suggest that that was supported.
> Yes, alterTopics could return INVALID_REQUEST if it was given topic
> configs, but this is just making the API harder to use since it is
> weakening of the type safety of the API.
>
> I suppose we *could* write a common TopicDetails class and make the
> existing nested one extend the common one, with deprecations, to eventually
> remove the nested one.
>
>
>
>>   *   A typo in the ReplicaStatus : gpartition() instead of partition()
>>   *   In the AlterTopicRequets
>>  *   the replication factor should be INT16
>>
>
> Ah, thanks!
>
>
>>  *   I would use same fields name as CreateTopicsRequest (they are
>> quite similar)
>>   *   What's the broker id in the ReplicaStatusRequest ?
>>
>
> It's the broker, which is expected to have a replica of the given
> partition, that we're querying the status of. It is necessary because we're
> asking the _leader_ for the partition about (a subset of) the status of the
> followers. Put another way, to identify the replica of a particular
> partition on a particular broker we need the tuple (topic, partition,
> broker).
>
> If we were always interested in the status of the partition across all
> brokers we could omit the broker part. But for reassignment we actually
> only care about a subset of the brokers.
>
>
>>   *   Thinking aloud, could make sense having "Partition" in the
>> ReplicaStatusRequest as an array so that I can specify in only one request
>> the status for partitions I'm interested in, in order to avoid e request
>> for each partition for the same topic. Maybe empty array could mean ..
>> "give me status for all partitions of this topic". Of course it means 

Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-07-26 Thread Tom Bentley
Thanks Paolo,

  *   in the "Public Interfaces" section you wrote
> alterTopics(Set) but then a collection is used (instead of a
> set) in the Proposed Changes section. I'm ok with collection.
>

Agree it should be Collection.


>   *   in the summary of the alterTopics method you say "The request can
> change the number of partitions, replication factor and/or the partition
> assignments." I think that the "and/or" is misleading (at least for me).
> For the TopicCommand tool you can specify partitions AND replication factor
> ... OR partition assignments (but not for example partitions AND
> replication factor AND partition assignments). Maybe it could be "The
> request can change the number of partitions and the related replication
> factor or specifying a partition assignments."
>

Is there a reason why we can't support changing all three at once? It
certainly makes conceptual sense to, say, increase the number of partitions
and replication factor, and specify how you want the partitions assigned.
And doing two separate calls would be less efficient as we sync new
replicas on brokers only to then move them somewhere else.

If there is a reason we don't want to support changing all three, then we
can return the error INVALID_REQUEST (42). That would allow us to support
changing all three at some time in the future, without having to change the
API.


>   *   I know that it would be a breaking change in the Admin Client API
> but why having an AlteredTopic class which is quite similar to the already
> existing NewTopic class ? I know that using NewTopic for the alter method
> could be misleading. What about renaming NewTopic in something like
> AdminTopic ? At same time it means that the TopicDetails class (which you
> can get from the current NewTopic) should be outside the
> CreateTopicsRequest because it could be used in the AlterTopicsRequest as
> well.
>

One problem with this is it tends to inhibit future API changes for either
newTopics() or alterTopics(), because any common class needs to make sense
in both contexts.

For createTopics() we get to specify some configs (the Map),
but since the AdminClient already has alterConfigs() for changing topic
configs after topic creation I don't think it's right to also support
changing those configs via alterTopics() as well. But having them in a
common AdminTopic class would suggest that that was supported. Yes,
alterTopics could return INVALID_REQUEST if it was given topic configs, but
this is just making the API harder to use since it is weakening of the type
safety of the API.

I suppose we *could* write a common TopicDetails class and make the
existing nested one extend the common one, with deprecations, to eventually
remove the nested one.



>   *   A typo in the ReplicaStatus : gpartition() instead of partition()
>   *   In the AlterTopicRequets
>  *   the replication factor should be INT16
>

Ah, thanks!


>  *   I would use same fields name as CreateTopicsRequest (they are
> quite similar)
>   *   What's the broker id in the ReplicaStatusRequest ?
>

It's the broker, which is expected to have a replica of the given
partition, that we're querying the status of. It is necessary because we're
asking the _leader_ for the partition about (a subset of) the status of the
followers. Put another way, to identify the replica of a particular
partition on a particular broker we need the tuple (topic, partition,
broker).

If we were always interested in the status of the partition across all
brokers we could omit the broker part. But for reassignment we actually
only care about a subset of the brokers.


>   *   Thinking aloud, could make sense having "Partition" in the
> ReplicaStatusRequest as an array so that I can specify in only one request
> the status for partitions I'm interested in, in order to avoid e request
> for each partition for the same topic. Maybe empty array could mean ..
> "give me status for all partitions of this topic". Of course it means that
> the ReplicaStatusResponse should change because we should have an array
> with partition, broker, lag and so on
>

You already can specify in one request the status for all the partitions
you're interested in (ReplicaStatus can be repeated/is an array field).

We could factor out the topic to avoid repeating it, which would be more
efficient when we're querying the status of many partitions of a topic
and/or there are many brokers holding replicas. In other words, we could
factor it to look like this:

ReplicaStatusRequest => [TopicReplicas]
  TopicReplicas => Topic [PartitionReplica]
Topic => string
PartitionReplica => Partition Broker
  Partition => int32
  Broker => int32

Does this make sense to you?


Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-07-25 Thread Paolo Patierno
Hi Tom,


I haven taken a look to the updated KIP, some thoughts :


  *   in the "Public Interfaces" section you wrote 
alterTopics(Set) but then a collection is used (instead of a set) 
in the Proposed Changes section. I'm ok with collection.
  *   in the summary of the alterTopics method you say "The request can change 
the number of partitions, replication factor and/or the partition assignments." 
I think that the "and/or" is misleading (at least for me). For the TopicCommand 
tool you can specify partitions AND replication factor ... OR partition 
assignments (but not for example partitions AND replication factor AND 
partition assignments). Maybe it could be "The request can change the number of 
partitions and the related replication factor or specifying a partition 
assignments."
  *   I know that it would be a breaking change in the Admin Client API but why 
having an AlteredTopic class which is quite similar to the already existing 
NewTopic class ? I know that using NewTopic for the alter method could be 
misleading. What about renaming NewTopic in something like AdminTopic ? At same 
time it means that the TopicDetails class (which you can get from the current 
NewTopic) should be outside the CreateTopicsRequest because it could be used in 
the AlterTopicsRequest as well.
  *   A typo in the ReplicaStatus : gpartition() instead of partition()
  *   In the AlterTopicRequets
 *   the replication factor should be INT16
 *   I would use same fields name as CreateTopicsRequest (they are quite 
similar)
  *   What's the broker id in the ReplicaStatusRequest ?
  *   Thinking aloud, could make sense having "Partition" in the 
ReplicaStatusRequest as an array so that I can specify in only one request the 
status for partitions I'm interested in, in order to avoid e request for each 
partition for the same topic. Maybe empty array could mean .. "give me status 
for all partitions of this topic". Of course it means that the 
ReplicaStatusResponse should change because we should have an array with 
partition, broker, lag and so on

Thanks,
Paolo


Paolo Patierno
Senior Software Engineer (IoT) @ Red Hat
Microsoft MVP on Windows Embedded & IoT
Microsoft Azure Advisor

Twitter : @ppatierno<http://twitter.com/ppatierno>
Linkedin : paolopatierno<http://it.linkedin.com/in/paolopatierno>
Blog : DevExperience<http://paolopatierno.wordpress.com/>



From: Tom Bentley <t.j.bent...@gmail.com>
Sent: Tuesday, July 25, 2017 11:07 AM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use 
AdminClient

Hi Ismael and Paolo,

I've updated my KIP [1] to describe an alterTopics() API which would work
for kafka-reassign-partitions.sh. It's still a bit rough, but should be a
good basis for a KIP to cover both tools.

As a first step, if Paolo could review this and check it's compatible with
what he needs for the kafka-topics.sh tool that would be great. Then we can
add to describe his changes to the rest of the kafka-topics.sh tool,
assuming Paolo is happy with this arrangement, Paolo?

Cheers,

Tom

[1]
https://cwiki.apache.org/confluence/display/KAFKA/KIP-179+-+Change+ReassignPartitionsCommand+to+use+AdminClient

On 25 July 2017 at 11:38, Ismael Juma <ism...@juma.me.uk> wrote:

> Hi Tom and Paolo,
>
> It's true that increasing the number of partitions is done via the
> kafka-topics tool, which is also being converted to use the AdminClient
> (but via a different JIRA). I also agree that it would be good to consider
> if alterTopics would be a sensible way to support all the use cases or if
> it's better to have separate APIs. I think it makes sense to have a single
> KIP though as they are related and it will be easier to evaluate as a
> whole.
>
> Does that make sense?
>
> Ismael
>
> On Tue, Jul 25, 2017 at 10:16 AM, Paolo Patierno <ppatie...@live.com>
> wrote:
>
> > Hi,
> >
> >
> > I was digging into it because I need something like an Admin Client alter
> > API for my work on rewriting the TopicCommand tool using them.
> >
> > The AlterConfigs API is used for changing topic level configuration (i.e.
> > retention.ms, retention.bytes and so on).
> >
> > A new AlterTopic API could be better in order to change topic "high
> level"
> > structure so number of partitions, replication factors and so on.
> >
> > My opinion is that we need separate API because from my point of view
> they
> > are different settings.
> >
> >
> > Thanks,
> >
> > Paolo.
> >
> >
> > Paolo Patierno
> > Senior Software Engineer (IoT) @ Red Hat
> > Microsoft MVP on Windows Embedded & IoT
> > Microsoft Azure Advisor
> >
> > Twitter : @ppatierno&l

Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-07-25 Thread Ismael Juma
Hi Tom and Paolo,

It's true that increasing the number of partitions is done via the
kafka-topics tool, which is also being converted to use the AdminClient
(but via a different JIRA). I also agree that it would be good to consider
if alterTopics would be a sensible way to support all the use cases or if
it's better to have separate APIs. I think it makes sense to have a single
KIP though as they are related and it will be easier to evaluate as a whole.

Does that make sense?

Ismael

On Tue, Jul 25, 2017 at 10:16 AM, Paolo Patierno <ppatie...@live.com> wrote:

> Hi,
>
>
> I was digging into it because I need something like an Admin Client alter
> API for my work on rewriting the TopicCommand tool using them.
>
> The AlterConfigs API is used for changing topic level configuration (i.e.
> retention.ms, retention.bytes and so on).
>
> A new AlterTopic API could be better in order to change topic "high level"
> structure so number of partitions, replication factors and so on.
>
> My opinion is that we need separate API because from my point of view they
> are different settings.
>
>
> Thanks,
>
> Paolo.
>
>
> Paolo Patierno
> Senior Software Engineer (IoT) @ Red Hat
> Microsoft MVP on Windows Embedded & IoT
> Microsoft Azure Advisor
>
> Twitter : @ppatierno<http://twitter.com/ppatierno>
> Linkedin : paolopatierno<http://it.linkedin.com/in/paolopatierno>
> Blog : DevExperience<http://paolopatierno.wordpress.com/>
>
>
> ____
> From: Tom Bentley <t.j.bent...@gmail.com>
> Sent: Tuesday, July 25, 2017 9:02 AM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use
> AdminClient
>
> Hi Ismael,
>
> I assume that's the --partitions for kafka-topics.sh? I must admit I hadn't
> considered that tool, only kafka-reassign-partitions.sh. Thanks for
> pointing it out, because obviously the AdminClient API needs to be suitable
> for reuse in kafka-topics.sh too. Since AdminClient doesn't currently
> support an alterTopics() I guess the API should mirror the API for
> newTopics(), so something like:
>
> public AlterTopicsResult alterTopics(Set topics);
> // where:
> public class AlteredTopic {
> // constructor etc
> public String name();
> public int numPartitions();
> public int replicationFactor();
> Map<Integer,List> replicasAssignment();
> }
>
> Note that although NewTopic contains a `Map<String,String> configs`, I
> think the API for changing a topic's config already exists: alterConfigs().
>
> This API is better than having separate methods to set the number of
> partitions/replicas and assigning them to brokers, since sometimes people
> will want to set the assignment at the same time as changing the
> partitions/replicas.
>
> An API like this could then be used by both tools.
>
>
>
> On 24 July 2017 at 16:23, Ismael Juma <ism...@juma.me.uk> wrote:
>
> > Hi Tom,
> >
> > I think it makes sense to keep progress reporting simple in the initial
> > version. As you say, time to completion is tricky to compute and it seems
> > like it should be tackled as its own KIP.
> >
> > Regarding waiting for reassignment completion, it's easy enough for
> people
> > to do that via a script, so I don't think we need to add it to the tool.
> >
> > One question that occurred to me, the tool allows one to add partitions
> to
> > existing topics:
> >
> > val partitionsOpt = parser.accepts("partitions", "The number of
> partitions
> > for the topic being created or " +
> >   "altered (WARNING: If partitions are increased for a topic that
> has a
> > key, the partition logic or ordering of the messages will be affected")
> >
> > It seems like it may make sense to have that as an AdminClient API as
> well.
> > If we agree to do that, then we need to decide if it should be
> implemented
> > client-side or by adding a protocol API. The former is simpler, but the
> > latter would allow non Java clients to use it without duplicating the
> logic
> > of assigning replicas to the new partitions. What are your thoughts?
> >
> > Ismael
> >
> > On Sat, Jul 22, 2017 at 10:14 AM, Tom Bentley <t.j.bent...@gmail.com>
> > wrote:
> >
> > > Thinking about this some more, I release that the proposed API for
> > tracking
> > > progress is a bit specific to this reassignment use case. A more
> > generally
> > > useful API would be to be able to find out, for a partition on a
> broker:
&

Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-07-25 Thread Paolo Patierno
Hi,


I was digging into it because I need something like an Admin Client alter API 
for my work on rewriting the TopicCommand tool using them.

The AlterConfigs API is used for changing topic level configuration (i.e. 
retention.ms, retention.bytes and so on).

A new AlterTopic API could be better in order to change topic "high level" 
structure so number of partitions, replication factors and so on.

My opinion is that we need separate API because from my point of view they are 
different settings.


Thanks,

Paolo.


Paolo Patierno
Senior Software Engineer (IoT) @ Red Hat
Microsoft MVP on Windows Embedded & IoT
Microsoft Azure Advisor

Twitter : @ppatierno<http://twitter.com/ppatierno>
Linkedin : paolopatierno<http://it.linkedin.com/in/paolopatierno>
Blog : DevExperience<http://paolopatierno.wordpress.com/>



From: Tom Bentley <t.j.bent...@gmail.com>
Sent: Tuesday, July 25, 2017 9:02 AM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use 
AdminClient

Hi Ismael,

I assume that's the --partitions for kafka-topics.sh? I must admit I hadn't
considered that tool, only kafka-reassign-partitions.sh. Thanks for
pointing it out, because obviously the AdminClient API needs to be suitable
for reuse in kafka-topics.sh too. Since AdminClient doesn't currently
support an alterTopics() I guess the API should mirror the API for
newTopics(), so something like:

public AlterTopicsResult alterTopics(Set topics);
// where:
public class AlteredTopic {
// constructor etc
public String name();
public int numPartitions();
public int replicationFactor();
Map<Integer,List> replicasAssignment();
}

Note that although NewTopic contains a `Map<String,String> configs`, I
think the API for changing a topic's config already exists: alterConfigs().

This API is better than having separate methods to set the number of
partitions/replicas and assigning them to brokers, since sometimes people
will want to set the assignment at the same time as changing the
partitions/replicas.

An API like this could then be used by both tools.



On 24 July 2017 at 16:23, Ismael Juma <ism...@juma.me.uk> wrote:

> Hi Tom,
>
> I think it makes sense to keep progress reporting simple in the initial
> version. As you say, time to completion is tricky to compute and it seems
> like it should be tackled as its own KIP.
>
> Regarding waiting for reassignment completion, it's easy enough for people
> to do that via a script, so I don't think we need to add it to the tool.
>
> One question that occurred to me, the tool allows one to add partitions to
> existing topics:
>
> val partitionsOpt = parser.accepts("partitions", "The number of partitions
> for the topic being created or " +
>   "altered (WARNING: If partitions are increased for a topic that has a
> key, the partition logic or ordering of the messages will be affected")
>
> It seems like it may make sense to have that as an AdminClient API as well.
> If we agree to do that, then we need to decide if it should be implemented
> client-side or by adding a protocol API. The former is simpler, but the
> latter would allow non Java clients to use it without duplicating the logic
> of assigning replicas to the new partitions. What are your thoughts?
>
> Ismael
>
> On Sat, Jul 22, 2017 at 10:14 AM, Tom Bentley <t.j.bent...@gmail.com>
> wrote:
>
> > Thinking about this some more, I release that the proposed API for
> tracking
> > progress is a bit specific to this reassignment use case. A more
> generally
> > useful API would be to be able to find out, for a partition on a broker:
> >
> > * When the broker became a follower
> > * When the broker was last in the ISR
> > * If the broker is not in the ISR, how many messages behind it is
> >
> > That's not enough to be able to calculate a percentage completion of a
> > reassignment though (we don't know how far behind it was when it became a
> > follower). But maybe we could maintain a maximum of how far behind it has
> > fallen behind the ISR, since it became a follower/dropped out of the ISR.
> >
> > btw, we need the middle bullet to cope with following sequence:
> >
> > 1. Start syncing
> > 2. Finish syncing
> > 3. Fall behind (drop out of ISR for some reason)
> > 4. User queries for if reassignment has finished (it has, but just
> looking
> > at the ISR would give the impression that it has not).
> >
> >
> >
> >
> > On 21 July 2017 at 11:09, Tom Bentley <t.j.bent...@gmail.com> wrote:
> >
> > > Aside: I've started this new DISCUSS thread for KIP-179 since the
> > origin

Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-07-25 Thread Tom Bentley
Hi Ismael,

I assume that's the --partitions for kafka-topics.sh? I must admit I hadn't
considered that tool, only kafka-reassign-partitions.sh. Thanks for
pointing it out, because obviously the AdminClient API needs to be suitable
for reuse in kafka-topics.sh too. Since AdminClient doesn't currently
support an alterTopics() I guess the API should mirror the API for
newTopics(), so something like:

public AlterTopicsResult alterTopics(Set topics);
// where:
public class AlteredTopic {
// constructor etc
public String name();
public int numPartitions();
public int replicationFactor();
Map replicasAssignment();
}

Note that although NewTopic contains a `Map configs`, I
think the API for changing a topic's config already exists: alterConfigs().

This API is better than having separate methods to set the number of
partitions/replicas and assigning them to brokers, since sometimes people
will want to set the assignment at the same time as changing the
partitions/replicas.

An API like this could then be used by both tools.



On 24 July 2017 at 16:23, Ismael Juma  wrote:

> Hi Tom,
>
> I think it makes sense to keep progress reporting simple in the initial
> version. As you say, time to completion is tricky to compute and it seems
> like it should be tackled as its own KIP.
>
> Regarding waiting for reassignment completion, it's easy enough for people
> to do that via a script, so I don't think we need to add it to the tool.
>
> One question that occurred to me, the tool allows one to add partitions to
> existing topics:
>
> val partitionsOpt = parser.accepts("partitions", "The number of partitions
> for the topic being created or " +
>   "altered (WARNING: If partitions are increased for a topic that has a
> key, the partition logic or ordering of the messages will be affected")
>
> It seems like it may make sense to have that as an AdminClient API as well.
> If we agree to do that, then we need to decide if it should be implemented
> client-side or by adding a protocol API. The former is simpler, but the
> latter would allow non Java clients to use it without duplicating the logic
> of assigning replicas to the new partitions. What are your thoughts?
>
> Ismael
>
> On Sat, Jul 22, 2017 at 10:14 AM, Tom Bentley 
> wrote:
>
> > Thinking about this some more, I release that the proposed API for
> tracking
> > progress is a bit specific to this reassignment use case. A more
> generally
> > useful API would be to be able to find out, for a partition on a broker:
> >
> > * When the broker became a follower
> > * When the broker was last in the ISR
> > * If the broker is not in the ISR, how many messages behind it is
> >
> > That's not enough to be able to calculate a percentage completion of a
> > reassignment though (we don't know how far behind it was when it became a
> > follower). But maybe we could maintain a maximum of how far behind it has
> > fallen behind the ISR, since it became a follower/dropped out of the ISR.
> >
> > btw, we need the middle bullet to cope with following sequence:
> >
> > 1. Start syncing
> > 2. Finish syncing
> > 3. Fall behind (drop out of ISR for some reason)
> > 4. User queries for if reassignment has finished (it has, but just
> looking
> > at the ISR would give the impression that it has not).
> >
> >
> >
> >
> > On 21 July 2017 at 11:09, Tom Bentley  wrote:
> >
> > > Aside: I've started this new DISCUSS thread for KIP-179 since the
> > original
> > > one had the incorrect KIP number 178. The original thread can be found
> > > here: http://mail-archives.apache.org/mod_mbox/kafka-dev/201707.
> > > mbox/%3cCAMd5YszudP+-8z5KTbFh6JscT2p4xFi1=VZWWX+
> > > 5dccpxry...@mail.gmail.com%3e
> > >
> > > I've just updated KIP-179 to support Ismael's request for the command
> to
> > > be able to support progress reporting of an ongoing partition
> > reassignment.
> > >
> > > I'll call out two things which I'm not sure about since I don't yet
> have
> > > much experience of Kafka being used operationally:
> > >
> > > 1. As currently constructed the --progress option could report an
> overall
> > > progress percentage, per-partition percentages and errors. It cannot
> > > provide any kind of time-to-completion estimate. Part of me is loath to
> > do
> > > this, as I'm sure we all remember file transfer dialogs that provide
> > > amusing/baffling time-to-completion estimates. So it might be hard to
> do
> > > _well_. On the other hand I expect the thing people will be interested
> in
> > > will often be "when will it be finished?"
> > >
> > > 2. There is no option for the tool to wait for reassignment
> completion. I
> > > can imagine users might want to script something to happen after the
> > > reassignment is complete, and without some kind of --wait option they
> > will
> > > have to poll for completion "manually". Having a --wait optin and

Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-07-24 Thread Ismael Juma
Hi Tom,

I think it makes sense to keep progress reporting simple in the initial
version. As you say, time to completion is tricky to compute and it seems
like it should be tackled as its own KIP.

Regarding waiting for reassignment completion, it's easy enough for people
to do that via a script, so I don't think we need to add it to the tool.

One question that occurred to me, the tool allows one to add partitions to
existing topics:

val partitionsOpt = parser.accepts("partitions", "The number of partitions
for the topic being created or " +
  "altered (WARNING: If partitions are increased for a topic that has a
key, the partition logic or ordering of the messages will be affected")

It seems like it may make sense to have that as an AdminClient API as well.
If we agree to do that, then we need to decide if it should be implemented
client-side or by adding a protocol API. The former is simpler, but the
latter would allow non Java clients to use it without duplicating the logic
of assigning replicas to the new partitions. What are your thoughts?

Ismael

On Sat, Jul 22, 2017 at 10:14 AM, Tom Bentley  wrote:

> Thinking about this some more, I release that the proposed API for tracking
> progress is a bit specific to this reassignment use case. A more generally
> useful API would be to be able to find out, for a partition on a broker:
>
> * When the broker became a follower
> * When the broker was last in the ISR
> * If the broker is not in the ISR, how many messages behind it is
>
> That's not enough to be able to calculate a percentage completion of a
> reassignment though (we don't know how far behind it was when it became a
> follower). But maybe we could maintain a maximum of how far behind it has
> fallen behind the ISR, since it became a follower/dropped out of the ISR.
>
> btw, we need the middle bullet to cope with following sequence:
>
> 1. Start syncing
> 2. Finish syncing
> 3. Fall behind (drop out of ISR for some reason)
> 4. User queries for if reassignment has finished (it has, but just looking
> at the ISR would give the impression that it has not).
>
>
>
>
> On 21 July 2017 at 11:09, Tom Bentley  wrote:
>
> > Aside: I've started this new DISCUSS thread for KIP-179 since the
> original
> > one had the incorrect KIP number 178. The original thread can be found
> > here: http://mail-archives.apache.org/mod_mbox/kafka-dev/201707.
> > mbox/%3cCAMd5YszudP+-8z5KTbFh6JscT2p4xFi1=VZWWX+
> > 5dccpxry...@mail.gmail.com%3e
> >
> > I've just updated KIP-179 to support Ismael's request for the command to
> > be able to support progress reporting of an ongoing partition
> reassignment.
> >
> > I'll call out two things which I'm not sure about since I don't yet have
> > much experience of Kafka being used operationally:
> >
> > 1. As currently constructed the --progress option could report an overall
> > progress percentage, per-partition percentages and errors. It cannot
> > provide any kind of time-to-completion estimate. Part of me is loath to
> do
> > this, as I'm sure we all remember file transfer dialogs that provide
> > amusing/baffling time-to-completion estimates. So it might be hard to do
> > _well_. On the other hand I expect the thing people will be interested in
> > will often be "when will it be finished?"
> >
> > 2. There is no option for the tool to wait for reassignment completion. I
> > can imagine users might want to script something to happen after the
> > reassignment is complete, and without some kind of --wait option they
> will
> > have to poll for completion "manually". Having a --wait optin and putting
> > this polling in the tool means we have a lot more control over how often
> > such polling happens.
> >
> > The KIP is available here:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-179+-+Change+
> > ReassignPartitionsCommand+to+use+AdminClient
> >
> > Thanks,
> >
> > Tom
> >
>


Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-07-22 Thread Tom Bentley
Thinking about this some more, I release that the proposed API for tracking
progress is a bit specific to this reassignment use case. A more generally
useful API would be to be able to find out, for a partition on a broker:

* When the broker became a follower
* When the broker was last in the ISR
* If the broker is not in the ISR, how many messages behind it is

That's not enough to be able to calculate a percentage completion of a
reassignment though (we don't know how far behind it was when it became a
follower). But maybe we could maintain a maximum of how far behind it has
fallen behind the ISR, since it became a follower/dropped out of the ISR.

btw, we need the middle bullet to cope with following sequence:

1. Start syncing
2. Finish syncing
3. Fall behind (drop out of ISR for some reason)
4. User queries for if reassignment has finished (it has, but just looking
at the ISR would give the impression that it has not).




On 21 July 2017 at 11:09, Tom Bentley  wrote:

> Aside: I've started this new DISCUSS thread for KIP-179 since the original
> one had the incorrect KIP number 178. The original thread can be found
> here: http://mail-archives.apache.org/mod_mbox/kafka-dev/201707.
> mbox/%3cCAMd5YszudP+-8z5KTbFh6JscT2p4xFi1=VZWWX+
> 5dccpxry...@mail.gmail.com%3e
>
> I've just updated KIP-179 to support Ismael's request for the command to
> be able to support progress reporting of an ongoing partition reassignment.
>
> I'll call out two things which I'm not sure about since I don't yet have
> much experience of Kafka being used operationally:
>
> 1. As currently constructed the --progress option could report an overall
> progress percentage, per-partition percentages and errors. It cannot
> provide any kind of time-to-completion estimate. Part of me is loath to do
> this, as I'm sure we all remember file transfer dialogs that provide
> amusing/baffling time-to-completion estimates. So it might be hard to do
> _well_. On the other hand I expect the thing people will be interested in
> will often be "when will it be finished?"
>
> 2. There is no option for the tool to wait for reassignment completion. I
> can imagine users might want to script something to happen after the
> reassignment is complete, and without some kind of --wait option they will
> have to poll for completion "manually". Having a --wait optin and putting
> this polling in the tool means we have a lot more control over how often
> such polling happens.
>
> The KIP is available here:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-179+-+Change+
> ReassignPartitionsCommand+to+use+AdminClient
>
> Thanks,
>
> Tom
>


[DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient

2017-07-21 Thread Tom Bentley
Aside: I've started this new DISCUSS thread for KIP-179 since the original
one had the incorrect KIP number 178. The original thread can be found
here:
http://mail-archives.apache.org/mod_mbox/kafka-dev/201707.mbox/%3cCAMd5YszudP+-8z5KTbFh6JscT2p4xFi1=vzwwx+5dccpxry...@mail.gmail.com%3e

I've just updated KIP-179 to support Ismael's request for the command to be
able to support progress reporting of an ongoing partition reassignment.

I'll call out two things which I'm not sure about since I don't yet have
much experience of Kafka being used operationally:

1. As currently constructed the --progress option could report an overall
progress percentage, per-partition percentages and errors. It cannot
provide any kind of time-to-completion estimate. Part of me is loath to do
this, as I'm sure we all remember file transfer dialogs that provide
amusing/baffling time-to-completion estimates. So it might be hard to do
_well_. On the other hand I expect the thing people will be interested in
will often be "when will it be finished?"

2. There is no option for the tool to wait for reassignment completion. I
can imagine users might want to script something to happen after the
reassignment is complete, and without some kind of --wait option they will
have to poll for completion "manually". Having a --wait optin and putting
this polling in the tool means we have a lot more control over how often
such polling happens.

The KIP is available here:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-179+-+Change+ReassignPartitionsCommand+to+use+AdminClient

Thanks,

Tom