Hey Aniket, Yeah we usually discuss on the tickets just to keep it in one place but either is totally fine.
1. Actually that wasn't quite what I was proposing. What I am saying is that there are three cases (a) metadata data missing in all dirs, (b) metadata missing in some dirs, (c) metadata inconsistent between dirs. In the case of (a) we should generate an id, in the case of (b) we should fill in the missing data (this would be the case where a drive is destroyed and replaced), and in the case of (c) someone has done something sketchy and we should just error out. Let me know if you think that makes sense. An alternative approach would be to designate a special place to keep this kind of metadata but the question is always what happens in the case of drive failure with multiple independently mounted drives. 2. Yup. We have a metadata api that does this. -Jay On Wed, Oct 2, 2013 at 9:50 AM, Aniket Bhatnagar <aniket.bhatna...@gmail.com > wrote: > Thanks Jay. I read through the JIRA defect and had some queries. Apologies > if I was supposed to comment on JIRA ticket instead of discussing it here. > If so, let me know and I will repost my comments on JIRA. > > 1. With the suggested approach, each time a new disk/data dir is added to > the configuration, Kafka will fail to start unless meta file is copied to > the new disk. Copying over the meta file would result in copying over all > other values like data format. Not sure if that would be intentional. > > 2. Is there a way to query broker id to get hostname, etc via zookeeper or > kafka? > On 2 Oct 2013 21:38, "Jay Kreps" <jay.kr...@gmail.com> wrote: > > > I'm in favor of doing this if someone is willing to work on it! I agree > it > > would really help with easy provisioning. > > > > I filed a bug to discuss and track: > > https://issues.apache.org/jira/browse/KAFKA-1070 > > > > Some comments: > > 1. I'm not in favor of having a pluggable strategy, unless we are really > > really sure this is an area where people are going to get a lot of value > by > > writing lots of plugins. I am not at all sure why you would want to > retain > > the current behavior if you had a good strategy for automatically > > generating ids. Basically plugins are an evil we only want to accept when > > either we don't understand the problem or the solutions have such extreme > > tradeoffs that there is no single "good approach". Plugins cause problems > > for upgrades, testing, documentation, user understandability, code > > understandability, etc. > > 2. The node id can't be the host or port or anything tied to the physical > > machine or its location on the network because you need to be able to > > change these things. I recommend we just keep an integer. > > > > -Jay > > > > > > On Tue, Oct 1, 2013 at 7:08 AM, Aniket Bhatnagar < > > aniket.bhatna...@gmail.com > > > wrote: > > > > > Right. It is currently java integer. However, as per previous thread, > it > > > seems possible to change it to a string. In that case, we can use > > instance > > > IDs, IP addresses, custom ID generators, etc. > > > How are you currently generating broker IDs from IP address? Chef > script > > or > > > custom shell script? > > > > > > > > > On 1 October 2013 18:34, Maxime Brugidou <maxime.brugi...@gmail.com> > > > wrote: > > > > > > > I think it currently is a java (signed) integer or maybe this was > > > > zookeeper? > > > > We are generating the id from IP address for now but this is not > ideal > > > (and > > > > can cause integer overflow with java signed ints) > > > > On Oct 1, 2013 12:52 PM, "Aniket Bhatnagar" < > > aniket.bhatna...@gmail.com> > > > > wrote: > > > > > > > > > I would like to revive an older thread around auto generating > broker > > > ID. > > > > As > > > > > a AWS user, I would like Kafka to just use the instance's ID or > > > > instance's > > > > > IP or instance's internal domain (whichever is easier). This would > > > mean I > > > > > can easily clone from a AMI to launch kafka instances without > having > > to > > > > > worry about setting a unique broker ID. This also alows me to setup > > > auto > > > > > scaling. > > > > > > > > > > I realize 1 size may not fit all in this case. Other strategies > that > > > may > > > > > work for other cloud providers are generate the UUID and persist it > > on > > > a > > > > > disk, etc. > > > > > > > > > > What I propose is a way to define a a broker ID generation strategy > > in > > > > the > > > > > configuration file which points to a class file that is responsible > > for > > > > > generating the ID. Is this something being already worked upon? > > > > > > > > > > > > > > >