Re: Change kafka broker ids dynamically

2015-11-24 Thread SpikyHawk SpikyHawk
Hi there.

I have a similar question but is related to this scenario.

Docker server running in EC2 instance with an EBS volume attached to it.

Kafka running in a Docker container in this host with server.properties
"autogenerated" using a bootstrap script.
Part of the bootstrapping process is to generate a broker id using some
property of the host like ip or of the docker container like docker
hostname and update server.properties

Docker container running Kafka mounts the host attached EBS volume  to use
it as log dir.

case 1:
If container dies and is re-launched on same host, then it will generate a
new broker id but will "see" the same log dir, so data in this dir was
technically generated by a broker with a different id. is this the same
scenario like trying to rename a broker id? Should I try to use same broker
id if this broker will use same log dir with data in it?

case 2:
If entire host dies or crashes or is terminated by AWS, then can I use the
same EBS with another host where a new kafka container will run with a
different broker id? I think at the end this is same scenario than case 1 :)

How do you handle this situation where a broker or host may disappear but
log data is still there? Should always be discarded along with the broker
(id) that generated it?

Regards
Luciano


On Fri, Nov 6, 2015 at 7:24 PM, Todd Palino  wrote:

> I’m not quite sure why you would need to do this - the broker IDs are not
> significant outside of the internal metadata. But this is what you would
> have to do for each move (assuming you are running with at least
> replication factor 2):
>
> 1) Shut down the broker
> 2) Clear its partition data
> 3) Reconfigure the broker to the new ID
> 4) Restart the broker
> 5) Issue a partition reassignment to reassign all of broker 1’s partitions
> to broker 0
> 6) Wait for the broker to replicate all it’s partitions from other members
> of the cluster
>
> That’s a lot of moving data around, just to renumber. You can’t just issue
> the reassignment while the broker is down, and not delete the partitions,
> because the ID number 0 is unknown, so the reassignment will fail (the
> broker is not online). If you wanted to shut the entire cluster down you
> could, in theory, walk through the Zookeeper tree manually changing all the
> replica information. That assumes you can shut the whole cluster down for a
> while.
>
> -Todd
>
>
>
> On Fri, Nov 6, 2015 at 1:23 PM, Arathi Maddula 
> wrote:
>
> > Hi,
> > Is it possible to change the broker.ids property for a node belonging to
> a
> > Kafka cluster? For example, currently if I  have brokers with ids 1,2,3.
> If
> > I want to stop broker 1,  can I change broker.id to 0 (with current id =
> > 1) in server.properties and meta.properties files and then restart broker
> > 1. Can I repeat this for brokers 2 and 3 as well?
> >
> > Thanks,
> > Arathi
> >
> >
>


Re: Change kafka broker ids dynamically

2015-11-24 Thread SpikyHawk SpikyHawk
Excellent, We are planning to use Kafka 0.9.0 so your last point is very
useful information. can you please point me to some documentation or code
where I can understand how this auto-generation works?
In the 0.9.0 documentation is see that the default value for broker.id is
-1. That means it will be auto-generated?
Where the generated value (the data file you mention) is stored?
If the broker finds this data file, then it will use the id stored in that
file, right?

I am asking all of this because given your comments, the strategy seems
to be to store the data file where the generated id lives in the same
volume along with the data and make each instance of Kafka using this
volume to use this data file.

On Tue, Nov 24, 2015 at 8:10 PM, Gwen Shapira  wrote:

> You should definitely use the same id if you still have the data - it makes
> life so much better.
>
> There are 3 common ways to do it:
> 1. Use the last 3 digits of the IP as the broker ID (assuming Docker gives
> you the same IP when the container relaunches)
> 2. Use a deployment manager that can register the brokers in an external DB
> to make sure the same broker always gets the same ID
> 3. Use Kafka 0.9.0 where Kafka can auto-generate the broker name and then
> store it in a data file, so if you still have the files, you have the same
> broker ID
>
> On Tue, Nov 24, 2015 at 2:58 PM, SpikyHawk SpikyHawk <
> listas.luaf...@gmail.com> wrote:
>
> > Hi there.
> >
> > I have a similar question but is related to this scenario.
> >
> > Docker server running in EC2 instance with an EBS volume attached to it.
> >
> > Kafka running in a Docker container in this host with server.properties
> > "autogenerated" using a bootstrap script.
> > Part of the bootstrapping process is to generate a broker id using some
> > property of the host like ip or of the docker container like docker
> > hostname and update server.properties
> >
> > Docker container running Kafka mounts the host attached EBS volume  to
> use
> > it as log dir.
> >
> > case 1:
> > If container dies and is re-launched on same host, then it will generate
> a
> > new broker id but will "see" the same log dir, so data in this dir was
> > technically generated by a broker with a different id. is this the same
> > scenario like trying to rename a broker id? Should I try to use same
> broker
> > id if this broker will use same log dir with data in it?
> >
> > case 2:
> > If entire host dies or crashes or is terminated by AWS, then can I use
> the
> > same EBS with another host where a new kafka container will run with a
> > different broker id? I think at the end this is same scenario than case 1
> > :)
> >
> > How do you handle this situation where a broker or host may disappear but
> > log data is still there? Should always be discarded along with the broker
> > (id) that generated it?
> >
> > Regards
> > Luciano
> >
> >
> > On Fri, Nov 6, 2015 at 7:24 PM, Todd Palino  wrote:
> >
> > > I’m not quite sure why you would need to do this - the broker IDs are
> not
> > > significant outside of the internal metadata. But this is what you
> would
> > > have to do for each move (assuming you are running with at least
> > > replication factor 2):
> > >
> > > 1) Shut down the broker
> > > 2) Clear its partition data
> > > 3) Reconfigure the broker to the new ID
> > > 4) Restart the broker
> > > 5) Issue a partition reassignment to reassign all of broker 1’s
> > partitions
> > > to broker 0
> > > 6) Wait for the broker to replicate all it’s partitions from other
> > members
> > > of the cluster
> > >
> > > That’s a lot of moving data around, just to renumber. You can’t just
> > issue
> > > the reassignment while the broker is down, and not delete the
> partitions,
> > > because the ID number 0 is unknown, so the reassignment will fail (the
> > > broker is not online). If you wanted to shut the entire cluster down
> you
> > > could, in theory, walk through the Zookeeper tree manually changing all
> > the
> > > replica information. That assumes you can shut the whole cluster down
> > for a
> > > while.
> > >
> > > -Todd
> > >
> > >
> > >
> > > On Fri, Nov 6, 2015 at 1:23 PM, Arathi Maddula <
> amadd...@boardreader.com
> > >
> > > wrote:
> > >
> > > > Hi,
> > > > Is it possible to change the broker.ids property for a node belonging
> > to
> > > a
> > > > Kafka cluster? For example, currently if I  have brokers with ids
> > 1,2,3.
> > > If
> > > > I want to stop broker 1,  can I change broker.id to 0 (with current
> > id =
> > > > 1) in server.properties and meta.properties files and then restart
> > broker
> > > > 1. Can I repeat this for brokers 2 and 3 as well?
> > > >
> > > > Thanks,
> > > > Arathi
> > > >
> > > >
> > >
> >
>


Trying to understand 0.9.0 producer and Consumer design

2015-12-01 Thread SpikyHawk SpikyHawk
Hi Everybody

is there any design document you can point me to understand producer and
consumer in Kafka 0.9.0?

I am reading
https://cwiki.apache.org/confluence/display/KAFKA/Client+Rewrite

but would like to know if this is "up to date" and reflect actual
implementation.

Regards
Luciano


How to deploy Kafka on AWS

2015-12-04 Thread SpikyHawk SpikyHawk
Hi

I am starting to analyze how to deploy Kafka (we are using AWS) and would
like to hear what are you doing. I am particularly interested in knowing
things like these:

Do you use Docker for Kafka and/or ZooKeeper?

If using Docker, which OS, CoreOS, other? Are you using some scheduler like
Kubernetes or Mesos?

If using Docker do you use Docker FS or use host volumes?

Do you have static environments or use auto scaling groups? In case of the
later how are you provisioning new machines? Immutable AMIs? Chef, Puppet,
other?

Any feedback is welcome.

Regards
Luciano