Re: If you run Kafka in AWS or Docker, how do you persist data?
Hello, We use docker for kafka on vm's with both nas and local disk. We mount the volumes externally. We havent had many problems at all, and a restart has cleared any issue. We are on .8.1 We are also started to deploy to aws. -- Colin +1 612 859 6129 Skype colin.p.clark On Mar 4, 2015, at 10:46 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, On Fri, Feb 27, 2015 at 1:36 AM, James Cheng jch...@tivo.com wrote: Hi, I know that Netflix might be talking about Kafka on AWS at the March meetup, but I wanted to bring up the topic anyway. I'm sure that some people are running Kafka in AWS. I'd say most, not some :) Is anyone running Kafka within docker in production? How does that work? Not us. When I was at DevOps Days in NYC last year, everyone was talking about Docker, but only about 2.5 people in the room actually really used it. For both of these, how do you persist data? If on AWS, do you use EBS? Do you use ephemeral storage and then rely on replication? And if using docker, do you persist data outside the docker container and on the host machine? We've used both EBD and local disks in AWS. We don't have Kafka replication, as far as I know. And related, how do you deal with broker failure? Do you simply replace it, and repopulate a new broker via replication? Or do you bring back up the broker with the persisted files? We monitor all Kafka pieces - producers, consumer, and brokers with SPM. We have alerts and anomaly detection enabled for various Kafka metrics (yeah, consumer lag being one of them). Broker failures have been very rare (we've used 0.7.2, 0.8.1.x, and are now on 0.8.2). When they happened a restart was typically enough. I can recall one instance where segments recovery tool a long time (minutes, maybe more than an hour), but this was 6 months ago. Trying to learn about what people are doing, beyond on premises and dedicated hardware. In my world almost everyone I talk to is in AWS. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/
Re: If you run Kafka in AWS or Docker, how do you persist data?
Hi, On Fri, Feb 27, 2015 at 1:36 AM, James Cheng jch...@tivo.com wrote: Hi, I know that Netflix might be talking about Kafka on AWS at the March meetup, but I wanted to bring up the topic anyway. I'm sure that some people are running Kafka in AWS. I'd say most, not some :) Is anyone running Kafka within docker in production? How does that work? Not us. When I was at DevOps Days in NYC last year, everyone was talking about Docker, but only about 2.5 people in the room actually really used it. For both of these, how do you persist data? If on AWS, do you use EBS? Do you use ephemeral storage and then rely on replication? And if using docker, do you persist data outside the docker container and on the host machine? We've used both EBD and local disks in AWS. We don't have Kafka replication, as far as I know. And related, how do you deal with broker failure? Do you simply replace it, and repopulate a new broker via replication? Or do you bring back up the broker with the persisted files? We monitor all Kafka pieces - producers, consumer, and brokers with SPM. We have alerts and anomaly detection enabled for various Kafka metrics (yeah, consumer lag being one of them). Broker failures have been very rare (we've used 0.7.2, 0.8.1.x, and are now on 0.8.2). When they happened a restart was typically enough. I can recall one instance where segments recovery tool a long time (minutes, maybe more than an hour), but this was 6 months ago. Trying to learn about what people are doing, beyond on premises and dedicated hardware. In my world almost everyone I talk to is in AWS. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/
Re: If you run Kafka in AWS or Docker, how do you persist data?
On Fri, Feb 27, 2015 at 8:09 PM, Jeff Schroeder jeffschroe...@computer.org wrote: Kafka on dedicated hosts running in docker under marathon under Mesos. It was a real bear to get working, but is really beautiful once I did manage to get it working. I simply run with a unique hostname constraint and number of instances = replication factor. If a broker dies and it isn't a hardware or network issue, marathon restarts it. The hardest part was that Kafka was registering to ZK with the internal (to docker) port. My workaround was that you have to use the same port inside and outside docker or it will register to ZK with whatever the port is inside the container. You should be able to use advertised.host.name and advertised.port to control this, so you aren't required to use the same port inside and outside Docker. FYI this is an on premise dedicated Mesos cluster running on bare metal :) On Friday, February 27, 2015, James Cheng jch...@tivo.com wrote: Hi, I know that Netflix might be talking about Kafka on AWS at the March meetup, but I wanted to bring up the topic anyway. I'm sure that some people are running Kafka in AWS. Is anyone running Kafka within docker in production? How does that work? For both of these, how do you persist data? If on AWS, do you use EBS? Do you use ephemeral storage and then rely on replication? And if using docker, do you persist data outside the docker container and on the host machine? On AWS, your choice will depend on a tradeoff of tolerance for data loss, performance, and price sensitivity. You might be able to get better/more predictable performance out of the ephemeral instance storage, but since you are presumably running all instances in the same AZ you leave yourself open to significant data loss if there's a coordinated outage. It's pretty rare, but it does happen. With EBS you may have to do more work or spread across more volumes to get the same throughput. Relevant quote from the docs on provisioned IOPS: Additionally, you can stripe multiple volumes together to achieve up to 48,000 IOPS or 800MBps when attached to larger EC2 instances. (Note MBps not Mbps.) Other considerations: AWS has been moving most of its instance storage to SSDs, so getting enough instance storage space can be relatively pricey, and you can also potentially go with a hybrid setup to get a balance of the two, but you'll need to be very careful about partition assignment then to ensure at least one copy of every partition ends up on an EBS-backed node. For Docker, you probably want the data to be stored on a volume. If possible, it would be better if non-hardware errors could be resolved just by restarting the broker. You'll avoid a lot of needless copying of data. Storing data in a volume would let you simply restart a new container and have it pick up where the last one left off. The example of Postgres given for a volume container in https://docs.docker.com/userguide/dockervolumes/ isn't too far from Kafka if you were to assume Postgres was replicating to a slave -- you'd prefer to reuse the existing data on the existing node (which a volume container enables), but could still handle bringing up a new node if necessary. And related, how do you deal with broker failure? Do you simply replace it, and repopulate a new broker via replication? Or do you bring back up the broker with the persisted files? Trying to learn about what people are doing, beyond on premises and dedicated hardware. Thanks, -James -- Text by Jeff, typos by iPhone -- Thanks, Ewen
Re: If you run Kafka in AWS or Docker, how do you persist data?
Side question: why run kafka on docker for aws? Is the docker config being used for configuration management? Are there more systems running on the instance other than kafka? Sent by Outlookhttp://taps.io/outlookmobile for Android On Sun, Mar 1, 2015 at 1:10 PM -0800, Ewen Cheslack-Postava e...@confluent.iomailto:e...@confluent.io wrote: On Fri, Feb 27, 2015 at 8:09 PM, Jeff Schroeder jeffschroe...@computer.org wrote: Kafka on dedicated hosts running in docker under marathon under Mesos. It was a real bear to get working, but is really beautiful once I did manage to get it working. I simply run with a unique hostname constraint and number of instances = replication factor. If a broker dies and it isn't a hardware or network issue, marathon restarts it. The hardest part was that Kafka was registering to ZK with the internal (to docker) port. My workaround was that you have to use the same port inside and outside docker or it will register to ZK with whatever the port is inside the container. You should be able to use advertised.host.name and advertised.port to control this, so you aren't required to use the same port inside and outside Docker. FYI this is an on premise dedicated Mesos cluster running on bare metal :) On Friday, February 27, 2015, James Cheng jch...@tivo.com wrote: Hi, I know that Netflix might be talking about Kafka on AWS at the March meetup, but I wanted to bring up the topic anyway. I'm sure that some people are running Kafka in AWS. Is anyone running Kafka within docker in production? How does that work? For both of these, how do you persist data? If on AWS, do you use EBS? Do you use ephemeral storage and then rely on replication? And if using docker, do you persist data outside the docker container and on the host machine? On AWS, your choice will depend on a tradeoff of tolerance for data loss, performance, and price sensitivity. You might be able to get better/more predictable performance out of the ephemeral instance storage, but since you are presumably running all instances in the same AZ you leave yourself open to significant data loss if there's a coordinated outage. It's pretty rare, but it does happen. With EBS you may have to do more work or spread across more volumes to get the same throughput. Relevant quote from the docs on provisioned IOPS: Additionally, you can stripe multiple volumes together to achieve up to 48,000 IOPS or 800MBps when attached to larger EC2 instances. (Note MBps not Mbps.) Other considerations: AWS has been moving most of its instance storage to SSDs, so getting enough instance storage space can be relatively pricey, and you can also potentially go with a hybrid setup to get a balance of the two, but you'll need to be very careful about partition assignment then to ensure at least one copy of every partition ends up on an EBS-backed node. For Docker, you probably want the data to be stored on a volume. If possible, it would be better if non-hardware errors could be resolved just by restarting the broker. You'll avoid a lot of needless copying of data. Storing data in a volume would let you simply restart a new container and have it pick up where the last one left off. The example of Postgres given for a volume container in https://docs.docker.com/userguide/dockervolumes/ isn't too far from Kafka if you were to assume Postgres was replicating to a slave -- you'd prefer to reuse the existing data on the existing node (which a volume container enables), but could still handle bringing up a new node if necessary. And related, how do you deal with broker failure? Do you simply replace it, and repopulate a new broker via replication? Or do you bring back up the broker with the persisted files? Trying to learn about what people are doing, beyond on premises and dedicated hardware. Thanks, -James -- Text by Jeff, typos by iPhone -- Thanks, Ewen
Re: If you run Kafka in AWS or Docker, how do you persist data?
Kafka on dedicated hosts running in docker under marathon under Mesos. It was a real bear to get working, but is really beautiful once I did manage to get it working. I simply run with a unique hostname constraint and number of instances = replication factor. If a broker dies and it isn't a hardware or network issue, marathon restarts it. The hardest part was that Kafka was registering to ZK with the internal (to docker) port. My workaround was that you have to use the same port inside and outside docker or it will register to ZK with whatever the port is inside the container. FYI this is an on premise dedicated Mesos cluster running on bare metal :) On Friday, February 27, 2015, James Cheng jch...@tivo.com wrote: Hi, I know that Netflix might be talking about Kafka on AWS at the March meetup, but I wanted to bring up the topic anyway. I'm sure that some people are running Kafka in AWS. Is anyone running Kafka within docker in production? How does that work? For both of these, how do you persist data? If on AWS, do you use EBS? Do you use ephemeral storage and then rely on replication? And if using docker, do you persist data outside the docker container and on the host machine? And related, how do you deal with broker failure? Do you simply replace it, and repopulate a new broker via replication? Or do you bring back up the broker with the persisted files? Trying to learn about what people are doing, beyond on premises and dedicated hardware. Thanks, -James -- Text by Jeff, typos by iPhone
If you run Kafka in AWS or Docker, how do you persist data?
Hi, I know that Netflix might be talking about Kafka on AWS at the March meetup, but I wanted to bring up the topic anyway. I'm sure that some people are running Kafka in AWS. Is anyone running Kafka within docker in production? How does that work? For both of these, how do you persist data? If on AWS, do you use EBS? Do you use ephemeral storage and then rely on replication? And if using docker, do you persist data outside the docker container and on the host machine? And related, how do you deal with broker failure? Do you simply replace it, and repopulate a new broker via replication? Or do you bring back up the broker with the persisted files? Trying to learn about what people are doing, beyond on premises and dedicated hardware. Thanks, -James