Re: If you run Kafka in AWS or Docker, how do you persist data?

2015-03-04 Thread Colin
Hello,

We use docker for kafka on vm's with both nas and local disk.  We mount the 
volumes externally.  We havent had many problems at all, and a restart has 
cleared any issue.  We are on .8.1

We are also started to deploy to aws.

--
Colin 
+1 612 859 6129
Skype colin.p.clark

 On Mar 4, 2015, at 10:46 PM, Otis Gospodnetic otis.gospodne...@gmail.com 
 wrote:
 
 Hi,
 
 On Fri, Feb 27, 2015 at 1:36 AM, James Cheng jch...@tivo.com wrote:
 
 Hi,
 
 I know that Netflix might be talking about Kafka on AWS at the March
 meetup, but I wanted to bring up the topic anyway.
 
 I'm sure that some people are running Kafka in AWS.
 
 
 I'd say most, not some :)
 
 
 Is anyone running Kafka within docker in production? How does that work?
 
 Not us.  When I was at DevOps Days in NYC last year, everyone was talking
 about Docker, but only about 2.5 people in the room actually really used it.
 
 For both of these, how do you persist data? If on AWS, do you use EBS? Do
 you use ephemeral storage and then rely on replication? And if using
 docker, do you persist data outside the docker container and on the host
 machine?
 
 We've used both EBD and local disks in AWS.  We don't have Kafka
 replication, as far as I know.
 
 And related, how do you deal with broker failure? Do you simply replace it,
 and repopulate a new broker via replication? Or do you bring back up the
 broker with the persisted files?
 
 We monitor all Kafka pieces - producers, consumer, and brokers with SPM.
 We have alerts and anomaly detection enabled for various Kafka metrics
 (yeah, consumer lag being one of them).
 Broker failures have been very rare (we've used 0.7.2, 0.8.1.x, and are now
 on 0.8.2).  When they happened a restart was typically enough. I can recall
 one instance where segments recovery tool a long time (minutes, maybe more
 than an hour), but this was 6 months ago.
 
 
 Trying to learn about what people are doing, beyond on premises and
 dedicated hardware.
 
 In my world almost everyone I talk to is in AWS.
 
 Otis
 --
 Monitoring * Alerting * Anomaly Detection * Centralized Log Management
 Solr  Elasticsearch Support * http://sematext.com/


Re: If you run Kafka in AWS or Docker, how do you persist data?

2015-03-04 Thread Otis Gospodnetic
Hi,

On Fri, Feb 27, 2015 at 1:36 AM, James Cheng jch...@tivo.com wrote:

 Hi,

 I know that Netflix might be talking about Kafka on AWS at the March
 meetup, but I wanted to bring up the topic anyway.

 I'm sure that some people are running Kafka in AWS.


I'd say most, not some :)


 Is anyone running Kafka within docker in production? How does that work?


Not us.  When I was at DevOps Days in NYC last year, everyone was talking
about Docker, but only about 2.5 people in the room actually really used it.

For both of these, how do you persist data? If on AWS, do you use EBS? Do
 you use ephemeral storage and then rely on replication? And if using
 docker, do you persist data outside the docker container and on the host
 machine?


We've used both EBD and local disks in AWS.  We don't have Kafka
replication, as far as I know.

And related, how do you deal with broker failure? Do you simply replace it,
 and repopulate a new broker via replication? Or do you bring back up the
 broker with the persisted files?


We monitor all Kafka pieces - producers, consumer, and brokers with SPM.
We have alerts and anomaly detection enabled for various Kafka metrics
(yeah, consumer lag being one of them).
Broker failures have been very rare (we've used 0.7.2, 0.8.1.x, and are now
on 0.8.2).  When they happened a restart was typically enough. I can recall
one instance where segments recovery tool a long time (minutes, maybe more
than an hour), but this was 6 months ago.


 Trying to learn about what people are doing, beyond on premises and
 dedicated hardware.


In my world almost everyone I talk to is in AWS.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr  Elasticsearch Support * http://sematext.com/


Re: If you run Kafka in AWS or Docker, how do you persist data?

2015-03-01 Thread Ewen Cheslack-Postava
On Fri, Feb 27, 2015 at 8:09 PM, Jeff Schroeder jeffschroe...@computer.org
wrote:

 Kafka on dedicated hosts running in docker under marathon under Mesos. It
 was a real bear to get working, but is really beautiful once I did manage
 to get it working. I simply run with a unique hostname constraint and
 number of instances = replication factor. If a broker dies and it isn't a
 hardware or network issue, marathon restarts it.

 The hardest part was that Kafka was registering to ZK with the internal (to
 docker) port. My workaround was that you have to use the same port inside
 and outside docker or it will register to ZK with whatever the port is
 inside the container.


You should be able to use advertised.host.name and advertised.port to
control this, so you aren't required to use the same port inside and
outside Docker.



 FYI this is an on premise dedicated Mesos cluster running on bare metal :)

 On Friday, February 27, 2015, James Cheng jch...@tivo.com wrote:

  Hi,
 
  I know that Netflix might be talking about Kafka on AWS at the March
  meetup, but I wanted to bring up the topic anyway.
 
  I'm sure that some people are running Kafka in AWS. Is anyone running
  Kafka within docker in production? How does that work?
 
  For both of these, how do you persist data? If on AWS, do you use EBS? Do
  you use ephemeral storage and then rely on replication? And if using
  docker, do you persist data outside the docker container and on the host
  machine?


On AWS, your choice will depend on a tradeoff of tolerance for data loss,
performance, and price sensitivity. You might be able to get better/more
predictable performance out of the ephemeral instance storage, but since
you are presumably running all instances in the same AZ you leave yourself
open to significant data loss if there's a coordinated outage. It's pretty
rare, but it does happen. With EBS you may have to do more work or spread
across more volumes to get the same throughput. Relevant quote from the
docs on provisioned IOPS: Additionally, you can stripe multiple volumes
together to achieve up to 48,000 IOPS or 800MBps when attached to larger
EC2 instances. (Note MBps not Mbps.) Other considerations: AWS has been
moving most of its instance storage to SSDs, so getting enough instance
storage space can be relatively pricey, and you can also potentially go
with a hybrid setup to get a balance of the two, but you'll need to be very
careful about partition assignment then to ensure at least one copy of
every partition ends up on an EBS-backed node.

For Docker, you probably want the data to be stored on a volume. If
possible, it would be better if non-hardware errors could be resolved just
by restarting the broker. You'll avoid a lot of needless copying of data.
Storing data in a volume would let you simply restart a new container and
have it pick up where the last one left off. The example of Postgres given
for a volume container in https://docs.docker.com/userguide/dockervolumes/
isn't too far from Kafka if you were to assume Postgres was replicating to
a slave -- you'd prefer to reuse the existing data on the existing node
(which a volume container enables), but could still handle bringing up a
new node if necessary.



 
  And related, how do you deal with broker failure? Do you simply replace
  it, and repopulate a new broker via replication? Or do you bring back up
  the broker with the persisted files?
 
  Trying to learn about what people are doing, beyond on premises and
  dedicated hardware.
 
  Thanks,
  -James
 
 

 --
 Text by Jeff, typos by iPhone




-- 
Thanks,
Ewen


Re: If you run Kafka in AWS or Docker, how do you persist data?

2015-03-01 Thread Joseph Lawson
Side question: why run kafka on docker for aws? Is the docker config being used 
for configuration management? Are there more systems running on the instance 
other than kafka?


Sent by Outlookhttp://taps.io/outlookmobile for Android



On Sun, Mar 1, 2015 at 1:10 PM -0800, Ewen Cheslack-Postava 
e...@confluent.iomailto:e...@confluent.io wrote:

On Fri, Feb 27, 2015 at 8:09 PM, Jeff Schroeder jeffschroe...@computer.org
wrote:

 Kafka on dedicated hosts running in docker under marathon under Mesos. It
 was a real bear to get working, but is really beautiful once I did manage
 to get it working. I simply run with a unique hostname constraint and
 number of instances = replication factor. If a broker dies and it isn't a
 hardware or network issue, marathon restarts it.

 The hardest part was that Kafka was registering to ZK with the internal (to
 docker) port. My workaround was that you have to use the same port inside
 and outside docker or it will register to ZK with whatever the port is
 inside the container.


You should be able to use advertised.host.name and advertised.port to
control this, so you aren't required to use the same port inside and
outside Docker.



 FYI this is an on premise dedicated Mesos cluster running on bare metal :)

 On Friday, February 27, 2015, James Cheng jch...@tivo.com wrote:

  Hi,
 
  I know that Netflix might be talking about Kafka on AWS at the March
  meetup, but I wanted to bring up the topic anyway.
 
  I'm sure that some people are running Kafka in AWS. Is anyone running
  Kafka within docker in production? How does that work?
 
  For both of these, how do you persist data? If on AWS, do you use EBS? Do
  you use ephemeral storage and then rely on replication? And if using
  docker, do you persist data outside the docker container and on the host
  machine?


On AWS, your choice will depend on a tradeoff of tolerance for data loss,
performance, and price sensitivity. You might be able to get better/more
predictable performance out of the ephemeral instance storage, but since
you are presumably running all instances in the same AZ you leave yourself
open to significant data loss if there's a coordinated outage. It's pretty
rare, but it does happen. With EBS you may have to do more work or spread
across more volumes to get the same throughput. Relevant quote from the
docs on provisioned IOPS: Additionally, you can stripe multiple volumes
together to achieve up to 48,000 IOPS or 800MBps when attached to larger
EC2 instances. (Note MBps not Mbps.) Other considerations: AWS has been
moving most of its instance storage to SSDs, so getting enough instance
storage space can be relatively pricey, and you can also potentially go
with a hybrid setup to get a balance of the two, but you'll need to be very
careful about partition assignment then to ensure at least one copy of
every partition ends up on an EBS-backed node.

For Docker, you probably want the data to be stored on a volume. If
possible, it would be better if non-hardware errors could be resolved just
by restarting the broker. You'll avoid a lot of needless copying of data.
Storing data in a volume would let you simply restart a new container and
have it pick up where the last one left off. The example of Postgres given
for a volume container in https://docs.docker.com/userguide/dockervolumes/
isn't too far from Kafka if you were to assume Postgres was replicating to
a slave -- you'd prefer to reuse the existing data on the existing node
(which a volume container enables), but could still handle bringing up a
new node if necessary.



 
  And related, how do you deal with broker failure? Do you simply replace
  it, and repopulate a new broker via replication? Or do you bring back up
  the broker with the persisted files?
 
  Trying to learn about what people are doing, beyond on premises and
  dedicated hardware.
 
  Thanks,
  -James
 
 

 --
 Text by Jeff, typos by iPhone




--
Thanks,
Ewen


Re: If you run Kafka in AWS or Docker, how do you persist data?

2015-02-27 Thread Jeff Schroeder
Kafka on dedicated hosts running in docker under marathon under Mesos. It
was a real bear to get working, but is really beautiful once I did manage
to get it working. I simply run with a unique hostname constraint and
number of instances = replication factor. If a broker dies and it isn't a
hardware or network issue, marathon restarts it.

The hardest part was that Kafka was registering to ZK with the internal (to
docker) port. My workaround was that you have to use the same port inside
and outside docker or it will register to ZK with whatever the port is
inside the container.

FYI this is an on premise dedicated Mesos cluster running on bare metal :)

On Friday, February 27, 2015, James Cheng jch...@tivo.com wrote:

 Hi,

 I know that Netflix might be talking about Kafka on AWS at the March
 meetup, but I wanted to bring up the topic anyway.

 I'm sure that some people are running Kafka in AWS. Is anyone running
 Kafka within docker in production? How does that work?

 For both of these, how do you persist data? If on AWS, do you use EBS? Do
 you use ephemeral storage and then rely on replication? And if using
 docker, do you persist data outside the docker container and on the host
 machine?

 And related, how do you deal with broker failure? Do you simply replace
 it, and repopulate a new broker via replication? Or do you bring back up
 the broker with the persisted files?

 Trying to learn about what people are doing, beyond on premises and
 dedicated hardware.

 Thanks,
 -James



-- 
Text by Jeff, typos by iPhone


If you run Kafka in AWS or Docker, how do you persist data?

2015-02-26 Thread James Cheng
Hi,

I know that Netflix might be talking about Kafka on AWS at the March meetup, 
but I wanted to bring up the topic anyway.

I'm sure that some people are running Kafka in AWS. Is anyone running Kafka 
within docker in production? How does that work?

For both of these, how do you persist data? If on AWS, do you use EBS? Do you 
use ephemeral storage and then rely on replication? And if using docker, do you 
persist data outside the docker container and on the host machine?

And related, how do you deal with broker failure? Do you simply replace it, and 
repopulate a new broker via replication? Or do you bring back up the broker 
with the persisted files?

Trying to learn about what people are doing, beyond on premises and dedicated 
hardware.

Thanks,
-James