Re: Tips for working with Kafka and data streams

2015-02-25 Thread Christian Csar
I wouldn't say no to some discussion of encryption. We're running on Azure
EventHubs (with preparations for Kinesis for EC2, and Kafka for deployments
in customer datacenters when needed) so can't just use disk level
encryption (which would have its own overhead). We're putting all of our
messages inside of encrypted envelopes before sending them to the stream
which limits our opportunities for schema verification of the underlying
messages to the declared type of the message.

Encryption at rest mostly works out to a sales point for customers who want
assurances, and in a Kafka focused discussion might be dealt with by
covering disk encryption and how the conversations between Kafka instances
are protected.

Christian


On Wed, Feb 25, 2015 at 11:51 AM, Jay Kreps j...@confluent.io wrote:

 Hey guys,

 One thing we tried to do along with the product release was start to put
 together a practical guide for using Kafka. I wrote this up here:
 http://blog.confluent.io/2015/02/25/stream-data-platform-1/

 I'd like to keep expanding on this as good practices emerge and we learn
 more stuff. So two questions:
 1. Anything you think other people should know about working with data
 streams? What did you wish you knew when you got started?
 2. Anything you don't know about but would like to hear more about?

 -Jay



Re: Tips for working with Kafka and data streams

2015-02-25 Thread Christian Csar
Yeah, we do have scenarios where we use customer specific keys so our
envelopes end up containing key identification information for accessing
our key repository. I'll certainly follow any changes you propose in this
area with interest, but I'd expect that sort of centralized key thing to be
fairly separate from Kafka even if there's a handy optional layer that
integrates with it.

Christian

On Wed, Feb 25, 2015 at 5:34 PM, Julio Castillo 
jcasti...@financialengines.com wrote:

 Although full disk encryption appears to be an easy solution, in our case
 that may not be sufficient. For cases where the actual payload needs to be
 encrypted, the cost of encryption is paid by the consumer and producers.
 Further complicating the matter would be the handling of encryption keys,
 etc. I think this is the area where enhancements to Kafka may facilitate
 that key exchange between consumers and producers, still leaving it up to
 the clients, but facilitating the key handling.

 Julio

 On 2/25/15, 4:24 PM, Christian Csar christ...@csar.us wrote:

 The questions we get from customers typically end up being general so we
 break out our answer into network level and on disk scenarios.
 
 On disk/at rest scenario may just be use full disk encryption at the OS
 level and Kafka doesn't need to worry about it. But documenting any issues
 around it would be good. For example what sort of Kafka specific
 performance impacts does it have, ie budgeting for better processors.
 
 The security story right now is to run on a private network, but I believe
 some of our customers like to be told that within datacenter transmissions
 are encrypted on the wire. Based on
 
 https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_conf
 luence_display_KAFKA_Securityd=AwIBaQc=cKbMccWasSe6U4u_qE0M-qEjqwAh3shju
 L5QPa1B7Ykr=rJHFl4LhCQ-6kvKROhIocflKqVSHRTvT-PgdZ5MFuS0m=jhFmJTJBQfbq0sN
 jxtKA4M1tvSVgBLKOr2ePaK6zqwws=HqZ4N2gLpCZ796dRG7Fo-KLOBc0tgnnvDnC_8VTUo84
 e=  that might mean
 waiting for TLS support, or using a VPN/ssh tunnel for the network
 connections.
 
 Since we're in hosted stream land we can't do either of the above and
 encrypt the messages themselves. For those enterprises that are like our
 customers but would run Kafka or use Confluent, having a story like the
 above so they don't give up the benefits of your schema management layers
 would be good.
 
 Since I didn't mention it before I did find your blog posts handy (though
 I'm already moving us towards stream centric land).
 
 Christian
 
 On Wed, Feb 25, 2015 at 3:57 PM, Jay Kreps jay.kr...@gmail.com wrote:
 
  Hey Christian,
 
  That makes sense. I agree that would be a good area to dive into. Are
 you
  primarily interested in network level security or encryption on disk?
 
  -Jay
 
  On Wed, Feb 25, 2015 at 1:38 PM, Christian Csar christ...@csar.us
 wrote:
 
   I wouldn't say no to some discussion of encryption. We're running on
  Azure
   EventHubs (with preparations for Kinesis for EC2, and Kafka for
  deployments
   in customer datacenters when needed) so can't just use disk level
   encryption (which would have its own overhead). We're putting all of
 our
   messages inside of encrypted envelopes before sending them to the
 stream
   which limits our opportunities for schema verification of the
 underlying
   messages to the declared type of the message.
  
   Encryption at rest mostly works out to a sales point for customers who
  want
   assurances, and in a Kafka focused discussion might be dealt with by
   covering disk encryption and how the conversations between Kafka
  instances
   are protected.
  
   Christian
  
  
   On Wed, Feb 25, 2015 at 11:51 AM, Jay Kreps j...@confluent.io wrote:
  
Hey guys,
   
One thing we tried to do along with the product release was start to
  put
together a practical guide for using Kafka. I wrote this up here:
   
 
 https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.confluent.io_201
 5_02_25_stream-2Ddata-2Dplatform-2D1_d=AwIBaQc=cKbMccWasSe6U4u_qE0M-qEj
 qwAh3shjuL5QPa1B7Ykr=rJHFl4LhCQ-6kvKROhIocflKqVSHRTvT-PgdZ5MFuS0m=jhFmJ
 TJBQfbq0sNjxtKA4M1tvSVgBLKOr2ePaK6zqwws=0I9x4bCw1kN3y9Y22l9lK_YbhSYEZpp4
 ZwBBrP-dSLke=
   
I'd like to keep expanding on this as good practices emerge and we
  learn
more stuff. So two questions:
1. Anything you think other people should know about working with
 data
streams? What did you wish you knew when you got started?
2. Anything you don't know about but would like to hear more about?
   
-Jay
   
  
 

 NOTICE: This e-mail and any attachments to it may be privileged,
 confidential or contain trade secret information and is intended only for
 the use of the individual or entity to which it is addressed. If this
 e-mail was sent to you in error, please notify me immediately by either
 reply e-mail or by phone at 408.498.6000, and do not use, disseminate,
 retain, print or copy the e-mail or any attachment. All messages sent to
 and from 

Re: Tips for working with Kafka and data streams

2015-02-25 Thread Tong Li

+2, these kind of articles coming from the ones who created Kafka always
provide great value to Kafka users and developers. For my 2 cents, I would
love to see one or two articles for developers who involved in Kafka
development on the topics of how to develop test cases and how to run them,
what to expect when error occurs, typical system settings, I suspect that
most of us do run it on linux based systems, little pointer probably can
help a lot. and most importantly how to set up your dev environment so that
you are not struggling with the things the pioneers have already figured
out. For example, recommended dev. ide, debug methods, of course, these
will be the preference of the writer, no one is obligated to use but can
certainly get people started quicker. As Kafka draw more interest, I
suspect more developers will join, having something like that can be
extremely helpful.

Jay, articles similar to the one linked in your original email can actually
be submitted to developerworks, and you can get some money out of it if you
like. If you do not know how to do that, I can certainly provide some
pointers if you are interested.

Thanks.

Tong Li
OpenStack  Kafka Community Development
Building 501/B205
liton...@us.ibm.com



From:   Jay Kreps j...@confluent.io
To: d...@kafka.apache.org d...@kafka.apache.org,
users@kafka.apache.org users@kafka.apache.org
Date:   02/25/2015 02:52 PM
Subject:Tips for working with Kafka and data streams



Hey guys,

One thing we tried to do along with the product release was start to put
together a practical guide for using Kafka. I wrote this up here:
http://blog.confluent.io/2015/02/25/stream-data-platform-1/

I'd like to keep expanding on this as good practices emerge and we learn
more stuff. So two questions:
1. Anything you think other people should know about working with data
streams? What did you wish you knew when you got started?
2. Anything you don't know about but would like to hear more about?

-Jay


Re: Tips for working with Kafka and data streams

2015-02-25 Thread Jay Kreps
Hey Christian,

That makes sense. I agree that would be a good area to dive into. Are you
primarily interested in network level security or encryption on disk?

-Jay

On Wed, Feb 25, 2015 at 1:38 PM, Christian Csar christ...@csar.us wrote:

 I wouldn't say no to some discussion of encryption. We're running on Azure
 EventHubs (with preparations for Kinesis for EC2, and Kafka for deployments
 in customer datacenters when needed) so can't just use disk level
 encryption (which would have its own overhead). We're putting all of our
 messages inside of encrypted envelopes before sending them to the stream
 which limits our opportunities for schema verification of the underlying
 messages to the declared type of the message.

 Encryption at rest mostly works out to a sales point for customers who want
 assurances, and in a Kafka focused discussion might be dealt with by
 covering disk encryption and how the conversations between Kafka instances
 are protected.

 Christian


 On Wed, Feb 25, 2015 at 11:51 AM, Jay Kreps j...@confluent.io wrote:

  Hey guys,
 
  One thing we tried to do along with the product release was start to put
  together a practical guide for using Kafka. I wrote this up here:
  http://blog.confluent.io/2015/02/25/stream-data-platform-1/
 
  I'd like to keep expanding on this as good practices emerge and we learn
  more stuff. So two questions:
  1. Anything you think other people should know about working with data
  streams? What did you wish you knew when you got started?
  2. Anything you don't know about but would like to hear more about?
 
  -Jay
 



Re: Tips for working with Kafka and data streams

2015-02-25 Thread Christian Csar
The questions we get from customers typically end up being general so we
break out our answer into network level and on disk scenarios.

On disk/at rest scenario may just be use full disk encryption at the OS
level and Kafka doesn't need to worry about it. But documenting any issues
around it would be good. For example what sort of Kafka specific
performance impacts does it have, ie budgeting for better processors.

The security story right now is to run on a private network, but I believe
some of our customers like to be told that within datacenter transmissions
are encrypted on the wire. Based on
https://cwiki.apache.org/confluence/display/KAFKA/Security that might mean
waiting for TLS support, or using a VPN/ssh tunnel for the network
connections.

Since we're in hosted stream land we can't do either of the above and
encrypt the messages themselves. For those enterprises that are like our
customers but would run Kafka or use Confluent, having a story like the
above so they don't give up the benefits of your schema management layers
would be good.

Since I didn't mention it before I did find your blog posts handy (though
I'm already moving us towards stream centric land).

Christian

On Wed, Feb 25, 2015 at 3:57 PM, Jay Kreps jay.kr...@gmail.com wrote:

 Hey Christian,

 That makes sense. I agree that would be a good area to dive into. Are you
 primarily interested in network level security or encryption on disk?

 -Jay

 On Wed, Feb 25, 2015 at 1:38 PM, Christian Csar christ...@csar.us wrote:

  I wouldn't say no to some discussion of encryption. We're running on
 Azure
  EventHubs (with preparations for Kinesis for EC2, and Kafka for
 deployments
  in customer datacenters when needed) so can't just use disk level
  encryption (which would have its own overhead). We're putting all of our
  messages inside of encrypted envelopes before sending them to the stream
  which limits our opportunities for schema verification of the underlying
  messages to the declared type of the message.
 
  Encryption at rest mostly works out to a sales point for customers who
 want
  assurances, and in a Kafka focused discussion might be dealt with by
  covering disk encryption and how the conversations between Kafka
 instances
  are protected.
 
  Christian
 
 
  On Wed, Feb 25, 2015 at 11:51 AM, Jay Kreps j...@confluent.io wrote:
 
   Hey guys,
  
   One thing we tried to do along with the product release was start to
 put
   together a practical guide for using Kafka. I wrote this up here:
   http://blog.confluent.io/2015/02/25/stream-data-platform-1/
  
   I'd like to keep expanding on this as good practices emerge and we
 learn
   more stuff. So two questions:
   1. Anything you think other people should know about working with data
   streams? What did you wish you knew when you got started?
   2. Anything you don't know about but would like to hear more about?
  
   -Jay
  
 



Re: Tips for working with Kafka and data streams

2015-02-25 Thread Julio Castillo
Although full disk encryption appears to be an easy solution, in our case
that may not be sufficient. For cases where the actual payload needs to be
encrypted, the cost of encryption is paid by the consumer and producers.
Further complicating the matter would be the handling of encryption keys,
etc. I think this is the area where enhancements to Kafka may facilitate
that key exchange between consumers and producers, still leaving it up to
the clients, but facilitating the key handling.

Julio

On 2/25/15, 4:24 PM, Christian Csar christ...@csar.us wrote:

The questions we get from customers typically end up being general so we
break out our answer into network level and on disk scenarios.

On disk/at rest scenario may just be use full disk encryption at the OS
level and Kafka doesn't need to worry about it. But documenting any issues
around it would be good. For example what sort of Kafka specific
performance impacts does it have, ie budgeting for better processors.

The security story right now is to run on a private network, but I believe
some of our customers like to be told that within datacenter transmissions
are encrypted on the wire. Based on
https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_conf
luence_display_KAFKA_Securityd=AwIBaQc=cKbMccWasSe6U4u_qE0M-qEjqwAh3shju
L5QPa1B7Ykr=rJHFl4LhCQ-6kvKROhIocflKqVSHRTvT-PgdZ5MFuS0m=jhFmJTJBQfbq0sN
jxtKA4M1tvSVgBLKOr2ePaK6zqwws=HqZ4N2gLpCZ796dRG7Fo-KLOBc0tgnnvDnC_8VTUo84
e=  that might mean
waiting for TLS support, or using a VPN/ssh tunnel for the network
connections.

Since we're in hosted stream land we can't do either of the above and
encrypt the messages themselves. For those enterprises that are like our
customers but would run Kafka or use Confluent, having a story like the
above so they don't give up the benefits of your schema management layers
would be good.

Since I didn't mention it before I did find your blog posts handy (though
I'm already moving us towards stream centric land).

Christian

On Wed, Feb 25, 2015 at 3:57 PM, Jay Kreps jay.kr...@gmail.com wrote:

 Hey Christian,

 That makes sense. I agree that would be a good area to dive into. Are
you
 primarily interested in network level security or encryption on disk?

 -Jay

 On Wed, Feb 25, 2015 at 1:38 PM, Christian Csar christ...@csar.us
wrote:

  I wouldn't say no to some discussion of encryption. We're running on
 Azure
  EventHubs (with preparations for Kinesis for EC2, and Kafka for
 deployments
  in customer datacenters when needed) so can't just use disk level
  encryption (which would have its own overhead). We're putting all of
our
  messages inside of encrypted envelopes before sending them to the
stream
  which limits our opportunities for schema verification of the
underlying
  messages to the declared type of the message.
 
  Encryption at rest mostly works out to a sales point for customers who
 want
  assurances, and in a Kafka focused discussion might be dealt with by
  covering disk encryption and how the conversations between Kafka
 instances
  are protected.
 
  Christian
 
 
  On Wed, Feb 25, 2015 at 11:51 AM, Jay Kreps j...@confluent.io wrote:
 
   Hey guys,
  
   One thing we tried to do along with the product release was start to
 put
   together a practical guide for using Kafka. I wrote this up here:
   
https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.confluent.io_201
5_02_25_stream-2Ddata-2Dplatform-2D1_d=AwIBaQc=cKbMccWasSe6U4u_qE0M-qEj
qwAh3shjuL5QPa1B7Ykr=rJHFl4LhCQ-6kvKROhIocflKqVSHRTvT-PgdZ5MFuS0m=jhFmJ
TJBQfbq0sNjxtKA4M1tvSVgBLKOr2ePaK6zqwws=0I9x4bCw1kN3y9Y22l9lK_YbhSYEZpp4
ZwBBrP-dSLke= 
  
   I'd like to keep expanding on this as good practices emerge and we
 learn
   more stuff. So two questions:
   1. Anything you think other people should know about working with
data
   streams? What did you wish you knew when you got started?
   2. Anything you don't know about but would like to hear more about?
  
   -Jay
  
 


NOTICE: This e-mail and any attachments to it may be privileged, confidential 
or contain trade secret information and is intended only for the use of the 
individual or entity to which it is addressed. If this e-mail was sent to you 
in error, please notify me immediately by either reply e-mail or by phone at 
408.498.6000, and do not use, disseminate, retain, print or copy the e-mail or 
any attachment. All messages sent to and from this e-mail address may be 
monitored as permitted by or necessary under applicable law and regulations.