Re: Tips for working with Kafka and data streams
I wouldn't say no to some discussion of encryption. We're running on Azure EventHubs (with preparations for Kinesis for EC2, and Kafka for deployments in customer datacenters when needed) so can't just use disk level encryption (which would have its own overhead). We're putting all of our messages inside of encrypted envelopes before sending them to the stream which limits our opportunities for schema verification of the underlying messages to the declared type of the message. Encryption at rest mostly works out to a sales point for customers who want assurances, and in a Kafka focused discussion might be dealt with by covering disk encryption and how the conversations between Kafka instances are protected. Christian On Wed, Feb 25, 2015 at 11:51 AM, Jay Kreps j...@confluent.io wrote: Hey guys, One thing we tried to do along with the product release was start to put together a practical guide for using Kafka. I wrote this up here: http://blog.confluent.io/2015/02/25/stream-data-platform-1/ I'd like to keep expanding on this as good practices emerge and we learn more stuff. So two questions: 1. Anything you think other people should know about working with data streams? What did you wish you knew when you got started? 2. Anything you don't know about but would like to hear more about? -Jay
Re: Tips for working with Kafka and data streams
Yeah, we do have scenarios where we use customer specific keys so our envelopes end up containing key identification information for accessing our key repository. I'll certainly follow any changes you propose in this area with interest, but I'd expect that sort of centralized key thing to be fairly separate from Kafka even if there's a handy optional layer that integrates with it. Christian On Wed, Feb 25, 2015 at 5:34 PM, Julio Castillo jcasti...@financialengines.com wrote: Although full disk encryption appears to be an easy solution, in our case that may not be sufficient. For cases where the actual payload needs to be encrypted, the cost of encryption is paid by the consumer and producers. Further complicating the matter would be the handling of encryption keys, etc. I think this is the area where enhancements to Kafka may facilitate that key exchange between consumers and producers, still leaving it up to the clients, but facilitating the key handling. Julio On 2/25/15, 4:24 PM, Christian Csar christ...@csar.us wrote: The questions we get from customers typically end up being general so we break out our answer into network level and on disk scenarios. On disk/at rest scenario may just be use full disk encryption at the OS level and Kafka doesn't need to worry about it. But documenting any issues around it would be good. For example what sort of Kafka specific performance impacts does it have, ie budgeting for better processors. The security story right now is to run on a private network, but I believe some of our customers like to be told that within datacenter transmissions are encrypted on the wire. Based on https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_conf luence_display_KAFKA_Securityd=AwIBaQc=cKbMccWasSe6U4u_qE0M-qEjqwAh3shju L5QPa1B7Ykr=rJHFl4LhCQ-6kvKROhIocflKqVSHRTvT-PgdZ5MFuS0m=jhFmJTJBQfbq0sN jxtKA4M1tvSVgBLKOr2ePaK6zqwws=HqZ4N2gLpCZ796dRG7Fo-KLOBc0tgnnvDnC_8VTUo84 e= that might mean waiting for TLS support, or using a VPN/ssh tunnel for the network connections. Since we're in hosted stream land we can't do either of the above and encrypt the messages themselves. For those enterprises that are like our customers but would run Kafka or use Confluent, having a story like the above so they don't give up the benefits of your schema management layers would be good. Since I didn't mention it before I did find your blog posts handy (though I'm already moving us towards stream centric land). Christian On Wed, Feb 25, 2015 at 3:57 PM, Jay Kreps jay.kr...@gmail.com wrote: Hey Christian, That makes sense. I agree that would be a good area to dive into. Are you primarily interested in network level security or encryption on disk? -Jay On Wed, Feb 25, 2015 at 1:38 PM, Christian Csar christ...@csar.us wrote: I wouldn't say no to some discussion of encryption. We're running on Azure EventHubs (with preparations for Kinesis for EC2, and Kafka for deployments in customer datacenters when needed) so can't just use disk level encryption (which would have its own overhead). We're putting all of our messages inside of encrypted envelopes before sending them to the stream which limits our opportunities for schema verification of the underlying messages to the declared type of the message. Encryption at rest mostly works out to a sales point for customers who want assurances, and in a Kafka focused discussion might be dealt with by covering disk encryption and how the conversations between Kafka instances are protected. Christian On Wed, Feb 25, 2015 at 11:51 AM, Jay Kreps j...@confluent.io wrote: Hey guys, One thing we tried to do along with the product release was start to put together a practical guide for using Kafka. I wrote this up here: https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.confluent.io_201 5_02_25_stream-2Ddata-2Dplatform-2D1_d=AwIBaQc=cKbMccWasSe6U4u_qE0M-qEj qwAh3shjuL5QPa1B7Ykr=rJHFl4LhCQ-6kvKROhIocflKqVSHRTvT-PgdZ5MFuS0m=jhFmJ TJBQfbq0sNjxtKA4M1tvSVgBLKOr2ePaK6zqwws=0I9x4bCw1kN3y9Y22l9lK_YbhSYEZpp4 ZwBBrP-dSLke= I'd like to keep expanding on this as good practices emerge and we learn more stuff. So two questions: 1. Anything you think other people should know about working with data streams? What did you wish you knew when you got started? 2. Anything you don't know about but would like to hear more about? -Jay NOTICE: This e-mail and any attachments to it may be privileged, confidential or contain trade secret information and is intended only for the use of the individual or entity to which it is addressed. If this e-mail was sent to you in error, please notify me immediately by either reply e-mail or by phone at 408.498.6000, and do not use, disseminate, retain, print or copy the e-mail or any attachment. All messages sent to and from
Re: Tips for working with Kafka and data streams
+2, these kind of articles coming from the ones who created Kafka always provide great value to Kafka users and developers. For my 2 cents, I would love to see one or two articles for developers who involved in Kafka development on the topics of how to develop test cases and how to run them, what to expect when error occurs, typical system settings, I suspect that most of us do run it on linux based systems, little pointer probably can help a lot. and most importantly how to set up your dev environment so that you are not struggling with the things the pioneers have already figured out. For example, recommended dev. ide, debug methods, of course, these will be the preference of the writer, no one is obligated to use but can certainly get people started quicker. As Kafka draw more interest, I suspect more developers will join, having something like that can be extremely helpful. Jay, articles similar to the one linked in your original email can actually be submitted to developerworks, and you can get some money out of it if you like. If you do not know how to do that, I can certainly provide some pointers if you are interested. Thanks. Tong Li OpenStack Kafka Community Development Building 501/B205 liton...@us.ibm.com From: Jay Kreps j...@confluent.io To: d...@kafka.apache.org d...@kafka.apache.org, users@kafka.apache.org users@kafka.apache.org Date: 02/25/2015 02:52 PM Subject:Tips for working with Kafka and data streams Hey guys, One thing we tried to do along with the product release was start to put together a practical guide for using Kafka. I wrote this up here: http://blog.confluent.io/2015/02/25/stream-data-platform-1/ I'd like to keep expanding on this as good practices emerge and we learn more stuff. So two questions: 1. Anything you think other people should know about working with data streams? What did you wish you knew when you got started? 2. Anything you don't know about but would like to hear more about? -Jay
Re: Tips for working with Kafka and data streams
Hey Christian, That makes sense. I agree that would be a good area to dive into. Are you primarily interested in network level security or encryption on disk? -Jay On Wed, Feb 25, 2015 at 1:38 PM, Christian Csar christ...@csar.us wrote: I wouldn't say no to some discussion of encryption. We're running on Azure EventHubs (with preparations for Kinesis for EC2, and Kafka for deployments in customer datacenters when needed) so can't just use disk level encryption (which would have its own overhead). We're putting all of our messages inside of encrypted envelopes before sending them to the stream which limits our opportunities for schema verification of the underlying messages to the declared type of the message. Encryption at rest mostly works out to a sales point for customers who want assurances, and in a Kafka focused discussion might be dealt with by covering disk encryption and how the conversations between Kafka instances are protected. Christian On Wed, Feb 25, 2015 at 11:51 AM, Jay Kreps j...@confluent.io wrote: Hey guys, One thing we tried to do along with the product release was start to put together a practical guide for using Kafka. I wrote this up here: http://blog.confluent.io/2015/02/25/stream-data-platform-1/ I'd like to keep expanding on this as good practices emerge and we learn more stuff. So two questions: 1. Anything you think other people should know about working with data streams? What did you wish you knew when you got started? 2. Anything you don't know about but would like to hear more about? -Jay
Re: Tips for working with Kafka and data streams
The questions we get from customers typically end up being general so we break out our answer into network level and on disk scenarios. On disk/at rest scenario may just be use full disk encryption at the OS level and Kafka doesn't need to worry about it. But documenting any issues around it would be good. For example what sort of Kafka specific performance impacts does it have, ie budgeting for better processors. The security story right now is to run on a private network, but I believe some of our customers like to be told that within datacenter transmissions are encrypted on the wire. Based on https://cwiki.apache.org/confluence/display/KAFKA/Security that might mean waiting for TLS support, or using a VPN/ssh tunnel for the network connections. Since we're in hosted stream land we can't do either of the above and encrypt the messages themselves. For those enterprises that are like our customers but would run Kafka or use Confluent, having a story like the above so they don't give up the benefits of your schema management layers would be good. Since I didn't mention it before I did find your blog posts handy (though I'm already moving us towards stream centric land). Christian On Wed, Feb 25, 2015 at 3:57 PM, Jay Kreps jay.kr...@gmail.com wrote: Hey Christian, That makes sense. I agree that would be a good area to dive into. Are you primarily interested in network level security or encryption on disk? -Jay On Wed, Feb 25, 2015 at 1:38 PM, Christian Csar christ...@csar.us wrote: I wouldn't say no to some discussion of encryption. We're running on Azure EventHubs (with preparations for Kinesis for EC2, and Kafka for deployments in customer datacenters when needed) so can't just use disk level encryption (which would have its own overhead). We're putting all of our messages inside of encrypted envelopes before sending them to the stream which limits our opportunities for schema verification of the underlying messages to the declared type of the message. Encryption at rest mostly works out to a sales point for customers who want assurances, and in a Kafka focused discussion might be dealt with by covering disk encryption and how the conversations between Kafka instances are protected. Christian On Wed, Feb 25, 2015 at 11:51 AM, Jay Kreps j...@confluent.io wrote: Hey guys, One thing we tried to do along with the product release was start to put together a practical guide for using Kafka. I wrote this up here: http://blog.confluent.io/2015/02/25/stream-data-platform-1/ I'd like to keep expanding on this as good practices emerge and we learn more stuff. So two questions: 1. Anything you think other people should know about working with data streams? What did you wish you knew when you got started? 2. Anything you don't know about but would like to hear more about? -Jay
Re: Tips for working with Kafka and data streams
Although full disk encryption appears to be an easy solution, in our case that may not be sufficient. For cases where the actual payload needs to be encrypted, the cost of encryption is paid by the consumer and producers. Further complicating the matter would be the handling of encryption keys, etc. I think this is the area where enhancements to Kafka may facilitate that key exchange between consumers and producers, still leaving it up to the clients, but facilitating the key handling. Julio On 2/25/15, 4:24 PM, Christian Csar christ...@csar.us wrote: The questions we get from customers typically end up being general so we break out our answer into network level and on disk scenarios. On disk/at rest scenario may just be use full disk encryption at the OS level and Kafka doesn't need to worry about it. But documenting any issues around it would be good. For example what sort of Kafka specific performance impacts does it have, ie budgeting for better processors. The security story right now is to run on a private network, but I believe some of our customers like to be told that within datacenter transmissions are encrypted on the wire. Based on https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_conf luence_display_KAFKA_Securityd=AwIBaQc=cKbMccWasSe6U4u_qE0M-qEjqwAh3shju L5QPa1B7Ykr=rJHFl4LhCQ-6kvKROhIocflKqVSHRTvT-PgdZ5MFuS0m=jhFmJTJBQfbq0sN jxtKA4M1tvSVgBLKOr2ePaK6zqwws=HqZ4N2gLpCZ796dRG7Fo-KLOBc0tgnnvDnC_8VTUo84 e= that might mean waiting for TLS support, or using a VPN/ssh tunnel for the network connections. Since we're in hosted stream land we can't do either of the above and encrypt the messages themselves. For those enterprises that are like our customers but would run Kafka or use Confluent, having a story like the above so they don't give up the benefits of your schema management layers would be good. Since I didn't mention it before I did find your blog posts handy (though I'm already moving us towards stream centric land). Christian On Wed, Feb 25, 2015 at 3:57 PM, Jay Kreps jay.kr...@gmail.com wrote: Hey Christian, That makes sense. I agree that would be a good area to dive into. Are you primarily interested in network level security or encryption on disk? -Jay On Wed, Feb 25, 2015 at 1:38 PM, Christian Csar christ...@csar.us wrote: I wouldn't say no to some discussion of encryption. We're running on Azure EventHubs (with preparations for Kinesis for EC2, and Kafka for deployments in customer datacenters when needed) so can't just use disk level encryption (which would have its own overhead). We're putting all of our messages inside of encrypted envelopes before sending them to the stream which limits our opportunities for schema verification of the underlying messages to the declared type of the message. Encryption at rest mostly works out to a sales point for customers who want assurances, and in a Kafka focused discussion might be dealt with by covering disk encryption and how the conversations between Kafka instances are protected. Christian On Wed, Feb 25, 2015 at 11:51 AM, Jay Kreps j...@confluent.io wrote: Hey guys, One thing we tried to do along with the product release was start to put together a practical guide for using Kafka. I wrote this up here: https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.confluent.io_201 5_02_25_stream-2Ddata-2Dplatform-2D1_d=AwIBaQc=cKbMccWasSe6U4u_qE0M-qEj qwAh3shjuL5QPa1B7Ykr=rJHFl4LhCQ-6kvKROhIocflKqVSHRTvT-PgdZ5MFuS0m=jhFmJ TJBQfbq0sNjxtKA4M1tvSVgBLKOr2ePaK6zqwws=0I9x4bCw1kN3y9Y22l9lK_YbhSYEZpp4 ZwBBrP-dSLke= I'd like to keep expanding on this as good practices emerge and we learn more stuff. So two questions: 1. Anything you think other people should know about working with data streams? What did you wish you knew when you got started? 2. Anything you don't know about but would like to hear more about? -Jay NOTICE: This e-mail and any attachments to it may be privileged, confidential or contain trade secret information and is intended only for the use of the individual or entity to which it is addressed. If this e-mail was sent to you in error, please notify me immediately by either reply e-mail or by phone at 408.498.6000, and do not use, disseminate, retain, print or copy the e-mail or any attachment. All messages sent to and from this e-mail address may be monitored as permitted by or necessary under applicable law and regulations.