Re: [DISCUSS] KIP-317: Transparent Data Encryption

2019-08-09 Thread Sönke Liebau
Hi Andrew,

thanks for your feedback!
I am interested though, why are you doubtful about getting a committer to
volunteer an opinion? Shouldn't this be in their interest as well?

I'll just continue along for now and start building a very rough poc
implementation based on what's in the KIP so far to flesh out more details
and add them to the KIP as I go along.

Best regards,
Sönke

On Wed, 7 Aug 2019 at 18:18, Andrew Schofield 
wrote:

> Hi,
> I think this is a useful KIP and it looks good in principle. While it can
> all be done using
> interceptors, if the brokers do not know anything about it, you need to
> maintain the
> mapping from topics to key ids somewhere external. I'd prefer the way
> you've done it.
>
> I'm not sure whether you'll manage to interest any committers in
> volunteering an
> opinion, and you'll need that before you can get the KIP accepted into
> Kafka.
>
> Thanks,
> Andrew Schofield (IBM)
>
> On 06/08/2019, 15:46, "Sönke Liebau" 
> wrote:
>
> Hi,
>
> I have so far received pretty much no comments on the technical details
> outlined in the KIP. While I am happy to continue with my own ideas of
> how
> to implement this, I would much prefer to at least get a very broad
> "looks
> good in principle, but still lots to flesh out" from a few people
> before I
> but more work into this.
>
> Best regards,
> Sönke
>
>
>
>
> On Tue, 21 May 2019 at 14:15, Sönke Liebau  >
> wrote:
>
> > Hi everybody,
> >
> > I'd like to rekindle the discussion around KIP-317.
> > I have reworked the KIP a little bit in order to design everything
> as a
> > pluggable implementation. During the course of that work I've also
> decided
> > to rename the KIP, as encryption will only be transparent in some
> cases. It
> > is now called "Add end to end data encryption functionality to Apache
> > Kafka" [1].
> >
> > I'd very much appreciate it if you could give the KIP a quick read.
> This
> > is not at this point a fully fleshed out design, as I would like to
> agree
> > on the underlying structure that I came up with first, before
> spending time
> > on details.
> >
> > TL/DR is:
> > Create three pluggable classes:
> > KeyManager runs on the broker and manages which keys to use, key
> rollover
> > etc
> > KeyProvider runs on the client and retrieves keys based on what the
> > KeyManager tells it
> > EncryptionEngine runs on the client andhandles the actual encryption
> > First idea of control flow between these components can be seen at
> [2]
> >
> > Please let me know any thoughts or concerns that you may have!
> >
> > Best regards,
> > Sönke
> >
> > [1]
> >
> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-317%253A%2BAdd%2Bend-to-end%2Bdata%2Bencryption%2Bfunctionality%2Bto%2BApache%2BKafkadata=02%7C01%7C%7Cc858aa722cc9434ba98d08d71a7cd547%7C84df9e7fe9f640afb435%7C1%7C0%7C637006995760557724sdata=GwcvmfILdjTZBxOseHR4IjUY0oMG3%2BKEjFNHo3pJlvc%3Dreserved=0
> > [2]
> >
> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdownload%2Fattachments%2F85479936%2Fkafka_e2e-encryption_control-flow.png%3Fversion%3D1%26modificationDate%3D1558439227551%26api%3Dv2data=02%7C01%7C%7Cc858aa722cc9434ba98d08d71a7cd547%7C84df9e7fe9f640afb435%7C1%7C0%7C637006995760557724sdata=FcMoNEliLn48OZfWca1TCQv%2BiIlRNqJNQvU52UfkbEs%3Dreserved=0
> >
> >
> >
> > On Fri, 10 Aug 2018 at 14:05, Sönke Liebau <
> soenke.lie...@opencore.com>
> > wrote:
> >
> >> Hi Viktor,
> >>
> >> thanks for your input! We could accommodate magic headers by
> removing any
> >> known fixed bytes pre-encryption, sticking them in a header field
> and
> >> prepending them after decryption. However, I am not sure whether
> this is
> >> actually necessary, as most modern (AES for sure) algorithms are
> considered
> >> to be resistant to known-plaintext types of attack. Even if the
> entire
> >> plaintext is known to the attacker he still needs to brute-force
> the key -
> >> which may take a while.
> >>
> >> Something different to consider in this context are compression
> >> sidechannel attacks like CRIME or BREACH, which may be relevant
> depending
> >> on what type of data is being sent through Kafka. Both these
> attacks depend
> >> on the encrypted record containing a combination of secret and user
> >> controlled data.
> >> For example if Kafka was used to forward data that the user entered
> on a
> >> website along with a secret API key that the website adds to a
> back-end
> >> server and the user can obtain the Kafka messages, these attacks
> would
> >> become relevant. Not much we can do about that except disallow
> encryption
> >> when compression is 

Re: [DISCUSS] KIP-317: Transparent Data Encryption

2019-08-08 Thread Sönke Liebau
Thanks for your feedback both of you!

I've commented inline below.


On Thu, 8 Aug 2019 at 08:38, Jörn Franke  wrote:

> If you are doing batch encryption then you are more similar to a scenario
> of file encryption. The more frequent the messages are you are closer to
> the ssl/https scenarios. You may learn from those protocols on how they
> handle keys, how long they keep them etc. to implement your E2e solution .
>
> > Am 08.08.2019 um 08:11 schrieb Maulin Vasavada <
> maulin.vasav...@gmail.com>:
> >
> > Hi Sönke Liebau
> > <
> https://www.mail-archive.com/search?l=dev@kafka.apache.org=from:%22S%C3%B6nke+Liebau%22
> >
> >
> > Thanks for the great detailed documentation. However, I feel by leaving
> the
> > KMS outside of Kafka might simplify the whole thing to a great extent. If
> > the broker is not going to touch the encrypted messages, why would we put
> > any dependency of KMS interfaces on the Broker. We have experimented
> doing
> > end-to-end message encryption and we used topic level keys and message
> > encryption with serializer wrapper which encrypts each message before
> > serializing. The serializer wrapper have to integrate with required KMS
> we
> > use internally and that was all.
>
My idea by having the broker manage topic keys was that we keep the option
of actually making the encryption transparent to the clients. This way you
could configure a topic as encrypted on the broker and the broker would
then push everything to the client that it needs to know to encrypt
messages on startup - but still be unable to decrypt messages itself.

However, this is only one possible scenario. Another valid scenario is of
course that you want to configure clients directly with keys, which I hope
my proposal also covers, as everything is pluggable. And in this case the
broker would not need a dependency on the KMS, as it doesn't need to handle
keys.

Basically by making this pluggable I hope to be able to cover a wide
variety of use cases, the two described in the KIP are just the ones that
I'd implement initially.



> >
> > However one key observation we had was - if we could do encryption at
> > 'batch' level instead of 'per-message' it can perform much better
> > (depending upon batch sizing). We didn't experiment with that though.
>

I agree, batch encryption would make this perform much better, but it has
downsides as well. I am unsure of the security implications of larger vs
smaller payload to be honest, but will investigate this.
In addition however, we do not want to decrypt the batch on the broker, so
this will be handed to consumers as a batch as well, which has the same
implications as end-to-end compression like more complicated offset
committing for consumers. I have not looked into that in a long time and
that may not even be an issue anymore. I'll do some digging here as well.
Bottom line: I agree, but I think we should offer both modes of operation.


> >
> > Thanks
> > Maulin
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [DISCUSS] KIP-317: Transparent Data Encryption

2019-08-08 Thread Jörn Franke
If you are doing batch encryption then you are more similar to a scenario of 
file encryption. The more frequent the messages are you are closer to the 
ssl/https scenarios. You may learn from those protocols on how they handle 
keys, how long they keep them etc. to implement your E2e solution .

> Am 08.08.2019 um 08:11 schrieb Maulin Vasavada :
> 
> Hi Sönke Liebau
> 
> 
> Thanks for the great detailed documentation. However, I feel by leaving the
> KMS outside of Kafka might simplify the whole thing to a great extent. If
> the broker is not going to touch the encrypted messages, why would we put
> any dependency of KMS interfaces on the Broker. We have experimented doing
> end-to-end message encryption and we used topic level keys and message
> encryption with serializer wrapper which encrypts each message before
> serializing. The serializer wrapper have to integrate with required KMS we
> use internally and that was all.
> 
> However one key observation we had was - if we could do encryption at
> 'batch' level instead of 'per-message' it can perform much better
> (depending upon batch sizing). We didn't experiment with that though.
> 
> Thanks
> Maulin


Re: [DISCUSS] KIP-317: Transparent Data Encryption

2019-08-08 Thread Maulin Vasavada
Hi Sönke Liebau


Thanks for the great detailed documentation. However, I feel by leaving the
KMS outside of Kafka might simplify the whole thing to a great extent. If
the broker is not going to touch the encrypted messages, why would we put
any dependency of KMS interfaces on the Broker. We have experimented doing
end-to-end message encryption and we used topic level keys and message
encryption with serializer wrapper which encrypts each message before
serializing. The serializer wrapper have to integrate with required KMS we
use internally and that was all.

However one key observation we had was - if we could do encryption at
'batch' level instead of 'per-message' it can perform much better
(depending upon batch sizing). We didn't experiment with that though.

Thanks
Maulin


Re: [DISCUSS] KIP-317: Transparent Data Encryption

2019-08-07 Thread Andrew Schofield
Hi,
I think this is a useful KIP and it looks good in principle. While it can all 
be done using
interceptors, if the brokers do not know anything about it, you need to 
maintain the
mapping from topics to key ids somewhere external. I'd prefer the way you've 
done it.

I'm not sure whether you'll manage to interest any committers in volunteering an
opinion, and you'll need that before you can get the KIP accepted into Kafka.

Thanks,
Andrew Schofield (IBM)

On 06/08/2019, 15:46, "Sönke Liebau"  
wrote:

Hi,

I have so far received pretty much no comments on the technical details
outlined in the KIP. While I am happy to continue with my own ideas of how
to implement this, I would much prefer to at least get a very broad "looks
good in principle, but still lots to flesh out" from a few people before I
but more work into this.

Best regards,
Sönke




On Tue, 21 May 2019 at 14:15, Sönke Liebau 
wrote:

> Hi everybody,
>
> I'd like to rekindle the discussion around KIP-317.
> I have reworked the KIP a little bit in order to design everything as a
> pluggable implementation. During the course of that work I've also decided
> to rename the KIP, as encryption will only be transparent in some cases. 
It
> is now called "Add end to end data encryption functionality to Apache
> Kafka" [1].
>
> I'd very much appreciate it if you could give the KIP a quick read. This
> is not at this point a fully fleshed out design, as I would like to agree
> on the underlying structure that I came up with first, before spending 
time
> on details.
>
> TL/DR is:
> Create three pluggable classes:
> KeyManager runs on the broker and manages which keys to use, key rollover
> etc
> KeyProvider runs on the client and retrieves keys based on what the
> KeyManager tells it
> EncryptionEngine runs on the client andhandles the actual encryption
> First idea of control flow between these components can be seen at [2]
>
> Please let me know any thoughts or concerns that you may have!
>
> Best regards,
> Sönke
>
> [1]
> 
https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-317%253A%2BAdd%2Bend-to-end%2Bdata%2Bencryption%2Bfunctionality%2Bto%2BApache%2BKafkadata=02%7C01%7C%7Cc858aa722cc9434ba98d08d71a7cd547%7C84df9e7fe9f640afb435%7C1%7C0%7C637006995760557724sdata=GwcvmfILdjTZBxOseHR4IjUY0oMG3%2BKEjFNHo3pJlvc%3Dreserved=0
> [2]
> 
https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdownload%2Fattachments%2F85479936%2Fkafka_e2e-encryption_control-flow.png%3Fversion%3D1%26modificationDate%3D1558439227551%26api%3Dv2data=02%7C01%7C%7Cc858aa722cc9434ba98d08d71a7cd547%7C84df9e7fe9f640afb435%7C1%7C0%7C637006995760557724sdata=FcMoNEliLn48OZfWca1TCQv%2BiIlRNqJNQvU52UfkbEs%3Dreserved=0
>
>
>
> On Fri, 10 Aug 2018 at 14:05, Sönke Liebau 
> wrote:
>
>> Hi Viktor,
>>
>> thanks for your input! We could accommodate magic headers by removing any
>> known fixed bytes pre-encryption, sticking them in a header field and
>> prepending them after decryption. However, I am not sure whether this is
>> actually necessary, as most modern (AES for sure) algorithms are 
considered
>> to be resistant to known-plaintext types of attack. Even if the entire
>> plaintext is known to the attacker he still needs to brute-force the key 
-
>> which may take a while.
>>
>> Something different to consider in this context are compression
>> sidechannel attacks like CRIME or BREACH, which may be relevant depending
>> on what type of data is being sent through Kafka. Both these attacks 
depend
>> on the encrypted record containing a combination of secret and user
>> controlled data.
>> For example if Kafka was used to forward data that the user entered on a
>> website along with a secret API key that the website adds to a back-end
>> server and the user can obtain the Kafka messages, these attacks would
>> become relevant. Not much we can do about that except disallow encryption
>> when compression is enabled (TLS chose this approach in version 1.3)
>>
>> I agree with you, that we definitely need to clearly document any risks
>> and how much security can reasonably be expected in any given scenario. 
We
>> might even consider logging a warning message when sending data that is
>> compressed and encrypted.
>>
>> On a different note, I've started amending the KIP to make key management
>> and distribution pluggable, should hopefully be able to publish sometime
>> Monday.
>>
>> Best regards,
>> Sönke
>>
>>
>> On Thu, Jun 21, 2018 at 12:26 PM, Viktor Somogyi > > wrote:
>>
>>> Hi Sönke,
>>>

Re: [DISCUSS] KIP-317: Transparent Data Encryption

2019-08-06 Thread Sönke Liebau
Hi,

I have so far received pretty much no comments on the technical details
outlined in the KIP. While I am happy to continue with my own ideas of how
to implement this, I would much prefer to at least get a very broad "looks
good in principle, but still lots to flesh out" from a few people before I
but more work into this.

Best regards,
Sönke




On Tue, 21 May 2019 at 14:15, Sönke Liebau 
wrote:

> Hi everybody,
>
> I'd like to rekindle the discussion around KIP-317.
> I have reworked the KIP a little bit in order to design everything as a
> pluggable implementation. During the course of that work I've also decided
> to rename the KIP, as encryption will only be transparent in some cases. It
> is now called "Add end to end data encryption functionality to Apache
> Kafka" [1].
>
> I'd very much appreciate it if you could give the KIP a quick read. This
> is not at this point a fully fleshed out design, as I would like to agree
> on the underlying structure that I came up with first, before spending time
> on details.
>
> TL/DR is:
> Create three pluggable classes:
> KeyManager runs on the broker and manages which keys to use, key rollover
> etc
> KeyProvider runs on the client and retrieves keys based on what the
> KeyManager tells it
> EncryptionEngine runs on the client andhandles the actual encryption
> First idea of control flow between these components can be seen at [2]
>
> Please let me know any thoughts or concerns that you may have!
>
> Best regards,
> Sönke
>
> [1]
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+end-to-end+data+encryption+functionality+to+Apache+Kafka
> [2]
> https://cwiki.apache.org/confluence/download/attachments/85479936/kafka_e2e-encryption_control-flow.png?version=1=1558439227551=v2
>
>
>
> On Fri, 10 Aug 2018 at 14:05, Sönke Liebau 
> wrote:
>
>> Hi Viktor,
>>
>> thanks for your input! We could accommodate magic headers by removing any
>> known fixed bytes pre-encryption, sticking them in a header field and
>> prepending them after decryption. However, I am not sure whether this is
>> actually necessary, as most modern (AES for sure) algorithms are considered
>> to be resistant to known-plaintext types of attack. Even if the entire
>> plaintext is known to the attacker he still needs to brute-force the key -
>> which may take a while.
>>
>> Something different to consider in this context are compression
>> sidechannel attacks like CRIME or BREACH, which may be relevant depending
>> on what type of data is being sent through Kafka. Both these attacks depend
>> on the encrypted record containing a combination of secret and user
>> controlled data.
>> For example if Kafka was used to forward data that the user entered on a
>> website along with a secret API key that the website adds to a back-end
>> server and the user can obtain the Kafka messages, these attacks would
>> become relevant. Not much we can do about that except disallow encryption
>> when compression is enabled (TLS chose this approach in version 1.3)
>>
>> I agree with you, that we definitely need to clearly document any risks
>> and how much security can reasonably be expected in any given scenario. We
>> might even consider logging a warning message when sending data that is
>> compressed and encrypted.
>>
>> On a different note, I've started amending the KIP to make key management
>> and distribution pluggable, should hopefully be able to publish sometime
>> Monday.
>>
>> Best regards,
>> Sönke
>>
>>
>> On Thu, Jun 21, 2018 at 12:26 PM, Viktor Somogyi > > wrote:
>>
>>> Hi Sönke,
>>>
>>> Compressing before encrypting has its dangers as well. Suppose you have a
>>> known compression format which adds a magic header and you're using a
>>> block
>>> cipher with a small enough block, then it becomes much easier to figure
>>> out
>>> the encryption key. For instance you can look at Snappy's stream
>>> identifier:
>>> https://github.com/google/snappy/blob/master/framing_format.txt
>>> . Based on this you should only use block ciphers where block sizes are
>>> much larger then 6 bytes. AES for instance should be good with its 128
>>> bits
>>> = 16 bytes but even this isn't entirely secure as the first 6 bytes
>>> already
>>> leaked some information - and it depends on the cypher that how much it
>>> is.
>>> Also if we suppose that an adversary accesses a broker and takes all the
>>> data, they'll have much easier job to decrypt it as they'll have much
>>> more
>>> examples.
>>> So overall we should make sure to define and document the compatible
>>> encryptions with the supported compression methods and the level of
>>> security they provide to make sure the users are fully aware of the
>>> security implications.
>>>
>>> Cheers,
>>> Viktor
>>>
>>> On Tue, Jun 19, 2018 at 11:55 AM Sönke Liebau
>>>  wrote:
>>>
>>> > Hi Stephane,
>>> >
>>> > thanks for pointing out the broken pictures, I fixed those.
>>> >
>>> > Regarding encrypting before or after batching the messages, you are
>>> > correct, I 

Re: [DISCUSS] KIP-317: Transparent Data Encryption

2019-05-21 Thread Sönke Liebau
Hi everybody,

I'd like to rekindle the discussion around KIP-317.
I have reworked the KIP a little bit in order to design everything as a
pluggable implementation. During the course of that work I've also decided
to rename the KIP, as encryption will only be transparent in some cases. It
is now called "Add end to end data encryption functionality to Apache
Kafka" [1].

I'd very much appreciate it if you could give the KIP a quick read. This is
not at this point a fully fleshed out design, as I would like to agree on
the underlying structure that I came up with first, before spending time on
details.

TL/DR is:
Create three pluggable classes:
KeyManager runs on the broker and manages which keys to use, key rollover
etc
KeyProvider runs on the client and retrieves keys based on what the
KeyManager tells it
EncryptionEngine runs on the client andhandles the actual encryption
First idea of control flow between these components can be seen at [2]

Please let me know any thoughts or concerns that you may have!

Best regards,
Sönke

[1]
https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+end-to-end+data+encryption+functionality+to+Apache+Kafka
[2]
https://cwiki.apache.org/confluence/download/attachments/85479936/kafka_e2e-encryption_control-flow.png?version=1=1558439227551=v2



On Fri, 10 Aug 2018 at 14:05, Sönke Liebau 
wrote:

> Hi Viktor,
>
> thanks for your input! We could accommodate magic headers by removing any
> known fixed bytes pre-encryption, sticking them in a header field and
> prepending them after decryption. However, I am not sure whether this is
> actually necessary, as most modern (AES for sure) algorithms are considered
> to be resistant to known-plaintext types of attack. Even if the entire
> plaintext is known to the attacker he still needs to brute-force the key -
> which may take a while.
>
> Something different to consider in this context are compression
> sidechannel attacks like CRIME or BREACH, which may be relevant depending
> on what type of data is being sent through Kafka. Both these attacks depend
> on the encrypted record containing a combination of secret and user
> controlled data.
> For example if Kafka was used to forward data that the user entered on a
> website along with a secret API key that the website adds to a back-end
> server and the user can obtain the Kafka messages, these attacks would
> become relevant. Not much we can do about that except disallow encryption
> when compression is enabled (TLS chose this approach in version 1.3)
>
> I agree with you, that we definitely need to clearly document any risks
> and how much security can reasonably be expected in any given scenario. We
> might even consider logging a warning message when sending data that is
> compressed and encrypted.
>
> On a different note, I've started amending the KIP to make key management
> and distribution pluggable, should hopefully be able to publish sometime
> Monday.
>
> Best regards,
> Sönke
>
>
> On Thu, Jun 21, 2018 at 12:26 PM, Viktor Somogyi 
> wrote:
>
>> Hi Sönke,
>>
>> Compressing before encrypting has its dangers as well. Suppose you have a
>> known compression format which adds a magic header and you're using a
>> block
>> cipher with a small enough block, then it becomes much easier to figure
>> out
>> the encryption key. For instance you can look at Snappy's stream
>> identifier:
>> https://github.com/google/snappy/blob/master/framing_format.txt
>> . Based on this you should only use block ciphers where block sizes are
>> much larger then 6 bytes. AES for instance should be good with its 128
>> bits
>> = 16 bytes but even this isn't entirely secure as the first 6 bytes
>> already
>> leaked some information - and it depends on the cypher that how much it
>> is.
>> Also if we suppose that an adversary accesses a broker and takes all the
>> data, they'll have much easier job to decrypt it as they'll have much more
>> examples.
>> So overall we should make sure to define and document the compatible
>> encryptions with the supported compression methods and the level of
>> security they provide to make sure the users are fully aware of the
>> security implications.
>>
>> Cheers,
>> Viktor
>>
>> On Tue, Jun 19, 2018 at 11:55 AM Sönke Liebau
>>  wrote:
>>
>> > Hi Stephane,
>> >
>> > thanks for pointing out the broken pictures, I fixed those.
>> >
>> > Regarding encrypting before or after batching the messages, you are
>> > correct, I had not thought of compression and how this changes things.
>> > Encrypted data does not really encrypt well. My reasoning at the time
>> > of writing was that if we encrypt the entire batch we'd have to wait
>> > for the batch to be full before starting to encrypt. Whereas with per
>> > message encryption we can encrypt them as they come in and more or
>> > less have them ready for sending when the batch is complete.
>> > However I think the difference will probably not be that large (will
>> > do some testing) and offset by just encrypting 

Re: [DISCUSS] KIP-317: Transparent Data Encryption

2018-08-10 Thread Sönke Liebau
Hi Viktor,

thanks for your input! We could accommodate magic headers by removing any
known fixed bytes pre-encryption, sticking them in a header field and
prepending them after decryption. However, I am not sure whether this is
actually necessary, as most modern (AES for sure) algorithms are considered
to be resistant to known-plaintext types of attack. Even if the entire
plaintext is known to the attacker he still needs to brute-force the key -
which may take a while.

Something different to consider in this context are compression sidechannel
attacks like CRIME or BREACH, which may be relevant depending on what type
of data is being sent through Kafka. Both these attacks depend on the
encrypted record containing a combination of secret and user controlled
data.
For example if Kafka was used to forward data that the user entered on a
website along with a secret API key that the website adds to a back-end
server and the user can obtain the Kafka messages, these attacks would
become relevant. Not much we can do about that except disallow encryption
when compression is enabled (TLS chose this approach in version 1.3)

I agree with you, that we definitely need to clearly document any risks and
how much security can reasonably be expected in any given scenario. We
might even consider logging a warning message when sending data that is
compressed and encrypted.

On a different note, I've started amending the KIP to make key management
and distribution pluggable, should hopefully be able to publish sometime
Monday.

Best regards,
Sönke


On Thu, Jun 21, 2018 at 12:26 PM, Viktor Somogyi 
wrote:

> Hi Sönke,
>
> Compressing before encrypting has its dangers as well. Suppose you have a
> known compression format which adds a magic header and you're using a block
> cipher with a small enough block, then it becomes much easier to figure out
> the encryption key. For instance you can look at Snappy's stream
> identifier: https://github.com/google/snappy/blob/master/framing_
> format.txt
> . Based on this you should only use block ciphers where block sizes are
> much larger then 6 bytes. AES for instance should be good with its 128 bits
> = 16 bytes but even this isn't entirely secure as the first 6 bytes already
> leaked some information - and it depends on the cypher that how much it is.
> Also if we suppose that an adversary accesses a broker and takes all the
> data, they'll have much easier job to decrypt it as they'll have much more
> examples.
> So overall we should make sure to define and document the compatible
> encryptions with the supported compression methods and the level of
> security they provide to make sure the users are fully aware of the
> security implications.
>
> Cheers,
> Viktor
>
> On Tue, Jun 19, 2018 at 11:55 AM Sönke Liebau
>  wrote:
>
> > Hi Stephane,
> >
> > thanks for pointing out the broken pictures, I fixed those.
> >
> > Regarding encrypting before or after batching the messages, you are
> > correct, I had not thought of compression and how this changes things.
> > Encrypted data does not really encrypt well. My reasoning at the time
> > of writing was that if we encrypt the entire batch we'd have to wait
> > for the batch to be full before starting to encrypt. Whereas with per
> > message encryption we can encrypt them as they come in and more or
> > less have them ready for sending when the batch is complete.
> > However I think the difference will probably not be that large (will
> > do some testing) and offset by just encrypting once instead of many
> > times, which has a certain overhead every time. Also, from a security
> > perspective encrypting longer chunks of data is preferable - another
> > benefit.
> >
> > This does however take away the ability of the broker to see the
> > individual records inside the encrypted batch, so this would need to
> > be stored and retrieved as a single record - just like is done for
> > compressed batches. I am not 100% sure that this won't create issues,
> > especially when considering transactions, I will need to look at the
> > compression code some more. In essence though, since it works for
> > compression I see no reason why it can't be made to work here.
> >
> > On a different note, going down this route might make us reconsider
> > storing the key with the data, as this might significantly reduce
> > storage overhead - still much higher than just storing them once
> > though.
> >
> > Best regards,
> > Sönke
> >
> > On Tue, Jun 19, 2018 at 5:59 AM, Stephane Maarek
> >  wrote:
> > > Hi Sonke
> > >
> > > Very much needed feature and discussion. FYI the image links seem
> broken.
> > >
> > > My 2 cents (if I understood correctly): you say "This process will be
> > > implemented after Serializer and Interceptors are done with the message
> > > right before it is added to the batch to be sent, in order to ensure
> that
> > > existing serializers and interceptors keep working with encryption just
> > > like without it."
> > >
> > > I think 

Re: [DISCUSS] KIP-317: Transparent Data Encryption

2018-06-21 Thread Viktor Somogyi
Hi Sönke,

Compressing before encrypting has its dangers as well. Suppose you have a
known compression format which adds a magic header and you're using a block
cipher with a small enough block, then it becomes much easier to figure out
the encryption key. For instance you can look at Snappy's stream
identifier: https://github.com/google/snappy/blob/master/framing_format.txt
. Based on this you should only use block ciphers where block sizes are
much larger then 6 bytes. AES for instance should be good with its 128 bits
= 16 bytes but even this isn't entirely secure as the first 6 bytes already
leaked some information - and it depends on the cypher that how much it is.
Also if we suppose that an adversary accesses a broker and takes all the
data, they'll have much easier job to decrypt it as they'll have much more
examples.
So overall we should make sure to define and document the compatible
encryptions with the supported compression methods and the level of
security they provide to make sure the users are fully aware of the
security implications.

Cheers,
Viktor

On Tue, Jun 19, 2018 at 11:55 AM Sönke Liebau
 wrote:

> Hi Stephane,
>
> thanks for pointing out the broken pictures, I fixed those.
>
> Regarding encrypting before or after batching the messages, you are
> correct, I had not thought of compression and how this changes things.
> Encrypted data does not really encrypt well. My reasoning at the time
> of writing was that if we encrypt the entire batch we'd have to wait
> for the batch to be full before starting to encrypt. Whereas with per
> message encryption we can encrypt them as they come in and more or
> less have them ready for sending when the batch is complete.
> However I think the difference will probably not be that large (will
> do some testing) and offset by just encrypting once instead of many
> times, which has a certain overhead every time. Also, from a security
> perspective encrypting longer chunks of data is preferable - another
> benefit.
>
> This does however take away the ability of the broker to see the
> individual records inside the encrypted batch, so this would need to
> be stored and retrieved as a single record - just like is done for
> compressed batches. I am not 100% sure that this won't create issues,
> especially when considering transactions, I will need to look at the
> compression code some more. In essence though, since it works for
> compression I see no reason why it can't be made to work here.
>
> On a different note, going down this route might make us reconsider
> storing the key with the data, as this might significantly reduce
> storage overhead - still much higher than just storing them once
> though.
>
> Best regards,
> Sönke
>
> On Tue, Jun 19, 2018 at 5:59 AM, Stephane Maarek
>  wrote:
> > Hi Sonke
> >
> > Very much needed feature and discussion. FYI the image links seem broken.
> >
> > My 2 cents (if I understood correctly): you say "This process will be
> > implemented after Serializer and Interceptors are done with the message
> > right before it is added to the batch to be sent, in order to ensure that
> > existing serializers and interceptors keep working with encryption just
> > like without it."
> >
> > I think encryption should happen AFTER a batch is created, right before
> it
> > is sent. Reason is that if we want to still keep advantage of
> compression,
> > encryption needs to happen after it (and I believe compression happens
> on a
> > batch level).
> > So to me for a producer: serializer / interceptors => batching =>
> > compression => encryption => send.
> > and the inverse for a consumer.
> >
> > Regards
> > Stephane
> >
> > On 19 June 2018 at 06:46, Sönke Liebau  .invalid>
> > wrote:
> >
> >> Hi everybody,
> >>
> >> I've created a draft version of KIP-317 which describes the addition
> >> of transparent data encryption functionality to Kafka.
> >>
> >> Please consider this as a basis for discussion - I am aware that this
> >> is not at a level of detail sufficient for implementation, but I
> >> wanted to get some feedback from the community on the general idea
> >> before spending more time on this.
> >>
> >> Link to the KIP is:
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> 317%3A+Add+transparent+data+encryption+functionality
> >>
> >> Best regards,
> >> Sönke
> >>
>
>
>
> --
> Sönke Liebau
> Partner
> Tel. +49 179 7940878
> OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany
>


Re: [DISCUSS] KIP-317: Transparent Data Encryption

2018-06-19 Thread Sönke Liebau
Hi Stephane,

thanks for pointing out the broken pictures, I fixed those.

Regarding encrypting before or after batching the messages, you are
correct, I had not thought of compression and how this changes things.
Encrypted data does not really encrypt well. My reasoning at the time
of writing was that if we encrypt the entire batch we'd have to wait
for the batch to be full before starting to encrypt. Whereas with per
message encryption we can encrypt them as they come in and more or
less have them ready for sending when the batch is complete.
However I think the difference will probably not be that large (will
do some testing) and offset by just encrypting once instead of many
times, which has a certain overhead every time. Also, from a security
perspective encrypting longer chunks of data is preferable - another
benefit.

This does however take away the ability of the broker to see the
individual records inside the encrypted batch, so this would need to
be stored and retrieved as a single record - just like is done for
compressed batches. I am not 100% sure that this won't create issues,
especially when considering transactions, I will need to look at the
compression code some more. In essence though, since it works for
compression I see no reason why it can't be made to work here.

On a different note, going down this route might make us reconsider
storing the key with the data, as this might significantly reduce
storage overhead - still much higher than just storing them once
though.

Best regards,
Sönke

On Tue, Jun 19, 2018 at 5:59 AM, Stephane Maarek
 wrote:
> Hi Sonke
>
> Very much needed feature and discussion. FYI the image links seem broken.
>
> My 2 cents (if I understood correctly): you say "This process will be
> implemented after Serializer and Interceptors are done with the message
> right before it is added to the batch to be sent, in order to ensure that
> existing serializers and interceptors keep working with encryption just
> like without it."
>
> I think encryption should happen AFTER a batch is created, right before it
> is sent. Reason is that if we want to still keep advantage of compression,
> encryption needs to happen after it (and I believe compression happens on a
> batch level).
> So to me for a producer: serializer / interceptors => batching =>
> compression => encryption => send.
> and the inverse for a consumer.
>
> Regards
> Stephane
>
> On 19 June 2018 at 06:46, Sönke Liebau 
> wrote:
>
>> Hi everybody,
>>
>> I've created a draft version of KIP-317 which describes the addition
>> of transparent data encryption functionality to Kafka.
>>
>> Please consider this as a basis for discussion - I am aware that this
>> is not at a level of detail sufficient for implementation, but I
>> wanted to get some feedback from the community on the general idea
>> before spending more time on this.
>>
>> Link to the KIP is:
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 317%3A+Add+transparent+data+encryption+functionality
>>
>> Best regards,
>> Sönke
>>



-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [DISCUSS] KIP-317: Transparent Data Encryption

2018-06-18 Thread Stephane Maarek
Hi Sonke

Very much needed feature and discussion. FYI the image links seem broken.

My 2 cents (if I understood correctly): you say "This process will be
implemented after Serializer and Interceptors are done with the message
right before it is added to the batch to be sent, in order to ensure that
existing serializers and interceptors keep working with encryption just
like without it."

I think encryption should happen AFTER a batch is created, right before it
is sent. Reason is that if we want to still keep advantage of compression,
encryption needs to happen after it (and I believe compression happens on a
batch level).
So to me for a producer: serializer / interceptors => batching =>
compression => encryption => send.
and the inverse for a consumer.

Regards
Stephane

On 19 June 2018 at 06:46, Sönke Liebau 
wrote:

> Hi everybody,
>
> I've created a draft version of KIP-317 which describes the addition
> of transparent data encryption functionality to Kafka.
>
> Please consider this as a basis for discussion - I am aware that this
> is not at a level of detail sufficient for implementation, but I
> wanted to get some feedback from the community on the general idea
> before spending more time on this.
>
> Link to the KIP is:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 317%3A+Add+transparent+data+encryption+functionality
>
> Best regards,
> Sönke
>