Re: Pure Python Data Mangling or Encrypting
On 06/30/2015 01:33 PM, Chris Angelico wrote: From the software's point of view, it has two distinct modes: server, in which it listens on a socket and receives data, and client, in which it connects to other people's sockets and sends data. As such, the server mode is the only one that receives untrusted data from another user and stores it on the hard disk. That's close. There are 3 types: storage nodes, client nodes, and control nodes. Communication: storage node -- control node storage node -- storage node client node -- storage node client node -- control node Data is uploaded by clients and distributed among storage nodes. Everything is coordinated by the control nodes (plural for redundancy). -Randall -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Tue, 30 Jun 2015 23:25:01 +, Jon Ribbens wrote: On 2015-06-30, Steven D'Aprano st...@pearwood.info wrote: I don't think there has been much research into keeping at least *some* security even when keys have been compromised, apart from as it relates to two-factor authentication. That's because the key is all the secret part. If an attacker knows the algorithm, and the key, and the ciphertext, then *by definition* all is lost. If you mean keeping the algorithm secret too then that's just considered bad crypto. In the past, and still today among people who don't understand Kerckhoffs' principle, people have tried to keep the cipher secret and not have a key at all. E.g. atbash, or caesar cipher, which once upon a time were cutting edge ciphers, as laughably insecure as they are today. If the method was compromised, all was lost. Caesar cipher has a key. It's just very small, so is easy to guess. Today, if the key is compromised, all is lost. Is it possible that there are ciphers that are resistant to discovery of the key? Obviously if you know the key you can read encrypted messages, that's what the key is for, but there are scenarios where you would want security to degrade gracefully instead of in a brittle all-or-nothing manner: - even if the attacker can read my messages, he cannot tamper with them or write new ones as me. I suppose that could be achieved by having separate encryption and signing keys, but you could do the same but better by encrypting with multiple algorithms. It's not an unstudied area: https://en.wikipedia.org/wiki/Multiple_encryption The kipper flies at Midnight (from almost every WWII spy movie ever) even if this message is decoded it is meaningless unless the attacker also has the meanings of the Code phrases (which would mean your agent had been captured anyway) -- That's the funniest thing I've ever heard and I will _not_ condone it. -- DyerMaker, 17 March 2000 MegaPhone radio show -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
Randall Smith wrote: Worse case, something that looks like this would land on the disk. crc32 checksum + translation table + malware It would be safer to add something to both the beginning *and* end of the file. Some file formats, e.g. zip, pdf, are designed to be read starting from the end. So I would suggest something like crc32 checksum + payload + translation table -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 2015-06-30, Steven D'Aprano st...@pearwood.info wrote: On Tue, 30 Jun 2015 06:52 am, Randall Smith wrote: Not sure why you posted the link. The crc32 checksum is just to check for possible filesystem corruption. The system does periodic data corruption checks. BTRFS uses crc32 checksums also. Please explain. The file system can trust that anything writing to a file is allowed to write to it, in doesn't have to defend against malicious writes. As I understand it, your application does. Here is the attack scenario I have in mind: - you write a file to my computer, and neglect to encrypt it; Eh? The game is over right there. I don't trust you, and yet I have just given you my private data, unencrypted. Checksums don't even come into it, we have failed utterly at step 1. - since you are using CRC, it is quite easy for me to ensure the checksums match after inserting malware; No, you have yet *again* misunderstood the difference between the client and the server. I was wrong: cryptographically strong ciphers are generally NOT resistant to what I described as a preimage attack. If the key leaks, using AES won't save you: an attacker with access to the key can produce a ciphertext that decrypts to the malware of his choice, regardless of whether you use AES-256 or rot-13. There may be other encryption methods which don't suffer from that, but he doesn't know of them off the top of his head. lol. I suspected as much. You and Johannes were even more wrong than was already obvious. The other threat I mentioned is that the receiver will read the content of the file. For that, a strong cipher is much to be preferred over a weak one, and it needs to be encrypted by the sending end, not the receiving end. (If the receiving end does it, it has to keep the key so it can decrypt before sending back, which means the computer's owner can just grab the key and read the files.) Yes, that is utterly basic. -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 06/29/2015 10:00 PM, Steven D'Aprano wrote: On Tue, 30 Jun 2015 06:52 am, Randall Smith wrote: Not sure why you posted the link. The crc32 checksum is just to check for possible filesystem corruption. The system does periodic data corruption checks. BTRFS uses crc32 checksums also. Please explain. The file system can trust that anything writing to a file is allowed to write to it, in doesn't have to defend against malicious writes. As I understand it, your application does. Here is the attack scenario I have in mind: - you write a file to my computer, and neglect to encrypt it; - and record the checksum for later; - I insert malware into your file; - you retrieve the file from me; - if the checksum matches what you have on record, you accept the file; - since you are using CRC, it is quite easy for me to ensure the checksums match after inserting malware; - and I have now successfully injected malware into your computer. I'm making an assumption here -- I assume that the sender records a checksum for uploaded files so that when they get something back again they can tell whether or not it is the same content they uploaded. Yes. The client software computes sha256 checksums. * * * By the way, regarding the use of a substitution cipher, I spoke to the crypto guy at work, and preimage attack is not quite the right terminology, since that's normally used in the context of hash functions. It's almost a known ciphertext attack, but not quite, since that terminology refers to guessing the key from the ciphertext. I was wrong: cryptographically strong ciphers are generally NOT resistant to what I described as a preimage attack. If the key leaks, using AES won't save you: an attacker with access to the key can produce a ciphertext that decrypts to the malware of his choice, regardless of whether you use AES-256 or rot-13. There may be other encryption methods which don't suffer from that, but he doesn't know of them off the top of his head. His comment was, don't leak the key. I'm pretty sure all encryption hinges on guarding the key. The other threat I mentioned is that the receiver will read the content of the file. For that, a strong cipher is much to be preferred over a weak one, and it needs to be encrypted by the sending end, not the receiving end. (If the receiving end does it, it has to keep the key so it can decrypt before sending back, which means the computer's owner can just grab the key and read the files.) And again, that's why the client (data owner) uses AES. -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 2015-06-30, Steven D'Aprano st...@pearwood.info wrote: On Tue, 30 Jun 2015 10:19 pm, Jon Ribbens wrote: Eh? The game is over right there. I don't trust you, and yet I have just given you my private data, unencrypted. Yes. That is exactly the problem. If the application doesn't encrypt the data for me, *it isn't going to happen*. We are in violent agreement that the sender needs to encrypt the data. It's a good thing that he's said it will then. Randall has suggested that encryption is optional. No he hasn't. You just keep creatively misreading what he says, for some reason. It's not unreasonable to raise this issue. It is unreasonable to raise it over and over again however, especially when there's no reason at all to think it's relevant, and nothing has changed from the last time you raised it. We can mitigate against the second attack by using a cryptographically strong hash function to detect tampering. Not on the server you can't. If the attacker can edit the files he can edit the hashes too. These *are* resistant to preimage attacks. If I give you a SHA512 checksum, there is no known practical method to generate a file with that same checksum. If I give you a CRC checksum, you can. Randall didn't suggest any usage of CRCs where preimage attacks are relevant. You just made that bit up. - since you are using CRC, it is quite easy for me to ensure the checksums match after inserting malware; No, you have yet *again* misunderstood the difference between the client and the server. This was described as a peer-to-peer application. You even stated that it was a pretty obvious use-case, a peer-to-peer dropbox. So is it peer-to-peer or client-server? Both. It sounds a bit like there are clients which upload files to a cloud of servers which are peers of each other. But seriously, is this the source of all your confusion? Even if all the nodes are pure peers (which it doesn't sound like they are), any particular file will still have a source node which is therefore the client for that file. You're trying to draw a hard distinction where there is none. lol. I suspected as much. You and Johannes were even more wrong than was already obvious. You suspected as much? Such a pity you didn't speak up earlier and explain that cryptographic ciphers aren't generally resistant to preimage attacks. I think you're misusing that phrase. But taking what you meant, I suspected it was true (would they be reistant, after all?) but I couldn't be bothered to check because the whole crypto bit was a complete red-herring in the first place. The original discussion wasn't about crypto, all the discussion about that was only because you and Johannes wrongly insisted it was necessary. -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Tue, 30 Jun 2015 10:19 pm, Jon Ribbens wrote: On 2015-06-30, Steven D'Aprano st...@pearwood.info wrote: On Tue, 30 Jun 2015 06:52 am, Randall Smith wrote: Not sure why you posted the link. The crc32 checksum is just to check for possible filesystem corruption. The system does periodic data corruption checks. BTRFS uses crc32 checksums also. Please explain. The file system can trust that anything writing to a file is allowed to write to it, in doesn't have to defend against malicious writes. As I understand it, your application does. Here is the attack scenario I have in mind: - you write a file to my computer, and neglect to encrypt it; Eh? The game is over right there. I don't trust you, and yet I have just given you my private data, unencrypted. Yes. That is exactly the problem. If the application doesn't encrypt the data for me, *it isn't going to happen*. We are in violent agreement that the sender needs to encrypt the data. I think Randall has been somewhat less than clear about what the application actually does and how it works. He probably thinks he doesn't need to explain, that its none of our business, and wishes we'd just shut up about it. That's his right. It's also my right to discuss the possible security implications of some hypothetical peer-to-peer dropbox-like application which may, or may not, be similar to Randall's application. Whether Randall learns anything from that discussion, or just tunes it out, is irrelevant. I've already learned at least one thing from this discussion, so as far as I'm concerned it's a win. Randall has suggested that encryption is optional. It isn't clear whether he means there is an option to turn encryption off, or whether he means I can hack the application and disable it, or write my own application. I don't expect him to be responsible for rogue applications that have been hacked or written independently, which (out of malice or stupidity) don't encrypt the uploaded data. But I think that it is foolish to support an unencrypted mode of operation. It's not unreasonable to raise this issue. The default state of security among IT professionals is something worse than awful: https://medium.com/message/everything-is-broken-81e5f33a24e1 One of Australia's largest ISPs recently was hacked, and guess how they stored their customer's passwords? Yes, you got it: in plain text. There is no security mistake so foolish that IT professionals won't make it. Checksums don't even come into it, we have failed utterly at step 1. *shrug* You're right. But having failed at step 1, there are multiple attacks that can follow. The first attack is the obvious one: the ability to read the unencrypted data. If you can trick me into turning encryption off (say, you use a social engineering attack on me and convince me to delete the virus crypto.py), then I might inadvertently upload unencrypted data to you. Or maybe you find an attack on the application that can fool it into dropping down to unencrypted mode. If there's no unencrypted mode in the first place, that's much harder. Earlier, Chris suggested that the application might choose to import the crypto module, and if it's not available, just keep working without encryption. This hypothetical attack demonstrates that this would be a mistake. It's hard for an attacker to convince a naive user to open up the application source code and edit the code. It's easier to convince them to delete a file. Or, the application just has a bug in it. It accidentally flips the sense of the use encryption flag. That's a failure mode that simply cannot occur if there is no such flag in the first place. If our attacker has managed to disable encryption in the sender's application, then they can not only read my data, but tamper with it. These are *separate attacks* with the same underlying cause. I can mitigate one without mitigating the other. We can mitigate against the second attack by using a cryptographically strong hash function to detect tampering. These *are* resistant to preimage attacks. If I give you a SHA512 checksum, there is no known practical method to generate a file with that same checksum. If I give you a CRC checksum, you can. (Naturally the checksum has to be under the sender's control. If the receiver has the checksum and the data, they can just replace the checksum with one of their choosing.) That's a separate issue from detecting non-malicious data corruption, although of course a SHA512 checksum will detect that as well. - since you are using CRC, it is quite easy for me to ensure the checksums match after inserting malware; No, you have yet *again* misunderstood the difference between the client and the server. This was described as a peer-to-peer application. You even stated that it was a pretty obvious use-case, a peer-to-peer dropbox. So is it peer-to-peer or client-server? In any case, since Randall has refused to go into specific details of how his application
Re: Pure Python Data Mangling or Encrypting
On 06/29/2015 03:49 PM, Jon Ribbens wrote: On 2015-06-29, Randall Smith rand...@tnr.cc wrote: Same reason newer filesystems like BTRFS use checkusms (BTRFS uses CRC32). The storage machine runs periodic file integrity checks. It has no control over the underlying filesystem. True, but presumably neither does it have anything it can do to rectify the situation if it finds a problem, and the client will have to keep its own secure hash of its file anyway. (Unless I suppose the server actually can request a new copy from the client or another server if it finds a corrupt file?) Yes. The storage servers are monitored for integrity. They can request a new copy, though frequent corruption results in the server being marked as unreliable. -Randall -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Wed, Jul 1, 2015 at 4:17 AM, Steven D'Aprano st...@pearwood.info wrote: If you can trick me into turning encryption off (say, you use a social engineering attack on me and convince me to delete the virus crypto.py), then I might inadvertently upload unencrypted data to you. Or maybe you find an attack on the application that can fool it into dropping down to unencrypted mode. If there's no unencrypted mode in the first place, that's much harder. Earlier, Chris suggested that the application might choose to import the crypto module, and if it's not available, just keep working without encryption. This hypothetical attack demonstrates that this would be a mistake. It's hard for an attacker to convince a naive user to open up the application source code and edit the code. It's easier to convince them to delete a file. And I'm sure Steven knows about this, but if anyone else isn't convinced that this is a serious vulnerability, look into various forms of downgrade attack, such as the recent POODLE. Security doesn't exist if an attacker can convince your program to turn it off without your knowledge. - since you are using CRC, it is quite easy for me to ensure the checksums match after inserting malware; No, you have yet *again* misunderstood the difference between the client and the server. This was described as a peer-to-peer application. You even stated that it was a pretty obvious use-case, a peer-to-peer dropbox. So is it peer-to-peer or client-server? I've never managed to get any sort of grasp of what this application actually *is*, but peer-to-peer Dropbox is certainly something that it *might be*. It could be simultaneously peer-to-peer from the human's point of view, and client-server from the application's - imagine BitTorrent protocol, but where one end connects to a socket that the other's listening on, and the active socket always pushes data to the passive socket. (With BitTorrent, it's truly symmetrical - doesn't matter who listens and who connects. But imagine if it weren't that way.) From the software's point of view, it has two distinct modes: server, in which it listens on a socket and receives data, and client, in which it connects to other people's sockets and sends data. As such, the server mode is the only one that receives untrusted data from another user and stores it on the hard disk. But this is just one theory of what the program *might* be, based on what I've gathered in this thread. Or rather, it's a vague theory of something that's mostly plausible, without necessarily even being useful. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Wed, 1 Jul 2015 03:39 am, Randall Smith wrote: On 06/29/2015 10:00 PM, Steven D'Aprano wrote: I'm making an assumption here -- I assume that the sender records a checksum for uploaded files so that when they get something back again they can tell whether or not it is the same content they uploaded. Yes. The client software computes sha256 checksums. Thanks for clarifying. [...] His comment was, don't leak the key. I'm pretty sure all encryption hinges on guarding the key. That would be Kerckhoffs' Principle, also known as Shannon's Maxim. I don't think there has been much research into keeping at least *some* security even when keys have been compromised, apart from as it relates to two-factor authentication. (Assume that other people know the password to your bank account. They can read your balance, but they can't steal your money unless they first steal your phone or RSA token.) In the past, and still today among people who don't understand Kerckhoffs' principle, people have tried to keep the cipher secret and not have a key at all. E.g. atbash, or caesar cipher, which once upon a time were cutting edge ciphers, as laughably insecure as they are today. If the method was compromised, all was lost. Today, if the key is compromised, all is lost. Is it possible that there are ciphers that are resistant to discovery of the key? Obviously if you know the key you can read encrypted messages, that's what the key is for, but there are scenarios where you would want security to degrade gracefully instead of in a brittle all-or-nothing manner: - even if the attacker can read my messages, he cannot tamper with them or write new ones as me. (I'm pretty sure that, for example, the military would consider it horrible if the enemy could listen in on their communications, but *even worse* if the enemy could send false orders that appear to be legitimate.) Sixty years ago, the idea of having a separate encryption key that you keep secret and a decryption key that you can give out to everyone (public key encryption) probably would have seemed ridiculous too. -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Wed, Jul 1, 2015 at 4:59 AM, Steven D'Aprano st...@pearwood.info wrote: Today, if the key is compromised, all is lost. Is it possible that there are ciphers that are resistant to discovery of the key? Obviously if you know the key you can read encrypted messages, that's what the key is for, but there are scenarios where you would want security to degrade gracefully instead of in a brittle all-or-nothing manner: - even if the attacker can read my messages, he cannot tamper with them or write new ones as me. (I'm pretty sure that, for example, the military would consider it horrible if the enemy could listen in on their communications, but *even worse* if the enemy could send false orders that appear to be legitimate.) That would be accomplished by a two-fold enveloping of signing and encrypting. If I sign something using my private key, then encrypt it using your public key, someone who's compromised your private key could snoop and read the message, but couldn't forge a message from me. Of course, that just means there are lots more secrets to worry about getting compromised. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 2015-06-30, Steven D'Aprano st...@pearwood.info wrote: I don't think there has been much research into keeping at least *some* security even when keys have been compromised, apart from as it relates to two-factor authentication. That's because the key is all the secret part. If an attacker knows the algorithm, and the key, and the ciphertext, then *by definition* all is lost. If you mean keeping the algorithm secret too then that's just considered bad crypto. In the past, and still today among people who don't understand Kerckhoffs' principle, people have tried to keep the cipher secret and not have a key at all. E.g. atbash, or caesar cipher, which once upon a time were cutting edge ciphers, as laughably insecure as they are today. If the method was compromised, all was lost. Caesar cipher has a key. It's just very small, so is easy to guess. Today, if the key is compromised, all is lost. Is it possible that there are ciphers that are resistant to discovery of the key? Obviously if you know the key you can read encrypted messages, that's what the key is for, but there are scenarios where you would want security to degrade gracefully instead of in a brittle all-or-nothing manner: - even if the attacker can read my messages, he cannot tamper with them or write new ones as me. I suppose that could be achieved by having separate encryption and signing keys, but you could do the same but better by encrypting with multiple algorithms. It's not an unstudied area: https://en.wikipedia.org/wiki/Multiple_encryption -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 2015-06-29, Randall Smith rand...@tnr.cc wrote: Same reason newer filesystems like BTRFS use checkusms (BTRFS uses CRC32). The storage machine runs periodic file integrity checks. It has no control over the underlying filesystem. True, but presumably neither does it have anything it can do to rectify the situation if it finds a problem, and the client will have to keep its own secure hash of its file anyway. (Unless I suppose the server actually can request a new copy from the client or another server if it finds a corrupt file?) -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 06/28/2015 09:21 AM, Jon Ribbens wrote: On 2015-06-27, Randall Smith rand...@tnr.cc wrote: Thankyou. Nice points. I do think given the risks (there are always risks) discussed, a successful attack of this nature is not very likely. Worse case, something that looks like this would land on the disk. crc32 checksum + translation table + malware with a generated base64 name and no extension. I'm not sure why you're bothering with the checksum, it doesn't seem to me that it buys you anything. Personally I'd do something like this (pseudocode): Same reason newer filesystems like BTRFS use checkusms (BTRFS uses CRC32). The storage machine runs periodic file integrity checks. It has no control over the underlying filesystem. def obfuscate(data): encode_key = list(range(256)) random.shuffle(encode_key) encode_key = bytes(encode_key) decode_key = bytes(encode_key.index(i) for i in range(256)) return decode_key + data.translate(encode_key) + decode_key def deobfuscate(data): return data[256:-256].translate(data[:256]) The reason for appending the key as well as prepending it is that some anti-virus or malware scanners may well look at the last part of the file first, so putting something entirely locally-generated there may add a bit of safety. You could also simply pad with nulls or something of course, but again I can imagine some tools skipping backwards past nulls. -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 06/27/2015 01:50 PM, Steven D'Aprano wrote: On Sun, 28 Jun 2015 03:08 am, Randall Smith wrote: Though I didn't mention it in the description, the storage server is appending a CRC32 checksum for routine integrity checks. So by the time the data hits the disk, it will have added both a 256 byte translation table and a 4 byte checksum. http://stackoverflow.com/questions/1515914/crc32-collision Not sure why you posted the link. The crc32 checksum is just to check for possible filesystem corruption. The system does periodic data corruption checks. BTRFS uses crc32 checksums also. Please explain. -Randall -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Saturday 27 June 2015 08:27:38 Laura Creighton wrote: In a message of Sat, 27 Jun 2015 20:16:47 +1000, Chris Angelico writes: Okay, Johannes, NOW you're proving that you don't have a clue what you're talking about. D-K effect doesn't go away... ChrisA You need to read the paper again. That was the whole point -- when Kruger and Dunning went and taught the people at the bottom quadrile some basic skill in the task being estimated, and taught people at the top quadrile how poorly their peers were performing, their ability to estimate how they would score relative to their peers improved a whole lot. But, of course, since these were academics studying students, they had access to bottom-quadrile performers who actually wanted to learn and improve. In the real world, it is getting the bottom-performers to even notice that they need improvement that may be the most difficult task. Laura The rest of the readers of this list would do well to change may above to is, and carve the last sentence into something fairly substantial as it is a basic truth. Zircon crystal would be ideal, we've found a few grains of it over 4 Billion years old, but granite would do for this generation. Laura obviously gets it. Sadly, it is entirely too true in the real world. Too often the bottom person who made a good sales pitch, once hired, is either incapable of learning, or loses interest after he has been hired. I've seen both. The basic education they received is to blame for much of that effect. So they wind up getting shuffled around to various sub-jobs until you find something they can do efficiently. Many times they weren't ever aware of why they were being moved. Telling them depresses them, so its usually best to just let it work itself out. Cheers, Gene Heskett -- There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) Genes Web page http://geneslinuxbox.net:6309/gene -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Tue, 30 Jun 2015 06:52 am, Randall Smith wrote: Not sure why you posted the link. The crc32 checksum is just to check for possible filesystem corruption. The system does periodic data corruption checks. BTRFS uses crc32 checksums also. Please explain. The file system can trust that anything writing to a file is allowed to write to it, in doesn't have to defend against malicious writes. As I understand it, your application does. Here is the attack scenario I have in mind: - you write a file to my computer, and neglect to encrypt it; - and record the checksum for later; - I insert malware into your file; - you retrieve the file from me; - if the checksum matches what you have on record, you accept the file; - since you are using CRC, it is quite easy for me to ensure the checksums match after inserting malware; - and I have now successfully injected malware into your computer. I'm making an assumption here -- I assume that the sender records a checksum for uploaded files so that when they get something back again they can tell whether or not it is the same content they uploaded. * * * By the way, regarding the use of a substitution cipher, I spoke to the crypto guy at work, and preimage attack is not quite the right terminology, since that's normally used in the context of hash functions. It's almost a known ciphertext attack, but not quite, since that terminology refers to guessing the key from the ciphertext. I was wrong: cryptographically strong ciphers are generally NOT resistant to what I described as a preimage attack. If the key leaks, using AES won't save you: an attacker with access to the key can produce a ciphertext that decrypts to the malware of his choice, regardless of whether you use AES-256 or rot-13. There may be other encryption methods which don't suffer from that, but he doesn't know of them off the top of his head. His comment was, don't leak the key. The other threat I mentioned is that the receiver will read the content of the file. For that, a strong cipher is much to be preferred over a weak one, and it needs to be encrypted by the sending end, not the receiving end. (If the receiving end does it, it has to keep the key so it can decrypt before sending back, which means the computer's owner can just grab the key and read the files.) -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 2015-06-27, Steven D'Aprano st...@pearwood.info wrote: Despite his initial claim that he doesn't want to use AES because it's too slow implemented as pure Python, Randall has said that the application will offer AES encryption as an option. (He says it is enabled by default, except that the user can turn it off.) So the code is already there, all he has to do is call it. You're still not listening to what he's saying. Everything you have said in the above paragraph is false. He said he is using AES encryption in the client, but that the server does not have the processing power to do so (nor does it need to). He has not said that the user can turn it off, he's just acknowledging the fact that since the user controls their own computer, they can rewrite the client code to do whatever they want, and there's nothing he can do to stop them. The choice ought to be a no-brainer. The fact that folks are seriously considering using something barely one step up from a medieval substitution cipher in 2015 for something with real security consequences if it is broken goes to show what a lousy job the IT industry does for security. The fact that you think that is happening when it isn't shows what a lousy job you have been doing of following the thread. -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 2015-06-27, Randall Smith rand...@tnr.cc wrote: Thankyou. Nice points. I do think given the risks (there are always risks) discussed, a successful attack of this nature is not very likely. Worse case, something that looks like this would land on the disk. crc32 checksum + translation table + malware with a generated base64 name and no extension. I'm not sure why you're bothering with the checksum, it doesn't seem to me that it buys you anything. Personally I'd do something like this (pseudocode): def obfuscate(data): encode_key = list(range(256)) random.shuffle(encode_key) encode_key = bytes(encode_key) decode_key = bytes(encode_key.index(i) for i in range(256)) return decode_key + data.translate(encode_key) + decode_key def deobfuscate(data): return data[256:-256].translate(data[:256]) The reason for appending the key as well as prepending it is that some anti-virus or malware scanners may well look at the last part of the file first, so putting something entirely locally-generated there may add a bit of safety. You could also simply pad with nulls or something of course, but again I can imagine some tools skipping backwards past nulls. -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Fri, Jun 26, 2015 at 7:21 PM, Chris Angelico ros...@gmail.com wrote: On Sat, Jun 27, 2015 at 6:09 AM, Randall Smith rand...@tnr.cc wrote: Give me one plausible scenario where an attacker can cause malware to hit the disk after bytearray.translate with a 256 byte translation table and I'll be thankful to you. The entire 256-byte translation table is significant ONLY if you need all 256 possible bytes. Suppose I want to generate the following byte sequence: \xCD\x19 (Okay, this is a slightly oversimplified example, as this attack doesn't work on a modern Windows. But back in the days of DOS, this program would reboot your computer.) Nice! When I suggested the possibility of a two byte value malicious payload, I thought it an extreme example of the hypothetical attack. I didn't expect that somebody might actually produce one. -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Sat, 27 Jun 2015 03:47 pm, Ian Kelly wrote: [...] Just make the AES encryption mandatory, not optional. Then the user cannot upload unencrypted malicious data, and the receiver cannot read the data. That's two problems solved. And what if somebody else writes a competing version of the client software that doesn't bother with the encryption step at all? The point was that while encryption is expected, it cannot be assumed by the receiver, and in fact if the data is actually malicious, then it likely is not even being sent by the client software in the first place. Right. As I said later in my post, you have a situation where neither party can trust the other. I'm trying to store data on your computer, and I can't trust you not to snoop on it, and you can't trust me not to send you malware. If the app does encrypt the data with AES before sending, then you don't gain any benefit by obfuscating an encrypted file with a classical monoalphabetic substitution cipher. Only if the recipient can *trust* the sender to have performed the encryption, which it can't, no matter how mandatory the OP tries to make it. True. But in either case, a classical (i.e. insecure) cipher doesn't do the job. Suppose that you hire an intern to write the choose key function, and not knowing any better, he simply iterates through the keys in numeric order, one after the other. So the first upload will use key 0, the second key 1, the third key 2, and so on, until key 256! - 1, then start again. In that case, predicting the next key is *trivial*. If I can work out what key you send now (I just upload a file containing \x00\x01\x02...\xFF to myself and see what I get), then I know what key the app will use next. If you upload a file to yourself, the result that you get will have no bearing on what key might be chosen when you upload a file to somebody else. I admit it: I was getting a bit confused between attacks on the sender side and attacks on the receiver side. The attacks I describe depend on the sender's application doing encryption, but given that a malicious uploader can just write their own client, that's redundant. There are easier attacks on sender-side encryption. A back-and-forth argument on Usenet is no substitute between a careful security analysis. Can the sender attack encryption on the client side? Well, Chris has already demonstrated one actual attack, based on a two-byte malicious payload. That proves that the concept is at least possible, even if nobody uses DOS any more. As you go on to say: If the recipient system is using the system random to generate the key, then you can hack the application all you want, and it will give you precisely zero information about the state of the entropy pool on the remote system. You're right. Are there other attacks where I, the sender, can get the recipient to leak information about the key from the receiver? Would you like to bet the answer is always No? I wouldn't. Can you say timing attack? http://codahale.com/a-lesson-in-timing-attacks/ Can you [generic you] believe that attackers can *reliably* attack remote systems based on a 20µs timing differences? If you say No, then you fail Security 101 and should step away from the computer until a security expert can be called in to review your code. I'm not a security expert. I'm not even a talented amateur. *Every time* I suggest that X is secure, the security guy at work shoots me down in flames. But nicely, because I pay his wages wink If I say such-and-such an attack is impossible, feel free to scoff and laugh, because I'm probably wrong. But if I say I don't know what it is, but there's probably an attack you haven't thought of, unless you're a security guy yourself, you probably ought to listen. (And if you are a security guy, then you know how hard it is to secure against unknown attacks.) Tens of millions of zombie computers in botnets are proof that there are exploitable attacks that programmers didn't think of. Or rather, *some* programmers didn't think of them. Some other guys did. I've said it before, and I will say it again: a classical substitution cipher is trivially vulnerable to a preimage attack, strong crypto ciphers are not. You're betting everything on the key being secret. If the keys leaks, or is predictable, the attacker can successfully write malware on the receiver's system. If the keys leak with AES, the system is still secure against a preimage attack. Nobody will be able to guess the key, we don't need strong crypto. The Titanic is unsinkable, we don't need lifeboats for everyone. -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Sat, Jun 27, 2015 at 6:38 PM, Steven D'Aprano st...@pearwood.info wrote: I'm not a security expert. I'm not even a talented amateur. *Every time* I suggest that X is secure, the security guy at work shoots me down in flames. But nicely, because I pay his wages wink Just out of interest, is _anybody_ active in this thread an expert on security? I certainly am not, which means that the proposal I'm currently putting together probably has a whole bunch of vulnerabilities that I haven't thought of. (Though there's no emphasis on encryption anywhere, just signing. I'm *hoping* that RSA public key verification is sufficient, but if it isn't, it would be possible for a malicious user to make a serious mess of stuff.) But I'm under no delusions. I don't say this is secure - all I'm saying is this works in proof-of-concept. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Sat, Jun 27, 2015 at 8:05 PM, Johannes Bauer dfnsonfsdu...@gmx.de wrote: On 27.06.2015 11:27, Jon Ribbens wrote: Johannes might have all the education in the world, but he's demonstrated quite comprehensively in this thread that he doesn't have a clue what he's talking about. Oh, how hurtful. I might even shed a tear or two, but it's pretty clear to me that you're just suffering under the Dunning-Kruger effect. No worries, champ, it's just a phase that'll go away eventually. Okay, Johannes, NOW you're proving that you don't have a clue what you're talking about. D-K effect doesn't go away... ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 27.06.2015 10:53, Chris Angelico wrote: On Sat, Jun 27, 2015 at 6:38 PM, Steven D'Aprano st...@pearwood.info wrote: I'm not a security expert. I'm not even a talented amateur. *Every time* I suggest that X is secure, the security guy at work shoots me down in flames. But nicely, because I pay his wages wink Just out of interest, is _anybody_ active in this thread an expert on security? Yes. I've done a good 10 years of work in the field doing security (mostly applied cryptography on embedded systems with a focus on side channels like DPA, but also security concepts and threat/risk analysis) and spent the last 3-4 years working on my PhD in the field of IT security. My thesis is almost(tm) finished. I would claim to be an expert, yes. I certainly am not, which means that the proposal I'm currently putting together probably has a whole bunch of vulnerabilities that I haven't thought of. (Though there's no emphasis on encryption anywhere, just signing. I'm *hoping* that RSA public key verification is sufficient, but if it isn't, it would be possible for a malicious user to make a serious mess of stuff.) But I'm under no delusions. I don't say this is secure - all I'm saying is this works in proof-of-concept. I must admit that I haven't seen your ideas in this thread? Best regards, Johannes -- Wo hattest Du das Beben nochmal GENAU vorhergesagt? Zumindest nicht öffentlich! Ah, der neueste und bis heute genialste Streich unsere großen Kosmologen: Die Geheim-Vorhersage. - Karl Kaos über Rüdiger Thomas in dsa hidbv3$om2$1...@speranza.aioe.org -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 2015-06-27, Ian Kelly ian.g.ke...@gmail.com wrote: On Fri, Jun 26, 2015 at 7:21 PM, Chris Angelico ros...@gmail.com wrote: On Sat, Jun 27, 2015 at 6:09 AM, Randall Smith rand...@tnr.cc wrote: Give me one plausible scenario where an attacker can cause malware to hit the disk after bytearray.translate with a 256 byte translation table and I'll be thankful to you. The entire 256-byte translation table is significant ONLY if you need all 256 possible bytes. Suppose I want to generate the following byte sequence: \xCD\x19 (Okay, this is a slightly oversimplified example, as this attack doesn't work on a modern Windows. But back in the days of DOS, this program would reboot your computer.) Nice! When I suggested the possibility of a two byte value malicious payload, I thought it an extreme example of the hypothetical attack. I didn't expect that somebody might actually produce one. It's a good example of the interesting things that people can come up with (for example, binary executable files that in fact are comprised entirely of printable ASCII characters), but it isn't in any sense an attack on the system described in this thread. -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 2015-06-27, Chris Angelico ros...@gmail.com wrote: On Sat, Jun 27, 2015 at 7:07 PM, Johannes Bauer dfnsonfsdu...@gmx.de wrote: On 27.06.2015 10:53, Chris Angelico wrote: On Sat, Jun 27, 2015 at 6:38 PM, Steven D'Aprano st...@pearwood.info wrote: I'm not a security expert. I'm not even a talented amateur. *Every time* I suggest that X is secure, the security guy at work shoots me down in flames. But nicely, because I pay his wages wink Just out of interest, is _anybody_ active in this thread an expert on security? Yes. I've done a good 10 years of work in the field doing security (mostly applied cryptography on embedded systems with a focus on side channels like DPA, but also security concepts and threat/risk analysis) and spent the last 3-4 years working on my PhD in the field of IT security. My thesis is almost(tm) finished. I would claim to be an expert, yes. Good, so this isn't like that episode of Yes Minister when they were trying to figure out whether to allow a chemical factory to be built. Johannes might have all the education in the world, but he's demonstrated quite comprehensively in this thread that he doesn't have a clue what he's talking about. -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Sat, 27 Jun 2015 02:05 pm, Devin Jeanpierre wrote: On Fri, Jun 26, 2015 at 8:38 PM, Steven D'Aprano st...@pearwood.info wrote: Now you say that the application encrypts the data, except that the user can turn that option off. Just make the AES encryption mandatory, not optional. Then the user cannot upload unencrypted malicious data, and the receiver cannot read the data. That's two problems solved. No, because another application could pretend to be the file-sending application, but send unencrypted data instead of encrypted data. Did you stop reading my post when you got to that? Because I went on to say: Actually, the more I think about this, the more I come to think that the only way this can be secure is for both the sending client application and the receiving client appl to both encrypt the data. The sender can't trust the receiver not to read the files, so the sender has to encrypt; the receiver can't trust the sender not to send malicious files, so the receiver has to encrypt too. -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Sat, Jun 27, 2015 at 3:59 PM, Ian Kelly ian.g.ke...@gmail.com wrote: On Fri, Jun 26, 2015 at 7:21 PM, Chris Angelico ros...@gmail.com wrote: On Sat, Jun 27, 2015 at 6:09 AM, Randall Smith rand...@tnr.cc wrote: Give me one plausible scenario where an attacker can cause malware to hit the disk after bytearray.translate with a 256 byte translation table and I'll be thankful to you. The entire 256-byte translation table is significant ONLY if you need all 256 possible bytes. Suppose I want to generate the following byte sequence: \xCD\x19 (Okay, this is a slightly oversimplified example, as this attack doesn't work on a modern Windows. But back in the days of DOS, this program would reboot your computer.) Nice! When I suggested the possibility of a two byte value malicious payload, I thought it an extreme example of the hypothetical attack. I didn't expect that somebody might actually produce one. I'm fairly sure this won't actually work on a modern system (I tried it and all that happened was that debug.exe terminated), but it's entirely possible there are other attacks. Or attacks that require only a small number of bytes - maybe create a gzip bomb that will expand to petabytes of data, that probably wouldn't need many unique byte values. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 2015-06-27 04:38, Steven D'Aprano wrote: Maybe you use Python's standard library and the Mersenne Twister. The period of that is huge, possibly bigger than 256! (or not, I forget, and I'm too lazy to look it up). So you think that's safe. But it's not: Mersenne Twister is not a cryptographically secure pseudorandom number generator. If I can get some small number of values from the Twister (by memory, something of the order of 100 such values) then I can predict the rest for ever. 634. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 2015-06-27 08:58, Robert Kern wrote: On 2015-06-27 04:38, Steven D'Aprano wrote: Maybe you use Python's standard library and the Mersenne Twister. The period of that is huge, possibly bigger than 256! (or not, I forget, and I'm too lazy to look it up). So you think that's safe. But it's not: Mersenne Twister is not a cryptographically secure pseudorandom number generator. If I can get some small number of values from the Twister (by memory, something of the order of 100 such values) then I can predict the rest for ever. 634. Bah! 624. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Sat, Jun 27, 2015 at 7:07 PM, Johannes Bauer dfnsonfsdu...@gmx.de wrote: On 27.06.2015 10:53, Chris Angelico wrote: On Sat, Jun 27, 2015 at 6:38 PM, Steven D'Aprano st...@pearwood.info wrote: I'm not a security expert. I'm not even a talented amateur. *Every time* I suggest that X is secure, the security guy at work shoots me down in flames. But nicely, because I pay his wages wink Just out of interest, is _anybody_ active in this thread an expert on security? Yes. I've done a good 10 years of work in the field doing security (mostly applied cryptography on embedded systems with a focus on side channels like DPA, but also security concepts and threat/risk analysis) and spent the last 3-4 years working on my PhD in the field of IT security. My thesis is almost(tm) finished. I would claim to be an expert, yes. Good, so this isn't like that episode of Yes Minister when they were trying to figure out whether to allow a chemical factory to be built. I certainly am not, which means that the proposal I'm currently putting together probably has a whole bunch of vulnerabilities that I haven't thought of. (Though there's no emphasis on encryption anywhere, just signing. I'm *hoping* that RSA public key verification is sufficient, but if it isn't, it would be possible for a malicious user to make a serious mess of stuff.) But I'm under no delusions. I don't say this is secure - all I'm saying is this works in proof-of-concept. I must admit that I haven't seen your ideas in this thread? No, the proposal I'm putting together is unrelated. You'll see the *vast* extent of my security skills here: https://github.com/Rosuav/ThirdSquare My contribution to this thread has been fairly minor, just suggesting one attack that doesn't even work any more, not much else. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
Randall Smith wrote: Chunks of data (about 2MB) are to be stored on machines using a peer-to-peer protocol. The recipient of these chunks can't assume that the payload is benign. While the data senders are supposed to encrypt data, that's not guaranteed, and I'd like to protect the recipient against exposure to nefarious data by mangling or encrypting the data before it is written to disk. My original idea was for the recipient to encrypt using AES. But I want to keep this software pure Python batteries included and not require installation of other platform-dependent software. Pure Python AES and even DES are just way too slow. I don't know that I really need encryption here, but some type of fast mangling algorithm where a bad actor sending a payload can't guess the output ahead of time. Any ideas are appreciated. Thanks. Would it be sufficient to prepend the chunk with one block, say, of random data? To unmangle you'd just strip off that block. BLOCK = os.urandom(BLOCKSIZE) def mangle(source, dest): dest.write(BLOCK) shutil.copyfileobj(source, dest) def unmangle(source, dest): source.read(BLOCKSIZE) shutil.copyfileobj(source, dest) Disclaimer: I did not follow the ongoing discussion. -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 27.06.2015 10:38, Steven D'Aprano wrote: Can you say timing attack? http://codahale.com/a-lesson-in-timing-attacks/ Can you [generic you] believe that attackers can *reliably* attack remote systems based on a 20µs timing differences? If you say No, then you fail Security 101 and should step away from the computer until a security expert can be called in to review your code. Yes, as people do more and more proper crypto (in contrast to crappy stuff like LFSR-based custom keystream generators and such), side channels become of great importance. I'm not a security expert. I'm not even a talented amateur. *Every time* I suggest that X is secure, the security guy at work shoots me down in flames. But nicely, because I pay his wages wink :-) Being shot down in flames is the way to become a security expert, probably the *only* way. I don't know anyone who is an expert who hasn't had that horrible experience at least a dozen of times. It is amazing how many holes you can poke in designs if you look at it from enough angles. Having holes poked in my designs gives you a thourough appreciation for the true crypto experts (i.e. people doing theoretical cryptography). Best regards, Johannes -- Wo hattest Du das Beben nochmal GENAU vorhergesagt? Zumindest nicht öffentlich! Ah, der neueste und bis heute genialste Streich unsere großen Kosmologen: Die Geheim-Vorhersage. - Karl Kaos über Rüdiger Thomas in dsa hidbv3$om2$1...@speranza.aioe.org -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 27.06.2015 11:27, Jon Ribbens wrote: Johannes might have all the education in the world, but he's demonstrated quite comprehensively in this thread that he doesn't have a clue what he's talking about. Oh, how hurtful. I might even shed a tear or two, but it's pretty clear to me that you're just suffering under the Dunning-Kruger effect. No worries, champ, it's just a phase that'll go away eventually. Hugs and kisses, Johannes -- Wo hattest Du das Beben nochmal GENAU vorhergesagt? Zumindest nicht öffentlich! Ah, der neueste und bis heute genialste Streich unsere großen Kosmologen: Die Geheim-Vorhersage. - Karl Kaos über Rüdiger Thomas in dsa hidbv3$om2$1...@speranza.aioe.org -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 27.06.2015 11:17, Chris Angelico wrote: Good, so this isn't like that episode of Yes Minister when they were trying to figure out whether to allow a chemical factory to be built. I must admit that I have no clue about that show or that epsisode in particular and needed to read up on it: https://en.wikipedia.org/wiki/The_Greasy_Pole I must admit that I haven't seen your ideas in this thread? No, the proposal I'm putting together is unrelated. You'll see the *vast* extent of my security skills here: https://github.com/Rosuav/ThirdSquare My contribution to this thread has been fairly minor, just suggesting one attack that doesn't even work any more, not much else. Well, if people already have a solution ready there's a good chance that any criticism falls on deaf ears. In any case something that others have to be responsible for, their party, their choice. I've looked at your code even though I don't know pike. That's the typesafe JavaScript derivative, isn't it? The only thing that I found horrible was the ssh key format to PKCS parsing. Man that's hacky :-) You're creating a DER structure on-the-fly that you fill with the key and that you then have parsed back. I've only seen ssh-keygen used to generate keys (not to initiate actual ssh connections), why don't you use openssl to generate the keys? I think you can generate a RSA keypair in openssl (also valid for ssh should you need it) and I'm pretty sure that you can generate a ssh public key with ssh-keygen from that private keypair file. That would eliminate the need to do this kind of parsing, but it's just a PoC as I understand it. It appears to be online-only, is that correct? Is Internet coverage so good down under? I wish this were the case in Germany :-/ Not 100% about it, but I think that the bus concepts that are active in Germany (locally in some cities) either user asymmetric transponders (i.e. SmartMX), which gives a beautiful, decentralized, secure and offline solution at the cost of being comparatively expensive. The others use symmetric transponders which have limited off-line functionality: i.e. monotonic counters which are reset in a cryptographically secured way by backend systems every time a online-connection persists and which are counted down in the offline case. In any case, interesting. Thanks for sharing. Best regards, Johannes -- Wo hattest Du das Beben nochmal GENAU vorhergesagt? Zumindest nicht öffentlich! Ah, der neueste und bis heute genialste Streich unsere großen Kosmologen: Die Geheim-Vorhersage. - Karl Kaos über Rüdiger Thomas in dsa hidbv3$om2$1...@speranza.aioe.org -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
Laura Creighton writes: Johannes, if you don't know Yes, Minister then you most likely do not know the Politician's Syllogism (which now has its own wikipedia page :) And I _didn't_ do it! Honest!) Something must be done. This is something. Therefore we must do it! Surely that's to be worded as follows? To have a stricter syllogistic form. We need to do something. This is something. Therefore we need to do this. Or, We must do something. This is something. Therefore we must do this. Or, but this feels weaker to me, Something must be done. This is something. Therefore this must be done. ISWIM. In particular, I think the move from the agentless passive (must be done) to the specific expression of agency (we must do) seems to me to be a *different* joke. (I was tempted to call it passive temperature but that would have been yet another, unrelated joke.) :) -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 2015-06-26, Randall Smith rand...@tnr.cc wrote: The only person who can read a file is the owner. That's always the plan, but many a successful exploit has been based on breaking that assumption. If privacy actually matters, that's not a good assumption to rely on as a single point of failure. -- Grant -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 27.06.2015 12:16, Chris Angelico wrote: Okay, Johannes, NOW you're proving that you don't have a clue what you're talking about. D-K effect doesn't go away... :-D It does in some people. I've seen it happen, with knowledge comes humility. Not saying Jon is a lost cause just yet. He's just in intellectual puberty right now. I'm giving him a few years to re-judge. Cheers, Johannes -- Wo hattest Du das Beben nochmal GENAU vorhergesagt? Zumindest nicht öffentlich! Ah, der neueste und bis heute genialste Streich unsere großen Kosmologen: Die Geheim-Vorhersage. - Karl Kaos über Rüdiger Thomas in dsa hidbv3$om2$1...@speranza.aioe.org -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
Johannes, if you don't know Yes, Minister then you most likely do not know the Politician's Syllogism (which now has its own wikipedia page :) And I _didn't_ do it! Honest!) Something must be done. This is something. Therefore we must do it! :) Unfortunatetely, the Politician's Syllogism is not restricted to television comedies. It's alive and well in Brussels, and mixes very nicely with the Dunning-Kruger effect. Laura -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Sat, Jun 27, 2015 at 8:18 PM, Johannes Bauer dfnsonfsdu...@gmx.de wrote: On 27.06.2015 11:17, Chris Angelico wrote: Good, so this isn't like that episode of Yes Minister when they were trying to figure out whether to allow a chemical factory to be built. I must admit that I have no clue about that show or that epsisode in particular and needed to read up on it: https://en.wikipedia.org/wiki/The_Greasy_Pole I must admit that I haven't seen your ideas in this thread? No, the proposal I'm putting together is unrelated. You'll see the *vast* extent of my security skills here: https://github.com/Rosuav/ThirdSquare My contribution to this thread has been fairly minor, just suggesting one attack that doesn't even work any more, not much else. Well, if people already have a solution ready there's a good chance that any criticism falls on deaf ears. In any case something that others have to be responsible for, their party, their choice. I've looked at your code even though I don't know pike. That's the typesafe JavaScript derivative, isn't it? Not really; it's more like Python semantics meets C++ syntax. But that's still off-topic for this list; I'd be happy to continue discussion off-list with anyone who's interested. The most interesting part of the project is the README, to be honest. Even if you can't understand a single line of the code, you'll be able to see the specs. Grokking the code is a bonus. The only thing that I found horrible was the ssh key format to PKCS parsing. Man that's hacky :-) You're creating a DER structure on-the-fly that you fill with the key and that you then have parsed back. I've only seen ssh-keygen used to generate keys (not to initiate actual ssh connections), why don't you use openssl to generate the keys? I think you can generate a RSA keypair in openssl (also valid for ssh should you need it) and I'm pretty sure that you can generate a ssh public key with ssh-keygen from that private keypair file. That would eliminate the need to do this kind of parsing, but it's just a PoC as I understand it. Yeah, it's pretty disgusting. I could actually use Pike to generate the keys, rather than using ssh-keygen at all, but I wanted to demonstrate that this is using a well-known key generation method, ergo I don't need to separately prove that the keys are appropriately random. (Not that I distrust Pike, but it's one less thing to try to prove.) It appears to be online-only, is that correct? Is Internet coverage so good down under? I wish this were the case in Germany :-/ Correct, that's one of the key changes. Our current system (Myki) is a stored-value card - if you recharge a hundred bucks, then clone your card, you'd have two cards with a hundred bucks each. With ThirdSquare, if you clone your card, you have two cards that draw on the same hundred bucks. (Though I still don't want people cloning cards, as it would confuse the system some. Plus, it'd be really REALLY bad if someone could clone someone *else's* card, thus effectively stealing a copy of it. So the cards themselves need some kind of security, but short of public key crypto performed actually on the cards, I'm not sure how to do that.) My plan is to stick a 3G/4G device onto each bus. So long as there's mobile phone coverage on all routes, which should be fine in suburbia, the system will work. It can cope with short dropouts (up to ten minutes), queueing requests in the client. Not 100% about it, but I think that the bus concepts that are active in Germany (locally in some cities) either user asymmetric transponders (i.e. SmartMX), which gives a beautiful, decentralized, secure and offline solution at the cost of being comparatively expensive. The others use symmetric transponders which have limited off-line functionality: i.e. monotonic counters which are reset in a cryptographically secured way by backend systems every time a online-connection persists and which are counted down in the offline case. Thanks for the name, I'll check that out. Ideally, I'd like to use off-the-shelf hardware for everything, and open-source software. It should be possible for anyone to pick up the specs, buy their own hardware, and create something that interoperates with the rest of the system - for instance, the Red Engine Group could allow their customers to ding their tickets to buy coffee - simply by providing the appropriate public keys to the central authorizing database. That would be a massive improvement over Melbourne's previous ticketing system (Metcard), which was entirely proprietary; expansion of the fleet required additional validators, and basically the public transport operators had to beg, cap in hand, for the company to do them a favour - for which, of course, they then also had to pay the earth for. But I digress. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
In a message of Sat, 27 Jun 2015 15:23:07 +0300, Jussi Piitulainen writes: Laura Creighton writes: Johannes, if you don't know Yes, Minister then you most likely do not know the Politician's Syllogism (which now has its own wikipedia page :) And I _didn't_ do it! Honest!) Something must be done. This is something. Therefore we must do it! Surely that's to be worded as follows? To have a stricter syllogistic form. We need to do something. This is something. Therefore we need to do this. Somehow doesn't have the same comedic ring, though. So the version I posted is what was on the tv show. (Or rather in Yes, Prime Minister.) The Minister becomes Prime Minister for the last 2 seasons. In the television show it is just called 'Politician's Logic'. Not sure who started calling it the Politicians Syllogism. Laura -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Sat, Jun 27, 2015 at 5:33 AM, Chris Angelico ros...@gmail.com wrote: On Sat, Jun 27, 2015 at 8:18 PM, Johannes Bauer dfnsonfsdu...@gmx.de wrote: I've looked at your code even though I don't know pike. That's the typesafe JavaScript derivative, isn't it? Not really; it's more like Python semantics meets C++ syntax. But that's still off-topic for this list; I'd be happy to continue discussion off-list with anyone who's interested. That description could apply equally well to Javascript, though. ;-) -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 2015-06-27, Johannes Bauer dfnsonfsdu...@gmx.de wrote: On 27.06.2015 11:27, Jon Ribbens wrote: Johannes might have all the education in the world, but he's demonstrated quite comprehensively in this thread that he doesn't have a clue what he's talking about. Oh, how hurtful. I might even shed a tear or two, but it's pretty clear to me that you're just suffering under the Dunning-Kruger effect. No worries, champ, it's just a phase that'll go away eventually. I guess we need to add the Dunning-Kruger effect to that ever-growing list of things that you don't understand then... -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
In a message of Sat, 27 Jun 2015 20:16:47 +1000, Chris Angelico writes: Okay, Johannes, NOW you're proving that you don't have a clue what you're talking about. D-K effect doesn't go away... ChrisA You need to read the paper again. That was the whole point -- when Kruger and Dunning went and taught the people at the bottom quadrile some basic skill in the task being estimated, and taught people at the top quadrile how poorly their peers were performing, their ability to estimate how they would score relative to their peers improved a whole lot. But, of course, since these were academics studying students, they had access to bottom-quadrile performers who actually wanted to learn and improve. In the real world, it is getting the bottom-performers to even notice that they need improvement that may be the most difficult task. Laura -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Sun, 28 Jun 2015 01:09 am, Ian Kelly wrote: On Sat, Jun 27, 2015 at 2:38 AM, Steven D'Aprano st...@pearwood.info wrote: Can you [generic you] believe that attackers can *reliably* attack remote systems based on a 20µs timing differences? If you say No, then you fail Security 101 and should step away from the computer until a security expert can be called in to review your code. Of course. I wouldn't bet the house on it, but with the proposed substitution cipher system, I don't see why there would be any measurable timing differences at all based on the choice of key. I wouldn't bet one wooden nickle on it. Not without a security audit of the application. And then what happens when the implementation changes and the audit is no longer valid? Despite his initial claim that he doesn't want to use AES because it's too slow implemented as pure Python, Randall has said that the application will offer AES encryption as an option. (He says it is enabled by default, except that the user can turn it off.) So the code is already there, all he has to do is call it. It might not be a timing attack. Maybe there's a vulnerability in the application that if you upload a sufficiently large file, a buffer will overflow and you can force the key of your choosing. Who knows? Bugs happen. The nature of how the hypothetical key leakage happens is less important than the consequences if there is one. Randall can: (1) bet the security of his application and his users on the key never leaking; Why have you situated a naked flame right next to the gas tank? It's okay, I'm confident that the tank will never leak. or (2) use something which, *even if the key leaks*, is still resistant to preimage attacks. The choice ought to be a no-brainer. The fact that folks are seriously considering using something barely one step up from a medieval substitution cipher in 2015 for something with real security consequences if it is broken goes to show what a lousy job the IT industry does for security. The time to obfuscate a single byte is constant, Are you sure about that? Bet your house? How about your computer? # Python 3.3 on Linux, YMMV py text = 'NOBODY expects the Spanish Inquisition!'*5 py import string py s = string.digits + string.ascii_letters py t = (string.ascii_uppercase + string.digits[::-1] + ... string.ascii_lowercase) py trans1 = str.maketrans('abcdef', 'fedcba') py trans2 = str.maketrans(s, t) py trans3 = str.maketrans('aZ', 'Za') py with Stopwatch(): ... x = str.translate(text, trans1) ... time taken: 0.427513 seconds py with Stopwatch(): ... x = str.translate(text, trans2) ... time taken: 0.228869 seconds py with Stopwatch(): ... x = str.translate(text, trans3) ... time taken: 0.387105 seconds so the total time to obfuscate the payload should just be a function of the length of the data. Good thing you didn't bet your house. -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
Michael Torrie torr...@gmail.com writes: Furthermore you cannot prove a negative, which is what proving security is for anything but the trivial case. Are you saying this is untrue? I've always thought that there are no two even numbers that when you add them together, give you an odd number. Are you saying that statement can't be proven? But how does one prove a system is secure except by enumerating attack vectors In the case of encryption, you do a reduction proof to a recognized primitive like AES. That is, you show that if your system is breakable, you can transform the break into a break against AES itself. That's the best you can do at the moment, because the open status of the P!=NP problem means that no one knows how to prove that any primitive (such as AES) is secure. The reduction proof means that the evidence for AES's security also applies to your system. Of course that's just for the cipher itself. For the entire surrounding software/hardware/process system which is mostly not mathematical, you're right, there's no way to (mathematically) prove security or even to define it. -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Sun, 28 Jun 2015 03:35 am, Steven D'Aprano wrote: On Sun, 28 Jun 2015 01:09 am, Ian Kelly wrote: The time to obfuscate a single byte is constant, Are you sure about that? Bet your house? How about your computer? Correction: the example I showed uses str, not bytes. With bytes, the timing differences are much smaller. Are they statistically distinguishable? Don't know. On my machine, they appear to be, although that could be just a fluke. Is there a guarantee that bytes.translate will always be constant time per byte? No of course not. Might the application itself some day start using str.translate? Who knows? The point is, you cannot rely on this. Preventing leakage is *hard*. -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Sat, Jun 27, 2015 at 2:38 AM, Steven D'Aprano st...@pearwood.info wrote: Can you [generic you] believe that attackers can *reliably* attack remote systems based on a 20µs timing differences? If you say No, then you fail Security 101 and should step away from the computer until a security expert can be called in to review your code. Of course. I wouldn't bet the house on it, but with the proposed substitution cipher system, I don't see why there would be any measurable timing differences at all based on the choice of key. The time to obfuscate a single byte is constant, so the total time to obfuscate the payload should just be a function of the length of the data. Secondly, the 200 (or whatever) response to the client does not depend on the outcome of the obfuscation step, so there is no reason that the server cannot simply respond first and obfuscate after, giving the client nothing to time. -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 06/26/2015 08:21 PM, Chris Angelico wrote: On Sat, Jun 27, 2015 at 6:09 AM, Randall Smith rand...@tnr.cc wrote: Give me one plausible scenario where an attacker can cause malware to hit the disk after bytearray.translate with a 256 byte translation table and I'll be thankful to you. The entire 256-byte translation table is significant ONLY if you need all 256 possible bytes. Suppose I want to generate the following byte sequence: \xCD\x19 (Okay, this is a slightly oversimplified example, as this attack doesn't work on a modern Windows. But back in the days of DOS, this program would reboot your computer.) How many truly different translation tables are there if I'm trying to produce this? Just 256*255, or 65280. If I send random two-byte files, there is one chance in that of my malware successfully landing. Once I've sent about 45,000 of those files, I have a fifty-fifty chance of having hit it. Send twice as many, I have a 75% chance of success, etc. Yes, that's true. It's even an issue with AES, which uses padding to overcome. That said, remember these are bytes going straight to disk. I'm really not concerned about 2 or 3 byte malware and the probability plunges after that. And remember, normal use case is AES encrypted data. Quite interesting. Though I didn't mention it in the description, the storage server is appending a CRC32 checksum for routine integrity checks. So by the time the data hits the disk, it will have added both a 256 byte translation table and a 4 byte checksum. I think that would interfere with any extremely short malware. Malware can be crafted to fit within certain restrictions. I saw a proof-of-concept and analysis document detailing a particular remote code execution/privilege escalation attack that involved stuffing text into an entry field and then inducing the program to read that into its stack, finally triggering it by some sort of buffer overflow, I think. The text had to be no more than X bytes long (because that's all the entry field was set to accept - it'd truncate after that), and had to not contain any NUL bytes, and there might have been other restrictions too. Sure, it makes it harder to write your malware... but imagine if you can write something in just a handful of different bytes, which then goes and triggers something else. You could have an extremely plausible attack that might need only a day's uploading to deliver. It makes no difference that there are 256! possible encryption keys, if most of them have the same result. ChrisA Thankyou. Nice points. I do think given the risks (there are always risks) discussed, a successful attack of this nature is not very likely. Worse case, something that looks like this would land on the disk. crc32 checksum + translation table + malware with a generated base64 name and no extension. Doesn't seem like much of a threat. Much less likely than a bug in the standard Crytpo libraries. -Randall -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 06/26/2015 03:11 PM, Johannes Bauer wrote: You misunderstand. This is now how it works, this is not how any of this works. Steven does not *at all* have to prove to you your system is breakable or show actual attacks. YOU have to prove that your system is secure. Ahh the holy grail of computer science. Now it's been a while since I finished my CS degree, but I recall spending a lot of time in class talking about the proving code correctness, which is a similar problem, and learning that that was thought to be NP complete. Furthermore you cannot prove a negative, which is what proving security is for anything but the trivial case. Are you saying this is untrue? Obviously there are best practices, which you are an expert in. But how does one prove a system is secure except by enumerating attack vectors and addressing each one, preferably in the design phase? -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 06/27/2015 03:29 AM, Peter Otten wrote: Would it be sufficient to prepend the chunk with one block, say, of random data? To unmangle you'd just strip off that block. BLOCK = os.urandom(BLOCKSIZE) def mangle(source, dest): dest.write(BLOCK) shutil.copyfileobj(source, dest) def unmangle(source, dest): source.read(BLOCKSIZE) shutil.copyfileobj(source, dest) Disclaimer: I did not follow the ongoing discussion. That is happening as a side effect. Though not completely random, after running the data through a translation table, the 256 byte table is prepended. Then a 4 byte checksum is calculated and prepended. -Randall -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 06/27/2015 07:38 AM, Grant Edwards wrote: On 2015-06-26, Randall Smith rand...@tnr.cc wrote: The only person who can read a file is the owner. That's always the plan, but many a successful exploit has been based on breaking that assumption. If privacy actually matters, that's not a good assumption to rely on as a single point of failure. -- Grant The owner (client software) encrypts the data using AES. This is the default behavior of the client software. If the client chooses to disable encryption, that's their issue for sure. I'm trying to make sure it doesn't become the storage server's issue too. -Randall -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Jun 27, 2015 11:51 AM, Paul Rubin no.email@nospam.invalid wrote: Michael Torrie torr...@gmail.com writes: Furthermore you cannot prove a negative, which is what proving security is for anything but the trivial case. Are you saying this is untrue? I've always thought that there are no two even numbers that when you add them together, give you an odd number. Are you saying that statement can't be proven? But how does one prove a system is secure except by enumerating attack vectors In the case of encryption, you do a reduction proof to a recognized primitive like AES. That is, you show that if your system is breakable, you can transform the break into a break against AES itself. That's the best you can do at the moment, because the open status of the P!=NP problem means that no one knows how to prove that any primitive (such as AES) is secure. The reduction proof means that the evidence for AES's security also applies to your system. Of course that's just for the cipher itself. For the entire surrounding software/hardware/process system which is mostly not mathematical, you're right, there's no way to (mathematically) prove security or even to define it. Ahh okay. So what he's referring to must be such reductions and proofs of these provable aspects, though he spoke very broadly. -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Sun, 28 Jun 2015 03:08 am, Randall Smith wrote: Though I didn't mention it in the description, the storage server is appending a CRC32 checksum for routine integrity checks. So by the time the data hits the disk, it will have added both a 256 byte translation table and a 4 byte checksum. http://stackoverflow.com/questions/1515914/crc32-collision -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Sun, 28 Jun 2015 04:22 am, Randall Smith wrote: The owner (client software) encrypts the data using AES. This is the default behavior of the client software. If the client chooses to disable encryption, that's their issue for sure. I cannot imagine what you think you gain from allowing that to be optional. Apart from privacy and security breaches. -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Sun, 28 Jun 2015 06:30 am, Devin Jeanpierre wrote: On Fri, Jun 26, 2015 at 11:16 PM, Steven D'Aprano st...@pearwood.info wrote: On Sat, 27 Jun 2015 02:05 pm, Devin Jeanpierre wrote: On Fri, Jun 26, 2015 at 8:38 PM, Steven D'Aprano st...@pearwood.info wrote: Now you say that the application encrypts the data, except that the user can turn that option off. Just make the AES encryption mandatory, not optional. Then the user cannot upload unencrypted malicious data, and the receiver cannot read the data. That's two problems solved. No, because another application could pretend to be the file-sending application, but send unencrypted data instead of encrypted data. Did you stop reading my post when you got to that? Because I went on to say: At that point I quit in frustration, yeah. Actually, the more I think about this, the more I come to think that the only way this can be secure is for both the sending client application and the receiving client appl to both encrypt the data. The sender can't trust the receiver not to read the files, so the sender has to encrypt; the receiver can't trust the sender not to send malicious files, so the receiver has to encrypt too. When you realize you've said something completely wrong, you should edit your email. If both the sender and receiver encrypt the data, how is is completely wrong to say that encrypting data should be mandatory? -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Fri, Jun 26, 2015 at 11:16 PM, Steven D'Aprano st...@pearwood.info wrote: On Sat, 27 Jun 2015 02:05 pm, Devin Jeanpierre wrote: On Fri, Jun 26, 2015 at 8:38 PM, Steven D'Aprano st...@pearwood.info wrote: Now you say that the application encrypts the data, except that the user can turn that option off. Just make the AES encryption mandatory, not optional. Then the user cannot upload unencrypted malicious data, and the receiver cannot read the data. That's two problems solved. No, because another application could pretend to be the file-sending application, but send unencrypted data instead of encrypted data. Did you stop reading my post when you got to that? Because I went on to say: At that point I quit in frustration, yeah. Actually, the more I think about this, the more I come to think that the only way this can be secure is for both the sending client application and the receiving client appl to both encrypt the data. The sender can't trust the receiver not to read the files, so the sender has to encrypt; the receiver can't trust the sender not to send malicious files, so the receiver has to encrypt too. When you realize you've said something completely wrong, you should edit your email. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Sun, Jun 28, 2015 at 4:51 AM, Steven D'Aprano st...@pearwood.info wrote: On Sun, 28 Jun 2015 04:22 am, Randall Smith wrote: The owner (client software) encrypts the data using AES. This is the default behavior of the client software. If the client chooses to disable encryption, that's their issue for sure. I cannot imagine what you think you gain from allowing that to be optional. Apart from privacy and security breaches. I've no idea whether this is the case or not, but one thing you might gain is independence from a third-party module. You could, for instance, automatically AES-encrypt your data, but only if from Crypto.Cipher import AES didn't raise ImportError. That effectively makes encryption optional (the program won't barf for lack of pycrypto installation), while still clearly being the default - and if you have a nice loud warning, then it's clear that encryption is the normal state, and the fallback is a lesser state. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Sat, Jun 27, 2015 at 6:18 PM, Steven D'Aprano st...@pearwood.info wrote: On Sun, 28 Jun 2015 06:30 am, Devin Jeanpierre wrote: On Fri, Jun 26, 2015 at 11:16 PM, Steven D'Aprano st...@pearwood.info wrote: On Sat, 27 Jun 2015 02:05 pm, Devin Jeanpierre wrote: On Fri, Jun 26, 2015 at 8:38 PM, Steven D'Aprano st...@pearwood.info wrote: Now you say that the application encrypts the data, except that the user can turn that option off. Just make the AES encryption mandatory, not optional. Then the user cannot upload unencrypted malicious data, and the receiver cannot read the data. That's two problems solved. No, because another application could pretend to be the file-sending application, but send unencrypted data instead of encrypted data. Did you stop reading my post when you got to that? Because I went on to say: At that point I quit in frustration, yeah. Actually, the more I think about this, the more I come to think that the only way this can be secure is for both the sending client application and the receiving client appl to both encrypt the data. The sender can't trust the receiver not to read the files, so the sender has to encrypt; the receiver can't trust the sender not to send malicious files, so the receiver has to encrypt too. When you realize you've said something completely wrong, you should edit your email. If both the sender and receiver encrypt the data, how is is completely wrong to say that encrypting data should be mandatory? That isn't what I was calling completely wrong. This is: Just make the AES encryption mandatory, not optional. Then the user cannot upload unencrypted malicious data, and the receiver cannot read the data. That's two problems solved. The user can still upload unencrypted malicious data by writing their own client that doesn't have mandatory AES encryption. You realized this later in the email, apparently, which is why you should have edited your own email to delete your original, insecure, suggestion. :( That said, I appreciate the work you've done here asking for a specific threat model and pushing back on the idea that it's up to python-list to prove something is insecure, not the other way around. That's important. I think, for the same reasons, it's also important to be really careful what cryptosystems we discuss, and not suggest or appear to suggest ones that won't work. P.S. FWIW, the base64 idea has a lot of promise and is probably fundamentally better than a crypto algorithm. With something along the lines of base64 -- say, encoding a file using just the letters 'a' and 'b' -- one might try to make it it literally impossible to write bad things to disk, whereas with any crypto, it is always possible to obtain the key, so one has to be careful with key management to prevent/mitigate that. (One might add: why not both? Beats me. I like using extension modules.) P.P.S.: of course, I'm not an expert. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Sat, Jun 27, 2015 at 11:35 AM, Steven D'Aprano st...@pearwood.info wrote: On Sun, 28 Jun 2015 01:09 am, Ian Kelly wrote: On Sat, Jun 27, 2015 at 2:38 AM, Steven D'Aprano st...@pearwood.info wrote: Can you [generic you] believe that attackers can *reliably* attack remote systems based on a 20µs timing differences? If you say No, then you fail Security 101 and should step away from the computer until a security expert can be called in to review your code. Of course. I wouldn't bet the house on it, but with the proposed substitution cipher system, I don't see why there would be any measurable timing differences at all based on the choice of key. I wouldn't bet one wooden nickle on it. Not without a security audit of the application. And then what happens when the implementation changes and the audit is no longer valid? I don't disagree about the security audit, although I think you'll find that such things will require a greater investment of resources than a wooden nickel. Despite his initial claim that he doesn't want to use AES because it's too slow implemented as pure Python, Randall has said that the application will offer AES encryption as an option. Once again you're confusing what he said about the server with what he said about the client. Just because he considers it too slow for data mangling on the server doesn't make it too slow for any use. The time to obfuscate a single byte is constant, Are you sure about that? Bet your house? How about your computer? # Python 3.3 on Linux, YMMV py text = 'NOBODY expects the Spanish Inquisition!'*5 py import string py s = string.digits + string.ascii_letters py t = (string.ascii_uppercase + string.digits[::-1] + ... string.ascii_lowercase) py trans1 = str.maketrans('abcdef', 'fedcba') py trans2 = str.maketrans(s, t) py trans3 = str.maketrans('aZ', 'Za') py with Stopwatch(): ... x = str.translate(text, trans1) ... time taken: 0.427513 seconds py with Stopwatch(): ... x = str.translate(text, trans2) ... time taken: 0.228869 seconds py with Stopwatch(): ... x = str.translate(text, trans3) ... time taken: 0.387105 seconds Your examples are using partial keys of different sizes. It's hardly surprising that the timing varies when you pass dicts of varying sizes as the translation tables. py a = list(range(256)) py b = random.sample(a, 256) py c = random.sample(a, 256) py d = random.sample(a, 256) py min(timeit.repeat(str.translate(text, a), from __main__ import text, a, number=10, repeat=10)) 0.9780099680647254 py min(timeit.repeat(str.translate(text, b), from __main__ import text, b, number=10, repeat=10)) 0.9837233647704124 py min(timeit.repeat(str.translate(text, c), from __main__ import text, c, number=10, repeat=10)) 0.9627216667868197 py min(timeit.repeat(str.translate(text, d), from __main__ import text, d, number=10, repeat=10)) 0.9793561780825257 py min(timeit.repeat(str.translate(text, c), from __main__ import text, c, number=10, repeat=10)) 0.9840573272667825 I ran it on c a second time to see if the 0.962 timing was systemic or a fluke. The fact that c produced both the shortest and longest timings out of only two runs lends me confidence (for the purpose of this discussion) that the variation seen in these timings is random and not correlated to the keys used. -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Fri, Jun 26, 2015 at 9:38 PM, Steven D'Aprano st...@pearwood.info wrote: With respect Randall, you contradict yourself. Is there any wonder that some of us (well, me at least) is suspicious and confused, when your story changes as often as the weather? Sometimes you say that the client software uses AES encryption. Sometimes you say that you don't want to use AES encryption because you want the client to be pure Python, and a pure-Python implementation would be too slow. Your very first post says: My original idea was for the recipient to encrypt using AES. But I want to keep this software pure Python batteries included and not require installation of other platform-dependent software. Pure Python AES and even DES are just way too slow. In the context of the initial post, this was referring to the data mangling done by the receiver; it has no bearing on the form of the data sent by the application. Sometimes you say the user is supposed to encrypt the data themselves: While the data senders are supposed to encrypt data, that's not guaranteed Whereas this clearly describes the behavior of the application itself. Now you say that the application encrypts the data, except that the user can turn that option off. Just make the AES encryption mandatory, not optional. Then the user cannot upload unencrypted malicious data, and the receiver cannot read the data. That's two problems solved. And what if somebody else writes a competing version of the client software that doesn't bother with the encryption step at all? The point was that while encryption is expected, it cannot be assumed by the receiver, and in fact if the data is actually malicious, then it likely is not even being sent by the client software in the first place. If the app does encrypt the data with AES before sending, then you don't gain any benefit by obfuscating an encrypted file with a classical monoalphabetic substitution cipher. Only if the recipient can *trust* the sender to have performed the encryption, which it can't, no matter how mandatory the OP tries to make it. Suppose that you hire an intern to write the choose key function, and not knowing any better, he simply iterates through the keys in numeric order, one after the other. So the first upload will use key 0, the second key 1, the third key 2, and so on, until key 256! - 1, then start again. In that case, predicting the next key is *trivial*. If I can work out what key you send now (I just upload a file containing \x00\x01\x02...\xFF to myself and see what I get), then I know what key the app will use next. If you upload a file to yourself, the result that you get will have no bearing on what key might be chosen when you upload a file to somebody else. Even if I can't do that, I might be able to guess the seed: I know what time the application started up, to within a few milliseconds, How? and I know (or can guess) how many random numbers you have used, How? Except... you're getting your random numbers from a system *I* control. No you don't. If you did already control the target system, then as already suggested, you have no need to attack the data upload; you can just write whatever data you want to disk. This is like suggesting that the sudoers file is insecure because a user with root access would be able to add themselves to it. If the attacker controlled the machine the app was on, why would it fool with /dev/urandom? I think he'd just plant the files he wanted to plant and be done. This is non-nonsensical anyway. No, you don't understand the nature of the attack. In this scenario, the sender is the attacker. I want to upload malicious files to the receiver. You are trying to stop me, that's the whole point of mangling or encrypting the files. (Your words.) So I, the sender, prepare a file such that when you mangle it, the resulting mangled content is the malicious content I want. If you use a substitution cipher, I can do this if I can guess or force the key. If you use strong crypto, I can't. However, I can hack the application. The client sits on my computer, it's pure Python, even if it isn't I can still hack the application, I don't need access to the source code. If the recipient system is using the system random to generate the key, then you can hack the application all you want, and it will give you precisely zero information about the state of the entropy pool on the remote system. Yes. Do you think that's hard for an attacker who has access to your application, possibly including the source code, and controls all the sources of entropy on the system your application is running on? I don't have to *randomly* guess. I control what time your application starts, I control what randomness you get from /dev/urandom, I control how many keys you go through, I might even be able to read the source code of the application (not that I need to, that just makes it
Re: Pure Python Data Mangling or Encrypting
On Fri, 26 Jun 2015 11:01 am, Ian Kelly wrote: You're making the same mistake that Steven did in misunderstanding the threat model. I don't think I'm misunderstanding the threat, I think I'm pointing out a threat which the OP is hoping to just ignore. In an earlier post, I suggested that the threat model should involve at least *three* different attacks, apart from the usual man-in-the-model attacks of data in transit. One is that the attacker is the person sending the data. E.g. I want to send a nasty payload (say, malware, or an offensive image). Another is that the attacker is the recipient of the file, who wants to read the sender's data. As far as I can tell, the OP's plan to defend the sender's privacy is to dump responsibility for encrypting the files in the sender's lap. As far as I'm concerned, perhaps as many as one user in 2 will pre-encrypt their files. (Early adopters will be unrepresentative of the eventual user base of this system. If this takes off, the user base will likely end up dominated by people who think that qwerty is the epitome of unguessable passwords.) Users just don't use crypto unless their applications do it for them. My opinion is that the application ought to do so, and not expect Aunt Tillie to learn how to correctly use encryption software before uploading her files. http://www.catb.org/jargon/html/A/Aunt-Tillie.html It is the OP's prerogative to disagree, of course, but to me, if the OP's app doesn't use strong crypto to encrypt users' data, that's tantamount to saying they don't care about their users' data privacy. Using a monoalphabetic substitution cipher to obfuscate the data is not strong crypto. The goal isn't to prevent the attacker from working out the key for a file that has already been obfuscated. Any real data that might be exposed by a vulnerability in the server is presumed to have already been strongly encrypted by the user. I think that's a ridiculously unrealistic presumption, unless your user-base is entirely taken from a very small subset of security savvy and pedantically careful users. The goal is to prevent the attacker from guessing a key that hasn't even been generated yet, which could be exploited to engineer the obfuscated content into something malicious. They don't need to predict the key exactly. If they can predict that the key will be, lets say, one of these thousand values, then they can generate one thousand files and upload them. One of them will match the key, and there's your exploit. That's one attack. A second attack is to force the key. The attacker controls the machine the application is running on, they control /dev/urandom and can feed your app whatever not-so-random numbers they like, so potentially they can force the app to use the key of their choosing. Then they don't need 1000 files, they just need one. That's two. Does anyone think that I've thought of all the possible attacks? (Well, hypothetical attacks. I acknowledge that I don't know the application, and cannot be sure that it *actually is* vulnerable to these attacks.) The problem here is that a monoalphabetic substitution cipher is not resistant to preimage attacks. Your only defence is that the key is unknown. If the attacker can force the key, or predict the key, or guess a small range of keys, they can exploit your weak cipher. (Technically, preimage attack is usually used to refer to attacks on hash functions. I'm not sure if the same name is used for attacks on ciphers.) https://en.wikipedia.org/wiki/Preimage_attack With a strong crypto cipher, there are no known preimage attacks. Even if the attacker knows exactly what key you are using, they cannot predict what preimage they need to supply in order to generate the malicious payload they want after encryption. (As far as I know.) That is the critical issue right there. The sort of simple monoalphabetic substitution cipher using bytes.translate that the OP is using is vulnerable to preimage attacks. Strong crypto is not. There are no frequency-based attacks possible here, because you can't do frequency analysis on the result of a key that hasn't even been generated yet. Frequency-based attacks apply to a different threat. I'm referring to at least two different attacks here, with different attackers and different victims. Don't mix them up. Assuming that you have no attack on the key generation itself, the Not a safe assumption! best you can do is send a file deobfuscated with a random key and hope that the recipient randomly chooses the same key; the odds of that happening are 1 in 256!. It's easy to come up with attacks which are no better than brute force. It's the attacks which are better than brute force that you have to watch out for. -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 2015-06-26, Johannes Bauer dfnsonfsdu...@gmx.de wrote: On 26.06.2015 22:09, Randall Smith wrote: You've gone on a rampage about nothing. My original description said the client was supposed to encrypt the data, but you want to assume the opposite for some unknown reason. While you seem to think that Steven is rampaging about nothing, he does have a fair point: You consistently were vague about wheter you want to have encryption, authentication or obfuscation of data. This suggests that you may not be so sure yourself what it is you actually want. He hasn't been vague, you and Steven just haven't been paying attention. You always play around with the 256! which would be a ridiculously high security margin (1684 bits of security, w!). You totally ignore that the system can be broken in a linear fashion. No, it can't, because the attacker does not have access to the ciphertext. Nobody assumes you're a moron. But it's safe to assume that you're a crypto layman, because only laymen have no clue on how difficult it is to get cryptography even remotely right. Amateur crypto is indeed a bad idea. But what you're still not getting is that what he's doing here *isn't crypto*. He's just trying to avoid letting third parties write completely arbitrary data to the disk. You know what would be a perfectly good solution to his problem? Base 64 encoding. That would solve the issue pretty much completely, the only reason it's not an ideal solution is that it of course increases the size of the data. That people in 2015 actually defend inventing a substitution-cipher cryptosystem sends literally shivers down my spine. Nobody is defending such a thing, you just haven't understood what problem is being solved here. -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 06/26/2015 12:06 PM, Steven D'Aprano wrote: On Fri, 26 Jun 2015 11:01 am, Ian Kelly wrote: You're making the same mistake that Steven did in misunderstanding the threat model. I don't think I'm misunderstanding the threat, I think I'm pointing out a threat which the OP is hoping to just ignore. I'm not hoping to ignore anything. I didn't explain the entire system, as it was not necessary to find a solution to the problem at hand. But since you want to make negative assumptions about what I didn't tell you, I'll gladly address your accusations of negligence. In an earlier post, I suggested that the threat model should involve at least *three* different attacks, apart from the usual man-in-the-model attacks of data in transit. All communication is secured using TLS and authentication handled by X.509 certificates. This prevents man in the middle attacks. Certificates are signed by CAs I control. One is that the attacker is the person sending the data. E.g. I want to send a nasty payload (say, malware, or an offensive image). Another is that the attacker is the recipient of the file, who wants to read the sender's data. The only person who can read a file is the owner. AES encryption is built into the client software. The only way data can be uploaded unencrypted is if encryption is intentionally disabled. As far as I can tell, the OP's plan to defend the sender's privacy is to dump responsibility for encrypting the files in the sender's lap. As far as I'm concerned, perhaps as many as one user in 2 will pre-encrypt their files. (Early adopters will be unrepresentative of the eventual user base of this system. If this takes off, the user base will likely end up dominated by people who think that qwerty is the epitome of unguessable passwords.) Making assumptions again. See above. The client software encrypts by default. You're also assuming there is no password strength checking. Users just don't use crypto unless their applications do it for them. And it does. My opinion is that the application ought to do so, and not expect Aunt Tillie to learn how to correctly use encryption software before uploading her files. http://www.catb.org/jargon/html/A/Aunt-Tillie.html It is the OP's prerogative to disagree, of course, but to me, if the OP's app doesn't use strong crypto to encrypt users' data, that's tantamount to saying they don't care about their users' data privacy. Using a monoalphabetic substitution cipher to obfuscate the data is not strong crypto. You've gone on a rampage about nothing. My original description said the client was supposed to encrypt the data, but you want to assume the opposite for some unknown reason. The goal isn't to prevent the attacker from working out the key for a file that has already been obfuscated. Any real data that might be exposed by a vulnerability in the server is presumed to have already been strongly encrypted by the user. I think that's a ridiculously unrealistic presumption, unless your user-base is entirely taken from a very small subset of security savvy and pedantically careful users. The difference is he's not assuming I'm a moron. He's giving me the benefit of the doubt. That plus I actually said, data senders are supposed to encrypt data. In a networked system, you can't make assumptions about what the other peers are doing. You have to handle what comes across the wire. You also have to consider that you may come under attack. That's what this is about. The goal is to prevent the attacker from guessing a key that hasn't even been generated yet, which could be exploited to engineer the obfuscated content into something malicious. They don't need to predict the key exactly. If they can predict that the key will be, lets say, one of these thousand values, then they can generate one thousand files and upload them. One of them will match the key, and there's your exploit. That's one attack. Thousand Values ??? Isn't it 256!, which is just freaking huge! import math; math.factorial(256) A second attack is to force the key. The attacker controls the machine the application is running on, they control /dev/urandom and can feed your app whatever not-so-random numbers they like, so potentially they can force the app to use the key of their choosing. Then they don't need 1000 files, they just need one. If the attacker controlled the machine the app was on, why would it fool with /dev/urandom? I think he'd just plant the files he wanted to plant and be done. This is non-nonsensical anyway. That's two. Does anyone think that I've thought of all the possible attacks? (Well, hypothetical attacks. I acknowledge that I don't know the application, and cannot be sure that it *actually is* vulnerable to these attacks.) The problem here is that a monoalphabetic substitution cipher is not resistant to preimage attacks. Your only defence is that the key is unknown. If
Re: Pure Python Data Mangling or Encrypting
On 26.06.2015 22:09, Randall Smith wrote: You've gone on a rampage about nothing. My original description said the client was supposed to encrypt the data, but you want to assume the opposite for some unknown reason. While you seem to think that Steven is rampaging about nothing, he does have a fair point: You consistently were vague about wheter you want to have encryption, authentication or obfuscation of data. This suggests that you may not be so sure yourself what it is you actually want. All Steven is doing is pointing out that people do good crypto for a reason. It's 2015 and we're still discussion substitution ciphers, really? Good crypto is available, it's fast, it has awesome cryptanalysis. All Steven is pointing out is that when ten crypto-laymen meet in a Python newsgroup and think they have invented a soooper secure scheme, it may still be complete and utter crap. Just not everone can see it. You always play around with the 256! which would be a ridiculously high security margin (1684 bits of security, w!). You totally ignore that the system can be broken in a linear fashion. I don't need to know all 256 characters to do damage, sometimes even a handful will already give me part of what I need and the option to crack more and more. This is something that would ultimately and instantly disqualify your cryptosystem as utterly insecure. Nobody assumes you're a moron. But it's safe to assume that you're a crypto layman, because only laymen have no clue on how difficult it is to get cryptography even remotely right. Everyone who knows the trade uses proven constructions not because it's inconvenient, but because it's one of the very few ways to achieve a secure system. That said, for your solution this type of obfuscation may be fine. And chances are that nobody will ever notice. But don't claim you weren't warned about the abyss when you designed your solution and people break this stuff. Because then you might *look* like a moron (even if you're not), since the first question people will ask will be: Why? Why on earth? It's a blatantly obvious bad idea(tm). That people in 2015 actually defend inventing a substitution-cipher cryptosystem sends literally shivers down my spine. Cheers, Johannes -- Wo hattest Du das Beben nochmal GENAU vorhergesagt? Zumindest nicht öffentlich! Ah, der neueste und bis heute genialste Streich unsere großen Kosmologen: Die Geheim-Vorhersage. - Karl Kaos über Rüdiger Thomas in dsa hidbv3$om2$1...@speranza.aioe.org -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 26.06.2015 22:09, Randall Smith wrote: And that's why we're having this discussion. Do you know of an attack in which you can control the output (say at least 100 consecutive bytes) for data which goes through a 256 byte translation table, chosen randomly from 256! permutations after the data is sent. If you do, I'm all ears! But at this point you're just setting up straw men and knocking them down. Oh and I wanted to comment on this as well, but sent my reply too soon. You misunderstand. This is now how it works, this is not how any of this works. Steven does not *at all* have to prove to you your system is breakable or show actual attacks. YOU have to prove that your system is secure. Either analytically or you wait until you have peer review and cryptanalysis by actual experts. It's *very* easy to set up a badly flawed obfuscation system that can't be broken by laymen in a Python newsgroup and which appers to be secure. This does not imply one bit that it is even remotely secure. Cheers, Johannes -- Wo hattest Du das Beben nochmal GENAU vorhergesagt? Zumindest nicht öffentlich! Ah, der neueste und bis heute genialste Streich unsere großen Kosmologen: Die Geheim-Vorhersage. - Karl Kaos über Rüdiger Thomas in dsa hidbv3$om2$1...@speranza.aioe.org -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 06/26/2015 05:42 PM, Johannes Bauer wrote: On 26.06.2015 23:29, Jon Ribbens wrote: While you seem to think that Steven is rampaging about nothing, he does have a fair point: You consistently were vague about wheter you want to have encryption, authentication or obfuscation of data. This suggests that you may not be so sure yourself what it is you actually want. He hasn't been vague, you and Steven just haven't been paying attention. Bullshit. Even the topic indicates that he doesn't know what he wants: data mangling or encryption, which one is it? I knew exactly what I wanted and spelled it out. protect the recipient against exposure to nefarious data ... before it is written to disk You shouldn't need to make assumptions about other parts of the system. Just prevent potential malware from hitting the disk as such. Before this thread, I knew that encryption would definitely work and data mangling might. Now I know that data mangling is a really nice solution for the given requirements. You always play around with the 256! which would be a ridiculously high security margin (1684 bits of security, w!). You totally ignore that the system can be broken in a linear fashion. No, it can't, because the attacker does not have access to the ciphertext. Or so you claim. No the attacker does not have access to the ciphertext. What would lead you to think they did? This statement is central to the problem: I'd like to protect the recipient against exposure to nefarious data by mangling or encrypting the data before it is written to disk This makes it clear I'm not trying to encrypt data to protect the data. I'm trying to protect the recipient (storage server) from an attack. This specific attack being malware. Yes, AES encryption would have worked here, but encryption is not the objective. I could go into detail about how the assumtion that the ciphertext is secret is not a smart one in the context of cryptography. And how side channels and other leakage may affect overall system security. But I'm going to save my time on that. I do get paid to review cryptographic systems and part of the job is dealing with belligerent people who have read Schneier's blog and think they can outsmart anyone else. Since I don't get paid to convice you, it's absolutely fine that you think your substitution scheme is the grand prize. All of which has nothing to do with this thread. Actual encryption is handled using AES and TLS. This is not about encryption. Nobody assumes you're a moron. But it's safe to assume that you're a crypto layman, because only laymen have no clue on how difficult it is to get cryptography even remotely right. Amateur crypto is indeed a bad idea. But what you're still not getting is that what he's doing here *isn't crypto*. So the topic says Encrypting. If you look really closely at the word, the part crypt might give away to you that cryptography is involved. This isn't about encrypting data to protect the data. All the encryption I do uses standard AES and TLS. Yes, I do understand that crypto is best left to experts. The topic says Encrypting because I knew that encrypting the data would properly obfuscate it. He's just trying to avoid letting third parties write completely arbitrary data to the disk. There's your requirement. Then there's obviously some kind of implication when a third party *can* write arbitrary data to disk. And your other solution to that problem... It's a network protocol. Just like when writing a web app, you have to deal with bad actors. That's what I'm doing here. The entire service is about handling arbitrary data. Just like Amazon S3 handles people's arbitrary data. Not sure what you mean by third party. It would be a registered peer. But registration doesn't prevent the scenario in discussion. -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Sat, Jun 27, 2015 at 6:09 AM, Randall Smith rand...@tnr.cc wrote: Give me one plausible scenario where an attacker can cause malware to hit the disk after bytearray.translate with a 256 byte translation table and I'll be thankful to you. The entire 256-byte translation table is significant ONLY if you need all 256 possible bytes. Suppose I want to generate the following byte sequence: \xCD\x19 (Okay, this is a slightly oversimplified example, as this attack doesn't work on a modern Windows. But back in the days of DOS, this program would reboot your computer.) How many truly different translation tables are there if I'm trying to produce this? Just 256*255, or 65280. If I send random two-byte files, there is one chance in that of my malware successfully landing. Once I've sent about 45,000 of those files, I have a fifty-fifty chance of having hit it. Send twice as many, I have a 75% chance of success, etc. Malware can be crafted to fit within certain restrictions. I saw a proof-of-concept and analysis document detailing a particular remote code execution/privilege escalation attack that involved stuffing text into an entry field and then inducing the program to read that into its stack, finally triggering it by some sort of buffer overflow, I think. The text had to be no more than X bytes long (because that's all the entry field was set to accept - it'd truncate after that), and had to not contain any NUL bytes, and there might have been other restrictions too. Sure, it makes it harder to write your malware... but imagine if you can write something in just a handful of different bytes, which then goes and triggers something else. You could have an extremely plausible attack that might need only a day's uploading to deliver. It makes no difference that there are 256! possible encryption keys, if most of them have the same result. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 06/26/2015 04:55 PM, Mark Lawrence wrote: To be perfectly blunt I gave up days ago trying to follow what was being said, just too many words from all angles and too few diagrams for me to follow. I sincerely hope it doesn't end in tears. Mark. There's not much to follow. The solution was simple and complete. The original description was limited to a small part of a large, more complex system. The reason you've had trouble following is because several people made (very bad) assumptions about what the rest of the system did. Everything required for the solution was present in the original post. -Randall -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 06/26/2015 04:07 PM, Johannes Bauer wrote: You consistently were vague about wheter you want to have encryption, authentication or obfuscation of data. I knew (possibly extra) encryption wasn't necessary at this stage, but I also knew that encryption would provide good obfuscation. Problem is, I didn't want an extra C library to install. See the original post. ... I'd like to protect the recipient against exposure to nefarious data by mangling or encrypting the data before it is written to disk. My original idea was for the recipient to encrypt using AES. But I want to keep this software pure Python batteries included and not require installation of other platform-dependent software ... I don't know that I really need encryption here, but some type of fast mangling algorithm where a bad actor sending a payload can't guess the output ahead of time. -Randall -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 2015-06-26, Johannes Bauer dfnsonfsdu...@gmx.de wrote: On 26.06.2015 23:29, Jon Ribbens wrote: While you seem to think that Steven is rampaging about nothing, he does have a fair point: You consistently were vague about wheter you want to have encryption, authentication or obfuscation of data. This suggests that you may not be so sure yourself what it is you actually want. He hasn't been vague, you and Steven just haven't been paying attention. Bullshit. Even the topic indicates that he doesn't know what he wants: data mangling or encryption, which one is it? He wants data mangling and he was asking whether he needed encryption to achieve it. The answer is no, he doesn't. I could go into detail about how the assumtion that the ciphertext is secret is not a smart one in the context of cryptography. But, and I've already pointed this out and you don't seem to have quite got the picture yet, we're not in the context of cryptography. And how side channels and other leakage may affect overall system security. But I'm going to save my time on that. I do get paid to review cryptographic systems and part of the job is dealing with belligerent people who have read Schneier's blog and think they can outsmart anyone else. You seem to be describing your own attitude to a tee. Since I don't get paid to convice you, it's absolutely fine that you think your substitution scheme is the grand prize. My scheme? It wasn't my suggestion. So the topic says Encrypting. If you look really closely at the word, the part crypt might give away to you that cryptography is involved. If you were to actually read past the subject line and continue on to read the text of the articles, you would discover that cryptography is not involved. No wonder you're confused if you're disengaging your brain the instant you get past the subject line. He's just trying to avoid letting third parties write completely arbitrary data to the disk. There's your requirement. My requirement? You know what would be a perfectly good solution to his problem? Base 64 encoding. That would solve the issue pretty much completely, the only reason it's not an ideal solution is that it of course increases the size of the data. ...wow. That's a nice interpretation of not letting a third party write completely arbitrary data. It's an accurate interpretation. Something that seems not to be your forte. According to your definition, this would be: It's okay if the attacker can control 6 of 8 bits. Yes, it probably is ok. Add a bit of random gunk at the top and tail of the file and it's almost certainly ok. Why do you think it's not? Oh I understand your solutions plenty well. Evidently not. The only thing I don't understand is why you don't own a Fields medal yet for your groundbreaking work on bulletproof obfuscation. That is clearly a very long way from the only thing you don't understand. -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 26.06.2015 23:29, Jon Ribbens wrote: While you seem to think that Steven is rampaging about nothing, he does have a fair point: You consistently were vague about wheter you want to have encryption, authentication or obfuscation of data. This suggests that you may not be so sure yourself what it is you actually want. He hasn't been vague, you and Steven just haven't been paying attention. Bullshit. Even the topic indicates that he doesn't know what he wants: data mangling or encryption, which one is it? You always play around with the 256! which would be a ridiculously high security margin (1684 bits of security, w!). You totally ignore that the system can be broken in a linear fashion. No, it can't, because the attacker does not have access to the ciphertext. Or so you claim. I could go into detail about how the assumtion that the ciphertext is secret is not a smart one in the context of cryptography. And how side channels and other leakage may affect overall system security. But I'm going to save my time on that. I do get paid to review cryptographic systems and part of the job is dealing with belligerent people who have read Schneier's blog and think they can outsmart anyone else. Since I don't get paid to convice you, it's absolutely fine that you think your substitution scheme is the grand prize. Nobody assumes you're a moron. But it's safe to assume that you're a crypto layman, because only laymen have no clue on how difficult it is to get cryptography even remotely right. Amateur crypto is indeed a bad idea. But what you're still not getting is that what he's doing here *isn't crypto*. So the topic says Encrypting. If you look really closely at the word, the part crypt might give away to you that cryptography is involved. He's just trying to avoid letting third parties write completely arbitrary data to the disk. There's your requirement. Then there's obviously some kind of implication when a third party *can* write arbitrary data to disk. And your other solution to that problem... You know what would be a perfectly good solution to his problem? Base 64 encoding. That would solve the issue pretty much completely, the only reason it's not an ideal solution is that it of course increases the size of the data. ...wow. That's a nice interpretation of not letting a third party write completely arbitrary data. According to your definition, this would be: It's okay if the attacker can control 6 of 8 bits. That people in 2015 actually defend inventing a substitution-cipher cryptosystem sends literally shivers down my spine. Nobody is defending such a thing, you just haven't understood what problem is being solved here. Oh I understand your solutions plenty well. The only thing I don't understand is why you don't own a Fields medal yet for your groundbreaking work on bulletproof obfuscation. Cheers, Johannes -- Wo hattest Du das Beben nochmal GENAU vorhergesagt? Zumindest nicht öffentlich! Ah, der neueste und bis heute genialste Streich unsere großen Kosmologen: Die Geheim-Vorhersage. - Karl Kaos über Rüdiger Thomas in dsa hidbv3$om2$1...@speranza.aioe.org -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 26/06/2015 22:29, Jon Ribbens wrote: On 2015-06-26, Johannes Bauer dfnsonfsdu...@gmx.de wrote: On 26.06.2015 22:09, Randall Smith wrote: You've gone on a rampage about nothing. My original description said the client was supposed to encrypt the data, but you want to assume the opposite for some unknown reason. While you seem to think that Steven is rampaging about nothing, he does have a fair point: You consistently were vague about wheter you want to have encryption, authentication or obfuscation of data. This suggests that you may not be so sure yourself what it is you actually want. He hasn't been vague, you and Steven just haven't been paying attention. You always play around with the 256! which would be a ridiculously high security margin (1684 bits of security, w!). You totally ignore that the system can be broken in a linear fashion. No, it can't, because the attacker does not have access to the ciphertext. Nobody assumes you're a moron. But it's safe to assume that you're a crypto layman, because only laymen have no clue on how difficult it is to get cryptography even remotely right. Amateur crypto is indeed a bad idea. But what you're still not getting is that what he's doing here *isn't crypto*. He's just trying to avoid letting third parties write completely arbitrary data to the disk. You know what would be a perfectly good solution to his problem? Base 64 encoding. That would solve the issue pretty much completely, the only reason it's not an ideal solution is that it of course increases the size of the data. That people in 2015 actually defend inventing a substitution-cipher cryptosystem sends literally shivers down my spine. Nobody is defending such a thing, you just haven't understood what problem is being solved here. To be perfectly blunt I gave up days ago trying to follow what was being said, just too many words from all angles and too few diagrams for me to follow. I sincerely hope it doesn't end in tears. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Fri, Jun 26, 2015 at 3:07 PM, Johannes Bauer dfnsonfsdu...@gmx.de wrote: That people in 2015 actually defend inventing a substitution-cipher cryptosystem sends literally shivers down my spine. I think that the people defending this have been reasonably consistent about using the word obfuscation, not crypto. -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
Johannes, I agree with a lot of what you say, but can you please have less of a mean attitude? -- Devin On Fri, Jun 26, 2015 at 3:42 PM, Johannes Bauer dfnsonfsdu...@gmx.de wrote: On 26.06.2015 23:29, Jon Ribbens wrote: While you seem to think that Steven is rampaging about nothing, he does have a fair point: You consistently were vague about wheter you want to have encryption, authentication or obfuscation of data. This suggests that you may not be so sure yourself what it is you actually want. He hasn't been vague, you and Steven just haven't been paying attention. Bullshit. Even the topic indicates that he doesn't know what he wants: data mangling or encryption, which one is it? You always play around with the 256! which would be a ridiculously high security margin (1684 bits of security, w!). You totally ignore that the system can be broken in a linear fashion. No, it can't, because the attacker does not have access to the ciphertext. Or so you claim. I could go into detail about how the assumtion that the ciphertext is secret is not a smart one in the context of cryptography. And how side channels and other leakage may affect overall system security. But I'm going to save my time on that. I do get paid to review cryptographic systems and part of the job is dealing with belligerent people who have read Schneier's blog and think they can outsmart anyone else. Since I don't get paid to convice you, it's absolutely fine that you think your substitution scheme is the grand prize. Nobody assumes you're a moron. But it's safe to assume that you're a crypto layman, because only laymen have no clue on how difficult it is to get cryptography even remotely right. Amateur crypto is indeed a bad idea. But what you're still not getting is that what he's doing here *isn't crypto*. So the topic says Encrypting. If you look really closely at the word, the part crypt might give away to you that cryptography is involved. He's just trying to avoid letting third parties write completely arbitrary data to the disk. There's your requirement. Then there's obviously some kind of implication when a third party *can* write arbitrary data to disk. And your other solution to that problem... You know what would be a perfectly good solution to his problem? Base 64 encoding. That would solve the issue pretty much completely, the only reason it's not an ideal solution is that it of course increases the size of the data. ...wow. That's a nice interpretation of not letting a third party write completely arbitrary data. According to your definition, this would be: It's okay if the attacker can control 6 of 8 bits. That people in 2015 actually defend inventing a substitution-cipher cryptosystem sends literally shivers down my spine. Nobody is defending such a thing, you just haven't understood what problem is being solved here. Oh I understand your solutions plenty well. The only thing I don't understand is why you don't own a Fields medal yet for your groundbreaking work on bulletproof obfuscation. Cheers, Johannes -- Wo hattest Du das Beben nochmal GENAU vorhergesagt? Zumindest nicht öffentlich! Ah, der neueste und bis heute genialste Streich unsere großen Kosmologen: Die Geheim-Vorhersage. - Karl Kaos über Rüdiger Thomas in dsa hidbv3$om2$1...@speranza.aioe.org -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Sat, 27 Jun 2015 06:09 am, Randall Smith wrote: On 06/26/2015 12:06 PM, Steven D'Aprano wrote: On Fri, 26 Jun 2015 11:01 am, Ian Kelly wrote: You're making the same mistake that Steven did in misunderstanding the threat model. I don't think I'm misunderstanding the threat, I think I'm pointing out a threat which the OP is hoping to just ignore. I'm not hoping to ignore anything. I didn't explain the entire system, as it was not necessary to find a solution to the problem at hand. But since you want to make negative assumptions about what I didn't tell you, I'll gladly address your accusations of negligence. Negligence is *your* word, not mine. I've never said that. And I'm not *assuming* anything, everything I've stated has been based on the evidence of what you have written. I've even gone so far as to EXPLICITLY say that I cannot know for a fact that your application is vulnerable to these threats, since I'm only going from a description rather than the app itself. But your responses don't suggest that you have these threats under control, on the contrary, they indicate that you are *far* underestimating the seriousness of them and overestimating the difficulty of running a secure application on a machine you cannot trust. If your application has any saving grace, it is that there are easier ways to get malware onto somebody else's computer. There are a hundred million unsecured Windows boxen out there, if I were malicious I would just hire a bot net rather than spend the time trying to hack your system. But maybe somebody else will do it just for the lulz, or to prove it can be done. Some black hats like a challenge, and yours appears to fall nicely into that middle ground of hard enough to be interesting but not hard enough to be really difficult. In an earlier post, I suggested that the threat model should involve at least *three* different attacks, apart from the usual man-in-the-model attacks of data in transit. All communication is secured using TLS and authentication handled by X.509 certificates. This prevents man in the middle attacks. Certificates are signed by CAs I control. You control the CAs? Presumably that means they're self-signed (unless you mean you get to choose the CA). I don't know if that makes a difference or not. One is that the attacker is the person sending the data. E.g. I want to send a nasty payload (say, malware, or an offensive image). Another is that the attacker is the recipient of the file, who wants to read the sender's data. The only person who can read a file is the owner. AES encryption is built into the client software. The only way data can be uploaded unencrypted is if encryption is intentionally disabled. With respect Randall, you contradict yourself. Is there any wonder that some of us (well, me at least) is suspicious and confused, when your story changes as often as the weather? Sometimes you say that the client software uses AES encryption. Sometimes you say that you don't want to use AES encryption because you want the client to be pure Python, and a pure-Python implementation would be too slow. Your very first post says: My original idea was for the recipient to encrypt using AES. But I want to keep this software pure Python batteries included and not require installation of other platform-dependent software. Pure Python AES and even DES are just way too slow. Sometimes you say the user is supposed to encrypt the data themselves: While the data senders are supposed to encrypt data, that's not guaranteed Now you say that the application encrypts the data, except that the user can turn that option off. Just make the AES encryption mandatory, not optional. Then the user cannot upload unencrypted malicious data, and the receiver cannot read the data. That's two problems solved. Making AES or similarly strong encryption mandatory protects both the sender of data and the receiver of data. I cannot imagine why you are considering making it optional, since that only adds more work for you and reduces the security of your users. Oh, and DES is not good enough. As far as I can tell, the OP's plan to defend the sender's privacy is to dump responsibility for encrypting the files in the sender's lap. As far as I'm concerned, perhaps as many as one user in 2 will pre-encrypt their files. (Early adopters will be unrepresentative of the eventual user base of this system. If this takes off, the user base will likely end up dominated by people who think that qwerty is the epitome of unguessable passwords.) Making assumptions again. See above. The client software encrypts by default. You're also assuming there is no password strength checking. My comment about qwerty as a password was a comment on the majority of people on the internet, not an assumption about your application. Users just don't use crypto unless their applications do it for them. And it does.
Re: Pure Python Data Mangling or Encrypting
On Fri, Jun 26, 2015 at 8:38 PM, Steven D'Aprano st...@pearwood.info wrote: Now you say that the application encrypts the data, except that the user can turn that option off. Just make the AES encryption mandatory, not optional. Then the user cannot upload unencrypted malicious data, and the receiver cannot read the data. That's two problems solved. No, because another application could pretend to be the file-sending application, but send unencrypted data instead of encrypted data. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 27.06.2015 02:55, Randall Smith wrote: No the attacker does not have access to the ciphertext. What would lead you to think they did? Years of practical experience in the field of applied cryptography. Knowledge of how side channels work and how easily they can be constructed for bad schemes. Rest snipped, explanation futile. Cheers, Johannes -- Wo hattest Du das Beben nochmal GENAU vorhergesagt? Zumindest nicht öffentlich! Ah, der neueste und bis heute genialste Streich unsere großen Kosmologen: Die Geheim-Vorhersage. - Karl Kaos über Rüdiger Thomas in dsa hidbv3$om2$1...@speranza.aioe.org -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 2015-06-26, Chris Angelico ros...@gmail.com wrote: On Fri, Jun 26, 2015 at 1:26 AM, Jon Ribbens jon+use...@unequivocal.co.uk wrote: Well, it means you need to send 256 times as much data, which is a start. If you're instead using a 256-byte translation table then an attack becomes utterly impractical. Utterly impractical? Maybe, if you attempt a pure brute-force approach - there are 256! possible translation tables, which is roughly e500 attempts [1], and at roughly four a microsecond [2] that'd still take a ridiculously long time. But there are two gigantic optimizations you could do. Firstly, there are frequency-based attacks, No, there aren't. As I already said, the attacker does not have the ciphertext. He can't do anything related to frequency analysis. -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Thursday 25 June 2015 14:07, Steven D'Aprano wrote: You got it. I didn't want to explain any more than necessary. But yes, the recipient just stores the data for the end-user. Trust me. That's not all they are doing. Hmm, sorry, that's a glib answer. What I meant to say is, you can't *trust* that this is all they are doing, not unless all your users are within a single organisation where everyone trusts everyone else. Obviously some people are more trustworthy, or less inquisitive, than others. But you don't know which ones are which. -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Thu, Jun 25, 2015 at 2:57 AM, Chris Angelico ros...@gmail.com wrote: On Thu, Jun 25, 2015 at 7:41 PM, Devin Jeanpierre jeanpierr...@gmail.com wrote: I know that the OP doesn't propose using ROT-13, but a classical substitution cipher isn't that much stronger. Yes, it is. It requires the attacker being able to see something about the ciphertext, unlike ROT13. But it is reasonable to suppose that maybe the attacker can trigger the file getting executed, at which point maybe you can deduce from the behavior what the starting bytes are...? If a symmetric cipher is being used and the key is known, anyone can simply perform a decryption operation on the desired bytes, get back a pile of meaningless encrypted junk, and submit that. When it's encrypted with the same key, voila! The cleartext will reappear. Asymmetric ciphers are a bit different, though. AIUI you can't perform a decryption without the private key, whereas you can encrypt with only the public key. So you ought to be safe on that one; the only way someone could deliberately craft input that, when encrypted with your public key, produces a specific set of bytes, would be to brute-force it. (But I might be wrong on that. I'm no crypto expert.) Yes, so it should be random. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
Personally, i have had AVG give at least 2 false positives (fyi one of them was like python2.6) as long as antivirus software can give so many false positives i would thing preventing your AV from nuking someone elses data is a reasonable thing. -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Thu, Jun 25, 2015 at 7:41 PM, Devin Jeanpierre jeanpierr...@gmail.com wrote: I know that the OP doesn't propose using ROT-13, but a classical substitution cipher isn't that much stronger. Yes, it is. It requires the attacker being able to see something about the ciphertext, unlike ROT13. But it is reasonable to suppose that maybe the attacker can trigger the file getting executed, at which point maybe you can deduce from the behavior what the starting bytes are...? If a symmetric cipher is being used and the key is known, anyone can simply perform a decryption operation on the desired bytes, get back a pile of meaningless encrypted junk, and submit that. When it's encrypted with the same key, voila! The cleartext will reappear. Asymmetric ciphers are a bit different, though. AIUI you can't perform a decryption without the private key, whereas you can encrypt with only the public key. So you ought to be safe on that one; the only way someone could deliberately craft input that, when encrypted with your public key, produces a specific set of bytes, would be to brute-force it. (But I might be wrong on that. I'm no crypto expert.) ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 2015-06-25, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Thursday 25 June 2015 14:27, Devin Jeanpierre wrote: If it's encrypted malware, and you can't decrypt it, there's no threat. If the *only* threat is that the sender will send malware, you can mitigate around that by dropping the file in an unencrypted container. Anything good enough to prevent Windows from executing the code, accidentally or deliberately, say, a tar file with a custom extension. That won't stop virus scanners etc potentially making their own minds up about the file. But encrypting the file is also a good solution, and it prevents the storage machine spying on the file contents too. Provided the encryption is strong. How would the receiver encrypting the file after receiving it prevent the receiver from seeing what's in the file? The original post said that the sender will usually send files they encrypted, unless they are malicious. So if the sender wants them to be encrypted, they already are. The OP *hopes* that the sender will encrypt the files. I think that's a vanishingly faint hope, unless the application itself encrypts the file. Yes, the application itself encrypts the file. Haven't you been reading what he's saying? The sender has a copy of the application? Then they can see the type of obfuscation used. If they know the key, or can guess it, they can take their malware, *decrypt* it, and send that, so that *encrypting* that file puts the malicious code on the disk. Not if they don't know the key they can't. E.g. suppose I want to send you an insult, but I know your program automatically ROT-13s the strings I send you. Then I send you: 'lbhe sngure fzryyf bs ryqreoreevrf' and your program ROT-13s it to: 'your father smells of elderberries' I know that the OP doesn't propose using ROT-13, but a classical substitution cipher isn't that much stronger. Replace ROT-13 with ROT-n where 'n' is a secret known only to the receiver, and suddenly it's not such a bad method of obfuscation. Improve it to the random-translation-map method he's actually using and you've got really quite a reasonable system. I am usually very oppositional when it comes to rolling your own crypto, but am I alone here in thinking the OP very clearly laid out their case? I don't think any of us *really* understand his use-case or the potential threats, but to my way of thinking, you can never have too strong a cipher or underestimate the risk of users taking short-cuts. The use case is pretty obvious (a peer-to-peer dropbox type thing) but it does appear to be being misunderstood. This isn't actually a crypto problem at all and users taking short-cuts isn't an issue. -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 06/24/2015 08:33 PM, Dennis Lee Bieber wrote: On Wed, 24 Jun 2015 13:20:07 -0500, Randall Smith rand...@tnr.cc declaimed the following: On 06/24/2015 06:36 AM, Steven D'Aprano wrote: I don't understand how mangling the data is supposed to protect the recipient. Don't they have the ability unmangle the data, and thus expose themselves to whatever nasties are in the files? They never look at the data and wouldn't care to unmangle it. The purpose is primarily to prevent automated software (file indexers, virus scanners) from doing bad things to the data. Which leads to the question: what is doing bad things. Storage nodes are computers running the software in discussion, that store chunks of data they are sent (recipient) and send it upon request. Their job (as related to this software) is to accept, store and send chunks of data upon request. So losing data is a bad thing. The storage node software is cross platform and should run on anything from a dedicated Raspberry PI to an old Windows PC. Data integrity is insured using encryption and hashes generated by the original data owners. Normally, a data chunk would look like random bytes, because it is encrypted. However, the storage node cannot prevent the client (uploader) from sending unencrypted data. The purpose of this obfuscation is to protect the storage node, as many potential users have expressed hesitation in storing other peoples data. Example: A storage node runs a Desktop OS with an image indexer. It receives an unencrypted nasty image or movie. The indexer picks it up and shows it in the person's image or movie Library. Does that clear things up? -Randall -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Thu, 25 Jun 2015 08:03 pm, Jon Ribbens wrote: On 2015-06-25, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Thursday 25 June 2015 14:27, Devin Jeanpierre wrote: If it's encrypted malware, and you can't decrypt it, there's no threat. If the *only* threat is that the sender will send malware, you can mitigate around that by dropping the file in an unencrypted container. Anything good enough to prevent Windows from executing the code, accidentally or deliberately, say, a tar file with a custom extension. That won't stop virus scanners etc potentially making their own minds up about the file. *shrug* Sure, but I was specifically referring to the risk of the malware being executed, not being detected by a virus scanner. Encrypting the file won't even necessarily stop the virus scanner from finding false positives. It might even increase the chances. But it will prevent the virus scanner from finding actual viruses. You may or may not consider that a problem. But encrypting the file is also a good solution, and it prevents the storage machine spying on the file contents too. Provided the encryption is strong. How would the receiver encrypting the file after receiving it prevent the receiver from seeing what's in the file? I didn't say it ought to be encrypted by the receiver. Obviously the encryption needs to be done in a way that the recipient doesn't get access to the key. The obvious way to do that is for the application to encrypt the data before it sends it. Then the receiver just writes the encrypted bytes directly to a file. That would have the benefit of protecting against man-in-the-middle attacks as well, since the file is never transmitted in the clear. The original post said that the sender will usually send files they encrypted, unless they are malicious. So if the sender wants them to be encrypted, they already are. The OP *hopes* that the sender will encrypt the files. I think that's a vanishingly faint hope, unless the application itself encrypts the file. Yes, the application itself encrypts the file. Haven't you been reading what he's saying? I have been reading what the OP has been saying. I'm not sure if you have been. The OP doesn't want to encrypt the file, because he wants the application to be pure Python and encryption in pure Python is too slow. So he wants to obfuscate it with some sort of substitution cipher or equivalent, which may be easily crackable by anyone who really wants to. I've been arguing that the application *should* encrypt the file, and not mess about giving the illusion of security. The sender has a copy of the application? Then they can see the type of obfuscation used. If they know the key, or can guess it, they can take their malware, *decrypt* it, and send that, so that *encrypting* that file puts the malicious code on the disk. Not if they don't know the key they can't. If they know the key, or can guess it, ... Not if they don't know the key they can't. Really? Glad you're around to point that out to me. But seriously, they have the application. If the application is using a symmetric substitution cipher, it needs the key (because there is only one), so the receiver will have the cipher. With the sort of substitution cipher the OP is experimenting with, forcing a particular result is trivially easy. The sender has access to the application, knows the cipher, knows the key, and can easily generate a file which will generate whatever content the sender wants after being obfuscated. Modern asymmetric ciphers like AES are quite resistant to that sort of attack. There is, so far as I know, no way to generate a file which results in a specific content after encryption. E.g. suppose I want to send you an insult, but I know your program automatically ROT-13s the strings I send you. Then I send you: 'lbhe sngure fzryyf bs ryqreoreevrf' and your program ROT-13s it to: 'your father smells of elderberries' I know that the OP doesn't propose using ROT-13, but a classical substitution cipher isn't that much stronger. Replace ROT-13 with ROT-n where 'n' is a secret known only to the receiver, and suddenly it's not such a bad method of obfuscation. There are only 256 possible values for n, one of which doesn't transform the data at all (ROT-0). If you're thinking of attacking this by pencil and paper, 255 transformations sounds like a lot. For a computer, that's barely harder than a single transformation. Improve it to the random-translation-map method he's actually using and you've got really quite a reasonable system. No, truly you haven't. The OP is experimenting with bytearray.translate, which likely makes it a monoalphabetic substitution cipher, and the techniques for cracking those go back to the 9th century AD. That's over a thousand years of experience in cracking these things. The situation is a bit harder than the sort of traditional ciphers, instead of using an alphabet
Re: Pure Python Data Mangling or Encrypting
On 2015-06-25, Steven D'Aprano st...@pearwood.info wrote: On Thu, 25 Jun 2015 08:03 pm, Jon Ribbens wrote: That won't stop virus scanners etc potentially making their own minds up about the file. *shrug* Sure, but I was specifically referring to the risk of the malware being executed, not being detected by a virus scanner. Encrypting the file won't even necessarily stop the virus scanner from finding false positives. It might even increase the chances. That seems spectacularly unlikely. But it will prevent the virus scanner from finding actual viruses. You may or may not consider that a problem. The OP would consider it a benefit. I didn't say it ought to be encrypted by the receiver. Obviously the encryption needs to be done in a way that the recipient doesn't get access to the key. No, you're still misunderstanding. The encryption needs to be done in a way that the *sender* doesn't get access to the key. The recipient has access to it by definition because the recipient chooses it. The obvious way to do that is for the application to encrypt the data before it sends it. Yes, he already said the application does that. The problem is, what if the sender is not the genuine application but is instead a malicious attacker? Then the receiver just writes the encrypted bytes directly to a file. That's precisely what he's trying to avoid. That would have the benefit of protecting against man-in-the-middle attacks as well, since the file is never transmitted in the clear. With what he's talking about, the file after encryption is never transmitted *at all*. I've been arguing that the application *should* encrypt the file, and not mess about giving the illusion of security. You haven't understood the threat model. But seriously, they have the application. If the application is using a symmetric substitution cipher, it needs the key (because there is only one), so the receiver will have the cipher. There is not only one key. The recipient would invent a new key for each file after the file is received. With the sort of substitution cipher the OP is experimenting with, forcing a particular result is trivially easy. The sender has access to the application, knows the cipher, knows the key, and can easily generate a file which will generate whatever content the sender wants after being obfuscated. No, because the sender does not know the key. Replace ROT-13 with ROT-n where 'n' is a secret known only to the receiver, and suddenly it's not such a bad method of obfuscation. There are only 256 possible values for n, one of which doesn't transform the data at all (ROT-0). If you're thinking of attacking this by pencil and paper, 255 transformations sounds like a lot. For a computer, that's barely harder than a single transformation. Well, it means you need to send 256 times as much data, which is a start. If you're instead using a 256-byte translation table then an attack becomes utterly impractical. Improve it to the random-translation-map method he's actually using and you've got really quite a reasonable system. No, truly you haven't. The OP is experimenting with bytearray.translate, which likely makes it a monoalphabetic substitution cipher, and the techniques for cracking those go back to the 9th century AD. Only if you have the ciphertext, which the attacker in this scenario does not. The attacker gets to set the plaintext, knows the algorithm, does not know the key (unless the method of choosing the key has a flaw), and wants to set the ciphertext to some specific string. Frequency analysis doesn't even begin to apply to this scenario. You're relying on security by obscurity No, he really isn't. The use case is pretty obvious (a peer-to-peer dropbox type thing) but it does appear to be being misunderstood. This isn't actually a crypto problem at all and users taking short-cuts isn't an issue. Yes it is. If users don't properly pre-encrypt their files before sending it out to the cloud, AND THEY WON'T, Yes they will. He said his application encrypts the files for them, presumably he is indeed using proper crypto for that. receivers WILL be able to read those files, That's a problem for the sender not the receiver. -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 06/24/2015 11:27 PM, Devin Jeanpierre wrote: On Wed, Jun 24, 2015 at 9:07 PM, Steven D'Aprano st...@pearwood.info wrote: But just sticking to the three above, the first one is partially mitigated by allowing virus scanners to scan the data, but that implies that the owner of the storage machine can spy on the files. So you have a conflict here. If it's encrypted malware, and you can't decrypt it, there's no threat. Honestly, the *only* real defence against the spying issue is to encrypt the files. Not obfuscate them with a lousy random substitution cipher. The storage machine can keep the files as long as they like, just by making a copy, and spend hours bruteforcing them. They *will* crack the substitution cipher. In pure Python, that may take a few days or weeks; in C, hours or days. If they have the resources to throw at it, minutes. Substitution ciphers have not been effective encryption since, oh, the 1950s, unless you use a one-time pad. Which you won't be. The original post said that the sender will usually send files they encrypted, unless they are malicious. So if the sender wants them to be encrypted, they already are. While the data senders are supposed to encrypt data, that's not guaranteed, and I'd like to protect the recipient against exposure to nefarious data by mangling or encrypting the data before it is written to disk. The cipher is just to keep the sender from being able to control what is on disk. I am usually very oppositional when it comes to rolling your own crypto, but am I alone here in thinking the OP very clearly laid out their case? -- Devin Thanks Devin. You understand the issue perfectly despite my limited description of the system. I've fully implemented and performance tested your suggested solution and am quite happy with it. Though the issue is solved, I would be glad to listen to any remaining criticisms, suggestions or questions. --Randall -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
Thanks Jon. I couldn't have answered those questions better myself, and I wrote the software in question. I didn't intend to describe the entire system, but rather just enough of it to present the issue at hand. You seem to understand it quite well. I'm now using a randomly generated 256 byte translation table, which performs very well on the lowly Raspberry PI ARM chip. The Raspberry PI is to be my recommended storage node platform. For those that care, the storage system is something like Amazon S3, except storage is distributed peer to peer. Clients compress, encrypt, and chunk data, then send it to storage nodes. Storage nodes propagate the data. Encryption and Authentication are handled through TLS. Files use AES encryption for storage. Storage Nodes are monitored for availability, integrity, and performance. Data transfers are coordinated by a centralized service which tracks storage and transfers. Redundancy is configurable by chunk. Storage nodes are compensated for storage x time. Uploads and downloads can utilize several storage nodes simultaneously to increase throughput. -Randall On 06/25/2015 10:26 AM, Jon Ribbens wrote: On 2015-06-25, Steven D'Aprano st...@pearwood.info wrote: On Thu, 25 Jun 2015 08:03 pm, Jon Ribbens wrote: That won't stop virus scanners etc potentially making their own minds up about the file. *shrug* Sure, but I was specifically referring to the risk of the malware being executed, not being detected by a virus scanner. Encrypting the file won't even necessarily stop the virus scanner from finding false positives. It might even increase the chances. That seems spectacularly unlikely. But it will prevent the virus scanner from finding actual viruses. You may or may not consider that a problem. The OP would consider it a benefit. I didn't say it ought to be encrypted by the receiver. Obviously the encryption needs to be done in a way that the recipient doesn't get access to the key. No, you're still misunderstanding. The encryption needs to be done in a way that the *sender* doesn't get access to the key. The recipient has access to it by definition because the recipient chooses it. The obvious way to do that is for the application to encrypt the data before it sends it. Yes, he already said the application does that. The problem is, what if the sender is not the genuine application but is instead a malicious attacker? Then the receiver just writes the encrypted bytes directly to a file. That's precisely what he's trying to avoid. That would have the benefit of protecting against man-in-the-middle attacks as well, since the file is never transmitted in the clear. With what he's talking about, the file after encryption is never transmitted *at all*. I've been arguing that the application *should* encrypt the file, and not mess about giving the illusion of security. You haven't understood the threat model. But seriously, they have the application. If the application is using a symmetric substitution cipher, it needs the key (because there is only one), so the receiver will have the cipher. There is not only one key. The recipient would invent a new key for each file after the file is received. With the sort of substitution cipher the OP is experimenting with, forcing a particular result is trivially easy. The sender has access to the application, knows the cipher, knows the key, and can easily generate a file which will generate whatever content the sender wants after being obfuscated. No, because the sender does not know the key. Replace ROT-13 with ROT-n where 'n' is a secret known only to the receiver, and suddenly it's not such a bad method of obfuscation. There are only 256 possible values for n, one of which doesn't transform the data at all (ROT-0). If you're thinking of attacking this by pencil and paper, 255 transformations sounds like a lot. For a computer, that's barely harder than a single transformation. Well, it means you need to send 256 times as much data, which is a start. If you're instead using a 256-byte translation table then an attack becomes utterly impractical. Improve it to the random-translation-map method he's actually using and you've got really quite a reasonable system. No, truly you haven't. The OP is experimenting with bytearray.translate, which likely makes it a monoalphabetic substitution cipher, and the techniques for cracking those go back to the 9th century AD. Only if you have the ciphertext, which the attacker in this scenario does not. The attacker gets to set the plaintext, knows the algorithm, does not know the key (unless the method of choosing the key has a flaw), and wants to set the ciphertext to some specific string. Frequency analysis doesn't even begin to apply to this scenario. You're relying on security by obscurity No, he really isn't. The use case is pretty obvious (a peer-to-peer dropbox type thing) but it does appear to be being
Re: Pure Python Data Mangling or Encrypting
On Fri, Jun 26, 2015 at 11:17 AM, Mark Lawrence breamore...@yahoo.co.uk wrote: Even the famous Enigma machine was a lot more than just letter-for-letter substitution - a double letter in the cleartext wouldn't be represented by a double letter in the result - and once the machine's secrets were figured out, the day's key could be reassembled fairly readily. The day's key for a given network, with the Luftwaffe easily being the worst offenders. Some networks remained unbroken at the end of WWII. I was massively oversimplifying, here. But there's a reason that modern crypto doesn't use str.translate() level ciphers. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 26/06/2015 03:06, Chris Angelico wrote: On Fri, Jun 26, 2015 at 11:17 AM, Mark Lawrence breamore...@yahoo.co.uk wrote: Even the famous Enigma machine was a lot more than just letter-for-letter substitution - a double letter in the cleartext wouldn't be represented by a double letter in the result - and once the machine's secrets were figured out, the day's key could be reassembled fairly readily. The day's key for a given network, with the Luftwaffe easily being the worst offenders. Some networks remained unbroken at the end of WWII. I was massively oversimplifying, here. But there's a reason that modern crypto doesn't use str.translate() level ciphers. ChrisA I should know. Ever heard of DISCON? Like to hazard a guess as to who worked on it all those years ago? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Fri, Jun 26, 2015 at 11:01 AM, Ian Kelly ian.g.ke...@gmail.com wrote: On Thu, Jun 25, 2015 at 6:33 PM, Chris Angelico ros...@gmail.com wrote: On Fri, Jun 26, 2015 at 1:26 AM, Jon Ribbens jon+use...@unequivocal.co.uk wrote: Well, it means you need to send 256 times as much data, which is a start. If you're instead using a 256-byte translation table then an attack becomes utterly impractical. Utterly impractical? chomp analysis You're making the same mistake that Steven did in misunderstanding the threat model. To be honest, I wasn't actually answering anything about the original threat model, but only responding to the statement that a 256-byte anything-to-anything cipher is somehow incredibly secure. It isn't, but that might not be a problem for the original purpose. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Fri, Jun 26, 2015 at 12:24 PM, Mark Lawrence breamore...@yahoo.co.uk wrote: On 26/06/2015 03:06, Chris Angelico wrote: On Fri, Jun 26, 2015 at 11:17 AM, Mark Lawrence breamore...@yahoo.co.uk wrote: Even the famous Enigma machine was a lot more than just letter-for-letter substitution - a double letter in the cleartext wouldn't be represented by a double letter in the result - and once the machine's secrets were figured out, the day's key could be reassembled fairly readily. The day's key for a given network, with the Luftwaffe easily being the worst offenders. Some networks remained unbroken at the end of WWII. I was massively oversimplifying, here. But there's a reason that modern crypto doesn't use str.translate() level ciphers. ChrisA I should know. Ever heard of DISCON? Like to hazard a guess as to who worked on it all those years ago? No, not familiar with it. But I'm guessing you have the crypto background to know all this stuff, which means you aren't the sort of person I need to explain things to. Great! :) ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Fri, Jun 26, 2015 at 1:26 AM, Jon Ribbens jon+use...@unequivocal.co.uk wrote: There are only 256 possible values for n, one of which doesn't transform the data at all (ROT-0). If you're thinking of attacking this by pencil and paper, 255 transformations sounds like a lot. For a computer, that's barely harder than a single transformation. Well, it means you need to send 256 times as much data, which is a start. If you're instead using a 256-byte translation table then an attack becomes utterly impractical. Utterly impractical? Maybe, if you attempt a pure brute-force approach - there are 256! possible translation tables, which is roughly e500 attempts [1], and at roughly four a microsecond [2] that'd still take a ridiculously long time. But there are two gigantic optimizations you could do. Firstly, there are frequency-based attacks, and byte value duplicates will tell you a lot - classic cryptographic work. And secondly, you can simply take the first few bytes of a file - let's say 16, although a lot of files can be recognized in less than that. Even if there are no duplicate bytes, that'd be a maximum of 16! translation tables that truly matter, or just 2e13. At the same speed, that makes about a million seconds of computing time required. Divide that across a bunch of separate computers (the job is embarrassingly parallel after all), and you could get that result pretty easily. Cut the prefix to just 8 bytes and you have a mere 40K encryption keys to try - so quick that you wouldn't even see it happen. Nope, a simple substitution cipher is still not secure. Even the famous Enigma machine was a lot more than just letter-for-letter substitution - a double letter in the cleartext wouldn't be represented by a double letter in the result - and once the machine's secrets were figured out, the day's key could be reassembled fairly readily. ChrisA [1] It's actually closer to 8.6e506, if you care. [2] timeit result from my laptop - you could do better, but that's a reasonable average -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On 26/06/2015 01:33, Chris Angelico wrote: On Fri, Jun 26, 2015 at 1:26 AM, Jon Ribbens jon+use...@unequivocal.co.uk wrote: There are only 256 possible values for n, one of which doesn't transform the data at all (ROT-0). If you're thinking of attacking this by pencil and paper, 255 transformations sounds like a lot. For a computer, that's barely harder than a single transformation. Well, it means you need to send 256 times as much data, which is a start. If you're instead using a 256-byte translation table then an attack becomes utterly impractical. Utterly impractical? Maybe, if you attempt a pure brute-force approach - there are 256! possible translation tables, which is roughly e500 attempts [1], and at roughly four a microsecond [2] that'd still take a ridiculously long time. But there are two gigantic optimizations you could do. Firstly, there are frequency-based attacks, and byte value duplicates will tell you a lot - classic cryptographic work. And secondly, you can simply take the first few bytes of a file - let's say 16, although a lot of files can be recognized in less than that. Even if there are no duplicate bytes, that'd be a maximum of 16! translation tables that truly matter, or just 2e13. At the same speed, that makes about a million seconds of computing time required. Divide that across a bunch of separate computers (the job is embarrassingly parallel after all), and you could get that result pretty easily. Cut the prefix to just 8 bytes and you have a mere 40K encryption keys to try - so quick that you wouldn't even see it happen. Nope, a simple substitution cipher is still not secure. Even the famous Enigma machine was a lot more than just letter-for-letter substitution - a double letter in the cleartext wouldn't be represented by a double letter in the result - and once the machine's secrets were figured out, the day's key could be reassembled fairly readily. The day's key for a given network, with the Luftwaffe easily being the worst offenders. Some networks remained unbroken at the end of WWII. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Thu, Jun 25, 2015 at 6:33 PM, Chris Angelico ros...@gmail.com wrote: On Fri, Jun 26, 2015 at 1:26 AM, Jon Ribbens jon+use...@unequivocal.co.uk wrote: There are only 256 possible values for n, one of which doesn't transform the data at all (ROT-0). If you're thinking of attacking this by pencil and paper, 255 transformations sounds like a lot. For a computer, that's barely harder than a single transformation. Well, it means you need to send 256 times as much data, which is a start. If you're instead using a 256-byte translation table then an attack becomes utterly impractical. Utterly impractical? Maybe, if you attempt a pure brute-force approach - there are 256! possible translation tables, which is roughly e500 attempts [1], and at roughly four a microsecond [2] that'd still take a ridiculously long time. But there are two gigantic optimizations you could do. Firstly, there are frequency-based attacks, and byte value duplicates will tell you a lot - classic cryptographic work. And secondly, you can simply take the first few bytes of a file - let's say 16, although a lot of files can be recognized in less than that. Even if there are no duplicate bytes, that'd be a maximum of 16! translation tables that truly matter, or just 2e13. At the same speed, that makes about a million seconds of computing time required. Divide that across a bunch of separate computers (the job is embarrassingly parallel after all), and you could get that result pretty easily. Cut the prefix to just 8 bytes and you have a mere 40K encryption keys to try - so quick that you wouldn't even see it happen. Nope, a simple substitution cipher is still not secure. Even the famous Enigma machine was a lot more than just letter-for-letter substitution - a double letter in the cleartext wouldn't be represented by a double letter in the result - and once the machine's secrets were figured out, the day's key could be reassembled fairly readily. You're making the same mistake that Steven did in misunderstanding the threat model. The goal isn't to prevent the attacker from working out the key for a file that has already been obfuscated. Any real data that might be exposed by a vulnerability in the server is presumed to have already been strongly encrypted by the user. The goal is to prevent the attacker from guessing a key that hasn't even been generated yet, which could be exploited to engineer the obfuscated content into something malicious. There are no frequency-based attacks possible here, because you can't do frequency analysis on the result of a key that hasn't even been generated yet. Assuming that you have no attack on the key generation itself, the best you can do is send a file deobfuscated with a random key and hope that the recipient randomly chooses the same key; the odds of that happening are 1 in 256!. That said, I do see a potential weakness here: if the attacker can create a malicious payload using only a subset of the 256 possible byte values, then the odds of getting a correct key are increased, since multiple keys will work. For an extreme example, if the attacker can manage to craft a malicious payload that uses only the two byte values 32 and 47, then the probability of getting a key that will obfuscate to that is increased to 1 in 256! / 254!, or 1 in 65280. If they distribute 65280 copies of that payload to various recipients, then they can expect that one recipient on average will get the payload in its malicious form. -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Thursday 25 June 2015 14:27, Devin Jeanpierre wrote: On Wed, Jun 24, 2015 at 9:07 PM, Steven D'Aprano st...@pearwood.info wrote: But just sticking to the three above, the first one is partially mitigated by allowing virus scanners to scan the data, but that implies that the owner of the storage machine can spy on the files. So you have a conflict here. If it's encrypted malware, and you can't decrypt it, there's no threat. If the *only* threat is that the sender will send malware, you can mitigate around that by dropping the file in an unencrypted container. Anything good enough to prevent Windows from executing the code, accidentally or deliberately, say, a tar file with a custom extension. But encrypting the file is also a good solution, and it prevents the storage machine spying on the file contents too. Provided the encryption is strong. Honestly, the *only* real defence against the spying issue is to encrypt the files. Not obfuscate them with a lousy random substitution cipher. The storage machine can keep the files as long as they like, just by making a copy, and spend hours bruteforcing them. They *will* crack the substitution cipher. In pure Python, that may take a few days or weeks; in C, hours or days. If they have the resources to throw at it, minutes. Substitution ciphers have not been effective encryption since, oh, the 1950s, unless you use a one-time pad. Which you won't be. The original post said that the sender will usually send files they encrypted, unless they are malicious. So if the sender wants them to be encrypted, they already are. The OP *hopes* that the sender will encrypt the files. I think that's a vanishingly faint hope, unless the application itself encrypts the file. Most people don't have any encryption software beyond password-protecting zip files. Zip 2.0 legacy encryption is crap, and there are plenty of tools available to break it. Winzip has an extension for 128-bit and 256-bit AES encryption, both of which are probably strong enough unless you're targeted by the NSA, but the weak link in the chain is the idea that people will encrypt the software before sending it. Even if they have the tools, laziness being the defining characteristic of most people, they won't use them. While the data senders are supposed to encrypt data, that's not guaranteed, and I'd like to protect the recipient against exposure to nefarious data by mangling or encrypting the data before it is written to disk. The cipher is just to keep the sender from being able to control what is on disk. The sender has a copy of the application? Then they can see the type of obfuscation used. If they know the key, or can guess it, they can take their malware, *decrypt* it, and send that, so that *encrypting* that file puts the malicious code on the disk. E.g. suppose I want to send you an insult, but I know your program automatically ROT-13s the strings I send you. Then I send you: 'lbhe sngure fzryyf bs ryqreoreevrf' and your program ROT-13s it to: 'your father smells of elderberries' I know that the OP doesn't propose using ROT-13, but a classical substitution cipher isn't that much stronger. I am usually very oppositional when it comes to rolling your own crypto, but am I alone here in thinking the OP very clearly laid out their case? I don't think any of us *really* understand his use-case or the potential threats, but to my way of thinking, you can never have too strong a cipher or underestimate the risk of users taking short-cuts. -- Steve -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
On Thu, Jun 25, 2015 at 2:25 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Thursday 25 June 2015 14:27, Devin Jeanpierre wrote: The original post said that the sender will usually send files they encrypted, unless they are malicious. So if the sender wants them to be encrypted, they already are. The OP *hopes* that the sender will encrypt the files. I think that's a vanishingly faint hope, unless the application itself encrypts the file. Most people don't have any encryption software beyond password-protecting zip files. Zip 2.0 legacy encryption is crap, and there are plenty of tools available to break it. Winzip has an extension for 128-bit and 256-bit AES encryption, both of which are probably strong enough unless you're targeted by the NSA, but the weak link in the chain is the idea that people will encrypt the software before sending it. Even if they have the tools, laziness being the defining characteristic of most people, they won't use them. You're right, I was supposing that since they wrote the server, they also wrote the client, and were just protecting from the protocol itself being weak. I know that the OP doesn't propose using ROT-13, but a classical substitution cipher isn't that much stronger. Yes, it is. It requires the attacker being able to see something about the ciphertext, unlike ROT13. But it is reasonable to suppose that maybe the attacker can trigger the file getting executed, at which point maybe you can deduce from the behavior what the starting bytes are...? I don't think any of us *really* understand his use-case or the potential threats, but to my way of thinking, you can never have too strong a cipher or underestimate the risk of users taking short-cuts. This is truth. It would be nice if something like keyczar came in the stdlib. (Otherwise, users of Python take shortcuts and use randomized substitution ciphers instead of AES.) -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: Pure Python Data Mangling or Encrypting
How about a random substitution cipher? This will be ultra-weak, but fast (using bytes.translate/bytes.maketrans) and seems to be the kind of thing you're asking for. -- Devin On Tue, Jun 23, 2015 at 12:02 PM, Randall Smith rand...@tnr.cc wrote: Chunks of data (about 2MB) are to be stored on machines using a peer-to-peer protocol. The recipient of these chunks can't assume that the payload is benign. While the data senders are supposed to encrypt data, that's not guaranteed, and I'd like to protect the recipient against exposure to nefarious data by mangling or encrypting the data before it is written to disk. My original idea was for the recipient to encrypt using AES. But I want to keep this software pure Python batteries included and not require installation of other platform-dependent software. Pure Python AES and even DES are just way too slow. I don't know that I really need encryption here, but some type of fast mangling algorithm where a bad actor sending a payload can't guess the output ahead of time. Any ideas are appreciated. Thanks. -Randall -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Pure Python Data Mangling or Encrypting
Chunks of data (about 2MB) are to be stored on machines using a peer-to-peer protocol. The recipient of these chunks can't assume that the payload is benign. While the data senders are supposed to encrypt data, that's not guaranteed, and I'd like to protect the recipient against exposure to nefarious data by mangling or encrypting the data before it is written to disk. My original idea was for the recipient to encrypt using AES. But I want to keep this software pure Python batteries included and not require installation of other platform-dependent software. Pure Python AES and even DES are just way too slow. I don't know that I really need encryption here, but some type of fast mangling algorithm where a bad actor sending a payload can't guess the output ahead of time. Any ideas are appreciated. Thanks. -Randall -- https://mail.python.org/mailman/listinfo/python-list