Re: Pure Python Data Mangling or Encrypting

2015-07-01 Thread Randall Smith

On 06/30/2015 01:33 PM, Chris Angelico wrote:

 From the software's point of view, it has two distinct
modes: server, in which it listens on a socket and receives data, and
client, in which it connects to other people's sockets and sends data.
As such, the server mode is the only one that receives untrusted
data from another user and stores it on the hard disk.



That's close.  There are 3 types: storage nodes, client nodes, and 
control nodes.


Communication:

storage node -- control node
storage node -- storage node
client node   -- storage node
client node   -- control node

Data is uploaded by clients and distributed among storage nodes.

Everything is coordinated by the control nodes (plural for redundancy).

-Randall

--
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-07-01 Thread alister
On Tue, 30 Jun 2015 23:25:01 +, Jon Ribbens wrote:

 On 2015-06-30, Steven D'Aprano st...@pearwood.info wrote:
 I don't think there has been much research into keeping at least *some*
 security even when keys have been compromised, apart from as it relates
 to two-factor authentication.
 
 That's because the key is all the secret part. If an attacker knows
 the algorithm, and the key, and the ciphertext, then *by definition* all
 is lost. If you mean keeping the algorithm secret too then that's just
 considered bad crypto.
 
 In the past, and still today among people who don't understand
 Kerckhoffs' principle, people have tried to keep the cipher secret and
 not have a key at all. E.g. atbash, or caesar cipher, which once upon a
 time were cutting edge ciphers, as laughably insecure as they are
 today. If the method was compromised, all was lost.
 
 Caesar cipher has a key. It's just very small, so is easy to guess.
 
 Today, if the key is compromised, all is lost. Is it possible that
 there are ciphers that are resistant to discovery of the key? Obviously
 if you know the key you can read encrypted messages, that's what the
 key is for, but there are scenarios where you would want security to
 degrade gracefully instead of in a brittle all-or-nothing manner:

 - even if the attacker can read my messages, he cannot tamper with
   them or write new ones as me.
 
 I suppose that could be achieved by having separate encryption and
 signing keys, but you could do the same but better by encrypting with
 multiple algorithms. It's not an unstudied area:
 https://en.wikipedia.org/wiki/Multiple_encryption

The kipper flies at Midnight

(from almost every WWII spy movie ever)
even if this message is decoded it is meaningless unless the attacker 
also has the meanings of the Code phrases
(which would mean your agent had been captured anyway)



-- 
That's the funniest thing I've ever heard and I will _not_ condone it.
-- DyerMaker, 17 March 2000 MegaPhone radio show
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-07-01 Thread Gregory Ewing

Randall Smith wrote:


 Worse case, something that looks like this would land on the disk.

crc32 checksum + translation table + malware


It would be safer to add something to both the
beginning *and* end of the file. Some file formats,
e.g. zip, pdf, are designed to be read starting
from the end.

So I would suggest something like

  crc32 checksum + payload + translation table

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-30 Thread Jon Ribbens
On 2015-06-30, Steven D'Aprano st...@pearwood.info wrote:
 On Tue, 30 Jun 2015 06:52 am, Randall Smith wrote:
 Not sure why you posted the link.  The crc32 checksum is just to check
 for possible filesystem corruption.  The system does periodic data
 corruption checks.  BTRFS uses crc32 checksums also.  Please explain.

 The file system can trust that anything writing to a file is allowed to
 write to it, in doesn't have to defend against malicious writes. As I
 understand it, your application does.

 Here is the attack scenario I have in mind:

 - you write a file to my computer, and neglect to encrypt it;

Eh? The game is over right there. I don't trust you, and yet
I have just given you my private data, unencrypted. Checksums
don't even come into it, we have failed utterly at step 1.

 - since you are using CRC, it is quite easy for me to ensure the 
   checksums match after inserting malware;

No, you have yet *again* misunderstood the difference between the
client and the server.

 I was wrong: cryptographically strong ciphers are generally NOT resistant to
 what I described as a preimage attack. If the key leaks, using AES won't
 save you: an attacker with access to the key can produce a ciphertext that
 decrypts to the malware of his choice, regardless of whether you use
 AES-256 or rot-13. There may be other encryption methods which don't suffer
 from that, but he doesn't know of them off the top of his head.

lol. I suspected as much. You and Johannes were even more wrong than
was already obvious.

 The other threat I mentioned is that the receiver will read the content of
 the file. For that, a strong cipher is much to be preferred over a weak
 one, and it needs to be encrypted by the sending end, not the receiving
 end. (If the receiving end does it, it has to keep the key so it can
 decrypt before sending back, which means the computer's owner can just grab
 the key and read the files.)

Yes, that is utterly basic.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-30 Thread Randall Smith

On 06/29/2015 10:00 PM, Steven D'Aprano wrote:

On Tue, 30 Jun 2015 06:52 am, Randall Smith wrote:


Not sure why you posted the link.  The crc32 checksum is just to check
for possible filesystem corruption.  The system does periodic data
corruption checks.  BTRFS uses crc32 checksums also.  Please explain.


The file system can trust that anything writing to a file is allowed to
write to it, in doesn't have to defend against malicious writes. As I
understand it, your application does.

Here is the attack scenario I have in mind:

- you write a file to my computer, and neglect to encrypt it;
- and record the checksum for later;
- I insert malware into your file;
- you retrieve the file from me;
- if the checksum matches what you have on record, you accept the file;
- since you are using CRC, it is quite easy for me to ensure the
   checksums match after inserting malware;
- and I have now successfully injected malware into your computer.

I'm making an assumption here -- I assume that the sender records a checksum
for uploaded files so that when they get something back again they can tell
whether or not it is the same content they uploaded.


Yes.  The client software computes sha256 checksums.



* * *

By the way, regarding the use of a substitution cipher, I spoke to the
crypto guy at work, and preimage attack is not quite the right
terminology, since that's normally used in the context of hash functions.
It's almost a known ciphertext attack, but not quite, since that
terminology refers to guessing the key from the ciphertext.

I was wrong: cryptographically strong ciphers are generally NOT resistant to
what I described as a preimage attack. If the key leaks, using AES won't
save you: an attacker with access to the key can produce a ciphertext that
decrypts to the malware of his choice, regardless of whether you use
AES-256 or rot-13. There may be other encryption methods which don't suffer
from that, but he doesn't know of them off the top of his head.

His comment was, don't leak the key.


I'm pretty sure all encryption hinges on guarding the key.



The other threat I mentioned is that the receiver will read the content of
the file. For that, a strong cipher is much to be preferred over a weak
one, and it needs to be encrypted by the sending end, not the receiving
end. (If the receiving end does it, it has to keep the key so it can
decrypt before sending back, which means the computer's owner can just grab
the key and read the files.)




And again, that's why the client (data owner) uses AES.


--
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-30 Thread Jon Ribbens
On 2015-06-30, Steven D'Aprano st...@pearwood.info wrote:
 On Tue, 30 Jun 2015 10:19 pm, Jon Ribbens wrote:
 Eh? The game is over right there. I don't trust you, and yet
 I have just given you my private data, unencrypted.

 Yes. That is exactly the problem. If the application doesn't encrypt the
 data for me, *it isn't going to happen*. We are in violent agreement that
 the sender needs to encrypt the data.

It's a good thing that he's said it will then.

 Randall has suggested that encryption is optional.

No he hasn't. You just keep creatively misreading what he says, for
some reason.

 It's not unreasonable to raise this issue.

It is unreasonable to raise it over and over again however,
especially when there's no reason at all to think it's relevant,
and nothing has changed from the last time you raised it.

 We can mitigate against the second attack by using a cryptographically
 strong hash function to detect tampering.

Not on the server you can't. If the attacker can edit the files he can
edit the hashes too.

 These *are* resistant to preimage attacks. If I give you a SHA512
 checksum, there is no known practical method to generate a file with
 that same checksum. If I give you a CRC checksum, you can.

Randall didn't suggest any usage of CRCs where preimage attacks are
relevant. You just made that bit up.

 - since you are using CRC, it is quite easy for me to ensure the
   checksums match after inserting malware;
 
 No, you have yet *again* misunderstood the difference between the
 client and the server.

 This was described as a peer-to-peer application. You even stated that it
 was a pretty obvious use-case, a peer-to-peer dropbox. So is it
 peer-to-peer or client-server?

Both. It sounds a bit like there are clients which upload files
to a cloud of servers which are peers of each other. But seriously,
is this the source of all your confusion? Even if all the nodes
are pure peers (which it doesn't sound like they are), any
particular file will still have a source node which is therefore
the client for that file. You're trying to draw a hard distinction
where there is none.

 lol. I suspected as much. You and Johannes were even more wrong than
 was already obvious.

 You suspected as much? Such a pity you didn't speak up earlier and
 explain that cryptographic ciphers aren't generally resistant to
 preimage attacks.

I think you're misusing that phrase. But taking what you meant,
I suspected it was true (would they be reistant, after all?)
but I couldn't be bothered to check because the whole crypto bit
was a complete red-herring in the first place. The original discussion
wasn't about crypto, all the discussion about that was only because
you and Johannes wrongly insisted it was necessary.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-30 Thread Steven D'Aprano
On Tue, 30 Jun 2015 10:19 pm, Jon Ribbens wrote:

 On 2015-06-30, Steven D'Aprano st...@pearwood.info wrote:
 On Tue, 30 Jun 2015 06:52 am, Randall Smith wrote:
 Not sure why you posted the link.  The crc32 checksum is just to check
 for possible filesystem corruption.  The system does periodic data
 corruption checks.  BTRFS uses crc32 checksums also.  Please explain.

 The file system can trust that anything writing to a file is allowed to
 write to it, in doesn't have to defend against malicious writes. As I
 understand it, your application does.

 Here is the attack scenario I have in mind:

 - you write a file to my computer, and neglect to encrypt it;
 
 Eh? The game is over right there. I don't trust you, and yet
 I have just given you my private data, unencrypted.

Yes. That is exactly the problem. If the application doesn't encrypt the
data for me, *it isn't going to happen*. We are in violent agreement that
the sender needs to encrypt the data.

I think Randall has been somewhat less than clear about what the application
actually does and how it works. He probably thinks he doesn't need to
explain, that its none of our business, and wishes we'd just shut up about
it. That's his right.

It's also my right to discuss the possible security implications of some
hypothetical peer-to-peer dropbox-like application which may, or may not,
be similar to Randall's application. Whether Randall learns anything from
that discussion, or just tunes it out, is irrelevant. I've already learned
at least one thing from this discussion, so as far as I'm concerned it's a
win.

Randall has suggested that encryption is optional. It isn't clear whether he
means there is an option to turn encryption off, or whether he means I can
hack the application and disable it, or write my own application. I don't
expect him to be responsible for rogue applications that have been hacked
or written independently, which (out of malice or stupidity) don't encrypt
the uploaded data. But I think that it is foolish to support an unencrypted
mode of operation.

It's not unreasonable to raise this issue. The default state of security
among IT professionals is something worse than awful:

https://medium.com/message/everything-is-broken-81e5f33a24e1

One of Australia's largest ISPs recently was hacked, and guess how they
stored their customer's passwords? Yes, you got it: in plain text. There is
no security mistake so foolish that IT professionals won't make it.


 Checksums don't even come into it, we have failed utterly at step 1.

*shrug*

You're right. But having failed at step 1, there are multiple attacks that
can follow. The first attack is the obvious one: the ability to read the
unencrypted data.

If you can trick me into turning encryption off (say, you use a social
engineering attack on me and convince me to delete the virus crypto.py),
then I might inadvertently upload unencrypted data to you. Or maybe you
find an attack on the application that can fool it into dropping down to
unencrypted mode. If there's no unencrypted mode in the first place, that's
much harder.

Earlier, Chris suggested that the application might choose to import the
crypto module, and if it's not available, just keep working without
encryption. This hypothetical attack demonstrates that this would be a
mistake. It's hard for an attacker to convince a naive user to open up the
application source code and edit the code. It's easier to convince them to
delete a file.

Or, the application just has a bug in it. It accidentally flips the sense of
the use encryption flag. That's a failure mode that simply cannot occur
if there is no such flag in the first place.

If our attacker has managed to disable encryption in the sender's
application, then they can not only read my data, but tamper with it. These
are *separate attacks* with the same underlying cause. I can mitigate one
without mitigating the other.

We can mitigate against the second attack by using a cryptographically
strong hash function to detect tampering. These *are* resistant to preimage
attacks. If I give you a SHA512 checksum, there is no known practical
method to generate a file with that same checksum. If I give you a CRC
checksum, you can.

(Naturally the checksum has to be under the sender's control. If the
receiver has the checksum and the data, they can just replace the checksum
with one of their choosing.)

That's a separate issue from detecting non-malicious data corruption,
although of course a SHA512 checksum will detect that as well.


 - since you are using CRC, it is quite easy for me to ensure the
   checksums match after inserting malware;
 
 No, you have yet *again* misunderstood the difference between the
 client and the server.

This was described as a peer-to-peer application. You even stated that it
was a pretty obvious use-case, a peer-to-peer dropbox. So is it
peer-to-peer or client-server?

In any case, since Randall has refused to go into specific details of how
his application 

Re: Pure Python Data Mangling or Encrypting

2015-06-30 Thread Randall Smith

On 06/29/2015 03:49 PM, Jon Ribbens wrote:

On 2015-06-29, Randall Smith rand...@tnr.cc wrote:

Same reason newer filesystems like BTRFS use checkusms (BTRFS uses
CRC32).  The storage machine runs periodic file integrity checks.  It
has no control over the underlying filesystem.


True, but presumably neither does it have anything it can do to
rectify the situation if it finds a problem, and the client will
have to keep its own secure hash of its file anyway. (Unless I suppose
the server actually can request a new copy from the client or another
server if it finds a corrupt file?)



Yes.  The storage servers are monitored for integrity.  They can request 
a new copy, though frequent corruption results in the server being 
marked as unreliable.


-Randall

--
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-30 Thread Chris Angelico
On Wed, Jul 1, 2015 at 4:17 AM, Steven D'Aprano st...@pearwood.info wrote:
 If you can trick me into turning encryption off (say, you use a social
 engineering attack on me and convince me to delete the virus crypto.py),
 then I might inadvertently upload unencrypted data to you. Or maybe you
 find an attack on the application that can fool it into dropping down to
 unencrypted mode. If there's no unencrypted mode in the first place, that's
 much harder.

 Earlier, Chris suggested that the application might choose to import the
 crypto module, and if it's not available, just keep working without
 encryption. This hypothetical attack demonstrates that this would be a
 mistake. It's hard for an attacker to convince a naive user to open up the
 application source code and edit the code. It's easier to convince them to
 delete a file.

And I'm sure Steven knows about this, but if anyone else isn't
convinced that this is a serious vulnerability, look into various
forms of downgrade attack, such as the recent POODLE. Security doesn't
exist if an attacker can convince your program to turn it off without
your knowledge.

 - since you are using CRC, it is quite easy for me to ensure the
   checksums match after inserting malware;

 No, you have yet *again* misunderstood the difference between the
 client and the server.

 This was described as a peer-to-peer application. You even stated that it
 was a pretty obvious use-case, a peer-to-peer dropbox. So is it
 peer-to-peer or client-server?

I've never managed to get any sort of grasp of what this application
actually *is*, but peer-to-peer Dropbox is certainly something that
it *might be*. It could be simultaneously peer-to-peer from the
human's point of view, and client-server from the application's -
imagine BitTorrent protocol, but where one end connects to a socket
that the other's listening on, and the active socket always pushes
data to the passive socket. (With BitTorrent, it's truly symmetrical -
doesn't matter who listens and who connects. But imagine if it weren't
that way.) From the software's point of view, it has two distinct
modes: server, in which it listens on a socket and receives data, and
client, in which it connects to other people's sockets and sends data.
As such, the server mode is the only one that receives untrusted
data from another user and stores it on the hard disk.

But this is just one theory of what the program *might* be, based on
what I've gathered in this thread. Or rather, it's a vague theory of
something that's mostly plausible, without necessarily even being
useful.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-30 Thread Steven D'Aprano
On Wed, 1 Jul 2015 03:39 am, Randall Smith wrote:

 On 06/29/2015 10:00 PM, Steven D'Aprano wrote:

 I'm making an assumption here -- I assume that the sender records a
 checksum for uploaded files so that when they get something back again
 they can tell whether or not it is the same content they uploaded.
 
 Yes.  The client software computes sha256 checksums.

Thanks for clarifying.


[...]
 His comment was, don't leak the key.
 
 I'm pretty sure all encryption hinges on guarding the key.

That would be Kerckhoffs' Principle, also known as Shannon's Maxim.

I don't think there has been much research into keeping at least *some*
security even when keys have been compromised, apart from as it relates to
two-factor authentication. (Assume that other people know the password to
your bank account. They can read your balance, but they can't steal your
money unless they first steal your phone or RSA token.)

In the past, and still today among people who don't understand Kerckhoffs'
principle, people have tried to keep the cipher secret and not have a key
at all. E.g. atbash, or caesar cipher, which once upon a time were cutting
edge ciphers, as laughably insecure as they are today. If the method was
compromised, all was lost. 

Today, if the key is compromised, all is lost. Is it possible that there are
ciphers that are resistant to discovery of the key? Obviously if you know
the key you can read encrypted messages, that's what the key is for, but
there are scenarios where you would want security to degrade gracefully
instead of in a brittle all-or-nothing manner:

- even if the attacker can read my messages, he cannot tamper with 
  them or write new ones as me.

(I'm pretty sure that, for example, the military would consider it horrible
if the enemy could listen in on their communications, but *even worse* if
the enemy could send false orders that appear to be legitimate.)

Sixty years ago, the idea of having a separate encryption key that you keep
secret and a decryption key that you can give out to everyone (public key
encryption) probably would have seemed ridiculous too.



-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-30 Thread Chris Angelico
On Wed, Jul 1, 2015 at 4:59 AM, Steven D'Aprano st...@pearwood.info wrote:
 Today, if the key is compromised, all is lost. Is it possible that there are
 ciphers that are resistant to discovery of the key? Obviously if you know
 the key you can read encrypted messages, that's what the key is for, but
 there are scenarios where you would want security to degrade gracefully
 instead of in a brittle all-or-nothing manner:

 - even if the attacker can read my messages, he cannot tamper with
   them or write new ones as me.

 (I'm pretty sure that, for example, the military would consider it horrible
 if the enemy could listen in on their communications, but *even worse* if
 the enemy could send false orders that appear to be legitimate.)

That would be accomplished by a two-fold enveloping of signing and
encrypting. If I sign something using my private key, then encrypt it
using your public key, someone who's compromised your private key
could snoop and read the message, but couldn't forge a message from
me. Of course, that just means there are lots more secrets to worry
about getting compromised.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-30 Thread Jon Ribbens
On 2015-06-30, Steven D'Aprano st...@pearwood.info wrote:
 I don't think there has been much research into keeping at least *some*
 security even when keys have been compromised, apart from as it relates to
 two-factor authentication.

That's because the key is all the secret part. If an attacker knows
the algorithm, and the key, and the ciphertext, then *by definition*
all is lost. If you mean keeping the algorithm secret too then that's
just considered bad crypto.

 In the past, and still today among people who don't understand Kerckhoffs'
 principle, people have tried to keep the cipher secret and not have a key
 at all. E.g. atbash, or caesar cipher, which once upon a time were cutting
 edge ciphers, as laughably insecure as they are today. If the method was
 compromised, all was lost. 

Caesar cipher has a key. It's just very small, so is easy to guess.

 Today, if the key is compromised, all is lost. Is it possible that there are
 ciphers that are resistant to discovery of the key? Obviously if you know
 the key you can read encrypted messages, that's what the key is for, but
 there are scenarios where you would want security to degrade gracefully
 instead of in a brittle all-or-nothing manner:

 - even if the attacker can read my messages, he cannot tamper with 
   them or write new ones as me.

I suppose that could be achieved by having separate encryption and
signing keys, but you could do the same but better by encrypting
with multiple algorithms. It's not an unstudied area:
https://en.wikipedia.org/wiki/Multiple_encryption
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-29 Thread Jon Ribbens
On 2015-06-29, Randall Smith rand...@tnr.cc wrote:
 Same reason newer filesystems like BTRFS use checkusms (BTRFS uses 
 CRC32).  The storage machine runs periodic file integrity checks.  It 
 has no control over the underlying filesystem.

True, but presumably neither does it have anything it can do to
rectify the situation if it finds a problem, and the client will
have to keep its own secure hash of its file anyway. (Unless I suppose
the server actually can request a new copy from the client or another
server if it finds a corrupt file?)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-29 Thread Randall Smith

On 06/28/2015 09:21 AM, Jon Ribbens wrote:

On 2015-06-27, Randall Smith rand...@tnr.cc wrote:

Thankyou.  Nice points. I do think given the risks (there are always
risks) discussed, a successful attack of this nature is not very likely.
   Worse case, something that looks like this would land on the disk.

crc32 checksum + translation table + malware

with a generated base64 name and no extension.


I'm not sure why you're bothering with the checksum, it doesn't seem
to me that it buys you anything. Personally I'd do something like
this (pseudocode):



Same reason newer filesystems like BTRFS use checkusms (BTRFS uses 
CRC32).  The storage machine runs periodic file integrity checks.  It 
has no control over the underlying filesystem.





   def obfuscate(data):
   encode_key = list(range(256))
   random.shuffle(encode_key)
   encode_key = bytes(encode_key)
   decode_key = bytes(encode_key.index(i) for i in range(256))
   return decode_key + data.translate(encode_key) + decode_key

   def deobfuscate(data):
   return data[256:-256].translate(data[:256])

The reason for appending the key as well as prepending it is that some
anti-virus or malware scanners may well look at the last part of the
file first, so putting something entirely locally-generated there may
add a bit of safety. You could also simply pad with nulls or something
of course, but again I can imagine some tools skipping backwards past
nulls.




--
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-29 Thread Randall Smith

On 06/27/2015 01:50 PM, Steven D'Aprano wrote:

On Sun, 28 Jun 2015 03:08 am, Randall Smith wrote:


Though I didn't mention it in the description, the storage server is
appending a CRC32 checksum for routine integrity checks.  So by the time
the data hits the disk, it will have added both a 256 byte translation
table and a 4 byte checksum.



http://stackoverflow.com/questions/1515914/crc32-collision







Not sure why you posted the link.  The crc32 checksum is just to check 
for possible filesystem corruption.  The system does periodic data 
corruption checks.  BTRFS uses crc32 checksums also.  Please explain.


-Randall

--
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-29 Thread Gene Heskett


On Saturday 27 June 2015 08:27:38 Laura Creighton wrote:
 In a message of Sat, 27 Jun 2015 20:16:47 +1000, Chris Angelico writes:
 Okay, Johannes, NOW you're proving that you don't have a clue what
 you're talking about. D-K effect doesn't go away...
 
 ChrisA

 You need to read the paper again.  That was the whole point -- when
 Kruger and Dunning went and taught the people at the bottom quadrile
 some basic skill in the task being estimated, and taught people at
 the top quadrile how poorly their peers were performing, their ability
 to estimate how they would score relative to their peers improved
 a whole lot.

 But, of course, since these were academics studying students, they had
 access to bottom-quadrile performers who actually wanted to learn and
 improve.  In the real world, it is getting the bottom-performers to
 even notice that they need improvement that may be the most difficult
 task.

 Laura

The rest of the readers of this list would do well to change may above 
to is, and carve the last sentence into something fairly substantial 
as it is a basic truth.

Zircon crystal would be ideal, we've found a few grains of it over 4 
Billion years old, but granite would do for this generation.  Laura 
obviously gets it.

Sadly, it is entirely too true in the real world. Too often the bottom 
person who made a good sales pitch, once hired, is either incapable of 
learning, or loses interest after he has been hired. I've seen both. The 
basic education they received is to blame for much of that effect. So 
they wind up getting shuffled around to various sub-jobs until you find 
something they can do efficiently.  Many times they weren't ever aware 
of why they were being moved.  Telling them depresses them, so its 
usually best to just let it work itself out.

Cheers, Gene Heskett
-- 
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
Genes Web page http://geneslinuxbox.net:6309/gene
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-29 Thread Steven D'Aprano
On Tue, 30 Jun 2015 06:52 am, Randall Smith wrote:

 Not sure why you posted the link.  The crc32 checksum is just to check
 for possible filesystem corruption.  The system does periodic data
 corruption checks.  BTRFS uses crc32 checksums also.  Please explain.

The file system can trust that anything writing to a file is allowed to
write to it, in doesn't have to defend against malicious writes. As I
understand it, your application does.

Here is the attack scenario I have in mind:

- you write a file to my computer, and neglect to encrypt it;
- and record the checksum for later;
- I insert malware into your file;
- you retrieve the file from me;
- if the checksum matches what you have on record, you accept the file;
- since you are using CRC, it is quite easy for me to ensure the 
  checksums match after inserting malware;
- and I have now successfully injected malware into your computer.

I'm making an assumption here -- I assume that the sender records a checksum
for uploaded files so that when they get something back again they can tell
whether or not it is the same content they uploaded.

* * * 

By the way, regarding the use of a substitution cipher, I spoke to the
crypto guy at work, and preimage attack is not quite the right
terminology, since that's normally used in the context of hash functions.
It's almost a known ciphertext attack, but not quite, since that
terminology refers to guessing the key from the ciphertext.

I was wrong: cryptographically strong ciphers are generally NOT resistant to
what I described as a preimage attack. If the key leaks, using AES won't
save you: an attacker with access to the key can produce a ciphertext that
decrypts to the malware of his choice, regardless of whether you use
AES-256 or rot-13. There may be other encryption methods which don't suffer
from that, but he doesn't know of them off the top of his head.

His comment was, don't leak the key.

The other threat I mentioned is that the receiver will read the content of
the file. For that, a strong cipher is much to be preferred over a weak
one, and it needs to be encrypted by the sending end, not the receiving
end. (If the receiving end does it, it has to keep the key so it can
decrypt before sending back, which means the computer's owner can just grab
the key and read the files.)



-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-28 Thread Jon Ribbens
On 2015-06-27, Steven D'Aprano st...@pearwood.info wrote:
 Despite his initial claim that he doesn't want to use AES because it's too
 slow implemented as pure Python, Randall has said that the application will
 offer AES encryption as an option. (He says it is enabled by default,
 except that the user can turn it off.) So the code is already there, all he
 has to do is call it.

You're still not listening to what he's saying. Everything you have
said in the above paragraph is false. He said he is using AES
encryption in the client, but that the server does not have the
processing power to do so (nor does it need to). He has not said
that the user can turn it off, he's just acknowledging the fact
that since the user controls their own computer, they can rewrite
the client code to do whatever they want, and there's nothing he
can do to stop them.

 The choice ought to be a no-brainer. The fact that folks are seriously
 considering using something barely one step up from a medieval substitution
 cipher in 2015 for something with real security consequences if it is
 broken goes to show what a lousy job the IT industry does for security.

The fact that you think that is happening when it isn't shows what
a lousy job you have been doing of following the thread.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-28 Thread Jon Ribbens
On 2015-06-27, Randall Smith rand...@tnr.cc wrote:
 Thankyou.  Nice points. I do think given the risks (there are always 
 risks) discussed, a successful attack of this nature is not very likely. 
   Worse case, something that looks like this would land on the disk.

 crc32 checksum + translation table + malware

 with a generated base64 name and no extension.

I'm not sure why you're bothering with the checksum, it doesn't seem
to me that it buys you anything. Personally I'd do something like
this (pseudocode):

  def obfuscate(data):
  encode_key = list(range(256))
  random.shuffle(encode_key)
  encode_key = bytes(encode_key)
  decode_key = bytes(encode_key.index(i) for i in range(256))
  return decode_key + data.translate(encode_key) + decode_key

  def deobfuscate(data):
  return data[256:-256].translate(data[:256])

The reason for appending the key as well as prepending it is that some
anti-virus or malware scanners may well look at the last part of the
file first, so putting something entirely locally-generated there may
add a bit of safety. You could also simply pad with nulls or something
of course, but again I can imagine some tools skipping backwards past
nulls.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Ian Kelly
On Fri, Jun 26, 2015 at 7:21 PM, Chris Angelico ros...@gmail.com wrote:
 On Sat, Jun 27, 2015 at 6:09 AM, Randall Smith rand...@tnr.cc wrote:
 Give me one plausible scenario where an attacker can cause malware to hit
 the disk after bytearray.translate with a 256 byte translation table and
 I'll be thankful to you.

 The entire 256-byte translation table is significant ONLY if you need
 all 256 possible bytes. Suppose I want to generate the following byte
 sequence:

 \xCD\x19

 (Okay, this is a slightly oversimplified example, as this attack
 doesn't work on a modern Windows. But back in the days of DOS, this
 program would reboot your computer.)

Nice! When I suggested the possibility of a two byte value malicious
payload, I thought it an extreme example of the hypothetical attack. I
didn't expect that somebody might actually produce one.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Steven D'Aprano
On Sat, 27 Jun 2015 03:47 pm, Ian Kelly wrote:

[...]
 Just make the AES encryption mandatory, not optional. Then the user
 cannot upload unencrypted malicious data, and the receiver cannot read
 the data. That's two problems solved.
 
 And what if somebody else writes a competing version of the client
 software that doesn't bother with the encryption step at all? The
 point was that while encryption is expected, it cannot be assumed by
 the receiver, and in fact if the data is actually malicious, then it
 likely is not even being sent by the client software in the first
 place.

Right. As I said later in my post, you have a situation where neither party
can trust the other. I'm trying to store data on your computer, and I can't
trust you not to snoop on it, and you can't trust me not to send you
malware.


 If the app does encrypt the data with AES before sending, then you don't
 gain any benefit by obfuscating an encrypted file with a classical
 monoalphabetic substitution cipher.
 
 Only if the recipient can *trust* the sender to have performed the
 encryption, which it can't, no matter how mandatory the OP tries to
 make it.

True. But in either case, a classical (i.e. insecure) cipher doesn't do the
job.


 Suppose that you hire an intern to write the choose key function, and
 not knowing any better, he simply iterates through the keys in numeric
 order, one after the other. So the first upload will use key 0, the
 second key 1, the third key 2, and so on, until key 256! - 1, then start
 again. In that case, predicting the next key is *trivial*. If I can work
 out what key you send now (I just upload a file containing
 \x00\x01\x02...\xFF to myself and see what I get), then I know what key
 the app will use next.
 
 If you upload a file to yourself, the result that you get will have no
 bearing on what key might be chosen when you upload a file to somebody
 else.

I admit it: I was getting a bit confused between attacks on the sender side
and attacks on the receiver side. The attacks I describe depend on the
sender's application doing encryption, but given that a malicious uploader
can just write their own client, that's redundant. There are easier attacks
on sender-side encryption. A back-and-forth argument on Usenet is no
substitute between a careful security analysis.

Can the sender attack encryption on the client side? Well, Chris has already
demonstrated one actual attack, based on a two-byte malicious payload. That
proves that the concept is at least possible, even if nobody uses DOS any
more.

As you go on to say:

 If the recipient system is using the system random to generate the
 key, then you can hack the application all you want, and it will give
 you precisely zero information about the state of the entropy pool on
 the remote system.

You're right. Are there other attacks where I, the sender, can get the
recipient to leak information about the key from the receiver?

Would you like to bet the answer is always No? I wouldn't.

Can you say timing attack?

http://codahale.com/a-lesson-in-timing-attacks/

Can you [generic you] believe that attackers can *reliably* attack remote
systems based on a 20µs timing differences? If you say No, then you fail
Security 101 and should step away from the computer until a security expert
can be called in to review your code.

I'm not a security expert. I'm not even a talented amateur. *Every time* I
suggest that X is secure, the security guy at work shoots me down in
flames. But nicely, because I pay his wages wink

If I say such-and-such an attack is impossible, feel free to scoff and
laugh, because I'm probably wrong. But if I say I don't know what it is,
but there's probably an attack you haven't thought of, unless you're a
security guy yourself, you probably ought to listen. (And if you are a
security guy, then you know how hard it is to secure against unknown
attacks.)

Tens of millions of zombie computers in botnets are proof that there are
exploitable attacks that programmers didn't think of. Or rather, *some*
programmers didn't think of them. Some other guys did.

I've said it before, and I will say it again: a classical substitution
cipher is trivially vulnerable to a preimage attack, strong crypto ciphers
are not. You're betting everything on the key being secret. If the keys
leaks, or is predictable, the attacker can successfully write malware on
the receiver's system. If the keys leak with AES, the system is still
secure against a preimage attack.


Nobody will be able to guess the key, we don't need strong crypto.

The Titanic is unsinkable, we don't need lifeboats for everyone.




-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Chris Angelico
On Sat, Jun 27, 2015 at 6:38 PM, Steven D'Aprano st...@pearwood.info wrote:
 I'm not a security expert. I'm not even a talented amateur. *Every time* I
 suggest that X is secure, the security guy at work shoots me down in
 flames. But nicely, because I pay his wages wink

Just out of interest, is _anybody_ active in this thread an expert on
security? I certainly am not, which means that the proposal I'm
currently putting together probably has a whole bunch of
vulnerabilities that I haven't thought of. (Though there's no emphasis
on encryption anywhere, just signing. I'm *hoping* that RSA public key
verification is sufficient, but if it isn't, it would be possible for
a malicious user to make a serious mess of stuff.) But I'm under no
delusions. I don't say this is secure - all I'm saying is this
works in proof-of-concept.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Chris Angelico
On Sat, Jun 27, 2015 at 8:05 PM, Johannes Bauer dfnsonfsdu...@gmx.de wrote:
 On 27.06.2015 11:27, Jon Ribbens wrote:

 Johannes might have all the education in the world, but he's
 demonstrated quite comprehensively in this thread that he doesn't
 have a clue what he's talking about.

 Oh, how hurtful. I might even shed a tear or two, but it's pretty clear
 to me that you're just suffering under the Dunning-Kruger effect. No
 worries, champ, it's just a phase that'll go away eventually.

Okay, Johannes, NOW you're proving that you don't have a clue what
you're talking about. D-K effect doesn't go away...

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Johannes Bauer
On 27.06.2015 10:53, Chris Angelico wrote:
 On Sat, Jun 27, 2015 at 6:38 PM, Steven D'Aprano st...@pearwood.info wrote:
 I'm not a security expert. I'm not even a talented amateur. *Every time* I
 suggest that X is secure, the security guy at work shoots me down in
 flames. But nicely, because I pay his wages wink
 
 Just out of interest, is _anybody_ active in this thread an expert on
 security?

Yes. I've done a good 10 years of work in the field doing security
(mostly applied cryptography on embedded systems with a focus on side
channels like DPA, but also security concepts and threat/risk analysis)
and spent the last 3-4 years working on my PhD in the field of IT
security. My thesis is almost(tm) finished. I would claim to be an
expert, yes.

 I certainly am not, which means that the proposal I'm
 currently putting together probably has a whole bunch of
 vulnerabilities that I haven't thought of. (Though there's no emphasis
 on encryption anywhere, just signing. I'm *hoping* that RSA public key
 verification is sufficient, but if it isn't, it would be possible for
 a malicious user to make a serious mess of stuff.) But I'm under no
 delusions. I don't say this is secure - all I'm saying is this
 works in proof-of-concept.

I must admit that I haven't seen your ideas in this thread?

Best regards,
Johannes

-- 
 Wo hattest Du das Beben nochmal GENAU vorhergesagt?
 Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
 - Karl Kaos über Rüdiger Thomas in dsa hidbv3$om2$1...@speranza.aioe.org
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Jon Ribbens
On 2015-06-27, Ian Kelly ian.g.ke...@gmail.com wrote:
 On Fri, Jun 26, 2015 at 7:21 PM, Chris Angelico ros...@gmail.com wrote:
 On Sat, Jun 27, 2015 at 6:09 AM, Randall Smith rand...@tnr.cc wrote:
 Give me one plausible scenario where an attacker can cause malware to hit
 the disk after bytearray.translate with a 256 byte translation table and
 I'll be thankful to you.

 The entire 256-byte translation table is significant ONLY if you need
 all 256 possible bytes. Suppose I want to generate the following byte
 sequence:

 \xCD\x19

 (Okay, this is a slightly oversimplified example, as this attack
 doesn't work on a modern Windows. But back in the days of DOS, this
 program would reboot your computer.)

 Nice! When I suggested the possibility of a two byte value malicious
 payload, I thought it an extreme example of the hypothetical attack. I
 didn't expect that somebody might actually produce one.

It's a good example of the interesting things that people can come up
with (for example, binary executable files that in fact are
comprised entirely of printable ASCII characters), but it isn't in
any sense an attack on the system described in this thread.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Jon Ribbens
On 2015-06-27, Chris Angelico ros...@gmail.com wrote:
 On Sat, Jun 27, 2015 at 7:07 PM, Johannes Bauer dfnsonfsdu...@gmx.de wrote:
 On 27.06.2015 10:53, Chris Angelico wrote:
 On Sat, Jun 27, 2015 at 6:38 PM, Steven D'Aprano st...@pearwood.info 
 wrote:
 I'm not a security expert. I'm not even a talented amateur. *Every time* I
 suggest that X is secure, the security guy at work shoots me down in
 flames. But nicely, because I pay his wages wink

 Just out of interest, is _anybody_ active in this thread an expert on
 security?

 Yes. I've done a good 10 years of work in the field doing security
 (mostly applied cryptography on embedded systems with a focus on side
 channels like DPA, but also security concepts and threat/risk analysis)
 and spent the last 3-4 years working on my PhD in the field of IT
 security. My thesis is almost(tm) finished. I would claim to be an
 expert, yes.

 Good, so this isn't like that episode of Yes Minister when they were
 trying to figure out whether to allow a chemical factory to be built.

Johannes might have all the education in the world, but he's
demonstrated quite comprehensively in this thread that he doesn't
have a clue what he's talking about.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Steven D'Aprano
On Sat, 27 Jun 2015 02:05 pm, Devin Jeanpierre wrote:

 On Fri, Jun 26, 2015 at 8:38 PM, Steven D'Aprano st...@pearwood.info
 wrote:
 Now you say that the application encrypts the data, except that the user
 can turn that option off.

 Just make the AES encryption mandatory, not optional. Then the user
 cannot upload unencrypted malicious data, and the receiver cannot read
 the data. That's two problems solved.
 
 No, because another application could pretend to be the file-sending
 application, but send unencrypted data instead of encrypted data.

Did you stop reading my post when you got to that? Because I went on to say:

Actually, the more I think about this, the more I come to think that the
only way this can be secure is for both the sending client application and
the receiving client appl to both encrypt the data. The sender can't
trust the receiver not to read the files, so the sender has to encrypt; the
receiver can't trust the sender not to send malicious files, so the
receiver has to encrypt too.




-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Chris Angelico
On Sat, Jun 27, 2015 at 3:59 PM, Ian Kelly ian.g.ke...@gmail.com wrote:
 On Fri, Jun 26, 2015 at 7:21 PM, Chris Angelico ros...@gmail.com wrote:
 On Sat, Jun 27, 2015 at 6:09 AM, Randall Smith rand...@tnr.cc wrote:
 Give me one plausible scenario where an attacker can cause malware to hit
 the disk after bytearray.translate with a 256 byte translation table and
 I'll be thankful to you.

 The entire 256-byte translation table is significant ONLY if you need
 all 256 possible bytes. Suppose I want to generate the following byte
 sequence:

 \xCD\x19

 (Okay, this is a slightly oversimplified example, as this attack
 doesn't work on a modern Windows. But back in the days of DOS, this
 program would reboot your computer.)

 Nice! When I suggested the possibility of a two byte value malicious
 payload, I thought it an extreme example of the hypothetical attack. I
 didn't expect that somebody might actually produce one.

I'm fairly sure this won't actually work on a modern system (I tried
it and all that happened was that debug.exe terminated), but it's
entirely possible there are other attacks. Or attacks that require
only a small number of bytes - maybe create a gzip bomb that will
expand to petabytes of data, that probably wouldn't need many unique
byte values.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Robert Kern

On 2015-06-27 04:38, Steven D'Aprano wrote:


Maybe you use Python's standard library and the Mersenne Twister. The period
of that is huge, possibly bigger than 256! (or not, I forget, and I'm too
lazy to look it up). So you think that's safe. But it's not: Mersenne
Twister is not a cryptographically secure pseudorandom number generator. If
I can get some small number of values from the Twister (by memory,
something of the order of 100 such values) then I can predict the rest for
ever.


634.

--
Robert Kern

I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth.
  -- Umberto Eco

--
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Robert Kern

On 2015-06-27 08:58, Robert Kern wrote:

On 2015-06-27 04:38, Steven D'Aprano wrote:


Maybe you use Python's standard library and the Mersenne Twister. The period
of that is huge, possibly bigger than 256! (or not, I forget, and I'm too
lazy to look it up). So you think that's safe. But it's not: Mersenne
Twister is not a cryptographically secure pseudorandom number generator. If
I can get some small number of values from the Twister (by memory,
something of the order of 100 such values) then I can predict the rest for
ever.


634.


Bah! 624.

--
Robert Kern

I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth.
  -- Umberto Eco

--
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Chris Angelico
On Sat, Jun 27, 2015 at 7:07 PM, Johannes Bauer dfnsonfsdu...@gmx.de wrote:
 On 27.06.2015 10:53, Chris Angelico wrote:
 On Sat, Jun 27, 2015 at 6:38 PM, Steven D'Aprano st...@pearwood.info wrote:
 I'm not a security expert. I'm not even a talented amateur. *Every time* I
 suggest that X is secure, the security guy at work shoots me down in
 flames. But nicely, because I pay his wages wink

 Just out of interest, is _anybody_ active in this thread an expert on
 security?

 Yes. I've done a good 10 years of work in the field doing security
 (mostly applied cryptography on embedded systems with a focus on side
 channels like DPA, but also security concepts and threat/risk analysis)
 and spent the last 3-4 years working on my PhD in the field of IT
 security. My thesis is almost(tm) finished. I would claim to be an
 expert, yes.

Good, so this isn't like that episode of Yes Minister when they were
trying to figure out whether to allow a chemical factory to be built.

 I certainly am not, which means that the proposal I'm
 currently putting together probably has a whole bunch of
 vulnerabilities that I haven't thought of. (Though there's no emphasis
 on encryption anywhere, just signing. I'm *hoping* that RSA public key
 verification is sufficient, but if it isn't, it would be possible for
 a malicious user to make a serious mess of stuff.) But I'm under no
 delusions. I don't say this is secure - all I'm saying is this
 works in proof-of-concept.

 I must admit that I haven't seen your ideas in this thread?

No, the proposal I'm putting together is unrelated. You'll see the
*vast* extent of my security skills here:

https://github.com/Rosuav/ThirdSquare

My contribution to this thread has been fairly minor, just suggesting
one attack that doesn't even work any more, not much else.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Peter Otten
Randall Smith wrote:

 Chunks of data (about 2MB) are to be stored on machines using a
 peer-to-peer protocol.  The recipient of these chunks can't assume that
 the payload is benign.  While the data senders are supposed to encrypt
 data, that's not guaranteed, and I'd like to protect the recipient
 against exposure to nefarious data by mangling or encrypting the data
 before it is written to disk.
 
 My original idea was for the recipient to encrypt using AES.  But I want
 to keep this software pure Python batteries included and not require
 installation of other platform-dependent software.  Pure Python AES and
 even DES are just way too slow.  I don't know that I really need
 encryption here, but some type of fast mangling algorithm where a bad
 actor sending a payload can't guess the output ahead of time.
 
 Any ideas are appreciated.  Thanks.

Would it be sufficient to prepend the chunk with one block, say, of random 
data? To unmangle you'd just strip off that block.

BLOCK = os.urandom(BLOCKSIZE)

def mangle(source, dest):
dest.write(BLOCK)
shutil.copyfileobj(source, dest)

def unmangle(source, dest):
source.read(BLOCKSIZE)
shutil.copyfileobj(source, dest)

Disclaimer: I did not follow the ongoing discussion.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Johannes Bauer
On 27.06.2015 10:38, Steven D'Aprano wrote:

 Can you say timing attack?
 
 http://codahale.com/a-lesson-in-timing-attacks/
 
 Can you [generic you] believe that attackers can *reliably* attack remote
 systems based on a 20µs timing differences? If you say No, then you fail
 Security 101 and should step away from the computer until a security expert
 can be called in to review your code.

Yes, as people do more and more proper crypto (in contrast to crappy
stuff like LFSR-based custom keystream generators and such), side
channels become of great importance.

 I'm not a security expert. I'm not even a talented amateur. *Every time* I
 suggest that X is secure, the security guy at work shoots me down in
 flames. But nicely, because I pay his wages wink

:-)

Being shot down in flames is the way to become a security expert,
probably the *only* way. I don't know anyone who is an expert who hasn't
had that horrible experience at least a dozen of times.

It is amazing how many holes you can poke in designs if you look at it
from enough angles. Having holes poked in my designs gives you a
thourough appreciation for the true crypto experts (i.e. people doing
theoretical cryptography).

Best regards,
Johannes

-- 
 Wo hattest Du das Beben nochmal GENAU vorhergesagt?
 Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
 - Karl Kaos über Rüdiger Thomas in dsa hidbv3$om2$1...@speranza.aioe.org
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Johannes Bauer
On 27.06.2015 11:27, Jon Ribbens wrote:

 Johannes might have all the education in the world, but he's
 demonstrated quite comprehensively in this thread that he doesn't
 have a clue what he's talking about.

Oh, how hurtful. I might even shed a tear or two, but it's pretty clear
to me that you're just suffering under the Dunning-Kruger effect. No
worries, champ, it's just a phase that'll go away eventually.

Hugs and kisses,
Johannes

-- 
 Wo hattest Du das Beben nochmal GENAU vorhergesagt?
 Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
 - Karl Kaos über Rüdiger Thomas in dsa hidbv3$om2$1...@speranza.aioe.org
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Johannes Bauer
On 27.06.2015 11:17, Chris Angelico wrote:

 Good, so this isn't like that episode of Yes Minister when they were
 trying to figure out whether to allow a chemical factory to be built.

I must admit that I have no clue about that show or that epsisode in
particular and needed to read up on it:
https://en.wikipedia.org/wiki/The_Greasy_Pole

 I must admit that I haven't seen your ideas in this thread?
 
 No, the proposal I'm putting together is unrelated. You'll see the
 *vast* extent of my security skills here:
 
 https://github.com/Rosuav/ThirdSquare
 
 My contribution to this thread has been fairly minor, just suggesting
 one attack that doesn't even work any more, not much else.

Well, if people already have a solution ready there's a good chance that
any criticism falls on deaf ears. In any case something that others have
to be responsible for, their party, their choice.

I've looked at your code even though I don't know pike. That's the
typesafe JavaScript derivative, isn't it?

The only thing that I found horrible was the ssh key format to PKCS
parsing. Man that's hacky :-) You're creating a DER structure on-the-fly
that you fill with the key and that you then have parsed back. I've only
seen ssh-keygen used to generate keys (not to initiate actual ssh
connections), why don't you use openssl to generate the keys? I think
you can generate a RSA keypair in openssl (also valid for ssh should you
need it) and I'm pretty sure that you can generate a ssh public key with
ssh-keygen from that private keypair file. That would eliminate the need
to do this kind of parsing, but it's just a PoC as I understand it.

It appears to be online-only, is that correct? Is Internet coverage so
good down under? I wish this were the case in Germany :-/

Not 100% about it, but I think that the bus concepts that are active in
Germany (locally in some cities) either user asymmetric transponders
(i.e. SmartMX), which gives a beautiful, decentralized, secure and
offline solution at the cost of being comparatively expensive. The
others use symmetric transponders which have limited off-line
functionality: i.e. monotonic counters which are reset in a
cryptographically secured way by backend systems every time a
online-connection persists and which are counted down in the offline case.

In any case, interesting. Thanks for sharing.
Best regards,
Johannes

-- 
 Wo hattest Du das Beben nochmal GENAU vorhergesagt?
 Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
 - Karl Kaos über Rüdiger Thomas in dsa hidbv3$om2$1...@speranza.aioe.org
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Jussi Piitulainen
Laura Creighton writes:

 Johannes, if you don't know Yes, Minister then you most likely do
 not know the Politician's Syllogism (which now has its own wikipedia
 page :)  And I _didn't_ do it!  Honest!)

 Something must be done.
 This is something.
 Therefore we must do it!

Surely that's to be worded as follows? To have a stricter syllogistic
form.

We need to do something.
This is something.
Therefore we need to do this.

Or,

We must do something.
This is something.
Therefore we must do this.

Or, but this feels weaker to me,

Something must be done.
This is something.
Therefore this must be done.

ISWIM. In particular, I think the move from the agentless passive (must
be done) to the specific expression of agency (we must do) seems to me
to be a *different* joke.

(I was tempted to call it passive temperature but that would have been
yet another, unrelated joke.)

:)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Grant Edwards
On 2015-06-26, Randall Smith rand...@tnr.cc wrote:

 The only person who can read a file is the owner.

That's always the plan, but many a successful exploit has been based
on breaking that assumption.  If privacy actually matters, that's not
a good assumption to rely on as a single point of failure.

--
Grant

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Johannes Bauer
On 27.06.2015 12:16, Chris Angelico wrote:

 Okay, Johannes, NOW you're proving that you don't have a clue what
 you're talking about. D-K effect doesn't go away...

:-D

It does in some people. I've seen it happen, with knowledge comes
humility. Not saying Jon is a lost cause just yet. He's just in
intellectual puberty right now. I'm giving him a few years to re-judge.

Cheers,
Johannes

-- 
 Wo hattest Du das Beben nochmal GENAU vorhergesagt?
 Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
 - Karl Kaos über Rüdiger Thomas in dsa hidbv3$om2$1...@speranza.aioe.org
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Laura Creighton
Johannes, if you don't know Yes, Minister then you most likely do
not know the Politician's Syllogism (which now has its own wikipedia
page :)  And I _didn't_ do it!  Honest!)

Something must be done.
This is something.
Therefore we must do it!

:)

Unfortunatetely, the Politician's Syllogism is not restricted to television
comedies.  It's alive and well in Brussels, and mixes very nicely
with the Dunning-Kruger effect.

Laura
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Chris Angelico
On Sat, Jun 27, 2015 at 8:18 PM, Johannes Bauer dfnsonfsdu...@gmx.de wrote:
 On 27.06.2015 11:17, Chris Angelico wrote:

 Good, so this isn't like that episode of Yes Minister when they were
 trying to figure out whether to allow a chemical factory to be built.

 I must admit that I have no clue about that show or that epsisode in
 particular and needed to read up on it:
 https://en.wikipedia.org/wiki/The_Greasy_Pole

 I must admit that I haven't seen your ideas in this thread?

 No, the proposal I'm putting together is unrelated. You'll see the
 *vast* extent of my security skills here:

 https://github.com/Rosuav/ThirdSquare

 My contribution to this thread has been fairly minor, just suggesting
 one attack that doesn't even work any more, not much else.

 Well, if people already have a solution ready there's a good chance that
 any criticism falls on deaf ears. In any case something that others have
 to be responsible for, their party, their choice.

 I've looked at your code even though I don't know pike. That's the
 typesafe JavaScript derivative, isn't it?

Not really; it's more like Python semantics meets C++ syntax. But
that's still off-topic for this list; I'd be happy to continue
discussion off-list with anyone who's interested.

The most interesting part of the project is the README, to be honest.
Even if you can't understand a single line of the code, you'll be able
to see the specs. Grokking the code is a bonus.

 The only thing that I found horrible was the ssh key format to PKCS
 parsing. Man that's hacky :-) You're creating a DER structure on-the-fly
 that you fill with the key and that you then have parsed back. I've only
 seen ssh-keygen used to generate keys (not to initiate actual ssh
 connections), why don't you use openssl to generate the keys? I think
 you can generate a RSA keypair in openssl (also valid for ssh should you
 need it) and I'm pretty sure that you can generate a ssh public key with
 ssh-keygen from that private keypair file. That would eliminate the need
 to do this kind of parsing, but it's just a PoC as I understand it.

Yeah, it's pretty disgusting. I could actually use Pike to generate
the keys, rather than using ssh-keygen at all, but I wanted to
demonstrate that this is using a well-known key generation method,
ergo I don't need to separately prove that the keys are appropriately
random. (Not that I distrust Pike, but it's one less thing to try to
prove.)

 It appears to be online-only, is that correct? Is Internet coverage so
 good down under? I wish this were the case in Germany :-/

Correct, that's one of the key changes. Our current system (Myki) is a
stored-value card - if you recharge a hundred bucks, then clone your
card, you'd have two cards with a hundred bucks each. With
ThirdSquare, if you clone your card, you have two cards that draw on
the same hundred bucks. (Though I still don't want people cloning
cards, as it would confuse the system some. Plus, it'd be really
REALLY bad if someone could clone someone *else's* card, thus
effectively stealing a copy of it. So the cards themselves need some
kind of security, but short of public key crypto performed actually on
the cards, I'm not sure how to do that.)

My plan is to stick a 3G/4G device onto each bus. So long as there's
mobile phone coverage on all routes, which should be fine in suburbia,
the system will work. It can cope with short dropouts (up to ten
minutes), queueing requests in the client.

 Not 100% about it, but I think that the bus concepts that are active in
 Germany (locally in some cities) either user asymmetric transponders
 (i.e. SmartMX), which gives a beautiful, decentralized, secure and
 offline solution at the cost of being comparatively expensive. The
 others use symmetric transponders which have limited off-line
 functionality: i.e. monotonic counters which are reset in a
 cryptographically secured way by backend systems every time a
 online-connection persists and which are counted down in the offline case.

Thanks for the name, I'll check that out. Ideally, I'd like to use
off-the-shelf hardware for everything, and open-source software. It
should be possible for anyone to pick up the specs, buy their own
hardware, and create something that interoperates with the rest of the
system - for instance, the Red Engine Group could allow their
customers to ding their tickets to buy coffee - simply by providing
the appropriate public keys to the central authorizing database. That
would be a massive improvement over Melbourne's previous ticketing
system (Metcard), which was entirely proprietary; expansion of the
fleet required additional validators, and basically the public
transport operators had to beg, cap in hand, for the company to do
them a favour - for which, of course, they then also had to pay the
earth for. But I digress.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Laura Creighton
In a message of Sat, 27 Jun 2015 15:23:07 +0300, Jussi Piitulainen writes:
Laura Creighton writes:

 Johannes, if you don't know Yes, Minister then you most likely do
 not know the Politician's Syllogism (which now has its own wikipedia
 page :)  And I _didn't_ do it!  Honest!)

 Something must be done.
 This is something.
 Therefore we must do it!

Surely that's to be worded as follows? To have a stricter syllogistic
form.

We need to do something.
This is something.
Therefore we need to do this.

Somehow doesn't have the same comedic ring, though.  So the version I
posted is what was on the tv show.  (Or rather in Yes, Prime Minister.)
The Minister becomes Prime Minister for the last 2 seasons.

In the television show it is just called 'Politician's Logic'.  Not
sure who started calling it the Politicians Syllogism.

Laura

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Ian Kelly
On Sat, Jun 27, 2015 at 5:33 AM, Chris Angelico ros...@gmail.com wrote:
 On Sat, Jun 27, 2015 at 8:18 PM, Johannes Bauer dfnsonfsdu...@gmx.de wrote:
 I've looked at your code even though I don't know pike. That's the
 typesafe JavaScript derivative, isn't it?

 Not really; it's more like Python semantics meets C++ syntax. But
 that's still off-topic for this list; I'd be happy to continue
 discussion off-list with anyone who's interested.

That description could apply equally well to Javascript, though. ;-)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Jon Ribbens
On 2015-06-27, Johannes Bauer dfnsonfsdu...@gmx.de wrote:
 On 27.06.2015 11:27, Jon Ribbens wrote:
 Johannes might have all the education in the world, but he's
 demonstrated quite comprehensively in this thread that he doesn't
 have a clue what he's talking about.

 Oh, how hurtful. I might even shed a tear or two, but it's pretty clear
 to me that you're just suffering under the Dunning-Kruger effect. No
 worries, champ, it's just a phase that'll go away eventually.

I guess we need to add the Dunning-Kruger effect to that ever-growing
list of things that you don't understand then...
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Laura Creighton
In a message of Sat, 27 Jun 2015 20:16:47 +1000, Chris Angelico writes:

Okay, Johannes, NOW you're proving that you don't have a clue what
you're talking about. D-K effect doesn't go away...

ChrisA

You need to read the paper again.  That was the whole point -- when
Kruger and Dunning went and taught the people at the bottom quadrile
some basic skill in the task being estimated, and taught people at
the top quadrile how poorly their peers were performing, their ability
to estimate how they would score relative to their peers improved
a whole lot.

But, of course, since these were academics studying students, they had
access to bottom-quadrile performers who actually wanted to learn and
improve.  In the real world, it is getting the bottom-performers to
even notice that they need improvement that may be the most difficult task.

Laura









-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Steven D'Aprano
On Sun, 28 Jun 2015 01:09 am, Ian Kelly wrote:

 On Sat, Jun 27, 2015 at 2:38 AM, Steven D'Aprano st...@pearwood.info
 wrote:
 Can you [generic you] believe that attackers can *reliably* attack remote
 systems based on a 20µs timing differences? If you say No, then you
 fail Security 101 and should step away from the computer until a security
 expert can be called in to review your code.
 
 Of course. I wouldn't bet the house on it, but with the proposed
 substitution cipher system, I don't see why there would be any
 measurable timing differences at all based on the choice of key.

I wouldn't bet one wooden nickle on it. Not without a security audit of the
application. And then what happens when the implementation changes and the
audit is no longer valid?

Despite his initial claim that he doesn't want to use AES because it's too
slow implemented as pure Python, Randall has said that the application will
offer AES encryption as an option. (He says it is enabled by default,
except that the user can turn it off.) So the code is already there, all he
has to do is call it.

It might not be a timing attack. Maybe there's a vulnerability in the
application that if you upload a sufficiently large file, a buffer will
overflow and you can force the key of your choosing. Who knows? Bugs
happen. The nature of how the hypothetical key leakage happens is less
important than the consequences if there is one.

Randall can:

(1) bet the security of his application and his users on the key never
leaking; 

Why have you situated a naked flame right next to the gas tank?

It's okay, I'm confident that the tank will never leak.


or


(2) use something which, *even if the key leaks*, is still resistant to
preimage attacks.


The choice ought to be a no-brainer. The fact that folks are seriously
considering using something barely one step up from a medieval substitution
cipher in 2015 for something with real security consequences if it is
broken goes to show what a lousy job the IT industry does for security.



 The time to obfuscate a single byte is constant, 

Are you sure about that? Bet your house? How about your computer?


# Python 3.3 on Linux, YMMV

py text = 'NOBODY expects the Spanish Inquisition!'*5
py import string
py s = string.digits + string.ascii_letters
py t = (string.ascii_uppercase + string.digits[::-1] + 
... string.ascii_lowercase)
py trans1 = str.maketrans('abcdef', 'fedcba')
py trans2 = str.maketrans(s, t)
py trans3 = str.maketrans('aZ', 'Za')
py with Stopwatch():
... x = str.translate(text, trans1)
...
time taken: 0.427513 seconds
py with Stopwatch():
... x = str.translate(text, trans2)
...
time taken: 0.228869 seconds
py with Stopwatch():
... x = str.translate(text, trans3)
...
time taken: 0.387105 seconds



 so the total time to 
 obfuscate the payload should just be a function of the length of the
 data.


Good thing you didn't bet your house.



-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Paul Rubin
Michael Torrie torr...@gmail.com writes:
 Furthermore you cannot prove a negative, which is what proving
 security is for anything but the trivial case. Are you saying this is
 untrue?

I've always thought that there are no two even numbers that when you add
them together, give you an odd number.  Are you saying that statement
can't be proven?

 But how does one prove a system is secure except by enumerating attack
 vectors

In the case of encryption, you do a reduction proof to a recognized
primitive like AES.  That is, you show that if your system is breakable,
you can transform the break into a break against AES itself.  That's the
best you can do at the moment, because the open status of the P!=NP
problem means that no one knows how to prove that any primitive (such as
AES) is secure.  The reduction proof means that the evidence for AES's
security also applies to your system.

Of course that's just for the cipher itself.  For the entire surrounding
software/hardware/process system which is mostly not mathematical,
you're right, there's no way to (mathematically) prove security or even
to define it.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Steven D'Aprano
On Sun, 28 Jun 2015 03:35 am, Steven D'Aprano wrote:

 On Sun, 28 Jun 2015 01:09 am, Ian Kelly wrote:

 The time to obfuscate a single byte is constant,
 
 Are you sure about that? Bet your house? How about your computer?

Correction: the example I showed uses str, not bytes.

With bytes, the timing differences are much smaller. Are they statistically
distinguishable? Don't know. On my machine, they appear to be, although
that could be just a fluke. Is there a guarantee that bytes.translate will
always be constant time per byte? No of course not. Might the application
itself some day start using str.translate? Who knows?

The point is, you cannot rely on this. Preventing leakage is *hard*.


-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Ian Kelly
On Sat, Jun 27, 2015 at 2:38 AM, Steven D'Aprano st...@pearwood.info wrote:
 Can you [generic you] believe that attackers can *reliably* attack remote
 systems based on a 20µs timing differences? If you say No, then you fail
 Security 101 and should step away from the computer until a security expert
 can be called in to review your code.

Of course. I wouldn't bet the house on it, but with the proposed
substitution cipher system, I don't see why there would be any
measurable timing differences at all based on the choice of key. The
time to obfuscate a single byte is constant, so the total time to
obfuscate the payload should just be a function of the length of the
data.

Secondly, the 200 (or whatever) response to the client does not depend
on the outcome of the obfuscation step, so there is no reason that the
server cannot simply respond first and obfuscate after, giving the
client nothing to time.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Randall Smith

On 06/26/2015 08:21 PM, Chris Angelico wrote:

On Sat, Jun 27, 2015 at 6:09 AM, Randall Smith rand...@tnr.cc wrote:

Give me one plausible scenario where an attacker can cause malware to hit
the disk after bytearray.translate with a 256 byte translation table and
I'll be thankful to you.


The entire 256-byte translation table is significant ONLY if you need
all 256 possible bytes. Suppose I want to generate the following byte
sequence:

\xCD\x19

(Okay, this is a slightly oversimplified example, as this attack
doesn't work on a modern Windows. But back in the days of DOS, this
program would reboot your computer.)

How many truly different translation tables are there if I'm trying to
produce this? Just 256*255, or 65280. If I send random two-byte files,
there is one chance in that of my malware successfully landing. Once
I've sent about 45,000 of those files, I have a fifty-fifty chance of
having hit it. Send twice as many, I have a 75% chance of success,
etc.



Yes, that's true.  It's even an issue with AES, which uses padding to 
overcome.  That said, remember these are bytes going straight to disk. 
I'm really not concerned about 2 or 3 byte malware and the probability 
plunges after that.  And remember, normal use case is AES encrypted data.


Quite interesting.

Though I didn't mention it in the description, the storage server is 
appending a CRC32 checksum for routine integrity checks.  So by the time 
the data hits the disk, it will have added both a 256 byte translation 
table and a 4 byte checksum.  I think that would interfere with any 
extremely short malware.




Malware can be crafted to fit within certain restrictions. I saw a
proof-of-concept and analysis document detailing a particular remote
code execution/privilege escalation attack that involved stuffing
text into an entry field and then inducing the program to read that
into its stack, finally triggering it by some sort of buffer overflow,
I think. The text had to be no more than X bytes long (because that's
all the entry field was set to accept - it'd truncate after that), and
had to not contain any NUL bytes, and there might have been other
restrictions too. Sure, it makes it harder to write your malware...
but imagine if you can write something in just a handful of different
bytes, which then goes and triggers something else. You could have an
extremely plausible attack that might need only a day's uploading to
deliver.

It makes no difference that there are 256! possible encryption keys,
if most of them have the same result.

ChrisA




Thankyou.  Nice points. I do think given the risks (there are always 
risks) discussed, a successful attack of this nature is not very likely. 
 Worse case, something that looks like this would land on the disk.


crc32 checksum + translation table + malware

with a generated base64 name and no extension.


Doesn't seem like much of a threat.   Much less likely than a bug in the 
standard Crytpo libraries.


-Randall


--
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Michael Torrie
On 06/26/2015 03:11 PM, Johannes Bauer wrote:
 You misunderstand. This is now how it works, this is not how any of this
 works. Steven does not *at all* have to prove to you your system is
 breakable or show actual attacks. YOU have to prove that your system is
 secure. 

Ahh the holy grail of computer science.  Now it's been a while since I
finished my CS degree, but I recall spending a lot of time in class
talking about the proving code correctness, which is a similar problem,
and learning that that was thought to be NP complete.  Furthermore you
cannot prove a negative, which is what proving security is for anything
but the trivial case.

Are you saying this is untrue?

Obviously there are best practices, which you are an expert in.  But how
does one prove a system is secure except by enumerating attack vectors
and addressing each one, preferably in the design phase?

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Randall Smith

On 06/27/2015 03:29 AM, Peter Otten wrote:



Would it be sufficient to prepend the chunk with one block, say, of random
data? To unmangle you'd just strip off that block.

BLOCK = os.urandom(BLOCKSIZE)

def mangle(source, dest):
 dest.write(BLOCK)
 shutil.copyfileobj(source, dest)

def unmangle(source, dest):
 source.read(BLOCKSIZE)
 shutil.copyfileobj(source, dest)

Disclaimer: I did not follow the ongoing discussion.



That is happening as a side effect.  Though not completely random, after 
running the data through a translation table, the 256 byte table is 
prepended.  Then a 4 byte checksum is calculated and prepended.


-Randall


--
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Randall Smith

On 06/27/2015 07:38 AM, Grant Edwards wrote:

On 2015-06-26, Randall Smith rand...@tnr.cc wrote:


The only person who can read a file is the owner.


That's always the plan, but many a successful exploit has been based
on breaking that assumption.  If privacy actually matters, that's not
a good assumption to rely on as a single point of failure.

--
Grant




The owner (client software) encrypts the data using AES.  This is the 
default behavior of the client software.  If the client chooses to 
disable encryption, that's their issue for sure.  I'm trying to make 
sure it doesn't become the storage server's issue too.


-Randall

--
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Michael Torrie
On Jun 27, 2015 11:51 AM, Paul Rubin no.email@nospam.invalid wrote:

 Michael Torrie torr...@gmail.com writes:
  Furthermore you cannot prove a negative, which is what proving
  security is for anything but the trivial case. Are you saying this is
  untrue?

 I've always thought that there are no two even numbers that when you add
 them together, give you an odd number.  Are you saying that statement
 can't be proven?

  But how does one prove a system is secure except by enumerating attack
  vectors

 In the case of encryption, you do a reduction proof to a recognized
 primitive like AES.  That is, you show that if your system is breakable,
 you can transform the break into a break against AES itself.  That's the
 best you can do at the moment, because the open status of the P!=NP
 problem means that no one knows how to prove that any primitive (such as
 AES) is secure.  The reduction proof means that the evidence for AES's
 security also applies to your system.

 Of course that's just for the cipher itself.  For the entire surrounding
 software/hardware/process system which is mostly not mathematical,
 you're right, there's no way to (mathematically) prove security or even
 to define it.

Ahh okay. So what he's referring to must be such reductions and proofs of
these provable aspects, though he spoke very broadly.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Steven D'Aprano
On Sun, 28 Jun 2015 03:08 am, Randall Smith wrote:

 Though I didn't mention it in the description, the storage server is
 appending a CRC32 checksum for routine integrity checks.  So by the time
 the data hits the disk, it will have added both a 256 byte translation
 table and a 4 byte checksum.


http://stackoverflow.com/questions/1515914/crc32-collision




-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Steven D'Aprano
On Sun, 28 Jun 2015 04:22 am, Randall Smith wrote:

 The owner (client software) encrypts the data using AES.  This is the
 default behavior of the client software.  If the client chooses to
 disable encryption, that's their issue for sure.

I cannot imagine what you think you gain from allowing that to be optional.
Apart from privacy and security breaches.



-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Steven D'Aprano
On Sun, 28 Jun 2015 06:30 am, Devin Jeanpierre wrote:

 On Fri, Jun 26, 2015 at 11:16 PM, Steven D'Aprano st...@pearwood.info
 wrote:
 On Sat, 27 Jun 2015 02:05 pm, Devin Jeanpierre wrote:

 On Fri, Jun 26, 2015 at 8:38 PM, Steven D'Aprano st...@pearwood.info
 wrote:
 Now you say that the application encrypts the data, except that the
 user can turn that option off.

 Just make the AES encryption mandatory, not optional. Then the user
 cannot upload unencrypted malicious data, and the receiver cannot read
 the data. That's two problems solved.

 No, because another application could pretend to be the file-sending
 application, but send unencrypted data instead of encrypted data.

 Did you stop reading my post when you got to that? Because I went on to
 say:
 
 At that point I quit in frustration, yeah.
 
 Actually, the more I think about this, the more I come to think that the
 only way this can be secure is for both the sending client application
 and the receiving client appl to both encrypt the data. The sender can't
 trust the receiver not to read the files, so the sender has to encrypt;
 the receiver can't trust the sender not to send malicious files, so the
 receiver has to encrypt too.
 
 When you realize you've said something completely wrong, you should
 edit your email.

If both the sender and receiver encrypt the data, how is is completely
wrong to say that encrypting data should be mandatory?




-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Devin Jeanpierre
On Fri, Jun 26, 2015 at 11:16 PM, Steven D'Aprano st...@pearwood.info wrote:
 On Sat, 27 Jun 2015 02:05 pm, Devin Jeanpierre wrote:

 On Fri, Jun 26, 2015 at 8:38 PM, Steven D'Aprano st...@pearwood.info
 wrote:
 Now you say that the application encrypts the data, except that the user
 can turn that option off.

 Just make the AES encryption mandatory, not optional. Then the user
 cannot upload unencrypted malicious data, and the receiver cannot read
 the data. That's two problems solved.

 No, because another application could pretend to be the file-sending
 application, but send unencrypted data instead of encrypted data.

 Did you stop reading my post when you got to that? Because I went on to say:

At that point I quit in frustration, yeah.

 Actually, the more I think about this, the more I come to think that the
 only way this can be secure is for both the sending client application and
 the receiving client appl to both encrypt the data. The sender can't
 trust the receiver not to read the files, so the sender has to encrypt; the
 receiver can't trust the sender not to send malicious files, so the
 receiver has to encrypt too.

When you realize you've said something completely wrong, you should
edit your email.

-- Devin
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Chris Angelico
On Sun, Jun 28, 2015 at 4:51 AM, Steven D'Aprano st...@pearwood.info wrote:
 On Sun, 28 Jun 2015 04:22 am, Randall Smith wrote:

 The owner (client software) encrypts the data using AES.  This is the
 default behavior of the client software.  If the client chooses to
 disable encryption, that's their issue for sure.

 I cannot imagine what you think you gain from allowing that to be optional.
 Apart from privacy and security breaches.

I've no idea whether this is the case or not, but one thing you might
gain is independence from a third-party module. You could, for
instance, automatically AES-encrypt your data, but only if from
Crypto.Cipher import AES didn't raise ImportError. That effectively
makes encryption optional (the program won't barf for lack of pycrypto
installation), while still clearly being the default - and if you have
a nice loud warning, then it's clear that encryption is the normal
state, and the fallback is a lesser state.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Devin Jeanpierre
On Sat, Jun 27, 2015 at 6:18 PM, Steven D'Aprano st...@pearwood.info wrote:
 On Sun, 28 Jun 2015 06:30 am, Devin Jeanpierre wrote:

 On Fri, Jun 26, 2015 at 11:16 PM, Steven D'Aprano st...@pearwood.info
 wrote:
 On Sat, 27 Jun 2015 02:05 pm, Devin Jeanpierre wrote:

 On Fri, Jun 26, 2015 at 8:38 PM, Steven D'Aprano st...@pearwood.info
 wrote:
 Now you say that the application encrypts the data, except that the
 user can turn that option off.

 Just make the AES encryption mandatory, not optional. Then the user
 cannot upload unencrypted malicious data, and the receiver cannot read
 the data. That's two problems solved.

 No, because another application could pretend to be the file-sending
 application, but send unencrypted data instead of encrypted data.

 Did you stop reading my post when you got to that? Because I went on to
 say:

 At that point I quit in frustration, yeah.

 Actually, the more I think about this, the more I come to think that the
 only way this can be secure is for both the sending client application
 and the receiving client appl to both encrypt the data. The sender can't
 trust the receiver not to read the files, so the sender has to encrypt;
 the receiver can't trust the sender not to send malicious files, so the
 receiver has to encrypt too.

 When you realize you've said something completely wrong, you should
 edit your email.

 If both the sender and receiver encrypt the data, how is is completely
 wrong to say that encrypting data should be mandatory?

That isn't what I was calling completely wrong. This is:

 Just make the AES encryption mandatory, not optional. Then the user
 cannot upload unencrypted malicious data, and the receiver cannot read
 the data. That's two problems solved.

The user can still upload unencrypted malicious data by writing their
own client that doesn't have mandatory AES encryption.

You realized this later in the email, apparently, which is why you
should have edited your own email to delete your original, insecure,
suggestion. :(

That said, I appreciate the work you've done here asking for a
specific threat model and pushing back on the idea that it's up to
python-list to prove something is insecure, not the other way around.
That's important. I think, for the same reasons, it's also important
to be really careful what cryptosystems we discuss, and not suggest or
appear to suggest ones that won't work.


P.S. FWIW, the base64 idea has a lot of promise and is probably
fundamentally better than a crypto algorithm. With something along the
lines of base64 -- say, encoding a file using just the letters 'a' and
'b' -- one might try to make it it literally impossible to write bad
things to disk, whereas with any crypto, it is always possible to
obtain the key, so one has to be careful with key management to
prevent/mitigate that.  (One might add: why not both? Beats me. I like
using extension modules.)

P.P.S.: of course, I'm not an expert.

-- Devin
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-27 Thread Ian Kelly
On Sat, Jun 27, 2015 at 11:35 AM, Steven D'Aprano st...@pearwood.info wrote:
 On Sun, 28 Jun 2015 01:09 am, Ian Kelly wrote:

 On Sat, Jun 27, 2015 at 2:38 AM, Steven D'Aprano st...@pearwood.info
 wrote:
 Can you [generic you] believe that attackers can *reliably* attack remote
 systems based on a 20µs timing differences? If you say No, then you
 fail Security 101 and should step away from the computer until a security
 expert can be called in to review your code.

 Of course. I wouldn't bet the house on it, but with the proposed
 substitution cipher system, I don't see why there would be any
 measurable timing differences at all based on the choice of key.

 I wouldn't bet one wooden nickle on it. Not without a security audit of the
 application. And then what happens when the implementation changes and the
 audit is no longer valid?

I don't disagree about the security audit, although I think you'll
find that such things will require a greater investment of resources
than a wooden nickel.

 Despite his initial claim that he doesn't want to use AES because it's too
 slow implemented as pure Python, Randall has said that the application will
 offer AES encryption as an option.

Once again you're confusing what he said about the server with what he
said about the client. Just because he considers it too slow for data
mangling on the server doesn't make it too slow for any use.

 The time to obfuscate a single byte is constant,

 Are you sure about that? Bet your house? How about your computer?


 # Python 3.3 on Linux, YMMV

 py text = 'NOBODY expects the Spanish Inquisition!'*5
 py import string
 py s = string.digits + string.ascii_letters
 py t = (string.ascii_uppercase + string.digits[::-1] +
 ... string.ascii_lowercase)
 py trans1 = str.maketrans('abcdef', 'fedcba')
 py trans2 = str.maketrans(s, t)
 py trans3 = str.maketrans('aZ', 'Za')
 py with Stopwatch():
 ... x = str.translate(text, trans1)
 ...
 time taken: 0.427513 seconds
 py with Stopwatch():
 ... x = str.translate(text, trans2)
 ...
 time taken: 0.228869 seconds
 py with Stopwatch():
 ... x = str.translate(text, trans3)
 ...
 time taken: 0.387105 seconds

Your examples are using partial keys of different sizes. It's hardly
surprising that the timing varies when you pass dicts of varying sizes
as the translation tables.

py a = list(range(256))
py b = random.sample(a, 256)
py c = random.sample(a, 256)
py d = random.sample(a, 256)
py min(timeit.repeat(str.translate(text, a), from __main__ import
text, a, number=10, repeat=10))
0.9780099680647254
py min(timeit.repeat(str.translate(text, b), from __main__ import
text, b, number=10, repeat=10))
0.9837233647704124
py min(timeit.repeat(str.translate(text, c), from __main__ import
text, c, number=10, repeat=10))
0.9627216667868197
py min(timeit.repeat(str.translate(text, d), from __main__ import
text, d, number=10, repeat=10))
0.9793561780825257
py min(timeit.repeat(str.translate(text, c), from __main__ import
text, c, number=10, repeat=10))
0.9840573272667825

I ran it on c a second time to see if the 0.962 timing was systemic or
a fluke. The fact that c produced both the shortest and longest
timings out of only two runs lends me confidence (for the purpose of
this discussion) that the variation seen in these timings is random
and not correlated to the keys used.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-26 Thread Ian Kelly
On Fri, Jun 26, 2015 at 9:38 PM, Steven D'Aprano st...@pearwood.info wrote:
 With respect Randall, you contradict yourself. Is there any wonder that some
 of us (well, me at least) is suspicious and confused, when your story
 changes as often as the weather?

 Sometimes you say that the client software uses AES encryption. Sometimes
 you say that you don't want to use AES encryption because you want the
 client to be pure Python, and a pure-Python implementation would be too
 slow. Your very first post says:

 My original idea was for the recipient to encrypt using AES.  But
 I want to keep this software pure Python batteries included and
 not require installation of other platform-dependent software.
 Pure Python AES and even DES are just way too slow.

In the context of the initial post, this was referring to the data
mangling done by the receiver; it has no bearing on the form of the
data sent by the application.

 Sometimes you say the user is supposed to encrypt the data themselves:

 While the data senders are supposed to encrypt data, that's not
 guaranteed

Whereas this clearly describes the behavior of the application itself.

 Now you say that the application encrypts the data, except that the user can
 turn that option off.

 Just make the AES encryption mandatory, not optional. Then the user cannot
 upload unencrypted malicious data, and the receiver cannot read the data.
 That's two problems solved.

And what if somebody else writes a competing version of the client
software that doesn't bother with the encryption step at all? The
point was that while encryption is expected, it cannot be assumed by
the receiver, and in fact if the data is actually malicious, then it
likely is not even being sent by the client software in the first
place.

 If the app does encrypt the data with AES before sending, then you don't
 gain any benefit by obfuscating an encrypted file with a classical
 monoalphabetic substitution cipher.

Only if the recipient can *trust* the sender to have performed the
encryption, which it can't, no matter how mandatory the OP tries to
make it.

 Suppose that you hire an intern to write the choose key function, and not
 knowing any better, he simply iterates through the keys in numeric order,
 one after the other. So the first upload will use key 0, the second key 1,
 the third key 2, and so on, until key 256! - 1, then start again. In that
 case, predicting the next key is *trivial*. If I can work out what key you
 send now (I just upload a file containing \x00\x01\x02...\xFF to myself
 and see what I get), then I know what key the app will use next.

If you upload a file to yourself, the result that you get will have no
bearing on what key might be chosen when you upload a file to somebody
else.

 Even if I can't do that, I might be able to guess the seed: I know what time
 the application started up, to within a few milliseconds,

How?

 and I know (or
 can guess) how many random numbers you have used,

How?

 Except... you're getting your random numbers from a system *I* control.

No you don't. If you did already control the target system, then as
already suggested, you have no need to attack the data upload; you can
just write whatever data you want to disk. This is like suggesting
that the sudoers file is insecure because a user with root access
would be able to add themselves to it.

 If the attacker controlled the machine the app was on, why would it fool
 with /dev/urandom?  I think he'd just plant the files he wanted to plant
 and be done.  This is non-nonsensical anyway.

 No, you don't understand the nature of the attack. In this scenario, the
 sender is the attacker. I want to upload malicious files to the receiver.
 You are trying to stop me, that's the whole point of mangling or
 encrypting the files. (Your words.) So I, the sender, prepare a file such
 that when you mangle it, the resulting mangled content is the malicious
 content I want.

 If you use a substitution cipher, I can do this if I can guess or force the
 key. If you use strong crypto, I can't.

 However, I can hack the application. The client sits on my computer, it's
 pure Python, even if it isn't I can still hack the application, I don't
 need access to the source code.

If the recipient system is using the system random to generate the
key, then you can hack the application all you want, and it will give
you precisely zero information about the state of the entropy pool on
the remote system.

 Yes. Do you think that's hard for an attacker who has access to your
 application, possibly including the source code, and controls all the
 sources of entropy on the system your application is running on?

 I don't have to *randomly* guess. I control what time your application
 starts, I control what randomness you get from /dev/urandom, I control how
 many keys you go through, I might even be able to read the source code of
 the application (not that I need to, that just makes it 

Re: Pure Python Data Mangling or Encrypting

2015-06-26 Thread Steven D'Aprano
On Fri, 26 Jun 2015 11:01 am, Ian Kelly wrote:

 You're making the same mistake that Steven did in misunderstanding the
 threat model.

I don't think I'm misunderstanding the threat, I think I'm pointing out a
threat which the OP is hoping to just ignore.

In an earlier post, I suggested that the threat model should involve at
least *three* different attacks, apart from the usual man-in-the-model
attacks of data in transit.

One is that the attacker is the person sending the data. E.g. I want to send
a nasty payload (say, malware, or an offensive image). Another is that the
attacker is the recipient of the file, who wants to read the sender's data.

As far as I can tell, the OP's plan to defend the sender's privacy is to
dump responsibility for encrypting the files in the sender's lap. As far as
I'm concerned, perhaps as many as one user in 2 will pre-encrypt their
files. (Early adopters will be unrepresentative of the eventual user base
of this system. If this takes off, the user base will likely end up
dominated by people who think that qwerty is the epitome of unguessable
passwords.)

Users just don't use crypto unless their applications do it for them.

My opinion is that the application ought to do so, and not expect Aunt
Tillie to learn how to correctly use encryption software before uploading
her files. 

http://www.catb.org/jargon/html/A/Aunt-Tillie.html

It is the OP's prerogative to disagree, of course, but to me, if the OP's
app doesn't use strong crypto to encrypt users' data, that's tantamount to
saying they don't care about their users' data privacy. Using a
monoalphabetic substitution cipher to obfuscate the data is not strong
crypto.


 The goal isn't to prevent the attacker from working out 
 the key for a file that has already been obfuscated. Any real data
 that might be exposed by a vulnerability in the server is presumed to
 have already been strongly encrypted by the user.

I think that's a ridiculously unrealistic presumption, unless your user-base
is entirely taken from a very small subset of security savvy and
pedantically careful users.


 The goal is to prevent the attacker from guessing a key that hasn't
 even been generated yet, which could be exploited to engineer the
 obfuscated content into something malicious.

They don't need to predict the key exactly. If they can predict that the key
will be, lets say, one of these thousand values, then they can generate one
thousand files and upload them. One of them will match the key, and there's
your exploit. That's one attack.

A second attack is to force the key. The attacker controls the machine the
application is running on, they control /dev/urandom and can feed your app
whatever not-so-random numbers they like, so potentially they can force the
app to use the key of their choosing. Then they don't need 1000 files, they
just need one.

That's two. Does anyone think that I've thought of all the possible attacks?

(Well, hypothetical attacks. I acknowledge that I don't know the
application, and cannot be sure that it *actually is* vulnerable to these
attacks.)

The problem here is that a monoalphabetic substitution cipher is not
resistant to preimage attacks. Your only defence is that the key is
unknown. If the attacker can force the key, or predict the key, or guess a
small range of keys, they can exploit your weak cipher.

(Technically, preimage attack is usually used to refer to attacks on hash
functions. I'm not sure if the same name is used for attacks on ciphers.)

https://en.wikipedia.org/wiki/Preimage_attack

With a strong crypto cipher, there are no known preimage attacks. Even if
the attacker knows exactly what key you are using, they cannot predict what
preimage they need to supply in order to generate the malicious payload
they want after encryption. (As far as I know.)

That is the critical issue right there. The sort of simple monoalphabetic
substitution cipher using bytes.translate that the OP is using is
vulnerable to preimage attacks. Strong crypto is not.


 There are no 
 frequency-based attacks possible here, because you can't do frequency
 analysis on the result of a key that hasn't even been generated yet.

Frequency-based attacks apply to a different threat. I'm referring to at
least two different attacks here, with different attackers and different
victims. Don't mix them up.


 Assuming that you have no attack on the key generation itself, the

Not a safe assumption!


 best you can do is send a file deobfuscated with a random key and hope
 that the recipient randomly chooses the same key; the odds of that
 happening are 1 in 256!.

It's easy to come up with attacks which are no better than brute force. It's
the attacks which are better than brute force that you have to watch out
for.


-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-26 Thread Jon Ribbens
On 2015-06-26, Johannes Bauer dfnsonfsdu...@gmx.de wrote:
 On 26.06.2015 22:09, Randall Smith wrote:
 You've gone on a rampage about nothing.  My original description said
 the client was supposed to encrypt the data, but you want to assume the
 opposite for some unknown reason.

 While you seem to think that Steven is rampaging about nothing, he does
 have a fair point: You consistently were vague about wheter you want to
 have encryption, authentication or obfuscation of data. This suggests
 that you may not be so sure yourself what it is you actually want.

He hasn't been vague, you and Steven just haven't been paying
attention.

 You always play around with the 256! which would be a ridiculously high
 security margin (1684 bits of security, w!). You totally ignore that
 the system can be broken in a linear fashion.

No, it can't, because the attacker does not have access to the
ciphertext.

 Nobody assumes you're a moron. But it's safe to assume that you're a
 crypto layman, because only laymen have no clue on how difficult it is
 to get cryptography even remotely right.

Amateur crypto is indeed a bad idea. But what you're still not getting
is that what he's doing here *isn't crypto*. He's just trying to avoid
letting third parties write completely arbitrary data to the disk. You
know what would be a perfectly good solution to his problem? Base 64
encoding. That would solve the issue pretty much completely, the only
reason it's not an ideal solution is that it of course increases the
size of the data.

 That people in 2015 actually defend inventing a substitution-cipher
 cryptosystem sends literally shivers down my spine.

Nobody is defending such a thing, you just haven't understood what
problem is being solved here.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-26 Thread Randall Smith

On 06/26/2015 12:06 PM, Steven D'Aprano wrote:

On Fri, 26 Jun 2015 11:01 am, Ian Kelly wrote:


You're making the same mistake that Steven did in misunderstanding the
threat model.


I don't think I'm misunderstanding the threat, I think I'm pointing out a
threat which the OP is hoping to just ignore.


I'm not hoping to ignore anything.  I didn't explain the entire system, 
as it was not necessary to find a solution to the problem at hand.  But 
since you want to make negative assumptions about what I didn't tell 
you, I'll gladly address your accusations of negligence.




In an earlier post, I suggested that the threat model should involve at
least *three* different attacks, apart from the usual man-in-the-model
attacks of data in transit.


All communication is secured using TLS and authentication handled by 
X.509 certificates.  This prevents man in the middle attacks. 
Certificates are signed by CAs I control.




One is that the attacker is the person sending the data. E.g. I want to send
a nasty payload (say, malware, or an offensive image). Another is that the
attacker is the recipient of the file, who wants to read the sender's data.


The only person who can read a file is the owner.   AES encryption is 
built into the client software.  The only way data can be uploaded 
unencrypted is if encryption is intentionally disabled.




As far as I can tell, the OP's plan to defend the sender's privacy is to
dump responsibility for encrypting the files in the sender's lap. As far as
I'm concerned, perhaps as many as one user in 2 will pre-encrypt their
files. (Early adopters will be unrepresentative of the eventual user base
of this system. If this takes off, the user base will likely end up
dominated by people who think that qwerty is the epitome of unguessable
passwords.)


Making assumptions again.  See above.  The client software encrypts by 
default.  You're also assuming there is no password strength checking.




Users just don't use crypto unless their applications do it for them.


And it does.



My opinion is that the application ought to do so, and not expect Aunt
Tillie to learn how to correctly use encryption software before uploading
her files.

http://www.catb.org/jargon/html/A/Aunt-Tillie.html

It is the OP's prerogative to disagree, of course, but to me, if the OP's
app doesn't use strong crypto to encrypt users' data, that's tantamount to
saying they don't care about their users' data privacy. Using a
monoalphabetic substitution cipher to obfuscate the data is not strong
crypto.


You've gone on a rampage about nothing.  My original description said 
the client was supposed to encrypt the data, but you want to assume the 
opposite for some unknown reason.






The goal isn't to prevent the attacker from working out
the key for a file that has already been obfuscated. Any real data
that might be exposed by a vulnerability in the server is presumed to
have already been strongly encrypted by the user.


I think that's a ridiculously unrealistic presumption, unless your user-base
is entirely taken from a very small subset of security savvy and
pedantically careful users.


The difference is he's not assuming I'm a moron.  He's giving me the 
benefit of the doubt.  That plus I actually said, data senders are 
supposed to encrypt data.


In a networked system, you can't make assumptions about what the other 
peers are doing.  You have to handle what comes across the wire.  You 
also have to consider that you may come under attack.  That's what this 
is about.






The goal is to prevent the attacker from guessing a key that hasn't
even been generated yet, which could be exploited to engineer the
obfuscated content into something malicious.


They don't need to predict the key exactly. If they can predict that the key
will be, lets say, one of these thousand values, then they can generate one
thousand files and upload them. One of them will match the key, and there's
your exploit. That's one attack.


Thousand Values ???  Isn't it 256!, which is just freaking huge!  import 
math; math.factorial(256)




A second attack is to force the key. The attacker controls the machine the
application is running on, they control /dev/urandom and can feed your app
whatever not-so-random numbers they like, so potentially they can force the
app to use the key of their choosing. Then they don't need 1000 files, they
just need one.



If the attacker controlled the machine the app was on, why would it fool 
with /dev/urandom?  I think he'd just plant the files he wanted to plant 
and be done.  This is non-nonsensical anyway.



That's two. Does anyone think that I've thought of all the possible attacks?

(Well, hypothetical attacks. I acknowledge that I don't know the
application, and cannot be sure that it *actually is* vulnerable to these
attacks.)

The problem here is that a monoalphabetic substitution cipher is not
resistant to preimage attacks. Your only defence is that the key is
unknown. If 

Re: Pure Python Data Mangling or Encrypting

2015-06-26 Thread Johannes Bauer
On 26.06.2015 22:09, Randall Smith wrote:

 You've gone on a rampage about nothing.  My original description said
 the client was supposed to encrypt the data, but you want to assume the
 opposite for some unknown reason.

While you seem to think that Steven is rampaging about nothing, he does
have a fair point: You consistently were vague about wheter you want to
have encryption, authentication or obfuscation of data. This suggests
that you may not be so sure yourself what it is you actually want.

All Steven is doing is pointing out that people do good crypto for a
reason. It's 2015 and we're still discussion substitution ciphers,
really? Good crypto is available, it's fast, it has awesome
cryptanalysis. All Steven is pointing out is that when ten crypto-laymen
meet in a Python newsgroup and think they have invented a soooper secure
scheme, it may still be complete and utter crap. Just not everone can
see it.

You always play around with the 256! which would be a ridiculously high
security margin (1684 bits of security, w!). You totally ignore that
the system can be broken in a linear fashion. I don't need to know all
256 characters to do damage, sometimes even a handful will already give
me part of what I need and the option to crack more and more. This is
something that would ultimately and instantly disqualify your
cryptosystem as utterly insecure.

Nobody assumes you're a moron. But it's safe to assume that you're a
crypto layman, because only laymen have no clue on how difficult it is
to get cryptography even remotely right. Everyone who knows the trade
uses proven constructions not because it's inconvenient, but because
it's one of the very few ways to achieve a secure system.

That said, for your solution this type of obfuscation may be fine. And
chances are that nobody will ever notice. But don't claim you weren't
warned about the abyss when you designed your solution and people break
this stuff. Because then you might *look* like a moron (even if you're
not), since the first question people will ask will be: Why? Why on
earth? It's a blatantly obvious bad idea(tm).

That people in 2015 actually defend inventing a substitution-cipher
cryptosystem sends literally shivers down my spine.

Cheers,
Johannes

-- 
 Wo hattest Du das Beben nochmal GENAU vorhergesagt?
 Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
 - Karl Kaos über Rüdiger Thomas in dsa hidbv3$om2$1...@speranza.aioe.org
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-26 Thread Johannes Bauer
On 26.06.2015 22:09, Randall Smith wrote:

 And that's why we're having this discussion.  Do you know of an attack
 in which you can control the output (say at least 100 consecutive bytes)
 for data which goes through a 256 byte translation table, chosen
 randomly from 256! permutations after the data is sent.  If you do, I'm
 all ears!  But at this point you're just setting up straw men and
 knocking them down.

Oh and I wanted to comment on this as well, but sent my reply too soon.

You misunderstand. This is now how it works, this is not how any of this
works. Steven does not *at all* have to prove to you your system is
breakable or show actual attacks. YOU have to prove that your system is
secure. Either analytically or you wait until you have peer review and
cryptanalysis by actual experts.

It's *very* easy to set up a badly flawed obfuscation system that can't
be broken by laymen in a Python newsgroup and which appers to be secure.
This does not imply one bit that it is even remotely secure.

Cheers,
Johannes

-- 
 Wo hattest Du das Beben nochmal GENAU vorhergesagt?
 Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
 - Karl Kaos über Rüdiger Thomas in dsa hidbv3$om2$1...@speranza.aioe.org
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-26 Thread Randall Smith

On 06/26/2015 05:42 PM, Johannes Bauer wrote:

On 26.06.2015 23:29, Jon Ribbens wrote:


While you seem to think that Steven is rampaging about nothing, he does
have a fair point: You consistently were vague about wheter you want to
have encryption, authentication or obfuscation of data. This suggests
that you may not be so sure yourself what it is you actually want.


He hasn't been vague, you and Steven just haven't been paying
attention.


Bullshit. Even the topic indicates that he doesn't know what he wants:
data mangling or encryption, which one is it?


I knew exactly what I wanted and spelled it out.

protect the recipient against exposure to nefarious data ... before it 
is written to disk


You shouldn't need to make assumptions about other parts of the system. 
 Just prevent potential malware from hitting the disk as such.


Before this thread, I knew that encryption would definitely work and 
data mangling might.  Now I know that data mangling is a really nice 
solution for the given requirements.






You always play around with the 256! which would be a ridiculously high
security margin (1684 bits of security, w!). You totally ignore that
the system can be broken in a linear fashion.


No, it can't, because the attacker does not have access to the
ciphertext.


Or so you claim.


No the attacker does not have access to the ciphertext.  What would lead 
you to think they did?


This statement is central to the problem:

I'd like to protect the recipient against exposure to nefarious data by 
mangling or encrypting the data before it is written to disk


This makes it clear I'm not trying to encrypt data to protect the data. 
 I'm trying to protect the recipient (storage server) from an attack. 
This specific attack being malware.  Yes, AES encryption would have 
worked here, but encryption is not the objective.




I could go into detail about how the assumtion that the ciphertext is
secret is not a smart one in the context of cryptography. And how side
channels and other leakage may affect overall system security. But I'm
going to save my time on that. I do get paid to review cryptographic
systems and part of the job is dealing with belligerent people who have
read Schneier's blog and think they can outsmart anyone else. Since I
don't get paid to convice you, it's absolutely fine that you think your
substitution scheme is the grand prize.



All of which has nothing to do with this thread.  Actual encryption is 
handled using AES and TLS.  This is not about encryption.





Nobody assumes you're a moron. But it's safe to assume that you're a
crypto layman, because only laymen have no clue on how difficult it is
to get cryptography even remotely right.


Amateur crypto is indeed a bad idea. But what you're still not getting
is that what he's doing here *isn't crypto*.


So the topic says Encrypting. If you look really closely at the word,
the part crypt might give away to you that cryptography is involved.




This isn't about encrypting data to protect the data.  All the 
encryption I do uses standard AES and TLS.  Yes, I do understand that 
crypto is best left to experts.  The topic says Encrypting because I 
knew that encrypting the data would properly obfuscate it.




He's just trying to avoid
letting third parties write completely arbitrary data to the disk.


There's your requirement. Then there's obviously some kind of
implication when a third party *can* write arbitrary data to disk. And
your other solution to that problem...


It's a network protocol.  Just like when writing a web app, you have to 
deal with bad actors.  That's what I'm doing here.  The entire service 
is about handling arbitrary data.  Just like Amazon S3 handles people's 
arbitrary data.  Not sure what you mean by third party.  It would be 
a registered peer.  But registration doesn't prevent the scenario in 
discussion.





--
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-26 Thread Chris Angelico
On Sat, Jun 27, 2015 at 6:09 AM, Randall Smith rand...@tnr.cc wrote:
 Give me one plausible scenario where an attacker can cause malware to hit
 the disk after bytearray.translate with a 256 byte translation table and
 I'll be thankful to you.

The entire 256-byte translation table is significant ONLY if you need
all 256 possible bytes. Suppose I want to generate the following byte
sequence:

\xCD\x19

(Okay, this is a slightly oversimplified example, as this attack
doesn't work on a modern Windows. But back in the days of DOS, this
program would reboot your computer.)

How many truly different translation tables are there if I'm trying to
produce this? Just 256*255, or 65280. If I send random two-byte files,
there is one chance in that of my malware successfully landing. Once
I've sent about 45,000 of those files, I have a fifty-fifty chance of
having hit it. Send twice as many, I have a 75% chance of success,
etc.

Malware can be crafted to fit within certain restrictions. I saw a
proof-of-concept and analysis document detailing a particular remote
code execution/privilege escalation attack that involved stuffing
text into an entry field and then inducing the program to read that
into its stack, finally triggering it by some sort of buffer overflow,
I think. The text had to be no more than X bytes long (because that's
all the entry field was set to accept - it'd truncate after that), and
had to not contain any NUL bytes, and there might have been other
restrictions too. Sure, it makes it harder to write your malware...
but imagine if you can write something in just a handful of different
bytes, which then goes and triggers something else. You could have an
extremely plausible attack that might need only a day's uploading to
deliver.

It makes no difference that there are 256! possible encryption keys,
if most of them have the same result.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-26 Thread Randall Smith

On 06/26/2015 04:55 PM, Mark Lawrence wrote:



To be perfectly blunt I gave up days ago trying to follow what was being
said, just too many words from all angles and too few diagrams for me to
follow.  I sincerely hope it doesn't end in tears.



Mark.

There's not much to follow.  The solution was simple and complete.

The original description was limited to a small part of a large, more 
complex system.  The reason you've had trouble following is because 
several people made (very bad) assumptions about what the rest of the 
system did.  Everything required for the solution was present in the 
original post.


-Randall

--
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-26 Thread Randall Smith

On 06/26/2015 04:07 PM, Johannes Bauer wrote:

You consistently were vague about wheter you want to
have encryption, authentication or obfuscation of data.


I knew (possibly extra) encryption wasn't necessary at this stage, but I 
also knew that encryption would provide good obfuscation.  Problem is, I 
didn't want an extra C library to install. See the original post.


... I'd like to protect the recipient against exposure to nefarious 
data by mangling or encrypting the data before it is written to disk. My 
original idea was for the recipient to encrypt using AES.  But I want to 
keep this software pure Python batteries included and not require 
installation of other platform-dependent software ... I don't know that 
I really need encryption here, but some type of fast mangling algorithm 
where a bad actor sending a payload can't guess the output ahead of time.


-Randall



--
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-26 Thread Jon Ribbens
On 2015-06-26, Johannes Bauer dfnsonfsdu...@gmx.de wrote:
 On 26.06.2015 23:29, Jon Ribbens wrote:
 While you seem to think that Steven is rampaging about nothing, he does
 have a fair point: You consistently were vague about wheter you want to
 have encryption, authentication or obfuscation of data. This suggests
 that you may not be so sure yourself what it is you actually want.
 
 He hasn't been vague, you and Steven just haven't been paying
 attention.

 Bullshit. Even the topic indicates that he doesn't know what he wants:
 data mangling or encryption, which one is it?

He wants data mangling and he was asking whether he needed encryption
to achieve it. The answer is no, he doesn't.

 I could go into detail about how the assumtion that the ciphertext is
 secret is not a smart one in the context of cryptography.

But, and I've already pointed this out and you don't seem to have
quite got the picture yet, we're not in the context of cryptography.

 And how side channels and other leakage may affect overall system
 security. But I'm going to save my time on that. I do get paid to
 review cryptographic systems and part of the job is dealing with
 belligerent people who have read Schneier's blog and think they can
 outsmart anyone else.

You seem to be describing your own attitude to a tee.

 Since I don't get paid to convice you, it's absolutely fine that you
 think your substitution scheme is the grand prize.

My scheme? It wasn't my suggestion.

 So the topic says Encrypting. If you look really closely at the word,
 the part crypt might give away to you that cryptography is involved.

If you were to actually read past the subject line and continue on to
read the text of the articles, you would discover that cryptography is
not involved. No wonder you're confused if you're disengaging your
brain the instant you get past the subject line.

 He's just trying to avoid letting third parties write completely
 arbitrary data to the disk.

 There's your requirement.

My requirement?

 You know what would be a perfectly good solution to his problem?
 Base 64 encoding. That would solve the issue pretty much
 completely, the only reason it's not an ideal solution is that it
 of course increases the size of the data.

 ...wow.

 That's a nice interpretation of not letting a third party write
 completely arbitrary data.

It's an accurate interpretation. Something that seems not to be your
forte.

 According to your definition, this would be: It's okay if the
 attacker can control 6 of 8 bits.

Yes, it probably is ok. Add a bit of random gunk at the top and tail
of the file and it's almost certainly ok. Why do you think it's not?

 Oh I understand your solutions plenty well.

Evidently not.

 The only thing I don't understand is why you don't own a Fields
 medal yet for your groundbreaking work on bulletproof obfuscation.

That is clearly a very long way from the only thing you don't
understand.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-26 Thread Johannes Bauer
On 26.06.2015 23:29, Jon Ribbens wrote:

 While you seem to think that Steven is rampaging about nothing, he does
 have a fair point: You consistently were vague about wheter you want to
 have encryption, authentication or obfuscation of data. This suggests
 that you may not be so sure yourself what it is you actually want.
 
 He hasn't been vague, you and Steven just haven't been paying
 attention.

Bullshit. Even the topic indicates that he doesn't know what he wants:
data mangling or encryption, which one is it?

 You always play around with the 256! which would be a ridiculously high
 security margin (1684 bits of security, w!). You totally ignore that
 the system can be broken in a linear fashion.
 
 No, it can't, because the attacker does not have access to the
 ciphertext.

Or so you claim.

I could go into detail about how the assumtion that the ciphertext is
secret is not a smart one in the context of cryptography. And how side
channels and other leakage may affect overall system security. But I'm
going to save my time on that. I do get paid to review cryptographic
systems and part of the job is dealing with belligerent people who have
read Schneier's blog and think they can outsmart anyone else. Since I
don't get paid to convice you, it's absolutely fine that you think your
substitution scheme is the grand prize.

 Nobody assumes you're a moron. But it's safe to assume that you're a
 crypto layman, because only laymen have no clue on how difficult it is
 to get cryptography even remotely right.
 
 Amateur crypto is indeed a bad idea. But what you're still not getting
 is that what he's doing here *isn't crypto*. 

So the topic says Encrypting. If you look really closely at the word,
the part crypt might give away to you that cryptography is involved.

 He's just trying to avoid
 letting third parties write completely arbitrary data to the disk.

There's your requirement. Then there's obviously some kind of
implication when a third party *can* write arbitrary data to disk. And
your other solution to that problem...

 You
 know what would be a perfectly good solution to his problem? Base 64
 encoding. That would solve the issue pretty much completely, the only
 reason it's not an ideal solution is that it of course increases the
 size of the data.

...wow.

That's a nice interpretation of not letting a third party write
completely arbitrary data. According to your definition, this would be:
It's okay if the attacker can control 6 of 8 bits.

 That people in 2015 actually defend inventing a substitution-cipher
 cryptosystem sends literally shivers down my spine.
 
 Nobody is defending such a thing, you just haven't understood what
 problem is being solved here.

Oh I understand your solutions plenty well. The only thing I don't
understand is why you don't own a Fields medal yet for your
groundbreaking work on bulletproof obfuscation.

Cheers,
Johannes

-- 
 Wo hattest Du das Beben nochmal GENAU vorhergesagt?
 Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
 - Karl Kaos über Rüdiger Thomas in dsa hidbv3$om2$1...@speranza.aioe.org
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-26 Thread Mark Lawrence

On 26/06/2015 22:29, Jon Ribbens wrote:

On 2015-06-26, Johannes Bauer dfnsonfsdu...@gmx.de wrote:

On 26.06.2015 22:09, Randall Smith wrote:

You've gone on a rampage about nothing.  My original description said
the client was supposed to encrypt the data, but you want to assume the
opposite for some unknown reason.


While you seem to think that Steven is rampaging about nothing, he does
have a fair point: You consistently were vague about wheter you want to
have encryption, authentication or obfuscation of data. This suggests
that you may not be so sure yourself what it is you actually want.


He hasn't been vague, you and Steven just haven't been paying
attention.


You always play around with the 256! which would be a ridiculously high
security margin (1684 bits of security, w!). You totally ignore that
the system can be broken in a linear fashion.


No, it can't, because the attacker does not have access to the
ciphertext.


Nobody assumes you're a moron. But it's safe to assume that you're a
crypto layman, because only laymen have no clue on how difficult it is
to get cryptography even remotely right.


Amateur crypto is indeed a bad idea. But what you're still not getting
is that what he's doing here *isn't crypto*. He's just trying to avoid
letting third parties write completely arbitrary data to the disk. You
know what would be a perfectly good solution to his problem? Base 64
encoding. That would solve the issue pretty much completely, the only
reason it's not an ideal solution is that it of course increases the
size of the data.


That people in 2015 actually defend inventing a substitution-cipher
cryptosystem sends literally shivers down my spine.


Nobody is defending such a thing, you just haven't understood what
problem is being solved here.



To be perfectly blunt I gave up days ago trying to follow what was being 
said, just too many words from all angles and too few diagrams for me to 
follow.  I sincerely hope it doesn't end in tears.


--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-26 Thread Ian Kelly
On Fri, Jun 26, 2015 at 3:07 PM, Johannes Bauer dfnsonfsdu...@gmx.de wrote:
 That people in 2015 actually defend inventing a substitution-cipher
 cryptosystem sends literally shivers down my spine.

I think that the people defending this have been reasonably consistent
about using the word obfuscation, not crypto.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-26 Thread Devin Jeanpierre
Johannes, I agree with a lot of what you say, but can you please have
less of a mean attitude?

-- Devin

On Fri, Jun 26, 2015 at 3:42 PM, Johannes Bauer dfnsonfsdu...@gmx.de wrote:
 On 26.06.2015 23:29, Jon Ribbens wrote:

 While you seem to think that Steven is rampaging about nothing, he does
 have a fair point: You consistently were vague about wheter you want to
 have encryption, authentication or obfuscation of data. This suggests
 that you may not be so sure yourself what it is you actually want.

 He hasn't been vague, you and Steven just haven't been paying
 attention.

 Bullshit. Even the topic indicates that he doesn't know what he wants:
 data mangling or encryption, which one is it?

 You always play around with the 256! which would be a ridiculously high
 security margin (1684 bits of security, w!). You totally ignore that
 the system can be broken in a linear fashion.

 No, it can't, because the attacker does not have access to the
 ciphertext.

 Or so you claim.

 I could go into detail about how the assumtion that the ciphertext is
 secret is not a smart one in the context of cryptography. And how side
 channels and other leakage may affect overall system security. But I'm
 going to save my time on that. I do get paid to review cryptographic
 systems and part of the job is dealing with belligerent people who have
 read Schneier's blog and think they can outsmart anyone else. Since I
 don't get paid to convice you, it's absolutely fine that you think your
 substitution scheme is the grand prize.

 Nobody assumes you're a moron. But it's safe to assume that you're a
 crypto layman, because only laymen have no clue on how difficult it is
 to get cryptography even remotely right.

 Amateur crypto is indeed a bad idea. But what you're still not getting
 is that what he's doing here *isn't crypto*.

 So the topic says Encrypting. If you look really closely at the word,
 the part crypt might give away to you that cryptography is involved.

 He's just trying to avoid
 letting third parties write completely arbitrary data to the disk.

 There's your requirement. Then there's obviously some kind of
 implication when a third party *can* write arbitrary data to disk. And
 your other solution to that problem...

 You
 know what would be a perfectly good solution to his problem? Base 64
 encoding. That would solve the issue pretty much completely, the only
 reason it's not an ideal solution is that it of course increases the
 size of the data.

 ...wow.

 That's a nice interpretation of not letting a third party write
 completely arbitrary data. According to your definition, this would be:
 It's okay if the attacker can control 6 of 8 bits.

 That people in 2015 actually defend inventing a substitution-cipher
 cryptosystem sends literally shivers down my spine.

 Nobody is defending such a thing, you just haven't understood what
 problem is being solved here.

 Oh I understand your solutions plenty well. The only thing I don't
 understand is why you don't own a Fields medal yet for your
 groundbreaking work on bulletproof obfuscation.

 Cheers,
 Johannes

 --
 Wo hattest Du das Beben nochmal GENAU vorhergesagt?
 Zumindest nicht öffentlich!
 Ah, der neueste und bis heute genialste Streich unsere großen
 Kosmologen: Die Geheim-Vorhersage.
  - Karl Kaos über Rüdiger Thomas in dsa hidbv3$om2$1...@speranza.aioe.org
 --
 https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-26 Thread Steven D'Aprano
On Sat, 27 Jun 2015 06:09 am, Randall Smith wrote:

 On 06/26/2015 12:06 PM, Steven D'Aprano wrote:
 On Fri, 26 Jun 2015 11:01 am, Ian Kelly wrote:

 You're making the same mistake that Steven did in misunderstanding the
 threat model.

 I don't think I'm misunderstanding the threat, I think I'm pointing out a
 threat which the OP is hoping to just ignore.
 
 I'm not hoping to ignore anything.  I didn't explain the entire system,
 as it was not necessary to find a solution to the problem at hand.  But
 since you want to make negative assumptions about what I didn't tell
 you, I'll gladly address your accusations of negligence.

Negligence is *your* word, not mine. I've never said that. And I'm not
*assuming* anything, everything I've stated has been based on the evidence
of what you have written. I've even gone so far as to EXPLICITLY say that I
cannot know for a fact that your application is vulnerable to these
threats, since I'm only going from a description rather than the app
itself. But your responses don't suggest that you have these threats under
control, on the contrary, they indicate that you are *far* underestimating
the seriousness of them and overestimating the difficulty of running a
secure application on a machine you cannot trust.

If your application has any saving grace, it is that there are easier ways
to get malware onto somebody else's computer. There are a hundred million
unsecured Windows boxen out there, if I were malicious I would just hire a
bot net rather than spend the time trying to hack your system. But maybe
somebody else will do it just for the lulz, or to prove it can be done.
Some black hats like a challenge, and yours appears to fall nicely into
that middle ground of hard enough to be interesting but not hard enough to
be really difficult.


 In an earlier post, I suggested that the threat model should involve at
 least *three* different attacks, apart from the usual man-in-the-model
 attacks of data in transit.
 
 All communication is secured using TLS and authentication handled by
 X.509 certificates.  This prevents man in the middle attacks.
 Certificates are signed by CAs I control.

You control the CAs? Presumably that means they're self-signed (unless you
mean you get to choose the CA). I don't know if that makes a difference or
not.


 One is that the attacker is the person sending the data. E.g. I want to
 send a nasty payload (say, malware, or an offensive image). Another is
 that the attacker is the recipient of the file, who wants to read the
 sender's data.
 
 The only person who can read a file is the owner.   AES encryption is
 built into the client software.  The only way data can be uploaded
 unencrypted is if encryption is intentionally disabled.

With respect Randall, you contradict yourself. Is there any wonder that some
of us (well, me at least) is suspicious and confused, when your story
changes as often as the weather?

Sometimes you say that the client software uses AES encryption. Sometimes
you say that you don't want to use AES encryption because you want the
client to be pure Python, and a pure-Python implementation would be too
slow. Your very first post says:

My original idea was for the recipient to encrypt using AES.  But
I want to keep this software pure Python batteries included and
not require installation of other platform-dependent software.  
Pure Python AES and even DES are just way too slow.


Sometimes you say the user is supposed to encrypt the data themselves:

While the data senders are supposed to encrypt data, that's not
guaranteed


Now you say that the application encrypts the data, except that the user can
turn that option off.

Just make the AES encryption mandatory, not optional. Then the user cannot
upload unencrypted malicious data, and the receiver cannot read the data.
That's two problems solved.

Making AES or similarly strong encryption mandatory protects both the sender
of data and the receiver of data. I cannot imagine why you are considering
making it optional, since that only adds more work for you and reduces the
security of your users.

Oh, and DES is not good enough.


 As far as I can tell, the OP's plan to defend the sender's privacy is to
 dump responsibility for encrypting the files in the sender's lap. As far
 as I'm concerned, perhaps as many as one user in 2 will pre-encrypt
 their files. (Early adopters will be unrepresentative of the eventual
 user base of this system. If this takes off, the user base will likely
 end up dominated by people who think that qwerty is the epitome of
 unguessable passwords.)
 
 Making assumptions again.  See above.  The client software encrypts by
 default.  You're also assuming there is no password strength checking.

My comment about qwerty as a password was a comment on the majority of
people on the internet, not an assumption about your application.


 Users just don't use crypto unless their applications do it for them.
 
 And it does.


Re: Pure Python Data Mangling or Encrypting

2015-06-26 Thread Devin Jeanpierre
On Fri, Jun 26, 2015 at 8:38 PM, Steven D'Aprano st...@pearwood.info wrote:
 Now you say that the application encrypts the data, except that the user can
 turn that option off.

 Just make the AES encryption mandatory, not optional. Then the user cannot
 upload unencrypted malicious data, and the receiver cannot read the data.
 That's two problems solved.

No, because another application could pretend to be the file-sending
application, but send unencrypted data instead of encrypted data.

-- Devin
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-26 Thread Johannes Bauer
On 27.06.2015 02:55, Randall Smith wrote:

 No the attacker does not have access to the ciphertext.  What would lead
 you to think they did?

Years of practical experience in the field of applied cryptography.
Knowledge of how side channels work and how easily they can be
constructed for bad schemes.

Rest snipped, explanation futile.
Cheers,
Johannes

-- 
 Wo hattest Du das Beben nochmal GENAU vorhergesagt?
 Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
 - Karl Kaos über Rüdiger Thomas in dsa hidbv3$om2$1...@speranza.aioe.org
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-26 Thread Jon Ribbens
On 2015-06-26, Chris Angelico ros...@gmail.com wrote:
 On Fri, Jun 26, 2015 at 1:26 AM, Jon Ribbens
jon+use...@unequivocal.co.uk wrote:
 Well, it means you need to send 256 times as much data, which is a
 start. If you're instead using a 256-byte translation table then
 an attack becomes utterly impractical.

 Utterly impractical? Maybe, if you attempt a pure brute-force approach
 - there are 256! possible translation tables, which is roughly e500
 attempts [1], and at roughly four a microsecond [2] that'd still take
 a ridiculously long time. But there are two gigantic optimizations you
 could do. Firstly, there are frequency-based attacks,

No, there aren't. As I already said, the attacker does not have the
ciphertext. He can't do anything related to frequency analysis.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Steven D'Aprano
On Thursday 25 June 2015 14:07, Steven D'Aprano wrote:

 You got it.  I didn't want to explain any more than necessary.  But yes,
 the recipient just stores the data for the end-user.
 
 Trust me. That's not all they are doing.

Hmm, sorry, that's a glib answer.

What I meant to say is, you can't *trust* that this is all they are doing, 
not unless all your users are within a single organisation where everyone 
trusts everyone else.

Obviously some people are more trustworthy, or less inquisitive, than 
others. But you don't know which ones are which.


-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Devin Jeanpierre
On Thu, Jun 25, 2015 at 2:57 AM, Chris Angelico ros...@gmail.com wrote:
 On Thu, Jun 25, 2015 at 7:41 PM, Devin Jeanpierre
 jeanpierr...@gmail.com wrote:
 I know that the OP doesn't propose using ROT-13, but a classical
 substitution cipher isn't that much stronger.

 Yes, it is. It requires the attacker being able to see something about
 the ciphertext, unlike ROT13. But it is reasonable to suppose that
 maybe the attacker can trigger the file getting executed, at which
 point maybe you can deduce from the behavior what the starting bytes
 are...?


 If a symmetric cipher is being used and the key is known, anyone can
 simply perform a decryption operation on the desired bytes, get back a
 pile of meaningless encrypted junk, and submit that. When it's
 encrypted with the same key, voila! The cleartext will reappear.

 Asymmetric ciphers are a bit different, though. AIUI you can't perform
 a decryption without the private key, whereas you can encrypt with
 only the public key. So you ought to be safe on that one; the only way
 someone could deliberately craft input that, when encrypted with your
 public key, produces a specific set of bytes, would be to brute-force
 it. (But I might be wrong on that. I'm no crypto expert.)

Yes, so it should be random.

-- Devin
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Joonas Liik
Personally, i have had AVG give at least 2 false positives (fyi one of
them was like python2.6)

as long as antivirus software can give so many false positives i would
thing preventing your AV from nuking someone elses data is a
reasonable thing.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Chris Angelico
On Thu, Jun 25, 2015 at 7:41 PM, Devin Jeanpierre
jeanpierr...@gmail.com wrote:
 I know that the OP doesn't propose using ROT-13, but a classical
 substitution cipher isn't that much stronger.

 Yes, it is. It requires the attacker being able to see something about
 the ciphertext, unlike ROT13. But it is reasonable to suppose that
 maybe the attacker can trigger the file getting executed, at which
 point maybe you can deduce from the behavior what the starting bytes
 are...?


If a symmetric cipher is being used and the key is known, anyone can
simply perform a decryption operation on the desired bytes, get back a
pile of meaningless encrypted junk, and submit that. When it's
encrypted with the same key, voila! The cleartext will reappear.

Asymmetric ciphers are a bit different, though. AIUI you can't perform
a decryption without the private key, whereas you can encrypt with
only the public key. So you ought to be safe on that one; the only way
someone could deliberately craft input that, when encrypted with your
public key, produces a specific set of bytes, would be to brute-force
it. (But I might be wrong on that. I'm no crypto expert.)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Jon Ribbens
On 2015-06-25, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote:
 On Thursday 25 June 2015 14:27, Devin Jeanpierre wrote:
 If it's encrypted malware, and you can't decrypt it, there's no threat.

 If the *only* threat is that the sender will send malware, you can mitigate 
 around that by dropping the file in an unencrypted container. Anything good 
 enough to prevent Windows from executing the code, accidentally or 
 deliberately, say, a tar file with a custom extension.

That won't stop virus scanners etc potentially making their own minds
up about the file.

 But encrypting the file is also a good solution, and it prevents the storage 
 machine spying on the file contents too. Provided the encryption is strong.

How would the receiver encrypting the file after receiving it prevent
the receiver from seeing what's in the file?

 The original post said that the sender will usually send files they
 encrypted, unless they are malicious. So if the sender wants them to
 be encrypted, they already are.

 The OP *hopes* that the sender will encrypt the files. I think that's a 
 vanishingly faint hope, unless the application itself encrypts the file.

Yes, the application itself encrypts the file. Haven't you been
reading what he's saying?

 The sender has a copy of the application? Then they can see the type of 
 obfuscation used. If they know the key, or can guess it, they can take their 
 malware, *decrypt* it, and send that, so that *encrypting* that file puts 
 the malicious code on the disk.

Not if they don't know the key they can't.

 E.g. suppose I want to send you an insult, but I know your program 
 automatically ROT-13s the strings I send you. Then I send you:

 'lbhe sngure fzryyf bs ryqreoreevrf'

 and your program ROT-13s it to:

 'your father smells of elderberries'

 I know that the OP doesn't propose using ROT-13, but a classical 
 substitution cipher isn't that much stronger.

Replace ROT-13 with ROT-n where 'n' is a secret known only to the
receiver, and suddenly it's not such a bad method of obfuscation.
Improve it to the random-translation-map method he's actually using
and you've got really quite a reasonable system.

 I am usually very oppositional when it comes to rolling your own
 crypto, but am I alone here in thinking the OP very clearly laid out
 their case?

 I don't think any of us *really* understand his use-case or the potential 
 threats, but to my way of thinking, you can never have too strong a cipher 
 or underestimate the risk of users taking short-cuts.

The use case is pretty obvious (a peer-to-peer dropbox type thing) but
it does appear to be being misunderstood. This isn't actually a crypto
problem at all and users taking short-cuts isn't an issue.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Randall Smith

On 06/24/2015 08:33 PM, Dennis Lee Bieber wrote:

On Wed, 24 Jun 2015 13:20:07 -0500, Randall Smith rand...@tnr.cc
declaimed the following:


On 06/24/2015 06:36 AM, Steven D'Aprano wrote:

I don't understand how mangling the data is supposed to protect the
recipient. Don't they have the ability unmangle the data, and thus expose
themselves to whatever nasties are in the files?


They never look at the data and wouldn't care to unmangle it.  The
purpose is primarily to prevent automated software (file indexers, virus
scanners) from doing bad things to the data.



Which leads to the question: what is doing bad things.


Storage nodes are computers running the software in discussion, that 
store chunks of data they are sent (recipient) and send it upon request. 
 Their job (as related to this software) is to accept, store and send 
chunks of data upon request.  So losing data is a bad thing.


The storage node software is cross platform and should run on anything 
from a dedicated Raspberry PI to an old Windows PC.  Data integrity is 
insured using encryption and hashes generated by the original data 
owners.  Normally, a data chunk would look like random bytes, because it 
is encrypted.  However, the storage node cannot prevent the client 
(uploader) from sending unencrypted data.  The purpose of this 
obfuscation is to protect the storage node, as many potential users have 
expressed hesitation in storing other peoples data.


Example: A storage node runs a Desktop OS with an image indexer. It 
receives an unencrypted nasty image or movie. The indexer picks it up 
and shows it in the person's image or movie Library.


Does that clear things up?


-Randall
--
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Steven D'Aprano
On Thu, 25 Jun 2015 08:03 pm, Jon Ribbens wrote:

 On 2015-06-25, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info
 wrote:
 On Thursday 25 June 2015 14:27, Devin Jeanpierre wrote:
 If it's encrypted malware, and you can't decrypt it, there's no threat.

 If the *only* threat is that the sender will send malware, you can
 mitigate around that by dropping the file in an unencrypted container.
 Anything good enough to prevent Windows from executing the code,
 accidentally or deliberately, say, a tar file with a custom extension.
 
 That won't stop virus scanners etc potentially making their own minds
 up about the file.

*shrug* Sure, but I was specifically referring to the risk of the malware
being executed, not being detected by a virus scanner.

Encrypting the file won't even necessarily stop the virus scanner from
finding false positives. It might even increase the chances. But it will
prevent the virus scanner from finding actual viruses. You may or may not
consider that a problem.


 But encrypting the file is also a good solution, and it prevents the
 storage machine spying on the file contents too. Provided the encryption
 is strong.
 
 How would the receiver encrypting the file after receiving it prevent
 the receiver from seeing what's in the file?

I didn't say it ought to be encrypted by the receiver. Obviously the
encryption needs to be done in a way that the recipient doesn't get access
to the key. The obvious way to do that is for the application to encrypt
the data before it sends it. Then the receiver just writes the encrypted
bytes directly to a file. That would have the benefit of protecting against
man-in-the-middle attacks as well, since the file is never transmitted in
the clear.

 
 The original post said that the sender will usually send files they
 encrypted, unless they are malicious. So if the sender wants them to
 be encrypted, they already are.

 The OP *hopes* that the sender will encrypt the files. I think that's a
 vanishingly faint hope, unless the application itself encrypts the file.
 
 Yes, the application itself encrypts the file. Haven't you been
 reading what he's saying?

I have been reading what the OP has been saying. I'm not sure if you have
been. The OP doesn't want to encrypt the file, because he wants the
application to be pure Python and encryption in pure Python is too slow. So
he wants to obfuscate it with some sort of substitution cipher or
equivalent, which may be easily crackable by anyone who really wants to.

I've been arguing that the application *should* encrypt the file, and not
mess about giving the illusion of security.


 The sender has a copy of the application? Then they can see the type of
 obfuscation used. If they know the key, or can guess it, they can take
 their malware, *decrypt* it, and send that, so that *encrypting* that
 file puts the malicious code on the disk.
 
 Not if they don't know the key they can't.

If they know the key, or can guess it, ...
Not if they don't know the key they can't.

Really? Glad you're around to point that out to me.

But seriously, they have the application. If the application is using a
symmetric substitution cipher, it needs the key (because there is only
one), so the receiver will have the cipher.

With the sort of substitution cipher the OP is experimenting with, forcing a
particular result is trivially easy. The sender has access to the
application, knows the cipher, knows the key, and can easily generate a
file which will generate whatever content the sender wants after being
obfuscated.

Modern asymmetric ciphers like AES are quite resistant to that sort of
attack. There is, so far as I know, no way to generate a file which results
in a specific content after encryption.


 E.g. suppose I want to send you an insult, but I know your program
 automatically ROT-13s the strings I send you. Then I send you:

 'lbhe sngure fzryyf bs ryqreoreevrf'

 and your program ROT-13s it to:

 'your father smells of elderberries'

 I know that the OP doesn't propose using ROT-13, but a classical
 substitution cipher isn't that much stronger.
 
 Replace ROT-13 with ROT-n where 'n' is a secret known only to the
 receiver, and suddenly it's not such a bad method of obfuscation.

There are only 256 possible values for n, one of which doesn't transform the
data at all (ROT-0). If you're thinking of attacking this by pencil and
paper, 255 transformations sounds like a lot. For a computer, that's barely
harder than a single transformation.


 Improve it to the random-translation-map method he's actually using
 and you've got really quite a reasonable system.

No, truly you haven't. The OP is experimenting with bytearray.translate,
which likely makes it a monoalphabetic substitution cipher, and the
techniques for cracking those go back to the 9th century AD. That's over a
thousand years of experience in cracking these things.

The situation is a bit harder than the sort of traditional ciphers, instead
of using an alphabet 

Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Jon Ribbens
On 2015-06-25, Steven D'Aprano st...@pearwood.info wrote:
 On Thu, 25 Jun 2015 08:03 pm, Jon Ribbens wrote:
 That won't stop virus scanners etc potentially making their own minds
 up about the file.

 *shrug* Sure, but I was specifically referring to the risk of the malware
 being executed, not being detected by a virus scanner.

 Encrypting the file won't even necessarily stop the virus scanner from
 finding false positives. It might even increase the chances.

That seems spectacularly unlikely.

 But it will prevent the virus scanner from finding actual viruses.
 You may or may not consider that a problem.

The OP would consider it a benefit.

 I didn't say it ought to be encrypted by the receiver. Obviously the
 encryption needs to be done in a way that the recipient doesn't get access
 to the key.

No, you're still misunderstanding. The encryption needs to be done in
a way that the *sender* doesn't get access to the key. The recipient
has access to it by definition because the recipient chooses it.

 The obvious way to do that is for the application to encrypt the
 data before it sends it.

Yes, he already said the application does that. The problem is,
what if the sender is not the genuine application but is instead
a malicious attacker?

 Then the receiver just writes the encrypted bytes directly to a file.

That's precisely what he's trying to avoid.

 That would have the benefit of protecting against man-in-the-middle
 attacks as well, since the file is never transmitted in the clear.

With what he's talking about, the file after encryption is never
transmitted *at all*.

 I've been arguing that the application *should* encrypt the file, and not
 mess about giving the illusion of security.

You haven't understood the threat model.

 But seriously, they have the application. If the application is using a
 symmetric substitution cipher, it needs the key (because there is only
 one), so the receiver will have the cipher.

There is not only one key. The recipient would invent a new key for
each file after the file is received.

 With the sort of substitution cipher the OP is experimenting with, forcing a
 particular result is trivially easy. The sender has access to the
 application, knows the cipher, knows the key, and can easily generate a
 file which will generate whatever content the sender wants after being
 obfuscated.

No, because the sender does not know the key.

 Replace ROT-13 with ROT-n where 'n' is a secret known only to the
 receiver, and suddenly it's not such a bad method of obfuscation.

 There are only 256 possible values for n, one of which doesn't transform the
 data at all (ROT-0). If you're thinking of attacking this by pencil and
 paper, 255 transformations sounds like a lot. For a computer, that's barely
 harder than a single transformation.

Well, it means you need to send 256 times as much data, which is a
start. If you're instead using a 256-byte translation table then
an attack becomes utterly impractical.

 Improve it to the random-translation-map method he's actually using
 and you've got really quite a reasonable system.

 No, truly you haven't. The OP is experimenting with bytearray.translate,
 which likely makes it a monoalphabetic substitution cipher, and the
 techniques for cracking those go back to the 9th century AD.

Only if you have the ciphertext, which the attacker in this scenario
does not. The attacker gets to set the plaintext, knows the algorithm,
does not know the key (unless the method of choosing the key has a
flaw), and wants to set the ciphertext to some specific string.
Frequency analysis doesn't even begin to apply to this scenario.

 You're relying on security by obscurity

No, he really isn't.

 The use case is pretty obvious (a peer-to-peer dropbox type thing) but
 it does appear to be being misunderstood. This isn't actually a crypto
 problem at all and users taking short-cuts isn't an issue.

 Yes it is. If users don't properly pre-encrypt their files before sending it
 out to the cloud, AND THEY WON'T,

Yes they will. He said his application encrypts the files for them,
presumably he is indeed using proper crypto for that.

 receivers WILL be able to read those files,

That's a problem for the sender not the receiver.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Randall Smith

On 06/24/2015 11:27 PM, Devin Jeanpierre wrote:

On Wed, Jun 24, 2015 at 9:07 PM, Steven D'Aprano st...@pearwood.info wrote:

But just sticking to the three above, the first one is partially mitigated
by allowing virus scanners to scan the data, but that implies that the
owner of the storage machine can spy on the files. So you have a conflict
here.


If it's encrypted malware, and you can't decrypt it, there's no threat.


Honestly, the *only* real defence against the spying issue is to encrypt the
files. Not obfuscate them with a lousy random substitution cipher. The
storage machine can keep the files as long as they like, just by making a
copy, and spend hours bruteforcing them. They *will* crack the substitution
cipher. In pure Python, that may take a few days or weeks; in C, hours or
days. If they have the resources to throw at it, minutes. Substitution
ciphers have not been effective encryption since, oh, the 1950s, unless you
use a one-time pad. Which you won't be.


The original post said that the sender will usually send files they
encrypted, unless they are malicious. So if the sender wants them to
be encrypted, they already are.

While the data senders are supposed to encrypt data, that's not
guaranteed, and I'd like to protect the recipient against exposure to
nefarious data by mangling or encrypting the data before it is written
to disk.

The cipher is just to keep the sender from being able to control what
is on disk.

I am usually very oppositional when it comes to rolling your own
crypto, but am I alone here in thinking the OP very clearly laid out
their case?

-- Devin



Thanks Devin.  You understand the issue perfectly despite my limited 
description of the system.  I've fully implemented and performance 
tested your suggested solution and am quite happy with it.


Though the issue is solved, I would be glad to listen to any remaining 
criticisms, suggestions or questions.


--Randall
--
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Randall Smith
Thanks Jon.  I couldn't have answered those questions better myself, and 
I wrote the software in question.


I didn't intend to describe the entire system, but rather just enough of 
it to present the issue at hand.  You seem to understand it quite well.


I'm now using a randomly generated 256 byte translation table, which 
performs very well on the lowly Raspberry PI ARM chip.  The Raspberry PI 
is to be my recommended storage node platform.


For those that care, the storage system is something like Amazon S3, 
except storage is distributed peer to peer.  Clients compress, encrypt, 
and chunk data, then send it to storage nodes. Storage nodes propagate 
the data.  Encryption and Authentication are handled through TLS.  Files 
use AES encryption for storage.  Storage Nodes are monitored for 
availability, integrity, and performance.  Data transfers are 
coordinated by a centralized service which tracks storage and transfers. 
 Redundancy is configurable by chunk. Storage nodes are compensated for 
storage x time.  Uploads and downloads can utilize several storage nodes 
simultaneously to increase throughput.


-Randall

On 06/25/2015 10:26 AM, Jon Ribbens wrote:

On 2015-06-25, Steven D'Aprano st...@pearwood.info wrote:

On Thu, 25 Jun 2015 08:03 pm, Jon Ribbens wrote:

That won't stop virus scanners etc potentially making their own minds
up about the file.


*shrug* Sure, but I was specifically referring to the risk of the malware
being executed, not being detected by a virus scanner.

Encrypting the file won't even necessarily stop the virus scanner from
finding false positives. It might even increase the chances.


That seems spectacularly unlikely.


But it will prevent the virus scanner from finding actual viruses.
You may or may not consider that a problem.


The OP would consider it a benefit.


I didn't say it ought to be encrypted by the receiver. Obviously the
encryption needs to be done in a way that the recipient doesn't get access
to the key.


No, you're still misunderstanding. The encryption needs to be done in
a way that the *sender* doesn't get access to the key. The recipient
has access to it by definition because the recipient chooses it.


The obvious way to do that is for the application to encrypt the
data before it sends it.


Yes, he already said the application does that. The problem is,
what if the sender is not the genuine application but is instead
a malicious attacker?


Then the receiver just writes the encrypted bytes directly to a file.


That's precisely what he's trying to avoid.


That would have the benefit of protecting against man-in-the-middle
attacks as well, since the file is never transmitted in the clear.


With what he's talking about, the file after encryption is never
transmitted *at all*.


I've been arguing that the application *should* encrypt the file, and not
mess about giving the illusion of security.


You haven't understood the threat model.


But seriously, they have the application. If the application is using a
symmetric substitution cipher, it needs the key (because there is only
one), so the receiver will have the cipher.


There is not only one key. The recipient would invent a new key for
each file after the file is received.


With the sort of substitution cipher the OP is experimenting with, forcing a
particular result is trivially easy. The sender has access to the
application, knows the cipher, knows the key, and can easily generate a
file which will generate whatever content the sender wants after being
obfuscated.


No, because the sender does not know the key.


Replace ROT-13 with ROT-n where 'n' is a secret known only to the
receiver, and suddenly it's not such a bad method of obfuscation.


There are only 256 possible values for n, one of which doesn't transform the
data at all (ROT-0). If you're thinking of attacking this by pencil and
paper, 255 transformations sounds like a lot. For a computer, that's barely
harder than a single transformation.


Well, it means you need to send 256 times as much data, which is a
start. If you're instead using a 256-byte translation table then
an attack becomes utterly impractical.


Improve it to the random-translation-map method he's actually using
and you've got really quite a reasonable system.


No, truly you haven't. The OP is experimenting with bytearray.translate,
which likely makes it a monoalphabetic substitution cipher, and the
techniques for cracking those go back to the 9th century AD.


Only if you have the ciphertext, which the attacker in this scenario
does not. The attacker gets to set the plaintext, knows the algorithm,
does not know the key (unless the method of choosing the key has a
flaw), and wants to set the ciphertext to some specific string.
Frequency analysis doesn't even begin to apply to this scenario.


You're relying on security by obscurity


No, he really isn't.


The use case is pretty obvious (a peer-to-peer dropbox type thing) but
it does appear to be being 

Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Chris Angelico
On Fri, Jun 26, 2015 at 11:17 AM, Mark Lawrence breamore...@yahoo.co.uk wrote:
 Even the famous Enigma
 machine was a lot more than just letter-for-letter substitution - a
 double letter in the cleartext wouldn't be represented by a double
 letter in the result - and once the machine's secrets were figured
 out, the day's key could be reassembled fairly readily.


 The day's key for a given network, with the Luftwaffe easily being the worst
 offenders.  Some networks remained unbroken at the end of WWII.

I was massively oversimplifying, here. But there's a reason that
modern crypto doesn't use str.translate() level ciphers.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Mark Lawrence

On 26/06/2015 03:06, Chris Angelico wrote:

On Fri, Jun 26, 2015 at 11:17 AM, Mark Lawrence breamore...@yahoo.co.uk wrote:

Even the famous Enigma
machine was a lot more than just letter-for-letter substitution - a
double letter in the cleartext wouldn't be represented by a double
letter in the result - and once the machine's secrets were figured
out, the day's key could be reassembled fairly readily.



The day's key for a given network, with the Luftwaffe easily being the worst
offenders.  Some networks remained unbroken at the end of WWII.


I was massively oversimplifying, here. But there's a reason that
modern crypto doesn't use str.translate() level ciphers.

ChrisA



I should know.  Ever heard of DISCON?  Like to hazard a guess as to who 
worked on it all those years ago?


--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Chris Angelico
On Fri, Jun 26, 2015 at 11:01 AM, Ian Kelly ian.g.ke...@gmail.com wrote:
 On Thu, Jun 25, 2015 at 6:33 PM, Chris Angelico ros...@gmail.com wrote:
 On Fri, Jun 26, 2015 at 1:26 AM, Jon Ribbens
 jon+use...@unequivocal.co.uk wrote:
 Well, it means you need to send 256 times as much data, which is a
 start. If you're instead using a 256-byte translation table then
 an attack becomes utterly impractical.


 Utterly impractical? chomp analysis

 You're making the same mistake that Steven did in misunderstanding the
 threat model.

To be honest, I wasn't actually answering anything about the original
threat model, but only responding to the statement that a 256-byte
anything-to-anything cipher is somehow incredibly secure. It isn't,
but that might not be a problem for the original purpose.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Chris Angelico
On Fri, Jun 26, 2015 at 12:24 PM, Mark Lawrence breamore...@yahoo.co.uk wrote:
 On 26/06/2015 03:06, Chris Angelico wrote:

 On Fri, Jun 26, 2015 at 11:17 AM, Mark Lawrence breamore...@yahoo.co.uk
 wrote:

 Even the famous Enigma
 machine was a lot more than just letter-for-letter substitution - a
 double letter in the cleartext wouldn't be represented by a double
 letter in the result - and once the machine's secrets were figured
 out, the day's key could be reassembled fairly readily.


 The day's key for a given network, with the Luftwaffe easily being the
 worst
 offenders.  Some networks remained unbroken at the end of WWII.


 I was massively oversimplifying, here. But there's a reason that
 modern crypto doesn't use str.translate() level ciphers.

 ChrisA


 I should know.  Ever heard of DISCON?  Like to hazard a guess as to who
 worked on it all those years ago?

No, not familiar with it. But I'm guessing you have the crypto
background to know all this stuff, which means you aren't the sort of
person I need to explain things to. Great! :)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Chris Angelico
On Fri, Jun 26, 2015 at 1:26 AM, Jon Ribbens
jon+use...@unequivocal.co.uk wrote:
 There are only 256 possible values for n, one of which doesn't transform the
 data at all (ROT-0). If you're thinking of attacking this by pencil and
 paper, 255 transformations sounds like a lot. For a computer, that's barely
 harder than a single transformation.

 Well, it means you need to send 256 times as much data, which is a
 start. If you're instead using a 256-byte translation table then
 an attack becomes utterly impractical.


Utterly impractical? Maybe, if you attempt a pure brute-force approach
- there are 256! possible translation tables, which is roughly e500
attempts [1], and at roughly four a microsecond [2] that'd still take
a ridiculously long time. But there are two gigantic optimizations you
could do. Firstly, there are frequency-based attacks, and byte value
duplicates will tell you a lot - classic cryptographic work. And
secondly, you can simply take the first few bytes of a file - let's
say 16, although a lot of files can be recognized in less than that.
Even if there are no duplicate bytes, that'd be a maximum of 16!
translation tables that truly matter, or just 2e13. At the same speed,
that makes about a million seconds of computing time required. Divide
that across a bunch of separate computers (the job is embarrassingly
parallel after all), and you could get that result pretty easily. Cut
the prefix to just 8 bytes and you have a mere 40K encryption keys to
try - so quick that you wouldn't even see it happen. Nope, a simple
substitution cipher is still not secure. Even the famous Enigma
machine was a lot more than just letter-for-letter substitution - a
double letter in the cleartext wouldn't be represented by a double
letter in the result - and once the machine's secrets were figured
out, the day's key could be reassembled fairly readily.

ChrisA

[1] It's actually closer to 8.6e506, if you care.
[2] timeit result from my laptop - you could do better, but that's a
reasonable average
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Mark Lawrence

On 26/06/2015 01:33, Chris Angelico wrote:

On Fri, Jun 26, 2015 at 1:26 AM, Jon Ribbens
jon+use...@unequivocal.co.uk wrote:

There are only 256 possible values for n, one of which doesn't transform the
data at all (ROT-0). If you're thinking of attacking this by pencil and
paper, 255 transformations sounds like a lot. For a computer, that's barely
harder than a single transformation.


Well, it means you need to send 256 times as much data, which is a
start. If you're instead using a 256-byte translation table then
an attack becomes utterly impractical.



Utterly impractical? Maybe, if you attempt a pure brute-force approach
- there are 256! possible translation tables, which is roughly e500
attempts [1], and at roughly four a microsecond [2] that'd still take
a ridiculously long time. But there are two gigantic optimizations you
could do. Firstly, there are frequency-based attacks, and byte value
duplicates will tell you a lot - classic cryptographic work. And
secondly, you can simply take the first few bytes of a file - let's
say 16, although a lot of files can be recognized in less than that.
Even if there are no duplicate bytes, that'd be a maximum of 16!
translation tables that truly matter, or just 2e13. At the same speed,
that makes about a million seconds of computing time required. Divide
that across a bunch of separate computers (the job is embarrassingly
parallel after all), and you could get that result pretty easily. Cut
the prefix to just 8 bytes and you have a mere 40K encryption keys to
try - so quick that you wouldn't even see it happen. Nope, a simple
substitution cipher is still not secure. Even the famous Enigma
machine was a lot more than just letter-for-letter substitution - a
double letter in the cleartext wouldn't be represented by a double
letter in the result - and once the machine's secrets were figured
out, the day's key could be reassembled fairly readily.



The day's key for a given network, with the Luftwaffe easily being the 
worst offenders.  Some networks remained unbroken at the end of WWII.


--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Ian Kelly
On Thu, Jun 25, 2015 at 6:33 PM, Chris Angelico ros...@gmail.com wrote:
 On Fri, Jun 26, 2015 at 1:26 AM, Jon Ribbens
 jon+use...@unequivocal.co.uk wrote:
 There are only 256 possible values for n, one of which doesn't transform the
 data at all (ROT-0). If you're thinking of attacking this by pencil and
 paper, 255 transformations sounds like a lot. For a computer, that's barely
 harder than a single transformation.

 Well, it means you need to send 256 times as much data, which is a
 start. If you're instead using a 256-byte translation table then
 an attack becomes utterly impractical.


 Utterly impractical? Maybe, if you attempt a pure brute-force approach
 - there are 256! possible translation tables, which is roughly e500
 attempts [1], and at roughly four a microsecond [2] that'd still take
 a ridiculously long time. But there are two gigantic optimizations you
 could do. Firstly, there are frequency-based attacks, and byte value
 duplicates will tell you a lot - classic cryptographic work. And
 secondly, you can simply take the first few bytes of a file - let's
 say 16, although a lot of files can be recognized in less than that.
 Even if there are no duplicate bytes, that'd be a maximum of 16!
 translation tables that truly matter, or just 2e13. At the same speed,
 that makes about a million seconds of computing time required. Divide
 that across a bunch of separate computers (the job is embarrassingly
 parallel after all), and you could get that result pretty easily. Cut
 the prefix to just 8 bytes and you have a mere 40K encryption keys to
 try - so quick that you wouldn't even see it happen. Nope, a simple
 substitution cipher is still not secure. Even the famous Enigma
 machine was a lot more than just letter-for-letter substitution - a
 double letter in the cleartext wouldn't be represented by a double
 letter in the result - and once the machine's secrets were figured
 out, the day's key could be reassembled fairly readily.

You're making the same mistake that Steven did in misunderstanding the
threat model. The goal isn't to prevent the attacker from working out
the key for a file that has already been obfuscated. Any real data
that might be exposed by a vulnerability in the server is presumed to
have already been strongly encrypted by the user.

The goal is to prevent the attacker from guessing a key that hasn't
even been generated yet, which could be exploited to engineer the
obfuscated content into something malicious. There are no
frequency-based attacks possible here, because you can't do frequency
analysis on the result of a key that hasn't even been generated yet.
Assuming that you have no attack on the key generation itself, the
best you can do is send a file deobfuscated with a random key and hope
that the recipient randomly chooses the same key; the odds of that
happening are 1 in 256!.

That said, I do see a potential weakness here: if the attacker can
create a malicious payload using only a subset of the 256 possible
byte values, then the odds of getting a correct key are increased,
since multiple keys will work. For an extreme example, if the attacker
can manage to craft a malicious payload that uses only the two byte
values 32 and 47, then the probability of getting a key that will
obfuscate to that is increased to 1 in 256! / 254!, or 1 in 65280. If
they distribute 65280 copies of that payload to various recipients,
then they can expect that one recipient on average will get the
payload in its malicious form.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Steven D'Aprano
On Thursday 25 June 2015 14:27, Devin Jeanpierre wrote:

 On Wed, Jun 24, 2015 at 9:07 PM, Steven D'Aprano st...@pearwood.info
 wrote:
 But just sticking to the three above, the first one is partially
 mitigated by allowing virus scanners to scan the data, but that implies
 that the owner of the storage machine can spy on the files. So you have a
 conflict here.
 
 If it's encrypted malware, and you can't decrypt it, there's no threat.

If the *only* threat is that the sender will send malware, you can mitigate 
around that by dropping the file in an unencrypted container. Anything good 
enough to prevent Windows from executing the code, accidentally or 
deliberately, say, a tar file with a custom extension.

But encrypting the file is also a good solution, and it prevents the storage 
machine spying on the file contents too. Provided the encryption is strong.


 Honestly, the *only* real defence against the spying issue is to encrypt
 the files. Not obfuscate them with a lousy random substitution cipher.
 The storage machine can keep the files as long as they like, just by
 making a copy, and spend hours bruteforcing them. They *will* crack the
 substitution cipher. In pure Python, that may take a few days or weeks;
 in C, hours or days. If they have the resources to throw at it, minutes.
 Substitution ciphers have not been effective encryption since, oh, the
 1950s, unless you use a one-time pad. Which you won't be.
 
 The original post said that the sender will usually send files they
 encrypted, unless they are malicious. So if the sender wants them to
 be encrypted, they already are.

The OP *hopes* that the sender will encrypt the files. I think that's a 
vanishingly faint hope, unless the application itself encrypts the file.

Most people don't have any encryption software beyond password-protecting 
zip files. Zip 2.0 legacy encryption is crap, and there are plenty of tools 
available to break it. Winzip has an extension for 128-bit and 256-bit AES 
encryption, both of which are probably strong enough unless you're targeted 
by the NSA, but the weak link in the chain is the idea that people will 
encrypt the software before sending it. Even if they have the tools, 
laziness being the defining characteristic of most people, they won't use 
them.

 While the data senders are supposed to encrypt data, that's not
 guaranteed, and I'd like to protect the recipient against exposure to
 nefarious data by mangling or encrypting the data before it is written
 to disk.
 
 The cipher is just to keep the sender from being able to control what
 is on disk.

The sender has a copy of the application? Then they can see the type of 
obfuscation used. If they know the key, or can guess it, they can take their 
malware, *decrypt* it, and send that, so that *encrypting* that file puts 
the malicious code on the disk.

E.g. suppose I want to send you an insult, but I know your program 
automatically ROT-13s the strings I send you. Then I send you:

'lbhe sngure fzryyf bs ryqreoreevrf'

and your program ROT-13s it to:

'your father smells of elderberries'

I know that the OP doesn't propose using ROT-13, but a classical 
substitution cipher isn't that much stronger.


 I am usually very oppositional when it comes to rolling your own
 crypto, but am I alone here in thinking the OP very clearly laid out
 their case?


I don't think any of us *really* understand his use-case or the potential 
threats, but to my way of thinking, you can never have too strong a cipher 
or underestimate the risk of users taking short-cuts.



-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Devin Jeanpierre
On Thu, Jun 25, 2015 at 2:25 AM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
 On Thursday 25 June 2015 14:27, Devin Jeanpierre wrote:
 The original post said that the sender will usually send files they
 encrypted, unless they are malicious. So if the sender wants them to
 be encrypted, they already are.

 The OP *hopes* that the sender will encrypt the files. I think that's a
 vanishingly faint hope, unless the application itself encrypts the file.

 Most people don't have any encryption software beyond password-protecting
 zip files. Zip 2.0 legacy encryption is crap, and there are plenty of tools
 available to break it. Winzip has an extension for 128-bit and 256-bit AES
 encryption, both of which are probably strong enough unless you're targeted
 by the NSA, but the weak link in the chain is the idea that people will
 encrypt the software before sending it. Even if they have the tools,
 laziness being the defining characteristic of most people, they won't use
 them.

You're right, I was supposing that since they wrote the server, they
also wrote the client, and were just protecting from the protocol
itself being weak.

 I know that the OP doesn't propose using ROT-13, but a classical
 substitution cipher isn't that much stronger.

Yes, it is. It requires the attacker being able to see something about
the ciphertext, unlike ROT13. But it is reasonable to suppose that
maybe the attacker can trigger the file getting executed, at which
point maybe you can deduce from the behavior what the starting bytes
are...?

 I don't think any of us *really* understand his use-case or the potential
 threats, but to my way of thinking, you can never have too strong a cipher
 or underestimate the risk of users taking short-cuts.

This is truth. It would be nice if something like keyczar came in the stdlib.

(Otherwise, users of Python take shortcuts and use randomized
substitution ciphers instead of AES.)

-- Devin
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-24 Thread Devin Jeanpierre
How about a random substitution cipher? This will be ultra-weak, but
fast (using bytes.translate/bytes.maketrans) and seems to be the kind
of thing you're asking for.

-- Devin

On Tue, Jun 23, 2015 at 12:02 PM, Randall Smith rand...@tnr.cc wrote:
 Chunks of data (about 2MB) are to be stored on machines using a peer-to-peer
 protocol.  The recipient of these chunks can't assume that the payload is
 benign.  While the data senders are supposed to encrypt data, that's not
 guaranteed, and I'd like to protect the recipient against exposure to
 nefarious data by mangling or encrypting the data before it is written to
 disk.

 My original idea was for the recipient to encrypt using AES.  But I want to
 keep this software pure Python batteries included and not require
 installation of other platform-dependent software.  Pure Python AES and even
 DES are just way too slow.  I don't know that I really need encryption here,
 but some type of fast mangling algorithm where a bad actor sending a payload
 can't guess the output ahead of time.

 Any ideas are appreciated.  Thanks.

 -Randall

 --
 https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list


Pure Python Data Mangling or Encrypting

2015-06-24 Thread Randall Smith
Chunks of data (about 2MB) are to be stored on machines using a 
peer-to-peer protocol.  The recipient of these chunks can't assume that 
the payload is benign.  While the data senders are supposed to encrypt 
data, that's not guaranteed, and I'd like to protect the recipient 
against exposure to nefarious data by mangling or encrypting the data 
before it is written to disk.


My original idea was for the recipient to encrypt using AES.  But I want 
to keep this software pure Python batteries included and not require 
installation of other platform-dependent software.  Pure Python AES and 
even DES are just way too slow.  I don't know that I really need 
encryption here, but some type of fast mangling algorithm where a bad 
actor sending a payload can't guess the output ahead of time.


Any ideas are appreciated.  Thanks.

-Randall

--
https://mail.python.org/mailman/listinfo/python-list


  1   2   >