rsyncrypto (was: offtopic: a sad story about VPS)

Eran Tromer Wed, 29 Jun 2005 07:29:11 -0700

Ahoy,

On 29/06/05 13:50, Shachar Shemesh wrote:


>> Not just over the wire, according by your webpage, but even backup
>> server can't decrypt it. Excellent.
>>
> Otherwise we gain nothing.

You'd gain protection from a 3rd-party adversary listening in on the
traffic (if you didn't a secure channel like SSH in the first place).


>> But how can you do that with a secure (hence chained) encryption mode
>> while maintaining the communication efficiency of rsync? The only way I
>> see is to keep the backup data in numerous separately encrypted
>> fragments (inconvenient)
>>
> not only inconvenient, but won't solve your fundamental problem.

It will, and here's how. Use an encryption mode that allows anyone to
extract a block-aligned sub-sequence of an encrypted plaintext and
stored it efficiently. For example, with CBC you could copy the relevant
ciphertext blocks plus IV, and for counter-mode you'd also need the offset.

Now, when the sender tells the receiver to use some chunk of old
plaintext, the receiver can extract this encrypted subsequence from the
old ciphertext and use it for the new ciphertext (with a bit of care and
a little overhead due to block alignment). The receiver will end up with
chunks of fresh ciphertext plus chunks of extracted old ciphertext, but
he knows exactly what plaintext offset each ciphertext chunk corresponds
to, so if he sends all he know to someone who has the keys then the
plaintext can be recovered. For best effect, I suppose you could wrap up
all those fragments in a single file.

How would the sender compute the delta when the receiver can no longer
compute the hashes? Easy -- the sender encrypts the hashes and sends
them during each sync, as an opaque cookie (albeit a largish one). The
receiver just sends that back on the next sync.


>> and have the sender transmit the encrypted
>> block hashes (communication cost)
>>
> Receiver, if anything. No, that will never do.

See above. Why?


> It's a good question whether traffic analysis is a concern. As far as
> algorithms go, our analysis assumes that we have only two parties. The
> user of the backup server (Alice) and the backup server operator
> (Mellisa), who is also the adversary. This means that Mellisa can
> perform actual comparing of before and after images, possibly even
> affecting the delta encrypted.
> 
> Consider, for example, a case where your company rents Lingnu's backup
> services. One of the things most crucial to a company is its customer
> list, which is therefor, obviously, also backed up.
> 
> I want to gain said customer list. I therefor enroll, under an assumed
> name, as your customer, giving crafted information as my personal
> details. I then analyze the changes that happened to the encrypted file
> as a result.
> 
> As far as I can tell, the encryption method we use is reasonably
> resilient to even this kind of attack.

"Reasonably" depends on the application. I don't think an efficient
stream-based system can be perfectly resilient (for example, once I
found where in the stream the customer list is stored, I can watch how
often it is changed -- which leaks a non-zero amount of business
information).


> As far as I can tell, there are two ways I can answer this email. I can
> tell you that you can't realistically expect me to explain, on a public
> list, without making you sign an NDA first, the main competitive
> advantage we have, or to disclose our trade secret patent protected
> algorithm (and I wouldn't be the first one to think, mistakingly, that
> it is possible to patent trade secrets).
> 
> Or, I can tell you to hold your breath just a little bit longer, and
> hear all about it in the up coming August Penguin, where I'm giving a
> lecture on the way this precise algorithm is built, as well as publish a
> paper performing cryptanalysis of it.
> 
> If you feel truly impatient, feel free to download rsyncrypto
[...]
> The project is at http://sourceforge.net/projects/rsyncrypto, and the 
> utility itself is GPL.

Bravo! Got me there for a second...


So you're taking the self-synchronizing approach of rsyncable gzip:
changing the encryption so that local changes in plaintext cause
(pretty) local changes in ciphertext, so the thing behaves reasonably
efficiently when piped to vanilla lib/rsync. Specifically, you're
encrypting the compressed plaintext in CBC mode, and resetting the
chained value to the IV whenever the sum of the last sum_span=256 bytes
is divisible by sum_mod=8192 but no sooner than sum_min_dist=8192 bytes
after the last reset (a comment to this effect in the sourcecode would
have saved some people some time...).

So the expected chunk size (i.e., runs between IV resets) is 16K,
meaning a typical change of N bytes in the compressed plaintext will
change roughly N+8K bytes in the ciphertext. That's on top of a
comparable overhead from rsyncable gzip. Reasonable for many workloads,
I presume, though it would be nasty for a database undergoing many tiny
updates.

As for security... Well, it would certainly give the attacker some
trouble, but it's not leak-proof. For example:

Suppose the adversary can inject some data into the stream (via a web
form or whatever). Then he'll craft a plaintext sequence a consisting of
4097 bytes that sum to 0 modulo RSYNC_WIN=4096 (this forces 'gzip
-rsyncable' to a known state) followed by some data whose gzip
compression has sum 0 modulo 8192 over a rolling 256-byte window exactly
once and no sooner than 8192 from the beginning (this forces the tweaked
CBC mode to revert to its IV). Now, whenever this magic sequence is
injected, the whole compression+encryption process is reset to a state
that is fully determined by the IV, meaning (for example) the next block
can be thought of as encrypted under ECB, with all the obvious attacks
on that. Now, whether the adversary can inject the magic sequence into
the stream just before interesting business data -- that really depends
on the circumstances, but is far from unthinkable.

To give one marginally realistic scenario for the above, suppose your
customer mailing list is stored in a sorted plain text file. Now, I want
to find out if you're dealing with [EMAIL PROTECTED] I thus subscribe
"devik${MAGICSTRING}pad" to your list, so it would appear just before
"[EMAIL PROTECTED]" in the plaintext stream. I also inject
"[EMAIL PROTECTED]" into an arbitrary location in the
plaintext stream. Then, I look at the next backup and check if the
resulting ciphertext blocks are identical. I don't even need to look at
more than one backup to do that.



BTW, encrypting the raw data files separately via 'rsyncrypto -r'
reveals potentially meaningful filenames and makes traffic analysis oh
so easier. In the unlikely case you're doing that, and in the more
likely case that users of rsyncrypto will misuse '-r' once they see it,
I would have replaced
  rsyncrypto -r plain_dir cipher_dir keyfile key
with
  tar cf - plain_dir | rsyncrypto - cipher_file keyfile key
(which also gets you the various nifty features of tar).

Better yet, using rexecsync [1] to completely avoid temporary files, I'd
like to do

[EMAIL PROTECTED] rexecsync 'ssh [EMAIL PROTECTED]' \
           'tar clf - / | rsyncrypto - - keyfile key' \
           /backup/fiasco/today

(or its secure sudo-based equivalent). Is that a pipe dream?

  Eran


[1] http://tromer.org/misc/rexecsync
    "rexecsync: Run a command remotely and save its output to a
     local file, using a difference-based algorithm to reduce
     communication on subsequent updates."

=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

rsyncrypto (was: offtopic: a sad story about VPS)

Reply via email to