On 08/11/2019 19:00, David Diepenbrock wrote:
I may have discovered another bug, and I apologize in advance for not digging into it more, and for polluting this thread.  When I ran on top of my rather old encrypted copy, after compiling with the delete bug fixed, I noticed that the size was *significantly* larger than it should be, on the order of double or more what I was expecting.  Now I had cleaned out the source dir quite a bit, so I suspect that not everything was properly deleted.  A quick check at the file counts showed the encrypted directory had nearly 4x the file count of the source dir.

That may be a bug, indeed. I'll try to have a look at it.


With that said, please note that there is a very easy work around. Just delete all of the encrypted files, leaving only the key files (which are very small), and then re-encrypt your data. You will have just the relevant files, and they will still be rsyncable.


Unfortunately, this made it a no-go for me to resume using rsyncrypto.  For now I've migrated to using a fuse encrypted filesystem instead (gocryptfs in my case, since I needed something with reverse mode for feeding into rsync).  With the fuse option it means I don't need to preserve an encrypted copy of the data on disk, which rather significantly outweighs any of the drawbacks.  As such, I can't see myself moving back to rsyncrypto anytime soon, unless someone can point out something I might be missing?

Here's what my understanding of how Fuse based solutions work. The file system keeps track over what changed, encrypts the delta, and sends it over. This keeps the changes small and gives you, in many cases, bandwidth efficiency similar to rsyncrypto[1].


Here's this system's down side. Personally, I find this down side so bad as to make the system unworkable, but most people don't seem to care: You can never free space on the back up storage device. Every single intermediate backup is potentially crucial for correct restore. There are two ways, and only two ways, to keep your system up to date. Either resign yourself to your remote backup folder getting bigger and bigger as time moves on, or periodically re-sync the whole data set. For a system designed to keep bandwidth low, I find it unacceptable.


When I say this to people, the common answer is that neither bandwidth nor storage are that expensive these days[2]. I find this answer near-sighted, as it only views the "backup" side of the equation. There is a very serious "restore" side to consider.


Encrypted backup should, ideally, have just one "single point of failure" point, which is the encryption key (for rsyncrypto, even that's not true: you store the symmetric key locally in the key files, and you have your RSA master key). Fuse based solutions have another failure point: every delta produced and encrypted must find it's way to the backup storage and remain there. If one such delta fails to be stored, reliable restore is impossible from that point onwards. If this update goes missing the whole backup becomes unreliable.


Compare this to rsyncrypto's failure mode (the worst of which you experienced): if Rsyncrypto lose track of a file, it will re-encrypt that file. This wastes storage, but is otherwise harmless. Since rsyncrypto requires very little state, it is also very cheap to recover from this problem (you need to re-encrypt everything locally, but you do not need to re-upload everything).


In summary: I have moved on. People are not interested in what rsyncrypto has to offer, and I accept that. I wish I understood that, however, as it seems to me to be a genuinely superior solution (though, admittedly, more clunky) to what people are actually choosing.


Rsyncrypto can be made better. The file name can be stored, encrypted, inside the file to allow recovering the file map. I could think of a system that would integrate rsyncrypto and rsync, so that the files could be encrypted on the fly, saving local storage. I don't think those will change rsyncrypto's adoption in any significant ways, so I don't spend my time on them.


Shachar



1 - There certain types of changes for which this method doesn't save bandwidth. If you take a large file and add one byte at its beginning, with rsyncrypto+rsync you will have to resync about 8KB of data, whereas with FUSE based solution you'd have to retransmit the entire file. As I said above, most people don't seem to find that painful at this day and age.


2 - There is no bound on how much excessive storage is used, and no simple work around to regain that lost storage, except by re-uploading the whole data set. Please remember you complained about 4x data usage.

_______________________________________________
Rsyncrypto-devel mailing list
Rsyncrypto-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsyncrypto-devel

Reply via email to