Hello everyone!

I am having an issue with the backup of a few files here, taking more 
space than need on my ZFS dataset.
After some digging, i found the issue is primarly caused by both gzip 
and rsyncrypto.

Here, i will only discuss of the rsyncrypto part making rsync to fail at 
backup efficiently :

Suppose you make 2 files of 450MB, with only 50MB that changed, in the 
middle of the file (no deleted or added data, not even moved).
To create a test case, here is what i made :

dd if=/dev/urandom of=begin.iso bs=1M count=100
dd if=/dev/urandom of=end.iso bs=1M count=300

dd if=/dev/urandom of=middle1.iso bs=1M count=50
dd if=/dev/urandom of=middle2.iso bs=1M count=50


Lets build our two files :
cat begin.iso middle1.iso end.iso >file1.iso
cat begin.iso middle2.iso end.iso >file2.iso

So we end up with two files of identical size, but 50MB diff somewhere 
inside :
-rw-r--r-- 1 kuri users 471859200  2 févr. 14:55 file1.iso
-rw-r--r-- 1 kuri users 471859200  2 févr. 14:55 file2.iso

I now encrypt them with rsyncrypto :
rsyncrypto --gzip=nullgzip file1.iso{,.enc} backup.{keys,crt}
rsyncrypto --gzip=nullgzip file2.iso{,.enc} backup.{keys,crt}

The first noticeable thing i see is that they dont do the same size once 
encrypted :
-rw-r--r-- 1 root root  472063316  2 févr. 14:55 file1.iso.enc
-rw-r--r-- 1 root root  472062484  2 févr. 14:55 file2.iso.enc

Now if i copy the original files using rsync, I get interesting i/o work 
:
[kuri:~/tmp/random] $ rsync --progress -av --inplace --no-whole-file -i 
file1.iso test/file.iso
sending incremental file list
> f+++++++++ file1.iso
     471,859,200 100%  208.71MB/s    0:00:02 (xfr#1, to-chk=0/1)

sent 471,974,500 bytes  received 35 bytes  188,789,814.00 bytes/sec
total size is 471,859,200  speedup is 1.00
[kuri:~/tmp/random] $ rsync --progress -av --inplace --no-whole-file -i 
file2.iso test/file.iso
sending incremental file list
> f..t...... file2.iso
     471,859,200 100%  135.90MB/s    0:00:03 (xfr#1, to-chk=0/1)

sent 52,543,948 bytes  received 152,118 bytes  8,107,087.08 bytes/sec
total size is 471,859,200  speedup is 8.95
[kuri:~/tmp/random] $


Now i copy the encrypted files :
[kuri:~/tmp/random] $ rsync --progress -av --inplace --no-whole-file -i 
file1.iso.enc test/file.iso.enc
sending incremental file list
> f+++++++++ file1.iso.enc
     472,063,316 100%  180.86MB/s    0:00:02 (xfr#1, to-chk=0/1)

sent 472,178,659 bytes  received 35 bytes  134,908,198.29 bytes/sec
total size is 472,063,316  speedup is 1.00
[kuri:~/tmp/random] $ rsync --progress -av --inplace --no-whole-file -i 
file2.iso.enc test/file.iso.enc
sending incremental file list
> f.st...... file2.iso.enc
     472,062,484 100%  111.87MB/s    0:00:04 (xfr#1, to-chk=0/1)

sent 52,608,319 bytes  received 152,188 bytes  9,592,819.45 bytes/sec
total size is 472,062,484  speedup is 8.95

So, it worked perfectly on this test, but sometimes,
it fails to do proper diff, so lets make another test file :
dd if=/dev/urandom of=middle3.iso bs=1M count=50
cat begin.iso middle3.iso end.iso >file3.iso
rsyncrypto --gzip=nullgzip file3.iso{,.enc} backup.{keys,crt}

Lets look at the files :
-rw-r--r-- 1 kuri users 471859200  2 févr. 14:55 file1.iso
-rw-r--r-- 1 kuri users 471859200  2 févr. 14:55 file2.iso
-rw-r--r-- 1 kuri users 471859200  3 févr. 09:07 file3.iso

-rw-r--r-- 1 root root 472063316  2 févr. 14:55 file1.iso.enc
-rw-r--r-- 1 root root 472062484  2 févr. 14:55 file2.iso.enc
-rw-r--r-- 1 root root 472062932  3 févr. 09:07 file3.iso.enc

Lets rsync the third file :
[kuri:~/tmp/random] $ rsync --progress -av --inplace --no-whole-file -i 
file3.iso.enc test/file.iso.enc
sending incremental file list
> f.st...... file3.iso.enc
     472,062,932 100%   53.22MB/s    0:00:08 (xfr#1, to-chk=0/1)

sent 367,307,827 bytes  received 152,188 bytes  34,996,191.90 bytes/sec
total size is 472,062,932  speedup is 1.28

So, it copied 350MB of a 450MB file that only had 50MB changed.
Lets see with the unencrypted files :
[kuri:~/tmp/random] $ rsync --progress -av --inplace --no-whole-file -i 
file3.iso test/file.iso
sending incremental file list
> f..t...... file3.iso
     471,859,200 100%  135.29MB/s    0:00:03 (xfr#1, to-chk=0/1)

sent 52,543,947 bytes  received 152,118 bytes  9,581,102.73 bytes/sec
total size is 471,859,200  speedup is 8.95

So it is working properlly if files are not encrypted.
Is it possible that due to having different filesize, rsync algorithm 
fails ?
Do you have any hints ?
The only thing i can see is that between file1.iso.enc and 
file2.iso.enc,
the filesize dropped a little, and between file2.iso.enc and 
file3.iso.enc it is higher,
but i have no idea if this can be related...

Checking at the data of each encrypted file i can see that the last 
300MB are exactly the same :
[kuri:~/tmp/random] $ tail -c 314572800 file1.iso.enc | sha1sum
ee0c8bb19a620f7cdd44705b1293df461af389bc  -
[kuri:~/tmp/random] $ tail -c 314572800 file2.iso.enc | sha1sum
ee0c8bb19a620f7cdd44705b1293df461af389bc  -
[kuri:~/tmp/random] $ tail -c 314572800 file3.iso.enc | sha1sum
ee0c8bb19a620f7cdd44705b1293df461af389bc  -

But the first 100MB are not :
[kuri:~/tmp/random] $ head -c 104857600 file1.iso.enc | sha1sum
d86fa953b25e1a01a53409f567cc845535525dc1  -
[kuri:~/tmp/random] $ head -c 104857600 file2.iso.enc | sha1sum
0c10309cf8fe0bb349b05081c782469e4c2fb0e2  -
[kuri:~/tmp/random] $ head -c 104857600 file3.iso.enc | sha1sum
338ba6c1a58dde8c334092986e5ce20e3b8114df  -


Any help would be greatly appreciated, i would like to backup even 
bigger files (some GBs), where over 90% of the file gets transferred if 
encrypted with rsyncrypto while only 2-4MB would be transferred 
otherwise.


------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Rsyncrypto-devel mailing list
Rsyncrypto-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsyncrypto-devel

Reply via email to