Hello everyone! I am having an issue with the backup of a few files here, taking more space than need on my ZFS dataset. After some digging, i found the issue is primarly caused by both gzip and rsyncrypto.
Here, i will only discuss of the rsyncrypto part making rsync to fail at backup efficiently : Suppose you make 2 files of 450MB, with only 50MB that changed, in the middle of the file (no deleted or added data, not even moved). To create a test case, here is what i made : dd if=/dev/urandom of=begin.iso bs=1M count=100 dd if=/dev/urandom of=end.iso bs=1M count=300 dd if=/dev/urandom of=middle1.iso bs=1M count=50 dd if=/dev/urandom of=middle2.iso bs=1M count=50 Lets build our two files : cat begin.iso middle1.iso end.iso >file1.iso cat begin.iso middle2.iso end.iso >file2.iso So we end up with two files of identical size, but 50MB diff somewhere inside : -rw-r--r-- 1 kuri users 471859200 2 févr. 14:55 file1.iso -rw-r--r-- 1 kuri users 471859200 2 févr. 14:55 file2.iso I now encrypt them with rsyncrypto : rsyncrypto --gzip=nullgzip file1.iso{,.enc} backup.{keys,crt} rsyncrypto --gzip=nullgzip file2.iso{,.enc} backup.{keys,crt} The first noticeable thing i see is that they dont do the same size once encrypted : -rw-r--r-- 1 root root 472063316 2 févr. 14:55 file1.iso.enc -rw-r--r-- 1 root root 472062484 2 févr. 14:55 file2.iso.enc Now if i copy the original files using rsync, I get interesting i/o work : [kuri:~/tmp/random] $ rsync --progress -av --inplace --no-whole-file -i file1.iso test/file.iso sending incremental file list > f+++++++++ file1.iso 471,859,200 100% 208.71MB/s 0:00:02 (xfr#1, to-chk=0/1) sent 471,974,500 bytes received 35 bytes 188,789,814.00 bytes/sec total size is 471,859,200 speedup is 1.00 [kuri:~/tmp/random] $ rsync --progress -av --inplace --no-whole-file -i file2.iso test/file.iso sending incremental file list > f..t...... file2.iso 471,859,200 100% 135.90MB/s 0:00:03 (xfr#1, to-chk=0/1) sent 52,543,948 bytes received 152,118 bytes 8,107,087.08 bytes/sec total size is 471,859,200 speedup is 8.95 [kuri:~/tmp/random] $ Now i copy the encrypted files : [kuri:~/tmp/random] $ rsync --progress -av --inplace --no-whole-file -i file1.iso.enc test/file.iso.enc sending incremental file list > f+++++++++ file1.iso.enc 472,063,316 100% 180.86MB/s 0:00:02 (xfr#1, to-chk=0/1) sent 472,178,659 bytes received 35 bytes 134,908,198.29 bytes/sec total size is 472,063,316 speedup is 1.00 [kuri:~/tmp/random] $ rsync --progress -av --inplace --no-whole-file -i file2.iso.enc test/file.iso.enc sending incremental file list > f.st...... file2.iso.enc 472,062,484 100% 111.87MB/s 0:00:04 (xfr#1, to-chk=0/1) sent 52,608,319 bytes received 152,188 bytes 9,592,819.45 bytes/sec total size is 472,062,484 speedup is 8.95 So, it worked perfectly on this test, but sometimes, it fails to do proper diff, so lets make another test file : dd if=/dev/urandom of=middle3.iso bs=1M count=50 cat begin.iso middle3.iso end.iso >file3.iso rsyncrypto --gzip=nullgzip file3.iso{,.enc} backup.{keys,crt} Lets look at the files : -rw-r--r-- 1 kuri users 471859200 2 févr. 14:55 file1.iso -rw-r--r-- 1 kuri users 471859200 2 févr. 14:55 file2.iso -rw-r--r-- 1 kuri users 471859200 3 févr. 09:07 file3.iso -rw-r--r-- 1 root root 472063316 2 févr. 14:55 file1.iso.enc -rw-r--r-- 1 root root 472062484 2 févr. 14:55 file2.iso.enc -rw-r--r-- 1 root root 472062932 3 févr. 09:07 file3.iso.enc Lets rsync the third file : [kuri:~/tmp/random] $ rsync --progress -av --inplace --no-whole-file -i file3.iso.enc test/file.iso.enc sending incremental file list > f.st...... file3.iso.enc 472,062,932 100% 53.22MB/s 0:00:08 (xfr#1, to-chk=0/1) sent 367,307,827 bytes received 152,188 bytes 34,996,191.90 bytes/sec total size is 472,062,932 speedup is 1.28 So, it copied 350MB of a 450MB file that only had 50MB changed. Lets see with the unencrypted files : [kuri:~/tmp/random] $ rsync --progress -av --inplace --no-whole-file -i file3.iso test/file.iso sending incremental file list > f..t...... file3.iso 471,859,200 100% 135.29MB/s 0:00:03 (xfr#1, to-chk=0/1) sent 52,543,947 bytes received 152,118 bytes 9,581,102.73 bytes/sec total size is 471,859,200 speedup is 8.95 So it is working properlly if files are not encrypted. Is it possible that due to having different filesize, rsync algorithm fails ? Do you have any hints ? The only thing i can see is that between file1.iso.enc and file2.iso.enc, the filesize dropped a little, and between file2.iso.enc and file3.iso.enc it is higher, but i have no idea if this can be related... Checking at the data of each encrypted file i can see that the last 300MB are exactly the same : [kuri:~/tmp/random] $ tail -c 314572800 file1.iso.enc | sha1sum ee0c8bb19a620f7cdd44705b1293df461af389bc - [kuri:~/tmp/random] $ tail -c 314572800 file2.iso.enc | sha1sum ee0c8bb19a620f7cdd44705b1293df461af389bc - [kuri:~/tmp/random] $ tail -c 314572800 file3.iso.enc | sha1sum ee0c8bb19a620f7cdd44705b1293df461af389bc - But the first 100MB are not : [kuri:~/tmp/random] $ head -c 104857600 file1.iso.enc | sha1sum d86fa953b25e1a01a53409f567cc845535525dc1 - [kuri:~/tmp/random] $ head -c 104857600 file2.iso.enc | sha1sum 0c10309cf8fe0bb349b05081c782469e4c2fb0e2 - [kuri:~/tmp/random] $ head -c 104857600 file3.iso.enc | sha1sum 338ba6c1a58dde8c334092986e5ce20e3b8114df - Any help would be greatly appreciated, i would like to backup even bigger files (some GBs), where over 90% of the file gets transferred if encrypted with rsyncrypto while only 2-4MB would be transferred otherwise. ------------------------------------------------------------------------------ Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ Rsyncrypto-devel mailing list Rsyncrypto-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rsyncrypto-devel