[Bug 5482] look into a hierarchical checksum algorithm that would help to efficiently transmit really large files
https://bugzilla.samba.org/show_bug.cgi?id=5482 Wayne Davison changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #7 from Wayne Davison --- Rsync now supports xxhash, which is a huge speedup in checksum speed. It also has support for x86 acceleration for the rolling checksum (checksum1) -- You are receiving this mail because: You are the QA Contact for the bug. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
[Bug 13433] out_of_memory in receive_sums on large files
https://bugzilla.samba.org/show_bug.cgi?id=13433 Wayne Davison changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #6 from Wayne Davison --- You can specify a larger malloc sanity check in the latest rsync (which will also let you know when the limit is exceeded instead of claiming that it is out of memory). -- You are receiving this mail because: You are the QA Contact for the bug. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
[Bug 13433] out_of_memory in receive_sums on large files
https://bugzilla.samba.org/show_bug.cgi?id=13433 --- Comment #5 from MulticoreNOP --- might be related to bug #12769 -- You are receiving this mail because: You are the QA Contact for the bug. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsync rewrites all blocks of large files although it uses delta transfer
On Thu 14 Feb 2019, Delian Krustev via rsync wrote: > On Wednesday, February 13, 2019 6:25:59 PM EET Remi Gauvin > wrote: > > If the --inplace delta is as large as the filesize, then the > > structure/location of the data has changed enough that the whole file > > would have to be written out in any case. > > This is not the case. > If you see my original post you would have noticed that the delta transfer > finds only about 20 MB of differences within the almost 2G datafile. I think you're missing the point of Remi's message. Say the original file is: ABCDEFGHIJ The new file is: XABCDEFGHI Then the delta is just 10%, but the entire file needs to be rewritten as the structure is changed. Paul -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsync rewrites all blocks of large files although it uses delta transfer
On Wednesday, February 13, 2019 6:25:59 PM EET Remi Gauvin wrote: > If the --inplace delta is as large as the filesize, then the > structure/location of the data has changed enough that the whole file > would have to be written out in any case. This is not the case. If you see my original post you would have noticed that the delta transfer finds only about 20 MB of differences within the almost 2G datafile. The problem with --inplace without --backupdir is that delta transfers can no longer work efficiently. Cheers -- Delian -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsync rewrites all blocks of large files although it uses delta transfer
On Wednesday, February 13, 2019 6:20:13 PM EET Remi Gauvin via rsync wrote: > Have you run the nifs-clean before checking this free space comparison? > Maybe there is just large amplification created by Rsyn's many small > writes when using --inplace. nilfs-clean is being suspended for the time of the backup. It would have idled if the fullness threshold of the FS (90% by default) have not been reached. The problem is probably that these mysqldump files have changed data near the beginning of the files. Thus any later blocks have to be overwritten. In order to avoid this "rsync" would have to allocate and deallocate space in the middle of the file: http://man7.org/linux/man-pages/man2/fallocate.2.html and unfortunately the respective syscalls are not portable, quite new and filesystem specific. Would have been nice to have these for all OSes and filesystems though. And better yet not aligned on FS block size. E.g.: - give me 5 new blocks in the middle of file F starting at POS - do not use the entire last block of these 5 but rather only X bytes of it. or - replace block 5 with "this" partial block data - truncate blocks 6 to 20 I can find a usage for them in many application workflows - from text editors trough databases to backup software .. Cheers -- Delian -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsync rewrites all blocks of large files although it uses delta transfer
On 2019-02-13 10:47 a.m., Delian Krustev via rsync wrote: > > > Free space at the beginning and end of the backup: > Filesystem 1M-blocks Used Available Use% Mounted on > /dev/mapper/bkp 102392 76872 20400 80% /mnt/bkp > /dev/mapper/bkp 102392 78768 18504 81% /mnt/bkp > > > > As can be seen "rsync" has sent about 20M and received 300K of data. However > the filesystem has allocated almost 2G, which is the total size of the files > being backed up. > > The filesystem mounted on "/mnt/bkp" is of type "nilfs2", which is a log > structured filesystem. I'm using its snapshotting feature to keep backups for > past dates. Have you run the nifs-clean before checking this free space comparison? Maybe there is just large amplification created by Rsyn's many small writes when using --inplace. <> signature.asc Description: OpenPGP digital signature -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsync rewrites all blocks of large files although it uses delta transfer
On 2019-02-13 5:26 p.m., Delian Krustev via rsync wrote: > > The copy is needed for the comparison of the blocks as "--inplace" overwrites > the destination file. I've tried without "--backup" but then the delta > transfers too much data - close to the size of the backed-up files. > It's cool that --backup can be used as source data that way, a feature was unaware of.. but I think you found the cause of your problem right here as well. If the --inplace delta is as large as the filesize, then the structure/location of the data has changed enough that the whole file would have to be written out in any case. <> signature.asc Description: OpenPGP digital signature -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsync rewrites all blocks of large files although it uses delta transfer
It can't do what you want. The closest thing would be --compare-dest. On 2/13/19 5:26 PM, Delian Krustev wrote: > On Wednesday, February 13, 2019 11:29:44 AM EET Kevin Korb via rsync > wrote: >> With --backup in order to end up with 2 files it has to write out a >> whole new file. >> Sure, it only sent the differences (normally that means >> over the network but there is no network here) but the writing end was >> told to duplicate the file being updated before updating it. > > The copy is needed for the comparison of the blocks as "--inplace" overwrites > the destination file. I've tried without "--backup" but then the delta > transfers too much data - close to the size of the backed-up files. > > The copy is in a temp file system which is discarded after the backup (by "rm > -rf"). This temp filesystem is not log structured or copy-on-write so having > a > copy there is not a big problem. Although I don't want a backup of all files > which are modified but rather a TMPDIR. > > The ideal workflow would be to compare SRC and DST and write changed blocks > to > the TMPDIR, then read them from TMPDIR and apply it to DST. > > > > > Cheers > -- > Delian > -- ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._., Kevin Korb Phone:(407) 252-6853 Systems Administrator Internet: FutureQuest, Inc. ke...@futurequest.net (work) Orlando, Floridak...@sanitarium.net (personal) Web page: https://sanitarium.net/ PGP public key available on web site. ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._., signature.asc Description: OpenPGP digital signature -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsync rewrites all blocks of large files although it uses delta transfer
On Wednesday, February 13, 2019 11:29:44 AM EET Kevin Korb via rsync wrote: > With --backup in order to end up with 2 files it has to write out a > whole new file. > Sure, it only sent the differences (normally that means > over the network but there is no network here) but the writing end was > told to duplicate the file being updated before updating it. The copy is needed for the comparison of the blocks as "--inplace" overwrites the destination file. I've tried without "--backup" but then the delta transfers too much data - close to the size of the backed-up files. The copy is in a temp file system which is discarded after the backup (by "rm -rf"). This temp filesystem is not log structured or copy-on-write so having a copy there is not a big problem. Although I don't want a backup of all files which are modified but rather a TMPDIR. The ideal workflow would be to compare SRC and DST and write changed blocks to the TMPDIR, then read them from TMPDIR and apply it to DST. Cheers -- Delian -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsync rewrites all blocks of large files although it uses delta transfer
With --backup in order to end up with 2 files it has to write out a whole new file. Sure, it only sent the differences (normally that means over the network but there is no network here) but the writing end was told to duplicate the file being updated before updating it. On 2/13/19 10:47 AM, Delian Krustev via rsync wrote: > Hi All, > > For a backup purpose I'm trying to transfer only the changed blocks of > large files. Thus I've run "rsync" with the appropriate options: > > RSYNC_BKPDIR=`mktemp -d` > rsync \ > --archive \ > --no-whole-file \ > --inplace \ > --backup \ > --backup-dir="$RSYNC_BKPDIR" \ > --verbose \ > --stats \ > /var/backups/mysql-dbs/. \ > /mnt/bkp/var/backups/mysql-dbs/. > > The problem is that although "rsync" shows that delta transfer is used(when > run with -vv) and only small amount if data is transferred, the target files > look to be overwritten in full. > > Here is the output of "rsync" and some more debugging info: > > > > sending incremental file list > ./ > horde.data.sql > horde.schema.sql > LARGEDB.data.sql > LARGEDB.schema.sql > mysql.data.sql > mysql.schema.sql > phpmyadmin.data.sql > phpmyadmin.schema.sql > > Number of files: 9 (reg: 8, dir: 1) > Number of created files: 0 > Number of deleted files: 0 > Number of regular files transferred: 8 > Total file size: 1,944,522,704 bytes > Total transferred file size: 1,944,522,704 bytes > Literal data: 21,421,681 bytes > Matched data: 1,923,101,023 bytes > File list size: 0 > File list generation time: 0.001 seconds > File list transfer time: 0.000 seconds > Total bytes sent: 21,612,218 > Total bytes received: 323,302 > > sent 21,612,218 bytes received 323,302 bytes 259,591.95 bytes/sec > total size is 1,944,522,704 speedup is 88.65 > > # du -m 1.9G /tmp/tmp.8gBzjNQOQZ > 1.9G /tmp/tmp.8gBzjNQOQZ > > # tree -a /tmp/tmp.8gBzjNQOQZ > /tmp/tmp.8gBzjNQOQZ > ├── horde.data.sql > ├── horde.schema.sql > ├── LARGEDB.data.sql > ├── LARGEDB.schema.sql > ├── mysql.data.sql > ├── mysql.schema.sql > ├── phpmyadmin.data.sql > └── phpmyadmin.schema.sql > > 0 directories, 8 files > > Free space at the beginning and end of the backup: > Filesystem 1M-blocks Used Available Use% Mounted on > /dev/mapper/bkp 102392 76872 20400 80% /mnt/bkp > /dev/mapper/bkp 102392 78768 18504 81% /mnt/bkp > > > > As can be seen "rsync" has sent about 20M and received 300K of data. However > the filesystem has allocated almost 2G, which is the total size of the files > being backed up. > > The filesystem mounted on "/mnt/bkp" is of type "nilfs2", which is a log > structured filesystem. I'm using its snapshotting feature to keep backups for > past dates. > > > Is there anything that can be done in order "rsync" to overwrite only the > changed blocks ? > > > > > P.S. I guess that it will be the same for copy-on-write filesystems, e.g. > BTRFS or ZFS. > > > > Cheers > -- > Delian > -- ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._., Kevin Korb Phone:(407) 252-6853 Systems Administrator Internet: FutureQuest, Inc. ke...@futurequest.net (work) Orlando, Floridak...@sanitarium.net (personal) Web page: https://sanitarium.net/ PGP public key available on web site. ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._., signature.asc Description: OpenPGP digital signature -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
rsync rewrites all blocks of large files although it uses delta transfer
Hi All, For a backup purpose I'm trying to transfer only the changed blocks of large files. Thus I've run "rsync" with the appropriate options: RSYNC_BKPDIR=`mktemp -d` rsync \ --archive \ --no-whole-file \ --inplace \ --backup \ --backup-dir="$RSYNC_BKPDIR" \ --verbose \ --stats \ /var/backups/mysql-dbs/. \ /mnt/bkp/var/backups/mysql-dbs/. The problem is that although "rsync" shows that delta transfer is used(when run with -vv) and only small amount if data is transferred, the target files look to be overwritten in full. Here is the output of "rsync" and some more debugging info: sending incremental file list ./ horde.data.sql horde.schema.sql LARGEDB.data.sql LARGEDB.schema.sql mysql.data.sql mysql.schema.sql phpmyadmin.data.sql phpmyadmin.schema.sql Number of files: 9 (reg: 8, dir: 1) Number of created files: 0 Number of deleted files: 0 Number of regular files transferred: 8 Total file size: 1,944,522,704 bytes Total transferred file size: 1,944,522,704 bytes Literal data: 21,421,681 bytes Matched data: 1,923,101,023 bytes File list size: 0 File list generation time: 0.001 seconds File list transfer time: 0.000 seconds Total bytes sent: 21,612,218 Total bytes received: 323,302 sent 21,612,218 bytes received 323,302 bytes 259,591.95 bytes/sec total size is 1,944,522,704 speedup is 88.65 # du -m 1.9G /tmp/tmp.8gBzjNQOQZ 1.9G /tmp/tmp.8gBzjNQOQZ # tree -a /tmp/tmp.8gBzjNQOQZ /tmp/tmp.8gBzjNQOQZ ├── horde.data.sql ├── horde.schema.sql ├── LARGEDB.data.sql ├── LARGEDB.schema.sql ├── mysql.data.sql ├── mysql.schema.sql ├── phpmyadmin.data.sql └── phpmyadmin.schema.sql 0 directories, 8 files Free space at the beginning and end of the backup: Filesystem 1M-blocks Used Available Use% Mounted on /dev/mapper/bkp 102392 76872 20400 80% /mnt/bkp /dev/mapper/bkp 102392 78768 18504 81% /mnt/bkp As can be seen "rsync" has sent about 20M and received 300K of data. However the filesystem has allocated almost 2G, which is the total size of the files being backed up. The filesystem mounted on "/mnt/bkp" is of type "nilfs2", which is a log structured filesystem. I'm using its snapshotting feature to keep backups for past dates. Is there anything that can be done in order "rsync" to overwrite only the changed blocks ? P.S. I guess that it will be the same for copy-on-write filesystems, e.g. BTRFS or ZFS. Cheers -- Delian -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
[Bug 13645] Improve efficiency when resuming transfer of large files
https://bugzilla.samba.org/show_bug.cgi?id=13645 --- Comment #4 from Rob Janssen --- Ok you apparently did not understand what I proposed. However it is not that important as in our use case we can use --append. -- You are receiving this mail because: You are the QA Contact for the bug. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
[Bug 13645] Improve efficiency when resuming transfer of large files
https://bugzilla.samba.org/show_bug.cgi?id=13645 Wayne Davison changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |WONTFIX --- Comment #3 from Wayne Davison --- Rsync is never going to assume that a file can be continued, as it doesn't know what the old data is compared to the source. You can tell rsync to assume that the early data is all fine by using --append, but that can cause you problems if any non-new files need an update that is not an append. -- You are receiving this mail because: You are the QA Contact for the bug. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: [Bug 13645] New: Improve efficiency when resuming transfer of large files
If you are doing a local<-> local transfer, you are wasting time with checksums. You'll get faster performance with "--whole-file". Why do you stop it at night when you could 'unlimit' the transfer speed? Seems like when you aren't there would be best time to copy everything. Doing checksums will cause a noticeable impact to local-file transfers. On 10/5/2018 10:34 AM, just subscribed for rsync-qa from bugzilla via rsync wrote: https://bugzilla.samba.org/show_bug.cgi?id=13645 When transferring large files over a slow network, ... The command used is: rsync -av --inplace --bwlimit=400 hostname::module /dest When restarting the transfer, a lot of time is "wasted" while first the local system is reading the partially transferred file and sends the checksums to the remote, ... Of course these optimizations (at least #2) may actually decrease performance when the transfer is local (not over slow network) and the disk read rate is negatively affected by reading at two different places in parallel. So #2 should only be attempted when the transfer is over a network. --- Or might decrease performance on a fast network. Not sure what you mean by 'slow' 10Mb? 100Mb -- not sure w/o measuring if it is faster or slower to do checksums, but I know at 1000Mb and 10Gb, checksums are prohibitively expensive. NOTE: you also might look at the protocol you use to do network transfers. I.e. use rsync over a locally mounted disk to a locally mounted network share, and make the network share a samba one. That way you will get parallelism automatically -- the file transfer cpu-time will happen inside of samba, while the local file gathering will happen in rsync. I regularly got ~ 119MB R/W over 1000Mb ethernet. BTW, Any place I use a power-of-2 unit like 'B' (Byte), I use the power-of-two base (1024) prefix, but if I use a singular unit like 'b' (bit), then I use decimal prefixes. Doing otherwise makes things hard to calculate and can introduce calculation inaccuracies. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
[Bug 13645] Improve efficiency when resuming transfer of large files
https://bugzilla.samba.org/show_bug.cgi?id=13645 --- Comment #2 from Rob Janssen --- Thanks, that helps a lot for this particular use case. (the files are backups) -- You are receiving this mail because: You are the QA Contact for the bug. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
[Bug 13645] Improve efficiency when resuming transfer of large files
https://bugzilla.samba.org/show_bug.cgi?id=13645 --- Comment #1 from Kevin Korb --- If you are sure the file has not been changed since it was partially copied, see --append. -- You are receiving this mail because: You are the QA Contact for the bug. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
[Bug 13645] New: Improve efficiency when resuming transfer of large files
https://bugzilla.samba.org/show_bug.cgi?id=13645 Bug ID: 13645 Summary: Improve efficiency when resuming transfer of large files Product: rsync Version: 3.0.9 Hardware: All OS: All Status: NEW Severity: enhancement Priority: P5 Component: core Assignee: way...@samba.org Reporter: pe1...@amsat.org QA Contact: rsync...@samba.org When transferring large files over a slow network, we interrupt rsync at the beginning of business hours leaving the transfer unfinished. The command used is: rsync -av --inplace --bwlimit=400 hostname::module /dest When restarting the transfer, a lot of time is "wasted" while first the local system is reading the partially transferred file and sends the checksums to the remote, which only then starts to read the source file until it finds something to transfer. So nothing happens until 2 times the time required to read the partial transfer from the disks! When the partial file is many many GB, this can take hours. Suggestions: 1. when the source is larger than the destination, immediately begin to transfer from the offset in the source equal to the size of the destination. it is already known that this part will have to be transferred. 2. try to do the reading of the partial file at the destination and the same part of the source in parallel (so the time is halved), and preferably also in parallel to 1. Of course these optimizations (at least #2) may actually decrease performance when the transfer is local (not over slow network) and the disk read rate is negatively affected by reading at two different places in parallel. So #2 should only be attempted when the transfer is over a network. -- You are receiving this mail because: You are the QA Contact for the bug. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
[Bug 13433] out_of_memory in receive_sums on large files
https://bugzilla.samba.org/show_bug.cgi?id=13433 --- Comment #4 from Ben RUBSON--- util2.c:#define MALLOC_MAX 0x4000 Which is 1 GB. 1 GB / 40 bytes x 131072 bytes = 3276 GB, which is then the maximum file size in protocol_version >= 30. Did you try to increase MALLOC_MAX on sending side ? Btw, would be interesting to know why MAX_BLOCK_SIZE has been limited to 128 KB. rsync.h:#define MAX_BLOCK_SIZE ((int32)1 << 17) -- You are receiving this mail because: You are the QA Contact for the bug. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
[Bug 13433] out_of_memory in receive_sums on large files
https://bugzilla.samba.org/show_bug.cgi?id=13433 --- Comment #3 from Kevin Day--- Just adding --protocol=29 falls back to the older chunk generator code and automatically selects 2MB chunks which is enough to at least make this work without a malloc error. -- You are receiving this mail because: You are the QA Contact for the bug. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
[Bug 13433] out_of_memory in receive_sums on large files
https://bugzilla.samba.org/show_bug.cgi?id=13433 --- Comment #2 from Kevin Day--- (In reply to Dave Gordon from comment #1) It looks like that's no longer allowed? rsync: --block-size=10485760 is too large (max: 131072) rsync error: syntax or usage error (code 1) at main.c(1591) [client=3.1.3] #define MAX_BLOCK_SIZE ((int32)1 << 17) if (block_size > MAX_BLOCK_SIZE) { snprintf(err_buf, sizeof err_buf, "--block-size=%lu is too large (max: %u)\n", block_size, MAX_BLOCK_SIZE); return 0; } OLD_MAX_BLOCK_SIZE is defined, but options.c would need to be patched to allow looser block sizes if protocol_version < 30 -- You are receiving this mail because: You are the QA Contact for the bug. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
[Bug 13433] out_of_memory in receive_sums on large files
https://bugzilla.samba.org/show_bug.cgi?id=13433 --- Comment #1 from Dave Gordon--- Maybe try --block-size=10485760 --protocol=29 as mentioned here: https://bugzilla.samba.org/show_bug.cgi?id=10518#c8 -- You are receiving this mail because: You are the QA Contact for the bug. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
[Bug 13433] New: out_of_memory in receive_sums on large files
https://bugzilla.samba.org/show_bug.cgi?id=13433 Bug ID: 13433 Summary: out_of_memory in receive_sums on large files Product: rsync Version: 3.1.3 Hardware: All OS: All Status: NEW Severity: normal Priority: P5 Component: core Assignee: way...@samba.org Reporter: toa...@dragondata.com QA Contact: rsync...@samba.org I'm attempting to rsync a 4TB file. It fails with: generating and sending sums for 0 count=33554432 rem=0 blength=131072 s2length=6 flength=4398046511104 chunk[0] offset=0 len=131072 sum1=8d15ed6f chunk[1] offset=131072 len=131072 sum1=3d66e7f7 [omitted] chunk[6550] offset=858521600 len=131072 sum1=d70deab6 chunk[6551] offset=858652672 len=131072 sum1=657e34df send_files(0, /bay3/b.tc) count=33554432 n=131072 rem=0 ERROR: out of memory in receive_sums [sender] [sender] _exit_cleanup(code=22, file=util2.c, line=105): entered rsync error: error allocating core memory buffers (code 22) at util2.c(105) [sender=3.1.3] This is getting called: 92 if (!(s->sums = new_array(struct sum_buf, s->count))) 93 out_of_memory("receive_sums"); And the size of a sum_buf(40 bytes) * the number of sums (33554432) exceeds MALLOC_MAX. How is this supposed to work/why is it breaking here, when I'm pretty sure I've transferred files bigger than this before? -- You are receiving this mail because: You are the QA Contact for the bug. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
100% CPU freeze on read of large files with --sparse
Hello, While restoring a large data backup which contained some big sparse-ish files, using rsync 3.1.1, (these were VMDK files to be precise), I found that adding the --sparse option can permanently wedge the rsync processes. I performed a few basic checks during the time it happened (at one point I left it a few days so I suspect it can last more or less forever). * strace didn't show any syscall activity, making me suspect it was blocked in userland * kill and kill -9 could not stop the processes, which would imply it was blocked in kernel IO * strace of the 100% processes did not display any syscall activity * the processes refused to stop consuming 100% CPU, until the system was rebooted * rebooting the system took forever on the all-process-kill timers I wanted to see if anybody had seen similar behavior before, or if there is more I could do to diagnose the cause. It's the first time in many years of use I ever got any unexplaining behavior like this from rsync so I wasn't sure what I should check since it defied most typical debug tools. The behavior appeared to quit when --sparse was removed. Sincerely, Matthew. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
[Bug 8512] rsync is slower than cp -- Reduce overhead to cp for transfer of large files
https://bugzilla.samba.org/show_bug.cgi?id=8512 --- Comment #5 from Peter van Hooft ho...@natlab.research.philips.com 2014-09-09 07:49:13 UTC --- We use rsync to copy data from one file server to another using NFS3 mounts over a 10Gb link. We found that upping the buffer sizes (as a quick test) increases performance. When using --sparse this increases performance with a factor of fifty, from 2MBps to 100MBps. % diff -u rsync.h-org rsync.h --- rsync.h-org 2014-04-13 19:36:59.0 +0200 +++ rsync.h 2014-09-08 16:20:41.427973852 +0200 @@ -131,11 +131,11 @@ #define RSYNC_PORT 873 -#define SPARSE_WRITE_SIZE (1024) -#define WRITE_SIZE (32*1024) -#define CHUNK_SIZE (32*1024) +#define SPARSE_WRITE_SIZE (128*1024) +#define WRITE_SIZE (128*1024) +#define CHUNK_SIZE (128*1024) #define MAX_MAP_SIZE (256*1024) -#define IO_BUFFER_SIZE (32*1024) +#define IO_BUFFER_SIZE (128*1024) #define MAX_BLOCK_SIZE ((int32)1 17) /* For compatibility with older rsyncs */ % It sure would be nice if these sizes were `officially' increased. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the QA contact for the bug. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: [Bug 8512] rsync is slower than cp -- Reduce overhead to cp for transfer of large files
Why not enable Jumbo Frames? http://stromberg.dnsalias.org/~strombrg/jumbo.html For NFS, you can use http://stromberg.dnsalias.org/~strombrg/nfs-test.html to get some fast settings. The script could be modified to do CIFS I suppose. HTH -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
rsync performance on large files strongly depends on file's (dis)similarity
Hi list, I've found this post on rsync's expected performance for large files: https://lists.samba.org/archive/rsync/2007-January/017033.html I have a related but different observation to share: with files in the multi-gigabyte-range, I've noticed that rsync's runtime also depends on how much the source/destination diverge, i.e., synchronization is faster if the files are similar. However, this is not just because less data must be transferred. For example, on an 8 GiB file with 10% updates, rsync takes 390 seconds. With 50% updates, it takes about 1400 seconds, and at 90% updates about 2400 seconds. My current explanation, and it would be awesome if someone more knowledgeable than me could confirm, is this: with very large files, we'd expect a certain level of false alarms, i.e., weak checksum matches, but strong checksum does not. However, with large files that are very similar, a weak match is much more likely to be confirmed with a matching strong checksum. Contrary, with large files that are very dissimilar a weak match is much less likely to be confirmed with a strong checksum, exactly because the files are very different from each other. rsync ends up computing lots of strong checksums, which do not result in a match. Is this a valid/reasonable explanation? Can someone else confirm this relationship between rsync's computational overhead and the file's (dis)similarity? Thanks, Thomas. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsync performance on large files strongly depends on file's (dis)similarity
Maybe an alternative explanation is that a high degree of similarity allows to skip more bytes on the sender. For each matched block, the sender can does not need to compute any checksums, weak or strong, for the next S bytes, where S is the block size. As the number of matched blocks decreases, i.e., dissimilarity increases, the number of computed checksums grows. This relationship is especially apparent for large files, where many strong (and expensive) checksum must be computed, due to many false alarms. On Fri, Apr 11, 2014 at 1:35 PM, Thomas Knauth thomas.kna...@gmx.de wrote: Hi list, I've found this post on rsync's expected performance for large files: https://lists.samba.org/archive/rsync/2007-January/017033.html I have a related but different observation to share: with files in the multi-gigabyte-range, I've noticed that rsync's runtime also depends on how much the source/destination diverge, i.e., synchronization is faster if the files are similar. However, this is not just because less data must be transferred. For example, on an 8 GiB file with 10% updates, rsync takes 390 seconds. With 50% updates, it takes about 1400 seconds, and at 90% updates about 2400 seconds. My current explanation, and it would be awesome if someone more knowledgeable than me could confirm, is this: with very large files, we'd expect a certain level of false alarms, i.e., weak checksum matches, but strong checksum does not. However, with large files that are very similar, a weak match is much more likely to be confirmed with a matching strong checksum. Contrary, with large files that are very dissimilar a weak match is much less likely to be confirmed with a strong checksum, exactly because the files are very different from each other. rsync ends up computing lots of strong checksums, which do not result in a match. Is this a valid/reasonable explanation? Can someone else confirm this relationship between rsync's computational overhead and the file's (dis)similarity? Thanks, Thomas. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
[Bug 8512] rsync is slower than cp -- Reduce overhead to cp for transfer of large files
https://bugzilla.samba.org/show_bug.cgi?id=8512 --- Comment #4 from John Wiegley jo...@newartisans.com 2013-11-17 09:02:51 UTC --- Let me add my voice to the mix here. I'm copying a 1GB VOB file from an Ubuntu ZFS server running Samba 4.1.1, to my Mac OS X 10.9 box. iperf reports 112 MB/s (should be my theoretical maximum). Copying with Path Finder over Samba: 99 MB/s. Copying with rsync directly (using arcfour256): 92 MB/s. Copying with dd over Samba: 67 MB/s. Copying with cat over Samba (measured with pv): 69 MB/s. Copying with rsync over Samba: 55 MB/s. I'm using gigabit ethernet, obviously, with mtu set to 1500 and no TCP options other than the following in smb.conf: socket options = TCP_NODELAY SO_RCVBUF=131072 SO_SNDBUF=131072 These numbers are very stable over several runs, so I'm pretty curious now about what's going on, especially with rsync. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the QA contact for the bug. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: [Bug 8512] rsync is slower than cp -- Reduce overhead to cp for transfer of large files
On 2013-11-17 4:02 AM, samba-b...@samba.org samba-b...@samba.org wrote: I'm using gigabit ethernet, obviously, with mtu set to 1500 and no TCP options other than the following in smb.conf: socket options = TCP_NODELAY SO_RCVBUF=131072 SO_SNDBUF=131072 First, remove these... These options have been deprecated - and can CAUSE problems - for many, MANY years. Next, since you had those still lingering around (I'm assuming from an ancient initial install, or from following some stupid long irrelevant $random_howto on the internet), go ask for help on the Samba support list evaluating your smb.conf settings and see if you have any other miscreants in there... Then, if you are still having trouble, maybe its an rsync issue and come back here for more help... -- Best regards, */Charles/* -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
[Bug 7195] timeout reached while sending checksums for very large files
https://bugzilla.samba.org/show_bug.cgi?id=7195 --- Comment #2 from Loïc Gomez samba-b...@kyoshiro.org 2012-10-23 11:27:38 UTC --- I ran into a similar issue recently while transferring large files (40GB). After a few tests, it seems - in my case at least - to be related to the delta-xfer algorithm. The bug does not happen anymore with the -W option. I don't know if this will resolve your issue, but you can also try looking into these options : --no-checksum --no-compress --blocking-io. These were not the source of my problems, but the functions they're related to might rise a network timeout. I hope it helps, anyways, good luck solving your issue. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the QA contact for the bug. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Question about --partial-dir and aborted transfers of large files
On Fri, Aug 10, 2012 at 9:03 AM, T.J. Crowder t...@crowdersoftware.comwrote: 1. Am I correct in inferring that when rsync sees data for a file in the --partial-dir directory, it applies its delta transfer algorithm to the partial file? 2. And that this is _instead of_ applying it to the real target file? (Not a nifty three-way combination.) Yes. The current code behaves the same as if you had specified --partial (as far as the next transfer goes), just without actually being destructive of the destination file. I have imagined making the code pretend that the partial file and any destination file are concatenated together for the purpose of generating checksums. That would allow content references to both files, but rsync would need to be enhanced to open both files in both the generator and the receiver and be able to figure out what read goes where (which shouldn't be too hard). I'd suggest that the code read the partial file first, padding out the end of its data to an even checksum-sized unit so that the destination file starts on a even checksum boundary (so that the code never needs to combine data from two files in a single checksum or copy reference). If so, it would appear that this means a large amount of unnecessary data may end up being transferred in the second sync of a large file if you interrupt the first sync. It all depends on where you interrupt it and how much data matches in the remaining portion of the destination file. It does give you the option of discarding the partial data if it is too short to be useful, or possibly doing your own concatenation of the whole (or trailing portion) of the destination file onto the partial file, should you want to tweak things before resuming the transfer. ..wayne.. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Question about --partial-dir and aborted transfers of large files
On Sun, Aug 12, 2012 at 10:41 AM, Wayne Davison way...@samba.org wrote: I have imagined making the code pretend that the partial file and any destination file are concatenated together for the purpose of generating checksums. Actually, that could be bad if the destination and partial file are both huge. What would be better would be to send just the size of the destination file in checksums, but overlay the start of the destination's data with the partial-file's data (and just ignore any partial-block from the end of the partial file). ..wayne.. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Question about --partial-dir and aborted transfers of large files
Hi, Thanks for that! On 12 August 2012 18:41, Wayne Davison way...@samba.org wrote: I have imagined making the code pretend that the partial file and any destination file are concatenated together for the purpose of generating checksums. That would allow content references to both files, but rsync would need to be enhanced to open both files in both the generator and the receiver and be able to figure out what read goes where (which shouldn't be too hard). I'd suggest that the code read the partial file first, padding out the end of its data to an even checksum-sized unit so that the destination file starts on a even checksum boundary (so that the code never needs to combine data from two files in a single checksum or copy reference). So if I'm inspired and somehow magically find the time, it's at least feasible. I'm not seeing why the generator would need to be different, though; the receiver would be doing the see-through magic (treating the partial as though it were overlaid on the beginning of the target). If so, it would appear that this means a large amount of unnecessary data may end up being transferred in the second sync of a large file if you interrupt the first sync. It all depends on where you interrupt it and how much data matches in the remaining portion of the destination file. It does give you the option of discarding the partial data if it is too short to be useful, or possibly doing your own concatenation of the whole (or trailing portion) of the destination file onto the partial file, should you want to tweak things before resuming the transfer. Ah, yes, I _nearly_ got there, didn't I, with my boxing clever workaround. If one knows one's in this situation, just append data from the target file to the partial file to fill in the missing bits (e.g., if the target is 100K and the partial is 20K, append the _last_ 80K of target to partial), and when rsync runs it'll only send what it has to. A C program to recursively walk a tree and do that on the selected partials where it makes sense (e.g., my VM HDD files) and not to others (which might have deletions or insertions) is probably 20-30 lines of code. On 12 August 2012 19:08, Wayne Davison way...@samba.org wrote: On Sun, Aug 12, 2012 at 10:41 AM, Wayne Davison way...@samba.org wrote: I have imagined making the code pretend that the partial file and any destination file are concatenated together for the purpose of generating checksums. Actually, that could be bad if the destination and partial file are both huge. What would be better would be to send just the size of the destination file in checksums, but overlay the start of the destination's data with the partial-file's data (and just ignore any partial-block from the end of the partial file). Yes, I wasn't thinking concatenation, but more like what LVM and similar do with snapshots: The partial file is a bunch of snapshot blocks with the curious property of only being at the beginning of the file. So given a file with 50K blocks, and a partial with 20K blocks, the code would view the combined result as the first 20K blocks of the partial followed by the subsequent 30K blocks from the target. (Hence my see through terminology above.) E.g., resorting to ASCII-art, the receiver code see a virtual file: +--+ | partial file | +--+ +--+ +--+ | virtual file | +---| Blks 0-9K| | target file | +--+ | +-| Blks 10K-19K | +--+ | Blks 0-9K|--+ | +--+ | Blks 0-9K| | Blks 10K-19K |+ | Blks 10K-19K | | Blks 20K-29K |---| Blks 20K-29K | | Blks 30K-39K |---| Blks 30K-39K | | Blks 40K-49K |---| Blks 40K-49K | +--++--+ The receiver would perform checksums against that virtual file, and when time to copy a block, if the block needs to be transferred, do that; if not, grab it from the target file. Again, this all really only applies in the simple case of files that are nice, discrete blocks of data. Not knowing the delta algorithm, I have no idea what would happen if the above were applied to a file that got (say) 5K of blocks deleted at the beginning followed by 1K blocks of inserted data. The virtual file would appear to have duplicated data in that case, which the delta algorithm would then have to get rid of / cope with. I wouldn't be too surprised to find that it lead to inefficiency in other types of files. Thanks again, -- T.J. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
rsync 3.0.7 intermittent failures with many large files
I have recently changed the version of rsync I use from the ancient 2.6.6 to 3.0.7. Every since then it seems to me that I am getting more rsync failures than before. Hopefully, other people can share their experiences and point to the cause which, I acknowledge, might be me doing something wrong. The rsync error long indicates things like: rsync: writefd_unbuffered failed to write 4092 bytes to socket [generator]: Connection reset by peer (104)rsync: read error: Connection reset by peer (104) rsync error: error in rsync protocol data stream (code 12) at io.c(1530) [generator=3.0.7] rsync error: error in rsync protocol data stream (code 12) at io.c(760) [receiver=3.0.7] I am using rsync to perform a distributed software release to around 200 machines. Several hundred files are transferred. Some are quite large executables. Each time it is a different set of machines that fail. Out of 200 it is 3 to 6 that fail. I have not had a run where all 200 work for quite some time. With the older version of rsync total success was the norm. We got a few failures such as the above every now and then. I am not sure it is to do with moving to a more recent version of rsync. It might be to do with a flakey network. It's hard to say. But I did see an rsync discussion thread at http://serverfault.com/questions/27137/rsync-or-dfs-r-for-remote-backup-over-slow-links-of-windows-2003-r2-sp2. This seems to be talking about the same kind of problems I have been having. The distribution is over a WAN, tranferring files from London to Geneva. I am using Windows-XP (SP2), with rsync built using the latest version of cygwin. I am also defining the environment variable CYGWIN to be NONTSEC to head off any ACL-related permission problems. The thread I refer to makes me think that it might actually be a problem with rsync. Maybe rsync should watch out for temporary loss of network connectivity or slowness like you get in WANs sometimes. Maybe it should tolerate these sorts of errors subject to a maximum number of retries. Regards, Andrew Marlow NOTE TO MODERATORS: I apologise for the length of the disclaimer that the emailer attaches. There is nothing I can do about it. Please feel free to delete it. --- ___ This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorised copying, disclosure or distribution of the material in this e-mail is prohibited. Please refer to http://www.bnpparibas.co.uk/en/information/legal_information.asp?Code=ECAS-845C5H for additional disclosures. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
DO NOT REPLY [Bug 7195] New: timeout reached while sending checksums for very large files
https://bugzilla.samba.org/show_bug.cgi?id=7195 Summary: timeout reached while sending checksums for very large files Product: rsync Version: 3.0.7 Platform: All OS/Version: All Status: NEW Severity: minor Priority: P3 Component: core AssignedTo: way...@samba.org ReportedBy: jan...@webgods.de QAContact: rsync...@samba.org When I try to continue the upload of a very large file (400GB, 200GB already transmitted) with --partial, rsync stops with an error after 10 minutes. Verbosity shows, that during this time it has transmitted checksums for about 30G worth of data. Increasing the timeout with --timeout=10 helps. With this, rsync reaches the point where it transmits new data. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the QA contact for the bug, or are watching the QA contact. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: retransfer fail of large files with inplace and broken pipe
On Sun, 2009-12-13 at 07:21 +, tom raschel wrote: i have to tranfer large files each 5-100 GB (mo-fri) over dsl line. unfortunately dsl lines are often not very stable and i got a broken pipe error. (dsl lines are getting a new ip if they are broken or at least after a reconnect every 24 hours) i had a script which detect the rsync error and restart the transmission. this means that if a file has transfered e.g. 80 % i start again from beginning. using partial and partial-dir was no solution to resync because rsync cut the original file (e.g. from 20 GB to 15 GB) which means that i have to transfer the whole rest of 5 GB. Indeed. I entered an enhancement request to handle this situation better: https://bugzilla.samba.org/show_bug.cgi?id=7123 -- Matt -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
rsync taking a while on really large files
Can anyone suggest a good way to speed up rsync on really large files? In particular, when I rsync the mail spool directory, I have a few users with inboxes over 1GB and up and it seems to take a very long time to just compare the files. Maybe it would be faster to copy from scratch for files over a certain size or something if the time stamps don't match. David -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsync taking a while on really large files
On 01/15/2010 07:22 PM, David Trammell wrote: Can anyone suggest a good way to speed up rsync on really large files? In particular, when I rsync the mail spool directory, I have a few users with inboxes over 1GB and up and it seems to take a very long time to just compare the files. Maybe it would be faster to copy from scratch for files over a certain size or something if the time stamps don't match. David rsync is meant to safe bandwidth, that's the main use of the tool. If you have enough bandwidth, rsync without options might not be how you want to use it. Their is -W for whole files. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsync taking a while on really large files
I could use an ordinary copying script for the mail files, but I figured if rsync can do it in some more optimal way, I'll stick with it for simplicity (since it's working great for the several hundred gigs of user files). I saw the -W option, but I wasn't sure about how it behaves as the man pages don't have many details, and I thought there might be other options I missed. For -W the man page just says copy files whole (w/o delta-xfer algorithm) Does that mean it will copy all files with no comparison, or does it at least verify that there is some change to the file first? I suppose either way I can test it to see, which is faster, but if someone can clarify the behavior I'd appreciate it. Thanks, David - Original Message - From: l...@consolejunky.net To: rsync@lists.samba.org Sent: Friday, January 15, 2010 12:40 PM Subject: Re: rsync taking a while on really large files On 01/15/2010 07:22 PM, David Trammell wrote: Can anyone suggest a good way to speed up rsync on really large files? In particular, when I rsync the mail spool directory, I have a few users with inboxes over 1GB and up and it seems to take a very long time to just compare the files. Maybe it would be faster to copy from scratch for files over a certain size or something if the time stamps don't match. David rsync is meant to safe bandwidth, that's the main use of the tool. If you have enough bandwidth, rsync without options might not be how you want to use it. Their is -W for whole files. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsync taking a while on really large files
On 01/15/2010 07:46 PM, David Trammell wrote: I could use an ordinary copying script for the mail files, but I figured if rsync can do it in some more optimal way, I'll stick with it for simplicity (since it's working great for the several hundred gigs of user files). I saw the -W option, but I wasn't sure about how it behaves as the man pages don't have many details, and I thought there might be other options I missed. For -W the man page just says copy files whole (w/o delta-xfer algorithm) Does that mean it will copy all files with no comparison, or does it at least verify that there is some change to the file first? I suppose either way I can test it to see, which is faster, but if someone can clarify the behavior I'd appreciate it. Thanks, David Without -W set it will check for file date/time and I think size. - Original Message - From: l...@consolejunky.net To: rsync@lists.samba.org Sent: Friday, January 15, 2010 12:40 PM Subject: Re: rsync taking a while on really large files On 01/15/2010 07:22 PM, David Trammell wrote: Can anyone suggest a good way to speed up rsync on really large files? In particular, when I rsync the mail spool directory, I have a few users with inboxes over 1GB and up and it seems to take a very long time to just compare the files. Maybe it would be faster to copy from scratch for files over a certain size or something if the time stamps don't match. David rsync is meant to safe bandwidth, that's the main use of the tool. If you have enough bandwidth, rsync without options might not be how you want to use it. Their is -W for whole files. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsync taking a while on really large files
On Fri 15 Jan 2010, David Trammell wrote: I saw the -W option, but I wasn't sure about how it behaves as the man pages don't have many details, and I thought there might be other options I missed. For -W the man page just says copy files whole (w/o delta-xfer algorithm) Take a moment to properly read the manpage, the above indicates you've never gone beyond the option summary. Further on, the options are discussed in detail: -W, --whole-file With this option rsync’s delta-transfer algorithm is not used and the whole file is sent as-is instead. The transfer may be faster if this option is used when the bandwidth between the source and destination machines is higher than the bandwidth to disk (especially when the disk is actually a networked filesystem). This is the default when both the source and des‐ tination are specified as local paths, but only if no batch- writing option is in effect. Paul -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Rsync performance with very large files
We're having a performance issue when attempting to rsync a very large file. Transfer rate is only 1.5MB/sec. My issue looks very similar to this one: http://www.mail-archive.com/rsync@lists.samba.org/msg17812.html In that thread, a 'dynamic_hash.diff' patch was developed to work around this issue. I applied the 'dynamic_hash' patch included in the 2.6.7 src, but it didn't help. We are trying to evaluate the possibility of using rsync as an alternative to IBM's FlashCopy, which only works within the storage pool controlled by our San Volume Controller. Some details about our test environment: - Sender and Receiver are both POWER6 servers running AIX 5.3 - Fiber attached disk, DS8300 storage - Gigabit network (Hypervisor Virtual I/O) - Test file is 232GB - I've tried rsync version 3.0.7 (vanilla) and 2.6.7 with the dynamic_hash.diff patch, both compiled with IBM's xlc compiler. Same behavior with both versions. - It takes approx 1.5 hours to 'consider' the file before transfers begin, no big deal... - Once the changes are being sent, the rate is only 1.5MB/sec - Nothing is using either the source or destination files, only rsync (these are test servers.) - Both servers appear healthy, no CPU or memory problems. Just hoping somebody might have some insight. The thread I linked above didn't have any info indicating success or failure of the patch - the original poster didn't provide any feedback. Eric Cron ericc...@yahoo.com -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Rsync performance with very large files
Eric Cron (ericc...@yahoo.com) wrote on 8 January 2010 12:20: We're having a performance issue when attempting to rsync a very large file. Transfer rate is only 1.5MB/sec. My issue looks very similar to this one: http://www.mail-archive.com/rsync@lists.samba.org/msg17812.html In that thread, a 'dynamic_hash.diff' patch was developed to work around this issue. I applied the 'dynamic_hash' patch included in the 2.6.7 src, but it didn't help. That's what I'd expect. We are trying to evaluate the possibility of using rsync as an alternative to IBM's FlashCopy, which only works within the storage pool controlled by our San Volume Controller. Some details about our test environment: - Sender and Receiver are both POWER6 servers running AIX 5.3 - Fiber attached disk, DS8300 storage - Gigabit network (Hypervisor Virtual I/O) - Test file is 232GB - I've tried rsync version 3.0.7 (vanilla) and 2.6.7 with the dynamic_hash.diff patch, both compiled with IBM's xlc compiler. Same behavior with both versions. Yes. v3 has better hashing but it's rarely the bottleneck. - It takes approx 1.5 hours to 'consider' the file before transfers begin, no big deal... Reasonable. It's likely not considering, it's reading the file on the destination. At a rate of 40MB/s it takes about 1.5h to read 232GB. - Once the changes are being sent, the rate is only 1.5MB/sec Likely limited by the origin reading the file, if there are few changes. rsync is designed to reduce net traffic, and this usually costs more local I/O. The destination machine first reads the entire file and sends checksums to the origin, which (only) then reads the entire file and (meanwhile) sends the differences to the destination. So the total time is at least destination-reading + source-reading. In your case you have a net that is about as fast as local I/O. If the destination can write roughly as fast as the origin can read, you're better off just copying the entire file. This will save you about 40%-50% in total time, since you then do the destination and source operations in parallel. You can speed up rsync with --whole-file, which will do exactly the above. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: retransfer fail of large files with inplace and broken pipe
OK, have tried now --inplace with --backup option but syncing the files does consume much more time than a normal rsync process, so this is not a reliable solution. Thx Tom Tom rasc...@edvantice.de schrieb im Newsbeitrag news:hg7dg6$em...@ger.gmane.org... Hi, retransfer of large fail with inplace after a broken pipe is working now. (thx again to wayne) but it is much more slow as if a normal rsync job. I have read that setting the --backup option could help. (have not tried it yet) But --backup option would halve the space, which is not desirable. Is there a way to tell rsync to delete the --backup file after an successful sync. thx Tom tom raschel rasc...@edvantice.de schrieb im Newsbeitrag news:loom.20091213t075221-...@post.gmane.org... Hi, i have to tranfer large files each 5-100 GB (mo-fri) over dsl line. unfortunately dsl lines are often not very stable and i got a broken pipe error. (dsl lines are getting a new ip if they are broken or at least after a reconnect every 24 hours) i had a script which detect the rsync error and restart the transmission. this means that if a file has transfered e.g. 80 % i start again from beginning. using partial and partial-dir was no solution to resync because rsync cut the original file (e.g. from 20 GB to 15 GB) which means that i have to transfer the whole rest of 5 GB. so i had a look at --inplace which I thougt could do the trick, but inplace is updating the timestamp and if the script start a retransfer after a broken pipe it fails because the --inplace file is newer than the original file of the sender. using ignore-times could be a solution but slow down the whole process to much. is there a option to tell rsync not to change the time of a --inplace transfered file, or maybe preserve the mtime and do a comparison of mtime instead of ctime. Thx Tom -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: retransfer fail of large files with inplace and broken pipe
Hi, retransfer of large fail with inplace after a broken pipe is working now. (thx again to wayne) but it is much more slow as if a normal rsync job. I have read that setting the --backup option could help. (have not tried it yet) But --backup option would halve the space, which is not desirable. Is there a way to tell rsync to delete the --backup file after an successful sync. thx Tom tom raschel rasc...@edvantice.de schrieb im Newsbeitrag news:loom.20091213t075221-...@post.gmane.org... Hi, i have to tranfer large files each 5-100 GB (mo-fri) over dsl line. unfortunately dsl lines are often not very stable and i got a broken pipe error. (dsl lines are getting a new ip if they are broken or at least after a reconnect every 24 hours) i had a script which detect the rsync error and restart the transmission. this means that if a file has transfered e.g. 80 % i start again from beginning. using partial and partial-dir was no solution to resync because rsync cut the original file (e.g. from 20 GB to 15 GB) which means that i have to transfer the whole rest of 5 GB. so i had a look at --inplace which I thougt could do the trick, but inplace is updating the timestamp and if the script start a retransfer after a broken pipe it fails because the --inplace file is newer than the original file of the sender. using ignore-times could be a solution but slow down the whole process to much. is there a option to tell rsync not to change the time of a --inplace transfered file, or maybe preserve the mtime and do a comparison of mtime instead of ctime. Thx Tom -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
RE: retransfer fail of large files with inplace and broken pipe
Tom wrote: to make things more clear 1.) first transfer is done either a initial setup or with a usb hdd to get sender and receiver in sync. 2.) transfer does not stop because rsync had a timeout, it stops because the dsl line is broken (which i could see at dyndns) 3) if dsl line is stable the transfer is successfull (which works furtunately most of the time) 4.) i am searching for a way to reduce the time to retransfer the file or in other words to resume the filetransfer after a broken pipe (e.g. if you download a 4.4 GB Centos Image it is comfortable to resume the transfer of a 99 % transfered file instead to download all from scratch) Tom But you already have 100% of the image, only it is an older version of the image. The only thing I've found that works (and this is ONLY on something UNIXy) is to monitor the temporary file on the target and if it is big enough, rename it to the intended target file before the target rsync destroys it. For real disasters, you can attempt to automate this process. First: Transfer or re-transfer. I think, particularly with bad connections, you need to treat those VERY differently. For the initial transfer, --partial should help. For retransfers, where stuff in the middle has changed, I would expect the necessary state information to exist ONLY in the two running processes, and that information is lost if the connection goes down. This includes the connection dying because both sides are going through the file and have nothing worthwhile to say to each other. As usual, flames invited if I've got any of this wrong. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart- questions.html -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart- questions.html -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: retransfer fail of large files with inplace and broken pipe
On Sat, Dec 12, 2009 at 11:21 PM, tom raschel rasc...@edvantice.de wrote: so i had a look at --inplace which I thougt could do the trick, but inplace is updating the timestamp and if the script start a retransfer after a broken pipe it fails because the --inplace file is newer than the original file of the sender. Are you using --update (-u)? If so, turn that off. If not, rsync won't skip a file that is newer, so something else is afoot. ..wayne.. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: retransfer fail of large files with inplace and broken pipe
Thx to all, it was the -u option which prevents rsync to resume the file. Tom Tony Abernethy t...@servasoftware.com schrieb im Newsbeitrag news:af5ef1769d564645a9acc947375f0d021567087...@winxbeus13.exchange.xchg... Tom wrote: to make things more clear 1.) first transfer is done either a initial setup or with a usb hdd to get sender and receiver in sync. 2.) transfer does not stop because rsync had a timeout, it stops because the dsl line is broken (which i could see at dyndns) 3) if dsl line is stable the transfer is successfull (which works furtunately most of the time) 4.) i am searching for a way to reduce the time to retransfer the file or in other words to resume the filetransfer after a broken pipe (e.g. if you download a 4.4 GB Centos Image it is comfortable to resume the transfer of a 99 % transfered file instead to download all from scratch) Tom But you already have 100% of the image, only it is an older version of the image. The only thing I've found that works (and this is ONLY on something UNIXy) is to monitor the temporary file on the target and if it is big enough, rename it to the intended target file before the target rsync destroys it. For real disasters, you can attempt to automate this process. First: Transfer or re-transfer. I think, particularly with bad connections, you need to treat those VERY differently. For the initial transfer, --partial should help. For retransfers, where stuff in the middle has changed, I would expect the necessary state information to exist ONLY in the two running processes, and that information is lost if the connection goes down. This includes the connection dying because both sides are going through the file and have nothing worthwhile to say to each other. As usual, flames invited if I've got any of this wrong. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart- questions.html -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart- questions.html -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
retransfer fail of large files with inplace and broken pipe
Hi, i have to tranfer large files each 5-100 GB (mo-fri) over dsl line. unfortunately dsl lines are often not very stable and i got a broken pipe error. (dsl lines are getting a new ip if they are broken or at least after a reconnect every 24 hours) i had a script which detect the rsync error and restart the transmission. this means that if a file has transfered e.g. 80 % i start again from beginning. using partial and partial-dir was no solution to resync because rsync cut the original file (e.g. from 20 GB to 15 GB) which means that i have to transfer the whole rest of 5 GB. so i had a look at --inplace which I thougt could do the trick, but inplace is updating the timestamp and if the script start a retransfer after a broken pipe it fails because the --inplace file is newer than the original file of the sender. using ignore-times could be a solution but slow down the whole process to much. is there a option to tell rsync not to change the time of a --inplace transfered file, or maybe preserve the mtime and do a comparison of mtime instead of ctime. Thx Tom -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
RE: retransfer fail of large files with inplace and broken pipe
tom raschel wrote: Hi, i have to tranfer large files each 5-100 GB (mo-fri) over dsl line. unfortunately dsl lines are often not very stable and i got a broken pipe error. (dsl lines are getting a new ip if they are broken or at least after a reconnect every 24 hours) i had a script which detect the rsync error and restart the transmission. this means that if a file has transfered e.g. 80 % i start again from beginning. using partial and partial-dir was no solution to resync because rsync cut the original file (e.g. from 20 GB to 15 GB) which means that i have to transfer the whole rest of 5 GB. so i had a look at --inplace which I thougt could do the trick, but inplace is updating the timestamp and if the script start a retransfer after a broken pipe it fails because the --inplace file is newer than the original file of the sender. using ignore-times could be a solution but slow down the whole process to much. is there a option to tell rsync not to change the time of a --inplace transfered file, or maybe preserve the mtime and do a comparison of mtime instead of ctime. Thx Tom -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart- questions.html First: Transfer or re-transfer. I think, particularly with bad connections, you need to treat those VERY differently. For the initial transfer, --partial should help. For retransfers, where stuff in the middle has changed, I would expect the necessary state information to exist ONLY in the two running processes, and that information is lost if the connection goes down. This includes the connection dying because both sides are going through the file and have nothing worthwhile to say to each other. As usual, flames invited if I've got any of this wrong. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: retransfer fail of large files with inplace and broken pipe
to make things more clear 1.) first transfer is done either a initial setup or with a usb hdd to get sender and receiver in sync. 2.) transfer does not stop because rsync had a timeout, it stops because the dsl line is broken (which i could see at dyndns) 3) if dsl line is stable the transfer is successfull (which works furtunately most of the time) 4.) i am searching for a way to reduce the time to retransfer the file or in other words to resume the filetransfer after a broken pipe (e.g. if you download a 4.4 GB Centos Image it is comfortable to resume the transfer of a 99 % transfered file instead to download all from scratch) Tom First: Transfer or re-transfer. I think, particularly with bad connections, you need to treat those VERY differently. For the initial transfer, --partial should help. For retransfers, where stuff in the middle has changed, I would expect the necessary state information to exist ONLY in the two running processes, and that information is lost if the connection goes down. This includes the connection dying because both sides are going through the file and have nothing worthwhile to say to each other. As usual, flames invited if I've got any of this wrong. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsync algorithm for large files
ehar...@lyricsemiconductors.com wrote: I thought rsync, would calculate checksums of large files that have changed timestamps or filesizes, and send only the chunks which changed. Is this not correct? My goal is to come up with a reasonable (fast and efficient) way for me to daily incrementally backup my Parallels virtual machine (a directory structure containing mostly small files, and one 20G file) I’m on OSX 10.5, using rsync 2.6.9, and the destination machine has the same versions. I configured ssh keys, and this is my result: Upgrade to rsync 3 at least. Rsync keeps a hash of the blocks of sliding hashes. For older versions of rsync, the has was of a constant size. This meant that files over 3GB in size had a high chance of hash collisions. For a 20G file, the collisions alone might be the cause of your trouble. Newer rsyncs detect when the hash gets too big, and increase the has size accordingly, thus avoiding the collisions. In other words - upgrade both sides (but specifically the sender). Shachar -- Shachar Shemesh Lingnu Open Source Consulting Ltd. http://www.lingnu.com -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
RE: rsync algorithm for large files
Yup, by doing --inplace, I got down from 30 mins to 24 mins... So that's slightly better than resending the whole file again. However, this doesn't really do what I was hoping to do. Perhaps it can't be done, or somebody would like to recommend some other product that is more well suited for my purposes? If I could describe ideally exactly what I'm trying to do, it would be ... · During initial send, calculate checksums on the fly, down to some blocksize (perhaps 1Mb), and store the checksums for later use. · On subsequent sends, just read the source and compare checksums against previously saved values, and only send the blocks needed. In worst case, all blocks have changed, and the time to send is very nearly equal to the initial send. · The runtime for subsequent runs should never significantly exceed the runtime of the initial. Because the goal is to gain something over brainless delete-and-overwrite. · The runtime for subsequent runs should be on the same order of magnitude of: oWhichever is greater: oCalculate the checksums of the source or oSend the changed blocks In my specific situation, 33mins for the initial send of 20G across 100Mbit lan, my subsequent run should be approx 11mins, because that’s how long it takes for me to md5 the whole tree. Thanks again for any assistance… -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
rsync algorithm for large files
I thought rsync, would calculate checksums of large files that have changed timestamps or filesizes, and send only the chunks which changed. Is this not correct? My goal is to come up with a reasonable (fast and efficient) way for me to daily incrementally backup my Parallels virtual machine (a directory structure containing mostly small files, and one 20G file) I’m on OSX 10.5, using rsync 2.6.9, and the destination machine has the same versions. I configured ssh keys, and this is my result: (Initial sync) time rsync -a --delete MyVirtualMachine/ myserver:MyVirtualMachine/ 20G ~30minutes (Second time I ran it, with no changes to the VM) time rsync -a --delete MyVirtualMachine/ myserver:MyVirtualMachine/ 2 seconds (Then I made some minor changes inside the VM, and I want to send just the changed blocks) time rsync -a --delete MyVirtualMachine/ myserver:MyVirtualMachine/ After waiting 50 minutes, I cancelled the job. Why does it take longer the 3rd time I run it? Shouldn’t the performance always be **at least** as good as the initial sync? Thanks for any help… -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsync algorithm for large files
On 04.09.2009 18:00, ehar...@lyricsemiconductors.com wrote: Why does it take longer the 3rd time I run it? Shouldn?t the performance always be **at least** as good as the initial sync? Not per se. First you have to determine THAT the file has changed, then the file is synced if there was a change. At least that's what you have to do when the file-size is unchanged and only the timestamp is differs. (Which is unfortunatly often the case for Virtual Machine Images) Worst case: Takes double the time if the change is at end of the file. When the filesize differs rsync immediatly knows that the file has actual changes and starts the sync right away. If i understand '--ignore-times' correctly it forces rsync to always regard the files as changed and so start a sync right away, without first checking for changes. There are also some other options that may or may not have a speed impact for you: --inplace, so that rsync doesn't create a tmp-copy that is later moved over the previous file on the target-site. --whole-file, so that rsync doesn't use delta-transfer but rather copies the whole file. Also you may to separate the small from the large files with: --min-size --max-size So you can use different options for the small/large file(s). Bis denn -- Real Programmers consider what you see is what you get to be just as bad a concept in Text Editors as it is in women. No, the Real Programmer wants a you asked for it, you got it text editor -- complicated, cryptic, powerful, unforgiving, dangerous. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsync algorithm for large files
Matthias Schniedermeyer (m...@citd.de) wrote on 5 September 2009 00:34: On 04.09.2009 18:00, ehar...@lyricsemiconductors.com wrote: Why does it take longer the 3rd time I run it? Shouldn?t the performance always be **at least** as good as the initial sync? Not per se. First you have to determine THAT the file has changed, then the file is synced if there was a change. At least that's what you have to do when the file-size is unchanged and only the timestamp is differs. (Which is unfortunatly often the case for Virtual Machine Images) Worst case: Takes double the time if the change is at end of the file. No, rsync assumes that the file has changed if either the size or the timestamp differs, and syncs it immediately. For a new file transfer it's read once in the source and written once in the destination. For an update it's still read once in the source but read twice and written once in the destination, no matter how many or extensive the changes are. The source also has to do the slidding checksumming. This is usually faster than reading the file, so it'll only slow down the process if the source is very slow or the cpu is busy with other tasks. OTOH, the IO on the destination is significantly higher for big files; this is often the cause of a slower transfer rate than a full copy. There are also some other options that may or may not have a speed impact for you: --inplace, so that rsync doesn't create a tmp-copy that is later moved over the previous file on the target-site. Yes, this is useful because it avoids both a second reading and the full write on the destination (in priciple; I didn't bother to check the actual implementation). For large files with small changes this option is probably the best. The problem is that if the update aborts for any reason you lose your backup. One might want to keep at least two days of backups in this case. --whole-file, so that rsync doesn't use delta-transfer but rather copies the whole file. Yes but causes a lot of net traffic. He mentions an average transfer rate of about 11MB/s, so for a 100Mb/s net whole-file is probably not suitable. If however he has a free gigabit link it'll be the best if --inplace is not acceptable. Also you may to separate the small from the large files with: --min-size --max-size So you can use different options for the small/large file(s). Agreed. I'd also suggest using rsync v3 because it limits the blocksize. Previous versions will use quite a large block for big files and if changes are scattered it'll transfer much more than v3. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
DO NOT REPLY [Bug 5482] look into a hierarchical checksum algorithm that would help to efficiently transmit really large files
https://bugzilla.samba.org/show_bug.cgi?id=5482 [EMAIL PROTECTED] changed: What|Removed |Added Status|NEW |ASSIGNED Summary|apply the rsync comparison |look into a hierarchical |algorithm specially to .mov |checksum algorithm that |and .mp4 files |would help to efficiently ||transmit really large files --- Comment #6 from [EMAIL PROTECTED] 2008-05-25 09:31 CST --- I also think coding a special transfer algorithm for media files is not something that would be appropriate for rsync (though someone may want to code their own custom version if this is important to them). A general hierarchical checksum would be quite interesting, but we'd need to balance the costs of disk I/O and memory use, and avoid round-trip latency. Rsync deals with latency by pipe-lining the checksum generation and file reconstruction in separate processes, but each one requires a separate read of the basis file. So, we'd want to make rsync still able to process more than one active transfer and not do a read-read for each sub-level of checksum generation. If rsync were made to send a first level set of checksums for each file, computing a certain number of sub-level checksums into memory, that might work for the generator at the expense of holding the hierarchical sums in memory for N active files at once (perhaps topping out after a certain selectable memory limit was reached). The biggest unknown to me would be how to make the sender work efficiently for such an algorithm. The sender is the part that must compute a new checksum on each character boundary (using a rolling checksum) to find relocation matches for each block. Computing a rolling checksum using the current method would require holding huge amount of the file in memory at once (but that could be changed to use mem-mapped files, potentially triggering re-reads as the window slides forward). We'd also need to move away from the current bifurcated checksum algorithm into a single, stronger rolling checksum to avoid having to recompute the strong checksum over a huge amount of data to double-check a match. Also, each successive level of transfer would likely trigger a re-read of the part of the file that didn't match anything as we divide it into smaller checksums. We'd also want to limit how hierarchical the checksums are based on file size, as smaller files just need the current method, while really large files might benefit from several levels. We may want to even consider a reversal of the jobs of the sender and generator to lighten the load on the server. This would slow things down latency-wise, but could make it possible for a server to deal with more clients, especially if this makes the per-byte processing more intensive. Sounds like an intriguing idea to investigate further and see how feasible it would be. Comments? Jamie, do you have code to share? -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the QA contact for the bug, or are watching the QA contact. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Thought on large files
Matt McCutchen wrote: On Thu, 2008-01-24 at 13:54 +0900, Brendan Grieve wrote: I had a look at rdiff-backup, but I was trying to get something that spoke native rsync (IE, not to force any change on the client side). To achieve this, you can have the client push to an rsync daemon and then have the daemon call rdiff-backup so that the rdiff-backup part happens entirely on the server. The idea is the same as the daemon-and-rsnapshot setup I described in the following message, but with rdiff-backup in place of rsnapshot as the backend: http://lists.samba.org/archive/rsync/2007-December/019470.html After some thought I think the best place to put such a change would be at the filesystem level. For example, if one had a FUSE filesystem that simply ran on top of an existing one, that wrote its files as I described (or uses diff-like methods), but presents a clean filesystem for rsync (or indeed any tool) to make use of. I believe I may look in that direction instead of hacking rsync. You could do that, but note that the rsync receiver won't explicitly tell the filesystem what files are similar, so you'll have to either keep a big hashtable to help you coalesce identical blocks globally or use some kludge like looking at what other files the receiver has open while it is writing the destination file. Matt I spent some time playing around with rdiff-backup as mentioned, but it turns to be not as efficient as I had hoped. Also it makes it quite tough to restore a single file or have a method of viewing a snapshot of a system at a particular time. I spent a bit of time hacking at a FUSE system, and actually I think this method would work very well. Essentially one would mount a very thin 'fuse' layer onto a standard list of files. This layer would basically just pass through all commands (read/write/dirread/stat etc..). However, it would divide up files larger than say 25Mb into 25Mb chunks, and intelligently store them. Whatever is accessing the top layer system (rsync in this case, but it could be samba, or a web share or anything) would see the files as they should be, but under neath the files are stored in these chunks. When rsync (or anything) needs to read or write a file, it would work out which raw file to pull the data from the seek offset. Then, say once a day, we create a hard-link copy of the files. Normally one would copy the 'top layer' files to a snapshot directory, but in this case you would copy the 'bottom layer' (IE the raw data) to a snapshot directory. This snapshot directory could itself also be covered by a 'blockfs' style FUSE layer so a person can browse it and see the files as they should. Hope this makes sense. I've hacked together a test 'blockfs' fuse layer, and will run some more tests on some multi-gigabyte files, clean it up and put it up if anyone is interested. I'm amazed at how easy it is to program a FUSE layer. Brendan Grieve -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Thought on large files
Matt McCutchen wrote: On Wed, 2008-01-23 at 13:38 +0900, Brendan Grieve wrote: Lets say the file, whatever it is, is a 10Gb file, and that some small amount of data changes in it. This is efficiently sent accross by rsync, BUT the rsync server side will correctly break the hard-link and create a new file with the changed bits. This means, if even 1 byte of that 10Gb file changes, you now have to store that whole file again. What my thoughts were is that if the server could transparently break a large file into chunks and store them that way, then one can still make use of hard-links efficiently. This is a fine idea, but I don't think support for this should be added to rsync. Instead, I suggest that you use rdiff-backup ( http://www.nongnu.org/rdiff-backup/ ), a backup tool that stores an ordinary latest snapshot of the source along with reverse deltas for previous snapshots and redundant attribute information both in its own format. Matt I had a look at rdiff-backup, but I was trying to get something that spoke native rsync (IE, not to force any change on the client side). I do however agree that support should NOT be added in rsync. Rsync is a mirroring tool and not some elaborate tool that needs to know really how files are stored. In fact I'd go as far as to say many of the options rsync does support veer away from being a simple mirror tool (IE backup etc...). After some thought I think the best place to put such a change would be at the filesystem level. For example, if one had a FUSE filesystem that simply ran on top of an existing one, that wrote its files as I described (or uses diff-like methods), but presents a clean filesystem for rsync (or indeed any tool) to make use of. I believe I may look in that direction instead of hacking rsync. Brendan Grieve -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Thought on large files
On Thu, 2008-01-24 at 13:54 +0900, Brendan Grieve wrote: I had a look at rdiff-backup, but I was trying to get something that spoke native rsync (IE, not to force any change on the client side). To achieve this, you can have the client push to an rsync daemon and then have the daemon call rdiff-backup so that the rdiff-backup part happens entirely on the server. The idea is the same as the daemon-and-rsnapshot setup I described in the following message, but with rdiff-backup in place of rsnapshot as the backend: http://lists.samba.org/archive/rsync/2007-December/019470.html After some thought I think the best place to put such a change would be at the filesystem level. For example, if one had a FUSE filesystem that simply ran on top of an existing one, that wrote its files as I described (or uses diff-like methods), but presents a clean filesystem for rsync (or indeed any tool) to make use of. I believe I may look in that direction instead of hacking rsync. You could do that, but note that the rsync receiver won't explicitly tell the filesystem what files are similar, so you'll have to either keep a big hashtable to help you coalesce identical blocks globally or use some kludge like looking at what other files the receiver has open while it is writing the destination file. Matt -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Thought on large files
Matt McCutchen wrote: On Thu, 2008-01-24 at 13:54 +0900, Brendan Grieve wrote: I had a look at rdiff-backup, but I was trying to get something that spoke native rsync (IE, not to force any change on the client side). To achieve this, you can have the client push to an rsync daemon and then have the daemon call rdiff-backup so that the rdiff-backup part happens entirely on the server. The idea is the same as the daemon-and-rsnapshot setup I described in the following message, but with rdiff-backup in place of rsnapshot as the backend: http://lists.samba.org/archive/rsync/2007-December/019470.html Thanks, I do like that idea. -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Thought on large files
Hi There, I've been toying around with the code of rsync on and off for a while, and I had a thought that I would like some comments on. Its to do with very large files and disk space. One of the common uses of rsync is to use it as a backup program. A client connects to the rsync server, and sends over any changed files. If the client has very large files that have changed marginally, then rsync efficiently only sends the changed bits. On the server side, one may have it set up to create 'snapshots' of the existing data there by hardlinking that data to another directory periodically. Theres plenty of documentation on the web how to do this so I won't go into it further. This is very effective and uses quite little disk space since a file that does not change effectively doesn't take up any more disk space (not much more anyway), even if it exists now in many snapshots. One place where this falls down is if the file is very large. Lets say the file, whatever it is, is a 10Gb file, and that some small amount of data changes in it. This is efficiently sent accross by rsync, BUT the rsync server side will correctly break the hard-link and create a new file with the changed bits. This means, if even 1 byte of that 10Gb file changes, you now have to store that whole file again. I won't get into the whole issue of why one would have big files etc... I see it all the time, especially in the Microsoft world, with Outlook PST files, and Microsoft Exchange Database files. What my thoughts were is that if the server could transparently break a large file into chunks and store them that way, then one can still make use of hard-links efficiently. For example, going back to a 10Gb Exchange Database file, its likely not going to change too much during use. So if the server stored the huge clumsy 'priv1.edb' as: .priv1.edb._somemagicstring_.1 .priv1.edb._somemagicstring_.2 etc... and intelligently only broke the 'hard-links' of the bits that actually change, then it all works well. One could have an option to enable this for files bigger than a certain size, and break them into specific sized chunks. One could quite rightly argue that this changes rsync from a tool that synchronizes data between places to a dedicated backup tool (as the two sides will now have physically different data), however I could see it being useful, especially since it wouldn't need changes on the client side as the server still presents it as just one file. What are your comments? Good idea? Stupid idea? Been done before? Does anyone have some hints about where in the code I should look to make these changes so I can test it out? Brendan Grieve -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Thought on large files
On Wed, 2008-01-23 at 13:38 +0900, Brendan Grieve wrote: Lets say the file, whatever it is, is a 10Gb file, and that some small amount of data changes in it. This is efficiently sent accross by rsync, BUT the rsync server side will correctly break the hard-link and create a new file with the changed bits. This means, if even 1 byte of that 10Gb file changes, you now have to store that whole file again. What my thoughts were is that if the server could transparently break a large file into chunks and store them that way, then one can still make use of hard-links efficiently. This is a fine idea, but I don't think support for this should be added to rsync. Instead, I suggest that you use rdiff-backup ( http://www.nongnu.org/rdiff-backup/ ), a backup tool that stores an ordinary latest snapshot of the source along with reverse deltas for previous snapshots and redundant attribute information both in its own format. Matt -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
problems with rsync 2.6.9 and large files (up to 20GB)
Hi, I'm Jordi and I work at the university of barcelona. I'm trying to make a backup of our several clusters. In the past I worked with rsync with a very good results. When I try to backup some large directories (for example 1.5TB with a lot of large files 20GB) whit this command: rsync -aulHI --delete --partial --modify-window=2 --log-format= %t %o %l %f - -stats --exclude-from=/home/jingles/soft/sinc/sinc_conf/exclude_files.txt --temp -dir=/data/tmp --bwlimit=1 -e ssh -o ForwardX11=no -i /root/.ssh/id_rsa - l root /home server_name:/data/cerqt2/ it seems that rsync is working and finishing but when appears the message that summarizes the transferred files rsync starts the backup again (taking a lot of time) sent 640278263924 bytes received 217256421 bytes 8610488.88 bytes/sec total size is 783944943854 speedup is 1.22 rsync warning: some files vanished before they could be transferred (code 24) at main.c(977) [sender=2.6.9] 2007/12/11 17:02:04 del. 0 home/jingles/soft/sinc/bin/home/g1benjamin/timel-fre q.po15740 I must say that these clusters are working full day so there are several changes in the data stored in our hard disks. I googled for this problem and I found that there are a patch file to solve this problem (you can see the thread in http://www.mail-archive.com/rsync@lists.samba.org/msg17815.html ) but I can't found it. Can you help me? The youngest 3.0 version solve this problem? Thanks for any suggestion, regards, jordi -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: problems with rsync 2.6.9 and large files (up to 20GB)
On Wed, 2007-12-12 at 14:28 +0100, gorka barracuda wrote: it seems that rsync is working and finishing but when appears the message that summarizes the transferred files rsync starts the backup again (taking a lot of time) sent 640278263924 bytes received 217256421 bytes 8610488.88 bytes/sec total size is 783944943854 speedup is 1.22 rsync warning: some files vanished before they could be transferred (code 24) at main.c(977) [sender=2.6.9] 2007/12/11 17:02:04 del. 0 home/jingles/soft/sinc/bin/home/g1benjamin/timel-fre q.po15740 I've never known rsync to start over like that. Are you using a script that runs rsync repeatedly until it exits with code 0? I googled for this problem and I found that there are a patch file to solve this problem (you can see the thread in http://www.mail-archive.com/rsync@lists.samba.org/msg17815.html ) but I can't found it. That thread is about an optimization that makes the delta-transfer algorithm much faster on large files. The optimization is included in current development versions of rsync 3.0.0. This probably is not related to the problem with rsync starting over. Matt -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: problems with rsync 2.6.9 and large files (up to 20GB)
Hi Matt, thanks for your fastet reply 2007/12/12, Matt McCutchen [EMAIL PROTECTED]: On Wed, 2007-12-12 at 14:28 +0100, gorka barracuda wrote: it seems that rsync is working and finishing but when appears the message that summarizes the transferred files rsync starts the backup again (taking a lot of time) sent 640278263924 bytes received 217256421 bytes 8610488.88 bytes/sec total size is 783944943854 speedup is 1.22 rsync warning: some files vanished before they could be transferred (code 24) at main.c(977) [sender=2.6.9] 2007/12/11 17:02:04 del. 0 home/jingles/soft/sinc/bin/home/g1benjamin/timel-fre q.po15740 I've never known rsync to start over like that. Are you using a script that runs rsync repeatedly until it exits with code 0? ops! My apologizes, you are correct, my first version of the script checked the exits code and it seems that I forgot to comment it :( Sorry for the inconvenience. but, the second time that makes this re-rsync it takes the same time that the first time...it seems that it doen't make an incremental backup with our large files... Do you think that the cause could be the problem of the new optimized algorithm that you put in rsync 3.0.0? I googled for this problem and I found that there are a patch file to solve this problem (you can see the thread in http://www.mail-archive.com/rsync@lists.samba.org/msg17815.html ) but I can't found it. That thread is about an optimization that makes the delta-transfer algorithm much faster on large files. The optimization is included in current development versions of rsync 3.0.0. This probably is not related to the problem with rsync starting over. Matt Thanks again, and sorry for all jordi -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: problems with rsync 2.6.9 and large files (up to 20GB)
On Wed, 2007-12-12 at 17:01 +0100, gorka barracuda wrote: but, the second time that makes this re-rsync it takes the same time that the first time...it seems that it doen't make an incremental backup with our large files... Do you think that the cause could be the problem of the new optimized algorithm that you put in rsync 3.0.0? You are passing -I, which makes rsync transfer all regular files every time even if they appear to be identical on source and destination. Rsync does reduce network traffic using the delta-transfer algorithm, but the process still rewrites each destination file in full and uses a bunch of CPU on the sender (especially without the optimized algorithm), so it may take a long time. Consider whether you really need the extra certainty of catching changes afforded by -I. At the least, you could disable -I for rsync runs after the first. Matt -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: problems with rsync 2.6.9 and large files (up to 20GB)
It works! Thanks again Matt! 2007/12/12, Matt McCutchen [EMAIL PROTECTED]: On Wed, 2007-12-12 at 17:01 +0100, gorka barracuda wrote: but, the second time that makes this re-rsync it takes the same time that the first time...it seems that it doen't make an incremental backup with our large files... Do you think that the cause could be the problem of the new optimized algorithm that you put in rsync 3.0.0? You are passing -I, which makes rsync transfer all regular files every time even if they appear to be identical on source and destination. Rsync does reduce network traffic using the delta-transfer algorithm, but the process still rewrites each destination file in full and uses a bunch of CPU on the sender (especially without the optimized algorithm), so it may take a long time. Consider whether you really need the extra certainty of catching changes afforded by -I. At the least, you could disable -I for rsync runs after the first. Matt -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Building hash table times for large files
I'm running pre4 on a 77GB file. It seems like the hash table is taking a long time to be built. I'm not sure what is involved in this step but as an example the following is logged during a run: send_files(11, priv1.edb) send_files mapped priv1.edb of size 79187419136 calling match_sums priv1.edb f.st.. priv1.edb hash search b=131072 len=79187419136 built hash table for entries 0 - 52427 built hash table for entries 52428 - 104855 built hash table for entries 104856 - 157283 built hash table for entries 157284 - 209711 The first two hash table entries (0 - 52427 and 52428 - 104855) took only a few minutes. The next two entries written to the log file have taken several hours. Does this make sense? Should the entries take so long to be built? Rob -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Extremely poor rsync performance on very large files (near 100GB and larger)
On Mon, Jan 08, 2007 at 10:16:01AM -0800, Wayne Davison wrote: And one final thought that occurred to me: it would also be possible for the sender to segment a really large file into several chunks, handling each one without overlap, all without the generator or the receiver knowing that it was happening. I have a patch that implements this: http://rsync.samba.org/ftp/unpacked/rsync/patches/segment_large_hash.diff I'm wondering if anyone has any feedback on such a method being included in rsync? ..wayne.. -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Extremely poor rsync performance on very large files (near 100GB and larger)
On 10/7/07, Wayne Davison [EMAIL PROTECTED] wrote: On Mon, Jan 08, 2007 at 10:16:01AM -0800, Wayne Davison wrote: And one final thought that occurred to me: it would also be possible for the sender to segment a really large file into several chunks, handling each one without overlap, all without the generator or the receiver knowing that it was happening. I have a patch that implements this: http://rsync.samba.org/ftp/unpacked/rsync/patches/segment_large_hash.diff I like better performance, but I'm not entirely happy with a fixed upper limit on the distance that data can migrate and still be matched by the delta-transfer algorithm: if someone is copying an image of an entire hard disk and rearranges the partitions within the disk, rsync will needlessly retransmit all the partition data. An alternative would be to use several different block sizes spaced by a factor of 16 or so and have a separate hash table for each. Each hash table would hold checksums for a sliding window of 8/10*TABLESIZE blocks around the current position. This way, small blocks could be matched across small distances without overloading the hash table, and large blocks could still be matched across large distances. Matt -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Extremely poor rsync performance on very large files (near 100GB and larger)
Evan Harris wrote: Would it make more sense just to make rsync pick a more sane blocksize for very large files? I say that without knowing how rsync selects the blocksize, but I'm assuming that if a 65k entry hash table is getting overloaded, it must be using something way too small. rsync picks a block size that is the square root of the file size. As I didn't write this code, I can safely say that it seems like a very good compromise between too small block sizes (too many hash lookups) and too large blocksizes (decreased chance of finding matches). Should it be scaling the blocksize with a power-of-2 algorithm rather than the hash table (based on filesize)? If Wayne intends to make the hash size a power of 2, maybe selecting block sizes that are smaller will make sense. We'll see how 3.0 comes along. I haven't tested to see if that would work. Will -B accept a value of something large like 16meg? It should. That's about 10 times the block size you need in order to not overflow the hash table, though, so a block size of 2MB would seem more appropriate to me for a file size of 100GB. At my data rates, that's about a half a second of network bandwidth, and seems entirely reasonable. Evan I would just like to note that since I submitted the large hash table patch, I have seen no feedback on anyone actually testing it. If you can compile a patched rsync and report how it goes, that would be very valuable to me. Shachar -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Extremely poor rsync performance on very large files (near 100GB and larger)
On Mon, Jan 08, 2007 at 01:37:45AM -0600, Evan Harris wrote: I've been playing with rsync and very large files approaching and surpassing 100GB, and have found that rsync has excessively very poor performance on these very large files, and the performance appears to degrade the larger the file gets. Yes, this is caused by the current hashing algorithm that the sender uses to find matches for moved data. The current hash table has a fixed size of 65536 slots, and can get overloaded for really large files. There is a diff in the patches dir that makes rsync work better with large files: dynamic_hash.diff. This makes the size of the hash table depend on how many blocks there are in the transfer. It does speed up the transfer of large files significantly, but since it introduces a mod (%) operation on a per-byte basis, it slows down the transfer of normal sized files significantly. I'm going to be checking into using a hash algorithm with a table that is always a power of 2 in size as an alternative implementation of this dynamic hash algorithm. That will hopefully not bloat the CPU time for normal-sized files. Alternately, the hashing algorithm could be made to vary depending on the file's size. I'm hoping to have this improved in the upcoming 3.0.0 release. And one final thought that occurred to me: it would also be possible for the sender to segment a really large file into several chunks, handling each one without overlap, all without the generator or the receiver knowing that it was happening. The upside is that huge files could be handled this way, but the downside is that the incremental-sync algorithm would not find matches spanning the chunks. It would be interesting to test this and see if the rsync algorithm would be better served by using a larger number of smaller chunks while segmenting the file, rather than a smaller number of much larger chunks while considering the file as a whole. ..wayne.. -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Extremely poor rsync performance on very large files (near 100GB and larger)
On Mon, 8 Jan 2007, Wayne Davison wrote: On Mon, Jan 08, 2007 at 01:37:45AM -0600, Evan Harris wrote: I've been playing with rsync and very large files approaching and surpassing 100GB, and have found that rsync has excessively very poor performance on these very large files, and the performance appears to degrade the larger the file gets. Yes, this is caused by the current hashing algorithm that the sender uses to find matches for moved data. The current hash table has a fixed size of 65536 slots, and can get overloaded for really large files. ... Would it make more sense just to make rsync pick a more sane blocksize for very large files? I say that without knowing how rsync selects the blocksize, but I'm assuming that if a 65k entry hash table is getting overloaded, it must be using something way too small. Should it be scaling the blocksize with a power-of-2 algorithm rather than the hash table (based on filesize)? I know that may result in more network traffic as a bigger block containing a difference will be considered changed and need to be sent instead of smaller blocks, but in some circumstances wasting a little more network bandwidth may be wholly warranted. Then maybe the hash table size doesn't matter, since there are fewer blocks to check. I haven't tested to see if that would work. Will -B accept a value of something large like 16meg? At my data rates, that's about a half a second of network bandwidth, and seems entirely reasonable. Evan -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Error while transfering large files
Hi, I'm using rsync to backup my data from my Linux machine (SUSE10.1) and Windows (XP) I've mounted a windows share on my linux and I'm trying now to copy files to the windows with rsync. The windows share is a NTFS filesystem It's all working except I have a error on large files. This is the command that I'm trying: rsync -a --no-o --delete /public/Games/Call_of_Duty2/ /backup/public/Games/Call_of_Duty2/ and this is the error I'm receiving on this command: rsync: writefd_unbuffered failed to write 4 bytes [sender]: Broken pipe (32) rsync: write failed on /backup/public/Games/Call_of_Duty2/dev-cod2.iso: File too large (27) rsync error: error in file IO (code 11) at receiver.c(258) [receiver=2.6.9] rsync: connection unexpectedly closed (76 bytes received so far) [generator] rsync error: error in rsync protocol data stream (code 12) at io.c(453) [generator=2.6.9] rsync: connection unexpectedly closed (36 bytes received so far) [sender] rsync error: error in rsync protocol data stream (code 12) at io.c(453) [sender=2.6.9] Then I tried (as described in on the website) with the strace in front. The command format was as followed: strace -f rsync -a --no-o --delete /public/Games/Call_of_Duty2/ /backup/public/Games/Call_of_Duty2/ 2./tracedump.log This is a part of the tracedump.log file. Where it went wrong. [pid 826] select(1, [0], [], NULL, {60, 0} unfinished ... [pid 824] ... write resumed ) = 32768 [pid 826] ... select resumed ) = 1 (in [0], left {60, 0}) [pid 824] select(5, NULL, [4], [4], {60, 0} unfinished ... [pid 826] read(0, \326\3357\233`\v\t\26\3352G\33\210\314 \221\371\373\20..., 8184) = 8184 [pid 826] select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) [pid 826] read(0, \232\t\316S\177\300\373\362\350\26\371\311\255v\352\212..., 8184) = 8184 [pid 826] select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) [pid 826] read(0, unfinished ... [pid 824] ... select resumed ) = 1 (out [4], left {60, 0}) [pid 826] ... read resumed \344\31\f\7\360e\17\301\227\25\202/\233\324i\256\246\0..., 8184) = 8184 [pid 824] write(4, \0\200\0\0, 4 unfinished ... [pid 826] select(1, [0], [], NULL, {60, 0} unfinished ... [pid 824] ... write resumed ) = 4 [pid 826] ... select resumed ) = 1 (in [0], left {60, 0}) [pid 824] read(3, unfinished ... [pid 826] read(0, unfinished ... [pid 824] ... read resumed \31\310A\232o\356\30r\223p\215B\27\313\337\357\371\364..., 262144) = 262144 [pid 826] ... read resumed \3068\340\203\337\254\360\374%\217\321\320\352\343q\17..., 8184) = 8184 [pid 824] select(5, NULL, [4], [4], {60, 0} unfinished ... [pid 826] select(1, [0], [], NULL, {60, 0} unfinished ... [pid 824] ... select resumed ) = 1 (out [4], left {60, 0}) [pid 826] ... select resumed ) = 1 (in [0], left {60, 0}) [pid 824] write(4, \31\310A\232o\356\30r\223p\215B\27\313\337\357\371\364..., 32768 unfinished ... [pid 826] read(0, unfinished ... [pid 824] ... write resumed ) = 32768 [pid 826] ... read resumed \347\235\303/\210G\365\n\304\325\327\335A\374\272\31\320..., 8184) = 8184 [pid 824] select(5, NULL, [4], [4], {60, 0} unfinished ... [pid 826] write(1, x\17\240\2267\376m\257G\271\373\217\264\375\325\312\302..., 262144) = 262143 [pid 826] write(1, 5, 1)= -1 EFBIG (File too large) [pid 826] --- SIGXFSZ (File size limit exceeded) @ 0 (0) --- [pid 826] write(4, ^\0\0\10rsync: write failed on \/bac..., 98) = 98 [pid 826] rt_sigaction(SIGUSR1, {SIG_IGN}, unfinished ... [pid 825] ... select resumed ) = 1 (in [3], left {2, 488000}) [pid 826] ... rt_sigaction resumed NULL, 8) = 0 [pid 825] read(3, unfinished ... [pid 826] rt_sigaction(SIGUSR2, {SIG_IGN}, NULL, 8) = 0 [pid 826] unlink(.dev-cod2.iso.GrkNfL unfinished ... [pid 825] ... read resumed ^\0\0\10, 4) = 4 [pid 825] select(4, [3], [], NULL, {60, 0}) = 1 (in [3], left {60, 0}) [pid 825] read(3, rsync: write failed on \/backup/..., 94) = 94 [pid 825] select(2, NULL, [1], [1], {60, 0}) = 1 (out [1], left {60, 0}) [pid 825] write(1, ^\0\0\10rsync: write failed on \/bac..., 98) = 98 [pid 825] time(NULL) = 1164306617 [pid 825] select(4, [3], [], NULL, {60, 0} unfinished ... [pid 826] ... unlink resumed ) = 0 [pid 826] write(4, L\0\0\10rsync error: error in file I..., 80) = 80 [pid 826] exit_group(11) = ? Process 826 detached [pid 824] ... select resumed ) = 1 (out [4], left {59, 88}) [pid 824] write(4, \0\200\0\0, 4) = -1 EPIPE (Broken pipe) [pid 824] --- SIGPIPE (Broken pipe) @ 0 (0) --- [pid 824] write(2, rsync: writefd_unbuffered failed..., 76rsync: writefd_unbuffered failed to write 4 bytes [sender]: Broken pipe (32)) = 76 [pid 824] write(2, \n, 1 ) = 1 [pid 824] select(6, [5], [], NULL, {30, 0}) = 1 (in [5], left {30, 0}) [pid 825
Rsync hangs on large files over stunnel
Greetings. Here's my setup: On the server - rsync 2.5.6 protocol version 26 stunnel 4.04 on i686-suse-linux-gnu PTHREAD with OpenSSL 0.9.7b On the client - rsync version 2.6.6 protocol version 29 stunnel 4.14 on i686-suse-linux-gnu UCONTEXT+POLL+IPv4+LIBWRAP with OpenSSL 0.9.8a Both ends run rsync as root The rsync daemon listens on a non-default port that is only bound to 127.0.01. Stunnel securely proxies between an exposed high port and the rsync port. The client is configured to pull from the server The connection works. Rsync starts A series of small files are synchronized The first moderately large file causes rsync to hang, typically at 66% completion. The file is approximately 25 Meg. netstat on both sides appears to show an established connection, but without any activity after the initial series of files are transferred. I have tried using the --protocol option to force use of protocol 26. I have tried using the --no-blocking-io Any help you can offer is very much appreciated. Regards Gene -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: large files not being synced properly while being uploaded to samba share
I've experienced cases like that. I've been able to repair the file with an rsync -I, although this doesn't address the cause of the problem. On Tue, 2006-10-03 at 14:42 -0500, Mark Osborne wrote: Hello, I have run into an issue with rsync that I’m hoping someone can help with. We are using rsync to mirror data between a samba share on an internal staging server and our production ftp servers. The rsync runs from cron every 15 minutes. Occasionally, the rsync will run while somebody is uploading a large file to the samba share (for instance an iso image). The file appears to make it out to the production ftp servers, and an ls shows it to have the correct file size and timestamp. However, an md5sum of the file shows that it is different from the file on the staging server. Subsequent runs of the rsync do not update the file. I have tried to run the rsync manually with the –c flag even though we wouldn’t really want to implement that because of how long it makes rysnc take. Even with checksum turned on, the file still did not get correctly updated. If the file is completely uploaded to the share before the rsync runs there does not appear to be an issue. Originally I thought that there may be a problem with different versions of rsync on the servers. The staging server was running rsync 2.5.5 while the production servers were running 2.5.7. I have gotten rsync 2.6.8 on both servers and am still experiencing the problem. More information about the servers Staging server – Solaris 8, rsync 2.6.8 Ftp1 – Redhat AS 2.1, rsync 2.6.8 Ftp2 – Redhat AS 2.1, rsync 2.6.8 Has anybody else seen this problem or have any ideas? Thanks, Mark ~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~ Mark Osborne Web Systems Engineer [EMAIL PROTECTED] (512) 683-5019 ~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~ -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
large files not being synced properly while being uploaded to samba share
Hello, I have run into an issue with rsync that I’m hoping someone can help with. We are using rsync to mirror data between a samba share on an internal staging server and our production ftp servers. The rsync runs from cron every 15 minutes. Occasionally, the rsync will run while somebody is uploading a large file to the samba share (for instance an iso image). The file appears to make it out to the production ftp servers, and an ls shows it to have the correct file size and timestamp. However, an md5sum of the file shows that it is different from the file on the staging server. Subsequent runs of the rsync do not update the file. I have tried to run the rsync manually with the –c flag even though we wouldn’t really want to implement that because of how long it makes rysnc take. Even with checksum turned on, the file still did not get correctly updated. If the file is completely uploaded to the share before the rsync runs there does not appear to be an issue. Originally I thought that there may be a problem with different versions of rsync on the servers. The staging server was running rsync 2.5.5 while the production servers were running 2.5.7. I have gotten rsync 2.6.8 on both servers and am still experiencing the problem. More information about the servers Staging server – Solaris 8, rsync 2.6.8 Ftp1 – Redhat AS 2.1, rsync 2.6.8 Ftp2 – Redhat AS 2.1, rsync 2.6.8 Has anybody else seen this problem or have any ideas? Thanks, Mark ~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~ Mark Osborne Web Systems Engineer [EMAIL PROTECTED] (512) 683-5019 ~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~ -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Problems with rsync, large files and partial.
I can't seem to get rsync to restart where it left off when I am syncing a large file ( 5GB). Below is some info on what I have been doing. If someone has the energy to barrel through my comments that would be great. Out of curiosity is there an alternative to rsync for large files? I believe I looked all over the web for answers and I found partial answers and just didn't seem clear if this works. If anyone has pointers or explanations it would be most helpful. Here is my set up and some explanation: - I use rsync over ssh to sync over our wan. - I sync over an 8 hour window every night. - After 8 hours if the sync is not complete, it gets killed and restarts the next evening. - Everything works as expected except for one very large file which always has to restart from the beginning even though I use the partial command. - I have tried it with and without compression. - I have tried it with various versions of rsync including the latest. - There were some posts that seem to imply that if the file already existed on the backup system and using --append might help, but that didn't make sense. - I have tried with whole-file and nowhole-file and that does not make difference. - I do notice a bunch .file_name.gibberish in the target directory which is the partial backup, but it does not seem to use them in subsequent tries. It just seems the right thing to do would be for the rsync to continue where it left off the night before, but it doesn't. - Here is my command rsync --archive --verbose --progress -partial --stats -e ssh xxx.xxx.xxx:/dir panos __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Problems with rsync, large files and partial.
On Mon 24 Apr 2006, Panos Koutsoyannis wrote: - I use rsync over ssh to sync over our wan. - I sync over an 8 hour window every night. - After 8 hours if the sync is not complete, it gets killed and restarts the next evening. How do you kill it? Via kill -9? - I do notice a bunch .file_name.gibberish in the target directory which is the partial backup, but it does not seem to use them in subsequent tries. It just seems the right thing to do would be for the rsync to continue where it left off the night before, but it doesn't. - Here is my command rsync --archive --verbose --progress -partial --stats -e ssh xxx.xxx.xxx:/dir With --partial it should rename the .file_name.gibberish file to file_name when interrupted, so that it can resume using the partial file as a start (which it won't do with .file_name.gibberish). That's why I suspect you're not stopping rsync politely... Paul Slootman -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Problems with rsync, large files and partial.
Ah...That make sense. I do not stop it politely .. you are right. I will fix up the signal handling and give it whirl. Thanks Panos --- Paul Slootman [EMAIL PROTECTED] wrote: On Mon 24 Apr 2006, Panos Koutsoyannis wrote: - I use rsync over ssh to sync over our wan. - I sync over an 8 hour window every night. - After 8 hours if the sync is not complete, it gets killed and restarts the next evening. How do you kill it? Via kill -9? - I do notice a bunch .file_name.gibberish in the target directory which is the partial backup, but it does not seem to use them in subsequent tries. It just seems the right thing to do would be for the rsync to continue where it left off the night before, but it doesn't. - Here is my command rsync --archive --verbose --progress -partial --stats -e ssh xxx.xxx.xxx:/dir With --partial it should rename the .file_name.gibberish file to file_name when interrupted, so that it can resume using the partial file as a start (which it won't do with .file_name.gibberish). That's why I suspect you're not stopping rsync politely... Paul Slootman -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Problems with rsync, large files and partial.
Just changed by scripts and that was definately my problem and its fixed. Thank you panos --- Panos Koutsoyannis [EMAIL PROTECTED] wrote: Ah...That make sense. I do not stop it politely .. you are right. I will fix up the signal handling and give it whirl. Thanks Panos --- Paul Slootman [EMAIL PROTECTED] wrote: On Mon 24 Apr 2006, Panos Koutsoyannis wrote: - I use rsync over ssh to sync over our wan. - I sync over an 8 hour window every night. - After 8 hours if the sync is not complete, it gets killed and restarts the next evening. How do you kill it? Via kill -9? - I do notice a bunch .file_name.gibberish in the target directory which is the partial backup, but it does not seem to use them in subsequent tries. It just seems the right thing to do would be for the rsync to continue where it left off the night before, but it doesn't. - Here is my command rsync --archive --verbose --progress -partial --stats -e ssh xxx.xxx.xxx:/dir With --partial it should rename the .file_name.gibberish file to file_name when interrupted, so that it can resume using the partial file as a start (which it won't do with .file_name.gibberish). That's why I suspect you're not stopping rsync politely... Paul Slootman -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
DO NOT REPLY [Bug 3358] rsync chokes on large files
https://bugzilla.samba.org/show_bug.cgi?id=3358 [EMAIL PROTECTED] changed: What|Removed |Added Status|NEW |RESOLVED Resolution||INVALID --- Comment #5 from [EMAIL PROTECTED] 2006-01-02 09:49 MST --- (In reply to comment #4) If rsyns is _not_ checksumming files, why does rsyns remain in this state: [...] for maybe 30 minutes when it transfers my big file? Because it is transferring the file. Yes, this involves file-transfer checksumming, but I was talking about pre-transfer checksum generation (and its use in determining which files get transferred) which is what --checksum enables. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the QA contact for the bug, or are watching the QA contact. -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
DO NOT REPLY [Bug 3358] rsync chokes on large files
https://bugzilla.samba.org/show_bug.cgi?id=3358 --- Comment #6 from [EMAIL PROTECTED] 2006-01-02 10:21 MST --- This is weird, there is no network activity during this building file list phase. However, as soon as it is finished, rsync saturates my network. I thought rsync worked, if the file's size and modification date doesn't match, by creating a binary tree and then checksumming the parts between every node, recursively to the root of the tree, and then only transferring the parts where the checksum didn't match. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the QA contact for the bug, or are watching the QA contact. -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
DO NOT REPLY [Bug 3358] rsync chokes on large files
https://bugzilla.samba.org/show_bug.cgi?id=3358 --- Comment #7 from [EMAIL PROTECTED] 2006-01-02 11:02 MST --- (In reply to comment #6) This is weird, there is no network activity during this building file list phase. However, as soon as it is finished, rsync saturates my network. What is weird about that? As soon as rsync outputs the 1 file to consider message, the file-list-building stage is over, and rsync then starts to transfer the file if it is in need of an update. (If --checksum was specified, the receiving rsync would first be busily checksumming the file to decide if the file was actually changed before (possibly) starting the transfer.) I thought rsync worked, if the file's size and modification date doesn't match, by creating a binary tree and then checksumming the parts between every node, recursively to the root of the tree, and then only transferring the parts where the checksum didn't match. There are no b-trees involved -- rsync immediately starts to send checksum info from the receiving side to the sender, who then diffs the remote checksums with the sending-side file and sends instructions to the receiver on how to recreate the file using as much of the local data as possible (this new file is built in a separate temp-file unless the --inplace option was specified). -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the QA contact for the bug, or are watching the QA contact. -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
DO NOT REPLY [Bug 3358] rsync chokes on large files
https://bugzilla.samba.org/show_bug.cgi?id=3358 --- Comment #8 from [EMAIL PROTECTED] 2006-01-02 11:42 MST --- What is weird about that? You wrote in a previous comment when I asked why rsync is considering a file for 30 minutes if it is not checksumming it: Because it is transferring the file. To which I replied that there is no noticable network activity when rsync is in this state. However, when it is finished with the 'consideration phase' the network is saturated. I think it is weird that transferring a 25 GB file doesn't generate any network activity when rsync is in the 'consideration phase' but transferring the same file when rsync is in another phase saturates the network. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the QA contact for the bug, or are watching the QA contact. -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
DO NOT REPLY [Bug 3358] rsync chokes on large files
https://bugzilla.samba.org/show_bug.cgi?id=3358 --- Comment #2 from [EMAIL PROTECTED] 2005-12-29 13:47 MST --- Intereseting, didn't knwo that rsync worked that way - I thought the default behaviour was to only replace the parts of the file that had changed. Anyway, this motivates a follow-up question: If I understand it correctly if you file 1 on computer A and file 2 on computer B and some minor changes has been made to 1 and you want to sync these changes to B rsync basically make a copy of 2 and works with that. If 1/2 are big, like in my example when they where 25-50 GB, the copy operation from 2.0 to 2.1 generates a lot of disk acticity. In my case when I rsync between two laptops all this disk activity is a little unfortunate since laptop drives are so slow. Now to my question, is there a way to reduce disk activity? Does the --inplace switch work around this? Thanks. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the QA contact for the bug, or are watching the QA contact. -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
DO NOT REPLY [Bug 3358] rsync chokes on large files
https://bugzilla.samba.org/show_bug.cgi?id=3358 --- Comment #3 from [EMAIL PROTECTED] 2005-12-29 13:48 MST --- Btw, I am just trying your suggestions. First I will try the inplace switch and secondly I will test syncing with twice the amount of space required for the file available. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the QA contact for the bug, or are watching the QA contact. -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
DO NOT REPLY [Bug 3358] rsync chokes on large files
https://bugzilla.samba.org/show_bug.cgi?id=3358 --- Comment #4 from [EMAIL PROTECTED] 2005-12-29 13:54 MST --- Sorry for spamming, but I just realised what you meant when you wrote: You can use the --checksum option to avoid this unneeded update at the expense of a lot of extra disk I/O to compute each file's checksum before figuring out if a transfer is needed. If rsyns is _not_ checksumming files, why does rsyns remain in this state: building file list ... 1 file to consider for maybe 30 minutes when it transfers my big file? -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the QA contact for the bug, or are watching the QA contact. -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
[Bug 3358] New: rsync chokes on large files
https://bugzilla.samba.org/show_bug.cgi?id=3358 Summary: rsync chokes on large files Product: rsync Version: 2.6.6 Platform: PPC OS/Version: Mac OS X Status: NEW Severity: major Priority: P3 Component: core AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] QAContact: [EMAIL PROTECTED] I try to rsync a 25-50 GB AES128 encrypted disk image called 'test' between two Mac OS X-machines. This is with rsync 2.6.6 (is there a 2.6.7? The front page just says 2.6.6) % rsync -av --progress --stats --rsh=ssh /test 2nd-machine:/test Warning: No xauth data; using fake authentication data for X11 forwarding. tcsh: TERM: Undefined variable. building file list ... 1 file to consider test rsync: writefd_unbuffered failed to write 4 bytes: phase unknown [sender]: Broken pipe (32) rsync: write failed on /test: No space left on device (28) rsync error: error in file IO (code 11) at /SourceCache/rsync/rsync-20/rsync/receiver.c(312) rsync: connection unexpectedly closed (92 bytes received so far) [generator] rsync error: error in rsync protocol data stream (code 12) at /SourceCache/rsync/rsync-20/rsync/io.c(359) rsync: connection unexpectedly closed (1240188 bytes received so far) [sender] rsync error: error in rsync protocol data stream (code 12) at io.c(434) The receiving machine has space left (2+ GB). Before I upgraded to 2.6.6 I had 2.6.2 on the sending machine and 2.6.3 on the receiving machine. With that combination I got another error message: % rsync -av --progress --stats --rsh=ssh test 2nd-machine:/test Warning: No xauth data; using fake authentication data for X11 forwarding. tcsh: TERM: Undefined variable. building file list ... 1 file to consider test rsync: writefd_unbuffered failed to write 4 bytes: phase unknown: Broken pipe rsync error: error in rsync protocol data stream (code 12) at /SourceCache/rsync/rsync-14/rsync/io.c(836) The files _should_ be identical, I first transfered them with sftp without problems but they will change in the future and then I want to use rsync to keep them identical. This was just a test to verify my plan - a test that didn't seem to workout that well. I don't know if this matter but here are some more information about my setup: Powerbook G3 with 10.3.9 Powerbook G4 with 10.4.3 Wireless 802.11G-network between router and G4, wired network between G3 and router. The router is a Linksys WRT54GS. Both the older versions and the most recent versions works very well when I work with smaller filer (for example, I synchronized 40 GB with mp3:a without problems) -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the QA contact for the bug, or are watching the QA contact. -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
[Bug 3358] rsync chokes on large files
https://bugzilla.samba.org/show_bug.cgi?id=3358 --- Comment #1 from [EMAIL PROTECTED] 2005-12-28 11:21 MST --- The pertinent error is this: rsync: write failed on /test: No space left on device (28) That is an error from your OS that indicates that there was no room to write out the destination file. Keep in mind that when rsync updates a file, it creates a new version of the file (unless --inplace was specifed), so your destination directory needs to have enough free space available to hold the largest updated file. As for why the file is updating, if the modified time and size don't match, rsync will update the file (efficiently). You can use the --checksum option to avoid this unneeded update at the expense of a lot of extra disk I/O to compute each file's checksum before figuring out if a transfer is needed. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the QA contact for the bug, or are watching the QA contact. -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsync and large files
On Mon, Aug 15, 2005 at 12:11:35PM -0400, Sameer Kamat wrote: My question is, I am observing that the data being sent over is almost equal to the size of the file. Would an insertion of a few blocks in a binary file, move the alignment of the entire file and cause this to happen? That depends on the file and your options. Is the file compressed? If so, all data after the change is radically different, and will not match (unless you use an rsync-friendly compression algorithm, such as gzip --rsyncable). If that's not the case, are you using --inplace? (That option specifically mentions that it doesn't handle early insertions well.) Or is --whole-file being specified or implied? (It is implied by a local transfer, so specify --no-whole-file if you need to test using a local transfer.) ..wayne.. -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
rsync and large files
Hello, I have a few files of the order of 50G, that get synchronized to a remote server over ssh. These files have binary data and the change before the next time they are synchronized over. My question is, I am observing that the data being sent over is almost equal to the size of the file. Would an insertion of a few blocks in a binary file, move the alignment of the entire file and cause this to happen ? Does rsync internally understand that the file is almost the same, but the data is just skewed a little bit ? Please advise. Thanks, Sameer. -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Transfering very large files / and restarting failures
On Wed, Jul 27, 2005 at 04:29:46PM -0700, Todd Papaioannou wrote: Not sure I have the mojo to mess with the patches though! I applied the --append patch to the CVS source, so if you want to snag version 2.6.7cvs, you can grab it via the latest nightly tar file: http://rsync.samba.org/ftp/rsync/nightly/ I did some very simple timing tests to see how fast it would be to do a local transfer of a 250MB file that had a little over the first half of the file already present. The results were: Normal --whole-file: ~31 seconds (straight copy, data speedup 1.00) Forced --no-whole-file: ~73 seconds (data speedup 2.39) Using --inplace: ~30 seconds (data speedup 2.39) Using --update: ~24 seconds (data speedup 2.40) (The data speedup values are rsync's standard speedup values, which indicate how much the transferred data was reduced over the wire.) Also keep in mind that even though the --append option is writing out less than 1/2 the file, it is still reading in all the existing data in the partial file in order to compute the full-file checksum. I then tried the same transfer test between two systems on my local wireless network (11g). Here are the results: Forced --whole-file: ~134 seconds (straight copy, data speedup 1.00) Normal --no-whole-file: ~100 seconds (data speedup 2.39) Using --inplace: ~95 seconds (data speedup 2.39) Using --update: ~60 seconds (data speedup 2.40) is there another protocol you might know of, other than ftp that supports byte level restart/append? I can recall seeing the continue feature in wget and bittorrent. ..wayne.. -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Transfering very large files / and restarting failures
Hi, My situation is that I would like to use rsync to copy very large files within my network/systems. Specifically, these files are in the order of 10-100GB. Needless to say, I would like to be able to restart a transfer if it only partially succeeded, but NOT repeat the work already done. Currently, I am initiating the transfer with this command: rsync --partial --progress theFile /path/to/dest where both theFile and /path/to/dest are local drives. In the future /path/to/dest will be an NFS mount. This succeeds in writing theFile to the destination as bytes flow. I.e. I get a partial file there, until the full transfer is successful. Now, say something failed. I want to restart that transfer, and am trying something like: rsync -u --no-whole-file --progress theFile /path/to/dest However, the stats shown during the progress seem to imply that the whole transfer is starting again. Can someone help me out with the correct options to ensure that if I want to restart a copy I can take advantage of the bytes that have already been transferred? Many Thanks Todd -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Transfering very large files / and restarting failures (again)
Woops! In my last email, I meant to say the second command was: rsync --no-whole-file --progress theFile /path/to/dest Todd Hi, My situation is that I would like to use rsync to copy very large files within my network/systems. Specifically, these files are in the order of 10-100GB. Needless to say, I would like to be able to restart a transfer if it only partially succeeded, but NOT repeat the work already done. Currently, I am initiating the transfer with this command: rsync --partial --progress theFile /path/to/dest where both theFile and /path/to/dest are local drives. In the future /path/to/dest will be an NFS mount. This succeeds in writing theFile to the destination as bytes flow. I.e. I get a partial file there, until the full transfer is successful. Now, say something failed. I want to restart that transfer, and am trying something like: rsync -u --no-whole-file --progress theFile /path/to/dest However, the stats shown during the progress seem to imply that the whole transfer is starting again. Can someone help me out with the correct options to ensure that if I want to restart a copy I can take advantage of the bytes that have already been transferred? Many Thanks Todd -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Transfering very large files / and restarting failures
On Wed, Jul 27, 2005 at 01:50:39PM -0700, Todd Papaioannou wrote: where both theFile and /path/to/dest are local drives. [...] rsync -u --no-whole-file --progress theFile /path/to/dest When using local drives, the rsync protocol (--no-whole-file) slows things down, so you don't want to use it (the rsync protocol's purpose is to trade disk I/O and CPU cycles to reduce network bandwidth, so it doesn't help when the transfer bandwidth is very high, as it is in a local copy). Note also that you're not preserving the file times, which makes rsync less efficient (which forces you to use the -u option to avoid a retransfer) -- you're usually better off using -t (--times) unless you have some overriding reason to omit it. However, the stats shown during the progress seem to imply that the whole transfer is starting again. Yes, that's what rsync does. It retransfers the whole file, but it uses the local data to make the amount of data flowing over the socket (or pipe) smaller. The already-sent data is thus coming from the original, partially-transferred file rather than coming from the sender (which would lower the network bandwidth if this were a remote connection). In the future /path/to/dest will be an NFS mount. You don't want to do that unless you're network speed is higher than your disk speed -- with slower net speeds you are better off rsyncing directly to the remote machine that is the source of the NFS mount so that rsync can reduce the amount of data it is sending. With higher net speeds you're better off just transferring the data via --whole-file and not using --partial. One other possibility is the --append option from the patch named patches/append.diff -- this implements a more efficient append mode for incremental transfers (I'm considering adding this to the next version of rsync). ..wayne.. -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
RE: Transfering very large files / and restarting failures
Wayne, Thanks for the swift answers and insight. However, the stats shown during the progress seem to imply that the whole transfer is starting again. Yes, that's what rsync does. It retransfers the whole file, but it uses the local data to make the amount of data flowing over the socket (or pipe) smaller. The already-sent data is thus coming from the original, partially-transferred file rather than coming from the sender (which would lower the network bandwidth if this were a remote connection). Hmm, OK. I guess my mental model of what rsync does is wrong. If I read this correctly, I'm doing a local to local copy then I get no benefit from re-using the partial copy. If however, I were doing a remote copy, I would definitely get a benefit. In the future /path/to/dest will be an NFS mount. You don't want to do that unless you're network speed is higher than your disk speed -- with slower net speeds you are better off rsyncing directly to the remote machine that is the source of the NFS mount so that rsync can reduce the amount of data it is sending. With higher net speeds you're better off just transferring the data via --whole-file and not using --partial. One other possibility is the --append option from the patch named patches/append.diff -- this implements a more efficient append mode for incremental transfers (I'm considering adding this to the next version of rsync). Ahh, that sounds like what I'm looking for. I was hoping rsync supported something like ftp restart, which would restart the file transfer down to the byte level. I'll give it a look. Not sure I have the mojo to mess with the patches though! By the way, is there another protocol you might know of, other than ftp that supports byte level restart/append? Thanks Todd -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Problem with rsync --inplace very slow/hung on large files
I'm trying to rsync a very large (62gig) file from one machine to another as part of a nightly backup. If the file does not exist at the destination, it takes about 2.5 hours to copy in my environment. But, if the file does exist and --inplace is specified, and the file contents differ, rsync either is so significantly slowed as to take more than 30 hours (the longest I've let an instance run), or it is just hung. Running with -vvv gives this as the last few lines of the output: match at 205401064 last_match=205401064 j=821 len=250184 n=0 match at 205651248 last_match=205651248 j=822 len=250184 n=0 match at 205901432 last_match=205901432 j=823 len=250184 n=0 match at 206151616 last_match=206151616 j=824 len=250184 n=0 at which point it has not printed anything else since I last looked at the current run attempt about 8 hours ago. Doing an strace on the rsync processes on the sending and receiving machines it appears that there is still reading and writing going on, but there isn't any output from the -vvv and I can't tell if it's really doing anything. Is this excessive slowness just an artifact of doing an rsync --inplace on such a large file, and it will eventually complete if let run long enough? I would try testing without the --inplace, but the system in question doesn't have enough disk space for two copies of that size file, which is why I am using --inplace. Using 2.6.3, on Debian. Any help appreciated. Evan -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html