[Bug 5482] look into a hierarchical checksum algorithm that would help to efficiently transmit really large files

2020-07-26 Thread just subscribed for rsync-qa from bugzilla via rsync
https://bugzilla.samba.org/show_bug.cgi?id=5482

Wayne Davison  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Wayne Davison  ---
Rsync now supports xxhash, which is a huge speedup in checksum speed. It also
has support for x86 acceleration for the rolling checksum (checksum1)

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 13433] out_of_memory in receive_sums on large files

2020-07-26 Thread just subscribed for rsync-qa from bugzilla via rsync
https://bugzilla.samba.org/show_bug.cgi?id=13433

Wayne Davison  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Wayne Davison  ---
You can specify a larger malloc sanity check in the latest rsync (which will
also let you know when the limit is exceeded instead of claiming that it is out
of memory).

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 13433] out_of_memory in receive_sums on large files

2020-06-24 Thread just subscribed for rsync-qa from bugzilla via rsync
https://bugzilla.samba.org/show_bug.cgi?id=13433

--- Comment #5 from MulticoreNOP  ---
might be related to bug #12769

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync rewrites all blocks of large files although it uses delta transfer

2019-02-14 Thread Paul Slootman via rsync
On Thu 14 Feb 2019, Delian Krustev via rsync wrote:
> On Wednesday, February 13, 2019 6:25:59 PM EET Remi Gauvin 
>  wrote:
> > If the --inplace delta is as large as the filesize, then the
> > structure/location of the data has changed enough that the whole file
> > would have to be written out in any case.
> 
> This is not the case.
> If you see my original post you would have noticed that the delta transfer 
> finds only about 20 MB of differences within the almost 2G datafile.

I think you're missing the point of Remi's message.

Say the original file is:

ABCDEFGHIJ

The new file is:

XABCDEFGHI

Then the delta is just 10%, but the entire file needs to be rewritten as
the structure is changed.


Paul

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: rsync rewrites all blocks of large files although it uses delta transfer

2019-02-13 Thread Delian Krustev via rsync
On Wednesday, February 13, 2019 6:25:59 PM EET Remi Gauvin 
 wrote:
> If the --inplace delta is as large as the filesize, then the
> structure/location of the data has changed enough that the whole file
> would have to be written out in any case.

This is not the case.
If you see my original post you would have noticed that the delta transfer 
finds only about 20 MB of differences within the almost 2G datafile.

The problem with --inplace without --backupdir is that delta transfers can no 
longer work efficiently.


Cheers
--
Delian

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: rsync rewrites all blocks of large files although it uses delta transfer

2019-02-13 Thread Delian Krustev via rsync
On Wednesday, February 13, 2019 6:20:13 PM EET Remi Gauvin via rsync 
 wrote:
> Have you run the nifs-clean before checking this free space comparison?
>  Maybe there is just large amplification created by Rsyn's many small
> writes when using --inplace.

nilfs-clean is being suspended for the time of the backup. It would have idled 
if the fullness threshold of the FS (90% by default) have not been reached.

The problem is probably that these mysqldump files have changed data near the 
beginning of the files. Thus any later blocks have to be overwritten. In order 
to avoid this "rsync" would have to allocate and deallocate space in the 
middle of the file:

  http://man7.org/linux/man-pages/man2/fallocate.2.html

and unfortunately the respective syscalls are not portable, quite new and 
filesystem specific.

Would have been nice to have these for all OSes and filesystems though. And 
better yet not aligned on FS block size. E.g.:

  - give me 5 new blocks in the middle of file F starting at POS
  - do not use the entire last block of these 5 but rather only X bytes of it.

or
  - replace block 5 with "this" partial block data
  - truncate blocks 6 to 20

I can find a usage for them in many application workflows - from text editors 
trough databases to backup software ..


Cheers
--
Delian


-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: rsync rewrites all blocks of large files although it uses delta transfer

2019-02-13 Thread Remi Gauvin via rsync
On 2019-02-13 10:47 a.m., Delian Krustev via rsync wrote:
>
> 
> Free space at the beginning and end of the backup:
> Filesystem 1M-blocks   Used Available Use% Mounted on
> /dev/mapper/bkp   102392  76872 20400  80% /mnt/bkp
> /dev/mapper/bkp   102392  78768 18504  81% /mnt/bkp
> 
> 
> 
> As can be seen "rsync" has sent about 20M and received 300K of data. However 
> the filesystem has allocated almost 2G, which is the total size of the files 
> being backed up.
> 
> The filesystem mounted on "/mnt/bkp" is of type "nilfs2", which is a log 
> structured filesystem. I'm using its snapshotting feature to keep backups for 
> past dates.


Have you run the nifs-clean before checking this free space comparison?
 Maybe there is just large amplification created by Rsyn's many small
writes when using --inplace.

<>

signature.asc
Description: OpenPGP digital signature
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: rsync rewrites all blocks of large files although it uses delta transfer

2019-02-13 Thread Remi Gauvin via rsync
On 2019-02-13 5:26 p.m., Delian Krustev via rsync wrote:

> 
> The copy is needed for the comparison of the blocks as "--inplace" overwrites 
> the destination file. I've tried without "--backup" but then the delta 
> transfers too much data - close to the size of the backed-up files.
> 

It's cool that --backup can be used as source data that way, a feature
was unaware of.. but I think you found the cause of your problem right
here as well.

If the --inplace delta is as large as the filesize, then the
structure/location of the data has changed enough that the whole file
would have to be written out in any case.



<>

signature.asc
Description: OpenPGP digital signature
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: rsync rewrites all blocks of large files although it uses delta transfer

2019-02-13 Thread Kevin Korb via rsync
It can't do what you want.  The closest thing would be --compare-dest.

On 2/13/19 5:26 PM, Delian Krustev wrote:
> On Wednesday, February 13, 2019 11:29:44 AM EET Kevin Korb via rsync 
>  wrote:
>> With --backup in order to end up with 2 files it has to write out a
>> whole new file.
>> Sure, it only sent the differences (normally that means
>> over the network but there is no network here) but the writing end was
>> told to duplicate the file being updated before updating it.
> 
> The copy is needed for the comparison of the blocks as "--inplace" overwrites 
> the destination file. I've tried without "--backup" but then the delta 
> transfers too much data - close to the size of the backed-up files.
> 
> The copy is in a temp file system which is discarded after the backup (by "rm 
> -rf"). This temp filesystem is not log structured or copy-on-write so having 
> a 
> copy there is not a big problem. Although I don't want a backup of all files 
> which are modified but rather a TMPDIR.
> 
> The ideal workflow would be to compare SRC and DST and write changed blocks 
> to 
> the TMPDIR, then read them from TMPDIR and apply it to DST.
> 
> 
> 
>  
> Cheers
> --
> Delian
> 

-- 
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
Kevin Korb  Phone:(407) 252-6853
Systems Administrator   Internet:
FutureQuest, Inc.   ke...@futurequest.net  (work)
Orlando, Floridak...@sanitarium.net (personal)
Web page:   https://sanitarium.net/
PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,



signature.asc
Description: OpenPGP digital signature
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: rsync rewrites all blocks of large files although it uses delta transfer

2019-02-13 Thread Delian Krustev via rsync
On Wednesday, February 13, 2019 11:29:44 AM EET Kevin Korb via rsync 
 wrote:
> With --backup in order to end up with 2 files it has to write out a
> whole new file.
> Sure, it only sent the differences (normally that means
> over the network but there is no network here) but the writing end was
> told to duplicate the file being updated before updating it.

The copy is needed for the comparison of the blocks as "--inplace" overwrites 
the destination file. I've tried without "--backup" but then the delta 
transfers too much data - close to the size of the backed-up files.

The copy is in a temp file system which is discarded after the backup (by "rm 
-rf"). This temp filesystem is not log structured or copy-on-write so having a 
copy there is not a big problem. Although I don't want a backup of all files 
which are modified but rather a TMPDIR.

The ideal workflow would be to compare SRC and DST and write changed blocks to 
the TMPDIR, then read them from TMPDIR and apply it to DST.



 
Cheers
--
Delian



-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: rsync rewrites all blocks of large files although it uses delta transfer

2019-02-13 Thread Kevin Korb via rsync
With --backup in order to end up with 2 files it has to write out a
whole new file.  Sure, it only sent the differences (normally that means
over the network but there is no network here) but the writing end was
told to duplicate the file being updated before updating it.

On 2/13/19 10:47 AM, Delian Krustev via rsync wrote:
>   Hi All,
> 
> For a backup purpose I'm trying to transfer only the changed blocks of
> large files. Thus I've run "rsync" with the appropriate options:
> 
>   RSYNC_BKPDIR=`mktemp -d`
>   rsync \
>   --archive \
>   --no-whole-file \
>   --inplace \
>   --backup \
>   --backup-dir="$RSYNC_BKPDIR" \
>   --verbose \
>   --stats \
>   /var/backups/mysql-dbs/. \
>   /mnt/bkp/var/backups/mysql-dbs/.
> 
> The problem is that although "rsync" shows that delta transfer is used(when 
> run with -vv) and only small amount if data is transferred, the target files 
> look to be overwritten in full.
> 
> Here is the output of "rsync" and some more debugging info:
> 
> 
> 
> sending incremental file list
> ./
> horde.data.sql
> horde.schema.sql
> LARGEDB.data.sql
> LARGEDB.schema.sql
> mysql.data.sql
> mysql.schema.sql
> phpmyadmin.data.sql
> phpmyadmin.schema.sql
> 
> Number of files: 9 (reg: 8, dir: 1)
> Number of created files: 0
> Number of deleted files: 0
> Number of regular files transferred: 8
> Total file size: 1,944,522,704 bytes
> Total transferred file size: 1,944,522,704 bytes
> Literal data: 21,421,681 bytes
> Matched data: 1,923,101,023 bytes
> File list size: 0
> File list generation time: 0.001 seconds
> File list transfer time: 0.000 seconds
> Total bytes sent: 21,612,218
> Total bytes received: 323,302
> 
> sent 21,612,218 bytes  received 323,302 bytes  259,591.95 bytes/sec
> total size is 1,944,522,704  speedup is 88.65
> 
> # du -m 1.9G /tmp/tmp.8gBzjNQOQZ
> 1.9G /tmp/tmp.8gBzjNQOQZ
> 
> # tree -a /tmp/tmp.8gBzjNQOQZ
> /tmp/tmp.8gBzjNQOQZ
> ├── horde.data.sql
> ├── horde.schema.sql
> ├── LARGEDB.data.sql
> ├── LARGEDB.schema.sql
> ├── mysql.data.sql
> ├── mysql.schema.sql
> ├── phpmyadmin.data.sql
> └── phpmyadmin.schema.sql
> 
> 0 directories, 8 files
> 
> Free space at the beginning and end of the backup:
> Filesystem 1M-blocks   Used Available Use% Mounted on
> /dev/mapper/bkp   102392  76872 20400  80% /mnt/bkp
> /dev/mapper/bkp   102392  78768 18504  81% /mnt/bkp
> 
> 
> 
> As can be seen "rsync" has sent about 20M and received 300K of data. However 
> the filesystem has allocated almost 2G, which is the total size of the files 
> being backed up.
> 
> The filesystem mounted on "/mnt/bkp" is of type "nilfs2", which is a log 
> structured filesystem. I'm using its snapshotting feature to keep backups for 
> past dates.
> 
> 
> Is there anything that can be done in order "rsync" to overwrite only the 
> changed blocks ?
> 
> 
> 
> 
> P.S. I guess that it will be the same for copy-on-write filesystems, e.g. 
> BTRFS or ZFS.
> 
> 
> 
> Cheers
> --
> Delian
> 

-- 
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
Kevin Korb  Phone:(407) 252-6853
Systems Administrator   Internet:
FutureQuest, Inc.   ke...@futurequest.net  (work)
Orlando, Floridak...@sanitarium.net (personal)
Web page:   https://sanitarium.net/
PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,



signature.asc
Description: OpenPGP digital signature
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

rsync rewrites all blocks of large files although it uses delta transfer

2019-02-13 Thread Delian Krustev via rsync
  Hi All,

For a backup purpose I'm trying to transfer only the changed blocks of
large files. Thus I've run "rsync" with the appropriate options:

RSYNC_BKPDIR=`mktemp -d`
rsync \
--archive \
--no-whole-file \
--inplace \
--backup \
--backup-dir="$RSYNC_BKPDIR" \
--verbose \
--stats \
/var/backups/mysql-dbs/. \
/mnt/bkp/var/backups/mysql-dbs/.

The problem is that although "rsync" shows that delta transfer is used(when 
run with -vv) and only small amount if data is transferred, the target files 
look to be overwritten in full.

Here is the output of "rsync" and some more debugging info:



sending incremental file list
./
horde.data.sql
horde.schema.sql
LARGEDB.data.sql
LARGEDB.schema.sql
mysql.data.sql
mysql.schema.sql
phpmyadmin.data.sql
phpmyadmin.schema.sql

Number of files: 9 (reg: 8, dir: 1)
Number of created files: 0
Number of deleted files: 0
Number of regular files transferred: 8
Total file size: 1,944,522,704 bytes
Total transferred file size: 1,944,522,704 bytes
Literal data: 21,421,681 bytes
Matched data: 1,923,101,023 bytes
File list size: 0
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 21,612,218
Total bytes received: 323,302

sent 21,612,218 bytes  received 323,302 bytes  259,591.95 bytes/sec
total size is 1,944,522,704  speedup is 88.65

# du -m 1.9G /tmp/tmp.8gBzjNQOQZ
1.9G /tmp/tmp.8gBzjNQOQZ

# tree -a /tmp/tmp.8gBzjNQOQZ
/tmp/tmp.8gBzjNQOQZ
├── horde.data.sql
├── horde.schema.sql
├── LARGEDB.data.sql
├── LARGEDB.schema.sql
├── mysql.data.sql
├── mysql.schema.sql
├── phpmyadmin.data.sql
└── phpmyadmin.schema.sql

0 directories, 8 files

Free space at the beginning and end of the backup:
Filesystem 1M-blocks   Used Available Use% Mounted on
/dev/mapper/bkp   102392  76872 20400  80% /mnt/bkp
/dev/mapper/bkp   102392  78768 18504  81% /mnt/bkp



As can be seen "rsync" has sent about 20M and received 300K of data. However 
the filesystem has allocated almost 2G, which is the total size of the files 
being backed up.

The filesystem mounted on "/mnt/bkp" is of type "nilfs2", which is a log 
structured filesystem. I'm using its snapshotting feature to keep backups for 
past dates.


Is there anything that can be done in order "rsync" to overwrite only the 
changed blocks ?




P.S. I guess that it will be the same for copy-on-write filesystems, e.g. 
BTRFS or ZFS.



Cheers
--
Delian

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

[Bug 13645] Improve efficiency when resuming transfer of large files

2018-11-21 Thread just subscribed for rsync-qa from bugzilla via rsync
https://bugzilla.samba.org/show_bug.cgi?id=13645

--- Comment #4 from Rob Janssen  ---
Ok you apparently did not understand what I proposed.
However it is not that important as in our use case we can use --append.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

[Bug 13645] Improve efficiency when resuming transfer of large files

2018-11-20 Thread just subscribed for rsync-qa from bugzilla via rsync
https://bugzilla.samba.org/show_bug.cgi?id=13645

Wayne Davison  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |WONTFIX

--- Comment #3 from Wayne Davison  ---
Rsync is never going to assume that a file can be continued, as it doesn't know
what the old data is compared to the source. You can tell rsync to assume that
the early data is all fine by using --append, but that can cause you problems
if any non-new files need an update that is not an append.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: [Bug 13645] New: Improve efficiency when resuming transfer of large files

2018-10-12 Thread L A Walsh via rsync

If you are doing a local<-> local transfer, you are wasting time
with checksums.  You'll get faster performance with "--whole-file".

Why do you stop it at night when you could 'unlimit' the transfer speed?
Seems like when you aren't there would be best time to copy everything.

Doing checksums will cause a noticeable impact to local-file transfers.


On 10/5/2018 10:34 AM, just subscribed for rsync-qa from bugzilla via 
rsync wrote:

https://bugzilla.samba.org/show_bug.cgi?id=13645
When transferring large files over a slow network, ...
The command used is: rsync -av --inplace --bwlimit=400 hostname::module /dest

When restarting the transfer, a lot of time is "wasted" while first the local
system is reading the partially transferred file and sends the checksums to the 
remote, ...

  
Of course these optimizations (at least #2) may actually decrease performance

when the transfer is local (not over slow network) and the disk read rate is
negatively affected by reading at two different places in parallel.  So #2
should only be attempted when the transfer is over a network.
  

---
   Or might decrease performance on a fast network.  Not sure what you mean
by 'slow' 10Mb?  100Mb -- not sure w/o measuring if it is faster or 
slower to

do checksums, but I know at 1000Mb and 10Gb, checksums are prohibitively
expensive.

NOTE: you also might look at the protocol you use to do network transfers.
I.e. use rsync over a locally mounted disk to a locally mounted network 
share,

and make the network share a samba one.  That way you will get parallelism
automatically -- the file transfer cpu-time will happen inside of samba,
while the local file gathering will happen in rsync.

I regularly got ~ 119MB R/W over 1000Mb ethernet.  BTW, Any place I use a
power-of-2 unit like 'B' (Byte), I use the power-of-two base (1024) prefix,
but if I use a singular unit like 'b' (bit), then I use decimal prefixes.
Doing otherwise makes things hard to calculate and can introduce calculation
inaccuracies.


--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 13645] Improve efficiency when resuming transfer of large files

2018-10-05 Thread just subscribed for rsync-qa from bugzilla via rsync
https://bugzilla.samba.org/show_bug.cgi?id=13645

--- Comment #2 from Rob Janssen  ---
Thanks, that helps a lot for this particular use case.
(the files are backups)

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 13645] Improve efficiency when resuming transfer of large files

2018-10-05 Thread just subscribed for rsync-qa from bugzilla via rsync
https://bugzilla.samba.org/show_bug.cgi?id=13645

--- Comment #1 from Kevin Korb  ---
If you are sure the file has not been changed since it was partially copied,
see --append.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 13645] New: Improve efficiency when resuming transfer of large files

2018-10-05 Thread just subscribed for rsync-qa from bugzilla via rsync
https://bugzilla.samba.org/show_bug.cgi?id=13645

Bug ID: 13645
   Summary: Improve efficiency when resuming transfer of large
files
   Product: rsync
   Version: 3.0.9
  Hardware: All
OS: All
Status: NEW
  Severity: enhancement
  Priority: P5
 Component: core
  Assignee: way...@samba.org
  Reporter: pe1...@amsat.org
QA Contact: rsync...@samba.org

When transferring large files over a slow network, we interrupt rsync at the
beginning of business hours leaving the transfer unfinished.

The command used is: rsync -av --inplace --bwlimit=400 hostname::module /dest

When restarting the transfer, a lot of time is "wasted" while first the local
system is reading the partially transferred file and sends the checksums to the
remote, which only then starts to read the source file until it finds something
to transfer.  So nothing happens until 2 times the time required to read the
partial transfer from the disks!  When the partial file is many many GB, this
can take hours.

Suggestions:
1. when the source is larger than the destination, immediately begin to
transfer from the offset in the source equal to the size of the destination. 
it is already known that this part will have to be transferred.
2. try to do the reading of the partial file at the destination and the same
part of the source in parallel (so the time is halved), and preferably also in
parallel to 1.

Of course these optimizations (at least #2) may actually decrease performance
when the transfer is local (not over slow network) and the disk read rate is
negatively affected by reading at two different places in parallel.  So #2
should only be attempted when the transfer is over a network.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 13433] out_of_memory in receive_sums on large files

2018-05-19 Thread just subscribed for rsync-qa from bugzilla via rsync
https://bugzilla.samba.org/show_bug.cgi?id=13433

--- Comment #4 from Ben RUBSON  ---
util2.c:#define MALLOC_MAX 0x4000

Which is 1 GB.

1 GB / 40 bytes x 131072 bytes = 3276 GB,
which is then the maximum file size in protocol_version >= 30.

Did you try to increase MALLOC_MAX on sending side ?

Btw, would be interesting to know why MAX_BLOCK_SIZE has been limited to 128
KB.
rsync.h:#define MAX_BLOCK_SIZE ((int32)1 << 17)

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 13433] out_of_memory in receive_sums on large files

2018-05-16 Thread just subscribed for rsync-qa from bugzilla via rsync
https://bugzilla.samba.org/show_bug.cgi?id=13433

--- Comment #3 from Kevin Day  ---
Just adding --protocol=29 falls back to the older chunk generator code and
automatically selects 2MB chunks which is enough to at least make this work
without a malloc error.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 13433] out_of_memory in receive_sums on large files

2018-05-16 Thread just subscribed for rsync-qa from bugzilla via rsync
https://bugzilla.samba.org/show_bug.cgi?id=13433

--- Comment #2 from Kevin Day  ---
(In reply to Dave Gordon from comment #1)

It looks like that's no longer allowed?

rsync: --block-size=10485760 is too large (max: 131072)
rsync error: syntax or usage error (code 1) at main.c(1591) [client=3.1.3]


#define MAX_BLOCK_SIZE ((int32)1 << 17)

if (block_size > MAX_BLOCK_SIZE) {
snprintf(err_buf, sizeof err_buf,
 "--block-size=%lu is too large (max: %u)\n",
block_size, MAX_BLOCK_SIZE);
return 0;
}

OLD_MAX_BLOCK_SIZE is defined, but options.c would need to be patched to allow
looser block sizes if protocol_version < 30

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 13433] out_of_memory in receive_sums on large files

2018-05-16 Thread just subscribed for rsync-qa from bugzilla via rsync
https://bugzilla.samba.org/show_bug.cgi?id=13433

--- Comment #1 from Dave Gordon  ---
Maybe try --block-size=10485760 --protocol=29 as mentioned here:
https://bugzilla.samba.org/show_bug.cgi?id=10518#c8

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 13433] New: out_of_memory in receive_sums on large files

2018-05-11 Thread just subscribed for rsync-qa from bugzilla via rsync
https://bugzilla.samba.org/show_bug.cgi?id=13433

Bug ID: 13433
   Summary: out_of_memory in receive_sums on large files
   Product: rsync
   Version: 3.1.3
  Hardware: All
OS: All
Status: NEW
  Severity: normal
  Priority: P5
 Component: core
  Assignee: way...@samba.org
  Reporter: toa...@dragondata.com
QA Contact: rsync...@samba.org

I'm attempting to rsync a 4TB file. It fails with:

generating and sending sums for 0
count=33554432 rem=0 blength=131072 s2length=6 flength=4398046511104
chunk[0] offset=0 len=131072 sum1=8d15ed6f
chunk[1] offset=131072 len=131072 sum1=3d66e7f7
[omitted]
chunk[6550] offset=858521600 len=131072 sum1=d70deab6
chunk[6551] offset=858652672 len=131072 sum1=657e34df
send_files(0, /bay3/b.tc)
count=33554432 n=131072 rem=0
ERROR: out of memory in receive_sums [sender]
[sender] _exit_cleanup(code=22, file=util2.c, line=105): entered
rsync error: error allocating core memory buffers (code 22) at util2.c(105)
[sender=3.1.3]

This is getting called:

92  if (!(s->sums = new_array(struct sum_buf, s->count)))
93  out_of_memory("receive_sums");

And the size of a sum_buf(40 bytes) * the number of sums (33554432) exceeds
MALLOC_MAX.

How is this supposed to work/why is it breaking here, when I'm pretty sure I've
transferred files bigger than this before?

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


100% CPU freeze on read of large files with --sparse

2017-04-03 Thread Matthew Hall via rsync
Hello,

While restoring a large data backup which contained some big sparse-ish files, 
using rsync 3.1.1, (these were VMDK files to be precise), I found that adding 
the --sparse option can permanently wedge the rsync processes.

I performed a few basic checks during the time it happened (at one point I 
left it a few days so I suspect it can last more or less forever).

* strace didn't show any syscall activity, making me suspect it 
was blocked in userland

* kill and kill -9 could not stop the processes, which would imply it was 
blocked in kernel IO

* strace of the 100% processes did not display any syscall activity

* the processes refused to stop consuming 100% CPU, until the system was 
rebooted

* rebooting the system took forever on the all-process-kill timers

I wanted to see if anybody had seen similar behavior before, or if there is 
more I could do to diagnose the cause. It's the first time in many years of 
use I ever got any unexplaining behavior like this from rsync so I wasn't sure 
what I should check since it defied most typical debug tools. The behavior 
appeared to quit when --sparse was removed.

Sincerely,
Matthew.

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 8512] rsync is slower than cp -- Reduce overhead to cp for transfer of large files

2014-09-09 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=8512

--- Comment #5 from Peter van Hooft ho...@natlab.research.philips.com 
2014-09-09 07:49:13 UTC ---
We use rsync to copy data from one file server to another using NFS3 mounts
over a 10Gb link. We found that upping the buffer sizes (as a quick test)
increases performance. When using --sparse this increases performance with a
factor of fifty, from 2MBps to 100MBps.
% diff -u rsync.h-org rsync.h
--- rsync.h-org 2014-04-13 19:36:59.0 +0200
+++ rsync.h 2014-09-08 16:20:41.427973852 +0200
@@ -131,11 +131,11 @@

 #define RSYNC_PORT 873

-#define SPARSE_WRITE_SIZE (1024)
-#define WRITE_SIZE (32*1024)
-#define CHUNK_SIZE (32*1024)
+#define SPARSE_WRITE_SIZE (128*1024)
+#define WRITE_SIZE (128*1024)
+#define CHUNK_SIZE (128*1024)
 #define MAX_MAP_SIZE (256*1024)
-#define IO_BUFFER_SIZE (32*1024)
+#define IO_BUFFER_SIZE (128*1024)
 #define MAX_BLOCK_SIZE ((int32)1  17)

 /* For compatibility with older rsyncs */
% 

It sure would be nice if these sizes were `officially' increased.

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: [Bug 8512] rsync is slower than cp -- Reduce overhead to cp for transfer of large files

2014-09-09 Thread Dan Stromberg
Why not enable Jumbo Frames?  http://stromberg.dnsalias.org/~strombrg/jumbo.html

For NFS, you can use
http://stromberg.dnsalias.org/~strombrg/nfs-test.html to get some fast
settings.  The script could be modified to do CIFS I suppose.

HTH
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


rsync performance on large files strongly depends on file's (dis)similarity

2014-04-11 Thread Thomas Knauth
Hi list,

I've found this post on rsync's expected performance for large files:

https://lists.samba.org/archive/rsync/2007-January/017033.html

I have a related but different observation to share: with files in the
multi-gigabyte-range, I've noticed that rsync's runtime also depends
on how much the source/destination diverge, i.e., synchronization is
faster if the files are similar. However, this is not just because
less data must be transferred.

For example, on an 8 GiB file with 10% updates, rsync takes 390
seconds. With 50% updates, it takes about 1400 seconds, and at 90%
updates about 2400 seconds.

My current explanation, and it would be awesome if someone more
knowledgeable than me could confirm, is this: with very large files,
we'd expect a certain level of false alarms, i.e., weak checksum
matches, but strong checksum does not. However, with large files that
are very similar, a weak match is much more likely to be confirmed
with a matching strong checksum. Contrary, with large files that are
very dissimilar a weak match is much less likely to be confirmed with
a strong checksum, exactly because the files are very different from
each other. rsync ends up computing lots of strong checksums, which do
not result in a match.

Is this a valid/reasonable explanation? Can someone else confirm this
relationship between rsync's computational overhead and the file's
(dis)similarity?

Thanks,
Thomas.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync performance on large files strongly depends on file's (dis)similarity

2014-04-11 Thread Thomas Knauth
Maybe an alternative explanation is that a high degree of similarity
allows to skip more bytes on the sender. For each matched block, the
sender can does not need to compute any checksums, weak or strong, for
the next S bytes, where S is the block size.

As the number of matched blocks decreases, i.e., dissimilarity
increases, the number of computed checksums grows. This relationship
is especially apparent for large files, where many strong (and
expensive) checksum must be computed, due to many false alarms.

On Fri, Apr 11, 2014 at 1:35 PM, Thomas Knauth thomas.kna...@gmx.de wrote:
 Hi list,

 I've found this post on rsync's expected performance for large files:

 https://lists.samba.org/archive/rsync/2007-January/017033.html

 I have a related but different observation to share: with files in the
 multi-gigabyte-range, I've noticed that rsync's runtime also depends
 on how much the source/destination diverge, i.e., synchronization is
 faster if the files are similar. However, this is not just because
 less data must be transferred.

 For example, on an 8 GiB file with 10% updates, rsync takes 390
 seconds. With 50% updates, it takes about 1400 seconds, and at 90%
 updates about 2400 seconds.

 My current explanation, and it would be awesome if someone more
 knowledgeable than me could confirm, is this: with very large files,
 we'd expect a certain level of false alarms, i.e., weak checksum
 matches, but strong checksum does not. However, with large files that
 are very similar, a weak match is much more likely to be confirmed
 with a matching strong checksum. Contrary, with large files that are
 very dissimilar a weak match is much less likely to be confirmed with
 a strong checksum, exactly because the files are very different from
 each other. rsync ends up computing lots of strong checksums, which do
 not result in a match.

 Is this a valid/reasonable explanation? Can someone else confirm this
 relationship between rsync's computational overhead and the file's
 (dis)similarity?

 Thanks,
 Thomas.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 8512] rsync is slower than cp -- Reduce overhead to cp for transfer of large files

2013-11-17 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=8512

--- Comment #4 from John Wiegley jo...@newartisans.com 2013-11-17 09:02:51 
UTC ---
Let me add my voice to the mix here.  I'm copying a 1GB VOB file from an Ubuntu
ZFS server running Samba 4.1.1, to my Mac OS X 10.9 box.

iperf reports 112 MB/s (should be my theoretical maximum).
Copying with Path Finder over Samba: 99 MB/s.
Copying with rsync directly (using arcfour256): 92 MB/s.
Copying with dd over Samba: 67 MB/s.
Copying with cat over Samba (measured with pv): 69 MB/s.
Copying with rsync over Samba: 55 MB/s.

I'm using gigabit ethernet, obviously, with mtu set to 1500 and no TCP options
other than the following in smb.conf:

socket options = TCP_NODELAY SO_RCVBUF=131072 SO_SNDBUF=131072

These numbers are very stable over several runs, so I'm pretty curious now
about what's going on, especially with rsync.

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: [Bug 8512] rsync is slower than cp -- Reduce overhead to cp for transfer of large files

2013-11-17 Thread Charles Marcus

On 2013-11-17 4:02 AM, samba-b...@samba.org samba-b...@samba.org wrote:

I'm using gigabit ethernet, obviously, with mtu set to 1500 and no TCP options
other than the following in smb.conf:

 socket options = TCP_NODELAY SO_RCVBUF=131072 SO_SNDBUF=131072


First, remove these...

These options have been deprecated - and can CAUSE problems - for many, 
MANY years.


Next, since you had those still lingering around (I'm assuming from an 
ancient initial install, or from following some stupid long irrelevant 
$random_howto on the internet), go ask for help on the Samba support 
list evaluating your smb.conf settings and see if you have any other 
miscreants in there...


Then, if you are still having trouble, maybe its an rsync issue and come 
back here for more help...


--

Best regards,

*/Charles/*
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

[Bug 7195] timeout reached while sending checksums for very large files

2012-10-23 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=7195

--- Comment #2 from Loïc Gomez samba-b...@kyoshiro.org 2012-10-23 11:27:38 
UTC ---
I ran into a similar issue recently while transferring large files (40GB).
After a few tests, it seems - in my case at least - to be related to the
delta-xfer algorithm. The bug does not happen anymore with the -W option.

I don't know if this will resolve your issue, but you can also try looking into
these options : --no-checksum --no-compress --blocking-io. These were not the
source of my problems, but the functions they're related to might rise a
network timeout.

I hope it helps, anyways, good luck solving your issue.

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Question about --partial-dir and aborted transfers of large files

2012-08-12 Thread Wayne Davison
On Fri, Aug 10, 2012 at 9:03 AM, T.J. Crowder t...@crowdersoftware.comwrote:

 1. Am I correct in inferring that when rsync sees data for a file in the
 --partial-dir directory, it applies its delta transfer algorithm to the
 partial file?

2. And that this is _instead of_ applying it to the real target file? (Not
 a nifty three-way combination.)


Yes.  The current code behaves the same as if you had specified --partial
(as far as the next transfer goes), just without actually being destructive
of the destination file.

I have imagined making the code pretend that the partial file and any
destination file are concatenated together for the purpose of generating
checksums.  That would allow content references to both files, but rsync
would need to be enhanced to open both files in both the generator and the
receiver and be able to figure out what read goes where (which shouldn't be
too hard).  I'd suggest that the code read the partial file first, padding
out the end of its data to an even checksum-sized unit so that the
destination file starts on a even checksum boundary (so that the code never
needs to combine data from two files in a single checksum or copy
reference).

If so, it would appear that this means a large amount of unnecessary data
 may end up being transferred in the second sync of a large file if you
 interrupt the first sync.


It all depends on where you interrupt it and how much data matches in the
remaining portion of the destination file.  It does give you the option of
discarding the partial data if it is too short to be useful, or possibly
doing your own concatenation of the whole (or trailing portion) of the
destination file onto the partial file, should you want to tweak things
before resuming the transfer.

..wayne..
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Question about --partial-dir and aborted transfers of large files

2012-08-12 Thread Wayne Davison
On Sun, Aug 12, 2012 at 10:41 AM, Wayne Davison way...@samba.org wrote:

 I have imagined making the code pretend that the partial file and any
 destination file are concatenated together for the purpose of generating
 checksums.


Actually, that could be bad if the destination and partial file are both
huge.  What would be better would be to send just the size of the
destination file in checksums, but overlay the start of the destination's
data with the partial-file's data (and just ignore any partial-block from
the end of the partial file).

..wayne..
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Question about --partial-dir and aborted transfers of large files

2012-08-12 Thread T.J. Crowder
Hi,

Thanks for that!

On 12 August 2012 18:41, Wayne Davison way...@samba.org wrote:

 I have imagined making the code pretend that the partial file and any
 destination file are concatenated together for the purpose of generating
 checksums.  That would allow content references to both files, but rsync
 would need to be enhanced to open both files in both the generator and the
 receiver and be able to figure out what read goes where (which shouldn't be
 too hard).  I'd suggest that the code read the partial file first, padding
 out the end of its data to an even checksum-sized unit so that the
 destination file starts on a even checksum boundary (so that the code never
 needs to combine data from two files in a single checksum or copy
 reference).


So if I'm inspired and somehow magically find the time, it's at least
feasible.

I'm not seeing why the generator would need to be different, though; the
receiver would be doing the see-through magic (treating the partial as
though it were overlaid on the beginning of the target).


 If so, it would appear that this means a large amount of unnecessary data
 may end up being transferred in the second sync of a large file if you
 interrupt the first sync.


 It all depends on where you interrupt it and how much data matches in the
 remaining portion of the destination file.  It does give you the option of
 discarding the partial data if it is too short to be useful, or possibly
 doing your own concatenation of the whole (or trailing portion) of the
 destination file onto the partial file, should you want to tweak things
 before resuming the transfer.


Ah, yes, I _nearly_ got there, didn't I, with my boxing clever
workaround. If one knows one's in this situation, just append data from the
target file to the partial file to fill in the missing bits (e.g., if the
target is 100K and the partial is 20K, append the _last_ 80K of target to
partial), and when rsync runs it'll only send what it has to. A C program
to recursively walk a tree and do that on the selected partials where it
makes sense (e.g., my VM HDD files) and not to others (which might have
deletions or insertions) is probably 20-30 lines of code.

On 12 August 2012 19:08, Wayne Davison way...@samba.org wrote:

 On Sun, Aug 12, 2012 at 10:41 AM, Wayne Davison way...@samba.org wrote:

 I have imagined making the code pretend that the partial file and any
 destination file are concatenated together for the purpose of generating
 checksums.


 Actually, that could be bad if the destination and partial file are both
 huge.  What would be better would be to send just the size of the
 destination file in checksums, but overlay the start of the destination's
 data with the partial-file's data (and just ignore any partial-block from
 the end of the partial file).


Yes, I wasn't thinking concatenation, but more like what LVM and similar do
with snapshots: The partial file is a bunch of snapshot blocks with the
curious property of only being at the beginning of the file. So given a
file with 50K blocks, and a partial with 20K blocks, the code would view
the combined result as the first 20K blocks of the partial followed by the
subsequent 30K blocks from the target. (Hence my see through terminology
above.)

E.g., resorting to ASCII-art, the receiver code see a virtual file:

   +--+
   | partial file |
+--+   +--+ +--+
| virtual file |  +---| Blks 0-9K| |  target file |
+--+  | +-| Blks 10K-19K | +--+
| Blks 0-9K|--+ |  +--+ | Blks 0-9K|
| Blks 10K-19K |+   | Blks 10K-19K |
| Blks 20K-29K |---| Blks 20K-29K |
| Blks 30K-39K |---| Blks 30K-39K |
| Blks 40K-49K |---| Blks 40K-49K |
+--++--+

The receiver would perform checksums against that virtual file, and when
time to copy a block, if the block needs to be transferred, do that; if
not, grab it from the target file.

Again, this all really only applies in the simple case of files that are
nice, discrete blocks of data. Not knowing the delta algorithm, I have no
idea what would happen if the above were applied to a file that got (say)
5K of blocks deleted at the beginning followed by 1K blocks of inserted
data. The virtual file would appear to have duplicated data in that case,
which the delta algorithm would then have to get rid of / cope with. I
wouldn't be too surprised to find that it lead to inefficiency in other
types of files.

Thanks again,

-- T.J.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

rsync 3.0.7 intermittent failures with many large files

2010-05-24 Thread andrew . marlow
I have recently changed the version of rsync I use from the ancient 2.6.6 
to 3.0.7. Every since then it seems to me that I am getting more rsync 
failures than before. Hopefully, other people can share their experiences 
and point to the cause which, I acknowledge, might be me doing something 
wrong.

The rsync error long indicates things like:

rsync: writefd_unbuffered failed to write 4092 bytes to socket 
[generator]: 
Connection reset by peer (104)rsync: read error: Connection reset by peer 
(104)

rsync error: error in rsync protocol data stream (code 12) at io.c(1530) 
[generator=3.0.7]
rsync error: error in rsync protocol data stream (code 12) at io.c(760) 
[receiver=3.0.7]

I am using rsync to perform a distributed software release to around 200 
machines. Several hundred files are transferred. Some are quite large 
executables. Each time it is a different set of machines that fail. Out of 
200 it is 3 to 6 that fail. I have not had a run where all 200 work for 
quite some time. With the older version of rsync total success was the 
norm. We got a few failures such as the above every now and then.

I am not sure it is to do with moving to a more recent version of rsync. 
It might be to do with a flakey network. It's hard to say. But I did see 
an rsync discussion thread at 
http://serverfault.com/questions/27137/rsync-or-dfs-r-for-remote-backup-over-slow-links-of-windows-2003-r2-sp2.
 
This seems to be talking about the same kind of problems I have been 
having. The distribution is over a WAN, tranferring files from London to 
Geneva. I am using Windows-XP (SP2), with rsync built using the latest 
version of cygwin. I am also defining the environment variable CYGWIN to 
be NONTSEC to head off any ACL-related permission problems.

The thread I refer to makes me think that it might actually be a problem 
with rsync. Maybe rsync should watch out for temporary loss of network 
connectivity or slowness like you get in WANs sometimes. Maybe it should 
tolerate these sorts of errors subject to a maximum number of retries.

Regards,

Andrew Marlow

NOTE TO MODERATORS: I apologise for the length of the disclaimer that the 
emailer attaches. There is nothing I can do about it. Please feel free to 
delete it.

---

___
This e-mail may contain confidential and/or privileged information. If you are 
not the intended recipient (or have received this e-mail in error) please 
notify the sender immediately and delete this e-mail. Any unauthorised copying, 
disclosure or distribution of the material in this e-mail is prohibited.

Please refer to 
http://www.bnpparibas.co.uk/en/information/legal_information.asp?Code=ECAS-845C5H
  for additional disclosures.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

DO NOT REPLY [Bug 7195] New: timeout reached while sending checksums for very large files

2010-03-02 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=7195

   Summary: timeout reached while sending checksums for very large
files
   Product: rsync
   Version: 3.0.7
  Platform: All
OS/Version: All
Status: NEW
  Severity: minor
  Priority: P3
 Component: core
AssignedTo: way...@samba.org
ReportedBy: jan...@webgods.de
 QAContact: rsync...@samba.org


When I try to continue the upload of a very large file (400GB, 200GB already
transmitted) with --partial, rsync stops with an error after 10 minutes.
Verbosity shows, that during this time it has transmitted checksums for about
30G worth of data. Increasing the timeout with --timeout=10 helps. With
this, rsync reaches the point where it transmits new data.


-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug, or are watching the QA contact.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: retransfer fail of large files with inplace and broken pipe

2010-02-09 Thread Matt McCutchen
On Sun, 2009-12-13 at 07:21 +, tom raschel wrote:
 i have to tranfer large files each 5-100 GB (mo-fri) over dsl line. 
 unfortunately dsl lines are often not very stable and i got a broken pipe 
 error.
 (dsl lines are getting a new ip if they are broken or at least after a 
 reconnect every 24 hours)
 
 i had a script which detect the rsync error and restart the transmission.
 this means that if a file has transfered e.g. 80 % i start again from 
 beginning.
 
 using partial and partial-dir was no solution to resync because rsync cut the 
 original file (e.g. from 20 GB to 15 GB) which means that i have to transfer 
 the whole rest of 5 GB.

Indeed.  I entered an enhancement request to handle this situation
better:

https://bugzilla.samba.org/show_bug.cgi?id=7123

-- 
Matt

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


rsync taking a while on really large files

2010-01-15 Thread David Trammell
Can anyone suggest a good way to speed up rsync on really large files?  In 
particular, when I rsync the mail spool directory, I have a few users with 
inboxes over 1GB and up and it seems to take a very long time to just 
compare the files.  Maybe it would be faster to copy from scratch for files 
over a certain size or something if the time stamps don't match.


David


--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync taking a while on really large files

2010-01-15 Thread leen

On 01/15/2010 07:22 PM, David Trammell wrote:
Can anyone suggest a good way to speed up rsync on really large 
files?  In particular, when I rsync the mail spool directory, I have a 
few users with inboxes over 1GB and up and it seems to take a very 
long time to just compare the files.  Maybe it would be faster to copy 
from scratch for files over a certain size or something if the time 
stamps don't match.


David


rsync is meant to safe bandwidth, that's the main use of the tool. If 
you have enough bandwidth, rsync without options might not be how you 
want to use it.


Their is -W for whole files.

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync taking a while on really large files

2010-01-15 Thread David Trammell
I could use an ordinary copying script for the mail files, but I figured if 
rsync can do it in some more optimal way, I'll stick with it for simplicity 
(since it's working great for the several hundred gigs of user files).


I saw the -W option, but I wasn't sure about how it behaves as the man pages 
don't have many details, and I thought there might be other options I 
missed.  For -W the man page just says copy files whole (w/o delta-xfer 
algorithm)


Does that mean it will copy all files with no comparison, or does it at 
least verify that there is some change to the file first?  I suppose either 
way I can test it to see, which is faster, but if someone can clarify the 
behavior I'd appreciate it.


Thanks,
David


- Original Message - 
From: l...@consolejunky.net

To: rsync@lists.samba.org
Sent: Friday, January 15, 2010 12:40 PM
Subject: Re: rsync taking a while on really large files



On 01/15/2010 07:22 PM, David Trammell wrote:
Can anyone suggest a good way to speed up rsync on really large files? 
In particular, when I rsync the mail spool directory, I have a few users 
with inboxes over 1GB and up and it seems to take a very long time to 
just compare the files.  Maybe it would be faster to copy from scratch 
for files over a certain size or something if the time stamps don't 
match.


David


rsync is meant to safe bandwidth, that's the main use of the tool. If you 
have enough bandwidth, rsync without options might not be how you want to 
use it.


Their is -W for whole files.

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: 
https://lists.samba.org/mailman/listinfo/rsync

Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html



--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync taking a while on really large files

2010-01-15 Thread leen

On 01/15/2010 07:46 PM, David Trammell wrote:
I could use an ordinary copying script for the mail files, but I 
figured if rsync can do it in some more optimal way, I'll stick with 
it for simplicity (since it's working great for the several hundred 
gigs of user files).


I saw the -W option, but I wasn't sure about how it behaves as the man 
pages don't have many details, and I thought there might be other 
options I missed.  For -W the man page just says copy files whole 
(w/o delta-xfer algorithm)


Does that mean it will copy all files with no comparison, or does it 
at least verify that there is some change to the file first?  I 
suppose either way I can test it to see, which is faster, but if 
someone can clarify the behavior I'd appreciate it.


Thanks,
David



Without -W set it will check for file date/time and I think size.



- Original Message - From: l...@consolejunky.net
To: rsync@lists.samba.org
Sent: Friday, January 15, 2010 12:40 PM
Subject: Re: rsync taking a while on really large files



On 01/15/2010 07:22 PM, David Trammell wrote:
Can anyone suggest a good way to speed up rsync on really large 
files? In particular, when I rsync the mail spool directory, I have 
a few users with inboxes over 1GB and up and it seems to take a very 
long time to just compare the files.  Maybe it would be faster to 
copy from scratch for files over a certain size or something if the 
time stamps don't match.


David


rsync is meant to safe bandwidth, that's the main use of the tool. If 
you have enough bandwidth, rsync without options might not be how you 
want to use it.


Their is -W for whole files.

--
Please use reply-all for most replies to avoid omitting the mailing 
list.
To unsubscribe or change options: 
https://lists.samba.org/mailman/listinfo/rsync

Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html





--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync taking a while on really large files

2010-01-15 Thread Paul Slootman
On Fri 15 Jan 2010, David Trammell wrote:
 
 I saw the -W option, but I wasn't sure about how it behaves as the
 man pages don't have many details, and I thought there might be
 other options I missed.  For -W the man page just says copy files
 whole (w/o delta-xfer algorithm)

Take a moment to properly read the manpage, the above indicates you've
never gone beyond the option summary.
Further on, the options are discussed in detail:

-W, --whole-file
With this option rsync’s delta-transfer algorithm  is  not  used
and  the  whole file is sent as-is instead.  The transfer may be
faster if this option is used when  the  bandwidth  between  the
source  and destination machines is higher than the bandwidth to
disk  (especially  when  the  disk  is  actually  a  networked
filesystem).   This is the default when both the source and des‐
tination are specified as local paths, but  only  if  no  batch-
writing option is in effect.


Paul
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Rsync performance with very large files

2010-01-08 Thread Eric Cron
We're having a performance issue when attempting to rsync a very large file. 
Transfer rate is only 1.5MB/sec.  My issue looks very similar to this one: 
 
http://www.mail-archive.com/rsync@lists.samba.org/msg17812.html  
 
In that thread, a  'dynamic_hash.diff' patch was developed to work around this 
issue. I applied the 'dynamic_hash' patch included in the 2.6.7 src, but it 
didn't help.   
 
We are trying to evaluate the possibility of using rsync as an
alternative to IBM's FlashCopy, which only works within the storage
pool controlled by our San Volume Controller.

Some details about our test environment:
 
- Sender and Receiver are both POWER6 servers running AIX 5.3 
- Fiber attached disk, DS8300  storage 
- Gigabit network (Hypervisor Virtual I/O) 
- Test file is 232GB 
- I've tried rsync version 3.0.7 (vanilla) and 2.6.7 with the dynamic_hash.diff 
patch, both compiled with IBM's xlc compiler.  
Same behavior with both versions.
- It takes approx 1.5 hours to 'consider' the file before transfers begin, no 
big deal... 
- Once the changes are being sent, the rate is only 1.5MB/sec 
- Nothing is using either the source or destination files, only rsync (these 
are test servers.) 
- Both servers appear healthy, no CPU or memory problems.  
 
Just hoping somebody might have some insight.  The thread I linked above didn't 
have any info indicating success or failure of the patch - the original poster 
didn't provide any feedback.   
 
Eric Cron 
ericc...@yahoo.com


  
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Rsync performance with very large files

2010-01-08 Thread Carlos Carvalho
Eric Cron (ericc...@yahoo.com) wrote on 8 January 2010 12:20:
 We're having a performance issue when attempting to rsync a very large file. 
 Transfer rate is only 1.5MB/sec.  My issue looks very similar to this one: 
  
 http://www.mail-archive.com/rsync@lists.samba.org/msg17812.html  
  
 In that thread, a  'dynamic_hash.diff' patch was developed to work around 
 this issue. I applied the 'dynamic_hash' patch included in the 2.6.7 src, but 
 it didn't help.   

That's what I'd expect.

 We are trying to evaluate the possibility of using rsync as an
 alternative to IBM's FlashCopy, which only works within the storage
 pool controlled by our San Volume Controller.
 
 Some details about our test environment:
  
 - Sender and Receiver are both POWER6 servers running AIX 5.3 
 - Fiber attached disk, DS8300  storage 
 - Gigabit network (Hypervisor Virtual I/O) 
 - Test file is 232GB 
 - I've tried rsync version 3.0.7 (vanilla) and 2.6.7 with the 
 dynamic_hash.diff patch, both compiled with IBM's xlc compiler.  
 Same behavior with both versions.

Yes. v3 has better hashing but it's rarely the bottleneck.

 - It takes approx 1.5 hours to 'consider' the file before transfers begin, no 
 big deal... 

Reasonable. It's likely not considering, it's reading the file on
the destination. At a rate of 40MB/s it takes about 1.5h to read
232GB.

 - Once the changes are being sent, the rate is only 1.5MB/sec 

Likely limited by the origin reading the file, if there are few
changes.

rsync is designed to reduce net traffic, and this usually costs more
local I/O. The destination machine first reads the entire file and
sends checksums to the origin, which (only) then reads the entire file
and (meanwhile) sends the differences to the destination. So the total
time is at least destination-reading + source-reading. In your case
you have a net that is about as fast as local I/O. If the destination
can write roughly as fast as the origin can read, you're better off
just copying the entire file. This will save you about 40%-50% in
total time, since you then do the destination and source operations in
parallel.

You can speed up rsync with --whole-file, which will do exactly the
above.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: retransfer fail of large files with inplace and broken pipe

2009-12-15 Thread Tom
OK,

have tried now --inplace with --backup option but syncing the files does 
consume much more time than a normal rsync process,
so this is not a reliable solution.

Thx
Tom


Tom rasc...@edvantice.de schrieb im Newsbeitrag 
news:hg7dg6$em...@ger.gmane.org...
 Hi,

 retransfer of large fail with inplace after a broken pipe is working now.
 (thx again to wayne)

 but it is much more slow as if a normal rsync job.

 I have read that setting the --backup option could help. (have not tried 
 it
 yet)

 But --backup option would halve the space, which is not desirable.

 Is there a way to tell rsync to delete the --backup file after an 
 successful
 sync.

 thx
 Tom


 tom raschel rasc...@edvantice.de schrieb im Newsbeitrag 
 news:loom.20091213t075221-...@post.gmane.org...
 Hi,

 i have to tranfer large files each 5-100 GB (mo-fri) over dsl line.
 unfortunately dsl lines are often not very stable and i got a broken pipe 
 error.
 (dsl lines are getting a new ip if they are broken or at least after a
 reconnect every 24 hours)

 i had a script which detect the rsync error and restart the transmission.
 this means that if a file has transfered e.g. 80 % i start again from 
 beginning.

 using partial and partial-dir was no solution to resync because rsync cut 
 the
 original file (e.g. from 20 GB to 15 GB) which means that i have to 
 transfer
 the whole rest of 5 GB.

 so i had a look at --inplace which I thougt could do the trick, but 
 inplace is
 updating the timestamp and if the script start a retransfer after a 
 broken pipe
 it fails because the --inplace file is newer than the original file of 
 the
 sender.

 using ignore-times could be a solution but slow down the whole process to 
 much.

 is there a option to tell rsync not to change the time of a --inplace
 transfered file, or maybe preserve the mtime and do a comparison of mtime
 instead of ctime.

 Thx
 Tom



 -- 
 Please use reply-all for most replies to avoid omitting the mailing list.
 To unsubscribe or change options: 
 https://lists.samba.org/mailman/listinfo/rsync
 Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html




 -- 
 Please use reply-all for most replies to avoid omitting the mailing list.
 To unsubscribe or change options: 
 https://lists.samba.org/mailman/listinfo/rsync
 Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
 



-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: retransfer fail of large files with inplace and broken pipe

2009-12-14 Thread Tom
Hi,

retransfer of large fail with inplace after a broken pipe is working now.
(thx again to wayne)

but it is much more slow as if a normal rsync job.

I have read that setting the --backup option could help. (have not tried it
yet)

But --backup option would halve the space, which is not desirable.

Is there a way to tell rsync to delete the --backup file after an successful
sync.

thx
Tom


tom raschel rasc...@edvantice.de schrieb im Newsbeitrag 
news:loom.20091213t075221-...@post.gmane.org...
 Hi,

 i have to tranfer large files each 5-100 GB (mo-fri) over dsl line.
 unfortunately dsl lines are often not very stable and i got a broken pipe 
 error.
 (dsl lines are getting a new ip if they are broken or at least after a
 reconnect every 24 hours)

 i had a script which detect the rsync error and restart the transmission.
 this means that if a file has transfered e.g. 80 % i start again from 
 beginning.

 using partial and partial-dir was no solution to resync because rsync cut 
 the
 original file (e.g. from 20 GB to 15 GB) which means that i have to 
 transfer
 the whole rest of 5 GB.

 so i had a look at --inplace which I thougt could do the trick, but 
 inplace is
 updating the timestamp and if the script start a retransfer after a broken 
 pipe
 it fails because the --inplace file is newer than the original file of the
 sender.

 using ignore-times could be a solution but slow down the whole process to 
 much.

 is there a option to tell rsync not to change the time of a --inplace
 transfered file, or maybe preserve the mtime and do a comparison of mtime
 instead of ctime.

 Thx
 Tom



 -- 
 Please use reply-all for most replies to avoid omitting the mailing list.
 To unsubscribe or change options: 
 https://lists.samba.org/mailman/listinfo/rsync
 Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
 



-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


RE: retransfer fail of large files with inplace and broken pipe

2009-12-13 Thread Tony Abernethy
Tom wrote:
 to make things more clear
 
 1.)
 first transfer is done either a initial setup or with a usb hdd to get
 sender and receiver in sync.
 
 2.)
 transfer does not stop because rsync had a timeout, it stops because
 the dsl
 line is broken (which i could see at dyndns)
 
 3)
 if dsl line is stable the transfer is successfull (which works
 furtunately
 most of the time)
 
 4.)
 i am searching for a way to reduce the time to retransfer the file or
 in
 other words to resume the filetransfer after a broken pipe
 (e.g. if you download a 4.4 GB Centos Image it is comfortable to resume
 the
 transfer of a 99 % transfered file instead to download all from
 scratch)
 
 Tom


But you already have 100% of the image, only it is an older version of the 
image.
The only thing I've found that works (and this is ONLY on something UNIXy) is
to monitor the temporary file on the target and if it is big enough, rename it
to the intended target file before the target rsync destroys it. 
For real disasters, you can attempt to automate this process.


  First: Transfer or re-transfer. I think, particularly with bad
  connections,
  you need to treat those VERY differently.
 
  For the initial transfer, --partial should help.
 
  For retransfers, where stuff in the middle has changed, I would
 expect the
  necessary state information to exist ONLY in the two running
 processes,
  and
  that information is lost if the connection goes down.
  This includes the connection dying because both sides are going
 through
  the
  file and have nothing worthwhile to say to each other.
 
  As usual, flames invited if I've got any of this wrong.
  --
  Please use reply-all for most replies to avoid omitting the mailing
 list.
  To unsubscribe or change options:
  https://lists.samba.org/mailman/listinfo/rsync
  Before posting, read: http://www.catb.org/~esr/faqs/smart-
 questions.html
 
 
 
 
 --
 Please use reply-all for most replies to avoid omitting the mailing
 list.
 To unsubscribe or change options:
 https://lists.samba.org/mailman/listinfo/rsync
 Before posting, read: http://www.catb.org/~esr/faqs/smart-
 questions.html
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: retransfer fail of large files with inplace and broken pipe

2009-12-13 Thread Wayne Davison
On Sat, Dec 12, 2009 at 11:21 PM, tom raschel rasc...@edvantice.de wrote:

 so i had a look at --inplace which I thougt could do the trick, but inplace
 is updating the timestamp and if the script start a retransfer after a
 broken pipe it fails because the --inplace file is newer than the original
 file of the sender.


Are you using --update (-u)?  If so, turn that off.  If not, rsync won't
skip a file that is newer, so something else is afoot.

..wayne..
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: retransfer fail of large files with inplace and broken pipe

2009-12-13 Thread Tom
Thx to all,

it was the -u option which prevents rsync to resume the file.

Tom

Tony Abernethy t...@servasoftware.com schrieb im Newsbeitrag 
news:af5ef1769d564645a9acc947375f0d021567087...@winxbeus13.exchange.xchg...
 Tom wrote:
 to make things more clear

 1.)
 first transfer is done either a initial setup or with a usb hdd to get
 sender and receiver in sync.

 2.)
 transfer does not stop because rsync had a timeout, it stops because
 the dsl
 line is broken (which i could see at dyndns)

 3)
 if dsl line is stable the transfer is successfull (which works
 furtunately
 most of the time)

 4.)
 i am searching for a way to reduce the time to retransfer the file or
 in
 other words to resume the filetransfer after a broken pipe
 (e.g. if you download a 4.4 GB Centos Image it is comfortable to resume
 the
 transfer of a 99 % transfered file instead to download all from
 scratch)

 Tom


 But you already have 100% of the image, only it is an older version of the 
 image.
 The only thing I've found that works (and this is ONLY on something UNIXy) 
 is
 to monitor the temporary file on the target and if it is big enough, 
 rename it
 to the intended target file before the target rsync destroys it.
 For real disasters, you can attempt to automate this process.


  First: Transfer or re-transfer. I think, particularly with bad
  connections,
  you need to treat those VERY differently.
 
  For the initial transfer, --partial should help.
 
  For retransfers, where stuff in the middle has changed, I would
 expect the
  necessary state information to exist ONLY in the two running
 processes,
  and
  that information is lost if the connection goes down.
  This includes the connection dying because both sides are going
 through
  the
  file and have nothing worthwhile to say to each other.
 
  As usual, flames invited if I've got any of this wrong.
  --
  Please use reply-all for most replies to avoid omitting the mailing
 list.
  To unsubscribe or change options:
  https://lists.samba.org/mailman/listinfo/rsync
  Before posting, read: http://www.catb.org/~esr/faqs/smart-
 questions.html
 



 --
 Please use reply-all for most replies to avoid omitting the mailing
 list.
 To unsubscribe or change options:
 https://lists.samba.org/mailman/listinfo/rsync
 Before posting, read: http://www.catb.org/~esr/faqs/smart-
 questions.html
 -- 
 Please use reply-all for most replies to avoid omitting the mailing list.
 To unsubscribe or change options: 
 https://lists.samba.org/mailman/listinfo/rsync
 Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
 



-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


retransfer fail of large files with inplace and broken pipe

2009-12-12 Thread tom raschel
Hi,

i have to tranfer large files each 5-100 GB (mo-fri) over dsl line. 
unfortunately dsl lines are often not very stable and i got a broken pipe error.
(dsl lines are getting a new ip if they are broken or at least after a 
reconnect every 24 hours)

i had a script which detect the rsync error and restart the transmission.
this means that if a file has transfered e.g. 80 % i start again from beginning.

using partial and partial-dir was no solution to resync because rsync cut the 
original file (e.g. from 20 GB to 15 GB) which means that i have to transfer 
the whole rest of 5 GB.

so i had a look at --inplace which I thougt could do the trick, but inplace is 
updating the timestamp and if the script start a retransfer after a broken pipe
it fails because the --inplace file is newer than the original file of the 
sender.

using ignore-times could be a solution but slow down the whole process to much.

is there a option to tell rsync not to change the time of a --inplace 
transfered file, or maybe preserve the mtime and do a comparison of mtime 
instead of ctime.

Thx
Tom



-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


RE: retransfer fail of large files with inplace and broken pipe

2009-12-12 Thread Tony Abernethy
tom raschel wrote:
 Hi,
 
 i have to tranfer large files each 5-100 GB (mo-fri) over dsl line.
 unfortunately dsl lines are often not very stable and i got a broken
 pipe error.
 (dsl lines are getting a new ip if they are broken or at least after a
 reconnect every 24 hours)
 
 i had a script which detect the rsync error and restart the
 transmission.
 this means that if a file has transfered e.g. 80 % i start again from
 beginning.
 
 using partial and partial-dir was no solution to resync because rsync
 cut the
 original file (e.g. from 20 GB to 15 GB) which means that i have to
 transfer
 the whole rest of 5 GB.
 
 so i had a look at --inplace which I thougt could do the trick, but
 inplace is
 updating the timestamp and if the script start a retransfer after a
 broken pipe
 it fails because the --inplace file is newer than the original file of
 the
 sender.
 
 using ignore-times could be a solution but slow down the whole process
 to much.
 
 is there a option to tell rsync not to change the time of a --inplace
 transfered file, or maybe preserve the mtime and do a comparison of
 mtime
 instead of ctime.
 
 Thx
 Tom
 
 
 
 --
 Please use reply-all for most replies to avoid omitting the mailing
 list.
 To unsubscribe or change options:
 https://lists.samba.org/mailman/listinfo/rsync
 Before posting, read: http://www.catb.org/~esr/faqs/smart-
 questions.html


First: Transfer or re-transfer. I think, particularly with bad connections,
you need to treat those VERY differently.

For the initial transfer, --partial should help.

For retransfers, where stuff in the middle has changed, I would expect the 
necessary state information to exist ONLY in the two running processes, and
that information is lost if the connection goes down.
This includes the connection dying because both sides are going through the 
file and have nothing worthwhile to say to each other.

As usual, flames invited if I've got any of this wrong.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: retransfer fail of large files with inplace and broken pipe

2009-12-12 Thread Tom

to make things more clear

1.)
first transfer is done either a initial setup or with a usb hdd to get 
sender and receiver in sync.

2.)
transfer does not stop because rsync had a timeout, it stops because the dsl 
line is broken (which i could see at dyndns)

3)
if dsl line is stable the transfer is successfull (which works furtunately 
most of the time)

4.)
i am searching for a way to reduce the time to retransfer the file or in 
other words to resume the filetransfer after a broken pipe
(e.g. if you download a 4.4 GB Centos Image it is comfortable to resume the 
transfer of a 99 % transfered file instead to download all from scratch)

Tom
 First: Transfer or re-transfer. I think, particularly with bad 
 connections,
 you need to treat those VERY differently.

 For the initial transfer, --partial should help.

 For retransfers, where stuff in the middle has changed, I would expect the
 necessary state information to exist ONLY in the two running processes, 
 and
 that information is lost if the connection goes down.
 This includes the connection dying because both sides are going through 
 the
 file and have nothing worthwhile to say to each other.

 As usual, flames invited if I've got any of this wrong.
 -- 
 Please use reply-all for most replies to avoid omitting the mailing list.
 To unsubscribe or change options: 
 https://lists.samba.org/mailman/listinfo/rsync
 Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
 



-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync algorithm for large files

2009-09-05 Thread Shachar Shemesh

ehar...@lyricsemiconductors.com wrote:


I thought rsync, would calculate checksums of large files that have 
changed timestamps or filesizes, and send only the chunks which 
changed.  Is this not correct?  My goal is to come up with a 
reasonable (fast and efficient) way for me to daily incrementally 
backup my Parallels virtual machine (a directory structure containing 
mostly small files, and one 20G file)


 

I’m on OSX 10.5, using rsync 2.6.9, and the destination machine has 
the same versions.  I configured ssh keys, and this is my result:



Upgrade to rsync 3 at least.

Rsync keeps a hash of the blocks of sliding hashes. For older versions 
of rsync, the has was of a constant size. This meant that files over 3GB 
in size had a high chance of hash collisions. For a 20G file, the 
collisions alone might be the cause of your trouble.


Newer rsyncs detect when the hash gets too big, and increase the has 
size accordingly, thus avoiding the collisions.


In other words - upgrade both sides (but specifically the sender).

Shachar

--
Shachar Shemesh
Lingnu Open Source Consulting Ltd.
http://www.lingnu.com

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

RE: rsync algorithm for large files

2009-09-05 Thread eharvey
Yup, by doing --inplace, I got down from 30 mins to 24 mins...  So that's
slightly better than resending the whole file again.



However, this doesn't really do what I was hoping to do.  Perhaps it can't
be done, or somebody would like to recommend some other product that is more
well suited for my purposes?



If I could describe ideally exactly what I'm trying to do, it would be ...

· During initial send, calculate checksums on the fly, down to some
blocksize (perhaps 1Mb), and store the checksums for later use.

· On subsequent sends, just read the source and compare checksums
against previously saved values, and only send the blocks needed.  In worst
case, all blocks have changed, and the time to send is very nearly equal to
the initial send.

· The runtime for subsequent runs should never significantly exceed
the runtime of the initial.  Because the goal is to gain something over
brainless delete-and-overwrite.

· The runtime for subsequent runs should be on the same order of
magnitude of:

oWhichever is greater:

oCalculate the checksums of the source
or

oSend the changed blocks



In my specific situation, 33mins for the initial send of 20G across 100Mbit
lan, my subsequent run should be approx 11mins, because that’s how long it
takes for me to md5 the whole tree.



Thanks again for any assistance…
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

rsync algorithm for large files

2009-09-04 Thread eharvey
I thought rsync, would calculate checksums of large files that have changed
timestamps or filesizes, and send only the chunks which changed.  Is this
not correct?  My goal is to come up with a reasonable (fast and efficient)
way for me to daily incrementally backup my Parallels virtual machine (a
directory structure containing mostly small files, and one 20G file)



I’m on OSX 10.5, using rsync 2.6.9, and the destination machine has the same
versions.  I configured ssh keys, and this is my result:



(Initial sync)

time rsync -a --delete MyVirtualMachine/ myserver:MyVirtualMachine/

20G

~30minutes



(Second time I ran it, with no changes to the VM)

time rsync -a --delete MyVirtualMachine/ myserver:MyVirtualMachine/

2 seconds



(Then I made some minor changes inside the VM, and I want to send just the
changed blocks)

time rsync -a --delete MyVirtualMachine/ myserver:MyVirtualMachine/

After waiting 50 minutes, I cancelled the job.



Why does it take longer the 3rd time I run it?  Shouldn’t the performance
always be **at least** as good as the initial sync?



Thanks for any help…
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: rsync algorithm for large files

2009-09-04 Thread Matthias Schniedermeyer
On 04.09.2009 18:00, ehar...@lyricsemiconductors.com wrote:
 
 Why does it take longer the 3rd time I run it?  Shouldn?t the performance
 always be **at least** as good as the initial sync?

Not per se.

First you have to determine THAT the file has changed, then the file is 
synced if there was a change. At least that's what you have to do when 
the file-size is unchanged and only the timestamp is differs.
(Which is unfortunatly often the case for Virtual Machine Images)

Worst case: Takes double the time if the change is at end of the file.

When the filesize differs rsync immediatly knows that the file has 
actual changes and starts the sync right away.

If i understand '--ignore-times' correctly it forces rsync to always 
regard the files as changed and so start a sync right away, without 
first checking for changes.


There are also some other options that may or may not have a speed 
impact for you:
--inplace, so that rsync doesn't create a tmp-copy that is later moved over 
the previous file on the target-site.
--whole-file, so that rsync doesn't use delta-transfer but rather copies 
the whole file.

Also you may to separate the small from the large files with:
--min-size
--max-size
So you can use different options for the small/large file(s).




Bis denn

-- 
Real Programmers consider what you see is what you get to be just as 
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a you asked for it, you got it text editor -- complicated, 
cryptic, powerful, unforgiving, dangerous.

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync algorithm for large files

2009-09-04 Thread Carlos Carvalho
Matthias Schniedermeyer (m...@citd.de) wrote on 5 September 2009 00:34:
 On 04.09.2009 18:00, ehar...@lyricsemiconductors.com wrote:
  
  Why does it take longer the 3rd time I run it?  Shouldn?t the performance
  always be **at least** as good as the initial sync?
 
 Not per se.
 
 First you have to determine THAT the file has changed, then the file is 
 synced if there was a change. At least that's what you have to do when 
 the file-size is unchanged and only the timestamp is differs.
 (Which is unfortunatly often the case for Virtual Machine Images)
 
 Worst case: Takes double the time if the change is at end of the file.

No, rsync assumes that the file has changed if either the size or the
timestamp differs, and syncs it immediately.

For a new file transfer it's read once in the source and written once
in the destination. For an update it's still read once in the source
but read twice and written once in the destination, no matter how many
or extensive the changes are. The source also has to do the slidding
checksumming. This is usually faster than reading the file, so it'll
only slow down the process if the source is very slow or the cpu is
busy with other tasks. OTOH, the IO on the destination is
significantly higher for big files; this is often the cause of a
slower transfer rate than a full copy.

 There are also some other options that may or may not have a speed 
 impact for you:
 --inplace, so that rsync doesn't create a tmp-copy that is later moved over 
 the previous file on the target-site.

Yes, this is useful because it avoids both a second reading and the
full write on the destination (in priciple; I didn't bother to check
the actual implementation). For large files with small changes this
option is probably the best. The problem is that if the update aborts
for any reason you lose your backup. One might want to keep at least
two days of backups in this case.

 --whole-file, so that rsync doesn't use delta-transfer but rather copies 
 the whole file.

Yes but causes a lot of net traffic. He mentions an average transfer
rate of about 11MB/s, so for a 100Mb/s net whole-file is probably not
suitable. If however he has a free gigabit link it'll be the best if
--inplace is not acceptable.

 Also you may to separate the small from the large files with:
 --min-size
 --max-size
 So you can use different options for the small/large file(s).

Agreed.

I'd also suggest using rsync v3 because it limits the blocksize.
Previous versions will use quite a large block for big files and if
changes are scattered it'll transfer much more than v3.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


DO NOT REPLY [Bug 5482] look into a hierarchical checksum algorithm that would help to efficiently transmit really large files

2008-05-25 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=5482


[EMAIL PROTECTED] changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
Summary|apply the rsync comparison  |look into a hierarchical
   |algorithm specially to .mov |checksum algorithm that
   |and .mp4 files  |would help to efficiently
   ||transmit really large files




--- Comment #6 from [EMAIL PROTECTED]  2008-05-25 09:31 CST ---
I also think coding a special transfer algorithm for media files is not
something that would be appropriate for rsync (though someone may want to code
their own custom version if this is important to them).

A general hierarchical checksum would be quite interesting, but we'd need to
balance the costs of disk I/O and memory use, and avoid round-trip latency. 
Rsync deals with latency by pipe-lining the checksum generation and file
reconstruction in separate processes, but each one requires a separate read of
the basis file.  So, we'd want to make rsync still able to process more than
one active transfer and not do a read-read for each sub-level of checksum
generation.

If rsync were made to send a first level set of checksums for each file,
computing a certain number of sub-level checksums into memory, that might work
for the generator at the expense of holding the hierarchical sums in memory for
N active files at once (perhaps topping out after a certain selectable memory
limit was reached).

The biggest unknown to me would be how to make the sender work efficiently for
such an algorithm.  The sender is the part that must compute a new checksum on
each character boundary (using a rolling checksum) to find relocation matches
for each block.  Computing a rolling checksum using the current method would
require holding huge amount of the file in memory at once (but that could be
changed to use mem-mapped files, potentially triggering re-reads as the window
slides forward).  We'd also need to move away from the current bifurcated
checksum algorithm into a single, stronger rolling checksum to avoid having to
recompute the strong checksum over a huge amount of data to double-check a
match.  Also, each successive level of transfer would likely trigger a re-read
of the part of the file that didn't match anything as we divide it into smaller
checksums.

We'd also want to limit how hierarchical the checksums are based on file size,
as smaller files just need the current method, while really large files might
benefit from several levels.

We may want to even consider a reversal of the jobs of the sender and generator
to lighten the load on the server.  This would slow things down latency-wise,
but could make it possible for a server to deal with more clients, especially
if this makes the per-byte processing more intensive.

Sounds like an intriguing idea to investigate further and see how feasible it
would be.  Comments?  Jamie, do you have code to share?


-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug, or are watching the QA contact.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Thought on large files

2008-01-30 Thread Brendan Grieve




Matt McCutchen wrote:

  On Thu, 2008-01-24 at 13:54 +0900, Brendan Grieve wrote:
  
  
I had a look at rdiff-backup, but I was trying to get something that
spoke native rsync (IE, not to force any change on the client side).

  
  
To achieve this, you can have the client push to an rsync daemon and
then have the daemon call rdiff-backup so that the rdiff-backup part
happens entirely on the server.  The idea is the same as the
daemon-and-rsnapshot setup I described in the following message, but
with rdiff-backup in place of rsnapshot as the backend:

http://lists.samba.org/archive/rsync/2007-December/019470.html

  
  
After some thought I think the best place to put such a change would
be at the filesystem level. For example, if one had a FUSE filesystem
that simply ran on top of an existing one, that wrote its files as I
described (or uses diff-like methods), but presents a clean filesystem
for rsync (or indeed any tool) to make use of. I believe I may look in
that direction instead of hacking rsync.

  
  
You could do that, but note that the rsync receiver won't explicitly
tell the filesystem what files are similar, so you'll have to either
keep a big hashtable to help you coalesce identical blocks globally or
use some kludge like looking at what other files the receiver has open
while it is writing the destination file.

Matt

  


I spent some time playing around with rdiff-backup as mentioned, but it
turns to be not as efficient as I had hoped. Also it makes it quite
tough to restore a single file or have a method of viewing a snapshot
of a system at a particular time.

I spent a bit of time hacking at a FUSE system, and actually I think
this method would work very well. Essentially one would mount a very
thin 'fuse' layer onto a standard list of files. This layer would
basically just pass through all commands (read/write/dirread/stat
etc..). However, it would divide up files larger than say 25Mb into
25Mb chunks, and intelligently store them. Whatever is accessing the
top layer system (rsync in this case, but it could be samba, or a web
share or anything) would see the files as they should be, but under
neath the files are stored in these chunks.
When rsync (or anything) needs to read or write a file, it would work
out which raw file to pull the data from the seek offset.

Then, say once a day, we create a hard-link copy of the files. Normally
one would copy the 'top layer' files to a snapshot directory, but in
this case you would copy the 'bottom layer' (IE the raw data) to a
snapshot directory. This snapshot directory could itself also be
covered by a 'blockfs' style FUSE layer so a person can browse it and
see the files as they should.

Hope this makes sense. I've hacked together a test 'blockfs' fuse
layer, and will run some more tests on some multi-gigabyte files, clean
it up and put it up if anyone is interested. I'm amazed at how easy it
is to program a FUSE layer.


Brendan Grieve




-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Thought on large files

2008-01-23 Thread Brendan Grieve




Matt McCutchen wrote:

  On Wed, 2008-01-23 at 13:38 +0900, Brendan Grieve wrote:
  
  
Lets say 
the file, whatever it is, is a 10Gb file, and that some small amount of 
data changes in it. This is efficiently sent accross by rsync, BUT the 
rsync server side will correctly break the hard-link and create a new 
file with the changed bits. This means, if even 1 byte of that 10Gb file 
changes, you now have to store that whole file again.

  
  
  
  
What my thoughts were is that if the server could transparently break a 
large file into chunks and store them that way, then one can still make 
use of hard-links efficiently.

  
  
This is a fine idea, but I don't think support for this should be added
to rsync.  Instead, I suggest that you use rdiff-backup
( http://www.nongnu.org/rdiff-backup/ ), a backup tool that stores an
ordinary latest snapshot of the source along with reverse deltas for
previous snapshots and redundant attribute information both in its own
format.

Matt

  


I had a look at rdiff-backup, but I was trying to get something that
spoke native rsync (IE, not to force any change on the client side).

I do however agree that support should NOT be added in rsync. Rsync is
a mirroring tool and not some elaborate tool that needs to know really
how files are stored. In fact I'd go as far as to say many of the
options rsync does support veer away from being a simple mirror tool
(IE backup etc...).

After some thought I think the best place to put such a change would be
at the filesystem level. For example, if one had a FUSE filesystem that
simply ran on top of an existing one, that wrote its files as I
described (or uses diff-like methods), but presents a clean filesystem
for rsync (or indeed any tool) to make use of. I believe I may look in
that direction instead of hacking rsync.


Brendan Grieve


-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Thought on large files

2008-01-23 Thread Matt McCutchen
On Thu, 2008-01-24 at 13:54 +0900, Brendan Grieve wrote:
 I had a look at rdiff-backup, but I was trying to get something that
 spoke native rsync (IE, not to force any change on the client side).

To achieve this, you can have the client push to an rsync daemon and
then have the daemon call rdiff-backup so that the rdiff-backup part
happens entirely on the server.  The idea is the same as the
daemon-and-rsnapshot setup I described in the following message, but
with rdiff-backup in place of rsnapshot as the backend:

http://lists.samba.org/archive/rsync/2007-December/019470.html

 After some thought I think the best place to put such a change would
 be at the filesystem level. For example, if one had a FUSE filesystem
 that simply ran on top of an existing one, that wrote its files as I
 described (or uses diff-like methods), but presents a clean filesystem
 for rsync (or indeed any tool) to make use of. I believe I may look in
 that direction instead of hacking rsync.

You could do that, but note that the rsync receiver won't explicitly
tell the filesystem what files are similar, so you'll have to either
keep a big hashtable to help you coalesce identical blocks globally or
use some kludge like looking at what other files the receiver has open
while it is writing the destination file.

Matt

-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Thought on large files

2008-01-23 Thread Brendan Grieve




Matt McCutchen wrote:

  On Thu, 2008-01-24 at 13:54 +0900, Brendan Grieve wrote:
  
  
I had a look at rdiff-backup, but I was trying to get something that
spoke native rsync (IE, not to force any change on the client side).

  
  
To achieve this, you can have the client push to an rsync daemon and
then have the daemon call rdiff-backup so that the rdiff-backup part
happens entirely on the server.  The idea is the same as the
daemon-and-rsnapshot setup I described in the following message, but
with rdiff-backup in place of rsnapshot as the backend:

http://lists.samba.org/archive/rsync/2007-December/019470.html

  

Thanks, I do like that idea. 



-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Thought on large files

2008-01-22 Thread Brendan Grieve

Hi There,

I've been toying around with the code of rsync on and off for a while, 
and I had a thought that I would like some comments on. Its to do with 
very large files and disk space.


One of the common uses of rsync is to use it as a backup program. A 
client connects to the rsync server, and sends over any changed files. 
If the client has very large files that have changed marginally, then 
rsync efficiently only sends the changed bits.


On the server side, one may have it set up to create 'snapshots' of the 
existing data there by hardlinking that data to another directory 
periodically. Theres plenty of documentation on the web how to do this 
so I won't go into it further.


This is very effective and uses quite little disk space since a file 
that does not change effectively doesn't take up any more disk space 
(not much more anyway), even if it exists now in many snapshots.


One place where this falls down is if the file is very large. Lets say 
the file, whatever it is, is a 10Gb file, and that some small amount of 
data changes in it. This is efficiently sent accross by rsync, BUT the 
rsync server side will correctly break the hard-link and create a new 
file with the changed bits. This means, if even 1 byte of that 10Gb file 
changes, you now have to store that whole file again.


I won't get into the whole issue of why one would have big files etc... 
I see it all the time, especially in the Microsoft world, with Outlook 
PST files, and Microsoft Exchange Database files.


What my thoughts were is that if the server could transparently break a 
large file into chunks and store them that way, then one can still make 
use of hard-links efficiently.


For example, going back to a 10Gb Exchange Database file, its likely not 
going to change too much during use. So if the server stored the huge 
clumsy 'priv1.edb' as:

 .priv1.edb._somemagicstring_.1
 .priv1.edb._somemagicstring_.2
etc...

and intelligently only broke the 'hard-links' of the bits that actually 
change, then it all works well. One could have an option to enable this 
for files bigger than a certain size, and break them into specific sized 
chunks.


One could quite rightly argue that this changes rsync from a tool that 
synchronizes data between places to a dedicated backup tool (as the two 
sides will now have physically different data), however I could see it 
being useful, especially since it wouldn't need changes on the client 
side as the server still presents it as just one file.


What are your comments? Good idea? Stupid idea? Been done before? Does 
anyone have some hints about where in the code I should look to make 
these changes so I can test it out?




Brendan Grieve
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Thought on large files

2008-01-22 Thread Matt McCutchen
On Wed, 2008-01-23 at 13:38 +0900, Brendan Grieve wrote:
 Lets say 
 the file, whatever it is, is a 10Gb file, and that some small amount of 
 data changes in it. This is efficiently sent accross by rsync, BUT the 
 rsync server side will correctly break the hard-link and create a new 
 file with the changed bits. This means, if even 1 byte of that 10Gb file 
 changes, you now have to store that whole file again.

 What my thoughts were is that if the server could transparently break a 
 large file into chunks and store them that way, then one can still make 
 use of hard-links efficiently.

This is a fine idea, but I don't think support for this should be added
to rsync.  Instead, I suggest that you use rdiff-backup
( http://www.nongnu.org/rdiff-backup/ ), a backup tool that stores an
ordinary latest snapshot of the source along with reverse deltas for
previous snapshots and redundant attribute information both in its own
format.

Matt

-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


problems with rsync 2.6.9 and large files (up to 20GB)

2007-12-12 Thread gorka barracuda
Hi,

I'm Jordi and I work at the university of barcelona. I'm trying to make a
backup of our several clusters. In the past I worked with rsync with a very
good results.

When I try to backup some large directories (for example  1.5TB with a lot
of large files 20GB) whit this command:

rsync -aulHI --delete --partial --modify-window=2 --log-format= %t %o %l %f
 -
-stats --exclude-from=/home/jingles/soft/sinc/sinc_conf/exclude_files.txt
--temp
-dir=/data/tmp --bwlimit=1 -e ssh -o ForwardX11=no -i
/root/.ssh/id_rsa -
l root /home server_name:/data/cerqt2/


it seems that rsync is working and finishing but when appears the message
that summarizes the transferred files rsync starts the backup again (taking
a lot of time)


sent 640278263924 bytes  received 217256421 bytes  8610488.88 bytes/sec
total size is 783944943854  speedup is 1.22
rsync warning: some files vanished before they could be transferred (code
24) at
 main.c(977) [sender=2.6.9]
 2007/12/11 17:02:04 del. 0
home/jingles/soft/sinc/bin/home/g1benjamin/timel-fre
q.po15740


I must say that these clusters are working full day so there are several
changes in the data stored in our hard disks.

I googled for this problem and I found that there are a patch file to solve
this problem (you can see the thread in
http://www.mail-archive.com/rsync@lists.samba.org/msg17815.html
) but I can't found it.

Can you help me? The youngest 3.0 version solve this problem?


Thanks for any suggestion,

regards,

jordi
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: problems with rsync 2.6.9 and large files (up to 20GB)

2007-12-12 Thread Matt McCutchen
On Wed, 2007-12-12 at 14:28 +0100, gorka barracuda wrote:
 it seems that rsync is working and finishing but when appears the
 message that summarizes the transferred files rsync starts the backup
 again (taking a lot of time)
 
 
 sent 640278263924 bytes  received 217256421 bytes  8610488.88
 bytes/sec
 total size is 783944943854  speedup is 1.22
 rsync warning: some files vanished before they could be transferred
 (code 24) at 
  main.c(977) [sender=2.6.9]
  2007/12/11 17:02:04 del. 0
 home/jingles/soft/sinc/bin/home/g1benjamin/timel-fre
 q.po15740 
 

I've never known rsync to start over like that.  Are you using a script
that runs rsync repeatedly until it exits with code 0?

 I googled for this problem and I found that there are a patch file to
 solve this problem (you can see the thread in
 http://www.mail-archive.com/rsync@lists.samba.org/msg17815.html ) but
 I can't found it.

That thread is about an optimization that makes the delta-transfer
algorithm much faster on large files.  The optimization is included in
current development versions of rsync 3.0.0.  This probably is not
related to the problem with rsync starting over.

Matt

-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: problems with rsync 2.6.9 and large files (up to 20GB)

2007-12-12 Thread gorka barracuda
Hi Matt, thanks for your fastet reply

2007/12/12, Matt McCutchen [EMAIL PROTECTED]:

 On Wed, 2007-12-12 at 14:28 +0100, gorka barracuda wrote:
  it seems that rsync is working and finishing but when appears the
  message that summarizes the transferred files rsync starts the backup
  again (taking a lot of time)
 
  
  sent 640278263924 bytes  received 217256421 bytes  8610488.88
  bytes/sec
  total size is 783944943854  speedup is 1.22
  rsync warning: some files vanished before they could be transferred
  (code 24) at
   main.c(977) [sender=2.6.9]
   2007/12/11 17:02:04 del. 0
  home/jingles/soft/sinc/bin/home/g1benjamin/timel-fre
  q.po15740
  

 I've never known rsync to start over like that.  Are you using a script
 that runs rsync repeatedly until it exits with code 0?


ops! My apologizes, you are correct, my first version of the script checked
the exits code and it seems that I forgot to comment it :(

Sorry for the inconvenience.

but, the second time that makes this re-rsync it takes the same time that
the first time...it seems that it doen't make an incremental backup with our
large files... Do you think that the cause could be the problem of the new
optimized algorithm that you put in rsync 3.0.0?


 I googled for this problem and I found that there are a patch file to
  solve this problem (you can see the thread in
  http://www.mail-archive.com/rsync@lists.samba.org/msg17815.html ) but
  I can't found it.

 That thread is about an optimization that makes the delta-transfer
 algorithm much faster on large files.  The optimization is included in
 current development versions of rsync 3.0.0.  This probably is not
 related to the problem with rsync starting over.

 Matt



Thanks again, and sorry for all

jordi
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: problems with rsync 2.6.9 and large files (up to 20GB)

2007-12-12 Thread Matt McCutchen
On Wed, 2007-12-12 at 17:01 +0100, gorka barracuda wrote:
 but, the second time that makes this re-rsync it takes the same time
 that the first time...it seems that it doen't make an incremental
 backup with our large files... Do you think that the cause could be
 the problem of the new optimized algorithm that you put in rsync
 3.0.0?

You are passing -I, which makes rsync transfer all regular files every
time even if they appear to be identical on source and destination.
Rsync does reduce network traffic using the delta-transfer algorithm,
but the process still rewrites each destination file in full and uses a
bunch of CPU on the sender (especially without the optimized algorithm),
so it may take a long time.  Consider whether you really need the extra
certainty of catching changes afforded by -I.  At the least, you could
disable -I for rsync runs after the first.

Matt

-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: problems with rsync 2.6.9 and large files (up to 20GB)

2007-12-12 Thread gorka barracuda
It works!

Thanks again Matt!


2007/12/12, Matt McCutchen [EMAIL PROTECTED]:

 On Wed, 2007-12-12 at 17:01 +0100, gorka barracuda wrote:
  but, the second time that makes this re-rsync it takes the same time
  that the first time...it seems that it doen't make an incremental
  backup with our large files... Do you think that the cause could be
  the problem of the new optimized algorithm that you put in rsync
  3.0.0?

 You are passing -I, which makes rsync transfer all regular files every
 time even if they appear to be identical on source and destination.
 Rsync does reduce network traffic using the delta-transfer algorithm,
 but the process still rewrites each destination file in full and uses a
 bunch of CPU on the sender (especially without the optimized algorithm),
 so it may take a long time.  Consider whether you really need the extra
 certainty of catching changes afforded by -I.  At the least, you could
 disable -I for rsync runs after the first.

 Matt


-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Building hash table times for large files

2007-11-02 Thread Rob Bosch
I'm running pre4 on a 77GB file.  It seems like the hash table is taking a
long time to be built.  I'm not sure what is involved in this step but as an
example the following is logged during a run:

send_files(11, priv1.edb) 
send_files mapped priv1.edb of size 79187419136 
calling match_sums priv1.edb 
f.st.. priv1.edb 
hash search b=131072 len=79187419136 
built hash table for entries 0 - 52427 
built hash table for entries 52428 - 104855 
built hash table for entries 104856 - 157283 
built hash table for entries 157284 - 209711

The first two hash table entries (0 - 52427 and 52428 - 104855) took only a
few minutes.  The next two entries written to the log file have taken
several hours.  

Does this make sense?  Should the entries take so long to be built?

Rob



-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Extremely poor rsync performance on very large files (near 100GB and larger)

2007-10-07 Thread Wayne Davison
On Mon, Jan 08, 2007 at 10:16:01AM -0800, Wayne Davison wrote:
 And one final thought that occurred to me:  it would also be possible
 for the sender to segment a really large file into several chunks,
 handling each one without overlap, all without the generator or the
 receiver knowing that it was happening.

I have a patch that implements this:

http://rsync.samba.org/ftp/unpacked/rsync/patches/segment_large_hash.diff

I'm wondering if anyone has any feedback on such a method being included
in rsync?

..wayne..
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Extremely poor rsync performance on very large files (near 100GB and larger)

2007-10-07 Thread Matt McCutchen
On 10/7/07, Wayne Davison [EMAIL PROTECTED] wrote:
 On Mon, Jan 08, 2007 at 10:16:01AM -0800, Wayne Davison wrote:
  And one final thought that occurred to me:  it would also be possible
  for the sender to segment a really large file into several chunks,
  handling each one without overlap, all without the generator or the
  receiver knowing that it was happening.

 I have a patch that implements this:

 http://rsync.samba.org/ftp/unpacked/rsync/patches/segment_large_hash.diff

I like better performance, but I'm not entirely happy with a fixed
upper limit on the distance that data can migrate and still be matched
by the delta-transfer algorithm: if someone is copying an image of an
entire hard disk and rearranges the partitions within the disk, rsync
will needlessly retransmit all the partition data.  An alternative
would be to use several different block sizes spaced by a factor of 16
or so and have a separate hash table for each.  Each hash table would
hold checksums for a sliding window of 8/10*TABLESIZE blocks around
the current position.  This way, small blocks could be matched across
small distances without overloading the hash table, and large blocks
could still be matched across large distances.

Matt
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Extremely poor rsync performance on very large files (near 100GB and larger)

2007-01-12 Thread Shachar Shemesh
Evan Harris wrote:
 Would it make more sense just to make rsync pick a more sane blocksize
 for very large files?  I say that without knowing how rsync selects
 the blocksize, but I'm assuming that if a 65k entry hash table is
 getting overloaded, it must be using something way too small.
rsync picks a block size that is the square root of the file size. As I
didn't write this code, I can safely say that it seems like a very good
compromise between too small block sizes (too many hash lookups) and too
large blocksizes (decreased chance of finding matches).
 Should it be scaling the blocksize with a power-of-2 algorithm rather
 than the hash table (based on filesize)?
If Wayne intends to make the hash size a power of 2, maybe selecting
block sizes that are smaller will make sense. We'll see how 3.0 comes along.
 I haven't tested to see if that would work.  Will -B accept a value of
 something large like 16meg?
It should. That's about 10 times the block size you need in order to not
overflow the hash table, though, so a block size of 2MB would seem more
appropriate to me for a file size of 100GB.
   At my data rates, that's about a half a second of network bandwidth,
 and seems entirely reasonable.
 Evan
I would just like to note that since I submitted the large hash table
patch, I have seen no feedback on anyone actually testing it. If you can
compile a patched rsync and report how it goes, that would be very
valuable to me.

Shachar
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Extremely poor rsync performance on very large files (near 100GB and larger)

2007-01-08 Thread Wayne Davison
On Mon, Jan 08, 2007 at 01:37:45AM -0600, Evan Harris wrote:
 I've been playing with rsync and very large files approaching and 
 surpassing 100GB, and have found that rsync has excessively very poor 
 performance on these very large files, and the performance appears to 
 degrade the larger the file gets.

Yes, this is caused by the current hashing algorithm that the sender
uses to find matches for moved data.  The current hash table has a fixed
size of 65536 slots, and can get overloaded for really large files.

There is a diff in the patches dir that makes rsync work better with
large files: dynamic_hash.diff.  This makes the size of the hash table
depend on how many blocks there are in the transfer.  It does speed up
the transfer of large files significantly, but since it introduces a mod
(%) operation on a per-byte basis, it slows down the transfer of normal
sized files significantly.

I'm going to be checking into using a hash algorithm with a table that
is always a power of 2 in size as an alternative implementation of this
dynamic hash algorithm.  That will hopefully not bloat the CPU time for
normal-sized files.  Alternately, the hashing algorithm could be made to
vary depending on the file's size.  I'm hoping to have this improved in
the upcoming 3.0.0 release.

And one final thought that occurred to me:  it would also be possible
for the sender to segment a really large file into several chunks,
handling each one without overlap, all without the generator or the
receiver knowing that it was happening.  The upside is that huge files
could be handled this way, but the downside is that the incremental-sync
algorithm would not find matches spanning the chunks.  It would be
interesting to test this and see if the rsync algorithm would be better
served by using a larger number of smaller chunks while segmenting the
file, rather than a smaller number of much larger chunks while
considering the file as a whole.

..wayne..
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Extremely poor rsync performance on very large files (near 100GB and larger)

2007-01-08 Thread Evan Harris


On Mon, 8 Jan 2007, Wayne Davison wrote:


On Mon, Jan 08, 2007 at 01:37:45AM -0600, Evan Harris wrote:


I've been playing with rsync and very large files approaching and
surpassing 100GB, and have found that rsync has excessively very poor
performance on these very large files, and the performance appears to
degrade the larger the file gets.


Yes, this is caused by the current hashing algorithm that the sender
uses to find matches for moved data.  The current hash table has a fixed
size of 65536 slots, and can get overloaded for really large files.
...


Would it make more sense just to make rsync pick a more sane blocksize for 
very large files?  I say that without knowing how rsync selects the 
blocksize, but I'm assuming that if a 65k entry hash table is getting 
overloaded, it must be using something way too small.  Should it be scaling 
the blocksize with a power-of-2 algorithm rather than the hash table (based 
on filesize)?


I know that may result in more network traffic as a bigger block containing 
a difference will be considered changed and need to be sent instead of 
smaller blocks, but in some circumstances wasting a little more network 
bandwidth may be wholly warranted.  Then maybe the hash table size doesn't 
matter, since there are fewer blocks to check.


I haven't tested to see if that would work.  Will -B accept a value of 
something large like 16meg?  At my data rates, that's about a half a second 
of network bandwidth, and seems entirely reasonable.


Evan
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Error while transfering large files

2006-11-23 Thread Peter-Jan Deweirdt
Hi,

 

I'm using rsync to backup my data from my Linux machine (SUSE10.1) and
Windows (XP)

I've mounted a windows share on my linux and I'm trying now to copy files to
the windows with rsync.

The windows share is a NTFS filesystem

 

It's all working except I have a error on large files.

 

This is the command that I'm trying:

rsync -a --no-o --delete /public/Games/Call_of_Duty2/
/backup/public/Games/Call_of_Duty2/

 

and this is the error I'm receiving on this command:

rsync: writefd_unbuffered failed to write 4 bytes [sender]: Broken pipe (32)

rsync: write failed on /backup/public/Games/Call_of_Duty2/dev-cod2.iso:
File too large (27)

rsync error: error in file IO (code 11) at receiver.c(258) [receiver=2.6.9]

rsync: connection unexpectedly closed (76 bytes received so far) [generator]

rsync error: error in rsync protocol data stream (code 12) at io.c(453)
[generator=2.6.9]

rsync: connection unexpectedly closed (36 bytes received so far) [sender]

rsync error: error in rsync protocol data stream (code 12) at io.c(453)
[sender=2.6.9]

 

 

Then I tried (as described in on the website) with the strace in front. The
command format was as followed:

strace -f rsync -a --no-o --delete /public/Games/Call_of_Duty2/
/backup/public/Games/Call_of_Duty2/ 2./tracedump.log

 

 

This is a part of the tracedump.log file. Where it went wrong.

 

[pid   826] select(1, [0], [], NULL, {60, 0} unfinished ...

[pid   824] ... write resumed )   = 32768

[pid   826] ... select resumed )  = 1 (in [0], left {60, 0})

[pid   824] select(5, NULL, [4], [4], {60, 0} unfinished ...

[pid   826] read(0, \326\3357\233`\v\t\26\3352G\33\210\314
\221\371\373\20..., 8184) = 8184

[pid   826] select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0})

[pid   826] read(0,
\232\t\316S\177\300\373\362\350\26\371\311\255v\352\212..., 8184) = 8184

[pid   826] select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0})

[pid   826] read(0,  unfinished ...

[pid   824] ... select resumed )  = 1 (out [4], left {60, 0})

[pid   826] ... read resumed
\344\31\f\7\360e\17\301\227\25\202/\233\324i\256\246\0..., 8184) = 8184

[pid   824] write(4, \0\200\0\0, 4 unfinished ...

[pid   826] select(1, [0], [], NULL, {60, 0} unfinished ...

[pid   824] ... write resumed )   = 4

[pid   826] ... select resumed )  = 1 (in [0], left {60, 0})

[pid   824] read(3,  unfinished ...

[pid   826] read(0,  unfinished ...

[pid   824] ... read resumed
\31\310A\232o\356\30r\223p\215B\27\313\337\357\371\364..., 262144) =
262144

[pid   826] ... read resumed
\3068\340\203\337\254\360\374%\217\321\320\352\343q\17..., 8184) = 8184

[pid   824] select(5, NULL, [4], [4], {60, 0} unfinished ...

[pid   826] select(1, [0], [], NULL, {60, 0} unfinished ...

[pid   824] ... select resumed )  = 1 (out [4], left {60, 0})

[pid   826] ... select resumed )  = 1 (in [0], left {60, 0})

[pid   824] write(4,
\31\310A\232o\356\30r\223p\215B\27\313\337\357\371\364..., 32768
unfinished ...

[pid   826] read(0,  unfinished ...

[pid   824] ... write resumed )   = 32768

[pid   826] ... read resumed
\347\235\303/\210G\365\n\304\325\327\335A\374\272\31\320..., 8184) = 8184

[pid   824] select(5, NULL, [4], [4], {60, 0} unfinished ...

[pid   826] write(1,
x\17\240\2267\376m\257G\271\373\217\264\375\325\312\302..., 262144) =
262143

[pid   826] write(1, 5, 1)= -1 EFBIG (File too large)

[pid   826] --- SIGXFSZ (File size limit exceeded) @ 0 (0) ---

[pid   826] write(4, ^\0\0\10rsync: write failed on \/bac..., 98) = 98

[pid   826] rt_sigaction(SIGUSR1, {SIG_IGN},  unfinished ...

[pid   825] ... select resumed )  = 1 (in [3], left {2, 488000})

[pid   826] ... rt_sigaction resumed NULL, 8) = 0

[pid   825] read(3,  unfinished ...

[pid   826] rt_sigaction(SIGUSR2, {SIG_IGN}, NULL, 8) = 0

[pid   826] unlink(.dev-cod2.iso.GrkNfL unfinished ...

[pid   825] ... read resumed ^\0\0\10, 4) = 4

[pid   825] select(4, [3], [], NULL, {60, 0}) = 1 (in [3], left {60, 0})

[pid   825] read(3, rsync: write failed on \/backup/..., 94) = 94

[pid   825] select(2, NULL, [1], [1], {60, 0}) = 1 (out [1], left {60, 0})

[pid   825] write(1, ^\0\0\10rsync: write failed on \/bac..., 98) = 98

[pid   825] time(NULL)  = 1164306617

[pid   825] select(4, [3], [], NULL, {60, 0} unfinished ...

[pid   826] ... unlink resumed )  = 0

[pid   826] write(4, L\0\0\10rsync error: error in file I..., 80) = 80

[pid   826] exit_group(11)  = ?

Process 826 detached

[pid   824] ... select resumed )  = 1 (out [4], left {59, 88})

[pid   824] write(4, \0\200\0\0, 4)   = -1 EPIPE (Broken pipe)

[pid   824] --- SIGPIPE (Broken pipe) @ 0 (0) ---

[pid   824] write(2, rsync: writefd_unbuffered failed..., 76rsync:
writefd_unbuffered failed to write 4 bytes [sender]: Broken pipe (32)) = 76

[pid   824] write(2, \n, 1

)   = 1

[pid   824] select(6, [5], [], NULL, {30, 0}) = 1 (in [5], left {30, 0})

[pid   825

Rsync hangs on large files over stunnel

2006-10-31 Thread Gene Quinn
Greetings.

Here's my setup:

On the server -
rsync 2.5.6  protocol version 26
stunnel 4.04 on i686-suse-linux-gnu PTHREAD with OpenSSL 0.9.7b

On the client - 
rsync  version 2.6.6  protocol version 29
stunnel 4.14 on i686-suse-linux-gnu UCONTEXT+POLL+IPv4+LIBWRAP with OpenSSL 
0.9.8a

Both ends run rsync as root
The rsync daemon listens on a non-default port that is only bound to 127.0.01.  
Stunnel securely proxies between an exposed high port and the rsync port.

The client is configured to pull from the server
The connection works.
Rsync starts
A series of small files are synchronized
The first moderately large file causes rsync to hang, typically at 66% 
completion.  The file is approximately 25 Meg.

netstat on both sides appears to show an established connection, but without 
any activity after the initial series of files are transferred.

I have tried using the --protocol option to force use of protocol 26.
I have tried using the --no-blocking-io

Any help you can offer is very much appreciated.

Regards

Gene





--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: large files not being synced properly while being uploaded to samba share

2006-10-06 Thread Mike Daws
I've experienced cases like that.
I've been able to repair the file with an rsync -I, although this
doesn't address the cause of the problem.

On Tue, 2006-10-03 at 14:42 -0500, Mark Osborne wrote:
 
 Hello, 
 
 I have run into an issue with rsync that I’m hoping someone can help
 with.  We are using rsync to mirror data between a samba share on an
 internal staging server and our production ftp servers.  The rsync
 runs from cron every 15 minutes.  Occasionally, the rsync will run
 while somebody is uploading a large file to the samba share (for
 instance an iso image).  The file appears to make it out to the
 production ftp servers, and an ls shows it to have the correct file
 size and timestamp.  However, an md5sum of the file shows that it is
 different from the file on the staging server.  Subsequent runs of the
 rsync do not update the file.  I have tried to run the rsync manually
 with the –c flag even though we wouldn’t really want to implement that
 because of how long it makes rysnc take.  Even with checksum turned
 on, the file still did not get correctly updated.  If the file is
 completely uploaded to the share before the rsync runs there does not
 appear to be an issue. 
 
 Originally I thought that there may be a problem with different
 versions of rsync on the servers.   The staging server was running
 rsync 2.5.5 while the production servers were running 2.5.7.  I have
 gotten rsync 2.6.8 on both servers and am still experiencing the
 problem.   
 
 More information about the servers 
 
 Staging server – Solaris 8, rsync 2.6.8 
 Ftp1 – Redhat AS 2.1, rsync 2.6.8 
 Ftp2 – Redhat AS 2.1, rsync 2.6.8 
 
 Has anybody else seen this problem or have any ideas?   
 
 Thanks, 
 Mark 
 
 ~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~
 Mark Osborne
 Web Systems Engineer
 [EMAIL PROTECTED]
 (512) 683-5019
 ~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~ 
 -- 
 To unsubscribe or change options: 
 https://lists.samba.org/mailman/listinfo/rsync
 Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


large files not being synced properly while being uploaded to samba share

2006-10-03 Thread Mark Osborne

Hello,

I have run into an issue with rsync
that I’m hoping someone can help with. We are using rsync to mirror
data between a samba share on an internal staging server and our production
ftp servers. The rsync runs from cron every 15 minutes. Occasionally,
the rsync will run while somebody is uploading a large file to the samba
share (for instance an iso image). The file appears to make it out
to the production ftp servers, and an ls shows it to have the correct file
size and timestamp. However, an md5sum of the file shows that it
is different from the file on the staging server. Subsequent runs
of the rsync do not update the file. I have tried to run the rsync
manually with the –c flag even though we wouldn’t really want to implement
that because of how long it makes rysnc take. Even with checksum
turned on, the file still did not get correctly updated. If the file
is completely uploaded to the share before the rsync runs there does not
appear to be an issue.

Originally I thought that there
may be a problem with different versions of rsync on the servers. 
The staging server was running rsync 2.5.5 while the production servers
were running 2.5.7. I have gotten rsync 2.6.8 on both servers and
am still experiencing the problem. 

More information about the servers

Staging server – Solaris 8, rsync
2.6.8
Ftp1 – Redhat AS 2.1, rsync 2.6.8

Ftp2 – Redhat AS 2.1, rsync 2.6.8

Has anybody else seen this problem
or have any ideas? 

Thanks,
Mark

~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~
Mark Osborne
Web Systems Engineer
[EMAIL PROTECTED]
(512) 683-5019
~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Problems with rsync, large files and partial.

2006-04-24 Thread Panos Koutsoyannis
I can't seem to get rsync to restart where it left off
when I am syncing a large file ( 5GB).  Below is some
info on what I have been doing.  If someone has the
energy to barrel through my comments that would be
great.  Out of curiosity is there an alternative to
rsync for large files?

I believe I looked all over the web for answers and I
found partial answers and just didn't seem clear if
this works.  If anyone has pointers or explanations it
would be most helpful.   Here is my set up and some
explanation:

- I use rsync over ssh to sync over our wan.
- I sync over an 8 hour window every night.
- After 8 hours if the sync is not complete, it gets
killed and restarts the next evening.
- Everything works as expected except for one very
large file which always has to restart from the
beginning even though I use the partial command.
- I have tried it with and without compression.
- I have tried it with various versions of rsync
including the latest.
- There were some posts that seem to imply that if the
file already existed on the backup system and using
--append might help, but that didn't make sense.
- I have tried with whole-file and nowhole-file and
that does not make difference. 
- I do notice a bunch .file_name.gibberish in the
target  directory which is the partial backup, but it
does not seem to use them in subsequent tries.


It just seems the right thing to do would be for the
rsync to continue where it left off the night before,
but it doesn't.

- Here is my command
rsync --archive --verbose --progress -partial --stats
-e ssh xxx.xxx.xxx:/dir


panos

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Problems with rsync, large files and partial.

2006-04-24 Thread Paul Slootman
On Mon 24 Apr 2006, Panos Koutsoyannis wrote:
 
 - I use rsync over ssh to sync over our wan.
 - I sync over an 8 hour window every night.
 - After 8 hours if the sync is not complete, it gets
 killed and restarts the next evening.

How do you kill it? Via kill -9?

 - I do notice a bunch .file_name.gibberish in the
 target  directory which is the partial backup, but it
 does not seem to use them in subsequent tries.
 
 
 It just seems the right thing to do would be for the
 rsync to continue where it left off the night before,
 but it doesn't.
 
 - Here is my command
 rsync --archive --verbose --progress -partial --stats
 -e ssh xxx.xxx.xxx:/dir

With --partial it should rename the .file_name.gibberish file to
file_name when interrupted, so that it can resume using the partial file
as a start (which it won't do with .file_name.gibberish).
That's why I suspect you're not stopping rsync politely...


Paul Slootman
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Problems with rsync, large files and partial.

2006-04-24 Thread Panos Koutsoyannis
Ah...That make sense.  I do  not stop it politely ..
you are right.  I will fix up the signal handling and
give it whirl.

Thanks

Panos


--- Paul Slootman [EMAIL PROTECTED] wrote:

 On Mon 24 Apr 2006, Panos Koutsoyannis wrote:
  
  - I use rsync over ssh to sync over our wan.
  - I sync over an 8 hour window every night.
  - After 8 hours if the sync is not complete, it
 gets
  killed and restarts the next evening.
 
 How do you kill it? Via kill -9?
 
  - I do notice a bunch .file_name.gibberish in the
  target  directory which is the partial backup, but
 it
  does not seem to use them in subsequent tries.
  
  
  It just seems the right thing to do would be for
 the
  rsync to continue where it left off the night
 before,
  but it doesn't.
  
  - Here is my command
  rsync --archive --verbose --progress -partial
 --stats
  -e ssh xxx.xxx.xxx:/dir
 
 With --partial it should rename the
 .file_name.gibberish file to
 file_name when interrupted, so that it can resume
 using the partial file
 as a start (which it won't do with
 .file_name.gibberish).
 That's why I suspect you're not stopping rsync
 politely...
 
 
 Paul Slootman
 -- 
 To unsubscribe or change options:
 https://lists.samba.org/mailman/listinfo/rsync
 Before posting, read:
 http://www.catb.org/~esr/faqs/smart-questions.html
 


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Problems with rsync, large files and partial.

2006-04-24 Thread Panos Koutsoyannis
Just changed by scripts and that was definately my
problem and its fixed.

Thank you

panos


--- Panos Koutsoyannis [EMAIL PROTECTED] wrote:

 Ah...That make sense.  I do  not stop it politely ..
 you are right.  I will fix up the signal handling
 and
 give it whirl.
 
 Thanks
 
 Panos
 
 
 --- Paul Slootman [EMAIL PROTECTED] wrote:
 
  On Mon 24 Apr 2006, Panos Koutsoyannis wrote:
   
   - I use rsync over ssh to sync over our wan.
   - I sync over an 8 hour window every night.
   - After 8 hours if the sync is not complete, it
  gets
   killed and restarts the next evening.
  
  How do you kill it? Via kill -9?
  
   - I do notice a bunch .file_name.gibberish in
 the
   target  directory which is the partial backup,
 but
  it
   does not seem to use them in subsequent tries.
   
   
   It just seems the right thing to do would be for
  the
   rsync to continue where it left off the night
  before,
   but it doesn't.
   
   - Here is my command
   rsync --archive --verbose --progress -partial
  --stats
   -e ssh xxx.xxx.xxx:/dir
  
  With --partial it should rename the
  .file_name.gibberish file to
  file_name when interrupted, so that it can resume
  using the partial file
  as a start (which it won't do with
  .file_name.gibberish).
  That's why I suspect you're not stopping rsync
  politely...
  
  
  Paul Slootman
  -- 
  To unsubscribe or change options:
  https://lists.samba.org/mailman/listinfo/rsync
  Before posting, read:
  http://www.catb.org/~esr/faqs/smart-questions.html
  
 
 
 __
 Do You Yahoo!?
 Tired of spam?  Yahoo! Mail has the best spam
 protection around 
 http://mail.yahoo.com 
 -- 
 To unsubscribe or change options:
 https://lists.samba.org/mailman/listinfo/rsync
 Before posting, read:
 http://www.catb.org/~esr/faqs/smart-questions.html
 


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


DO NOT REPLY [Bug 3358] rsync chokes on large files

2006-01-02 Thread bugzilla-daemon
https://bugzilla.samba.org/show_bug.cgi?id=3358


[EMAIL PROTECTED] changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||INVALID




--- Comment #5 from [EMAIL PROTECTED]  2006-01-02 09:49 MST ---
(In reply to comment #4)
 If rsyns is _not_ checksumming files, why does rsyns remain in this state:
 [...]
 for maybe 30 minutes when it transfers my big file?

Because it is transferring the file.  Yes, this involves file-transfer
checksumming, but I was talking about pre-transfer checksum generation (and its
use in determining which files get transferred) which is what --checksum
enables.


-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug, or are watching the QA contact.
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


DO NOT REPLY [Bug 3358] rsync chokes on large files

2006-01-02 Thread bugzilla-daemon
https://bugzilla.samba.org/show_bug.cgi?id=3358





--- Comment #6 from [EMAIL PROTECTED]  2006-01-02 10:21 MST ---
This is weird, there is no network activity during this building file list
phase. However, as soon as it is finished, rsync saturates my network.

I thought rsync worked, if the file's size and modification date doesn't match,
by creating a binary tree and then checksumming the parts between every node,
recursively to the root of the tree, and then only transferring the parts where
the checksum didn't match.


-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug, or are watching the QA contact.
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


DO NOT REPLY [Bug 3358] rsync chokes on large files

2006-01-02 Thread bugzilla-daemon
https://bugzilla.samba.org/show_bug.cgi?id=3358





--- Comment #7 from [EMAIL PROTECTED]  2006-01-02 11:02 MST ---
(In reply to comment #6)
 This is weird, there is no network activity during this building file list
 phase. However, as soon as it is finished, rsync saturates my network.

What is weird about that?  As soon as rsync outputs the 1 file to consider
message, the file-list-building stage is over, and rsync then starts to
transfer the file if it is in need of an update.  (If --checksum was specified,
the receiving rsync would first be busily checksumming the file to decide if
the file was actually changed before (possibly) starting the transfer.)

 I thought rsync worked, if the file's size and modification date doesn't
 match, by creating a binary tree and then checksumming the parts between
 every node, recursively to the root of the tree, and then only transferring
 the parts where the checksum didn't match.

There are no b-trees involved -- rsync immediately starts to send checksum info
from the receiving side to the sender, who then diffs the remote checksums with
the sending-side file and sends instructions to the receiver on how to recreate
the file using as much of the local data as possible (this new file is built in
a separate temp-file unless the --inplace option was specified).


-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug, or are watching the QA contact.
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


DO NOT REPLY [Bug 3358] rsync chokes on large files

2006-01-02 Thread bugzilla-daemon
https://bugzilla.samba.org/show_bug.cgi?id=3358





--- Comment #8 from [EMAIL PROTECTED]  2006-01-02 11:42 MST ---
 What is weird about that?

You wrote in a previous comment when I asked why rsync is considering a file
for 30 minutes if it is not checksumming it: 

 Because it is transferring the file. 

To which I replied that there is no noticable network activity when rsync is in
this state. However, when it is finished with the 'consideration phase' the
network is saturated.

I think it is weird that transferring a 25 GB file doesn't generate any network
activity when rsync is in the 'consideration phase' but transferring the same
file when rsync is in another phase saturates the network.


-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug, or are watching the QA contact.
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


DO NOT REPLY [Bug 3358] rsync chokes on large files

2005-12-29 Thread bugzilla-daemon
https://bugzilla.samba.org/show_bug.cgi?id=3358





--- Comment #2 from [EMAIL PROTECTED]  2005-12-29 13:47 MST ---
Intereseting, didn't knwo that rsync worked that way - I thought the default
behaviour was to only replace the parts of the file that had changed. Anyway,
this  motivates a follow-up question:

If I understand it correctly if you file 1 on computer A and file 2 on computer
B and some minor changes has been made to 1 and you want to sync these changes
to B rsync basically make a copy of 2 and works with that. If 1/2 are big, like
in my example when they where 25-50 GB, the copy operation from 2.0 to 2.1
generates a lot of disk acticity.

In my case when I rsync between two laptops all this disk activity is a little
unfortunate since laptop drives are so slow. Now to my question, is there a way
to reduce disk activity? Does the --inplace switch work around this?

Thanks.


-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug, or are watching the QA contact.
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


DO NOT REPLY [Bug 3358] rsync chokes on large files

2005-12-29 Thread bugzilla-daemon
https://bugzilla.samba.org/show_bug.cgi?id=3358





--- Comment #3 from [EMAIL PROTECTED]  2005-12-29 13:48 MST ---
Btw, I am just trying your suggestions. First I will try the inplace switch and
secondly I will test syncing with twice the amount of space required for the
file available.


-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug, or are watching the QA contact.
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


DO NOT REPLY [Bug 3358] rsync chokes on large files

2005-12-29 Thread bugzilla-daemon
https://bugzilla.samba.org/show_bug.cgi?id=3358





--- Comment #4 from [EMAIL PROTECTED]  2005-12-29 13:54 MST ---
Sorry for spamming, but I just realised what you meant when you wrote:

You can use the --checksum option to avoid this unneeded update at the expense
of a lot of extra disk I/O to compute each file's checksum before figuring out
if a transfer is needed.


If rsyns is _not_ checksumming files, why does rsyns remain in this state:

building file list ... 
1 file to consider


for maybe 30 minutes when it transfers my big file?


-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug, or are watching the QA contact.
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 3358] New: rsync chokes on large files

2005-12-28 Thread bugzilla-daemon
https://bugzilla.samba.org/show_bug.cgi?id=3358

   Summary: rsync chokes on large files
   Product: rsync
   Version: 2.6.6
  Platform: PPC
OS/Version: Mac OS X
Status: NEW
  Severity: major
  Priority: P3
 Component: core
AssignedTo: [EMAIL PROTECTED]
ReportedBy: [EMAIL PROTECTED]
 QAContact: [EMAIL PROTECTED]


I try to rsync a 25-50 GB AES128 encrypted disk image called 'test' between two
Mac OS X-machines. This is with rsync 2.6.6 (is there a 2.6.7? The front page
just says 2.6.6)


% rsync -av --progress --stats --rsh=ssh /test 2nd-machine:/test
Warning: No xauth data; using fake authentication data for X11 forwarding.
tcsh: TERM: Undefined variable.
building file list ... 
1 file to consider
test
rsync: writefd_unbuffered failed to write 4 bytes: phase unknown [sender]:
Broken pipe (32)
rsync: write failed on /test: No space left on device (28)
rsync error: error in file IO (code 11) at
/SourceCache/rsync/rsync-20/rsync/receiver.c(312)
rsync: connection unexpectedly closed (92 bytes received so far) [generator]
rsync error: error in rsync protocol data stream (code 12) at
/SourceCache/rsync/rsync-20/rsync/io.c(359)
rsync: connection unexpectedly closed (1240188 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(434)


The receiving machine has space left (2+ GB). Before I upgraded to 2.6.6 I had
2.6.2 on the sending machine and 2.6.3 on the receiving machine. With that
combination I got another error message:

% rsync -av --progress --stats --rsh=ssh test 2nd-machine:/test
Warning: No xauth data; using fake authentication data for X11
forwarding.
tcsh: TERM: Undefined variable.
building file list ... 
1 file to consider
test
rsync: writefd_unbuffered failed to write 4 bytes: phase unknown:
Broken pipe
rsync error: error in rsync protocol data stream (code 12) at
/SourceCache/rsync/rsync-14/rsync/io.c(836)



The files _should_ be identical, I first transfered them with sftp without
problems but they will change in the future and then I want to use rsync to
keep them identical. This was just a test to verify my plan - a test that
didn't seem to workout that well.

I don't know if this matter but here are some more information about my setup:

Powerbook G3 with 10.3.9
Powerbook G4 with 10.4.3

Wireless 802.11G-network between router and G4, wired network between G3 and
router. The router is a Linksys WRT54GS.

Both the older versions and the most recent versions works very well when I
work with smaller filer (for example, I synchronized 40 GB with mp3:a without
problems)


-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug, or are watching the QA contact.
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 3358] rsync chokes on large files

2005-12-28 Thread bugzilla-daemon
https://bugzilla.samba.org/show_bug.cgi?id=3358





--- Comment #1 from [EMAIL PROTECTED]  2005-12-28 11:21 MST ---
The pertinent error is this:

rsync: write failed on /test: No space left on device (28)

That is an error from your OS that indicates that there was no room to write
out the destination file.  Keep in mind that when rsync updates a file, it
creates a new version of the file (unless --inplace was specifed), so your
destination directory needs to have enough free space available to hold the
largest updated file.

As for why the file is updating, if the modified time and size don't match,
rsync will update the file (efficiently).  You can use the --checksum option to
avoid this unneeded update at the expense of a lot of extra disk I/O to compute
each file's checksum before figuring out if a transfer is needed.


-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug, or are watching the QA contact.
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync and large files

2005-08-17 Thread Wayne Davison
On Mon, Aug 15, 2005 at 12:11:35PM -0400, Sameer Kamat wrote:
 My question is, I am observing that the data being sent over is almost
 equal to the size of the file. Would an insertion of a few blocks in a
 binary file, move the alignment of the entire file and cause this to
 happen?

That depends on the file and your options.  Is the file compressed?  If
so, all data after the change is radically different, and will not match
(unless you use an rsync-friendly compression algorithm, such as
gzip --rsyncable).   If that's not the case, are you using --inplace?
(That option specifically mentions that it doesn't handle early
insertions well.)  Or is --whole-file being specified or implied?  (It is
implied by a local transfer, so specify --no-whole-file if you need to
test using a local transfer.)

..wayne..
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


rsync and large files

2005-08-15 Thread Sameer Kamat



Hello,

 I have a few files of the order 
of 50G, that get synchronized to a remote server over ssh. These files have 
binary data and the 
change before the next time they are synchronized over. My 
question is, I am observing that the data being sent over is almost 

equal to the size of the file. Would an insertion of a few 
blocks in a binary file, move the alignment of the entire file and cause 
this
to happen ? Does rsync internally understand that the file is 
almost the same, but the data is just skewed a little bit ?

Please advise.

Thanks,
Sameer.
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Transfering very large files / and restarting failures

2005-07-28 Thread Wayne Davison
On Wed, Jul 27, 2005 at 04:29:46PM -0700, Todd Papaioannou wrote:
 Not sure I have the mojo to mess with the patches though! 

I applied the --append patch to the CVS source, so if you want to snag
version 2.6.7cvs, you can grab it via the latest nightly tar file:

http://rsync.samba.org/ftp/rsync/nightly/

I did some very simple timing tests to see how fast it would be to do a
local transfer of a 250MB file that had a little over the first half of
the file already present.  The results were:

Normal --whole-file:  ~31 seconds  (straight copy, data speedup 1.00)
Forced --no-whole-file:   ~73 seconds  (data speedup 2.39)
Using --inplace:  ~30 seconds  (data speedup 2.39)
Using --update:   ~24 seconds  (data speedup 2.40)

(The data speedup values are rsync's standard speedup values, which
indicate how much the transferred data was reduced over the wire.)

Also keep in mind that even though the --append option is writing out
less than 1/2 the file, it is still reading in all the existing data in
the partial file in order to compute the full-file checksum.

I then tried the same transfer test between two systems on my local
wireless network (11g).  Here are the results:

Forced --whole-file: ~134 seconds  (straight copy, data speedup 1.00)
Normal --no-whole-file:  ~100 seconds  (data speedup 2.39)
Using --inplace:  ~95 seconds  (data speedup 2.39)
Using --update:   ~60 seconds  (data speedup 2.40)

 is there another protocol you might know of, other than ftp that
 supports byte level restart/append? 

I can recall seeing the continue feature in wget and bittorrent.

..wayne..
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Transfering very large files / and restarting failures

2005-07-27 Thread Todd Papaioannou
Hi,

My situation is that I would like to use rsync to copy very large files 
within my network/systems. Specifically, these files are
in the order of 10-100GB. Needless to say, I would like to be able
to restart a transfer if it only partially succeeded, but NOT repeat
the work already done. 

Currently, I am initiating the transfer with this command:

rsync --partial --progress theFile /path/to/dest

where both theFile and /path/to/dest are local drives. In the future
/path/to/dest will be an NFS mount.

This succeeds in writing theFile to the destination as bytes flow.
I.e. I get a partial file there, until the full transfer is successful.

Now, say something failed. I want to restart that transfer, and
am trying something like:

rsync -u --no-whole-file --progress theFile /path/to/dest

However, the stats shown during the progress seem to imply that 
the whole transfer is starting again. 

Can someone help me out with the correct options to ensure that
if I want to restart a copy I can take advantage of the bytes that have
already been transferred?

Many Thanks

Todd


-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Transfering very large files / and restarting failures (again)

2005-07-27 Thread Todd Papaioannou
Woops! In my last email, I meant to say the second command
was:

rsync --no-whole-file --progress theFile /path/to/dest

Todd



Hi,

My situation is that I would like to use rsync to copy very large files 
within my network/systems. Specifically, these files are
in the order of 10-100GB. Needless to say, I would like to be able
to restart a transfer if it only partially succeeded, but NOT repeat
the work already done. 

Currently, I am initiating the transfer with this command:

rsync --partial --progress theFile /path/to/dest

where both theFile and /path/to/dest are local drives. In the future
/path/to/dest will be an NFS mount.

This succeeds in writing theFile to the destination as bytes flow.
I.e. I get a partial file there, until the full transfer is successful.

Now, say something failed. I want to restart that transfer, and
am trying something like:

rsync -u --no-whole-file --progress theFile /path/to/dest

However, the stats shown during the progress seem to imply that 
the whole transfer is starting again. 

Can someone help me out with the correct options to ensure that
if I want to restart a copy I can take advantage of the bytes that have
already been transferred?

Many Thanks

Todd


-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Transfering very large files / and restarting failures

2005-07-27 Thread Wayne Davison
On Wed, Jul 27, 2005 at 01:50:39PM -0700, Todd Papaioannou wrote:
 where both theFile and /path/to/dest are local drives. [...]
 rsync -u --no-whole-file --progress theFile /path/to/dest

When using local drives, the rsync protocol (--no-whole-file) slows
things down, so you don't want to use it (the rsync protocol's purpose
is to trade disk I/O and CPU cycles to reduce network bandwidth, so it
doesn't help when the transfer bandwidth is very high, as it is in a
local copy).

Note also that you're not preserving the file times, which makes rsync
less efficient (which forces you to use the -u option to avoid a
retransfer) -- you're usually better off using -t (--times) unless you
have some overriding reason to omit it.

 However, the stats shown during the progress seem to imply that the
 whole transfer is starting again. 

Yes, that's what rsync does.   It retransfers the whole file, but it
uses the local data to make the amount of data flowing over the socket
(or pipe) smaller.  The already-sent data is thus coming from the
original, partially-transferred file rather than coming from the
sender (which would lower the network bandwidth if this were a
remote connection).

 In the future /path/to/dest will be an NFS mount.

You don't want to do that unless you're network speed is higher than
your disk speed -- with slower net speeds you are better off rsyncing
directly to the remote machine that is the source of the NFS mount so
that rsync can reduce the amount of data it is sending.  With higher net
speeds you're better off just transferring the data via --whole-file and
not using --partial.  One other possibility is the --append option from
the patch named patches/append.diff -- this implements a more efficient
append mode for incremental transfers (I'm considering adding this to
the next version of rsync).

..wayne..
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


RE: Transfering very large files / and restarting failures

2005-07-27 Thread Todd Papaioannou
Wayne,

Thanks for the swift answers and insight.  

  However, the stats shown during the progress seem to imply that the 
  whole transfer is starting again.
 
 Yes, that's what rsync does.   It retransfers the whole file, but it
 uses the local data to make the amount of data flowing over 
 the socket (or pipe) smaller.  The already-sent data is thus 
 coming from the original, partially-transferred file rather 
 than coming from the sender (which would lower the network 
 bandwidth if this were a remote connection).

Hmm, OK. I guess my mental model of what rsync does is wrong.
If I read this correctly, I'm doing a local to local copy then I get no
benefit from re-using the partial copy. If however, I were doing
a remote copy, I would definitely get a benefit.

  In the future /path/to/dest will be an NFS mount.
 
 You don't want to do that unless you're network speed is 
 higher than your disk speed -- with slower net speeds you are 
 better off rsyncing directly to the remote machine that is 
 the source of the NFS mount so that rsync can reduce the 
 amount of data it is sending.  With higher net speeds you're 
 better off just transferring the data via --whole-file and 
 not using --partial.  One other possibility is the --append 
 option from the patch named patches/append.diff -- this 
 implements a more efficient append mode for incremental 
 transfers (I'm considering adding this to the next version of rsync).

Ahh, that sounds like what I'm looking for. I was hoping rsync
supported something like ftp restart, which would restart the file
transfer down to the byte level. I'll give it a look. Not sure I have the 
mojo to mess with the patches though! 

By the way, is there another protocol you might know of, other than ftp
that supports byte level restart/append? 

Thanks

Todd


-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Problem with rsync --inplace very slow/hung on large files

2005-03-15 Thread Evan Harris

I'm trying to rsync a very large (62gig) file from one machine to another as
part of a nightly backup.  If the file does not exist at the destination, it
takes about 2.5 hours to copy in my environment.

But, if the file does exist and --inplace is specified, and the file
contents differ, rsync either is so significantly slowed as to take more
than 30 hours (the longest I've let an instance run), or it is just hung.

Running with -vvv gives this as the last few lines of the output:

match at 205401064 last_match=205401064 j=821 len=250184 n=0
match at 205651248 last_match=205651248 j=822 len=250184 n=0
match at 205901432 last_match=205901432 j=823 len=250184 n=0
match at 206151616 last_match=206151616 j=824 len=250184 n=0

at which point it has not printed anything else since I last looked at the
current run attempt about 8 hours ago.

Doing an strace on the rsync processes on the sending and receiving machines
it appears that there is still reading and writing going on, but there isn't
any output from the -vvv and I can't tell if it's really doing anything.

Is this excessive slowness just an artifact of doing an rsync --inplace on
such a large file, and it will eventually complete if let run long enough?

I would try testing without the --inplace, but the system in question
doesn't have enough disk space for two copies of that size file, which is
why I am using --inplace.

Using 2.6.3, on Debian.  Any help appreciated.

Evan

-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


  1   2   >