RE: How to make rsync faster?

2007-11-16 Thread Tang, Clayton (Yiqi)

Thanks for the reply. Unfortunately, people here are very conservative
and shy away from kernel modules. I will look into the pull process.


Regards,
Clayton
--
Clayton (Yiqi) Tang, LMX / Autotrader Production Management
212-526-7493, 745-7th Ave, New York, NY 10019

-Original Message-
From: Chris (Ducky) Chapin [mailto:[EMAIL PROTECTED] 
Sent: Thursday, November 15, 2007 4:50 PM
To: Tang, Clayton (Yiqi)
Cc: rsync@lists.samba.org
Subject: Re: How to make rsync faster?


1) Yes! 2.6.x especially helps with memory.
2) Not that I've seen, but I'd be really interested!
3) We've had great luck with (Open)AFS, though it's not for everyone,
not even in our environment. =) (having to load a kernel module being #1
complaint). rsync allows us to accommodate those that don't wish to use
AFS.

Here's what we're doing:

Roughly 10% (~1k hosts) of our install-base use rsync as an alternative
to AFS (our system configuration and application store). About 250M is
checked hourly, though as often as every 15 minutes for more time
sensitive systems. We've tossed around the idea of using batch-mode, but
it unfortunately doesn't fit our model - It's basically a huge buffet of
data that the hosts pick and choose which trees to keep in sync.

What we've found is client initiated pulls scale much better than pushes
from a central server. We have each host sleep for a random amount of
time using the hostname as a seed (so it's the same from run to run)
before initiating the rsync. This causes multiple rsyncs to be run on
the server, but it can handle dozens of connections at a time without
issue, especially after the switch to 2.6 versions of rsync.

We also have multiple servers from which the client can rsync from, but
that is handled similarly to the timing: A host randomly picks  a server
from a list using hostname as the seed. The servers are monitored for
load and new ones added appropriately. Our server to client ratio is
close to 50:1.

-Ducky

Tang, Clayton (Yiqi) wrote:
 I manage 250+ redhat linux boxes. The boxes are all setup the same
way.
 On a daily basis, we sync the app directory which is about 30gb out to

 all hosts. The daily delta is actually less than 1gb, but since I 
 can't be sure if any individual box was tempered during the day, I 
 always do a full sync. On a monthly basis, we run with --delete to 
 clean out the stale files on the hosts.

 The command I use daily is: /usr/bin/rsync -a -e ssh, with a ksh for

 loop on the 250+ host names The version is: rsync  version 2.5.7  
 protocol version 26

 Since rsync must do a chksum on the local and remote box on all files,

 the whole sync process takes over 2hrs even if nothing was changed.

 My questions are:

 1) I know I have an old version, are there performance improvements in

 the later versions? I am not the SA, the process to request a new 
 install is lengthy.

 2) Is there a parallel rsync program? Looping 250 times to invoke 
 causes rsync to checksum the local files 250 times, which is a waste 
 of resource. Can parallel rsync be considered for a future version?

 3) Are there better ways to achieve what I need to do with rsync or 
 another tool?

 Thank you,
 Clayton

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
 - - - - - - - - -

 This message is intended only for the personal and confidential use of
the designated recipient(s) named above.  If you are not the intended
recipient of this message you are hereby notified that any review,
dissemination, distribution or copying of this message is strictly
prohibited.  This communication is for information purposes only and
should not be regarded as an offer to sell or as a solicitation of an
offer to buy any financial product, an official confirmation of any
transaction, or as an official statement of Lehman Brothers.  Email
transmission cannot be guaranteed to be secure or error-free.
Therefore, we do not represent that this information is complete or
accurate and it should not be relied upon as such.  All information is
subject to change without notice.

 
 IRS Circular 230 Disclosure:
 Please be advised that any discussion of U.S. tax matters contained
within this communication (including any attachments) is not intended or
written to be used and cannot be used for the purpose of (i) avoiding
U.S. tax related penalties or (ii) promoting, marketing or recommending
to another party any transaction or matter addressed herein.


   

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - -

This message is intended only for the personal and confidential use of the 
designated recipient(s) named above.  If you are not the intended recipient of 
this message you are hereby notified that any review, dissemination, 
distribution or copying of this message is strictly prohibited.  This 
communication is for information purposes only and should not be regarded as an 
offer to sell or as a solicitation of an offer to buy any

RE: How to make rsync faster?

2007-11-16 Thread Tang, Clayton (Yiqi)

Thanks for the reply. How safe is it NOT to chksum? Does rsync use size
instead or date instead or both together? 

Actually splitting is what I just did. I split 250 hosts into 4 lists
and running 4 rsync jobs from the master in parallel. This causes
80%-90% total CPU usage, still runs for about 50min...


Regards,
Clayton
--
Clayton (Yiqi) Tang, LMX / Autotrader Production Management
212-526-7493, 745-7th Ave, New York, NY 10019

-Original Message-
From: Craig Hammond [mailto:[EMAIL PROTECTED] 
Sent: Thursday, November 15, 2007 5:57 PM
To: Tang, Clayton (Yiqi); rsync@lists.samba.org
Subject: RE: How to make rsync faster?

I'm no rsync guru my any means, but two things spring to mind.

Use the -t option to stop all the spurious check summing.

Split your script into multiple scripts, each with a share of host
names.
Run each in parallel. Multiple rsyncs can run on the one box
concurrently.

Craig


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On
Behalf Of Tang, Clayton (Yiqi)
Sent: Friday, 16 November 2007 6:09 AM
To: rsync@lists.samba.org
Subject: How to make rsync faster?


I manage 250+ redhat linux boxes. The boxes are all setup the same way.
On a daily basis, we sync the app directory which is about 30gb out to
all hosts. The daily delta is actually less than 1gb, but since I can't
be sure if any individual box was tempered during the day, I always do a
full sync. On a monthly basis, we run with --delete to clean out the
stale files on the hosts.

The command I use daily is: /usr/bin/rsync -a -e ssh, with a ksh for
loop on the 250+ host names The version is: rsync  version 2.5.7
protocol version 26

Since rsync must do a chksum on the local and remote box on all files,
the whole sync process takes over 2hrs even if nothing was changed.

My questions are:

1) I know I have an old version, are there performance improvements in
the later versions? I am not the SA, the process to request a new
install is lengthy.

2) Is there a parallel rsync program? Looping 250 times to invoke
causes rsync to checksum the local files 250 times, which is a waste of
resource. Can parallel rsync be considered for a future version?

3) Are there better ways to achieve what I need to do with rsync or
another tool?

Thank you,
Clayton

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - -

This message is intended only for the personal and confidential use of
the designated recipient(s) named above.  If you are not the intended
recipient of this message you are hereby notified that any review,
dissemination, distribution or copying of this message is strictly
prohibited.  This communication is for information purposes only and
should not be regarded as an offer to sell or as a solicitation of an
offer to buy any financial product, an official confirmation of any
transaction, or as an official statement of Lehman Brothers.  Email
transmission cannot be guaranteed to be secure or error-free.
Therefore, we do not represent that this information is complete or
accurate and it should not be relied upon as such.  All information is
subject to change without notice.


IRS Circular 230 Disclosure:
Please be advised that any discussion of U.S. tax matters contained
within this communication (including any attachments) is not intended or
written to be used and cannot be used for the purpose of (i) avoiding
U.S. tax related penalties or (ii) promoting, marketing or recommending
to another party any transaction or matter addressed herein.


--
To unsubscribe or change options:
https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - -

This message is intended only for the personal and confidential use of the 
designated recipient(s) named above.  If you are not the intended recipient of 
this message you are hereby notified that any review, dissemination, 
distribution or copying of this message is strictly prohibited.  This 
communication is for information purposes only and should not be regarded as an 
offer to sell or as a solicitation of an offer to buy any financial product, an 
official confirmation of any transaction, or as an official statement of Lehman 
Brothers.  Email transmission cannot be guaranteed to be secure or error-free.  
Therefore, we do not represent that this information is complete or accurate 
and it should not be relied upon as such.  All information is subject to change 
without notice.


IRS Circular 230 Disclosure:
Please be advised that any discussion of U.S. tax matters contained within this 
communication (including any attachments) is not intended or written to be used 
and cannot be used for the purpose of (i) avoiding U.S. tax related penalties 
or (ii) promoting, marketing or recommending to another party any transaction 
or matter addressed herein

RE: How to make rsync faster?

2007-11-16 Thread Craig Hammond
-t is good enough for me. It checks size and time and would be heaps
faster than chksum.
It has never caused me a problem as yet.

Whether size  time over chksum is good enough for you is up to you.

Craig...

-Original Message-
From: Tang, Clayton (Yiqi) [mailto:[EMAIL PROTECTED] 
Sent: Saturday, 17 November 2007 1:16 AM
To: Craig Hammond; rsync@lists.samba.org
Subject: RE: How to make rsync faster?


Thanks for the reply. How safe is it NOT to chksum? Does rsync use size
instead or date instead or both together? 

Actually splitting is what I just did. I split 250 hosts into 4 lists
and running 4 rsync jobs from the master in parallel. This causes
80%-90% total CPU usage, still runs for about 50min...


Regards,
Clayton
--
Clayton (Yiqi) Tang, LMX / Autotrader Production Management
212-526-7493, 745-7th Ave, New York, NY 10019

-Original Message-
From: Craig Hammond [mailto:[EMAIL PROTECTED] 
Sent: Thursday, November 15, 2007 5:57 PM
To: Tang, Clayton (Yiqi); rsync@lists.samba.org
Subject: RE: How to make rsync faster?

I'm no rsync guru my any means, but two things spring to mind.

Use the -t option to stop all the spurious check summing.

Split your script into multiple scripts, each with a share of host
names.
Run each in parallel. Multiple rsyncs can run on the one box
concurrently.

Craig


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On
Behalf Of Tang, Clayton (Yiqi)
Sent: Friday, 16 November 2007 6:09 AM
To: rsync@lists.samba.org
Subject: How to make rsync faster?


I manage 250+ redhat linux boxes. The boxes are all setup the same way.
On a daily basis, we sync the app directory which is about 30gb out to
all hosts. The daily delta is actually less than 1gb, but since I can't
be sure if any individual box was tempered during the day, I always do a
full sync. On a monthly basis, we run with --delete to clean out the
stale files on the hosts.

The command I use daily is: /usr/bin/rsync -a -e ssh, with a ksh for
loop on the 250+ host names The version is: rsync  version 2.5.7
protocol version 26

Since rsync must do a chksum on the local and remote box on all files,
the whole sync process takes over 2hrs even if nothing was changed.

My questions are:

1) I know I have an old version, are there performance improvements in
the later versions? I am not the SA, the process to request a new
install is lengthy.

2) Is there a parallel rsync program? Looping 250 times to invoke
causes rsync to checksum the local files 250 times, which is a waste of
resource. Can parallel rsync be considered for a future version?

3) Are there better ways to achieve what I need to do with rsync or
another tool?

Thank you,
Clayton

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - -

This message is intended only for the personal and confidential use of
the designated recipient(s) named above.  If you are not the intended
recipient of this message you are hereby notified that any review,
dissemination, distribution or copying of this message is strictly
prohibited.  This communication is for information purposes only and
should not be regarded as an offer to sell or as a solicitation of an
offer to buy any financial product, an official confirmation of any
transaction, or as an official statement of Lehman Brothers.  Email
transmission cannot be guaranteed to be secure or error-free.
Therefore, we do not represent that this information is complete or
accurate and it should not be relied upon as such.  All information is
subject to change without notice.


IRS Circular 230 Disclosure:
Please be advised that any discussion of U.S. tax matters contained
within this communication (including any attachments) is not intended or
written to be used and cannot be used for the purpose of (i) avoiding
U.S. tax related penalties or (ii) promoting, marketing or recommending
to another party any transaction or matter addressed herein.


--
To unsubscribe or change options:
https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - -

This message is intended only for the personal and confidential use of
the designated recipient(s) named above.  If you are not the intended
recipient of this message you are hereby notified that any review,
dissemination, distribution or copying of this message is strictly
prohibited.  This communication is for information purposes only and
should not be regarded as an offer to sell or as a solicitation of an
offer to buy any financial product, an official confirmation of any
transaction, or as an official statement of Lehman Brothers.  Email
transmission cannot be guaranteed to be secure or error-free.
Therefore, we do not represent that this information is complete or
accurate and it should not be relied upon as such.  All information is
subject to change without

How to make rsync faster?

2007-11-15 Thread Tang, Clayton (Yiqi)

I manage 250+ redhat linux boxes. The boxes are all setup the same way.
On a daily basis, we sync the app directory which is about 30gb out to
all hosts. The daily delta is actually less than 1gb, but since I can't
be sure if any individual box was tempered during the day, I always do a
full sync. On a monthly basis, we run with --delete to clean out the
stale files on the hosts.

The command I use daily is: /usr/bin/rsync -a -e ssh, with a ksh for
loop on the 250+ host names
The version is: rsync  version 2.5.7  protocol version 26

Since rsync must do a chksum on the local and remote box on all files,
the whole sync process takes over 2hrs even if nothing was changed.

My questions are:

1) I know I have an old version, are there performance improvements in
the later versions? I am not the SA, the process to request a new
install is lengthy.

2) Is there a parallel rsync program? Looping 250 times to invoke
causes rsync to checksum the local files 250 times, which is a waste of
resource. Can parallel rsync be considered for a future version?

3) Are there better ways to achieve what I need to do with rsync or
another tool?

Thank you,
Clayton

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - -

This message is intended only for the personal and confidential use of the 
designated recipient(s) named above.  If you are not the intended recipient of 
this message you are hereby notified that any review, dissemination, 
distribution or copying of this message is strictly prohibited.  This 
communication is for information purposes only and should not be regarded as an 
offer to sell or as a solicitation of an offer to buy any financial product, an 
official confirmation of any transaction, or as an official statement of Lehman 
Brothers.  Email transmission cannot be guaranteed to be secure or error-free.  
Therefore, we do not represent that this information is complete or accurate 
and it should not be relied upon as such.  All information is subject to change 
without notice.


IRS Circular 230 Disclosure:
Please be advised that any discussion of U.S. tax matters contained within this 
communication (including any attachments) is not intended or written to be used 
and cannot be used for the purpose of (i) avoiding U.S. tax related penalties 
or (ii) promoting, marketing or recommending to another party any transaction 
or matter addressed herein.


--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: How to make rsync faster?

2007-11-15 Thread Maxim Veksler
On Nov 15, 2007 9:08 PM, Tang, Clayton (Yiqi) [EMAIL PROTECTED] wrote:

 I manage 250+ redhat linux boxes. The boxes are all setup the same way.
 On a daily basis, we sync the app directory which is about 30gb out to
 all hosts. The daily delta is actually less than 1gb, but since I can't
 be sure if any individual box was tempered during the day, I always do a
 full sync. On a monthly basis, we run with --delete to clean out the
 stale files on the hosts.

 The command I use daily is: /usr/bin/rsync -a -e ssh, with a ksh for
 loop on the 250+ host names
 The version is: rsync  version 2.5.7  protocol version 26

 Since rsync must do a chksum on the local and remote box on all files,
 the whole sync process takes over 2hrs even if nothing was changed.

 My questions are:

 1) I know I have an old version, are there performance improvements in
 the later versions? I am not the SA, the process to request a new
 install is lengthy.

 2) Is there a parallel rsync program? Looping 250 times to invoke
 causes rsync to checksum the local files 250 times, which is a waste of
 resource. Can parallel rsync be considered for a future version?

 3) Are there better ways to achieve what I need to do with rsync or
 another tool?

 Thank you,
 Clayton



Hello Tang,

First, for such operation you should RTFM about rsync batch mode [1].

Second, If I were you I would look for other solutions. Perhaps a
shared NFS storage or a copied FS based on drbd. Using rsync sounds
like a quick hack to me when you had 2 servers and 0 time to market.

I would love to hear other suggestions people have on this list for your issue.

[1] http://samba.anu.edu.au/ftp/rsync/rsync.html

-- 
Cheers,
Maxim Veksler

Free as in Freedom - Do u GNU ?
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: How to make rsync faster?

2007-11-15 Thread Chris (Ducky) Chapin


1) Yes! 2.6.x especially helps with memory.
2) Not that I've seen, but I'd be really interested!
3) We've had great luck with (Open)AFS, though it's not for everyone, 
not even in our environment. =) (having to load a kernel module being #1 
complaint). rsync allows us to accommodate those that don't wish to use AFS.


Here's what we're doing:

Roughly 10% (~1k hosts) of our install-base use rsync as an alternative 
to AFS (our system configuration and application store). About 250M is 
checked hourly, though as often as every 15 minutes for more time 
sensitive systems. We've tossed around the idea of using batch-mode, but 
it unfortunately doesn't fit our model - It's basically a huge buffet of 
data that the hosts pick and choose which trees to keep in sync.


What we've found is client initiated pulls scale much better than pushes 
from a central server. We have each host sleep for a random amount of 
time using the hostname as a seed (so it's the same from run to run) 
before initiating the rsync. This causes multiple rsyncs to be run on 
the server, but it can handle dozens of connections at a time without 
issue, especially after the switch to 2.6 versions of rsync.


We also have multiple servers from which the client can rsync from, but 
that is handled similarly to the timing: A host randomly picks  a server 
from a list using hostname as the seed. The servers are monitored for 
load and new ones added appropriately. Our server to client ratio is 
close to 50:1.


-Ducky

Tang, Clayton (Yiqi) wrote:

I manage 250+ redhat linux boxes. The boxes are all setup the same way.
On a daily basis, we sync the app directory which is about 30gb out to
all hosts. The daily delta is actually less than 1gb, but since I can't
be sure if any individual box was tempered during the day, I always do a
full sync. On a monthly basis, we run with --delete to clean out the
stale files on the hosts.

The command I use daily is: /usr/bin/rsync -a -e ssh, with a ksh for
loop on the 250+ host names
The version is: rsync  version 2.5.7  protocol version 26

Since rsync must do a chksum on the local and remote box on all files,
the whole sync process takes over 2hrs even if nothing was changed.

My questions are:

1) I know I have an old version, are there performance improvements in
the later versions? I am not the SA, the process to request a new
install is lengthy.

2) Is there a parallel rsync program? Looping 250 times to invoke
causes rsync to checksum the local files 250 times, which is a waste of
resource. Can parallel rsync be considered for a future version?

3) Are there better ways to achieve what I need to do with rsync or
another tool?

Thank you,
Clayton

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - -

This message is intended only for the personal and confidential use of the 
designated recipient(s) named above.  If you are not the intended recipient of 
this message you are hereby notified that any review, dissemination, 
distribution or copying of this message is strictly prohibited.  This 
communication is for information purposes only and should not be regarded as an 
offer to sell or as a solicitation of an offer to buy any financial product, an 
official confirmation of any transaction, or as an official statement of Lehman 
Brothers.  Email transmission cannot be guaranteed to be secure or error-free.  
Therefore, we do not represent that this information is complete or accurate 
and it should not be relied upon as such.  All information is subject to change 
without notice.


IRS Circular 230 Disclosure:
Please be advised that any discussion of U.S. tax matters contained within this 
communication (including any attachments) is not intended or written to be used 
and cannot be used for the purpose of (i) avoiding U.S. tax related penalties 
or (ii) promoting, marketing or recommending to another party any transaction 
or matter addressed herein.


  

--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


RE: How to make rsync faster?

2007-11-15 Thread Craig Hammond
I'm no rsync guru my any means, but two things spring to mind.

Use the -t option to stop all the spurious check summing.

Split your script into multiple scripts, each with a share of host
names.
Run each in parallel. Multiple rsyncs can run on the one box
concurrently.

Craig


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On
Behalf Of Tang, Clayton (Yiqi)
Sent: Friday, 16 November 2007 6:09 AM
To: rsync@lists.samba.org
Subject: How to make rsync faster?


I manage 250+ redhat linux boxes. The boxes are all setup the same way.
On a daily basis, we sync the app directory which is about 30gb out to
all hosts. The daily delta is actually less than 1gb, but since I can't
be sure if any individual box was tempered during the day, I always do a
full sync. On a monthly basis, we run with --delete to clean out the
stale files on the hosts.

The command I use daily is: /usr/bin/rsync -a -e ssh, with a ksh for
loop on the 250+ host names
The version is: rsync  version 2.5.7  protocol version 26

Since rsync must do a chksum on the local and remote box on all files,
the whole sync process takes over 2hrs even if nothing was changed.

My questions are:

1) I know I have an old version, are there performance improvements in
the later versions? I am not the SA, the process to request a new
install is lengthy.

2) Is there a parallel rsync program? Looping 250 times to invoke
causes rsync to checksum the local files 250 times, which is a waste of
resource. Can parallel rsync be considered for a future version?

3) Are there better ways to achieve what I need to do with rsync or
another tool?

Thank you,
Clayton

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - -

This message is intended only for the personal and confidential use of
the designated recipient(s) named above.  If you are not the intended
recipient of this message you are hereby notified that any review,
dissemination, distribution or copying of this message is strictly
prohibited.  This communication is for information purposes only and
should not be regarded as an offer to sell or as a solicitation of an
offer to buy any financial product, an official confirmation of any
transaction, or as an official statement of Lehman Brothers.  Email
transmission cannot be guaranteed to be secure or error-free.
Therefore, we do not represent that this information is complete or
accurate and it should not be relied upon as such.  All information is
subject to change without notice.


IRS Circular 230 Disclosure:
Please be advised that any discussion of U.S. tax matters contained
within this communication (including any attachments) is not intended or
written to be used and cannot be used for the purpose of (i) avoiding
U.S. tax related penalties or (ii) promoting, marketing or recommending
to another party any transaction or matter addressed herein.


-- 
To unsubscribe or change options:
https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html