Re: rsync mechanics question

2007-05-09 Thread Izidor Jerebic


On 10.5.2007, at 2:16, Tom Riley wrote:


I have 2 mounts on a single computer the production mount is 100gigs
(/msgstore) and contains rough 17 million small files (email message
store), and a newly created 500g ufs file system (/mnt)




However, the curiosity comes in with my source data taking up  
86gigs of

data on a 100g partition, and as the copy progresses the destination
drive is reporting 240 gigs of usage.

So as far as I can tell, rsync is working and the data integrity seems
good, it's simply taking up 2.5 times the space.


The simplest possibility is that the two partitions are using  
different block size, so a small file that occupies one block is  
taking more disk space on the partition with larger block size.


izidor

--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync mechanics question

2007-05-09 Thread Jamie Lokier
Tom Riley wrote:
> However, the curiosity comes in with my source data taking up 86gigs of
> data on a 100g partition, and as the copy progresses the destination
> drive is reporting 240 gigs of usage.
> 
> So as far as I can tell, rsync is working and the data integrity seems
> good, it's simply taking up 2.5 times the space.

Do you need the -S (--sparse) option?

Omitting this, when some of the source files are sparse, is one reason
files take more space when they are copied on unix in general.  If
there are sparse files, this will reduce their size at the destination
to something more reasonable, but I don't know if they'll be exactly
the same size.

Secondly, do you need the -H (--hard-links) option?

Omitting this, when some of the source files are hard linked, would
cause multiple copies of the same file to be created on the destination.

To be sure of a clean copy with -S and -H, I think you need to start
with an empty destination, the first time.  This will show you if
those options have helped.

You can check if these options are relevant without actually copying,
using "du" to get number of inodes and number of bytes used on the
source disk, "find . | wc -l" to get the number of inodes
(approximately) that will be created without -H, and "find . -printf
'.+((%s+4095)/4096*4096)\n' | bc -l | tail -n1" (works on Linux
anyway) to get the number of bytes (approximately) that will be
created without -S and -H both.

> This crosses realms of expertise that I'm a bit light on, and am fast
> coming up to speed on. I'm trying to determine if there is some mechanic
> within the rsync process that could account for the used space. James
> mentioned that rsync creates temp files which could account for double
> disk usage, and I'm following up on that. 

It only creates one temp file at a time, though, and moves it into
place before starting the next one.  So if the largest individual file
is 1G, you'd only expect 1G at most extra during the transfer, and
nothing by the end.  It cannot possibly explain taking 2.5 times the
space.

-- Jamie
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


delete old logs

2007-05-09 Thread Florin Andrei

I've several web servers that use log4j to generate the logs.

The log file currently appended to is called filename. The older, 
non-current logs are called filename.-mm-dd-hh.


Obviously, rotation takes place once every hour and it's done 
automatically by log4j. Old logs are never deleted by log4j, something 
else must delete them.


I need to rsync all the filename* files from all web servers 
periodically to a "log server", and a safe copy is made to a backup log 
server. Each web server is running an rsync daemon, and a cron job on 
the main log server cycles through all web servers periodically.


Logs older than X hours must be deleted from the web servers, but only 
provided that a copy already exists on the log server AND another copy 
exists on the backup log server (any log file must exist in at least two 
places after any given rsync transfer).


Because I want to minimize the number of protocols and applications 
involved in the process, I'd like to perform the deletion of old log 
files from the web servers using rsync.
Another reason to do that is that I want to handle everything (log 
transfer, archival, deletion) via one single cron job on the log server, 
instead of various cron jobs on the log server and on each web server, 
in order to minimize collisions.


The problem is, the files that must be deleted are on the sender and the 
rsync documentation that I'm looking at doesn't seem to provide any clue 
as to how to delete files on the sender.


Once some of these conditions are relaxed, the problem appears quite 
solvable (e.g., mount the log directories via NFS and then do rsync and 
deletion over the NFS mount), but first I'd like to make sure that 
there's no way to accomplish everything solely via rsync.


Please enlighten me.

Thanks,

--
Florin Andrei

http://florin.myip.org/
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


RE: rsync mechanics question

2007-05-09 Thread Tom Riley
Matt,

Thanks for the reply. To clarify, I'm doing the following:

I have 2 mounts on a single computer the production mount is 100gigs
(/msgstore) and contains rough 17 million small files (email message
store), and a newly created 500g ufs file system (/mnt)

I'd like to minimize the downtime required to make the cut over of data,
so I'm doing low priority repeative rsync of the source data over to the
new 500g partition. Just prior to the downtime, I'll stop mail services
and do a final cold rsync and change the mount points.

Rsync is working fine, and doing its job.

However, the curiosity comes in with my source data taking up 86gigs of
data on a 100g partition, and as the copy progresses the destination
drive is reporting 240 gigs of usage.

So as far as I can tell, rsync is working and the data integrity seems
good, it's simply taking up 2.5 times the space.

This crosses realms of expertise that I'm a bit light on, and am fast
coming up to speed on. I'm trying to determine if there is some mechanic
within the rsync process that could account for the used space. James
mentioned that rsync creates temp files which could account for double
disk usage, and I'm following up on that. 

A second possibility is that because the disk is so large that Solaris
is doing something funky with the minimum block size per inode
assignment that's causing small files to consume more space on a larger
sized file system.

Ideally, I'd prefer the data copied from /msgstore to consume roughly
the same amount of the new disk as the old so I can have 400gigs of
growth rather than 250gigs.

Make sense? Any thoughts or suggestions would be welcomed.

-Tom

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf
Of
> Matt McCutchen
> Sent: Wednesday, May 09, 2007 2:56 PM
> To: Tom Riley
> Cc: rsync@lists.samba.org
> Subject: Re: rsync mechanics question
> 
> On 5/9/07, Tom Riley <[EMAIL PROTECTED]> wrote:
> > I've been using rsync (2.6.9) to migrate a 90g message store volume
and
> I'm
> > running into some interesting results.
> 
> Please be more specific about what is going wrong.  If you get an
> error message, please send the exact text.  If rsync is successful,
> what does it do that you didn't expect/want?
> 
> > Does rsync copy files at a file copy level or is it attempting to do
> some
> > block level copying?
> 
> Rsync works at file level, not block level.  (Of course, if a source
> file is itself a filesystem image, then one could say that rsync works
> at block level for that image.)
> 
> Matt
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Using --remove-source-files with backup?

2007-05-09 Thread Matt McCutchen

On 5/7/07, ScottZ <[EMAIL PROTECTED]> wrote:

With --source-backup the tree structure of the original source file is also 
being included in the --backup-dir directory.


That is an idiosyncracy of the quick-and-dirty way I implemented the
source backup, and I documented it in the patched man page.  To avoid
it, cd into the source directory first so you can give the source
argument as "." or just a filename.

When Wayne made a modified version of the source-backup patch to
include with HEAD of rsync, he found it more convenient to follow the
backup-dir path starting from the source dir rather than the source
argument path starting from the backup dir.  It's not clear to me
whether either of these behaviors is the best possible.

Matt
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync mechanics question

2007-05-09 Thread Matt McCutchen

On 5/9/07, Tom Riley <[EMAIL PROTECTED]> wrote:

I've been using rsync (2.6.9) to migrate a 90g message store volume and I'm
running into some interesting results.


Please be more specific about what is going wrong.  If you get an
error message, please send the exact text.  If rsync is successful,
what does it do that you didn't expect/want?


Does rsync copy files at a file copy level or is it attempting to do some
block level copying?


Rsync works at file level, not block level.  (Of course, if a source
file is itself a filesystem image, then one could say that rsync works
at block level for that image.)

Matt
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync feature needed: preserve atime

2007-05-09 Thread Matt McCutchen

On 5/9/07, Dave Dykstra <[EMAIL PROTECTED]> wrote:

What does it really mean to preserve access times?  When rsync reads
a file to copy it, it will change the access time just because it is
reading it, so the backup should then have the correct access time,
the time the backup file was created.


I tested HEAD of rsync + atimes.diff and it indeed behaves this way,
which is silly: rsync destroys the data it is told to preserve!  The
atimes.test doesn't catch this because it uses a zero-length file, so
there is no read to actually hit the atime.


Some tools have an option to
reset the access time of the file they copy, but in so doing the tools
update the inode change time of the input file which is generally more
important to users than access time.  It is not possible to preserve
both as a filesystem user like rsync.


It is possible on some filesystems to read a file without hitting its
atime by opening it with O_NOATIME.  The same issues apply to GNU tar,
so there was an extensive discussion on the bug-tar mailing list of
how not to destroy atimes of source files (including use of
O_NOATIME):

http://lists.gnu.org/archive/html/bug-tar/2005-09/msg00035.html

Personally, I don't like access times because they're impure in the
sense that reading shouldn't write and as far as I know they don't
have any important uses.  All of my computer's filesystems are mounted
noatime.

Matt
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: preserve EA?

2007-05-09 Thread Ming Zhang
On Wed, 2007-05-09 at 16:32 -0400, Matt McCutchen wrote:
> On 5/9/07, Ming Zhang <[EMAIL PROTECTED]> wrote:
> > side question, if i know one file only have EA changed, thus mtime is
> > updated, can i force the rsync to do EA update only? regular rsync run
> > will do checksum stuff if mtime changed, and find out all content are
> > same which generate too many computation overheads.
> 
> Yes, you can use --size-only, which makes rsync assume that
> corresponding files with the same size don't need the contents
> transferred.

cool. thx!

> 
> Matt

-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: preserve EA?

2007-05-09 Thread Matt McCutchen

On 5/9/07, Ming Zhang <[EMAIL PROTECTED]> wrote:

side question, if i know one file only have EA changed, thus mtime is
updated, can i force the rsync to do EA update only? regular rsync run
will do checksum stuff if mtime changed, and find out all content are
same which generate too many computation overheads.


Yes, you can use --size-only, which makes rsync assume that
corresponding files with the same size don't need the contents
transferred.

Matt
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync mechanics question

2007-05-09 Thread Matt McCutchen

On 5/9/07, JamesDR <[EMAIL PROTECTED]> wrote:

By using --inplace (if I read that right)
you'll be xfering the entire 90GB store over the network.


Not exactly.  --inplace only prevents the receiver from matching
source data using data at an earlier offset in the old destination
file (because the receiver will overwrite the old data too soon to use
it).  (Note to Wayne: I think the man page should state this
explicitly, especially because that will make the remark about sorting
data matches meaningful to new users.)  Thus, if I reason correctly,
inserting data in the source file spoils all the matches from there
on, while deleting or modifying data does not do any more harm than it
would without --inplace.

Matt
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: preserve EA?

2007-05-09 Thread Ming Zhang
On Wed, 2007-05-09 at 16:07 -0400, Matt McCutchen wrote:
> On 5/9/07, Ming Zhang <[EMAIL PROTECTED]> wrote:
> > sorry that i should do more check. the rsync man page on site does not
> > have this while fc6 rsync man page has -X support. i guess there are
> > some extra patches floating around.
> 
> That extra patch is floating at "patches/xattrs.diff" in rsync source
> distributions through version 2.6.9.  The support for preserving
> extended attributes has been merged into the main rsync in CVS, which
> will eventually be released as rsync 3.0.0.

ic. thx for the info.

side question, if i know one file only have EA changed, thus mtime is
updated, can i force the rsync to do EA update only? regular rsync run
will do checksum stuff if mtime changed, and find out all content are
same which generate too many computation overheads.

> 
> Matt

-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: preserve EA?

2007-05-09 Thread Matt McCutchen

On 5/9/07, Ming Zhang <[EMAIL PROTECTED]> wrote:

sorry that i should do more check. the rsync man page on site does not
have this while fc6 rsync man page has -X support. i guess there are
some extra patches floating around.


That extra patch is floating at "patches/xattrs.diff" in rsync source
distributions through version 2.6.9.  The support for preserving
extended attributes has been merged into the main rsync in CVS, which
will eventually be released as rsync 3.0.0.

Matt
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync mechanics question

2007-05-09 Thread JamesDR
JamesDR wrote:
> Tom Riley wrote:
>> Hey All,
>>
>>  
>>
>> I’ve been using rsync (2.6.9) to migrate a 90g message store volume and
>> I’m running into some interesting results. I have two FC storage arrays
>> attached to a Sunfire V280R, running Solaris 8. My 100gig volume is on a
>> Sun StoreEdge 3510, and my new 500gig partition is on an HP EVA.
>>
>>  
>>
>> I used the syntax: rsync -a --stats --delete $SRC/$dir/ $DST/$dir
>>
>>  
>>
>> At present time, the destination volume is 2.5 times the size of the
>> original volume. My first thought was this may be a drive geometry
>> issue, and I’ve been working with Sun to get a solution. They believe
>> rsync is doing a block level copy, instead of a file level copy. That
>> doesn’t seem to jive with what I’ve been reading, but wanted to get some
>> more experienced eyes looking at the problem.
>>
>>  
>>
>> Does rsync copy files at a file copy level or is it attempting to do
>> some block level copying?  Has anyone experienced this sort of bloated
>> expansion of space? Any advice would be greatly appreciated.
>>
>>  
>>
>> -Tom
>>
> 
> It creates temporary files during the xfer then moves them over once
> completed. I think you'll want to use --inplace. So if the store is 90GB
> in size, while the xfer is going on it could be using 180GB +/- (depending.)
> 
> http://rsync.samba.org/ftp/rsync/rsync.html has more info.
> 

Re-reading that, you may want to consider the sync and storage needed
verses network bandwidth. By using --inplace (if I read that right)
you'll be xfering the entire 90GB store over the network. Its a toss up
between network bandwidth (and time to xfer) and storage space.

-- 
Thanks,
James

-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync fails to sync files

2007-05-09 Thread Matt McCutchen

On 5/9/07, Paul Slootman <[EMAIL PROTECTED]> wrote:

On Tue 08 May 2007, Wayne Davison wrote:
>
> You can read the very latest manpage with my improvements here:
>
> http://rsync.samba.org/ftp/rsync/nightly/rsync.html
>
> E.g., there's extra quick-check discussion in the DESCRIPTION section.

Yes, that's a great improvement. This is typically one of those things
that you don't miss when you're familiar with rsync, but which is quite
essential info otherwise...


I agree.  Making important default behaviors like this one more
prominent is a major goal of the man page rewrite that C Sights and I
were discussing.  I hope to continue working on it at some point.

Matt
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync mechanics question

2007-05-09 Thread JamesDR
Tom Riley wrote:
> Hey All,
> 
>  
> 
> I’ve been using rsync (2.6.9) to migrate a 90g message store volume and
> I’m running into some interesting results. I have two FC storage arrays
> attached to a Sunfire V280R, running Solaris 8. My 100gig volume is on a
> Sun StoreEdge 3510, and my new 500gig partition is on an HP EVA.
> 
>  
> 
> I used the syntax: rsync -a --stats --delete $SRC/$dir/ $DST/$dir
> 
>  
> 
> At present time, the destination volume is 2.5 times the size of the
> original volume. My first thought was this may be a drive geometry
> issue, and I’ve been working with Sun to get a solution. They believe
> rsync is doing a block level copy, instead of a file level copy. That
> doesn’t seem to jive with what I’ve been reading, but wanted to get some
> more experienced eyes looking at the problem.
> 
>  
> 
> Does rsync copy files at a file copy level or is it attempting to do
> some block level copying?  Has anyone experienced this sort of bloated
> expansion of space? Any advice would be greatly appreciated.
> 
>  
> 
> -Tom
> 

It creates temporary files during the xfer then moves them over once
completed. I think you'll want to use --inplace. So if the store is 90GB
in size, while the xfer is going on it could be using 180GB +/- (depending.)

http://rsync.samba.org/ftp/rsync/rsync.html has more info.

-- 
Thanks,
James

-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync feature needed: preserve atime

2007-05-09 Thread Dave Dykstra
On Tue, May 08, 2007 at 07:09:06PM -0400, Matt McCutchen wrote:
> On 5/8/07, Brent Thompson <[EMAIL PROTECTED]> wrote:
> >Often we need to preserve the information atime conveys, but I have found
> >no way to get rsync to preserve this, nor any hint it is being worked on.
> >It would be great if 'rsync -t' also set atime not just mtime -- or maybe a
> >new option is desired for atime.
> 
> The standard version of rsync does not support preserving atimes, but
> the rsync source distribution includes a patch "patches/atimes.diff"
> that adds an option --atimes to preserve atimes.  I recommend you
> compile your own copy of rsync including this patch (reply if you need
> help/instructions for this) and then use it with its --atimes option.


What does it really mean to preserve access times?  When rsync reads
a file to copy it, it will change the access time just because it is
reading it, so the backup should then have the correct access time,
the time the backup file was created.  Some tools have an option to
reset the access time of the file they copy, but in so doing the tools
update the inode change time of the input file which is generally more
important to users than access time.  It is not possible to preserve
both as a filesystem user like rsync.

- Dave
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


rsync mechanics question

2007-05-09 Thread Tom Riley
Hey All,

 

I've been using rsync (2.6.9) to migrate a 90g message store volume and
I'm running into some interesting results. I have two FC storage arrays
attached to a Sunfire V280R, running Solaris 8. My 100gig volume is on a
Sun StoreEdge 3510, and my new 500gig partition is on an HP EVA.

 

I used the syntax: rsync -a --stats --delete $SRC/$dir/ $DST/$dir

 

At present time, the destination volume is 2.5 times the size of the
original volume. My first thought was this may be a drive geometry
issue, and I've been working with Sun to get a solution. They believe
rsync is doing a block level copy, instead of a file level copy. That
doesn't seem to jive with what I've been reading, but wanted to get some
more experienced eyes looking at the problem.

 

Does rsync copy files at a file copy level or is it attempting to do
some block level copying?  Has anyone experienced this sort of bloated
expansion of space? Any advice would be greatly appreciated.

 

-Tom

-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: preserve EA?

2007-05-09 Thread Ming Zhang

On Wed, 2007-05-09 at 14:51 -0400, Ming Zhang wrote:
> Hi all
> 
> When rsync replicate one file to remote side, will the extended
> attributes be copied as well? Thanks,

sorry that i should do more check. the rsync man page on site does not
have this while fc6 rsync man page has -X support. i guess there are
some extra patches floating around.

sorry for the noise.

> 
> Ming


-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


preserve EA?

2007-05-09 Thread Ming Zhang
Hi all

When rsync replicate one file to remote side, will the extended
attributes be copied as well? Thanks,

Ming


-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync fails to sync files

2007-05-09 Thread Paul Slootman
On Tue 08 May 2007, Wayne Davison wrote:
> 
> You can read the very latest manpage with my improvements here:
> 
> http://rsync.samba.org/ftp/rsync/nightly/rsync.html
> 
> E.g., there's extra quick-check discussion in the DESCRIPTION section.

Yes, that's a great improvement. This is typically one of those things
that you don't miss when you're familiar with rsync, but which is quite
essential info otherwise...


Paul Slootman
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html