Re: faster rsync of huge directories
guy keren wrote: as well as sys admins/kernel developers - the initrd file on (some?) linux distributions is a gziped cpio file (at least on RHEL 5.X) Initrd can come in one of two formats. These are either some (any) file system (you usually use some read only file system, most common of which is cramfs). If that is the case, the image is called initrd, and is available since 2.4 kernels. The other option is to put the files inside a cpio archive. If that is the case, the image is called initramfs, and is the new method (i.e. - 2.6). Initramfs is the preferred method of creating initrd images, and so you can say that cpio is making a comeback... :-) Shachar -- Shachar Shemesh Lingnu Open Source Consulting Ltd. http://www.lingnu.com ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: faster rsync of huge directories
On Mon, Apr 12, 2010 at 5:02 PM, Nadav Har'El n...@math.technion.ac.ilwrote: On Mon, Apr 12, 2010, Tom Rosenfeld wrote about Re: faster rsync of huge directories: I realized that in my case I did not really need rsync since it is a local disk to disk copy. I could have used a tar and pipe, but I like cpio: Is this quicker? I can't tell, because it is still running, and will be for a few days, but at least it has started copying instead of just building an index. If it is, then the reason of rsync's extreme slowness which you described was *not* the filesystem speed. It has to be something else. Maybe rsync simply uses tons of memory, and starts thrashing? (but this is just a guess, I didn't look at it code). If this is the case then the copy-while-building- the-list that Shachar described might indeed be a big win. find $FROMDIR -depth -print |cpio -pdma $TODIR By default cpio also will not overwrite files if the source is not newer. I recommend you use the -print0 option to find instead of -print, and add the -0 option to cpio. These are GNU extensions to find and cpu (and a bunch of other commands as well) that uses nulls, instead of newlines, to separate the file names. This allows newline characters in filenames (these aren't common, but nevertheless are legal...). By the way, while cpio -p is indeed a good historic tool, nowadays there is little reason to use it, because GNU's cp make it easier to do almost everything that cpio -p did: The -a option to cp is recursive and copies links, modes, timestamps and so on, and the -u option will only copy if the source is newer than the destination (or the destination is missing). So, cp -au $FROMDIR $TODIR is shorter and easier to remember than find | cpio -p. But please note I didn't test this command, so don't use it on your important data without thinking first! Thanks for the tip Nadav (and everyone else.) While we are on the topic, I use cpio because I am also historic :-) In the past I had to do similar copies on diff versions of *NIX (even before rsync was invented!) and after much testing of issues of hard links, sym links, timestamps, etc I found cpio to be the most portable tool. I guess when I get a chance I will test 'cp -au' Thanks, -tom ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: faster rsync of huge directories
On Tue, Apr 13, 2010, Tom Rosenfeld wrote about Re: faster rsync of huge directories: By the way, while cpio -p is indeed a good historic tool, nowadays there is little reason to use it, because GNU's cp make it easier to do almost everything that cpio -p did: The -a option to cp is recursive and copies ... While we are on the topic, I use cpio because I am also historic :-) In the past I had to do similar copies on diff versions of *NIX (even before rsync was invented!) That's ok, because I am also historic :-) which explains why I even heard of cpio (nowadays the only people who are likely to have even heard this name are developers of RPM tools...). In the late 80's, I used cpio extensively for transferring files across the atlantic on... diskettes. I even remember one day when I arrived with a corrupt diskette, and had to modify the cpio source code to skip over errors in the file. That day I learned three lessons: 1. That open source rules, 2. That cpio sucks as a backup format (because it has no error recovery capabilities) and that there must be a better file transfer protocol than diskettes ;-) Soon afterwards, I learned about tar and GNU cp. I haven't used cpio since... and after much testing of issues of hard links, sym links, timestamps, etc I found cpio to be the most portable tool. I guess when I get a chance I will test 'cp -au' I just checked, and cp -a does seem to copy hard-linked files in a directory correctly (i.e., the destination files are also hard-linked). I never checked, though, if it can do more complicated hard-link copying. Frankly, I don't really care - I stopped using hard links around the same time I stopped using cpio. They were really important before the advent of symbolic links (in System V release 4, if I recall correctly), but nowadays they are more often confusing than useful - at least in my opinion. -- Nadav Har'El| Tuesday, Apr 13 2010, 30 Nisan 5770 n...@math.technion.ac.il |- Phone +972-523-790466, ICQ 13349191 |You do not need a parachute to skydive. http://nadav.harel.org.il |You only need one to skydive twice. ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: faster rsync of huge directories
Nadav Har'El wrote: On Tue, Apr 13, 2010, Tom Rosenfeld wrote about Re: faster rsync of huge directories: By the way, while cpio -p is indeed a good historic tool, nowadays there is little reason to use it, because GNU's cp make it easier to do almost everything that cpio -p did: The -a option to cp is recursive and copies ... While we are on the topic, I use cpio because I am also historic :-) In the past I had to do similar copies on diff versions of *NIX (even before rsync was invented!) That's ok, because I am also historic :-) which explains why I even heard of cpio (nowadays the only people who are likely to have even heard this name are developers of RPM tools...). as well as sys admins/kernel developers - the initrd file on (some?) linux distributions is a gziped cpio file (at least on RHEL 5.X) --guy ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: faster rsync of huge directories
2010/4/12 Tom Rosenfeld tro...@bezeqint.net Hi, I am a great fan of rsync for copying filesystems. However I now have a filesystem which is several hundred gigabytes and apparently has a lot of small files. I have been running rsync all night and it still did not start copying as it is still building the file list. Is there any way to get it to start copying as it goes. Or do any of you have a better tool? Are both servers in the same LAN? IMHO, your problem in network band witchbetween source and destination. I have ~4M files, ~800GB - rsync is very fast in the same LAN (1Gb), and slowly for remote destination. Regards, Vitaly ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: faster rsync of huge directories
Tom Rosenfeld wrote: Hi, I am a great fan of rsync for copying filesystems. However I now have a filesystem which is several hundred gigabytes and apparently has a lot of small files. I have been running rsync all night and it still did not start copying as it is still building the file list. Is there any way to get it to start copying as it goes. Or do any of you have a better tool? Yes, there is a better tool. Upgrade both ends to rsync version 3 or later. That version starts the transfer even before the file list is completely built. Shachar -- Shachar Shemesh Lingnu Open Source Consulting Ltd. http://www.lingnu.com ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: faster rsync of huge directories
On Mon, Apr 12, 2010, Shachar Shemesh wrote about Re: faster rsync of huge directories: Upgrade both ends to rsync version 3 or later. That version starts the transfer even before the file list is completely built. Maybe I'm missing something, but how does this help? It may find the first file to copy a little quicker, but finishing the rsync will take exactly the same time, won't it? Also, if nothing has changed, it will take it exactly the same time to figure this out, won't it? I'm not sure what his problem is, though. Is it the fact that the remote rsync takes a very long time to walk the huge directory tree, or the fact that sending the whole list over the network is slow? If it's the first problem, then maybe switching to a different filesystem, or reorganizing your directory structure (e.g., not to have more than a few hundred files per directory) will help. If it's the second problem, then maybe rsync improvements are due - i.e., to use rsync's delta protocol not only on the individual files, but also on the file list. -- Nadav Har'El| Monday, Apr 12 2010, 28 Nisan 5770 n...@math.technion.ac.il |- Phone +972-523-790466, ICQ 13349191 |Fame: when your name is in everything but http://nadav.harel.org.il |the phone book. ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: faster rsync of huge directories
Nadav Har'El wrote: On Mon, Apr 12, 2010, Shachar Shemesh wrote about Re: faster rsync of huge directories: Upgrade both ends to rsync version 3 or later. That version starts the transfer even before the file list is completely built. Maybe I'm missing something, but how does this help? It may find the first file to copy a little quicker, but finishing the rsync will take exactly the same time, won't it? Not at all. If the two are done linearly, then only after the entire directory tree is scanned will the first transfer *begin*. The total transfer time will be tree scan time + transfer time for older rsyncs, but the two overlap for newer transfers. How much time exactly that would save really depends on how much the second time is (i.e. - how much data you need to actually transfer). Also, if nothing has changed, it will take it exactly the same time to figure this out, won't it? Yes. You might still save some time, but this, definitely, is the minimal advantage that newer rsyncs have over older ones. I'm not sure what his problem is, though. Is it the fact that the remote rsync takes a very long time to walk the huge directory tree, or the fact that sending the whole list over the network is slow? From my experience, it's mostly the former. If it's the first problem, then maybe switching to a different filesystem, At the time, we tested ext3, jfs and xfs, and found no significant differences between them. It was not, however, a scientific test. or reorganizing your directory structure (e.g., not to have more than a few hundred files per directory) will help. That is likely to actually help (plugand is why rsyncrypto has the --ne-nesting option when encrypting file names/plug), but is not always a viable option. If it's the second problem, then maybe rsync improvements are due - i.e., to use rsync's delta protocol not only on the individual files, but also on the file list. It's not the second, typically. Shachar -- Shachar Shemesh Lingnu Open Source Consulting Ltd. http://www.lingnu.com ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: faster rsync of huge directories
On Mon, Apr 12, 2010 at 9:41 AM, Tom Rosenfeld tro...@bezeqint.net wrote: Hi, I am a great fan of rsync for copying filesystems. However I now have a filesystem which is several hundred gigabytes and apparently has a lot of small files. I have been running rsync all night and it still did not start copying as it is still building the file list. Is there any way to get it to start copying as it goes. Or do any of you have a better tool? Thanks, -tom Thanks for all the suggestions! I realized that in my case I did not really need rsync since it is a local disk to disk copy. I could have used a tar and pipe, but I like cpio: find $FROMDIR -depth -print |cpio -pdma $TODIR By default cpio also will not overwrite files if the source is not newer. It was also pointed out that ver 3 of rsync now does start to copy before it indexes all the files. Unfortunately, it is not available on CentOS 5. -tom ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: faster rsync of huge directories
Tom Rosenfeld wrote: On Mon, Apr 12, 2010 at 9:41 AM, Tom Rosenfeld tro...@bezeqint.net mailto:tro...@bezeqint.net wrote: Hi, I am a great fan of rsync for copying filesystems. However I now have a filesystem which is several hundred gigabytes and apparently has a lot of small files. I have been running rsync all night and it still did not start copying as it is still building the file list. Is there any way to get it to start copying as it goes. Or do any of you have a better tool? Thanks, -tom Thanks for all the suggestions! I realized that in my case I did not really need rsync since it is a local disk to disk copy. Please note that rsync from local to local is just a glorified cp. It does not do file comparisons at all. It was also pointed out that ver 3 of rsync now does start to copy before it indexes all the files. Unfortunately, it is not available on CentOS 5. wget http://samba.anu.edu.au/ftp/rsync/src/rsync-3.0.7.tar.gz tar xvzf rsync-3.0.7.tar.gz cd rsync-3.0.7.tar.gz ./configure make su make install Shachar -- Shachar Shemesh Lingnu Open Source Consulting Ltd. http://www.lingnu.com ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: faster rsync of huge directories
On Mon, Apr 12, 2010, Tom Rosenfeld wrote about Re: faster rsync of huge directories: I realized that in my case I did not really need rsync since it is a local disk to disk copy. I could have used a tar and pipe, but I like cpio: Is this quicker? If it is, then the reason of rsync's extreme slowness which you described was *not* the filesystem speed. It has to be something else. Maybe rsync simply uses tons of memory, and starts thrashing? (but this is just a guess, I didn't look at it code). If this is the case then the copy-while-building- the-list that Shachar described might indeed be a big win. find $FROMDIR -depth -print |cpio -pdma $TODIR By default cpio also will not overwrite files if the source is not newer. I recommend you use the -print0 option to find instead of -print, and add the -0 option to cpio. These are GNU extensions to find and cpu (and a bunch of other commands as well) that uses nulls, instead of newlines, to separate the file names. This allows newline characters in filenames (these aren't common, but nevertheless are legal...). By the way, while cpio -p is indeed a good historic tool, nowadays there is little reason to use it, because GNU's cp make it easier to do almost everything that cpio -p did: The -a option to cp is recursive and copies links, modes, timestamps and so on, and the -u option will only copy if the source is newer than the destination (or the destination is missing). So, cp -au $FROMDIR $TODIR is shorter and easier to remember than find | cpio -p. But please note I didn't test this command, so don't use it on your important data without thinking first! -- Nadav Har'El| Monday, Apr 12 2010, 28 Nisan 5770 n...@math.technion.ac.il |- Phone +972-523-790466, ICQ 13349191 |I have a watch cat! If someone breaks in, http://nadav.harel.org.il |she'll watch. ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: faster rsync of huge directories
By default cpio also will not overwrite files if the source is not newer. Consider cp -ur rsync also can --delete extraneous files from dest dirs -- Constantine Shulyupin Embedded Linux Expert TI DaVinci Expert Tel-Aviv Israel http://www.LinuxDriver.co.il/ ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
RE: faster rsync of huge directories
Check out Repliweb From: linux-il-boun...@cs.huji.ac.il [mailto:linux-il-boun...@cs.huji.ac.il] On Behalf Of Tom Rosenfeld Sent: Monday, April 12, 2010 9:41 AM To: linux-il@cs.huji.ac.il Subject: faster rsync of huge directories Hi, I am a great fan of rsync for copying filesystems. However I now have a filesystem which is several hundred gigabytes and apparently has a lot of small files. I have been running rsync all night and it still did not start copying as it is still building the file list. Is there any way to get it to start copying as it goes. Or do any of you have a better tool? Thanks, -tom ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: faster rsync of huge directories
On Mon, Apr 12, 2010 at 10:04 AM, Vitaly li...@karasik.org wrote: 2010/4/12 Tom Rosenfeld tro...@bezeqint.net Hi, I am a great fan of rsync for copying filesystems. However I now have a filesystem which is several hundred gigabytes and apparently has a lot of small files. I have been running rsync all night and it still did not start copying as it is still building the file list. Is there any way to get it to start copying as it goes. Or do any of you have a better tool? Are both servers in the same LAN? IMHO, your problem in network band witchbetween source and destination. I have ~4M files, ~800GB - rsync is very fast in the same LAN (1Gb), and slowly for remote destination. Regards, Vitaly I am not even using a lan. It is disk to disk. I have ~16M files ~900GB. rsync has been running about 18 hours and has indexed over 8 million files, but still did not copy even one. Thanks, -tom ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il