Re: compression, built-in or ssh ?
On Fri, Oct 17, 2003 at 01:51:53AM -0400, Brian K. White wrote: What is the general recommendation for compression when using ssh? Use rsync's compression. Is it a wasteful performance hit to have both ssh and rsync do compression (when using rsync over ssh)? Yes. If so, is there a clear prefference which is more efficient, rsync or ssh? Yes. -- J.W. SchultzPegasystems Technologies email address: [EMAIL PROTECTED] Remember Cernan and Schmitt -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: compression, built-in or ssh ?
jw schultz wrote: What is the general recommendation for compression when using ssh? Use rsync's compression. If so, is there a clear prefference which is more efficient, rsync or ssh? Yes. Why, if they both use zlib? Moreover compressing at a higher level always seems a good diea to me (e.g. if you compress the whole SSH session, you're sure *anything* is compressed, if yuo let each application encrypt you have to nkow each application, and maybe it does compress only data and not commands, just to do an example). In the case of rsync I always thought it was the same, and I never use -z on rsync as I have compression on by default on SSH connections. Any actual reason not to do that? Lapo -- Lapo 'Raist' Luchini [EMAIL PROTECTED] (PGP X.509 keys available) http://www.lapo.it (ICQ UIN: 529796) -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Versioned files (take 2)
On Tue, Oct 14, 2003 at 01:09:03PM -0400, Jason M. Felice wrote: I've pondered the feedback and revised my proposal to the client. Here is the revised project objectives. Notably, this is the addition of 4), the deletion of the whole slew of items actually related to handling versioned files, and mention of preexisting work on 1). I've took a little gander at some of the backup wrappers, and it looks like I will probably use one of these. I'll have to look a bit closer to see which ones best fit my needs. Thanks, -Jay 'Eraserhead' Felice Project objectives The backup system project will meet the following objectives: 1. Implement SSL connections. The modified client will use SSL for encryption of the protocol stream. Existing clients can use an external shell program such as SSH to provide encryption, but this is not portable and it is difficult to manage. An --ssl option will be added to the rsync program to enable this feature. This option will be accepted in both client and daemon mode. Patches to rsync exist which do this. They will be evaluated and applied or modified as appropriate. The patches that exist have issues with the three-way connection. That is the primary reason they have not been accepted although they have tended to have problems with key management as well. A better use of your developer time might be to work on fixing the cygwin hang problem running rsync over ssh. I suspect that the delays in fixing this problem have been a lack of resources committed to fixing it. Using ssh is very manageable and only a portability problem for legacy systems. 2. Write a Windows backup service. [snip] 3. Write a configuration GUI for the Windows backup service. [snip] 4. Add a --link-dest-type option. Currently, rsync's --link-dest option will hard link files against an older copy in an identical directory structure when they have not changed in order to save space. With this option, the user would be able to specify the link destination type as either mirror or hash. Mirror is the default, and will behave like existing versions of rsync. The hash type will calculate a directory name based on a strong hash of the file and the file's size, for example /f7/d6/22/d9e9a6d8b9e9e4f00/1ff. rsync will search this directory for a file with identical contents to the one being transferred. If it finds one, it will hard link the transferred file to it. If it does not, it will create a new file with the next available integer containing the new file and hard link it to the destination. Cute idea. Will be dog slow compared to a normal rsync. You may want to sixel the hash which is 20 bytes. --link-by-hash=dir would be a better name. This will allow us to store only one copy of a file which might exist in multiple places in a filesystem or even on multiple clients. I wouldn't recommend accepting this option as part of a proposal. It is vaporware that even if created has no assurance of becoming a part of the mainline codebase. Unless accepted into mainline the customer would then have a patched rsync that needs to be repatched every time there is an important (read security) update to mainline and no support from the community. 5. Write a restore GUI for Windows. [snip] 6. Create Windows installer. [snip] -- J.W. SchultzPegasystems Technologies email address: [EMAIL PROTECTED] Remember Cernan and Schmitt -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: compression, built-in or ssh ?
On Fri, Oct 17, 2003 at 09:34:17AM +0200, Lapo Luchini wrote: jw schultz wrote: What is the general recommendation for compression when using ssh? Use rsync's compression. If so, is there a clear prefference which is more efficient, rsync or ssh? Yes. Why, if they both use zlib? Moreover compressing at a higher level always seems a good diea to me (e.g. if you compress the whole SSH session, you're sure *anything* is compressed, if yuo let each application encrypt you have to nkow each application, and maybe it does compress only data and not commands, just to do an example). In the case of rsync I always thought it was the same, and I never use -z on rsync as I have compression on by default on SSH connections. Any actual reason not to do that? For the file data sent rsync seeds the compressor so that it achieves a higher level of compression than can be achieved my only compressing the blocks transmitted. Whether you use the -z option or not rsync is micro-optimised in transmitting the file list so that the meta-data transmission is effectively compressed. About the only thing compressible that is not compressed are error messages. Were it not for the micro-optimisation i would say that which level to compress for maximum effect would depend on the amount of non-matched data sent. One way that doing the compression in ssh does have an advantage is that with ssh protocol version 1 you can control the compression level. -- J.W. SchultzPegasystems Technologies email address: [EMAIL PROTECTED] Remember Cernan and Schmitt -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
--bwlimit not working right
Hello! I cant get the bwlimit option working right. If i set this option over 400 kbyte per sec i still only get 400kbyte per sec, whether wich value i set. I try this option with a 100MB big file. I use a debian stable System with rsync version 2.5.6cvs protocol version 26. Can someone tell me how i can this get working? thx Rene dpkg -l rsync* ii rsync 2.5.5-0.1 fast remote file copy program (like rcp) without bwlimit: Number of files transferred: 1 Total file size: 99745211 bytes Total transferred file size: 99745211 bytes Literal data: 99745211 bytes Matched data: 0 bytes File list size: 74 Total bytes written: 154 Total bytes read: 99757574 wrote 154 bytes read 99757574 bytes 9500736.00 bytes/sec total size is 99745211 speedup is 1.00 with bwlimit=200 Number of files: 2 Number of files transferred: 1 Total file size: 99745211 bytes Total transferred file size: 99745211 bytes Literal data: 99745211 bytes Matched data: 0 bytes File list size: 74 Total bytes written: 168 Total bytes read: 99757574 wrote 168 bytes read 99757574 bytes 136188.04 bytes/sec total size is 99745211 speedup is 1.00 with bwlimit=1000 Number of files: 2 Number of files transferred: 1 Total file size: 99745211 bytes Total transferred file size: 99745211 bytes Literal data: 99745211 bytes Matched data: 0 bytes File list size: 74 Total bytes written: 169 Total bytes read: 99757574 wrote 169 bytes read 99757574 bytes 408007.13 bytes/sec total size is 99745211 speedup is 1.00 with bwlimit=5000 Number of files: 2 Number of files transferred: 1 Total file size: 99745211 bytes Total transferred file size: 99745211 bytes Literal data: 99745211 bytes Matched data: 0 bytes File list size: 74 Total bytes written: 169 Total bytes read: 99757574 wrote 169 bytes read 99757574 bytes 408007.13 bytes/sec total size is 99745211 speedup is 1.00 -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
RE: --bwlimit not working right
I cant get the bwlimit option working right. If i set this option over 400 kbyte per sec i still only get 400kbyte per sec, whether wich value i set. I try this option with a 100MB big file. I use a debian stable System with rsync version 2.5.6cvs protocol version 26. Can someone tell me how i can this get working? This really doesn't answer your question, but I wanted to mention that I use CBQ on Redhat for QoS and it does a great job at limiting bandwidth to exactly what it's set to. I believe it's included with Debian as well. I use it on several servers to limit saturation. I think I tried the --bwlimit option a while back and it doesn't seem to work to well if you have many small files. Max -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: compression, built-in or ssh ?
Donovan Baarda wrote: Any actual reason not to do that? rsync can use what I refer to in pysync as context compression. This is where the matching data is compressed even though the matching compressed data is not transmitted (because the other end already has it). This primes the compressor with context information. My tests with pysync show that this can improve compression on real-world data by 20% or more. Great! I didn't know that =) In that case, rsync -z is *really* better than ssh -C -- Lapo 'Raist' Luchini [EMAIL PROTECTED] (PGP X.509 keys available) http://www.lapo.it (ICQ UIN: 529796) -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: --bwlimit not working right
On 17 Oct 2003, Rene Schumann [EMAIL PROTECTED] wrote: Hello! I cant get the bwlimit option working right. If i set this option over 400 kbyte per sec i still only get 400kbyte per sec, whether wich value i set. I try this option with a 100MB big file. I use a debian stable System with rsync version 2.5.6cvs protocol version 26. Can someone tell me how i can this get working? [ snip ] We use --bwlimit extensively and have experienced the same 400 kB limit. So you are doing nothing wrong. It's just the nature of the beast in the way that it is implemented. Linux systems have a granularity of 10 ms. Wait times cannot be shorter than that, and are rounded up if necessary. If you are pulling data (vs. pushing data) then rsync uses a buffer size of 4096. The forumla used to calculate the sleep time in microsecs is: bytes_written * 1000 / bwlimit 4096 * 1000 / 400 = approx. 10,000 So attempts to use bwlimit greater than 400 ends up with a wait time that is rounded up to 10,000, which is effectively 409.6 kB/s given the 4096 byte buffeer size. Thus the apparent ceiling. There is a proposed patch to accumulate wait times to make it more accurate which would probably solve your problem. See this thread in the archives: http://www.mail-archive.com/[EMAIL PROTECTED]/msg07270.html A corrected patch is in the next message (7271). -- John Van Essen Univ of MN Alumnus [EMAIL PROTECTED] -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Added functionality --compare-file and --compare-auto
Recently various needs for multiple version handling were discussed and I put forward a plan of mine. Subsequently the proposal for a --compare-file=FILE switch had support, so I have implemented this. I have also implemented an experimental --compare-auto which decides which file to match against using a rule. Instructions for patch: 1. Install rsync-2.5.6 source 2. patch -p1 rsync-2.5.6-arh1.patch (the code below) 3. edit configure to add arh1 to the RSYNC_VERSION string and run ./configure, or if you've already run this, edit config.h to add arh1 to the RSYNC_VERSION string. 4. make proto - to update proto.h file 5. make Here's rsync-2.5.6-arh1.patch: -cut here- diff -aur rsync-2.5.6/generator.c rsync-arh/generator.c --- rsync-2.5.6/generator.c Thu Aug 29 14:44:55 2002 +++ rsync-arh/generator.c Fri Oct 17 15:48:56 2003 @@ -5,6 +5,7 @@ Copyright (C) 1996-2000 by Andrew Tridgell Copyright (C) Paul Mackerras 1996 Copyright (C) 2002 by Martin Pool [EMAIL PROTECTED] + Copyright (C) 2003, Andy Henson, Zexia Access Ltd This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -41,6 +42,8 @@ extern int always_checksum; extern int modify_window; extern char *compare_dest; +extern char *compare_file; +extern int compare_auto; extern int link_dest; @@ -357,29 +360,36 @@ fnamecmp = fname; - if ((statret == -1) (compare_dest != NULL)) { - /* try the file at compare_dest instead */ + if ((statret == -1) compare_auto) { + compare_file = findcomparename(fname,fnamecmpbuf); + } else if ((statret == -1) (compare_dest != NULL)) { + snprintf(fnamecmpbuf,MAXPATHLEN,%s/%s, + compare_dest,fname); + compare_file = fnamecmpbuf; + } + + if ((statret == -1) (compare_file != NULL)) { + /*try this file instead (--compare-dest, --compare-file, --compare-auto)*/ int saveerrno = errno; - snprintf(fnamecmpbuf,MAXPATHLEN,%s/%s,compare_dest,fname); - statret = link_stat(fnamecmpbuf,st); + statret = link_stat(compare_file,st); if (!S_ISREG(st.st_mode)) statret = -1; if (statret == -1) errno = saveerrno; #if HAVE_LINK else if (link_dest !dry_run) { - if (do_link(fnamecmpbuf, fname) != 0) { + if (do_link(compare_file, fname) != 0) { if (verbose 0) rprintf(FINFO,link %s = %s : %s\n, - fnamecmpbuf, + compare_file, fname, strerror(errno)); } - fnamecmp = fnamecmpbuf; + fnamecmp = compare_file; } #endif else - fnamecmp = fnamecmpbuf; + fnamecmp = compare_file; } if (statret == -1) { @@ -534,3 +544,86 @@ write_int(f,-1); } } + + + +char * findcomparename(const char* fname, char* buf) + /* returns compare name, a valid file with name similar to @param fname. +* Implements the --compare-auto name function. +* May use @param buf as buffer for the name (size is MAXPATHLEN). */ + +/* The algorithm: scans the directory for filenames where the names +match once version information is stripped out. Version information +is assumed to be digits after one of - . ; and it continues until +either . and non-digit or - and non-digit, t, p, r. This rather +odd rule permits 2.4-test2, 2.4-rc4, 2.4-pre3 to be ignored as versions. +Finally it selects the most recent of these which has a size no smaller +than 90% of the biggest of any of them. +I acknowlege these are pretty arbitrary rules - arh 17 October 2003 */ +{ + char newname[MAXPATHLEN]; + char tmpname[MAXPATHLEN]; + time_t newtime=0; + size_t newsize=0; + struct dirent *di; + DIR *d; + char* dirname; + char *name; + + strncpy(buf,fname,MAXPATHLEN); + dirname = buf; + name = strrchr(buf,'/'); + if (name) + *name++ = 0;//terminate name at end of directory part + else { + name = (char*)fname; + dirname = .; + } + if (compare_dest) + dirname = compare_dest; + if (verbose 1) + rprintf(FINFO,findcomparename: dir %s name %s\n,dirname,name); + d = opendir(dirname); + if (d) { + for (di =
Pysync 2.24 release, was Re: rsync on OpenVMS
On Tue, 2003-10-14 at 11:01, Donovan Baarda wrote: On Mon, 2003-10-13 at 13:00, John E. Malmberg wrote: jw schultz wrote: On Sun, Oct 12, 2003 at 12:38:40AM -0400, John E. Malmberg wrote: [...] I have not heard of unison. I have heard that pysync was successful in a limited test on OpenVMS. As near as I can tell though the librsync it is based on is a bit out of date. [...] Something possibly worth trying on it is psyco... it compiles python to native code on the fly using a simple import psyco. Pure python is a bit slow compared to native C implementations, but psyco could help close the gap a bit. Following up on this... I tried using psyco with python2.2 and it cut the pysync tests on my machine from 21secs down to 14secs... that's a 33% speedup. In the past I'd tried using pyrex to speed things up with no success. psyco not only gives a better boost, but is much easier to use. I haven't touched pysync for a while, but it should still work with the latest librsync as the API hasn't changed. If there are any problems, please let me know. I believe rdiff-backup also has a python wrapper for librsync that might be more advanced than the one in pysync. I have plans for both pysync and librsync, but I haven't worked on them much lately. I find I am mostly motivated by feedback from others when funding is not available :-) This little bit of interest motivated me to have a look at it again, and I've just released version 2.24. From it's NEWS: Updates between release 2.16 and 2.24 - * Added TODO and NEWS files. * Changed to use psyco if available, giving a 33% speedup. * Updated to use librsync 0.9.6. * Changed to using a faster md4sum implementation based on the librsync implementation, modified to use the RSA API. * Added rollin/rollout support to historical adler32.py. * Minor cleanups to rollsum code. * Minor tweaks to handling of block fragment matching. -- Donovan Baarda [EMAIL PROTECTED] http://minkirri.apana.org.au/~abo/ -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html