Re: compression, built-in or ssh ?

2003-10-17 Thread jw schultz
On Fri, Oct 17, 2003 at 01:51:53AM -0400, Brian K. White wrote:
 What is the general recommendation for compression when using ssh?

Use rsync's compression.

 Is it a wasteful performance hit to have both ssh and rsync do compression
 (when using rsync over ssh)?

Yes.

 If so, is there a clear prefference which is more efficient, rsync or ssh?

Yes.

-- 

J.W. SchultzPegasystems Technologies
email address:  [EMAIL PROTECTED]

Remember Cernan and Schmitt
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: compression, built-in or ssh ?

2003-10-17 Thread Lapo Luchini
jw schultz wrote:

What is the general recommendation for compression when using ssh?
   

Use rsync's compression.
 

If so, is there a clear prefference which is more efficient, rsync or ssh?
   

Yes.
 

Why, if they both use zlib?

Moreover compressing at a higher level always seems a good diea to me 
(e.g. if you compress the whole SSH session, you're sure *anything* is 
compressed, if yuo let each application encrypt you have to nkow each 
application, and maybe it does compress only data and not commands, just 
to do an example).

In the case of rsync I always thought it was the same, and I never use 
-z on rsync as I have compression on by default on SSH connections.

Any actual reason not to do that?

Lapo

--
Lapo 'Raist' Luchini
[EMAIL PROTECTED] (PGP  X.509 keys available)
http://www.lapo.it (ICQ UIN: 529796)
--
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Versioned files (take 2)

2003-10-17 Thread jw schultz
On Tue, Oct 14, 2003 at 01:09:03PM -0400, Jason M. Felice wrote:
 I've pondered the feedback and revised my proposal to the client.  Here
 is the revised project objectives.  Notably, this is the addition of 4),
 the deletion of the whole slew of items actually related to handling
 versioned files, and mention of preexisting work on 1).
 
 I've took a little gander at some of the backup wrappers, and it looks
 like I will probably use one of these.  I'll have to look a bit closer
 to see which ones best fit my needs.
 
 Thanks,
 -Jay 'Eraserhead' Felice
 
 
Project objectives
 
The backup system project will meet the following objectives:
 
 1. Implement SSL connections.
 
The modified client will use SSL for encryption of the protocol
stream. Existing clients can use an external shell program such as SSH
to provide encryption, but this is not portable and it is difficult to
manage.
 
An --ssl option will be added to the rsync program to enable this
feature. This option will be accepted in both client and daemon mode.
 
Patches to rsync exist which do this. They will be evaluated and
applied or modified as appropriate.

The patches that exist have issues with the three-way
connection.  That is the primary reason they have not been
accepted although they have tended to have problems with
key management as well.  

A better use of your developer time might be to work on
fixing the cygwin hang problem running rsync over ssh.  I
suspect that the delays in fixing this problem have been a
lack of resources committed to fixing it.  Using ssh is very
manageable and only a portability problem for legacy
systems.

 2. Write a Windows backup service.
[snip]
 3. Write a configuration GUI for the Windows backup service.
[snip]
 4. Add a --link-dest-type option.
 
Currently, rsync's --link-dest option will hard link files against
an older copy in an identical directory structure when they have not
changed in order to save space. With this option, the user would be
able to specify the link destination type as either mirror or
hash. Mirror is the default, and will behave like existing versions
of rsync.
 
The hash type will calculate a directory name based on a strong hash
of the file and the file's size, for example
/f7/d6/22/d9e9a6d8b9e9e4f00/1ff. rsync will search this directory
for a file with identical contents to the one being transferred. If it
finds one, it will hard link the transferred file to it. If it does
not, it will create a new file with the next available integer
containing the new file and hard link it to the destination.

Cute idea.  Will be dog slow compared to a normal rsync.
You may want to sixel the hash which is 20 bytes.
--link-by-hash=dir would be a better name.

 
This will allow us to store only one copy of a file which might exist
in multiple places in a filesystem or even on multiple clients.

I wouldn't recommend accepting this option as part of a
proposal.  It is vaporware that even if created has no
assurance of becoming a part of the mainline codebase.
Unless accepted into mainline the customer would then have a
patched rsync that needs to be repatched every time there is
an important (read security) update to mainline and no
support from the community.

 5. Write a restore GUI for Windows.
[snip]
 6. Create Windows installer.
[snip]

-- 

J.W. SchultzPegasystems Technologies
email address:  [EMAIL PROTECTED]

Remember Cernan and Schmitt
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: compression, built-in or ssh ?

2003-10-17 Thread jw schultz
On Fri, Oct 17, 2003 at 09:34:17AM +0200, Lapo Luchini wrote:
 jw schultz wrote:
 
 What is the general recommendation for compression when using ssh?

 
 Use rsync's compression.
  
 
 If so, is there a clear prefference which is more efficient, rsync or ssh?

 
 Yes.
  
 
 Why, if they both use zlib?
 
 Moreover compressing at a higher level always seems a good diea to me 
 (e.g. if you compress the whole SSH session, you're sure *anything* is 
 compressed, if yuo let each application encrypt you have to nkow each 
 application, and maybe it does compress only data and not commands, just 
 to do an example).
 
 In the case of rsync I always thought it was the same, and I never use 
 -z on rsync as I have compression on by default on SSH connections.
 
 Any actual reason not to do that?

For the file data sent rsync seeds the compressor so that it
achieves a higher level of compression than can be achieved
my only compressing the blocks transmitted.

Whether you use the -z option or not rsync is
micro-optimised in transmitting the file list so that the
meta-data transmission is effectively compressed.  About the
only thing compressible that is not compressed are error
messages.

Were it not for the micro-optimisation i would say that
which level to compress for maximum effect would depend on
the amount of non-matched data sent.

One way that doing the compression in ssh does have an
advantage is that with ssh protocol version 1 you can
control the compression level.

-- 

J.W. SchultzPegasystems Technologies
email address:  [EMAIL PROTECTED]

Remember Cernan and Schmitt
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


--bwlimit not working right

2003-10-17 Thread Rene Schumann
Hello!

I cant get the bwlimit option working right.
If i set this option over 400 kbyte per sec i still only get 400kbyte
per sec, whether wich value i set.
I try this option with a 100MB big file.
I use a debian stable System with rsync version 2.5.6cvs  protocol
version 26.
Can someone tell me how i can this get working?

thx
Rene



dpkg -l rsync*

ii  rsync   2.5.5-0.1   fast remote file copy program (like rcp)

without bwlimit:

Number of files transferred: 1
Total file size: 99745211 bytes
Total transferred file size: 99745211 bytes
Literal data: 99745211 bytes
Matched data: 0 bytes
File list size: 74
Total bytes written: 154
Total bytes read: 99757574

wrote 154 bytes  read 99757574 bytes  9500736.00 bytes/sec
total size is 99745211  speedup is 1.00

with bwlimit=200

Number of files: 2
Number of files transferred: 1
Total file size: 99745211 bytes
Total transferred file size: 99745211 bytes
Literal data: 99745211 bytes
Matched data: 0 bytes
File list size: 74
Total bytes written: 168
Total bytes read: 99757574

wrote 168 bytes  read 99757574 bytes  136188.04 bytes/sec
total size is 99745211  speedup is 1.00

with bwlimit=1000

Number of files: 2
Number of files transferred: 1
Total file size: 99745211 bytes
Total transferred file size: 99745211 bytes
Literal data: 99745211 bytes
Matched data: 0 bytes
File list size: 74
Total bytes written: 169
Total bytes read: 99757574

wrote 169 bytes  read 99757574 bytes  408007.13 bytes/sec
total size is 99745211  speedup is 1.00


with bwlimit=5000

Number of files: 2
Number of files transferred: 1
Total file size: 99745211 bytes
Total transferred file size: 99745211 bytes
Literal data: 99745211 bytes
Matched data: 0 bytes
File list size: 74
Total bytes written: 169
Total bytes read: 99757574

wrote 169 bytes  read 99757574 bytes  408007.13 bytes/sec
total size is 99745211  speedup is 1.00

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


RE: --bwlimit not working right

2003-10-17 Thread Max Kipness
 I cant get the bwlimit option working right.
 If i set this option over 400 kbyte per sec i still only get 
 400kbyte per sec, whether wich value i set. I try this option 
 with a 100MB big file. I use a debian stable System with 
 rsync version 2.5.6cvs  protocol version 26. Can someone tell 
 me how i can this get working?

This really doesn't answer your question, but I wanted to mention that I
use CBQ on Redhat for QoS and it does a great job at limiting bandwidth
to exactly what it's set to. I believe it's included with Debian as
well. I use it on several servers to limit saturation.

I think I tried the --bwlimit option a while back and it doesn't seem to
work to well if you have many small files.

Max
--
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: compression, built-in or ssh ?

2003-10-17 Thread Lapo Luchini
Donovan Baarda wrote:

Any actual reason not to do that?
   

rsync can use what I refer to in pysync as context compression. This
is where the matching data is compressed even though the matching
compressed data is not transmitted (because the other end already has
it). This primes the compressor with context information. My tests
with pysync show that this can improve compression on real-world data by
20% or more.
 

Great!
I didn't know that =)
In that case, rsync -z is *really* better than ssh -C

--
Lapo 'Raist' Luchini
[EMAIL PROTECTED] (PGP  X.509 keys available)
http://www.lapo.it (ICQ UIN: 529796)
--
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: --bwlimit not working right

2003-10-17 Thread John Van Essen
On 17 Oct 2003, Rene Schumann [EMAIL PROTECTED] wrote:

 Hello!
 
 I cant get the bwlimit option working right.
 If i set this option over 400 kbyte per sec i still only get 400kbyte
 per sec, whether wich value i set.
 I try this option with a 100MB big file.
 I use a debian stable System with rsync version 2.5.6cvs  protocol
 version 26.
 Can someone tell me how i can this get working?
[ snip ]

We use --bwlimit extensively and have experienced the same 400 kB limit.
So you are doing nothing wrong.  It's just the nature of the beast in
the way that it is implemented.

Linux systems have a granularity of 10 ms.  Wait times cannot be
shorter than that, and are rounded up if necessary.

If you are pulling data (vs. pushing data) then rsync uses a buffer
size of 4096.

The forumla used to calculate the sleep time in microsecs is:

bytes_written * 1000 / bwlimit

4096 * 1000 / 400 = approx. 10,000

So attempts to use bwlimit greater than 400 ends up with a wait
time that is rounded up to 10,000, which is effectively 409.6 kB/s
given the 4096 byte buffeer size.  Thus the apparent ceiling.

There is a proposed patch to accumulate wait times to make it
more accurate which would probably solve your problem.  See this
thread in the archives:

http://www.mail-archive.com/[EMAIL PROTECTED]/msg07270.html

A corrected patch is in the next message (7271).
-- 
John Van Essen  Univ of MN Alumnus  [EMAIL PROTECTED]

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Added functionality --compare-file and --compare-auto

2003-10-17 Thread Andy Henson
Recently various needs for multiple version handling were discussed
and I put forward a plan of mine.  Subsequently the proposal for a
--compare-file=FILE switch had support, so I have implemented
this. I have also implemented an experimental --compare-auto which
decides which file to match against using a rule.

Instructions for patch:

1. Install rsync-2.5.6 source
2. patch -p1  rsync-2.5.6-arh1.patch (the code below)
3. edit configure to add arh1 to the RSYNC_VERSION string and run
./configure, or if you've already run this, edit config.h to add
arh1 to the RSYNC_VERSION string.
4. make proto  - to update proto.h file
5. make

Here's rsync-2.5.6-arh1.patch:
-cut here-
diff -aur rsync-2.5.6/generator.c rsync-arh/generator.c
--- rsync-2.5.6/generator.c Thu Aug 29 14:44:55 2002
+++ rsync-arh/generator.c   Fri Oct 17 15:48:56 2003
@@ -5,6 +5,7 @@
Copyright (C) 1996-2000 by Andrew Tridgell 
Copyright (C) Paul Mackerras 1996
Copyright (C) 2002 by Martin Pool [EMAIL PROTECTED]
+   Copyright (C) 2003, Andy Henson, Zexia Access Ltd

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@@ -41,6 +42,8 @@
 extern int always_checksum;
 extern int modify_window;
 extern char *compare_dest;
+extern char *compare_file;
+extern int compare_auto;
 extern int link_dest;
 
 
@@ -357,29 +360,36 @@
 
fnamecmp = fname;
 
-   if ((statret == -1)  (compare_dest != NULL)) {
-   /* try the file at compare_dest instead */
+   if ((statret == -1)  compare_auto) {
+   compare_file = findcomparename(fname,fnamecmpbuf);
+   } else if ((statret == -1)  (compare_dest != NULL)) {
+   snprintf(fnamecmpbuf,MAXPATHLEN,%s/%s,
+   compare_dest,fname);
+   compare_file = fnamecmpbuf;
+   }
+
+   if ((statret == -1)  (compare_file != NULL)) {
+   /*try this file instead (--compare-dest, --compare-file, 
--compare-auto)*/
int saveerrno = errno;
-   snprintf(fnamecmpbuf,MAXPATHLEN,%s/%s,compare_dest,fname);
-   statret = link_stat(fnamecmpbuf,st);
+   statret = link_stat(compare_file,st);
if (!S_ISREG(st.st_mode))
statret = -1;
if (statret == -1)
errno = saveerrno;
 #if HAVE_LINK
else if (link_dest  !dry_run) {
-   if (do_link(fnamecmpbuf, fname) != 0) {
+   if (do_link(compare_file, fname) != 0) {
if (verbose  0)
rprintf(FINFO,link %s = %s : %s\n,
-   fnamecmpbuf,
+   compare_file,
fname,
strerror(errno));
}
-   fnamecmp = fnamecmpbuf;
+   fnamecmp = compare_file;
}
 #endif
else
-   fnamecmp = fnamecmpbuf;
+   fnamecmp = compare_file;
}
 
if (statret == -1) {
@@ -534,3 +544,86 @@
write_int(f,-1);
}
 }
+
+
+
+char * findcomparename(const char* fname, char* buf)
+   /* returns compare name, a valid file with name similar to @param fname.
+* Implements the --compare-auto name function.
+* May use @param buf as buffer for the name (size is MAXPATHLEN). */
+
+/* The algorithm: scans the directory for filenames where the names
+match once version information is stripped out.  Version information
+is assumed to be digits after one of - . ; and it continues until
+either . and non-digit or - and non-digit, t, p, r.  This rather
+odd rule permits 2.4-test2, 2.4-rc4, 2.4-pre3 to be ignored as versions.
+Finally it selects the most recent of these which has a size no smaller
+than 90% of the biggest of any of them.
+I acknowlege these are pretty arbitrary rules - arh 17 October 2003 */
+{
+   char newname[MAXPATHLEN];
+   char tmpname[MAXPATHLEN];
+   time_t newtime=0;
+   size_t newsize=0;
+   struct dirent *di;
+   DIR *d;
+   char* dirname;
+   char *name;
+
+   strncpy(buf,fname,MAXPATHLEN);
+   dirname = buf;
+   name = strrchr(buf,'/');
+   if (name)
+   *name++ = 0;//terminate name at end of directory part  
 
+   else {
+   name = (char*)fname;
+   dirname = .;
+   }
+   if (compare_dest)
+   dirname = compare_dest;
+   if (verbose  1)
+   rprintf(FINFO,findcomparename: dir %s name %s\n,dirname,name);
+   d = opendir(dirname);
+   if (d) {
+   for (di = 

Pysync 2.24 release, was Re: rsync on OpenVMS

2003-10-17 Thread Donovan Baarda
On Tue, 2003-10-14 at 11:01, Donovan Baarda wrote:
 On Mon, 2003-10-13 at 13:00, John E. Malmberg wrote:
  jw schultz wrote:
   On Sun, Oct 12, 2003 at 12:38:40AM -0400, John E. Malmberg wrote:
[...]
  I have not heard of unison.  I have heard that pysync was successful in 
  a limited test on OpenVMS.  As near as I can tell though the librsync it 
  is based on is a bit out of date.
[...]
 Something possibly worth trying on it is psyco... it compiles python to
 native code on the fly using a simple import psyco. Pure python is a
 bit slow compared to native C implementations, but psyco could help
 close the gap a bit.

Following up on this... I tried using psyco with python2.2 and it cut
the pysync tests on my machine from 21secs down to 14secs... that's a
33% speedup. In the past I'd tried using pyrex to speed things up with
no success. psyco not only gives a better boost, but is much easier to
use.

 I haven't touched pysync for a while, but it should still work with the
 latest librsync as the API hasn't changed. If there are any problems,
 please let me know. I believe rdiff-backup also has a python wrapper for
 librsync that might be more advanced than the one in pysync.
 
 I have plans for both pysync and librsync, but I haven't worked on them
 much lately. I find I am mostly motivated by feedback from others when
 funding is not available :-)

This little bit of interest motivated me to have a look at it again, and
I've just released version 2.24. From it's NEWS:

Updates between release 2.16 and 2.24
-
 
  * Added TODO and NEWS files.
 
  * Changed to use psyco if available, giving a 33% speedup.
   
  * Updated to use librsync 0.9.6.
   
  * Changed to using a faster md4sum implementation based on the
  librsync implementation, modified to use the RSA API.
   
  * Added rollin/rollout support to historical adler32.py.
   
  * Minor cleanups to rollsum code.
   
  * Minor tweaks to handling of block fragment matching.


-- 
Donovan Baarda [EMAIL PROTECTED]
http://minkirri.apana.org.au/~abo/

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html