Re: LGPL relicense port of rsync

2016-01-24 Thread Martin Pool
> >
> > > I guess I could write an initial protocol specification - but it would
> > > not be complete and I wouldn't be able to relicense my library to
> > > LGPL anyway.
> > >
> > > So I guess I have convinced myself that it is not worth the effort
> > > trying. Time is probably better spent coding ;) And that's OK too, it
> is not
> > > that big of a deal anyway.
> >
> > Or think about following. You insist that your Java library is
> > derivative work from the C program. OK. However, I believe a
> > "translation into other languages" doesn't mean you make changes into
> > the workflow by code restructuring, introducing another data
> > structures, classes and so on. More such changes you made, less it just
> > a "translation" and more an inspiration. Often I read in code not
> > "based on" but "inspired by".
> >
> > Anyway, you have written every line in Java. This means you're a
> > copyright holder on this. Thus you're allowed to license your work as
> > you wish. In case you still insist it is a derivative work, you're
> > required to allow the usage of your code under GPL. But! As a copyright
> > holder you're allowed to give an arbitrary license additionally and
> > even on a per case basis.
> >
> > This was my opinion. Additional references to approve or disapprove are
> > welcome :)
> You might be right but I am a bit hesitant.

These are talking about different situations:

 - 'porting' in the sense of making code run on a different platform while
still having some code in common
 - line-by-line rewrite or translation
 - writing a new program using the rsync source as documentation of the
protocol, as you are doing

In my (not a lawyer) opinion, the last of them does not create a copyright
derivative, and (separately) I don't object to you doing that on GPL'd work
that I wrote. I would consider the first two to be a violation.

I think you have a couple of cheap options to get some clarity:

 - mail the other key authors listed above explaining what you're doing and
ask if they object
 - mail the FSF or SFLC as custodians of the L/GPL

> I think that the best thing would be if rsync would be split into a
> library part (LGPL) and application part (GPL). This could make the
> rsync protocol even more used.
> But again, it could be quite some substantial work, both coding (?)
> but also getting permissions from previous contributors to relicense
> the library part.
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options:
Before posting, read:

Fwd: Delete some excluded files in rsync

2006-03-07 Thread Martin Pool

Begin forwarded message:

From: Karel Kulhavy [EMAIL PROTECTED]
Date: 7 March 2006 18:01:43
Subject: Delete some excluded files in rsync


I suggest that a feature be added into rsync. That one could  

specify excluded files that should be deleted on the receiver and
excluded files that shouldn't be deleted on the receiver.

I am using rsync for remote updating of my website
and this feature would be handy because some files are generated on
the server because cannot be generated on the laptop where the files
are edited, and shouldn't be deleted. The remaining excluded files
should be deleted on the receiver if they got accidentally copied  
in the

past, for example becaue the rsync script wasn't tuned properly.



Martin Pool

To unsubscribe or change options:
Before posting, read:

Re: [librsync-users] MD4 second-preimage attack

2006-02-21 Thread Martin Pool
On Tue, 2006-02-21 at 14:58 -0800, [EMAIL PROTECTED] wrote:

 A year ago we discussed the strength of the MD4 hash used by rsync and
 librsync, and one of the points mentioned was that only collision
 attacks are known on MD4.

Could you please forward this into the bug tracker so it's not lost?


Description: This is a digitally signed message part
To unsubscribe or change options:
Before posting, read:

Re: Spam to this list

2005-03-25 Thread Martin Pool
John Van Essen wrote:
Off list to rsync list owner (feel free to reply on-list if you like):
On Fri, 25 Mar 2005, Dag Wieers [EMAIL PROTECTED] wrote:
I'm not sure what the policy of this list is and I bet everyone has a spam
filter, so nobody might have noticed, but we got spammed.
The policy is to block as much spam as possible without blocking
legitimate posts.  A 100% solution is impossible, even if we had human
moderation (humans make mistakes).
It seems that these posts got through during a surge of spam when the
filter hit its maximum-process limit.  During the day of the 24th more
than 60 spam messages to the list were blocked.
I got several.  Delivered to the mailing list from: []
  unknown []
  unknown []
  unknown []

The first one has been in the blacklist since Oct.
I use these 4 DNS-based blacklists in the mail server that I manage:
And they have helped a LOT.

The other 3 have no reverse DNS entries.  A machine with no reverse DNS
that is sending email is not very likely to be a legitimate email server.
It's much more likely a compromised machine on a clueless ISP's network.
Rejecting email from those unidentified machines also has helped a lot.
Using any of those measures alone tends to block legitimate posters,
particularly those running their own mail server, which to my mind is a
greater harm than letting ocassional spam go through.  Our purpose here
is to run a mailing list, not punish ISPs.  So we use all the things you
named as part of a weighted score.

Description: OpenPGP digital signature
To unsubscribe or change options:
Before posting, read:

Re: rsync filename heuristics

2005-01-04 Thread Martin Pool
On  5 Jan 2005, Rusty Russell [EMAIL PROTECTED] wrote:
 On Tue, 2005-01-04 at 18:24 +0100, Robert Lemmen wrote:
  hi rusty,
  i read on some webpage about rsync and debian that you wrote a patch to
  rsync that let's it uses heuristics when deciding which local file to
  use. could you tell me whether this is planned to be included in a rsync
  release? could i have that patch?
 Hmm, good question.  This is from 2.5.4, and can't remember how well it
 worked.  Good luck!

I'm not the rsync maintainer anymore, but I think it would be cool if
this were merged, if the current team feels OK about it.

 diff -urN rsync-2.5.4/ rsync-2.5.4-fuzzy/
 --- rsync-2.5.4/   2002-02-26 05:48:25.0 +1100
 +++ rsync-2.5.4-fuzzy/ 2002-04-03 16:35:55.0 +1000
 @@ -28,7 +28,7 @@
  ZLIBOBJ=zlib/deflate.o zlib/infblock.o zlib/infcodes.o zlib/inffast.o \
   zlib/inflate.o zlib/inftrees.o zlib/infutil.o zlib/trees.o \
   zlib/zutil.o zlib/adler32.o 
 -OBJS1=rsync.o generator.o receiver.o cleanup.o sender.o exclude.o util.o 
 main.o checksum.o match.o syscall.o log.o backup.o
 +OBJS1=rsync.o generator.o receiver.o cleanup.o sender.o exclude.o util.o 
 main.o checksum.o match.o syscall.o log.o backup.o alternate.o
  OBJS2=options.o flist.o io.o compat.o hlink.o token.o uidlist.o socket.o 
 fileio.o batch.o \
  DAEMON_OBJ = params.o loadparm.o clientserver.o access.o connection.o 
 diff -urN rsync-2.5.4/alternate.c rsync-2.5.4-fuzzy/alternate.c
 --- rsync-2.5.4/alternate.c   1970-01-01 10:00:00.0 +1000
 +++ rsync-2.5.4-fuzzy/alternate.c 2002-04-03 17:04:15.0 +1000
 @@ -0,0 +1,117 @@
 +#include rsync.h
 +extern char *compare_dest;
 +extern int verbose;
 +/* Alternate methods for opening files, if local doesn't exist */
 +/* Sanity check that we are about to open regular file */
 +int do_open_regular(char *fname)
 + if (do_stat(fname, st) == 0  S_ISREG(st.st_mode))
 + return do_open(fname, O_RDONLY, 0);
 + return -1;
 +static void split_names(char *fname, char **dirname, char **basename)
 + char *slash;
 + slash = strrchr(fname, '/');
 + if (slash) {
 + *dirname = fname;
 + *slash = '\0';
 + *basename = slash+1;
 + } else {
 + *basename = fname;
 + *dirname = .;
 + }
 +static unsigned int measure_name(const char *name,
 +  const char *basename,
 +  const char *ext)
 + int namelen = strlen(name);
 + int extlen = strlen(ext);
 + unsigned int score = 0;
 + /* Extensions must match */
 + if (namelen = extlen || strcmp(name+namelen-extlen, ext) != 0)
 + return 0;
 + /* Now score depends on similarity of prefix */
 + for (; *name==*basename  *name; name++, basename++)
 + score++;
 + return score;
 +int open_alternate_base_fuzzy(const char *fname)
 + DIR *d;
 + struct dirent *di;
 + char *basename, *dirname;
 + char mangled_name[MAXPATHLEN];
 + char bestname[MAXPATHLEN];
 + unsigned int bestscore = 0;
 + const char *ext;
 + /* FIXME: can we assume fname fits here? */
 + strcpy(mangled_name, fname);
 + split_names(mangled_name, dirname, basename);
 + d = opendir(dirname);
 + if (!d) {
 + rprintf(FERROR,recv_generator opendir(%s): %s\n,
 + dirname,strerror(errno));
 + return -1;
 + }
 + /* Get final extension, eg. .gz; never full basename though. */
 + ext = strrchr(basename + 1, '.');
 + if (!ext)
 + ext = basename + strlen(basename); /* ext =  */
 + while ((di = readdir(d)) != NULL) {
 + const char *dname = d_name(di);
 + unsigned int score;
 + if (strcmp(dname,.)==0 ||
 + strcmp(dname,..)==0)
 + continue;
 + score = measure_name(dname, basename, ext);
 + if (verbose  4)
 + rprintf(FINFO,fuzzy score for %s = %u\n,
 + dname, score);
 + if (score  bestscore) {
 + strcpy(bestname, dname); 
 + bestscore = score;
 + }
 + }
 + closedir(d);
 + /* Found a candidate. */
 + if (bestscore != 0) {
 + char fuzzyname[MAXPATHLEN];
 + snprintf(fuzzyname,MAXPATHLEN,%s/%s, dirname, bestname);
 + if (verbose  2)
 + rprintf(FINFO,fuzzy match %s-%s\n,
 + fname, fuzzyname);
 + return do_open_regular(fuzzyname);
 + }
 + return -1;
 +int open_alternate_base_comparedir(const char *fname)
 + char fnamebuf[MAXPATHLEN];
 + /* try the file at 

Re: A question about rsync

2004-06-06 Thread Martin Pool
On  7 Jun 2004, Guo jing [EMAIL PROTECTED] wrote:
 Thanks for your answer!
 Yes,my question is that if we can get a good result when the file is 
 changing while it is being copied by rsync  
 In my test, if the file is being augmented while it been copied using 
 rsync.I can get a normal copy on the other end and the result file is the 
 same as what the source file is when the rsync scanning. The same result 
 can be gotten if the sour file is reduced and the blocks were not occupied. 
 As you said, if the source file reduced and the blocks were occupied by 
 other files there will be a file with other file's content and a abnormal 
 end on the other end.
 So,is this true that we can't deal with this problem except to do some 
 changes with the OS ?

Yes, or don't change the file while it is being copied.

To unsubscribe or change options:
Before posting, read:

Re: Bug reporting

2004-06-01 Thread Martin Pool
On  1 Jun 2004, John Summerfield [EMAIL PROTECTED] wrote:
 The jitterbug link on no longer works. I 
 suggest it either be fixed or removed.

Thanks, fixed.

 You make bug-reporting needlessly difficult, I think. I dislike the need to 
 subscribe to a mailing-list and potentially receive lots of email that 
 doesn't interest me. I have plenty of other email to keep me amused.

I don't think you need to subscribe to post.  I put the address
directly on the nobugs page to make it easier for people to write to
it.  Did you have any other suggestions about how to make it better?

The reason we took Jitterbug and faq-o-matic down is that people
seemed to get help more promptly when they wrote to the list.

 What I wanted most to do is ensure you know about rsyncx and consider working 
 with the authors to create a unified product that supports resource forks 
 when built on OSX.
 Their CVS repository is at
 It seems a shame to have two projects where one will do.

Well, sometimes there are reasons not to glom everything into one big

To unsubscribe or change options:
Before posting, read:

Re: I20 Drivers Crash system when used with Rsync

2004-06-01 Thread Martin Pool
On 30 May 2004, Dennis R. Gesker [EMAIL PROTECTED] wrote:
 Note: I don't know if this is a problem withe I20 drivers or Rsync so 
 I'm submitting to both the Kernel Bugzilla and the Rsync mailing list. I 
 couldn't find a bugzilla for Rsync. I hope this was the correct way to 
 submit this issue.
 Distribution: Debian
 Hardware Environment: Intel 850MV Mother board, Pentium 4 processor, 
 1Gig of RAM, Adaptec 2400A RAID Controler. Both the motherboard and 
 Controller card have the most recent BIOS/firmware installed. The 
 Adaptec card is capable of RAID configuration but currently it is 
 configured to view each of the attached IDE drives as individual drives. 
 None of the cards RAID features are presently beeing used. Network is 
 100MB/s Switched Ethernet. Network cables and connects have been tested 
 and verified.
 Software Environment: Very basic/vanilla Debian system install (Sid 
 branch). Software package is rsync.
 Problem Description: When transfering many sometimes large files (3Gig 
 in some cases)for backup purposes using rsync either via an ssh shell or 
 rsync server the I20 drivers cause a kernel panic. The system seems to 
 report increases in queue depth, shortly afterward the system completely 
 hangs indicating a kernel panic.

A kernel panic is by definition a kernel bug, not an application bug.

Good luck! :-)

p.s. kernel bug reports ought to say what kernel you're using.

 Steps to reproduce: Transfer files using rsync. Last specific command 
 issued at prompt that reproduced this error was:
 rsync --bwlimit=2048 -vv -r -e ssh --delete --exclude lost+found 
 rsync://[EMAIL PROTECTED]:873:/bu/area1/blue/* /bu/area1/blue
 This error does not seem to occour when transferrring the same file set 
 using cp  over nfs or scp. However, this does happen using rsync over nfs.

Description: Digital signature
To unsubscribe or change options:
Before posting, read:

Re: A question about rsync

2004-06-01 Thread Martin Pool
On 31 May 2004, Guo jing [EMAIL PROTECTED] wrote:
  I am a student in China.I like the linux and usually use the rsync to 
 backup my documents. Last week when I use it,I find a question I want to 
 discuss with you.
   The condition is like this: The source file that I want to rsync to 
 another computer is 129M before I start the rsync. During the running of 
 the rsync,the file was changed and became to about 50M, then the rsync 
 ended. When I view the destination, I found that the file was 129M. And 
 there were some contents of the files added when the rsync was running. 
  After that, I do some tests about the rsync:
   1. After I start the rsync to backup a file, I delete the file 
 during the rsync is running, I found the file can been backuped normally.
   2. While the rsync is backuping a file name sourfile (50M), I add 
 some content by the command cat addfile  sourfile to enlarge the file 
 to 100M. After the rsync finished.I found the file is still 50M.
The question is that , how the rsync copy a file to another computer at 
 the first time ? My attitude is that it remenbers the physical blocks the 
 file used when the rsync scaned. Then ,rsync will send the blocks to the 
 destination no matter if the file or the block has changed. So, is that 
 right?? Who can tell me how the rsync decide which contents should to send 
 to the destiation?

I'm not sure I understand the question, sorry.  

If you change a file while it is being copied by rsync you may end up
with undefined results on the other end.  There is not much that can
be done about this without os-level version control.

  Sorry, my English is very poor. Thanks for your read and answer!!
  MSN Messenger:  

Description: Digital signature
To unsubscribe or change options:
Before posting, read:

CVS update: rsyncweb

2004-06-01 Thread Martin Pool

Date:   Tue Jun  1 09:08:29 2004
Author: mbp

Update of /data/cvs/rsyncweb
In directory

Modified Files:
Log Message:
Clean up mention of mailing list.

nobugs.html 1.8 = 1.9
rsync-cvs mailing list

CVS update: rsyncweb

2004-05-31 Thread Martin Pool

Date:   Tue Jun  1 02:07:38 2004
Author: mbp

Update of /data/cvs/rsyncweb
In directory

Modified Files:
Log Message:

features.html   1.2 = 1.3
rsync-cvs mailing list

faq-o-matic gone

2004-05-27 Thread Martin Pool
The rsync faq-o-matic was broken during the recent machine migration.
Since there was relatively little useful content and a lot of
unanswered or pointless questions, I am going to remove the links to

If anyone feels like maintaining an FAQ please do so.


Description: Digital signature
To unsubscribe or change options:
Before posting, read:

CVS update: rsyncweb

2004-05-27 Thread Martin Pool

Date:   Fri May 28 02:25:19 2004
Author: mbp

Update of /data/cvs/rsyncweb
In directory

Modified Files:
Log Message:
remove dead faq-o-matic

header.html 1.14 = 1.15
rsync-cvs mailing list

(fwd from rsync: Request for a feature

2004-05-02 Thread Martin Pool
- Forwarded message from Paulo da Silva [EMAIL PROTECTED] -

From: Paulo da Silva [EMAIL PROTECTED]
Subject: rsync: Request for a feature
Date: Sun, 02 May 2004 17:09:11 +0100
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040317
X-Spam-Status: No, hits=-0.9 required=3.2 tests=BAYES_10 autolearn=ham 


1st. of all thank you for mantaining this very useful program.
It helps me a lot in a lot of tasks that would be otherwise very
tedious and time consuming.

However, I think that a small feature could make it yet more
powerfull when used as a backup tool.
The idea was to have a switch so that files could be kept
compressed at the destination. These compressed files could
be then restored the same way specifing a switch telling that
source files are to be uncompressed.
Files with known extensions (.gz, .zip, ...) should not be
All files must keep the original names unchanged.

   export RSYNC_COMPRESSED_EXTS=.gz .zip ... ;# Extensions from files 
not to be compressed
   rsync -av --delete --zip MyDir/ BackupDir
   rsync -av --delete --unzip BackupDir/ MyDir

This is only a sugestion. You may find a better solution.

Thank you.
Paulo da Silva

- End forwarded message -

Description: Digital signature
To unsubscribe or change options:
Before posting, read:

Re: test message only

2004-04-23 Thread Martin Pool
On 23 Apr 2004, Jim Salter [EMAIL PROTECTED] wrote:
 This is a test message - my apologies for it, but everything I send is 
 getting bounced.

Our spamfilter was a little too hasty.  It should be OK now.

To unsubscribe or change options:
Before posting, read:

OT: fyi, spam

2004-01-14 Thread Martin Pool
Just as background information: our spam filter caught 14000 attempted
spams in the last two weeks.  Suggestions on blocking more are welcome
but the vast majority is already blocked.  I think we removed the whitelist.


Description: Digital signature
To unsubscribe or change options:
Before posting, read:

CVS update: rsyncweb

2004-01-11 Thread Martin Pool

Date:   Mon Jan 12 00:49:28 2004
Author: mbp

Update of /data/cvs/rsyncweb
In directory

Modified Files:
Log Message:
Fix link to Smart Questions document.

lists.html  1.5 = 1.6
rsync-cvs mailing list

Re: rsync / ssh -i

2003-12-04 Thread Martin Pool
On  4 Dec 2003, Michael [EMAIL PROTECTED] wrote:
 I know that with ssh I can issue the -i command to use a different identity.
 Is there anyway to use the -i command with rsync and ssh?  Thank

Use the IdentityFile and Host keywords in your ssh_config:

  Host suzy-alt-key
  IdentityFile ~/.ssh/id_some_other_dsa

Martin -- Adelaide, January 2004

Description: Digital signature
To unsubscribe or change options:
Before posting, read:

CVS update: rsyncweb

2003-12-04 Thread Martin Pool

Date:   Thu Dec  4 10:59:33 2003
Author: mbp

Update of /data/cvs/rsyncweb
In directory

Modified Files:
Log Message:
Clarify that the problem is with 2.5.6 *and earlier*.

Add CVE index.

index.html  1.17 = 1.18
rsync-cvs mailing list

Re: rsync-bugs and unclear semantics when copying multiple source-dirs to one target

2003-11-24 Thread Martin Pool
On 24 Nov 2003, Dirk Pape [EMAIL PROTECTED] wrote:
 Dear Martin Pool,
 I tried to ask via the rsync-mailing list but never got an answer. So I 
 contact you directly.
 I refer to the rsync syntax
 rsync [OPTION]... SRC [SRC]... DEST
 with more than one SRC, which is mentioned in the man-pages.
 We use this form to overlay a target directory tree from more than one 
 sources (class, group1, group2, ..., machine) to yield a costomized 
 cloned directory.
 There are some glitches and bugs when using this form of rsync commands, 
 one of which I have described in the here attached mail to the rsync 
 mailing list. This is a platform specific bug.

The heart of the problem is that you are trying to write the same file
from several different source directories.  I think this just will not
work predictably in the current design of rsync, because it builds a
single list of all files at the start of the transfer.  Furthermore
the order in which files are transferred is rather strange, for
reasons of historical compatibility.

I think we do not make any guarantees about what happens if the same
relative path occurs in several source directories the behaviour is

I agree that it would be nice if it processed the source directories
in the order they are given, but that is not how it works.

At the moment your options are:

  Fix rsync to support this behaviour.

  Transfer the directories one at a time to build up the destination.
  This has several problems, one being that there may be many
  redundant transfers and another that the state will be inconsistent
  for longer.

  Make a single source directory that has the state you want.

  Ditto, but use union bind mounts to synthesize it from several
  directories, assuming that your OS supports that.

  Use some other tool.

  Do several rsync transfers using exclude/include options to pick the
  right directories from each overlay.

The last is possibly the most promising.  You could even write a
little Perl script to build the exactly correct include lists.

 There is another glitch, which I will describe here:
 if you have the following directory structure (- is softlink)
 ./dir2/dir - ../dir3/dir
 and do
 rsync -av --delete dir1/ dir2/ target
 you get
 ./target/dir - ../dir3/dir
 I would expect either
 Variant 1:
 ./target/dir - ../dir3/dir
 (contents of /dir1/dir is ignored because dir ist overlayed with a 
 symlink in dir2)
 Variant 2:
 ./target/dir - ../dir3/dir
 (./dir1/dir/a is copied following the overlayed symlink *but* the --delete 
 then also has to follow the symlink)
 I would prefer strongly to see variant 1 or a new option to protect target 
 directories from changing contents by linking in o them.
 For your motivation:
 Our more complex scenario is like that: We have
 machine/usr/share/bugzilla - /local/usr/share/bugzilla
 and we do something like
 rsync -av --delete --exclude local class/ machine/ targethost:/
 the --exclude local protects files in targethost:/local from being 
 deleted but not from being overwritten with files which are present in 
 class/usr/share/bugzilla/ on the scr-host.
 I would like to see an option (or standard semantics) to simply killing a 
 directories sub-filelist when the directory is overlayed by a symlink in 
 a source directory given later in the command line. May be it would suffice 
 to do that only if the symlink points to a directory, which is outside 
 all source dirs or element of an exclude list.
 I hope you understand and can help me.
 Dirk Pape.

 From: Dirk Pape [EMAIL PROTECTED]
 Subject: bug (filelist) for platforms solaris and darwin (macosx) and *not*
 Date: Sun, 28 Sep 2003 13:19:45 +0200
 X-Mailer: Mulberry/3.1.0b7 (Mac OS X)
 I have found a nasty bug when a file, which is in some of many sources, 
 shall be copied to a target.
 The linux-Version works well but rsync 2.5.{2|5|6} under solaris9 (gcc 
 2.95.3) and darwin (gcc 3.1) do not. The decision which file (out of which 
 src) shall be copied depends on the number of src dirs given on the command 
 This bug bytes us very hard, because we decided to rely on rsync to build 
 local directories by overlaying different directories from a server and 
 need to be sure to have a consistent semantics in what version of the file 
 appears in the local directory.
 I stripped our sitation down to a (yet fairly complex) test archive, so you 
 can reproduce the situation.
 Here is the script, which is also in the archive:
 $rsyncpath -av --delete  dir1/ dir2/ merged12
 $rsyncpath -av --delete  dir1/ dir2/ dir3/ merged123
 # as dir3 only consists of an empty dir subdir we expect
 # that merged12 and merged123 have identical files in them
 # but merged*/subdir/s0/LOOKATTHIS differ

Re: rsync rcp

2003-10-30 Thread Martin Pool
On 30 Oct 2003, [EMAIL PROTECTED] wrote:
 I was hoping that since you guys are the authors to rsync that
 you could answer a simple question for me.
 I'm trying to transfer files via the rsh/rexec protocol by
 remotely executing a cat command, i.e. cat  foo.txt
 and then sending data through the socket to the stdin of the remote
 process. This all works fine, except for the fact that I have to
 close the socket to force and end of file. 
 My question is, does rcp/rsync close a socket when it sends files
 to signify and end of file? If not, how does it send multiple files
 without closing the socket?

It uses a binary protocol to delimit files and describe metadata such
as their name and ownership.  As you say you cannot use the
end-of-file mark more than once.  It is conceptually similar to a tar

So if you wanted to send multiple files with just rsh, you could do

  tar c mydir | ssh somehost tar x

[EMAIL PROTECTED] is a better forum for questions.
To unsubscribe or change options:
Before posting, read:

Re: The rsync daemon as a gateway service?

2003-10-22 Thread Martin Pool
That's an interesting idea.

As a temporary measure you might different tcp ports rather than
module names to distinguish different services, and then use tcp

To unsubscribe or change options:
Before posting, read:

Re: Filesystem panic

2003-10-22 Thread Martin Pool
On 22 Oct 2003, Morten [EMAIL PROTECTED] wrote:
 I'm running RH9, 2.4.20-18.9. Each night, the server mounts
 an external FAT32 disk using firewire, and performs backups
 to it using rsync.
 Twice within the past 3 months, the backup process has resulted
 in machine crash (complete hang, hardware reboot needed).
 From /var/log/messages:
 Oct 22 04:02:20 yoda kernel: Filesystem panic (dev 08:21).
 Oct 22 04:02:20 yoda kernel:   fat_free: deleting beyond EOF
 Oct 22 04:02:20 yoda kernel:   File system has been set read-only

You probably need to report this to the vfat fs maintainer

P:  Gordon Chaffee
S:  Maintained

 From the rsync error log, I can see that the filesystem becomes
 read-only, and that it begins to fail the synchronization task with
 .Delecelle.psd.Rz49bM failed: Read-only file system

 Which is understandable. After doing this for a while, the error
 message changes to Too many open files. And I suspect that this is
 what causes the machine to hang. 

Can you please try to reproduce the problem, and then do

  lsof -p PID_OF_RSYNC

for each rsync process sometime before it starts complaining about too
many open files.  Then kill rsync to avoid the problem.

 Is there any way to configure rsync to abort execution once the
 first error occurs?

Not at the moment.

To unsubscribe or change options:
Before posting, read:

Re: doing an md5sum rsync?

2003-09-09 Thread Martin Pool
On  9 Sep 2003 Greger Cronquist [EMAIL PROTECTED] wrote:

 See also unison, which does
 exactly this (and synchronizes using the rsync algorithm).

Yes, Unison is very cool.  I hadn't realized that it detected renames

To unsubscribe or change options:
Before posting, read:

Re: performance suggestion: sparse files

2003-09-09 Thread Martin Pool
On  9 Sep 2003 Jon Howell [EMAIL PROTECTED] wrote:

  Actually you can guess by looking at the allocated-blocks measure,
  and use this to guess whether it's preallocated zeros or sparse,
  which might be useful for backups.  But there is no way around
  reading the blocks.
 Sure. Bummer; that's a lot of memory bus bandwidth (having the kernel
 zero-fill the blocks, then having rsync zero-compare them) wasted.

If the program mmaps the file the kernel will fill the vm with COW
references to the zero page and it will be quite cheap.

 Seems like a fcntl() is in order.

Repeat after me: premature optimization is the root of all evil.

To unsubscribe or change options:
Before posting, read:

Re: rsyncing *to* live system

2003-09-09 Thread Martin Pool
On 26 Aug 2003 jw schultz [EMAIL PROTECTED] wrote:

 On Wed, Aug 27, 2003 at 09:25:41AM +1200, Steve Wray wrote:
  Hi there,
  I have been asked to develop a system for keeping
  a bunch of machines remotely configured and updated.
  The client has asked for this to be implemented using rsync.
  The machines involved are located at remote sites and
  physical access by competent personel is non trivial.
  And the systems are running Debian.
  I am a little concerned at the prospect of using rsync to
  replace running binaries, open files and system libraries.
  I've searched for an example where rsync has been used in this way.
  So far I have found nothing; people use it to backup a live
  filesystem; we are tasked with doing the reverse (sort of).
  And there are people who use rsync to replicate systems (rolling out
  a bunch of identical boxes; typically these recieve the rsync
  *before* they go live not after).
  So, can anyone please give me arguments or reasons for
  or against using rsync in this way? References to sites
  which currently use rsync in this way would be much appreciated.
 There are some difficulties that can occur depending on how
 you structure your filesystems.
 It is possible to produce temporary dangling symlinks.
 Rsync may remove the destination of the link before
 the symlink is updated or deleted (see --delete-after); or
 if rsync creates or updates a symlink before the destination
 is created.
 You can get inter-file inconsistencies.  The file sets are not
 updated atomically so different config files and binaries
 may be updated at slightly different times.  Because rsync
 processes the file list in lexical order the window size will
 depend on the relative remoteness of files in the directory
 hierarchy so files in the same directory have small windows
 but files in different subtrees will have a somewhat larger

Here is an example of a bad case: a program depends on a shared
library, and needs to be recompiled when a new version of the library
is released.  Your transfer upgrades the program before it updates the
library (or vice versa) and the program crashes.

I agree with JW and will just add that the inter-file inconsistencies
could be far worse if the transfer is ever interrupted due to e.g. a
network outage.  If you interrupt it at the right (or wrong) time it
is possible that rsync will no longer be able to run.

dpkg knows how to upgrade software in a safe and sane way, avoiding
all these problems.  Let it do its job.  By all means use rsync to
transfer the packages, but then run apt or dpkg.

In addition, once you upgrade software, you will want to restart
daemons to make sure the upgraded stuff is used.  dpkg handles that

To unsubscribe or change options:
Before posting, read:

Re: Looking for atime reset...

2003-09-09 Thread Martin Pool
On  9 Sep 2003 Saylor, Ted [EMAIL PROTECTED] wrote:

 I find rsync an excellent tool when I need to move multi-gigabyte
 filesystems, because I can do most of the copying during the week -
 then a quick cleanup sweep in our 4 hour outage window.
 I do need to somehow get the atime's to copy over, because as it
 stands now I loose the age information (which we will soon be using
 for auto-archiving) on things I copy with rsync.
 Would it be that hard to enhance rsync to copy the atime along with
 the current mtime info?
 Does anyone have a speedy script, perl, or C program to cleanup the
 atime after the final rsync is done?

You don't say what operating system or filesystem you're using, but on
Linux there is no standard way to change the atimes of a file, so
there is nothing rsync can do about it.

If you persuade your friendly neighbourhood kernel hacker / vendor to
add an operation to do this then I suppose rsync could support it.

To unsubscribe or change options:
Before posting, read:

Re: Operation not permitted?

2003-09-09 Thread Martin Pool
On  9 Sep 2003 Max Kipness [EMAIL PROTECTED] wrote:

 Can someone tell me what the problem is here. I am doing an rsync on a
 sendmail spool directory to a folder that is a samba mount. 

What do you mean by a samba mount?  A filesystem mounted over smbfs?

 Why is rsync trying to change owner?

Because you told it to, using the -a, -o or -g options.

 Does it have to?

You asked for it, you got it :-)

 I tried manually changing owner (as root) on a file that is sitting on
 the samba mount and I got the same operation not permitted error.

Assuming you're using smbfs, it's because smbmount logs in to the
server as a single NT user, and all files appear to be owned by that
user.  Ownership is not preserved.

cifsfs may fix this, but you need to ask about that elsewhere.

To unsubscribe or change options:
Before posting, read:

Re: doing an md5sum rsync?

2003-09-08 Thread Martin Pool
On  7 Sep 2003 Marc MERLIN [EMAIL PROTECTED] wrote:

 I don't know if this has been requested before, but I would really
 like for rsync to compute an md5sum for each file at the source and
 destination (with a flag turned off by default of course), and it
 would realize that I renamed files at the source by noticing a
 matching md5sum between different filenames
 It would then rename the destination instead of deleting it and
 resending the entire source, just because the filename changed.
 This would also take care of me moving files between directory trees,
 and again do a mv instead of a delete/resend (if I rsync the root of
 all that of course)
 Or is this possible already?

This is not possible yet.  It is on my wishlist for a future program.

Of course remotely detecting files that have moved between directories
might mean having the server hash every file on the filesystem.  So it
might be quite expensive...

To unsubscribe or change options:
Before posting, read:

Re: Add a feature : disk and partition cloning

2003-09-08 Thread Martin Pool
On  2 Sep 2003 [EMAIL PROTECTED] wrote:

 Today, I use rsync  for updating some 40 Debian/Linux box, rsync is

 So, now, I'll need to update a whole disk or partition (NTFS) with an
 image or an other disk or part. (case multiboot system), 
 can'I hope rsync do this task in some day ?

I agree that it would be a cool feature.

It's unlikely that the existing codebase would be extended for it, but
something like rdiff might support it eventually.

In the meantime, just dd across ssh.

 rsync algorithm would be great for this task, isn't-it ?

Not directly; the basic rsync algorithm cannot update in place.  You
might adapt it to do so though.

 I'don't mind how amount of works this feature need, 
 but some folks are interresting in ?

You don't mind how much work other people do for you?  How gracious.

Or were you volunteering to write it?  If so, adding in-place updates
to rdiff would be a good place to start.

To unsubscribe or change options:
Before posting, read:

Re: rsync daemon and secrets file

2003-08-26 Thread Martin Pool
On Mon, 25 Aug 2003 12:49:36 -0400
Hardy Merrill [EMAIL PROTECTED] wrote:

   rsync -avv [EMAIL PROTECTED]::test-secret/one_secret

Yes, that's better.

 Although 'man rsync' does technically describe this
 PROGRAM section with this command:
   rsync -av --rsh=ssh -l ssh-user [EMAIL PROTECTED]::module[/path]

But that's not what you're doing here.  You're just connecting to an
rsync server over TCP.

 IMHO, it would enhance user understanding to provide a
 concrete EXAMPLE of this.  Also, it would help in
 'man rsyncd.conf' not only to see an example of an
 rsyncd.conf file, but also to see examples of the
 different transfers that could be done with that
 rsyncd.conf file.  I'm not criticizing - just mearly
 noticing an area that given some attention, could
 increase user understanding and decrease support.

Could you please draft a couple of paragraphs to add to the manual
that you think would improve it?  If you post them here I will check
them and commit them.


GNU does not eliminate all the world's problems, only some of them.
-- The GNU Manifesto
To unsubscribe or change options:
Before posting, read:

Re: mknod / rsync error

2003-08-26 Thread Martin Pool
On 22 Aug 2003 16:11:21 +0200
Lars Bungum [EMAIL PROTECTED] wrote:

 I'm experiencing these problems as described in this mail:
 From: Thomas Quinot ([EMAIL PROTECTED])
 Subject: Rsync 2.5.5: FreeBSD mknod can't create FIFO's 
 This is the only article in this thread 
View: Original
  Newsgroups: mailing.unix.rsync
 Date: 2002-06-24 06:05:25 PST 
 The following patch (adapted to rsync 2.5.5 from the one posted in
 Dec. 2000,

n.b. the patch quoted in this mail was truncated.

That looks reasonable to me.


GNU does not eliminate all the world's problems, only some of them.
-- The GNU Manifesto

Description: PGP signature
To unsubscribe or change options:
Before posting, read:

Re: [librsync-devel] Re: state of the rsync nation? (revisited6/2003 from 11/2000)

2003-08-02 Thread Martin Pool
On  8 Jun 2003, Donovan Baarda [EMAIL PROTECTED] wrote:

 regarding librsync... It is still in sort-of-active development on
 SourceForge by a variety of developers... a new release is waiting in
 CVS for me to finally get around to releasing it, but I'm busy on a big
 contract at the moment so its currently on hold pending some more
 cygwin/win32 testing. It is in active use by projects like rdiff-backup.

 AFAIK, rproxy is pretty much dead, and the only version that exists
 depends on a very old version of libhsync. The closest thing to this
 available now is the http proxy proof of concept with xdelta, but it's
 radically different in many ways to the old rproxy (due to xdelta not
 using signatures).

The main reason why rproxy is dead is that dynamically-generated HTML
files, where in principle rproxy does best, are just not a very
important problem for many people at the moment.

For users with ADSL or better HTML is not a problem, binaries may be.  

I realize there are people in the world who are still using modem
connections where rproxy might be a win but I don't see any of them
asking for commit access.  

A positive thing for rproxy is that both Mozilla and Apache2 are
pretty stable now, and they have good interfaces for adding streaming
compression support.  Neither Apache1.2 or Netscape, which were
dominant when I started on it could handle this very well.

 This is largely still true, except libhsync changed back to librsync and
 now has its own project on SourceForge separate from the mostly defunct
 rproxy. librsync itself has no wire format, being just a general
 purpose signature/delta/patch library implementing the rsync algorithm.
 The comments about rsync never using libhsync/librsync are still true
 for the foreseeable future. There are many things rsync includes that
 are still missing from librsync, and the rsync implementation is very
 tightly coupled, with many backwards compatibility issues. Even when
 librsync reaches the point of being as good or better than rsync at
 signature/delta/patch calculation, it would be a major task to fit it
 into rsync.

I think it's best at the moment to let rsync continue as a nice stable
program, good at what it does.  Wayne and myself have been toying with
replacements in the background and perhaps in the future something
better will come out, and perhaps it will use librsync.

 rsync also has more active development, mostly in the form of
 incremental feature additions and the resulting bugfix fire-fighting,
 all of which lead to an even more tangled implementation. Occasionally
 there are efforts to re-write and clean up sections of the code, but
 they are (rightly) regarded cautiously because of the breakage risk
 involved for little immediate gain.

 The librsync code in CVS is still largely not very good. It is pretty 
 messy and needs a good cleanup.

True. :-/

I think I got a bit mentally twisted up by trying to support
nonblocking operation, which I still think is very important the the
library being generally useful.  Doing this in C is a bit hard.  But
it could certainly be done much better.

The other thing I would really like to see is a more thorough test

 The API is mostly OK though, and it _does_ work quite well, with no
 known bugs. I have some plans for a major cleanup and optimisation
 of the code based on my experiences with pysync. I have a patch
 submitted that I plan to commit after the next release that
 optimises and cleans up the delta calculation code quite a bit.
 The next big thing in delta calculation is probably going to be the
 vcdiff encoding format, which should allow a common delta format for
 various applications and supports self-referencing delta's, which
 makes it capable of compression. According to the xdelta project this
 has already been implemented, and I'm keen to see Josh's code, as it
 could be used as the basis for a cleanup/replacement of at least the
 patch component of librsync.

Yes, that sounds good.

 I think someone has a Perl wrapper for librsync that was being used
 as a test bed for rsync 3 type development (superlifter?).

superlifter was my prototype.  It uses Python, and in fact just
calls out to rdiff at the moment.

At the moment I see it as another layer above librsync/rdiff that
provides pipelined delta-compressed remote network IO, optionally over
SSL or SSH.  On top of this you could build a batch transfer like
rsync 2.6, or an interactive client, or a backup system like
Duplicity, or a real-time mirror based on dnotify.

 For the future I can see continued support of the exising rsync code. I
 would also like to see librsync adopt vcdiff as it's delta format, and
 get a major cleanup, possibly by re-using some xdelta code. There are
 many common elements to the xdelta and rsync algorithms, and I see no
 reason why a single library couldn't support both (as pysync does). It
 would be nice if librsync and/or xdelta could become _the_ delta

I heartily 

Re: patch draft for extended attributes on linux

2003-06-25 Thread Martin Pool
On 25 Jun 2003, Wayne Davison [EMAIL PROTECTED] wrote:
 On Wed, Jun 25, 2003 at 10:34:38AM +1000, Martin Pool wrote:
  There is no mtime for xattrs, so they are transferred every time as
  part of the file list.
 One possibly better solution would be to create some kind of CRC of the
 xattr data (MD4/MD5/whatever) and send just that in the file list for
 each file.  This would allow you to figure out when to update the xattr
 data, but the protocol would need to be modified to send the xattr data
 during the file-update phase (and possibly to allow the reciever to
 request just an xattr update without doing a file update).

That's a pretty good idea.  For the moment I just wanted a minimal
patch, as traffic size is not an overwhelming consideration for the
particular user I was helping.

However, for many realistic cases the xas are quite small.  It is
entirely possible for a file's attr and value them to be smaller than
a 20-byte SHA1.  (Well, perhaps not with my inefficient packing, but
in principle they might be.)

In cases where xattrs are used for security information, it might not
be sufficient to apply them just at the end of the transfer.  That
might make the permissions on the temporary file too weak.  Or perhaps
not -- I just didn't want to think about it. :-)

To unsubscribe or change options:
Before posting, read:

patch draft for extended attributes on linux

2003-06-24 Thread Martin Pool
This draft patch adds support for transferring extended attributes
with a new --xattr option.  It ought to work on Linux with XFS or
ext2/ext3 filesystems with the SGI/bestbits attribute system.

It is partially working, but there seems to be some kind of hang bug
while transferring the file list.  I suspect it might be provoking a
problem in io.c.

You need to rerun autoconf, autoheader and configure after applying

There is no mtime for xattrs, so they are transferred every time as
part of the file list.  This means that they will be updated correctly
if you change attributes but do not change the file.

I wrote this because it was required by a colleague.  I have mixed
feelings about whether this ought to be merged, even once it's working
correctly.  rsync hardly needs more options or protocol
variations. :-( (Amusingly enough I once said -xattr instead of
--xattr and it silently did something else.)

diff -urpdN -x .ignore -x packaging -x cvs.log -x configure -x -x 
autom4te.cache -x config.log -x .cvsignore -x dummy -x .svn -x ID -x TAGS 
rsync-2.5.6/ xa/
--- rsync-2.5.6/ 2003-01-21 05:26:14.0 +1100
+++ xa/  2003-06-24 15:08:09.0 +1000
@@ -34,7 +34,7 @@ OBJS1=rsync.o generator.o receiver.o cle
main.o checksum.o match.o syscall.o log.o backup.o
 OBJS2=options.o flist.o io.o compat.o hlink.o token.o uidlist.o socket.o \
fileio.o batch.o clientname.o
-OBJS3=progress.o pipe.o
+OBJS3=progress.o pipe.o xattr.o
 DAEMON_OBJ = params.o loadparm.o clientserver.o access.o connection.o authenticate.o
 popt_OBJS=popt/findme.o  popt/popt.o  popt/poptconfig.o \
popt/popthelp.o popt/poptparse.o
diff -urpdN -x .ignore -x packaging -x cvs.log -x configure -x -x 
autom4te.cache -x config.log -x .cvsignore -x dummy -x .svn -x ID -x TAGS 
rsync-2.5.6/cleanup.c xa/cleanup.c
--- rsync-2.5.6/cleanup.c   2003-01-27 14:35:08.0 +1100
+++ xa/cleanup.c2003-06-24 16:16:58.0 +1000
@@ -26,7 +26,7 @@
  * shutdown() of socket connections.  This eliminates the abortive
  * TCP RST sent by a Winsock-based system when the close() occurs.
-void close_all()
+void close_all(void)
int max_fd;
diff -urpdN -x .ignore -x packaging -x cvs.log -x configure -x -x 
autom4te.cache -x config.log -x .cvsignore -x dummy -x .svn -x ID -x TAGS 
rsync-2.5.6/ xa/
--- rsync-2.5.6/configure.in2003-01-28 16:27:40.0 +1100
+++ xa/ 2003-06-24 20:27:45.0 +1000
@@ -5,7 +5,7 @@ AC_CONFIG_SRCDIR([byteorder.h])
 AC_MSG_NOTICE([Configuring rsync $RSYNC_VERSION])
@@ -267,6 +267,7 @@ AC_CHECK_HEADERS(glob.h mcheck.h sys/sys
@@ -414,6 +415,7 @@ AC_CHECK_FUNCS(waitpid wait4 getcwd strd
 AC_CHECK_FUNCS(fchmod fstat strchr readlink link utime utimes strftime)
 AC_CHECK_FUNCS(memmove lchown vsnprintf snprintf asprintf setsid glob strpbrk)
 AC_CHECK_FUNCS(strlcat strlcpy strtol mtrace mallinfo setgroups)
 AC_CACHE_CHECK([for working socketpair],rsync_cv_HAVE_SOCKETPAIR,[
diff -urpdN -x .ignore -x packaging -x cvs.log -x configure -x -x 
autom4te.cache -x config.log -x .cvsignore -x dummy -x .svn -x ID -x TAGS 
rsync-2.5.6/flist.c xa/flist.c
--- rsync-2.5.6/flist.c 2003-01-19 05:00:23.0 +1100
+++ xa/flist.c  2003-06-25 08:29:52.0 +1000
@@ -1,7 +1,7 @@
Copyright (C) Andrew Tridgell 1996
Copyright (C) Paul Mackerras 1996
-   Copyright (C) 2001, 2002 by Martin Pool [EMAIL PROTECTED]
+   Copyright (C) 2001-2003 by Martin Pool [EMAIL PROTECTED]

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@@ -422,6 +422,12 @@ static void send_file_entry(struct file_
+if (opt_xattr) {
+xalist_send(f, file-xattrs);
if (preserve_hard_links  S_ISREG(file-mode)) {
if (remote_version  26) {
@@ -457,7 +463,9 @@ static void send_file_entry(struct file_
+ * This matches up with send_file_entry()
+ **/
 static void receive_file_entry(struct file_struct **fptr,
   unsigned flags, int f)
@@ -555,6 +563,13 @@ static void receive_file_entry(struct fi
sanitize_path(file-link, file-dirname);
+if (opt_xattr) {
+xalist_receive(f, file);
if (preserve_hard_links

Re: patch draft for extended attributes on linux

2003-06-24 Thread Martin Pool
On 24 Jun 2003, jw schultz [EMAIL PROTECTED] wrote:

 I don't much care for sending the xattrs as part of the file list.
 Even the 4KB ext[23] _currently_ limit it to is huge.

I would have preferred to do it doing the regular transfer, rather
than in the file list, but that seemed to make it a bit harder to
ensure that the attributes were always applied even if the file was
not otherwise modified, or if it were a symlink, etc.

 Nevertheless i do think it worth having something in

Yes, if I can get this working I think the best place for it to end up
is as an unofficial patch.

To unsubscribe or change options:
Before posting, read:

Re: patch draft for extended attributes on linux

2003-06-24 Thread Martin Pool
On 24 Jun 2003, jw schultz [EMAIL PROTECTED] wrote:
 That lack of an mtime for xattr could well cause
 difficulties for backup systems as well.  Perhaps a note to
 the filesystems people is in order.  The problem is that you
 can't use mtime for these.  It really needs its own
 timestamp, perhaps as a mandatory system attribute.

Yes, I think so.  If that still makese sense when I'm finished on this
I will send mail.

Perhaps it would fit well into the reiserfs 'tree of small things'
model of the world.

 I don't much care for sending the xattrs as part of the file list.
 Even the 4KB ext[23] _currently_ limit it to is huge.

I'm not sure what is typical here.  The situation I'm working on is
replicating a Samba share which is storing ACLs and EAs in XFS EAs.
Most of them will be pretty small, and most files won't have them.  

For a small tree with a short XA on each file and no other changes,
it's like this:

[data]$ ~/work/rsync/xa/rsync -aPzv distcc-2.7.1/ dest --xattr
building file list ...
161 files to consider
wrote 12365 bytes  read 20 bytes  24770.00 bytes/sec
total size is 1115540  speedup is 90.07
[data]$ ~/work/rsync/xa/rsync -aPzv distcc-2.7.1/ dest
building file list ...
161 files to consider
wrote 3027 bytes  read 20 bytes  6094.00 bytes/sec
total size is 1115540  speedup is 366.11

I think it's tolerable.

To unsubscribe or change options:
Before posting, read:

Re: Oops more testing was required....

2003-06-18 Thread Martin Pool
On 17 Jun 2003, Rogier Wolff [EMAIL PROTECTED] wrote:
 Oops. Missed one line in the last patch

Thankyou.  That looks good.

If we're going to make this more accurate it might be worthwhile to
actually look at how long we really did sleep for, and use that to
adjust time_to_sleep rather than resetting to zero.

Also I'd prefer the variable be called micros_to_sleep or
us_to_sleep.  Small point I know.

 diff -ur rsync-2.5.6.orig/io.c rsync-2.5.6/io.c
 +++ rsync-2.5.6/io.c  Tue Jun 17 23:43:49 2003
 @@ -416,10 +416,19 @@
   * use a bit less bandwidth than specified, because it doesn't make up
   * for slow periods.  But arguably this is a feature.  In addition, we
   * ought to take the time used to write the data into account.
 + *
 + * During some phases of big transfers (file XXX is uptodate) this is
 + * called with a small bytes_written every time. As the kernel has to
 + * round small waits up to guarantee that we actually wait at least
 + * the requested number of microseconds, this can become grossly
 + * inaccurate. We therefore keep a cumulating number of microseconds
 + * to wait, and only actually perform the sleep when the rouding
 + * becomes insignificant. (less than 10%) -- REW.
  static void sleep_for_bwlimit(int bytes_written)
   struct timeval tv;
 + static int time_to_sleep = 0; 
   if (!bwlimit)
 @@ -427,9 +436,13 @@
   assert(bytes_written  0);
   assert(bwlimit  0);
 - tv.tv_usec = bytes_written * 1000 / bwlimit;
 - tv.tv_sec  = tv.tv_usec / 100;
 - tv.tv_usec = tv.tv_usec % 100;
 + time_to_sleep += bytes_written * 1000 / bwlimit; 
 + if (time_to_sleep  10) return;
 + tv.tv_sec  = time_to_sleep / 100;
 + tv.tv_usec = time_to_sleep % 100;
 + time_to_sleep = 0; 
   select(0, NULL, NULL, NULL, tv);

To unsubscribe or change options:
Before posting, read:

Re: Oops more testing was required....

2003-06-18 Thread Martin Pool
On 18 Jun 2003, jw schultz [EMAIL PROTECTED] wrote:
 On Wed, Jun 18, 2003 at 09:09:59PM +1000, Martin Pool wrote:
  On 17 Jun 2003, Rogier Wolff [EMAIL PROTECTED] wrote:
   Oops. Missed one line in the last patch
  Thankyou.  That looks good.
  If we're going to make this more accurate it might be worthwhile to
  actually look at how long we really did sleep for, and use that to
  adjust time_to_sleep rather than resetting to zero.
 That would have to be a platform specific thing since not
 all systems modify the timeout value to reflect the amount
 of time not slept.  Nevertheless that is a nice idea.

Right, I know that is not portable but I forgot to say so.  As Rogier
say, you need to call gettimeofday() or some such.

To unsubscribe or change options:
Before posting, read:

Re: Smoother bandwidth limiting

2003-06-18 Thread Martin Pool
On  4 Feb 2003, jw schultz [EMAIL PROTECTED] wrote:

 Yes but i'd like to hear from some people who know network
 performance programming.

I know only enough to be mildly dangerous.  :-)

I don't think you can do this optimally in userspace, because there is
lots of buffering between what we write to the kernel and what gets
onto the wire, which is generally what the user cares about.  

It will interact with the MTU, which is generally small enough not to
matter, but also with the TCP window size.  I think by throttling our
connection we will also change the TCP window dynamic behaviour.  

In particular with no bwlimit rsync will often be blocked on network
IO, but it may not be with bwlimit.  This might make a difference to
whether the Nagle algorithm comes in to effect to get packets pushed

There is also some kind of interaction with routers with their own
queues (as for ADSL, etc), and performance on fast networks may be
very different.  So I would be a bit cautious of applying patches
based on one person's experience.

Doing larger writes is likely to make the bandwidth more jerky, as
the kernel buffer is filled up, drains, and then pauses.  That might
make rsync's interaction with interactive traffic more harmful than it
ought to be.  But bringing it right down to 1024B doesn't sound good
-- it's likely to generate MTU packets, which nobody really wants.

So by all means tweak it, but I think trying to make it run at the
exact specified limit is unlikely to pay off.

To unsubscribe or change options:
Before posting, read:

Re: Smoother bandwidth limiting

2003-06-18 Thread Martin Pool
On 15 May 2003, Paul Slootman [EMAIL PROTECTED] wrote:

 I can't really see that doing smaller writes will lead to packets being
 padded, unless you're doing really small writes (ref. the ATM 48-byte
 packets); the TCP and IP headers will always be added, which means that
 the extra overhead of those will have a larger impact than any
 So, I'd suggest that 1024 isn't that bad a number for all cases; it'll
 fit comfortably into most MTU sizes, and for dialup PPP it'll be split
 into two packets without that much overhead. If not concerned with the
 dialup PPP case, I'd go for something like 1400.

Of course a write() does not necessarily correspond to a TCP frame,
which does not necessarily correspond to an IP packet.

But nevertheless I would suggest avoiding writes that are this short.
In addition to the headers that Paul mentioned, there are other
per-packet costs such as Ethernet leadin and trailer times, and the
hardware, interrupt and OS overhead for processing packets.

Consider also that some people use rsync on fast networks, and they
won't appreciate small packets *or* getting more system calls to
process a given amount of data.

Needlessly causing each packet hold 30% less data than it normally
would is very wasteful.  The point of bwlimit is after all to help
users have more bandwidth for other applications.

Checking for bwlimit after every say 4k I can imagine but below that
is dubious.  I'm happy to be proved wrong though.

To unsubscribe or change options:
Before posting, read:

Re: You have emailed an address at

2003-06-16 Thread Martin Pool
On 16 Jun 2003, Lapo Luchini [EMAIL PROTECTED] wrote:
 Each time I send a message to the ML I receive this message... (thi 
 mislead me to double-post some days ago).
 Could someone please unsubscribe the blocked address?
 But I guess that's not possible, as anyone else shuold have noticed 
 this, too... =(

Done.  (I saw it too.)

Description: PGP signature
To unsubscribe or change options:
Before posting, read:

Re: Interactive Rsync Authentication Problem

2003-06-16 Thread Martin Pool
On 29 May 2003, Andrew Klein [EMAIL PROTECTED] wrote:
 The getpassphrase() call is identical to getpass() except it returns 256 
 chars maximum.  Of course you would have to mess with autoconf but I 
 don't think that should be too hard.  Based on the autoconf stuff in the 
 latest rsync release, the compile check would be something along these 
 AC_CACHE_CHECK([for getpassphrase],rsync_cv_HAVE_GETPASSPHRASE,[
 AC_TRY_COMPILE([#include unistd.h],
 [char *pass;  pass = getpassphrase(Password: );],
 if test x$rsync_cv_HAVE_GETPASSPHRASE = xyes; then

Can you try that and tell us if it actually works?

It's OK if you can't get the autoconf stuff straight, but it would be
good to know that getpassphrase() actually solves the problem before

Better yet, send a patch that adds an appropriately-licenced
readpassphrase()/getpassphrase() to the lib/ directory?

Someone wrote:
 I love the fact that the man page for getpass() under Linux says don't use
 this, but does not provide any alternative. Mmmm... Linux - it's so
 secure! ;-)

Solaris fnmatch(ass, hat, 0) used to return true!

To unsubscribe or change options:
Before posting, read:

Re: support@microsoft e-mails is a VIRUS

2003-06-16 Thread Martin Pool
On 20 May 2003, jw schultz [EMAIL PROTECTED] wrote:

   Is there anyway you can stop sending these e-mails to everybody on the list?
   I've received maybe 3 or 4 of them since yesterday.
  One possible solution to reduce the spam/virus traffic on the list would
  be to close the list so that only people on the list can send to it.
 The rsync team has, so far, rejected that approach.  We want
 to keep the list as open as possible.

Many people post to the list without subscribing, because it is the
main support forum for a product.  It is not really closed in the
way that a list for a development team is.

So there would be a lot of mails blocked.  If they're automatically
bounced then it is annoying for rsync users.  If they're deferred then
the delay is annoying, and somebody needs to spend time reading
through the queue.  At the moment I don't think that would be a good
use of time.

The only real solution is to send spammers and virus writers to jail.

In the mean time we have set up spam and virus filters.  As jw says,
you are only seeing a small fraction of the literally hundreds of
attacks we suffer every day.

To unsubscribe or change options:
Before posting, read:

Re: Feature request: true multiple sources

2003-06-16 Thread Martin Pool
On 14 Jun 2003, Gregory Brauer [EMAIL PROTECTED] wrote:
 I am a big fan of rsync, but the more I use it, the more I
 become frustrated at rsync's asymetrical functionality.
 For instance, I can do this:
 rsync /A/ /B/ desthost:/AB
 but not this:
 rsync srchost:/A/ srchost:/B/ /AB

rsync allows remote shell wildcards:

  rsync 'srchost:/{A,B}/' /AB

The limitations are in your own mind.  (Well, at least this one
is. :-)

To unsubscribe or change options:
Before posting, read:

Re: Multistreaming rsync

2003-06-16 Thread Martin Pool
On 10 Feb 2003, Cockram, Michael  L (ISI) [EMAIL PROTECTED] wrote:
 Newbie here!
 I am not sure if this is possible or not, but is it possible to multistream
 the connections that rsync is making?  Say I had a directory with a bunch of
 huge sized files.  Is there a way of telling rsync to make multiple
 connections for different groups of files?  Am I making sence?

Just run different rsync processes for different subdirectories.
There is no support in the program itself. 

 Are there tcp window limitations on rync like ftp has?

What do you mean?

TCP windows are pretty much invisible to applications.

To unsubscribe or change options:
Before posting, read:

CVS update: rsync

2003-06-16 Thread Martin Pool

Date:   Tue Jun 17 04:46:32 2003
Author: mbp

Update of /data/cvs/rsync
In directory

Modified Files:
Log Message:
Add a comment about using getpassphrase() or readpassphrase() rather
than getpass().  No code change.

authenticate.c  1.22 = 1.23
rsync-cvs mailing list

Re: [librsync-devel] Re: state of the rsync nation? (revisited6/2003 from 11/2000)

2003-06-13 Thread Martin Pool
On 12 Jun 2003, jw schultz [EMAIL PROTECTED] wrote:

 Mind you, that means making the server lightweight with the
 client doing all the logic and a nearly stateless connection.
 Much like my earlier post on this thread posited.

I was wondering today if that would make it easier to gain confidence
in the design's security.  Making the semantics of an operation less
dependent on a lot of accumulated state probably helps, all other
things being equal.

To unsubscribe or change options:
Before posting, read:

Re: [librsync-devel] Re: state of the rsync nation? (revisited6/2003 from 11/2000)

2003-06-12 Thread Martin Pool
On 12 Jun 2003, Brad Hards [EMAIL PROTECTED] wrote:
 Hash: SHA1
 On Wed, 11 Jun 2003 11:25 am, Martin Pool wrote:
  That could be a pretty nice thing.  We use little rsync shares on
  workstations here for sharing files, and I know some people do the
  same with FTP.
  What aside from SLP would make this more useful?

 A standardised way of describing the share would be good. By this, I don't 
 mean a software implementation, but a user / admin configuration. Think 
 Standard Operating Procedures.
 The other thing that would be nice would be a search capability - find me the 
 shares with a copy of rsync-2.5.6.tar.bz2

OK, interesting.

 1.  I'm thinking about something that, as a minimum, doesn't do plain text 
 passwords. I admire clever attacks as much as the next guy, but the next guy 
 doesn't want some kewl hax0r with a copy of tcpdump uploading warez either.
 Probably SASL is worth a look.

Yes, SASL looks like the way to go, at least for authentication.

Some things I read indicate that SASL is not a good choice for
encryption/integrity.  So perhaps we should use SASL just for
authentication, and SSL for confidentiality/integrity.  Does that make
any sense?

 Why run this _only_ over TCP? Obviously you don't want to re-invent TCP/IP 
 error handling, but the protocol shouldn't rely on such a system. File 
 transfer can potentially run connectionless.

It sounds like you're talking about something like NFS (XDR-RPC) that
can run over UDP or TCP?

I wouldn't rely on TCP specifically, but I think it's OK to rely on a
byte stream channel, such as TCP or SSH.

I suppose if you're going to do UDP then you might want to try to do
multicast too, but that makes things like error handling a lot harder.

But I do think there should be a layer at which there are distinct
messages, and that what goes under that might be something other than
a byte stream in future.

To unsubscribe or change options:
Before posting, read:

Re: [librsync-devel] Re: state of the rsync nation? (revisited6/2003 from 11/2000)

2003-06-12 Thread Martin Pool
On 12 Jun 2003, jw schultz [EMAIL PROTECTED] wrote:

 Leave the communications protocol to the communications
 layer.  You don't save anything by coding reordering and
 retransmission at the packet level; that is infrastructure.
 Connectionless is fine.  Lightweight sessions is better.  If
 you lose a connection a restart is possible.  It is
 preferable to not have to authenticate and negotiate
 protocol versions and encryption with every message.
 Think in terms of transactions.  Each transaction is atomic.
 If a transaction doesn't complete you have the means to
 roll-back and retry.  If a connection breaks between
 transactions, or leaving a transaction incomplete, you start
 a new connection and pick up where you left off.

I agree with all this.

To extend on what jw says:

I think it's fine to (if desired) negotiate SSL, authentication, and
compression at the start of a connection.  They generally require
multiple round trips and it would be wasteful to do them more
frequently when per-connection is natural.

On the other hand it would be nice if the client could pick up an
interrupted transfer halfway through the tree, rather than needing to
start from the beginning as rsync 2.x does. 

To unsubscribe or change options:
Before posting, read:

Re: [librsync-devel] Re: state of the rsync nation? (revisited6/2003 from 11/2000)

2003-06-11 Thread Martin Pool
On 11 Jun 2003, Donovan Baarda [EMAIL PROTECTED] wrote:
 On Wed, 2003-06-11 at 13:59, Martin Pool wrote:
  On 11 Jun 2003, Donovan Baarda [EMAIL PROTECTED] wrote:
   The vcdiff standard is available as RFC3284, and Josh is listed as one
   of the authors. 
  Yes, I've just been reading that.
  I seem to remember that it was around as an Internet-Draft when I
  started, but it didn't seem clear that it would become standard so I
  didn't use it.

 I'm not sure if this is the same one... I vaguely recall something like
 this too, but I think it was an attempt to add delta support to http and
 had the significant flaw of not supporting rsync's
 delta-from-signature. It may have come out of the early xdelta http
 proxy project. IMHO rproxy's http extensions for delta support were
 better because they were more general.

Yes, the most recent version of the Mogul delta-http proposal I read
assumed that the server had a complete history of the document to
generate diffs.  This is fine if you're serving e.g. software
distributions or content from a version control system and have the
history, but not very general.

 I forget if I saw this in Tridge's thesis, but I definitely noticed that
 librsync uses a modified zlib to make feeding data to the compressor and
 throwing away the compressed output more efficient. I have implemented
 this in pysync too, though I don't use a modified zlib... I just throw
 the compressed output away.

Yes, I remember that, but that's not rzip.

By the way the gzip hack is an example of a place where I think a bit
of extra compression doesn't justify cluttering up the code.  I think
I'd rather just compress the whole stream with plain gzip and be done.

See pg 86ff

rzip is about using block search algorithms to find widely-separated
identical blocks in a file.  (I won't go into detail because tridge's
explanation is quite clear.)

I am pretty sure you could encode rzip into VCDIFF.  I am not sure if
VCDIFF will permit an encoding as efficient as you might get from a
format natively designed for rzip, but perhaps it will be good enough
that using a standard format is a win anyhow.  Perhaps building a
VCDIFF and then using bzip/gzip/lzop across the top would be

In fact rzip has more in common with xdelta than rsync, since it works
entirely locally and can find blocks of any length. 

rzip's advantage compared to gzip/bzip2 is that it can use compression
windows of unlimited size, as compared to a maximum of 900kB for
bzip2.  Holding an entire multi-100MB file in memory and compressing
it in a single window is feasible on commodity hardware.

 The self referencing compression idea is neat but would be a...
 challenge to implement. For it to be effective, the self-referenced
 matches would need to be non-block aligned like xdelta, which tends to
 suggest using xdelta to do the self-reference matches on top of rsync
 for the block aligned remote matches. Fortunately xdelta and rsync have
 heaps on common, so implementing both in one library would be easy (see
 pysync for an example).
 If I didn't have paid work I would be prototyping it in pysync right
 now. If anyone wanted to fund something like this I could make myself
 available :-)

I may get a chance to work full time on replication again soon, so I'm
trying to work out  where we're up to.

 Yeah, my big complaint about librsync at the moment is it is messy. Just
 cleaning up the code alone will be a big improvement. I would guess that
 at least 30% of the code could be trimmed away, leaving a cleaner and
 more extensible core, and because messy leads to inefficient, it
 would be faster too.

If I'd had more time this letter would have been shorter.

To unsubscribe or change options:
Before posting, read:

Re: [librsync-devel] Re: state of the rsync nation? (revisited6/2003 from 11/2000)

2003-06-10 Thread Martin Pool
On 10 Jun 2003, Brad Hards [EMAIL PROTECTED] wrote:

 Yep. Also, I was playing with the idea of rsync with Service Location Protocol 
 to use as a replacement for the crappy practice of sharing data over floppy 
 disks. The rough concept was that each machine had a shared directory, which 
 you could conveiently label and advertise over SLP. 

That could be a pretty nice thing.  We use little rsync shares on
workstations here for sharing files, and I know some people do the
same with FTP. 

What aside from SLP would make this more useful?

 Go superlifter! For what it is worth, the things I identified during the 
 abortive kioslave / SLPv2 share development:
 1. More secure than FTP.
 2. Easy to label shares/directories and provide fine grained access control, 
 if desired.
 3. Client side library that doesn't require hellish text parsing, or at least 
 hides it from you.
 4. Well delimited packets, so you can tell when one has been

Can you give more detail on those?

What do you mean by packets being dropped?  How can that happen on a
TCP channel?

To unsubscribe or change options:
Before posting, read:

Re: [librsync-devel] Re: state of the rsync nation? (revisited6/2003 from 11/2000)

2003-06-10 Thread Martin Pool
On 11 Jun 2003, Donovan Baarda [EMAIL PROTECTED] wrote:

 The vcdiff standard is available as RFC3284, and Josh is listed as one
 of the authors. 

Yes, I've just been reading that.

I seem to remember that it was around as an Internet-Draft when I
started, but it didn't seem clear that it would become standard so I
didn't use it.

 I also had some correspondence with Josh ages ago where he talked about
 how self-referencing delta's can directly do compression of the miss
 data without using things like zlib and by default gives you the
 benefits of rsync's context compression without the overheads (rsync
 runs a decompressor _and_ a compressor on the receiving end just to
 regenerate the compressed hit context data).

Something possibly similar is mentioned in tridge's thesis.  I was
talking to him a while ago and (iirc) he thought it would be good to
try it again, since it does well with the large amounts of memory and
CPU time that are available on modern machines.

I strongly agree with what you said a while ago about code simplicity
being more valuable than squeezing out every last bit.

To unsubscribe or change options:
Before posting, read:

Re: state of the rsync nation? (revisited 6/2003 from 11/2000)

2003-06-09 Thread Martin Pool
On  9 Jun 2003, Brad Hards [EMAIL PROTECTED] wrote:
 Hash: SHA1
 On Sun, 8 Jun 2003 15:43 pm, Donovan Baarda wrote:
  The comments about rsync never using libhsync/librsync are still true
  for the foreseeable future. There are many things rsync includes that
  are still missing from librsync, and the rsync implementation is very
  tightly coupled, with many backwards compatibility issues. Even when
  librsync reaches the point of being as good or better than rsync at
  signature/delta/patch calculation, it would be a major task to fit it
  into rsync.

 The downside to not having a library that is wire-compatible with rsync 
 - --daemon is that it is damn difficult to write something that works as a VFS 
 / kioslave type device. I had a hack at this, by wrapping the rsync 
 executable, and it worked a bit, but it was way too fragile for any real use:

I guess the reason why you're interested in doing it is so that you
can browse public rsync mirrors from Konqueror/whatever?

Speaking only for myself, I don't think this is worth spending time
on.  It would be hard to write a wire-compatible library, and hard to
refactor rsync into such a library.

Not only might a new tool be written more easily without baggage, it
might also (in a couple of years) persuade people running mirror sites
to switch.  I know many of them are unhappy with rsync at the moment:

 - large memory usage
 - no really good ways to restrict client usage
 - ...

To unsubscribe or change options:
Before posting, read:

Re: state of the rsync nation? (revisited 6/2003 from 11/2000)

2003-06-09 Thread Martin Pool
On  8 Jun 2003, Donovan Baarda [EMAIL PROTECTED] wrote:

 The next big thing in delta calculation is probably going to be the
 vcdiff encoding format, which should allow a common delta format for
 various applications and supports self-referencing delta's, which
 makes it capable of compression. According to the xdelta project this
 has already been implemented, and I'm keen to see Josh's code, as it
 could be used as the basis for a cleanup/replacement of at least the
 patch component of librsync.

Do you have a link for this?  Josh plays his cards pretty close to his
chest.  The XDelta page seems to be even more inactive than librsync

To unsubscribe or change options:
Before posting, read:

(fwd) PATCH: managing permissions with rsyncd.conf options

2003-03-12 Thread Martin Pool

This is a patch to control unix permissions when uploading to a rsyncd-server
by setting rsyncd.conf options.

cu, Stefan
Stefan Nehlsen | ParlaNet Administration | [EMAIL PROTECTED] | +49 431 988-1260
rsyncd.conf options to handle file permissions
(stolen from samba)

This patch is made to provide more control on the
permissions of files and directories that are
uploaded to a rsyncd-server.

Normally when files and directories are uploaded to
a rsyncd they are created with the permissions of the
source. Especially in the case that user and group
are set to special values using the uid and gid
directives it does not much sense to use the source
permission pattern.

There is a patch introducing a new chmod command line
option but normally you may want to control the permissions
on server side. The patch below will allow you to modify
file and directory permissions by using 4 new rsyncd.conf
directives. I'm sure that those 2 patches will not break
each other and it really makes sense to use them both.

You may know this options from samba :-)

create mask

When a file is created (or touched) by rsyncd the
permissions will be taken from the source file
bit-wise 'AND'ed with this parameter. This
parameter may be thought of as a bit-wise MASK for
the UNIX modes of a file. Any bit not set here will
be removed from the modes set on a file when it is

The default value of this parameter is set to 0
to be provide the default behaviour of older versions.

Following this rsync  will bit-wise 'OR' the UNIX
mode created from this parameter with the value  of
the force create mode parameter which is set to 000
by default.

This parameter does not affect directory modes. See
the parameter directory mask for details.

See also the force create mode parameter for
forcing particular mode bits to be set on created
files. See also the directory mask parameter for
masking mode bits on created directories.

Default: create mask = 0

Example: create mask = 0644

force create mode

This parameter specifies a set of UNIX mode bit
permissions that will always be set on a file created
by rsyncd. This is done by bitwise 'OR'ing these bits
onto the mode bits of a file that is being created or
having its permissions changed.

The default for this parameter is (in octal) 000.
The modes in this parameter are bitwise 'OR'ed onto
the file mode after the mask set in the create mask
parameter is applied.

See also the parameter create mask for details on
masking mode bits on files.

Default: force create mode = 000

Example: force create mode = 0644

directory mask

When a directory is created (or touched) by rsyncd the
permissions will be taken from the source directory
bit-wise 'AND'ed with this parameter. This
parameter may be thought of as a bit-wise MASK for
the UNIX modes of a file. Any bit not set here will
be removed from the modes set on a file when it is

The default value of this parameter is set to 0
to be provide the default behaviour of older versions.
Following this rsync  will bit-wise 'OR' the UNIX
mode created from this parameter with the value  of
the force directory mode parameter which is set to 000
by default.

This parameter does not affect file modes. See
the parameter create mask for details.
See also the force directory mode parameter for
forcing particular mode bits to be set on created
directories. See also the create mask parameter for
masking mode bits on created files.
Default: directory mask = 0

Example: directory mask = 0755

force directory mode

This parameter specifies a set of UNIX mode bit
permissions that will always be set on a directory
created by rsyncd. This is done by bitwise 'OR'ing
these bits onto the mode bits of a directory that
is being created. The default for this parameter is
(in octal)  which will not add any extra permission
bits to a created directory. This operation is done
after the mode mask in the parameter directory mask
is applied.

See also the parameter  directory mask for details
on masking mode bits on created directories.

Default: force directory mode = 000

Example: force directory mode = 0755

diff -ur rsync-2.5.5/loadparm.c rsync-2.5.5-umask/loadparm.c
--- rsync-2.5.5/loadparm.c  Mon Mar 25 05:04:23 2002
+++ rsync-2.5.5-umask/loadparm.cSun Mar  2 22:53:16 2003
@@ -140,6 +140,10 @@
int timeout;
int max_connections;

(fwd from files of length zero

2003-03-11 Thread Martin Pool
- Forwarded message from Klaus Dittrich [EMAIL PROTECTED] -

From: [EMAIL PROTECTED] (Klaus Dittrich)
Subject: files of length zero
Date: Tue, 11 Mar 2003 17:08:47 +0100
User-Agent: Mutt/1.4i
X-Bogosity: No, tests=bogofilter, spamicity=0.00, version=0.10.2

Hi Martin,

MS-Windows users here sometimes make the expierience to become files of 
length zero when something on windows crashes. 

They often have many files open and after a crash they don't realize
that parts of their work gots lost.

Nightly a backup-server using rsync, copies those zero length files
and thereby destroyes their files backed up the day before by 
makeing it zero length too.

Can you build in an option to rsync that handles files of length zero
the same way as deleted ones, so preserving the old file ?

Regards Klaus

- End forwarded message -
To unsubscribe or change options:
Before posting, read:

Re: rsync in-place (was Re: rsync 1tb+ each day)

2003-02-04 Thread Martin Pool
On  4 Feb 2003, jw schultz [EMAIL PROTECTED] wrote:

 The reason why in-place updating is difficult is that
 rsync expects the unchanged blocks in the old file may be
 relocated.  Data inserted into or removed from the file does
 not require the rest of the file to be retransmitted.
 Unchanged blocks will be copied from the old locations in
 the old file to new locations in the new file.
 In-place updates requires that blocks not relocate.
 It may be possible by disallowing matches having differing
 offsets.  That would require deeper investigation.

Of course the other place where people want this is for transfers of
block devices, where the rename is just not possible.

I looked a little at doing this in librsync.  The naive solution is to
merely prohibit the delta from referring to blocks that have been
already overwritten.  I will probably eventually add at least this

You might try this in rsync.  A lot of other code to do with
e.g. setting permissions makes the assumption of the rename model,
though.  It would take a fair amount of testing.

Of course this model really falls down in some cases.  Consider the
case of one block inserted at the beginning.  Then with the naive no
backreferences approach every block will be overwritten just before
it's needed. :( 

You can imagine a smarter algorithm that does non-sequential writes to
the output so as to avoid writing over blocks that will be needed
later.  Alternatively, if you assume some amount of temporary storage,
then it might be possible to still produce output as a stream.

Really for your problem the practical solution is just to dump the
whole file, perhaps allowing for sparse blocks.  As other people have
observed, by design rsync does a lot more disk IO than network.

To unsubscribe or change options:
Before posting, read:

Re: proposal to fork the list (users/developers)

2003-01-30 Thread Martin Pool
On 30 Jan 2003, Green, Paul [EMAIL PROTECTED] wrote:

 I tend to be someone who automatically looks for trends, and the nice thing
 about having just one list is that it lets me know where people are having
 problems.  Judging by the number of questions we get, one of the biggest
 challenges for inexperienced rsync users is knowing why a particular file is
 included or excluded. 

Yes, that's definitely a large advantage of having a single list.

 Way in the back of my mind I see a need for an option that, for
 every file included or excluded, says which rule was used to make
 the decision.  Nice and simple.

I came to the same conclusion in a similar way a while ago.  If you
use -vv for rsync, you should see messages about exactly this. :-)


Debian: giving you the power to shoot yourself in each toe individually.
-- ajt
To unsubscribe or change options:
Before posting, read:

Re: reconnect ssh connection?

2003-01-30 Thread Martin Pool
On 30 Jan 2003, David Garamond [EMAIL PROTECTED] wrote:

 has someone come up with a trick to let disconnected ssh connections be
 recovered without terminating and having to restart rsync (perhaps by
 wrapping ssh or something)?

Ooh, interesting idea...

You might do it with some kind of wrapper at both ends...

Alternatively, by changing ssh options perhaps you can get the process
to stay open even if the link goes away, by increasing timeouts and so

To unsubscribe or change options:
Before posting, read:

Re: Proposal that we now create two branches - 2_5 and head

2003-01-29 Thread Martin Pool
On 30 Jan 2003, Donovan Baarda [EMAIL PROTECTED] wrote:
 On Thu, 2003-01-30 at 07:40, Green, Paul wrote:
  jw schultz [mailto:[EMAIL PROTECTED]] wrote:
  [general discussion of forthcoming patches removed]
   All well and good.  But the question before this thread is
   are the changes big and disruptive enough to make a second
   branch for the event of a security or other critical bug.
 After reading arguments, is support the delay the branch till it
 absolutely must happen approach... ie don't branch until a bugfix needs
 to go in to a stable version and HEAD is way too unstable to be
 released with the fix as the new stable.

Yes, that is generally a better approach.  Remember, you can always go
back and create a branch from the release later on if such a situation

Personally, I'm more interested in eventually starting from scratch
with something like duplicity, rzync, or superlifter.  I think the way
Subversion builds on the experience but not the code from CVS is
pretty good.  

Obviously there are downsides to this approach: it may be a long time
before the code is ready, and people may not want to switch for a
while after that.  But it may be more fun, and eventually yield a
cleaner solution.  I hope other people are interested in continuing
work on librsync and projects based on it.

I think the parallels between rsync and CVS are actually reasonably

 - good tools, and de facto standards both for the free software

 - showing signs of age in underlying assumptions (file-by-file
   versioning in CVS, shared filelist in rsync)

 - knotty code and interface that are a bit hard to refactor

 - most existing users have it working properly and don't *want*
   disruptive changes, just bug fixes or perhaps small additional

 - new approach offers substantial benefits

 - doing something new is not urgent

All the above is just for me personally.  Continuing to move rsync
itself forward as and when appropriate is still a good thing.

 Actually, a bigger attitude issue for me is having a separate
 rsync-devel and rsync-user lists. I have almost unsubscribed many times
 because of the numerous newbie user questions;

Me too.

Samba does this with samba-technical and samba.  I think at this point
the user list for samba only has slightly more traffic than rsync.  I
think apache may now be the same too.

Plenty of people post user questions to samba-technical despite
prominent notices that it is only for developers.  They tend to both
piss off developers and go unanswered at least some of the time.  It's
probably due both to my question *is* technical and if the
developers read it they might answer.  I'm not sure what a good
solution would be: probably a clearer name would help.  Perhaps

What do people feel about this?

 I'm only interested in the devel stuff. I'm sure there are many
 users who have unsubscribed because of the numerous long technical

To unsubscribe or change options:
Before posting, read:

Re: [trivial patch] link overloaded

2003-01-29 Thread Martin Pool
On 29 Jan 2003, jw schultz [EMAIL PROTECTED] wrote:
 This is just a trivial documentation change.  The word
 link is overloaded.  It refers to symlinks, hardlinks and
 network links.  When looking for references to file links in
 the manpages the network references get in the way.


To unsubscribe or change options:
Before posting, read:

Re: Proposal that we now create two branches - 2_5 and head

2003-01-28 Thread Martin Pool
On 28 Jan 2003, Green, Paul [EMAIL PROTECTED] wrote:

 I think splitting the branches will also let us be a little more
 experimental in the development branch, at least until we get near
 the next release phase, because we'll always have the field release
 in which to make crucial bug fixes available quickly.

I agree that this would be a good approach if and only if there is
energy to do lots of development in the head branch.  What do you have
in mind?

To unsubscribe or change options:
Before posting, read:

list filtering

2003-01-27 Thread Martin Pool
Because of the enormous amount of traffic being generated by Windows
viruses[0] I have turned on Mailman attachment filtering on the
high-traffic lists.

Lists will now pass only text/plain MIME parts through to the list.
multipart/alternative messages with both text and html forms will have
the HTML form removed, and messages in only HTML will be squashed to
text.  Messages which cannot be handled in any of these ways will be

To send patches or log files to the list, you need to either insert
them inline into your message, or make sure they're marked as
text/plain.  On most systems, just making the name be *.txt should be

I hope everybody's enjoying their SQL Server experience :-)

Martin postmaster

[0] ... automated notifications about viruses, users complaining about
viruses, users complaining about automated notification, users
complaining about users complaining, scanners complaining about
perfectly ordinary attachments, etc
To unsubscribe or change options:
Before posting, read:

Re: signing tarballs

2003-01-15 Thread Martin Pool
[replied to list]

There was a discussion about this on the Samba list a while ago


  We should create a team signing key, with an lifetime of about a
  year.  It has to be relatively short to allow for turnover in the
  people who have access to the key.

  The signing key must only be stored on secure machines, certainly
  *not* on it was on, somebody who
  compromised that machine could also generate new signatures and it
  would be pointless.)

  The key should be signed by team members and other relevant people;
  we should also sign each others' keys.

  The key should be on the keyservers and on the web site.

Unless you've already done so I'll create the key and send the private
half to you and the public half to the website, keyservers, and list.

To unsubscribe or change options:
Before posting, read:

Re: SPAM on List...

2002-12-11 Thread Martin Pool
On  9 Dec 2002, John E. Malmberg [EMAIL PROTECTED] wrote:

 I will agree that the SAMBA lists are being kept more spam free than 
 some of the other mail servers that I get e-mail on.

Just as an interesting data point: our bogofilter setup caught 60 spam
messages in the last 24 hours aimed at lists.  

 And while you are saying that you are not in favor of using blocking 
 lists, you are blocking Korea by some method, but that could be just 
 something that bogofilter has figured out.

We're using  Unfortunately the spam:ham ratio for
Korea is so bad that this seems to be the only appropriate solution.
We check the headers on samples of rejected messages and there are
dozens of spams per day and I haven't seen a nonspam message yet.

 It is your servers and your decisions on how to allocate your resources.
 No spam blocking method is 100%.
 And I am not complaining about your efforts.  I was just posting some 
 methods of spam blocking in use, and of course my bias opinions on

Thanks, understood.  

If I'm defensive it's only because maintaining these things is
generally a thankless task.  People (not you, John) complain and whine
when spam gets through, but nobody sees the work that goes into
keeping the other 99% out and keeping things running smoothly.

To unsubscribe or change options:
Before posting, read:

Re: SPAM on List...

2002-12-11 Thread Martin Pool
On 10 Dec 2002, jw schultz [EMAIL PROTECTED] wrote:
 First let me say that Martin (and any others list managers)
 is doing pretty well.  Although there was a breif rise in
 the volumen of spam leaking through during the transition
 it has settled down quite nicely.  This is an arms war and
 I don't expect perfection.  Cudos!


 I can almost second that.  That seems to hold true for the
 last couple of months.  Perhaps html is already blocked.
 I do know that some valid mail may come in with
 Content-Type: Multipart/Alternative where one is text/plain.
 Although i don't like the waste of bandwidth i could see
 accepting that.  It is the stuff that is only html that
 should definitely be bounced.

I've wondered about installing something like mimedefang to handle
these things.  It would be nice to get rid of TNEF attachments too.

I won't start this until we have some experience with the new
stopspam-bogofilter setup.

There are some complications:

 As Tim points out, some people don't control whether their mailer
 sends HTML or not.  So we would need to fall back to html-text
 conversion, rather than bouncing such messages.  This makes it not a
 good way to detect spam.

 Some people need to send patches/log files/whatever to the lists as

 What's not there can't break.  Unless it's clearly useful, it
 shouldn't be installed.

Given that some people can't change their HTML setup (not under their
control or too clueless) I'm not sure if notification messages are

 The other clear indicator that comes up more often here
 seems to be non-english messages.  Care has to be taken not
 to block just because of a few words but if the message is
 mostly non-english or is in a charset incompatible with
 english it should be bounced.

The previous bouncer did explicitly block non-latin character sets.
However, there was a nasty failure mode which caused some non-junk
messages to be blocked.  People writing from (say) China may be using
a mail client that sends messages in a Chinese character set.  Some of
those character sets contain latin characters, so they may have in
fact been writing a purely English message, or perhaps an English
message with a part-Chinese sig block.  

Discarding these messages was incorrect; what was worse was that the
old system gave no indication of how to fix the problem and the
messages were dropped without review. :-(

As an amusing example of going too far in the other direction, a
certain government body has XXX as a blackword in their mail filter,
and a single occurrence is enough to cause the messages to bounce.  Of
course people pretty regularly write XXX for don't care values...
And let's not even think about byte sex. :-)

To unsubscribe or change options:
Before posting, read:

Re: SPAM on List...

2002-12-09 Thread Martin Pool
On  9 Dec 2002, John E. Malmberg [EMAIL PROTECTED] wrote:

 If it was on any of the reputable blocking lists, I would not be able to
 receive any of the SAMBA lists, and you would be getting the

It has since been removed from some of them.

 I.P. based blocking has shown to be the only thing that motivates some
 domains to act on abuse reports.

I really don't care about abuse reports anymore.  There is an
inexhaustible supply of other spam sources.  Desirable as it may be to
have ISPs behave properly, it will not reduce the amount of spam.

 And the bounce message can contain an alternate contact means such
 as a web form if someone needs a white-listing.

A major goal of this exercise is to reduce or eliminate the number of
messages that require manual handling because they waste admin time,
and they are often dropped.  Our previous experience was that IP
blacklists have significant false-positive and false-negative rates.

In addition, IP blacklists seem to often go mad when the admins
start pursuing a campagin against some ISP in a way that does not
agree with our goals.  For example, the previously-reputable ORBS
server blacklisted most of Australia a few years ago.

Basically I want the decisions to be made by samba team admins, not by
other people.

 Some time last fall apparently Korea passed an OPT-OUT with the 
 equivalent of ADV in the headers law.  Right after that, list that I 
 subscribe to at a major university went from 2 spams a week to over 8 
 spams a day.  99% from Korea.

We no longer accept any mail from Korea. :-(

 Now the other thing to consider is that when the filter makes a mistake 
 and deletes a legitimate message, it is quite a while before the sender 
 figures out, if at all that the message did not get through.

Our filter sends intelligible, actionable bounce messages.  This is an
enormous improvement of the previous system, which said something like
error 10. 

 If the message is bounced, the sender knows immediately, and can use the 
 alternate contact information, such as a web form to request a 

As RFC 2822 requires, mail to postmaster is not filtered, and is read
by a human.  People can report problems there.

 They also know that there is probably a problem with their ISP or
 with the particular block list, and they have the information needed
 to fix it.

That's bogus.  If my ISP is blocked it is very difficult for me to
change -- at home I am on a 12 month contract with my DSL provider,
for example.  Even if I did move, it's very unlikely that my leaving
would persuade them to change/enforce their AUP.  People with business
hosting are in a even more difficult situation.

 Filtering makes spam your problem.  Using a blocking list makes spam the 
 problem of the ISP sending the spam.  Eventually almost noone will 
 accept e-mail from them, either from local blocking lists, or public

You describe a long-term solution in which spam-friendly ISPs are
gradually ostracised.  I'm not quite sure I believe you that there is
a clear distinction, that bonafide ISPs are really able to stop spam,
and that being ostracised will ever really cut them off.  But
regardless, these are long-term, global measures.What I care about
is reducing admin load and spam transmission on right now.

Our bogofilter setup seems to be doing *extremely well* at just that;
I can see it catching many more messages and getting far fewer false
positives before, and it is no longer necessary to clear queues by
hand.  I looked through the queue when I installed it and there were
many posters who just happened e.g. to be from China and whose
messages were basically dropped.  

Unless people have specific complaints about the new setup I intend to
keep going along this path.

To unsubscribe or change options:
Before posting, read:

Re: Head Rotor VE 12/08A

2002-12-08 Thread Martin Pool
On  8 Dec 2002, [EMAIL PROTECTED] wrote:
 Can we get RID of this member?  This is the 2nd time I have seen this
 posted.  Now after the first time, I figured it would have been put into a
 SPAM filter, and thereby the member would not be able to post SPAM to the
 list again, but that does not seen to be the case.

We're working on improving spam filtering for the list using
bogofilter.  At the moment we catch about 100 spams per day going to
the samba lists, so the percentage is not so bad.

The only real solution is to jail spammers.

 I still suggest we go to a closed list, whereby e-mail addresses are
 verified by a person before being allowed to post.  It would help with SPAM,
 and when a member posts SPAM, they are put into moderated mode, and if they
 do it a 2nd time, they are banned...permanently. already has lots of trouble with people who are not able to
follow simple instructions about how to subscribe, unsubscribe, post,
etc.  Going to a closed list would cause more administrative work, and
would also inconvenience posters who want to e.g. read via a local
list.  So at the moment I don't want to do that.


Open a medium-sized can of Spam (retain the can (retain the spam too))
To unsubscribe or change options:
Before posting, read:

Re: rsync] Re: bug reporting.. bugzilla

2002-12-08 Thread Martin Pool
On  9 Dec 2002, R P Herrold [EMAIL PROTECTED] wrote:

 Really a better FAQ editor process seems more useful.  Isn't
 this the purpose of a CVS and commit privileges -- set up one
 or more trusted editors with rights, and delegate that aspect.  

Anybody who wants to maintain the FAQ-O-Matic already has the
necessary access.  If somebody starts working on it and feels that CVS
would be more appropriate then of course we can switch to that.

 For the last year, I have acted as editor on the RPM website;
 there is also an open editorial mailing list, and provided
 content (dreadfully little) gets slotted in.
 I monitor all the mailing lists in the area (five primary
 ones) and watch for common questions or misunderstandings
 which are well answered end up summarized and on the site.  I
 particularly look for the postings by the lead maintainer and
 a few others for the 'nuggets' -- the answers float by and may
 be picked out of the stream and tossed up on the riverbank of
 a FAQ

Yes, that's the process I had in mind.  It's just a matter of some set
of people finding the time and motiviation to do it.

To unsubscribe or change options:
Before posting, read:

(fwd from Bugs in rsync

2002-11-19 Thread Martin Pool
- Forwarded message from David Jonsson [EMAIL PROTECTED] -

From: David Jonsson [EMAIL PROTECTED]
Subject: Bugs in rsync 
Date: Fri, 19 Jul 2002 18:38:59 +0200 (CEST)
To: Martin Pool [EMAIL PROTECTED], Andrew Tridgell [EMAIL PROTECTED]

First, Thansk for a great tool!

I run rsync supplied with RedHat 7.3
 rsync --version
rsync  version 2.5.4  protocol version 26
Copyright (C) 1996-2002 by Andrew Tridgell and others
Capabilities: 64-bit files, socketpairs, hard links, symlinks, batchfiles, 
  64-bit system inums, 64-bit internal inums

and I experience the followinf errors when I issue this command
rsync -a --delete * /mnt/navhdb2

Files beginning with . will not get deleted at the destination if they 
don't exist in the source. (I detected that i leaves erased .forward 

I hade replaced a directory with a symbolic link at the source but the 
destination kept the directory.

Please respond to me just so I know the reasons of my problems.


- End forwarded message -
To unsubscribe or change options:
Before posting, read:

spam filter on rsync list

2002-11-06 Thread Martin Pool
#  0.40  0.40
#  0.40  0.307692  delivered-to
#  0.40  0.228571  for
#  0.40  0.164948  from
#  0.40  0.116364
#  0.40  0.080706  mbp
#  0.40  0.055292  nov
#  0.40  0.037553
#  0.40  0.025353  postfix
#  0.40  0.017046  received
#  0.40  0.011429  return-path
#  0.40  0.007648  rsync
#  0.40  0.005112
#  0.40  0.003414  wed
#  0.40  0.002278  with

In response to rampant abuse, I have installed a new spam filter,
Bogofilter, on the rsync mailing list.  Experiments have indicated
that it should get a smaller rate of false negatives or positives than
the existing system.

If there are any problems, please mail me or the postmaster.

To unsubscribe or change options:
Before posting, read:

Re: 2.5.6 release

2002-11-05 Thread Martin Pool
On  5 Nov 2002, jw schultz [EMAIL PROTECTED] wrote:
 This might be a good time for tagging 2.5.6 perhaps.  A fair
 number of bugfixes have gone in, popt updates, and a few new
 features.  It has been stable for about 2 months.  Unless
 there is something in the pipeline it sounds like time to
 release and start on 2.5.7cvs.

Sounds good to me.   I'll do a 2.5.6pre to check next week, unless
somebody else really wants to do it.

To unsubscribe or change options:
Before posting, read:

Re: superlifter design notes and a new proposal

2002-08-04 Thread Martin Pool

On  4 Aug 2002, Wayne Davison [EMAIL PROTECTED] wrote:

 Your previous proposal sounded quite a bit more fine-grained than what
 rZync is doing.  For instance, it sounded like you would have much more
 primitive building-block messages and move much of the controlling
 smarts into something like a python-language scripting layer.  While
 rZync allows ftp-level control (such as send this file, send this
 directory tree, delete this file, create this directory) it does
 this with a small number of higher-level command messages.

OK, good.

 I think that's a good idea.  My rZync app currently operates on each arg
 independently, but I recently discovered that this makes it incompatible
 with rsync when merging directories and such.  For instance, the command
 rsync -r dir1/ dir2/ dir3 merges the file list and removes duplicates
 before starting the transfer to dir3.

This is a substantial source of cruft in the current code, and one of
the reasons claimed to make an up-front traversal necessary.

I think a more efficient, and possibly simpler solution, would be to
first examine all of the source directories and determine their
relationships.  Basically, you might discover that dir2 is in fact a
subdirectory of dir1, or the same (or vice versa), in which case you
can eliminate it.  Or you might discover that they're disjoint.  Given
that directories are trees, I don't think any there are any other

Doing this in a way that properly respects various symlink options
will be a little complex, but I think it is in principle possible.  It
is also something quite amenable to being thoroughly exercised in
isolation as a unit test.

I am pretty sure that you can do this by just examining dir1 and dir2.
You do need to look at the filesystem to find out about symlinks and
so on, but I think you do not need to traverse their contents.

It is pretty complex, so there might be some case I've missed.

 I got rid of the multi-IO idiom of rsync in favor of sending all
 data via messages and limiting each chunk to 32K to allow other
 messages to be mixed into the middle of a large file's data-stream
 (such as verbose output).

OK, that makes sense.  I guess 32k is as good a number as any.

 I think the basic idea of how rZync envisions a new protocol working is
 a good one -- not so much the specifics of the bytes sent in the
 message-header format, but how the messages flow, how each side handles
 the messages in a single process, how all I/O is handled by a single
 function, etc.  There's certainly lots of room for improvement,

I've started looking at the code, and it looks very nice.  It's
certainly easier to read that rsync.  Would you mind putting in some
more comments to help me along though?

I had a couple of internal thoughts about how the code for a next
release ought to go.  Please don't take them as criticisms of your
right to write experimental code however you want, or as an attempt to
dictate how we run things.  I just want to raise the issues.

Global names should be distinguished with some kind of prefix, as in
librsync: rz_ or whatever.  If this ever turns into a library that
gets linked into something else it will help; in the meantime it helps
keep clear what is part of the project and what's pulled in from

I really liked mkproto.awk when I first saw it, but now I'm not so
keen.  I think maintaining header files by hand is in some ways a
good thing, because it forces you to think about whether a particular
function really needs to be exported to rest of the program, or to the
world at large.

From rzync.h:

 #define MSG_HELLO 1

 #define MSG_QUIT  3
 #define MSG_NO_QUIT_YET   4 // XXX needed??
 #define MSG_ABORT 5

 #define MSG_NOTE_DIRNAME  6
 #define MSG_DEC_REFCNT8

These might work better as an enum, so that gdb can show symbolic

 typedef struct {
 char *names[MAX_ID_LIST_LEN];
 long nums[MAX_ID_LIST_LEN];
 int count;
 } ID;

Linus has a rule about not using typedefs for structures, because it's
good to be clear about whether something is a structure or whatever.
I'm inclined to agree.  So I would refer to that thing struct rz_id
or something.

Being 64-bit clean probably implies declaring rz_time_t, rz_uid_t and
so on, and using that rather than the native types, which will be
pretty random.

 This also reminds me that I hadn't responded to jw's question about why
 I thought his pipelined approach was more conducive to a batch protocol
 than an interactive protocol.  To make the pipelined protocol as
 efficient as rsync will require the complexity of his backchannel
 implementation, which I think will be harder to get right than a
 single-process message-oriented protocol.  If every stage is a separate
 process, it seems less clear how to implement something like an
 interactive mkdir or a delete 

Re: superlifter design notes and a new proposal

2002-08-04 Thread Martin Pool

I think there was some confusion earlier in the thread about the
redo thing in rsync 2.  It's not for handling files that have
changed during the transfer.  My understanding of this is that it is
used when the whole-file md4 hash shows that the block checksum
actually made a mistake in transferring the file.


To unsubscribe or change options:
Before posting, read:

Re: new rsync release needed soon?

2002-08-03 Thread Martin Pool

On 31 Jul 2002, Dave Dykstra [EMAIL PROTECTED] wrote:

 Yes I think a new release is needed soon, but there's more patches than
 that that should get in.  

We need to weigh up getting functions in vs making steps small enough
that the chance of breakage is acceptable.  

I am afraid that at the moment our only means of getting really good
cross-platform test coverage for rsync is to throw a release out, and
so that inclines me towards being conservative in what we put in.
Hopefully we can try to get people on the list testing -rc releases
more aggressively.

 A bunch of them have been posted and I was hoping you were keeping
 track of them and would be putting more of them in.

I will try to read back through the list and see about merging them
this week, with a view to a release candidate on about the 11th, and a
release about a week after that.

 The patch that I'd most like to see get in JD Paul's patch for using SSH
 and daemon mode together.  We still don't have an agreement on what the
 syntax should be.  I think the combination of -e ssh and :: which he
 implemented is the most understandable syntax and we should just go with

I agree that it would be really good to support it.  

However, -e and :: seem to be a persistent source of confusion for new
users.  I'm not sure if this change will help those people, or what if
anything would be better.  (More later on this.)


To unsubscribe or change options:
Before posting, read:

Re: superlifter design notes and a new proposal

2002-08-03 Thread Martin Pool

I've been thinking a bit more about Wayne and jw's ideas.

My first draft was proposing what you might call a fine-grained rpc
system, with operations like list this directory, delete this
file, calculate the checksum of this file.  I think Wayne's rzync
system was kind of like that too.

One unusual feature of rsync compared to most protocols is that a
single request causes an enormous amount of stuff to happen: there is
only one request/response per connection at the moment, really.  It is
a very CISC-like protocol.

I wonder what we could achieve if we stay broadly within that model,
of both parties knowing about the whole job, and working in tandem,
rather than one of them controlling the other per file?  So the client
will send something more or less equivalent to its whole command line.

This would be a more conservative design in some ways, because it is
more similar to the existing system.  It also perhaps avoids some of
the issues about pipelining that have been giving me trouble at least.

While staying with that overall approach, we may still be able to make
some improvements in 

 - documenting the protocol

 - doing one directory at a time

 - possibly, doing librsync deltas of directories

 - just one process on either end

 - getting rid of interleaved streams on top of TCP

 - sending errors as distinct packets, including a reference to the 
   file that caused them (if any)

 - handling ACLs, EAs, and other incidental things

 - holding the connection open and doing more operations afterwards

What made me start thinking this way is the realization that the basic
idea of cooperating processes (rather than client-server) is not
really causing us any trouble at the moment.  Other things in that
list are, like the interleaved error stream, or the 3-process model.
But perhaps sending the arguments across the network and having the
remote process know what to do is not such a problem.

I will try to write up a more detailed description of this idea later


To unsubscribe or change options:
Before posting, read:

Re: new rsync release needed soon?

2002-08-01 Thread Martin Pool

On  1 Aug 2002, Dave Dykstra [EMAIL PROTECTED] wrote:
 Another change that I think really ought to go in is something like
 the one at
 to get the correct error codes out of rsync.  But first I think we
 really need to hear from Tridge why he put that code there in the first
 place.  Martin, did you ever ask him?  If not, can you please get him
 to look at it?

I will follow that up with him.


To unsubscribe or change options:
Before posting, read:

Re: Useless option combos (was Re: --password-file switch)

2002-07-30 Thread Martin Pool

On 30 Jul 2002, Wayne Davison [EMAIL PROTECTED] wrote:
 On Tue, 30 Jul 2002, Martin Pool wrote:
  The --password-file option only applies to rsync daemon connections,
  not ssh.
 Perhaps we should make rsync complain about such options that don't make
 sense (another example being trying to use -e with a :: hostspec)?

There's a patch in cvs to make it complain about -e with ::.

The manual actually already says that --password-file does not effect 
remote shells, but I have made it a bit more obvious.  

I agree that a warning would be good.

Shall we do a new release soon?

There's just one more change I would like to put in, which is partially
rolling back the IPv6 patch so that it uses the old code, unmodified,
if --disable-ipv6 is specified.  I'm not sure this needs to go in 
before the next release though.  I think it would reduce the overall
level of pain, particularly on older platforms.


To unsubscribe or change options:
Before posting, read:

Re: timestamp on symlink

2002-07-29 Thread Martin Pool

On 29 Jul 2002, Donovan Baarda [EMAIL PROTECTED] wrote:
 This is because most of python's methods de-reference symlinks. You
 get this error because 'nothere' doesn't exist. The correct way to get time
 info on symlinks is to use os.lstat(), which doesn't de-reference links.

I realize you can get the time that way (although not on all platforms),
but how do you set it?  As jw says, there is no lutime().


To unsubscribe or change options:
Before posting, read:

Re: --password-file switch

2002-07-29 Thread Martin Pool

On 30 Jul 2002, Jochen K?chelin [EMAIL PROTECTED] wrote:
 How can I use the --password-file switch with rsync in order not to
 be promted for the users password so I can run rsync in a cronjob?
 rsync -uavrpog -e ssh /www [EMAIL PROTECTED]:/DESTINATION/`date +%A`
 does not work!
 I always get a prompt to enter users root password!

The --password-file option only applies to rsync daemon connections, not 
ssh.  You need to set up an ssh key to make ssh connections with no
password; see the recent thread or the ssh manual for instructions.


To unsubscribe or change options:
Before posting, read:

Re: superlifter design notes (OpenVMS perspective)

2002-07-28 Thread Martin Pool

On 27 Jul 2002, jw schultz [EMAIL PROTECTED] wrote:
 The server has no need to deal with cleint limitations.  I
 am saying that the protocol would make the bare minimum of
 limitatons (null termination, no nulls in names).

It probably also makes sense to follow NFS4 in representing
paths as a vector of components, rather than as a single string
with '/'s in it or whatever.  ['home', 'mbp', 'work', 'rsync'] avoids
any worries about / vs \ vs :, and just lets the client do
whatever makes sense.

I don't know a lot about i18n support, but it does seem that
programs will need to know what encoding to use for the filesystem
on platforms that are not natively Unicode.  On Unix it probably
makes sense to default to UTF-8, but latin-1 or others are
equally likely.  This is independent of the choice of message
locale.  I think the W32 APIs are defined in Unicode so we don't 
need to worry.

Quoting, translating, or rejecting illegal characters could all
make sense depending on context.

I guess I see John's backup vs distribution question as 
hopefully being different profiles or wrappers around a single
codebase, rather than different programs.  Perhaps the distinction
he's getting at is whether the audience for the client who
uploaded the data is the same client, or somebody else?


To unsubscribe or change options:
Before posting, read:

Re: superlifter design notes (was Re: ...

2002-07-27 Thread Martin Pool

On 27 Jul 2002, John E. Malmberg [EMAIL PROTECTED] wrote:

 A program serving source files for distribution does not need to be that 
 concerned with preserving exact file attributes, but may need to track 
 suggested file attributes for for the various client platforms.
 A program that is replicating for backup  purposes must not have any 
 loss of data, including any operating specific file attributes.
 That is why I posted previously that they should be designed as two 
 separate but related programs.

I'm not sure that the application space for rsync really divides
neatly into two parts like that.  Can you expand a bit more on how
you think they would be used?


To unsubscribe or change options:
Before posting, read:

Re: superlifter design notes (was Re: Latest rZync release: 0.06)

2002-07-27 Thread Martin Pool

I'm inclined to agree with jw that truthfully representing time and
leap seconds is a problem for the operating system, not for us.  We
just need to be able to accurately represent whatever it tells us,
without thinking very much about the meaning.

Somebody previously pointed out that timestamp precision is not a
property of the kernel, but rather of the filesystem on which the
files are stored.  In general there may be no easy way to determine it
ahead of time: you can (if you squint) imagine a network filesystem
with nanosecond resolution that's served by something with rather
less.  I suspect the only way to know may be to set the time and then
read it back.

You can also imagine that in the next few years some platform may
change to a format that accurately represents leap seconds, whether by
TAI or something else.  (I'm not sure if I'd put money on it.)
Presumably that machine's POSIX interface will do a lossy conversion
back to regular Unix time to support old apps.  If we merely used that
information, then when replicating between two such machines, files
whose mtime happened to near on a leap second would be inaccurate.
That would contradict our goal of preserving precision as much as
possible, even if we can't tell if it is accurate.

Ideally, we would use the native interface so as to be able to get the
machine's full precision, and that would imply something like TAI

Whether this is worth doing depends on whether you reckon any platform
will actually move to a filesystem that can represent leap seconds.
As jw says, practically all machines have clocks with more than one
second of inaccuracy, so handling leap seconds is not practically
important.  Certainly they might use it within their ntp code, but I
don't know if they'll expose it to applications.

 What is the actual format of TAI?

64-bit signed seconds-since-1970, plus optionally nanoseconds, plus
optionally attoseconds.  (There's something rather fascinating about
using attoseconds.)

To be fair, it seems that TAI is an international standard, and djb
just made up libtai, not the whole thing.  (Mind you, from some
standards I've seen, that would be a good reason to walk briskly

One drawback, which is not realy djb's fault, is that if you
inadvertently use a TAI value as a Unix value it will be about 10
seconds off -- almost, but not quite, correct.  I'd hate to have bugs
like that but presumably they can be avoided by using the interface

On the other hand, sint32 unix time is clearly running out, and if we
have to use something perhaps it might as well be TAI. 

I would kind of prefer just a single 64-bit quantity measured in (say)
nanoseconds, and compromise on being able to time the end of the
universe, but I don't think I care enough to invent a new standard.


To unsubscribe or change options:
Before posting, read:

Re: rsync (dammit) on RTOS

2002-07-22 Thread Martin Pool

On 22 Jul 2002, Biju Perumal [EMAIL PROTECTED] wrote:
 Thanks Martin.
   I need to port it to QNX
   Any idea of available implementations of rsync on QNX?

I don't know if anybody has done it, but as far as I know QNX is
pretty similar to Unix so it should not be too hard.  Why not try try
compiling it?  If you have trouble consult a QNX guru and/or post a
clear and detailed description of the problem to this list.


To unsubscribe or change options:
Before posting, read:

Re: superlifter design notes (was Re: Latest rZync release: 0.06)

2002-07-21 Thread Martin Pool

On 21 Jul 2002, jw schultz [EMAIL PROTECTED] wrote:
 .From what i can see rsync is very clever.  The biggest
 problems i see with its inability to scale for large trees,
 a little bit  of accumulated cruft and featuritis, and
 excessively tight integration.

Yes, I think that's basically the problem.

One question that may (or may not) be worth considering is to what
degree you want to be able to implement new features by changing only
the client.  So with NFS (I'm not proposing we use it, only an
example), you can implement any kind of VM or database or whatever on
the client, and the server doesn't have to care.  The current protocol
is just about the opposite: the two halves have to be quite intimately
involved, so adding rename detection would require not just small
additions but major surgery on the server.

 What i am seeing is a Multi-stage pipeline.  Instead of one
 side driving the other with comand and response codes each
 side (client/server) would set up a pipeline containing
 those components that are needed with the appropriate
 plumbing.  Each stage would largly look like a simple
 utility reading from input; doing one thing; writing to
 output, error and log.  The output of each stage is sent to
 the next uni-directionally with no handshake required.

So it's like a Unix pipeline?  (I realize you're proposing pipelines
as a design idea, rather than as an implementation.)

So, we could in fact prototype it using plain Unix pipelines?

That could be interesting.

  Choose some files:
find ~ | lifter-makedirectory  /tmp/local.dir
  Do an rdiff transfer of the remote directory to here:
rdiff sig /tmp/local.dir /tmp/local.dir.sig
scp /tmp/local.dir.sig othermachine:/tmp
ssh othermachine 'find ~ | lifter-makedirectory | rdiff delta /tmp/local.dir.sig - 
' /tmp/
rdiff patch /tmp/local.dir /tmp/ /tmp/remote.dir

  For each of those files, do whatever
for file in lifter-dirdiff /tmp/local.dir /tmp/remote.dir

Of course the commands I've sketched there don't fix one of the key
problems, which is that of traversing the whole directory up front,
but you could equally well write them as a pipeline that is gradually
consumed as it finds different files.  Imagine

  lifter-find-different-files /home/mbp/ othermachine:/home/mbp/ | \
xargs -n1 lifter-move-file 

(I'm just making up the commands as I go along; don't take them too

That could be very nice indeed.

I am just a little concerned that a complicated use of pipelines in
both directions will make us prone to deadlock.  It's possible to
cause local deadlocks if e.g. you have a child process with both stdin
and stdout connected to its parent by pipes.  It gets potentially more
hairy when all the pipes are run through a single TCP connection.

I don't think that concern rules this design out by any means, but we
need to think about it.

One of the design criteria I'd like to add is that it should
preferably be obvious by inspection that deadlocks are not possible.

   timestamps should be represented as seconds from
   Epoch (SuS) as unsigned 32 int.  It will be 90 years
   before we exceed this by which time the protocol
   will be extended to use uint64 for milliseconds.

I think we should go to milliseconds straight away: if I remember
correctly, NTFS already stores files with sub-second precision, and
some Linux filesystems are going the same way.  A second is a long
time in modern computing!  (For example, it's possible for a command
started by Make to complete in less than a second, and therefore
apparently not change a timestamp.)  

I think there will be increasing pressure for sub-second precision in
much less than 90 years, and it would be sensible for us to support it
from the beginning.  The Java file APIs, for example, already work in

Transmitting the precision of the file sounds good.

   I think by default user and groups only be handled

I think by default we should use names, because that will be least
surprising to most people.  I agree we need to support both.

Names are not universally unique, and need to be qualified, by a NIS
domain or NT domain, or some other means.  I want to be able to say:


when transferring across machines.

We probably cannot assume UIDs are any particular length; on NT they
correspond to SIDs (?) which are 128-bit(?) things, typically
represented by strings like


So on the whole I think I would suggest following NFSv4 and just using
strings, with the intreptation of them up to the implementation,
possibly with guidance from the admin.

   When textual names are used a special chunk in the
   datastream would specify a node+ID - name
   equivalency immediately before the first use of that

It seems like in general there is a need to have 

Re: superlifter design notes (was Re: Latest rZync release: 0.06)

2002-07-21 Thread Martin Pool

People have proposed network-endianness, ascii fields, etc.  

Here's a straw-man proposal on handling this for people to criticize,
ignite, feed to horses, etc.  I don't have any specific numbers to
back it up, so take it with a grain of salt.  Experiments would be
pretty straightforward.

Swabbing to/from network endianness is very cheap.  On 486s and higher
it is a single inlined instruction and I think takes about one cycle.
On non-x86 it is free.  The cost is barely worth considering: if you
are flipping words as fast as you can you will almost certainly be
limited by memory bandwidth, not by the work of swapping them.

BER-style variable length fields, on the other hand, are very
intensive, because you need to look at the top bit, mask it, shift,

If you're going to use a protocol that difficult, I think you might as
well use ASCII hex or decimal numbers.  

All other things being equal having a readable protocol is good. A
little redundancy in the protocol can help make it readable and also
help detect errors.  For example, distcc's 4-char commands make it
easy for humans to visually parse a packet, and they make errors in
transmission almost always immediately cause an error.  At the same
time they're cheap to process -- it's just a uint32 compare.

Arguably we should use x86-endianness because it's the most common
architecture at the moment, but I don't think the performance
justifies using something non-standard.  Anyhow, I would hope that if
it gets off the ground, this protocol might still be in use in ten
years, in which time x86 may no longer be dominant.  Bigendian also
has the minor advantage that it's easier to read in packet dumps.

Negotiated protocols are a bad idea because they needlessly multiply
the test domain.  Samba has to deal with Microsoft protocols which are
in theory negotiated-endian, but in practice of course Microsoft never
test anything but Intel, so BE support is broken and people writing
non-x86 servers need to negotiate Intel endianness.  Even assuming
we're smarter than they are, I don't think we need to make our lives
difficult in this way.

Lempel-Ziv is ideal for the exact case of compressing
0x0001 into a couple of bits.  Even a very cheap
compressor such as lzo (about half the speed of memcpy) will do well
on that kind of case; presumably numbers like uint64 0, 1, 2, etc will
occur often in packet headers and get tightly compressed.  I think it
will probably deal with filenames for us too.

So, as a straw man:

 - use XDR-like network-endian 32 and 64 bit fields 

 - keep all fields 4-byte aligned

 - make strings int32 length-preceded, and padded to a 4-byte boundary 

 - don't worry about interning or compressing filenames, just send
   then as plain UTF-8 relative to a working directory

 - send things like usernames as strings too

 - make operation names (or whatever) be human-readable, either
   variable-length strings or 4-byte tokens that happen to be readable
   as ascii


To unsubscribe or change options:
Before posting, read:

Re: superlifter design notes (OpenVMS perspective)

2002-07-21 Thread Martin Pool

On 22 Jul 2002, John E. Malmberg [EMAIL PROTECTED] wrote:
  1. Be reasonably portable: at least in principle, it should be
  possible to port to Windows, OS X, and various Unixes without major
 In general, I would like to see OpenVMS in that list.

Yes, OpenVMS, perhaps also QNX and some other TCP/IP-capable RTOSs.

Having a portable protocol is a bit more important than a portable
implementation.  I would hope that with a new system, even if the
implementation was unix-bound, you would at least be able to write a
new client, reusing some of the code, that worked well on ITS.

 A clean design allows optimization to be done by the compiler, and tight 
 optimization should be driven by profiling tools.

Right.  So, for example, glib has a very smart assembly ntohl() and
LZO is tight code.  I would much rather use them than try to reduce
the byte count by a complicated protocol.

  4. Keep the socket open until the client gets bored. (Avoids startup
  time; good for on-line mirroring; good for interactive clients.)
 I am afraid I do not quite understand this one.  Are you refering to a 
 server waiting for a reconnect for a while instead of reconnecting?

What I meant is that I would like to be able to open a connection to a
server, download a file, leave the connection open, decide I need
another file, and then get that one too.  You can do this with FTP,
and (kindof) HTTP, but not rsync, which needs to know the command up

Of course the server can drop you too by a timeout or whatever.

 If so, that seems to be a standard behavior for network daemons.
  5. Similarly, no silly tricks with forking, threads, or nonblocking
  IO: one process, one IO.
 Forking or multiple processes can be high cost on some platforms.  I am 
 not experienced with Posix threads to judge their portability.
 But as long as it is done right, non-blocking I/O is not a problem for me.
 If you structure the protocol processing where no subroutine ever posts 
 a write and then waits for a read, you can set up a library that can be 
 used either blocking or non-blocking.

Yes, that's how librsync is structured.

Is it reasonable to assume that some kind of poll/select arrangement
is available everywhere?  In other words, can I check to see if input
is available from a socket without needing to block trying to read
from it?

I would hope that only a relatively small layer needs to know about
how and when IO is scheduled.  It will make callbacks (or whatever) to
processes that produce and consume data.  That layer can be adapted,
or if necessary, rewritten, to use whatever async IO features are
available on the relevant platform.

 Test programs that internally fork() are very troublesome for me. 
 Starting a few hundred individually by a script are not.

If we always use fork/exec (aka spawn()) is that OK?  Is it only
processes that fork and that then continue executing the same program
that cause trouble?

 I can only read UNIX shell scripts of minor complexity.

Apparently Python runs on VMS.  I'm in favour of using it for the test
suite; it's much more effective than sh.

  12. Try to keep the TCP pipe full in both directions at all times.
  Pursuing this intently has worked well in rsync, but has also led to
  a complicated design prone to deadlocks.
 Deadlocks can be avoided.

Do you mean that in the technical sense of deadlock avoidance?
i.e. checking for a cycle of dependencies and failing?  That sounds
undesirably complex.

 Make sure if an I/O is initiated, that the 
 next step is to return to the protocol dispatching routine.

  9  Model files as composed of a stream of bytes, plus an optional
  table of key-value attributes. Some of these can be distinguished to
  model ownership, ACLs, resource forks, etc.
 Not portable.  This will effectively either exclude all non-UNIX or make 
 it very difficult to port to them.

Non-UNIX is not completely fair; as far as I know MacOS, Amiga,
OS/2, Windows, BeOS, and QNX are {byte stream + attributes + forks}

I realize there are platforms which are record-oriented, but I don't
have much experience on them.  How would the rsync algorithm even
operate on such things?

Is it sufficient to model them as ascii+linefeeds internally, and then
do any necessary translation away from that model on IO?

 BINARY files are no real problem.  The binary is either meaningful on 
 the client or server or it is not.  However file attributes may need to 
 be maintained.  If the file attributes are maintained, it would be 
 possible for me to have a OpenVMS indexed file moved up to a UNIX 
 server, and then back to another OpenVMS system and be usuable.

Possibly it would be nice to have a way to stash attributes that
cannot be represented on the destination filesystem, but perhaps that
is out of scope.

 I recall seeing a comment somewhere in this thread about timestamps 
 being left to 16 bits.

No, 32 bits.  16 bits is obviously silly.

 File timestamps 

Re: superlifter design notes (OpenVMS perspective)

2002-07-21 Thread Martin Pool

 User-Agent: Mozilla/5.0 (X11; U; OpenVMS COMPAQ_AlphaServer_DS10_466_MHz; en-US; 
rv:1.1a) Gecko/20020614

If something as complex as Mozilla can run on OpenVMS then I guess we
really have no excuse :-)


To unsubscribe or change options:
Before posting, read:

Re: superlifter design notes and rZync feedback

2002-07-19 Thread Martin Pool

One more link, about variable-length vs fixed-length encodings:

(The HTML is a bit broken, view the source.)

Basically they make the somewhat obvious point that variable-length
encodings are much slower to handle than fixed-length.  I don't know
if the difference is so great that lzo encoding could produce a
smaller result with less work.  I wouldn't be surprised either way,

One way to look at it is this: in the case where you're CPU-bound, not
network-bound, then you'll definitely want to use something like XDR.
In the case where you're completely network-bound, then you probably
want to use gzip -9 or even bzip2, and whether the underlying protocol
is fixed or variable-length probably doesn't matter.  

So perhaps XDR plus compression is a good tradeoff across a wider
domain.  (Or perhaps not.)


To unsubscribe or change options:
Before posting, read:

Re: rsync anti-FUD

2002-07-18 Thread Martin Pool

On 18 Jul 2002, Paul Nendick [EMAIL PROTECTED] wrote:
 I'm working on a commercial project that would benefit immensely from
 the use of rsync.  However, I cannot convince management that rsync is a
 worthy tool due to the rote it's shareware, it's not supported FUD.  
 Are there any documented, corportate users of rsync? Testimonials?  In
 short, how do I drag this risk-averse group out of the FTP age into the
 rsync present?

I work for HP.  We use it extensively, indeed so much so that it would
probably be impossible to count the number of users.  If you want
support, I'm sure HP's consulting group would be interested in helping
you out and very capable.  If you do not already have an account
manager there, I can find somebody good for you to speak to.  They can
probably produce a nice pointyhead-friendly Powerpoint slideshow about
the strengths of open source :-)

rsync is a mature product, with many established users.  As other
people have said, it is the de-facto standard for filesystem

rsync's stability means that new features do not go in very fast,
however there is active work on extending it to new areas and
capabilities, including xdelta, Unison, librsync, rzync, lift, pysync,
and others.  As far as I know, there is no new work going into FTP as
a protocol, although people are doing some nice work on
implementation, such as ProFTPd.  So you need not fear rsync leading
you into a dead end.

One of the nice things about open source is that you are not locked in
to a single provider.  If, at some time during the project, you decide
you want to pay for commercial support, you can do so.  If you do pay
for commercial support and it turns out that you're not happy with the
company you can change.

rsync is not shareware anyhow; it is Free Software, or, if you prefer,
Open Source Software.  Shareware is sometimes the worst of both worlds
-- half-hearted support, but no opportunity to fix things yourself or
seek alternative help.

I don't know what FTP implementation you're using, but I suspect most
of them will be either open source, shareware (on Windows), or a thin
veneer of Unix-vendor gloss on an old BSD implementation.

A good way to proceed might be to post a brief description of what it
is you want to do to the list.  I'm sure several people will be able
to tell you that's easy, that's possible, or rsync's not the
right tool.  Drawing on the freely-available resource of experienced
users is probably the best thing you can do to reduce risk.


To unsubscribe or change options:
Before posting, read:

Re: strip setuid/setgid bits on backup (was Re: small security-related rsync extension)

2002-07-18 Thread Martin Pool

On 16 Jul 2002, Dan Stromberg [EMAIL PROTECTED] wrote:

 If by sillyrename, you mean busy text files are renamed to .nfs*, 

sillyrename is in fact the technical term for this.  I am not making
it up.  I'm pretty sure Callaghan's book calls it that, Sun people call
it that, and it is the term used in the Linux NFSv3 implementation, etc.

 then I think you're missing how it works yourself, I'm sorry to say.
 You just unlink something on the server, and it happens, like magic.
 Maybe that happens on the client side - but that's really beside the
 point.  Rename will probably do just as well.

it happens, like magic.  Uh-huh.

My understanding of sillyrename, from memory and a brief perusal of
the kernel source (I don't have Callaghan here), is as follows:

It is a purely client-side behaviour, to handle the fact that Unix
files may be still open when unlinked.  This relies on Unix having an
in-memory use count, in addition to the on-disk link count.  The
problem is that the NFS server may reboot while the client has the
file open, therefore losing its in-memory use count, and causing the
file data to be garbage-collected by fsck.  As a workaround, deletes
or link-replacement on the client for a file still in use are handled
by moving that file to a temporary name, so that from the
point-of-view of other clients the file has gone.  When the use count
drops to zero, the client removes the .nfs file.  If the client
crashes or the net is partitioned before the use count goes to zero,
then the .nfs file may remain indefinitely, which is why you need a
reaper run from cron.

If this is wrong please explain how.

This is, incidentally, a much better solution for replacing in-use
files than rsync's backups, because it only affects in-use files, and
they are gc'd when no longer in use.  Replacing local files and
letting the kernel handle it is even better, because it can never

 .nfs* may well suffer from the same the backup file is still setuid
 problem though.

Yes; if you replace in-use setuid binaries in such a way that
sillyrename orphans may be generated, then they may still be setuid,
and that may be a security problem.  I agree.

These files should only be generated by edge cases where the program
is in use when replaced, and where the client loses contact with the
server or abruptly reboots.  Presumably if you're installing a
security update to a program then you need to restart that program
fairly promptly, so the window should be small.  Of course small
window != zero, but there is no need to unnecessarily panic.

It looks like the root problem is that replacing a setuid file from an
NFS client may cause a setuid sillyrename file to remain under some
circumstances.  I haven't tested it, but I can believe that might
happen.  Is that what you're trying to say?

If this is true, then it is a problem with NFS, not with rsync.  The
failure would presumably occur in the same way if you used dpkg, rpm,
pkgadmin or cp to replace the files.

 I'm finding it hard to see why this makes the issue moot.

It is moot because you can just run rsync direct to the NFS server.
This is faster and avoids the security hole.  If you disagree, please
explain why.

 I'm also finding it hard to understand why security might be so
 unimportant to you.  I seriously wish you'd read bugtraq for a few
 months before making such a short sighted decision.

I have a pretty good understanding of Unix security, and I do consider
it important.  If you want changes to rsync you have to make a clear
case, not just wave your hands and say like magic.


To unsubscribe or change options:
Before posting, read:

Re: superlifter design notes and rZync feedback

2002-07-18 Thread Martin Pool

On 18 Jul 2002, Wayne Davison [EMAIL PROTECTED] wrote:

 (definitely NOT rzync).

Great.  (Excuse my overreaction :-)

 Re: rzync's variable-length fields:  Note that my code allows more
 variation than just 2 or 4 bytes -- e.g., I size the 8-byte file-size
 value to only as many bytes as needed to actually store the length.  I
 agree that we should question whether this complexity is needed, but I
 don't agree that it is wrong on principal.  There are two areas where
 field-sizing is used:  in the directory-info compression (which is very
 similar to what rsync does, but with some extra field-sizing thrown in
 for good measure), and in the transmission protocol itself:

OK.  If the protocol said that all integers are encoded in a UTF-8-ish
or BER-ish variable length scheme that would sound perfectly
reasonable to me.  I had misunderstood the document as suggesting that
some fields should be defined to be different lengths to others that
would worry me.

There is still a question on the relative merits of having
known-length headers (easier to manage buffers, know how much to read,
etc), vs making them as small as possible.

I think I mentioned this -- I'd like to have a reasonable means to
choose a compression scheme at connection time.  bzip2 would be good
for modems; lzo for 100Mbps.  (I think of bzip2 as simmering on the
stove all day, and lzo as lightly blanching :-)

 I still have questions about how best to handle the transfer of
 directory info.  I'm thinking that it might be better to remove the
 rsync-like downsizing of the data and to use a library like zlib to
 remove the huge redundancies in the dir data during its

Ben Escoto suggested a stack like this:

  1.  The specification for an abstract protocol designed to allow a
  single threaded application get good performance using a single,
  possibly low bandwidth/high latency pipe.  No specific file commands
  would enter in at this stage, but error reporting and recovery, some
  kind of security policy, and some other stuff I'm omitting would be

  2.  A library to make it easy for applications to work with protocols
  that have the form in 1.  A well-written interface to a scripting
  language (probably python) would be considered a core part of this.

  3.  Specification for a more specific, rsync-like protocol, and maybe
  another library (again with at least a scripting wrapper) to make it
  easy for applications to implement the protocol.

  4.  The model application rsync3 which shows off what the protocol can
  do.  Ideally this part should be really short and sweet.

I think that's a good way to play it, because there is enough work in
each section that they're non-trivial layers, but they're also
sufficiently separate to allow a lot of good experimentation or

I'd hope that by getting a good foundation in #1 and #2, we would be
able to experiment with doing binary deltas on directories, or not, or
something else again.  I would hope that working only at layer 4,
you'd be able to implement a client that could detect remote renames
(by scanning for files with the same size, looking at their checksums,

I wonder if this layering is excessive, but I think that all the
layers are necessary, and a first implementation could be simple in
many cases.  For example, 2 could initially be trivially implemented
in a way that only supports non-pipelined operation.

 In the protocol itself, there are only two variable-size elements that
 goes into each message header.  While this increases complexity quite a
 bit over a fixed-length message header, it shouldn't be too hard to
 automate a test that ensures that the various header combinations
 (particularly boundary conditions) encode and decode properly.  I don't
 know if this level of message header complexity is actually needed (this
 is one of the things that we can use the test app to check out), but if
 we decide we want it, I believe we can adequately test it to ensure that
 it will not be a sinkhole of latent bugs.

OK, good.

 Re: rzync's name cache.  I've revamped it to be a very dependable design
 that no longer depends on lock-step synchronization in the expiration of
 old items (just in the creation of new items, which is easy to achieve).
 Some comments on your registers:
 You mention having something like 16 registers to hold names.  I think
 you'll find this to be inadequate, but it does depend on exactly how
 much you plan to cache names outside of the registers, how much
 retransmission of names you consider to be acceptable, and whether you
 plan to have a move mode where the source file is deleted.

Yes, I agree that 16 is probably too small; the next round number
would be 256.  If we use something like BER it could be unboundedly
big.  However, since using a name causes server-side resources to be
allocated, that's probably no good.  We don't want somebody abusing a
public server by allocating a zillion names; on the other hand I 

Re: strip setuid/setgid bits on backup (was Re: small security-related rsync extension)

2002-07-12 Thread Martin Pool

On 11 Jul 2002, Dan Stromberg [EMAIL PROTECTED] wrote:

  I don't get what you are doing.  Where did these insecure
  suid root files come from in the first place?
 Have you ever read bugtraq on a regular basis?  They're coming out of
 the woodwork.

Another question would be, why do you want to keep them around at all?
Presumably so that people can undo the changes if something goes

For your situation, it might work better to dump them all into a mode
700 backup directory.

It seems like the overarching problem is different focusses: Dan wants
rsync to be a software-distribution mechanism (which is certainly a
good use for it), which which case stripping setuid bits is obviously
quite desirable.  But for a bit-perfect backup tool, it's probably

I have been thinking about what general strategies software tools use
to address this problem of focus.  They seem to be

1- Add a pile of built-in options (--strip-setuid) -- rsync's strategy
   to date.

2- Build a common layer, and then variations on the program to suit
   different purposes.  I think rdiff-backup is kind of like this.
   It has the advantage that end users who just want to do backups
   or software distribution or mirroring don't need to deal with 

3- Make the program call out to various scripts that can control its
   behavior -- the CVS server is like this, for example, with loginfo
   scripts and so on.

4- Make the program's interfaces and performance characteristics be
   such that it can easily be controlled by a scripting language.
   Subversion is trying to be like this.  The --log-format proposal
   for rsync goes in this direction, though needing a new socket for
   each invocation rather cripples it.

5- Make the whole program intimately intermingled with a scripting
   language, like emacs or (perhaps) Mozilla.


To unsubscribe or change options:
Before posting, read:

Re: strip setuid/setgid bits on backup (was Re: small security-related rsync extension)

2002-07-12 Thread Martin Pool

On 12 Jul 2002, Dan Stromberg [EMAIL PROTECTED] wrote:

 Because when we update, for example, bash, everbody's bash is going to
 die on them if we don't keep around backups (segfault as you demand page
 from a binary that has Mostly the Same Stuff in Different Places).

rsync creates a new file, and then atomically moves it into place on
successful completion.  You should never end up with the file being
part-changed, assuming you don't use --partial or -P.  

It does not normally unlink as such, although I think it might try to
use that as a big hammer if the rename fails.

So up until the file is completely transferred and the replacement
takes place, everyone will keep seeing the old file.  Afterwards,
people who had the old file open will keep seeing it, and people who
open the new one will get the new one.  

This is, as far as I know, the same approach and the same semantics
that most unix software-distribution systems, such as dpkg or rpm,
will give you.

It may break on systems (like HP-UX?) that don't let you rename or
remove a file while it's being executed.  I don't know what you're
meant to do there, except shut down everything on the machine.
Presumably you don't have one of those or -b would be failing.

 Or does rsync unlink and recreate rather than overwriting?  In that
 case, we might just end up with a bunch of .nfs* files if we don't keep
 Rumor has it, however, that depending on the .nfs* mechanism doesn't
 always work.  I haven't seen it fail myself, but one of the other guys
 here, who's pretty experienced, sounds pretty convinced that it fails

It's possible that you can get .nfs* orphans, but only you can know
whether they're common in your environment.  If I understand
correctly, the only problem with that would be that the old, setuid
text still hangs around in the .nfs file.

I would be inclined to say that it's not rsync's problem if unlink()
is unreliable, so just run a sillyrename reaper and be done.

Is it possible to just rsync onto the NFS server, rather than onto the
clients?  That would probably be faster, and avoid sillyrename.

 I considered this, but I wasn't sure NFS/TheKernel would allow demand
 paging from an inaccessible binary on all of our supported *ix platforms
 now and into the future.  Are you?  We currently support Linux, Solaris,
 Irix and Tru64 presently, and may add and drop some in the future.

I suspect that any machines that let you rename or unlink in-use text
files will not care whether they have an accessible name or not.
Unfortunately experiment is probably the only way to tell.


To unsubscribe or change options:
Before posting, read:

superlifter design notes (was Re: Latest rZync release: 0.06)

2002-07-11 Thread Martin Pool

I've put a cleaned-up version of my design notes up here

It's very early days, but (gentle :-) feedback would be welcome.  It
has some comments on Wayne's rzync design, which on the whole looks
pretty clever.

I don't have any worthwhile code specifically towards this yet, but I
have been experimenting with the protocol ideas in distcc

I like the way it has worked out there: the protocol is simple and
easy to understand, the bugs more or less found themselves, and it
feels like I'm using TCP in a natural way -- all of these much more so
than rsync at the moment.  (Of course, the rsync problem is much more


To unsubscribe or change options:
Before posting, read:

Re: strip setuid/setgid bits on backup (was Re: small security-related rsync extension)

2002-07-11 Thread Martin Pool

On  8 Jul 2002, Dave Dykstra [EMAIL PROTECTED] wrote:
 The idea of the rsync client executing programs has been descussed before
 and rejected because it could easily be done by an external program if
 rsync simply passes it filenames.  The only case I can see for having rsync
 execute programs is in the daemon; that was once approved in principle but
 nobody every implemented it.
 What we need, have long wanted, and even once had someone volunteer for
 (but it was never completed), is a major upgrade to the --log-format option
 to allow a lot more flexibility in what gets printed, and to have it work
 consistently with and without --dry-run.  This would work too with lots of
 files because the names get streamed out as they're processed.  See for
 example the thread around

I'm pretty sure I'm with Dave on this.  

I think it would be reasonable when over ssh to have a way to run a
script on the remote machine, and have that script also get a copy of
the log.


To unsubscribe or change options:
Before posting, read:

strip setuid/setgid bits on backup (was Re: small security-related rsync extension)

2002-07-08 Thread Martin Pool

Any thoughts on whether this should go in?  I can see arguments either
way.  It seems like we ought to think about whether it would be better
to do it as part of a generalized --chmod or --chmod-backup facility.


On 21 Jun 2002, Dan Stromberg [EMAIL PROTECTED] wrote:
 Included below is a shar archive containing two patches that together:
 1) make backup files get their setuid and setgid bits stripped by
 2) add a -s option that allows backup files to continue to have
 these privileges
 This means that if you update a collection of binaries with rsync, and
 one or more of them has a local-root security problem, the backup
 file(s) created when you fix the problem in your source archive won't
 remain exploitable.
 The patch is relative to 2.5.4.
 The backup-dir support is attempted but untested.  We're using the
 default backup behavior (with ~) in production.  I'd be pleased if
 someone who uses backup-dir were to try it out and let me know how it
 I'd also be pleased if this were to find its way into the main
 distribution in some form.
 Thank you.
 # This is a shell archive (shar 3.32)
 # made 06/21/2002 20:17 UTC by [EMAIL PROTECTED]
 # Source directory /dcslibsrc/network/rsync/exportable-patches
 # existing files WILL be overwritten
 # This shar contains:
 # length  mode   name
 # -- -- --
 #   1798 -rw-r--r-- backup-priv-backups
 #   1339 -rw-r--r-- options-priv-backups
 if touch 21 | fgrep 'amc'  /dev/null
  then TOUCH=touch
  else TOUCH=true
 # = backup-priv-backups ==
 echo x - extracting backup-priv-backups (Text)
 sed 's/^X//'  'SHAR_EOF'  backup-priv-backups 
 X*** backup.c.t   Sun May  6 23:59:37 2001
 X--- backup.c Fri Jun 21 13:15:51 2002
 X*** 29,34 
 X--- 29,56 
 X  extern int preserve_devices;
 X  extern int preserve_links;
 X  extern int preserve_hard_links;
 X+ extern int priv_backups;
 X+ #ifdef HAVE_CHMOD
 X+ static int strip_perm(char *fname)
 X+ {
 X+struct stat buf;
 X+if (link_stat(fname,buf) != 0) {
 X+rprintf(FERROR,stat failed\n);
 X+return 0;
 X+if (S_ISREG(buf.st_mode)  (buf.st_mode  (S_ISUID | S_ISGID))) {
 X+mode_t new_mode;
 X+new_mode = buf.st_mode  01777;
 X+if (do_chmod(fname,new_mode) != 0) {
 X+rprintf(FERROR,chmod failed\n);
 X+return 0;
 X+return 1;
 X+ }
 X+ #endif
 X  /* simple backup creates a backup with a suffix in the same directory */
 X  static int make_simple_backup(char *fname)
 X*** 46,54 
 X rsyserr(FERROR, errno, rename %s to backup %s, fname, 
 X return 0;
 X }
 X!} else if (verbose  1) {
 X!rprintf(FINFO,backed up %s to %s\n,fname,fnamebak);
 X }
 X return 1;
 X  }
 X--- 68,86 
 X rsyserr(FERROR, errno, rename %s to backup %s, fname, 
 X return 0;
 X }
 X!} else {
 X!if (verbose  1) {
 X!rprintf(FINFO,backed up %s to %s\n,fname,fnamebak);
 X! #ifdef HAVE_CHMOD
 X!if (!priv_backups  strip_perm(fnamebak) == 0) {
 X!return 0;
 X!} else if (verbose  1) {
 X!rprintf(FINFO,Stripped setuid and/or setgid from 
 X! #endif
 X }
 X return 1;
 X  }
 X*** 271,276 
 X--- 303,314 
 X fname, keep_name, strerror(errno));
 X };
 X set_perms (keep_name, file, NULL, 0);
 X+/* may mean an extra stat */
 X+ #ifdef HAVE_CHMOD
 X+if (!priv_backups  strip_perm(keep_name) == 0) {
 X+return 0;
 X+ #endif
 X free_file (file);
 X free (file);
 $TOUCH -am 06211315102 backup-priv-backups 
 chmod 0644 backup-priv-backups ||
 echo restore of backup-priv-backups failed
 set `wc -c backup-priv-backups`;Wc_c=$1
 if test $Wc_c != 1798; then
   echo original size 1798, current size $Wc_c
 # = options-priv-backups ==
 echo x - extracting options-priv-backups (Text)
 sed 's/^X//'  'SHAR_EOF'  options-priv-backups 
 X--- options.c.t  Fri Jun 21 08:56:31 2002
 X+++ options.cFri Jun 21 09:41:41 2002
 X@@ -21,6 +21,9 @@
 X #include rsync.h
 X #include popt.h
 X+#ifdef HAVE_CHMOD
 X+int priv_backups = 0;
 X int make_backups = 0;
 X int whole_file = -1;
 X int copy_links = 0;
 X@@ -188,6 +191,7 @@
 X   rprintf(F, -b, --backupmake backups (default %s 
 X   rprintf(F, --backup-dirmake backups into this directory\n);
 X   rprintf(F, --suffix=SUFFIX override backup suffix\n);  

avoiding temporary files (Re: about rsync)

2002-06-24 Thread Martin Pool

On 22 Jun 2002, macgiver [EMAIL PROTECTED] wrote:
 i love rsync, but i want to know how it is possible to let rsync 
 download a file with the same filename, and not a temp filename like: 
 package.tar.gz.hzmkjz5 or so...
 i don 't want to use temp filenames when downloading with rsync. Why? 
 Because i'm writing a program with a progress bar and it sucks with the 
 temp filename.

That feature is quite tightly tied in to the design of rsync.  We need
to use a temporary filename because rsync needs access to the old file
to do delta encoding, and in any case many people want the file
atomically replaced when it's complete.

 what do you suggest me? :(

Run with --progress and read stdout, or just look for
.package.tar.gz.* and see how big it is.  The name is unpredictable,
but there will be only one.  You could even set the temporary
directory to make it more predictable.

Oh, and please use the list so that other people can make suggestions.


To unsubscribe or change options:
Before posting, read:

Re: Possible UID/GID bug in chrooted shells?

2002-06-13 Thread Martin Pool

On 12 Jun 2002, Tom Worley [EMAIL PROTECTED] wrote:
 Dear Martin,
 Sorry to mail you directly, but I've had no joy trying to get round this 
 problem (read the faqs, posted on the mailing list RTFM a lot etc)
 This is (slightly updated) what I posted to the mailing list:
 I'm stuck on a problem with rsync...
 We've got a chrooted shell with rsync and all the needed libs inside  (and not 
 much else). 
 We're using rsync over ssh to send the files into this chrooted session. The 
 rsync binary in the chrooted session is SUID root so that it can create the 
 files with the correct UID/GID. When the following is run, it creates all the 
 files as root.staff, not as the test user/group, or the correct UID/GID of 
 the original files, so the SUID root is working. We've also tried extracting 
 files from tar that belong to another user (that is the files inside the tar) 
 and when tar is suid root in the chroot it extracts them with the correct 
 This is the command we used:
 rsync --delete-excluded --delete -essh -avz --numeric-ids  /home/admin/ 
 (from outside the chroot, the test user being inside it)
 The test user's shell is the chrooted session,

What do you mean by that?  Their /etc/passwd shell is some chrooted
session program?  If  you wrote it please post the source, otherwise
what is the name.

Do you know you cannot just run /usr/sbin/chroot as a regular user?
It's a privileged operation; it must be done before changing uid.

 and the session works fine through ssh, rsync runs without errors,
 but all the files created are owned by root.

 If we try the same but to a non-chrooted user (and suid root to the rsync 
 binary outside the chroot, yeah yeah, it's just a test), it correctly creates 
 the files with the right UID/GID. I've even tried copying the complete 
 /etc/passwd and shadow files into the chroot jail, but that didn't help. We'd 
 rather not have to setup users/passwords for several hundered users for rsync 
 and run it as a daemon (and send the password securely somehow to each 
 person).  Could it be a bug in the way rsync sets the UID/GID of the files?
 Running Debian Linux Sid, up to date as of this morning, and rsync:
 rsync  version 2.5.6cvs  protocol version 26 from debian packages, linux 
 2.4.18 kernel, chroot 2.0.11 on an i686.
 Kind regards, and TIA,
 Tom Worley


To unsubscribe or change options:
Before posting, read:

  1   2   3   4   5   >