Re: news of the rproxy world

2000-12-14 Thread Alberto Accomazzi

In message [EMAIL PROTECTED], Martin Pool writes:

 I hope this can eventually replace the coding functions in rsync,
 although at the moment Rusty is going ahead on rsync 3.0 with a much
 simpler and less flexible library.

This is the first time I hear of rsync 3.0 -- could you (or Rusty)
comment on the planned features and timeline?  Presumably this will
include the incremental directory tree creation and transfer that
Tridge was talking about way back when?  Of maybe file list caching?

Thanks,

-- Alberto



Alberto Accomazzi  mailto:[EMAIL PROTECTED]
NASA Astrophysics Data System  http://adsabs.harvard.edu
Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu
60 Garden Street, MS 83, Cambridge, MA 02138 USA   





Re: news of the rproxy world

2000-12-15 Thread Alberto Accomazzi

In message [EMAIL PROTECTED], Dan
 Phoenix writes:

 where does the file list caching go?

Err... nowhere, at the moment.  The list that rsync builds in memory
containing the file names to be transferred and their signatures is
right now built from scratch for every request.  For a site which
mirrors the same of data to a lot of clients, it makes sense to allow
caching of the file list so that this is done just once every so often.

If you search the mailing list archives you'll find a few messages
about the issue.


-- Alberto


  Date: Thu, 14 Dec 2000 16:42:34 -0500
  From: Alberto Accomazzi [EMAIL PROTECTED]
  To: Martin Pool [EMAIL PROTECTED]
  Cc: [EMAIL PROTECTED]
  Subject: Re: news of the rproxy world 
  
  In message [EMAIL PROTECTED], Martin Pool writes:
  
   I hope this can eventually replace the coding functions in rsync,
   although at the moment Rusty is going ahead on rsync 3.0 with a much
   simpler and less flexible library.
  
  This is the first time I hear of rsync 3.0 -- could you (or Rusty)
  comment on the planned features and timeline?  Presumably this will
  include the incremental directory tree creation and transfer that
  Tridge was talking about way back when?  Of maybe file list caching?
  
  Thanks,
  
  -- Alberto
  
  
  ***
*
  Alberto Accomazzi  mailto:[EMAIL PROTECTED]
u
  NASA Astrophysics Data System  http://adsabs.harvard.ed
u
  Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.ed
u
  60 Garden Street, MS 83, Cambridge, MA 02138 USA   
  ***
*
  
  
 




Alberto Accomazzi  mailto:[EMAIL PROTECTED]
NASA Astrophysics Data System  http://adsabs.harvard.edu
Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu
60 Garden Street, MS 83, Cambridge, MA 02138 USA   





Re: are redhat updates rsyncable

2001-01-25 Thread Alberto Accomazzi

In message [EMAIL PROTECTED], Harry Putnam writes:

 Sorry to reprint this request for information but I guess I want more
 handholding here.
 
 [...]
 
  
  "Michael H. Warfield" [EMAIL PROTECTED] writes:
  
 rsync ftp.wtfo.com::
  
 
 Getting this far ... works as advertised.
 But beyond that point, how to actually get to the files and collect them?


If you use the latest sources from the CVS tree, an rsync on an 
"rsync URL" lists its contents if the URL ends with a trailing slash:

adsone-465: ./rsync-current/rsync rsync://ftp.wtfo.com/
  WTFO Mirror FTP Site

Please report any problems immediately to [EMAIL PROTECTED].



ftp Complete wtfo FTP Sit
rh70RedHat 7.0 complete
rh70-isoRedHat 7.0 ISO Images
[...]

adsone-466: ./rsync-current/rsync rsync://ftp.wtfo.com/rh70/
  WTFO Mirror FTP Site

Please report any problems immediately to [EMAIL PROTECTED].



drwxr-xr-x4096 2000/11/23 18:58:17 .
drwxr-xr-x4096 2000/10/12 09:10:11 .nfs_dontpush
[...]


And so on.  Previous versions of rsync did not handle things as
gracefully and would display the "client: nothing to do" message instead.
In general, though, I imagine that you'd want to rsync on
a whole "module" (i.e. top-level element of the rsync URL), as in:

adsone-467: ./rsync-current/rsync -avz rsync://ftp.wtfo.com/rh70 /mirror/

You can get the current version of rsync from:
rsync://rsync.samba.org/ftp/unpacked/rsync/

On a related note, it looks to me like running rsync in list mode as
shown above causes the daemon to create a recursive listing of all files
under the top-level directory, which is completely unnecessary.
Martin, you may want to have a look at that.


Hope this helps,


-- Alberto





********
Alberto Accomazzi  mailto:[EMAIL PROTECTED]
NASA Astrophysics Data System  http://adsabs.harvard.edu
Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu
60 Garden Street, MS 83, Cambridge, MA 02138 USA   





Re: exclude list and patterns

2001-03-20 Thread Alberto Accomazzi

In message [EMAIL PROTECTED], Dave Dykstra writes:

 It is possible to build your own complete list of files to copy and give
 them all to rsync, by building a --include list and doing '--exclude *'
 at the end.  Currently you need to also either have --include '*/' or
 explicitly list all parent directories above the files you want included
 or else the exclude '*' will exclude the whole directories.  There's been
 talk of adding a --files-from option which would remove this last restriction,
 and I even offered to implement it, but I'm still waiting for anybody to
 give performance measurements (using rsync 2.3.2 which had an include
 optimization that did something similar if there were no wildcards) to show
 what the performance impact would be.

Dave,

I see you've now mentioned a few times what the performance impact of
this proposed patch would be, and I can't quite understand what you're
getting at.  My suggestion of --files-from came from the obvious (at
least to me) realization that the current include/exclude mechanism is
confusing to many users, and had nothing to do with performance (at
least on my mind).  I thought (and still think) that it would provide
a cleaner interface for performing fine-grained synchronization of
part of a filesystem, and as such was a desireable feature.

So while I understand the argument of not wanting to clobber rsync
with a lot of unnecessary features, I thought this one makes sense
regardless of performance or compatibility issues.  In fact, I think
it makes sense to have it as a separate option as opposed to kludging
the equivalent functionality in the include/exclude syntax to avoid
the proliferation of confusing options and special cases.

Anyway, just wanted to make this point.  As I have mentioned, I don't
personally *need* this option at the moment, but I think that if
enough people wanted to see it in rsync it should be implemented
regardless of what the change in performance may be.


-- Alberto




Alberto Accomazzi  mailto:[EMAIL PROTECTED]
NASA Astrophysics Data System  http://adsabs.harvard.edu
Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu
60 Garden Street, MS 83, Cambridge, MA 02138 USA   





Re: exclude list and patterns

2001-03-20 Thread Alberto Accomazzi

In message [EMAIL PROTECTED], Dave Dykstra writes:

 Well the easier syntax only motivates me 90% to personally take the time to
 implement the option.  If somebody can show a performance improvement that
 will be enough to clinch it for me.  My initial motivation for implementing
 the optimization that was taken out in 2.4.0 was performance (which I
 hadn't measured), and when Tridge took it out he asked me to show him a
 performance gain to justify leaving it in I did some measurements then and
 couldn't pursuade myself.  All I'm asking is for somebody to put a little
 effort into showing a modest performance difference.

Well, not to be pedantic here, but how do we measure performance of a
feature that isn't available yet?  I guess my point is that Tridge's
objection to the optimization does not apply here, since this is
simply a new option rather than a rewrite of code that works already.
And the new option is there to make the program more user-friendly
rather than increasing performance.

-- Alberto



Alberto Accomazzi  mailto:[EMAIL PROTECTED]
NASA Astrophysics Data System  http://adsabs.harvard.edu
Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu
60 Garden Street, MS 83, Cambridge, MA 02138 USA   





Re: unlimited backup revisions?

2001-03-21 Thread Alberto Accomazzi

In message [EMAIL PROTECTED], "Sean J. Schluntz" writes:

 
  That's what I figured.  Well, I need it for a project so I guess you all
  won't mind if I code it and submit a patch ;)
  
  How does --revisions=XXX sound.  --revisions=0 would be unlimited, any oth
er
  number would be the limiter for the number of revisions.
 
 And when it reaches that number, do you want it to delete old
 revisions, or stop making new revisions?
 
 You would delete the old one as you continue rolling down.
 
 
 Perhaps something like --backup=numeric would be a better name.  In
 the long term it might be better to handle this with scripting.

I would suggest no reinventing the wheel and doing this the way GNU cp
does it:

  -V, --version-control=WORD   override the usual version control

The backup suffix is ~, unless set with SIMPLE_BACKUP_SUFFIX.  The
version control may be set with VERSION_CONTROL, values are:

  t, numbered make numbered backups
  nil, existing   numbered if numbered backups exist, simple otherwise
  never, simple   always make simple backups


Unless there is some overwhelming reason not follow this scheme.


-- Alberto


********
Alberto Accomazzi  mailto:[EMAIL PROTECTED]
NASA Astrophysics Data System  http://adsabs.harvard.edu
Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu
60 Garden Street, MS 83, Cambridge, MA 02138 USA   





Re: Need for an --operating-window feature ???

2001-04-04 Thread Alberto Accomazzi

In message A103903D0DE0D1119EE800805FD69AC402D08B04@XCGNY005, "Allen, John L.
" writes:

 I'm new to rsync, so be gentle... I have need to use rsync, but want 
 to have it operate only off-hours when the network is lightly loaded. 
 I did not see any option for making rsync obey an "operating time 
 window" so that it would basically cease copying data if the time-of-day 
 falls outside a specified window.  I thus thought it might be a good idea
 to have a --operating-window option where you could specify an 
 allowed time of operation by indicating two endpoints, perhaps like this
 
   --operating-window 22:00-05:00
 
 where the times are given in HH:MM 24-hour military time.  
 You could obviously extend this to allow for multiple disjoint windows,
 but I don't think there's much point.


I've done something like this using a shell script.  
Essentially the code goes like this:

   if in_operating_window ; then
echo kill -HUP $$ | at $end_operating_window_time
exec rsync "$@"
   else 
echo $0 "$@" | at $start_operating_window_time
   fi

as you can see, the script uses at(1) to resubmit itself if it's not
running during the operating window, otherwise it sets up an at job
that will send the script a SIGHUP (causing the running rsync to exit)
at the end of the operating window.


-- Alberto


********
Alberto Accomazzi  mailto:[EMAIL PROTECTED]
NASA Astrophysics Data System  http://adsabs.harvard.edu
Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu
60 Garden Street, MS 83, Cambridge, MA 02138 USA   





Re: Need for an --operating-window feature ???

2001-04-04 Thread Alberto Accomazzi


FYI, rsync has a good cleanup mechanism that kicks in when you send it
a SIGHUP.  It removes/renames temporary files as appropriate, sends a
signal to its child process, and exits.  I use this all the time to
gracefully stop transfers (see the pseudo-code in my previous message).


-- Alberto


In message [EMAIL PROTECTED], Sean 
Berry writes:

 
 It can do that in one of two ways: finish the current file, or back out
 the current file.  Finishing the current file may leave it running til the
 next time rsync runs (assuming it'll run out of cron).  Backing out the
 current file is probably what you want.  Would it make more sense (and I
 don't know whether rsync currently supports this in the way I think of)
 for rsync to back out the current file and exit gracefully if it received
 a signal 15?  This might be a functionality useful outside of the
 environment you have in mind.
 
 On Tue, 3 Apr 2001, Allen, John L. wrote:
 
  Date: Tue, 3 Apr 2001 08:39:17 -0400
  From: "Allen, John L." [EMAIL PROTECTED]
  To: 'Dirk Markwardt' [EMAIL PROTECTED]
  Cc: "'[EMAIL PROTECTED]'" [EMAIL PROTECTED]
  Subject: RE: Need for an --operating-window feature ???
  
  A cron job is fine for starting it, but I want it to stop
  on its own if it finds itself running outside its allowed window.
  (Obviously this is only really needed when there is a huge
  amount of data to sync, or when the network is really slow.)
  
  John. 
  
  -Original Message-
  From: Dirk Markwardt [mailto:[EMAIL PROTECTED]]
  Sent: Tuesday, April 03, 2001 03:38
  To: Allen, John L.
  Cc: '[EMAIL PROTECTED]'
  Subject: Re: Need for an --operating-window feature ???
  
  
  Hello John,
  
  AJL to have a --operating-window option where you could specify an
  AJL allowed time of operation by indicating two endpoints, perhaps like
  this
  
  AJL --operating-window 22:00-05:00
  
  AJL where the times are given in HH:MM 24-hour military time.  
  
  What about a cron-job ?
  
  at 22:00:  chmod 755 /usr/bin/rsync
  at 05:00:  chmod 644 /usr/bin/rsync
  
  Greetings
  Dirk
  -- 
  ---
  Dirk Markwardt
  Besselstr. 7
  38114 Braunschweig
  [EMAIL PROTECTED] 
  
  
  
 
 --
 Sean Berry works with many flavors of UNIX, but especially Solaris/SPARC and
 NetBSD.  His hobbies include graphics and raytracing.  He drinks coke mostly.
 His opinions are not necessarily those of his employers.  
 
 



********
Alberto Accomazzi  mailto:[EMAIL PROTECTED]
NASA Astrophysics Data System  http://adsabs.harvard.edu
Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu
60 Garden Street, MS 83, Cambridge, MA 02138 USA   





Re: temp files during copy

2001-05-17 Thread Alberto Accomazzi

In message [EMAIL PROTECTED] , Jim Ogi
lvie writes:

 Hi,
 
 I know rsync creates temp files in the destination directory  and
 then at some point renames them to the original file name.  Therefore
 the destination directories need to be larger than the source directories.
 
 I'm trying to find a way to calculate how much larger the destination
 directories
 need to be. How does rsync decide when to rename them?  Is it by directory?

rsync will transfer files one at a time, so you need to have at least
as much disk space as the largest file in a directory being syncronized.
If your files are large enough that this becomes a problem, I suggest
you use the -T option which makes rsync use a separate temporary
directory.


-- Alberto



Alberto Accomazzi  mailto:[EMAIL PROTECTED]
NASA Astrophysics Data System  http://adsabs.harvard.edu
Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu
60 Garden Street, MS 83, Cambridge, MA 02138 USA   





Re: unexpected EOF in read_timeout

2001-05-30 Thread Alberto Accomazzi


The unexpected EOF in read_timeout can happen for a variety of reasons, 
but it typically shows up when you have deadlock (no bytes sent across 
the wire, which could happen as a result of ssh blocking for instance) 
or a very large filesystem being syncronized (which could happen if the
receiving side is spending a long time creating the file list and a 
timeout has been specified).

Under both circumstances, one of the two rsync processes on the
receiving side notices that no IO has happened in the last N seconds,
so it sends a signal to the other process and quits.  (Now that I think
of it, it may be that the sender quits first and the receiver then gives
up -- you'll have to go through the code if you want to find out for sure).

If the problem that causes this is due to the timeout, the simple solution
is to increase it (--timeout option).  If it's due to the client or server
running out of memory trying to generate the file list, the solution would
be to break up the transfer in smaller chunks.  If it's due to deadlock,
you'll have to find another transport for the connection.

-- Alberto



In message [EMAIL PROTECTED], Phil Howard writes:

 Randy Kramer wrote:
 
  I'm a novice at rsync, so I can't tell you all the details, but, in
  general, the unexpected EOF in read_timeout (usually) means that
  something not so good happened on the server side.  On what server I was
  connecting to, I believe that the preprocessing by the server made some
  watchdog on the server side decide that the process was dead -- it then
  killed it, and then I got the error message.  I never proved this
  completely.
 
 I posted a while back with this problem and someone answered that the
 message existed in ssh and not in rsync.  I never verified it.
 
 But I do know that it started happening when I upgraded to 2.4.6.  But
 I also upgraded ssh around that time, so this was believable.
 
 
  At the time, because I was new to using rsync, I was using the -c option
  to force a full checksum comparison of the two files (because I thought
  that the files were not updating because the dates and times matched). 
  I stopped using the -c option and just made sure the dates and times did
  not match and that cured my problem with the unexpected EOF ... -- I
  believe because the server (and client) spent less time calculating
  checksums before starting to exchange data.
 
 Unfortunately, I'm not using -c and I do get these problems.  The thing
 is, they occur randomly.  I run some mirroing scripts and have coded the
 scripts to just repeat until a good status comes back, like:
 
 while ! rsync ; do echo oops, let's try that again; done
 
 -- 
 -
 | Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
 | [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/ |
 -
 




Alberto Accomazzi  mailto:[EMAIL PROTECTED]
NASA Astrophysics Data System  http://adsabs.harvard.edu
Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu
60 Garden Street, MS 83, Cambridge, MA 02138 USA   





Re: use rsync as a diff tool only

2001-06-01 Thread Alberto Accomazzi

In message [EMAIL PROTECTED], Lucy Hou writes:

 Hi all,
 
 I am wondering if I can use rsync as a diff tool only, not really
 copying files over to destination. I tried -n option, it doesn't seem to
 have done any content comparsion on the file(s), it merely list the file
 names.

Lucy,

check out the rdiff utility, distributed with librsync, which is available
at http://rproxy.sourceforge.net/download.html.  It does exactly what you
describe to local files (i.e. no network transport is built in).

-- Alberto



Alberto Accomazzi  mailto:[EMAIL PROTECTED]
NASA Astrophysics Data System  http://adsabs.harvard.edu
Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu
60 Garden Street, MS 83, Cambridge, MA 02138 USA   





Re: rsync+ patch

2001-07-13 Thread Alberto Accomazzi

In message [EMAIL PROTECTED], Martin Pool writes:

 I'm inclined to apply this: at the very least, it doesn't look like it
 could damage anything else.  Any other opinions?

Yes please!  I would personally love to see that functionality supported
in the stock rsync distributions.  Since the patch implements a feature 
that is periodically requested by users, it seems to me there is a good 
reason for inclusion.


-- Alberto



Alberto Accomazzi  mailto:[EMAIL PROTECTED]
NASA Astrophysics Data System  http://adsabs.harvard.edu
Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu
60 Garden Street, MS 83, Cambridge, MA 02138 USA   





Re: Rsync: Re: patch to enable faster mirroring of large filesystems

2001-11-27 Thread Alberto Accomazzi
 filename.
 
 I still want to write a --files-from option sometime, and I'm still waiting
 for somebody who has an application that could use it to do some
 performance measurements with rsync 2.3.2.  I agree that --files-from has
 value on its own without performance implications, but somebody has to want
 it badly enough to put it in a little effort if they'd like me to implement
 it.


-- Alberto



Alberto Accomazzi  mailto:[EMAIL PROTECTED]
NASA Astrophysics Data System  http://adsabs.harvard.edu
Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu
60 Garden Street, MS 83, Cambridge, MA 02138 USA   





Re: Rsync: Re: patch to enable faster mirroring of large filesyst ems

2001-11-29 Thread Alberto Accomazzi


It seems to me the new options --read-batch and --write-batch should go 
a long way towards reducing any time spent in creation of checksums and
file lists, so you should definitely give 2.4.7pre4 a try.  This is just
a guess since I haven't actually used those options myself, but seems
worth looking into.

BTW, could we please have some real documentation about these options?  What's
in the man page doesn't come nearly close to telling what is cached and
how to make use of it.  Some examples of how people are using this option
may be illuminating for those of us who don't have the time or inclination 
to figure it out from the code.


-- Alberto


In message [EMAIL PROTECTED],
 Keating, Tim writes:

 I was at first, but then removed it. The results were still insufficiently
 fast.
 
  Were you using the -c option of rsync?  It sounds like you 
  were and it's
  extremely slow.  I knew somebody who once went to 
  extraordinary lengths to
  avoid the overhead of -c, making a big patch to rsync to 
  cache checksums,
  when all he had to do was not use -c.
 




Alberto Accomazzi  mailto:[EMAIL PROTECTED]
NASA Astrophysics Data System  http://adsabs.harvard.edu
Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu
60 Garden Street, MS 83, Cambridge, MA 02138 USA   





Re: Rsync: Re: patch to enable faster mirroring of large filesystems

2001-11-29 Thread Alberto Accomazzi

In message [EMAIL PROTECTED], Dave Dykstra writes:

 On Thu, Nov 29, 2001 at 11:02:07AM -0500, Alberto Accomazzi wrote:
 ...
  These numbers show that reading the filenames this way rather than using
  the code in place to deal with the include/exclude list cuts the startup
  time down to 0 (from 1hr).  The actual sending of the filenames is down
  from 2h 15m to 1h 40m.  The reason this isn't better is due to the fact
  that turning buffering on only helps the client, while the server still
  has to do unbuffered reads because of the way the list is sent across. 
 
 Are you sure about that?  I don't see any unbuffered reads.

Actually I'm not sure that the code intends to do unbuffered reads,
but that's what's happening for sure from the trussing I've done on 
the server side.  I'm not sure how the buffering should take place 
since the include/exclude file names are sent over the wire one at
a time rather than as a chunk of data, but maybe buffering is done at
a higher level.

 2.3.2 did have the read_check() hack which was there to prevent SSH pipes
 from getting stuck, maybe that's what you're seeing.  That was taken out
 in 2.4.0 so maybe that would greatly speed it up.

Possible.  Another reason why I don't think it's worth spending any more
time patching 2.3.2 anyways...

  As far as I can tell there is no way to get around the buffering without
  a protocol change or a different approach to sending this list.
  
  Given the data above, I think implementing --files-from this way would
  be the wrong way to go, for a number of reasons:
 
 I've been starting to think along those lines too.  It should be a protocol
 change to just send the files and not treat it like excludes.  In fact,
 the file list is normally sent from the sender to the receiver, but if
 the client is the receiver maybe we could figure out a way to have
 --files-from only send the list in the other direction.

Right.  The point is that when Tridge wrote the code he was obviously 
envisioning a client sending a short exclude list to the server and then
the server sending a massive list back to the client.  Therefore no
optimization nor compression has ever been included to ensure the fast
trasfer of the exclude list, so patching things this way goes against 
the original design of the protocol.  So probably the best thing to do
is stick the file list right after the exclude list, turning on compression
if -z has been selected and bump up the protocol so that we can be 
backwards compatible.  At least that's my take.

-- Alberto



Alberto Accomazzi  mailto:[EMAIL PROTECTED]
NASA Astrophysics Data System  http://adsabs.harvard.edu
Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu
60 Garden Street, MS 83, Cambridge, MA 02138 USA   





Re: [path] module options with SSH

2002-02-06 Thread Alberto Accomazzi


The discussion about syntax for remote file specification and the exchange
between Martin and Wayne about configure options for rsh make me wonder if
we should push some alternative syntax for specifying the transport protocol
to be used by rsync.  

I, for one, always stick to the rsync://host/module syntax when pulling from
an rsync server, and have often wished that the same syntax were available 
when doing a push.  I find the URL-style syntax easy to remember and understand,
while the :: seems much less intuitive (it actually looks perlish to me
because of the way modules are specified in perl).
Among other things, I notice that SUN uses the url syntax in its man pages
describing NFS (they use nfs://host[:port]/pathname).

So what came to mind is to have rsync recognize and use both for push and
pull remote specifications of the form:

rsync://host/module/file
ssh://[username@]host/dir/file
rsh://[username@]host/dir/file

I'm not crazy about the last two, but thought of them while reading messages
about ssh/rsh issues.  Hmm... one problem that this wouldn't solve is the
use of ssh-over-rsyncd that somebody has proposed, though.  Also I'm not 
sure how I would handle the passing of additional options to the external
transport program (what we do now with -e 'shell [OPTIONS]').  Ok, so 
maybe this is not so hot, but rsync:// is cool, IMHO.

-- Alberto


In message [EMAIL PROTECTED], Dave Dykstra writes:

  Am I understanding you correctly when you say ssh and -daemon are not
  working together when you use the :: syntax or are you saying that they just
  don't period regardless of : or ::?
 
 : syntax uses rsh (or ssh if you use -e ssh) to run another copy of the
 rsync program on the remote side.  :: syntax skips that completely,
 ignores -e, and instead connects to a daemon separately started to listen
 on port 873 on the remote host.  In the future, when JD Paul's patch is
 accepted, the expectation will be that if you use :: and -e ssh
 together it will still use ssh to connect but it will run rsync -daemon
 interactively so it can honor your rsyncd.conf.  
 
 Does that make it clear?
 
 
  Because, I do not have RSH, only SSH on my server and it does work for me. 
I
  do have to use the SSH Verion 2 as I wasn't able to do it with the version 
1
  and I use DSA not RSA.
 
 That doesn't matter; :: syntax bypasses both RSH and SSH.




Alberto Accomazzi  mailto:[EMAIL PROTECTED]
NASA Astrophysics Data System  http://adsabs.harvard.edu
Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu
60 Garden Street, MS 83, Cambridge, MA 02138 USA   





Re: List of rsync output / error messages

2002-02-06 Thread Alberto Accomazzi


Joseph,

check out the header file errcode.h in the rsync distribution.
That file and the structure found in log.c map the system exit codes to 
the error messages you refer to, so the best way to programmatically
catch errors is simply to check the exit status returned by rsync.

-- Alberto


In message [EMAIL PROTECTED], Joseph Annino writes:

 Is there a good place to get information about the list of all possible
 output and error messages rsync generates?  Or should I just muck around the
 source code (which I haven't looked at yet) and find them?
 
 I am doing something where I would like to parse rsync's output using Perl
 into a set of data structures.  I already have something that works under
 normal conditions.  Eventually I'd like to use that data as part of building
 a Perl/Tk interface.  Of course to parse the output successfully, I need to
 know all the possibilities as surprises can throw things out of whack.
 
 And this is just an idea, but ways to make rsync's output more easily
 parseable, and more verbose in terms to reporting information that would be
 useful for say making a progress bar would be nice to discuss.
 
 Thanks.
 
 
 -- 
 Joseph Annino Consulting - Perl, PHP, MySQL, Oracle, etc.
 [EMAIL PROTECTED] - http://www.jannino.com
 




Alberto Accomazzi  mailto:[EMAIL PROTECTED]
NASA Astrophysics Data System  http://adsabs.harvard.edu
Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu
60 Garden Street, MS 83, Cambridge, MA 02138 USA   





Re: Incremental Diffs?

2002-03-07 Thread Alberto Accomazzi

In message [EMAIL PROTECTED], Kim Scarborough 
writes:

   I'm using it to backup files from one computer to another, and it
   works exactly as I thought it would, except that it seems to be
   copying entire files over when they've change rather than the
   differences.
 
  What specifically leads you to that conclusion?
 
 I have it set to extra verbose, and I've been watching the files transfer
 over. When I append 2K to a 100MB text file and re-rsync, it's pretty
 obvious it's transferring 100MB, not 2K + whatever overhead the diff takes
 up.


I wouldn't be so sure.  Add the option --stats to the rsync command line
and see what it says.  AFAIK those numbers are correct.


-- Alberto




Alberto Accomazzi  mailto:[EMAIL PROTECTED]
NASA Astrophysics Data System  http://adsabs.harvard.edu
Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu
60 Garden Street, MS 83, Cambridge, MA 02138 USA   


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



timeout error in rsync-2.5.5

2002-04-16 Thread Alberto Accomazzi


Dear all,

I've been trying to track down a problem with timeouts when pulling data from
an rsync daemon and I have now run out of any useful ideas.
The problem manifests itself when I try to transfer a large directory tree
on a slow client machine.  What happens then is that the client rsync process
successfully receives the list of files from the server, then begins checking
the local directory tree, taking its sweet time.  Since I know that the process
is quite slow, I invoke rsync with a timeout of 5 hours to avoid dropping the
connection.  Howerver, after a little over 1 hour (usually 66 minutes or so), 
the server process simply gives up.

I have verified the problem under rsync versions 2.3.2, and 2.4.6 and up 
(including 2.5.5), testing a few different combinations of client/server
versions (althoug the client is always a linux box and the server always
a solaris box).  It looks to me as if something kicks the server out of
the select() call at line 202 of io.c (read_timeout) despite the timeout
being correctly set to 18000 seconds.  Can anybody think of what the 
problem may be?  See all the details below.

Thanks,

-- Alberto



CLIENT:

[ads@ads-pc ~]$ rsync --version
rsync  version 2.5.5  protocol version 26
Copyright (C) 1996-2002 by Andrew Tridgell and others
http://rsync.samba.org/
Capabilities: 64-bit files, socketpairs, hard links, symlinks, batchfiles, 
  IPv6, 64-bit system inums, 64-bit internal inums

rsync comes with ABSOLUTELY NO WARRANTY.  This is free software, and you
are welcome to redistribute it under certain conditions.  See the GNU
General Public Licence for details.

[ads@ads-pc ~]$ rsync -ptv --compress --suffix .old --timeout 18000 -r --delete 
rsync://adsfore.harvard.edu:1873/text-4097/. /mnt/fwhd0/abstracts/phy/text/
receiving file list ... done
rsync: read error: Connection reset by peer
rsync error: error in rsync protocol data stream (code 12) at io.c(162)
rsync: connection unexpectedly closed (17798963 bytes read so far)
rsync error: error in rsync protocol data stream (code 12) at io.c(150)


SERVER:

adsfore-15: /proj/ads/soft/utils/src/rsync-2.5.5/rsync --version
rsync  version 2.5.5  protocol version 26
Copyright (C) 1996-2002 by Andrew Tridgell and others
http://rsync.samba.org/
Capabilities: 64-bit files, socketpairs, hard links, symlinks, batchfiles, 
  no IPv6, 64-bit system inums, 64-bit internal inums

rsync comes with ABSOLUTELY NO WARRANTY.  This is free software, and you
are welcome to redistribute it under certain conditions.  See the GNU
General Public Licence for details.

from the log file:

2002/04/16 08:52:48 [18996] rsyncd version 2.5.5 starting, listening on port 1873
2002/04/16 09:39:01 [988] rsync on text-4097/. from ads-pc (131.142.43.117)
2002/04/16 10:51:36 [988] rsync: read error: Connection timed out
2002/04/16 10:51:36 [988] rsync error: error in rsync protocol data stream (code 12) 
at io.c(162)

from a truss:

adsfore-14: truss -d -p 988
Base time stamp:  1018964639.2848  [ Tue Apr 16 09:43:59 EDT 2002 ]
poll(0xFFBE4E90, 1, 1800)   (sleeping...)
4057.4093   poll(0xFFBE4E90, 1, 1800)   = 1
4057.4098   read(3, 0xFFBE5500, 4)  Err#145 ETIMEDOUT
4057.4103   time()  = 1018968696
4057.4106   getpid()= 988 [18996]
4057.4229   write(4,  2 0 0 2 / 0 4 / 1 6   1.., 66)  = 66
4057.4345   sigaction(SIGUSR1, 0xFFBE4D20, 0xFFBE4DA0)  = 0
4057.4347   sigaction(SIGUSR2, 0xFFBE4D20, 0xFFBE4DA0)  = 0
4057.4349   time()  = 1018968696
4057.4350   getpid()= 988 [18996]
4057.4352   write(4,  2 0 0 2 / 0 4 / 1 6   1.., 98)  = 98
4057.4357   llseek(0, 0, SEEK_CUR)  = 0
4057.4359   _exit(12)



Alberto Accomazzi  mailto:[EMAIL PROTECTED]
NASA Astrophysics Data System  http://adsabs.harvard.edu
Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu
60 Garden Street, MS 83, Cambridge, MA 02138 USA   


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: timeout error in rsync-2.5.5

2002-05-06 Thread Alberto Accomazzi


Dave,

I understand how the timeout works.  The problem here is that the traversing
of the directory tree on the client side does indeed take more than 1 hour,
during which no bytes are exchanged on the wire between client and server.
So I do know for a fact that the read call on the server side needs a very
long timeout because the client has to traverse a directory tree of 1.5
million files stored on a slow external drive.  The question is why does
the server process give up after 66 minutes or so even though the timeout
has been set to 5 hourse (see the system call).  The client machine is
now behind a firewall which I guess complicates things a bit, but from what
I can tell there is nothing on the LAN that forces the connection to be 
dropped; I have checked the firewall settings to no avail.  

Anyone other ideas?

-- Alberto


P.S. Sorry, but neither one of the machines can be accessed from the
 outside.


In message [EMAIL PROTECTED], Dave Dykstra writes:

 You shouldn't need to have such a long timeout.  The timeout is not over
 the whole length of the run, only the time since the last data was
 transferred.  It's a mystery to me why it quits after 66 minutes rather
 than 5 hours, but the real question is why it stops transferring data for
 so long.  Perhaps something went wrong with the network.  I can't connect
 to that server to try it, perhaps it is behind a firewall.
 
 - Dave Dykstra
 
 On Tue, Apr 16, 2002 at 12:36:03PM -0400, Alberto Accomazzi wrote:
  
  Dear all,
  
  I've been trying to track down a problem with timeouts when pulling data from
  an rsync daemon and I have now run out of any useful ideas.
  The problem manifests itself when I try to transfer a large directory tree
  on a slow client machine.  What happens then is that the client rsync process
  successfully receives the list of files from the server, then begins checking
  the local directory tree, taking its sweet time.  Since I know that the process
  is quite slow, I invoke rsync with a timeout of 5 hours to avoid dropping the
  connection.  Howerver, after a little over 1 hour (usually 66 minutes or so), 
  the server process simply gives up.
  
  I have verified the problem under rsync versions 2.3.2, and 2.4.6 and up 
  (including 2.5.5), testing a few different combinations of client/server
  versions (althoug the client is always a linux box and the server always
  a solaris box).  It looks to me as if something kicks the server out of
  the select() call at line 202 of io.c (read_timeout) despite the timeout
  being correctly set to 18000 seconds.  Can anybody think of what the 
  problem may be?  See all the details below.
  
  Thanks,
  
  -- Alberto
  
  
  
  CLIENT:
  
  [ads@ads-pc ~]$ rsync --version
  rsync  version 2.5.5  protocol version 26
  Copyright (C) 1996-2002 by Andrew Tridgell and others
  http://rsync.samba.org/
  Capabilities: 64-bit files, socketpairs, hard links, symlinks, batchfiles, 
IPv6, 64-bit system inums, 64-bit internal inums
  
  rsync comes with ABSOLUTELY NO WARRANTY.  This is free software, and you
  are welcome to redistribute it under certain conditions.  See the GNU
  General Public Licence for details.
  
  [ads@ads-pc ~]$ rsync -ptv --compress --suffix .old --timeout 18000 -r --de
lete rsync://adsfore.harvard.edu:1873/text-4097/. /mnt/fwhd0/abstracts/phy/text
/
  receiving file list ... done
  rsync: read error: Connection reset by peer
  rsync error: error in rsync protocol data stream (code 12) at io.c(162)
  rsync: connection unexpectedly closed (17798963 bytes read so far)
  rsync error: error in rsync protocol data stream (code 12) at io.c(150)
  
  
  SERVER:
  
  adsfore-15: /proj/ads/soft/utils/src/rsync-2.5.5/rsync --version
  rsync  version 2.5.5  protocol version 26
  Copyright (C) 1996-2002 by Andrew Tridgell and others
  http://rsync.samba.org/
  Capabilities: 64-bit files, socketpairs, hard links, symlinks, batchfiles, 
no IPv6, 64-bit system inums, 64-bit internal inums
  
  rsync comes with ABSOLUTELY NO WARRANTY.  This is free software, and you
  are welcome to redistribute it under certain conditions.  See the GNU
  General Public Licence for details.
  
  from the log file:
  
  2002/04/16 08:52:48 [18996] rsyncd version 2.5.5 starting, listening on por
t 1873
  2002/04/16 09:39:01 [988] rsync on text-4097/. from ads-pc (131.142.43.117)
  2002/04/16 10:51:36 [988] rsync: read error: Connection timed out
  2002/04/16 10:51:36 [988] rsync error: error in rsync protocol data stream 
(code 12) at io.c(162)
  
  from a truss:
  
  adsfore-14: truss -d -p 988
  Base time stamp:  1018964639.2848  [ Tue Apr 16 09:43:59 EDT 2002 ]
  poll(0xFFBE4E90, 1, 1800)   (sleeping...)
  4057.4093   poll(0xFFBE4E90, 1, 1800)   = 1
  4057.4098   read(3, 0xFFBE5500, 4)  Err#145 ETI
MEDOUT
  4057.4103   time()  = 101896869
6
  4057.4106

rsyncd listing of directories

2002-05-22 Thread Alberto Accomazzi


I just took a look at the 2.5.5 codebase to see how easy it would be to
write a little driver script that downloads a big directory tree from an
rsync daemon the chunky way (get a list of a module's subdirectories and
do the transfer by subdirectory).  The reason for doing this is obvious
when you have large directory trees, as is the case for many of us.
Unfortunately the way list_only is currently implemented makes the whole
idea useless since it forces the daemon to recurse the target directory
tree anyways.  Here's the code:

in options.c:

/* this is a complete hack - blame Rusty 

   this is a hack to make the list_only (remote file list)
   more useful */
if (list_only  !recurse) 
argstr[x++] = 'r';


in exclude.c:

/* This is a complete hack - blame Rusty.
 *
 * FIXME: This pattern shows up in the output of
 * report_exclude_result(), which is not ideal. */
if (list_only  !recurse) {
add_exclude(/*/*, 0);
}


So I'm going to bite and blame Rusty at this point and ask the question: 
why was this implemented this way?  I can't think of a good reason why.  
I'm happy to try and work on a patch if there is a consensus that this 
is desireable.

-- Alberto




Alberto Accomazzi  mailto:[EMAIL PROTECTED]
NASA Astrophysics Data System  http://adsabs.harvard.edu
Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu
60 Garden Street, MS 83, Cambridge, MA 02138 USA   


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Rsync'ing lists of files

2002-06-11 Thread Alberto Accomazzi


Just so that we don't forget the lessons from the past, let me point
out that we had discussion and testing done on this subject back in
November, with mixed results (i.e. YMMV):
   http://lists.samba.org/pipermail/rsync/2001-November/005398.html
I think the consensus from that experiment was that implementing the
option using the include/exclude mechanism was not the way to go
(correct me if I'm wrong Dave).

Andrew Schorr's patch does this differently but from what I can tell
it would only work when uploading files to a server (which is the 
opposite of what my experiments with --files-from were):
   http://lists.samba.org/pipermail/rsync/2001-November/005272.html

Since it seems that different people want this option for different
purposes, we need to make sure that it gets implemented in a sensible
way, with some testing being done to ensure that we still have decent
performance and that it works in all cases (sending/receiving files).

So I think our strategy should be to bug Dave Dykstra until he gives
up and writes the patch :-)

-- Alberto


Stephane Paltani wrote:

 Dave Dykstra wrote:
  
  Sigh, another request for the --files-from I promised to write over 6
  months ago, but I've been so overloaded at work lately that I don't know if
  I'm ever going to get to it.  Perhaps someone else will have to do it.
 
 He he, happy to see a general consensus for this feature!
 
  It turns out that back in rsync 2.3.2 and earlier there was an optimization
  (which I wrote and actually was the primary reason that I volunteered to be
  maintainer of rsync for a while) that kicked in when there was list of
  includes with no wildcards followed by an --exclude '*', and there was no
  --delete.  Instead of recursing through the files and doing comparisons, it
  would just directly open the files in the include list.  It only had to be
 on the sending side, you might want to try 2.3.2 on your sending side to
  see if you get a significant performance boost.  Andrew Tridgell took it
  out in 2.4.0 because he didn't like how it changed the usual semantics of
  requiring all parent directories to be explicitly listed in the include
  list.
 
 Whoops! That did the trick for me! It took me 6 minutes to transfer 250 GB!
 Too bad it has been turned down. I have the impression it would satisfy
 most, if not all, --files-from lobbyists.



Alberto Accomazzi  mailto:[EMAIL PROTECTED]
NASA Astrophysics Data System  http://adsabs.harvard.edu
Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu
60 Garden Street, MS 83, Cambridge, MA 02138 USA   


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



rsync via floppy

2002-08-12 Thread Alberto Accomazzi


Our project is considering supporting a mirror site which is going to be
off the network (essentially a stand-alone mirror for a local LAN in a
place without internet connectivity).  So I am in the (unfortunate) position
of having to decide how to do this (or if this can be done at all).

The current plan is to set up a PC running linux with a 120GB drive and a
DVD reader on the remote site and ship periodical updates to our dataset
that can be used to patch the local distribution, then run some updating
procedures to make the new database live.  I can think of a well-defined
plan to carry out the updates, but I'm weary about the lack of feedback about
the actual updating procedures (what if a filesystem fills up or a command
fails for whatever reason?).  I also don't have a lot of time to build a
customized system for doing this rsync on a floppy myself, so I'm hoping that
somebody on the list has some suggestions or tool that can be useful.

BTW, I think that given the nature of our dataset file patching a la rsync
is not strictly necessary, since we can probably fit a fresh copy of all
files that have changed on a DVD.  The problem I'm mostly worried about is
keeping enough metadata on both ends to reliably figure out the updating
strategy.

Thanks,

-- Alberto




Alberto Accomazzihttp://cfa-www.harvard.edu/~alberto
NASA Astrophysics Data System  http://adswww.harvard.edu
Harvard-Smithsonian Center for Astrophysics   [EMAIL PROTECTED]
60 Garden Street, MS 83, Cambridge, MA 02138 USA   

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: rsync 2.6.0 - suspected memory leak bug

2004-01-22 Thread Alberto Accomazzi
 On Wed, Jan 21, 2004 at 03:35:37PM +, Kelly Garrett wrote:

 Does anyone know how to build a version of the kernel that either 
does no disk
 cacheing (we have very fast RAID processors and SCSI disks on the 
machine) or
 limit the amount of cache that the system will allocate for disk?

Kelly,

we have a similar setup here (RH8 with latest bigmem kernel on a machine 
with 4GB ram and 1.4TB fs) and see a similar behaviour.  Howevever, long 
ago I have accepted this as a feature of the linux kernel and I've yet 
to find that this causes any performance issues.  It's true that if your 
file access is truly random caching the filesystem in RAM doesn't help, 
but I can't imagine that this is a significant performance hit.  The 
system simply releases the cache as needed so that even if at any time 
you're seeing 100% memory usage, when a new process needs memory the RAM 
cache will give way.

I'm sure that there are ways to override this behaviour (see for 
instance the linux kernel hacking howto for hints) but I doubt that this 
is worth the effort unless you need to squeeze every last bit of 
performance out of your box.  And if you do, I would suggest looking at 
installing a 2.6.x kernel instead.

-- Alberto

P.S. I found that in our case one thing that actually helped quite a bit 
with overall performance was tuning some kernel parameters for the 
particular raid controller we have (3ware IDE raid).  I mention this 
because if you start going down the performance tuning path there are a 
number of things that you should look at.



**
Alberto Accomazzi, NASA Astrophysics Data Systemhttp://ads.harvard.edu
Harvard-Smithsonian Center for Astrophysics http://cfa-www.harvard.edu
60 Garden St, MS 31, Cambridge, MA 02138, USA   [EMAIL PROTECTED]
**
--
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


--link-dest not working with rsync daemon?

2004-01-26 Thread Alberto Accomazzi
I am puzzled as to why I can't get the option --link-dest to work 
properly.  When I use this option when both source and destinations are 
on a local filesystem the hard-linking of the target against the 
link-dest directory does work, but when the source is a remote directory 
(via ssh or rsync server) hard links are not created.  I suspect it has 
something to do with setting the correct timestamp on the files, since 
the server and client machines have clocks with a large offset, but why 
would that be the case?

The version of rsync I'm using is:
rsync  version 2.6.0  protocol version 27
Copyright (C) 1996-2004 by Andrew Tridgell and others
http://rsync.samba.org/
Capabilities: 64-bit files, socketpairs, hard links, symlinks, batchfiles,
  IPv6, 64-bit system inums, 64-bit internal inums
The distribution is fedora core 1 with kernel 2.4.22-1.2140.nptl.  Below 
you can see the results of my test.  Any help is appreciated.

Thanks,

-- Alberto

[EMAIL PROTECTED] ~/rsynctest]$ rsync 
rsync://adswon.cfa.harvard.edu/ast/load/test/
drwxr-xr-x4096 2004/01/26 15:17:47 .
-rw-r--r--   4 2004/01/26 15:17:47 foo

[EMAIL PROTECTED] ~/rsynctest]$ date
Mon Jan 26 14:59:16 EST 2004
[EMAIL PROTECTED] ~/rsynctest]$ cat ./runtest.sh
#!/bin/sh
/bin/rm -rf orig new

# echo getting original copy
$HOME/mirror/bin/i486-linux/rsync-2.6.0 --timeout 1800 --delete -az \
rsync://adswon.cfa.harvard.edu/ast/load/test/ \
`pwd`/orig/
# echo getting second copy
$HOME/mirror/bin/i486-linux/rsync-2.6.0 --timeout 1800 --delete -az $@ \
--link-dest=`pwd`/orig/ \
rsync://adswon.cfa.harvard.edu/ast/load/test/ \
`pwd`/new/
/bin/ls -l `pwd`/orig `pwd`/new

[EMAIL PROTECTED] ~/rsynctest]$ ./runtest.sh -vvv
opening tcp connection to adswon.cfa.harvard.edu port 873
receiving file list ...
recv_file_name(.)
recv_file_name(foo)
received 2 names
done
recv_file_list done
get_local_name count=2 /home/ads/rsynctest/new/
created directory /home/ads/rsynctest/new
make_file(.,*,2)
expand file_list to 4000 bytes, did move
send_file_list done
deleting in .
recv_files(2) starting
generator starting pid=1001 count=2
delta transmission enabled
recv_generator(.,0)
set modtime of . to (1075148267) Mon Jan 26 15:17:47 2004
./
recv_generator(foo,1)
generating and sending sums for 1
count=1 rem=4 blength=700 s2length=2 flength=4
generate_files phase=1
recv_files(foo)
recv mapped foo of size 4
foo
got file_sum
renaming .foo.cJC1Ra to foo
set modtime of foo to (1075148267) Mon Jan 26 15:17:47 2004
recv_files phase=1
generate_files phase=2
recv_generator(.,0)
set modtime of . to (1075148267) Mon Jan 26 15:17:47 2004
recv_files finished
wrote 127 bytes  read 139 bytes  532.00 bytes/sec
total size is 4  speedup is 0.02
_exit_cleanup(code=0, file=main.c, line=1064): about to call exit(0)
/home/ads/rsynctest/new:
total 4
-rw-r--r--1 ads  ads 4 Jan 26  2004 foo
/home/ads/rsynctest/orig:
total 4
-rw-r--r--1 ads  ads 4 Jan 26  2004 foo
**
Alberto Accomazzi, NASA Astrophysics Data Systemhttp://ads.harvard.edu
Harvard-Smithsonian Center for Astrophysics http://cfa-www.harvard.edu
60 Garden St, MS 31, Cambridge, MA 02138, USA   [EMAIL PROTECTED]
**
--
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: --link-dest not working with rsync daemon?

2004-01-26 Thread Alberto Accomazzi
Wayne Davison wrote:
On Mon, Jan 26, 2004 at 04:14:14PM -0500, Alberto Accomazzi wrote:

I am puzzled as to why I can't get the option --link-dest to work 
properly.  When I use this option when both source and destinations are 
on a local filesystem the hard-linking of the target against the 
link-dest directory does work, but when the source is a remote directory 
(via ssh or rsync server) hard links are not created.


This is something that is fixed in the CVS version.  You can work around
the problem in 2.6.0 by not specifying (or implying) the -o (--owner)
option (when running as non-root).  So, change your -a option into the
options -rlptgD, and it should work fine.
Wayne,

indeed you're correct on this one.  Guess I should have tried CVS before 
starting to whine ;-)

Let me take this opportunity to thank you personally for taking on the 
task of pushing out the latest rsync release and for your and jw's 
continuing work on this.  I know a lot of people have contributed 
patches and ideas to rsync but it does take a few good men to pull it 
all together.  I think you guys are doing a top-notch job.  So I think I 
speak for all the readers on the list when I say we are all *very* 
grateful for all you've done and are doing.

And while I'm still on the line: do you have any ideas about upcoming 
releases?  There was some discussion prior to releasing 2.6.0 about what 
should be tackled next, but I'm not sure there was consensus beyond what 
became 2.6.0.

Thanks,

-- Alberto

--
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: --link-dest not working with rsync daemon?

2004-01-27 Thread Alberto Accomazzi
Wayne Davison wrote:

I was just mentioning to J.W. that I thought we had so much good stuff
already in CVS that we should try to button things up in the next month
or so, really hammer on it a while, and then crank out a new release.
There have been a lot of optimizations that make it use less CPU and
less memory, which I like a lot.  I do think the changes have been
pretty significant, so we obviously need to do some testing to make
sure that we haven't busted something important.  However, it's looking
really stable to me so far -- I run it for all my rsyncing.
Wayne,

ok, I'm on the hammering CVS bandwagon, so I just downloaded and 
compiled the latest nightly snapshot on a redhat 7.3 box and I have a 
small problem to report.  I configured rsync using ./configure 
--with-included-popt and got a bunch of warnings (see below).

Is this worth worrying about?  I'm not sure how many people use the 
built-in popt these days...

-- Alberto

[EMAIL PROTECTED] rsync-HEAD-20040127-1010GMT]$ uname -a
Linux adsfife 2.4.20-24.7smp #1 SMP Mon Dec 1 13:03:45 EST 2003 i686 unknown
[EMAIL PROTECTED] rsync-HEAD-20040127-1010GMT]$ gcc --version
2.96
[EMAIL PROTECTED] rsync-HEAD-20040127-1010GMT]$ make
[...]
gcc -I. -I. -g -O2 -DHAVE_CONFIG_H -Wall -W -I./popt  -c popt/popt.c -o 
popt/popt.o
popt/popt.c: In function `poptAddAlias':
popt/popt.c:1058: warning: unused parameter `flags'
gcc -I. -I. -g -O2 -DHAVE_CONFIG_H -Wall -W -I./popt  -c 
popt/poptconfig.c -o popt/poptconfig.o
popt/poptconfig.c: In function `poptReadDefaultConfig':
popt/poptconfig.c:162: warning: unused parameter `useEnv'
gcc -I. -I. -g -O2 -DHAVE_CONFIG_H -Wall -W -I./popt  -c popt/popthelp.c 
-o popt/popthelp.o
popt/popthelp.c: In function `displayArgs':
popt/popthelp.c:20: warning: unused parameter `foo'
popt/popthelp.c:22: warning: unused parameter `arg'
popt/popthelp.c:22: warning: unused parameter `data'
popt/popthelp.c: In function `getArgDescrip':
popt/popthelp.c:87: warning: unused parameter `translation_domain'
popt/popthelp.c: In function `singleOptionDefaultValue':
popt/popthelp.c:118: warning: unused parameter `translation_domain'
popt/popthelp.c: In function `poptPrintHelp':
popt/popthelp.c:478: warning: unused parameter `flags'
popt/popthelp.c: In function `poptPrintUsage':
popt/popthelp.c:637: warning: unused parameter `flags'

**
Alberto Accomazzi, NASA Astrophysics Data Systemhttp://ads.harvard.edu
Harvard-Smithsonian Center for Astrophysics http://cfa-www.harvard.edu
60 Garden St, MS 31, Cambridge, MA 02138, USA   [EMAIL PROTECTED]
**
--
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


file has vanished bug [rsync-HEAD-20040127-1010GMT]

2004-01-27 Thread Alberto Accomazzi
Just ran into this bug when running the latest snapshot from CVS: when 
rsyncing from two source directories into a third one, rsync gets 
confused about which source file is from which directory, resulting in a 
file vanished error.  See test script below.

Also, is there any consensus on whether using multiple source modules 
when pulling from an rsync daemon is going to be ok?  I recall some 
discussion on escaping spaces or quoting them in the past but I'm not 
sure if anything was decided.  What I'm referring to is this case:

rsync -av rsync://server/'module1 module2 module3' dest/

Right now the latest CVS still supports this.

Thanks,

-- Alberto

-
#!/bin/sh
[ -d target ]  /bin/rm -rf target
if [ ! -d one ] ; then
mkdir one
touch one/foo
touch one/zoo
fi
if [ ! -d two ] ; then
mkdir two
touch two/bar
fi
./rsync-2.6.1 -avv one/ two/ target/
/bin/ls -l one two target

[EMAIL PROTECTED] ~/tmp]$ ./runtest.sh
building file list ... done
created directory target
./
bar
file has vanished: /home/ads/tmp/two/foo
file has vanished: /home/ads/tmp/two/zoo
wrote 150 bytes  read 80 bytes  460.00 bytes/sec
total size is 0  speedup is 0.00
rsync warning: some files vanished before they could be transfered (code 
24) at
main.c(628)
one:
total 0
-rw-rw-r--1 ads  ads 0 Jan 27 16:22 foo
-rw-rw-r--1 ads  ads 0 Jan 27 16:22 zoo

target:
total 0
-rw-rw-r--1 ads  ads 0 Jan 27 16:22 bar
two:
total 0
-rw-rw-r--1 ads  ads 0 Jan 27 16:22 bar
--
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Change in reporting for --dry-run in 2.6.x

2004-01-28 Thread Alberto Accomazzi
I just noticed that there is an extra blank line in the output generated 
by rsync when the --dry-run (-n) flag is used.  This seems to have 
started with 2.6.0.  Is this desired?  The reason why I'm asking is 
because I use scripts that parse the output from rsync and little 
modifications in verbosity can make or break things easily.

Thanks,

-- Alberto

[EMAIL PROTECTED] bin]$ rsync-2.5.7 -an 
rsync://adswon.cfa.harvard.edu/pre/load/current/ 
/home/ads/abstracts/pre/latest/
receiving file list ... done
wrote 78 bytes  read 1204 bytes  2564.00 bytes/sec
total size is 457622413  speedup is 356959.76

[EMAIL PROTECTED] bin]$ rsync-2.6.1 -an 
rsync://adswon.cfa.harvard.edu/pre/load/current/ 
/home/ads/abstracts/pre/latest/
receiving file list ... done

wrote 78 bytes  read 1204 bytes  2564.00 bytes/sec
total size is 457622413  speedup is 356959.76
--
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: batch-mode fixes [was: [PATCH] fix read-batch SEGFAULT]

2004-05-17 Thread Alberto Accomazzi
Chris,
to put things in the right prespective, you should read (if you haven't 
done so already) the original paper describing the design behind batch 
mode.  The design and implementation of this functionality goes back to 
a project called the Internet2 Distributed Storage Infrastructure 
(I2-DSI).  As part of that project, the authors created a modified 
version of rsync (called rsync+) which had the capability of creating 
these batch sets for mirroring.  Here are a couple of URLs describing 
the ideas and motivation behind it:
http://www.ils.unc.edu/i2dsi/unc_rsync+.html
http://www.ils.unc.edu/ils/research/reports/TR-1999-01.pdf

Chris Shoemaker wrote:
	Yes, I think you're right about the original design.  And I guess we'd
want to preserve that capability.  Or would we?
	I'm having a little trouble seeing why this was the intended 
use.  I figure, there are three cases:

   A) If you have access to both source and dest, it doesn't really matter too
much who writes the batch -- this is like the local copy case.
   B) If you have access to the dest but not the source, then you need the
client to write the batch -- and it's not far-fetched that you might have
other copies of dest to update.
   C) However, having access to source but not dest is the only case that
_requires_ the sender to write the batch -- now what's the chance that you'll
have another identical dest to apply the batch to?  And if you did, why
wouldn't you generate the batch on that dest as in case A, above?
   So, it seems to me that it's much more useful to have the receiver/client 
write the batch than sender/client, or receiver/server, or sender/server.  
But, maybe I'm just not appreciating what the potential uses of batch-mode 
are.

  Survey: so who uses batch-mode and what for?
I haven't used the feature but back when I read the docs on rsync+ I 
thought it was a clever way to do multicasting on the cheap.  I think 
the only scenario where batch mode makes sense is when you need to 
distribute updates from a particular archive to a (large) number of 
mirror sites and you have tight control on the state of both client and 
server (so that you know exactly what needs to be updated on the mirror 
sites).  This ensures that you can create a set of batch files that 
contain *all* the changes necessary for updating each mirror site.

So basically I would use batch mode if I had a situation in which:
1) all mirror sites have the same set of files
2) rsync is invoked from each mirror site in exactly the same way (i.e. 
same command-line options) to pull data from a master server

then instead of having N sites invoke rsync against the same archive, I 
would invoke it once, make it write out a set of batch files, then 
transfer the batch files to each client and run rsync locally using the 
batch set.  The advantage of this is that the server only performs its 
computations once.  An example of this usage would be using rsync to 
upgrade a linux distribution, say going from FC 1 to FC 2.  All files 
from each distribution are frozen, so you should be able to create a 
single batch which incorporates all the changes and then apply that on 
each site carrying the distro.

The question of whether the batch files should be on the client or 
server side is not easy to answer and in the end depends on exactly what 
you're trying to do.  In general, I would say that since the contents of 
the batch mode depend on the status of both client and server, there is 
not a natural location for it.

-- Alberto

Alberto Accomazzi  aaccomazzi(at)cfa harvard edu
NASA Astrophysics Data Systemads.harvard.edu
Harvard-Smithsonian Center for Astrophysics  www.cfa.harvard.edu
60 Garden St, MS 31, Cambridge, MA 02138, USA

--
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: batch-mode fixes [was: [PATCH] fix read-batch SEGFAULT]

2004-05-18 Thread Alberto Accomazzi
Chris Shoemaker wrote:
	Indeed, what you describe seems to have been the design motivation.  I
can share what my desired application is: I want to create a mirror of a
public server onto my local machine which physically disconnected from the
Internet, and keep it current.  So, I intend to first rsync update my own copy
which _is_ networked while creating the batch set.  Then I can sneakernet the
batch set to the unnetworked machine and use rsync --read-batch to update it. 
This keeps the batch sets smallish even though the mirror is largish. 
This was something I looked into a couple of years ago.  Back then I 
even posted an email to the list 
(http://lists.samba.org/archive/rsync/2002-August/003433.html) and got 
no feedback, which led me to conclude that people were not doing any of 
this at the time.  To restate the obvious, the batch mode thing is 
really just a glorified diff/patch operation.  The problem I have with 
it is that AFAICT it's a very fragile one, since a simple change of one 
file on either sender or receiver after the batch has been created will 
invalidate the use of the batch mode.  Contrast this with diff/patch, 
which has builtin measures to account for fuzzy matches and therefore 
makes it a much more robust tool.

In the end my motivation for using the rsync-via-sneakernet approach 
disappeared when I convinced myself that the whole operation would have 
been far too unreliable, at least for our application where files are 
updated all the time and there is never really a freeze of a release 
against which a batch file can be created.  I won't go as far as saying 
that the feature is useless, but just caution people that they need to 
understand the assumptions that this use of rsync is based upon.  Also, 
I would suggest checking out other diff/patch tools such as rdiff-backup 
or xdelta.

	BTW, there is a work-around.  If you don't mind duplicating the mirror
twice, one solution is to do a regular (no --write-batch) rsync update of one
copy of the mirror, and then do the --write-batch during a local to local
rsync update of another copy of the mirror.  Actually, this has some real
advantages if your network connection is unreliable. 
This is really the only circumstance under which I would even consider 
using batch mode.  There should also be safeguards built into the batch 
mode operation to guarantee that the source files to which the batch is 
applied are in the state we expect them to be.  I wouldn't otherwise 
want rsync to touch my files.

	Thanks for your input.
Likewise.  Good luck...
-- Alberto

Alberto Accomazzi  aaccomazzi(at)cfa harvard edu
NASA Astrophysics Data Systemads.harvard.edu
Harvard-Smithsonian Center for Astrophysics  www.cfa.harvard.edu
60 Garden St, MS 31, Cambridge, MA 02138, USA

--
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: batch-mode fixes [was: [PATCH] fix read-batch SEGFAULT]

2004-05-18 Thread Alberto Accomazzi
Wayne Davison wrote:

The knowledge or memory of that exact state is more likely to
reside with the receiver (who just left that state) than with the
sender (who may never have been in that state).  Therefore it is more
likely to be useful to the receiver than to sender.

This is only true if you imagine a receiver doing one pull and then
forwarding the update on to multiple hosts.  For instance, if you
use a pull to create the batch files and then make them available
for people to download, which would help to alleviate load from the
original server.  That said, I think most of the time a receiver is
going to be a leaf node, so the server tends to be the place where
a batch is more likely to be useful, IMO.
In thinking about batch mode, it seems like its restrictions make
it useful in only a very small set of of circumstances.  Since the
receiving systems must all have identical starting hierarchies, it
really does limit how often it can be used.
I completely agree with Wayne's assesment here.  But just to make things 
clear, let's restate what batch mode provides:

1. a (partial) set of metadata about the state of the sender
2. a (partial) set of metadata about the state of the receiver
3. an rsync-style patch for files that differ in 1. and 2.
so while 1+2+3 may be too restrictive to be useful in mirroring 
datasets, having the capability to create and cache just 1 or 2 may be a 
big win for busy servers.

I'm wondering if batch mode should be removed from the main rsync
release and relegated to a parallel project?  It seems to me that a
better feature for the mainstream utility would be something that
optimized away some of the load on the sending system when it is
serving lots of users.  So, having the ability to cache a directory
tree's information, and the ability to cache checksums for files
would be useful (especially if the data was auto-updated as it
became stale).  That would make all transfers more optimal,
regardless of what files the receiving system started from.
Firs of all, I have a feeling that the number of people who have 
*considered* using batch mode is quite small, and those who actually 
have used in the recent past is certainly an even smaller number (I'm 
thinking zero, actually).  So removing the functionality from the 
mainstream rsync would not be a problem, in fact I think it would be a 
good thing.  It doesn't make sense to keep something in the code that is 
not used and cannot be reliably supported.  Although I applaud Jos's 
efforts in providing this functionality to rsync, I was surprised to see 
it included in the main distribution, especially since it underwent 
virtually no testing as far as I can tell.

There's no doubt that caching the file list on the server side would 
indeed be a very useful feature for all those who use rsyncd as a 
distribution method.  We all know how difficult it can be to reliably 
rsync a large directory tree because of the memory and I/O costs in 
keeping a huge filelist in memory.  This may best be done by creating a 
separate helper application (say rsyncd-cache or such) that can be run 
on a regular basis to create a cached version of a directory tree 
corresponding to an rsyncd module on the server side.  The trick in 
getting this right will be to separate out the client-supplied options 
concering file selection, checksumming, etc, so that the cache is as 
general as possible and can be used for a large set of connections so as 
to minimize the number of times that the actual filesystem is scanned.

Such a new feature would probably best be added to an rsync
replacement project, though.
Hmmm... replacement?  why not make this a utility that can be run 
alongsize an rsync daemon?  Or are you thinking of a design for a new 
rsync?

-- Alberto

Alberto Accomazzi  aaccomazzi(at)cfa harvard edu
NASA Astrophysics Data Systemads.harvard.edu
Harvard-Smithsonian Center for Astrophysics  www.cfa.harvard.edu
60 Garden St, MS 31, Cambridge, MA 02138, USA

--
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: batch-mode fixes [was: [PATCH] fix read-batch SEGFAULT]

2004-05-19 Thread Alberto Accomazzi
Chris Shoemaker wrote:
There's no doubt that caching the file list on the server side would 
indeed be a very useful feature for all those who use rsyncd as a 
distribution method.  We all know how difficult it can be to reliably 
rsync a large directory tree because of the memory and I/O costs in 
keeping a huge filelist in memory.  This may best be done by creating a 
separate helper application (say rsyncd-cache or such) that can be run 
on a regular basis to create a cached version of a directory tree 
corresponding to an rsyncd module on the server side.  The trick in 
getting this right will be to separate out the client-supplied options 
concering file selection, checksumming, etc, so that the cache is as 
general as possible and can be used for a large set of connections so as 
to minimize the number of times that the actual filesystem is scanned.

	What client options are you thinking will be tricky?  Wouldn't the 
helper app just cache _all_ the metadata for the module, and then rsync would 
query only the subset it needed?  It's not like the client can change the 
checksum stride.  [That would hurt.]
What I'm referring to are those options that a client passes to the 
server which influence file selection, checksum and block generation.  I 
haven't looked at the rsync source code in quite a while, but off the 
top of my head here are the issues to look at when considering caching a 
filesystem scan:

1. Exclude/include patterns:
 -C, --cvs-exclude   auto ignore files in the same way CVS does
 --exclude=PATTERN   exclude files matching PATTERN
 --exclude-from=FILE exclude patterns listed in FILE
 --include=PATTERN   don't exclude files matching PATTERN
 --include-from=FILE don't exclude patterns listed in FILE
 --files-from=FILE   read FILE for list of source-file names
These should be easy to deal with: I would simply have the cache creator 
ignore any --exclude options passed by the client (but probably honor 
the ones defined in a daemon config file).

2. Other file selection options:
 -x, --one-file-system   don't cross filesystem boundaries
 -S, --sparsehandle sparse files efficiently
 -l, --links copy symlinks as symlinks
 -L, --copy-linkscopy the referent of all symlinks
 --copy-unsafe-links copy the referent of unsafe symlinks
 --safe-linksignore unsafe symlinks
It's possible that these can also be dealt with easily, but I'm not so 
sure.  Clearly -x influences what gets scanned, so how do you decide 
what to cache?  The other options are probably easier to deal with.

3. File checksums:
 -c, --checksum  always checksum
Should the caching operation always checksum so that the checksums are 
readily available when a client sets -c?  This can lead to a lot of 
computations and disk IO which may be unnecessary if the clients do not 
use this option.

4. Block checksums:
 -B, --block-size=SIZE   checksum blocking size (default 700)
It would be great if we could cache the rolling block checksums as they 
are computed but this may be even harder (or impossible) to deal with. 
And it looks like soon we'll have a new checksum-seed option which will 
further complicate the issue (in fact I admit I have no idea about how 
all of this works beyond versions 2.5.x; maybe somebody with more 
knowledge on the subject will chime in).

So I'm just pointing out that in order to create a cache with a high hit 
probability you have to make assumptions and choices that may be 
non-trivial.  Probably the best solution is reducing the scope of the 
cache so that it contains only the initial file list generation under 
default settings, or maybe you want to have a set of different caches 
created using different options.  I, for one, have consistently been 
using the --checksum option when distributing some sensitive data to our 
mirror sites, so I would want that to be included in a cache.

-- Alberto

Alberto Accomazzi  aaccomazzi(at)cfa harvard edu
NASA Astrophysics Data Systemads.harvard.edu
Harvard-Smithsonian Center for Astrophysics  www.cfa.harvard.edu
60 Garden St, MS 31, Cambridge, MA 02138, USA

--
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Potential new option: --delete-during

2005-01-21 Thread Alberto Accomazzi
Wayne,
I haven't (yet) given this a try but it sounds like a very reasonable 
thing to do.  --delete-before is still useful in those cases where you 
may be tight on diskspace.  But the default behaviour should be the most 
efficient one, so I agree with making this patch be the default --delete.

On a related note, don't you think it's time to start making candidate 
releases for 2.6.4?  It's been a while...

-- Alberto

There is a new patch named delete-during.diff in the CVS patches
dir.  This patch adds the ability for rsync to incrementally delete
files in each directory during the normal course of the transfer rather
than doing a separate delete scan either before or after the transfer.
The patch renames the current --delete option into --delete-before and
makes --delete behave in the delete-during style.  I'm debating whether
we actually need a --delete-during option -- I'm currently leaning
towards leaving it out, so it's not documented as existing at the
moment.  I've done some simple testing (including both with and without
the --relative option) and it seems to work fine so far.
Comments?  How do people feel about making the --delete-during behavior
the default --delete algorithm?  I think it will be much more efficient
(and less prone to timeouts), so having it as the default is the best
choice.
The patch applies to (and comes with) the CVS version, and is present in
the latest nightly tar file (available from the web site).
..wayne..


Alberto Accomazzi  aaccomazzi(at)cfa harvard edu
NASA Astrophysics Data Systemads.harvard.edu
Harvard-Smithsonian Center for Astrophysics  www.cfa.harvard.edu
60 Garden St, MS 31, Cambridge, MA 02138, USA

--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: feedback on rsync-HEAD-20050125-1221GMT

2005-01-31 Thread Alberto Accomazzi
Hi Chris,
Chris Shoemaker wrote:
On Fri, Jan 28, 2005 at 03:42:25PM -0500, Alberto Accomazzi wrote:
Chris Shoemaker wrote:

If I understand Wayne's design, it would be possible to invent a
(per-directory) hook rule, whose value is executed, and whose stdout
is parsed as a [in|ex]clude file list.  E.g.:
-R cat .rsync-my-includes
or
-R find . -ctime 1 -a ! -fstype nfs -a ! -empty -o iname 'foo*'
This is certainly a very powerful mechanism, but it definitely should 
not be the only way we implement file filtering.  Two problems:

1. Sprinkling rule files like these across directories would mean 
executing external programs all the time for each file to be considered. 

No, only one execution per specified rule.  Most users of this feature
would put specify one rule at the root directory.  But, if a user
wanted to change the rules for every directory, they would have to
specify a rule in each directory.  Then, yes, one execution per
directory.  Presumably they would do this because they actually need
to.  Never one execution per file.
Ok, I guess I had misunderstood your original suggestion.  One execution 
per directory is presumably not so bad, although it's hard to make 
assumptions about how one's data hierarchy is structured.

This would presumably slow down rsync's execution by an order of 
magnitude or so and suck the life out of a system doing a big backup job.

If you're referring to process spawning overhead, it's no big deal.
If you're referring to the actual work required to return the file
list, what makes you think that rsync can do it more efficiently than
'cat' or 'find', or whatever tool the user chose?
I was referring to the overhead of spawning a process per file being 
considered.  But I think we all agree that this is not desirable nor 
necessary.

2. Who does actually need such powerful but yet hard-to-handle 
mechanism?  Most of rsync's users are not programmers, and even us few 
who are apparently still get confused with rsync's include/exclude 
logic, forget about even more complicated approaches.

Do you mean include/exclude mechanism or filtering mechanism?  Well,
IMO, parsing a file list is *less* complicated than rsync's custom
pattern specification and include/exclude chaining.  Actually, I think
rsync patterns are /crazy/ complicated and fully deserve the pages
upon pages of documentation, explanation and examples that they get in
the man page.
But, complexity is somewhat subjective, so I won't argue (much) about
it.  In practice, /familiarity/ is far more important than complexity
in a case like this.  Someone who looks at rsync for the first time
has a _zero_ chance of having seen something like rsync's patterns
before, because there is nothing else like them.  
I agree that exclude/include patters can be tricky, and you have a good 
point about familiarity versus complexity.  I think what makes them hard 
to handle is the fact that we are dealing with filename (and directory 
name) matching and recursion.  So matching only a subset of a file tree, 
while simple as a concept, is non-trivial once you sit down and realize 
that you need a well-defined syntax for it.  Can you write a find 
expression that is simpler or more familiar to the average user than an 
rsync's include/exclude?

(The allusion to GNU
tar's --exclude option which takes only a filename, not a pattern,
isn't really helpful in understanding rsyncs --exclude option.)
Uh?  Tar does take patters for exclusion, and has its own quirky way of 
dealing with wildcards, directory matching and filename anchoring:
http://www.gnu.org/software/tar/manual/html_node/tar_100.html

It's not that pattern matching for file selection isn't complex --
it's just that it's such a well-defined, conceptually simple, common
task that other tools (like 'find' and 'bash') handle better than
rsync ever will.  And that's the way it should be: it's the unix way.
I agree that this is something we should be striving for as much as 
possible: pipeline and offload tasks rather than bloating applications.

If you really need 
complete freedom maybe the way to go is to do your file selection first 
and use --files-from.  

Yes, --files-from is nice, and honestly, almost completely sufficient.
But in some dynamic cases, you can't keep the list updated.
Well, maybe we should go back and see if the solution to all problems 
isn't making --files-from sufficient.  What exactly is missing from it 
right now?  The capability to delete files which are not in the 
files-from list?  Or the remote execution of a command that can generate 
the files-from list for an rsync server?  Maybe we ought to really 
figure out what things cannot be achieved with the current functionality 
before coming up with something new.

challenge is making this powerful without making it too complicated, 
because in that case nobody will use it.

You see --filter as less complicated than --include/exclude, then?
It's certainly more powerful.
Since --filter can support a superset