FWD: Re: specifying a list of files to transfer

2003-01-17 Thread jw schultz

On Fri, Jan 17, 2003 at 05:42:41PM -0800, Wayne Davison wrote:
> On Fri, Jan 17, 2003 at 04:21:59PM -0800, jw schultz wrote:
> > It should not do /root2/i386/etc/init.d/rsyncd and so on as
> > -R would have it.
> 
> -R would only do that if you actually prefixed the paths with the source
> dir, which is not what happens with --files-from.  The source dir is
> just used as the default dir.  So, your example works exactly as you are
> expecting.  I.e, this set of commands:
> 
> cd /some/path
> rsync -R `cat /tmp/files` remote:/dest
> 
> works much like this new command:
> 
> rsync --files-from=/tmp/files /some/path remote:/dest
> 
> Except that it also transfers any named dirs in the input file (without
> -r and without recursing).  Note also that this reflects the new default
> of -R being enabled by default when --files-from is specified.
> 
> If the user wants the extra dirs prefixed from the source spec, they
> just need to specify them as part of the dest:
> 
> rsync --files-from=/tmp/files /some/path remote:/dest/some/path

Great!  It seems _I_ missed something.  I think it is the
difference between the behavior of list items and command
line items that threw me.  Sometimes it helps to actually
use an example.  We'll have to make sure the manpage is very
clear.

> > I hope this points out clearly the difference in our perspectives on
> > this.  I am not talking about a way to extend the command line.  I am
> > talking about an explicit list that eliminates the tree walk and
> > awkwardness of artificial include/exclude lists [...]
> 
> Sorry, but I don't see any conflict in our perspectives at all.  Let me
> know if I'm missing something.

It sounds like the only remaining issues (mostly
implementation detail) are:
implied directories (resolved i think)
when to recourse directories in the list
duplicate dirs


-- 

J.W. SchultzPegasystems Technologies
email address:  [EMAIL PROTECTED]

Remember Cernan and Schmitt
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-17 Thread Wayne Davison
On Fri, Jan 17, 2003 at 04:21:59PM -0800, jw schultz wrote:
> It should not do /root2/i386/etc/init.d/rsyncd and so on as
> -R would have it.

-R would only do that if you actually prefixed the paths with the source
dir, which is not what happens with --files-from.  The source dir is
just used as the default dir.  So, your example works exactly as you are
expecting.  I.e, this set of commands:

cd /some/path
rsync -R `cat /tmp/files` remote:/dest

works much like this new command:

rsync --files-from=/tmp/files /some/path remote:/dest

Except that it also transfers any named dirs in the input file (without
-r and without recursing).  Note also that this reflects the new default
of -R being enabled by default when --files-from is specified.

If the user wants the extra dirs prefixed from the source spec, they
just need to specify them as part of the dest:

rsync --files-from=/tmp/files /some/path remote:/dest/some/path

> I hope this points out clearly the difference in our perspectives on
> this.  I am not talking about a way to extend the command line.  I am
> talking about an explicit list that eliminates the tree walk and
> awkwardness of artificial include/exclude lists [...]

Sorry, but I don't see any conflict in our perspectives at all.  Let me
know if I'm missing something.

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-17 Thread Lee Eakin
---begin quoted text---
> From: jw schultz <[EMAIL PROTECTED]>
> Date: Fri, 17 Jan 2003 16:21:59 -0800
> 
> On Fri, Jan 17, 2003 at 04:42:51PM -0600, Dave Dykstra wrote:
> > On Thu, Jan 16, 2003 at 11:14:50PM -0800, Wayne Davison wrote:
> > 
> > Oh, right, I hadn't thought of that implication of the way this is
> > implemented.  Definitely we want the -R functionality implied.  That's
> > the only way I can imagine people wanting to use this.
> > 

  I can think of a couple of uses for a --no-relative option.  It would not
  be the common case, I agree with the examples below.  They illustrate
  both the common case and the exception quite well.

  I can see a case where you want to backup several critical files from a
  one system to a single (flat) directory on another.  The flattened
  example below would work well for this.  Of course the example also shows
  a filename stepping on another, but since --no-relative would would be
  the exception instead of default, the user can deal with it (they
  explicitly asked for it after all).

  I can also see a case where you have several files in a single directory
  that you want to update from a master repository, but the repository has
  them spread out in different dirs (may due to different files for
  different architectures).  This option could allow you to update say
  /usr/local/bin pulling from several known locations save in the distlist
  file.

  Sorry, just had to throw this in.  I understand stand the desire to avoid
  feeping creaturism.  Making software more useful to more people with
  hideous bloat is a very difficult balance.

-Lee
> 
>   rsync -lptgoDu --delete --files-from=distlist distserver::8.0/i386 /root2
> where distlist is
>   etc/init.d/rsyncd
>   etc/rsyncd.conf
>   usr/bin/rsync
>   usr/bin/rsyncstats
>   usr/sbin/rcrsyncd
>   usr/sbin/rsyncd
>   usr/share/doc/packages/rsync
>   usr/share/doc/packages/rsync/COPYING
>   usr/share/doc/packages/rsync/README
>   usr/share/doc/packages/rsync/tech_report.ps
>   usr/share/doc/packages/rsync/tech_report.tex
>   usr/share/man/man1/rsync.1.gz
>   usr/share/man/man5/rsyncd.conf.5.gz
> 
> It should not do /root2/i386/etc/init.d/rsyncd and so on as
> -R would have it.
> 
> It should not create (flattened)
>   /root2/rsyncd   # from /etc/init.d
>   /root2/rsyncd.conf
>   /root2/rsync
>   /root2/rsyncstats
>   /root2/rcrsyncd
>   /root2/rsyncd   # from usr/sbin?
>   /root2/COPYING
>   /root2/README
>   /root2/tech_report.ps
>   /root2/tech_report.tex
>   /root2/rsync.1.gz
>   /root2/rsyncd.conf.5.gz
> 
> What it should create or update is /root2/etc/init.d/rsyncd and so on.
> and it should be equivalent to
>   rsync -lptgoDu --delete --files-from=distlist \
>   distserver:/data/distribution/8.0/i386 /root2
> or
>   rsync -lptgoDu --delete --files-from=distlist \
>   /data/distribution/8.0/i386 client:/root2
> 
> 
> If /root2/usr/share/doc/packages doesn't exist it should be
> created with perms from source but it should not be recoursed.
> 
> This example is drawn from one of the most recent emails
> requesting this feature.
> 
---end quoted text---

-- 
Lee Eakin - [EMAIL PROTECTED]
 
Life's not fair, but the root password helps.
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-17 Thread jw schultz
On Fri, Jan 17, 2003 at 04:42:51PM -0600, Dave Dykstra wrote:
> On Thu, Jan 16, 2003 at 11:14:50PM -0800, Wayne Davison wrote:
> > On Thu, Jan 16, 2003 at 07:06:05PM -0800, jw schultz wrote:
> > > [...] and that entries therein are not flattened like they would be on
> > > the command-line (sans -R).
> > 
> > But they *are* flattened exactly like on the command-line, at least in
> > my current patch they are.  That's what -R is for -- telling rsync not
> > to do that.  So, without -R there are no implied directories to create
> > except for the destination dir (which is created if it doesn't exist).
> 
> Oh, right, I hadn't thought of that implication of the way this is
> implemented.  Definitely we want the -R functionality implied.  That's
> the only way I can imagine people wanting to use this.
> 
> 
> 
> > > The permissions and ownership should be derived from the source.
> > > so effectively it should be as though
> > >   ./deltapics
> > > where in the file list.
> > 
> > Right.  In fact, that's exactly what happens with -R -- all intermediate
> > directories get added to the file list (if they aren't already in it)
> > without causing any extra recursion (even if -r was specified/implied).
> 
> 
> In my former hack implementation of the "exclude optimization" (when 
> there were only includes with no wildcards and a final exclude '*') it
> was able to skip sending the parent directories completely.  Come to
> think of it, I'm not sure what kind of permissions were used for the
> directories that were not explicitly included, maybe it just use the
> default.
> 
> 
> > If people want the "--files-from" to imply "-R" then I'd want to see a
> > "--no-relative" option to let people turn it off.
> 
> That would be easy to implement so I guess it wouldn't hurt but I really
> can't see people wanting to do that.

rsync -lptgoDu --delete --files-from=distlist distserver::8.0/i386 /root2
where distlist is
etc/init.d/rsyncd
etc/rsyncd.conf
usr/bin/rsync
usr/bin/rsyncstats
usr/sbin/rcrsyncd
usr/sbin/rsyncd
usr/share/doc/packages/rsync
usr/share/doc/packages/rsync/COPYING
usr/share/doc/packages/rsync/README
usr/share/doc/packages/rsync/tech_report.ps
usr/share/doc/packages/rsync/tech_report.tex
usr/share/man/man1/rsync.1.gz
usr/share/man/man5/rsyncd.conf.5.gz

It should not do /root2/i386/etc/init.d/rsyncd and so on as
-R would have it.

It should not create (flattened)
/root2/rsyncd   # from /etc/init.d
/root2/rsyncd.conf
/root2/rsync
/root2/rsyncstats
/root2/rcrsyncd
/root2/rsyncd   # from usr/sbin?
/root2/COPYING
/root2/README
/root2/tech_report.ps
/root2/tech_report.tex
/root2/rsync.1.gz
/root2/rsyncd.conf.5.gz

What it should create or update is /root2/etc/init.d/rsyncd and so on.
and it should be equivalent to
rsync -lptgoDu --delete --files-from=distlist \
distserver:/data/distribution/8.0/i386 /root2
or
rsync -lptgoDu --delete --files-from=distlist \
/data/distribution/8.0/i386 client:/root2


If /root2/usr/share/doc/packages doesn't exist it should be
created with perms from source but it should not be recoursed.

This example is drawn from one of the most recent emails
requesting this feature.

I want to thank Wayne for his work on this and his patience
with me.  I seem to be butting heads with him while he has
been good enough to actually write code.  I hope this points
out clearly the difference in our perspectives on this.  I
am not talking about a way to extend the command line.  I am
talking about an explicit list that eliminates the tree walk
and awkwardness of artificial include/exclude lists and has
a similar effect to

while read subpath
do
rsync -lptgoD distserver::8.0/i386/$subpath /root2/$subpath
done http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-17 Thread Wayne Davison
On Fri, Jan 17, 2003 at 11:46:33AM -0500, Andrew J. Schorr wrote:
> Is there a possible case where
> the --files-from file list lives on the remote (sender) side?

Yes, I could see that being possible -- your update scenario is even an
interesting example.

It's actually easy to have the remote sender open the file list on the
sending side with the addition of a little code that allows the
--files-from name to be prefixed by a hostname (that must match the
sender's hostname).

> Come to think of it, if the data lives on the remote server, where
> would a local files-from list come from?  How would it be generated?

Since the list is manually generated I imagine that either the user has
advanced knowledge of what is to be grabbed or the user first runs a
remote command (via ssh, perhaps) that generates the list.

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-17 Thread Wayne Davison
On Fri, Jan 17, 2003 at 12:14:13PM -0500, Andrew J. Schorr wrote:
> If I run rsync in such a way that parent directories are sent automatically,
> it will send the following files (based on -vvv output):
> 
>make_file(4,test)
>make_file(4,test/foo.jpg)
>make_file(4,test/bar.jpg)
>make_file(4,test)
>make_file(4,test/sub)
>make_file(4,test/sub/foo.jpg)
>make_file(4,test)
>make_file(4,test/zeke.jpg)

Yeesh, that's bad.  It sends all these duplicates in the file list, but
then goes through the list and tries to remove duplicates, so it doesn't
transfer all of these duplicate names.  However, since it has a (known)
bug that fails to remove multiple duplicates in a row, so it does do
some significant redundant processing at the moment.

I've written a better implied-directory-adding routine that greatly
reduces the added dirs to the file list when the input list is in a
normal hierarchical order -- i.e. it doesn't use a hash, but does keep
track of the previous path in a better way.  I've also fixed the
duplicate-removing code to be able to handle multiple dups in a row.

I haven't committed either of these patches (since we're trying to get a
release out), but I'll put them into my --files-from patch when next I
update it.  The dups-removing fix is actually pretty simple, so if we
think that this would be something that we'd like to see in this next
release, I could commit that.

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-17 Thread Dave Dykstra
On Thu, Jan 16, 2003 at 11:14:50PM -0800, Wayne Davison wrote:
> On Thu, Jan 16, 2003 at 07:06:05PM -0800, jw schultz wrote:
> > [...] and that entries therein are not flattened like they would be on
> > the command-line (sans -R).
> 
> But they *are* flattened exactly like on the command-line, at least in
> my current patch they are.  That's what -R is for -- telling rsync not
> to do that.  So, without -R there are no implied directories to create
> except for the destination dir (which is created if it doesn't exist).

Oh, right, I hadn't thought of that implication of the way this is
implemented.  Definitely we want the -R functionality implied.  That's
the only way I can imagine people wanting to use this.



> > The permissions and ownership should be derived from the source.
> > so effectively it should be as though
> > ./deltapics
> > where in the file list.
> 
> Right.  In fact, that's exactly what happens with -R -- all intermediate
> directories get added to the file list (if they aren't already in it)
> without causing any extra recursion (even if -r was specified/implied).


In my former hack implementation of the "exclude optimization" (when 
there were only includes with no wildcards and a final exclude '*') it
was able to skip sending the parent directories completely.  Come to
think of it, I'm not sure what kind of permissions were used for the
directories that were not explicitly included, maybe it just use the
default.


> If people want the "--files-from" to imply "-R" then I'd want to see a
> "--no-relative" option to let people turn it off.

That would be easy to implement so I guess it wouldn't hurt but I really
can't see people wanting to do that.

- Dave
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-17 Thread Andrew J. Schorr
On Thu, Jan 16, 2003 at 07:06:05PM -0800, jw schultz wrote:
> I know i'm not talking about when -R is used.  I am talking
> about creating implied intermediate directories without -R.
> I'm talking about being able to take the output of
> find -name '*.jpg' and have it create (if necessary) any
> intermediate directories while maintaining the equivalency
> of src and dest.  If that means also behaving as though
> those directories were already in the list that would be OK
> as long as -r weren't specified.
> 
>   find . -name '*.jpg' | rsync -a --files-from=- .  remote:
> should when it hits
>   ./deltapics/031CGMUa.jpg
>   ./deltapics/031CGNga.jpg
>   ./deltapics/031CGOHa.jpg
>   ./deltapics/031CGPOa.jpg
>   ./deltapics/031CGPba.jpg
> create the deltapics directory if it doesn't exist.  The
> permissions and ownership should be derived from the source.
> so effectively it should be as though
>   ./deltapics
> where in the file list.  It needn't be updated if it
> does exist but if easier to implement it that way i wouldn't
> object.  In such a case even if -r is
> allowed and specified the implied directory should not defeat
> the the file list by transferring any files not in the list.
> 
> No errors, no need to do a run to find the missing
> directories and add them and no need to add a filter to the
> stream adding entries for directories that are missing.

There are performance issues associated with sending all the
parent directories automatically.  Consider the situation where
running "find test -name "*.jpg" -print" gives the following results
(and yes, this does happen, at least for me on solaris 8 where the
output of find seems to depend on the order in which the directory
entries were created):

   test/foo.jpg
   test/bar.jpg
   test/sub/foo.jpg
   test/zeke.jpg

If I run rsync in such a way that parent directories are sent automatically,
it will send the following files (based on -vvv output):

   make_file(4,test)
   make_file(4,test/foo.jpg)
   make_file(4,test/bar.jpg)
   make_file(4,test)
   make_file(4,test/sub)
   make_file(4,test/sub/foo.jpg)
   make_file(4,test)
   make_file(4,test/zeke.jpg)

Note that the "test" directory is sent 3 times in this case.  This is
because the code that checks whether to send the directory just compares
to the last one sent in an attempt to eliminate duplicates.  But this
is not a reliable way of preventing duplicates, as the above example
demonstrates.  So there is a danger of sending lots of duplicate
directory entries when the automatic directory transmission feature
is enabled.

This could probably be fixed by keeping a hash table of all the directory
entries that have already been transmitted instead of just comparing
against the last one sent.

In any case, I think it's important to be able to turn off the
automatic directory sending feature so that situations that don't
require this can avoid the performance hit.

-Andy
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-17 Thread Andrew J. Schorr
On Thu, Jan 16, 2003 at 05:07:13PM -0800, Wayne Davison wrote:
> (I assume you're talking about when using -R, which is not currently on
> by default.)  I believe that we do need an auto-creation mode and also a
> way to optimize the transfer to avoid this (since it results in a lot of
> extra directory checks that can slow things down when we know that they
> aren't needed).  Which one is the default is the current question.  I'm
> currently leaning toward going back to sending the implied dirs by
> default, and having an option for people to optimize the transfer (which
> would allow it to be used with the normal -R mode even when --from-files
> was not used).

This is why my patch included the --no-implicit-dirs and --send-dirs
options.  These allow you to specify precisely what you want without
having to remember which behavior is implied by various other options...

Although I suppose that those two options could be combined.  The basic
idea is that if the user is taking the responsibility for specifying
the directories himself (--send-dirs), then rsync doesn't need to
worry about automatically sending all the parent directories
(--no-implicit-dirs).

So perhaps an --explicit-dirs option (combining the two meanings) could
be used to indicate that the user is taking all responsibility for
sending directories and rsync doesn't need to worry about it.  And
to be safe, a --no-explicit-dirs to turn off this behavior.

-Andy
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-17 Thread Andrew J. Schorr
On Thu, Jan 16, 2003 at 01:58:49PM -0800, jw schultz wrote:
> On Thu, Jan 16, 2003 at 02:52:38PM -0600, Dave Dykstra wrote:
> > Also, if the transfer is being sent from the remote side, the file names
> > are all getting sent over to the remote side first for --files-from and
> > then sent back as part of the normal protocol, right?  I had hoped we'd
> > be able to avoid that round trip because the list could get long.
> 
> I don't think we can avoid the round trip without changing
> rsync drastically.  Just consider that this saves on sending
> voluminous --include lists or invoking rsync hundreds of
> times.

Perhaps I'm misunderstanding this completely, but is there a possible
scenario where the remote (sender) might have a list of files on
his side?  I might be mistaken, but it seems that the current patch
supports the case where the --files-from file list is located on
the local side, and it must be transmitted to the sender in the
case where the sender is remote.  Is there a possible case where
the --files-from file list lives on the remote (sender) side?

This is not relevant to me, but I'm just wondering.  One might
imagine a scenario in which a bunch of sites are mirroring from
a server, and the server runs a job to create a list of modified
files that the remote mirrors should pull for updating...

Come to think of it, if the data lives on the remote server, where
would a local files-from list come from?  How would it be generated?

-Andy
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-16 Thread Wayne Davison
On Thu, Jan 16, 2003 at 07:06:05PM -0800, jw schultz wrote:
> [...] and that entries therein are not flattened like they would be on
> the command-line (sans -R).

But they *are* flattened exactly like on the command-line, at least in
my current patch they are.  That's what -R is for -- telling rsync not
to do that.  So, without -R there are no implied directories to create
except for the destination dir (which is created if it doesn't exist).

> The permissions and ownership should be derived from the source.
> so effectively it should be as though
>   ./deltapics
> where in the file list.

Right.  In fact, that's exactly what happens with -R -- all intermediate
directories get added to the file list (if they aren't already in it)
without causing any extra recursion (even if -r was specified/implied).

If people want the "--files-from" to imply "-R" then I'd want to see a
"--no-relative" option to let people turn it off.

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-16 Thread jw schultz
On Thu, Jan 16, 2003 at 05:07:13PM -0800, Wayne Davison wrote:
> > On Wed, Jan 15, 2003 at 02:49:08PM -0800, jw schultz wrote:
> > > You seem to see the files-from as a way of replacing command-line
> > > args where i see it as a way of replacing the tree scan.
> 
> I actually think of it as both, since I also consider the command-line
> as a way of replacing the tree scan.  I think that it is fairly easy to
> explain how --files-from works if you explain it in terms of how it is
> much like specifying the names on the command-line (and explain what is
> different about it).  One thing that I'm hoping to avoid is arbitrary
> limits on what the mode can do -- I'd like to see it be an easy way to
> specify exactly what files to send, and also as a way to extend the size
> of the command-line.

The difference seems pretty big to me but if you can
describe it cleanly that's fine.  The big thing is that
paths specified in --files-from are relative to the tree,
not CWD and that entries therein are not flattened like they
would be on the command-line (sans -R).


> On Thu, Jan 16, 2003 at 02:52:38PM -0600, Dave Dykstra wrote:
> > I would rather have the '-r' option ignored when --files-from is in
> > effect.
> 
> I wouldn't want that as a hard limit.  It would be better to say that
> -a doesn't imply -r when --files-from is used, but the user can still
> manually specify -r if they want to.  It would be easy to implement this
> in a way that would _not_ require any particular order to the options on
> the command-line (which I agree with JW would be a very bad idea).
> 
> > If people leave out the directories, missing parent directories should
> > be automatically created.
> 
> (I assume you're talking about when using -R, which is not currently on
> by default.)  I believe that we do need an auto-creation mode and also a
> way to optimize the transfer to avoid this (since it results in a lot of
> extra directory checks that can slow things down when we know that they
> aren't needed).  Which one is the default is the current question.  I'm
> currently leaning toward going back to sending the implied dirs by
> default, and having an option for people to optimize the transfer (which
> would allow it to be used with the normal -R mode even when --from-files
> was not used).

I know i'm not talking about when -R is used.  I am talking
about creating implied intermediate directories without -R.
I'm talking about being able to take the output of
find -name '*.jpg' and have it create (if necessary) any
intermediate directories while maintaining the equivalency
of src and dest.  If that means also behaving as though
those directories were already in the list that would be OK
as long as -r weren't specified.

find . -name '*.jpg' | rsync -a --files-from=- .  remote:
should when it hits
./deltapics/031CGMUa.jpg
./deltapics/031CGNga.jpg
./deltapics/031CGOHa.jpg
./deltapics/031CGPOa.jpg
./deltapics/031CGPba.jpg
create the deltapics directory if it doesn't exist.  The
permissions and ownership should be derived from the source.
so effectively it should be as though
./deltapics
where in the file list.  It needn't be updated if it
does exist but if easier to implement it that way i wouldn't
object.  In such a case even if -r is
allowed and specified the implied directory should not defeat
the the file list by transferring any files not in the list.

No errors, no need to do a run to find the missing
directories and add them and no need to add a filter to the
stream adding entries for directories that are missing.


> > As it is now, if somebody just does "find . -print | rsync -a
> > --files-from- ..." are they going to get repeated files because the
> > directories are listed?
> 
> Rsync would weed out the duplicates, but if -a implied -r in this
> context then the presence of the directories would cause rsync to
> recurse through all the directory content and thus make this a
> horrible thing to do.  A couple alternatives:
> 
>   find . -print | rsync -lptgoDR --files-from=- . remote:/dest
>   find . -type f -print | rsync -aR --files-from=- . remote:/dest

Assume the user is working from a file list stored in a file
and isn't using -R.  Possibly doing a pull.  Let's not force
them to go through extra hoops.  I know i've never used -R
and i suspect many other people haven't.  Try

find $srddir -assorted-options|sed -e 's/^srcdir\///' \
|rsync -a --files $srcdir remote:$destdir

While unlikely from the command line, not unlikely from a
script.  While fairly easy to do simple subs, adding lines
can be a pain.


-- 

J.W. SchultzPegasystems Technologies
email address:  [EMAIL PROTECTED]

Remember Cernan and Schmitt
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxed

Re: specifying a list of files to transfer

2003-01-16 Thread Wayne Davison
> On Wed, Jan 15, 2003 at 02:49:08PM -0800, jw schultz wrote:
> > You seem to see the files-from as a way of replacing command-line
> > args where i see it as a way of replacing the tree scan.

I actually think of it as both, since I also consider the command-line
as a way of replacing the tree scan.  I think that it is fairly easy to
explain how --files-from works if you explain it in terms of how it is
much like specifying the names on the command-line (and explain what is
different about it).  One thing that I'm hoping to avoid is arbitrary
limits on what the mode can do -- I'd like to see it be an easy way to
specify exactly what files to send, and also as a way to extend the size
of the command-line.

On Thu, Jan 16, 2003 at 02:52:38PM -0600, Dave Dykstra wrote:
> I would rather have the '-r' option ignored when --files-from is in
> effect.

I wouldn't want that as a hard limit.  It would be better to say that
-a doesn't imply -r when --files-from is used, but the user can still
manually specify -r if they want to.  It would be easy to implement this
in a way that would _not_ require any particular order to the options on
the command-line (which I agree with JW would be a very bad idea).

> If people leave out the directories, missing parent directories should
> be automatically created.

(I assume you're talking about when using -R, which is not currently on
by default.)  I believe that we do need an auto-creation mode and also a
way to optimize the transfer to avoid this (since it results in a lot of
extra directory checks that can slow things down when we know that they
aren't needed).  Which one is the default is the current question.  I'm
currently leaning toward going back to sending the implied dirs by
default, and having an option for people to optimize the transfer (which
would allow it to be used with the normal -R mode even when --from-files
was not used).

> As it is now, if somebody just does "find . -print | rsync -a
> --files-from- ..." are they going to get repeated files because the
> directories are listed?

Rsync would weed out the duplicates, but if -a implied -r in this
context then the presence of the directories would cause rsync to
recurse through all the directory content and thus make this a
horrible thing to do.  A couple alternatives:

  find . -print | rsync -lptgoDR --files-from=- . remote:/dest
  find . -type f -print | rsync -aR --files-from=- . remote:/dest

> Also, if the transfer is being sent from the remote side, the file names
> are all getting sent over to the remote side first for --files-from and
> then sent back as part of the normal protocol, right?  I had hoped we'd
> be able to avoid that round trip because the list could get long.

I don't see a way to avoid this and still allow things like the creation
of the implied directories you mentioned (which must be sent as separate
entries in the current protocol).  I think it's probably best to leave
the current file+info send process alone.

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-16 Thread jw schultz
On Thu, Jan 16, 2003 at 02:52:38PM -0600, Dave Dykstra wrote:
> Also, if the transfer is being sent from the remote side, the file names
> are all getting sent over to the remote side first for --files-from and
> then sent back as part of the normal protocol, right?  I had hoped we'd
> be able to avoid that round trip because the list could get long.

I don't think we can avoid the round trip without changing
rsync drastically.  Just consider that this saves on sending
voluminous --include lists or invoking rsync hundreds of
times.

-- 

J.W. SchultzPegasystems Technologies
email address:  [EMAIL PROTECTED]

Remember Cernan and Schmitt
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-16 Thread Dave Dykstra
On Wed, Jan 15, 2003 at 02:49:08PM -0800, jw schultz wrote:
> On Wed, Jan 15, 2003 at 10:03:33AM -0800, Wayne Davison wrote:
> > On Tue, Jan 14, 2003 at 07:57:48PM -0800, jw schultz wrote:
> > > with the -r or -a options does this [recurse] on
> > > directories in the --files-from list?
> > 
> > Yes, it treats them like command-line args with the following two
> > exceptions:  if -r is not specified, we WILL create an explicitly named
> > directory (but not send its contents), if -R is specified, we do NOT
> > create implied directories (which was your next question).  This latter
> > exception means that we currently require the user to ensure that the
> > destination directory tree is valid (which could be done once with a
> > separate rsync --files-from run that didn't use -r and specified all
> > the dirs that we needed to ensure exist).  If this turns out to be too
> > much of a hassle, perhaps a new option named --implied-dirs could be
> > added to have rsync do its normal -R dir handling.
> 
> Plus the third difference, that relative paths in the
> files-from list are relative to the tree, not to the current
> directory.
> 
> We may well want the --implied-dirs option or some logic to
> created it.  If you don't have -r (or -a) you need to have
> all the intermediate dirs listed.  If you do have -r listing
> intermediate dirs efectively disables the file list.
> 
> I'm not sure i like that.  I'm inclined to think the file-list
> should disable recoursion.  Perhaps [fighting resistance of
> yuck] recoursion would have to be specified explicitly after
> the --file-list.  I hate sequence sensitive options but
> but requiring users to remember -lptgoD instead of -a may be
> worse.
> 
> You seem to see the files-from as a way of replacing
> command-line args where i see it as a way of replacing the
> tree scan.  However, if we can pin down the semantics i
> think we can acheive both ends.


I agree more with JW.  I envisioned --files-from as replacing the tree
scan.  I would rather have the '-r' option ignored when --files-from
is in effect.  I think it should be a complete list of the files and/or
directories that are to be sent.  If people leave out the directories,
missing parent directories should be automatically created.  As it is
now, if somebody just does "find . -print | rsync -a --files-from- ..."
are they going to get repeated files because the directories are listed?
Yuck.

Also, if the transfer is being sent from the remote side, the file names
are all getting sent over to the remote side first for --files-from and
then sent back as part of the normal protocol, right?  I had hoped we'd
be able to avoid that round trip because the list could get long.

- Dave

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-15 Thread Lee Eakin
I like it (except for the work).  I don't think I'm ready to code up a
filter yet, but it is good to know the general theory in case I need to use
it.  Of course, if I keep thinking about it I'll probably run across a
situation where it's use would greatly speed/simplify my life, so maybe I
should go ahead and throw something together ;).

Yes, the filter would need to sit in the middle for the duration of the
xfer, but I am thinking of those situations where the tree scan is very
costly so the little overhead and complexity of the filter would be worth
it.

  -Lee

---begin quoted text---
> From: Wayne Davison <[EMAIL PROTECTED]>
> Date: Wed, 15 Jan 2003 13:34:08 -0800
> 
> On Wed, Jan 15, 2003 at 02:48:05PM -0600, Lee Eakin wrote:
> > Now if I can only figure out a way to intercept the list when I need to
> > be real picky about which individual files are accessed ...
> 
> This should be possible with a filter process.  Here's how the new,
> slightly tweaked protocol works:
> 
> 1. The normal startup exchange occurs up to the point just before where
>the (normal) file info (names + attributes) starts to flow from the
>sender to the receiver.
> 
> 2a At this point IFF the sender is the remote process (i.e. we're
>pulling files), the receiver begins to send file names (separated by
>either newlines or nulls, as indicated by the --null option) over the
>socket (normally there is no data being sent to the sender during
>this stage).  The end of the list is marked by an empty entry.  (Note
>that the receiver begins receiving file info from the sender during
>this stage, so it must do both things at once without blocking.)  If
>the recursive flag is set, the receiver may get more names back than
>it sends out.
> 
> 2b Alternately, if the sender is the local process, the normal file info
>transfer happens (without anything new occurring over the socket).
> 
> 3. The rest of the transfer proceeds as normal.
> 
> So, if a filter understood the protocol enough to be able to pass
> through all the initial rsync data, it could actually look at all the
> names that go over the wire and allow/disallow/tweak them however it
> desired.  (It's sad that this filter would then have to continue to
> relay all the data over the socket after its work was done, but that's
> the price you pay.)  You'd just have to look for the --null option on
> the command-line to know if you're looking for a newline or a null EOL
> character, and stop scanning at the first empty name.
> 
> Alternately, you could just disallow the --files-from option and not
> worry about authorizing the data.
---end quoted text---

-- 
Lee Eakin - [EMAIL PROTECTED]
 
Lynch's Law:
  When the going gets tough, everybody leaves.
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-15 Thread jw schultz
On Wed, Jan 15, 2003 at 10:03:33AM -0800, Wayne Davison wrote:
> On Tue, Jan 14, 2003 at 07:57:48PM -0800, jw schultz wrote:
> > with the -r or -a options does this [recurse] on
> > directories in the --files-from list?
> 
> Yes, it treats them like command-line args with the following two
> exceptions:  if -r is not specified, we WILL create an explicitly named
> directory (but not send its contents), if -R is specified, we do NOT
> create implied directories (which was your next question).  This latter
> exception means that we currently require the user to ensure that the
> destination directory tree is valid (which could be done once with a
> separate rsync --files-from run that didn't use -r and specified all
> the dirs that we needed to ensure exist).  If this turns out to be too
> much of a hassle, perhaps a new option named --implied-dirs could be
> added to have rsync do its normal -R dir handling.

Plus the third difference, that relative paths in the
files-from list are relative to the tree, not to the current
directory.

We may well want the --implied-dirs option or some logic to
created it.  If you don't have -r (or -a) you need to have
all the intermediate dirs listed.  If you do have -r listing
intermediate dirs efectively disables the file list.

I'm not sure i like that.  I'm inclined to think the file-list
should disable recoursion.  Perhaps [fighting resistance of
yuck] recoursion would have to be specified explicitly after
the --file-list.  I hate sequence sensitive options but
but requiring users to remember -lptgoD instead of -a may be
worse.

You seem to see the files-from as a way of replacing
command-line args where i see it as a way of replacing the
tree scan.  However, if we can pin down the semantics i
think we can acheive both ends.


-- 

J.W. SchultzPegasystems Technologies
email address:  [EMAIL PROTECTED]

Remember Cernan and Schmitt
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-15 Thread Wayne Davison
On Wed, Jan 15, 2003 at 02:48:05PM -0600, Lee Eakin wrote:
> Now if I can only figure out a way to intercept the list when I need to
> be real picky about which individual files are accessed ...

This should be possible with a filter process.  Here's how the new,
slightly tweaked protocol works:

1. The normal startup exchange occurs up to the point just before where
   the (normal) file info (names + attributes) starts to flow from the
   sender to the receiver.

2a At this point IFF the sender is the remote process (i.e. we're
   pulling files), the receiver begins to send file names (separated by
   either newlines or nulls, as indicated by the --null option) over the
   socket (normally there is no data being sent to the sender during
   this stage).  The end of the list is marked by an empty entry.  (Note
   that the receiver begins receiving file info from the sender during
   this stage, so it must do both things at once without blocking.)  If
   the recursive flag is set, the receiver may get more names back than
   it sends out.

2b Alternately, if the sender is the local process, the normal file info
   transfer happens (without anything new occurring over the socket).

3. The rest of the transfer proceeds as normal.

So, if a filter understood the protocol enough to be able to pass
through all the initial rsync data, it could actually look at all the
names that go over the wire and allow/disallow/tweak them however it
desired.  (It's sad that this filter would then have to continue to
relay all the data over the socket after its work was done, but that's
the price you pay.)  You'd just have to look for the --null option on
the command-line to know if you're looking for a newline or a null EOL
character, and stop scanning at the first empty name.

Alternately, you could just disallow the --files-from option and not
worry about authorizing the data.

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-15 Thread Lee Eakin
---begin quoted text---
> From: Wayne Davison <[EMAIL PROTECTED]>
> Subject: Re: specifying a list of files to transfer
> Date: Wed, 15 Jan 2003 10:10:29 -0800
> 
> On Tue, Jan 14, 2003 at 10:01:47PM -0600, Lee Eakin wrote:
> > Yes, people do restrict args via ssh key restrictions.
> 
> OK, I thank you both for enlightening me on the subject.  My current
> patch applies the sanitize_path() function to all names read via the
> --files-from option, regardless of whether we're pushing or pulling.
> This means that all leading slashes are dropped from file names as
> well as all leading "../" prefixes, and that any infix "dir/../"
> combos are removed.  This ensures that we can't get above the root
> dir that was specified on the command-line.
> 

  That's awsome. Now as long as I want to allow access to the given portion
  of the file tree I can allow files-from.

  Now if I can only figure out a way to intercept the list when I need to
  be real picky about which individual files are accessed ...

> > so any sanitize code could first make sure all pathnames begin with a valid
> > module and then make sure the file or dir is really inside that module.
> 
> This isn't needed since the module name is specified on the command-line
> and then all paths are relative to the directory that was specified in
> that module.  For instance:
> 
> rsync --files-from=foo remote::module/bar
> 
> forces all pathnames read to be relative to the bar dir of the module.
> If no "/bar" path was specified, the paths would all be relative to the
> root-dir of the module.

  That's cool too, so no additional/special code to handle server-mode ;)

  I like this a lot, now to test ...
> 
---end quoted text---

  -Lee

-- 
Lee Eakin - [EMAIL PROTECTED]
 
Benchley's Law of Distinction:
  There are two kinds of people in the world, those who believe
there are two kinds of people in the world and those who don't.
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-15 Thread Wayne Davison
On Tue, Jan 14, 2003 at 10:01:47PM -0600, Lee Eakin wrote:
> Yes, people do restrict args via ssh key restrictions.

OK, I thank you both for enlightening me on the subject.  My current
patch applies the sanitize_path() function to all names read via the
--files-from option, regardless of whether we're pushing or pulling.
This means that all leading slashes are dropped from file names as
well as all leading "../" prefixes, and that any infix "dir/../"
combos are removed.  This ensures that we can't get above the root
dir that was specified on the command-line.

> so any sanitize code could first make sure all pathnames begin with a valid
> module and then make sure the file or dir is really inside that module.

This isn't needed since the module name is specified on the command-line
and then all paths are relative to the directory that was specified in
that module.  For instance:

rsync --files-from=foo remote::module/bar

forces all pathnames read to be relative to the bar dir of the module.
If no "/bar" path was specified, the paths would all be relative to the
root-dir of the module.

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-15 Thread Wayne Davison
On Tue, Jan 14, 2003 at 07:57:48PM -0800, jw schultz wrote:
> with the -r or -a options does this [recurse] on
> directories in the --files-from list?

Yes, it treats them like command-line args with the following two
exceptions:  if -r is not specified, we WILL create an explicitly named
directory (but not send its contents), if -R is specified, we do NOT
create implied directories (which was your next question).  This latter
exception means that we currently require the user to ensure that the
destination directory tree is valid (which could be done once with a
separate rsync --files-from run that didn't use -r and specified all
the dirs that we needed to ensure exist).  If this turns out to be too
much of a hassle, perhaps a new option named --implied-dirs could be
added to have rsync do its normal -R dir handling.

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-14 Thread Lee Eakin
> From: jw schultz <[EMAIL PROTECTED]>
> Date: Tue, 14 Jan 2003 20:33:46 -0800
> 
> On Tue, Jan 14, 2003 at 10:15:42PM -0600, Lee Eakin wrote:
> > ---begin quoted text---
> > > From: jw schultz <[EMAIL PROTECTED]>
> > > Date: Tue, 14 Jan 2003 20:07:58 -0800
> > > 
> > > Nope.  The files-from contents needs to passed on stdin otherwise
> > > we would hit command-line length limits.  That is why i'm
> > > stressing the fact that allowing paths not within the source
> > > or destination trees specified on the command-line would
> > > bypass your ssh command= wrapper restrictions.
> > > 
> > 
> > Oh, I see now.  Yes that could be a serious hole.  If the remote command
> > included an option (maybe a dummy --files-from) then the ssh wrapper could
> > at least abort and notify when it sees it.
> 
> If you look at Wayne's description of the patch the remote
> command does have a --files-from=- on it's command-line.
> However it would be a shame to disable that performance
> enhancing facility if we just need sanitize the contents of
> the file-from list and require that it only specify paths
> relative to the source and dest trees.
> 
> I suppose we could allow an option that would permit
> unsanitized paths.
> 

I would agree.  If the paths are known to be relative (forced to be by the
rsync running where the ssh restriction is) then (assuming the wrapper's
intent is to allow access to the whole sub-tree) it could allow the
files-from option.  If you only want to allow access to specific files,
then it would still have to disallow the option.

I can think of one possible way for the wrapper to find out what files are
being requested, but don't know enough about the interconnect between the
2 rsyncs to know if it would break it (probably).  If the wrapper could
run a modified version of the original command without a destination it
would print out a list of the files (remember that if you do not give a
destination it prints something similar to 'ls -l' with includes and
excludes applied) then it could walk the output and verify all the paths.
If it passed inspection, it could call the real command passing the file
list to stdin itself.  It would have to attach stdout and stderr properly,
and may even have to act as a pass-thru for further data coming on stdin.
It would be complicated, but might be possible if the dummy (no destination
run) did not close off the connection after reading the file list, and/or
the handshaking was clearly documented so non-developers could understand
it.

I only throw this idea out because I would really like files-from to work
even in a restricted-access mode.  It is a BIG win over parsing a large dir
for includes/excludes.

-- 
Lee Eakin - [EMAIL PROTECTED]
 
Murphy's Military Laws:
6. The buddy system is essential to your survival;
   it gives the enemy somebody else to shoot at.
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-14 Thread jw schultz
On Tue, Jan 14, 2003 at 10:15:42PM -0600, Lee Eakin wrote:
> ---begin quoted text---
> > From: jw schultz <[EMAIL PROTECTED]>
> > Date: Tue, 14 Jan 2003 20:07:58 -0800
> > 
> > Nope.  The files-from contents needs to passed on stdin otherwise
> > we would hit command-line length limits.  That is why i'm
> > stressing the fact that allowing paths not within the source
> > or destination trees specified on the command-line would
> > bypass your ssh command= wrapper restrictions.
> > 
> 
> Oh, I see now.  Yes that could be a serious hole.  If the remote command
> included an option (maybe a dummy --files-from) then the ssh wrapper could
> at least abort and notify when it sees it.

If you look at Wayne's description of the patch the remote
command does have a --files-from=- on it's command-line.
However it would be a shame to disable that performance
enhancing facility if we just need sanitize the contents of
the file-from list and require that it only specify paths
relative to the source and dest trees.

I suppose we could allow an option that would permit
unsanitized paths.

-- 

J.W. SchultzPegasystems Technologies
email address:  [EMAIL PROTECTED]

Remember Cernan and Schmitt
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-14 Thread Lee Eakin
---begin quoted text---
> From: jw schultz <[EMAIL PROTECTED]>
> Date: Tue, 14 Jan 2003 20:07:58 -0800
> 
> Nope.  The files-from contents needs to passed on stdin otherwise
> we would hit command-line length limits.  That is why i'm
> stressing the fact that allowing paths not within the source
> or destination trees specified on the command-line would
> bypass your ssh command= wrapper restrictions.
> 

Oh, I see now.  Yes that could be a serious hole.  If the remote command
included an option (maybe a dummy --files-from) then the ssh wrapper could
at least abort and notify when it sees it.

  -Lee
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-14 Thread jw schultz
On Tue, Jan 14, 2003 at 10:01:47PM -0600, Lee Eakin wrote:
> Please pardon my butting in again.  I am not a developer, but I am very
> interested in this option because I've need it in the past and had to work
> around the lack of it with include/exclude options (I wanted to sync only
> a few files from a large directory, and needed it to work via the daemon
> for anonymous access).  I also maintain the perl wrapper File::Rsync so I
> do my best to understand all of the options so I can handle them properly
> in the perl module.
> 
> ---begin quoted text---
> > From: Wayne Davison <[EMAIL PROTECTED]>
> > Date: Tue, 14 Jan 2003 19:39:49 -0800
> > 
> > On Tue, Jan 14, 2003 at 07:02:58PM -0800, jw schultz wrote:
> > > Up till now rsync hasn't touched anything outside of the paths
> > > specified on the command-line.  Changing that would mean access to
> > > rsync via ssh would no longer be restricted, just disabled.
> > 
> > Are you saying that some people have special ssh scripts that check
> > and/or tweak the file names on the command-line to ensure they fall with
> > certain bounds when running rsync commands?  I.e., if someone ran this
> > command:
> > 
> > rsync -av -e ssh "source:dir /foo/two /bar/three" /tmp
> > 
> > the remote ssh setup would handle the presence of the extra "/foo/two",
> > "/bar/three" args?  If so, I hadn't realized that people were limiting
> > ssh access by more than the traditional user/group/permissions access.
> > 
> 
> Yes, people do restrict args via ssh key restrictions.  I have done this
> myself on many occasions.  The environment variable SSH_ORIGINAL_COMMAND
> is passed to the actual command called from the command= key option so I
> write a small script to parse thru the variable checking each arg making
> sure they are what I expect (and possibly modifying them).  I also check
> pathnames to make sure they all fit my restrictions.  I then either exec
> rsync, or email the offending command to root if I find an exception
> (the mail also makes debugging easier).
> 
> I assume the remote end will get the expanded contents of files-from so
> ssh command parsing would still work properly.

Nope.  The files-from contents needs to passed on stdin otherwise
we would hit command-line length limits.  That is why i'm
stressing the fact that allowing paths not within the source
or destination trees specified on the command-line would
bypass your ssh command= wrapper restrictions.

> 
> > > Sanitizing the paths to force them to be relative on pulls
> > > but not pushes would be too asymetrical for my liking.
> > 
> > I agree that if we find that we want to sanitize the paths in some cases
> > that we should just make it the default for files-from -- i.e. make it
> > where nothing can get beyond the root dir specified on the command-line.
> > 
> > > I'd rather just disallow or sanitize absolute paths.
> 
> If you try to pull a full pathname from a daemon 'rsync remote::/foo' it
> errors out with:
> 
>   ERROR: The remote path must start with a module name not a /
> 
> so any sanitize code could first make sure all pathnames begin with a valid
> module and then make sure the file or dir is really inside that module.
> 
> ---end quoted text---
> 
> -- 
> Lee Eakin - [EMAIL PROTECTED] - Internet/Naming Services, Texas Instruments
>  
> LAWS OF COMPUTER PROGRAMMING:
> II. Any given program costs more and takes longer.
> -- 
> To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html

-- 

J.W. SchultzPegasystems Technologies
email address:  [EMAIL PROTECTED]

Remember Cernan and Schmitt
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-14 Thread Lee Eakin
Please pardon my butting in again.  I am not a developer, but I am very
interested in this option because I've need it in the past and had to work
around the lack of it with include/exclude options (I wanted to sync only
a few files from a large directory, and needed it to work via the daemon
for anonymous access).  I also maintain the perl wrapper File::Rsync so I
do my best to understand all of the options so I can handle them properly
in the perl module.

---begin quoted text---
> From: Wayne Davison <[EMAIL PROTECTED]>
> Date: Tue, 14 Jan 2003 19:39:49 -0800
> 
> On Tue, Jan 14, 2003 at 07:02:58PM -0800, jw schultz wrote:
> > Up till now rsync hasn't touched anything outside of the paths
> > specified on the command-line.  Changing that would mean access to
> > rsync via ssh would no longer be restricted, just disabled.
> 
> Are you saying that some people have special ssh scripts that check
> and/or tweak the file names on the command-line to ensure they fall with
> certain bounds when running rsync commands?  I.e., if someone ran this
> command:
> 
> rsync -av -e ssh "source:dir /foo/two /bar/three" /tmp
> 
> the remote ssh setup would handle the presence of the extra "/foo/two",
> "/bar/three" args?  If so, I hadn't realized that people were limiting
> ssh access by more than the traditional user/group/permissions access.
> 

Yes, people do restrict args via ssh key restrictions.  I have done this
myself on many occasions.  The environment variable SSH_ORIGINAL_COMMAND
is passed to the actual command called from the command= key option so I
write a small script to parse thru the variable checking each arg making
sure they are what I expect (and possibly modifying them).  I also check
pathnames to make sure they all fit my restrictions.  I then either exec
rsync, or email the offending command to root if I find an exception
(the mail also makes debugging easier).

I assume the remote end will get the expanded contents of files-from so
ssh command parsing would still work properly.

> > Sanitizing the paths to force them to be relative on pulls
> > but not pushes would be too asymetrical for my liking.
> 
> I agree that if we find that we want to sanitize the paths in some cases
> that we should just make it the default for files-from -- i.e. make it
> where nothing can get beyond the root dir specified on the command-line.
> 
> > I'd rather just disallow or sanitize absolute paths.

If you try to pull a full pathname from a daemon 'rsync remote::/foo' it
errors out with:

  ERROR: The remote path must start with a module name not a /

so any sanitize code could first make sure all pathnames begin with a valid
module and then make sure the file or dir is really inside that module.

---end quoted text---

-- 
Lee Eakin - [EMAIL PROTECTED] - Internet/Naming Services, Texas Instruments
 
LAWS OF COMPUTER PROGRAMMING:
II. Any given program costs more and takes longer.
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-14 Thread jw schultz
On Tue, Jan 14, 2003 at 07:39:49PM -0800, Wayne Davison wrote:
> On Tue, Jan 14, 2003 at 07:02:58PM -0800, jw schultz wrote:
> > Up till now rsync hasn't touched anything outside of the paths
> > specified on the command-line.  Changing that would mean access to
> > rsync via ssh would no longer be restricted, just disabled.
> 
> Are you saying that some people have special ssh scripts that check
> and/or tweak the file names on the command-line to ensure they fall with
> certain bounds when running rsync commands?  I.e., if someone ran this
> command:
> 
> rsync -av -e ssh "source:dir /foo/two /bar/three" /tmp
> 
> the remote ssh setup would handle the presence of the extra "/foo/two",
> "/bar/three" args?  If so, I hadn't realized that people were limiting
> ssh access by more than the traditional user/group/permissions access.

I don't know if they can handle correctly multiple source
paths on the command line but there are certainly people
using the command= option in authorized_keys to invoke
special scripts to check and/or tweak the rsync command line
restrict rsync to pre-aproved paths.

> > Sanitizing the paths to force them to be relative on pulls
> > but not pushes would be too asymetrical for my liking.
> 
> I agree that if we find that we want to sanitize the paths in some cases
> that we should just make it the default for files-from -- i.e. make it
> where nothing can get beyond the root dir specified on the command-line.
> 
> > I'd rather just disallow or sanitize absolute paths.
> 
> Note that it's more pervasive than just absolute paths, since someone
> can use args like "../../../etc/password" or "good_dir/../../bad_dir"
> (all of which the sanitize_path() call handles).

Yes, the relative ../../... paths slipped my mind but that
is a concern as well.

I'm aware that restricting --files-from to having relative
paths is somewhat limiting but i think it may be the better
approach.  You can always do

rsync --files-from=list / remote:/

If you need to.

I haven't had time yet to closely examine or try it but i
have two questions:

with the -r or -a options does this recourse on
directories in the --files-from list?

What happens when there are implied directories that are
missing on the destination?  For example
rsync -a --files-from=list src dest
with the list having
foo/bar/one
will dest/foo and dest/foo/bar be created with the source
directory attributes if they don't exist; will it fail; or
will the missing implied directories be created with umask
perms?

-- 

J.W. SchultzPegasystems Technologies
email address:  [EMAIL PROTECTED]

Remember Cernan and Schmitt
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-14 Thread Wayne Davison
On Tue, Jan 14, 2003 at 09:15:02PM -0600, Lee Eakin wrote:
> FYI, pulling multiple files from a daemon currently supported (well, it
> works).  Given a package of foo you can specify:
> 
>   rsync -av 'remote::foo/file1 foo/file5' /tmp

Oh!  I had left off the repeat of the module name when I tried to cajole
the daemon using this "kludge".  Thanks for the info.

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-14 Thread Wayne Davison
On Tue, Jan 14, 2003 at 07:02:58PM -0800, jw schultz wrote:
> Up till now rsync hasn't touched anything outside of the paths
> specified on the command-line.  Changing that would mean access to
> rsync via ssh would no longer be restricted, just disabled.

Are you saying that some people have special ssh scripts that check
and/or tweak the file names on the command-line to ensure they fall with
certain bounds when running rsync commands?  I.e., if someone ran this
command:

rsync -av -e ssh "source:dir /foo/two /bar/three" /tmp

the remote ssh setup would handle the presence of the extra "/foo/two",
"/bar/three" args?  If so, I hadn't realized that people were limiting
ssh access by more than the traditional user/group/permissions access.

> Sanitizing the paths to force them to be relative on pulls
> but not pushes would be too asymetrical for my liking.

I agree that if we find that we want to sanitize the paths in some cases
that we should just make it the default for files-from -- i.e. make it
where nothing can get beyond the root dir specified on the command-line.

> I'd rather just disallow or sanitize absolute paths.

Note that it's more pervasive than just absolute paths, since someone
can use args like "../../../etc/password" or "good_dir/../../bad_dir"
(all of which the sanitize_path() call handles).

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-14 Thread Lee Eakin
FYI, pulling multiple files from a daemon currently supported (well, it
works).  Given a package of foo you can specify:

  rsync -av 'remote::foo/file1 foo/file5' /tmp

It appears the daemon does proper splitting based on either white-space,
or possibly the current value of $IFS in the daemon's environment?

One other note, I did not determine whether it was a Solaris issue, or
string length limit, or file argument limit, but in my tests I could only
specify about 20 files using this method.  When I went over the limit no
files were xfered.

I was testing this a while back (just to see if I could), so I don't
remember the exact limit, but I am fairly sure I experimented with shorter
pathnames and it did not effect the max filenames I could specify.

Oh, yes.  I did not have the same limitation over ssh.  The remote shell
seems to pass any number of filenames to the remote end (of course there
may be limits depending on what login shell is used on the remote server).
  -Lee

---begin quoted text---
> From: Wayne Davison <[EMAIL PROTECTED]>
> To: jw schultz <[EMAIL PROTECTED]>
> Cc: [EMAIL PROTECTED]
> Subject: Re: specifying a list of files to transfer
> User-Agent: Mutt/1.3.28i
> X-BeenThere: [EMAIL PROTECTED]
> X-Mailman-Version: 2.0.13
> X-Original-Date: Tue, 14 Jan 2003 18:41:22 -0800
> Date: Tue, 14 Jan 2003 18:41:22 -0800
> 
> On Tue, Jan 14, 2003 at 04:35:40PM -0800, jw schultz wrote:
> > Absolute paths are bad news here.  Especially when dealing with an
> > rsync daemon.
> 
> Yes, this is something that needs to be dealt with for daemon mode since
> it does not appear to have been possible to specify multiple filenames
> to pull before (unlike remote-shell mode).
> 
> For non-daemon mode, the code is the same as it always was in this
> regard.  For example, this command:
> 
> rsync -av /tmp/one /foo/two /bar/three dest:
> 
> is no different than this command:
> 
> rsync -av --files-from=list /tmp dest:
> 
> where list contains:
> 
> one
> /foo/two
> /bar/three
> 
> In the patch I posted earlier, daemon mode did not work with the new
> --from-files option.  My latest patch has this fixed:
> 
> http://www.clari.net/~wayne/rsync-files-from.patch
> 
> And it also runs the filenames through sanitize_path() in daemon mode
> (when chroot is not specified, at least -- I haven't tested a chroot
> version yet).
> 
> ..wayne..
> -- 
> To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
---end quoted text---

-- 
Lee Eakin - [EMAIL PROTECTED]
 
With sufficient thrust, pigs fly just fine.  -- RFC 1925
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-14 Thread jw schultz
On Tue, Jan 14, 2003 at 06:41:22PM -0800, Wayne Davison wrote:
> On Tue, Jan 14, 2003 at 04:35:40PM -0800, jw schultz wrote:
> > Absolute paths are bad news here.  Especially when dealing with an
> > rsync daemon.
> 
> Yes, this is something that needs to be dealt with for daemon mode since
> it does not appear to have been possible to specify multiple filenames
> to pull before (unlike remote-shell mode).
> 
> For non-daemon mode, the code is the same as it always was in this
> regard.  For example, this command:
> 
> rsync -av /tmp/one /foo/two /bar/three dest:
> 
> is no different than this command:
> 
> rsync -av --files-from=list /tmp dest:
> 
> where list contains:
> 
> one
> /foo/two
> /bar/three

So in dest: you get
one
two
three

and if /foo/two and /bar/three are directories they are
recoursed due to -a ?

If so that would be OK, security wise for a push.

But we don't want
rsync -av --files-from=list source:dir /tmp
to allow pulling from source:/foo/two or source:/bar/three

Up till now rsync hasn't touched anything outside of the
paths specified on the command-line.  Changing that would
mean access to rsync via ssh would no longer be
restricted, just disabled.

Sanitizing the paths to force them to be relative on pulls
but not pushes would be too asymetrical for my liking.  I'd
rather just disallow or sanitize absolute paths.

> 
> In the patch I posted earlier, daemon mode did not work with the new
> --from-files option.  My latest patch has this fixed:
> 
> http://www.clari.net/~wayne/rsync-files-from.patch
> 
> And it also runs the filenames through sanitize_path() in daemon mode
> (when chroot is not specified, at least -- I haven't tested a chroot
> version yet).

chroot changes the whole meaning of absolute paths anyway.

-- 

J.W. SchultzPegasystems Technologies
email address:  [EMAIL PROTECTED]

Remember Cernan and Schmitt
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-14 Thread Wayne Davison
On Tue, Jan 14, 2003 at 04:35:40PM -0800, jw schultz wrote:
> Absolute paths are bad news here.  Especially when dealing with an
> rsync daemon.

Yes, this is something that needs to be dealt with for daemon mode since
it does not appear to have been possible to specify multiple filenames
to pull before (unlike remote-shell mode).

For non-daemon mode, the code is the same as it always was in this
regard.  For example, this command:

rsync -av /tmp/one /foo/two /bar/three dest:

is no different than this command:

rsync -av --files-from=list /tmp dest:

where list contains:

one
/foo/two
/bar/three

In the patch I posted earlier, daemon mode did not work with the new
--from-files option.  My latest patch has this fixed:

http://www.clari.net/~wayne/rsync-files-from.patch

And it also runs the filenames through sanitize_path() in daemon mode
(when chroot is not specified, at least -- I haven't tested a chroot
version yet).

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-14 Thread jw schultz
On Tue, Jan 14, 2003 at 03:57:51PM -0800, Wayne Davison wrote:
> On Tue, Jan 14, 2003 at 03:32:41PM -0600, Dave Dykstra wrote:
> > 1. Yes it should take a filename or - as a parameter.
> > 2. I don't like the idea of skipping the SRC spec.  Paths should be
> > relative to the SRC.  If somebody wants to use full paths they
> > can always have a SRC of "/".
> > 3. It should be called --files-from.
> > 4. --send-dirs and --no-implicit-dirs shouldn't be separate options,
> > they should be automatically turned on with the --files-from option.
> 
> OK, I'm also fine with these points.  Note RE comment #2: even though
> the relative path names now default to the SRC dir, the user can still
> include absolute path names in the list and rsync will transfer them

Absolute paths are bad news here.  Especially when
dealing with an rsync daemon.  This allows the user to
defeat any location restrictions.  Not only working outside
the module of an rsync daemon but also the kinds of
restrictions that someone might set up using command=
wrappers in ssh.

> without problem.  Also, I think the older implementation of --files-from
> implied the -R (--relative) option, and this implementation does not.
> 
> So, here's a *VERY EARLY* implementation that can transfer files in
> either direction.  It adds the option --files-from=FILE and the option
> --null (for null-terminated names).  "FILE" can be "-" for stdin.  This
> patch is relative to the CVS version, and is only for those that want to
> assist in implementation, design, and/or testing.  **I have not tested
> daemon mode at all yet, just simple ssh transfers in both directions.**
> 
> Compatibility note:  when pushing files, the --files-from mode will work
> with any older version of rsync that we can transfer files with.  When
> pulling files, the remote rsync must understand the "--files-from=-"
> option (which tells it to read the file list over the stdin-socket since
> it's combined with the --server option).

That's good at least temporarily, probably permanently.

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-14 Thread Wayne Davison
On Tue, Jan 14, 2003 at 03:32:41PM -0600, Dave Dykstra wrote:
> 1. Yes it should take a filename or - as a parameter.
> 2. I don't like the idea of skipping the SRC spec.  Paths should be
>   relative to the SRC.  If somebody wants to use full paths they
>   can always have a SRC of "/".
> 3. It should be called --files-from.
> 4. --send-dirs and --no-implicit-dirs shouldn't be separate options,
>   they should be automatically turned on with the --files-from option.

OK, I'm also fine with these points.  Note RE comment #2: even though
the relative path names now default to the SRC dir, the user can still
include absolute path names in the list and rsync will transfer them
without problem.  Also, I think the older implementation of --files-from
implied the -R (--relative) option, and this implementation does not.

So, here's a *VERY EARLY* implementation that can transfer files in
either direction.  It adds the option --files-from=FILE and the option
--null (for null-terminated names).  "FILE" can be "-" for stdin.  This
patch is relative to the CVS version, and is only for those that want to
assist in implementation, design, and/or testing.  **I have not tested
daemon mode at all yet, just simple ssh transfers in both directions.**

Compatibility note:  when pushing files, the --files-from mode will work
with any older version of rsync that we can transfer files with.  When
pulling files, the remote rsync must understand the "--files-from=-"
option (which tells it to read the file list over the stdin-socket since
it's combined with the --server option).

Aside:  there was a huge chunk of code in main.c that was not indented
correctly (due to the addition of some read_batch stuff).  I didn't want
to march the code off the edge of the screen any further, so I made the
read_batch code use a goto.  Those that have a weak stomach may wish to
avert their gaze from that portion of the patch.

..wayne..

Index: flist.c
--- flist.c 24 Dec 2002 07:42:04 -  1.127
+++ flist.c 14 Jan 2003 23:44:21 -
@@ -41,6 +41,8 @@
 extern int cvs_exclude;
 
 extern int recurse;
+extern char *files_from;
+extern int files_from_fd;
 
 extern int one_file_system;
 extern int make_backups;
@@ -680,7 +682,7 @@
if (noexcludes)
goto skip_excludes;
 
-   if (S_ISDIR(st.st_mode) && !recurse) {
+   if (S_ISDIR(st.st_mode) && !recurse && !files_from) {
rprintf(FINFO, "skipping directory %s\n", fname);
return NULL;
}
@@ -876,12 +878,13 @@
  **/
 struct file_list *send_file_list(int f, int argc, char *argv[])
 {
-   int i, l;
+   int l;
STRUCT_STAT st;
char *p, *dir, *olddir;
char lastpath[MAXPATHLEN] = "";
struct file_list *flist;
int64 start_write;
+   int use_ff_fd = 0;
 
if (show_filelist_p() && f != -1)
start_filelist_progress("building file list");
@@ -890,16 +893,33 @@
 
flist = flist_new();
 
-   if (f != -1) {
+   if (f != -1)
io_start_buffering(f);
+
+   if (files_from && f != -1) {
+   if (!push_dir(argv[0], 0)) {
+   rprintf(FERROR, "push_dir %s : %s\n",
+   argv[0], strerror(errno));
+   exit_cleanup(RERR_FILESELECT);
+   }
+   use_ff_fd = 1;
}
 
-   for (i = 0; i < argc; i++) {
+   while (1) {
char *fname = topsrcname;
 
-   strlcpy(fname, argv[i], MAXPATHLEN);
+   if (use_ff_fd) {
+   l = read_filesfrom_line(files_from_fd, fname);
+   if (!l)
+   break;
+   }
+   else {
+   if (argc-- == 0)
+   break;
+   strlcpy(fname, *argv++, MAXPATHLEN);
+   l = strlen(fname);
+   }
 
-   l = strlen(fname);
if (l != 1 && fname[l - 1] == '/') {
if ((l == 2) && (fname[0] == '.')) {
/*  Turn ./ into just . rather than ./.
@@ -922,7 +942,7 @@
continue;
}
 
-   if (S_ISDIR(st.st_mode) && !recurse) {
+   if (S_ISDIR(st.st_mode) && !recurse && !files_from) {
rprintf(FINFO, "skipping directory %s\n", fname);
continue;
}
@@ -940,7 +960,7 @@
dir = fname;
fname = p + 1;
}
-   } else if (f != -1 && (p = strrchr(fname, '/'))) {
+   } else if (f != -1 && !files_from && (p=strrchr(fname,'/'))) {
/* this ensures we send the intermediate directories,
   thus getting their permissions right */

Re: specifying a list of files to transfer

2003-01-14 Thread Andrew J. Schorr
On Tue, Jan 14, 2003 at 03:32:41PM -0600, Dave Dykstra wrote:
> I haven't looked at the implementation, but comments on the user
> interface:
> 1. Yes it should take a filename or - as a parameter.
> 2. I don't like the idea of skipping the SRC spec.  Paths should be
>   relative to the SRC.  If somebody wants to use full paths they
>   can always have a SRC of "/".
> 3. It should be called --files-from.
> 4. --send-dirs and --no-implicit-dirs shouldn't be separate options,
>   they should be automatically turned on with the --files-from option.

Those comments all sound reasonable to me.  The only reason I broke
out the --send-dirs and --no-implicit-dirs options was because they
were orthogonal to what I was doing and could potentially also apply
to situations where the user was specifing various SRC filenames
on the command line.  But it's certainly fine to have --files-from turn
those on automatically.

-Andy
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-14 Thread Dave Dykstra
On Tue, Jan 14, 2003 at 01:21:29PM -0800, Wayne Davison wrote:
> On Tue, Jan 14, 2003 at 11:02:44AM -0500, Andrew J. Schorr wrote:
> > I am attaching an updated version of my patch to allow you to specify
> > a list of files to transfer.
> 
> Cool.  I'm looking into making this work when fetching files.  Towards
> that end, I'd like to suggest an alternate command-line syntax to make
> the --source-file option take a filename.  This will allow it to accept
> "-" as stdin, and will make it easy to parse for the pull syntax.  This
> means that we need to omit the SRC spec on a push or specify it as
> empty.  E.g. these will all work:
> 
>   rsync --source-list=file remote:/path
>   rsync --source-list file : remote:/path
>   rsync --source-list=- "" remote:/path  
> A pull looks like this:
> 
>   rsync --source-list=file remote: /path
>   rsync --source-list - remote::module /path  
> What do people think?  Of course this is not for the rsync release we're
> currently working on, but could be included as a patch, if desired.

I haven't looked at the implementation, but comments on the user
interface:
1. Yes it should take a filename or - as a parameter.
2. I don't like the idea of skipping the SRC spec.  Paths should be
relative to the SRC.  If somebody wants to use full paths they
can always have a SRC of "/".
3. It should be called --files-from.
4. --send-dirs and --no-implicit-dirs shouldn't be separate options,
they should be automatically turned on with the --files-from option.

- Dave
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-14 Thread Wayne Davison
On Tue, Jan 14, 2003 at 11:02:44AM -0500, Andrew J. Schorr wrote:
> I am attaching an updated version of my patch to allow you to specify
> a list of files to transfer.

Cool.  I'm looking into making this work when fetching files.  Towards
that end, I'd like to suggest an alternate command-line syntax to make
the --source-file option take a filename.  This will allow it to accept
"-" as stdin, and will make it easy to parse for the pull syntax.  This
means that we need to omit the SRC spec on a push or specify it as
empty.  E.g. these will all work:

  rsync --source-list=file remote:/path
  rsync --source-list file : remote:/path
  rsync --source-list=- "" remote:/path http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



specifying a list of files to transfer

2003-01-14 Thread Andrew J. Schorr
Hi,

I don't want to start another --files-from war, but I am attaching
an updated version of my patch to allow you to specify a list
of files to transfer.  The normal rsync syntax allows you to specify
a list of SRC files to transfer on the command line.  This patch
adds some new options to allow you to instead supply a file that
contains a list of files to transfer.

The previous version of the patch was against rsync-2.4.6; this version
works for rsync-2.5.5.  The only real changes relate to the use of
the popt option parsing library in 2.5 (not used in 2.4).  This had
the minor effect of removing the possibility of using "-" to indicate
stdin since the new library seems to interpret this as an option and
barfs.  So instead I allow the use of "/dev/stdin".

By the way, this patch should also work against rsync-2.5.6pre1 except
for a couple of changes relating to white space and comments.  So a couple
of patch hunks are rejected but are easy to fix by hand.  If there is a
need, I can post an updated patch.

Last time we discussed this, Dave Dykstra objected to this patch
for two reasons:

   1. This patch only works in a single direction: when sending from a local
  system to a remote system.  It does not handle the case where you
  are receiving from a remote system to a local system.
   
   2. This capability is possible to achieve by specifying a list
  of files with --include-from and then adding --exclude '*' to
  ignore other files.  While this is true, it turns out to be
  much slower.  I have finally run a performance test to demonstrate
  this.  Results are below.

The basic idea of the patch is to handle the case where you already know
a list of files that might need to be updated and don't want to use
rsync's recursive directory tree scanning logic to enumerate all files.
The patch adds the following options:

 --source-list   SRC arg will be a (local) file name containing a list of 
files, or /dev/stdin
 --null  used with --source-list to indicate that the file names 
will be separated by null (zero) bytes instead of linefeed characters; useful with 
gfind -print0
 --send-dirs send directory entries even though not in recursive mode
 --no-implicit-dirs  do not send implicit directories (parents of the file 
being sent)

The --source-list option allows you to supply an explicit list of filenames
to transport without using the --recursive feature and without playing
around with include and exclude files.  As discussed below, the same
thing can be done by combining --recursive with --include-from and --exclude,
but it's significantly slower and more arcane to do it that way.

The --null flag allows you to handle files with embedded linefeeds.  This
is in the style of gnu find's -print0 operator.

The --send-dirs overcomes a problem where rsync refuses to send directories
unless it's in recursive mode.  One needs this to make sure that even
empty directories get mirrored.

And the --no-implicit-dirs option turns off the default behavior in which
all the parent directories of a file are transmitted before sending the
file.  That default behavior is very inefficient in my scenario where I
am taking the responsibility for sending those directories myself.

And now for a performance test:

I have a directory tree containing 128219 files of which 16064 are
directories.

To start the test, I made a list of files that had changed in the
past day:

   find . -mtime -1 -print > /tmp/changed

(normally, my list of candidate files is generated by some other means,
this is just a test example).  There were 5059 entries in /tmp/changed.

I used my new options to sync up these files to another host
as follows:

  time rsync -RlHptgoD --numeric-ids --source-list \
--send-dirs --no-implicit-dirs -xz --stats /dev/stdin \
remotehost:/extra_disk/tmp/tree1 < /tmp/changed

Here were the reported statistics:

 Number of files: 5059
 Number of files transferred: 5056
 Total file size: 355514100 bytes
 Total transferred file size: 355514100 bytes
 Literal data: 355514100 bytes
 Matched data: 0 bytes
 File list size: 139687
 Total bytes written: 154858363
 Total bytes read: 80916

 wrote 154858363 bytes  read 80916 bytes  364992.41 bytes/sec
 total size is 355514100  speedup is 2.29

And the time statistics:

  112.53u 8.82s 7:03.92 28.6%

I then ran the same command again (in which case there was nothing to
transfer).  Here's how long it took:

0.54u 0.62s 0:08.61 13.4%

Now to compare with the recursive method using --include-from.  First, we
must create the list of files.  In the case of include-from, we need to
include all the parent directories as include patterns.  The following
gawk seems to do the job:

  gawk '$0 != "./" {sub(/^\.\//,"")} {while ((length > 0) && !($0 in already)) 
{print "/"$0; already[$0] = 1;