FWD: Re: specifying a list of files to transfer
On Fri, Jan 17, 2003 at 05:42:41PM -0800, Wayne Davison wrote: > On Fri, Jan 17, 2003 at 04:21:59PM -0800, jw schultz wrote: > > It should not do /root2/i386/etc/init.d/rsyncd and so on as > > -R would have it. > > -R would only do that if you actually prefixed the paths with the source > dir, which is not what happens with --files-from. The source dir is > just used as the default dir. So, your example works exactly as you are > expecting. I.e, this set of commands: > > cd /some/path > rsync -R `cat /tmp/files` remote:/dest > > works much like this new command: > > rsync --files-from=/tmp/files /some/path remote:/dest > > Except that it also transfers any named dirs in the input file (without > -r and without recursing). Note also that this reflects the new default > of -R being enabled by default when --files-from is specified. > > If the user wants the extra dirs prefixed from the source spec, they > just need to specify them as part of the dest: > > rsync --files-from=/tmp/files /some/path remote:/dest/some/path Great! It seems _I_ missed something. I think it is the difference between the behavior of list items and command line items that threw me. Sometimes it helps to actually use an example. We'll have to make sure the manpage is very clear. > > I hope this points out clearly the difference in our perspectives on > > this. I am not talking about a way to extend the command line. I am > > talking about an explicit list that eliminates the tree walk and > > awkwardness of artificial include/exclude lists [...] > > Sorry, but I don't see any conflict in our perspectives at all. Let me > know if I'm missing something. It sounds like the only remaining issues (mostly implementation detail) are: implied directories (resolved i think) when to recourse directories in the list duplicate dirs -- J.W. SchultzPegasystems Technologies email address: [EMAIL PROTECTED] Remember Cernan and Schmitt -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
On Fri, Jan 17, 2003 at 04:21:59PM -0800, jw schultz wrote: > It should not do /root2/i386/etc/init.d/rsyncd and so on as > -R would have it. -R would only do that if you actually prefixed the paths with the source dir, which is not what happens with --files-from. The source dir is just used as the default dir. So, your example works exactly as you are expecting. I.e, this set of commands: cd /some/path rsync -R `cat /tmp/files` remote:/dest works much like this new command: rsync --files-from=/tmp/files /some/path remote:/dest Except that it also transfers any named dirs in the input file (without -r and without recursing). Note also that this reflects the new default of -R being enabled by default when --files-from is specified. If the user wants the extra dirs prefixed from the source spec, they just need to specify them as part of the dest: rsync --files-from=/tmp/files /some/path remote:/dest/some/path > I hope this points out clearly the difference in our perspectives on > this. I am not talking about a way to extend the command line. I am > talking about an explicit list that eliminates the tree walk and > awkwardness of artificial include/exclude lists [...] Sorry, but I don't see any conflict in our perspectives at all. Let me know if I'm missing something. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
---begin quoted text--- > From: jw schultz <[EMAIL PROTECTED]> > Date: Fri, 17 Jan 2003 16:21:59 -0800 > > On Fri, Jan 17, 2003 at 04:42:51PM -0600, Dave Dykstra wrote: > > On Thu, Jan 16, 2003 at 11:14:50PM -0800, Wayne Davison wrote: > > > > Oh, right, I hadn't thought of that implication of the way this is > > implemented. Definitely we want the -R functionality implied. That's > > the only way I can imagine people wanting to use this. > > I can think of a couple of uses for a --no-relative option. It would not be the common case, I agree with the examples below. They illustrate both the common case and the exception quite well. I can see a case where you want to backup several critical files from a one system to a single (flat) directory on another. The flattened example below would work well for this. Of course the example also shows a filename stepping on another, but since --no-relative would would be the exception instead of default, the user can deal with it (they explicitly asked for it after all). I can also see a case where you have several files in a single directory that you want to update from a master repository, but the repository has them spread out in different dirs (may due to different files for different architectures). This option could allow you to update say /usr/local/bin pulling from several known locations save in the distlist file. Sorry, just had to throw this in. I understand stand the desire to avoid feeping creaturism. Making software more useful to more people with hideous bloat is a very difficult balance. -Lee > > rsync -lptgoDu --delete --files-from=distlist distserver::8.0/i386 /root2 > where distlist is > etc/init.d/rsyncd > etc/rsyncd.conf > usr/bin/rsync > usr/bin/rsyncstats > usr/sbin/rcrsyncd > usr/sbin/rsyncd > usr/share/doc/packages/rsync > usr/share/doc/packages/rsync/COPYING > usr/share/doc/packages/rsync/README > usr/share/doc/packages/rsync/tech_report.ps > usr/share/doc/packages/rsync/tech_report.tex > usr/share/man/man1/rsync.1.gz > usr/share/man/man5/rsyncd.conf.5.gz > > It should not do /root2/i386/etc/init.d/rsyncd and so on as > -R would have it. > > It should not create (flattened) > /root2/rsyncd # from /etc/init.d > /root2/rsyncd.conf > /root2/rsync > /root2/rsyncstats > /root2/rcrsyncd > /root2/rsyncd # from usr/sbin? > /root2/COPYING > /root2/README > /root2/tech_report.ps > /root2/tech_report.tex > /root2/rsync.1.gz > /root2/rsyncd.conf.5.gz > > What it should create or update is /root2/etc/init.d/rsyncd and so on. > and it should be equivalent to > rsync -lptgoDu --delete --files-from=distlist \ > distserver:/data/distribution/8.0/i386 /root2 > or > rsync -lptgoDu --delete --files-from=distlist \ > /data/distribution/8.0/i386 client:/root2 > > > If /root2/usr/share/doc/packages doesn't exist it should be > created with perms from source but it should not be recoursed. > > This example is drawn from one of the most recent emails > requesting this feature. > ---end quoted text--- -- Lee Eakin - [EMAIL PROTECTED] Life's not fair, but the root password helps. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
On Fri, Jan 17, 2003 at 04:42:51PM -0600, Dave Dykstra wrote: > On Thu, Jan 16, 2003 at 11:14:50PM -0800, Wayne Davison wrote: > > On Thu, Jan 16, 2003 at 07:06:05PM -0800, jw schultz wrote: > > > [...] and that entries therein are not flattened like they would be on > > > the command-line (sans -R). > > > > But they *are* flattened exactly like on the command-line, at least in > > my current patch they are. That's what -R is for -- telling rsync not > > to do that. So, without -R there are no implied directories to create > > except for the destination dir (which is created if it doesn't exist). > > Oh, right, I hadn't thought of that implication of the way this is > implemented. Definitely we want the -R functionality implied. That's > the only way I can imagine people wanting to use this. > > > > > > The permissions and ownership should be derived from the source. > > > so effectively it should be as though > > > ./deltapics > > > where in the file list. > > > > Right. In fact, that's exactly what happens with -R -- all intermediate > > directories get added to the file list (if they aren't already in it) > > without causing any extra recursion (even if -r was specified/implied). > > > In my former hack implementation of the "exclude optimization" (when > there were only includes with no wildcards and a final exclude '*') it > was able to skip sending the parent directories completely. Come to > think of it, I'm not sure what kind of permissions were used for the > directories that were not explicitly included, maybe it just use the > default. > > > > If people want the "--files-from" to imply "-R" then I'd want to see a > > "--no-relative" option to let people turn it off. > > That would be easy to implement so I guess it wouldn't hurt but I really > can't see people wanting to do that. rsync -lptgoDu --delete --files-from=distlist distserver::8.0/i386 /root2 where distlist is etc/init.d/rsyncd etc/rsyncd.conf usr/bin/rsync usr/bin/rsyncstats usr/sbin/rcrsyncd usr/sbin/rsyncd usr/share/doc/packages/rsync usr/share/doc/packages/rsync/COPYING usr/share/doc/packages/rsync/README usr/share/doc/packages/rsync/tech_report.ps usr/share/doc/packages/rsync/tech_report.tex usr/share/man/man1/rsync.1.gz usr/share/man/man5/rsyncd.conf.5.gz It should not do /root2/i386/etc/init.d/rsyncd and so on as -R would have it. It should not create (flattened) /root2/rsyncd # from /etc/init.d /root2/rsyncd.conf /root2/rsync /root2/rsyncstats /root2/rcrsyncd /root2/rsyncd # from usr/sbin? /root2/COPYING /root2/README /root2/tech_report.ps /root2/tech_report.tex /root2/rsync.1.gz /root2/rsyncd.conf.5.gz What it should create or update is /root2/etc/init.d/rsyncd and so on. and it should be equivalent to rsync -lptgoDu --delete --files-from=distlist \ distserver:/data/distribution/8.0/i386 /root2 or rsync -lptgoDu --delete --files-from=distlist \ /data/distribution/8.0/i386 client:/root2 If /root2/usr/share/doc/packages doesn't exist it should be created with perms from source but it should not be recoursed. This example is drawn from one of the most recent emails requesting this feature. I want to thank Wayne for his work on this and his patience with me. I seem to be butting heads with him while he has been good enough to actually write code. I hope this points out clearly the difference in our perspectives on this. I am not talking about a way to extend the command line. I am talking about an explicit list that eliminates the tree walk and awkwardness of artificial include/exclude lists and has a similar effect to while read subpath do rsync -lptgoD distserver::8.0/i386/$subpath /root2/$subpath done http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
On Fri, Jan 17, 2003 at 11:46:33AM -0500, Andrew J. Schorr wrote: > Is there a possible case where > the --files-from file list lives on the remote (sender) side? Yes, I could see that being possible -- your update scenario is even an interesting example. It's actually easy to have the remote sender open the file list on the sending side with the addition of a little code that allows the --files-from name to be prefixed by a hostname (that must match the sender's hostname). > Come to think of it, if the data lives on the remote server, where > would a local files-from list come from? How would it be generated? Since the list is manually generated I imagine that either the user has advanced knowledge of what is to be grabbed or the user first runs a remote command (via ssh, perhaps) that generates the list. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
On Fri, Jan 17, 2003 at 12:14:13PM -0500, Andrew J. Schorr wrote: > If I run rsync in such a way that parent directories are sent automatically, > it will send the following files (based on -vvv output): > >make_file(4,test) >make_file(4,test/foo.jpg) >make_file(4,test/bar.jpg) >make_file(4,test) >make_file(4,test/sub) >make_file(4,test/sub/foo.jpg) >make_file(4,test) >make_file(4,test/zeke.jpg) Yeesh, that's bad. It sends all these duplicates in the file list, but then goes through the list and tries to remove duplicates, so it doesn't transfer all of these duplicate names. However, since it has a (known) bug that fails to remove multiple duplicates in a row, so it does do some significant redundant processing at the moment. I've written a better implied-directory-adding routine that greatly reduces the added dirs to the file list when the input list is in a normal hierarchical order -- i.e. it doesn't use a hash, but does keep track of the previous path in a better way. I've also fixed the duplicate-removing code to be able to handle multiple dups in a row. I haven't committed either of these patches (since we're trying to get a release out), but I'll put them into my --files-from patch when next I update it. The dups-removing fix is actually pretty simple, so if we think that this would be something that we'd like to see in this next release, I could commit that. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
On Thu, Jan 16, 2003 at 11:14:50PM -0800, Wayne Davison wrote: > On Thu, Jan 16, 2003 at 07:06:05PM -0800, jw schultz wrote: > > [...] and that entries therein are not flattened like they would be on > > the command-line (sans -R). > > But they *are* flattened exactly like on the command-line, at least in > my current patch they are. That's what -R is for -- telling rsync not > to do that. So, without -R there are no implied directories to create > except for the destination dir (which is created if it doesn't exist). Oh, right, I hadn't thought of that implication of the way this is implemented. Definitely we want the -R functionality implied. That's the only way I can imagine people wanting to use this. > > The permissions and ownership should be derived from the source. > > so effectively it should be as though > > ./deltapics > > where in the file list. > > Right. In fact, that's exactly what happens with -R -- all intermediate > directories get added to the file list (if they aren't already in it) > without causing any extra recursion (even if -r was specified/implied). In my former hack implementation of the "exclude optimization" (when there were only includes with no wildcards and a final exclude '*') it was able to skip sending the parent directories completely. Come to think of it, I'm not sure what kind of permissions were used for the directories that were not explicitly included, maybe it just use the default. > If people want the "--files-from" to imply "-R" then I'd want to see a > "--no-relative" option to let people turn it off. That would be easy to implement so I guess it wouldn't hurt but I really can't see people wanting to do that. - Dave -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
On Thu, Jan 16, 2003 at 07:06:05PM -0800, jw schultz wrote: > I know i'm not talking about when -R is used. I am talking > about creating implied intermediate directories without -R. > I'm talking about being able to take the output of > find -name '*.jpg' and have it create (if necessary) any > intermediate directories while maintaining the equivalency > of src and dest. If that means also behaving as though > those directories were already in the list that would be OK > as long as -r weren't specified. > > find . -name '*.jpg' | rsync -a --files-from=- . remote: > should when it hits > ./deltapics/031CGMUa.jpg > ./deltapics/031CGNga.jpg > ./deltapics/031CGOHa.jpg > ./deltapics/031CGPOa.jpg > ./deltapics/031CGPba.jpg > create the deltapics directory if it doesn't exist. The > permissions and ownership should be derived from the source. > so effectively it should be as though > ./deltapics > where in the file list. It needn't be updated if it > does exist but if easier to implement it that way i wouldn't > object. In such a case even if -r is > allowed and specified the implied directory should not defeat > the the file list by transferring any files not in the list. > > No errors, no need to do a run to find the missing > directories and add them and no need to add a filter to the > stream adding entries for directories that are missing. There are performance issues associated with sending all the parent directories automatically. Consider the situation where running "find test -name "*.jpg" -print" gives the following results (and yes, this does happen, at least for me on solaris 8 where the output of find seems to depend on the order in which the directory entries were created): test/foo.jpg test/bar.jpg test/sub/foo.jpg test/zeke.jpg If I run rsync in such a way that parent directories are sent automatically, it will send the following files (based on -vvv output): make_file(4,test) make_file(4,test/foo.jpg) make_file(4,test/bar.jpg) make_file(4,test) make_file(4,test/sub) make_file(4,test/sub/foo.jpg) make_file(4,test) make_file(4,test/zeke.jpg) Note that the "test" directory is sent 3 times in this case. This is because the code that checks whether to send the directory just compares to the last one sent in an attempt to eliminate duplicates. But this is not a reliable way of preventing duplicates, as the above example demonstrates. So there is a danger of sending lots of duplicate directory entries when the automatic directory transmission feature is enabled. This could probably be fixed by keeping a hash table of all the directory entries that have already been transmitted instead of just comparing against the last one sent. In any case, I think it's important to be able to turn off the automatic directory sending feature so that situations that don't require this can avoid the performance hit. -Andy -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
On Thu, Jan 16, 2003 at 05:07:13PM -0800, Wayne Davison wrote: > (I assume you're talking about when using -R, which is not currently on > by default.) I believe that we do need an auto-creation mode and also a > way to optimize the transfer to avoid this (since it results in a lot of > extra directory checks that can slow things down when we know that they > aren't needed). Which one is the default is the current question. I'm > currently leaning toward going back to sending the implied dirs by > default, and having an option for people to optimize the transfer (which > would allow it to be used with the normal -R mode even when --from-files > was not used). This is why my patch included the --no-implicit-dirs and --send-dirs options. These allow you to specify precisely what you want without having to remember which behavior is implied by various other options... Although I suppose that those two options could be combined. The basic idea is that if the user is taking the responsibility for specifying the directories himself (--send-dirs), then rsync doesn't need to worry about automatically sending all the parent directories (--no-implicit-dirs). So perhaps an --explicit-dirs option (combining the two meanings) could be used to indicate that the user is taking all responsibility for sending directories and rsync doesn't need to worry about it. And to be safe, a --no-explicit-dirs to turn off this behavior. -Andy -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
On Thu, Jan 16, 2003 at 01:58:49PM -0800, jw schultz wrote: > On Thu, Jan 16, 2003 at 02:52:38PM -0600, Dave Dykstra wrote: > > Also, if the transfer is being sent from the remote side, the file names > > are all getting sent over to the remote side first for --files-from and > > then sent back as part of the normal protocol, right? I had hoped we'd > > be able to avoid that round trip because the list could get long. > > I don't think we can avoid the round trip without changing > rsync drastically. Just consider that this saves on sending > voluminous --include lists or invoking rsync hundreds of > times. Perhaps I'm misunderstanding this completely, but is there a possible scenario where the remote (sender) might have a list of files on his side? I might be mistaken, but it seems that the current patch supports the case where the --files-from file list is located on the local side, and it must be transmitted to the sender in the case where the sender is remote. Is there a possible case where the --files-from file list lives on the remote (sender) side? This is not relevant to me, but I'm just wondering. One might imagine a scenario in which a bunch of sites are mirroring from a server, and the server runs a job to create a list of modified files that the remote mirrors should pull for updating... Come to think of it, if the data lives on the remote server, where would a local files-from list come from? How would it be generated? -Andy -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
On Thu, Jan 16, 2003 at 07:06:05PM -0800, jw schultz wrote: > [...] and that entries therein are not flattened like they would be on > the command-line (sans -R). But they *are* flattened exactly like on the command-line, at least in my current patch they are. That's what -R is for -- telling rsync not to do that. So, without -R there are no implied directories to create except for the destination dir (which is created if it doesn't exist). > The permissions and ownership should be derived from the source. > so effectively it should be as though > ./deltapics > where in the file list. Right. In fact, that's exactly what happens with -R -- all intermediate directories get added to the file list (if they aren't already in it) without causing any extra recursion (even if -r was specified/implied). If people want the "--files-from" to imply "-R" then I'd want to see a "--no-relative" option to let people turn it off. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
On Thu, Jan 16, 2003 at 05:07:13PM -0800, Wayne Davison wrote: > > On Wed, Jan 15, 2003 at 02:49:08PM -0800, jw schultz wrote: > > > You seem to see the files-from as a way of replacing command-line > > > args where i see it as a way of replacing the tree scan. > > I actually think of it as both, since I also consider the command-line > as a way of replacing the tree scan. I think that it is fairly easy to > explain how --files-from works if you explain it in terms of how it is > much like specifying the names on the command-line (and explain what is > different about it). One thing that I'm hoping to avoid is arbitrary > limits on what the mode can do -- I'd like to see it be an easy way to > specify exactly what files to send, and also as a way to extend the size > of the command-line. The difference seems pretty big to me but if you can describe it cleanly that's fine. The big thing is that paths specified in --files-from are relative to the tree, not CWD and that entries therein are not flattened like they would be on the command-line (sans -R). > On Thu, Jan 16, 2003 at 02:52:38PM -0600, Dave Dykstra wrote: > > I would rather have the '-r' option ignored when --files-from is in > > effect. > > I wouldn't want that as a hard limit. It would be better to say that > -a doesn't imply -r when --files-from is used, but the user can still > manually specify -r if they want to. It would be easy to implement this > in a way that would _not_ require any particular order to the options on > the command-line (which I agree with JW would be a very bad idea). > > > If people leave out the directories, missing parent directories should > > be automatically created. > > (I assume you're talking about when using -R, which is not currently on > by default.) I believe that we do need an auto-creation mode and also a > way to optimize the transfer to avoid this (since it results in a lot of > extra directory checks that can slow things down when we know that they > aren't needed). Which one is the default is the current question. I'm > currently leaning toward going back to sending the implied dirs by > default, and having an option for people to optimize the transfer (which > would allow it to be used with the normal -R mode even when --from-files > was not used). I know i'm not talking about when -R is used. I am talking about creating implied intermediate directories without -R. I'm talking about being able to take the output of find -name '*.jpg' and have it create (if necessary) any intermediate directories while maintaining the equivalency of src and dest. If that means also behaving as though those directories were already in the list that would be OK as long as -r weren't specified. find . -name '*.jpg' | rsync -a --files-from=- . remote: should when it hits ./deltapics/031CGMUa.jpg ./deltapics/031CGNga.jpg ./deltapics/031CGOHa.jpg ./deltapics/031CGPOa.jpg ./deltapics/031CGPba.jpg create the deltapics directory if it doesn't exist. The permissions and ownership should be derived from the source. so effectively it should be as though ./deltapics where in the file list. It needn't be updated if it does exist but if easier to implement it that way i wouldn't object. In such a case even if -r is allowed and specified the implied directory should not defeat the the file list by transferring any files not in the list. No errors, no need to do a run to find the missing directories and add them and no need to add a filter to the stream adding entries for directories that are missing. > > As it is now, if somebody just does "find . -print | rsync -a > > --files-from- ..." are they going to get repeated files because the > > directories are listed? > > Rsync would weed out the duplicates, but if -a implied -r in this > context then the presence of the directories would cause rsync to > recurse through all the directory content and thus make this a > horrible thing to do. A couple alternatives: > > find . -print | rsync -lptgoDR --files-from=- . remote:/dest > find . -type f -print | rsync -aR --files-from=- . remote:/dest Assume the user is working from a file list stored in a file and isn't using -R. Possibly doing a pull. Let's not force them to go through extra hoops. I know i've never used -R and i suspect many other people haven't. Try find $srddir -assorted-options|sed -e 's/^srcdir\///' \ |rsync -a --files $srcdir remote:$destdir While unlikely from the command line, not unlikely from a script. While fairly easy to do simple subs, adding lines can be a pain. -- J.W. SchultzPegasystems Technologies email address: [EMAIL PROTECTED] Remember Cernan and Schmitt -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxed
Re: specifying a list of files to transfer
> On Wed, Jan 15, 2003 at 02:49:08PM -0800, jw schultz wrote: > > You seem to see the files-from as a way of replacing command-line > > args where i see it as a way of replacing the tree scan. I actually think of it as both, since I also consider the command-line as a way of replacing the tree scan. I think that it is fairly easy to explain how --files-from works if you explain it in terms of how it is much like specifying the names on the command-line (and explain what is different about it). One thing that I'm hoping to avoid is arbitrary limits on what the mode can do -- I'd like to see it be an easy way to specify exactly what files to send, and also as a way to extend the size of the command-line. On Thu, Jan 16, 2003 at 02:52:38PM -0600, Dave Dykstra wrote: > I would rather have the '-r' option ignored when --files-from is in > effect. I wouldn't want that as a hard limit. It would be better to say that -a doesn't imply -r when --files-from is used, but the user can still manually specify -r if they want to. It would be easy to implement this in a way that would _not_ require any particular order to the options on the command-line (which I agree with JW would be a very bad idea). > If people leave out the directories, missing parent directories should > be automatically created. (I assume you're talking about when using -R, which is not currently on by default.) I believe that we do need an auto-creation mode and also a way to optimize the transfer to avoid this (since it results in a lot of extra directory checks that can slow things down when we know that they aren't needed). Which one is the default is the current question. I'm currently leaning toward going back to sending the implied dirs by default, and having an option for people to optimize the transfer (which would allow it to be used with the normal -R mode even when --from-files was not used). > As it is now, if somebody just does "find . -print | rsync -a > --files-from- ..." are they going to get repeated files because the > directories are listed? Rsync would weed out the duplicates, but if -a implied -r in this context then the presence of the directories would cause rsync to recurse through all the directory content and thus make this a horrible thing to do. A couple alternatives: find . -print | rsync -lptgoDR --files-from=- . remote:/dest find . -type f -print | rsync -aR --files-from=- . remote:/dest > Also, if the transfer is being sent from the remote side, the file names > are all getting sent over to the remote side first for --files-from and > then sent back as part of the normal protocol, right? I had hoped we'd > be able to avoid that round trip because the list could get long. I don't see a way to avoid this and still allow things like the creation of the implied directories you mentioned (which must be sent as separate entries in the current protocol). I think it's probably best to leave the current file+info send process alone. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
On Thu, Jan 16, 2003 at 02:52:38PM -0600, Dave Dykstra wrote: > Also, if the transfer is being sent from the remote side, the file names > are all getting sent over to the remote side first for --files-from and > then sent back as part of the normal protocol, right? I had hoped we'd > be able to avoid that round trip because the list could get long. I don't think we can avoid the round trip without changing rsync drastically. Just consider that this saves on sending voluminous --include lists or invoking rsync hundreds of times. -- J.W. SchultzPegasystems Technologies email address: [EMAIL PROTECTED] Remember Cernan and Schmitt -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
On Wed, Jan 15, 2003 at 02:49:08PM -0800, jw schultz wrote: > On Wed, Jan 15, 2003 at 10:03:33AM -0800, Wayne Davison wrote: > > On Tue, Jan 14, 2003 at 07:57:48PM -0800, jw schultz wrote: > > > with the -r or -a options does this [recurse] on > > > directories in the --files-from list? > > > > Yes, it treats them like command-line args with the following two > > exceptions: if -r is not specified, we WILL create an explicitly named > > directory (but not send its contents), if -R is specified, we do NOT > > create implied directories (which was your next question). This latter > > exception means that we currently require the user to ensure that the > > destination directory tree is valid (which could be done once with a > > separate rsync --files-from run that didn't use -r and specified all > > the dirs that we needed to ensure exist). If this turns out to be too > > much of a hassle, perhaps a new option named --implied-dirs could be > > added to have rsync do its normal -R dir handling. > > Plus the third difference, that relative paths in the > files-from list are relative to the tree, not to the current > directory. > > We may well want the --implied-dirs option or some logic to > created it. If you don't have -r (or -a) you need to have > all the intermediate dirs listed. If you do have -r listing > intermediate dirs efectively disables the file list. > > I'm not sure i like that. I'm inclined to think the file-list > should disable recoursion. Perhaps [fighting resistance of > yuck] recoursion would have to be specified explicitly after > the --file-list. I hate sequence sensitive options but > but requiring users to remember -lptgoD instead of -a may be > worse. > > You seem to see the files-from as a way of replacing > command-line args where i see it as a way of replacing the > tree scan. However, if we can pin down the semantics i > think we can acheive both ends. I agree more with JW. I envisioned --files-from as replacing the tree scan. I would rather have the '-r' option ignored when --files-from is in effect. I think it should be a complete list of the files and/or directories that are to be sent. If people leave out the directories, missing parent directories should be automatically created. As it is now, if somebody just does "find . -print | rsync -a --files-from- ..." are they going to get repeated files because the directories are listed? Yuck. Also, if the transfer is being sent from the remote side, the file names are all getting sent over to the remote side first for --files-from and then sent back as part of the normal protocol, right? I had hoped we'd be able to avoid that round trip because the list could get long. - Dave -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
I like it (except for the work). I don't think I'm ready to code up a filter yet, but it is good to know the general theory in case I need to use it. Of course, if I keep thinking about it I'll probably run across a situation where it's use would greatly speed/simplify my life, so maybe I should go ahead and throw something together ;). Yes, the filter would need to sit in the middle for the duration of the xfer, but I am thinking of those situations where the tree scan is very costly so the little overhead and complexity of the filter would be worth it. -Lee ---begin quoted text--- > From: Wayne Davison <[EMAIL PROTECTED]> > Date: Wed, 15 Jan 2003 13:34:08 -0800 > > On Wed, Jan 15, 2003 at 02:48:05PM -0600, Lee Eakin wrote: > > Now if I can only figure out a way to intercept the list when I need to > > be real picky about which individual files are accessed ... > > This should be possible with a filter process. Here's how the new, > slightly tweaked protocol works: > > 1. The normal startup exchange occurs up to the point just before where >the (normal) file info (names + attributes) starts to flow from the >sender to the receiver. > > 2a At this point IFF the sender is the remote process (i.e. we're >pulling files), the receiver begins to send file names (separated by >either newlines or nulls, as indicated by the --null option) over the >socket (normally there is no data being sent to the sender during >this stage). The end of the list is marked by an empty entry. (Note >that the receiver begins receiving file info from the sender during >this stage, so it must do both things at once without blocking.) If >the recursive flag is set, the receiver may get more names back than >it sends out. > > 2b Alternately, if the sender is the local process, the normal file info >transfer happens (without anything new occurring over the socket). > > 3. The rest of the transfer proceeds as normal. > > So, if a filter understood the protocol enough to be able to pass > through all the initial rsync data, it could actually look at all the > names that go over the wire and allow/disallow/tweak them however it > desired. (It's sad that this filter would then have to continue to > relay all the data over the socket after its work was done, but that's > the price you pay.) You'd just have to look for the --null option on > the command-line to know if you're looking for a newline or a null EOL > character, and stop scanning at the first empty name. > > Alternately, you could just disallow the --files-from option and not > worry about authorizing the data. ---end quoted text--- -- Lee Eakin - [EMAIL PROTECTED] Lynch's Law: When the going gets tough, everybody leaves. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
On Wed, Jan 15, 2003 at 10:03:33AM -0800, Wayne Davison wrote: > On Tue, Jan 14, 2003 at 07:57:48PM -0800, jw schultz wrote: > > with the -r or -a options does this [recurse] on > > directories in the --files-from list? > > Yes, it treats them like command-line args with the following two > exceptions: if -r is not specified, we WILL create an explicitly named > directory (but not send its contents), if -R is specified, we do NOT > create implied directories (which was your next question). This latter > exception means that we currently require the user to ensure that the > destination directory tree is valid (which could be done once with a > separate rsync --files-from run that didn't use -r and specified all > the dirs that we needed to ensure exist). If this turns out to be too > much of a hassle, perhaps a new option named --implied-dirs could be > added to have rsync do its normal -R dir handling. Plus the third difference, that relative paths in the files-from list are relative to the tree, not to the current directory. We may well want the --implied-dirs option or some logic to created it. If you don't have -r (or -a) you need to have all the intermediate dirs listed. If you do have -r listing intermediate dirs efectively disables the file list. I'm not sure i like that. I'm inclined to think the file-list should disable recoursion. Perhaps [fighting resistance of yuck] recoursion would have to be specified explicitly after the --file-list. I hate sequence sensitive options but but requiring users to remember -lptgoD instead of -a may be worse. You seem to see the files-from as a way of replacing command-line args where i see it as a way of replacing the tree scan. However, if we can pin down the semantics i think we can acheive both ends. -- J.W. SchultzPegasystems Technologies email address: [EMAIL PROTECTED] Remember Cernan and Schmitt -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
On Wed, Jan 15, 2003 at 02:48:05PM -0600, Lee Eakin wrote: > Now if I can only figure out a way to intercept the list when I need to > be real picky about which individual files are accessed ... This should be possible with a filter process. Here's how the new, slightly tweaked protocol works: 1. The normal startup exchange occurs up to the point just before where the (normal) file info (names + attributes) starts to flow from the sender to the receiver. 2a At this point IFF the sender is the remote process (i.e. we're pulling files), the receiver begins to send file names (separated by either newlines or nulls, as indicated by the --null option) over the socket (normally there is no data being sent to the sender during this stage). The end of the list is marked by an empty entry. (Note that the receiver begins receiving file info from the sender during this stage, so it must do both things at once without blocking.) If the recursive flag is set, the receiver may get more names back than it sends out. 2b Alternately, if the sender is the local process, the normal file info transfer happens (without anything new occurring over the socket). 3. The rest of the transfer proceeds as normal. So, if a filter understood the protocol enough to be able to pass through all the initial rsync data, it could actually look at all the names that go over the wire and allow/disallow/tweak them however it desired. (It's sad that this filter would then have to continue to relay all the data over the socket after its work was done, but that's the price you pay.) You'd just have to look for the --null option on the command-line to know if you're looking for a newline or a null EOL character, and stop scanning at the first empty name. Alternately, you could just disallow the --files-from option and not worry about authorizing the data. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
---begin quoted text--- > From: Wayne Davison <[EMAIL PROTECTED]> > Subject: Re: specifying a list of files to transfer > Date: Wed, 15 Jan 2003 10:10:29 -0800 > > On Tue, Jan 14, 2003 at 10:01:47PM -0600, Lee Eakin wrote: > > Yes, people do restrict args via ssh key restrictions. > > OK, I thank you both for enlightening me on the subject. My current > patch applies the sanitize_path() function to all names read via the > --files-from option, regardless of whether we're pushing or pulling. > This means that all leading slashes are dropped from file names as > well as all leading "../" prefixes, and that any infix "dir/../" > combos are removed. This ensures that we can't get above the root > dir that was specified on the command-line. > That's awsome. Now as long as I want to allow access to the given portion of the file tree I can allow files-from. Now if I can only figure out a way to intercept the list when I need to be real picky about which individual files are accessed ... > > so any sanitize code could first make sure all pathnames begin with a valid > > module and then make sure the file or dir is really inside that module. > > This isn't needed since the module name is specified on the command-line > and then all paths are relative to the directory that was specified in > that module. For instance: > > rsync --files-from=foo remote::module/bar > > forces all pathnames read to be relative to the bar dir of the module. > If no "/bar" path was specified, the paths would all be relative to the > root-dir of the module. That's cool too, so no additional/special code to handle server-mode ;) I like this a lot, now to test ... > ---end quoted text--- -Lee -- Lee Eakin - [EMAIL PROTECTED] Benchley's Law of Distinction: There are two kinds of people in the world, those who believe there are two kinds of people in the world and those who don't. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
On Tue, Jan 14, 2003 at 10:01:47PM -0600, Lee Eakin wrote: > Yes, people do restrict args via ssh key restrictions. OK, I thank you both for enlightening me on the subject. My current patch applies the sanitize_path() function to all names read via the --files-from option, regardless of whether we're pushing or pulling. This means that all leading slashes are dropped from file names as well as all leading "../" prefixes, and that any infix "dir/../" combos are removed. This ensures that we can't get above the root dir that was specified on the command-line. > so any sanitize code could first make sure all pathnames begin with a valid > module and then make sure the file or dir is really inside that module. This isn't needed since the module name is specified on the command-line and then all paths are relative to the directory that was specified in that module. For instance: rsync --files-from=foo remote::module/bar forces all pathnames read to be relative to the bar dir of the module. If no "/bar" path was specified, the paths would all be relative to the root-dir of the module. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
On Tue, Jan 14, 2003 at 07:57:48PM -0800, jw schultz wrote: > with the -r or -a options does this [recurse] on > directories in the --files-from list? Yes, it treats them like command-line args with the following two exceptions: if -r is not specified, we WILL create an explicitly named directory (but not send its contents), if -R is specified, we do NOT create implied directories (which was your next question). This latter exception means that we currently require the user to ensure that the destination directory tree is valid (which could be done once with a separate rsync --files-from run that didn't use -r and specified all the dirs that we needed to ensure exist). If this turns out to be too much of a hassle, perhaps a new option named --implied-dirs could be added to have rsync do its normal -R dir handling. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
> From: jw schultz <[EMAIL PROTECTED]> > Date: Tue, 14 Jan 2003 20:33:46 -0800 > > On Tue, Jan 14, 2003 at 10:15:42PM -0600, Lee Eakin wrote: > > ---begin quoted text--- > > > From: jw schultz <[EMAIL PROTECTED]> > > > Date: Tue, 14 Jan 2003 20:07:58 -0800 > > > > > > Nope. The files-from contents needs to passed on stdin otherwise > > > we would hit command-line length limits. That is why i'm > > > stressing the fact that allowing paths not within the source > > > or destination trees specified on the command-line would > > > bypass your ssh command= wrapper restrictions. > > > > > > > Oh, I see now. Yes that could be a serious hole. If the remote command > > included an option (maybe a dummy --files-from) then the ssh wrapper could > > at least abort and notify when it sees it. > > If you look at Wayne's description of the patch the remote > command does have a --files-from=- on it's command-line. > However it would be a shame to disable that performance > enhancing facility if we just need sanitize the contents of > the file-from list and require that it only specify paths > relative to the source and dest trees. > > I suppose we could allow an option that would permit > unsanitized paths. > I would agree. If the paths are known to be relative (forced to be by the rsync running where the ssh restriction is) then (assuming the wrapper's intent is to allow access to the whole sub-tree) it could allow the files-from option. If you only want to allow access to specific files, then it would still have to disallow the option. I can think of one possible way for the wrapper to find out what files are being requested, but don't know enough about the interconnect between the 2 rsyncs to know if it would break it (probably). If the wrapper could run a modified version of the original command without a destination it would print out a list of the files (remember that if you do not give a destination it prints something similar to 'ls -l' with includes and excludes applied) then it could walk the output and verify all the paths. If it passed inspection, it could call the real command passing the file list to stdin itself. It would have to attach stdout and stderr properly, and may even have to act as a pass-thru for further data coming on stdin. It would be complicated, but might be possible if the dummy (no destination run) did not close off the connection after reading the file list, and/or the handshaking was clearly documented so non-developers could understand it. I only throw this idea out because I would really like files-from to work even in a restricted-access mode. It is a BIG win over parsing a large dir for includes/excludes. -- Lee Eakin - [EMAIL PROTECTED] Murphy's Military Laws: 6. The buddy system is essential to your survival; it gives the enemy somebody else to shoot at. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
On Tue, Jan 14, 2003 at 10:15:42PM -0600, Lee Eakin wrote: > ---begin quoted text--- > > From: jw schultz <[EMAIL PROTECTED]> > > Date: Tue, 14 Jan 2003 20:07:58 -0800 > > > > Nope. The files-from contents needs to passed on stdin otherwise > > we would hit command-line length limits. That is why i'm > > stressing the fact that allowing paths not within the source > > or destination trees specified on the command-line would > > bypass your ssh command= wrapper restrictions. > > > > Oh, I see now. Yes that could be a serious hole. If the remote command > included an option (maybe a dummy --files-from) then the ssh wrapper could > at least abort and notify when it sees it. If you look at Wayne's description of the patch the remote command does have a --files-from=- on it's command-line. However it would be a shame to disable that performance enhancing facility if we just need sanitize the contents of the file-from list and require that it only specify paths relative to the source and dest trees. I suppose we could allow an option that would permit unsanitized paths. -- J.W. SchultzPegasystems Technologies email address: [EMAIL PROTECTED] Remember Cernan and Schmitt -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
---begin quoted text--- > From: jw schultz <[EMAIL PROTECTED]> > Date: Tue, 14 Jan 2003 20:07:58 -0800 > > Nope. The files-from contents needs to passed on stdin otherwise > we would hit command-line length limits. That is why i'm > stressing the fact that allowing paths not within the source > or destination trees specified on the command-line would > bypass your ssh command= wrapper restrictions. > Oh, I see now. Yes that could be a serious hole. If the remote command included an option (maybe a dummy --files-from) then the ssh wrapper could at least abort and notify when it sees it. -Lee -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
On Tue, Jan 14, 2003 at 10:01:47PM -0600, Lee Eakin wrote: > Please pardon my butting in again. I am not a developer, but I am very > interested in this option because I've need it in the past and had to work > around the lack of it with include/exclude options (I wanted to sync only > a few files from a large directory, and needed it to work via the daemon > for anonymous access). I also maintain the perl wrapper File::Rsync so I > do my best to understand all of the options so I can handle them properly > in the perl module. > > ---begin quoted text--- > > From: Wayne Davison <[EMAIL PROTECTED]> > > Date: Tue, 14 Jan 2003 19:39:49 -0800 > > > > On Tue, Jan 14, 2003 at 07:02:58PM -0800, jw schultz wrote: > > > Up till now rsync hasn't touched anything outside of the paths > > > specified on the command-line. Changing that would mean access to > > > rsync via ssh would no longer be restricted, just disabled. > > > > Are you saying that some people have special ssh scripts that check > > and/or tweak the file names on the command-line to ensure they fall with > > certain bounds when running rsync commands? I.e., if someone ran this > > command: > > > > rsync -av -e ssh "source:dir /foo/two /bar/three" /tmp > > > > the remote ssh setup would handle the presence of the extra "/foo/two", > > "/bar/three" args? If so, I hadn't realized that people were limiting > > ssh access by more than the traditional user/group/permissions access. > > > > Yes, people do restrict args via ssh key restrictions. I have done this > myself on many occasions. The environment variable SSH_ORIGINAL_COMMAND > is passed to the actual command called from the command= key option so I > write a small script to parse thru the variable checking each arg making > sure they are what I expect (and possibly modifying them). I also check > pathnames to make sure they all fit my restrictions. I then either exec > rsync, or email the offending command to root if I find an exception > (the mail also makes debugging easier). > > I assume the remote end will get the expanded contents of files-from so > ssh command parsing would still work properly. Nope. The files-from contents needs to passed on stdin otherwise we would hit command-line length limits. That is why i'm stressing the fact that allowing paths not within the source or destination trees specified on the command-line would bypass your ssh command= wrapper restrictions. > > > > Sanitizing the paths to force them to be relative on pulls > > > but not pushes would be too asymetrical for my liking. > > > > I agree that if we find that we want to sanitize the paths in some cases > > that we should just make it the default for files-from -- i.e. make it > > where nothing can get beyond the root dir specified on the command-line. > > > > > I'd rather just disallow or sanitize absolute paths. > > If you try to pull a full pathname from a daemon 'rsync remote::/foo' it > errors out with: > > ERROR: The remote path must start with a module name not a / > > so any sanitize code could first make sure all pathnames begin with a valid > module and then make sure the file or dir is really inside that module. > > ---end quoted text--- > > -- > Lee Eakin - [EMAIL PROTECTED] - Internet/Naming Services, Texas Instruments > > LAWS OF COMPUTER PROGRAMMING: > II. Any given program costs more and takes longer. > -- > To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync > Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html -- J.W. SchultzPegasystems Technologies email address: [EMAIL PROTECTED] Remember Cernan and Schmitt -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
Please pardon my butting in again. I am not a developer, but I am very interested in this option because I've need it in the past and had to work around the lack of it with include/exclude options (I wanted to sync only a few files from a large directory, and needed it to work via the daemon for anonymous access). I also maintain the perl wrapper File::Rsync so I do my best to understand all of the options so I can handle them properly in the perl module. ---begin quoted text--- > From: Wayne Davison <[EMAIL PROTECTED]> > Date: Tue, 14 Jan 2003 19:39:49 -0800 > > On Tue, Jan 14, 2003 at 07:02:58PM -0800, jw schultz wrote: > > Up till now rsync hasn't touched anything outside of the paths > > specified on the command-line. Changing that would mean access to > > rsync via ssh would no longer be restricted, just disabled. > > Are you saying that some people have special ssh scripts that check > and/or tweak the file names on the command-line to ensure they fall with > certain bounds when running rsync commands? I.e., if someone ran this > command: > > rsync -av -e ssh "source:dir /foo/two /bar/three" /tmp > > the remote ssh setup would handle the presence of the extra "/foo/two", > "/bar/three" args? If so, I hadn't realized that people were limiting > ssh access by more than the traditional user/group/permissions access. > Yes, people do restrict args via ssh key restrictions. I have done this myself on many occasions. The environment variable SSH_ORIGINAL_COMMAND is passed to the actual command called from the command= key option so I write a small script to parse thru the variable checking each arg making sure they are what I expect (and possibly modifying them). I also check pathnames to make sure they all fit my restrictions. I then either exec rsync, or email the offending command to root if I find an exception (the mail also makes debugging easier). I assume the remote end will get the expanded contents of files-from so ssh command parsing would still work properly. > > Sanitizing the paths to force them to be relative on pulls > > but not pushes would be too asymetrical for my liking. > > I agree that if we find that we want to sanitize the paths in some cases > that we should just make it the default for files-from -- i.e. make it > where nothing can get beyond the root dir specified on the command-line. > > > I'd rather just disallow or sanitize absolute paths. If you try to pull a full pathname from a daemon 'rsync remote::/foo' it errors out with: ERROR: The remote path must start with a module name not a / so any sanitize code could first make sure all pathnames begin with a valid module and then make sure the file or dir is really inside that module. ---end quoted text--- -- Lee Eakin - [EMAIL PROTECTED] - Internet/Naming Services, Texas Instruments LAWS OF COMPUTER PROGRAMMING: II. Any given program costs more and takes longer. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
On Tue, Jan 14, 2003 at 07:39:49PM -0800, Wayne Davison wrote: > On Tue, Jan 14, 2003 at 07:02:58PM -0800, jw schultz wrote: > > Up till now rsync hasn't touched anything outside of the paths > > specified on the command-line. Changing that would mean access to > > rsync via ssh would no longer be restricted, just disabled. > > Are you saying that some people have special ssh scripts that check > and/or tweak the file names on the command-line to ensure they fall with > certain bounds when running rsync commands? I.e., if someone ran this > command: > > rsync -av -e ssh "source:dir /foo/two /bar/three" /tmp > > the remote ssh setup would handle the presence of the extra "/foo/two", > "/bar/three" args? If so, I hadn't realized that people were limiting > ssh access by more than the traditional user/group/permissions access. I don't know if they can handle correctly multiple source paths on the command line but there are certainly people using the command= option in authorized_keys to invoke special scripts to check and/or tweak the rsync command line restrict rsync to pre-aproved paths. > > Sanitizing the paths to force them to be relative on pulls > > but not pushes would be too asymetrical for my liking. > > I agree that if we find that we want to sanitize the paths in some cases > that we should just make it the default for files-from -- i.e. make it > where nothing can get beyond the root dir specified on the command-line. > > > I'd rather just disallow or sanitize absolute paths. > > Note that it's more pervasive than just absolute paths, since someone > can use args like "../../../etc/password" or "good_dir/../../bad_dir" > (all of which the sanitize_path() call handles). Yes, the relative ../../... paths slipped my mind but that is a concern as well. I'm aware that restricting --files-from to having relative paths is somewhat limiting but i think it may be the better approach. You can always do rsync --files-from=list / remote:/ If you need to. I haven't had time yet to closely examine or try it but i have two questions: with the -r or -a options does this recourse on directories in the --files-from list? What happens when there are implied directories that are missing on the destination? For example rsync -a --files-from=list src dest with the list having foo/bar/one will dest/foo and dest/foo/bar be created with the source directory attributes if they don't exist; will it fail; or will the missing implied directories be created with umask perms? -- J.W. SchultzPegasystems Technologies email address: [EMAIL PROTECTED] Remember Cernan and Schmitt -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
On Tue, Jan 14, 2003 at 09:15:02PM -0600, Lee Eakin wrote: > FYI, pulling multiple files from a daemon currently supported (well, it > works). Given a package of foo you can specify: > > rsync -av 'remote::foo/file1 foo/file5' /tmp Oh! I had left off the repeat of the module name when I tried to cajole the daemon using this "kludge". Thanks for the info. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
On Tue, Jan 14, 2003 at 07:02:58PM -0800, jw schultz wrote: > Up till now rsync hasn't touched anything outside of the paths > specified on the command-line. Changing that would mean access to > rsync via ssh would no longer be restricted, just disabled. Are you saying that some people have special ssh scripts that check and/or tweak the file names on the command-line to ensure they fall with certain bounds when running rsync commands? I.e., if someone ran this command: rsync -av -e ssh "source:dir /foo/two /bar/three" /tmp the remote ssh setup would handle the presence of the extra "/foo/two", "/bar/three" args? If so, I hadn't realized that people were limiting ssh access by more than the traditional user/group/permissions access. > Sanitizing the paths to force them to be relative on pulls > but not pushes would be too asymetrical for my liking. I agree that if we find that we want to sanitize the paths in some cases that we should just make it the default for files-from -- i.e. make it where nothing can get beyond the root dir specified on the command-line. > I'd rather just disallow or sanitize absolute paths. Note that it's more pervasive than just absolute paths, since someone can use args like "../../../etc/password" or "good_dir/../../bad_dir" (all of which the sanitize_path() call handles). ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
FYI, pulling multiple files from a daemon currently supported (well, it works). Given a package of foo you can specify: rsync -av 'remote::foo/file1 foo/file5' /tmp It appears the daemon does proper splitting based on either white-space, or possibly the current value of $IFS in the daemon's environment? One other note, I did not determine whether it was a Solaris issue, or string length limit, or file argument limit, but in my tests I could only specify about 20 files using this method. When I went over the limit no files were xfered. I was testing this a while back (just to see if I could), so I don't remember the exact limit, but I am fairly sure I experimented with shorter pathnames and it did not effect the max filenames I could specify. Oh, yes. I did not have the same limitation over ssh. The remote shell seems to pass any number of filenames to the remote end (of course there may be limits depending on what login shell is used on the remote server). -Lee ---begin quoted text--- > From: Wayne Davison <[EMAIL PROTECTED]> > To: jw schultz <[EMAIL PROTECTED]> > Cc: [EMAIL PROTECTED] > Subject: Re: specifying a list of files to transfer > User-Agent: Mutt/1.3.28i > X-BeenThere: [EMAIL PROTECTED] > X-Mailman-Version: 2.0.13 > X-Original-Date: Tue, 14 Jan 2003 18:41:22 -0800 > Date: Tue, 14 Jan 2003 18:41:22 -0800 > > On Tue, Jan 14, 2003 at 04:35:40PM -0800, jw schultz wrote: > > Absolute paths are bad news here. Especially when dealing with an > > rsync daemon. > > Yes, this is something that needs to be dealt with for daemon mode since > it does not appear to have been possible to specify multiple filenames > to pull before (unlike remote-shell mode). > > For non-daemon mode, the code is the same as it always was in this > regard. For example, this command: > > rsync -av /tmp/one /foo/two /bar/three dest: > > is no different than this command: > > rsync -av --files-from=list /tmp dest: > > where list contains: > > one > /foo/two > /bar/three > > In the patch I posted earlier, daemon mode did not work with the new > --from-files option. My latest patch has this fixed: > > http://www.clari.net/~wayne/rsync-files-from.patch > > And it also runs the filenames through sanitize_path() in daemon mode > (when chroot is not specified, at least -- I haven't tested a chroot > version yet). > > ..wayne.. > -- > To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync > Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html ---end quoted text--- -- Lee Eakin - [EMAIL PROTECTED] With sufficient thrust, pigs fly just fine. -- RFC 1925 -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
On Tue, Jan 14, 2003 at 06:41:22PM -0800, Wayne Davison wrote: > On Tue, Jan 14, 2003 at 04:35:40PM -0800, jw schultz wrote: > > Absolute paths are bad news here. Especially when dealing with an > > rsync daemon. > > Yes, this is something that needs to be dealt with for daemon mode since > it does not appear to have been possible to specify multiple filenames > to pull before (unlike remote-shell mode). > > For non-daemon mode, the code is the same as it always was in this > regard. For example, this command: > > rsync -av /tmp/one /foo/two /bar/three dest: > > is no different than this command: > > rsync -av --files-from=list /tmp dest: > > where list contains: > > one > /foo/two > /bar/three So in dest: you get one two three and if /foo/two and /bar/three are directories they are recoursed due to -a ? If so that would be OK, security wise for a push. But we don't want rsync -av --files-from=list source:dir /tmp to allow pulling from source:/foo/two or source:/bar/three Up till now rsync hasn't touched anything outside of the paths specified on the command-line. Changing that would mean access to rsync via ssh would no longer be restricted, just disabled. Sanitizing the paths to force them to be relative on pulls but not pushes would be too asymetrical for my liking. I'd rather just disallow or sanitize absolute paths. > > In the patch I posted earlier, daemon mode did not work with the new > --from-files option. My latest patch has this fixed: > > http://www.clari.net/~wayne/rsync-files-from.patch > > And it also runs the filenames through sanitize_path() in daemon mode > (when chroot is not specified, at least -- I haven't tested a chroot > version yet). chroot changes the whole meaning of absolute paths anyway. -- J.W. SchultzPegasystems Technologies email address: [EMAIL PROTECTED] Remember Cernan and Schmitt -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
On Tue, Jan 14, 2003 at 04:35:40PM -0800, jw schultz wrote: > Absolute paths are bad news here. Especially when dealing with an > rsync daemon. Yes, this is something that needs to be dealt with for daemon mode since it does not appear to have been possible to specify multiple filenames to pull before (unlike remote-shell mode). For non-daemon mode, the code is the same as it always was in this regard. For example, this command: rsync -av /tmp/one /foo/two /bar/three dest: is no different than this command: rsync -av --files-from=list /tmp dest: where list contains: one /foo/two /bar/three In the patch I posted earlier, daemon mode did not work with the new --from-files option. My latest patch has this fixed: http://www.clari.net/~wayne/rsync-files-from.patch And it also runs the filenames through sanitize_path() in daemon mode (when chroot is not specified, at least -- I haven't tested a chroot version yet). ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
On Tue, Jan 14, 2003 at 03:57:51PM -0800, Wayne Davison wrote: > On Tue, Jan 14, 2003 at 03:32:41PM -0600, Dave Dykstra wrote: > > 1. Yes it should take a filename or - as a parameter. > > 2. I don't like the idea of skipping the SRC spec. Paths should be > > relative to the SRC. If somebody wants to use full paths they > > can always have a SRC of "/". > > 3. It should be called --files-from. > > 4. --send-dirs and --no-implicit-dirs shouldn't be separate options, > > they should be automatically turned on with the --files-from option. > > OK, I'm also fine with these points. Note RE comment #2: even though > the relative path names now default to the SRC dir, the user can still > include absolute path names in the list and rsync will transfer them Absolute paths are bad news here. Especially when dealing with an rsync daemon. This allows the user to defeat any location restrictions. Not only working outside the module of an rsync daemon but also the kinds of restrictions that someone might set up using command= wrappers in ssh. > without problem. Also, I think the older implementation of --files-from > implied the -R (--relative) option, and this implementation does not. > > So, here's a *VERY EARLY* implementation that can transfer files in > either direction. It adds the option --files-from=FILE and the option > --null (for null-terminated names). "FILE" can be "-" for stdin. This > patch is relative to the CVS version, and is only for those that want to > assist in implementation, design, and/or testing. **I have not tested > daemon mode at all yet, just simple ssh transfers in both directions.** > > Compatibility note: when pushing files, the --files-from mode will work > with any older version of rsync that we can transfer files with. When > pulling files, the remote rsync must understand the "--files-from=-" > option (which tells it to read the file list over the stdin-socket since > it's combined with the --server option). That's good at least temporarily, probably permanently. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
On Tue, Jan 14, 2003 at 03:32:41PM -0600, Dave Dykstra wrote: > 1. Yes it should take a filename or - as a parameter. > 2. I don't like the idea of skipping the SRC spec. Paths should be > relative to the SRC. If somebody wants to use full paths they > can always have a SRC of "/". > 3. It should be called --files-from. > 4. --send-dirs and --no-implicit-dirs shouldn't be separate options, > they should be automatically turned on with the --files-from option. OK, I'm also fine with these points. Note RE comment #2: even though the relative path names now default to the SRC dir, the user can still include absolute path names in the list and rsync will transfer them without problem. Also, I think the older implementation of --files-from implied the -R (--relative) option, and this implementation does not. So, here's a *VERY EARLY* implementation that can transfer files in either direction. It adds the option --files-from=FILE and the option --null (for null-terminated names). "FILE" can be "-" for stdin. This patch is relative to the CVS version, and is only for those that want to assist in implementation, design, and/or testing. **I have not tested daemon mode at all yet, just simple ssh transfers in both directions.** Compatibility note: when pushing files, the --files-from mode will work with any older version of rsync that we can transfer files with. When pulling files, the remote rsync must understand the "--files-from=-" option (which tells it to read the file list over the stdin-socket since it's combined with the --server option). Aside: there was a huge chunk of code in main.c that was not indented correctly (due to the addition of some read_batch stuff). I didn't want to march the code off the edge of the screen any further, so I made the read_batch code use a goto. Those that have a weak stomach may wish to avert their gaze from that portion of the patch. ..wayne.. Index: flist.c --- flist.c 24 Dec 2002 07:42:04 - 1.127 +++ flist.c 14 Jan 2003 23:44:21 - @@ -41,6 +41,8 @@ extern int cvs_exclude; extern int recurse; +extern char *files_from; +extern int files_from_fd; extern int one_file_system; extern int make_backups; @@ -680,7 +682,7 @@ if (noexcludes) goto skip_excludes; - if (S_ISDIR(st.st_mode) && !recurse) { + if (S_ISDIR(st.st_mode) && !recurse && !files_from) { rprintf(FINFO, "skipping directory %s\n", fname); return NULL; } @@ -876,12 +878,13 @@ **/ struct file_list *send_file_list(int f, int argc, char *argv[]) { - int i, l; + int l; STRUCT_STAT st; char *p, *dir, *olddir; char lastpath[MAXPATHLEN] = ""; struct file_list *flist; int64 start_write; + int use_ff_fd = 0; if (show_filelist_p() && f != -1) start_filelist_progress("building file list"); @@ -890,16 +893,33 @@ flist = flist_new(); - if (f != -1) { + if (f != -1) io_start_buffering(f); + + if (files_from && f != -1) { + if (!push_dir(argv[0], 0)) { + rprintf(FERROR, "push_dir %s : %s\n", + argv[0], strerror(errno)); + exit_cleanup(RERR_FILESELECT); + } + use_ff_fd = 1; } - for (i = 0; i < argc; i++) { + while (1) { char *fname = topsrcname; - strlcpy(fname, argv[i], MAXPATHLEN); + if (use_ff_fd) { + l = read_filesfrom_line(files_from_fd, fname); + if (!l) + break; + } + else { + if (argc-- == 0) + break; + strlcpy(fname, *argv++, MAXPATHLEN); + l = strlen(fname); + } - l = strlen(fname); if (l != 1 && fname[l - 1] == '/') { if ((l == 2) && (fname[0] == '.')) { /* Turn ./ into just . rather than ./. @@ -922,7 +942,7 @@ continue; } - if (S_ISDIR(st.st_mode) && !recurse) { + if (S_ISDIR(st.st_mode) && !recurse && !files_from) { rprintf(FINFO, "skipping directory %s\n", fname); continue; } @@ -940,7 +960,7 @@ dir = fname; fname = p + 1; } - } else if (f != -1 && (p = strrchr(fname, '/'))) { + } else if (f != -1 && !files_from && (p=strrchr(fname,'/'))) { /* this ensures we send the intermediate directories, thus getting their permissions right */
Re: specifying a list of files to transfer
On Tue, Jan 14, 2003 at 03:32:41PM -0600, Dave Dykstra wrote: > I haven't looked at the implementation, but comments on the user > interface: > 1. Yes it should take a filename or - as a parameter. > 2. I don't like the idea of skipping the SRC spec. Paths should be > relative to the SRC. If somebody wants to use full paths they > can always have a SRC of "/". > 3. It should be called --files-from. > 4. --send-dirs and --no-implicit-dirs shouldn't be separate options, > they should be automatically turned on with the --files-from option. Those comments all sound reasonable to me. The only reason I broke out the --send-dirs and --no-implicit-dirs options was because they were orthogonal to what I was doing and could potentially also apply to situations where the user was specifing various SRC filenames on the command line. But it's certainly fine to have --files-from turn those on automatically. -Andy -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
On Tue, Jan 14, 2003 at 01:21:29PM -0800, Wayne Davison wrote: > On Tue, Jan 14, 2003 at 11:02:44AM -0500, Andrew J. Schorr wrote: > > I am attaching an updated version of my patch to allow you to specify > > a list of files to transfer. > > Cool. I'm looking into making this work when fetching files. Towards > that end, I'd like to suggest an alternate command-line syntax to make > the --source-file option take a filename. This will allow it to accept > "-" as stdin, and will make it easy to parse for the pull syntax. This > means that we need to omit the SRC spec on a push or specify it as > empty. E.g. these will all work: > > rsync --source-list=file remote:/path > rsync --source-list file : remote:/path > rsync --source-list=- "" remote:/path > A pull looks like this: > > rsync --source-list=file remote: /path > rsync --source-list - remote::module /path > What do people think? Of course this is not for the rsync release we're > currently working on, but could be included as a patch, if desired. I haven't looked at the implementation, but comments on the user interface: 1. Yes it should take a filename or - as a parameter. 2. I don't like the idea of skipping the SRC spec. Paths should be relative to the SRC. If somebody wants to use full paths they can always have a SRC of "/". 3. It should be called --files-from. 4. --send-dirs and --no-implicit-dirs shouldn't be separate options, they should be automatically turned on with the --files-from option. - Dave -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
On Tue, Jan 14, 2003 at 11:02:44AM -0500, Andrew J. Schorr wrote: > I am attaching an updated version of my patch to allow you to specify > a list of files to transfer. Cool. I'm looking into making this work when fetching files. Towards that end, I'd like to suggest an alternate command-line syntax to make the --source-file option take a filename. This will allow it to accept "-" as stdin, and will make it easy to parse for the pull syntax. This means that we need to omit the SRC spec on a push or specify it as empty. E.g. these will all work: rsync --source-list=file remote:/path rsync --source-list file : remote:/path rsync --source-list=- "" remote:/path http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
specifying a list of files to transfer
Hi, I don't want to start another --files-from war, but I am attaching an updated version of my patch to allow you to specify a list of files to transfer. The normal rsync syntax allows you to specify a list of SRC files to transfer on the command line. This patch adds some new options to allow you to instead supply a file that contains a list of files to transfer. The previous version of the patch was against rsync-2.4.6; this version works for rsync-2.5.5. The only real changes relate to the use of the popt option parsing library in 2.5 (not used in 2.4). This had the minor effect of removing the possibility of using "-" to indicate stdin since the new library seems to interpret this as an option and barfs. So instead I allow the use of "/dev/stdin". By the way, this patch should also work against rsync-2.5.6pre1 except for a couple of changes relating to white space and comments. So a couple of patch hunks are rejected but are easy to fix by hand. If there is a need, I can post an updated patch. Last time we discussed this, Dave Dykstra objected to this patch for two reasons: 1. This patch only works in a single direction: when sending from a local system to a remote system. It does not handle the case where you are receiving from a remote system to a local system. 2. This capability is possible to achieve by specifying a list of files with --include-from and then adding --exclude '*' to ignore other files. While this is true, it turns out to be much slower. I have finally run a performance test to demonstrate this. Results are below. The basic idea of the patch is to handle the case where you already know a list of files that might need to be updated and don't want to use rsync's recursive directory tree scanning logic to enumerate all files. The patch adds the following options: --source-list SRC arg will be a (local) file name containing a list of files, or /dev/stdin --null used with --source-list to indicate that the file names will be separated by null (zero) bytes instead of linefeed characters; useful with gfind -print0 --send-dirs send directory entries even though not in recursive mode --no-implicit-dirs do not send implicit directories (parents of the file being sent) The --source-list option allows you to supply an explicit list of filenames to transport without using the --recursive feature and without playing around with include and exclude files. As discussed below, the same thing can be done by combining --recursive with --include-from and --exclude, but it's significantly slower and more arcane to do it that way. The --null flag allows you to handle files with embedded linefeeds. This is in the style of gnu find's -print0 operator. The --send-dirs overcomes a problem where rsync refuses to send directories unless it's in recursive mode. One needs this to make sure that even empty directories get mirrored. And the --no-implicit-dirs option turns off the default behavior in which all the parent directories of a file are transmitted before sending the file. That default behavior is very inefficient in my scenario where I am taking the responsibility for sending those directories myself. And now for a performance test: I have a directory tree containing 128219 files of which 16064 are directories. To start the test, I made a list of files that had changed in the past day: find . -mtime -1 -print > /tmp/changed (normally, my list of candidate files is generated by some other means, this is just a test example). There were 5059 entries in /tmp/changed. I used my new options to sync up these files to another host as follows: time rsync -RlHptgoD --numeric-ids --source-list \ --send-dirs --no-implicit-dirs -xz --stats /dev/stdin \ remotehost:/extra_disk/tmp/tree1 < /tmp/changed Here were the reported statistics: Number of files: 5059 Number of files transferred: 5056 Total file size: 355514100 bytes Total transferred file size: 355514100 bytes Literal data: 355514100 bytes Matched data: 0 bytes File list size: 139687 Total bytes written: 154858363 Total bytes read: 80916 wrote 154858363 bytes read 80916 bytes 364992.41 bytes/sec total size is 355514100 speedup is 2.29 And the time statistics: 112.53u 8.82s 7:03.92 28.6% I then ran the same command again (in which case there was nothing to transfer). Here's how long it took: 0.54u 0.62s 0:08.61 13.4% Now to compare with the recursive method using --include-from. First, we must create the list of files. In the case of include-from, we need to include all the parent directories as include patterns. The following gawk seems to do the job: gawk '$0 != "./" {sub(/^\.\//,"")} {while ((length > 0) && !($0 in already)) {print "/"$0; already[$0] = 1;