Re: feedback on rsync-HEAD-20050125-1221GMT

Alberto Accomazzi Mon, 31 Jan 2005 08:05:09 -0800

Hi Chris,

Chris Shoemaker wrote:

On Fri, Jan 28, 2005 at 03:42:25PM -0500, Alberto Accomazzi wrote:
Chris Shoemaker wrote:
If I understand Wayne's design, it would be possible to invent a
(per-directory) "hook" rule, whose value is executed, and whose stdout
is parsed as a [in|ex]clude file list.  E.g.:
-R "cat .rsync-my-includes"
or
-R "find . -ctime 1 -a ! -fstype nfs -a ! -empty -o iname 'foo*'"
This is certainly a very powerful mechanism, but it definitely should not be the only way we implement file filtering. Two problems:

1. Sprinkling rule files like these across directories would mean executing external programs all the time for each file to be considered.
No, only one execution per specified rule.  Most users of this feature
would put specify one rule at the root directory.  But, if a user
wanted to change the rules for every directory, they would have to
specify a rule in each directory.  Then, yes, one execution per
directory.  Presumably they would do this because they actually need
to.  Never one execution per file.

Ok, I guess I had misunderstood your original suggestion. One execution per directory is presumably not so bad, although it's hard to make assumptions about how one's data hierarchy is structured.

This would presumably slow down rsync's execution by an order of magnitude or so and suck the life out of a system doing a big backup job.
If you're referring to process spawning overhead, it's no big deal.
If you're referring to the actual work required to return the file
list, what makes you think that rsync can do it more efficiently than
'cat' or 'find', or whatever tool the user chose?

I was referring to the overhead of spawning a process per file being considered. But I think we all agree that this is not desirable nor necessary.

2. Who does actually need such powerful but yet hard-to-handle mechanism? Most of rsync's users are not programmers, and even us few who are apparently still get confused with rsync's include/exclude logic, forget about even more complicated approaches.
Do you mean include/exclude mechanism or filtering mechanism?  Well,
IMO, parsing a file list is *less* complicated than rsync's custom
pattern specification and include/exclude chaining.  Actually, I think
rsync patterns are /crazy/ complicated and fully deserve the pages
upon pages of documentation, explanation and examples that they get in
the man page.
But, complexity is somewhat subjective, so I won't argue (much) about it. In practice, /familiarity/ is far more important than complexity in a case like this. Someone who looks at rsync for the first time has a _zero_ chance of having seen something like rsync's patterns before, because there is nothing else like them.

I agree that exclude/include patters can be tricky, and you have a good point about familiarity versus complexity. I think what makes them hard to handle is the fact that we are dealing with filename (and directory name) matching and recursion. So matching only a subset of a file tree, while simple as a concept, is non-trivial once you sit down and realize that you need a well-defined syntax for it. Can you write a find expression that is simpler or more familiar to the average user than an rsync's include/exclude?

(The allusion to GNU
tar's --exclude option which takes only a filename, not a pattern,
isn't really helpful in understanding rsyncs --exclude option.)

Uh? Tar does take patters for exclusion, and has its own quirky way of dealing with wildcards, directory matching and filename anchoring: http://www.gnu.org/software/tar/manual/html_node/tar_100.html

It's not that pattern matching for file selection isn't complex --
it's just that it's such a well-defined, conceptually simple, common
task that other tools (like 'find' and 'bash') handle better than
rsync ever will.  And that's the way it should be: it's the unix way.

I agree that this is something we should be striving for as much as possible: pipeline and offload tasks rather than bloating applications.

If you really need complete freedom maybe the way to go is to do your file selection first and use --files-from.
Yes, --files-from is nice, and honestly, almost completely sufficient.
But in some dynamic cases, you can't keep the list updated.

Well, maybe we should go back and see if the solution to all problems isn't making --files-from sufficient. What exactly is missing from it right now? The capability to delete files which are not in the files-from list? Or the remote execution of a command that can generate the files-from list for an rsync server? Maybe we ought to really figure out what things cannot be achieved with the current functionality before coming up with something new.

challenge is making this powerful without making it too complicated, because in that case nobody will use it.
You see --filter as less complicated than --include/exclude, then?
It's certainly more powerful.

Since --filter can support a superset of the file selection rules that --include/exclude supports, it's certainly more complicated than include/exclude, but not by much: I still think the trickiest part of the file selection rules for the average user will be pattern matching. The other big issue looming is the logic used for nesting/inheriting/overriding file selection rules. I'm really worried that those can easily get out of hand.


-- Alberto

********************************************************************
Alberto Accomazzi                      aaccomazzi(at)cfa harvard edu
NASA Astrophysics Data System                        ads.harvard.edu
Harvard-Smithsonian Center for Astrophysics      www.cfa.harvard.edu
60 Garden St, MS 31, Cambridge, MA 02138, USA
********************************************************************
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: feedback on rsync-HEAD-20050125-1221GMT

Reply via email to