Re: feedback on rsync-HEAD-20050125-1221GMT

2005-01-31 Thread Alberto Accomazzi
Hi Chris,
Chris Shoemaker wrote:
On Fri, Jan 28, 2005 at 03:42:25PM -0500, Alberto Accomazzi wrote:
Chris Shoemaker wrote:

If I understand Wayne's design, it would be possible to invent a
(per-directory) hook rule, whose value is executed, and whose stdout
is parsed as a [in|ex]clude file list.  E.g.:
-R cat .rsync-my-includes
or
-R find . -ctime 1 -a ! -fstype nfs -a ! -empty -o iname 'foo*'
This is certainly a very powerful mechanism, but it definitely should 
not be the only way we implement file filtering.  Two problems:

1. Sprinkling rule files like these across directories would mean 
executing external programs all the time for each file to be considered. 

No, only one execution per specified rule.  Most users of this feature
would put specify one rule at the root directory.  But, if a user
wanted to change the rules for every directory, they would have to
specify a rule in each directory.  Then, yes, one execution per
directory.  Presumably they would do this because they actually need
to.  Never one execution per file.
Ok, I guess I had misunderstood your original suggestion.  One execution 
per directory is presumably not so bad, although it's hard to make 
assumptions about how one's data hierarchy is structured.

This would presumably slow down rsync's execution by an order of 
magnitude or so and suck the life out of a system doing a big backup job.

If you're referring to process spawning overhead, it's no big deal.
If you're referring to the actual work required to return the file
list, what makes you think that rsync can do it more efficiently than
'cat' or 'find', or whatever tool the user chose?
I was referring to the overhead of spawning a process per file being 
considered.  But I think we all agree that this is not desirable nor 
necessary.

2. Who does actually need such powerful but yet hard-to-handle 
mechanism?  Most of rsync's users are not programmers, and even us few 
who are apparently still get confused with rsync's include/exclude 
logic, forget about even more complicated approaches.

Do you mean include/exclude mechanism or filtering mechanism?  Well,
IMO, parsing a file list is *less* complicated than rsync's custom
pattern specification and include/exclude chaining.  Actually, I think
rsync patterns are /crazy/ complicated and fully deserve the pages
upon pages of documentation, explanation and examples that they get in
the man page.
But, complexity is somewhat subjective, so I won't argue (much) about
it.  In practice, /familiarity/ is far more important than complexity
in a case like this.  Someone who looks at rsync for the first time
has a _zero_ chance of having seen something like rsync's patterns
before, because there is nothing else like them.  
I agree that exclude/include patters can be tricky, and you have a good 
point about familiarity versus complexity.  I think what makes them hard 
to handle is the fact that we are dealing with filename (and directory 
name) matching and recursion.  So matching only a subset of a file tree, 
while simple as a concept, is non-trivial once you sit down and realize 
that you need a well-defined syntax for it.  Can you write a find 
expression that is simpler or more familiar to the average user than an 
rsync's include/exclude?

(The allusion to GNU
tar's --exclude option which takes only a filename, not a pattern,
isn't really helpful in understanding rsyncs --exclude option.)
Uh?  Tar does take patters for exclusion, and has its own quirky way of 
dealing with wildcards, directory matching and filename anchoring:
http://www.gnu.org/software/tar/manual/html_node/tar_100.html

It's not that pattern matching for file selection isn't complex --
it's just that it's such a well-defined, conceptually simple, common
task that other tools (like 'find' and 'bash') handle better than
rsync ever will.  And that's the way it should be: it's the unix way.
I agree that this is something we should be striving for as much as 
possible: pipeline and offload tasks rather than bloating applications.

If you really need 
complete freedom maybe the way to go is to do your file selection first 
and use --files-from.  

Yes, --files-from is nice, and honestly, almost completely sufficient.
But in some dynamic cases, you can't keep the list updated.
Well, maybe we should go back and see if the solution to all problems 
isn't making --files-from sufficient.  What exactly is missing from it 
right now?  The capability to delete files which are not in the 
files-from list?  Or the remote execution of a command that can generate 
the files-from list for an rsync server?  Maybe we ought to really 
figure out what things cannot be achieved with the current functionality 
before coming up with something new.

challenge is making this powerful without making it too complicated, 
because in that case nobody will use it.

You see --filter as less complicated than --include/exclude, then?
It's certainly more powerful.
Since --filter can support a superset 

Re: feedback on rsync-HEAD-20050125-1221GMT

2005-01-28 Thread Chris Shoemaker
On Fri, Jan 28, 2005 at 11:25:06AM -0500, Alberto Accomazzi wrote:

 
 Oooh, I see we are getting a little ambitious, aren't we? ;-)
 [suggestion to use 'find' syntax]

If I understand Wayne's design, it would be possible to invent a
(per-directory) hook rule, whose value is executed, and whose stdout
is parsed as a [in|ex]clude file list.  E.g.:

 -R cat .rsync-my-includes

or

 -R find . -ctime 1 -a ! -fstype nfs -a ! -empty -o iname 'foo*'


IMHO, rsync already has too much of its own filtering functionality,
and needs less, not more.  But maybe a hook like this that lets users
interface with their own filtering program is a step toward
deprecating rsync's [in|ex]clude[-from] options.

Notice that a generic include and exclude hooks immediately obsoletes
the --*-from options and the --*=PATTERN options.  (rsync needs fewer
options, ya see? :)

 Wayne Davison wrote:
 
 It already supports per-directory name rules, both inherited and not.
 The idea of having per-directory size and time limits would not be hard
 to add, and may be quite worthwhile.  For instance, assume 's' is for
 size and 't' is for the modified time:
 
 # Don't transfer files 1 GB or larger
 s 1g
 # Don't transfer files 100 KB or smaller
 s 100k
 # Only transfer new files (modified in the last day)
 t yesterday
 
 Something like that, perhaps.

We don't really want to reinvent 'find', do we?

 One more thing to point out: I got a core dump when starting a daemon 
 which tried to write to a log file that it had no permission to write 
 to.  The problem seems to be that the function log_open in log.c does 
 not check the return value of fopen.  I don't know whether the right 
 thing to do would be to exit with an error or continue but without 
 logging, but something ought to be changed.

Or... simply log to stderr.  After all, user may prevent daemon's
stderr from being redirected to /dev/null.


-chris
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: feedback on rsync-HEAD-20050125-1221GMT

2005-01-28 Thread Chris Shoemaker
On Fri, Jan 28, 2005 at 08:50:10PM -0500, Chris Shoemaker wrote:
 of the right path, but I won't be convinced until Wayne starts
 *deleting* man page text, because rsync's pattern matching can be
 fully explained in, say, one or two paragraphs.

that should've read pattern matching _interface_

-chris
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: feedback on rsync-HEAD-20050125-1221GMT

2005-01-27 Thread Wayne Davison
On Thu, Jan 27, 2005 at 11:56:11AM -0500, Alberto Accomazzi wrote:
 I have been using the rsync snapshot from 1/25 for the last few days and 
 everything seems quite solid so far.  I include below a few nit-picks in 
 case you're looking for thing to tidy up.

Much appreciated!

 I'm also looking forward to hearing news about the --filter option.  I'm 
 thinking it is going to be very useful for backup purposes.  What I 
 would want to be able to do is, on a per-directory basis and with 
 recursion as an option, enable or disabled backing up of files based on 
 their file name, size, and timestamp (maybe mtime only would suffice). 
 Is this where you're going?

It already supports per-directory name rules, both inherited and not.
The idea of having per-directory size and time limits would not be hard
to add, and may be quite worthwhile.  For instance, assume 's' is for
size and 't' is for the modified time:

# Don't transfer files 1 GB or larger
s 1g
# Don't transfer files 100 KB or smaller
s 100k
# Only transfer new files (modified in the last day)
t yesterday

Something like that, perhaps.

 - You should update the copyright stement to include 2005:

Fixed!

 - Compilation warnings when configured --with-included-popt with gcc 

Yeah, popt code does have a few warnings that I'm not too worried about
at the moment.

 - when running in daemon mode on fedora core 2, the daemon does not 
 start up properly unless you use the option --ipv4.

The code tries to deal with this case by forcing the IPv6 version of the
bind() to only bind the IPv6 port, not both IPv4 and IPv6:

#ifdef IPV6_V6ONLY
if (resp-ai_family == AF_INET6) {
if (setsockopt(s, IPPROTO_IPV6, IPV6_V6ONLY,
   (char *)one, sizeof one)  0
 default_af_hint != AF_INET6) {
close(s);
continue;
}
}
#endif

Do you know if FC2 has IPV6_V6ONLY defined?

The code does output the warning you mentioned when it thinks that this
bug is in effect.  I could have the code ignore the failure of the
second listen() call when this happens, but that would make rsync only
listen on the IPv4 address, and I think that it is better to ask the
user to specifically ask us to do that (via -4) rather than to output a
warning and assume that is OK to start up with reduced binding.

..wayne..
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html