Ian McNish [[EMAIL PROTECTED]] writes:
>On Sun, 8 Oct 2000, Martin Pool wrote:
>
>...
>>
>> As the manual says
>>
>> If you end an exclude list with --exclude '*', note that
>> since the algorithm is applied recursively that unless you
>> explicitly include parent directories of files you want to
>> include then the algorithm will stop at the parent
>> directories and never see the files below them. To
>> include all directories, use --include '*/' before the
>> --exclude '*'.
(...)
>yes, but part of what confuses me is the line in the man page that
>states:
>
> o --include "foo/" --include "foo/bar.c" --exclude "*"
> would include only foo/bar.c (the foo/ directory must
> be explicitly included or it would be excluded by the
> "*")
>
>this contradicts the behaviour and the statement you made. also, the
>man page states:
I know that Martin has followed up in more detail, but I thought I'd
also try to point out how there really isn't any contradiction between
these two entries from the man page.
To paraphrase the first section - if you are going to exclude with
wildcards, you must be sure to explicitly include (via some pattern)
all directories in the hierarchy above any files you wish to include.
If you don't do this, rsync never recurses into those directories
(since the exclude pattern matches the directory), and never gets a
chance to match your files.
The section of the man page you quoted is exactly this case, but just
a single example of the general rule. It uses an explicit ("foo/")
directory selection rather than wildcard ("*/"). That is, the
wildcard ("*") exclude match would normally match the top level "foo"
directory and prevent rsync from recursing into that directory, so it
never even tries to match "foo/bar.c" against the include pattern. By
explicitly including the directory foo ("foo/") you let rsync recurse
into it, at which point the other include pattern will take effect.
> o --exclude "/foo/*/bar" would exclude any file called
> bar two levels below a base directory called foo
>
>which would support my assumption that the command:
>
>% rsync -avz --include='*/*.gz' --exclude='*' /tmp/ jumper::files
>
>should copy what i want, but instead this does not copy anything.
I'm not sure of the correlation here, since the man page you cite is
talking about excluding a specific file and not a more generic
wildcard pattern. Rsync is going to apply each of these patterns at
each point in the tree as it processes files. Your generic "*"
wildcard will match all files and directories on all parts of the
tree, but the man page exclude of "/foo/*/bar" will only match files
(or directories) named bar that are three levels down from the root
beneath a first level name "foo" and any second level name.
The rsync command you give says to rsync (in order):
* Include files matching the pattern "*/*.gz"
* Exclude files matching the pattern "*".
Now, let's think like rsync does when it processes your directory
tree, which from an earlier message of yours was something like
(quoting partially):
>d /tmp
>d /tmp/abc
>f /tmp/abc/one
>f /tmp/abc/two
>f /tmp/abc/three
>f /tmp/abc/four
>f /tmp/abc/five.gz
>d /tmp/abd
>f /tmp/abd/one
>f /tmp/abd/two
>f /tmp/abd/three
>f /tmp/abd/four
>f /tmp/abd/five.gz
> (...)
So first, rsync starts in the directory tmp and looks at each of the
files it holds - in the above case, subdirectories "abc" and "abd".
It executes your include/exclude list in order. Neither of those
directories match the include pattern "*/*.gz" (it's important to
realize how the entire pattern is applied at each level of the tree).
However, both of those directories match the exclude pattern "*".
They are therefore excluded, and rsync stops at that point. Rsync
never gets to a point where the include pattern matches anything.
One way to fix this is what the man page is trying to describe - use
additional include patterns that cover the extra level of directories
that you need to match. If the above case was the extent of the
filesystem, you could fix this by adding either:
"--include=/tmp/abc/" and "--include=/tmp/abd/"
or
"--include=abc/" and "--include=abd/"
to your command. I mention both to highlight that the match is done
at each level of the tree. So even leaving off the "/tmp" prefix
would work in that the relative path would match at the second level.
Of course, the second case might also match those directory names at
deeper levels of the tree, which might or might not be what you
wanted.
But in a general case, you don't want to bother listing each and every
subdirectory in your tree, including multiple levels if you are
several levels deep. So by including the wildcard directory match
"*/" you automatically include all directories, which exempts them
from the global exclusion pattern.
An alternate approach (I believe - haven't tested it) would be that if
you really know your directory tree is just two levels deep, you could
make your exclude more selective - at the same granularity as your
include. So switching your exclude to "*/*" should (I think) keep it
from matching that first level of directory, so rsync can recurse.
Oh, and the global directory include does have a potential side-effect
of creating empty directories on the target for directories with no
matching files. If you don't want that, the only real fix is to
supply an include list that has all directory trees along paths to the
files you do want specified individually. There was a previous e-mail
thread on this list with respect to taking an explicit list of files
and generating the directory prefix list with a small shell script
that you might want to look back for if that's something of interest.
-- David
/-----------------------------------------------------------------------\
\ David Bolen \ E-mail: [EMAIL PROTECTED] /
| FitLinxx, Inc. \ Phone: (203) 708-5192 |
/ 860 Canal Street, Stamford, CT 06902 \ Fax: (203) 316-5150 \
\-----------------------------------------------------------------------/