My nightly backup script gets the following error every time, when
backing up one set of files across NFS:

cd /
....
cpio: Read error at byte 0 in file home/coo/stella/sam/Same4.1, padding with zeros
cpio: Read error at byte 0 in file home/coo/stella/sam/Same5.1, padding with zeros

The backup runs as root.  Root can open the files and examine it
manually (e.g. with vi), across the network just fine.

Only these files have the problem - other files backup okay.
Permissions look good - owner and group are correct, and uid and gid
now match on both machines (they didn't, before!).

E.g:
$ ls -l /home/coo/stella/sam/Same4.1
-rw-rw----    1 stella   kendall     43662 Jan 26 12:05 /home/coo/stella/sam/Same4.1

I copied all the files in the directory to a new place - now I have 2
directories with the problem.  I opened all the text files in the
copied directory in vi, and even wrote them all out again.  No
difference.  Needless to say, none of the files start with a NUL byte.

The only clue I have is these error messages on the console of the
machine where the files live, at last reboot:

fh_verify: sam-copy/Same4.1 permission failure, acc=4, error=13
fh_verify: sam-copy/Same5.1 permission failure, acc=4, error=13

etc.

I recall errno 13 is permission denied (though I can't find that
definition in errno.h nor bits/errno.h), but I don't know what acc=4
means, nor why permission would be denied to root on just those files
and only when run from the script.

Any suggestions?  I would like to back up those files along with all
the others ...  Strange NFS delays?  The files are small - typically
30k.

Even after changing the NFS options to timeout=14 and retry=1 (minute),
it hasn't made any difference. 

coo:/home on /home/coo type nfs 
(rw,noexec,nosuid,nodev,soft,bg,timeo=14,retry=1,addr=192.168.1.101)

The files with this mysterious problem were all copied from an old cpio
archive across the network, and unpacked, when the UID and GID of the
users mismatched.  Then I update the systems to share the same ids for
the same users and groups, and did a massive find and chown and chgrp
to fix them.  Now everything *looks* okay, according to ls, but I'm
getting this weird problem.

The network runs fast, no errors I can see from ifconfig.

Just found this, too, from the commandline:

# cd /
# echo home/coo/stella/sam/Same4.1 | cpio -o -H newc > /dev/null
cpio: Read error at byte 16384 in file home/coo/stella/sam/Same4.1, padding with zeros

This only differs from the scripted version in that in the script, all
100 or so files get a read error at byte 0.

luke

-- 
SLUG - Sydney Linux User's Group - http://slug.org.au/
More Info: http://lists.slug.org.au/listinfo/slug

Reply via email to