On Mon, 22 Oct 2018 at 10:57, Amir Goldstein <[email protected]> wrote:
> On Mon, Oct 22, 2018 at 11:26 AM Richard Weinberger <[email protected]> wrote:
> >
> > Am Montag, 22. Oktober 2018, 09:14:08 CEST schrieb Rafał Miłecki:
> > > On Fri, 19 Oct 2018 at 14:31, Rafał Miłecki <[email protected]> wrote:
> > > > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly
> > > > reporting file system corruptions. OpenWrt uses overlay(fs) with
> > > > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate
> > > > & describe test case for reproducing corruption when doing a power cut
> > > > after first boot.
> > > >
> > > > (...)
> > > >
> > > > Can I ask you to check if there is something possibly wrong with the
> > > > above ovl commit? Or does it expose some problem with the ubifs? Or
> > > > maybe the whole UBI?
> > > >
> > > > FWIW testing above commit (and one before it) always results in single
> > > > error in the kernel log:
> > > > [ 14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned
> > > > twice
> > > >
> > > > That UBIFS error doesn't occur with 4.12.14. Unfortunately it's
> > > > impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14.
> > >
> > > Let me provide a summary of all relevant commits & tests:
> > >
> > > By "Corruption" I mean file system corruption after power cut
> >
> > Well, is the filesystem not consistent anymore?
> > From what Russel explained to me, I thought the main problem is that no
> > write back happens.
> > IOW the inode is present, has correct length, but no content is there (all
> > zeros).
> >
> > Just like the typical case where userspace does not fsync.
> > But in your case sooner or later write back should have happened because
> > the writeback timer
> > fires at some point.
> >
>
> For the records overlayfs does:
> - open(O_TMPFILE)
> - setxattr() [with 3a1e819b4e80]
> - write to tmpfile
> - fsync tmpfile
> - link tmpfile
>
> I suggest that you try the same from user space on ubifs.
Are you 100% sure about it? I tried writing C app behaving as you
described above and I could not reproduce the problem.
Then I took a close look at ovl_copy_up_locked() and it seems above
info isn't accurate. It seems to me that setxattr() happens between
fsync and link. I modified my C app to follow that order (open, write,
fsync, setxattr, link) and I can reproduce the problem now!
Steps to reproduce the problem:
1) compile tmptest.c
2) tmptest /overlay/upper/foo.txt user.bar baz
3) wait 5 seconds (so ubifs writes to flash)
4) power cut
5) boot again and check content of /overlay/upper/foo.txt
6) in my case content appears to be 00 00 00 00
Is this correct for ovl to call vfs_setxattr() *after* calling vfs_fsync()?
--
Rafał
#include <errno.h>
#include <fcntl.h>
#include <libgen.h>
#include <linux/limits.h>
#include <stdio.h>
#include <string.h>
#include <sys/xattr.h>
#include <unistd.h>
static const char foo[] = "foo\n";
int main(int argc, char **argv) {
char path[PATH_MAX];
char *dir;
int err;
int fd;
if (argc < 2) {
fprintf(stderr, "You have to pass file as argument\n");
return -ENOENT;
}
dir = strdup(argv[1]);
if (!dir) {
return -ENOMEM;
}
dir = dirname(dir);
fd = open(dir, O_TMPFILE | O_RDWR);
if (fd < 0) {
err = -errno;
fprintf(stderr, "Failed to open O_TMPFILE: %d\n", err);
return err;
}
#if 0
if (argc >= 4) {
if (fsetxattr(fd, argv[2], argv[3], strlen(argv[3]), 0) != 0) {
err = -errno;
fprintf(stderr, "Failed to fsetxattr(): %d\n", err);
return err;
}
printf("Wrote xattr \"%s\" value \"%s\"\n", argv[2], argv[3]);
}
#endif
if (write(fd, foo, sizeof(foo)) != sizeof(foo)) {
err = -errno;
fprintf(stderr, "Failed to write(): %d\n", err);
return err;
}
if (fsync(fd) < 0) {
err = -errno;
fprintf(stderr, "Failed to fsync(): %d\n", err);
return err;
}
#if 1
if (argc >= 4) {
if (fsetxattr(fd, argv[2], argv[3], strlen(argv[3]), 0) != 0) {
err = -errno;
fprintf(stderr, "Failed to fsetxattr(): %d\n", err);
return err;
}
printf("Wrote xattr \"%s\" value \"%s\"\n", argv[2], argv[3]);
}
#endif
snprintf(path, sizeof(path), "/proc/self/fd/%d", fd);
if (linkat(-1, path, AT_FDCWD, argv[1], AT_SYMLINK_FOLLOW) != 0) {
err = -errno;
fprintf(stderr, "Failed to linkat(): %d\n", err);
return err;
}
printf("Success!\n");
return 0;
}
_______________________________________________
openwrt-devel mailing list
[email protected]
https://lists.openwrt.org/mailman/listinfo/openwrt-devel