Re: [Cluster-devel] [PATCH] gfs2: Fsync parent directories

2018-02-26 Thread Andreas Gruenbacher
On 21 February 2018 at 17:11, Christoph Hellwig  wrote:
> On Wed, Feb 21, 2018 at 08:51:15AM +1100, Dave Chinner wrote:
>> IOWs, if the filesystem is designed with strictly ordered metadata,
>> then fsync()ing a new file also implies that all references to the
>> new file are also on stable storage because they happened before the
>> fsync on the file was issued. i.e. the directory is fsync'd
>> implicitly because it was modified by the same operation that
>> created the file. Hence if the file creation is made stable, so must
>> be the directory modification done during file creation.
>>
>> This has nothing to do with POSIX or what the "linux standard" is -
>> this is testing whether the implementation of strictly ordered
>> metadata journalling is correct or not.  If gfs2 does not have
>> strictly ordered metadata journalling, then it probably shouldn't
>> run these tests
>
> Exactly.  Also this is not just for new entries but also things like
> rename.  So trying to come up with some adjocs hacks here seems
> wrong.
>
> That being said as far as I know gfs2 does transactional metadata
> updates and has one single global log.  Why doesn't it get these
> things right by default?

GFS2 does do metadata journaling. I was under the assumption that
gfs2's ordering model differs, but it turns out that all that was
missing was a log flush in iop->fsync in case the inode is clean but a
log flush hasn't been done for it, yet.

Thanks,
Andreas



Re: [Cluster-devel] [PATCH] gfs2: Fsync parent directories

2018-02-21 Thread Christoph Hellwig
On Wed, Feb 21, 2018 at 08:51:15AM +1100, Dave Chinner wrote:
> IOWs, if the filesystem is designed with strictly ordered metadata,
> then fsync()ing a new file also implies that all references to the
> new file are also on stable storage because they happened before the
> fsync on the file was issued. i.e. the directory is fsync'd
> implicitly because it was modified by the same operation that
> created the file. Hence if the file creation is made stable, so must
> be the directory modification done during file creation.
> 
> This has nothing to do with POSIX or what the "linux standard" is -
> this is testing whether the implementation of strictly ordered
> metadata journalling is correct or not.  If gfs2 does not have
> strictly ordered metadata journalling, then it probably shouldn't
> run these tests

Exactly.  Also this is not just for new entries but also things like
rename.  So trying to come up with some adjocs hacks here seems
wrong.

That being said as far as I know gfs2 does transactional metadata
updates and has one single global log.  Why doesn't it get these
things right by default?



Re: [Cluster-devel] [PATCH] gfs2: Fsync parent directories

2018-02-20 Thread Dave Chinner
On Tue, Feb 20, 2018 at 09:53:59PM +0100, Andreas Gruenbacher wrote:
> On 20 February 2018 at 20:46, Christoph Hellwig  wrote:
> > On Tue, Feb 20, 2018 at 12:22:01AM +0100, Andreas Gruenbacher wrote:
> >> When fsyncing a new file, also fsync the directory the files is in,
> >> recursively.  This is how Linux filesystems should behave nowadays,
> >> even if not mandated by POSIX.
> >
> > I think that is bullshit.  Maybe it is what google wants for ext4
> > non-journal mode which no one else uses anyway. but it certainly
> > is anything but normal Linux semantics.
> 
> Here's some code from xfstest generic/322:
> 
>   _mount_flakey
>   $XFS_IO_PROG -f -c "pwrite 0 1M" -c "fsync" $SCRATCH_MNT/foo \
> > $seqres.full 2>&1 || _fail "xfs_io failed"
>   mv $SCRATCH_MNT/foo $SCRATCH_MNT/bar
>   $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/bar
>   md5sum $SCRATCH_MNT/bar | _filter_scratch
> 
>   _flakey_drop_and_remount
> 
>   md5sum $SCRATCH_MNT/bar | _filter_scratch
>   _unmount_flakey
> 
> Note that there is no fsync for the parent directory ($SCRATCH_MNT),
> yet the test obviously expects the directory to be synced as well.
> This isn't implemented as in this patch on all filesystems, but the
> major ones all show this behavior. So where's the bullshit?

This test is for filesystems that have strictly ordered metadata
journalling. All the filesystems that fstests supports
via _require_metadata_journalling() have strictly ordered metadata
journalling/crash recovery semantics. (i.e. xfs, ext4, btrfs, and
f2fs (IIRC)).

IOWs, if the filesystem is designed with strictly ordered metadata,
then fsync()ing a new file also implies that all references to the
new file are also on stable storage because they happened before the
fsync on the file was issued. i.e. the directory is fsync'd
implicitly because it was modified by the same operation that
created the file. Hence if the file creation is made stable, so must
be the directory modification done during file creation.

This has nothing to do with POSIX or what the "linux standard" is -
this is testing whether the implementation of strictly ordered
metadata journalling is correct or not.  If gfs2 does not have
strictly ordered metadata journalling, then it probably shouldn't
run these tests

Cheers,

Dave.
-- 
Dave Chinner
dchin...@redhat.com



Re: [Cluster-devel] [PATCH] gfs2: Fsync parent directories

2018-02-20 Thread Andreas Gruenbacher
On 20 February 2018 at 20:46, Christoph Hellwig  wrote:
> On Tue, Feb 20, 2018 at 12:22:01AM +0100, Andreas Gruenbacher wrote:
>> When fsyncing a new file, also fsync the directory the files is in,
>> recursively.  This is how Linux filesystems should behave nowadays,
>> even if not mandated by POSIX.
>
> I think that is bullshit.  Maybe it is what google wants for ext4
> non-journal mode which no one else uses anyway. but it certainly
> is anything but normal Linux semantics.

Here's some code from xfstest generic/322:

  _mount_flakey
  $XFS_IO_PROG -f -c "pwrite 0 1M" -c "fsync" $SCRATCH_MNT/foo \
> $seqres.full 2>&1 || _fail "xfs_io failed"
  mv $SCRATCH_MNT/foo $SCRATCH_MNT/bar
  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/bar
  md5sum $SCRATCH_MNT/bar | _filter_scratch

  _flakey_drop_and_remount

  md5sum $SCRATCH_MNT/bar | _filter_scratch
  _unmount_flakey

Note that there is no fsync for the parent directory ($SCRATCH_MNT),
yet the test obviously expects the directory to be synced as well.
This isn't implemented as in this patch on all filesystems, but the
major ones all show this behavior. So where's the bullshit?

Thanks,
Andreas



Re: [Cluster-devel] [PATCH] gfs2: Fsync parent directories

2018-02-20 Thread Christoph Hellwig
On Tue, Feb 20, 2018 at 12:22:01AM +0100, Andreas Gruenbacher wrote:
> When fsyncing a new file, also fsync the directory the files is in,
> recursively.  This is how Linux filesystems should behave nowadays,
> even if not mandated by POSIX.

I think that is bullshit.  Maybe it is what google wants for ext4
non-journal mode which no one else uses anyway. but it certainly
is anything but normal Linux semantics.



Re: [Cluster-devel] [PATCH] gfs2: Fsync parent directories

2018-02-20 Thread Bob Peterson
Hi Andreas,

- Original Message -
| When fsyncing a new file, also fsync the directory the files is in,
| recursively.  This is how Linux filesystems should behave nowadays,
| even if not mandated by POSIX.
| 
| Based on ext4 commits 14ece1028, d59729f4e, and 9f713878f.
| 
| Fixes xfstests generic/322, generic/376.
| 
| Signed-off-by: Andreas Gruenbacher 
| ---

It seems like the patch should be calling gfs2_inode_lookup on the
parent directory or something, rather than a simple i_grab, and
possibly even holding (nw) the parent directory's i_gl glock.
Otherwise, the call to gfs2_ail_flush may reference an i_gl that
might not exist. I'm concerned about other nodes in the cluster
referencing and/or changing the parent directory inode while this
is happening. I'm not sure if it's possible. Maybe Nate has a test
to check cluster coherency for directories as well as files?

Regards,

Bob Peterson
Red Hat File Systems