Re: Flush disk cache on mount -ur

2017-05-28 Thread Ted Unangst
Mark Kettenis wrote:
> > * add a DIOCCACHESYNC ioctl that can be used to flush a disk's cache
> > 
> > * add code to the file systems that executes this ioctl when a mount
> >   is updated from r/w to r/o
> 
> Abusing the ioctl codepath in the kernel isn't a good idea.  This
> continues to be a major source of grief in the network stack.  We
> shouldn't let that disease spread.

Another previous effort to send magic commands to disks (hello TRIM) used
magic bufs and VOP_STRATEGY. I think that's probably worse. :)

But it does raise the question. How should a filesystem talk to the underlying
disk? Add another method to bdevsw? But do we know what it's arguments should
be? Maybe we proceed with ioctl and convert to bdevsw.diskctl when we have a
better idea what it should look like?



Re: Flush disk cache on mount -ur

2017-05-28 Thread Mark Kettenis
> Date: Sun, 28 May 2017 17:43:31 +0200 (CEST)
> From: Stefan Fritsch 
> 
> Hi,
> 
> I want to come back to the discussion from Oct. 2016 on tech@ about a
> disk cache flush ioctl.
> 
> 
> The problem we want to solve is that in case of power loss, there should 
> be no data loss on partitions that are either mounted read-only or are 
> unmounted. This should also be true if such a partition was previously 
> mounted r/w (this is what makes this difficult).
> 
> The most common situation where this is an issue are devices like home 
> routers, which have their root fs on a ramdisk and only store 
> configuration data on the disk. When the configuration is changed, the r/o 
> mount is changed to r/w, the data is written, and then the mount is 
> changed to r/o again.
> 
> So, the proposal was to
> 
> * add a DIOCCACHESYNC ioctl that can be used to flush a disk's cache
> 
> * add code to the file systems that executes this ioctl when a mount
>   is updated from r/w to r/o

Abusing the ioctl codepath in the kernel isn't a good idea.  This
continues to be a major source of grief in the network stack.  We
shouldn't let that disease spread.



Flush disk cache on mount -ur

2017-05-28 Thread Stefan Fritsch
Hi,

I want to come back to the discussion from Oct. 2016 on tech@ about a
disk cache flush ioctl.


The problem we want to solve is that in case of power loss, there should 
be no data loss on partitions that are either mounted read-only or are 
unmounted. This should also be true if such a partition was previously 
mounted r/w (this is what makes this difficult).

The most common situation where this is an issue are devices like home 
routers, which have their root fs on a ramdisk and only store 
configuration data on the disk. When the configuration is changed, the r/o 
mount is changed to r/w, the data is written, and then the mount is 
changed to r/o again.

So, the proposal was to

* add a DIOCCACHESYNC ioctl that can be used to flush a disk's cache

* add code to the file systems that executes this ioctl when a mount
  is updated from r/w to r/o

* change the various disk devices to do a cache flush whenever a
  writable physio file descriptor is closed on a partition. Right now
  this is only done if the last such file descriptor for a complete
  disk is closed.


There was some argument that the cache flush should not be done by the
filesystems but in a central place. The problem with that is that
currently the file systems do not notify anyone if a mount is changed
between r/o and r/w. So it's quite possible that a file system does
VOP_OPEN() and VOP_CLOSE() with different setting of F_WRITE. Or it
could be that despite the VOP_OPEN() call only having F_READ, the file
system is later changed to r/w.  So, if we wanted to go this way, we
would need a new call (VOP_UPDATE?) to change F_WRITE in the flags and
make all file systems use it..

Another issue voiced was the performance impact. I don't think that 
umounting or remounting file systems happens often enough for this to be a 
problem. For scsi it would actually be possible to do a cache flush only 
for a single partition, but for ata/nvme there is only an API for a cache 
flush for the whole disk.

I will re-send the patches that I have in separate mails.

Are there any other ideas how to go forward with this?

Cheers,
Stefan