This is a continuation of the thread Is factible to implement full writes of stripes to raid using NVRAM memory in LFS. http://mail-index.netbsd.org/tech-kern/2016/08/18/msg020982.html
I want to discuss in what layer must be located a write back-cache. It will be used usually for for raid configurations as a general purpose device: any type of filesystem or raw. Before of discussing the different options, I want to present the benefits of a write-back, that I think must be supported by a write-back cache, for check that they can be supported for the different options. 1- There is no need to use parity map for the RAID 1/10/5/6. Usually the impact is small, but it can be noticeable in busy servers. a) There isn't parity to rebuild. The parity is always up to date. Less down time in case of os crash / power failure / hardware failure b) Better performance for RAID 1/5/6. It isn't necessary to update the parity map because they don't exist. 2- In scattered writes contained in a same slice, it allows to reduce the number of writes. With RAID 5/6 there is a advantage, the parity is written only one time for several writes in the same slice, instead of one time for every write in the same slice. 3- It allows to consolidate several writes that takes the full length of the stripe in one write, without reading the parity. This can be the case for log structured file systems as LFS, and allows to use a RAID 5/6 with the similar performance of a RAID-0. 4- Faster synchronous writes. The proposed layer must support: A- It must be able to obtain the raid configuration of the raid device backing the writeback cache. If it is a RAID 0/1 it will cache portions of the size of the interleave. If it is RAID 5/6 it will cache the size of a full slice. B- It can use the buffer cache for avoid read/write cycles, and do only writes if the data to be read is in memory. C- Several devices can share the same write back-cache device -> optimal and easy to configure. There is not need to hard partitioning a NVRAM device in smaller devices with one partition over-used and other infra-used. D- In the case of filesystems as LFS, it would be useful to do the next optimization: when a slice is complete in the buffer, write it in a short time, because this won't be written any more. E- It can be useful to use elevator algorithms for do the writes from buffer cache to raid. These are the three options proposed by Thor. I would like to know what is the best option for you: 1- Implement in a generic pseudodisk the write-back cache. This pseuododisk is attached on a raid/disk/etc. This is also the option suggested by Greg. It seems the option more recommended in the previous thread. 2- Add this to Raidframe. Is it more easy to implement/to integrate with Raidframe?. The raid configurations are contained in the same driver. It can be more easy for a sysadmin to configure: less devices/commands, and not prune to corruption errors: there isn't a device with write-back cache and the same device without write-back cache. For non raid devices It can be used as raid 0 of one disk. 3- LVM. I don't see special advantage in this option. I want to leave for other thread what devices must be supported: nvram/nvme/disk/etc. As notes: - mdadm has the possibility to use a disk (flash by example) as journal for raid devices. It would be used instead of parity maps. It has been integrated inside of mdadm: https://lwn.net/Articles/665299/ - I don't think that it is possible to use write back-cache for boot the OS in a easy way.
