Hello Walter, Thanks for this crazy diff. It is big so my first comment would be to split it to ease reviews. Here's a suggestion: - At least one diff for the kernel parts that are not WAPBL-only. In other words the refactoring needed for integrating WAPBL into FFS. This is the critical part in your work since people generally do not like regression with their filesystem :) From my point of view the addition of B_CONTIG/B_METAONLY could be easily split. You could also produce a diff for the wrapper functions that you're adding...
- A diff adding the WAPBL backend to the kernel. This diff would depend on the previous. It would be nice if you could keep all its chunks under "#ifdef WAPBL". Even if such define is not strictly needed for compiling the kernel it helps when reading the code. - At least one diff for userland tools support. Splitting your work would also allow you to give a more precise descriptions about each subpart of this monstrous work :) Apart from adding WAPBL to OpenBSD, do you have any idea about how to start using it? Is there any drawback if I'm sharing a media with WAPBL enabled on an old OpenBSD installation (not supporting WAPBL)? Concerning WAPBL itself did you consider porting the wapbl(9) manual from NetBSD? Is its BUGS section still relevant? What about the Open issues mentioned by Jörg Sonnenberge in its 2009 BSDCan [0] talk, are they still open? Keep the good work! Martin Some more comments inline. [0] https://www.bsdcan.org/2009/schedule/attachments/106_wapbl.pdf > Index: sys/kern/vfs_bio.c > =================================================================== > RCS file: /Volumes/CSP/cvs/src/sys/kern/vfs_bio.c,v > retrieving revision 1.170 > diff -u -r1.170 vfs_bio.c > --- sys/kern/vfs_bio.c 19 Jul 2015 16:21:11 -0000 1.170 > +++ sys/kern/vfs_bio.c 23 Oct 2015 20:48:45 -0000 > @@ -647,6 +672,20 @@ > { > int s; > > + /* If this is a tape block, write the block now. */ > + if (major(bp->b_dev) < nblkdev && > + bdevsw[major(bp->b_dev)].d_type == D_TAPE) { > + bawrite(bp); > + return; > + } If I'm reading NetBSD's log correctly this has been done separately to avoid calling reassignbuf(). That would be a good candidate for a single diff :) > + if (injournal(bp)) { > + struct mount *mp = wapbl_vptomp(bp->b_vp); > + > + if (bp->b_iodone != mp->mnt_wapbl_op->wo_wapbl_biodone) > + WAPBL_ADD_BUF(mp, bp); > + } What I was suggesting before is to keep such blocks under #ifdef WAPBL. > @@ -743,12 +775,28 @@ > * Determine which queue the buffer should be on, then put it there. > */ > > + /* If it's locked, don't report an error; try again later */ > + if (ISSET(bp->b_flags, (B_LOCKED|B_ERROR)) == (B_LOCKED|B_ERROR)) > + CLR(bp->b_flags, B_ERROR); > + > /* If it's not cacheable, or an error, mark it invalid. */ > if (ISSET(bp->b_flags, (B_NOCACHE|B_ERROR))) > SET(bp->b_flags, B_INVAL); > > if (ISSET(bp->b_flags, B_INVAL)) { > /* > + * If using WAPBL > + */ I don't know what's your sync-with-upstream strategy. But by comparing your diff with NetBSD's sources I feel that the comment below should be there instead of this one ;) > + if (ISSET(bp->b_flags, B_LOCKED)) { > + if (wapbl_vphaswapbl(bp->b_vp)) { > + struct mount *mp = wapbl_vptomp(bp->b_vp); > + KASSERT(bp->b_iodone > + != mp->mnt_wapbl_op->wo_wapbl_biodone); > + WAPBL_REMOVE_BUF(mp, bp); > + } > + } > + > + /* > * If the buffer is invalid, free it now rather than leaving > * it in a queue and wasting memory. > */ ^^^^^^^ > @@ -1206,6 +1267,17 @@ > } > #endif > > +void > +buf_adjcnt(struct buf *bp, long ncount) > +{ > + KASSERT(ncount <= bp->b_bufsize); > + long ocount = bp->b_bcount; > + bp->b_bcount = ncount; > + if (injournal(bp)) > + WAPBL_RESIZE_BUF(wapbl_vptomp(bp->b_vp), bp, bp->b_bufsize, > + ocount); > +} This is an example of wrapper function that could be easily extended :) > Index: sys/kern/vfs_biomem.c > =================================================================== > RCS file: /Volumes/CSP/cvs/src/sys/kern/vfs_biomem.c,v > retrieving revision 1.34 > diff -u -r1.34 vfs_biomem.c > --- sys/kern/vfs_biomem.c 19 Jul 2015 21:21:14 -0000 1.34 > +++ sys/kern/vfs_biomem.c 23 Oct 2015 20:48:45 -0000 > @@ -89,7 +89,7 @@ > { > splassert(IPL_BIO); > SET(bp->b_flags, B_BUSY); > - if (bp->b_data != NULL) { > + if (bp->b_data != NULL && !(bp->b_flags & B_LOCKED)) { Style nit. The ISSET() family of macro is almost always used when manipulating ``bp->b_flags''. You might want to suggest such changes to Bitrig :) > Index: sys/kern/vfs_wapbl.c > =================================================================== > RCS file: sys/kern/vfs_wapbl.c > diff -N sys/kern/vfs_wapbl.c > --- /dev/null 1 Jan 1970 00:00:00 -0000 > +++ sys/kern/vfs_wapbl.c 23 Oct 2015 20:26:12 -0000 > @@ -0,0 +1,2912 @@ > +/* $NetBSD: vfs_wapbl.c,v 1.51.2.2 2013/01/02 23:23:15 riz Exp $ */ > + > +/*- > + * Copyright (c) 2003, 2008, 2009 The NetBSD Foundation, Inc. > + * All rights reserved. > + * > + * This code is derived from software contributed to The NetBSD Foundation > + * by Wasabi Systems, Inc. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * 1. Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * 2. Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in the > + * documentation and/or other materials provided with the distribution. > + * > + * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS > + * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > LIMITED > + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR > + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS > + * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR > + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF > + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS > + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN > + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) > + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE > + * POSSIBILITY OF SUCH DAMAGE. > + */ > + > +/* > + * This implements file system independent write ahead filesystem logging. > + */ > + > +#define WAPBL_INTERNAL > + > +#include <sys/cdefs.h> > + > +#include <sys/param.h> > +#include <sys/bitops.h> > + > +#ifdef _KERNEL > +#include <sys/param.h> > +#include <sys/systm.h> > +#include <sys/namei.h> > +#include <sys/proc.h> > +#include <sys/sysctl.h> > +#include <sys/uio.h> > +#include <sys/vnode.h> > +#include <sys/file.h> > +#include <sys/resourcevar.h> > +#include <sys/conf.h> > +#include <sys/mount.h> > +#include <sys/kernel.h> > +#include <sys/pool.h> > +#include <sys/mutex.h> > +#include <sys/atomic.h> > +#include <sys/malloc.h> > +#include <sys/pool.h> > +#include <sys/dkio.h> > +#include <sys/wapbl.h> > +#include <sys/wapbl_replay.h> Are you sure all these headers are needed? <sys/pool.h> twice? > +#define wapbl_alloc(s) malloc((s), M_TEMP, M_WAITOK | M_ZERO) > +#define wapbl_free(a, s) free((a), M_TEMP, (s)) > +#define wapbl_calloc(n, s) malloc((n)*(s), M_TEMP, M_WAITOK | M_ZERO) M_TEMP might be good enough for debugging but I would suggest using a different type :) > +/* > + * This structure holds per-mount log information. > + * > + * Legend: a = atomic access only > + * r = read-only after init > + * l = rwlock held > + * m = mutex held > + * lm = rwlock held writing or mutex held > + * u = unlocked access ok > + * b = bufcache_lock held > + */ Do you understand the existing locking strategy? Do you think it fits or is it worth keeping in OpenBSD's kernel? I'm asking because I find a bit confusing the existing mix of locks, mutex, cv and spl in the code you're porting. > Index: sys/sys/bitops.h > =================================================================== > RCS file: sys/sys/bitops.h > diff -N sys/sys/bitops.h > --- /dev/null 1 Jan 1970 00:00:00 -0000 > +++ sys/sys/bitops.h 23 Oct 2015 20:26:12 -0000 > @@ -0,0 +1,294 @@ > +/* $NetBSD: bitops.h,v 1.9 2011/07/30 16:35:58 christos Exp $ */ > + > +/*- > + * Copyright (c) 2007, 2010 The NetBSD Foundation, Inc. > + * All rights reserved. > + * > + * This code is derived from software contributed to The NetBSD Foundation > + * by Christos Zoulas and Joerg Sonnenberger. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * 1. Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * 2. Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in the > + * documentation and/or other materials provided with the distribution. > + * > + * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS > + * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > LIMITED > + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR > + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS > + * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR > + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF > + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS > + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN > + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) > + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE > + * POSSIBILITY OF SUCH DAMAGE. > + */ > +#ifndef _SYS_BITOPS_H_ > +#define _SYS_BITOPS_H_ > + > +#include <sys/stdint.h> > + > +/* > + * Find First Set functions > + */ > +#ifndef ffs32 > +static __inline int __unused > +ffs32(uint32_t _n) libkern already has ffs(9), fls(9) and flsl(9) is there anything to merge here? > +/* > + * Integer logarithm, returns -1 on error. Inspired by the linux > + * version written by David Howells. > + */ > +#define _ilog2_helper(_n, _x) ((_n) & (1ULL << (_x))) ? _x : > +#define ilog2(_n) \ Funny oce(4) defines its own version! > +static __inline void > +fast_divide32_prepare(uint32_t _div, uint32_t * __restrict _m, > + uint8_t *__restrict _s1, uint8_t *__restrict _s2) You don't seem to be using these anywhere, or did I miss something? > Index: sys/sys/mount.h > =================================================================== > RCS file: /Volumes/CSP/cvs/src/sys/sys/mount.h,v > retrieving revision 1.121 > diff -u -r1.121 mount.h > --- sys/sys/mount.h 8 Sep 2014 01:47:06 -0000 1.121 > +++ sys/sys/mount.h 23 Oct 2015 20:48:45 -0000 > @@ -358,6 +358,11 @@ > int mnt_maxsymlinklen; /* max size of short symlink */ > struct statfs mnt_stat; /* cache of filesystem stats */ > void *mnt_data; /* private data */ > + struct wapbl_ops > + *mnt_wapbl_op; /* logging ops */ > + struct wapbl *mnt_wapbl; /* log info */ > + struct wapbl_replay > + *mnt_wapbl_replay; /* replay support XXX: what? */ > }; > > /* > @@ -396,7 +401,7 @@ > /* > * Mask of flags that are visible to statfs() > */ > -#define MNT_VISFLAGMASK 0x0400ffff > +#define MNT_VISFLAGMASK 0x1400ffff > > #define MNT_BITS \ > "\010\001RDONLY\002SYNCHRONOUS\003NOEXEC\004NOSUID\005NODEV" \ > @@ -413,6 +418,7 @@ > #define MNT_WANTRDWR 0x02000000 /* want upgrade to read/write */ > #define MNT_SOFTDEP 0x04000000 /* soft dependencies being done */ > #define MNT_DOOMED 0x08000000 /* device behind filesystem is gone */ > +#define MNT_LOG 0x10000000 /* use logging */ > > /* > * Flags for various system call interfaces. > @@ -501,6 +507,51 @@ > extern int bufbackoff(struct uvm_constraint_range*, long); > > /* > + * This operations vector is so wapbl can be wrapped into a filesystem lkm. > + * XXX Eventually, we want to move this functionality > + * down into the filesystems themselves so that this isn't needed. > + */ This is interesting. OpenBSD does not support lkm, do you think it is worth keeping this layer of indirection? For diffability maybe? > Index: sys/sys/specdev.h > =================================================================== > RCS file: /Volumes/CSP/cvs/src/sys/sys/specdev.h,v > retrieving revision 1.34 > diff -u -r1.34 specdev.h > --- sys/sys/specdev.h 2 Nov 2013 00:16:31 -0000 1.34 > +++ sys/sys/specdev.h 23 Oct 2015 20:48:45 -0000 > @@ -37,6 +37,10 @@ > * special devices. It is allocated in checkalias and freed > * in vgone. > */ > + > +#ifndef _SYS_SPECDEV_H_ > +#define _SYS_SPECDEV_H_ > + > struct specinfo { > struct vnode **si_hashchain; > struct vnode *si_specnext; > @@ -108,3 +112,4 @@ > int spec_advlock(void *); > > #endif /* _KERNEL */ > +#endif /* _SYS_SPECDEV_H_ */ This guard could already be committed separately. > Index: sys/sys/stat.h > =================================================================== > RCS file: /Volumes/CSP/cvs/src/sys/sys/stat.h,v > retrieving revision 1.28 > diff -u -r1.28 stat.h > --- sys/sys/stat.h 4 Apr 2015 18:06:08 -0000 1.28 > +++ sys/sys/stat.h 23 Oct 2015 20:48:45 -0000 > @@ -173,6 +173,7 @@ > #define SF_ARCHIVED 0x00010000 /* file is archived */ > #define SF_IMMUTABLE 0x00020000 /* file may not be changed */ > #define SF_APPEND 0x00040000 /* writes to file may only > append */ > +#define SF_LOG 0x00400000 /* WAPBL log file inode */ Please do not mix tab and space :) > Index: sys/sys/vnode.h > =================================================================== > RCS file: /Volumes/CSP/cvs/src/sys/sys/vnode.h,v > retrieving revision 1.132 > diff -u -r1.132 vnode.h > --- sys/sys/vnode.h 7 May 2015 08:53:33 -0000 1.132 > +++ sys/sys/vnode.h 23 Oct 2015 20:48:45 -0000 > @@ -185,13 +185,14 @@ > /* > * Flags for ioflag. > */ > -#define IO_UNIT 0x01 /* do I/O as atomic unit */ > -#define IO_APPEND 0x02 /* append write to end */ > -#define IO_SYNC 0x04 /* do I/O synchronously */ > -#define IO_NODELOCKED 0x08 /* underlying node already > locked */ > -#define IO_NDELAY 0x10 /* FNDELAY flag set in file > table */ > -#define IO_NOLIMIT 0x20 /* don't enforce limits on i/o > */ > -#define IO_NOCACHE 0x40 /* don't cache result of this > i/o */ > +#define IO_UNIT 0x01 /* I/O as atomic unit */ > +#define IO_APPEND 0x02 /* append write to end */ > +#define IO_SYNC 0x04 /* do I/O synchronously */ > +#define IO_NODELOCKED 0x08 /* underlying node already > locked */ > +#define IO_NDELAY 0x10 /* FNDELAY flag set in file > table */ > +#define IO_NOLIMIT 0x20 /* don't enforce limits on i/o > */ > +#define IO_NOCACHE 0x40 /* don't cache result of this > i/o */ > +#define IO_JOURNALLOCKED 0x80 /* journal is already locked */ Same here :)