Re: Plan: journalling fixes for WAPBL

2016-09-22 Thread David Holland
On Thu, Sep 22, 2016 at 07:57:00AM +0800, Paul Goyette wrote:
 > While not particularly part of wapbl itself, I would like to see its
 > callers (ie, lfs) be more modular!

lfs is not related to wapbl, or even (now) ufs.

 > Currently, ffs (whether built-in or modular) has to be built with OPTIONS
 > WAPBL enabled in order to use wapbl.  And the ffs module has to "require"
 > the wapbl module.

This is because there is allegedly-filesystem-independent wapbl code
that was thought to maybe be reusable for additional block-journaling
implementations, e.g. ext3. I have always had doubts about this and it
hasn't panned out so far.

-- 
David A. Holland
dholl...@netbsd.org


Re: Plan: journalling fixes for WAPBL

2016-09-22 Thread Thor Lancelot Simon
On Thu, Sep 22, 2016 at 04:06:55PM +0200, Manuel Bouyer wrote:
> On Thu, Sep 22, 2016 at 07:50:27AM -0400, Thor Lancelot Simon wrote:
> > On Thu, Sep 22, 2016 at 01:27:48AM +0200, Jarom??r Dole??ek wrote:
> > > 
> > > 3.2 use FUA (Force Unit Access) for commit record write
> > > This avoids need to issue even the second DIOCCACHESYNC, as flushing
> > > the disk cache is not really all that useful, I like the thread over
> > > at:
> > > http://yarchive.net/comp/linux/drive_caches.html
> > > Slightly less controversially, this would allow the rest of the
> > > journal records to be written asynchronously, leaving them to execute
> > > even after commit if so desired. It may be useful to have this
> > > behaviour optional. I lean towards skipping the disk cache flush as
> > > default behaviour however, if we implement write barrier for the
> > > commit record (see below).
> > > WAPBL would need to deal with drives without FUA, i.e fall back to cache 
> > > flush.
> > 
> > I have never understood this business about needing FUA to implement
> > barriers.  AFAICT, for any SCSI or SCSI-like disk device, all that is
> > actually needed is to do standard writes with simple tags, and barrier
> > writes with ordered tags.  What am I missing?
> 
> AFAIK ordered tags only guarantees that the write will happen in order,
> but not that the writes are actually done to stable storage.

The target's not allowed to report the command complete unless the data
are on stable storage, except if you have write cache enable set in the
relevant mode page.

If you run SCSI drives like that, you're playing with fire.  Expect to get
burned.  The whole point of tagged queueing is to let you *not* set that
bit in the mode pages and still get good performance.

Thor


Re: Plan: journalling fixes for WAPBL

2016-09-22 Thread Manuel Bouyer
On Thu, Sep 22, 2016 at 07:50:27AM -0400, Thor Lancelot Simon wrote:
> On Thu, Sep 22, 2016 at 01:27:48AM +0200, Jarom??r Dole??ek wrote:
> > 
> > 3.2 use FUA (Force Unit Access) for commit record write
> > This avoids need to issue even the second DIOCCACHESYNC, as flushing
> > the disk cache is not really all that useful, I like the thread over
> > at:
> > http://yarchive.net/comp/linux/drive_caches.html
> > Slightly less controversially, this would allow the rest of the
> > journal records to be written asynchronously, leaving them to execute
> > even after commit if so desired. It may be useful to have this
> > behaviour optional. I lean towards skipping the disk cache flush as
> > default behaviour however, if we implement write barrier for the
> > commit record (see below).
> > WAPBL would need to deal with drives without FUA, i.e fall back to cache 
> > flush.
> 
> I have never understood this business about needing FUA to implement
> barriers.  AFAICT, for any SCSI or SCSI-like disk device, all that is
> actually needed is to do standard writes with simple tags, and barrier
> writes with ordered tags.  What am I missing?

AFAIK ordered tags only guarantees that the write will happen in order,
but not that the writes are actually done to stable storage.
If you get a fsync() from userland, you have to do a cache flush (or FUA).

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Plan: journalling fixes for WAPBL

2016-09-22 Thread Thor Lancelot Simon
On Thu, Sep 22, 2016 at 01:27:48AM +0200, Jarom??r Dole??ek wrote:
> 
> 3.2 use FUA (Force Unit Access) for commit record write
> This avoids need to issue even the second DIOCCACHESYNC, as flushing
> the disk cache is not really all that useful, I like the thread over
> at:
> http://yarchive.net/comp/linux/drive_caches.html
> Slightly less controversially, this would allow the rest of the
> journal records to be written asynchronously, leaving them to execute
> even after commit if so desired. It may be useful to have this
> behaviour optional. I lean towards skipping the disk cache flush as
> default behaviour however, if we implement write barrier for the
> commit record (see below).
> WAPBL would need to deal with drives without FUA, i.e fall back to cache 
> flush.

I have never understood this business about needing FUA to implement
barriers.  AFAICT, for any SCSI or SCSI-like disk device, all that is
actually needed is to do standard writes with simple tags, and barrier
writes with ordered tags.  What am I missing?

I must have proposed adding a B_ARRIER or B_ORDERED at least five times
over the years.  There are always objections...

Thor


Re: Plan: journalling fixes for WAPBL

2016-09-22 Thread Taylor R Campbell
   Date: Wed, 21 Sep 2016 17:06:18 -0700
   From: Brian Buhrow 

   hello.  Does this discussion imply that the WAPBL log/journaling
   function is broken in NetBSD-current?  Are we back to straight FFS as it
   was before the days of WAPBL or softdep?  Please tell me I'm mistaken about
   this.  If so, that's quite a regression, even from NetBSD-5 where both
   WAPBL log and softdep work quite well.

It is no more broken than it was in netbsd-5.