Re: [PATCH 1/7] xfs: always use DAX if mount option is used

2017-09-26 Thread Dave Chinner
On Tue, Sep 26, 2017 at 11:35:48AM +0200, Jan Kara wrote: > On Tue 26-09-17 09:38:12, Dave Chinner wrote: > > On Mon, Sep 25, 2017 at 05:13:58PM -0600, Ross Zwisler wrote: > > > Before support for the per-inode DAX flag was disabled the XFS the code > > > had &g

Re: false positive lockdep splat with loop device

2017-09-25 Thread Dave Chinner
On Thu, Sep 21, 2017 at 09:43:41AM +0300, Amir Goldstein wrote: > On Thu, Sep 21, 2017 at 1:22 AM, Dave Chinner wrote: > > [cc lkml, PeterZ and Byungchul] > ... > > The thing is, this IO completion has nothing to do with the lower > > filesystem - it's the IO complet

Re: shared/298 lockdep splat?

2017-09-25 Thread Dave Chinner
On Thu, Sep 21, 2017 at 05:47:14PM +0900, Byungchul Park wrote: > On Thu, Sep 21, 2017 at 08:22:56AM +1000, Dave Chinner wrote: > > Peter, this is the sort of false positive I mentioned were likely to > > occur without some serious work to annotate the IO stack to prevent > &g

Re: [PATCH 7/7] xfs: re-enable XFS per-inode DAX

2017-09-25 Thread Dave Chinner
sem, then we have a deadlock vector. Historically we've avoided any mm/ level interactions under the ILOCK_EXCL because of it's location in the page fault path locking order (e.g. lockdep will go nuts if we take a page fault with the ILOCK held). Hence I'm extremely wary of putting any other mm/ level locks under the ILOCK like this without a clear explanation of the locking orders and why it won't deadlock Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 1/7] xfs: always use DAX if mount option is used

2017-09-25 Thread Dave Chinner
off/on dax for the things that didn't/did work with DAX correctly so they didn't need multiple filesystems on pmem to segregate the apps that did/didn't work with DAX... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 4/7] xfs: protect S_DAX transitions in XFS write path

2017-09-25 Thread Dave Chinner
re lots of applications out there that rely on these semantics for performance. CHeers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 3/7] xfs: protect S_DAX transitions in XFS read path

2017-09-25 Thread Dave Chinner
; file_accessed(iocb->ki_filp); > - > - xfs_ilock(ip, XFS_IOLOCK_SHARED); > - ret = iomap_dio_rw(iocb, to, &xfs_iomap_ops, NULL); > - xfs_iunlock(ip, XFS_IOLOCK_SHARED); > - > - return ret; > + return iomap_dio_rw(iocb, to, &xfs_iomap_ops, NULL); This puts file_accessed under the XFS_IOLOCK_SHARED now. Is that a safe/sane thing to do for DIO? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: shared/298 lockdep splat?

2017-09-20 Thread Dave Chinner
erious work to annotate the IO stack to prevent them. We can nest multiple layers of IO completions and locking in the IO stack via things like loop and RAID devices. They can be nested to arbitrary depths, too (e.g. loop on fs on loop on fs on dm-raid on n * (loop on fs) on bdev) so this new completion lockdep checking is going to be a source of false positives until there is an effective (and simple!) way of providing context based completion annotations to avoid them... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [linux-next][XFS][trinity] WARNING: CPU: 32 PID: 31369 at fs/iomap.c:993

2017-09-18 Thread Dave Chinner
On Mon, Sep 18, 2017 at 05:00:58PM -0500, Eric Sandeen wrote: > On 9/18/17 4:31 PM, Dave Chinner wrote: > > On Mon, Sep 18, 2017 at 09:28:55AM -0600, Jens Axboe wrote: > >> On 09/18/2017 09:27 AM, Christoph Hellwig wrote: > >>> On Mon, Sep 18, 2017 at 08:26:

Re: [linux-next][XFS][trinity] WARNING: CPU: 32 PID: 31369 at fs/iomap.c:993

2017-09-18 Thread Dave Chinner
lem triage. Yes, the first invalidation should also have a comment like the post IO invalidation - the comment probably got dropped and not noticed when the changeover from internal XFS code to generic iomap code was made... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [linux-next][XFS][trinity] WARNING: CPU: 32 PID: 31369 at fs/iomap.c:993

2017-09-18 Thread Dave Chinner
being triggered. It needs to be on by default, bu tI'm sure we can wrap it with something like an xfs_alert_tag() type of construct so the tag can be set in /proc/fs/xfs/panic_mask to suppress it if testers so desire. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [GIT PULL] overlayfs update for 4.14

2017-09-17 Thread Dave Chinner
sharing of multiply referenced data blocks. I don't see overlay being involved in this functionality at all Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: iov_iter_pipe warning.

2017-09-11 Thread Dave Chinner
On Mon, Sep 11, 2017 at 09:07:13PM +0100, Al Viro wrote: > On Mon, Sep 11, 2017 at 04:44:40PM +1000, Dave Chinner wrote: > > > > iov_iter_get_pages() for pipe-backed destination does page allocation > > > and inserts freshly allocated pages into pipe. > > > &

Re: iov_iter_pipe warning.

2017-09-10 Thread Dave Chinner
On Mon, Sep 11, 2017 at 04:32:22AM +0100, Al Viro wrote: > On Mon, Sep 11, 2017 at 10:31:13AM +1000, Dave Chinner wrote: > > > splice does not go down the direct IO path, so iomap_dio_actor() > > should never be handled a pipe as the destination for the IO data. > > In

Re: iov_iter_pipe warning.

2017-09-10 Thread Dave Chinner
On Mon, Sep 11, 2017 at 12:07:23AM +0100, Al Viro wrote: > On Mon, Sep 11, 2017 at 08:08:14AM +1000, Dave Chinner wrote: > > On Sun, Sep 10, 2017 at 10:19:07PM +0100, Al Viro wrote: > > > On Mon, Sep 11, 2017 at 07:11:10AM +1000, Dave Chinner wrote: > > > > On Sun, Se

Re: iov_iter_pipe warning.

2017-09-10 Thread Dave Chinner
On Sun, Sep 10, 2017 at 10:19:07PM +0100, Al Viro wrote: > On Mon, Sep 11, 2017 at 07:11:10AM +1000, Dave Chinner wrote: > > On Sun, Sep 10, 2017 at 03:57:21AM +0100, Al Viro wrote: > > > On Sat, Sep 09, 2017 at 09:07:56PM -0400, Dave Jones wrote: > > > > >

Re: iov_iter_pipe warning.

2017-09-10 Thread Dave Chinner
27;t end up chasing ghosts when we see that warning in the logs. The usual vector is an app that mixes concurrent DIO with mmap access to the same file, which we explicitly say "don't do this because data corruption" in the open(2) man page Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: XFS mounted with 'discard' option - deleting fio test files slow

2017-09-07 Thread Dave Chinner
sys 0m25.524s 4k random write with direct IO. 5GB file. Probably got a million 4k extents in it. Which means XFS has sent a million tiny 4k discards to the device. Run 'xfs_bmap -vvp fio_test_file.*' to confirm. Don't use "-o discard" if you care about performance. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 0/9] add ext4 per-inode DAX flag

2017-09-07 Thread Dave Chinner
On Thu, Sep 07, 2017 at 04:19:00PM -0600, Ross Zwisler wrote: > On Fri, Sep 08, 2017 at 08:12:01AM +1000, Dave Chinner wrote: > > On Thu, Sep 07, 2017 at 03:51:48PM -0600, Ross Zwisler wrote: > > > On Thu, Sep 07, 2017 at 03:26:10PM -0600, Andreas Dilger wrote: > > >

Re: [PATCH 0/9] add ext4 per-inode DAX flag

2017-09-07 Thread Dave Chinner
then the only hammer we have is Brutus^Wdrop_caches. That's not an option for production machines. Neat idea, but one I'd already thought of and discarded as "not practical from an admin perspective". Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: iov_iter_pipe warning.

2017-09-06 Thread Dave Chinner
;s warning that the pipe buffer is already full before we try to read from the filesystem? That doesn't seem like an XFS problem - it indicates the pipe we are filling in generic_file_splice_read() is not being emptied by whatever we are splicing the file data to Cheers, Dave. -- Dave Chinner da...@fromorbit.com

[PATCH] swapon: fix vfree() badness

2017-09-04 Thread Dave Chinner
From: Dave Chinner The cluster_info structure is allocated with kvzalloc(), which can return kmalloc'd or vmalloc'd memory. It must be paired with kvfree(), but sys_swapon uses vfree(), resultin in this warning from xfstests generic/357: [ 1985.294915] swapon: swapfile has holes [ 1

Re: linux-next: build warning after merge of the xfs tree

2017-08-31 Thread Dave Chinner
; + boolordered; > > > + > > > + aborted = !!(lip->li_flags & XFS_LI_ABORTED); > > > + hold = !!(bip->bli_flags & XFS_BLI_HOLD); > > > + dirty = !!(bip->bli_flags & XFS_BLI_DIRTY); > > > + ordered = !!(bip->bli_flags

Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache

2017-08-30 Thread Dave Chinner
On Wed, Aug 30, 2017 at 12:14:03AM -0700, Christoph Hellwig wrote: > On Wed, Aug 30, 2017 at 07:51:57AM +1000, Dave Chinner wrote: > > Right, I've looked at btrees, too, but it's more complex than just > > using an rbtree. I originally looked at using Peter Z's old &g

Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache

2017-08-29 Thread Dave Chinner
gt; seemed like too large a CC list. :) I can explicitly add the xfs list > to the first three for any future versions. If you are touching multiple filesystems, you really should cc the entire patchset to linux-fsdevel, similar to how you sent the entire patchset to lkml. That way the entire series will end up on a list that almost all fs developers read. LKML is not a list you can rely on all filesystem developers reading (or developers in any other subsystem, for that matter)... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache

2017-08-29 Thread Dave Chinner
On Tue, Aug 29, 2017 at 05:45:36AM -0700, Christoph Hellwig wrote: > On Tue, Aug 29, 2017 at 10:31:26PM +1000, Dave Chinner wrote: > > Probably should. I've already been looking at killing the inline > > extents array to simplify the management of the extent list (much >

Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache

2017-08-29 Thread Dave Chinner
ling the inline data would get rid of the other part of the union the inline data sits in. OTOH, if we're going to have to dynamically allocate the memory for the extent/inline data for the data fork, it may just be easier to make the entire data fork a dynamic allocation (like the attr fork).

Re: [PATCH] xfs: Drop setting redundant PF_KSWAPD in kswapd context

2017-08-24 Thread Dave Chinner
different context. So this patch > loses the kswapd context. Yup. That's what the code does, and removing the PF_KSWAPD from it will break it. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v3 1/3] lockdep: Make LOCKDEP_CROSSRELEASE configs all part of PROVE_LOCKING

2017-08-22 Thread Dave Chinner
On Tue, Aug 22, 2017 at 11:06:03AM +0200, Peter Zijlstra wrote: > On Tue, Aug 22, 2017 at 03:46:03PM +1000, Dave Chinner wrote: > > Even if I ignore the fact that buffer completions are run on > > different workqueues, there seems to be a bigger problem with this > > sort o

Re: [PATCH v3 1/3] lockdep: Make LOCKDEP_CROSSRELEASE configs all part of PROVE_LOCKING

2017-08-21 Thread Dave Chinner
ch problems. i.e. the inode locks we hold at this point in the truncate process (i.e. the XFS_IOLOCK a.k.a i_rwsem) prevent new IO from being run, and we don't start the truncate until we've waited for all in progress IO to complete. Hence while the truncate runs and blocks on metadata IO completions, no data IO can be in progress on that inode, so there is no completions being run on that inode in workqueues. And therefore the IO completion deadlock path reported by lockdep can not actually be executed during a truncate, and so it's a false positive. Back to the drawing board, I guess Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v4 0/3] MAP_DIRECT and block-map sealed files

2017-08-15 Thread Dave Chinner
d_create() to manage shared access to anonymous tmpfs files and will EINVAL on any fd that points to a real file. Oh, even more problematic: Seals are a property of an inode. [] Furthermore, seals can never be removed, only added. That seems somewhat difficult to reconcile with how I need F_SEAL_IOMAP to operate. /me calls it a day and goes looking for the hard liquor. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v2 0/5] fs, xfs: block map immutable files for dax, dma-to-storage, and swap

2017-08-13 Thread Dave Chinner
e seal is going to be broken by the filesystem via the break_layouts() interface, and the break then blocks until the app releases the lease? So the seal lifetime is bounded by the lease? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v3 2/6] fs, xfs: introduce FALLOC_FL_SEAL_BLOCK_MAP

2017-08-11 Thread Dave Chinner
On Fri, Aug 11, 2017 at 07:31:54PM -0700, Darrick J. Wong wrote: > On Sat, Aug 12, 2017 at 10:30:34AM +1000, Dave Chinner wrote: > > On Fri, Aug 11, 2017 at 04:42:18PM -0700, Dan Williams wrote: > > > On Fri, Aug 11, 2017 at 4:27 PM, Dave Chinner wrote: > > > > On T

Re: [PATCH v3 2/6] fs, xfs: introduce FALLOC_FL_SEAL_BLOCK_MAP

2017-08-11 Thread Dave Chinner
On Fri, Aug 11, 2017 at 04:42:18PM -0700, Dan Williams wrote: > On Fri, Aug 11, 2017 at 4:27 PM, Dave Chinner wrote: > > On Thu, Aug 10, 2017 at 11:39:28PM -0700, Dan Williams wrote: > >> >From falloc.h: > >> > >> FALLOC_FL_SEAL_BLOCK_MAP is u

Re: [PATCH v3 6/6] mm, xfs: protect swapfile contents with immutable + unwritten extents

2017-08-11 Thread Dave Chinner
e user downgrades their kernel the swapfile suddenly can not be used by the older kernel. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v3 2/6] fs, xfs: introduce FALLOC_FL_SEAL_BLOCK_MAP

2017-08-11 Thread Dave Chinner
just one thing - having the seal operation also modify the extent map means it's not useful for the use cases where we need the extent map to remain unmodified Thoughts? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v2 1/5] fs, xfs: introduce S_IOMAP_IMMUTABLE

2017-08-06 Thread Dave Chinner
d rather than discussion and review being shut down because "Christoph shouted nasty words at me but I still don't understand why?". Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v2 2/5] fs, xfs: introduce FALLOC_FL_SEAL_BLOCK_MAP

2017-08-04 Thread Dave Chinner
On Fri, Aug 04, 2017 at 04:43:50PM -0700, Dan Williams wrote: > On Fri, Aug 4, 2017 at 4:31 PM, Dave Chinner wrote: > > On Thu, Aug 03, 2017 at 07:28:17PM -0700, Dan Williams wrote: > >> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c > >> index fe0f8f

Re: [PATCH v2 4/5] xfs: introduce XFS_DIFLAG2_IOMAP_IMMUTABLE

2017-08-04 Thread Dave Chinner
allocate(FALLOC_FL_[UN]SEAL_BLOCK_MAP). Support for toggling this > > on-disk state is saved for a later patch. > > > > Cc: Jan Kara > > Cc: Jeff Moyer > > Cc: Christoph Hellwig > > Cc: Ross Zwisler > > Suggested-by: Dave Chinner > > Sugges

Re: [PATCH v2 2/5] fs, xfs: introduce FALLOC_FL_SEAL_BLOCK_MAP

2017-08-04 Thread Dave Chinner
ling, so we've already guaranteed that it won't have holes in it. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 1/3] fs, xfs: introduce S_IOMAP_IMMUTABLE

2017-07-31 Thread Dave Chinner
can't imagine why anyone would want to turn a swap file back into a regular > file. > I haven't fully followed DAX, but I'd take your word for it if people want to > be able to remove the flag after. DAX isn't the driver of that functionality, it's the other use cases that need it, and why the proposed "only remove flag if len == 0" API is a non-starter Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 3/3] xfs: persist S_IOMAP_IMMUTABLE in di_flags2

2017-07-31 Thread Dave Chinner
code will now fail to allocate/zero anything... IOWs, this flag should be the last thing that is set on the inode once it's been fully allocated and zeroed. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 2/3] fs, xfs: introduce FALLOC_FL_SEAL_BLOCK_MAP

2017-07-31 Thread Dave Chinner
the file is mapped or not. Perhaps it would be better to start with a man page documenting the desired API? FWIW, the if/else if/else structure could be cleaned up with a simple "goto out_unlock" construct such as: /* don't make immutable if inode is currently mapped */ error = -EBUSY; if (mapping_mapped(mapping)) goto out_unlock; /* can't do anything if inode is already immutable */ error = -ETXTBSY; if (IS_IMMUTABLE(inode) || IS_IOMAP_IMMUTABLE(inode)) goto out_unlock; /* XFS only supports whole file extent immutability */ error = -EINVAL; if (len != i_size_read(inode)) goto out_unlock; /* all good to go */ error = 0; out_unlock: xfs_iunlock(ip, XFS_ILOCK_EXCL); i_mmap_unlock_read(mapping); if (error) return error; /* now unshare, allocate and add immutable flag */ Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [rfc] superblock shrinker accumulating excessive deferred counts

2017-07-18 Thread Dave Chinner
On Tue, Jul 18, 2017 at 05:28:14PM -0700, David Rientjes wrote: > On Tue, 18 Jul 2017, Dave Chinner wrote: > > > > Thanks for looking into this, Dave! > > > > > > The number of GFP_NOFS allocations that build up the deferred counts can > > > be unboun

Re: [rfc] superblock shrinker accumulating excessive deferred counts

2017-07-17 Thread Dave Chinner
On Mon, Jul 17, 2017 at 01:37:35PM -0700, David Rientjes wrote: > On Mon, 17 Jul 2017, Dave Chinner wrote: > > > > This is a side effect of super_cache_count() returning the appropriate > > > count but super_cache_scan() refusing to do anything about it and > >

Re: [rfc] superblock shrinker accumulating excessive deferred counts

2017-07-16 Thread Dave Chinner
n.. OTOH, if we don't damp down the deferred count scanning on small deltas, then we end up with filesystem caches being trashed in light memory pressure conditions. This is, generally speaking, bad for workloads that rely on filesystem caches for performance (e.g git, NFS servers, etc). What we have now is effectively a brute force solution that finds a decent middle ground most of the time. It's not perfect, but I'm yet to find a better solution Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH 2/2] mm, fs: daxfile, an interface for byte-addressable updates to pmem

2017-06-22 Thread Dave Chinner
On Wed, Jun 21, 2017 at 09:07:57PM -0700, Andy Lutomirski wrote: > On Wed, Jun 21, 2017 at 5:02 PM, Dave Chinner wrote: > > > > You seem to be calling the "fdatasync on every page fault" the > > It's the opposite of fdatasync(). It needs to sync whatever m

Re: [RFC PATCH 2/2] mm, fs: daxfile, an interface for byte-addressable updates to pmem

2017-06-21 Thread Dave Chinner
On Tue, Jun 20, 2017 at 10:18:24PM -0700, Andy Lutomirski wrote: > On Tue, Jun 20, 2017 at 6:40 PM, Dave Chinner wrote: > >> A per-inode > >> count of the number of live DAX mappings or of the number of struct > >> file instances that have requested DAX would work

Re: [RFC PATCH 2/2] mm, fs: daxfile, an interface for byte-addressable updates to pmem

2017-06-21 Thread Dave Chinner
t; > + > > +SYSCALL_DEFINE3(daxctl, const char __user *, path, int, flags, int, align) > > I was /about/ to grouse about this syscall, then realized that maybe it > /is/ useful to be able to check a specific alignment. Maybe not, since > I had something more permanent in mind anyway. In any case, just pass > in an opened fd if this sticks around. We can do all that via fallocate(), too... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH 2/2] mm, fs: daxfile, an interface for byte-addressable updates to pmem

2017-06-20 Thread Dave Chinner
On Tue, Jun 20, 2017 at 06:24:03PM -0700, Darrick J. Wong wrote: > On Wed, Jun 21, 2017 at 09:53:46AM +1000, Dave Chinner wrote: > > On Tue, Jun 20, 2017 at 09:17:36AM -0700, Dan Williams wrote: > > > An immutable-extent DAX-file and a reflink-capable DAX-file are not > &

Re: [RFC PATCH 2/2] mm, fs: daxfile, an interface for byte-addressable updates to pmem

2017-06-20 Thread Dave Chinner
On Tue, Jun 20, 2017 at 09:14:24AM -0700, Andy Lutomirski wrote: > On Tue, Jun 20, 2017 at 3:11 AM, Dave Chinner wrote: > > On Mon, Jun 19, 2017 at 10:53:12PM -0700, Andy Lutomirski wrote: > >> On Mon, Jun 19, 2017 at 5:46 PM, Dave Chinner wrote: > >> > On Mon, Ju

Re: [RFC PATCH 2/2] mm, fs: daxfile, an interface for byte-addressable updates to pmem

2017-06-20 Thread Dave Chinner
n. However, we cannot guarantee that no writes occur to the inode with immutable extent maps (especially as the whole point is to allow userspace writes and commits without the kernel being involved), so extent sharing on immutable extent maps cannot be allowed... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH 2/2] mm, fs: daxfile, an interface for byte-addressable updates to pmem

2017-06-20 Thread Dave Chinner
On Mon, Jun 19, 2017 at 10:53:12PM -0700, Andy Lutomirski wrote: > On Mon, Jun 19, 2017 at 5:46 PM, Dave Chinner wrote: > > On Mon, Jun 19, 2017 at 08:22:10AM -0700, Andy Lutomirski wrote: > >> Second: syncing extents. Here's a straw man. Forget the mmap() flag. >

Re: [RFC PATCH 2/2] mm, fs: daxfile, an interface for byte-addressable updates to pmem

2017-06-19 Thread Dave Chinner
On Mon, Jun 19, 2017 at 08:22:10AM -0700, Andy Lutomirski wrote: > On Mon, Jun 19, 2017 at 6:21 AM, Dave Chinner wrote: > > On Sat, Jun 17, 2017 at 10:05:45PM -0700, Andy Lutomirski wrote: > >> On Sat, Jun 17, 2017 at 8:15 PM, Dan Williams > >> wrote: > >&g

Re: [RFC PATCH 2/2] mm, fs: daxfile, an interface for byte-addressable updates to pmem

2017-06-19 Thread Dave Chinner
ly like this: > > if (metadata is dirty) { > up_write(&mmap_sem); > sync the metadata; > down_write(&mmap_sem); > return 0; /* retry the fault */ > } else { > return whatever success code; > } How do you know that there is dependent filesystem metadata that needs syncing at a level that you can safely manipulate the mmap_sem? And how, exactly, do you do this without races? It'd be trivial to DOS such retryable DAX faults simply by touching the file in a tight loop in a separate process... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization

2017-04-04 Thread Dave Chinner
On Mon, Apr 03, 2017 at 04:00:55PM +0200, Jan Kara wrote: > On Sun 02-04-17 09:05:26, Dave Chinner wrote: > > On Thu, Mar 30, 2017 at 12:12:31PM -0400, J. Bruce Fields wrote: > > > On Thu, Mar 30, 2017 at 07:11:48AM -0400, Jeff Layton wrote: > > > > On Thu, 2017-

Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization

2017-04-01 Thread Dave Chinner
ven know there was a crash at mount time because their architecture always leaves a consistent filesystem on disk (e.g. COW filesystems) > I wonder if repeated crashes can lead to any odd corner cases. WIthout defined, locked down behavour of the superblock counter, the almost certainly corner cases will exist... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization

2017-03-29 Thread Dave Chinner
ync() into ->getattr() (and dealt with all the locking issues that entails), by the time the statx syscall returns to userspace the i_version value may not match the data/metadata in the inode(*). IOWs, by the time i_version gets to userspace, it is out of date and any use of it for data versioning from userspace is going to be prone to race conditions. Cheers, Dave. (*) fiemap has exactly the same "stale the moment internal fs locks are released" race conditions, which is why it cannot safely be used for mapping holes when copying file data -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization

2017-03-21 Thread Dave Chinner
as the NFS clients are accessing and requiring synchronisation. > Not sure how big a problem that really is. This coherency problem has always existed on the server side... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH] xfs: remove kmem_zalloc_greedy

2017-03-06 Thread Dave Chinner
e asking if we should try > kmem_zalloc(4 pages), then kmem_zalloc(1 page), and only then switch to > the __vmalloc calls? Just call kmem_zalloc_large() for 4 pages without a fallback on failure - that's exactly how we handle allocations for things like the 64k xattr buffers Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail

2017-03-03 Thread Dave Chinner
On Fri, Mar 03, 2017 at 03:19:12PM -0800, Darrick J. Wong wrote: > On Sat, Mar 04, 2017 at 09:54:44AM +1100, Dave Chinner wrote: > > On Thu, Mar 02, 2017 at 04:45:40PM +0100, Michal Hocko wrote: > > > From: Michal Hocko > > > > > > Even though kmem_zalloc_gr

Re: [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail

2017-03-03 Thread Dave Chinner
<= minsize) > kmsize = minsize; > } Seems wrong to me - this function used to have lots of callers and over time we've slowly removed them or replaced them with something else. I'd suggest removing it completely, replacing the call sites with kmem_zalloc_large(). Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 1/7] fs, xfs: convert xfs_bui_log_item.bui_refcount from atomic_t to refcount_t

2017-02-22 Thread Dave Chinner
ject code. Any change to code in this area needs to be gone over with a fine tooth comb, because bugs can result in filesystem and/or journal corruption issues that may not be noticed until a system crashes and log recovery fails and the user loses their entire filesystem.... Hence the repeated comments about needing to actually test the code you are changing. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 2/7] fs, xfs: convert xfs_buf.b_hold and xfs_buf.b_lru_ref from atomic_t to refcount_t

2017-02-21 Thread Dave Chinner
that the object is not referenced by anyone (that's b_hold). i.e. b_lru_ref is an "active reference weighting" used to provide a heirarchical reclaim bias toward less important metadata objects, and has no bearing on the actual active users of the object. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 1/7] fs, xfs: convert xfs_bui_log_item.bui_refcount from atomic_t to refcount_t

2017-02-21 Thread Dave Chinner
> situations. I'm missing something: how do you overflow a log item object reference count? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 3.16 043/306] xfs: change mailing list address

2017-02-16 Thread Dave Chinner
On Wed, Feb 15, 2017 at 10:41:40PM +, Ben Hutchings wrote: > 3.16.40-rc1 review patch. If anyone has any objections, please let me know. > > -- > > From: Dave Chinner > > commit 541d48f05fa1c19a4a968d38df685529e728a20a upstream. > > oss.sgi.com

Re: [PATCH 4/6] xfs: use memalloc_nofs_{save,restore} instead of memalloc_noio*

2017-02-06 Thread Dave Chinner
dit of the caller paths is done and we're 100% certain that there are no lurking deadlocks. For example, I'm pretty sure we can call into _xfs_buf_map_pages() outside of a transaction context but with an inode ILOCK held exclusively. If we then recurse into memory reclaim and try to run a transaction during reclaim, we have an inverted ILOCK vs transaction locking order. i.e. we are not allowed to call xfs_trans_reserve() with an ILOCK held as that can deadlock the log: log full, locked inode pins tail of log, inode cannot be flushed because ILOCK is held by caller waiting for log space to become available i.e. there are certain situations where holding a ILOCK is a deadlock vector. See xfs_lock_inodes() for an example of the lengths we go to avoid ILOCK based log deadlocks like this... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-22 Thread Dave Chinner
On Fri, Dec 23, 2016 at 09:33:36AM +1100, Dave Chinner wrote: > On Fri, Dec 23, 2016 at 09:15:00AM +1100, Dave Chinner wrote: > > On Thu, Dec 22, 2016 at 01:10:19PM -0800, Linus Torvalds wrote: > > > Ok, so the numa issue was a red herring. With that fixed: > > > >

Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-22 Thread Dave Chinner
On Fri, Dec 23, 2016 at 09:15:00AM +1100, Dave Chinner wrote: > On Thu, Dec 22, 2016 at 01:10:19PM -0800, Linus Torvalds wrote: > > Ok, so the numa issue was a red herring. With that fixed: > > > > On Thu, Dec 22, 2016 at 1:06 PM, Dave Chinner wrote: > > > > &

Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-22 Thread Dave Chinner
On Thu, Dec 22, 2016 at 01:10:19PM -0800, Linus Torvalds wrote: > Ok, so the numa issue was a red herring. With that fixed: > > On Thu, Dec 22, 2016 at 1:06 PM, Dave Chinner wrote: > > > > Better, but still bad. average files/s is not up to 200k files/s, > > so still

Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-22 Thread Dave Chinner
On Fri, Dec 23, 2016 at 07:42:40AM +1100, Dave Chinner wrote: > On Thu, Dec 22, 2016 at 09:24:12AM -0800, Linus Torvalds wrote: > > On Wed, Dec 21, 2016 at 10:28 PM, Dave Chinner wrote: > > > > > > This sort of thing is normally indicative of a memory reclaim or &g

Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-22 Thread Dave Chinner
On Thu, Dec 22, 2016 at 09:24:12AM -0800, Linus Torvalds wrote: > On Wed, Dec 21, 2016 at 10:28 PM, Dave Chinner wrote: > > > > This sort of thing is normally indicative of a memory reclaim or > > lock contention problem. Profile showed unusual spinlock contention, > >

Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-21 Thread Dave Chinner
On Wed, Dec 21, 2016 at 09:46:37PM -0800, Linus Torvalds wrote: > On Wed, Dec 21, 2016 at 9:13 PM, Dave Chinner wrote: > > > > There may be deeper issues. I just started running scalability tests > > (e.g. 16-way fsmark create tests) and about a minute in I got a > > di

Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-21 Thread Dave Chinner
> report, so I'm not really sure what's going on here anyway. http://www.gossamer-threads.com/lists/linux/kernel/2587485 Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-21 Thread Dave Chinner
On Thu, Dec 22, 2016 at 04:13:22PM +1100, Dave Chinner wrote: > On Wed, Dec 21, 2016 at 04:13:03PM -0800, Chris Leech wrote: > > On Wed, Dec 21, 2016 at 03:19:15PM -0800, Linus Torvalds wrote: > > > Hi, > > > > > > On Wed, Dec 21, 2016 at 2:16 PM, Dave Chinner

Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-21 Thread Dave Chinner
On Wed, Dec 21, 2016 at 04:13:03PM -0800, Chris Leech wrote: > On Wed, Dec 21, 2016 at 03:19:15PM -0800, Linus Torvalds wrote: > > Hi, > > > > On Wed, Dec 21, 2016 at 2:16 PM, Dave Chinner wrote: > > > On Fri, Dec 16, 2016 at 10:59:06AM -0800, Chris L

Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-21 Thread Dave Chinner
iscsi guys seem to have bounced it and no-one is looking at it. I'm disappearing for several months at the end of tomorrow, so I thought I better make sure you know about it. I've also added linux-scsi, linux-block to the cc list Cheers, Dave. > On Thu, Dec 15, 2016 at 09:29

Re: DAX mapping detection (was: Re: [PATCH] Fix region lost in /proc/self/smaps)

2016-12-21 Thread Dave Chinner
ROT_WRITE, fd, 0); > > > > *(p + 42) = 0xDEADBEEF; > > asm { clflush; } /* or whatever */ > > > > ...so perhaps it would be a good idea to design the fallocate primitive > > around "prepare this fd for mmap-only pmem semantics" and let it the > > backend do zeroing and inode flag changes as necessary to make it > > happen. We'd need to do some bikeshedding about what the other falloc > > flags mean when we're dealing with pmem files and devices, but I think > > we should try to keep the userland presentation the same unless there's > > a really good reason not to. > > It would be interesting to use fallocate to size device-dax files... No. device-dax needs to die, not poison a bunch of existing file and block device APIs and behaviours with special snowflakes. Get DAX-enabled filesystems to do what you need, and get rid of this ugly, nasty hack. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 2/9] xfs: introduce and use KM_NOLOCKDEP to silence reclaim lockdep false positives

2016-12-20 Thread Dave Chinner
On Mon, Dec 19, 2016 at 02:06:19PM -0800, Darrick J. Wong wrote: > On Tue, Dec 20, 2016 at 08:24:13AM +1100, Dave Chinner wrote: > > On Thu, Dec 15, 2016 at 03:07:08PM +0100, Michal Hocko wrote: > > > From: Michal Hocko > > > > > > Now that the page al

Re: [PATCH 2/9] xfs: introduce and use KM_NOLOCKDEP to silence reclaim lockdep false positives

2016-12-19 Thread Dave Chinner
the unnecessary KM_NOFS allocations in one go. I've never liked whack-a-mole style changes like this - do it once, do it properly Cheers, Dave. -- Dave Chinner da...@fromorbit.com

[GIT PULL] xfs: updates for 4.10-rc1

2016-12-14 Thread Dave Chinner
| 1 + include/linux/iomap.h | 28 +- include/linux/lockdep.h| 25 +- kernel/locking/lockdep.c | 20 +- 73 files changed, 1994 insertions(+), 2063 deletions(-) -- Dave Chinner da...@fromorbit.com

Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-14 Thread Dave Chinner
On Thu, Dec 15, 2016 at 09:24:11AM +1100, Dave Chinner wrote: > Hi folks, > > Just updated my test boxes from 4.9 to a current Linus 4.10 merge > window kernel to test the XFS merge I am preparing for Linus. > Unfortunately, all my test VMs using iscsi failed pretty much > inst

[4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-14 Thread Dave Chinner
00 00 00 00 e9 ad fe ff ff 48 8b 7b 30 e8 da e7 ca ff 8b 53 10 44 89 ee 48 89 df 2b 53 14 48 89 43 30 c7 43 40 00 00 00 00 <8b [ 160.300674] RIP: iscsi_tcp_segment_done+0x20d/0x2e0 RSP: c9083c38 [ 160.301584] CR2: 000c Known problem, or something new? Cheers, Dave. -- D

Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-08 Thread Dave Chinner
is XFS's version of kvmalloc() that is GFP_NOFS/GFP_NOIO safe. Any generic API for this functionality will have to play these memalloc_noio_save/ memalloc_noio_restore games to ensure they are GFP_NOFS safe Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH] dax: try to avoid unused function warnings

2016-11-28 Thread Dave Chinner
option), but > > DAX > > and XFS at least require FS_IOMAP to behave correctly. > > > > If you made DAX a FS selectable option instead of a user selectable one, > > when > > would a FS know it needs to include DAX support? > > With a user-selectable DAX knob per-filesystem, XFS_DAX, EXT4_DAX, etc... That's just silly. Requiring users to configure every filesystem that can support DAX to support DAX at config time is unneeded config space bloat. DAX has an iomap config dependency, so just select it when DAX is selected - everything else should just be automatically and nobody else needs to care what build dependencies DAX has. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 3/6] dax: add tracepoint infrastructure, PMD tracing

2016-11-28 Thread Dave Chinner
On Mon, Nov 28, 2016 at 03:46:51PM -0700, Ross Zwisler wrote: > On Fri, Nov 25, 2016 at 02:00:59PM +1100, Dave Chinner wrote: > > On Wed, Nov 23, 2016 at 11:44:19AM -0700, Ross Zwisler wrote: > > > Tracepoints are the standard way to capture debugging and tracing > > > i

Re: [PATCH 3/6] dax: add tracepoint infrastructure, PMD tracing

2016-11-28 Thread Dave Chinner
On Sun, Nov 27, 2016 at 04:58:43PM -0800, Linus Torvalds wrote: > On Sun, Nov 27, 2016 at 2:42 PM, Dave Chinner wrote: > > > > And that's exactly why we need a method of marking tracepoints as > > stable. How else are we going to know whether a specific tracepoint > &

Re: [PATCH 3/6] dax: add tracepoint infrastructure, PMD tracing

2016-11-27 Thread Dave Chinner
re decides to use it in userspace" policy. > But tracing actual high-level things like IO and faults? I think that > makes perfect sense, as long as the data that is collected is also the > actual event data, and not so much a random implementation issue of > the day. IME, a tracepoint that doesn't expose detailed context specific information isn't really useful for complex problem diagnosis... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 3/6] dax: add tracepoint infrastructure, PMD tracing

2016-11-24 Thread Dave Chinner
On Fri, Nov 25, 2016 at 04:14:19AM +, Al Viro wrote: > [Linus Cc'd] > > On Fri, Nov 25, 2016 at 01:49:18PM +1100, Dave Chinner wrote: > > > they have become parts of stable userland ABI and are to be maintained > > > indefinitely. Don't expect &quo

Re: [PATCH 3/6] dax: add tracepoint infrastructure, PMD tracing

2016-11-24 Thread Dave Chinner
umber like so: xfs_ilock:dev 8:96 ino 0x493 flags ILOCK_EXCL This way we can filter the output easily across both dax and filesystem tracepoints with 'grep "ino 0x493"'... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 3/6] dax: add tracepoint infrastructure, PMD tracing

2016-11-24 Thread Dave Chinner
tential stable ABI > you might have to keep around forever. It's *not* a glorified debugging > printk. trace_printk() is the glorified debugging printk for tracing, not trace events. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH] x86: fix kaslr and memmap collision

2016-11-23 Thread Dave Chinner
purposes. My "pmem" test VM always has at least 2 ranges set to give me two discrete pmem devices, and I have used 4 from time to time to do things like test multi-volume scratch XFS filesystems in xfstests (i.e. data, log and realtime volumes) so I didn't need to play games with partitioning or DM... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 1/4] statx: Add a system call to make enhanced file info available

2016-11-22 Thread Dave Chinner
On Tue, Nov 22, 2016 at 10:39:29AM +, David Howells wrote: > Dave Chinner wrote: > > > No. Just provide a 64 bit high resoultion field, and define it to > > contain nanoseconds. When we need higher resolution to be exported > > to userspace, we use a /feature f

Re: [PATCH 1/4] statx: Add a system call to make enhanced file info available

2016-11-21 Thread Dave Chinner
It doesn't take much vision to extend the current hardare capabilities with coherent hardware accelerators (e.g. as has been added to the Power platform) writing directly into pmem storage and providing higher resolution timestamps than the CPU can generate. Call me silly if you want - I don't care - but let's not ignore the emerging storage technology trends that are there for everyone to see... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 1/4] statx: Add a system call to make enhanced file info available

2016-11-19 Thread Dave Chinner
On Fri, Nov 18, 2016 at 10:54:02PM +, David Howells wrote: > Dave Chinner wrote: > > > And when we start thinking in those timeframes, an > > increase in timestamp resoultion of at least another 10e-3 is > > likely > > Is it, though? To be useful, sur

Re: [PATCH 1/4] statx: Add a system call to make enhanced file info available

2016-11-18 Thread Dave Chinner
On Fri, Nov 18, 2016 at 09:48:21PM +, David Howells wrote: > Dave Chinner wrote: > > > > Btw, can you point me at the manpage that defines the fsxattr struct and > > > its > > > flags? > > > > man xfsctl is the original source. However, &g

Re: [PATCH 1/4] statx: Add a system call to make enhanced file info available

2016-11-18 Thread Dave Chinner
On Thu, Nov 17, 2016 at 08:28:57PM -0700, Andreas Dilger wrote: > On Nov 17, 2016, at 4:40 PM, Dave Chinner wrote: > >> > >> Time fields are split into separate seconds and nanoseconds fields to make > >> packing easier and the granularities can be queried with t

Re: [PATCH 1/4] statx: Add a system call to make enhanced file info available

2016-11-18 Thread Dave Chinner
On Fri, Nov 18, 2016 at 09:43:38AM +, David Howells wrote: > Dave Chinner wrote: > > > Fields in struct statx come in a number of classes: > > > > > > (0) stx_dev_*, stx_blksize. > > > > > > These are local system information and are

Re: [PATCH 1/4] statx: Add a system call to make enhanced file info available

2016-11-18 Thread Dave Chinner
On Fri, Nov 18, 2016 at 10:29:04AM +, David Howells wrote: > Dave Chinner wrote: > > > > (13) FS_IOC_GETFLAGS value. These could be translated to BSD's st_flags. > > > Note that the Linux IOC flags are a mess and filesystems such as Ext4 > > >

Re: [PATCH 1/4] statx: Add a system call to make enhanced file info available

2016-11-17 Thread Dave Chinner
t; The file is built automatically if CONFIG_SAMPLES is enabled. Can we get xfstests written to exercise and validate all this functionality, please? I'd suggest that adding xfs_io support for the statx syscall would be far more useful for xfstests than a standalone test program, too. We already have equivalent stat() functionality in xfs_io and that's used quite a bit in xfstests Cheers, Dave. -- Dave Chinner da...@fromorbit.com

<    2   3   4   5   6   7   8   9   10   11   >