Re: [Cluster-devel] [PATCH] gfs2: Fsync parent directories

2018-02-20 Thread Dave Chinner
de stable, so must be the directory modification done during file creation. This has nothing to do with POSIX or what the "linux standard" is - this is testing whether the implementation of strictly ordered metadata journalling is correct or not. If gfs2 does not have strictly ordered metadata journalling, then it probably shouldn't run these tests Cheers, Dave. -- Dave Chinner dchin...@redhat.com

Re: [Cluster-devel] [PATCH 08/10] iomap: New iomap_written operation

2018-01-11 Thread Dave Chinner
ct iomap_ops and have existing implementations set them up as iomap_write_begin()/ iomap_write_end(). Then gfs2 can do it's special little extra bit and then call iomap_write_end() in the one call... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH 6/7] xfs: Switch to iomap for lseek SEEK_HOLE / SEEK_DATA

2017-06-17 Thread Dave Chinner
Hence this will now always report unwritten extents as data . This strikes me as a regression as we currently report them as a hole: $ xfs_io -f -c "truncate 1m" -c "falloc 0 1m" -c "seek -a -r 0" foo Whence Result HOLE0 $ I'm pretty sure that ext4 has the same behaviour when it comes to dirty page cache pages over unwritten extents .. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH 4/6] xfs: use memalloc_nofs_{save, restore} instead of memalloc_noio*

2017-02-06 Thread Dave Chinner
dit of the caller paths is done and we're 100% certain that there are no lurking deadlocks. For example, I'm pretty sure we can call into _xfs_buf_map_pages() outside of a transaction context but with an inode ILOCK held exclusively. If we then recurse into memory reclaim and try to run a transaction during reclaim, we have an inverted ILOCK vs transaction locking order. i.e. we are not allowed to call xfs_trans_reserve() with an ILOCK held as that can deadlock the log: log full, locked inode pins tail of log, inode cannot be flushed because ILOCK is held by caller waiting for log space to become available i.e. there are certain situations where holding a ILOCK is a deadlock vector. See xfs_lock_inodes() for an example of the lengths we go to avoid ILOCK based log deadlocks like this... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH 2/9] xfs: introduce and use KM_NOLOCKDEP to silence reclaim lockdep false positives

2016-12-20 Thread Dave Chinner
On Mon, Dec 19, 2016 at 02:06:19PM -0800, Darrick J. Wong wrote: > On Tue, Dec 20, 2016 at 08:24:13AM +1100, Dave Chinner wrote: > > On Thu, Dec 15, 2016 at 03:07:08PM +0100, Michal Hocko wrote: > > > From: Michal Hocko > > > > > > Now that the page al

Re: [Cluster-devel] [PATCH 2/9] xfs: introduce and use KM_NOLOCKDEP to silence reclaim lockdep false positives

2016-12-19 Thread Dave Chinner
the unnecessary KM_NOFS allocations in one go. I've never liked whack-a-mole style changes like this - do it once, do it properly Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v2 2/3] GFS2: Implement iomap for block_map

2016-11-02 Thread Dave Chinner
On Wed, Nov 02, 2016 at 09:37:00AM +, Steven Whitehouse wrote: > Hi, > > On 31/10/16 20:07, Dave Chinner wrote: > >On Sat, Oct 29, 2016 at 10:24:45AM +0100, Steven Whitehouse wrote: > >>On 28/10/16 20:29, Bob Peterson wrote: > >>>+ if (create) > >>

Re: [Cluster-devel] [PATCH v2 2/3] GFS2: Implement iomap for block_map

2016-10-31 Thread Dave Chinner
essary to do if it is already known what ranges of the file contain zeros... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH 2/2] GFS2: Add a gfs2-specific prune_icache_sb

2016-06-28 Thread Dave Chinner
On Tue, Jun 28, 2016 at 10:13:32AM +0100, Steven Whitehouse wrote: > Hi, > > On 28/06/16 03:08, Dave Chinner wrote: > >On Fri, Jun 24, 2016 at 02:50:11PM -0500, Bob Peterson wrote: > >>This patch adds a new prune_icache_sb function for the VFS slab > >>shrinker

Re: [Cluster-devel] [PATCH 2/2] GFS2: Add a gfs2-specific prune_icache_sb

2016-06-27 Thread Dave Chinner
erblock shrinker for the above reasons - it's far too easy for people to get badly wrong. If there are specific limitations on how inodes can be freed, then move the parts of inode *freeing* that cause problems to a different context via the ->evict/destroy callouts and trigger that external context processing on demand. That external context can just do bulk "if it is on the list then free it" processing, because the reclaim policy has already been executed to place that inode on the reclaim list. This is essentially what XFS does, but it also uses the ->nr_cached_objects/->free_cached_objects() callouts in the superblock shrinker to provide the reclaim rate feedback mechanism required to throttle incoming memory allocations. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH 1/2] vfs: Add hooks for filesystem-specific prune_icache_sb

2016-06-27 Thread Dave Chinner
but do not destroy/free it - you simply queue it to an internal list and then do the cleanup/freeing in your own time? i.e. why do you need a special callout just to defer freeing to another thread when we already have hooks than enable you to do this? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH 0/2] scop GFP_NOFS api

2016-05-03 Thread Dave Chinner
On Sun, May 01, 2016 at 08:19:44AM +1000, NeilBrown wrote: > On Sat, Apr 30 2016, Dave Chinner wrote: > > Indeed, blocking the superblock shrinker in reclaim is a key part of > > balancing inode cache pressure in XFS. If the shrinker starts > > hitting dirty inodes, it bl

Re: [Cluster-devel] [PATCH 0/2] scop GFP_NOFS api

2016-04-29 Thread Dave Chinner
in place, I'd then make the changes to the generic superblock shrinker code to enable finer grained reclaim and optimise the XFS shrinkers to make use of it... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH 0/2] scop GFP_NOFS api

2016-04-29 Thread Dave Chinner
hread can't keep up with all of the allocation pressure that occurs. e.g. a 20-core intel CPU with local memory will be seen as a single node and so will have a single kswapd thread to do reclaim. There's a massive imbalance between maximum reclaim rate and maximum allocation rate in situations like this. If we want memory reclaim to run faster, we to be able to do more work *now*, not defer it to a context with limited execution resources. i.e. IMO deferring more work to a single reclaim thread per node is going to limit memory reclaim scalability and performance, not improve it. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH 2/2] mm, debug: report when GFP_NO{FS, IO} is used explicitly from memalloc_no{fs, io}_{save, restore} context

2016-04-27 Thread Dave Chinner
On Wed, Apr 27, 2016 at 10:03:11AM +0200, Michal Hocko wrote: > On Wed 27-04-16 08:58:45, Dave Chinner wrote: > > On Tue, Apr 26, 2016 at 01:56:12PM +0200, Michal Hocko wrote: > > > From: Michal Hocko > > > > > > THIS PATCH IS FOR TESTING ONLY AND NOT MEANT

Re: [Cluster-devel] [PATCH 1/2] mm: add PF_MEMALLOC_NOFS

2016-04-26 Thread Dave Chinner
#x27;t actually care about in XFS at all. That way I can carry all the XFS changes in the XFS tree and not have to worry about when this stuff gets merged or conflicts with the rest of the work that is being done to the mm/ code and whatever tree that eventually lands in... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH 2/2] mm, debug: report when GFP_NO{FS, IO} is used explicitly from memalloc_no{fs, io}_{save, restore} context

2016-04-26 Thread Dave Chinner
ing to restart the flood of false positive lockdep warnings we've silenced over the years, so perhaps lockdep needs to be made smarter as well... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH] fs: return EPERM on immutable inode

2016-04-05 Thread Dave Chinner
igned-off-by: Eryu Guan > --- > > I noticed this when running LTP on overlayfs, setxattr03 failed due to > unexpected EACCES on immutable inode. This should be in the commit message itself, rather than "EPERM looks more reasonable". Other than that, change seems fine to me.

Re: [Cluster-devel] [GFS2 PATCH 1/2] GFS2: Make gfs2_clear_inode() queue the final put

2015-12-07 Thread Dave Chinner
Slab shrinker calls into vfs inode shrinker to free inodes from memory. > 7. dlm blocks on a pending fence operation. Goto 1. Therefore, the fence operation should be doing GFP_NOFS allocations to prevent re-entry into the DLM via the filesystem via the shrinker Cheers, Dave. -- Dave Chinner dchin...@redhat.com

Re: [Cluster-devel] [PATCH 2/2][v2] blk-plug: don't flush nested plug lists

2015-04-08 Thread Dave Chinner
cific plugging problem you've identified (i.e. do_direct_IO() is flushing far too frequently) rather than making a sweeping generalisation that the IO stack plugging infrastructure needs fundamental change? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH 2/2][v2] blk-plug: don't flush nested plug lists

2015-04-08 Thread Dave Chinner
e3142..c3ac5ec 100644 > --- a/fs/xfs/xfs_itable.c > +++ b/fs/xfs/xfs_itable.c > @@ -196,7 +196,7 @@ xfs_bulkstat_ichunk_ra( >&xfs_inode_buf_ops); > } > } > - blk_finish_plug(&plug); > + blk_finish

Re: [Cluster-devel] [PATCH v3] fs: record task name which froze superblock

2015-03-02 Thread Dave Chinner
On Mon, Mar 02, 2015 at 05:38:29AM +0100, Mateusz Guzik wrote: > On Sun, Mar 01, 2015 at 08:31:26AM +1100, Dave Chinner wrote: > > On Sat, Feb 28, 2015 at 05:25:57PM +0300, Alexey Dobriyan wrote: > > > Freezing and thawing are separate system calls, task which is supposed > &

Re: [Cluster-devel] [PATCH v3] fs: record task name which froze superblock

2015-02-28 Thread Dave Chinner
That should be a separate patch, sent to the scheduler maintainers for review. AFAICT, it isn't part of the user API - it's not defined in the man page which just says "can be up to 16 bytes". Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH] gfs2: use __vmalloc GFP_NOFS for fs-related allocations.

2015-02-05 Thread Dave Chinner
On Wed, Feb 04, 2015 at 09:49:50AM +, Steven Whitehouse wrote: > Hi, > > On 04/02/15 07:13, Oleg Drokin wrote: > >Hello! > > > >On Feb 3, 2015, at 5:33 PM, Dave Chinner wrote: > >>>I also wonder if vmalloc is still very slow? That was the case some &g

Re: [Cluster-devel] [PATCH] gfs2: use __vmalloc GFP_NOFS for fs-related allocations.

2015-02-05 Thread Dave Chinner
On Wed, Feb 04, 2015 at 02:13:29AM -0500, Oleg Drokin wrote: > Hello! > > On Feb 3, 2015, at 5:33 PM, Dave Chinner wrote: > >> I also wonder if vmalloc is still very slow? That was the case some > >> time ago when I noticed a problem in directory access times in gfs2, &

Re: [Cluster-devel] [PATCH] gfs2: use __vmalloc GFP_NOFS for fs-related allocations.

2015-02-03 Thread Dave Chinner
On Mon, Feb 02, 2015 at 10:30:29AM +, Steven Whitehouse wrote: > Hi, > > On 02/02/15 08:11, Dave Chinner wrote: > >On Mon, Feb 02, 2015 at 01:57:23AM -0500, Oleg Drokin wrote: > >>Hello! > >> > >>On Feb 2, 2015, at 12:37 AM, Dave Chinner wrote: &g

Re: [Cluster-devel] [PATCH] gfs2: use __vmalloc GFP_NOFS for fs-related allocations.

2015-02-02 Thread Dave Chinner
On Mon, Feb 02, 2015 at 01:57:23AM -0500, Oleg Drokin wrote: > Hello! > > On Feb 2, 2015, at 12:37 AM, Dave Chinner wrote: > > > On Sun, Feb 01, 2015 at 10:59:54PM -0500, gr...@linuxhacker.ru wrote: > >> From: Oleg Drokin > >> > >> leaf_dealloc u

Re: [Cluster-devel] [PATCH] gfs2: use __vmalloc GFP_NOFS for fs-related allocations.

2015-02-01 Thread Dave Chinner
gly and grotesque, but we've got no other way to limit reclaim context because the MM devs won't pass the vmalloc gfp context down the stack to the PTE allocations Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH 0/17 v3] quota: Unify VFS and XFS quota interfaces

2015-01-21 Thread Dave Chinner
On Wed, Jan 21, 2015 at 11:23:20PM +0100, Jan Kara wrote: > On Thu 22-01-15 08:38:26, Dave Chinner wrote: > > On Fri, Jan 16, 2015 at 01:47:34PM +0100, Jan Kara wrote: > > > Hello, > > > > > > this is another iteration of patches to unify VFS and XFS quota

Re: [Cluster-devel] [PATCH 0/17 v3] quota: Unify VFS and XFS quota interfaces

2015-01-21 Thread Dave Chinner
f copies from 3 to 2 brings just 2% > improvement in speed in my test setup and getting quota information isn't IMHO > so performance critical that it would be worth the complications of the code. I think the numbers address my concern adequately ;) Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH][try6] VFS: new want_holesize and got_holesize buffer_head flags for fiemap

2014-10-26 Thread Dave Chinner
Hi Christoph. Can you send a link to the thread regarding Dave's iomap? > proposal? I don't recall it offhand, so I don't know what it was or > why it was never implemented. I assume you mean Dave Chinner. Maybe it's > time to revisit the concept as a long-term solution.

Re: [Cluster-devel] [PATCH 03/12] xfs: Set allowed quota types

2014-10-06 Thread Dave Chinner
On Wed, Oct 01, 2014 at 09:31:25PM +0200, Jan Kara wrote: > We support user, group, and project quotas. Tell VFS about it. > > CC: x...@oss.sgi.com > CC: Dave Chinner > Signed-off-by: Jan Kara > --- > fs/xfs/xfs_super.c | 2 ++ > 1 file changed, 2 insertions(+)

Re: [Cluster-devel] [RFC PATCH 0/2] dirreadahead system call

2014-08-05 Thread Dave Chinner
On Fri, Aug 01, 2014 at 07:54:56AM +0200, Andreas Dilger wrote: > On Aug 1, 2014, at 1:53, Dave Chinner wrote: > > On Thu, Jul 31, 2014 at 01:19:45PM +0200, Andreas Dilger wrote: > >> None of these issues are relevant in the API that I'm thinking about. > >> The

Re: [Cluster-devel] [RFC PATCH 0/2] dirreadahead system call

2014-07-31 Thread Dave Chinner
On Thu, Jul 31, 2014 at 01:19:45PM +0200, Andreas Dilger wrote: > On Jul 31, 2014, at 6:49, Dave Chinner wrote: > > > >> On Mon, Jul 28, 2014 at 03:19:31PM -0600, Andreas Dilger wrote: > >>> On Jul 28, 2014, at 6:52 AM, Abhijith Das wrote: > >>> O

Re: [Cluster-devel] [RFC PATCH 0/2] dirreadahead system call

2014-07-30 Thread Dave Chinner
contains enough information to construct a valid file handle in userspace and so access to inodes found via bulkstat can be gained via the XFS open-by-handle interfaces. Again, this bypasses permissions checking and hence is a root-only operation. It does, however, avoid TOCTOU races because the open-by-handle will fail if the inode is unlinked and reallocated between the bulkstat call and the open-by-handle as the generation number in the handle will no longer match that of the inode. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [RFC PATCH 1/2] fs: Add dirreadahead syscall and VFS hooks

2014-07-30 Thread Dave Chinner
pe to which readahead() can be applied. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [RFC] readdirplus implementations: xgetdents vs dirreadahead syscalls

2014-07-30 Thread Dave Chinner
On Mon, Jul 28, 2014 at 08:22:22AM -0400, Abhijith Das wrote: > > > - Original Message - > > From: "Dave Chinner" > > To: "Zach Brown" > > Cc: "Abhijith Das" , linux-ker...@vger.kernel.org, > > "linux-fsdeve

Re: [Cluster-devel] [RFC] readdirplus implementations: xgetdents vs dirreadahead syscalls

2014-07-30 Thread Dave Chinner
On Mon, Jul 28, 2014 at 03:21:20PM -0600, Andreas Dilger wrote: > On Jul 25, 2014, at 6:38 PM, Dave Chinner wrote: > > On Fri, Jul 25, 2014 at 10:52:57AM -0700, Zach Brown wrote: > >> On Fri, Jul 25, 2014 at 01:37:19PM -0400, Abhijith Das wrote: > >>> Hi al

Re: [Cluster-devel] [RFC] readdirplus implementations: xgetdents vs dirreadahead syscalls

2014-07-25 Thread Dave Chinner
that is being optimised here (i.e. queued, ordered, issued, cached), not the directory blocks themselves. As such, why does this need to be done in the kernel? This can all be done in userspace, and even hidden within the readdir() or ftw/ntfw() implementations themselves so it's OS, kernel and filesystem independent.. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [RFC 00/32] making inode time stamps y2038 ready

2014-06-03 Thread Dave Chinner
me representation, and the kernel to be independent of the physical filesystem time encoding Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH 14/18] xfs: use generic posix ACL infrastructure

2013-12-02 Thread Dave Chinner
On Sun, Dec 01, 2013 at 03:59:17AM -0800, Christoph Hellwig wrote: > Also create inodes with the proper mode instead of fixing it up later. > > Signed-off-by: Christoph Hellwig Nice cleanup work, Christoph. Reviewed-by: Dave Chinner -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] GFS2: Use list_lru for quota lru list

2013-09-17 Thread Dave Chinner
t is taken. Indeed, why do you even need to remove the item from the LRU list when you get a reference to it? you skip referenced dquots in the isolation callback, so the only time it needs to be removed from the LRU is on reclaim. And that means you only need an atomic_dec_and_test() to determine if you need to add the dquot to the LRU So what it appears to me is that you need to do is: a) separate the dq_lru_lock > dq_lock changes into a separate patch b) separate the object reference counting from the LRU operations c) make the LRU operations the innermost operations for locking purposes d) convert to list_lru operations... Cheers, Dave. -- Dave Chinner dchin...@redhat.com

Re: [Cluster-devel] [PATCH 4/4] fs: remove obsolete simple_strto

2012-12-17 Thread Dave Chinner
On Fri, Dec 07, 2012 at 05:25:19PM +0530, Abhijit Pawar wrote: > This patch replace the obsolete simple_strto with kstrto The XFS changes look fine. Consider those: Acked-by: Dave Chinner -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] GFS2: introduce AIL lock

2011-03-13 Thread Dave Chinner
tree for the next merge. No objections. I just did a quick check of the patch again and I can't see anything obviously wrong with it, so queue it up ;) Cheers, Dave. -- Dave Chinner dchin...@redhat.com

Re: [Cluster-devel] [PATCH 4/4] gfs2: introduce AIL lock

2010-02-05 Thread Dave Chinner
On Fri, Feb 05, 2010 at 11:11:48AM +, Steven Whitehouse wrote: > Hi, > > On Fri, 2010-02-05 at 16:45 +1100, Dave Chinner wrote: > > THe log lock is currently used to protect the AIL lists and > > the movements of buffers into and out of them. The lists > > ar

Re: [Cluster-devel] [PATCH 2/4] gfs2: ordered writes are backwards

2010-02-05 Thread Dave Chinner
eed. I'm taking small steps first, though. ;) Cheers, Dave. -- Dave Chinner dchin...@redhat.com

[Cluster-devel] (no subject)

2010-02-04 Thread Dave Chinner
These patches improve sequential write IO patterns and reduce ordered write log contention. The first patch is simply for diagnosis purposes - it enabled me to see where Io was being dispatched from, and led directly to he fix in the second patch. The third patch removes the use of WRITE_SYNC_PLUG

[Cluster-devel] [PATCH 1/4] gfs2: add IO submission trace points

2010-02-04 Thread Dave Chinner
Useful for tracking down where specific IOs are being issued from. Signed-off-by: Dave Chinner --- fs/gfs2/log.c|6 ++ fs/gfs2/lops.c |6 ++ fs/gfs2/trace_gfs2.h | 41 + 3 files changed, 53 insertions(+), 0 deletions

[Cluster-devel] [PATCH 4/4] gfs2: introduce AIL lock

2010-02-04 Thread Dave Chinner
throughput. On no-op scheduler on a disk that can do 85MB/s, this increases the write rate from 65MB/s with the ordering fixes to 75MB/s. Signed-off-by: Dave Chinner --- fs/gfs2/glops.c | 10 -- fs/gfs2/incore.h |1 + fs/gfs2/log.c| 32 +--- fs

[Cluster-devel] [PATCH 2/4] gfs2: ordered writes are backwards

2010-02-04 Thread Dave Chinner
dered buffers to the tail of the ordered buffer list to ensure that IO is dispatched in the order it was submitted. This should significantly improve large sequential write speeds. On a disk capable of 85MB/s, speeds increase from 50MB/s to 65MB/s for noop and from 38MB/s to 50MB/s for cfq. Signed-off-by:

[Cluster-devel] [PATCH 3/4] gfs2: ordered buffer writes are not sync

2010-02-04 Thread Dave Chinner
make sure that all the Io is issued by unplugging the device. The use of normal WRITEs for these buffers should significantly reduce the overhead of processing in the cfq elevator and enable the disk subsystem to get much closer to disk bandwidth for large sequential writes. Signed-off-by: Dave Ch

<    1   2