de stable, so must
be the directory modification done during file creation.
This has nothing to do with POSIX or what the "linux standard" is -
this is testing whether the implementation of strictly ordered
metadata journalling is correct or not. If gfs2 does not have
strictly ordered metadata journalling, then it probably shouldn't
run these tests
Cheers,
Dave.
--
Dave Chinner
dchin...@redhat.com
ct iomap_ops and have
existing implementations set them up as iomap_write_begin()/
iomap_write_end(). Then gfs2 can do it's special little extra bit
and then call iomap_write_end() in the one call...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
Hence this will now always report unwritten extents as
data . This strikes me as a regression as we currently report them
as a hole:
$ xfs_io -f -c "truncate 1m" -c "falloc 0 1m" -c "seek -a -r 0" foo
Whence Result
HOLE0
$
I'm pretty sure that ext4 has the same behaviour when it comes to
dirty page cache pages over unwritten extents ..
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
dit of the
caller paths is done and we're 100% certain that there are no
lurking deadlocks.
For example, I'm pretty sure we can call into _xfs_buf_map_pages()
outside of a transaction context but with an inode ILOCK held
exclusively. If we then recurse into memory reclaim and try to run a
transaction during reclaim, we have an inverted ILOCK vs transaction
locking order. i.e. we are not allowed to call xfs_trans_reserve()
with an ILOCK held as that can deadlock the log: log full, locked
inode pins tail of log, inode cannot be flushed because ILOCK is
held by caller waiting for log space to become available
i.e. there are certain situations where holding a ILOCK is a
deadlock vector. See xfs_lock_inodes() for an example of the lengths
we go to avoid ILOCK based log deadlocks like this...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Mon, Dec 19, 2016 at 02:06:19PM -0800, Darrick J. Wong wrote:
> On Tue, Dec 20, 2016 at 08:24:13AM +1100, Dave Chinner wrote:
> > On Thu, Dec 15, 2016 at 03:07:08PM +0100, Michal Hocko wrote:
> > > From: Michal Hocko
> > >
> > > Now that the page al
the unnecessary KM_NOFS allocations
in one go. I've never liked whack-a-mole style changes like this -
do it once, do it properly
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Wed, Nov 02, 2016 at 09:37:00AM +, Steven Whitehouse wrote:
> Hi,
>
> On 31/10/16 20:07, Dave Chinner wrote:
> >On Sat, Oct 29, 2016 at 10:24:45AM +0100, Steven Whitehouse wrote:
> >>On 28/10/16 20:29, Bob Peterson wrote:
> >>>+ if (create)
> >>
essary to do if it is already known
what ranges of the file contain zeros...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Tue, Jun 28, 2016 at 10:13:32AM +0100, Steven Whitehouse wrote:
> Hi,
>
> On 28/06/16 03:08, Dave Chinner wrote:
> >On Fri, Jun 24, 2016 at 02:50:11PM -0500, Bob Peterson wrote:
> >>This patch adds a new prune_icache_sb function for the VFS slab
> >>shrinker
erblock shrinker for
the above reasons - it's far too easy for people to get badly wrong.
If there are specific limitations on how inodes can be freed, then
move the parts of inode *freeing* that cause problems to a different
context via the ->evict/destroy callouts and trigger that external
context processing on demand. That external context can just do bulk
"if it is on the list then free it" processing, because the reclaim
policy has already been executed to place that inode on the reclaim
list.
This is essentially what XFS does, but it also uses the
->nr_cached_objects/->free_cached_objects() callouts in the
superblock shrinker to provide the reclaim rate feedback mechanism
required to throttle incoming memory allocations.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
but do
not destroy/free it - you simply queue it to an internal list and
then do the cleanup/freeing in your own time?
i.e. why do you need a special callout just to defer freeing to
another thread when we already have hooks than enable you to do
this?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Sun, May 01, 2016 at 08:19:44AM +1000, NeilBrown wrote:
> On Sat, Apr 30 2016, Dave Chinner wrote:
> > Indeed, blocking the superblock shrinker in reclaim is a key part of
> > balancing inode cache pressure in XFS. If the shrinker starts
> > hitting dirty inodes, it bl
in place, I'd then make the changes to the generic
superblock shrinker code to enable finer grained reclaim and
optimise the XFS shrinkers to make use of it...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
hread can't keep up
with all of the allocation pressure that occurs. e.g. a 20-core
intel CPU with local memory will be seen as a single node and so
will have a single kswapd thread to do reclaim. There's a massive
imbalance between maximum reclaim rate and maximum allocation rate
in situations like this. If we want memory reclaim to run faster,
we to be able to do more work *now*, not defer it to a context with
limited execution resources.
i.e. IMO deferring more work to a single reclaim thread per node is
going to limit memory reclaim scalability and performance, not
improve it.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Wed, Apr 27, 2016 at 10:03:11AM +0200, Michal Hocko wrote:
> On Wed 27-04-16 08:58:45, Dave Chinner wrote:
> > On Tue, Apr 26, 2016 at 01:56:12PM +0200, Michal Hocko wrote:
> > > From: Michal Hocko
> > >
> > > THIS PATCH IS FOR TESTING ONLY AND NOT MEANT
#x27;t actually care about in XFS at all. That way I can carry all
the XFS changes in the XFS tree and not have to worry about when
this stuff gets merged or conflicts with the rest of the work that
is being done to the mm/ code and whatever tree that eventually
lands in...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
ing
to restart the flood of false positive lockdep warnings we've
silenced over the years, so perhaps lockdep needs to be made smarter
as well...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
igned-off-by: Eryu Guan
> ---
>
> I noticed this when running LTP on overlayfs, setxattr03 failed due to
> unexpected EACCES on immutable inode.
This should be in the commit message itself, rather than "EPERM
looks more reasonable".
Other than that, change seems fine to me.
Slab shrinker calls into vfs inode shrinker to free inodes from memory.
> 7. dlm blocks on a pending fence operation. Goto 1.
Therefore, the fence operation should be doing GFP_NOFS allocations
to prevent re-entry into the DLM via the filesystem via the shrinker
Cheers,
Dave.
--
Dave Chinner
dchin...@redhat.com
cific plugging problem you've identified (i.e. do_direct_IO() is
flushing far too frequently) rather than making a sweeping
generalisation that the IO stack plugging infrastructure
needs fundamental change?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
e3142..c3ac5ec 100644
> --- a/fs/xfs/xfs_itable.c
> +++ b/fs/xfs/xfs_itable.c
> @@ -196,7 +196,7 @@ xfs_bulkstat_ichunk_ra(
>&xfs_inode_buf_ops);
> }
> }
> - blk_finish_plug(&plug);
> + blk_finish
On Mon, Mar 02, 2015 at 05:38:29AM +0100, Mateusz Guzik wrote:
> On Sun, Mar 01, 2015 at 08:31:26AM +1100, Dave Chinner wrote:
> > On Sat, Feb 28, 2015 at 05:25:57PM +0300, Alexey Dobriyan wrote:
> > > Freezing and thawing are separate system calls, task which is supposed
> &
That should be a separate patch, sent to the scheduler maintainers
for review. AFAICT, it isn't part of the user API - it's not defined
in the man page which just says "can be up to 16 bytes".
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Wed, Feb 04, 2015 at 09:49:50AM +, Steven Whitehouse wrote:
> Hi,
>
> On 04/02/15 07:13, Oleg Drokin wrote:
> >Hello!
> >
> >On Feb 3, 2015, at 5:33 PM, Dave Chinner wrote:
> >>>I also wonder if vmalloc is still very slow? That was the case some
&g
On Wed, Feb 04, 2015 at 02:13:29AM -0500, Oleg Drokin wrote:
> Hello!
>
> On Feb 3, 2015, at 5:33 PM, Dave Chinner wrote:
> >> I also wonder if vmalloc is still very slow? That was the case some
> >> time ago when I noticed a problem in directory access times in gfs2,
&
On Mon, Feb 02, 2015 at 10:30:29AM +, Steven Whitehouse wrote:
> Hi,
>
> On 02/02/15 08:11, Dave Chinner wrote:
> >On Mon, Feb 02, 2015 at 01:57:23AM -0500, Oleg Drokin wrote:
> >>Hello!
> >>
> >>On Feb 2, 2015, at 12:37 AM, Dave Chinner wrote:
&g
On Mon, Feb 02, 2015 at 01:57:23AM -0500, Oleg Drokin wrote:
> Hello!
>
> On Feb 2, 2015, at 12:37 AM, Dave Chinner wrote:
>
> > On Sun, Feb 01, 2015 at 10:59:54PM -0500, gr...@linuxhacker.ru wrote:
> >> From: Oleg Drokin
> >>
> >> leaf_dealloc u
gly and grotesque, but we've got no
other way to limit reclaim context because the MM devs won't pass
the vmalloc gfp context down the stack to the PTE allocations
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Wed, Jan 21, 2015 at 11:23:20PM +0100, Jan Kara wrote:
> On Thu 22-01-15 08:38:26, Dave Chinner wrote:
> > On Fri, Jan 16, 2015 at 01:47:34PM +0100, Jan Kara wrote:
> > > Hello,
> > >
> > > this is another iteration of patches to unify VFS and XFS quota
f copies from 3 to 2 brings just 2%
> improvement in speed in my test setup and getting quota information isn't IMHO
> so performance critical that it would be worth the complications of the code.
I think the numbers address my concern adequately ;)
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
Hi Christoph. Can you send a link to the thread regarding Dave's iomap?
> proposal? I don't recall it offhand, so I don't know what it was or
> why it was never implemented. I assume you mean Dave Chinner. Maybe it's
> time to revisit the concept as a long-term solution.
On Wed, Oct 01, 2014 at 09:31:25PM +0200, Jan Kara wrote:
> We support user, group, and project quotas. Tell VFS about it.
>
> CC: x...@oss.sgi.com
> CC: Dave Chinner
> Signed-off-by: Jan Kara
> ---
> fs/xfs/xfs_super.c | 2 ++
> 1 file changed, 2 insertions(+)
On Fri, Aug 01, 2014 at 07:54:56AM +0200, Andreas Dilger wrote:
> On Aug 1, 2014, at 1:53, Dave Chinner wrote:
> > On Thu, Jul 31, 2014 at 01:19:45PM +0200, Andreas Dilger wrote:
> >> None of these issues are relevant in the API that I'm thinking about.
> >> The
On Thu, Jul 31, 2014 at 01:19:45PM +0200, Andreas Dilger wrote:
> On Jul 31, 2014, at 6:49, Dave Chinner wrote:
> >
> >> On Mon, Jul 28, 2014 at 03:19:31PM -0600, Andreas Dilger wrote:
> >>> On Jul 28, 2014, at 6:52 AM, Abhijith Das wrote:
> >>> O
contains enough information to construct a
valid file handle in userspace and so access to inodes found via
bulkstat can be gained via the XFS open-by-handle interfaces. Again,
this bypasses permissions checking and hence is a root-only
operation. It does, however, avoid TOCTOU races because the open-by-handle
will fail if the inode is unlinked and reallocated between the
bulkstat call and the open-by-handle as the generation number in the
handle will no longer match that of the inode.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
pe to which readahead() can be
applied.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Mon, Jul 28, 2014 at 08:22:22AM -0400, Abhijith Das wrote:
>
>
> - Original Message -
> > From: "Dave Chinner"
> > To: "Zach Brown"
> > Cc: "Abhijith Das" , linux-ker...@vger.kernel.org,
> > "linux-fsdeve
On Mon, Jul 28, 2014 at 03:21:20PM -0600, Andreas Dilger wrote:
> On Jul 25, 2014, at 6:38 PM, Dave Chinner wrote:
> > On Fri, Jul 25, 2014 at 10:52:57AM -0700, Zach Brown wrote:
> >> On Fri, Jul 25, 2014 at 01:37:19PM -0400, Abhijith Das wrote:
> >>> Hi al
that is being optimised
here (i.e. queued, ordered, issued, cached), not the directory
blocks themselves. As such, why does this need to be done in the
kernel? This can all be done in userspace, and even hidden within
the readdir() or ftw/ntfw() implementations themselves so it's OS,
kernel and filesystem independent..
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
me representation, and the kernel
to be independent of the physical filesystem time encoding
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Sun, Dec 01, 2013 at 03:59:17AM -0800, Christoph Hellwig wrote:
> Also create inodes with the proper mode instead of fixing it up later.
>
> Signed-off-by: Christoph Hellwig
Nice cleanup work, Christoph.
Reviewed-by: Dave Chinner
--
Dave Chinner
da...@fromorbit.com
t is taken. Indeed, why do you even need to remove the item from
the LRU list when you get a reference to it? you skip referenced
dquots in the isolation callback, so the only time it needs to be
removed from the LRU is on reclaim. And that means you only need an
atomic_dec_and_test() to determine if you need to add the dquot to
the LRU
So what it appears to me is that you need to do is:
a) separate the dq_lru_lock > dq_lock changes into a
separate patch
b) separate the object reference counting from the LRU
operations
c) make the LRU operations the innermost operations for
locking purposes
d) convert to list_lru operations...
Cheers,
Dave.
--
Dave Chinner
dchin...@redhat.com
On Fri, Dec 07, 2012 at 05:25:19PM +0530, Abhijit Pawar wrote:
> This patch replace the obsolete simple_strto with kstrto
The XFS changes look fine. Consider those:
Acked-by: Dave Chinner
--
Dave Chinner
da...@fromorbit.com
tree for the next merge.
No objections. I just did a quick check of the patch again and I
can't see anything obviously wrong with it, so queue it up ;)
Cheers,
Dave.
--
Dave Chinner
dchin...@redhat.com
On Fri, Feb 05, 2010 at 11:11:48AM +, Steven Whitehouse wrote:
> Hi,
>
> On Fri, 2010-02-05 at 16:45 +1100, Dave Chinner wrote:
> > THe log lock is currently used to protect the AIL lists and
> > the movements of buffers into and out of them. The lists
> > ar
eed. I'm taking small steps first, though. ;)
Cheers,
Dave.
--
Dave Chinner
dchin...@redhat.com
These patches improve sequential write IO patterns and reduce ordered
write log contention.
The first patch is simply for diagnosis purposes - it enabled me to
see where Io was being dispatched from, and led directly to he fix
in the second patch. The third patch removes the use of WRITE_SYNC_PLUG
Useful for tracking down where specific IOs are being issued
from.
Signed-off-by: Dave Chinner
---
fs/gfs2/log.c|6 ++
fs/gfs2/lops.c |6 ++
fs/gfs2/trace_gfs2.h | 41 +
3 files changed, 53 insertions(+), 0 deletions
throughput. On no-op scheduler on a disk that can do 85MB/s,
this increases the write rate from 65MB/s with the ordering
fixes to 75MB/s.
Signed-off-by: Dave Chinner
---
fs/gfs2/glops.c | 10 --
fs/gfs2/incore.h |1 +
fs/gfs2/log.c| 32 +---
fs
dered buffers to the tail of the ordered buffer list to
ensure that IO is dispatched in the order it was submitted. This
should significantly improve large sequential write speeds. On a
disk capable of 85MB/s, speeds increase from 50MB/s to 65MB/s for
noop and from 38MB/s to 50MB/s for cfq.
Signed-off-by:
make sure that all the Io is issued by unplugging the
device. The use of normal WRITEs for these buffers should
significantly reduce the overhead of processing in the cfq elevator
and enable the disk subsystem to get much closer to disk bandwidth
for large sequential writes.
Signed-off-by: Dave Ch
101 - 151 of 151 matches
Mail list logo