On Tue, Sep 26, 2017 at 11:35:48AM +0200, Jan Kara wrote:
> On Tue 26-09-17 09:38:12, Dave Chinner wrote:
> > On Mon, Sep 25, 2017 at 05:13:58PM -0600, Ross Zwisler wrote:
> > > Before support for the per-inode DAX flag was disabled the XFS the code
> > > had
&g
On Thu, Sep 21, 2017 at 09:43:41AM +0300, Amir Goldstein wrote:
> On Thu, Sep 21, 2017 at 1:22 AM, Dave Chinner wrote:
> > [cc lkml, PeterZ and Byungchul]
> ...
> > The thing is, this IO completion has nothing to do with the lower
> > filesystem - it's the IO complet
On Thu, Sep 21, 2017 at 05:47:14PM +0900, Byungchul Park wrote:
> On Thu, Sep 21, 2017 at 08:22:56AM +1000, Dave Chinner wrote:
> > Peter, this is the sort of false positive I mentioned were likely to
> > occur without some serious work to annotate the IO stack to prevent
> &g
sem, then we have a deadlock vector.
Historically we've avoided any mm/ level interactions under the
ILOCK_EXCL because of it's location in the page fault path locking
order (e.g. lockdep will go nuts if we take a page fault with the
ILOCK held). Hence I'm extremely wary of putting any other mm/ level
locks under the ILOCK like this without a clear explanation of the
locking orders and why it won't deadlock
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
off/on
dax for the things that didn't/did work with DAX correctly so they
didn't need multiple filesystems on pmem to segregate the apps that
did/didn't work with DAX...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
re lots of applications out there that rely on
these semantics for performance.
CHeers,
Dave.
--
Dave Chinner
da...@fromorbit.com
; file_accessed(iocb->ki_filp);
> -
> - xfs_ilock(ip, XFS_IOLOCK_SHARED);
> - ret = iomap_dio_rw(iocb, to, &xfs_iomap_ops, NULL);
> - xfs_iunlock(ip, XFS_IOLOCK_SHARED);
> -
> - return ret;
> + return iomap_dio_rw(iocb, to, &xfs_iomap_ops, NULL);
This puts file_accessed under the XFS_IOLOCK_SHARED now. Is that a
safe/sane thing to do for DIO?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
erious work to annotate the IO stack to prevent
them. We can nest multiple layers of IO completions and locking in
the IO stack via things like loop and RAID devices. They can be
nested to arbitrary depths, too (e.g. loop on fs on loop on fs on
dm-raid on n * (loop on fs) on bdev) so this new completion lockdep
checking is going to be a source of false positives until there is
an effective (and simple!) way of providing context based completion
annotations to avoid them...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Mon, Sep 18, 2017 at 05:00:58PM -0500, Eric Sandeen wrote:
> On 9/18/17 4:31 PM, Dave Chinner wrote:
> > On Mon, Sep 18, 2017 at 09:28:55AM -0600, Jens Axboe wrote:
> >> On 09/18/2017 09:27 AM, Christoph Hellwig wrote:
> >>> On Mon, Sep 18, 2017 at 08:26:
lem triage.
Yes, the first invalidation should also have a comment like the post
IO invalidation - the comment probably got dropped and not noticed
when the changeover from internal XFS code to generic iomap code was
made...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
being triggered.
It needs to be on by default, bu tI'm sure we can wrap it with
something like an xfs_alert_tag() type of construct so the tag can
be set in /proc/fs/xfs/panic_mask to suppress it if testers so
desire.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
sharing of multiply
referenced data blocks. I don't see overlay being involved in this
functionality at all
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Mon, Sep 11, 2017 at 09:07:13PM +0100, Al Viro wrote:
> On Mon, Sep 11, 2017 at 04:44:40PM +1000, Dave Chinner wrote:
>
> > > iov_iter_get_pages() for pipe-backed destination does page allocation
> > > and inserts freshly allocated pages into pipe.
> >
> &
On Mon, Sep 11, 2017 at 04:32:22AM +0100, Al Viro wrote:
> On Mon, Sep 11, 2017 at 10:31:13AM +1000, Dave Chinner wrote:
>
> > splice does not go down the direct IO path, so iomap_dio_actor()
> > should never be handled a pipe as the destination for the IO data.
> > In
On Mon, Sep 11, 2017 at 12:07:23AM +0100, Al Viro wrote:
> On Mon, Sep 11, 2017 at 08:08:14AM +1000, Dave Chinner wrote:
> > On Sun, Sep 10, 2017 at 10:19:07PM +0100, Al Viro wrote:
> > > On Mon, Sep 11, 2017 at 07:11:10AM +1000, Dave Chinner wrote:
> > > > On Sun, Se
On Sun, Sep 10, 2017 at 10:19:07PM +0100, Al Viro wrote:
> On Mon, Sep 11, 2017 at 07:11:10AM +1000, Dave Chinner wrote:
> > On Sun, Sep 10, 2017 at 03:57:21AM +0100, Al Viro wrote:
> > > On Sat, Sep 09, 2017 at 09:07:56PM -0400, Dave Jones wrote:
> > >
> >
27;t end up chasing ghosts when we see that
warning in the logs. The usual vector is an app that mixes
concurrent DIO with mmap access to the same file, which we
explicitly say "don't do this because data corruption" in the
open(2) man page
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
sys 0m25.524s
4k random write with direct IO. 5GB file. Probably got a million 4k
extents in it. Which means XFS has sent a million tiny 4k discards
to the device. Run 'xfs_bmap -vvp fio_test_file.*' to confirm.
Don't use "-o discard" if you care about performance.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Thu, Sep 07, 2017 at 04:19:00PM -0600, Ross Zwisler wrote:
> On Fri, Sep 08, 2017 at 08:12:01AM +1000, Dave Chinner wrote:
> > On Thu, Sep 07, 2017 at 03:51:48PM -0600, Ross Zwisler wrote:
> > > On Thu, Sep 07, 2017 at 03:26:10PM -0600, Andreas Dilger wrote:
> > >
then the only
hammer we have is Brutus^Wdrop_caches. That's not an option for
production machines.
Neat idea, but one I'd already thought of and discarded as "not
practical from an admin perspective".
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
;s warning that the pipe buffer is already full before we
try to read from the filesystem?
That doesn't seem like an XFS problem - it indicates the pipe we are
filling in generic_file_splice_read() is not being emptied by
whatever we are splicing the file data to
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
From: Dave Chinner
The cluster_info structure is allocated with kvzalloc(), which can
return kmalloc'd or vmalloc'd memory. It must be paired with
kvfree(), but sys_swapon uses vfree(), resultin in this warning
from xfstests generic/357:
[ 1985.294915] swapon: swapfile has holes
[ 1
; + boolordered;
> > > +
> > > + aborted = !!(lip->li_flags & XFS_LI_ABORTED);
> > > + hold = !!(bip->bli_flags & XFS_BLI_HOLD);
> > > + dirty = !!(bip->bli_flags & XFS_BLI_DIRTY);
> > > + ordered = !!(bip->bli_flags
On Wed, Aug 30, 2017 at 12:14:03AM -0700, Christoph Hellwig wrote:
> On Wed, Aug 30, 2017 at 07:51:57AM +1000, Dave Chinner wrote:
> > Right, I've looked at btrees, too, but it's more complex than just
> > using an rbtree. I originally looked at using Peter Z's old
&g
gt; seemed like too large a CC list. :) I can explicitly add the xfs list
> to the first three for any future versions.
If you are touching multiple filesystems, you really should cc the
entire patchset to linux-fsdevel, similar to how you sent the entire
patchset to lkml. That way the entire series will end up on a list
that almost all fs developers read. LKML is not a list you can rely
on all filesystem developers reading (or developers in any other
subsystem, for that matter)...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Tue, Aug 29, 2017 at 05:45:36AM -0700, Christoph Hellwig wrote:
> On Tue, Aug 29, 2017 at 10:31:26PM +1000, Dave Chinner wrote:
> > Probably should. I've already been looking at killing the inline
> > extents array to simplify the management of the extent list (much
>
ling the inline data would get rid of the other
part of the union the inline data sits in.
OTOH, if we're going to have to dynamically allocate the memory for
the extent/inline data for the data fork, it may just be easier to
make the entire data fork a dynamic allocation (like the attr fork).
different context. So this patch
> loses the kswapd context.
Yup. That's what the code does, and removing the PF_KSWAPD from it
will break it.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Tue, Aug 22, 2017 at 11:06:03AM +0200, Peter Zijlstra wrote:
> On Tue, Aug 22, 2017 at 03:46:03PM +1000, Dave Chinner wrote:
> > Even if I ignore the fact that buffer completions are run on
> > different workqueues, there seems to be a bigger problem with this
> > sort o
ch problems. i.e. the inode locks we hold at this point
in the truncate process (i.e. the XFS_IOLOCK a.k.a i_rwsem) prevent
new IO from being run, and we don't start the truncate until we've
waited for all in progress IO to complete. Hence while the truncate
runs and blocks on metadata IO completions, no data IO can be in
progress on that inode, so there is no completions being run on that
inode in workqueues.
And therefore the IO completion deadlock path reported by lockdep
can not actually be executed during a truncate, and so it's a false
positive.
Back to the drawing board, I guess
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
d_create() to manage shared
access to anonymous tmpfs files and will EINVAL on any fd that
points to a real file.
Oh, even more problematic:
Seals are a property of an inode. [] Furthermore, seals
can never be removed, only added.
That seems somewhat difficult to reconcile with how I need
F_SEAL_IOMAP to operate.
/me calls it a day and goes looking for the hard liquor.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
e seal is going to be broken
by the filesystem via the break_layouts() interface, and the break
then blocks until the app releases the lease? So the seal lifetime
is bounded by the lease?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Fri, Aug 11, 2017 at 07:31:54PM -0700, Darrick J. Wong wrote:
> On Sat, Aug 12, 2017 at 10:30:34AM +1000, Dave Chinner wrote:
> > On Fri, Aug 11, 2017 at 04:42:18PM -0700, Dan Williams wrote:
> > > On Fri, Aug 11, 2017 at 4:27 PM, Dave Chinner wrote:
> > > > On T
On Fri, Aug 11, 2017 at 04:42:18PM -0700, Dan Williams wrote:
> On Fri, Aug 11, 2017 at 4:27 PM, Dave Chinner wrote:
> > On Thu, Aug 10, 2017 at 11:39:28PM -0700, Dan Williams wrote:
> >> >From falloc.h:
> >>
> >> FALLOC_FL_SEAL_BLOCK_MAP is u
e user
downgrades their kernel the swapfile suddenly can not be used by the
older kernel.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
just one
thing - having the seal operation also modify the extent map
means it's not useful for the use cases where we need the extent map
to remain unmodified
Thoughts?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
d rather than
discussion and review being shut down because "Christoph shouted
nasty words at me but I still don't understand why?".
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Fri, Aug 04, 2017 at 04:43:50PM -0700, Dan Williams wrote:
> On Fri, Aug 4, 2017 at 4:31 PM, Dave Chinner wrote:
> > On Thu, Aug 03, 2017 at 07:28:17PM -0700, Dan Williams wrote:
> >> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> >> index fe0f8f
allocate(FALLOC_FL_[UN]SEAL_BLOCK_MAP). Support for toggling this
> > on-disk state is saved for a later patch.
> >
> > Cc: Jan Kara
> > Cc: Jeff Moyer
> > Cc: Christoph Hellwig
> > Cc: Ross Zwisler
> > Suggested-by: Dave Chinner
> > Sugges
ling, so we've already guaranteed that it won't have holes
in it.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
can't imagine why anyone would want to turn a swap file back into a regular
> file.
> I haven't fully followed DAX, but I'd take your word for it if people want to
> be able to remove the flag after.
DAX isn't the driver of that functionality, it's the other use cases
that need it, and why the proposed "only remove flag if len == 0"
API is a non-starter
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
code will now fail
to allocate/zero anything...
IOWs, this flag should be the last thing that is set on the inode
once it's been fully allocated and zeroed.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
the file is mapped or
not. Perhaps it would be better to start with a man page
documenting the desired API?
FWIW, the if/else if/else structure could be cleaned up with a
simple "goto out_unlock" construct such as:
/* don't make immutable if inode is currently mapped */
error = -EBUSY;
if (mapping_mapped(mapping))
goto out_unlock;
/* can't do anything if inode is already immutable */
error = -ETXTBSY;
if (IS_IMMUTABLE(inode) || IS_IOMAP_IMMUTABLE(inode))
goto out_unlock;
/* XFS only supports whole file extent immutability */
error = -EINVAL;
if (len != i_size_read(inode))
goto out_unlock;
/* all good to go */
error = 0;
out_unlock:
xfs_iunlock(ip, XFS_ILOCK_EXCL);
i_mmap_unlock_read(mapping);
if (error)
return error;
/* now unshare, allocate and add immutable flag */
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Tue, Jul 18, 2017 at 05:28:14PM -0700, David Rientjes wrote:
> On Tue, 18 Jul 2017, Dave Chinner wrote:
>
> > > Thanks for looking into this, Dave!
> > >
> > > The number of GFP_NOFS allocations that build up the deferred counts can
> > > be unboun
On Mon, Jul 17, 2017 at 01:37:35PM -0700, David Rientjes wrote:
> On Mon, 17 Jul 2017, Dave Chinner wrote:
>
> > > This is a side effect of super_cache_count() returning the appropriate
> > > count but super_cache_scan() refusing to do anything about it and
> >
n..
OTOH, if we don't damp down the deferred count scanning on small
deltas, then we end up with filesystem caches being trashed in light
memory pressure conditions. This is, generally speaking, bad for
workloads that rely on filesystem caches for performance (e.g git,
NFS servers, etc).
What we have now is effectively a brute force solution that finds
a decent middle ground most of the time. It's not perfect, but I'm
yet to find a better solution
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Wed, Jun 21, 2017 at 09:07:57PM -0700, Andy Lutomirski wrote:
> On Wed, Jun 21, 2017 at 5:02 PM, Dave Chinner wrote:
> >
> > You seem to be calling the "fdatasync on every page fault" the
>
> It's the opposite of fdatasync(). It needs to sync whatever m
On Tue, Jun 20, 2017 at 10:18:24PM -0700, Andy Lutomirski wrote:
> On Tue, Jun 20, 2017 at 6:40 PM, Dave Chinner wrote:
> >> A per-inode
> >> count of the number of live DAX mappings or of the number of struct
> >> file instances that have requested DAX would work
t; > +
> > +SYSCALL_DEFINE3(daxctl, const char __user *, path, int, flags, int, align)
>
> I was /about/ to grouse about this syscall, then realized that maybe it
> /is/ useful to be able to check a specific alignment. Maybe not, since
> I had something more permanent in mind anyway. In any case, just pass
> in an opened fd if this sticks around.
We can do all that via fallocate(), too...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Tue, Jun 20, 2017 at 06:24:03PM -0700, Darrick J. Wong wrote:
> On Wed, Jun 21, 2017 at 09:53:46AM +1000, Dave Chinner wrote:
> > On Tue, Jun 20, 2017 at 09:17:36AM -0700, Dan Williams wrote:
> > > An immutable-extent DAX-file and a reflink-capable DAX-file are not
> &
On Tue, Jun 20, 2017 at 09:14:24AM -0700, Andy Lutomirski wrote:
> On Tue, Jun 20, 2017 at 3:11 AM, Dave Chinner wrote:
> > On Mon, Jun 19, 2017 at 10:53:12PM -0700, Andy Lutomirski wrote:
> >> On Mon, Jun 19, 2017 at 5:46 PM, Dave Chinner wrote:
> >> > On Mon, Ju
n.
However, we cannot guarantee that no writes occur to the inode with
immutable extent maps (especially as the whole point is to allow
userspace writes and commits without the kernel being involved), so
extent sharing on immutable extent maps cannot be allowed...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Mon, Jun 19, 2017 at 10:53:12PM -0700, Andy Lutomirski wrote:
> On Mon, Jun 19, 2017 at 5:46 PM, Dave Chinner wrote:
> > On Mon, Jun 19, 2017 at 08:22:10AM -0700, Andy Lutomirski wrote:
> >> Second: syncing extents. Here's a straw man. Forget the mmap() flag.
>
On Mon, Jun 19, 2017 at 08:22:10AM -0700, Andy Lutomirski wrote:
> On Mon, Jun 19, 2017 at 6:21 AM, Dave Chinner wrote:
> > On Sat, Jun 17, 2017 at 10:05:45PM -0700, Andy Lutomirski wrote:
> >> On Sat, Jun 17, 2017 at 8:15 PM, Dan Williams
> >> wrote:
> >&g
ly like this:
>
> if (metadata is dirty) {
> up_write(&mmap_sem);
> sync the metadata;
> down_write(&mmap_sem);
> return 0; /* retry the fault */
> } else {
> return whatever success code;
> }
How do you know that there is dependent filesystem metadata that
needs syncing at a level that you can safely manipulate the
mmap_sem? And how, exactly, do you do this without races? It'd be
trivial to DOS such retryable DAX faults simply by touching the file
in a tight loop in a separate process...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Mon, Apr 03, 2017 at 04:00:55PM +0200, Jan Kara wrote:
> On Sun 02-04-17 09:05:26, Dave Chinner wrote:
> > On Thu, Mar 30, 2017 at 12:12:31PM -0400, J. Bruce Fields wrote:
> > > On Thu, Mar 30, 2017 at 07:11:48AM -0400, Jeff Layton wrote:
> > > > On Thu, 2017-
ven know there was a crash at mount time because their
architecture always leaves a consistent filesystem on disk (e.g. COW
filesystems)
> I wonder if repeated crashes can lead to any odd corner cases.
WIthout defined, locked down behavour of the superblock counter, the
almost certainly corner cases will exist...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
ync() into ->getattr() (and dealt with all
the locking issues that entails), by the time the statx syscall
returns to userspace the i_version value may not match the
data/metadata in the inode(*). IOWs, by the time i_version gets
to userspace, it is out of date and any use of it for data
versioning from userspace is going to be prone to race conditions.
Cheers,
Dave.
(*) fiemap has exactly the same "stale the moment internal fs
locks are released" race conditions, which is why it cannot safely
be used for mapping holes when copying file data
--
Dave Chinner
da...@fromorbit.com
as the NFS
clients are accessing and requiring synchronisation.
> Not sure how big a problem that really is.
This coherency problem has always existed on the server side...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
e asking if we should try
> kmem_zalloc(4 pages), then kmem_zalloc(1 page), and only then switch to
> the __vmalloc calls?
Just call kmem_zalloc_large() for 4 pages without a fallback on
failure - that's exactly how we handle allocations for things like
the 64k xattr buffers
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Fri, Mar 03, 2017 at 03:19:12PM -0800, Darrick J. Wong wrote:
> On Sat, Mar 04, 2017 at 09:54:44AM +1100, Dave Chinner wrote:
> > On Thu, Mar 02, 2017 at 04:45:40PM +0100, Michal Hocko wrote:
> > > From: Michal Hocko
> > >
> > > Even though kmem_zalloc_gr
<= minsize)
> kmsize = minsize;
> }
Seems wrong to me - this function used to have lots of callers and
over time we've slowly removed them or replaced them with something
else. I'd suggest removing it completely, replacing the call sites
with kmem_zalloc_large().
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
ject code. Any change to code in this
area needs to be gone over with a fine tooth comb, because bugs can
result in filesystem and/or journal corruption issues that may not
be noticed until a system crashes and log recovery fails and the
user loses their entire filesystem....
Hence the repeated comments about needing to actually test the code
you are changing.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
that the object
is not referenced by anyone (that's b_hold). i.e. b_lru_ref is an
"active reference weighting" used to provide a heirarchical reclaim
bias toward less important metadata objects, and has no bearing on
the actual active users of the object.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
> situations.
I'm missing something: how do you overflow a log item object
reference count?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Wed, Feb 15, 2017 at 10:41:40PM +, Ben Hutchings wrote:
> 3.16.40-rc1 review patch. If anyone has any objections, please let me know.
>
> --
>
> From: Dave Chinner
>
> commit 541d48f05fa1c19a4a968d38df685529e728a20a upstream.
>
> oss.sgi.com
dit of the
caller paths is done and we're 100% certain that there are no
lurking deadlocks.
For example, I'm pretty sure we can call into _xfs_buf_map_pages()
outside of a transaction context but with an inode ILOCK held
exclusively. If we then recurse into memory reclaim and try to run a
transaction during reclaim, we have an inverted ILOCK vs transaction
locking order. i.e. we are not allowed to call xfs_trans_reserve()
with an ILOCK held as that can deadlock the log: log full, locked
inode pins tail of log, inode cannot be flushed because ILOCK is
held by caller waiting for log space to become available
i.e. there are certain situations where holding a ILOCK is a
deadlock vector. See xfs_lock_inodes() for an example of the lengths
we go to avoid ILOCK based log deadlocks like this...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Fri, Dec 23, 2016 at 09:33:36AM +1100, Dave Chinner wrote:
> On Fri, Dec 23, 2016 at 09:15:00AM +1100, Dave Chinner wrote:
> > On Thu, Dec 22, 2016 at 01:10:19PM -0800, Linus Torvalds wrote:
> > > Ok, so the numa issue was a red herring. With that fixed:
> > >
>
On Fri, Dec 23, 2016 at 09:15:00AM +1100, Dave Chinner wrote:
> On Thu, Dec 22, 2016 at 01:10:19PM -0800, Linus Torvalds wrote:
> > Ok, so the numa issue was a red herring. With that fixed:
> >
> > On Thu, Dec 22, 2016 at 1:06 PM, Dave Chinner wrote:
> > >
> &
On Thu, Dec 22, 2016 at 01:10:19PM -0800, Linus Torvalds wrote:
> Ok, so the numa issue was a red herring. With that fixed:
>
> On Thu, Dec 22, 2016 at 1:06 PM, Dave Chinner wrote:
> >
> > Better, but still bad. average files/s is not up to 200k files/s,
> > so still
On Fri, Dec 23, 2016 at 07:42:40AM +1100, Dave Chinner wrote:
> On Thu, Dec 22, 2016 at 09:24:12AM -0800, Linus Torvalds wrote:
> > On Wed, Dec 21, 2016 at 10:28 PM, Dave Chinner wrote:
> > >
> > > This sort of thing is normally indicative of a memory reclaim or
&g
On Thu, Dec 22, 2016 at 09:24:12AM -0800, Linus Torvalds wrote:
> On Wed, Dec 21, 2016 at 10:28 PM, Dave Chinner wrote:
> >
> > This sort of thing is normally indicative of a memory reclaim or
> > lock contention problem. Profile showed unusual spinlock contention,
> >
On Wed, Dec 21, 2016 at 09:46:37PM -0800, Linus Torvalds wrote:
> On Wed, Dec 21, 2016 at 9:13 PM, Dave Chinner wrote:
> >
> > There may be deeper issues. I just started running scalability tests
> > (e.g. 16-way fsmark create tests) and about a minute in I got a
> > di
> report, so I'm not really sure what's going on here anyway.
http://www.gossamer-threads.com/lists/linux/kernel/2587485
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Thu, Dec 22, 2016 at 04:13:22PM +1100, Dave Chinner wrote:
> On Wed, Dec 21, 2016 at 04:13:03PM -0800, Chris Leech wrote:
> > On Wed, Dec 21, 2016 at 03:19:15PM -0800, Linus Torvalds wrote:
> > > Hi,
> > >
> > > On Wed, Dec 21, 2016 at 2:16 PM, Dave Chinner
On Wed, Dec 21, 2016 at 04:13:03PM -0800, Chris Leech wrote:
> On Wed, Dec 21, 2016 at 03:19:15PM -0800, Linus Torvalds wrote:
> > Hi,
> >
> > On Wed, Dec 21, 2016 at 2:16 PM, Dave Chinner wrote:
> > > On Fri, Dec 16, 2016 at 10:59:06AM -0800, Chris L
iscsi guys
seem to have bounced it and no-one is looking at it.
I'm disappearing for several months at the end of tomorrow, so I
thought I better make sure you know about it. I've also added
linux-scsi, linux-block to the cc list
Cheers,
Dave.
> On Thu, Dec 15, 2016 at 09:29
ROT_WRITE, fd, 0);
> >
> > *(p + 42) = 0xDEADBEEF;
> > asm { clflush; } /* or whatever */
> >
> > ...so perhaps it would be a good idea to design the fallocate primitive
> > around "prepare this fd for mmap-only pmem semantics" and let it the
> > backend do zeroing and inode flag changes as necessary to make it
> > happen. We'd need to do some bikeshedding about what the other falloc
> > flags mean when we're dealing with pmem files and devices, but I think
> > we should try to keep the userland presentation the same unless there's
> > a really good reason not to.
>
> It would be interesting to use fallocate to size device-dax files...
No. device-dax needs to die, not poison a bunch of existing file and
block device APIs and behaviours with special snowflakes. Get
DAX-enabled filesystems to do what you need, and get rid of this
ugly, nasty hack.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Mon, Dec 19, 2016 at 02:06:19PM -0800, Darrick J. Wong wrote:
> On Tue, Dec 20, 2016 at 08:24:13AM +1100, Dave Chinner wrote:
> > On Thu, Dec 15, 2016 at 03:07:08PM +0100, Michal Hocko wrote:
> > > From: Michal Hocko
> > >
> > > Now that the page al
the unnecessary KM_NOFS allocations
in one go. I've never liked whack-a-mole style changes like this -
do it once, do it properly
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
| 1 +
include/linux/iomap.h | 28 +-
include/linux/lockdep.h| 25 +-
kernel/locking/lockdep.c | 20 +-
73 files changed, 1994 insertions(+), 2063 deletions(-)
--
Dave Chinner
da...@fromorbit.com
On Thu, Dec 15, 2016 at 09:24:11AM +1100, Dave Chinner wrote:
> Hi folks,
>
> Just updated my test boxes from 4.9 to a current Linus 4.10 merge
> window kernel to test the XFS merge I am preparing for Linus.
> Unfortunately, all my test VMs using iscsi failed pretty much
> inst
00 00 00 00 e9 ad fe ff ff 48 8b 7b 30 e8 da e7 ca
ff 8b 53 10 44 89 ee 48 89 df 2b 53 14 48 89 43 30 c7 43 40 00 00 00 00 <8b
[ 160.300674] RIP: iscsi_tcp_segment_done+0x20d/0x2e0 RSP: c9083c38
[ 160.301584] CR2: 000c
Known problem, or something new?
Cheers,
Dave.
--
D
is XFS's version of
kvmalloc() that is GFP_NOFS/GFP_NOIO safe. Any generic API for this
functionality will have to play these memalloc_noio_save/
memalloc_noio_restore games to ensure they are GFP_NOFS safe
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
option), but
> > DAX
> > and XFS at least require FS_IOMAP to behave correctly.
> >
> > If you made DAX a FS selectable option instead of a user selectable one,
> > when
> > would a FS know it needs to include DAX support?
>
> With a user-selectable DAX knob per-filesystem, XFS_DAX, EXT4_DAX, etc...
That's just silly. Requiring users to configure every filesystem
that can support DAX to support DAX at config time is unneeded
config space bloat. DAX has an iomap config dependency, so just
select it when DAX is selected - everything else should just be
automatically and nobody else needs to care what build dependencies
DAX has.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Mon, Nov 28, 2016 at 03:46:51PM -0700, Ross Zwisler wrote:
> On Fri, Nov 25, 2016 at 02:00:59PM +1100, Dave Chinner wrote:
> > On Wed, Nov 23, 2016 at 11:44:19AM -0700, Ross Zwisler wrote:
> > > Tracepoints are the standard way to capture debugging and tracing
> > > i
On Sun, Nov 27, 2016 at 04:58:43PM -0800, Linus Torvalds wrote:
> On Sun, Nov 27, 2016 at 2:42 PM, Dave Chinner wrote:
> >
> > And that's exactly why we need a method of marking tracepoints as
> > stable. How else are we going to know whether a specific tracepoint
> &
re decides to use it in
userspace" policy.
> But tracing actual high-level things like IO and faults? I think that
> makes perfect sense, as long as the data that is collected is also the
> actual event data, and not so much a random implementation issue of
> the day.
IME, a tracepoint that doesn't expose detailed context specific
information isn't really useful for complex problem diagnosis...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Fri, Nov 25, 2016 at 04:14:19AM +, Al Viro wrote:
> [Linus Cc'd]
>
> On Fri, Nov 25, 2016 at 01:49:18PM +1100, Dave Chinner wrote:
> > > they have become parts of stable userland ABI and are to be maintained
> > > indefinitely. Don't expect &quo
umber like
so:
xfs_ilock:dev 8:96 ino 0x493 flags ILOCK_EXCL
This way we can filter the output easily across both dax and
filesystem tracepoints with 'grep "ino 0x493"'...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
tential stable ABI
> you might have to keep around forever. It's *not* a glorified debugging
> printk.
trace_printk() is the glorified debugging printk for tracing, not
trace events.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
purposes. My "pmem" test VM always has at least 2
ranges set to give me two discrete pmem devices, and I have used 4
from time to time to do things like test multi-volume scratch XFS
filesystems in xfstests (i.e. data, log and realtime volumes) so I
didn't need to play games with partitioning or DM...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Tue, Nov 22, 2016 at 10:39:29AM +, David Howells wrote:
> Dave Chinner wrote:
>
> > No. Just provide a 64 bit high resoultion field, and define it to
> > contain nanoseconds. When we need higher resolution to be exported
> > to userspace, we use a /feature f
It doesn't take much vision to extend the current hardare
capabilities with coherent hardware accelerators (e.g. as has been
added to the Power platform) writing directly into pmem storage and
providing higher resolution timestamps than the CPU can generate.
Call me silly if you want - I don't care - but let's not ignore the
emerging storage technology trends that are there for everyone to
see...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Fri, Nov 18, 2016 at 10:54:02PM +, David Howells wrote:
> Dave Chinner wrote:
>
> > And when we start thinking in those timeframes, an
> > increase in timestamp resoultion of at least another 10e-3 is
> > likely
>
> Is it, though? To be useful, sur
On Fri, Nov 18, 2016 at 09:48:21PM +, David Howells wrote:
> Dave Chinner wrote:
>
> > > Btw, can you point me at the manpage that defines the fsxattr struct and
> > > its
> > > flags?
> >
> > man xfsctl is the original source. However,
&g
On Thu, Nov 17, 2016 at 08:28:57PM -0700, Andreas Dilger wrote:
> On Nov 17, 2016, at 4:40 PM, Dave Chinner wrote:
> >>
> >> Time fields are split into separate seconds and nanoseconds fields to make
> >> packing easier and the granularities can be queried with t
On Fri, Nov 18, 2016 at 09:43:38AM +, David Howells wrote:
> Dave Chinner wrote:
> > > Fields in struct statx come in a number of classes:
> > >
> > > (0) stx_dev_*, stx_blksize.
> > >
> > > These are local system information and are
On Fri, Nov 18, 2016 at 10:29:04AM +, David Howells wrote:
> Dave Chinner wrote:
>
> > > (13) FS_IOC_GETFLAGS value. These could be translated to BSD's st_flags.
> > > Note that the Linux IOC flags are a mess and filesystems such as Ext4
> > >
t; The file is built automatically if CONFIG_SAMPLES is enabled.
Can we get xfstests written to exercise and validate all this
functionality, please? I'd suggest that adding xfs_io support for
the statx syscall would be far more useful for xfstests than a
standalone test program, too. We already have equivalent stat()
functionality in xfs_io and that's used quite a bit in xfstests
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
601 - 700 of 2113 matches
Mail list logo