Re: [PATCH v4 72/73] xfs: Convert mru cache to XArray

2017-12-11 Thread Dave Chinner
On Mon, Dec 11, 2017 at 02:12:28PM -0800, Joe Perches wrote:
> On Tue, 2017-12-12 at 08:43 +1100, Dave Chinner wrote:
> > On Sat, Dec 09, 2017 at 09:00:18AM -0800, Joe Perches wrote:
> > > On Sat, 2017-12-09 at 09:36 +1100, Dave Chinner wrote:
> > > > 1. Using lockdep_set_novalidate_class() for anything other
> > > > than device->mutex will throw checkpatch warnings. Nice. (*)
> > > []
> > > > (*) checkpatch.pl is considered mostly harmful round here, too,
> > > > but that's another rant
> > > 
> > > How so?
> > 
> > Short story is that it barfs all over the slightly non-standard
> > coding style used in XFS.
> []
> > This sort of stuff is just lowest-common-denominator noise - great
> > for new code and/or inexperienced developers, but not for working
> > with large bodies of existing code with slightly non-standard
> > conventions.
> 
> Completely reasonable.  Thanks.
> 
> Do you get many checkpatch submitters for fs/xfs?

We used to. Not recently, though.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 72/73] xfs: Convert mru cache to XArray

2017-12-11 Thread Dave Chinner
On Sun, Dec 10, 2017 at 08:23:15PM -0800, Matthew Wilcox wrote:
> On Mon, Dec 11, 2017 at 10:57:45AM +1100, Dave Chinner wrote:
> > i.e.  the fact the cmpxchg failed may not have anything to do with a
> > race condtion - it failed because the slot wasn't empty like we
> > expected it to be. There can be any number of reasons the slot isn't
> > empty - the API should not "document" that the reason the insert
> > failed was a race condition. It should document the case that we
> > "couldn't insert because there was an existing entry in the slot".
> > Let the surrounding code document the reason why that might have
> > happened - it's not for the API to assume reasons for failure.
> > 
> > i.e. this API and potential internal implementation makes much
> > more sense:
> > 
> > int
> > xa_store_iff_empty(...)
> > {
> > curr = xa_cmpxchg(>pag_ici_xa, agino, NULL, ip, GFP_NOFS);
> > if (!curr)
> > return 0;   /* success! */
> > if (!IS_ERR(curr))
> > return -EEXIST; /* failed - slot not empty */
> > return PTR_ERR(curr);   /* failed - XA internal issue */
> > }
> > 
> > as it replaces the existing preload and insert code in the XFS code
> > paths whilst letting us handle and document the "insert failed
> > because slot not empty" case however we want. It implies nothing
> > about *why* the slot wasn't empty, just that we couldn't do the
> > insert because it wasn't empty.
> 
> Yeah, OK.  So, over the top of the recent changes I'm looking at this:
> 
> I'm not in love with xa_store_empty() as a name.  I almost want
> xa_store_weak(), but after my MAP_FIXED_WEAK proposed name got shot
> down, I'm leery of it.  "empty" is at least a concept we already have
> in the API with the comment for xa_init() talking about an empty array
> and xa_empty().  I also considered xa_store_null and xa_overwrite_null
> and xa_replace_null().  Naming remains hard.
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 941f38bb94a4..586b43836905 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -451,7 +451,7 @@ xfs_iget_cache_miss(
>   int flags,
>   int lock_flags)
>  {
> - struct xfs_inode*ip, *curr;
> + struct xfs_inode*ip;
>   int error;
>   xfs_agino_t agino = XFS_INO_TO_AGINO(mp, ino);
>   int iflags;
> @@ -498,8 +498,7 @@ xfs_iget_cache_miss(
>   xfs_iflags_set(ip, iflags);
>  
>   /* insert the new inode */
> - curr = xa_cmpxchg(>pag_ici_xa, agino, NULL, ip, GFP_NOFS);
> - error = __xa_race(curr, -EAGAIN);
> + error = xa_store_empty(>pag_ici_xa, agino, ip, GFP_NOFS, -EAGAIN);
>   if (error)
>   goto out_unlock;

Can we avoid encoding the error to be returned into the function
call? No other generic/library API does this, so this seems like a
highly unusual special snowflake approach. I just don't see how
creating a whole new error specification convention is justified
for the case where it *might* save a line or two of error checking
code in a caller?

I'd much prefer that the API defines the error on failure, and let
the caller handle that specific error however they need to like
every other library interface we have...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 72/73] xfs: Convert mru cache to XArray

2017-12-11 Thread Dave Chinner
On Sat, Dec 09, 2017 at 09:00:18AM -0800, Joe Perches wrote:
> On Sat, 2017-12-09 at 09:36 +1100, Dave Chinner wrote:
> > 1. Using lockdep_set_novalidate_class() for anything other
> > than device->mutex will throw checkpatch warnings. Nice. (*)
> []
> > (*) checkpatch.pl is considered mostly harmful round here, too,
> > but that's another rant
> 
> How so?

Short story is that it barfs all over the slightly non-standard
coding style used in XFS.  It generates enough noise on incidental
things we aren't important that it complicates simple things. e.g. I
just moved a block of defines from one header to another, and
checkpatch throws about 10 warnings on that because of whitespace.
I'm just moving code - I don't want to change it and it doesn't need
to be modified because it is neat and easy to read and is obviously
correct. A bunch of prototypes I added another parameter to gets
warnings because it uses "unsigned" for an existing parameter that
I'm not changing. And so on.

This sort of stuff is just lowest-common-denominator noise - great
for new code and/or inexperienced developers, but not for working
with large bodies of existing code with slightly non-standard
conventions. Churning *lots* of code we otherwise wouldn't touch
just to shut up checkpatch warnings takes time, risks regressions
and makes it harder to trace the history of the code when we are
trying to track down bugs.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 72/73] xfs: Convert mru cache to XArray

2017-12-10 Thread Dave Chinner
On Fri, Dec 08, 2017 at 03:01:31PM -0800, Matthew Wilcox wrote:
> On Thu, Dec 07, 2017 at 11:38:43AM +1100, Dave Chinner wrote:
> > > > cmpxchg is for replacing a known object in a store - it's not really
> > > > intended for doing initial inserts after a lookup tells us there is
> > > > nothing in the store.  The radix tree "insert only if empty" makes
> > > > sense here, because it naturally takes care of lookup/insert races
> > > > via the -EEXIST mechanism.
> > > > 
> > > > I think that providing xa_store_excl() (which would return -EEXIST
> > > > if the entry is not empty) would be a better interface here, because
> > > > it matches the semantics of lookup cache population used all over
> > > > the kernel
> > > 
> > > I'm not thrilled with xa_store_excl(), but I need to think about that
> > > a bit more.
> > 
> > Not fussed about the name - I just think we need a function that
> > matches the insert semantics of the code
> 
> I think I have something that works better for you than returning -EEXIST
> (because you don't actually want -EEXIST, you want -EAGAIN):
> 
> /* insert the new inode */
> -   spin_lock(>pag_ici_lock);
> -   error = radix_tree_insert(>pag_ici_root, agino, ip);
> -   if (unlikely(error)) {
> -   WARN_ON(error != -EEXIST);
> -   XFS_STATS_INC(mp, xs_ig_dup);
> -   error = -EAGAIN;
> -   goto out_preload_end;
> -   }
> -   spin_unlock(>pag_ici_lock);
> -   radix_tree_preload_end();
> +   curr = xa_cmpxchg(>pag_ici_xa, agino, NULL, ip, GFP_NOFS);
> +   error = __xa_race(curr, -EAGAIN);
> +   if (error)
> +   goto out_unlock;
> 
> ...
> 
> -out_preload_end:
> -   spin_unlock(>pag_ici_lock);
> -   radix_tree_preload_end();
> +out_unlock:
> +   if (error == -EAGAIN)
> +   XFS_STATS_INC(mp, xs_ig_dup);
> 
> I've changed the behaviour slightly in that returning an -ENOMEM used to
> hit a WARN_ON, and I don't think that's the right response -- GFP_NOFS
> returning -ENOMEM probably gets you a nice warning already from the
> mm code.

It's been a couple of days since I first looked at this, and my
initial reaction of "that's horrible!" hasn't changed.

What you are proposing here might be a perfectly reasonable
*internal implemention* of xa_store_excl(), but it makes for a
terrible external API because the sematics and behaviour are so
vague. e.g. what does "race" mean here with respect to an insert
failure?

i.e.  the fact the cmpxchg failed may not have anything to do with a
race condtion - it failed because the slot wasn't empty like we
expected it to be. There can be any number of reasons the slot isn't
empty - the API should not "document" that the reason the insert
failed was a race condition. It should document the case that we
"couldn't insert because there was an existing entry in the slot".
Let the surrounding code document the reason why that might have
happened - it's not for the API to assume reasons for failure.

i.e. this API and potential internal implementation makes much
more sense:

int
xa_store_iff_empty(...)
{
curr = xa_cmpxchg(>pag_ici_xa, agino, NULL, ip, GFP_NOFS);
if (!curr)
return 0;   /* success! */
if (!IS_ERR(curr))
return -EEXIST; /* failed - slot not empty */
return PTR_ERR(curr);   /* failed - XA internal issue */
}

as it replaces the existing preload and insert code in the XFS code
paths whilst letting us handle and document the "insert failed
because slot not empty" case however we want. It implies nothing
about *why* the slot wasn't empty, just that we couldn't do the
insert because it wasn't empty.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Lockdep is less useful than it was

2017-12-08 Thread Dave Chinner
On Fri, Dec 08, 2017 at 10:14:38AM -0800, Matthew Wilcox wrote:
> At the moment, the radix tree actively disables the RCU checking that
> enabling lockdep would give us.  It has to, because it has no idea what
> lock protects any individual access to the radix tree.  The XArray can
> use the RCU checking because it knows that every reference is protected
> by either the spinlock or the RCU lock.
> 
> Dave was saying that he has a tree which has to be protected by a mutex
> because of where it is in the locking hierarchy, and I was vigorously
> declining his proposal of allowing him to skip taking the spinlock.

Oh, I wasn't suggesting that you remove the internal tree locking
because we need external locking.

I was trying to point out that the internal locking doesn't remove
the need for external locking,  and that there are cases where
smearing the internal lock outside the XA tree doesn't work, either.
i.e. internal locking doesn't replace all the cases where external
locking is required, and so it's less efficient than the existing
radix tree code.

What I was questioning was the value of replacing the radix tree
code with a less efficient structure just to add lockdep validation
to a tree that doesn't actually need any extra locking validation...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 72/73] xfs: Convert mru cache to XArray

2017-12-08 Thread Dave Chinner
On Fri, Dec 08, 2017 at 12:35:07PM -0500, Alan Stern wrote:
> On Fri, 8 Dec 2017, Byungchul Park wrote:
> 
> > I'm sorry to hear that.. If I were you, I would also get
> > annoyed. And.. thanks for explanation.
> > 
> > But, I think assigning lock classes properly and checking
> > relationship of the classes to detect deadlocks is reasonable.
> > 
> > In my opinion about the common lockdep stuff, there are 2
> > problems on it.
> > 
> > 1) Firstly, it's hard to assign lock classes *properly*. By
> > default, it relies on the caller site of lockdep_init_map(),
> > but we need to assign another class manually, where ordering
> > rules are complicated so cannot rely on the caller site. That
> > *only* can be done by experts of the subsystem.

Sure, but that's not the issue here. The issue here is the lack of
communication with subsystem experts and that the annotation
complexity warnings given immediately by the subsystem experts were
completely ignored...

> > I think if they want to get benifit from lockdep, they have no
> > choice but to assign classes manually with the domain knowledge,
> > or use *lockdep_set_novalidate_class()* to invalidate locks
> > making the developers annoyed and not want to use the checking
> > for them.
> 
> Lockdep's no_validate class is used when the locking patterns are too
> complicated for lockdep to understand.  Basically, it tells lockdep to
> ignore those locks.

Let me just point out two things here:

1. Using lockdep_set_novalidate_class() for anything other
than device->mutex will throw checkpatch warnings. Nice. (*)

2. lockdep_set_novalidate_class() is completely undocumented
- it's the first I've ever heard of this functionality. i.e.
nobody has ever told us there is a mechanism to turn off
validation of an object; we've *always* been told to "change
your code and/or fix your annotations" when discussing
lockdep deficiencies. (**)

> The device core uses that class.  The tree of struct devices, each with
> its own lock, gets used in many different and complicated ways.  
> Lockdep can't understand this -- it doesn't have the ability to
> represent an arbitrarily deep hierarchical tree of locks -- so we tell
^^

That largely describes the in-memory structure of XFS, except we
have a forest of lock trees, not just one

> it to ignore the device locks.
> 
> It sounds like XFS may need to do the same thing with its semaphores.

Who-ever adds semaphore checking to lockdep can add those
annotations. The externalisation of the development cost of new
lockdep functionality is one of the problems here.

-Dave.

(*) checkpatch.pl is considered mostly harmful round here, too,
but that's another rant

(**) the frequent occurrence of "core code/devs aren't held to the
same rules/standard as everyone else" is another rant I have stored
up for a rainy day.

-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 72/73] xfs: Convert mru cache to XArray

2017-12-07 Thread Dave Chinner
On Fri, Dec 08, 2017 at 01:45:52PM +0900, Byungchul Park wrote:
> On Fri, Dec 08, 2017 at 09:22:16AM +1100, Dave Chinner wrote:
> > On Thu, Dec 07, 2017 at 11:06:34AM -0500, Theodore Ts'o wrote:
> > > On Wed, Dec 06, 2017 at 06:06:48AM -0800, Matthew Wilcox wrote:
> > > > > Unfortunately for you, I don't find arguments along the lines of
> > > > > "lockdep will save us" at all convincing.  lockdep already throws
> > > > > too many false positives to be useful as a tool that reliably and
> > > > > accurately points out rare, exciting, complex, intricate locking
> > > > > problems.
> > > > 
> > > > But it does reliably and accurately point out "dude, you forgot to take
> > > > the lock".  It's caught a number of real problems in my own testing that
> > > > you never got to see.
> > > 
> > > The problem is that if it has too many false positives --- and it's
> > > gotten *way* worse with the completion callback "feature", people will
> > > just stop using Lockdep as being too annyoing and a waste of developer
> > > time when trying to figure what is a legitimate locking bug versus
> > > lockdep getting confused.
> > > 
> > > I can't even disable the new Lockdep feature which is throwing
> > > lots of new false positives --- it's just all or nothing.
> > > 
> > > Dave has just said he's already stopped using Lockdep, as a result.
> > 
> > This is compeltely OT, but FYI I stopped using lockdep a long time
> > ago.  We've spend orders of magnitude more time and effort to shut
> > up lockdep false positives in the XFS code than we ever have on
> > locking problems that lockdep has uncovered. And still lockdep
> > throws too many false positives on XFS workloads to be useful to me.
> > 
> > But it's more than that: I understand just how much lockdep *doesn't
> > check* and that means *I know I can't rely on lockdep* for potential
> > deadlock detection. e.g.  it doesn't cover semaphores, which means
> 
> Hello,
> 
> I'm careful in saying the following since you seem to feel not good at
> crossrelease and even lockdep. Now that cross-release has been
> introduced, semaphores can be covered as you might know. Actually, all
> general waiters can.

And all it will do is create a whole bunch more work for us XFS guys
to shut up all the the false positive crap that falls out from it
because the locking model we have is far more complex than any of
the lockdep developers thought was necessary to support, just like
happened with the XFS inode annotations all those years ago.

e.g. nobody has ever bothered to ask us what is needed to describe
XFS's semaphore locking model.  If you did that, you'd know that we
nest *thousands* of locked semaphores in compeltely random lock
order during metadata buffer writeback. And that this lock order
does not reflect the actual locking order rules we have for locking
buffers during transactions.

Oh, and you'd also know that a semaphore's lock order and context
can change multiple times during the life time of the buffer.  Say
we free a block and the reallocate it as something else before it is
reclaimed - that buffer now might have a different lock order. Or
maybe we promote a buffer to be a root btree block as a result of a
join - it's now the first buffer in a lock run, rather than a child.
Or we split a tree, and the root is now a node and so no longer is
the first buffer in a lock run. Or that we walk sideways along the
leaf nodes siblings during searches.  IOWs, there is no well defined
static lock ordering at all for buffers - and therefore semaphores -
in XFS at all.

And knowing that, you wouldn't simply mention that lockdep can
support semaphores now as though that is necessary to "make it work"
for XFS.  It's going to be much simpler for us to just turn off
lockdep and ignore whatever crap it sends our way than it is to
spend unplanned weeks of our time to try to make lockdep sorta work
again. Sure, we might get there in the end, but it's likely to take
months, if not years like it did with the XFS inode annotations.

> > it has zero coverage of the entire XFS metadata buffer subsystem and
> > the complex locking orders we have for metadata updates.
> > 
> > Put simply: lockdep doesn't provide me with any benefit, so I don't
> > use it...
> 
> Sad..

I don't think you understand. I'll try to explain.

The lockdep infrastructure by itself doesn't make lockdep a useful
tool - it mostly generates false positives because it has no
concept of locking models that don't match it's internal tracking
assumptions and/or limitations.

That means if we can't suppress the false positives, then l

Re: Lockdep is less useful than it was

2017-12-07 Thread Dave Chinner
On Thu, Dec 07, 2017 at 02:38:03PM -0800, Matthew Wilcox wrote:
> On Thu, Dec 07, 2017 at 11:06:34AM -0500, Theodore Ts'o wrote:
> > The problem is that if it has too many false positives --- and it's
> > gotten *way* worse with the completion callback "feature", people will
> > just stop using Lockdep as being too annyoing and a waste of developer
> > time when trying to figure what is a legitimate locking bug versus
> > lockdep getting confused.
> > 
> > I can't even disable the new Lockdep feature which is throwing
> > lots of new false positives --- it's just all or nothing.
> 
> You *can* ... but it's way more hacking Kconfig than you ought to have
> to do (which is a separate rant ...)
> 
> You need to get LOCKDEP_CROSSRELEASE off.  I'd revert patches
> e26f34a407aec9c65bce2bc0c838fabe4f051fc6 and
> b483cf3bc249d7af706390efa63d6671e80d1c09
> 
> I think it was a mistake to force these on for everybody; they have a
> much higher false-positive rate than the rest of lockdep, so as you say
> forcing them on leads to fewer people using *any* of lockdep.
> 
> The bug you're hitting isn't Byungchul's fault; it's an annotation
> problem.  The same kind of annotation problem that we used to have with
> dozens of other places in the kernel which are now fixed.

That's one of the fundamental problem with lockdep - it throws the
difficulty of solving all these new false positives onto the
developers who know nothing about lockdep and don't follow it's
development. And until they do solve them - especially in critical
subsystems that everyone uses like the storage stack - lockdep is
essentially worthless.

> If you didn't
> have to hack Kconfig to get rid of this problem, you'd be happier, right?

I'd be much happier if it wasn't turned on by default in the first
place.  We gave plenty of warnings that there were still unsolved
false positive problems with the new checks in the storage stack.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 72/73] xfs: Convert mru cache to XArray

2017-12-07 Thread Dave Chinner
On Thu, Dec 07, 2017 at 11:06:34AM -0500, Theodore Ts'o wrote:
> On Wed, Dec 06, 2017 at 06:06:48AM -0800, Matthew Wilcox wrote:
> > > Unfortunately for you, I don't find arguments along the lines of
> > > "lockdep will save us" at all convincing.  lockdep already throws
> > > too many false positives to be useful as a tool that reliably and
> > > accurately points out rare, exciting, complex, intricate locking
> > > problems.
> > 
> > But it does reliably and accurately point out "dude, you forgot to take
> > the lock".  It's caught a number of real problems in my own testing that
> > you never got to see.
> 
> The problem is that if it has too many false positives --- and it's
> gotten *way* worse with the completion callback "feature", people will
> just stop using Lockdep as being too annyoing and a waste of developer
> time when trying to figure what is a legitimate locking bug versus
> lockdep getting confused.
> 
> I can't even disable the new Lockdep feature which is throwing
> lots of new false positives --- it's just all or nothing.
> 
> Dave has just said he's already stopped using Lockdep, as a result.

This is compeltely OT, but FYI I stopped using lockdep a long time
ago.  We've spend orders of magnitude more time and effort to shut
up lockdep false positives in the XFS code than we ever have on
locking problems that lockdep has uncovered. And still lockdep
throws too many false positives on XFS workloads to be useful to me.

But it's more than that: I understand just how much lockdep *doesn't
check* and that means *I know I can't rely on lockdep* for potential
deadlock detection. e.g.  it doesn't cover semaphores, which means
it has zero coverage of the entire XFS metadata buffer subsystem and
the complex locking orders we have for metadata updates.

Put simply: lockdep doesn't provide me with any benefit, so I don't
use it...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 72/73] xfs: Convert mru cache to XArray

2017-12-06 Thread Dave Chinner
On Wed, Dec 06, 2017 at 06:06:48AM -0800, Matthew Wilcox wrote:
> On Wed, Dec 06, 2017 at 07:44:04PM +1100, Dave Chinner wrote:
> > On Tue, Dec 05, 2017 at 08:45:49PM -0800, Matthew Wilcox wrote:
> > > That said, using xa_cmpxchg() in the dquot code looked like the right
> > > thing to do?  Since we'd dropped the qi mutex and the ILOCK, it looks
> > > entirely reasonable for another thread to come in and set up the dquot.
> > > But I'm obviously quite ignorant of the XFS internals, so maybe there's
> > > something else going on that makes this essentially a "can't happen".
> > 
> > It's no different to the inode cache code, which drops the RCU
> > lock on lookup miss, instantiates the new inode (maybe reading it
> > off disk), then locks the tree and attempts to insert it. Both cases
> > use "insert if empty, otherwise retry lookup from start" semantics.
> 
> Ah.  I had my focus set a little narrow on the inode cache code and didn't
> recognise the pattern.
> 
> Why do you sleep for one jiffy after encountering a miss, then seeing
> someone else insert the inode for you?

The sleep is a backoff that allows whatever we raced with to
complete, be it a hit that raced with an inode being reclaimed and
removed, or a miss that raced with another insert. Ideally we'd
sleep on the XFS_INEW bit, similar to the vfs I_NEW flag, but it's
not quite that simple with the reclaim side of things...

> > cmpxchg is for replacing a known object in a store - it's not really
> > intended for doing initial inserts after a lookup tells us there is
> > nothing in the store.  The radix tree "insert only if empty" makes
> > sense here, because it naturally takes care of lookup/insert races
> > via the -EEXIST mechanism.
> > 
> > I think that providing xa_store_excl() (which would return -EEXIST
> > if the entry is not empty) would be a better interface here, because
> > it matches the semantics of lookup cache population used all over
> > the kernel
> 
> I'm not thrilled with xa_store_excl(), but I need to think about that
> a bit more.

Not fussed about the name - I just think we need a function that
matches the insert semantics of the code

> > > I'm quite happy to have normal API variants that don't save/restore
> > > interrupts.  Just need to come up with good names ... I don't think
> > > xa_store_noirq() is a good name, but maybe you do?
> > 
> > I'd prefer not to have to deal with such things at all. :P
> > 
> > How many subsystems actually require irq safety in the XA locking
> > code? Make them use irqsafe versions, not make everyone else use
> > "noirq" versions, as is the convention for the rest of the kernel
> > code
> 
> Hard to say how many existing radix tree users require the irq safety.

The mapping tree requires it because it gets called from IO
completion contexts to clear page writeback state, but I don't know
about any of the others.

> Also hard to say how many potential users (people currently using
> linked lists, people using resizable arrays, etc) need irq safety.
> My thinking was "make it safe by default and let people who know better
> have a way to opt out", but there's definitely something to be said for
> "make it fast by default and let people who need the unusual behaviour
> type those extra few letters".
> 
> So, you're arguing for providing xa_store(), xa_store_irq(), xa_store_bh()
> and xa_store_irqsafe()?  (at least on demand, as users come to light?)
> At least the read side doesn't require any variants; everybody can use
> RCU for read side protection.

That would follow the pattern of the rest of the kernel APIs, though
I think it might be cleaner to simply state the locking requirement
to xa_init() and keep all those details completely internal rather
than encoding them into API calls. After all, the "irqsafe-ness" of
the locking needs to be consistent across the entire XA instance

> ("safe", not "save" because I wouldn't make the caller provide the
> "flags" argument).
> 
> > > At least, not today.  One of the future plans is to allow xa_nodes to
> > > be allocated from ZONE_MOVABLE.  In order to do that, we have to be
> > > able to tell which lock protects any given node.  With the XArray,
> > > we can find that out (xa_node->root->xa_lock); with the radix tree,
> > > we don't even know what kind of lock protects the tree.
> > 
> > Yup, this is a prime example of why we shouldn't be creating
> > external dependencies by smearing the locking context outside the XA
> > structure itself. It's not a stretch to se

Re: [PATCH v4 72/73] xfs: Convert mru cache to XArray

2017-12-06 Thread Dave Chinner
On Tue, Dec 05, 2017 at 08:45:49PM -0800, Matthew Wilcox wrote:
> On Wed, Dec 06, 2017 at 02:14:56PM +1100, Dave Chinner wrote:
> > > The other conversions use the normal API instead of the advanced API, so
> > > all of this gets hidden away.  For example, the inode cache does this:
> > 
> > Ah, OK, that's not obvious from the code changes. :/
> 
> Yeah, it's a lot easier to understand (I think!) if you build the
> docs in that tree and look at
> file:///home/willy/kernel/xarray-3/Documentation/output/core-api/xarray.html
> (mutatis mutandi).  I've tried to tell a nice story about how to put
> all the pieces together from the normal to the advanced API.
> 
> > However, it's probably overkill for XFS. In all the cases, when we
> > insert there should be no entry in the tree - the
> > radix tree insert error handling code there was simply catching
> > "should never happen" cases and handling it without crashing.
> 
> I thought it was probably overkill to be using xa_cmpxchg() in the
> pag_ici patch.  I didn't want to take away your error handling as part
> of the conversion, but I think a rational person implementing it today
> would just call xa_store() and not even worry about the return value
> except to check it for IS_ERR().

*nod*

> That said, using xa_cmpxchg() in the dquot code looked like the right
> thing to do?  Since we'd dropped the qi mutex and the ILOCK, it looks
> entirely reasonable for another thread to come in and set up the dquot.
> But I'm obviously quite ignorant of the XFS internals, so maybe there's
> something else going on that makes this essentially a "can't happen".

It's no different to the inode cache code, which drops the RCU
lock on lookup miss, instantiates the new inode (maybe reading it
off disk), then locks the tree and attempts to insert it. Both cases
use "insert if empty, otherwise retry lookup from start" semantics.

cmpxchg is for replacing a known object in a store - it's not really
intended for doing initial inserts after a lookup tells us there is
nothing in the store.  The radix tree "insert only if empty" makes
sense here, because it naturally takes care of lookup/insert races
via the -EEXIST mechanism.

I think that providing xa_store_excl() (which would return -EEXIST
if the entry is not empty) would be a better interface here, because
it matches the semantics of lookup cache population used all over
the kernel

> > Now that I've looked at this, I have to say that having a return
> > value of NULL meaning "success" is quite counter-intuitive. That's
> > going to fire my "that looks so wrong" detector every time I look at
> > the code and notice it's erroring out on a non-null return value
> > that isn't a PTR_ERR case
> 
> It's the same convention as cmpxchg().  I think it's triggering your
> "looks so wrong" detector because it's fundamentally not the natural
> thing to write.

Most definitely the case, and this is why it's a really bad
interface for the semantics we have. This how we end up with code
that makes it easy for programmers to screw up pointer checks in
error handling... :/

> I'm quite happy to have normal API variants that don't save/restore
> interrupts.  Just need to come up with good names ... I don't think
> xa_store_noirq() is a good name, but maybe you do?

I'd prefer not to have to deal with such things at all. :P

How many subsystems actually require irq safety in the XA locking
code? Make them use irqsafe versions, not make everyone else use
"noirq" versions, as is the convention for the rest of the kernel
code

> > > It's the design pattern I've always intended to use.  Naturally, the
> > > xfs radix trees weren't my initial target; it was the page cache, and
> > > the page cache does the same thing; uses the tree_lock to protect both
> > > the radix tree and several other fields in that same data structure.
> > > 
> > > I'm open to argument on this though ... particularly if you have a better
> > > design pattern in mind!
> > 
> > I don't mind structures having internal locking - I have a problem
> > with leaking them into contexts outside the structure they protect.
> > That way lies madness - you can't change the internal locking in
> > future because of external dependencies, and the moment you need
> > something different externally we've got to go back to an external
> > lock anyway.
> > 
> > This is demonstrated by the way you converted the XFS dquot tree -
> > you didn't replace the dquot tree lock with the internal xa_lock
> > because it's a mutex and we have to sleep holding it. IOWs, we've
> > added another layer of locking here, not simplif

Re: [PATCH v4 72/73] xfs: Convert mru cache to XArray

2017-12-05 Thread Dave Chinner
On Tue, Dec 05, 2017 at 06:02:08PM -0800, Matthew Wilcox wrote:
> On Wed, Dec 06, 2017 at 12:36:48PM +1100, Dave Chinner wrote:
> > > - if (radix_tree_preload(GFP_NOFS))
> > > - return -ENOMEM;
> > > -
> > >   INIT_LIST_HEAD(>list_node);
> > >   elem->key = key;
> > >  
> > > - spin_lock(>lock);
> > > - error = radix_tree_insert(>store, key, elem);
> > > - radix_tree_preload_end();
> > > - if (!error)
> > > - _xfs_mru_cache_list_insert(mru, elem);
> > > - spin_unlock(>lock);
> > > + do {
> > > + xas_lock();
> > > + xas_store(, elem);
> > > + error = xas_error();
> > > + if (!error)
> > > + _xfs_mru_cache_list_insert(mru, elem);
> > > + xas_unlock();
> > > + } while (xas_nomem(, GFP_NOFS));
> > 
> > Ok, so why does this have a retry loop on ENOMEM despite the
> > existing code handling that error? And why put such a loop in this
> > code and not any of the other XFS code that used
> > radix_tree_preload() and is arguably much more important to avoid
> > ENOMEM on insert (e.g. the inode cache)?
> 
> If we need more nodes in the tree, xas_store() will try to allocate them
> with GFP_NOWAIT | __GFP_NOWARN.  If that fails, it signals it in xas_error().
> xas_nomem() will notice that we're in an ENOMEM situation, and allocate
> a node using your preferred GFP flags (NOIO in your case).  Then we retry,
> guaranteeing forward progress. [1]
> 
> The other conversions use the normal API instead of the advanced API, so
> all of this gets hidden away.  For example, the inode cache does this:
> 
> +   curr = xa_cmpxchg(>pag_ici_xa, agino, NULL, ip, GFP_NOFS);
> 
> and xa_cmpxchg internally does:
> 
> do {
> xa_lock_irqsave(xa, flags);
> curr = xas_create();
> if (curr == old)
> xas_store(, entry);
> xa_unlock_irqrestore(xa, flags);
> } while (xas_nomem(, gfp));

Ah, OK, that's not obvious from the code changes. :/

However, it's probably overkill for XFS. In all the cases, when we
insert there should be no entry in the tree - the
radix tree insert error handling code there was simply catching
"should never happen" cases and handling it without crashing.

Now that I've looked at this, I have to say that having a return
value of NULL meaning "success" is quite counter-intuitive. That's
going to fire my "that looks so wrong" detector every time I look at
the code and notice it's erroring out on a non-null return value
that isn't a PTR_ERR case

Also, there's no need for irqsave/restore() locking contexts here as
we never access these caches from interrupt contexts. That's just
going to be extra overhead, especially on workloads that run 10^6
inodes inodes through the cache every second. That's a problem
caused by driving the locks into the XA structure and then needing
to support callers that require irq safety

> > Also, I really don't like the pattern of using xa_lock()/xa_unlock()
> > to protect access to an external structure. i.e. the mru->lock
> > context is protecting multiple fields and operations in the MRU
> > structure, not just the radix tree operations. Turning that around
> > so that a larger XFS structure and algorithm is now protected by an
> > opaque internal lock from generic storage structure the forms part
> > of the larger structure seems like a bad design pattern to me...
> 
> It's the design pattern I've always intended to use.  Naturally, the
> xfs radix trees weren't my initial target; it was the page cache, and
> the page cache does the same thing; uses the tree_lock to protect both
> the radix tree and several other fields in that same data structure.
> 
> I'm open to argument on this though ... particularly if you have a better
> design pattern in mind!

I don't mind structures having internal locking - I have a problem
with leaking them into contexts outside the structure they protect.
That way lies madness - you can't change the internal locking in
future because of external dependencies, and the moment you need
something different externally we've got to go back to an external
lock anyway.

This is demonstrated by the way you converted the XFS dquot tree -
you didn't replace the dquot tree lock with the internal xa_lock
because it's a mutex and we have to sleep holding it. IOWs, we've
added another layer of locking here, not simplified the code.

What I really see here is that  we have inconsistent locking
patterns w.r.t. XA stores inside XFS - some have an external mutex
to cover a wider scope, some use xa_

Re: [PATCH v4 00/73] XArray version 4

2017-12-05 Thread Dave Chinner
On Tue, Dec 05, 2017 at 06:05:15PM -0800, Matthew Wilcox wrote:
> On Wed, Dec 06, 2017 at 12:45:49PM +1100, Dave Chinner wrote:
> > On Tue, Dec 05, 2017 at 04:40:46PM -0800, Matthew Wilcox wrote:
> > > From: Matthew Wilcox <mawil...@microsoft.com>
> > > 
> > > I looked through some notes and decided this was version 4 of the XArray.
> > > Last posted two weeks ago, this version includes a *lot* of changes.
> > > I'd like to thank Dave Chinner for his feedback, encouragement and
> > > distracting ideas for improvement, which I'll get to once this is merged.
> > 
> > BTW, you need to fix the "To:" line on your patchbombs:
> > 
> > > To: unlisted-recipients: ;, no To-header on input 
> > > <@gmail-pop.l.google.com> 
> > 
> > This bad email address getting quoted to the cc line makes some MTAs
> > very unhappy.
> 
> I know :-(  I was unhappy when I realised what I'd done.
> 
> https://marc.info/?l=git=151252237912266=2
> 
> > I'll give this a quick burn this afternoon and see what catches fire...
> 
> All of the things ... 0day gave me a 90% chance of hanging in one
> configuration.  Need to drill down on it more and find out what stupid
> thing I've done wrong this time.

Yup, Bad Stuff happened on boot:

[   24.548039] INFO: rcu_preempt detected stalls on CPUs/tasks:
[   24.548978]  1-...!: (0 ticks this GP) idle=688/0/0 softirq=143/143 fqs=0
[   24.549926]  5-...!: (0 ticks this GP) idle=db8/0/0 softirq=120/120 fqs=0
[   24.550864]  6-...!: (0 ticks this GP) idle=d58/0/0 softirq=111/111 fqs=0
[   24.551802]  8-...!: (5 GPs behind) idle=514/0/0 softirq=189/189 fqs=0
[   24.552722]  10-...!: (84 GPs behind) idle=ac0/0/0 softirq=80/80 fqs=0
[   24.553617]  11-...!: (8 GPs behind) idle=cfc/0/0 softirq=95/95 fqs=0
[   24.554496]  13-...!: (8 GPs behind) idle=b0c/0/0 softirq=82/82 fqs=0
[   24.555382]  14-...!: (38 GPs behind) idle=a7c/0/0 softirq=93/93 fqs=0
[   24.556305]  15-...!: (4 GPs behind) idle=b18/0/0 softirq=88/88 fqs=0
[   24.557190]  (detected by 9, t=5252 jiffies, g=-178, c=-179, q=994)
[   24.558051] Sending NMI from CPU 9 to CPUs 1:
[   24.558703] NMI backtrace for cpu 1 skipped: idling at 
native_safe_halt+0x2/0x10
[   24.559654] Sending NMI from CPU 9 to CPUs 5:
[   24.559675] NMI backtrace for cpu 5 skipped: idling at 
native_safe_halt+0x2/0x10
[   24.560654] Sending NMI from CPU 9 to CPUs 6:
[   24.560689] NMI backtrace for cpu 6 skipped: idling at 
native_safe_halt+0x2/0x10
[   24.561655] Sending NMI from CPU 9 to CPUs 8:
[   24.561701] NMI backtrace for cpu 8 skipped: idling at 
native_safe_halt+0x2/0x10
[   24.562654] Sending NMI from CPU 9 to CPUs 10:
[   24.562675] NMI backtrace for cpu 10 skipped: idling at 
native_safe_halt+0x2/0x10
[   24.563653] Sending NMI from CPU 9 to CPUs 11:
[   24.563669] NMI backtrace for cpu 11 skipped: idling at 
native_safe_halt+0x2/0x10
[   24.564653] Sending NMI from CPU 9 to CPUs 13:
[   24.564670] NMI backtrace for cpu 13 skipped: idling at 
native_safe_halt+0x2/0x10
[   24.565652] Sending NMI from CPU 9 to CPUs 14:
[   24.565674] NMI backtrace for cpu 14 skipped: idling at 
native_safe_halt+0x2/0x10
[   24.566652] Sending NMI from CPU 9 to CPUs 15:
[   24.59] NMI backtrace for cpu 15 skipped: idling at 
native_safe_halt+0x2/0x10
[   24.567653] rcu_preempt kthread starved for 5256 jiffies! 
g18446744073709551438 c18446744073709551437 f0x0 RCU_GP_WAIT_FQS(3) 
->state=0x402 ->7
[   24.567654] rcu_preempt I15128 9  2 0x8000
[   24.567660] Call Trace:
[   24.567679]  ? __schedule+0x289/0x880
[   24.567681]  schedule+0x2f/0x90
[   24.567682]  schedule_timeout+0x152/0x370
[   24.567686]  ? __next_timer_interrupt+0xc0/0xc0
[   24.567689]  rcu_gp_kthread+0x561/0x880
[   24.567691]  ? force_qs_rnp+0x1a0/0x1a0
[   24.567693]  kthread+0x111/0x130
[   24.567695]  ? __kthread_create_worker+0x120/0x120
[   24.567697]  ret_from_fork+0x24/0x30
[   44.064092] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kswapd0:854]
[   44.065920] CPU: 0 PID: 854 Comm: kswapd0 Not tainted 4.15.0-rc2-dgc #228
[   44.067769] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1 04/01/2014
[   44.070030] RIP: 0010:smp_call_function_single+0xce/0x100
[   44.071521] RSP: :c90001d2fb20 EFLAGS: 0202 ORIG_RAX: 
ff11
[   44.073592] RAX:  RBX: 88013ab515c8 RCX: c9000350bb20
[   44.075560] RDX: 0001 RSI: c90001d2fb20 RDI: c90001d2fb20
[   44.077531] RBP: c90001d2fb50 R08: 0007 R09: 0080
[   44.079483] R10: c90001d2fb78 R11: c90001d2fb30 R12: c90001d2fc10
[   44.081465] R13: ea000449fc78 R14: ea000449fc58 R15: 88013ba36c40
[   44.083434] FS:  () GS:88013fc0() 
knlGS:
[   44.085683] CS:  0010 DS:  ES:  CR0

Re: [PATCH v4 00/73] XArray version 4

2017-12-05 Thread Dave Chinner
On Wed, Dec 06, 2017 at 01:53:41AM +, Matthew Wilcox wrote:
> Huh, you've caught a couple of problems that 0day hasn't sent me yet.  Try 
> turning on DAX or TRANSPARENT_HUGEPAGE.  Thanks!

Dax is turned on, CONFIG_TRANSPARENT_HUGEPAGE is not.

Looks like nothing is setting CONFIG_RADIX_TREE_MULTIORDER, which is
what xas_set_order() is hidden under.

Ah, CONFIG_ZONE_DEVICE turns it on, not CONFIG_DAX/CONFIG_FS_DAX.

H.  That seems wrong if it's used in fs/dax.c...

$ grep DAX .config
CONFIG_DAX=y
CONFIG_FS_DAX=y
$ grep ZONE_DEVICE .config
CONFIG_ARCH_HAS_ZONE_DEVICE=y
$

So I have DAX enabled, but not ZONE_DEVICE? Shouldn't DAX be
selecting ZONE_DEVICE, not relying on a user to select both of them
so that stuff works properly? Hmmm - there's no menu option to turn
on zone device, so it's selected by something else?  Oh, HMM turns
on ZONE device. But that is "default y", so should be turned on. But
it's not?  And there's no obvious HMM menu config option, either

What a godawful mess Kconfig has turned into.

I'm just going to enable TRANSPARENT_HUGEPAGE - madness awaits me if
I follow the other path down the rat hole

Ok, it build this time.

-Dave.

> 
> > -Original Message-----
> > From: Dave Chinner [mailto:da...@fromorbit.com]
> > Sent: Tuesday, December 5, 2017 8:51 PM
> > To: Matthew Wilcox <wi...@infradead.org>
> > Cc: Matthew Wilcox <mawil...@microsoft.com>; Ross Zwisler
> > <ross.zwis...@linux.intel.com>; Jens Axboe <ax...@kernel.dk>; Rehas
> > Sachdeva <aquan...@gmail.com>; linux...@kvack.org; linux-
> > fsde...@vger.kernel.org; linux-f2fs-de...@lists.sourceforge.net; linux-
> > ni...@vger.kernel.org; linux-bt...@vger.kernel.org; 
> > linux-...@vger.kernel.org;
> > linux-usb@vger.kernel.org; linux-ker...@vger.kernel.org
> > Subject: Re: [PATCH v4 00/73] XArray version 4
> > 
> > On Wed, Dec 06, 2017 at 12:45:49PM +1100, Dave Chinner wrote:
> > > On Tue, Dec 05, 2017 at 04:40:46PM -0800, Matthew Wilcox wrote:
> > > > From: Matthew Wilcox <mawil...@microsoft.com>
> > > >
> > > > I looked through some notes and decided this was version 4 of the 
> > > > XArray.
> > > > Last posted two weeks ago, this version includes a *lot* of changes.
> > > > I'd like to thank Dave Chinner for his feedback, encouragement and
> > > > distracting ideas for improvement, which I'll get to once this is 
> > > > merged.
> > >
> > > BTW, you need to fix the "To:" line on your patchbombs:
> > >
> > > > To: unlisted-recipients: ;, no To-header on input <@gmail-
> > pop.l.google.com>
> > >
> > > This bad email address getting quoted to the cc line makes some MTAs
> > > very unhappy.
> > >
> > > >
> > > > Highlights:
> > > >  - Over 2000 words of documentation in patch 8!  And lots more 
> > > > kernel-doc.
> > > >  - The page cache is now fully converted to the XArray.
> > > >  - Many more tests in the test-suite.
> > > >
> > > > This patch set is not for applying.  0day is still reporting problems,
> > > > and I'd feel bad for eating someone's data.  These patches apply on top
> > > > of a set of prepatory patches which just aren't interesting.  If you
> > > > want to see the patches applied to a tree, I suggest pulling my git 
> > > > tree:
> > > >
> > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgit.infrade
> > ad.org%2Fusers%2Fwilly%2Flinux-
> > dax.git%2Fshortlog%2Frefs%2Fheads%2Fxarray-2017-12-
> > 04=02%7C01%7Cmawilcox%40microsoft.com%7Ca3e721545f8b4b9dff1
> > 608d53c4bd42f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6364
> > 81218740341312=IXNZXXLTf964OQ0eLDpJt2LCv%2BGGWFW%2FQd4Kc
> > KYu6zo%3D=0
> > > > I also left out the idr_preload removals.  They're still in the git 
> > > > tree,
> > > > but I'm not looking for feedback on them.
> > >
> > > I'll give this a quick burn this afternoon and see what catches fire...
> > 
> > Build warnings/errors:
> > 
> > .
> > lib/radix-tree.c:700:13: warning: ¿radix_tree_free_nodes¿ defined but not 
> > used
> > [-Wunused-function]
> >  static void radix_tree_free_nodes(struct radix_tree_node *node)
> > .
> > lib/xarray.c: In function ¿xas_max¿:
> > lib/xarray.c:291:16: warning: unused variable ¿mask¿
> > [-Wunused-variable]
> >   unsigned long mask, max = xas->xa_index;
> >   ^~~~
> > ..
> > fs/dax.c: In function ¿grab_mapping_entry¿:
> > fs/dax.c:305:2: error: implicit declaration of function ¿xas_set_order¿; 
> > did you
> > mean ¿xas_set_err¿?  [-Werror=implicit-function-declaration]
> >   xas_set_order(, index, size_flag ? PMD_ORDER : 0);
> > ^
> > scripts/Makefile.build:310: recipe for target 'fs/dax.o' failed
> > make[1]: *** [fs/dax.o] Error 1
> > 
> > -Dave.
> > --
> > Dave Chinner
> > da...@fromorbit.com

-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 00/73] XArray version 4

2017-12-05 Thread Dave Chinner
On Wed, Dec 06, 2017 at 12:45:49PM +1100, Dave Chinner wrote:
> On Tue, Dec 05, 2017 at 04:40:46PM -0800, Matthew Wilcox wrote:
> > From: Matthew Wilcox <mawil...@microsoft.com>
> > 
> > I looked through some notes and decided this was version 4 of the XArray.
> > Last posted two weeks ago, this version includes a *lot* of changes.
> > I'd like to thank Dave Chinner for his feedback, encouragement and
> > distracting ideas for improvement, which I'll get to once this is merged.
> 
> BTW, you need to fix the "To:" line on your patchbombs:
> 
> > To: unlisted-recipients: ;, no To-header on input <@gmail-pop.l.google.com> 
> 
> This bad email address getting quoted to the cc line makes some MTAs
> very unhappy.
> 
> > 
> > Highlights:
> >  - Over 2000 words of documentation in patch 8!  And lots more kernel-doc.
> >  - The page cache is now fully converted to the XArray.
> >  - Many more tests in the test-suite.
> > 
> > This patch set is not for applying.  0day is still reporting problems,
> > and I'd feel bad for eating someone's data.  These patches apply on top
> > of a set of prepatory patches which just aren't interesting.  If you
> > want to see the patches applied to a tree, I suggest pulling my git tree:
> > http://git.infradead.org/users/willy/linux-dax.git/shortlog/refs/heads/xarray-2017-12-04
> > I also left out the idr_preload removals.  They're still in the git tree,
> > but I'm not looking for feedback on them.
> 
> I'll give this a quick burn this afternoon and see what catches fire...

Build warnings/errors:

.
lib/radix-tree.c:700:13: warning: ¿radix_tree_free_nodes¿ defined but not used 
[-Wunused-function]
 static void radix_tree_free_nodes(struct radix_tree_node *node)
.
lib/xarray.c: In function ¿xas_max¿:
lib/xarray.c:291:16: warning: unused variable ¿mask¿
[-Wunused-variable]
  unsigned long mask, max = xas->xa_index;
  ^~~~
..
fs/dax.c: In function ¿grab_mapping_entry¿:
fs/dax.c:305:2: error: implicit declaration of function ¿xas_set_order¿; did 
you mean ¿xas_set_err¿?  [-Werror=implicit-function-declaration]
  xas_set_order(, index, size_flag ? PMD_ORDER : 0);
^
scripts/Makefile.build:310: recipe for target 'fs/dax.o' failed
make[1]: *** [fs/dax.o] Error 1

-Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 00/73] XArray version 4

2017-12-05 Thread Dave Chinner
On Tue, Dec 05, 2017 at 04:40:46PM -0800, Matthew Wilcox wrote:
> From: Matthew Wilcox <mawil...@microsoft.com>
> 
> I looked through some notes and decided this was version 4 of the XArray.
> Last posted two weeks ago, this version includes a *lot* of changes.
> I'd like to thank Dave Chinner for his feedback, encouragement and
> distracting ideas for improvement, which I'll get to once this is merged.

BTW, you need to fix the "To:" line on your patchbombs:

> To: unlisted-recipients: ;, no To-header on input <@gmail-pop.l.google.com> 

This bad email address getting quoted to the cc line makes some MTAs
very unhappy.

> 
> Highlights:
>  - Over 2000 words of documentation in patch 8!  And lots more kernel-doc.
>  - The page cache is now fully converted to the XArray.
>  - Many more tests in the test-suite.
> 
> This patch set is not for applying.  0day is still reporting problems,
> and I'd feel bad for eating someone's data.  These patches apply on top
> of a set of prepatory patches which just aren't interesting.  If you
> want to see the patches applied to a tree, I suggest pulling my git tree:
> http://git.infradead.org/users/willy/linux-dax.git/shortlog/refs/heads/xarray-2017-12-04
> I also left out the idr_preload removals.  They're still in the git tree,
> but I'm not looking for feedback on them.

I'll give this a quick burn this afternoon and see what catches fire...

Cheers,

Dave.

-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 72/73] xfs: Convert mru cache to XArray

2017-12-05 Thread Dave Chinner
On Tue, Dec 05, 2017 at 04:41:58PM -0800, Matthew Wilcox wrote:
> From: Matthew Wilcox <mawil...@microsoft.com>
> 
> This eliminates a call to radix_tree_preload().

.

>  void
> @@ -431,24 +424,24 @@ xfs_mru_cache_insert(
>   unsigned long   key,
>   struct xfs_mru_cache_elem *elem)
>  {
> + XA_STATE(xas, >store, key);
>   int error;
>  
>   ASSERT(mru && mru->lists);
>   if (!mru || !mru->lists)
>   return -EINVAL;
>  
> - if (radix_tree_preload(GFP_NOFS))
> - return -ENOMEM;
> -
>   INIT_LIST_HEAD(>list_node);
>   elem->key = key;
>  
> - spin_lock(>lock);
> - error = radix_tree_insert(>store, key, elem);
> - radix_tree_preload_end();
> - if (!error)
> - _xfs_mru_cache_list_insert(mru, elem);
> - spin_unlock(>lock);
> + do {
> + xas_lock();
> + xas_store(, elem);
> + error = xas_error();
> + if (!error)
> + _xfs_mru_cache_list_insert(mru, elem);
> + xas_unlock();
> + } while (xas_nomem(, GFP_NOFS));

Ok, so why does this have a retry loop on ENOMEM despite the
existing code handling that error? And why put such a loop in this
code and not any of the other XFS code that used
radix_tree_preload() and is arguably much more important to avoid
ENOMEM on insert (e.g. the inode cache)?

Also, I really don't like the pattern of using xa_lock()/xa_unlock()
to protect access to an external structure. i.e. the mru->lock
context is protecting multiple fields and operations in the MRU
structure, not just the radix tree operations. Turning that around
so that a larger XFS structure and algorithm is now protected by an
opaque internal lock from generic storage structure the forms part
of the larger structure seems like a bad design pattern to me...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html