Re: [PATCHSET v2] ->follow_link() without dropping from RCU mode

2015-12-13 Thread Rasmus Villemoes
On Sat, Dec 12 2015, Al Viro  wrote:

> On Fri, Dec 11, 2015 at 11:16:24PM +, Al Viro wrote:
>
>> #define set_delayed_type(call, f, arg) \
>>  sizeof(f(arg),0), \
>>  __set_delayed_type(call, __closure_##f, (void *)arg)
>> 
>> That could be reused for timers with typechecking - we have a lot of timer
>> callbacks that start with casting the argument (unsigned long, not void *,
>> but that's not a big deal) to whatever it is that callback really wants,
>> with setup_timer() callers explicitly casting that whatever the callback
>> really wants to unsigned long.  Which, of course, defeats the typechecking
>> by both cc(1) and sparse(1)...
>> 
>> I still hope for better solution, though...  Comments?
>
> Hmm...
>
> #define set_delayed_type(call, f, arg)
> \
>   (sizeof(f(arg),0),  \
>   __set_delayed_type(call, ({ \
>   void _(void *__) {f((typeof(arg))(unsigned long)__);}   \
>   _;}), (void *)arg))
>
> woult do nicely, but it's a gccism, and the one clang doesn't support
> ;-/

Careful. While this may appear to work, I'm pretty sure it would break
horribly when f is not the actual name of a function but a function
pointer. Not that it won't compile, but it would presumably make gcc
emit what they call a 'trampoline', a small piece of executable code
located on the stack, allowing the stub to access the containing
function's local variables (namely, f). I don't know if kernel stacks
are even executable, but even then it's a bad idea, once the stack frame
containing the trampoline is gone. All of this could easily happen down
the line when someone sees a bunch of similar code and decides to factor
that into a helper(), and then it works whenever gcc decides to inline
the helper to both call sites and otherwise, as the gcc docs put it,
"all hell breaks loose". IOWs, an accident waiting to happen.

One could probably ensure that the macro is always used with a
compile/link-time constant by putting in some trick like

static void *link_time_constant_please = f;

but even then I wouldn't be completely sure that gcc would always emit a
stub as simple as the one you showed.

Rasmus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET v2] ->follow_link() without dropping from RCU mode

2015-12-13 Thread Rasmus Villemoes
On Sat, Dec 12 2015, Al Viro  wrote:

> On Fri, Dec 11, 2015 at 11:16:24PM +, Al Viro wrote:
>
>> #define set_delayed_type(call, f, arg) \
>>  sizeof(f(arg),0), \
>>  __set_delayed_type(call, __closure_##f, (void *)arg)
>> 
>> That could be reused for timers with typechecking - we have a lot of timer
>> callbacks that start with casting the argument (unsigned long, not void *,
>> but that's not a big deal) to whatever it is that callback really wants,
>> with setup_timer() callers explicitly casting that whatever the callback
>> really wants to unsigned long.  Which, of course, defeats the typechecking
>> by both cc(1) and sparse(1)...
>> 
>> I still hope for better solution, though...  Comments?
>
> Hmm...
>
> #define set_delayed_type(call, f, arg)
> \
>   (sizeof(f(arg),0),  \
>   __set_delayed_type(call, ({ \
>   void _(void *__) {f((typeof(arg))(unsigned long)__);}   \
>   _;}), (void *)arg))
>
> woult do nicely, but it's a gccism, and the one clang doesn't support
> ;-/

Careful. While this may appear to work, I'm pretty sure it would break
horribly when f is not the actual name of a function but a function
pointer. Not that it won't compile, but it would presumably make gcc
emit what they call a 'trampoline', a small piece of executable code
located on the stack, allowing the stub to access the containing
function's local variables (namely, f). I don't know if kernel stacks
are even executable, but even then it's a bad idea, once the stack frame
containing the trampoline is gone. All of this could easily happen down
the line when someone sees a bunch of similar code and decides to factor
that into a helper(), and then it works whenever gcc decides to inline
the helper to both call sites and otherwise, as the gcc docs put it,
"all hell breaks loose". IOWs, an accident waiting to happen.

One could probably ensure that the macro is always used with a
compile/link-time constant by putting in some trick like

static void *link_time_constant_please = f;

but even then I wouldn't be completely sure that gcc would always emit a
stub as simple as the one you showed.

Rasmus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET v2] ->follow_link() without dropping from RCU mode

2015-12-12 Thread Al Viro
On Fri, Dec 11, 2015 at 01:54:25AM +, Al Viro wrote:

>   BTW, why are we passing unsigned long to free_page()?  We have
> a bit under 700 callers; excluding the ones that have an explicit cast
> to unsigned long right in the argument of call leaves ~150, and the rest
> tend to contain a lot of pointer casts of unsigned long thing they are feeding
> to free_page() (IOW, they would be just as happy if they kept it as a pointer
> all along).  Sure, that would mean __get_free_page() et.al. returning void *,
> but I don't see any problems with that either...  Is that just for historical
> reasons, or is there anything more subtle I'm missing here?

The situation with free_pages() is even funnier - we have 313 call sites,
and after converting it to void(void *, unsigned) 31 of them need casts
to void *.  Right now the mainline has 249 of those call sites with
cast to unsigned long (or ulong, in several places).  Then there's a bunch
of places where we do __get_free_pages(), then use it a lot (all with
casts to pointers), then pass to free_pages() - those would be just fine
with storing it as void *, but even leaving those aside...

The current signature is contrary to actual use - nearly 80% of call
sites are forced to cast a pointer to unsigned long, only to have it
cast back to void * in free_pages() itself, we would obviously be better
off if we'd switched to just passing the damn thing as a pointer.  Especially
since that 80% turn into 90% once you add the callers that could easily
switch to storing the value eventually passed to free_pages() as a pointer.

And free_page() is basically the same story, only with twice as many
call sites...

While we are at it: __get_free_pages() has 238 call sites.  193 of them
immediately cast to pointer.  And there's a bunch of places like this:
static inline void __tlb_alloc_page(struct mmu_gather *tlb)
{
unsigned long addr = __get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);

if (addr) {
tlb->pages = (void *)addr;
tlb->max = PAGE_SIZE / sizeof(struct page *);
}
}
where the cast is not immediate, but might as well had been.  And conversion
of free_pages() whittles it down even more.  For __get_free_page() the
situation is about the same, ditto for get_zeroed_page().

I realize that get_free_page() has been returning unsigned long since 0.01,
but looking at 0.01... it might as well had been returning void * - wouldn't
be more inconvenient.  The kludge you had in get_pipe_inode() would've been
a bit more obviously wrong, but that's about it ;-)

Seriously, though - what do you think about a flagday commit right after
4.5-rc1 switching all those guys to void *?  For __get_user_pages(),
__get_user_page(), get_zeroed_page() - return values, for free_pages(),
free_page() - argument.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET v2] ->follow_link() without dropping from RCU mode

2015-12-12 Thread Al Viro
On Fri, Dec 11, 2015 at 01:54:25AM +, Al Viro wrote:

>   BTW, why are we passing unsigned long to free_page()?  We have
> a bit under 700 callers; excluding the ones that have an explicit cast
> to unsigned long right in the argument of call leaves ~150, and the rest
> tend to contain a lot of pointer casts of unsigned long thing they are feeding
> to free_page() (IOW, they would be just as happy if they kept it as a pointer
> all along).  Sure, that would mean __get_free_page() et.al. returning void *,
> but I don't see any problems with that either...  Is that just for historical
> reasons, or is there anything more subtle I'm missing here?

The situation with free_pages() is even funnier - we have 313 call sites,
and after converting it to void(void *, unsigned) 31 of them need casts
to void *.  Right now the mainline has 249 of those call sites with
cast to unsigned long (or ulong, in several places).  Then there's a bunch
of places where we do __get_free_pages(), then use it a lot (all with
casts to pointers), then pass to free_pages() - those would be just fine
with storing it as void *, but even leaving those aside...

The current signature is contrary to actual use - nearly 80% of call
sites are forced to cast a pointer to unsigned long, only to have it
cast back to void * in free_pages() itself, we would obviously be better
off if we'd switched to just passing the damn thing as a pointer.  Especially
since that 80% turn into 90% once you add the callers that could easily
switch to storing the value eventually passed to free_pages() as a pointer.

And free_page() is basically the same story, only with twice as many
call sites...

While we are at it: __get_free_pages() has 238 call sites.  193 of them
immediately cast to pointer.  And there's a bunch of places like this:
static inline void __tlb_alloc_page(struct mmu_gather *tlb)
{
unsigned long addr = __get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);

if (addr) {
tlb->pages = (void *)addr;
tlb->max = PAGE_SIZE / sizeof(struct page *);
}
}
where the cast is not immediate, but might as well had been.  And conversion
of free_pages() whittles it down even more.  For __get_free_page() the
situation is about the same, ditto for get_zeroed_page().

I realize that get_free_page() has been returning unsigned long since 0.01,
but looking at 0.01... it might as well had been returning void * - wouldn't
be more inconvenient.  The kludge you had in get_pipe_inode() would've been
a bit more obviously wrong, but that's about it ;-)

Seriously, though - what do you think about a flagday commit right after
4.5-rc1 switching all those guys to void *?  For __get_user_pages(),
__get_user_page(), get_zeroed_page() - return values, for free_pages(),
free_page() - argument.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET v2] ->follow_link() without dropping from RCU mode

2015-12-11 Thread Al Viro
On Fri, Dec 11, 2015 at 11:16:24PM +, Al Viro wrote:

> #define set_delayed_type(call, f, arg) \
>   sizeof(f(arg),0), \
>   __set_delayed_type(call, __closure_##f, (void *)arg)
> 
> That could be reused for timers with typechecking - we have a lot of timer
> callbacks that start with casting the argument (unsigned long, not void *,
> but that's not a big deal) to whatever it is that callback really wants,
> with setup_timer() callers explicitly casting that whatever the callback
> really wants to unsigned long.  Which, of course, defeats the typechecking
> by both cc(1) and sparse(1)...
> 
> I still hope for better solution, though...  Comments?

Hmm...

#define set_delayed_type(call, f, arg)  \
(sizeof(f(arg),0),  \
__set_delayed_type(call, ({ \
void _(void *__) {f((typeof(arg))(unsigned long)__);}   \
_;}), (void *)arg))

woult do nicely, but it's a gccism, and the one clang doesn't support ;-/
gcc support goes back at least to 4.1, not sure about the earlier branches
(and not at all sure how flaky it was/is, obviously).

But this thing works for e.g. f being void(char) and arg being char,
and AFAICS generates correct code - we end up with a stub like
_.1765:
movsbl  %dil, %edi
jmp f
and its address gets stored in call->fn, which should DTRT on call...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET v2] ->follow_link() without dropping from RCU mode

2015-12-11 Thread Al Viro
On Fri, Dec 11, 2015 at 08:49:24AM +0100, Rasmus Villemoes wrote:

> union delayed_call_fn {
>   void (*fn)(void *);
>   void (*kfree_like)(const void *);
> } __attribute__((__transparent_union__));
>
> void
> set_delayed_call(struct delayed_call *call, union delayed_call_fn u, void 
> *arg)
> {
>   call->fn = u.fn;
>   call->arg = arg;
> }

Yecc...  If we are into that kind of gccisms, I'd rather use
__builtin_choose_expr/__builtin_types_compatible_p.  At least that is
used elsewhere in the kernel; __transparent_union__ kludge isn't.  Sure, it
means having set_delayed_call() a macro, but IMO it's less nasty that way...

FWIW, another possibility is to have
#define CLOSURE_CALLBACK(f,type) \
static inline void __closure_##f(void *p) {return f((type)p);}

#define set_delayed_type(call, f, arg) \
sizeof(f(arg),0), \
__set_delayed_type(call, __closure_##f, (void *)arg)

That could be reused for timers with typechecking - we have a lot of timer
callbacks that start with casting the argument (unsigned long, not void *,
but that's not a big deal) to whatever it is that callback really wants,
with setup_timer() callers explicitly casting that whatever the callback
really wants to unsigned long.  Which, of course, defeats the typechecking
by both cc(1) and sparse(1)...

I still hope for better solution, though...  Comments?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET v2] ->follow_link() without dropping from RCU mode

2015-12-11 Thread Al Viro
On Fri, Dec 11, 2015 at 08:49:24AM +0100, Rasmus Villemoes wrote:

> union delayed_call_fn {
>   void (*fn)(void *);
>   void (*kfree_like)(const void *);
> } __attribute__((__transparent_union__));
>
> void
> set_delayed_call(struct delayed_call *call, union delayed_call_fn u, void 
> *arg)
> {
>   call->fn = u.fn;
>   call->arg = arg;
> }

Yecc...  If we are into that kind of gccisms, I'd rather use
__builtin_choose_expr/__builtin_types_compatible_p.  At least that is
used elsewhere in the kernel; __transparent_union__ kludge isn't.  Sure, it
means having set_delayed_call() a macro, but IMO it's less nasty that way...

FWIW, another possibility is to have
#define CLOSURE_CALLBACK(f,type) \
static inline void __closure_##f(void *p) {return f((type)p);}

#define set_delayed_type(call, f, arg) \
sizeof(f(arg),0), \
__set_delayed_type(call, __closure_##f, (void *)arg)

That could be reused for timers with typechecking - we have a lot of timer
callbacks that start with casting the argument (unsigned long, not void *,
but that's not a big deal) to whatever it is that callback really wants,
with setup_timer() callers explicitly casting that whatever the callback
really wants to unsigned long.  Which, of course, defeats the typechecking
by both cc(1) and sparse(1)...

I still hope for better solution, though...  Comments?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET v2] ->follow_link() without dropping from RCU mode

2015-12-11 Thread Al Viro
On Fri, Dec 11, 2015 at 11:16:24PM +, Al Viro wrote:

> #define set_delayed_type(call, f, arg) \
>   sizeof(f(arg),0), \
>   __set_delayed_type(call, __closure_##f, (void *)arg)
> 
> That could be reused for timers with typechecking - we have a lot of timer
> callbacks that start with casting the argument (unsigned long, not void *,
> but that's not a big deal) to whatever it is that callback really wants,
> with setup_timer() callers explicitly casting that whatever the callback
> really wants to unsigned long.  Which, of course, defeats the typechecking
> by both cc(1) and sparse(1)...
> 
> I still hope for better solution, though...  Comments?

Hmm...

#define set_delayed_type(call, f, arg)  \
(sizeof(f(arg),0),  \
__set_delayed_type(call, ({ \
void _(void *__) {f((typeof(arg))(unsigned long)__);}   \
_;}), (void *)arg))

woult do nicely, but it's a gccism, and the one clang doesn't support ;-/
gcc support goes back at least to 4.1, not sure about the earlier branches
(and not at all sure how flaky it was/is, obviously).

But this thing works for e.g. f being void(char) and arg being char,
and AFAICS generates correct code - we end up with a stub like
_.1765:
movsbl  %dil, %edi
jmp f
and its address gets stored in call->fn, which should DTRT on call...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET v2] ->follow_link() without dropping from RCU mode

2015-12-10 Thread Rasmus Villemoes
On Fri, Dec 11 2015, Al Viro  wrote:

> I would really love to be able to say
>   set_delayed_call(done, kfree, p);
> but as it is I had to keep a wrapper - void kfree_link(void *).  The problem
> is, you can't assign void f(const void *) to void (*p)(void *) - mismatch of
> qualifiers in the arguments makes the latter not assignment-compatible with
> the former.  If there's a clever trick allowing to sidestep that, I'd be
> very happy; I don't know one.  Any ideas not starting with "use C11" (or,
> worse yet, "use such and such C++ misfeature with arseloads of RTL required
> in order to implement it") would be welcome...

I _think_ this satisfies these very reasonable criteria. What you're
looking for is presumably __attribute__((__transparent_union__)). At
least this compiles without warnings at -Wall -Wextra and gives the
expected disassembly, and the gcc docs mention transparent_union at
least back to 4.0.4.

#include 

struct delayed_call {
void (*fn)(void *);
void *arg;
};
union delayed_call_fn {
void (*fn)(void *);
void (*kfree_like)(const void *);
} __attribute__((__transparent_union__));

void
set_delayed_call(struct delayed_call *call, union delayed_call_fn u, void *arg)
{
call->fn = u.fn;
call->arg = arg;
}

void some_cb(void *);
void kfree(const void *);
extern struct delayed_call done;

void test(void)
{
set_delayed_call(, some_cb, NULL);
set_delayed_call(, kfree, NULL);
}

Rasmus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET v2] ->follow_link() without dropping from RCU mode

2015-12-10 Thread Al Viro
On Thu, Dec 10, 2015 at 02:40:50AM +, Al Viro wrote:
> ... except that it would have to be something like work_struct, since
> callback_head gets embedded into the argument and here we don't want to
> do that.  _And_ we've no use for ->entry in work_struct, which makes it
> a bad fit as well.  IOW, please ignore that idiocy...

OK, the following (incremental to that series) seems to be better; in
particular, overlayfs ->get_link() doesn't allocate anything whatsoever,
->put_link() as a method is gone and instead of *cookie =  we do
set_delayed_call(done, callback, argument)
which is a whole lot more straightforward.  Memory footprint cost is
trivial - one extra pointer in struct nameidata.

Basically, we add a new structure (struct delayed_call; poor man's closure)
and pass a reference to that to ->get_link(); if anything needs to be done
once we are done with the string returned by ->get_link(), just fill that
structure (as above) and that's it.  It will be evaluated by caller once
it's through with the symlink body.

I'd put it in linux/fs.h for now, but it certainly belongs in something like
linux/types.h; the current location is temporary.  And I'm quite sure that
we have other places where something of that sort is open-coded - timers,
for one thing and probably quite a few other things.

I would really love to be able to say
set_delayed_call(done, kfree, p);
but as it is I had to keep a wrapper - void kfree_link(void *).  The problem
is, you can't assign void f(const void *) to void (*p)(void *) - mismatch of
qualifiers in the arguments makes the latter not assignment-compatible with
the former.  If there's a clever trick allowing to sidestep that, I'd be
very happy; I don't know one.  Any ideas not starting with "use C11" (or,
worse yet, "use such and such C++ misfeature with arseloads of RTL required
in order to implement it") would be welcome...

BTW, why are we passing unsigned long to free_page()?  We have
a bit under 700 callers; excluding the ones that have an explicit cast
to unsigned long right in the argument of call leaves ~150, and the rest
tend to contain a lot of pointer casts of unsigned long thing they are feeding
to free_page() (IOW, they would be just as happy if they kept it as a pointer
all along).  Sure, that would mean __get_free_page() et.al. returning void *,
but I don't see any problems with that either...  Is that just for historical
reasons, or is there anything more subtle I'm missing here?

Anyway, for now the patch below is strictly for review and (if one
is brave enough) testing.  Comments would be very welcome.

diff --git a/drivers/staging/lustre/lustre/llite/symlink.c 
b/drivers/staging/lustre/lustre/llite/symlink.c
index e5c4137..2610348 100644
--- a/drivers/staging/lustre/lustre/llite/symlink.c
+++ b/drivers/staging/lustre/lustre/llite/symlink.c
@@ -118,8 +118,14 @@ failed:
return rc;
 }
 
+static void ll_put_link(void *p)
+{
+   ptlrpc_req_finished(p);
+}
+
 static const char *ll_get_link(struct dentry *dentry,
-  struct inode *inode, void **cookie)
+  struct inode *inode,
+  struct delayed_call *done)
 {
struct ptlrpc_request *request = NULL;
int rc;
@@ -137,22 +143,16 @@ static const char *ll_get_link(struct dentry *dentry,
}
 
/* symname may contain a pointer to the request message buffer,
-* we delay request releasing until ll_put_link then.
+* we delay request releasing then.
 */
-   *cookie = request;
+   set_delayed_call(done, ll_put_link, request);
return symname;
 }
 
-static void ll_put_link(struct inode *unused, void *cookie)
-{
-   ptlrpc_req_finished(cookie);
-}
-
 const struct inode_operations ll_fast_symlink_inode_operations = {
.readlink   = generic_readlink,
.setattr= ll_setattr,
.get_link   = ll_get_link,
-   .put_link   = ll_put_link,
.getattr= ll_getattr,
.permission = ll_inode_permission,
.setxattr   = ll_setxattr,
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index a237f47..c7cc7c3 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -1226,11 +1226,12 @@ ino_t v9fs_qid2ino(struct p9_qid *qid)
  * v9fs_vfs_get_link - follow a symlink path
  * @dentry: dentry for symlink
  * @inode: inode for symlink
- * @cookie: place to pass the data to put_link()
+ * @done: delayed call for when we are done with the return value
  */
 
 static const char *v9fs_vfs_get_link(struct dentry *dentry,
-struct inode *inode, void **cookie)
+struct inode *inode,
+struct delayed_call *done)
 {
struct v9fs_session_info *v9ses;
struct p9_fid *fid;
@@ -1266,7 +1267,8 @@ static const char *v9fs_vfs_get_link(struct dentry 
*dentry,
 

Re: [PATCHSET v2] ->follow_link() without dropping from RCU mode

2015-12-10 Thread Al Viro
On Thu, Dec 10, 2015 at 02:40:50AM +, Al Viro wrote:
> ... except that it would have to be something like work_struct, since
> callback_head gets embedded into the argument and here we don't want to
> do that.  _And_ we've no use for ->entry in work_struct, which makes it
> a bad fit as well.  IOW, please ignore that idiocy...

OK, the following (incremental to that series) seems to be better; in
particular, overlayfs ->get_link() doesn't allocate anything whatsoever,
->put_link() as a method is gone and instead of *cookie =  we do
set_delayed_call(done, callback, argument)
which is a whole lot more straightforward.  Memory footprint cost is
trivial - one extra pointer in struct nameidata.

Basically, we add a new structure (struct delayed_call; poor man's closure)
and pass a reference to that to ->get_link(); if anything needs to be done
once we are done with the string returned by ->get_link(), just fill that
structure (as above) and that's it.  It will be evaluated by caller once
it's through with the symlink body.

I'd put it in linux/fs.h for now, but it certainly belongs in something like
linux/types.h; the current location is temporary.  And I'm quite sure that
we have other places where something of that sort is open-coded - timers,
for one thing and probably quite a few other things.

I would really love to be able to say
set_delayed_call(done, kfree, p);
but as it is I had to keep a wrapper - void kfree_link(void *).  The problem
is, you can't assign void f(const void *) to void (*p)(void *) - mismatch of
qualifiers in the arguments makes the latter not assignment-compatible with
the former.  If there's a clever trick allowing to sidestep that, I'd be
very happy; I don't know one.  Any ideas not starting with "use C11" (or,
worse yet, "use such and such C++ misfeature with arseloads of RTL required
in order to implement it") would be welcome...

BTW, why are we passing unsigned long to free_page()?  We have
a bit under 700 callers; excluding the ones that have an explicit cast
to unsigned long right in the argument of call leaves ~150, and the rest
tend to contain a lot of pointer casts of unsigned long thing they are feeding
to free_page() (IOW, they would be just as happy if they kept it as a pointer
all along).  Sure, that would mean __get_free_page() et.al. returning void *,
but I don't see any problems with that either...  Is that just for historical
reasons, or is there anything more subtle I'm missing here?

Anyway, for now the patch below is strictly for review and (if one
is brave enough) testing.  Comments would be very welcome.

diff --git a/drivers/staging/lustre/lustre/llite/symlink.c 
b/drivers/staging/lustre/lustre/llite/symlink.c
index e5c4137..2610348 100644
--- a/drivers/staging/lustre/lustre/llite/symlink.c
+++ b/drivers/staging/lustre/lustre/llite/symlink.c
@@ -118,8 +118,14 @@ failed:
return rc;
 }
 
+static void ll_put_link(void *p)
+{
+   ptlrpc_req_finished(p);
+}
+
 static const char *ll_get_link(struct dentry *dentry,
-  struct inode *inode, void **cookie)
+  struct inode *inode,
+  struct delayed_call *done)
 {
struct ptlrpc_request *request = NULL;
int rc;
@@ -137,22 +143,16 @@ static const char *ll_get_link(struct dentry *dentry,
}
 
/* symname may contain a pointer to the request message buffer,
-* we delay request releasing until ll_put_link then.
+* we delay request releasing then.
 */
-   *cookie = request;
+   set_delayed_call(done, ll_put_link, request);
return symname;
 }
 
-static void ll_put_link(struct inode *unused, void *cookie)
-{
-   ptlrpc_req_finished(cookie);
-}
-
 const struct inode_operations ll_fast_symlink_inode_operations = {
.readlink   = generic_readlink,
.setattr= ll_setattr,
.get_link   = ll_get_link,
-   .put_link   = ll_put_link,
.getattr= ll_getattr,
.permission = ll_inode_permission,
.setxattr   = ll_setxattr,
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index a237f47..c7cc7c3 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -1226,11 +1226,12 @@ ino_t v9fs_qid2ino(struct p9_qid *qid)
  * v9fs_vfs_get_link - follow a symlink path
  * @dentry: dentry for symlink
  * @inode: inode for symlink
- * @cookie: place to pass the data to put_link()
+ * @done: delayed call for when we are done with the return value
  */
 
 static const char *v9fs_vfs_get_link(struct dentry *dentry,
-struct inode *inode, void **cookie)
+struct inode *inode,
+struct delayed_call *done)
 {
struct v9fs_session_info *v9ses;
struct p9_fid *fid;
@@ -1266,7 +1267,8 @@ static const char *v9fs_vfs_get_link(struct dentry 
*dentry,
 

Re: [PATCHSET v2] ->follow_link() without dropping from RCU mode

2015-12-10 Thread Rasmus Villemoes
On Fri, Dec 11 2015, Al Viro  wrote:

> I would really love to be able to say
>   set_delayed_call(done, kfree, p);
> but as it is I had to keep a wrapper - void kfree_link(void *).  The problem
> is, you can't assign void f(const void *) to void (*p)(void *) - mismatch of
> qualifiers in the arguments makes the latter not assignment-compatible with
> the former.  If there's a clever trick allowing to sidestep that, I'd be
> very happy; I don't know one.  Any ideas not starting with "use C11" (or,
> worse yet, "use such and such C++ misfeature with arseloads of RTL required
> in order to implement it") would be welcome...

I _think_ this satisfies these very reasonable criteria. What you're
looking for is presumably __attribute__((__transparent_union__)). At
least this compiles without warnings at -Wall -Wextra and gives the
expected disassembly, and the gcc docs mention transparent_union at
least back to 4.0.4.

#include 

struct delayed_call {
void (*fn)(void *);
void *arg;
};
union delayed_call_fn {
void (*fn)(void *);
void (*kfree_like)(const void *);
} __attribute__((__transparent_union__));

void
set_delayed_call(struct delayed_call *call, union delayed_call_fn u, void *arg)
{
call->fn = u.fn;
call->arg = arg;
}

void some_cb(void *);
void kfree(const void *);
extern struct delayed_call done;

void test(void)
{
set_delayed_call(, some_cb, NULL);
set_delayed_call(, kfree, NULL);
}

Rasmus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET v2] ->follow_link() without dropping from RCU mode

2015-12-09 Thread Al Viro
On Thu, Dec 10, 2015 at 12:10:12AM +, Al Viro wrote:

> PS: I toyed with the idea of replacing cookie a struct callback_head, to make

... except that it would have to be something like work_struct, since
callback_head gets embedded into the argument and here we don't want to
do that.  _And_ we've no use for ->entry in work_struct, which makes it
a bad fit as well.  IOW, please ignore that idiocy...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET v2] ->follow_link() without dropping from RCU mode

2015-12-09 Thread Al Viro
On Wed, Dec 09, 2015 at 06:23:09PM +, Al Viro wrote:

> What's more, that dentry might very well have gone negative by that
> point.  Think what happens if, during the symlink traversal, we run
> into the hard "restart from scratch in non-RCU mode".  We'll need to
> do ->put_link() on everything we have in stack.  Regardless of what
> might've happened to dentries/inodes of symlinks involved - all we
> are really promised is that RCU-delayed parts of their destruction hadn't
> been entered yet.
> 
> Note, BTW, that *all* ->put_link() instances ignore the inode argument
> and I seriously considered dropping it completely.

PS: I toyed with the idea of replacing cookie a struct callback_head, to make
it really obvious what's going on - we are leaving a closure to be evaluated
once the caller is done with our return value.  That way ->put_link() as a
method would've been gone and we would've had something like
init_callback(destructor, page_put_link, page);
instead of
*cookie = page;
plus
.put_link = page_put_link,
we have right now, etc.

The reason not to go that way (and not all that strong, at that) is that
we get an extra pointer in struct saved, i.e. slightly heavier nameidata
(there are two struct saved embedded into it).  That, or play silly buggers
with
union {
struct callback_head callback;
struct inode *inode;
};
in struct saved - we need to stash a pointer to inode between pick_link()
and get_link().

Hell knows, still might make sense to go there...  What we want is to set
a closure to be evaluated once the caller of ->get_link() is done with
the string ->get_link() returns.  We end up picking the function to be
called at saved->inode->i_op->put_link and store its argument in
saved->cookie.  Might as well just use the representation we use for such
stuff (rcu_head, etc.) and be done with that...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET v2] ->follow_link() without dropping from RCU mode

2015-12-09 Thread NeilBrown

Hi Al,
 thanks a lot for this.  I particularly like that you renamed
 "follow_link" which didn't really describe what the function did any
 more.

 Apart from the changelog comment in 4/11 which starts
   "It was to needed for ..."

 I can't fault anything.

Thanks,
NeilBrown


signature.asc
Description: PGP signature


Re: [PATCHSET v2] ->follow_link() without dropping from RCU mode

2015-12-09 Thread Al Viro
On Wed, Dec 09, 2015 at 09:24:32AM -0800, Linus Torvalds wrote:

>That's because of the HIGHMEM crap that you removed. Without
> highmem, the page data address specifies the page fine without any
> cookie.
> 
>  - overlayfs, which uses it to create a temporary data storage.
> 
>Why? Mainly because it wants to hide the cookie in there. D'oh!

Not really - it also wants to stash a way to underlying inode in there.
See below...

>  - proc, which uses it to save the proc_dir_entry
> 
>Why? The reason seems to be that we don't call put_link() with the
> dentry that get_link got, so it can't look it up the sane way.

What's more, that dentry might very well have gone negative by that
point.  Think what happens if, during the symlink traversal, we run
into the hard "restart from scratch in non-RCU mode".  We'll need to
do ->put_link() on everything we have in stack.  Regardless of what
might've happened to dentries/inodes of symlinks involved - all we
are really promised is that RCU-delayed parts of their destruction hadn't
been entered yet.

Note, BTW, that *all* ->put_link() instances ignore the inode argument
and I seriously considered dropping it completely.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET v2] ->follow_link() without dropping from RCU mode

2015-12-09 Thread Linus Torvalds
On Tue, Dec 8, 2015 at 9:32 PM, Al Viro  wrote:
>
> Any help with review and testing would be welcome.  FWIW, it seems to survive
> the beating here.

It looks ok to me, but I *hate* that "cookie" thing we have. And I
don't actually see any reason for it.

Every single user except for a couple seem to just put the resulting
symlink address into the cookie (and then put_link() will kfree it or
put_page() it). And the calling convention would be *so* much clearer
if we just made things be

   char *get_link(dentry, inode);
   put_link(dentry, inode, char *);

and get rid of that cookie entirely.

The only users of the cookie seem to be

 - we currently use it for the page pointer vs the page data address.

   That's because of the HIGHMEM crap that you removed. Without
highmem, the page data address specifies the page fine without any
cookie.

 - overlayfs, which uses it to create a temporary data storage.

   Why? Mainly because it wants to hide the cookie in there. D'oh!

 - proc, which uses it to save the proc_dir_entry

   Why? The reason seems to be that we don't call put_link() with the
dentry that get_link got, so it can't look it up the sane way.

In other words, as far as I can tell, the cookie is just garbage now.

But maybe I missed something.

Other than that, I like the series. And I realize that technically the
"cookie removal" would be a separate thing, but since you're changing
the calling convention from "follow_link()" to "get_link()" in this
series, I _really_ think that as part of that calling convention
change we should just get rid of the cookie, rather than have
*another* calling convention change later.

Hmm?

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET v2] ->follow_link() without dropping from RCU mode

2015-12-09 Thread Linus Torvalds
On Tue, Dec 8, 2015 at 9:32 PM, Al Viro  wrote:
>
> Any help with review and testing would be welcome.  FWIW, it seems to survive
> the beating here.

It looks ok to me, but I *hate* that "cookie" thing we have. And I
don't actually see any reason for it.

Every single user except for a couple seem to just put the resulting
symlink address into the cookie (and then put_link() will kfree it or
put_page() it). And the calling convention would be *so* much clearer
if we just made things be

   char *get_link(dentry, inode);
   put_link(dentry, inode, char *);

and get rid of that cookie entirely.

The only users of the cookie seem to be

 - we currently use it for the page pointer vs the page data address.

   That's because of the HIGHMEM crap that you removed. Without
highmem, the page data address specifies the page fine without any
cookie.

 - overlayfs, which uses it to create a temporary data storage.

   Why? Mainly because it wants to hide the cookie in there. D'oh!

 - proc, which uses it to save the proc_dir_entry

   Why? The reason seems to be that we don't call put_link() with the
dentry that get_link got, so it can't look it up the sane way.

In other words, as far as I can tell, the cookie is just garbage now.

But maybe I missed something.

Other than that, I like the series. And I realize that technically the
"cookie removal" would be a separate thing, but since you're changing
the calling convention from "follow_link()" to "get_link()" in this
series, I _really_ think that as part of that calling convention
change we should just get rid of the cookie, rather than have
*another* calling convention change later.

Hmm?

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET v2] ->follow_link() without dropping from RCU mode

2015-12-09 Thread Al Viro
On Wed, Dec 09, 2015 at 09:24:32AM -0800, Linus Torvalds wrote:

>That's because of the HIGHMEM crap that you removed. Without
> highmem, the page data address specifies the page fine without any
> cookie.
> 
>  - overlayfs, which uses it to create a temporary data storage.
> 
>Why? Mainly because it wants to hide the cookie in there. D'oh!

Not really - it also wants to stash a way to underlying inode in there.
See below...

>  - proc, which uses it to save the proc_dir_entry
> 
>Why? The reason seems to be that we don't call put_link() with the
> dentry that get_link got, so it can't look it up the sane way.

What's more, that dentry might very well have gone negative by that
point.  Think what happens if, during the symlink traversal, we run
into the hard "restart from scratch in non-RCU mode".  We'll need to
do ->put_link() on everything we have in stack.  Regardless of what
might've happened to dentries/inodes of symlinks involved - all we
are really promised is that RCU-delayed parts of their destruction hadn't
been entered yet.

Note, BTW, that *all* ->put_link() instances ignore the inode argument
and I seriously considered dropping it completely.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET v2] ->follow_link() without dropping from RCU mode

2015-12-09 Thread NeilBrown

Hi Al,
 thanks a lot for this.  I particularly like that you renamed
 "follow_link" which didn't really describe what the function did any
 more.

 Apart from the changelog comment in 4/11 which starts
   "It was to needed for ..."

 I can't fault anything.

Thanks,
NeilBrown


signature.asc
Description: PGP signature


Re: [PATCHSET v2] ->follow_link() without dropping from RCU mode

2015-12-09 Thread Al Viro
On Wed, Dec 09, 2015 at 06:23:09PM +, Al Viro wrote:

> What's more, that dentry might very well have gone negative by that
> point.  Think what happens if, during the symlink traversal, we run
> into the hard "restart from scratch in non-RCU mode".  We'll need to
> do ->put_link() on everything we have in stack.  Regardless of what
> might've happened to dentries/inodes of symlinks involved - all we
> are really promised is that RCU-delayed parts of their destruction hadn't
> been entered yet.
> 
> Note, BTW, that *all* ->put_link() instances ignore the inode argument
> and I seriously considered dropping it completely.

PS: I toyed with the idea of replacing cookie a struct callback_head, to make
it really obvious what's going on - we are leaving a closure to be evaluated
once the caller is done with our return value.  That way ->put_link() as a
method would've been gone and we would've had something like
init_callback(destructor, page_put_link, page);
instead of
*cookie = page;
plus
.put_link = page_put_link,
we have right now, etc.

The reason not to go that way (and not all that strong, at that) is that
we get an extra pointer in struct saved, i.e. slightly heavier nameidata
(there are two struct saved embedded into it).  That, or play silly buggers
with
union {
struct callback_head callback;
struct inode *inode;
};
in struct saved - we need to stash a pointer to inode between pick_link()
and get_link().

Hell knows, still might make sense to go there...  What we want is to set
a closure to be evaluated once the caller of ->get_link() is done with
the string ->get_link() returns.  We end up picking the function to be
called at saved->inode->i_op->put_link and store its argument in
saved->cookie.  Might as well just use the representation we use for such
stuff (rcu_head, etc.) and be done with that...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET v2] ->follow_link() without dropping from RCU mode

2015-12-09 Thread Al Viro
On Thu, Dec 10, 2015 at 12:10:12AM +, Al Viro wrote:

> PS: I toyed with the idea of replacing cookie a struct callback_head, to make

... except that it would have to be something like work_struct, since
callback_head gets embedded into the argument and here we don't want to
do that.  _And_ we've no use for ->entry in work_struct, which makes it
a bad fit as well.  IOW, please ignore that idiocy...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHSET v2] ->follow_link() without dropping from RCU mode

2015-12-08 Thread Al Viro
On Tue, Nov 17, 2015 at 10:57:52PM +, Al Viro wrote:

>   Right now we stay in RCU mode for fast symlink traversal.
> However, anything trickier drops out of RCU mode - back in 4.2
> the symlink-related pile had grown too large to add this on top
> of everything else.  Below is an attempt to do that now.
> 
> Those who prefer to use git for review can find that series in
> git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git work.symlinks

Compared to the previous iteration:
* rebased to 4.4-rc4.
* NFS (in case of pagecache hit, of course) also stays in RCU mode.
* renamed inode_nohigh() as suggested by dchinner, moved to
fs/inode.c - less header PITA that way.

What's there:
01/11: switch befs long symlinks to page_symlink_operations
02/11: logfs: don't duplicate page_symlink_inode_operations
03/11: udf: don't duplicate page_symlink_inode_operations
04/11: ufs: get rid of ->setattr() for symlinks
Simplifying things a bit by switching them to page_symlink_operations
where possible.

05/11: namei: page_getlink() and page_follow_link_light() are the same thing
Get rid of some code duplication

06/11: don't put symlink bodies in pagecache into highmem
Bugfix for a long-standing mess.  For pagecache-based symlinks we end
up with kmap() of the body for the duration of traversal.  Which could take
a long time if we walk into e.g. slow automount along the way.  It's DoSable,
actually.  Not hard to fix - there's no reason to use GFP_HIGHUSER_MOVABLE
for those guys; GFP_USER serves just as well.

07/11: replace ->follow_link() with new method that could stay
Meat of the series - we switch to a new method (->get_link()) that
differs from ->follow_link() in getting dentry and inode separately; it
can be called in RCU mode (with NULL dentry) and it should bail out with
ERR_PTR(-ECHILD) if it needs non-RCU.  All instances converted, most of them
by making them bail out immediately in RCU mode.  Some are trivialy
RCU-safe, though; those do not bail out at all.

08/11: teach page_get_link() to work in RCU mode
09/11: teach shmem_get_link() to work in RCU mode
10/11: teach proc_self_get_link()/proc_thread_self_get_link()
11/11: teach nfs_get_link() to work in RCU mode
Teaching more instances to stay in RCU mode if they can.  (8) and
(11) are similar to what Neil had done back in March, except for the lack of
kmap mess to deal with - that's one of the benefits of having (6) done...

Missing, probably worth adding: XFS, overlayfs.  Maybe gfs2 - if it's
at all possible (not sure if it is).  Maybe autofs4 as well.

Procfs magical symlinks definitely are *not* worth trying to handle
in RCU mode - ->d_revalidate() on them is such that it'll kick us out of
RCU mode before we even see that it's a symlink, let alone try to follow it.

Any help with review and testing would be welcome.  FWIW, it seems to survive
the beating here.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHSET v2] ->follow_link() without dropping from RCU mode

2015-12-08 Thread Al Viro
On Tue, Nov 17, 2015 at 10:57:52PM +, Al Viro wrote:

>   Right now we stay in RCU mode for fast symlink traversal.
> However, anything trickier drops out of RCU mode - back in 4.2
> the symlink-related pile had grown too large to add this on top
> of everything else.  Below is an attempt to do that now.
> 
> Those who prefer to use git for review can find that series in
> git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git work.symlinks

Compared to the previous iteration:
* rebased to 4.4-rc4.
* NFS (in case of pagecache hit, of course) also stays in RCU mode.
* renamed inode_nohigh() as suggested by dchinner, moved to
fs/inode.c - less header PITA that way.

What's there:
01/11: switch befs long symlinks to page_symlink_operations
02/11: logfs: don't duplicate page_symlink_inode_operations
03/11: udf: don't duplicate page_symlink_inode_operations
04/11: ufs: get rid of ->setattr() for symlinks
Simplifying things a bit by switching them to page_symlink_operations
where possible.

05/11: namei: page_getlink() and page_follow_link_light() are the same thing
Get rid of some code duplication

06/11: don't put symlink bodies in pagecache into highmem
Bugfix for a long-standing mess.  For pagecache-based symlinks we end
up with kmap() of the body for the duration of traversal.  Which could take
a long time if we walk into e.g. slow automount along the way.  It's DoSable,
actually.  Not hard to fix - there's no reason to use GFP_HIGHUSER_MOVABLE
for those guys; GFP_USER serves just as well.

07/11: replace ->follow_link() with new method that could stay
Meat of the series - we switch to a new method (->get_link()) that
differs from ->follow_link() in getting dentry and inode separately; it
can be called in RCU mode (with NULL dentry) and it should bail out with
ERR_PTR(-ECHILD) if it needs non-RCU.  All instances converted, most of them
by making them bail out immediately in RCU mode.  Some are trivialy
RCU-safe, though; those do not bail out at all.

08/11: teach page_get_link() to work in RCU mode
09/11: teach shmem_get_link() to work in RCU mode
10/11: teach proc_self_get_link()/proc_thread_self_get_link()
11/11: teach nfs_get_link() to work in RCU mode
Teaching more instances to stay in RCU mode if they can.  (8) and
(11) are similar to what Neil had done back in March, except for the lack of
kmap mess to deal with - that's one of the benefits of having (6) done...

Missing, probably worth adding: XFS, overlayfs.  Maybe gfs2 - if it's
at all possible (not sure if it is).  Maybe autofs4 as well.

Procfs magical symlinks definitely are *not* worth trying to handle
in RCU mode - ->d_revalidate() on them is such that it'll kick us out of
RCU mode before we even see that it's a symlink, let alone try to follow it.

Any help with review and testing would be welcome.  FWIW, it seems to survive
the beating here.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/