Re: [PATCH net-next 1/4] indirect call wrappers: helpers to speed-up indirect calls of builtin

2018-12-05 Thread Paolo Abeni
On Tue, 2018-12-04 at 18:12 +, Edward Cree wrote:
> On 04/12/18 17:44, Paolo Abeni wrote:
> > On Tue, 2018-12-04 at 17:13 +, Edward Cree wrote:
> > > On 03/12/18 11:40, Paolo Abeni wrote:
> > > > This header define a bunch of helpers that allow avoiding the
> > > > retpoline overhead when calling builtin functions via function pointers.
> > > > It boils down to explicitly comparing the function pointers to
> > > > known builtin functions and eventually invoke directly the latter.
> > > > 
> > > > The macros defined here implement the boilerplate for the above schema
> > > > and will be used by the next patches.
> > > > 
> > > > rfc -> v1:
> > > >  - use branch prediction hint, as suggested by Eric
> > > > 
> > > > Suggested-by: Eric Dumazet 
> > > > Signed-off-by: Paolo Abeni 
> > > > ---
> > > I'm not sure I see the reason why this is done with numbers and
> > >  'name ## NR', adding extra distance between the callsite and the
> > >  list of callees.  In particular it means that each callable needs
> > >  to specify its index.
> > > Wouldn't it be simpler just to have
> > > #define 1(f, f1, ...) \
> > > (likely(f == f1) ? f1(__VA_ARGS__) : f(__VA_ARGS__))
> > > #define INDIRECT_CALL_2(f, f2, f1, ...) \
> > > (likely(f == f2) ? f2(__VA_ARGS__) : INDIRECT_CALL_1(f, f1, 
> > > __VA_ARGS__))
> > > etc.?  Removing the need for INDIRECT_CALLABLE_DECLARE_* entirely.
> > Thank you for the review!
> > 
> > As some of the builtin symbols are static, we would still need some
> > macro wrappers to properly specify the scope when retpoline is enabled.
> Ah I see, it hadn't occurred to me that static callees might not be
>  available at the callsite.  Makes sense now.  In that case, have my
>  Acked-By for this patch, if you want it.

I gave a shot to your idea, and after all I think is cleaner. So I'll
send v2 with that change.

Thanks,

Paolo



Re: [PATCH net-next 1/4] indirect call wrappers: helpers to speed-up indirect calls of builtin

2018-12-04 Thread Edward Cree
On 04/12/18 17:44, Paolo Abeni wrote:
> On Tue, 2018-12-04 at 17:13 +, Edward Cree wrote:
>> On 03/12/18 11:40, Paolo Abeni wrote:
>>> This header define a bunch of helpers that allow avoiding the
>>> retpoline overhead when calling builtin functions via function pointers.
>>> It boils down to explicitly comparing the function pointers to
>>> known builtin functions and eventually invoke directly the latter.
>>>
>>> The macros defined here implement the boilerplate for the above schema
>>> and will be used by the next patches.
>>>
>>> rfc -> v1:
>>>  - use branch prediction hint, as suggested by Eric
>>>
>>> Suggested-by: Eric Dumazet 
>>> Signed-off-by: Paolo Abeni 
>>> ---
>> I'm not sure I see the reason why this is done with numbers and
>>  'name ## NR', adding extra distance between the callsite and the
>>  list of callees.  In particular it means that each callable needs
>>  to specify its index.
>> Wouldn't it be simpler just to have
>> #define 1(f, f1, ...) \
>> (likely(f == f1) ? f1(__VA_ARGS__) : f(__VA_ARGS__))
>> #define INDIRECT_CALL_2(f, f2, f1, ...) \
>> (likely(f == f2) ? f2(__VA_ARGS__) : INDIRECT_CALL_1(f, f1, 
>> __VA_ARGS__))
>> etc.?  Removing the need for INDIRECT_CALLABLE_DECLARE_* entirely.
> Thank you for the review!
>
> As some of the builtin symbols are static, we would still need some
> macro wrappers to properly specify the scope when retpoline is enabled.
Ah I see, it hadn't occurred to me that static callees might not be
 available at the callsite.  Makes sense now.  In that case, have my
 Acked-By for this patch, if you want it.

> This:
> https://lore.kernel.org/lkml/cover.1543200841.git.jpoim...@redhat.com/T/#ma30f6b2aa655c99e93cfb267fef75b8fe9fca29b
>
> is possibly related to what you are planning.
Yes!  That looks like exactly the sort of machinery I need, thanks.
My idea is to build on that something that counts stats of where each
 indirect call goes, then every now and then patches in a new list of
 possible targets to an INDIRECT_CALL_*()-style jump tree, based on
 the N callees that were seen the most often.
(Such a solution could even cope with callees in modules: the jump
 table doesn't need a reference on the module because it only calls
 the function (from the module) if the function pointer is pointing at
 it, in which case (hopefully) whoever supplied the function pointer
 has a reference to the module.)
But I'm not sure I have the mad hacker skillz to make it work.

-Ed


Re: [PATCH net-next 1/4] indirect call wrappers: helpers to speed-up indirect calls of builtin

2018-12-04 Thread Paolo Abeni
On Tue, 2018-12-04 at 17:13 +, Edward Cree wrote:
> On 03/12/18 11:40, Paolo Abeni wrote:
> > This header define a bunch of helpers that allow avoiding the
> > retpoline overhead when calling builtin functions via function pointers.
> > It boils down to explicitly comparing the function pointers to
> > known builtin functions and eventually invoke directly the latter.
> > 
> > The macros defined here implement the boilerplate for the above schema
> > and will be used by the next patches.
> > 
> > rfc -> v1:
> >  - use branch prediction hint, as suggested by Eric
> > 
> > Suggested-by: Eric Dumazet 
> > Signed-off-by: Paolo Abeni 
> > ---
> I'm not sure I see the reason why this is done with numbers and
>  'name ## NR', adding extra distance between the callsite and the
>  list of callees.  In particular it means that each callable needs
>  to specify its index.
> Wouldn't it be simpler just to have
> #define 1(f, f1, ...) \
> (likely(f == f1) ? f1(__VA_ARGS__) : f(__VA_ARGS__))
> #define INDIRECT_CALL_2(f, f2, f1, ...) \
> (likely(f == f2) ? f2(__VA_ARGS__) : INDIRECT_CALL_1(f, f1, 
> __VA_ARGS__))
> etc.?  Removing the need for INDIRECT_CALLABLE_DECLARE_* entirely.

Thank you for the review!

As some of the builtin symbols are static, we would still need some
macro wrappers to properly specify the scope when retpoline is enabled.

Also, I think that f1, f2... declaration before INDIRECT_CALL_ would be
uglier, as we need to list there the function names (so we would have
the same list in 2 places).

Anyway this sounds really one thing that will enrage guys on lklm.
Suggestions for alternative solutions more than welcome ;)

> PS: this has reminded me of my desire to try runtime creation of
> these kinds of branch tables with self-modifying code

This:

https://lore.kernel.org/lkml/cover.1543200841.git.jpoim...@redhat.com/T/#ma30f6b2aa655c99e93cfb267fef75b8fe9fca29b

is possibly related to what you are planning. AFAICS should work only
for global function pointers, not for e.g. function ptr inside lists,
so the above and this series should be complementary.

Cheers,

Paolo





Re: [PATCH net-next 1/4] indirect call wrappers: helpers to speed-up indirect calls of builtin

2018-12-04 Thread Edward Cree
On 03/12/18 11:40, Paolo Abeni wrote:
> This header define a bunch of helpers that allow avoiding the
> retpoline overhead when calling builtin functions via function pointers.
> It boils down to explicitly comparing the function pointers to
> known builtin functions and eventually invoke directly the latter.
>
> The macros defined here implement the boilerplate for the above schema
> and will be used by the next patches.
>
> rfc -> v1:
>  - use branch prediction hint, as suggested by Eric
>
> Suggested-by: Eric Dumazet 
> Signed-off-by: Paolo Abeni 
> ---
I'm not sure I see the reason why this is done with numbers and
 'name ## NR', adding extra distance between the callsite and the
 list of callees.  In particular it means that each callable needs
 to specify its index.
Wouldn't it be simpler just to have
    #define INDIRECT_CALL_1(f, f1, ...) \
    (likely(f == f1) ? f1(__VA_ARGS__) : f(__VA_ARGS__))
    #define INDIRECT_CALL_2(f, f2, f1, ...) \
    (likely(f == f2) ? f2(__VA_ARGS__) : INDIRECT_CALL_1(f, f1, 
__VA_ARGS__))
etc.?  Removing the need for INDIRECT_CALLABLE_DECLARE_* entirely.

At least the commit message should explain the rationale for not
 doing things this way.

-Ed

PS: this has reminded me of my desire to try runtime creation of
 these kinds of branch tables with self-modifying code; is there
 any documentation on how to go about writing to kernel .text at
 runtime?  Last time I had a try at it I got very confused.


Re: [PATCH net-next 1/4] indirect call wrappers: helpers to speed-up indirect calls of builtin

2018-12-04 Thread David Miller
From: Paolo Abeni 
Date: Tue, 04 Dec 2018 12:27:51 +0100

> On Mon, 2018-12-03 at 10:04 -0800, Eric Dumazet wrote:
>> On 12/03/2018 03:40 AM, Paolo Abeni wrote:
>> > This header define a bunch of helpers that allow avoiding the
>> > retpoline overhead when calling builtin functions via function pointers.
>> > It boils down to explicitly comparing the function pointers to
>> > known builtin functions and eventually invoke directly the latter.
>> > 
>> > The macros defined here implement the boilerplate for the above schema
>> > and will be used by the next patches.
>> > 
>> > rfc -> v1:
>> >  - use branch prediction hint, as suggested by Eric
>> > 
>> > Suggested-by: Eric Dumazet 
>> > Signed-off-by: Paolo Abeni 
>> > ---
>> >  include/linux/indirect_call_wrapper.h | 77 +++
>> >  1 file changed, 77 insertions(+)
>> >  create mode 100644 include/linux/indirect_call_wrapper.h
>> 
>> This needs to be discussed more broadly, please include lkml 
> 
> Agreed. @David: please let me know if you prefer a repost or a v2 with
> the expanded recipients list.

v2 probably works better and will help me better keep track of things.

Thanks for asking.


Re: [PATCH net-next 1/4] indirect call wrappers: helpers to speed-up indirect calls of builtin

2018-12-04 Thread Paolo Abeni
On Mon, 2018-12-03 at 10:04 -0800, Eric Dumazet wrote:
> On 12/03/2018 03:40 AM, Paolo Abeni wrote:
> > This header define a bunch of helpers that allow avoiding the
> > retpoline overhead when calling builtin functions via function pointers.
> > It boils down to explicitly comparing the function pointers to
> > known builtin functions and eventually invoke directly the latter.
> > 
> > The macros defined here implement the boilerplate for the above schema
> > and will be used by the next patches.
> > 
> > rfc -> v1:
> >  - use branch prediction hint, as suggested by Eric
> > 
> > Suggested-by: Eric Dumazet 
> > Signed-off-by: Paolo Abeni 
> > ---
> >  include/linux/indirect_call_wrapper.h | 77 +++
> >  1 file changed, 77 insertions(+)
> >  create mode 100644 include/linux/indirect_call_wrapper.h
> 
> This needs to be discussed more broadly, please include lkml 

Agreed. @David: please let me know if you prefer a repost or a v2 with
the expanded recipients list.

> Also please include Paul Turner   in the discussion, since 
> he was
> the inventor of this idea.

I will do.

Thanks,
Paolo



Re: [PATCH net-next 1/4] indirect call wrappers: helpers to speed-up indirect calls of builtin

2018-12-03 Thread Eric Dumazet



On 12/03/2018 03:40 AM, Paolo Abeni wrote:
> This header define a bunch of helpers that allow avoiding the
> retpoline overhead when calling builtin functions via function pointers.
> It boils down to explicitly comparing the function pointers to
> known builtin functions and eventually invoke directly the latter.
> 
> The macros defined here implement the boilerplate for the above schema
> and will be used by the next patches.
> 
> rfc -> v1:
>  - use branch prediction hint, as suggested by Eric
> 
> Suggested-by: Eric Dumazet 
> Signed-off-by: Paolo Abeni 
> ---
>  include/linux/indirect_call_wrapper.h | 77 +++
>  1 file changed, 77 insertions(+)
>  create mode 100644 include/linux/indirect_call_wrapper.h


This needs to be discussed more broadly, please include lkml 

Also please include Paul Turner   in the discussion, since he 
was
the inventor of this idea.



[PATCH net-next 1/4] indirect call wrappers: helpers to speed-up indirect calls of builtin

2018-12-03 Thread Paolo Abeni
This header define a bunch of helpers that allow avoiding the
retpoline overhead when calling builtin functions via function pointers.
It boils down to explicitly comparing the function pointers to
known builtin functions and eventually invoke directly the latter.

The macros defined here implement the boilerplate for the above schema
and will be used by the next patches.

rfc -> v1:
 - use branch prediction hint, as suggested by Eric

Suggested-by: Eric Dumazet 
Signed-off-by: Paolo Abeni 
---
 include/linux/indirect_call_wrapper.h | 77 +++
 1 file changed, 77 insertions(+)
 create mode 100644 include/linux/indirect_call_wrapper.h

diff --git a/include/linux/indirect_call_wrapper.h 
b/include/linux/indirect_call_wrapper.h
new file mode 100644
index ..d6894ffc33f6
--- /dev/null
+++ b/include/linux/indirect_call_wrapper.h
@@ -0,0 +1,77 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_INDIRECT_CALL_WRAPPER_H
+#define _LINUX_INDIRECT_CALL_WRAPPER_H
+
+#ifdef CONFIG_RETPOLINE
+
+/*
+ * INDIRECT_CALL_$NR - wrapper for indirect calls with $NR known builtin
+ *  @f: function pointer
+ *  @name: base name for builtin functions, see INDIRECT_CALLABLE_DECLARE_$NR
+ *  @__VA_ARGS__: arguments for @f
+ *
+ * Avoid retpoline overhead for known builtin, checking @f vs each of them and
+ * eventually invoking directly the builtin function. Fallback to the indirect
+ * call
+ */
+#define INDIRECT_CALL_1(f, name, ...)  \
+   ({  \
+   likely(f == name ## 1) ? name ## 1(__VA_ARGS__) :   \
+f(__VA_ARGS__);\
+   })
+#define INDIRECT_CALL_2(f, name, ...)  \
+   ({  \
+   likely(f == name ## 2) ? name ## 2(__VA_ARGS__) :   \
+INDIRECT_CALL_1(f, name, __VA_ARGS__); \
+   })
+
+/*
+ * INDIRECT_CALLABLE_DECLARE_$NR - declare $NR known builtin for
+ * INDIRECT_CALL_$NR usage
+ *  @type: return type for the builtin function
+ *  @name: base name for builtin functions, the full list is generated 
appending
+ *the numbers in the 1..@NR range
+ *  @__VA_ARGS__: arguments type list for the builtin function
+ *
+ * Builtin with higher $NR will be checked first by INDIRECT_CALL_$NR
+ */
+#define INDIRECT_CALLABLE_DECLARE_1(type, name, ...)   \
+   extern type name ## 1(__VA_ARGS__)
+#define INDIRECT_CALLABLE_DECLARE_2(type, name, ...)   \
+   extern type name ## 2(__VA_ARGS__); \
+   INDIRECT_CALLABLE_DECLARE_1(type, name, __VA_ARGS__)
+
+/*
+ * INDIRECT_CALLABLE - allow usage of a builtin function from INDIRECT_CALL_$NR
+ *  @f: builtin function name
+ *  @nr: id associated with this builtin, higher values will be checked first 
by
+ *  INDIRECT_CALL_$NR
+ *  @type: function return type
+ *  @name: base name used by INDIRECT_CALL_ to access the builtin list
+ *  @__VA_ARGS__: arguments type list for @f
+ */
+#define INDIRECT_CALLABLE(f, nr, type, name, ...)  \
+   __alias(f) type name ## nr(__VA_ARGS__)
+
+#else
+#define INDIRECT_CALL_1(f, name, ...) f(__VA_ARGS__)
+#define INDIRECT_CALL_2(f, name, ...) f(__VA_ARGS__)
+#define INDIRECT_CALLABLE_DECLARE_1(type, name, ...)
+#define INDIRECT_CALLABLE_DECLARE_2(type, name, ...)
+#define INDIRECT_CALLABLE(f, nr, type, name, ...)
+#endif
+
+/*
+ * We can use INDIRECT_CALL_$NR for ipv6 related functions only if ipv6 is
+ * builtin, this macro simplify dealing with indirect calls with only ipv4/ipv6
+ * alternatives
+ */
+#if IS_BUILTIN(CONFIG_IPV6)
+#define INDIRECT_CALL_INET INDIRECT_CALL_2
+#elif IS_ENABLED(CONFIG_INET)
+#define INDIRECT_CALL_INET INDIRECT_CALL_1
+#else
+#define INDIRECT_CALL_INET(...)
+#endif
+
+#endif
-- 
2.19.2