Re: [PATCH v4.16-rc5 2/2] x86/vdso: VDSO should handle clock_gettime(CLOCK_MONOTONIC_RAW) without syscall

2018-03-17 Thread Jason Vas Dias
On 18/03/2018, Jason Vas Dias  wrote:
(should have CC'ed to list, sorry)
> On 17/03/2018, Andi Kleen  wrote:
>>
>> That's quite a mischaracterization of the issue. gcc works as intended,
>> but the kernel did not correctly supply a indirect call retpoline thunk
>> to the vdso, and it just happened to work by accident with the old
>> vdso.
>>
>>>
>>>  The automated test builds should now succeed with this patch.
>>
>> How about just adding the thunk function to the vdso object instead of
>> this cheap hack?
>>
>> The other option would be to build vdso with inline thunks.
>>
>> But just disabling is completely the wrong action.
>>
>> -Andi
>>
>
> Aha! Thanks for the clarification , Andi!
>
> I will do so and resend the 2nd patch.
>
> But is everyone agreed we should accept any slowdown for the timer
> functions ? I personally don't think it is a good idea, but I will
> regenerate the patch with the thunk function and without
> the Makefile change.
>
> Thanks & Best Regards,
> Jason
>


I am wondering if it is not better to avoid the thunk being generated
and remove the Makefile patch ?

I know that changing the switch in __vdso_clock_gettime() like
this avoids the thunk :

   switch(clock) {
   case CLOCK_MONOTONIC:
if (do_monotonic(ts) == VCLOCK_NONE)
goto fallback;
break;
default:
switch (clock) {
case CLOCK_REALTIME:
if (do_realtime(ts) == VCLOCK_NONE)
   goto fallback;
break;
 case CLOCK_MONOTONIC_RAW:
 if (do_monotonic_raw(ts) == VCLOCK_NONE)
   goto fallback;
 break;
 case CLOCK_REALTIME_COARSE:
 do_realtime_coarse(ts);
 break;
 case CLOCK_MONOTONIC_COARSE:
 do_monotonic_coarse(ts);
 break;
 default:
goto fallback;
 }
 return 0;
 fallback: ...
}


So at the cost of an unnecessary extra test of the clock parameter,
the thunk is avoided .

I wonder if the whole switch should be changed to an if / else clause ?

Or, I know this might be unorthodox, but might work :
#define _CAT(V1,V2) V1##V2
#define GTOD_CLK_LABEL(CLK) _CAT(_VCG_L_,CLK)
#define  MAX_CLK 16
//^^ ??
 __vdso_clock_gettime(  ... ) { ...
 static const void *clklbl_tab[MAX_CLK]
 ={ [ CLOCK_MONOTONIC ]
  =   &_CLK_LABEL(CLOCK_MONOTONIC) ,
 [ CLOCK_MONOTONIC_RAW ]
  =   &_CLK_LABEL(CLOCK_MONOTONIC_RAW) ,
// and similarly for all clocks handled ...
};

   goto clklbl_tab[ clock & 0xf ] ;

   GTOD_CLK_LABEL(CLOCK_MONOTONIC) :
if ( do_monotonic(ts) == VCLOCK_NONE )
  goto fallback ;

   GTOD_CLK_LABEL(CLOCK_MONOTONIC_RAW) :
if ( do_monotonic_raw(ts) == VCLOCK_NONE )
  goto fallback ;

 ... // similarly for all clocks

fallback:
 return vdso_fallback_gettime(clock,ts);
}


If a restructuring like that might be acceptable (with correct tab-based
formatting) , and the VDSO can have such a table in its .BSS ,  I think it
would avoid the thunk, and have the advantage of
precomputing the jump table at compile-time, and would not require any
indirect branches, I think.

Any thoughts ?

Thanks & Best regards,
Jason




  ;

 G


Re: [PATCH v4.16-rc5 2/2] x86/vdso: VDSO should handle clock_gettime(CLOCK_MONOTONIC_RAW) without syscall

2018-03-17 Thread Jason Vas Dias
On 18/03/2018, Jason Vas Dias  wrote:
(should have CC'ed to list, sorry)
> On 17/03/2018, Andi Kleen  wrote:
>>
>> That's quite a mischaracterization of the issue. gcc works as intended,
>> but the kernel did not correctly supply a indirect call retpoline thunk
>> to the vdso, and it just happened to work by accident with the old
>> vdso.
>>
>>>
>>>  The automated test builds should now succeed with this patch.
>>
>> How about just adding the thunk function to the vdso object instead of
>> this cheap hack?
>>
>> The other option would be to build vdso with inline thunks.
>>
>> But just disabling is completely the wrong action.
>>
>> -Andi
>>
>
> Aha! Thanks for the clarification , Andi!
>
> I will do so and resend the 2nd patch.
>
> But is everyone agreed we should accept any slowdown for the timer
> functions ? I personally don't think it is a good idea, but I will
> regenerate the patch with the thunk function and without
> the Makefile change.
>
> Thanks & Best Regards,
> Jason
>


I am wondering if it is not better to avoid the thunk being generated
and remove the Makefile patch ?

I know that changing the switch in __vdso_clock_gettime() like
this avoids the thunk :

   switch(clock) {
   case CLOCK_MONOTONIC:
if (do_monotonic(ts) == VCLOCK_NONE)
goto fallback;
break;
default:
switch (clock) {
case CLOCK_REALTIME:
if (do_realtime(ts) == VCLOCK_NONE)
   goto fallback;
break;
 case CLOCK_MONOTONIC_RAW:
 if (do_monotonic_raw(ts) == VCLOCK_NONE)
   goto fallback;
 break;
 case CLOCK_REALTIME_COARSE:
 do_realtime_coarse(ts);
 break;
 case CLOCK_MONOTONIC_COARSE:
 do_monotonic_coarse(ts);
 break;
 default:
goto fallback;
 }
 return 0;
 fallback: ...
}


So at the cost of an unnecessary extra test of the clock parameter,
the thunk is avoided .

I wonder if the whole switch should be changed to an if / else clause ?

Or, I know this might be unorthodox, but might work :
#define _CAT(V1,V2) V1##V2
#define GTOD_CLK_LABEL(CLK) _CAT(_VCG_L_,CLK)
#define  MAX_CLK 16
//^^ ??
 __vdso_clock_gettime(  ... ) { ...
 static const void *clklbl_tab[MAX_CLK]
 ={ [ CLOCK_MONOTONIC ]
  =   &_CLK_LABEL(CLOCK_MONOTONIC) ,
 [ CLOCK_MONOTONIC_RAW ]
  =   &_CLK_LABEL(CLOCK_MONOTONIC_RAW) ,
// and similarly for all clocks handled ...
};

   goto clklbl_tab[ clock & 0xf ] ;

   GTOD_CLK_LABEL(CLOCK_MONOTONIC) :
if ( do_monotonic(ts) == VCLOCK_NONE )
  goto fallback ;

   GTOD_CLK_LABEL(CLOCK_MONOTONIC_RAW) :
if ( do_monotonic_raw(ts) == VCLOCK_NONE )
  goto fallback ;

 ... // similarly for all clocks

fallback:
 return vdso_fallback_gettime(clock,ts);
}


If a restructuring like that might be acceptable (with correct tab-based
formatting) , and the VDSO can have such a table in its .BSS ,  I think it
would avoid the thunk, and have the advantage of
precomputing the jump table at compile-time, and would not require any
indirect branches, I think.

Any thoughts ?

Thanks & Best regards,
Jason




  ;

 G


Re: [PATCH v4.16-rc5 2/2] x86/vdso: VDSO should handle clock_gettime(CLOCK_MONOTONIC_RAW) without syscall

2018-03-17 Thread Andi Kleen
On Sat, Mar 17, 2018 at 02:29:34PM +, jason.vas.d...@gmail.com wrote:
>  This patch allows compilation to succeed with compilers that support 
> -DRETPOLINE -
>  it was kindly contributed by H.J. Liu in GCC Bugzilla: 84908 :
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84908
> 
>  Apparently the GCC retpoline implementation has a limitation that it 
> cannot
>  handle switch statements with more than 5 clauses, which 
> vclock_gettime.c's
>  __vdso_clock_gettime function now conts.

That's quite a mischaracterization of the issue. gcc works as intended,
but the kernel did not correctly supply a indirect call retpoline thunk
to the vdso, and it just happened to work by accident with the old
vdso.

> 
>  The automated test builds should now succeed with this patch.

How about just adding the thunk function to the vdso object instead of
this cheap hack? 

The other option would be to build vdso with inline thunks.

But just disabling is completely the wrong action.

-Andi


> 
> 
> diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> index 1943aeb..cb64e10 100644
> --- a/arch/x86/entry/vdso/Makefile
> +++ b/arch/x86/entry/vdso/Makefile
> @@ -76,7 +76,7 @@ CFL := $(PROFILING) -mcmodel=small -fPIC -O2 
> -fasynchronous-unwind-tables -m64 \
> -fno-omit-frame-pointer -foptimize-sibling-calls \
> -DDISABLE_BRANCH_PROFILING -DBUILD_VDSO
>  
> -$(vobjs): KBUILD_CFLAGS := $(filter-out 
> $(GCC_PLUGINS_CFLAGS),$(KBUILD_CFLAGS)) $(CFL)
> +$(vobjs): KBUILD_CFLAGS := $(filter-out $(GCC_PLUGINS_CFLAGS) 
> $(RETPOLINE_CFLAGS) -DRETPOLINE,$(KBUILD_CFLAGS)) $(CFL)
>  
>  #
>  # vDSO code runs in userspace and -pg doesn't help with profiling anyway.
> @@ -143,6 +143,7 @@ KBUILD_CFLAGS_32 := $(filter-out 
> -mcmodel=kernel,$(KBUILD_CFLAGS_32))
>  KBUILD_CFLAGS_32 := $(filter-out -fno-pic,$(KBUILD_CFLAGS_32))
>  KBUILD_CFLAGS_32 := $(filter-out -mfentry,$(KBUILD_CFLAGS_32))
>  KBUILD_CFLAGS_32 := $(filter-out $(GCC_PLUGINS_CFLAGS),$(KBUILD_CFLAGS_32))
> +KBUILD_CFLAGS_32 := $(filter-out $(RETPOLINE_CFLAGS) 
> -DRETPOLINE,$(KBUILD_CFLAGS_32))
>  KBUILD_CFLAGS_32 += -m32 -msoft-float -mregparm=0 -fpic
>  KBUILD_CFLAGS_32 += $(call cc-option, -fno-stack-protector)
>  KBUILD_CFLAGS_32 += $(call cc-option, -foptimize-sibling-calls)
> 


Re: [PATCH v4.16-rc5 2/2] x86/vdso: VDSO should handle clock_gettime(CLOCK_MONOTONIC_RAW) without syscall

2018-03-17 Thread Andi Kleen
On Sat, Mar 17, 2018 at 02:29:34PM +, jason.vas.d...@gmail.com wrote:
>  This patch allows compilation to succeed with compilers that support 
> -DRETPOLINE -
>  it was kindly contributed by H.J. Liu in GCC Bugzilla: 84908 :
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84908
> 
>  Apparently the GCC retpoline implementation has a limitation that it 
> cannot
>  handle switch statements with more than 5 clauses, which 
> vclock_gettime.c's
>  __vdso_clock_gettime function now conts.

That's quite a mischaracterization of the issue. gcc works as intended,
but the kernel did not correctly supply a indirect call retpoline thunk
to the vdso, and it just happened to work by accident with the old
vdso.

> 
>  The automated test builds should now succeed with this patch.

How about just adding the thunk function to the vdso object instead of
this cheap hack? 

The other option would be to build vdso with inline thunks.

But just disabling is completely the wrong action.

-Andi


> 
> 
> diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> index 1943aeb..cb64e10 100644
> --- a/arch/x86/entry/vdso/Makefile
> +++ b/arch/x86/entry/vdso/Makefile
> @@ -76,7 +76,7 @@ CFL := $(PROFILING) -mcmodel=small -fPIC -O2 
> -fasynchronous-unwind-tables -m64 \
> -fno-omit-frame-pointer -foptimize-sibling-calls \
> -DDISABLE_BRANCH_PROFILING -DBUILD_VDSO
>  
> -$(vobjs): KBUILD_CFLAGS := $(filter-out 
> $(GCC_PLUGINS_CFLAGS),$(KBUILD_CFLAGS)) $(CFL)
> +$(vobjs): KBUILD_CFLAGS := $(filter-out $(GCC_PLUGINS_CFLAGS) 
> $(RETPOLINE_CFLAGS) -DRETPOLINE,$(KBUILD_CFLAGS)) $(CFL)
>  
>  #
>  # vDSO code runs in userspace and -pg doesn't help with profiling anyway.
> @@ -143,6 +143,7 @@ KBUILD_CFLAGS_32 := $(filter-out 
> -mcmodel=kernel,$(KBUILD_CFLAGS_32))
>  KBUILD_CFLAGS_32 := $(filter-out -fno-pic,$(KBUILD_CFLAGS_32))
>  KBUILD_CFLAGS_32 := $(filter-out -mfentry,$(KBUILD_CFLAGS_32))
>  KBUILD_CFLAGS_32 := $(filter-out $(GCC_PLUGINS_CFLAGS),$(KBUILD_CFLAGS_32))
> +KBUILD_CFLAGS_32 := $(filter-out $(RETPOLINE_CFLAGS) 
> -DRETPOLINE,$(KBUILD_CFLAGS_32))
>  KBUILD_CFLAGS_32 += -m32 -msoft-float -mregparm=0 -fpic
>  KBUILD_CFLAGS_32 += $(call cc-option, -fno-stack-protector)
>  KBUILD_CFLAGS_32 += $(call cc-option, -foptimize-sibling-calls)
> 


[PATCH v4.16-rc5 2/2] x86/vdso: VDSO should handle clock_gettime(CLOCK_MONOTONIC_RAW) without syscall

2018-03-17 Thread jason . vas . dias
 This patch allows compilation to succeed with compilers that support 
-DRETPOLINE -
 it was kindly contributed by H.J. Liu in GCC Bugzilla: 84908 :
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84908

 Apparently the GCC retpoline implementation has a limitation that it 
cannot
 handle switch statements with more than 5 clauses, which 
vclock_gettime.c's
 __vdso_clock_gettime function now contains.

 The automated test builds should now succeed with this patch.


diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 1943aeb..cb64e10 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -76,7 +76,7 @@ CFL := $(PROFILING) -mcmodel=small -fPIC -O2 
-fasynchronous-unwind-tables -m64 \
-fno-omit-frame-pointer -foptimize-sibling-calls \
-DDISABLE_BRANCH_PROFILING -DBUILD_VDSO
 
-$(vobjs): KBUILD_CFLAGS := $(filter-out 
$(GCC_PLUGINS_CFLAGS),$(KBUILD_CFLAGS)) $(CFL)
+$(vobjs): KBUILD_CFLAGS := $(filter-out $(GCC_PLUGINS_CFLAGS) 
$(RETPOLINE_CFLAGS) -DRETPOLINE,$(KBUILD_CFLAGS)) $(CFL)
 
 #
 # vDSO code runs in userspace and -pg doesn't help with profiling anyway.
@@ -143,6 +143,7 @@ KBUILD_CFLAGS_32 := $(filter-out 
-mcmodel=kernel,$(KBUILD_CFLAGS_32))
 KBUILD_CFLAGS_32 := $(filter-out -fno-pic,$(KBUILD_CFLAGS_32))
 KBUILD_CFLAGS_32 := $(filter-out -mfentry,$(KBUILD_CFLAGS_32))
 KBUILD_CFLAGS_32 := $(filter-out $(GCC_PLUGINS_CFLAGS),$(KBUILD_CFLAGS_32))
+KBUILD_CFLAGS_32 := $(filter-out $(RETPOLINE_CFLAGS) 
-DRETPOLINE,$(KBUILD_CFLAGS_32))
 KBUILD_CFLAGS_32 += -m32 -msoft-float -mregparm=0 -fpic
 KBUILD_CFLAGS_32 += $(call cc-option, -fno-stack-protector)
 KBUILD_CFLAGS_32 += $(call cc-option, -foptimize-sibling-calls)


[PATCH v4.16-rc5 2/2] x86/vdso: VDSO should handle clock_gettime(CLOCK_MONOTONIC_RAW) without syscall

2018-03-17 Thread jason . vas . dias
 This patch allows compilation to succeed with compilers that support 
-DRETPOLINE -
 it was kindly contributed by H.J. Liu in GCC Bugzilla: 84908 :
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84908

 Apparently the GCC retpoline implementation has a limitation that it 
cannot
 handle switch statements with more than 5 clauses, which 
vclock_gettime.c's
 __vdso_clock_gettime function now contains.

 The automated test builds should now succeed with this patch.


diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 1943aeb..cb64e10 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -76,7 +76,7 @@ CFL := $(PROFILING) -mcmodel=small -fPIC -O2 
-fasynchronous-unwind-tables -m64 \
-fno-omit-frame-pointer -foptimize-sibling-calls \
-DDISABLE_BRANCH_PROFILING -DBUILD_VDSO
 
-$(vobjs): KBUILD_CFLAGS := $(filter-out 
$(GCC_PLUGINS_CFLAGS),$(KBUILD_CFLAGS)) $(CFL)
+$(vobjs): KBUILD_CFLAGS := $(filter-out $(GCC_PLUGINS_CFLAGS) 
$(RETPOLINE_CFLAGS) -DRETPOLINE,$(KBUILD_CFLAGS)) $(CFL)
 
 #
 # vDSO code runs in userspace and -pg doesn't help with profiling anyway.
@@ -143,6 +143,7 @@ KBUILD_CFLAGS_32 := $(filter-out 
-mcmodel=kernel,$(KBUILD_CFLAGS_32))
 KBUILD_CFLAGS_32 := $(filter-out -fno-pic,$(KBUILD_CFLAGS_32))
 KBUILD_CFLAGS_32 := $(filter-out -mfentry,$(KBUILD_CFLAGS_32))
 KBUILD_CFLAGS_32 := $(filter-out $(GCC_PLUGINS_CFLAGS),$(KBUILD_CFLAGS_32))
+KBUILD_CFLAGS_32 := $(filter-out $(RETPOLINE_CFLAGS) 
-DRETPOLINE,$(KBUILD_CFLAGS_32))
 KBUILD_CFLAGS_32 += -m32 -msoft-float -mregparm=0 -fpic
 KBUILD_CFLAGS_32 += $(call cc-option, -fno-stack-protector)
 KBUILD_CFLAGS_32 += $(call cc-option, -foptimize-sibling-calls)