Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO

2018-09-13 Thread Florian Weimer

On 09/13/2018 09:35 PM, Andy Lutomirski wrote:


Somewhat special, yes, but not overly so, and not in the type-polymorphic 
sense.  We can't give direct access of the vDSO implementation to applications 
because the kernel does not know about the userspace errno variable.  We do 
that for time on x86_64, where applications call into the vDSO directly, 
bypassing glibc completely after binding.


If the vDSO adds special helpers for CLOCK_MONOTONIC and CLOCK_REALTIME, I 
think we can reasonably safely promise that they never fail. (seccomp can 
obviously break that promise if there’s no TSC, but I think that seccomp users 
who do that get to keep both pieces.)


I agree, I thought about the same thing.  We already do not return 
EFAULT for invalid pointers, for obvious reasons.  And if the clock ID 
is fixed, the EINVAL error is impossible.


That would shave off a few nanoseconds more if the calling convention is 
identical to what glibc exposes to applications.  If the vDSO is not 
available or the symbol is missing, we can provide an implementation 
based on the current clock_gettime in glibc.


Thanks,
Florian


Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO

2018-09-13 Thread Florian Weimer

On 09/13/2018 09:35 PM, Andy Lutomirski wrote:


Somewhat special, yes, but not overly so, and not in the type-polymorphic 
sense.  We can't give direct access of the vDSO implementation to applications 
because the kernel does not know about the userspace errno variable.  We do 
that for time on x86_64, where applications call into the vDSO directly, 
bypassing glibc completely after binding.


If the vDSO adds special helpers for CLOCK_MONOTONIC and CLOCK_REALTIME, I 
think we can reasonably safely promise that they never fail. (seccomp can 
obviously break that promise if there’s no TSC, but I think that seccomp users 
who do that get to keep both pieces.)


I agree, I thought about the same thing.  We already do not return 
EFAULT for invalid pointers, for obvious reasons.  And if the clock ID 
is fixed, the EINVAL error is impossible.


That would shave off a few nanoseconds more if the calling convention is 
identical to what glibc exposes to applications.  If the vDSO is not 
available or the symbol is missing, we can provide an implementation 
based on the current clock_gettime in glibc.


Thanks,
Florian


Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO

2018-09-13 Thread Andy Lutomirski



> On Sep 13, 2018, at 12:07 PM, Florian Weimer  wrote:
> 
> On 09/13/2018 05:22 PM, Andy Lutomirski wrote:
>>> On Sep 13, 2018, at 1:07 AM, Florian Weimer  wrote:
>>> 
>>> On 09/12/2018 07:11 PM, Andy Lutomirski wrote:
> The multiplexer interfaces need much more surgery and talking about futex,
> we'd need to sit down with quite some people and identify the things they
> actually care about before just splitting it up and keeping the existing
> overloaded trainwreck the same.
> 
 There’s also the issue of how much the speedup matters. For futex, maybe a 
 better interface saves 3ns, but a futex syscall is hundreds of ns. 
 clock_gettime() is called at high frequency and can be ~25ns. Saving a few 
 ns is a bigger deal.
>>> 
>>> My concern is that the userspace system call wrappers currently do not know 
>>> how many arguments the individual operations take and what types the 
>>> arguments have (hence the “type-polymorphic” nature I mentioned). This 
>>> could be a problem for on-stack argument passing (where you might read 
>>> values beyond the end of the stack, and glibc avoids that most of the time 
>>> by having enough cruft on the stack), and for architectures which pass 
>>> pointers and integers in different registers (like some m68k ABIs do for 
>>> the return value).
> 
>> Isn’t clock_gettime already special because of the vDSO entry point, though?
> 
> Somewhat special, yes, but not overly so, and not in the type-polymorphic 
> sense.  We can't give direct access of the vDSO implementation to 
> applications because the kernel does not know about the userspace errno 
> variable.  We do that for time on x86_64, where applications call into the 
> vDSO directly, bypassing glibc completely after binding.

If the vDSO adds special helpers for CLOCK_MONOTONIC and CLOCK_REALTIME, I 
think we can reasonably safely promise that they never fail. (seccomp can 
obviously break that promise if there’s no TSC, but I think that seccomp users 
who do that get to keep both pieces.)


Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO

2018-09-13 Thread Andy Lutomirski



> On Sep 13, 2018, at 12:07 PM, Florian Weimer  wrote:
> 
> On 09/13/2018 05:22 PM, Andy Lutomirski wrote:
>>> On Sep 13, 2018, at 1:07 AM, Florian Weimer  wrote:
>>> 
>>> On 09/12/2018 07:11 PM, Andy Lutomirski wrote:
> The multiplexer interfaces need much more surgery and talking about futex,
> we'd need to sit down with quite some people and identify the things they
> actually care about before just splitting it up and keeping the existing
> overloaded trainwreck the same.
> 
 There’s also the issue of how much the speedup matters. For futex, maybe a 
 better interface saves 3ns, but a futex syscall is hundreds of ns. 
 clock_gettime() is called at high frequency and can be ~25ns. Saving a few 
 ns is a bigger deal.
>>> 
>>> My concern is that the userspace system call wrappers currently do not know 
>>> how many arguments the individual operations take and what types the 
>>> arguments have (hence the “type-polymorphic” nature I mentioned). This 
>>> could be a problem for on-stack argument passing (where you might read 
>>> values beyond the end of the stack, and glibc avoids that most of the time 
>>> by having enough cruft on the stack), and for architectures which pass 
>>> pointers and integers in different registers (like some m68k ABIs do for 
>>> the return value).
> 
>> Isn’t clock_gettime already special because of the vDSO entry point, though?
> 
> Somewhat special, yes, but not overly so, and not in the type-polymorphic 
> sense.  We can't give direct access of the vDSO implementation to 
> applications because the kernel does not know about the userspace errno 
> variable.  We do that for time on x86_64, where applications call into the 
> vDSO directly, bypassing glibc completely after binding.

If the vDSO adds special helpers for CLOCK_MONOTONIC and CLOCK_REALTIME, I 
think we can reasonably safely promise that they never fail. (seccomp can 
obviously break that promise if there’s no TSC, but I think that seccomp users 
who do that get to keep both pieces.)


Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO

2018-09-13 Thread Florian Weimer

On 09/13/2018 05:22 PM, Andy Lutomirski wrote:




On Sep 13, 2018, at 1:07 AM, Florian Weimer  wrote:

On 09/12/2018 07:11 PM, Andy Lutomirski wrote:

The multiplexer interfaces need much more surgery and talking about futex,
we'd need to sit down with quite some people and identify the things they
actually care about before just splitting it up and keeping the existing
overloaded trainwreck the same.


There’s also the issue of how much the speedup matters. For futex, maybe a 
better interface saves 3ns, but a futex syscall is hundreds of ns. 
clock_gettime() is called at high frequency and can be ~25ns. Saving a few ns 
is a bigger deal.


My concern is that the userspace system call wrappers currently do not know how 
many arguments the individual operations take and what types the arguments have 
(hence the “type-polymorphic” nature I mentioned). This could be a problem for 
on-stack argument passing (where you might read values beyond the end of the 
stack, and glibc avoids that most of the time by having enough cruft on the 
stack), and for architectures which pass pointers and integers in different 
registers (like some m68k ABIs do for the return value).



Isn’t clock_gettime already special because of the vDSO entry point, though?


Somewhat special, yes, but not overly so, and not in the 
type-polymorphic sense.  We can't give direct access of the vDSO 
implementation to applications because the kernel does not know about 
the userspace errno variable.  We do that for time on x86_64, where 
applications call into the vDSO directly, bypassing glibc completely 
after binding.


I suspect most Linux libcs that know about the vDSO at all have generic 
vsyscall support, just like they have generic support for plain system 
calls.


Thanks,
Florian


Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO

2018-09-13 Thread Florian Weimer

On 09/13/2018 05:22 PM, Andy Lutomirski wrote:




On Sep 13, 2018, at 1:07 AM, Florian Weimer  wrote:

On 09/12/2018 07:11 PM, Andy Lutomirski wrote:

The multiplexer interfaces need much more surgery and talking about futex,
we'd need to sit down with quite some people and identify the things they
actually care about before just splitting it up and keeping the existing
overloaded trainwreck the same.


There’s also the issue of how much the speedup matters. For futex, maybe a 
better interface saves 3ns, but a futex syscall is hundreds of ns. 
clock_gettime() is called at high frequency and can be ~25ns. Saving a few ns 
is a bigger deal.


My concern is that the userspace system call wrappers currently do not know how 
many arguments the individual operations take and what types the arguments have 
(hence the “type-polymorphic” nature I mentioned). This could be a problem for 
on-stack argument passing (where you might read values beyond the end of the 
stack, and glibc avoids that most of the time by having enough cruft on the 
stack), and for architectures which pass pointers and integers in different 
registers (like some m68k ABIs do for the return value).



Isn’t clock_gettime already special because of the vDSO entry point, though?


Somewhat special, yes, but not overly so, and not in the 
type-polymorphic sense.  We can't give direct access of the vDSO 
implementation to applications because the kernel does not know about 
the userspace errno variable.  We do that for time on x86_64, where 
applications call into the vDSO directly, bypassing glibc completely 
after binding.


I suspect most Linux libcs that know about the vDSO at all have generic 
vsyscall support, just like they have generic support for plain system 
calls.


Thanks,
Florian


Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO

2018-09-13 Thread Andy Lutomirski



> On Sep 13, 2018, at 1:07 AM, Florian Weimer  wrote:
> 
> On 09/12/2018 07:11 PM, Andy Lutomirski wrote:
>>> The multiplexer interfaces need much more surgery and talking about futex,
>>> we'd need to sit down with quite some people and identify the things they
>>> actually care about before just splitting it up and keeping the existing
>>> overloaded trainwreck the same.
>>> 
>> There’s also the issue of how much the speedup matters. For futex, maybe a 
>> better interface saves 3ns, but a futex syscall is hundreds of ns. 
>> clock_gettime() is called at high frequency and can be ~25ns. Saving a few 
>> ns is a bigger deal.
> 
> My concern is that the userspace system call wrappers currently do not know 
> how many arguments the individual operations take and what types the 
> arguments have (hence the “type-polymorphic” nature I mentioned). This could 
> be a problem for on-stack argument passing (where you might read values 
> beyond the end of the stack, and glibc avoids that most of the time by having 
> enough cruft on the stack), and for architectures which pass pointers and 
> integers in different registers (like some m68k ABIs do for the return value).
> 
> 

Isn’t clock_gettime already special because of the vDSO entry point, though?

Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO

2018-09-13 Thread Andy Lutomirski



> On Sep 13, 2018, at 1:07 AM, Florian Weimer  wrote:
> 
> On 09/12/2018 07:11 PM, Andy Lutomirski wrote:
>>> The multiplexer interfaces need much more surgery and talking about futex,
>>> we'd need to sit down with quite some people and identify the things they
>>> actually care about before just splitting it up and keeping the existing
>>> overloaded trainwreck the same.
>>> 
>> There’s also the issue of how much the speedup matters. For futex, maybe a 
>> better interface saves 3ns, but a futex syscall is hundreds of ns. 
>> clock_gettime() is called at high frequency and can be ~25ns. Saving a few 
>> ns is a bigger deal.
> 
> My concern is that the userspace system call wrappers currently do not know 
> how many arguments the individual operations take and what types the 
> arguments have (hence the “type-polymorphic” nature I mentioned). This could 
> be a problem for on-stack argument passing (where you might read values 
> beyond the end of the stack, and glibc avoids that most of the time by having 
> enough cruft on the stack), and for architectures which pass pointers and 
> integers in different registers (like some m68k ABIs do for the return value).
> 
> 

Isn’t clock_gettime already special because of the vDSO entry point, though?

Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO

2018-09-13 Thread Florian Weimer

On 09/12/2018 07:11 PM, Andy Lutomirski wrote:

The multiplexer interfaces need much more surgery and talking about futex,
we'd need to sit down with quite some people and identify the things they
actually care about before just splitting it up and keeping the existing
overloaded trainwreck the same.


There’s also the issue of how much the speedup matters. For futex, maybe a 
better interface saves 3ns, but a futex syscall is hundreds of ns. 
clock_gettime() is called at high frequency and can be ~25ns. Saving a few ns 
is a bigger deal.


My concern is that the userspace system call wrappers currently do not 
know how many arguments the individual operations take and what types 
the arguments have (hence the “type-polymorphic” nature I mentioned). 
This could be a problem for on-stack argument passing (where you might 
read values beyond the end of the stack, and glibc avoids that most of 
the time by having enough cruft on the stack), and for architectures 
which pass pointers and integers in different registers (like some m68k 
ABIs do for the return value).


Thanks,
Florian


Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO

2018-09-13 Thread Florian Weimer

On 09/12/2018 07:11 PM, Andy Lutomirski wrote:

The multiplexer interfaces need much more surgery and talking about futex,
we'd need to sit down with quite some people and identify the things they
actually care about before just splitting it up and keeping the existing
overloaded trainwreck the same.


There’s also the issue of how much the speedup matters. For futex, maybe a 
better interface saves 3ns, but a futex syscall is hundreds of ns. 
clock_gettime() is called at high frequency and can be ~25ns. Saving a few ns 
is a bigger deal.


My concern is that the userspace system call wrappers currently do not 
know how many arguments the individual operations take and what types 
the arguments have (hence the “type-polymorphic” nature I mentioned). 
This could be a problem for on-stack argument passing (where you might 
read values beyond the end of the stack, and glibc avoids that most of 
the time by having enough cruft on the stack), and for architectures 
which pass pointers and integers in different registers (like some m68k 
ABIs do for the return value).


Thanks,
Florian


Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO

2018-09-12 Thread Andy Lutomirski



> On Sep 12, 2018, at 7:29 AM, Thomas Gleixner  wrote:
> 
>> On Wed, 12 Sep 2018, Florian Weimer wrote:
>>> On 09/12/2018 04:17 PM, Thomas Gleixner wrote:
 On Wed, 12 Sep 2018, Florian Weimer wrote:
 Does this mean glibc can keep using a single vDSO entrypoint, the one we
 have today?
>>> 
>>> We have no intention to change that.
>> 
>> Okay, I was wondering because Andy seemed to have proposed just that.
>> 
>>> But we surely could provide separate entry points as an extra to avoid a
>>> bunch of conditionals.
>> 
>> We could adjust to that, but the benefit would be long-term because it's an
>> ABI change for glibc, and they tend to take a long time to propagate.
>> 
>> But I must say that clock_gettime is an odd place to start.  I would have
>> expected any of the type-polymorphic multiplexer interfaces (fcntl, ioctl,
>> ptrace, futex) to be a more natural starting point. 8-)
> 
> Well, the starting point of this was to provide clock_tai support in the
> vdso. clock_gettime() in the vdso vs. the real syscall is a factor of 10 in
> speed. clock_gettime() is a pretty hot function in some workloads.
> 
> Andy then noticed that some conditionals could be avoided entirely by using
> a different entry point and offered one along with a 10% speedup. We don't
> have to go there, we can.
> 
> The multiplexer interfaces need much more surgery and talking about futex,
> we'd need to sit down with quite some people and identify the things they
> actually care about before just splitting it up and keeping the existing
> overloaded trainwreck the same.
> 

There’s also the issue of how much the speedup matters. For futex, maybe a 
better interface saves 3ns, but a futex syscall is hundreds of ns. 
clock_gettime() is called at high frequency and can be ~25ns. Saving a few ns 
is a bigger deal.

Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO

2018-09-12 Thread Andy Lutomirski



> On Sep 12, 2018, at 7:29 AM, Thomas Gleixner  wrote:
> 
>> On Wed, 12 Sep 2018, Florian Weimer wrote:
>>> On 09/12/2018 04:17 PM, Thomas Gleixner wrote:
 On Wed, 12 Sep 2018, Florian Weimer wrote:
 Does this mean glibc can keep using a single vDSO entrypoint, the one we
 have today?
>>> 
>>> We have no intention to change that.
>> 
>> Okay, I was wondering because Andy seemed to have proposed just that.
>> 
>>> But we surely could provide separate entry points as an extra to avoid a
>>> bunch of conditionals.
>> 
>> We could adjust to that, but the benefit would be long-term because it's an
>> ABI change for glibc, and they tend to take a long time to propagate.
>> 
>> But I must say that clock_gettime is an odd place to start.  I would have
>> expected any of the type-polymorphic multiplexer interfaces (fcntl, ioctl,
>> ptrace, futex) to be a more natural starting point. 8-)
> 
> Well, the starting point of this was to provide clock_tai support in the
> vdso. clock_gettime() in the vdso vs. the real syscall is a factor of 10 in
> speed. clock_gettime() is a pretty hot function in some workloads.
> 
> Andy then noticed that some conditionals could be avoided entirely by using
> a different entry point and offered one along with a 10% speedup. We don't
> have to go there, we can.
> 
> The multiplexer interfaces need much more surgery and talking about futex,
> we'd need to sit down with quite some people and identify the things they
> actually care about before just splitting it up and keeping the existing
> overloaded trainwreck the same.
> 

There’s also the issue of how much the speedup matters. For futex, maybe a 
better interface saves 3ns, but a futex syscall is hundreds of ns. 
clock_gettime() is called at high frequency and can be ~25ns. Saving a few ns 
is a bigger deal.

Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO

2018-09-12 Thread Thomas Gleixner
On Wed, 12 Sep 2018, Florian Weimer wrote:
> On 09/12/2018 04:17 PM, Thomas Gleixner wrote:
> > On Wed, 12 Sep 2018, Florian Weimer wrote:
> > > Does this mean glibc can keep using a single vDSO entrypoint, the one we
> > > have today?
> > 
> > We have no intention to change that.
> 
> Okay, I was wondering because Andy seemed to have proposed just that.
> 
> > But we surely could provide separate entry points as an extra to avoid a
> > bunch of conditionals.
> 
> We could adjust to that, but the benefit would be long-term because it's an
> ABI change for glibc, and they tend to take a long time to propagate.
> 
> But I must say that clock_gettime is an odd place to start.  I would have
> expected any of the type-polymorphic multiplexer interfaces (fcntl, ioctl,
> ptrace, futex) to be a more natural starting point. 8-)

Well, the starting point of this was to provide clock_tai support in the
vdso. clock_gettime() in the vdso vs. the real syscall is a factor of 10 in
speed. clock_gettime() is a pretty hot function in some workloads.

Andy then noticed that some conditionals could be avoided entirely by using
a different entry point and offered one along with a 10% speedup. We don't
have to go there, we can.

The multiplexer interfaces need much more surgery and talking about futex,
we'd need to sit down with quite some people and identify the things they
actually care about before just splitting it up and keeping the existing
overloaded trainwreck the same.

Thanks,

tglx





Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO

2018-09-12 Thread Thomas Gleixner
On Wed, 12 Sep 2018, Florian Weimer wrote:
> On 09/12/2018 04:17 PM, Thomas Gleixner wrote:
> > On Wed, 12 Sep 2018, Florian Weimer wrote:
> > > Does this mean glibc can keep using a single vDSO entrypoint, the one we
> > > have today?
> > 
> > We have no intention to change that.
> 
> Okay, I was wondering because Andy seemed to have proposed just that.
> 
> > But we surely could provide separate entry points as an extra to avoid a
> > bunch of conditionals.
> 
> We could adjust to that, but the benefit would be long-term because it's an
> ABI change for glibc, and they tend to take a long time to propagate.
> 
> But I must say that clock_gettime is an odd place to start.  I would have
> expected any of the type-polymorphic multiplexer interfaces (fcntl, ioctl,
> ptrace, futex) to be a more natural starting point. 8-)

Well, the starting point of this was to provide clock_tai support in the
vdso. clock_gettime() in the vdso vs. the real syscall is a factor of 10 in
speed. clock_gettime() is a pretty hot function in some workloads.

Andy then noticed that some conditionals could be avoided entirely by using
a different entry point and offered one along with a 10% speedup. We don't
have to go there, we can.

The multiplexer interfaces need much more surgery and talking about futex,
we'd need to sit down with quite some people and identify the things they
actually care about before just splitting it up and keeping the existing
overloaded trainwreck the same.

Thanks,

tglx





Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO

2018-09-12 Thread Florian Weimer

On 09/12/2018 04:17 PM, Thomas Gleixner wrote:

On Wed, 12 Sep 2018, Florian Weimer wrote:

On 09/09/2018 10:05 PM, Thomas Gleixner wrote:

See the patch below. It's integrating TAI without slowing down everything
and it definitely does not result in indirect calls.

On a HSW it slows down clock_gettime() by ~0.5ns. On a SKL I get a speedup
by ~0.5ns. On a AMD Epyc server it's 1.2ns speedup. So it somehow depends
on the uarch and I also observed compiler version dependend variations.


Does this mean glibc can keep using a single vDSO entrypoint, the one we
have today?


We have no intention to change that.


Okay, I was wondering because Andy seemed to have proposed just that.


But we surely could provide separate entry points as an extra to avoid a
bunch of conditionals.


We could adjust to that, but the benefit would be long-term because it's 
an ABI change for glibc, and they tend to take a long time to propagate.


But I must say that clock_gettime is an odd place to start.  I would 
have expected any of the type-polymorphic multiplexer interfaces (fcntl, 
ioctl, ptrace, futex) to be a more natural starting point. 8-)


Thanks,
Florian


Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO

2018-09-12 Thread Florian Weimer

On 09/12/2018 04:17 PM, Thomas Gleixner wrote:

On Wed, 12 Sep 2018, Florian Weimer wrote:

On 09/09/2018 10:05 PM, Thomas Gleixner wrote:

See the patch below. It's integrating TAI without slowing down everything
and it definitely does not result in indirect calls.

On a HSW it slows down clock_gettime() by ~0.5ns. On a SKL I get a speedup
by ~0.5ns. On a AMD Epyc server it's 1.2ns speedup. So it somehow depends
on the uarch and I also observed compiler version dependend variations.


Does this mean glibc can keep using a single vDSO entrypoint, the one we
have today?


We have no intention to change that.


Okay, I was wondering because Andy seemed to have proposed just that.


But we surely could provide separate entry points as an extra to avoid a
bunch of conditionals.


We could adjust to that, but the benefit would be long-term because it's 
an ABI change for glibc, and they tend to take a long time to propagate.


But I must say that clock_gettime is an odd place to start.  I would 
have expected any of the type-polymorphic multiplexer interfaces (fcntl, 
ioctl, ptrace, futex) to be a more natural starting point. 8-)


Thanks,
Florian


Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO

2018-09-12 Thread Thomas Gleixner
On Wed, 12 Sep 2018, Florian Weimer wrote:
> On 09/09/2018 10:05 PM, Thomas Gleixner wrote:
> > See the patch below. It's integrating TAI without slowing down everything
> > and it definitely does not result in indirect calls.
> > 
> > On a HSW it slows down clock_gettime() by ~0.5ns. On a SKL I get a speedup
> > by ~0.5ns. On a AMD Epyc server it's 1.2ns speedup. So it somehow depends
> > on the uarch and I also observed compiler version dependend variations.
> 
> Does this mean glibc can keep using a single vDSO entrypoint, the one we
> have today?

We have no intention to change that.

But we surely could provide separate entry points as an extra to avoid a
bunch of conditionals.

Thanks,

tglx



Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO

2018-09-12 Thread Thomas Gleixner
On Wed, 12 Sep 2018, Florian Weimer wrote:
> On 09/09/2018 10:05 PM, Thomas Gleixner wrote:
> > See the patch below. It's integrating TAI without slowing down everything
> > and it definitely does not result in indirect calls.
> > 
> > On a HSW it slows down clock_gettime() by ~0.5ns. On a SKL I get a speedup
> > by ~0.5ns. On a AMD Epyc server it's 1.2ns speedup. So it somehow depends
> > on the uarch and I also observed compiler version dependend variations.
> 
> Does this mean glibc can keep using a single vDSO entrypoint, the one we
> have today?

We have no intention to change that.

But we surely could provide separate entry points as an extra to avoid a
bunch of conditionals.

Thanks,

tglx



Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO

2018-09-12 Thread Florian Weimer

On 09/09/2018 10:05 PM, Thomas Gleixner wrote:

See the patch below. It's integrating TAI without slowing down everything
and it definitely does not result in indirect calls.

On a HSW it slows down clock_gettime() by ~0.5ns. On a SKL I get a speedup
by ~0.5ns. On a AMD Epyc server it's 1.2ns speedup. So it somehow depends
on the uarch and I also observed compiler version dependend variations.


Does this mean glibc can keep using a single vDSO entrypoint, the one we 
have today?


Thanks,
Florian


Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO

2018-09-12 Thread Florian Weimer

On 09/09/2018 10:05 PM, Thomas Gleixner wrote:

See the patch below. It's integrating TAI without slowing down everything
and it definitely does not result in indirect calls.

On a HSW it slows down clock_gettime() by ~0.5ns. On a SKL I get a speedup
by ~0.5ns. On a AMD Epyc server it's 1.2ns speedup. So it somehow depends
on the uarch and I also observed compiler version dependend variations.


Does this mean glibc can keep using a single vDSO entrypoint, the one we 
have today?


Thanks,
Florian


Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO

2018-09-10 Thread Thomas Gleixner
On Sun, 9 Sep 2018, Thomas Gleixner wrote:
>  #ifdef BUILD_VDSO32_64
>  typedef u64 gtod_long_t;
>  #else
>  typedef unsigned long gtod_long_t;
>  #endif
> +
> +struct vgtod_ts {
> + gtod_long_t sec;

and actually this wants to become u64 unconditionally as we need to provide
the full seconds even on 32bit for the upcoming y2038 support. We still
have to truncate it for the current 32bit interface, but the core code
can be made ready now.

Thanks,

tglx


Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO

2018-09-10 Thread Thomas Gleixner
On Sun, 9 Sep 2018, Thomas Gleixner wrote:
>  #ifdef BUILD_VDSO32_64
>  typedef u64 gtod_long_t;
>  #else
>  typedef unsigned long gtod_long_t;
>  #endif
> +
> +struct vgtod_ts {
> + gtod_long_t sec;

and actually this wants to become u64 unconditionally as we need to provide
the full seconds even on 32bit for the upcoming y2038 support. We still
have to truncate it for the current 32bit interface, but the core code
can be made ready now.

Thanks,

tglx


Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO

2018-09-09 Thread Thomas Gleixner
On Fri, 31 Aug 2018, Andy Lutomirski wrote:

> (Hi, Florian!)
> 
> On Fri, Aug 31, 2018 at 6:59 PM, Matt Rickard  wrote:
> > Process clock_gettime(CLOCK_TAI) in vDSO.
> > This makes the call about as fast as CLOCK_REALTIME and CLOCK_MONOTONIC:
> >
> >   nanoseconds
> >  before after clockname
> > - -
> > 23387 CLOCK_TAI
> >  9693 CLOCK_REALTIME
> >  8887 CLOCK_MONOTONIC
> 
> Are you sure you did this right?  With the clocksource set to TSC
> (which is the only reasonable choice unless KVM has seriously cleaned
> up its act), with retpolines enabled, I get 24ns for CLOCK_MONOTONIC
> without your patch and 32ns with your patch.  And there is indeed a
> retpoline in the disassembled output:
> 
>   e5:   e8 07 00 00 00  callq  f1 <__vdso_clock_gettime+0x31>
>   ea:   f3 90   pause
>   ec:   0f ae e8lfence
>   ef:   eb f9   jmpea <__vdso_clock_gettime+0x2a>
>   f1:   48 89 04 24 mov%rax,(%rsp)
>   f5:   c3  retq
> 
> You're probably going to have to set -fno-jump-tables or do something
> clever like adding a whole array of (seconds, nsec) in gtod and
> indexing that array by the clock id.

See the patch below. It's integrating TAI without slowing down everything
and it definitely does not result in indirect calls.

On a HSW it slows down clock_gettime() by ~0.5ns. On a SKL I get a speedup
by ~0.5ns. On a AMD Epyc server it's 1.2ns speedup. So it somehow depends
on the uarch and I also observed compiler version dependend variations.

Thanks,

tglx

--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -203,39 +203,18 @@ notrace static inline u64 vgetsns(int *m
return v * gtod->mult;
 }
 
-/* Code size doesn't matter (vdso is 4k anyway) and this is faster. */
-notrace static int __always_inline do_realtime(struct timespec *ts)
+notrace static int __always_inline do_hres(struct timespec *ts, clockid_t clk)
 {
-   unsigned long seq;
-   u64 ns;
+   struct vgtod_ts *base = >basetime[clk & VGTOD_HRES_MASK];
+   unsigned int seq;
int mode;
-
-   do {
-   seq = gtod_read_begin(gtod);
-   mode = gtod->vclock_mode;
-   ts->tv_sec = gtod->wall_time_sec;
-   ns = gtod->wall_time_snsec;
-   ns += vgetsns();
-   ns >>= gtod->shift;
-   } while (unlikely(gtod_read_retry(gtod, seq)));
-
-   ts->tv_sec += __iter_div_u64_rem(ns, NSEC_PER_SEC, );
-   ts->tv_nsec = ns;
-
-   return mode;
-}
-
-notrace static int __always_inline do_monotonic(struct timespec *ts)
-{
-   unsigned long seq;
u64 ns;
-   int mode;
 
do {
seq = gtod_read_begin(gtod);
mode = gtod->vclock_mode;
-   ts->tv_sec = gtod->monotonic_time_sec;
-   ns = gtod->monotonic_time_snsec;
+   ts->tv_sec = base->sec;
+   ns = base->nsec;
ns += vgetsns();
ns >>= gtod->shift;
} while (unlikely(gtod_read_retry(gtod, seq)));
@@ -246,58 +225,50 @@ notrace static int __always_inline do_mo
return mode;
 }
 
-notrace static void do_realtime_coarse(struct timespec *ts)
+notrace static void do_coarse(struct timespec *ts, clockid_t clk)
 {
+   struct vgtod_ts *base = >basetime[clk];
unsigned long seq;
-   do {
-   seq = gtod_read_begin(gtod);
-   ts->tv_sec = gtod->wall_time_coarse_sec;
-   ts->tv_nsec = gtod->wall_time_coarse_nsec;
-   } while (unlikely(gtod_read_retry(gtod, seq)));
-}
 
-notrace static void do_monotonic_coarse(struct timespec *ts)
-{
-   unsigned long seq;
do {
seq = gtod_read_begin(gtod);
-   ts->tv_sec = gtod->monotonic_time_coarse_sec;
-   ts->tv_nsec = gtod->monotonic_time_coarse_nsec;
+   ts->tv_sec = base->sec;
+   ts->tv_nsec = base->nsec;
} while (unlikely(gtod_read_retry(gtod, seq)));
 }
 
 notrace int __vdso_clock_gettime(clockid_t clock, struct timespec *ts)
 {
-   switch (clock) {
-   case CLOCK_REALTIME:
-   if (do_realtime(ts) == VCLOCK_NONE)
-   goto fallback;
-   break;
-   case CLOCK_MONOTONIC:
-   if (do_monotonic(ts) == VCLOCK_NONE)
-   goto fallback;
-   break;
-   case CLOCK_REALTIME_COARSE:
-   do_realtime_coarse(ts);
-   break;
-   case CLOCK_MONOTONIC_COARSE:
-   do_monotonic_coarse(ts);
-   break;
-   default:
-   goto fallback;
-   }
+   unsigned int msk;
 
-   return 0;
-fallback:
+   /* Sort out negative (CPU/FD) and invalid clocks */
+   if (unlikely((unsigned int) clock >= MAX_CLOCKS))
+   return vdso_fallback_gettime(clock, ts);
+
+   /*

Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO

2018-09-09 Thread Thomas Gleixner
On Fri, 31 Aug 2018, Andy Lutomirski wrote:

> (Hi, Florian!)
> 
> On Fri, Aug 31, 2018 at 6:59 PM, Matt Rickard  wrote:
> > Process clock_gettime(CLOCK_TAI) in vDSO.
> > This makes the call about as fast as CLOCK_REALTIME and CLOCK_MONOTONIC:
> >
> >   nanoseconds
> >  before after clockname
> > - -
> > 23387 CLOCK_TAI
> >  9693 CLOCK_REALTIME
> >  8887 CLOCK_MONOTONIC
> 
> Are you sure you did this right?  With the clocksource set to TSC
> (which is the only reasonable choice unless KVM has seriously cleaned
> up its act), with retpolines enabled, I get 24ns for CLOCK_MONOTONIC
> without your patch and 32ns with your patch.  And there is indeed a
> retpoline in the disassembled output:
> 
>   e5:   e8 07 00 00 00  callq  f1 <__vdso_clock_gettime+0x31>
>   ea:   f3 90   pause
>   ec:   0f ae e8lfence
>   ef:   eb f9   jmpea <__vdso_clock_gettime+0x2a>
>   f1:   48 89 04 24 mov%rax,(%rsp)
>   f5:   c3  retq
> 
> You're probably going to have to set -fno-jump-tables or do something
> clever like adding a whole array of (seconds, nsec) in gtod and
> indexing that array by the clock id.

See the patch below. It's integrating TAI without slowing down everything
and it definitely does not result in indirect calls.

On a HSW it slows down clock_gettime() by ~0.5ns. On a SKL I get a speedup
by ~0.5ns. On a AMD Epyc server it's 1.2ns speedup. So it somehow depends
on the uarch and I also observed compiler version dependend variations.

Thanks,

tglx

--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -203,39 +203,18 @@ notrace static inline u64 vgetsns(int *m
return v * gtod->mult;
 }
 
-/* Code size doesn't matter (vdso is 4k anyway) and this is faster. */
-notrace static int __always_inline do_realtime(struct timespec *ts)
+notrace static int __always_inline do_hres(struct timespec *ts, clockid_t clk)
 {
-   unsigned long seq;
-   u64 ns;
+   struct vgtod_ts *base = >basetime[clk & VGTOD_HRES_MASK];
+   unsigned int seq;
int mode;
-
-   do {
-   seq = gtod_read_begin(gtod);
-   mode = gtod->vclock_mode;
-   ts->tv_sec = gtod->wall_time_sec;
-   ns = gtod->wall_time_snsec;
-   ns += vgetsns();
-   ns >>= gtod->shift;
-   } while (unlikely(gtod_read_retry(gtod, seq)));
-
-   ts->tv_sec += __iter_div_u64_rem(ns, NSEC_PER_SEC, );
-   ts->tv_nsec = ns;
-
-   return mode;
-}
-
-notrace static int __always_inline do_monotonic(struct timespec *ts)
-{
-   unsigned long seq;
u64 ns;
-   int mode;
 
do {
seq = gtod_read_begin(gtod);
mode = gtod->vclock_mode;
-   ts->tv_sec = gtod->monotonic_time_sec;
-   ns = gtod->monotonic_time_snsec;
+   ts->tv_sec = base->sec;
+   ns = base->nsec;
ns += vgetsns();
ns >>= gtod->shift;
} while (unlikely(gtod_read_retry(gtod, seq)));
@@ -246,58 +225,50 @@ notrace static int __always_inline do_mo
return mode;
 }
 
-notrace static void do_realtime_coarse(struct timespec *ts)
+notrace static void do_coarse(struct timespec *ts, clockid_t clk)
 {
+   struct vgtod_ts *base = >basetime[clk];
unsigned long seq;
-   do {
-   seq = gtod_read_begin(gtod);
-   ts->tv_sec = gtod->wall_time_coarse_sec;
-   ts->tv_nsec = gtod->wall_time_coarse_nsec;
-   } while (unlikely(gtod_read_retry(gtod, seq)));
-}
 
-notrace static void do_monotonic_coarse(struct timespec *ts)
-{
-   unsigned long seq;
do {
seq = gtod_read_begin(gtod);
-   ts->tv_sec = gtod->monotonic_time_coarse_sec;
-   ts->tv_nsec = gtod->monotonic_time_coarse_nsec;
+   ts->tv_sec = base->sec;
+   ts->tv_nsec = base->nsec;
} while (unlikely(gtod_read_retry(gtod, seq)));
 }
 
 notrace int __vdso_clock_gettime(clockid_t clock, struct timespec *ts)
 {
-   switch (clock) {
-   case CLOCK_REALTIME:
-   if (do_realtime(ts) == VCLOCK_NONE)
-   goto fallback;
-   break;
-   case CLOCK_MONOTONIC:
-   if (do_monotonic(ts) == VCLOCK_NONE)
-   goto fallback;
-   break;
-   case CLOCK_REALTIME_COARSE:
-   do_realtime_coarse(ts);
-   break;
-   case CLOCK_MONOTONIC_COARSE:
-   do_monotonic_coarse(ts);
-   break;
-   default:
-   goto fallback;
-   }
+   unsigned int msk;
 
-   return 0;
-fallback:
+   /* Sort out negative (CPU/FD) and invalid clocks */
+   if (unlikely((unsigned int) clock >= MAX_CLOCKS))
+   return vdso_fallback_gettime(clock, ts);
+
+   /*

Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO

2018-09-01 Thread Andy Lutomirski
On Sat, Sep 1, 2018 at 2:33 AM, Florian Weimer  wrote:
> On 09/01/2018 05:39 AM, Andy Lutomirski wrote:
>>
>> Florian, do you think
>> glibc would be willing to add some magic to turn
>> clock_gettime(CLOCK_MONOTONIC, t) into
>> __vdso_clock_gettime_monotonic(t) when CLOCK_MONOTONIC is a constant?
>
>
> What's the goal here?  Turn the indirect call/conditional jump/indirect call
> sequence into a single indirect call, purely for performance reasons?

Almost.  It's to bypass some of the branches in
__vdso_clock_gettime(), which is supposed to be very fast.  AFAIK most
user code that uses clock_gettime() passes a constant for the first
argument, and we can squeeze out some performance by optimizing that
case.  The indirect branches internal to the vDSO are a separate issue
and should be solved separately.

(It's really too bad that x86 doesn't have a 64-bit call instruction.
If it did, then the PLT could get rewritten at dynamic link time to
avoid indirect calls entirely, and presumably glibc could use the same
technique to call into the vDSO without indirect calls.)


Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO

2018-09-01 Thread Andy Lutomirski
On Sat, Sep 1, 2018 at 2:33 AM, Florian Weimer  wrote:
> On 09/01/2018 05:39 AM, Andy Lutomirski wrote:
>>
>> Florian, do you think
>> glibc would be willing to add some magic to turn
>> clock_gettime(CLOCK_MONOTONIC, t) into
>> __vdso_clock_gettime_monotonic(t) when CLOCK_MONOTONIC is a constant?
>
>
> What's the goal here?  Turn the indirect call/conditional jump/indirect call
> sequence into a single indirect call, purely for performance reasons?

Almost.  It's to bypass some of the branches in
__vdso_clock_gettime(), which is supposed to be very fast.  AFAIK most
user code that uses clock_gettime() passes a constant for the first
argument, and we can squeeze out some performance by optimizing that
case.  The indirect branches internal to the vDSO are a separate issue
and should be solved separately.

(It's really too bad that x86 doesn't have a 64-bit call instruction.
If it did, then the PLT could get rewritten at dynamic link time to
avoid indirect calls entirely, and presumably glibc could use the same
technique to call into the vDSO without indirect calls.)


Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO

2018-09-01 Thread Florian Weimer

On 09/01/2018 05:39 AM, Andy Lutomirski wrote:

Florian, do you think
glibc would be willing to add some magic to turn
clock_gettime(CLOCK_MONOTONIC, t) into
__vdso_clock_gettime_monotonic(t) when CLOCK_MONOTONIC is a constant?


What's the goal here?  Turn the indirect call/conditional jump/indirect 
call sequence into a single indirect call, purely for performance reasons?


Thanks,
Florian


Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO

2018-09-01 Thread Florian Weimer

On 09/01/2018 05:39 AM, Andy Lutomirski wrote:

Florian, do you think
glibc would be willing to add some magic to turn
clock_gettime(CLOCK_MONOTONIC, t) into
__vdso_clock_gettime_monotonic(t) when CLOCK_MONOTONIC is a constant?


What's the goal here?  Turn the indirect call/conditional jump/indirect 
call sequence into a single indirect call, purely for performance reasons?


Thanks,
Florian


Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO

2018-08-31 Thread Andy Lutomirski
(Hi, Florian!)

On Fri, Aug 31, 2018 at 6:59 PM, Matt Rickard  wrote:
> Process clock_gettime(CLOCK_TAI) in vDSO.
> This makes the call about as fast as CLOCK_REALTIME and CLOCK_MONOTONIC:
>
>   nanoseconds
>  before after clockname
> - -
> 23387 CLOCK_TAI
>  9693 CLOCK_REALTIME
>  8887 CLOCK_MONOTONIC

Are you sure you did this right?  With the clocksource set to TSC
(which is the only reasonable choice unless KVM has seriously cleaned
up its act), with retpolines enabled, I get 24ns for CLOCK_MONOTONIC
without your patch and 32ns with your patch.  And there is indeed a
retpoline in the disassembled output:

  e5:   e8 07 00 00 00  callq  f1 <__vdso_clock_gettime+0x31>
  ea:   f3 90   pause
  ec:   0f ae e8lfence
  ef:   eb f9   jmpea <__vdso_clock_gettime+0x2a>
  f1:   48 89 04 24 mov%rax,(%rsp)
  f5:   c3  retq

You're probably going to have to set -fno-jump-tables or do something
clever like adding a whole array of (seconds, nsec) in gtod and
indexing that array by the clock id.

Meanwhile, I wrote the following trivial patch to add a
__vdso_clock_gettime_monotonic export.  It runs in 21ns, and I suspect
that the speedup is even a bit bigger when cache-cold because it
avoids some branches.  What do you all think?  Florian, do you think
glibc would be willing to add some magic to turn
clock_gettime(CLOCK_MONOTONIC, t) into
__vdso_clock_gettime_monotonic(t) when CLOCK_MONOTONIC is a constant?
diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
index 91ed1bb2a3bb..4f22e9cb97a5 100644
--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -319,6 +319,14 @@ notrace int __vdso_clock_gettime(clockid_t clock, struct timespec *ts)
 int clock_gettime(clockid_t, struct timespec *)
 	__attribute__((weak, alias("__vdso_clock_gettime")));
 
+notrace int __vdso_clock_gettime_monotonic(struct timespec *ts)
+{
+	if (likely(do_monotonic(ts) != VCLOCK_NONE))
+		return 0;
+
+	return vdso_fallback_gettime(CLOCK_MONOTONIC, ts);
+}
+
 notrace int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz)
 {
 	if (likely(tv != NULL)) {
diff --git a/arch/x86/entry/vdso/vdso.lds.S b/arch/x86/entry/vdso/vdso.lds.S
index d3a2dce4cfa9..28e23cbc02c9 100644
--- a/arch/x86/entry/vdso/vdso.lds.S
+++ b/arch/x86/entry/vdso/vdso.lds.S
@@ -15,6 +15,11 @@
  * This controls what userland symbols we export from the vDSO.
  */
 VERSION {
+	LINUX_4.19 {
+	global:
+		__vdso_clock_gettime_monotonic;
+	};
+
 	LINUX_2.6 {
 	global:
 		clock_gettime;


Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO

2018-08-31 Thread Andy Lutomirski
(Hi, Florian!)

On Fri, Aug 31, 2018 at 6:59 PM, Matt Rickard  wrote:
> Process clock_gettime(CLOCK_TAI) in vDSO.
> This makes the call about as fast as CLOCK_REALTIME and CLOCK_MONOTONIC:
>
>   nanoseconds
>  before after clockname
> - -
> 23387 CLOCK_TAI
>  9693 CLOCK_REALTIME
>  8887 CLOCK_MONOTONIC

Are you sure you did this right?  With the clocksource set to TSC
(which is the only reasonable choice unless KVM has seriously cleaned
up its act), with retpolines enabled, I get 24ns for CLOCK_MONOTONIC
without your patch and 32ns with your patch.  And there is indeed a
retpoline in the disassembled output:

  e5:   e8 07 00 00 00  callq  f1 <__vdso_clock_gettime+0x31>
  ea:   f3 90   pause
  ec:   0f ae e8lfence
  ef:   eb f9   jmpea <__vdso_clock_gettime+0x2a>
  f1:   48 89 04 24 mov%rax,(%rsp)
  f5:   c3  retq

You're probably going to have to set -fno-jump-tables or do something
clever like adding a whole array of (seconds, nsec) in gtod and
indexing that array by the clock id.

Meanwhile, I wrote the following trivial patch to add a
__vdso_clock_gettime_monotonic export.  It runs in 21ns, and I suspect
that the speedup is even a bit bigger when cache-cold because it
avoids some branches.  What do you all think?  Florian, do you think
glibc would be willing to add some magic to turn
clock_gettime(CLOCK_MONOTONIC, t) into
__vdso_clock_gettime_monotonic(t) when CLOCK_MONOTONIC is a constant?
diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
index 91ed1bb2a3bb..4f22e9cb97a5 100644
--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -319,6 +319,14 @@ notrace int __vdso_clock_gettime(clockid_t clock, struct timespec *ts)
 int clock_gettime(clockid_t, struct timespec *)
 	__attribute__((weak, alias("__vdso_clock_gettime")));
 
+notrace int __vdso_clock_gettime_monotonic(struct timespec *ts)
+{
+	if (likely(do_monotonic(ts) != VCLOCK_NONE))
+		return 0;
+
+	return vdso_fallback_gettime(CLOCK_MONOTONIC, ts);
+}
+
 notrace int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz)
 {
 	if (likely(tv != NULL)) {
diff --git a/arch/x86/entry/vdso/vdso.lds.S b/arch/x86/entry/vdso/vdso.lds.S
index d3a2dce4cfa9..28e23cbc02c9 100644
--- a/arch/x86/entry/vdso/vdso.lds.S
+++ b/arch/x86/entry/vdso/vdso.lds.S
@@ -15,6 +15,11 @@
  * This controls what userland symbols we export from the vDSO.
  */
 VERSION {
+	LINUX_4.19 {
+	global:
+		__vdso_clock_gettime_monotonic;
+	};
+
 	LINUX_2.6 {
 	global:
 		clock_gettime;


[PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO

2018-08-31 Thread Matt Rickard
Process clock_gettime(CLOCK_TAI) in vDSO.
This makes the call about as fast as CLOCK_REALTIME and CLOCK_MONOTONIC:

  nanoseconds
 before after clockname
    - -
23387 CLOCK_TAI
 9693 CLOCK_REALTIME
 8887 CLOCK_MONOTONIC

Signed-off-by: Matt Rickard 
---
 arch/x86/entry/vdso/vclock_gettime.c| 25 +
 arch/x86/entry/vsyscall/vsyscall_gtod.c |  2 ++
 arch/x86/include/asm/vgtod.h|  1 +
 3 files changed, 28 insertions(+)

diff --git a/arch/x86/entry/vdso/vclock_gettime.c 
b/arch/x86/entry/vdso/vclock_gettime.c
index f19856d95c60..91ed1bb2a3bb 100644
--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -246,6 +246,27 @@ notrace static int __always_inline do_monotonic(struct 
timespec *ts)
return mode;
 }
 
+notrace static int __always_inline do_tai(struct timespec *ts)
+{
+   unsigned long seq;
+   u64 ns;
+   int mode;
+
+   do {
+   seq = gtod_read_begin(gtod);
+   mode = gtod->vclock_mode;
+   ts->tv_sec = gtod->tai_time_sec;
+   ns = gtod->wall_time_snsec;
+   ns += vgetsns();
+   ns >>= gtod->shift;
+   } while (unlikely(gtod_read_retry(gtod, seq)));
+
+   ts->tv_sec += __iter_div_u64_rem(ns, NSEC_PER_SEC, );
+   ts->tv_nsec = ns;
+
+   return mode;
+}
+
 notrace static void do_realtime_coarse(struct timespec *ts)
 {
unsigned long seq;
@@ -277,6 +298,10 @@ notrace int __vdso_clock_gettime(clockid_t clock, struct 
timespec *ts)
if (do_monotonic(ts) == VCLOCK_NONE)
goto fallback;
break;
+   case CLOCK_TAI:
+   if (do_tai(ts) == VCLOCK_NONE)
+   goto fallback;
+   break;
case CLOCK_REALTIME_COARSE:
do_realtime_coarse(ts);
break;
diff --git a/arch/x86/entry/vsyscall/vsyscall_gtod.c 
b/arch/x86/entry/vsyscall/vsyscall_gtod.c
index e1216dd95c04..d61392fe17f6 100644
--- a/arch/x86/entry/vsyscall/vsyscall_gtod.c
+++ b/arch/x86/entry/vsyscall/vsyscall_gtod.c
@@ -53,6 +53,8 @@ void update_vsyscall(struct timekeeper *tk)
vdata->monotonic_time_snsec = tk->tkr_mono.xtime_nsec
+ ((u64)tk->wall_to_monotonic.tv_nsec
<< tk->tkr_mono.shift);
+   vdata->tai_time_sec = tk->xtime_sec
+   + tk->tai_offset;
while (vdata->monotonic_time_snsec >=
(((u64)NSEC_PER_SEC) << 
tk->tkr_mono.shift)) {
vdata->monotonic_time_snsec -=
diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h
index fb856c9f0449..adc9f7b20b9c 100644
--- a/arch/x86/include/asm/vgtod.h
+++ b/arch/x86/include/asm/vgtod.h
@@ -32,6 +32,7 @@ struct vsyscall_gtod_data {
gtod_long_t wall_time_coarse_nsec;
gtod_long_t monotonic_time_coarse_sec;
gtod_long_t monotonic_time_coarse_nsec;
+   gtod_long_t tai_time_sec;
 
int tz_minuteswest;
int tz_dsttime;


[PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO

2018-08-31 Thread Matt Rickard
Process clock_gettime(CLOCK_TAI) in vDSO.
This makes the call about as fast as CLOCK_REALTIME and CLOCK_MONOTONIC:

  nanoseconds
 before after clockname
    - -
23387 CLOCK_TAI
 9693 CLOCK_REALTIME
 8887 CLOCK_MONOTONIC

Signed-off-by: Matt Rickard 
---
 arch/x86/entry/vdso/vclock_gettime.c| 25 +
 arch/x86/entry/vsyscall/vsyscall_gtod.c |  2 ++
 arch/x86/include/asm/vgtod.h|  1 +
 3 files changed, 28 insertions(+)

diff --git a/arch/x86/entry/vdso/vclock_gettime.c 
b/arch/x86/entry/vdso/vclock_gettime.c
index f19856d95c60..91ed1bb2a3bb 100644
--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -246,6 +246,27 @@ notrace static int __always_inline do_monotonic(struct 
timespec *ts)
return mode;
 }
 
+notrace static int __always_inline do_tai(struct timespec *ts)
+{
+   unsigned long seq;
+   u64 ns;
+   int mode;
+
+   do {
+   seq = gtod_read_begin(gtod);
+   mode = gtod->vclock_mode;
+   ts->tv_sec = gtod->tai_time_sec;
+   ns = gtod->wall_time_snsec;
+   ns += vgetsns();
+   ns >>= gtod->shift;
+   } while (unlikely(gtod_read_retry(gtod, seq)));
+
+   ts->tv_sec += __iter_div_u64_rem(ns, NSEC_PER_SEC, );
+   ts->tv_nsec = ns;
+
+   return mode;
+}
+
 notrace static void do_realtime_coarse(struct timespec *ts)
 {
unsigned long seq;
@@ -277,6 +298,10 @@ notrace int __vdso_clock_gettime(clockid_t clock, struct 
timespec *ts)
if (do_monotonic(ts) == VCLOCK_NONE)
goto fallback;
break;
+   case CLOCK_TAI:
+   if (do_tai(ts) == VCLOCK_NONE)
+   goto fallback;
+   break;
case CLOCK_REALTIME_COARSE:
do_realtime_coarse(ts);
break;
diff --git a/arch/x86/entry/vsyscall/vsyscall_gtod.c 
b/arch/x86/entry/vsyscall/vsyscall_gtod.c
index e1216dd95c04..d61392fe17f6 100644
--- a/arch/x86/entry/vsyscall/vsyscall_gtod.c
+++ b/arch/x86/entry/vsyscall/vsyscall_gtod.c
@@ -53,6 +53,8 @@ void update_vsyscall(struct timekeeper *tk)
vdata->monotonic_time_snsec = tk->tkr_mono.xtime_nsec
+ ((u64)tk->wall_to_monotonic.tv_nsec
<< tk->tkr_mono.shift);
+   vdata->tai_time_sec = tk->xtime_sec
+   + tk->tai_offset;
while (vdata->monotonic_time_snsec >=
(((u64)NSEC_PER_SEC) << 
tk->tkr_mono.shift)) {
vdata->monotonic_time_snsec -=
diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h
index fb856c9f0449..adc9f7b20b9c 100644
--- a/arch/x86/include/asm/vgtod.h
+++ b/arch/x86/include/asm/vgtod.h
@@ -32,6 +32,7 @@ struct vsyscall_gtod_data {
gtod_long_t wall_time_coarse_nsec;
gtod_long_t monotonic_time_coarse_sec;
gtod_long_t monotonic_time_coarse_nsec;
+   gtod_long_t tai_time_sec;
 
int tz_minuteswest;
int tz_dsttime;