Re: VDSO on amd64

2021-11-26 Thread Ed Maste
On Thu, 25 Nov 2021 at 00:36, Kurt Jaeger  wrote:
>
> Eleven years ago Giuseppe Cocomazzi posted this:
>
> http://lists.freebsd.org/pipermail/freebsd-hackers/2010-April/031553.html
>
> vdso and shared page patch

I see the patch generated a couple of responses on the list when it
was posted, including a plan to follow up with a detailed review that
appears not to have happened. It's unfortunate, and as a project we
definitely have an issue that not all contributions are addressed in a
timely manner.

One of the goals of the Git working group, and Warner's newer
development practices working group, is to make it easier to handle
contributions. Of course contributions can be overlooked regardless of
whether they're patches on a mailing list, attached to a Bugzilla PR,
opened as a Phabricator review, or as a GitHub or Gitlab pull or merge
request. There isn't a technical solution that will fully address
this, but we can reduce friction as much as possible.



Re: VDSO on amd64

2021-11-25 Thread Shawn Webb
On Thu, Nov 25, 2021 at 09:53:19PM +0200, Konstantin Belousov wrote:
> On Thu, Nov 25, 2021 at 09:35:53AM +, David Chisnall wrote:
> > Great news!
> > 
> > Note that your example of throwing an exception from a signal handler works
> > because the signal is delivered during a system call.  The compiler
> > generates correct unwind tables for calls because any call may throw.
> The syscalls itself are not annotated, I consider fixing this after vdso
> lands.
> 
> > 
> > If you did something like a division by zero to get a SIGFPE or a
> > null-pointer dereference to get a SIGSEGV then the throw would probably not
> > work (or, rather, would be delivered to the right place but might corrupt
> > some register state).  Neither clang nor GCC currently supports non-call
> > exceptions by default.
> Well, yes, the part of it was that the signal was synchronous.  I was always
> curious, how good are unwind tables generated by -fasynchronous-unwind-tables
> with this regard.
> 
> But still, the fact that unwinder stepped over the signal frame amused me.
> 
> > 
> > This mechanism is more useful for Java VMs and similar.  Some Linux-based
> > implementations (including Android) use this to avoid null-pointer checks in
> > Java.
> > 
> > The VDSO mechanism in Linux is also used for providing some syscall
> > implementations.  In particular, getting the current approximate time and
> > getting the current CPU (either by reading from the VDSO's data section or
> > by doing a real syscall, without userspace knowing which). It also provides
> > the syscall stub that is used for the kernel transition for all 'real'
> > syscalls.  This doesn't matter so much on amd64, but on i386 it lets them
> > select between int 80h, syscall or sysenter, depending on what the hardware
> > supports.
> > 
> > 
> > A few questions about future plans:
> > 
> >  - Do you have plans to extend the VDSO to provide system call entry points
> > and fast-path syscalls?  It would be really nice if we could move all of the
> > libsyscalls bits into the VDSO so that any compartmentalisation mechanism
> > that wanted to interpose on syscalls just needed to provide a replacement
> > for the VDSO.
> No.
> 
> Moving syscall entry point to VDSO is pointless:
> - it would add one more level of indirection before SYSCALL,
> - we do not have slow syscall entry point on amd64 so there is nothing to
>   choose.
> 
> And optimizing 32bit binaries (where we could implement slightly faster
> syscall entry) is past its importance.
> 
> Basically, we do not have to split libc into libc proper and VDSO, as
> Linux has. We can implement features for syscall boundary from both
> sides of kernel, because libc and kernel are developed under the same
> project.  Usermode timehands, fast signal blocks, upcoming rseq support,
> just to name a few of them, all benefit from this model.
> 
> VDSO is only needed for us to provide the unwind annotations on the signal
> trampoline, in a way expected by unwinders.
> 
> > 
> >  - It looks as if the Linux VDSO mechanism isn't yet using this.  Do you
> > plan on moving it over?
> No.
> 
> > 
> >  - I can't quite tell from kern_sharedpage.c (this file has almost no
> > comments) - is the userspace mapping of the VDSO randomised?  This has been
> > done on Linux for a while because the VDSO is an incredibly high-value
> > target for code reuse attacks (it can do system calls and it can restore the
> > entire register state from the contents of an on-stack buffer if you can
> > jump into it).
> Not now.  Randomizing shared page location is not too hard, but there are
> some ABI issues to sort out.  We live with fixed-mapped shared page for
> more than 10 years.

As a point of reference, HardenedBSD's PaX-inspired ASLR
implementation has randomized the shared page for more than half a
decade now without issue. I suspect FreeBSD will find, if applied
properly, randomization of the shared page (now VDSO) likely won't
break anything.

Thanks,

-- 
Shawn Webb
Cofounder / Security Engineer
HardenedBSD

https://git.hardenedbsd.org/hardenedbsd/pubkeys/-/raw/master/Shawn_Webb/03A4CBEBB82EA5A67D9F3853FF2E67A277F8E1FA.pub.asc


signature.asc
Description: PGP signature


Re: VDSO on amd64

2021-11-25 Thread Konstantin Belousov
On Thu, Nov 25, 2021 at 06:34:19AM +0100, Kurt Jaeger wrote:
> Hi!
> 
> > I have mostly finished implementation of "proper" vdso for amd64
> > native binaries, both 64bit and 32bit.  Vdso wraps signal trampolines
> > into real dynamic shared object, which is prelinked into dynamically
> > linked image.
> 
> Eleven years ago Giuseppe Cocomazzi posted this:
> 
> http://lists.freebsd.org/pipermail/freebsd-hackers/2010-April/031553.html
> 
> vdso and shared page patch
> 
> My question: What's the difference between
> 
> https://reviews.freebsd.org/D32960
> 
> and those changes from 2010 ? I'm curious and maybe a little explaination
> would help me understand what happened between 2010 and now.

No idea.  If you are so curious, read both and compare.



Re: VDSO on amd64

2021-11-25 Thread Konstantin Belousov
On Thu, Nov 25, 2021 at 09:35:53AM +, David Chisnall wrote:
> Great news!
> 
> Note that your example of throwing an exception from a signal handler works
> because the signal is delivered during a system call.  The compiler
> generates correct unwind tables for calls because any call may throw.
The syscalls itself are not annotated, I consider fixing this after vdso
lands.

> 
> If you did something like a division by zero to get a SIGFPE or a
> null-pointer dereference to get a SIGSEGV then the throw would probably not
> work (or, rather, would be delivered to the right place but might corrupt
> some register state).  Neither clang nor GCC currently supports non-call
> exceptions by default.
Well, yes, the part of it was that the signal was synchronous.  I was always
curious, how good are unwind tables generated by -fasynchronous-unwind-tables
with this regard.

But still, the fact that unwinder stepped over the signal frame amused me.

> 
> This mechanism is more useful for Java VMs and similar.  Some Linux-based
> implementations (including Android) use this to avoid null-pointer checks in
> Java.
> 
> The VDSO mechanism in Linux is also used for providing some syscall
> implementations.  In particular, getting the current approximate time and
> getting the current CPU (either by reading from the VDSO's data section or
> by doing a real syscall, without userspace knowing which). It also provides
> the syscall stub that is used for the kernel transition for all 'real'
> syscalls.  This doesn't matter so much on amd64, but on i386 it lets them
> select between int 80h, syscall or sysenter, depending on what the hardware
> supports.
> 
> 
> A few questions about future plans:
> 
>  - Do you have plans to extend the VDSO to provide system call entry points
> and fast-path syscalls?  It would be really nice if we could move all of the
> libsyscalls bits into the VDSO so that any compartmentalisation mechanism
> that wanted to interpose on syscalls just needed to provide a replacement
> for the VDSO.
No.

Moving syscall entry point to VDSO is pointless:
- it would add one more level of indirection before SYSCALL,
- we do not have slow syscall entry point on amd64 so there is nothing to
  choose.

And optimizing 32bit binaries (where we could implement slightly faster
syscall entry) is past its importance.

Basically, we do not have to split libc into libc proper and VDSO, as
Linux has. We can implement features for syscall boundary from both
sides of kernel, because libc and kernel are developed under the same
project.  Usermode timehands, fast signal blocks, upcoming rseq support,
just to name a few of them, all benefit from this model.

VDSO is only needed for us to provide the unwind annotations on the signal
trampoline, in a way expected by unwinders.

> 
>  - It looks as if the Linux VDSO mechanism isn't yet using this.  Do you
> plan on moving it over?
No.

> 
>  - I can't quite tell from kern_sharedpage.c (this file has almost no
> comments) - is the userspace mapping of the VDSO randomised?  This has been
> done on Linux for a while because the VDSO is an incredibly high-value
> target for code reuse attacks (it can do system calls and it can restore the
> entire register state from the contents of an on-stack buffer if you can
> jump into it).
Not now.  Randomizing shared page location is not too hard, but there are
some ABI issues to sort out.  We live with fixed-mapped shared page for
more than 10 years.

> 
> David
> 
> On 25/11/2021 02:36, Konstantin Belousov wrote:
> > I have mostly finished implementation of "proper" vdso for amd64
> > native binaries, both 64bit and 32bit.  Vdso wraps signal trampolines
> > into real dynamic shared object, which is prelinked into dynamically
> > linked image.
> > 
> > The main (and in fact, now the only) reason for wrapping trampolines
> > into vdso is to provide proper unwind annotation for the signal frame,
> > without a need to teach each unwinder about special frame types.  In
> > reality, most of them are already aware of our signal trampolines,
> > since there is no other way to walk over them except to match
> > instructions sequence in the frame.  Also, we provide sysctl
> > kern.proc.sigtramp which reports the location of the trampoline.
> > 
> > So this patch should not make much difference for e.g. gdb or lldb.
> > On the other hand, I noted that llvm13 unwinder with vdso is able to
> > catch exceptions thrown from the signal handler, which was a suprise
> > to me.  Corresponding test code is available at
> > https://gist.github.com/b886401fcc92dc37b49316eaf0e871ca
> > 
> > Another advantage for us is that having vdso allows to change
> > trampoline code without breaking 

Re: VDSO on amd64

2021-11-25 Thread David Chisnall

Great news!

Note that your example of throwing an exception from a signal handler 
works because the signal is delivered during a system call.  The 
compiler generates correct unwind tables for calls because any call may 
throw.


If you did something like a division by zero to get a SIGFPE or a 
null-pointer dereference to get a SIGSEGV then the throw would probably 
not work (or, rather, would be delivered to the right place but might 
corrupt some register state).  Neither clang nor GCC currently supports 
non-call exceptions by default.


This mechanism is more useful for Java VMs and similar.  Some 
Linux-based implementations (including Android) use this to avoid 
null-pointer checks in Java.


The VDSO mechanism in Linux is also used for providing some syscall 
implementations.  In particular, getting the current approximate time 
and getting the current CPU (either by reading from the VDSO's data 
section or by doing a real syscall, without userspace knowing which). 
It also provides the syscall stub that is used for the kernel transition 
for all 'real' syscalls.  This doesn't matter so much on amd64, but on 
i386 it lets them select between int 80h, syscall or sysenter, depending 
on what the hardware supports.



A few questions about future plans:

 - Do you have plans to extend the VDSO to provide system call entry 
points and fast-path syscalls?  It would be really nice if we could move 
all of the libsyscalls bits into the VDSO so that any 
compartmentalisation mechanism that wanted to interpose on syscalls just 
needed to provide a replacement for the VDSO.


 - It looks as if the Linux VDSO mechanism isn't yet using this.  Do 
you plan on moving it over?


 - I can't quite tell from kern_sharedpage.c (this file has almost no 
comments) - is the userspace mapping of the VDSO randomised?  This has 
been done on Linux for a while because the VDSO is an incredibly 
high-value target for code reuse attacks (it can do system calls and it 
can restore the entire register state from the contents of an on-stack 
buffer if you can jump into it).


David

On 25/11/2021 02:36, Konstantin Belousov wrote:

I have mostly finished implementation of "proper" vdso for amd64
native binaries, both 64bit and 32bit.  Vdso wraps signal trampolines
into real dynamic shared object, which is prelinked into dynamically
linked image.

The main (and in fact, now the only) reason for wrapping trampolines
into vdso is to provide proper unwind annotation for the signal frame,
without a need to teach each unwinder about special frame types.  In
reality, most of them are already aware of our signal trampolines,
since there is no other way to walk over them except to match
instructions sequence in the frame.  Also, we provide sysctl
kern.proc.sigtramp which reports the location of the trampoline.

So this patch should not make much difference for e.g. gdb or lldb.
On the other hand, I noted that llvm13 unwinder with vdso is able to
catch exceptions thrown from the signal handler, which was a suprise
to me.  Corresponding test code is available at
https://gist.github.com/b886401fcc92dc37b49316eaf0e871ca

Another advantage for us is that having vdso allows to change
trampoline code without breaking unwinders.

Vdso's for both 64bit and 32bit ABI are put into existing shared page.
This means that total size of both objects should be below 4k, and
some more space needs to be left available, for stuff like timehands
and fxrng.  Using linker tricks, which is where the most complexity in
this patch belongs, I was able to reduce size of objects below 1.5k.
I believe some more space saving could be achieved, but I stopped
there for now.  Or we might extend shared region object to two pages,
if current situation appears to be too tight.

The implementation can be found at https://reviews.freebsd.org/D32960

Signal delivery for old i386 elf (freebsd 4.x) and a.out binaries was
not yet tested.

Your reviews, testing, and any other form of feedback is welcomed.
The work was sponsored by The FreeBSD Foundation.





Re: VDSO on amd64

2021-11-24 Thread Kurt Jaeger
Hi!

> I have mostly finished implementation of "proper" vdso for amd64
> native binaries, both 64bit and 32bit.  Vdso wraps signal trampolines
> into real dynamic shared object, which is prelinked into dynamically
> linked image.

Eleven years ago Giuseppe Cocomazzi posted this:

http://lists.freebsd.org/pipermail/freebsd-hackers/2010-April/031553.html

vdso and shared page patch

My question: What's the difference between

https://reviews.freebsd.org/D32960

and those changes from 2010 ? I'm curious and maybe a little explaination
would help me understand what happened between 2010 and now.

-- 
p...@opsec.eu+49 171 3101372Now what ?



VDSO on amd64

2021-11-24 Thread Konstantin Belousov
I have mostly finished implementation of "proper" vdso for amd64
native binaries, both 64bit and 32bit.  Vdso wraps signal trampolines
into real dynamic shared object, which is prelinked into dynamically
linked image.

The main (and in fact, now the only) reason for wrapping trampolines
into vdso is to provide proper unwind annotation for the signal frame,
without a need to teach each unwinder about special frame types.  In
reality, most of them are already aware of our signal trampolines,
since there is no other way to walk over them except to match
instructions sequence in the frame.  Also, we provide sysctl
kern.proc.sigtramp which reports the location of the trampoline.

So this patch should not make much difference for e.g. gdb or lldb.
On the other hand, I noted that llvm13 unwinder with vdso is able to
catch exceptions thrown from the signal handler, which was a suprise
to me.  Corresponding test code is available at
https://gist.github.com/b886401fcc92dc37b49316eaf0e871ca

Another advantage for us is that having vdso allows to change
trampoline code without breaking unwinders.

Vdso's for both 64bit and 32bit ABI are put into existing shared page.
This means that total size of both objects should be below 4k, and
some more space needs to be left available, for stuff like timehands
and fxrng.  Using linker tricks, which is where the most complexity in
this patch belongs, I was able to reduce size of objects below 1.5k.
I believe some more space saving could be achieved, but I stopped
there for now.  Or we might extend shared region object to two pages,
if current situation appears to be too tight.

The implementation can be found at https://reviews.freebsd.org/D32960

Signal delivery for old i386 elf (freebsd 4.x) and a.out binaries was
not yet tested.

Your reviews, testing, and any other form of feedback is welcomed.
The work was sponsored by The FreeBSD Foundation.