Re: [osv-dev] OSv can run statically linked executable

2023-04-25 Thread Waldek Kozaczuk


On Monday, April 24, 2023 at 2:08:26 PM UTC-4 Waldek Kozaczuk wrote:

On Mon, Apr 24, 2023 at 4:34 AM Nadav Har'El  wrote:

On Mon, Apr 24, 2023 at 6:26 AM Waldek Kozaczuk  wrote:

Hi,

Over the recent week, I have been working to get OSv to run a simple "hello 
world" app (aka native-example) built as a position-dependent statically 
linked executable. 


Nice!
 


   1. Support the arch_prctl syscall that sets the app TLS 
  - this was by far the most complicated element that required changing 
  OSv to store new per-pcpu data pointer in GS register and enhancing both 
  syscall handler and interrupt/page fault handler to detect and switch if 
  needed the FS base to the kernel TLS on entry and back to the app one on 
  exit (see 
  
https://github.com/cloudius-systems/osv/issues/1137#issuecomment-1512315880
  ) 
   

If this has noticeable overhead, perhaps it makes sense to make it optional?

I have not measured it in any formal way. But when testing some of the 
earlier versions of the code, I could see the context switch time (the 
colocated one measured by misc-ctxsw) go up from 313 to 362 ns caused by 
adding this line: 

processor::wrmsr(msr::IA32_FS_KERNEL_BASE, reinterpret_cast(_tcb));
which may indirectly measure the cost of the code to change GS or FS base 
using the MSR instruction at ~ 50 ns (yikes). I would think the FSGSBASE 
instruction should be faster.


BTW I have measured indirectly the cost of the MSR and wrgsbase indirectly 
by modifying the thread::switch() like this and running misc-ctxsw (I 
assume the cost of wrfsbase would be identical):
 
+uint32_t IA32_GS_BASE = 0xc101;
 void thread::switch_to()
 {
 thread* old = current();
@@ -81,6 +82,8 @@ void thread::switch_to()
 // barriers
 barrier();
 set_fsbase(reinterpret_cast(_tcb));
+//asm volatile("wrgsbase %0" : : "r"(reinterpret_cast(_tcb)));
+//processor::wrmsr(IA32_GS_BASE, reinterpret_cast(_tcb));
 barrier();
 auto c = _detached_state->_cpu;
 old->_state.exception_stack = c->arch.get_exception_stack();

With uncommented wrgsbase the cost of colocated context switch barely 
budged. On average, I could see a maximum of 1-2 ns difference if any. 
Sometimes the times were identical. So it seems the wrgsbase is pretty 
cheap though we should avoid calling it in the interrupt/page fault/syscall 
handler.

On the other hand, with uncommented wrmsr code the cost of the context 
switch bumped by ~50ns so this instruction is very expensive. That is also 
why we need to especially avoid it if wrgsbase is not available.


Here is a subset of the changes I had to make to the context-switching code 
and the interrupt/syscall handler:

1. Add 2 new fields to the thread control block:

unsigned long app_tcb;   //holds address of the address the app passed 
to arch_prctl

long kernel_tcb_counter; //if 0 means we have to do an app/kernel/app 
FS base switch

2. Setup new per-cpu data intended to hold a pointer to the tcb:

--- a/arch/x64/arch-cpu.hh

+++ b/arch/x64/arch-cpu.hh


+struct tcb_data {

+u64 kernel_tcb;

+u64 tmp[2];

+};

+

 struct arch_cpu {

 arch_cpu();

 processor::aligned_task_state_segment atss;

@@ -46,6 +52,7 @@ struct arch_cpu {

 u32 apic_id;

 u32 acpi_id;

 u64 gdt[nr_gdt];

+tcb_data _tcb_data;

 void init_on_cpu();

 void set_ist_entry(unsigned ist, char* base, size_t size);

 char* get_ist_entry(unsigned ist);

@@ -181,6 +188,8 @@ inline void arch_cpu::init_on_cpu()

 processor::init_fpu();

 

 processor::init_syscall();

+

+processor::wrmsr(msr::IA32_GS_BASE, 
reinterpret_cast(&_tcb_data.kernel_tcb));

 }



3. Change kernel fs pointer on each context switch.

--- a/arch/x64/arch-switch.hh

+++ b/arch/x64/arch-switch.hh


@@ -81,11 +81,13 @@ void thread::switch_to()

...

 c->arch.set_exception_stack(_state.exception_stack);

+c->arch._tcb_data.kernel_tcb = reinterpret_cast(_tcb); //This 
should be very fast

 auto fpucw = processor::fnstcw();

...

@@ -258,6 +260,7 @@ void thread::setup_tcb()

 else {

 _tcb->syscall_stack_top = 0;

 }

+_tcb->kernel_tcb_counter = 1; //By default disable fs base switch

 }

4. Handle fs switch if necessary on entry/exit of syscall/exception/page 
fault handler:

This is just a code change around syscall entry but we have to do the 
opposite for exit and similar for page fault/interrupt handler (possibly 
signal handler as well) 

@@ -174,6 +214,26 @@ syscall_entry:

 .cfi_register rip, rcx # rcx took previous rip value

 .cfi_register rflags, r11 # r11 took previous rflags value

 # There is no ring transition and rflags are left unchanged.

+#

+# app->kernel tcb switch

+movq %rax, %gs:8  # save register rax so we can restore it later

+movq %gs:0, %rax  # copy address of kernel tcb to the temp register rax

+#1. Check if kernel_tcb_counter 0 and jump over to 3 if not (no need 
to 

Re: [osv-dev] OSv can run statically linked executable

2023-04-24 Thread 'Nadav Har'El' via OSv Development
On Mon, Apr 24, 2023 at 9:08 PM Waldek Kozaczuk 
wrote:

>
>
>
>
>>>1. Fixing a potential bug in handling TCGETS in the console driver.
>>>
>>> I'm curious what this bug was - I am personally fond of this area of
>> this code, as you can see from the history
>> lesson in drivers/line-discipline.cc :-)
>>
> I think it may have to do with some size difference of the termios struct
> between glibc and OSv. The symptom seemed to be a corrupted stack after
> ioctl syscall call that ended up calling the code to handle TCGETS. This
> change seems to fix it:
>
> --- a/drivers/console.cc
>
> +++ b/drivers/console.cc
>
> @@ -68,7 +68,16 @@ console_ioctl(u_long request, void *arg)
>
>  {
>
>  switch (request) {
>
>  case TCGETS:
>
> -*static_cast(arg) = tio;
>
> +//*static_cast(arg) = tio;
>
> +{
>
> +  termios *in = static_cast(arg);
>
> +  in->c_iflag = tio.c_iflag;
>
> +  in->c_oflag = tio.c_oflag;
>
> +  in->c_cflag = tio.c_cflag;
>
> +  in->c_lflag = tio.c_lflag;
>
> +  in->c_line = tio.c_line;
>
> +}
>
>  return 0;
>
> I think I have missed the c_cc field.
>
> Here is the relevant code in glibc -
> https://github.com/bminor/glibc/blob/ae612c45efb5e34713859a5facf92368307efb6e/sysdeps/unix/sysv/linux/tcgetattr.c
>

Ok, I think I know what's going on.

OSv's "struct termios" from include/api/termios.h is identical to that
which gcc defines in /usr/include/bits/termios-struct.h,
But looking at the code above, it turns out that glibc does NOT assume that
the kernel uses this termios structure, but something else called
__kernel_termios

So although our tcgetattr() function should return our usual termios
structure as-is, TCGETS should do something different - it should write a
__kernel_termios structure.
I think _kernel_termios is ktermios that you have in
/usr/include/asm-generic/termbits.h - you can see there that NCCS (the
number of control characters) is lower, just 19 instead of 32, which
explains the overflow you noticed.

I think the fix is simple - just copy a part of the termios struct - only
up to the 19th c_cc member, not the whole thing.
Nadav.

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/CANEVyjvWQgtbqas_qbznRhGRpNOheEJMBfuhhVa9snSQoA1vjg%40mail.gmail.com.


Re: [osv-dev] OSv can run statically linked executable

2023-04-24 Thread Waldek Kozaczuk
On Mon, Apr 24, 2023 at 4:34 AM Nadav Har'El  wrote:

> On Mon, Apr 24, 2023 at 6:26 AM Waldek Kozaczuk 
> wrote:
>
>> Hi,
>>
>> Over the recent week, I have been working to get OSv to run a simple
>> "hello world" app (aka native-example) built as a position-dependent
>> statically linked executable.
>>
>
> Nice!
>
>
>>
>>1. Support the arch_prctl syscall that sets the app TLS
>>   - this was by far the most complicated element that required
>>   changing OSv to store new per-pcpu data pointer in GS register and
>>   enhancing both syscall handler and interrupt/page fault handler to 
>> detect
>>   and switch if needed the FS base to the kernel TLS on entry and back 
>> to the
>>   app one on exit (see
>>   
>> https://github.com/cloudius-systems/osv/issues/1137#issuecomment-1512315880
>>   )
>>
>>
> If this has noticeable overhead, perhaps it makes sense to make it
> optional?
>
I have not measured it in any formal way. But when testing some of the
earlier versions of the code, I could see the context switch time (the
colocated one measured by misc-ctxsw) go up from 313 to 362 ns caused by
adding this line:

processor::wrmsr(msr::IA32_FS_KERNEL_BASE, reinterpret_cast(_tcb));

which may indirectly measure the cost of the code to change GS or FS base
using the MSR instruction at ~ 50 ns (yikes). I would think the FSGSBASE
instruction should be faster.

Here is a subset of the changes I had to make to the context-switching code
and the interrupt/syscall handler:

1. Add 2 new fields to the thread control block:

unsigned long app_tcb;   //holds address of the address the app passed
to arch_prctl

long kernel_tcb_counter; //if 0 means we have to do an app/kernel/app
FS base switch

2. Setup new per-cpu data intended to hold a pointer to the tcb:

--- a/arch/x64/arch-cpu.hh

+++ b/arch/x64/arch-cpu.hh


+struct tcb_data {

+u64 kernel_tcb;

+u64 tmp[2];

+};

+

 struct arch_cpu {

 arch_cpu();

 processor::aligned_task_state_segment atss;

@@ -46,6 +52,7 @@ struct arch_cpu {

 u32 apic_id;

 u32 acpi_id;

 u64 gdt[nr_gdt];

+tcb_data _tcb_data;

 void init_on_cpu();

 void set_ist_entry(unsigned ist, char* base, size_t size);

 char* get_ist_entry(unsigned ist);

@@ -181,6 +188,8 @@ inline void arch_cpu::init_on_cpu()

 processor::init_fpu();



 processor::init_syscall();

+

+processor::wrmsr(msr::IA32_GS_BASE,
reinterpret_cast(&_tcb_data.kernel_tcb));

 }



3. Change kernel fs pointer on each context switch.

--- a/arch/x64/arch-switch.hh

+++ b/arch/x64/arch-switch.hh


@@ -81,11 +81,13 @@ void thread::switch_to()

...

 c->arch.set_exception_stack(_state.exception_stack);

+c->arch._tcb_data.kernel_tcb = reinterpret_cast(_tcb); //This
should be very fast

 auto fpucw = processor::fnstcw();

...

@@ -258,6 +260,7 @@ void thread::setup_tcb()

 else {

 _tcb->syscall_stack_top = 0;

 }

+_tcb->kernel_tcb_counter = 1; //By default disable fs base switch

 }

4. Handle fs switch if necessary on entry/exit of syscall/exception/page
fault handler:

This is just a code change around syscall entry but we have to do the
opposite for exit and similar for page fault/interrupt handler (possibly
signal handler as well)

@@ -174,6 +214,26 @@ syscall_entry:

 .cfi_register rip, rcx # rcx took previous rip value

 .cfi_register rflags, r11 # r11 took previous rflags value

 # There is no ring transition and rflags are left unchanged.

+#

+# app->kernel tcb switch

+movq %rax, %gs:8  # save register rax so we can restore it later

+movq %gs:0, %rax  # copy address of kernel tcb to the temp register rax

+#1. Check if kernel_tcb_counter 0 and jump over to 3 if not (no need
to do fsbase switch)

+cmpq $0, 40(%rax)

+jne on_kernel_tcb

+

+#2. If zero set fs MSR to kernel tcb

+movq %rbx, %gs:16  # save register rbx so we can restore it later

+movq (%rax), %rbx # set kernel tcb

+wrfsbase %rbx //TODO: In reality we need to check if wrfsbase is
available and use wrmsr if not

+movq %gs:16, %rbx

+

+on_kernel_tcb:

+#3. Increment counter (for nested case)

+incq 40(%rax)

+#4. Restore %rax

+movq %gs:8, %rax

+

 #

 # Unfortunately the mov instruction cannot be used to dereference an
address

 # on syscall stack pointed by address in TCB (%fs:16) - double memory
dereference.


I did measure that the context switch code is not affected in any way. But
I am sure the syscall/page fault/interrupt handler is affected but
hopefully by a tiny bit for all the cases except when the application
thread (of the static elf) gets interrupted, triggers page fault, or makes
a syscall call. In other words, I hope that kernel threads and normal
(non-static-elf) threads would not be affected. We could also add the
necessary #ifdef static_elf.

Any ideas on how to measure how much slower the interrupt/syscall/page
fault handler 

Re: [osv-dev] OSv can run statically linked executable

2023-04-24 Thread 'Nadav Har'El' via OSv Development
On Mon, Apr 24, 2023 at 6:26 AM Waldek Kozaczuk 
wrote:

> Hi,
>
> Over the recent week, I have been working to get OSv to run a simple
> "hello world" app (aka native-example) built as a position-dependent
> statically linked executable.
>

Nice!


>
>1. Support the arch_prctl syscall that sets the app TLS
>   - this was by far the most complicated element that required
>   changing OSv to store new per-pcpu data pointer in GS register and
>   enhancing both syscall handler and interrupt/page fault handler to 
> detect
>   and switch if needed the FS base to the kernel TLS on entry and back to 
> the
>   app one on exit (see
>   
> https://github.com/cloudius-systems/osv/issues/1137#issuecomment-1512315880
>   )
>
>
If this has noticeable overhead, perhaps it makes sense to make it optional?


>
>1. Fixing a potential bug in handling TCGETS in the console driver.
>
> I'm curious what this bug was - I am personally fond of this area of this
code, as you can see from the history
lesson in drivers/line-discipline.cc :-)


>
>1. Implement sys_prlimit
>2. Enable the readlink, geteuid and getegid
>
> I think we already had those - or did you mean the system call?


> This was enough to run a single-threaded app but we will need to implement
> the clone syscall to support multi-threaded apps.
>

Very nice. You can probably start by implementing the "simple" case of
clone() used by a simple multi-threaded application and
leave the other cases with UNIMPLEMENTED (or "ignore" various parameters
and leave them to be perfected later, with WARN_ONCE)

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/CANEVyjs1vV__S%3DLKkX-vL91x0D62Q7VjiyPho3K%3DomqtvoCpcA%40mail.gmail.com.


Re: [osv-dev] OSv can run statically linked executable

2023-04-23 Thread 'Dor Laor' via OSv Development
Very impressive, it improves OSv's ability to be a very safe and
fast sandbox

On Mon, Apr 24, 2023 at 6:26 AM Waldek Kozaczuk 
wrote:

> Hi,
>
> Over the recent week, I have been working to get OSv to run a simple
> "hello world" app (aka native-example) built as a position-dependent
> statically linked executable. In essence, I picked up where Pekka Enberg
> left over 8 years ago (see
> https://github.com/cloudius-systems/osv/tree/static-elf). Obviously,
> given these days OSv has pretty robust support of over 70 syscalls (and 60
> more that should be trivial to add), and the remaining work is much more
> manageable.
>
> ./scripts/firecracker.py -e /hello
> OSv v0.57.0-37-g0de155a4
> Booted up in 5.23 ms
> Cmdline: /hello
>  -> syscall: 107
>  -> syscall: 102
>  -> syscall: 108
>  -> syscall: 104
>  -> syscall: 158
>  -> syscall: 012
>  -> syscall: 012
>  -> syscall: 158
>  -> syscall: 218
>  -> syscall: 273
>  -> syscall: 063
>  -> syscall: 302
>  -> syscall: 089
>  -> syscall: 318
>  -> syscall: 228
>  -> syscall: 228
>  -> syscall: 012
>  -> syscall: 012
>  -> syscall: 010
>  -> syscall: 262
>  -> syscall: 016
>  -> syscall: 001
> Hello from C code
>  -> syscall: 231
>
> I will be sending a series of proper patches later after I clean some
> issues but in essence here is a list of things I had to do including what
> Pekka started:
>
>1. Tweak dynamic linker to support static executable:
>   - Handle missing DT_SYMTAB, DT_STRTAB and DT_NEEDED.
>   - Handle ET_EXEC
>   - Support statically-linked executable base address
>2. Add basic handling of static ELF entry point and initial stack
>state setup (see figure 3.9 ("Initial Process Stack") of the x86-64 ABI
>specification - https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf
>   - make sure the RDX registers in zeroed and the basic AUX vector
>   with AT_RANDOM is set up
>   - more is left to do to fully support argv and full aux vector
>3. Add support of the brk() syscall (see issue 1138)
>4. Add dummy support of sys_set_robust_list and set_tid_address
>syscalls (possibly needs something more for multithreaded apps).
>5. Support the arch_prctl syscall that sets the app TLS
>   - this was by far the most complicated element that required
>   changing OSv to store new per-pcpu data pointer in GS register and
>   enhancing both syscall handler and interrupt/page fault handler to 
> detect
>   and switch if needed the FS base to the kernel TLS on entry and back to 
> the
>   app one on exit (see
>   
> https://github.com/cloudius-systems/osv/issues/1137#issuecomment-1512315880
>   )
>6. Fixing a potential bug in handling TCGETS in the console driver.
>7. Implement sys_prlimit
>8. Enable the readlink, geteuid and getegid
>
> This was enough to run a single-threaded app but we will need to implement
> the clone syscall to support multi-threaded apps. In addition, we would
> want to support the static pies as well which I hope should not be very
> difficult.
>
> Regards,
> Waldek
>
> --
> You received this message because you are subscribed to the Google Groups
> "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to osv-dev+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/osv-dev/3a561595-e072-4980-8375-3b742717dd6dn%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/CAKUaUn4WwpOOrYq9-9CF5M4LCzOTqXWhPhWqPunnpdfu2uVZRQ%40mail.gmail.com.


[osv-dev] OSv can run statically linked executable

2023-04-23 Thread Waldek Kozaczuk
Hi,

Over the recent week, I have been working to get OSv to run a simple "hello 
world" app (aka native-example) built as a position-dependent statically 
linked executable. In essence, I picked up where Pekka Enberg left over 8 
years ago (see https://github.com/cloudius-systems/osv/tree/static-elf). 
Obviously, given these days OSv has pretty robust support of over 70 
syscalls (and 60 more that should be trivial to add), and the remaining 
work is much more manageable.

./scripts/firecracker.py -e /hello
OSv v0.57.0-37-g0de155a4
Booted up in 5.23 ms
Cmdline: /hello 
 -> syscall: 107
 -> syscall: 102
 -> syscall: 108
 -> syscall: 104
 -> syscall: 158
 -> syscall: 012
 -> syscall: 012
 -> syscall: 158
 -> syscall: 218
 -> syscall: 273
 -> syscall: 063
 -> syscall: 302
 -> syscall: 089
 -> syscall: 318
 -> syscall: 228
 -> syscall: 228
 -> syscall: 012
 -> syscall: 012
 -> syscall: 010
 -> syscall: 262
 -> syscall: 016
 -> syscall: 001
Hello from C code
 -> syscall: 231

I will be sending a series of proper patches later after I clean some 
issues but in essence here is a list of things I had to do including what 
Pekka started:

   1. Tweak dynamic linker to support static executable:
  - Handle missing DT_SYMTAB, DT_STRTAB and DT_NEEDED.
  - Handle ET_EXEC
  - Support statically-linked executable base address
   2. Add basic handling of static ELF entry point and initial stack state 
   setup (see figure 3.9 ("Initial Process Stack") of the x86-64 ABI 
   specification - https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf
  - make sure the RDX registers in zeroed and the basic AUX vector with 
  AT_RANDOM is set up
  - more is left to do to fully support argv and full aux vector
   3. Add support of the brk() syscall (see issue 1138)
   4. Add dummy support of sys_set_robust_list and set_tid_address syscalls 
   (possibly needs something more for multithreaded apps).
   5. Support the arch_prctl syscall that sets the app TLS 
  - this was by far the most complicated element that required changing 
  OSv to store new per-pcpu data pointer in GS register and enhancing both 
  syscall handler and interrupt/page fault handler to detect and switch if 
  needed the FS base to the kernel TLS on entry and back to the app one on 
  exit 
  (see 
https://github.com/cloudius-systems/osv/issues/1137#issuecomment-1512315880) 
   6. Fixing a potential bug in handling TCGETS in the console driver.
   7. Implement sys_prlimit
   8. Enable the readlink, geteuid and getegid

This was enough to run a single-threaded app but we will need to implement 
the clone syscall to support multi-threaded apps. In addition, we would 
want to support the static pies as well which I hope should not be very 
difficult.

Regards,
Waldek 

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/3a561595-e072-4980-8375-3b742717dd6dn%40googlegroups.com.