Re: [PATCH v3] Implement switch to separate SYSCALL call stack

2018-05-13 Thread Nadav Har'El
Hi, thanks for being so insistent on this important patch.

I really wanted to commit this version, but unfortunately after another
reading,
I'm really worried if what you're doing with rbx is safe. Please see my
comments below.

Sorry about dragging this on for so long :-(


On Fri, Mar 16, 2018 at 4:56 AM, Waldemar Kozaczuk 
wrote:

> This patch implements separate syscall call stack needed
> when runtimes like Golang use SYSCALL instruction to execute
> system calls. More specifically in case of Golang tiny stacks
> used by coroutines are not deep enough to execute system
> calls on OSv which handles them as regular function calls.
> In case of OS like Linux tiny Golang stacks are not a problem as
> the system calls are executed on separate kernel stack.
>
> More specifically each application thread pre-alllocates
> "tiny" (1024-bytes deep) syscall call stack. When SYSCALL
> instruction is called first time for given thread the call stack
> is switched to the tiny syscall stack in order to setup the "large"
> stack and switch to. Eventually execution of syscall continues
> on large syscall stack. From this point, second and subsequent
> executions of SYSCALL instruction are handled on large syscall stack.
>
> Please note that because the stack is switched from original
> caller stack to the syscall stack that is allocated lazily
> the stack pointers "jumps" to higher addresses. From gdb
> perspective this results in "Backtrace stopped: previous
> frame inner to this frame (corrupt stack?)" message.
> Gdb assumes that stack pointer should decrease in value as the stack
> "grows"
> which is not the case when stack is switched to syscall stack.
>
> This patch is based on the original patch provided by @Hawxchen.
>
> Fixes #808
>
> ---
>  arch/x64/arch-switch.hh | 108 ++
> ++
>  arch/x64/arch-tls.hh|  15 +++
>  arch/x64/entry.S| 102 ++
> ---
>  include/osv/sched.hh|   4 +-
>  4 files changed, 205 insertions(+), 24 deletions(-)
>
> diff --git a/arch/x64/arch-switch.hh b/arch/x64/arch-switch.hh
> index d1a039a..b7635d2 100644
> --- a/arch/x64/arch-switch.hh
> +++ b/arch/x64/arch-switch.hh
> @@ -12,6 +12,38 @@
>  #include 
>  #include 
>
> +//
> +// The last 16 bytes of the syscall stack are reserved for 2 fields -
> +// caller stack pointer and tiny/large indicator.
> +// The reserved space has to be a multiple of 16 bytes to make it 16 bytes
> +// aligned as Linux x64 ABI mandates.
> +#define SYSCALL_STACK_RESERVED_SPACE_SIZE (2 * 8)
> +//
> +// The tiny stack has to be large enough to allow for execution of
> +// thread::setup_large_syscall_stack() that allocates and sets up
> +// large syscall stack. It was measured that as of this writing
> +// setup_large_syscall_stack() needs a little over 600 bytes of stack
> +// to properly operate. This makes 1024 bytes to be an adequate size
> +// of tiny stack.
> +// All application threads pre-allocate tiny syscall stack so there
> +// is a tiny penalty with this solution.
> +#define TINY_SYSCALL_STACK_SIZE 1024
> +#define TINY_SYSCALL_STACK_DEPTH (TINY_SYSCALL_STACK_SIZE -
> SYSCALL_STACK_RESERVED_SPACE_SIZE)
> +//
> +// The large syscall stack is setup and switched to on first
> +// execution of SYSCALL instruction for given application thread.
> +#define LARGE_SYSCALL_STACK_SIZE (16 * PAGE_SIZE)
> +#define LARGE_SYSCALL_STACK_DEPTH (LARGE_SYSCALL_STACK_SIZE -
> SYSCALL_STACK_RESERVED_SPACE_SIZE)
> +
> +#define SET_SYSCALL_STACK_TYPE_INDICATOR(value) \
> +*((long*)(_tcb->syscall_stack_top + 8)) = value;
> +
> +#define GET_SYSCALL_STACK_TYPE_INDICATOR() \
> +*((long*)(_tcb->syscall_stack_top + 8))
> +
> +#define TINY_SYSCALL_STACK_INDICATOR 0l
> +#define LARGE_SYSCALL_STACK_INDICATOR 1l
> +
>  extern "C" {
>  void thread_main(void);
>  void thread_main_c(sched::thread* t);
> @@ -146,6 +178,65 @@ void thread::setup_tcb()
>  _tcb = static_cast(p + tls.size +
> user_tls_size);
>  _tcb->self = _tcb;
>  _tcb->tls_base = p + user_tls_size;
> +
> +if (is_app()) {
> +//
> +// Allocate TINY syscall call stack
> +void* tiny_syscall_stack_begin = malloc(TINY_SYSCALL_STACK_SIZE);
> +assert(tiny_syscall_stack_begin);
> +//
> +// The top of the stack needs to be 16 bytes lower to make space
> for
> +// two fields - SYSCALL caller stack pointer and OSv syscall
> stack type indicator
> +_tcb->syscall_stack_top = tiny_syscall_stack_begin +
> TINY_SYSCALL_STACK_DEPTH;
> +SET_SYSCALL_STACK_TYPE_INDICATOR(TINY_SYSCALL_STACK_INDICATOR);
> +}
> +else {
> +_tcb->syscall_stack_top = 0;
> +}
> +}
> +
> +void thread::setup_large_syscall_stack()
> +{
> +assert(is_app());
> +assert(GET_SYSCALL_STACK_TYPE_INDICATOR() ==
> TINY_SYSCALL_STACK_INDICATOR);
> +//
> +// Allocate LARGE syscall stack
> +void* 

Re: [PATCH v3] Implement switch to separate SYSCALL call stack

2018-04-23 Thread Waldek Kozaczuk
This patch should is required to make golang-httpserver app work.

On Sunday, April 8, 2018 at 12:05:48 AM UTC-4, Waldek Kozaczuk wrote:
>
> This is the patch makes golang httpserver work.
>
> On Thursday, March 15, 2018 at 10:56:26 PM UTC-4, Waldek Kozaczuk wrote:
>>
>> This patch implements separate syscall call stack needed 
>> when runtimes like Golang use SYSCALL instruction to execute 
>> system calls. More specifically in case of Golang tiny stacks 
>> used by coroutines are not deep enough to execute system 
>> calls on OSv which handles them as regular function calls. 
>> In case of OS like Linux tiny Golang stacks are not a problem as 
>> the system calls are executed on separate kernel stack. 
>>
>> More specifically each application thread pre-alllocates 
>> "tiny" (1024-bytes deep) syscall call stack. When SYSCALL 
>> instruction is called first time for given thread the call stack 
>> is switched to the tiny syscall stack in order to setup the "large" 
>> stack and switch to. Eventually execution of syscall continues 
>> on large syscall stack. From this point, second and subsequent 
>> executions of SYSCALL instruction are handled on large syscall stack. 
>>
>> Please note that because the stack is switched from original 
>> caller stack to the syscall stack that is allocated lazily 
>> the stack pointers "jumps" to higher addresses. From gdb 
>> perspective this results in "Backtrace stopped: previous 
>> frame inner to this frame (corrupt stack?)" message. 
>> Gdb assumes that stack pointer should decrease in value as the stack 
>> "grows" 
>> which is not the case when stack is switched to syscall stack. 
>>
>> This patch is based on the original patch provided by @Hawxchen. 
>>
>> Fixes #808 
>>
>> --- 
>>  arch/x64/arch-switch.hh | 108 
>>  
>>  arch/x64/arch-tls.hh|  15 +++ 
>>  arch/x64/entry.S| 102 
>> ++--- 
>>  include/osv/sched.hh|   4 +- 
>>  4 files changed, 205 insertions(+), 24 deletions(-) 
>>
>> diff --git a/arch/x64/arch-switch.hh b/arch/x64/arch-switch.hh 
>> index d1a039a..b7635d2 100644 
>> --- a/arch/x64/arch-switch.hh 
>> +++ b/arch/x64/arch-switch.hh 
>> @@ -12,6 +12,38 @@ 
>>  #include  
>>  #include  
>>   
>> +// 
>> +// The last 16 bytes of the syscall stack are reserved for 2 fields - 
>> +// caller stack pointer and tiny/large indicator. 
>> +// The reserved space has to be a multiple of 16 bytes to make it 16 
>> bytes 
>> +// aligned as Linux x64 ABI mandates. 
>> +#define SYSCALL_STACK_RESERVED_SPACE_SIZE (2 * 8) 
>> +// 
>> +// The tiny stack has to be large enough to allow for execution of 
>> +// thread::setup_large_syscall_stack() that allocates and sets up 
>> +// large syscall stack. It was measured that as of this writing 
>> +// setup_large_syscall_stack() needs a little over 600 bytes of stack 
>> +// to properly operate. This makes 1024 bytes to be an adequate size 
>> +// of tiny stack. 
>> +// All application threads pre-allocate tiny syscall stack so there 
>> +// is a tiny penalty with this solution. 
>> +#define TINY_SYSCALL_STACK_SIZE 1024 
>> +#define TINY_SYSCALL_STACK_DEPTH (TINY_SYSCALL_STACK_SIZE - 
>> SYSCALL_STACK_RESERVED_SPACE_SIZE) 
>> +// 
>> +// The large syscall stack is setup and switched to on first 
>> +// execution of SYSCALL instruction for given application thread. 
>> +#define LARGE_SYSCALL_STACK_SIZE (16 * PAGE_SIZE) 
>> +#define LARGE_SYSCALL_STACK_DEPTH (LARGE_SYSCALL_STACK_SIZE - 
>> SYSCALL_STACK_RESERVED_SPACE_SIZE) 
>> + 
>> +#define SET_SYSCALL_STACK_TYPE_INDICATOR(value) \ 
>> +*((long*)(_tcb->syscall_stack_top + 8)) = value; 
>> + 
>> +#define GET_SYSCALL_STACK_TYPE_INDICATOR() \ 
>> +*((long*)(_tcb->syscall_stack_top + 8)) 
>> + 
>> +#define TINY_SYSCALL_STACK_INDICATOR 0l 
>> +#define LARGE_SYSCALL_STACK_INDICATOR 1l 
>> + 
>>  extern "C" { 
>>  void thread_main(void); 
>>  void thread_main_c(sched::thread* t); 
>> @@ -146,6 +178,65 @@ void thread::setup_tcb() 
>>  _tcb = static_cast(p + tls.size + 
>> user_tls_size); 
>>  _tcb->self = _tcb; 
>>  _tcb->tls_base = p + user_tls_size; 
>> + 
>> +if (is_app()) { 
>> +// 
>> +// Allocate TINY syscall call stack 
>> +void* tiny_syscall_stack_begin = 
>> malloc(TINY_SYSCALL_STACK_SIZE); 
>> +assert(tiny_syscall_stack_begin); 
>> +// 
>> +// The top of the stack needs to be 16 bytes lower to make space 
>> for 
>> +// two fields - SYSCALL caller stack pointer and OSv syscall 
>> stack type indicator 
>> +_tcb->syscall_stack_top = tiny_syscall_stack_begin + 
>> TINY_SYSCALL_STACK_DEPTH; 
>> +SET_SYSCALL_STACK_TYPE_INDICATOR(TINY_SYSCALL_STACK_INDICATOR); 
>> +} 
>> +else { 
>> +_tcb->syscall_stack_top = 0; 
>> +} 
>> +} 
>> + 
>> +void thread::setup_large_syscall_stack() 
>> +{ 
>> +assert(is_app()); 
>> +

Re: [PATCH v3] Implement switch to separate SYSCALL call stack

2018-04-07 Thread Waldek Kozaczuk
This is the patch makes golang httpserver work.

On Thursday, March 15, 2018 at 10:56:26 PM UTC-4, Waldek Kozaczuk wrote:
>
> This patch implements separate syscall call stack needed 
> when runtimes like Golang use SYSCALL instruction to execute 
> system calls. More specifically in case of Golang tiny stacks 
> used by coroutines are not deep enough to execute system 
> calls on OSv which handles them as regular function calls. 
> In case of OS like Linux tiny Golang stacks are not a problem as 
> the system calls are executed on separate kernel stack. 
>
> More specifically each application thread pre-alllocates 
> "tiny" (1024-bytes deep) syscall call stack. When SYSCALL 
> instruction is called first time for given thread the call stack 
> is switched to the tiny syscall stack in order to setup the "large" 
> stack and switch to. Eventually execution of syscall continues 
> on large syscall stack. From this point, second and subsequent 
> executions of SYSCALL instruction are handled on large syscall stack. 
>
> Please note that because the stack is switched from original 
> caller stack to the syscall stack that is allocated lazily 
> the stack pointers "jumps" to higher addresses. From gdb 
> perspective this results in "Backtrace stopped: previous 
> frame inner to this frame (corrupt stack?)" message. 
> Gdb assumes that stack pointer should decrease in value as the stack 
> "grows" 
> which is not the case when stack is switched to syscall stack. 
>
> This patch is based on the original patch provided by @Hawxchen. 
>
> Fixes #808 
>
> --- 
>  arch/x64/arch-switch.hh | 108 
>  
>  arch/x64/arch-tls.hh|  15 +++ 
>  arch/x64/entry.S| 102 
> ++--- 
>  include/osv/sched.hh|   4 +- 
>  4 files changed, 205 insertions(+), 24 deletions(-) 
>
> diff --git a/arch/x64/arch-switch.hh b/arch/x64/arch-switch.hh 
> index d1a039a..b7635d2 100644 
> --- a/arch/x64/arch-switch.hh 
> +++ b/arch/x64/arch-switch.hh 
> @@ -12,6 +12,38 @@ 
>  #include  
>  #include  
>   
> +// 
> +// The last 16 bytes of the syscall stack are reserved for 2 fields - 
> +// caller stack pointer and tiny/large indicator. 
> +// The reserved space has to be a multiple of 16 bytes to make it 16 
> bytes 
> +// aligned as Linux x64 ABI mandates. 
> +#define SYSCALL_STACK_RESERVED_SPACE_SIZE (2 * 8) 
> +// 
> +// The tiny stack has to be large enough to allow for execution of 
> +// thread::setup_large_syscall_stack() that allocates and sets up 
> +// large syscall stack. It was measured that as of this writing 
> +// setup_large_syscall_stack() needs a little over 600 bytes of stack 
> +// to properly operate. This makes 1024 bytes to be an adequate size 
> +// of tiny stack. 
> +// All application threads pre-allocate tiny syscall stack so there 
> +// is a tiny penalty with this solution. 
> +#define TINY_SYSCALL_STACK_SIZE 1024 
> +#define TINY_SYSCALL_STACK_DEPTH (TINY_SYSCALL_STACK_SIZE - 
> SYSCALL_STACK_RESERVED_SPACE_SIZE) 
> +// 
> +// The large syscall stack is setup and switched to on first 
> +// execution of SYSCALL instruction for given application thread. 
> +#define LARGE_SYSCALL_STACK_SIZE (16 * PAGE_SIZE) 
> +#define LARGE_SYSCALL_STACK_DEPTH (LARGE_SYSCALL_STACK_SIZE - 
> SYSCALL_STACK_RESERVED_SPACE_SIZE) 
> + 
> +#define SET_SYSCALL_STACK_TYPE_INDICATOR(value) \ 
> +*((long*)(_tcb->syscall_stack_top + 8)) = value; 
> + 
> +#define GET_SYSCALL_STACK_TYPE_INDICATOR() \ 
> +*((long*)(_tcb->syscall_stack_top + 8)) 
> + 
> +#define TINY_SYSCALL_STACK_INDICATOR 0l 
> +#define LARGE_SYSCALL_STACK_INDICATOR 1l 
> + 
>  extern "C" { 
>  void thread_main(void); 
>  void thread_main_c(sched::thread* t); 
> @@ -146,6 +178,65 @@ void thread::setup_tcb() 
>  _tcb = static_cast(p + tls.size + 
> user_tls_size); 
>  _tcb->self = _tcb; 
>  _tcb->tls_base = p + user_tls_size; 
> + 
> +if (is_app()) { 
> +// 
> +// Allocate TINY syscall call stack 
> +void* tiny_syscall_stack_begin = malloc(TINY_SYSCALL_STACK_SIZE); 
> +assert(tiny_syscall_stack_begin); 
> +// 
> +// The top of the stack needs to be 16 bytes lower to make space 
> for 
> +// two fields - SYSCALL caller stack pointer and OSv syscall 
> stack type indicator 
> +_tcb->syscall_stack_top = tiny_syscall_stack_begin + 
> TINY_SYSCALL_STACK_DEPTH; 
> +SET_SYSCALL_STACK_TYPE_INDICATOR(TINY_SYSCALL_STACK_INDICATOR); 
> +} 
> +else { 
> +_tcb->syscall_stack_top = 0; 
> +} 
> +} 
> + 
> +void thread::setup_large_syscall_stack() 
> +{ 
> +assert(is_app()); 
> +assert(GET_SYSCALL_STACK_TYPE_INDICATOR() == 
> TINY_SYSCALL_STACK_INDICATOR); 
> +// 
> +// Allocate LARGE syscall stack 
> +void* large_syscall_stack_begin = malloc(LARGE_SYSCALL_STACK_SIZE); 
> +void* large_syscall_stack_top = large_syscall_stack_begin + 
> 

Re: [PATCH v3] Implement switch to separate SYSCALL call stack

2018-03-15 Thread Waldek Kozaczuk
I realized that V2 version of this patch would break aarch64 build. To fix
this I have moved the "extern"ed versions of setup_large_syscall_stack() and
free_tiny_syscall_stack() to arch/x64/arch-switch.hh.

That is the only difference with V2.

On Thu, Mar 15, 2018 at 10:56 PM, Waldemar Kozaczuk 
wrote:

> This patch implements separate syscall call stack needed
> when runtimes like Golang use SYSCALL instruction to execute
> system calls. More specifically in case of Golang tiny stacks
> used by coroutines are not deep enough to execute system
> calls on OSv which handles them as regular function calls.
> In case of OS like Linux tiny Golang stacks are not a problem as
> the system calls are executed on separate kernel stack.
>
> More specifically each application thread pre-alllocates
> "tiny" (1024-bytes deep) syscall call stack. When SYSCALL
> instruction is called first time for given thread the call stack
> is switched to the tiny syscall stack in order to setup the "large"
> stack and switch to. Eventually execution of syscall continues
> on large syscall stack. From this point, second and subsequent
> executions of SYSCALL instruction are handled on large syscall stack.
>
> Please note that because the stack is switched from original
> caller stack to the syscall stack that is allocated lazily
> the stack pointers "jumps" to higher addresses. From gdb
> perspective this results in "Backtrace stopped: previous
> frame inner to this frame (corrupt stack?)" message.
> Gdb assumes that stack pointer should decrease in value as the stack
> "grows"
> which is not the case when stack is switched to syscall stack.
>
> This patch is based on the original patch provided by @Hawxchen.
>
> Fixes #808
>
> ---
>  arch/x64/arch-switch.hh | 108 ++
> ++
>  arch/x64/arch-tls.hh|  15 +++
>  arch/x64/entry.S| 102 ++
> ---
>  include/osv/sched.hh|   4 +-
>  4 files changed, 205 insertions(+), 24 deletions(-)
>
> diff --git a/arch/x64/arch-switch.hh b/arch/x64/arch-switch.hh
> index d1a039a..b7635d2 100644
> --- a/arch/x64/arch-switch.hh
> +++ b/arch/x64/arch-switch.hh
> @@ -12,6 +12,38 @@
>  #include 
>  #include 
>
> +//
> +// The last 16 bytes of the syscall stack are reserved for 2 fields -
> +// caller stack pointer and tiny/large indicator.
> +// The reserved space has to be a multiple of 16 bytes to make it 16 bytes
> +// aligned as Linux x64 ABI mandates.
> +#define SYSCALL_STACK_RESERVED_SPACE_SIZE (2 * 8)
> +//
> +// The tiny stack has to be large enough to allow for execution of
> +// thread::setup_large_syscall_stack() that allocates and sets up
> +// large syscall stack. It was measured that as of this writing
> +// setup_large_syscall_stack() needs a little over 600 bytes of stack
> +// to properly operate. This makes 1024 bytes to be an adequate size
> +// of tiny stack.
> +// All application threads pre-allocate tiny syscall stack so there
> +// is a tiny penalty with this solution.
> +#define TINY_SYSCALL_STACK_SIZE 1024
> +#define TINY_SYSCALL_STACK_DEPTH (TINY_SYSCALL_STACK_SIZE -
> SYSCALL_STACK_RESERVED_SPACE_SIZE)
> +//
> +// The large syscall stack is setup and switched to on first
> +// execution of SYSCALL instruction for given application thread.
> +#define LARGE_SYSCALL_STACK_SIZE (16 * PAGE_SIZE)
> +#define LARGE_SYSCALL_STACK_DEPTH (LARGE_SYSCALL_STACK_SIZE -
> SYSCALL_STACK_RESERVED_SPACE_SIZE)
> +
> +#define SET_SYSCALL_STACK_TYPE_INDICATOR(value) \
> +*((long*)(_tcb->syscall_stack_top + 8)) = value;
> +
> +#define GET_SYSCALL_STACK_TYPE_INDICATOR() \
> +*((long*)(_tcb->syscall_stack_top + 8))
> +
> +#define TINY_SYSCALL_STACK_INDICATOR 0l
> +#define LARGE_SYSCALL_STACK_INDICATOR 1l
> +
>  extern "C" {
>  void thread_main(void);
>  void thread_main_c(sched::thread* t);
> @@ -146,6 +178,65 @@ void thread::setup_tcb()
>  _tcb = static_cast(p + tls.size +
> user_tls_size);
>  _tcb->self = _tcb;
>  _tcb->tls_base = p + user_tls_size;
> +
> +if (is_app()) {
> +//
> +// Allocate TINY syscall call stack
> +void* tiny_syscall_stack_begin = malloc(TINY_SYSCALL_STACK_SIZE);
> +assert(tiny_syscall_stack_begin);
> +//
> +// The top of the stack needs to be 16 bytes lower to make space
> for
> +// two fields - SYSCALL caller stack pointer and OSv syscall
> stack type indicator
> +_tcb->syscall_stack_top = tiny_syscall_stack_begin +
> TINY_SYSCALL_STACK_DEPTH;
> +SET_SYSCALL_STACK_TYPE_INDICATOR(TINY_SYSCALL_STACK_INDICATOR);
> +}
> +else {
> +_tcb->syscall_stack_top = 0;
> +}
> +}
> +
> +void thread::setup_large_syscall_stack()
> +{
> +assert(is_app());
> +assert(GET_SYSCALL_STACK_TYPE_INDICATOR() ==
> TINY_SYSCALL_STACK_INDICATOR);
> +//
> +// Allocate LARGE syscall stack
> +void* large_syscall_stack_begin =